Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3592201pxu; Tue, 15 Dec 2020 10:29:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJzRRIkSKrcbdiKy044u94PDnjyhr/L6YyViFRxHZ8s+OV+IcAMTGLWJOjtN8BKlT9Tzas3N X-Received: by 2002:aa7:cb0e:: with SMTP id s14mr30765925edt.122.1608056951592; Tue, 15 Dec 2020 10:29:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608056951; cv=none; d=google.com; s=arc-20160816; b=gf0gT8fJTrHVg/OHEhKOcYm3xsQb0jJsTY2Vej5hg6OHDSmgWai4/rChUOOgeGKIUG YJtgCDP/dobyfSdw5a6DZI1in8rfGQMAcHGGv+9ixh6Xai+tPeb8qvkJmfsxtTDmfJzm 35H2+RAZqpPjKGlnuYVSlZzvW4W5eKxfVQThGYO3toO43rB5ZdkNeWzMIUJs8UW0AYDz Lw3OIsxyRQMA7hF3V7rw/Cz6P/+cBBsJQ4OZg22K0NAG+C1H+mX9j8QBrVlbwd+B2oI8 GZTYF14mZflV32EFfD6GwNudhdRmirDUo5IIDe9uIahNxomZQmUt5nF586T0Cc9pHRp4 dM2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:dkim-signature :date; bh=pA08SDkOJOtVfgim1elqDSFRIbsi1M9sGSxdPGTuNe0=; b=Ze3Uwzz1rzFgfCfl29t9WrVG2/jcAvVrMnvhW5sAlFHAGPEAGI+IdAz/zz4dZ+JcSE 7xSdiMWPLyo69M7djeAWMbyPC9gJ3bPdPRuXy2i2JaDUjnS/41X/rnvSVxI16tQbokdJ rdUkTzsaNLeSCbPvI5Hqr+pYEs/2FoI56WUeN89XHx5c/m2xv3AWj2n7AzW94czrGqUN ZFqX/Nf6zVmeNFIbpkSiQ6E71S9aCk/mS5rVv8SZRiWAqCX8GRwF5a03lrTRD3NfgAvh NM09UfhkKSpeIMGnrEyc5RVqA0Fbt9OYgHivf+UQ8nVI/vXZ8ESivbFpUIosUVOWh0Xx Wy1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KWuMsYDI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hb44si1412646ejc.150.2020.12.15.10.28.47; Tue, 15 Dec 2020 10:29:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KWuMsYDI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731562AbgLOSX6 (ORCPT + 99 others); Tue, 15 Dec 2020 13:23:58 -0500 Received: from mail.kernel.org ([198.145.29.99]:45384 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731534AbgLOSX3 (ORCPT ); Tue, 15 Dec 2020 13:23:29 -0500 Date: Tue, 15 Dec 2020 10:22:46 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1608056568; bh=QPrErB5DMDQIIkF9V+Ie0fJXxDat+232EdenUKSeBSs=; h=From:To:Cc:Subject:In-Reply-To:References:From; b=KWuMsYDIjqWHsxQvud7m9sdHgw7mSYKUSIbCUKHAlV19RXkHQcf4T7UbUy0HQYAll sBwOzEGkQcL8NDxf4qhZSUDzjmbpilZbZKOKTjSYifH7Z67MiN522UA3h0c8sUKJLr CP9NP4X8FQHU36xrIufIUx1pZaP/f/5lw7IsuzxCYGaldU57gdoKaq9b9MA/2iK0dx VrDsJz+nBpR5GGNS8n0gCfwOCqd+DcmynZEnF/0nGMtDMJQJRComYTZdwqhi9mMIk7 V0xo/LGcsBYLAaqPl/Nbg5vjaMFaxbWdIZE/abIoworzR88qT2XpI/sXmfOLGDfgkl zm5GECZO0ya6Q== From: Jakub Kicinski To: "Paul E. McKenney" Cc: Naresh Kamboju , open list , linux-stable , rcu@vger.kernel.org, Linux ARM , lkft-triage@lists.linaro.org, Netdev , Greg Kroah-Hartman , Sasha Levin , Peter Zijlstra , Thomas Gleixner , Matthew Wilcox Subject: Re: [stabe-rc 5.9 ] sched: core.c:7270 Illegal context switch in RCU-bh read-side critical section! Message-ID: <20201215102246.4bdca3d8@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> In-Reply-To: <20201215144531.GZ2657@paulmck-ThinkPad-P72> References: <20201215144531.GZ2657@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 15 Dec 2020 06:45:31 -0800 Paul E. McKenney wrote: > > Crash log: > > -------------- > > # selftests: bpf: test_tc_edt.sh > > [ 503.796362] > > [ 503.797960] ============================= > > [ 503.802131] WARNING: suspicious RCU usage > > [ 503.806232] 5.9.15-rc1 #1 Tainted: G W > > [ 503.811358] ----------------------------- > > [ 503.815444] /usr/src/kernel/kernel/sched/core.c:7270 Illegal > > context switch in RCU-bh read-side critical section! > > [ 503.825858] > > [ 503.825858] other info that might help us debug this: > > [ 503.825858] > > [ 503.833998] > > [ 503.833998] rcu_scheduler_active = 2, debug_locks = 1 > > [ 503.840981] 3 locks held by kworker/u12:1/157: > > [ 503.845514] #0: ffff0009754ed538 > > ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x208/0x768 > > [ 503.855048] #1: ffff800013e63df0 (net_cleanup_work){+.+.}-{0:0}, > > at: process_one_work+0x208/0x768 > > [ 503.864201] #2: ffff8000129fe3f0 (pernet_ops_rwsem){++++}-{3:3}, > > at: cleanup_net+0x64/0x3b8 > > [ 503.872786] > > [ 503.872786] stack backtrace: > > [ 503.877229] CPU: 1 PID: 157 Comm: kworker/u12:1 Tainted: G W > > 5.9.15-rc1 #1 > > [ 503.885433] Hardware name: ARM Juno development board (r2) (DT) > > [ 503.891382] Workqueue: netns cleanup_net > > [ 503.895324] Call trace: > > [ 503.897786] dump_backtrace+0x0/0x1f8 > > [ 503.901464] show_stack+0x2c/0x38 > > [ 503.904796] dump_stack+0xec/0x158 > > [ 503.908215] lockdep_rcu_suspicious+0xd4/0xf8 > > [ 503.912591] ___might_sleep+0x1e4/0x208 > > You really are forbidden to invoke ___might_sleep() while in a BH-disable > region of code, whether due to rcu_read_lock_bh(), local_bh_disable(), > or whatever else. > > I do see the cond_resched() in inet_twsk_purge(), but I don't immediately > see a BH-disable region of code. Maybe someone more familiar with this > code would have some ideas. > > Or you could place checks for being in a BH-disable further up in > the code. Or build with CONFIG_DEBUG_INFO=y to allow more precise > interpretation of this stack trace. My money would be on the option that whatever run on this workqueue before forgot to re-enable BH, but we already have a check for that... Naresh, do you have the full log? Is there nothing like "BUG: workqueue leaked lock" above the splat?