Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4950513ybb; Tue, 24 Mar 2020 08:14:42 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsnKu+Mx56tj/qnOUtbMhJSHNV3NCU/UU9+bjP2z2RY5g2DymmDNwtK24eeX70utXVqImR7 X-Received: by 2002:a9d:5c0c:: with SMTP id o12mr22353728otk.145.1585062882472; Tue, 24 Mar 2020 08:14:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585062882; cv=none; d=google.com; s=arc-20160816; b=DaJN0itmLqr1OsBtt81TZPbEbb85Gdo/fmM/7DHcbRE4KuEifcgYnkAXxtQpq4ZyZ6 8xb6Y7v7iLiUw/pWLrgKEKVhEPrrB2slMmVmiv11pnu35/Ty3gr8+BtXsQi9NfbQnFPQ f5Sb28JjxNqsxvjKlKwdn8Tj04hvNg83tgpUaXwyvdbNM07V51dNVdnkzN8w8jOVPJLA bT7mlgxpOOsJ/skpD1D30v/QoUVYvvA1IFDM/jZIxKB44xca6jFjzE2hTxoQk5I31a4L gnhGntzTL8Zda+qOzX3O1x598Wv+U24iLVk9dQ2sz+H4FCfMvsu3nYN0DXyfB91E+89O pQGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=OB5wsQfMn7KJE6fudASC8/R1hRZmk/PY8WHS+RUING4=; b=Zpg/NswOu/49lSFUg1cs0ue72rOAvei/iT3co17OKvsWjPoYEsaPc8hY4IxWPA/LvG WixbuSsrm3WwX5M8LNb4qx5KnuNld3vfWiwpOVNyRHJE0RKDKS/DP3McNCY5noCzY4vb yfxQXp0rwNRwtqR8g9LKE6mN4gqFOwEB3SemiK4uuFW5TTlKwrQ5Va4mDsllQDdAbJ7v fBJzg4yKMLtcukNZMZeEeEIo/DwERSt7cox/0nMHn5PVFKyscavTSvRtCNzpVJh+fWau iV2hR2gtMMulglB/xK2cLDvoDtSjJFcUDHNvSbArNyNuax0UuOrIcIuhdzlbiHNEidzj 51/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Lwn33DQo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c67si4328765oif.5.2020.03.24.08.14.27; Tue, 24 Mar 2020 08:14:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Lwn33DQo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727668AbgCXPMY (ORCPT + 99 others); Tue, 24 Mar 2020 11:12:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:52348 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727480AbgCXPMY (ORCPT ); Tue, 24 Mar 2020 11:12:24 -0400 Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 95AE220775; Tue, 24 Mar 2020 15:12:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1585062743; bh=XZ3Y86z3O6BjtKlt2lONWLkUBnw+fq+ymmnvaf69kFk=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=Lwn33DQoqpNsQ8uAI0rQnq1QzoTMfd7slf1CUGTF30UwG7KwmQDynrPGtuUXgFlc5 qbQRR9D4TcFjAKnlPau/DlKnHZBVwinkA/3H5whShpzBs+C2Rsuvvi0W9ACwfUXV1I geCmRZFUMZGaL1WMN6yt9gpev2+b2mbec/HlHEiQ= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 31B413522ACE; Tue, 24 Mar 2020 08:12:23 -0700 (PDT) Date: Tue, 24 Mar 2020 08:12:23 -0700 From: "Paul E. McKenney" To: "Li, Aubrey" Cc: Joel Fernandes , linux-kernel@vger.kernel.org, vpillai , Aaron Lu , Aubrey Li , peterz@infradead.org, Ben Segall , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Steven Rostedt , Vincent Guittot Subject: Re: [PATCH] sched: Use RCU-sched in core-scheduling balancing logic Message-ID: <20200324151223.GA19722@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200313232918.62303-1-joel@joelfernandes.org> <20200314003004.GI3199@paulmck-ThinkPad-P72> <20200323152126.GA141027@google.com> <6d933ce2-75e3-6469-4bb0-08ce9df29139@linux.intel.com> <20200324133012.GW3199@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200324133012.GW3199@paulmck-ThinkPad-P72> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 24, 2020 at 06:30:12AM -0700, Paul E. McKenney wrote: > On Tue, Mar 24, 2020 at 11:01:27AM +0800, Li, Aubrey wrote: > > On 2020/3/23 23:21, Joel Fernandes wrote: > > > On Mon, Mar 23, 2020 at 02:58:18PM +0800, Li, Aubrey wrote: > > >> On 2020/3/14 8:30, Paul E. McKenney wrote: > > >>> On Fri, Mar 13, 2020 at 07:29:18PM -0400, Joel Fernandes (Google) wrote: > > >>>> rcu_read_unlock() can incur an infrequent deadlock in > > >>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead. > > >>>> > > >>>> This fixes the following spinlock recursion observed when testing the > > >>>> core scheduling patches on PREEMPT=y kernel on ChromeOS: > > >>>> > > >>>> [ 14.998590] watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [kworker/0:10:965] > > >>>> > > >>> > > >>> The original could indeed deadlock, and this would avoid that deadlock. > > >>> (The commit to solve this deadlock is sadly not yet in mainline.) > > >>> > > >>> Acked-by: Paul E. McKenney > > >> > > >> I saw this in dmesg with this patch, is it expected? > > >> > > >> [ 117.000905] ============================= > > >> [ 117.000907] WARNING: suspicious RCU usage > > >> [ 117.000911] 5.5.7+ #160 Not tainted > > >> [ 117.000913] ----------------------------- > > >> [ 117.000916] kernel/sched/core.c:4747 suspicious rcu_dereference_check() usage! > > >> [ 117.000918] > > >> other info that might help us debug this: > > > > > > Sigh, this is because for_each_domain() expects rcu_read_lock(). From an RCU > > > PoV, the code is correct (warning doesn't cause any issue). > > > > > > To silence warning, we could replace the rcu_read_lock_sched() in my patch with: > > > preempt_disable(); > > > rcu_read_lock(); > > > > > > and replace the unlock with: > > > > > > rcu_read_unlock(); > > > preempt_enable(); > > > > > > That should both take care of both the warning and the scheduler-related > > > deadlock. Thoughts? > > > > > > > How about this? > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index a01df3e..7ff694e 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4743,7 +4743,6 @@ static void sched_core_balance(struct rq *rq) > > int cpu = cpu_of(rq); > > > > rcu_read_lock(); > > - raw_spin_unlock_irq(rq_lockp(rq)); > > for_each_domain(cpu, sd) { > > if (!(sd->flags & SD_LOAD_BALANCE)) > > break; > > @@ -4754,7 +4753,6 @@ static void sched_core_balance(struct rq *rq) > > if (steal_cookie_task(cpu, sd)) > > break; > > } > > - raw_spin_lock_irq(rq_lockp(rq)); > > rcu_read_unlock(); > > } > > As an alternative, I am backporting the -rcu commit 2b5e19e20fc2 ("rcu: > Make rcu_read_unlock_special() safe for rq/pi locks") to v5.6-rc6 and > testing it. The other support for this is already in mainline. I just > now started testing it, and will let you know how it goes. > > If that works for you, and if the bug you are looking to fix is also > in v5.5 or earlier, please let me know so that we can work out how to > deal with the older releases. And the backported patch below passes moderate rcutorture testing. But the big question... Does it work for you? Thanx, Paul ------------------------------------------------------------------------ commit 0ed0648e23f6aca2a6543fee011d74ab674e08e8 Author: Paul E. McKenney Date: Tue Mar 24 06:20:14 2020 -0700 rcu: Make rcu_read_unlock_special() safe for rq/pi locks The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively ong grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan [ paulmck: Backported to v5.6-rc6. ] Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c6ea81cd4189..2e1bfa0cc393 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -614,18 +614,16 @@ static void rcu_read_unlock_special(struct task_struct *t) struct rcu_node *rnp = rdp->mynode; exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) || - (rdp->grpmask & READ_ONCE(rnp->expmask)) || - tick_nohz_full_cpu(rdp->cpu); + (rdp->grpmask & READ_ONCE(rnp->expmask)); // Need to defer quiescent state until everything is enabled. - if (irqs_were_disabled && use_softirq && - (in_interrupt() || - (exp && !t->rcu_read_unlock_special.b.deferred_qs))) { - // Using softirq, safe to awaken, and we get - // no help from enabling irqs, unlike bh/preempt. + if (use_softirq && (in_irq() || (exp && !irqs_were_disabled))) { + // Using softirq, safe to awaken, and either the + // wakeup is free or there is an expedited GP. raise_softirq_irqoff(RCU_SOFTIRQ); } else { // Enabling BH or preempt does reschedule, so... - // Also if no expediting or NO_HZ_FULL, slow is OK. + // Also if no expediting, slow is OK. + // Plus nohz_full CPUs eventually get tick enabled. set_tsk_need_resched(current); set_preempt_need_resched(); if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&