Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp2061790ybg; Thu, 30 Jul 2020 09:22:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxY2f5zWPME60d52LQmftkoqDHx3Y0/1dYyu6kFZaXRtUb2ilHFnsDk0yOv5AA/faRgNjkR X-Received: by 2002:a17:906:4094:: with SMTP id u20mr3230245ejj.553.1596126157023; Thu, 30 Jul 2020 09:22:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596126157; cv=none; d=google.com; s=arc-20160816; b=nZeBk9g9YFX3nQOdaHgEClzInKT9fhI9ZB2HeiQOvnZum1bRz882hAR+Eqezc0mIR/ g1pghdASPW2lnCSPoail94HeqPIP2T2a28t7g0Ub5axehH2XUZBtAesG+CP7EmxqkVyL +KRdmVhL8sNSoNxMLVwU7Z+ZEhZVJ6LT1fL5+Nr8SPj8TsfMvxxbgtt7AdcigL3f/KMA ycxTkvDxmkfGjK+S3nKX3v7YITWPAXCYF3u91277xlDDvctGkMf+KrqVnpn2EUcip2WN RPhfbFEnFKwRmNE2BpgBT30IhJ1w1k5SbMX7KvT/B0SLqJsVugqC3DAWhy697oxs9tw4 xTPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=0zJtgahAwAaN+M9xk6FPGWoVj8u0wF0V1Pp3dgzqHA8=; b=ni3jTqmvSVZibe3HqOUVsZ/2AHbSn0P3+N7rX3z5QftyoLw3Pip58lFlkY7TaG4v9a ByWzk8Ox0HQQwQlMZT1RnVbyk2WCnbhTY6FucaiXRQbdQrghjKZ7s7UcoER0iilDYqeX Zxu5OAEDa7WcpwAi94nTEYRGYAknA9JfpSHzbMoiV7uoUaKtkijDoTRivhaJb1TbrOjR PBP97e1ss1d2WTGAsNqOz9bP4/ep471h3afYy19MHFCX19x5dubeLVqCeWnbr6q5oxby nAtQU3ZnnEON1izA1NwvxnokJT/nUFBg4hHHvDCBH2cDa9HeKYlcEB8k1tM2cgsLV91Y rFRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=TncS1nvG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qp26si3365115ejb.118.2020.07.30.09.22.14; Thu, 30 Jul 2020 09:22:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=TncS1nvG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728495AbgG3QWA (ORCPT + 99 others); Thu, 30 Jul 2020 12:22:00 -0400 Received: from mail.kernel.org ([198.145.29.99]:47888 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728452AbgG3QWA (ORCPT ); Thu, 30 Jul 2020 12:22:00 -0400 Received: from paulmck-ThinkPad-P72.home (unknown [50.45.173.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 605E72082E; Thu, 30 Jul 2020 16:21:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596126119; bh=9cFcZnfkur34WPfSqhU2MSEk1J8+VuHGVPWpxsCK54w=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=TncS1nvGzCSVxo7Sr9KeUFeJ9Hf7ZU3kLdn7SE8bAdJk0A0ZMURGgMlECP2pumm53 6tWH0uMI2b1S14bMdB7abLeZ8a2DXC1NVJF7ufKI/L6OtfQZi2GVv7pssf6Xd4VYfE F76ZjRPJUR/396oUNHZmuovdnxoHygXq8vFc0meA= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 3E5EF3520751; Thu, 30 Jul 2020 09:21:59 -0700 (PDT) Date: Thu, 30 Jul 2020 09:21:59 -0700 From: "Paul E. McKenney" To: "Joel Fernandes (Google)" Cc: linux-kernel@vger.kernel.org, Neeraj Upadhyay , Josh Triplett , Lai Jiangshan , Mathieu Desnoyers , rcu@vger.kernel.org, Steven Rostedt Subject: Re: [PATCH 1/2] rcu/tree: Add a warning if CPU being onlined did not report QS already Message-ID: <20200730162159.GZ9247@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200730030221.705255-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200730030221.705255-1-joel@joelfernandes.org> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 29, 2020 at 11:02:20PM -0400, Joel Fernandes (Google) wrote: > Add a warning if CPU being onlined did not report QS already. This is to > simplify the code in the CPU onlining path and also to make clear about > where QS is reported. The act of QS reporting in CPU onlining path is > is likely unnecessary as shown by code reading and testing with > rcutorture's TREE03 and hotplug parameters. How about something like this for the commit log? ------------------------------------------------------------------------ Currently, rcu_cpu_starting() checks to see if the RCU core expects a quiescent state from the incoming CPU. However, the current interaction between RCU quiescent-state reporting and CPU-hotplug operations should mean that the incoming CPU never needs to report a quiescent state. First, the outgoing CPU reports a quiescent state if needed. Second, the race where the CPU is leaving just as RCU is initializing a new grace period is handled by an explicit check for this condition. Third, the CPU's leaf rcu_node structure's ->lock serializes these checks. This means that if rcu_cpu_starting() ever feels the need to report a quiescent state, then there is a bug somewhere in the CPU hotplug code or the RCU grace-period handling code. This commit therefore adds a WARN_ON_ONCE() to bring that bug to everyone's attention. ------------------------------------------------------------------------ > Cc: Paul E. McKenney > Cc: Neeraj Upadhyay > Suggested-by: Paul E. McKenney > Signed-off-by: Joel Fernandes (Google) > > --- > kernel/rcu/tree.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 65e1b5e92319..1e51962b565b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu) > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */ > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq); > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags); > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */ > + > + /* > + * Delete QS reporting from here, by June 2021, if warning does not > + * fire. Let us make the rules for reporting QS for an offline CPUs > + * more explicit. The CPU onlining path does not need to report QS for > + * an offline CPU. Either the QS should have reported during CPU > + * offlining, or during rcu_gp_init() if it detected a race with either > + * CPU offlining or task unblocking on previously offlined CPUs. Note > + * that the FQS loop also does not report QS for an offline CPU any > + * longer (unless it splats due to an offline CPU blocking the GP for > + * too long). > + */ Let's leave at least the WARN_ON_ONCE() indefinitely. If you don't believe me, remove this code in your local tree, have someone give you several branches, some with bugs injected, and then try to figure out which have the bugs and then try to find those bugs. This is not a fastpath, so the overhead of the check is not a concern. Believe me, the difficulty of bug location without this check is a very real concern! ;-) On the other hand, I fully agree with the benefits of documenting the design rules. But is this really the best place to do that from the viewpoint of someone who is trying to figure out how RCU works? Thanx, Paul > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */ > rcu_disable_urgency_upon_qs(rdp); > /* Report QS -after- changing ->qsmaskinitnext! */ > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > -- > 2.28.0.rc0.142.g3c755180ce-goog >