Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp4516305pxa; Mon, 10 Aug 2020 10:58:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz8TGX04AjtyIchr5OAxcqK0ESBsxpuG69m8R/Mh2xfTQbVDCxPmPLpJJFoU1WTjlfLbnXi X-Received: by 2002:a17:907:20db:: with SMTP id qq27mr23630839ejb.550.1597082293904; Mon, 10 Aug 2020 10:58:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597082293; cv=none; d=google.com; s=arc-20160816; b=LrNfodn9czqCzvluZdvOWKDrmeUGBm/0Y/IMg+UiPgHBJZTmlfMHqMMD9WjDblrulE gSnkfQr7ev6KX9kfCyfOL1RjoSybhgw85uiFkeySXmCGCzthIaZAC/6FST+ZMgox0iEI eQxCeHKP/XDO/LH+nrfV3PFqg1kbtswsX1zEGWsywjYWMbuSdUVEYgc7dasta4SvtiCU GIzc7TRNUEDd1o5mg28O8t1SBq18lPaxjatrElP5ZP1EPkE0GALUCLdnqthJaL/K0YAc H2aUOApGjAvSHbHYP/xAE4k03cHdbZgh2RpPsR+0H55jPMeX1/dVo40QZjquvQ9c+SMv gPoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=+kyEqaXC74b510Zbz179tn3mwywnz0lflG40wO/FJ/A=; b=JXtKfQkH6y8vObMZNfYYs0Rolxf0ts6lGUy70ZWBJxys6m73wEIU4xpgjz/N21wOfK Nav8AY5QVQdTWcblmu445Vk+2gQjR1qH/I/ec56MgRLU+X58m2IJg+UZm+if4tBFZNRE EIhtwBR7891bc5g0y86DHMgTD7wQIB/+ubxnv3+fkve39DTL1UShG4/0HWfPNthfIKbt oa019aA/hnBI0BCVeCfzKK802tiQt4ie40+vrFsoQ2PT/AtdTymzEutDhXopiWNVhsdC ROpKshshuF9+UcTnxd6WAXYYbFjyaFBt5lWG5gAjvKefUbZZUGwdugA8WrMoQjtC7NDX 32Hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PX4NIdaD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cb26si11118958edb.2.2020.08.10.10.57.51; Mon, 10 Aug 2020 10:58:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=PX4NIdaD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727969AbgHJR5T (ORCPT + 99 others); Mon, 10 Aug 2020 13:57:19 -0400 Received: from mail.kernel.org ([198.145.29.99]:43736 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726820AbgHJR5T (ORCPT ); Mon, 10 Aug 2020 13:57:19 -0400 Received: from paulmck-ThinkPad-P72.home (unknown [50.45.173.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1A6B620866; Mon, 10 Aug 2020 17:57:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597082238; bh=k3MjZ6XxaVvaOS3cbVAXkz+PUn0Cc+I/FN8m/6d5GPA=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=PX4NIdaDL9nliQXjykosBBnoC2HgYkGI2QIGSsLZaAgZRP71kueVIZPomm0JpBra/ d5MCf+/Cxf6n2h2WmviE/Q6RVnTaEHN+2pj5yoCxE3WLglbqhWycWs+7lX/VPDHH4Q 0moOWyIfb0PEzJUHFX8QiZmbdzbMh9K79CIRV4Ng= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id B277135228C7; Mon, 10 Aug 2020 10:57:17 -0700 (PDT) Date: Mon, 10 Aug 2020 10:57:17 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: linux-kernel@vger.kernel.org, Neeraj Upadhyay , Davidlohr Bueso , Jonathan Corbet , Josh Triplett , Lai Jiangshan , linux-doc@vger.kernel.org, Mathieu Desnoyers , Mauro Carvalho Chehab , peterz@infradead.org, Randy Dunlap , rcu@vger.kernel.org, Steven Rostedt , tglx@linutronix.de, vineethrp@gmail.com Subject: Re: [PATCH v4 1/5] rcu/tree: Add a warning if CPU being onlined did not report QS already Message-ID: <20200810175717.GM4295@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200807170722.2897328-1-joel@joelfernandes.org> <20200807170722.2897328-2-joel@joelfernandes.org> <20200810154654.GJ4295@paulmck-ThinkPad-P72> <20200810173931.GB2253395@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200810173931.GB2253395@google.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 10, 2020 at 01:39:31PM -0400, Joel Fernandes wrote: > On Mon, Aug 10, 2020 at 08:46:54AM -0700, Paul E. McKenney wrote: > > On Fri, Aug 07, 2020 at 01:07:18PM -0400, Joel Fernandes (Google) wrote: > > > Currently, rcu_cpu_starting() checks to see if the RCU core expects a > > > quiescent state from the incoming CPU. However, the current interaction > > > between RCU quiescent-state reporting and CPU-hotplug operations should > > > mean that the incoming CPU never needs to report a quiescent state. > > > First, the outgoing CPU reports a quiescent state if needed. Second, > > > the race where the CPU is leaving just as RCU is initializing a new > > > grace period is handled by an explicit check for this condition. Third, > > > the CPU's leaf rcu_node structure's ->lock serializes these checks. > > > > > > This means that if rcu_cpu_starting() ever feels the need to report > > > a quiescent state, then there is a bug somewhere in the CPU hotplug > > > code or the RCU grace-period handling code. This commit therefore > > > adds a WARN_ON_ONCE() to bring that bug to everyone's attention. > > > > > > Cc: Paul E. McKenney > > > Cc: Neeraj Upadhyay > > > Suggested-by: Paul E. McKenney > > > Signed-off-by: Joel Fernandes (Google) > > > --- > > > kernel/rcu/tree.c | 9 ++++++++- > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 65e1b5e92319..a49fa3b60faa 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3996,7 +3996,14 @@ void rcu_cpu_starting(unsigned int cpu) > > > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */ > > > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq); > > > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags); > > > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */ > > > + > > > + /* > > > + * XXX: The following rcu_report_qs_rnp() is redundant. If the below > > > + * warning does not fire, consider replacing it with the "else" block, > > > + * by June 2021 or so (while keeping the warning). Refer to RCU's > > > + * Requirements documentation for the rationale. > > > > Let's suppose that this change is made, and further that in a year or > > two the "if" statement below is replaced with its "else" block. > > > > Now let's suppose that (some years after that) a hard-to-trigger bug > > makes its way into RCU's CPU-hotplug code that would have resulted in > > the WARN_ON_ONCE() triggering, but that this bug turns out to be not so > > hard to trigger in certain large production environments. > > > > Let's suppose further that you have moved on to where you are responsible > > for one of these large production environments. How would this > > hypothetical RCU/CPU-hotplug bug manifest? > > It could manifest as an RCU stall (after the warning triggers) since RCU > would wait forever. > > Were you thinking it is not worth doing this? I thought we wanted to remove > the reundant rcu_report_qs_rnp here to solidify everyone's understanding of > the code and fail early if there's something misunderstood (since such > misunderstanding could mean there are other hidden bugs somewhere). The > counter-argument to that being, making the code robust is more important for > the large production failure scenario where failures are costly. The benefits of removing code that is in theory redundant was my thought at one point, but sleeping on this several times since has made me much less favorable to this change. And perhaps my experiences with my new employer have affected my views on this as well. You never know! ;-) Thanx, Paul > thanks, > > - Joel > > > > Thanx, Paul > > > > > + */ > > > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */ > > > rcu_disable_urgency_upon_qs(rdp); > > > /* Report QS -after- changing ->qsmaskinitnext! */ > > > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); > > > -- > > > 2.28.0.236.gb10cc79966-goog > > >