Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2207745imm; Sat, 12 May 2018 07:43:43 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrrWKse2kTf/g7Y/FL/6adXVsxpRIszvvCxn6tfJPh4zU4sMOkLfNE7v3BGo+ephtOHsUHg X-Received: by 2002:a17:902:5610:: with SMTP id h16-v6mr2743849pli.140.1526136223686; Sat, 12 May 2018 07:43:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526136223; cv=none; d=google.com; s=arc-20160816; b=HO/FE5sJjcjIP4695xObgugLe4hguHF0bcQaYlflC+zIYn9AGnk6b45FFoYXnM+6G3 VJNdazilc31ckljiknY7T4/hONqwNUwqKI9dYtzMTmllB22DSZVrPTYGP28yhsAxMll9 OUzwuWEO55lTeZmOoZh8mpol+x89doxtyuyUAFFpoorJRs2Ann1d5R7ed/SPx/2VW4Fy rM9jKAtetHpkflgQ6JQGtAYT/B38yuZJHYVvklD9fu7yLDihxET1x+cyO735TVcheQsq YGMw8tMTNP54lG2gDPGxRfU6CGlO8XrEqoTGiv3C2R1Y/baFaLWyj+kV0NFBo8QBk1or ZgSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date:arc-authentication-results; bh=skgW9eS3opoMESSwlCi6Y4glvElWK2nyq5tXW3TajY4=; b=upmytHh0bwEPxIX+uKFP7C6CGrZQZ3eE0QKWbTS6yHPFzfrUb2C5AIc32rp0IPWu8U UEqb3UNbQCJbVYZ6w0ovzvt6CEvhDLkQvS5+Tue43xRSGDHXaOzzN8q7XSPYNBvAnANw kXJaCbn0fGxls/WtPkvV15Nwkco7l3cpakplPObI/hTzjzXS5fPwq60tiU9/LAgQr6aL 4i62lDKM/tiKtdtlCGzucV5QEwtAPnJCWgmpfPkKH6SKw7CJp8m3NLbG/a7gSNhCYsHf PRoMV58RdWJactq03/3EZMtPLH3IASbxzkg0DjmbJ3KaS2v7bTPO++xlM9FqF8lcEwVG I7sQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 94-v6si5543357ple.56.2018.05.12.07.43.28; Sat, 12 May 2018 07:43:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751223AbeELOnR (ORCPT + 99 others); Sat, 12 May 2018 10:43:17 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:37530 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750854AbeELOnP (ORCPT ); Sat, 12 May 2018 10:43:15 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4CEdCG5010193 for ; Sat, 12 May 2018 10:43:15 -0400 Received: from e11.ny.us.ibm.com (e11.ny.us.ibm.com [129.33.205.201]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hws5txy40-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sat, 12 May 2018 10:43:15 -0400 Received: from localhost by e11.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 12 May 2018 10:43:14 -0400 Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28) by e11.ny.us.ibm.com (146.89.104.198) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Sat, 12 May 2018 10:43:09 -0400 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4CEh8xP58589218; Sat, 12 May 2018 14:43:08 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 429ACB204D; Sat, 12 May 2018 11:45:07 -0400 (EDT) Received: from paulmck-ThinkPad-W541 (unknown [9.85.188.179]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP id E565DB2052; Sat, 12 May 2018 11:45:06 -0400 (EDT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id 5AF2C16C2C4D; Sat, 12 May 2018 07:44:38 -0700 (PDT) Date: Sat, 12 May 2018 07:44:38 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel.opensrc@gmail.com, torvalds@linux-foundation.org, npiggin@gmail.com Subject: Re: [tip/core/rcu,16/21] rcu: Add funnel locking to rcu_start_this_gp() Reply-To: paulmck@linux.vnet.ibm.com References: <1524452624-27589-16-git-send-email-paulmck@linux.vnet.ibm.com> <20180512060325.GA53808@joelaf.mtv.corp.google.com> <20180512144002.GI26088@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180512144002.GI26088@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18051214-2213-0000-0000-000002A379B9 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009012; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000260; SDB=6.01031247; UDB=6.00527102; IPR=6.00810378; MB=3.00021071; MTD=3.00000008; XFM=3.00000015; UTC=2018-05-12 14:43:13 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18051214-2214-0000-0000-00005A1768F8 Message-Id: <20180512144438.GA12826@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-05-12_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805120147 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 12, 2018 at 07:40:02AM -0700, Paul E. McKenney wrote: > On Fri, May 11, 2018 at 11:03:25PM -0700, Joel Fernandes wrote: > > On Sun, Apr 22, 2018 at 08:03:39PM -0700, Paul E. McKenney wrote: > > > The rcu_start_this_gp() function had a simple form of funnel locking that > > > used only the leaves and root of the rcu_node tree, which is fine for > > > systems with only a few hundred CPUs, but sub-optimal for systems having > > > thousands of CPUs. This commit therefore adds full-tree funnel locking. > > > > > > This variant of funnel locking is unusual in the following ways: > > > > > > 1. The leaf-level rcu_node structure's ->lock is held throughout. > > > Other funnel-locking implementations drop the leaf-level lock > > > before progressing to the next level of the tree. > > > > > > 2. Funnel locking can be started at the root, which is convenient > > > for code that already holds the root rcu_node structure's ->lock. > > > Other funnel-locking implementations start at the leaves. > > > > > > 3. If an rcu_node structure other than the initial one believes > > > that a grace period is in progress, it is not necessary to > > > go further up the tree. This is because grace-period cleanup > > > scans the full tree, so that marking the need for a subsequent > > > grace period anywhere in the tree suffices -- but only if > > > a grace period is currently in progress. > > > > > > 4. It is possible that the RCU grace-period kthread has not yet > > > started, and this case must be handled appropriately. > > > > > > However, the general approach of using a tree to control lock contention > > > is still in place. > > > > > > Signed-off-by: Paul E. McKenney > > > --- > > > kernel/rcu/tree.c | 92 +++++++++++++++++++++---------------------------------- > > > 1 file changed, 35 insertions(+), 57 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 94519c7d552f..d3c769502929 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1682,74 +1682,52 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > > > { > > > bool ret = false; > > > struct rcu_state *rsp = rdp->rsp; > > > - struct rcu_node *rnp_root = rcu_get_root(rsp); > > > - > > > - raw_lockdep_assert_held_rcu_node(rnp); > > > - > > > - /* If the specified GP is already known needed, return to caller. */ > > > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > > > - if (need_future_gp_element(rnp, c)) { > > > - trace_rcu_this_gp(rnp, rdp, c, TPS("Prestartleaf")); > > > - goto out; > > > - } > > > + struct rcu_node *rnp_root; > > > > > > /* > > > - * If this rcu_node structure believes that a grace period is in > > > - * progress, then we must wait for the one following, which is in > > > - * "c". Because our request will be noticed at the end of the > > > - * current grace period, we don't need to explicitly start one. > > > + * Use funnel locking to either acquire the root rcu_node > > > + * structure's lock or bail out if the need for this grace period > > > + * has already been recorded -- or has already started. If there > > > + * is already a grace period in progress in a non-leaf node, no > > > + * recording is needed because the end of the grace period will > > > + * scan the leaf rcu_node structures. Note that rnp->lock must > > > + * not be released. > > > */ > > > - if (rnp->gpnum != rnp->completed) { > > > - need_future_gp_element(rnp, c) = true; > > > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startedleaf")); > > > - goto out; > > > > Referring to the above negative diff as [1] (which I wanted to refer to later > > in this message..) > > > > > + raw_lockdep_assert_held_rcu_node(rnp); > > > + trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > > > + for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > > > + if (rnp_root != rnp) > > > + raw_spin_lock_rcu_node(rnp_root); > > > + if (need_future_gp_element(rnp_root, c) || > > > + ULONG_CMP_GE(rnp_root->gpnum, c) || > > > + (rnp != rnp_root && > > > + rnp_root->gpnum != rnp_root->completed)) { > > > + trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); > > > + goto unlock_out; > > > > I was a bit confused about the implementation of the above for loop: > > > > In the previous code (which I refer to in the negative diff [1]), we were > > checking the leaf, and if the leaf believed that RCU was not idle, then we > > were marking the need for the future GP and quitting this function. In the > > new code, it seems like even if the leaf believes RCU is not-idle, we still > > go all the way up the tree. > > > > I think the big change is, in the above new for loop, we either bail of if a > > future GP need was already marked by an intermediate node, or we go marking > > up the whole tree about the need for one. > > > > If a leaf believes RCU is not idle, can we not just mark the future GP need > > like before and return? It seems we would otherwise increase the lock > > contention since now we lock intermediate nodes and then finally even the > > root. Where as before we were not doing that if the leaf believed RCU was not > > idle. > > > > I am sorry if I missed something obvious. > > The trick is that we do the check before we have done the marking. > So if we bailed, we would not have marked at all. If we are at an > intermediate node and a grace period is in progress, we do bail. > > You are right that this means that we (perhaps unnecessarily) acquire > the lock of the parent rcu_node, which might or might not be the root. > And on systems with default fanout with 1024 CPUs or fewer, yes, it will > be the root, and yes, this is the common case. So might be well worth > improving. > > One way to implement the old mark-and-return approach as you suggest > above would be as shown below (untested, probably doesn't build, and > against current rcu/dev). What do you think? > > > The other thing is we now don't have the 'Startedleaf' trace like we did > > before. I sent a patch to remove it, but I think the removal of that is > > somehow connected to what I was just talking about.. and I was thinking if we > > should really remove it. Should we add the case for checking leaves only back > > or is that a bad thing to do? > > Suppose I got hit by a bus and you were stuck with the job of debugging > this. What traces would you want and where would they be? Keeping in > mind that too-frequent traces have their problems as well. > > (Yes, I will be trying very hard to avoid this scenario for as long as > I can, but this might be a good way for you (and everyone else) to be > thinking about this.) > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 1abe29a43944..abf3195e01dc 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1585,6 +1585,8 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > goto unlock_out; > } > rnp_root->gp_seq_needed = c; > + if (rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq))) Right... Make that rnp->gp_seq. Memory locality and all that... Thanx, Paul > + goto unlock_out; > if (rnp_root != rnp && rnp_root->parent != NULL) > raw_spin_unlock_rcu_node(rnp_root); > if (!rnp_root->parent)