Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1735813imm; Fri, 11 May 2018 23:03:54 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpQDqsOsK73hgDoRlCLWgPZhWTwswhcf4C3b0IHDISR6TyDQVcylg9oZIeW5rRZZxtyZwTc X-Received: by 2002:a63:7a19:: with SMTP id v25-v6mr1680031pgc.444.1526105034595; Fri, 11 May 2018 23:03:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526105034; cv=none; d=google.com; s=arc-20160816; b=B4FB8lK415xqmieKqJTkCJ6IQrZuCtmONLiEhPRcRdBhyKSWN/n/F9mYxSzfcTzNu2 IqOUtkOSlE1nYGb7mRqEA7Pg0KR//pG1zXhCe7Ud2pEZJfcC6wgmZuAUHC2gggC3pPT/ iAfI7f4AAlZB5NTWme24Ci4LitZPz52wukHbTXPwEugBVBGfrJAmxogEw4+kpWmOIftf h/sIzMbsWq8nTDLtk6nV8iWQ2AcXsr2ITsv2fIRDe9f4jFYHlpTiM/NSViobGSYjxnkx I3AmGDipqwJ3jeLMwzwQrHVm7e34f/Vz8PQL0CXQqFJBWsRmkREgLnlDGYCdP28Ptnno HDrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=hPFfK3cLYbeANCRnpIprQ9/XQLDj0mk38jEDgLYLsks=; b=TIN+JMkyzBCH8ofnWND3ySzxSVXxTBwGfrMi/9OEsUKE5x/VaXOfO9PKDHsF18xanM Mys0w32RMH5Hjn62dxjwzvBd85g3I9vhPE1FIv42gACr51cmlUyKi6d7v6yKcHhrKhaO A4Nj9fRvn6HFZ3jSW0B5lkszRbTMGwjDjBB6sLWJlqN4KWNr91eRQs/HZSP8TXtGVIZ0 CFFz7KzkXz9pHEU1f11DjA7EKZqQidgo8kJphruSrY/qmXuovr6lxyqjFpjTdJ/JbibI ZXaoPlQRV7bt2VYefOe+dPtv0bQm2uCURqdxiZ86t+qqLzlRR/n8lx+P6MDlZmEMNmaf 2K/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes-org.20150623.gappssmtp.com header.s=20150623 header.b=RBTW0wTp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 206-v6si5197121pfw.130.2018.05.11.23.03.38; Fri, 11 May 2018 23:03:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes-org.20150623.gappssmtp.com header.s=20150623 header.b=RBTW0wTp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750805AbeELGD2 (ORCPT + 99 others); Sat, 12 May 2018 02:03:28 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:32778 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750731AbeELGD1 (ORCPT ); Sat, 12 May 2018 02:03:27 -0400 Received: by mail-pl0-f68.google.com with SMTP id n10-v6so4433118plp.0 for ; Fri, 11 May 2018 23:03:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=hPFfK3cLYbeANCRnpIprQ9/XQLDj0mk38jEDgLYLsks=; b=RBTW0wTpNgG1reQCb4/TwskrbTQlFjEm06FidJMTZTpzxvN48YnTXaUPrYhohLtCL1 k6NpLJKgPPjiEBfYst+kKqXGFoGQIclEmBbNmaCa7tmgy4oNhy61rykQNxtxOeDV99dt UN8dJeFMpcMYqhOF29uPNW7IOedOoCKss6r+NpCf2K7DN0GE+2HluyRm9VjblT0jiYUk NrUOeM09BJvt7Fi8t5QANV4gQqPNWprx1emDNrRrPawDVkKTAdeHpo7WePPLC+ySubpQ lkF++z0CEoVv/nps7I/X1qc/KAQ86pbRKCaMy8Owz9AssdkbZVJxLT5NUwXu+sbQoTcI Z6ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=hPFfK3cLYbeANCRnpIprQ9/XQLDj0mk38jEDgLYLsks=; b=pinA7C+iLRdu8FQwdi1fDz4oGTroihNLZ+UrvfnZ/Mq/v7/8BX5KDqM3c9kcsuv6le AohAciHrJlx64AATfWhz+wmZgyofyOECQe7ESZ6jK03bnVV6EsGhJEbbpi34Gg+1+9DJ eS61CYo6dsDpdYxDkcaCuheRJzKeO2jOTx0NCBwevkceCa/rnqHfqEMKMhz+b3Ib956Z jHhwL6hZqN8ORR77PGeV5b7by4qcnFJOS2jAFRs6wA0NWvd7iv968zJQ8wAZoR1luSfk Zj85QYpe4iRQ6EN0GdDy2szbT2GwhvPQl4qe9+bZo8MiEb1Sb8vMYN5ex3aT/+U8zu27 os2Q== X-Gm-Message-State: ALKqPwdPyNIlqsvNj2gSrKY1f3DkQgjUGpaVwPV4Ixuws/xUbokZFgZU KRf30G74nGy61SQU8Gvlrg4H6w== X-Received: by 2002:a17:902:8f94:: with SMTP id z20-v6mr1164398plo.391.1526105006476; Fri, 11 May 2018 23:03:26 -0700 (PDT) Received: from localhost ([2620:0:1000:1600:3122:ea9c:d178:eb]) by smtp.gmail.com with ESMTPSA id s4-v6sm7056452pgp.35.2018.05.11.23.03.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 11 May 2018 23:03:25 -0700 (PDT) Date: Fri, 11 May 2018 23:03:25 -0700 From: Joel Fernandes To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel.opensrc@gmail.com, torvalds@linux-foundation.org, npiggin@gmail.com Subject: Re: [tip/core/rcu,16/21] rcu: Add funnel locking to rcu_start_this_gp() Message-ID: <20180512060325.GA53808@joelaf.mtv.corp.google.com> References: <1524452624-27589-16-git-send-email-paulmck@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1524452624-27589-16-git-send-email-paulmck@linux.vnet.ibm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 22, 2018 at 08:03:39PM -0700, Paul E. McKenney wrote: > The rcu_start_this_gp() function had a simple form of funnel locking that > used only the leaves and root of the rcu_node tree, which is fine for > systems with only a few hundred CPUs, but sub-optimal for systems having > thousands of CPUs. This commit therefore adds full-tree funnel locking. > > This variant of funnel locking is unusual in the following ways: > > 1. The leaf-level rcu_node structure's ->lock is held throughout. > Other funnel-locking implementations drop the leaf-level lock > before progressing to the next level of the tree. > > 2. Funnel locking can be started at the root, which is convenient > for code that already holds the root rcu_node structure's ->lock. > Other funnel-locking implementations start at the leaves. > > 3. If an rcu_node structure other than the initial one believes > that a grace period is in progress, it is not necessary to > go further up the tree. This is because grace-period cleanup > scans the full tree, so that marking the need for a subsequent > grace period anywhere in the tree suffices -- but only if > a grace period is currently in progress. > > 4. It is possible that the RCU grace-period kthread has not yet > started, and this case must be handled appropriately. > > However, the general approach of using a tree to control lock contention > is still in place. > > Signed-off-by: Paul E. McKenney > --- > kernel/rcu/tree.c | 92 +++++++++++++++++++++---------------------------------- > 1 file changed, 35 insertions(+), 57 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 94519c7d552f..d3c769502929 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1682,74 +1682,52 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, > { > bool ret = false; > struct rcu_state *rsp = rdp->rsp; > - struct rcu_node *rnp_root = rcu_get_root(rsp); > - > - raw_lockdep_assert_held_rcu_node(rnp); > - > - /* If the specified GP is already known needed, return to caller. */ > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > - if (need_future_gp_element(rnp, c)) { > - trace_rcu_this_gp(rnp, rdp, c, TPS("Prestartleaf")); > - goto out; > - } > + struct rcu_node *rnp_root; > > /* > - * If this rcu_node structure believes that a grace period is in > - * progress, then we must wait for the one following, which is in > - * "c". Because our request will be noticed at the end of the > - * current grace period, we don't need to explicitly start one. > + * Use funnel locking to either acquire the root rcu_node > + * structure's lock or bail out if the need for this grace period > + * has already been recorded -- or has already started. If there > + * is already a grace period in progress in a non-leaf node, no > + * recording is needed because the end of the grace period will > + * scan the leaf rcu_node structures. Note that rnp->lock must > + * not be released. > */ > - if (rnp->gpnum != rnp->completed) { > - need_future_gp_element(rnp, c) = true; > - trace_rcu_this_gp(rnp, rdp, c, TPS("Startedleaf")); > - goto out; Referring to the above negative diff as [1] (which I wanted to refer to later in this message..) > + raw_lockdep_assert_held_rcu_node(rnp); > + trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); > + for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { > + if (rnp_root != rnp) > + raw_spin_lock_rcu_node(rnp_root); > + if (need_future_gp_element(rnp_root, c) || > + ULONG_CMP_GE(rnp_root->gpnum, c) || > + (rnp != rnp_root && > + rnp_root->gpnum != rnp_root->completed)) { > + trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); > + goto unlock_out; I was a bit confused about the implementation of the above for loop: In the previous code (which I refer to in the negative diff [1]), we were checking the leaf, and if the leaf believed that RCU was not idle, then we were marking the need for the future GP and quitting this function. In the new code, it seems like even if the leaf believes RCU is not-idle, we still go all the way up the tree. I think the big change is, in the above new for loop, we either bail of if a future GP need was already marked by an intermediate node, or we go marking up the whole tree about the need for one. If a leaf believes RCU is not idle, can we not just mark the future GP need like before and return? It seems we would otherwise increase the lock contention since now we lock intermediate nodes and then finally even the root. Where as before we were not doing that if the leaf believed RCU was not idle. I am sorry if I missed something obvious. The other thing is we now don't have the 'Startedleaf' trace like we did before. I sent a patch to remove it, but I think the removal of that is somehow connected to what I was just talking about.. and I was thinking if we should really remove it. Should we add the case for checking leaves only back or is that a bad thing to do? thanks, - Joel > + } > + need_future_gp_element(rnp_root, c) = true; > + if (rnp_root != rnp && rnp_root->parent != NULL) > + raw_spin_unlock_rcu_node(rnp_root); > + if (!rnp_root->parent) > + break; /* At root, and perhaps also leaf. */ > } > [...]