Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763973AbdDSNsq (ORCPT ); Wed, 19 Apr 2017 09:48:46 -0400 Received: from merlin.infradead.org ([205.233.59.134]:56356 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1763831AbdDSNsn (ORCPT ); Wed, 19 Apr 2017 09:48:43 -0400 Date: Wed, 19 Apr 2017 15:48:35 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com Subject: Re: [PATCH tip/core/rcu 04/13] rcu: Make RCU_FANOUT_LEAF help text more explicit about skew_tick Message-ID: <20170419134835.bpuhurle2jjr66hm@hirez.programming.kicks-ass.net> References: <20170413160332.GZ3956@linux.vnet.ibm.com> <20170413161948.ymvzlzhporgmldvn@hirez.programming.kicks-ass.net> <20170413165516.GI3956@linux.vnet.ibm.com> <20170413170434.xk4zq3p75pu3ubxw@hirez.programming.kicks-ass.net> <20170413173100.GL3956@linux.vnet.ibm.com> <20170413174631.56ycg545gwbsb4q2@hirez.programming.kicks-ass.net> <20170413181926.GP3956@linux.vnet.ibm.com> <20170413182309.vmyivo3oqrtfhhxt@hirez.programming.kicks-ass.net> <20170413184232.GQ3956@linux.vnet.ibm.com> <20170419132226.yvo3jyweb3d2a632@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170419132226.yvo3jyweb3d2a632@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1721 Lines: 38 On Wed, Apr 19, 2017 at 03:22:26PM +0200, Peter Zijlstra wrote: > On Thu, Apr 13, 2017 at 11:42:32AM -0700, Paul E. McKenney wrote: > > > I believe that you are missing the fact that RCU grace-period > > initialization and cleanup walks through the rcu_node tree breadth > > first, using rcu_for_each_node_breadth_first(). > > Indeed. That is the part I completely missed. > > > This macro (shown below) > > implements this breadth-first walk using a simple sequential traversal of > > the ->node[] array that provides the structures making up the rcu_node > > tree. As you can see, this scan is completely independent of how CPU > > numbers might be mapped to rcu_data slots in the leaf rcu_node structures. > > So this code is clearly not a hotpath, but still its performance > matters? > > Seems like you cannot win here :/ So I sort of see what that code does, but I cannot quite grasp from the comments near there _why_ it is doing this. My thinking is that normal (active CPUs) will update their state at tick time through the tree, and once the state reaches the root node, IOW all CPUs agree they've observed that particular state, we advance the global state, rinse repeat. That's how tree-rcu works. NOHZ-idle stuff would be excluded entirely; that is, if we're allowed to go idle we're up-to-date, and completely drop out of the state tracking. When we become active again, we can simply sync the CPU's state to the active state and go from there -- ignoring whatever happened in the mean-time. So why do we have to do machine wide updates? How can we get at the end up a grace period without all CPUs already agreeing that its complete? /me puzzled.