Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964786Ab3GRBdI (ORCPT ); Wed, 17 Jul 2013 21:33:08 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:52128 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756516Ab3GRBdG (ORCPT ); Wed, 17 Jul 2013 21:33:06 -0400 Date: Thu, 18 Jul 2013 03:33:01 +0200 From: Frederic Weisbecker To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, darren@dvhart.com, sbw@mit.edu Subject: Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine Message-ID: <20130718013259.GA7398@somewhere> References: <20130709012934.GA26058@linux.vnet.ibm.com> <1373333406-26979-1-git-send-email-paulmck@linux.vnet.ibm.com> <1373333406-26979-6-git-send-email-paulmck@linux.vnet.ibm.com> <20130717233119.GA2801@somewhere> <20130718004141.GI4161@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130718004141.GI4161@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6549 Lines: 175 On Wed, Jul 17, 2013 at 05:41:41PM -0700, Paul E. McKenney wrote: > On Thu, Jul 18, 2013 at 01:31:21AM +0200, Frederic Weisbecker wrote: > > I'm missing a key here. > > > > Let's imagine that the timekeeper has finally set full_sysidle_state = RCU_SYSIDLE_FULL_NOTED > > with cmpxchg, what guarantees that this CPU is not seeing a stale RCU_SYSIDLE_SHORT value > > for example? > > Good question! Let's see if I have a reasonable answer. ;-) > > I am going to start with the large-CPU case, so that the state is advanced > only by the grace-period kthread. > > 1. Case 1: the timekeeper CPU invoked rcu_sysidle_force_exit(). > In this case, this is the same CPU that set full_sysidle_state > to RCU_SYSIDLE_FULL_NOTED, so it is guaranteed not to see a > stale value. > > 2. Case 2: Some CPU came out of idle, and invoked rcu_sysidle_exit(). > In this case, if this CPU reads a RCU_SYSIDLE_SHORT from > full_sysidle_state, this read must have come before the > cmpxchg() (on some other CPU) that set it to RCU_SYSIDLE_LONG. > Because this CPU's read from full_sysidle_state was preceded by > an atomic_inc() that updated this CPU's ->dynticks_idle, that > update must also precede the cmpxchg() that set full_sysidle_state > to RCU_SYSIDLE_LONG. Because the state advancing is done from > within a single thread, the subsequent scan is guaranteed to see > the first CPU's update of ->dynticks_idle, and will therefore > refrain from advancing full_sysidle_state to RCU_SYSIDLE_FULL. > > This will in turn prevent the timekeeping thread from advancing > the state to RCU_SYSIDLE_FULL_NOTED, so this scenario cannot > happen. Ok, so IIUC the safety is guaranteed in the following ordering: CPU 0 CPU 1 idle = 1 smp_mb() //for each cpu if (atomic_read(rdp(1)->dyntick_idle) & 1) atomic_inc(rdp->dyntick_idle) idle = 0 smp_mb() if (idle) cmpxchg(full_sysidle_state, SHORT, LONG) while (full_sysidle_state > SHORT) //reset with cmpxchg So it's like: CPU 0 CPU 1 read I write I smp_mb() smp_mb() cmpxchg S read S I still can't find what guarantees we don't read a value in CPU 1 that is way below what we want. > > Unfortunately, the reasoning in #2 above does not hold in the small-CPU > case because there is the possibility of both the timekeeping CPU and > the RCU grace-period kthread concurrently advancing the state machine. > This would be bad, good catch!!! It's not like I spotted anything myself but you're welcome :) > > The patch below (untested) is an attempt to fix this. If it actually > works, I will merge it in with 6/7. > > Anything else I missed? ;-) Well I guess I'll wait one more night before trying to understand the below ;) Thanks! > > Thanx, Paul > > ------------------------------------------------------------------------ > > Signed-off-by: Paul E. McKenney > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index aa3f525..fe83085 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -1364,7 +1364,7 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) > } > force_qs_rnp(rsp, dyntick_save_progress_counter, > &isidle, &maxj); > - rcu_sysidle_report(rsp, isidle, maxj); > + rcu_sysidle_report_gp(rsp, isidle, maxj); > fqs_state = RCU_FORCE_QS; > } else { > /* Handle dyntick-idle and offline CPUs. */ > diff --git a/kernel/rcutree.h b/kernel/rcutree.h > index 1602c21..657b415 100644 > --- a/kernel/rcutree.h > +++ b/kernel/rcutree.h > @@ -559,8 +559,8 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, > unsigned long *maxj); > static bool is_sysidle_rcu_state(struct rcu_state *rsp); > static void rcu_bind_gp_kthread(void); > -static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, > - unsigned long maxj); > +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > + unsigned long maxj); > static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); > > #endif /* #ifndef RCU_TREE_NONCORE */ > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h > index a4d44c3..f65d9c2 100644 > --- a/kernel/rcutree_plugin.h > +++ b/kernel/rcutree_plugin.h > @@ -2655,14 +2655,22 @@ static void rcu_sysidle_cancel(void) > * scan of the CPUs' dyntick-idle state. > */ > static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, > - unsigned long maxj) > + unsigned long maxj, bool gpkt) > { > if (rsp != rcu_sysidle_state) > return; /* Wrong flavor, ignore. */ > - if (isidle) > - rcu_sysidle(maxj); /* More idle! */ > - else > + if (isidle) { > + if (gpkt && nr_cpu_ids > RCU_SYSIDLE_SMALL) > + rcu_sysidle(maxj); /* More idle! */ > + } else { > rcu_sysidle_cancel(); /* Idle is over. */ > + } > +} > + > +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > + unsigned long maxj) > +{ > + rcu_sysidle_report(rsp, isidle, maxj, true); > } > > /* Callback and function for forcing an RCU grace period. */ > @@ -2713,7 +2721,8 @@ bool rcu_sys_is_idle(void) > if (!isidle) > break; > } > - rcu_sysidle_report(rcu_sysidle_state, isidle, maxj); > + rcu_sysidle_report(rcu_sysidle_state, > + isidle, maxj, false); > oldrss = rss; > rss = ACCESS_ONCE(full_sysidle_state); > } > @@ -2776,8 +2785,8 @@ static void rcu_bind_gp_kthread(void) > { > } > > -static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, > - unsigned long maxj) > +static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > + unsigned long maxj) > { > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/