Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934382AbcKILPH (ORCPT ); Wed, 9 Nov 2016 06:15:07 -0500 Received: from mail-vk0-f49.google.com ([209.85.213.49]:34908 "EHLO mail-vk0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933692AbcKILO5 (ORCPT ); Wed, 9 Nov 2016 06:14:57 -0500 MIME-Version: 1.0 In-Reply-To: <20161109014045.GI4127@linux.vnet.ibm.com> References: <1471382376-5443-1-git-send-email-cmetcalf@mellanox.com> <1605b087-2b3b-77c1-01ac-084e378f5f28@mellanox.com> <20161109014045.GI4127@linux.vnet.ibm.com> From: Andy Lutomirski Date: Wed, 9 Nov 2016 03:14:35 -0800 Message-ID: Subject: Re: task isolation discussion at Linux Plumbers To: "Paul E. McKenney" Cc: Chris Metcalf , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Daniel Lezcano , Francis Giraldeau , Andi Kleen , Arnd Bergmann , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5150 Lines: 130 On Tue, Nov 8, 2016 at 5:40 PM, Paul E. McKenney wrote: > commit 49961e272333ac720ac4ccbaba45521bfea259ae > Author: Paul E. McKenney > Date: Tue Nov 8 14:25:21 2016 -0800 > > rcu: Maintain special bits at bottom of ->dynticks counter > > Currently, IPIs are used to force other CPUs to invalidate their TLBs > in response to a kernel virtual-memory mapping change. This works, but > degrades both battery lifetime (for idle CPUs) and real-time response > (for nohz_full CPUs), and in addition results in unnecessary IPIs due to > the fact that CPUs executing in usermode are unaffected by stale kernel > mappings. It would be better to cause a CPU executing in usermode to > wait until it is entering kernel mode to missing words here? > > This commit therefore reserves a bit at the bottom of the ->dynticks > counter, which is checked upon exit from extended quiescent states. If it > is set, it is cleared and then a new rcu_dynticks_special_exit() macro > is invoked, which, if not supplied, is an empty single-pass do-while loop. > If this bottom bit is set on -entry- to an extended quiescent state, > then a WARN_ON_ONCE() triggers. > > This bottom bit may be set using a new rcu_dynticks_special_set() > function, which returns true if the bit was set, or false if the CPU > turned out to not be in an extended quiescent state. Please note that > this function refuses to set the bit for a non-nohz_full CPU when that > CPU is executing in usermode because usermode execution is tracked by > RCU as a dyntick-idle extended quiescent state only for nohz_full CPUs. I'm inclined to suggest s/dynticks/eqs/ in the public API. To me, "dynticks" is a feature, whereas "eqs" means "extended quiescent state" and means something concrete about the CPU state > > Reported-by: Andy Lutomirski > Signed-off-by: Paul E. McKenney > > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h > index 4f9b2fa2173d..130d911e4ba1 100644 > --- a/include/linux/rcutiny.h > +++ b/include/linux/rcutiny.h > @@ -33,6 +33,11 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp) > return 0; > } > > +static inline bool rcu_dynticks_special_set(int cpu) > +{ > + return false; /* Never flag non-existent other CPUs! */ > +} > + > static inline unsigned long get_state_synchronize_rcu(void) > { > return 0; > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index dbf20b058f48..8de83830e86b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -279,23 +279,36 @@ static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { > }; > > /* > + * Steal a bit from the bottom of ->dynticks for idle entry/exit > + * control. Initially this is for TLB flushing. > + */ > +#define RCU_DYNTICK_CTRL_MASK 0x1 > +#define RCU_DYNTICK_CTRL_CTR (RCU_DYNTICK_CTRL_MASK + 1) > +#ifndef rcu_dynticks_special_exit > +#define rcu_dynticks_special_exit() do { } while (0) > +#endif > + > /* > @@ -305,17 +318,21 @@ static void rcu_dynticks_eqs_enter(void) > static void rcu_dynticks_eqs_exit(void) > { > struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); > + int seq; > > /* > - * CPUs seeing atomic_inc() must see prior idle sojourns, > + * CPUs seeing atomic_inc_return() must see prior idle sojourns, > * and we also must force ordering with the next RCU read-side > * critical section. > */ > - smp_mb__before_atomic(); /* See above. */ > - atomic_inc(&rdtp->dynticks); > - smp_mb__after_atomic(); /* See above. */ > + seq = atomic_inc_return(&rdtp->dynticks); > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > - !(atomic_read(&rdtp->dynticks) & 0x1)); > + !(seq & RCU_DYNTICK_CTRL_CTR)); > + if (seq & RCU_DYNTICK_CTRL_MASK) { > + atomic_and(~RCU_DYNTICK_CTRL_MASK, &rdtp->dynticks); > + smp_mb__after_atomic(); /* Clear bits before acting on them */ > + rcu_dynticks_special_exit(); I think this needs to be reversed for NMI safety: do the callback and then clear the bits. > +/* > + * Set the special (bottom) bit of the specified CPU so that it > + * will take special action (such as flushing its TLB) on the > + * next exit from an extended quiescent state. Returns true if > + * the bit was successfully set, or false if the CPU was not in > + * an extended quiescent state. > + */ > +bool rcu_dynticks_special_set(int cpu) > +{ > + int old; > + int new; > + struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); > + > + do { > + old = atomic_read(&rdtp->dynticks); > + if (old & RCU_DYNTICK_CTRL_CTR) > + return false; > + new = old | ~RCU_DYNTICK_CTRL_MASK; Shouldn't this be old | RCU_DYNTICK_CTRL_MASK? > + } while (atomic_cmpxchg(&rdtp->dynticks, old, new) != old); > + return true; > } --Andy