Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751940AbbDZLDA (ORCPT ); Sun, 26 Apr 2015 07:03:00 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:59311 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751899AbbDZLC5 (ORCPT ); Sun, 26 Apr 2015 07:02:57 -0400 Date: Sun, 26 Apr 2015 03:52:13 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Manfred Spraul , Oleg Nesterov , Kirill Tkhai , linux-kernel@vger.kernel.org, Ingo Molnar , Josh Poimboeuf Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles Message-ID: <20150426105213.GA27191@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150217160532.GW4166@linux.vnet.ibm.com> <20150217183636.GR5029@twins.programming.kicks-ass.net> <20150217215231.GK4166@linux.vnet.ibm.com> <20150218155904.GA27687@redhat.com> <54E4E479.4050003@colorfullife.com> <20150218224317.GC5029@twins.programming.kicks-ass.net> <20150219141905.GA11018@redhat.com> <54E77CC0.5030401@colorfullife.com> <20150220184551.GQ2896@worktop.programming.kicks-ass.net> <20150425195602.GA26676@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150425195602.GA26676@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15042611-0033-0000-0000-00000459C71A Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 14279 Lines: 353 On Sat, Apr 25, 2015 at 12:56:02PM -0700, Paul E. McKenney wrote: > On Fri, Feb 20, 2015 at 07:45:51PM +0100, Peter Zijlstra wrote: > > On Fri, Feb 20, 2015 at 07:28:16PM +0100, Manfred Spraul wrote: > > > > > >We need the full barrier to serialize STORE's as well, but probably we can > > > >rely on control dependancy and thus we only need rmb(). > > > Do we need a full barrier or not? > > > > > > I don't manage to create a proper line of reasoning. > > > > I think I agree with Oleg in that we only need the smp_rmb(); of course > > that wants a somewhat elaborate comment to go along with it. How about > > something like so: > > > > spin_unlock_wait(&local); And then an smp_read_barrier_depends() would be needed either here or embedded in apin_unlock_wait(). But we also need to check the spin_unlock_wait() implementations to see if any are potentially vulnerable to compiler misbehavior due to lack of ACCESS_ONCE(), READ_ONCE(), or other sources of the required volatility: o alpha lacks ACCESS_ONCE() or READ_ONCE(), though admittedly the potential for damage is at least somewhat limited by the fact that smp_read_barrier_depends() is a full memory barrier on Alpha. o arc: lacks ACCESS_ONCE() or READ_ONCE(), but field is volatile, so OK.. o arm: looks good, the READ_ONCE() in arch_spin_is_locked() should do it. o arm64: ditto. o blackfin: Drops into assembly, which should keep the compiler suitably constrained. o cris: lacks ACCESS_ONCE() or READ_ONCE(). Not sure where SMP cris is getting its arch_spinlock_t from, though. Except for UP, which gets volatile fields. o hexagon: lacks ACCESS_ONCE() or READ_ONCE(), but field is volatile, so OK. o ia64: Uses ACCESS_ONCE(), so OK. o m32r: Volatile cast, so OK. READ_ONCE() might be nicer. o metag: Drops into assembly, so OK. o metag #2: volatile declaration on field, so OK. o mips: ACCESS_ONCE(), so OK. o mn10300: Volatile cast, so OK. READ_ONCE() might be nicer. o parisc: Obscure but very real volatile cast, so OK. Or, in another branch of #ifdef, volatile field, so also OK. o powerpc: Volatile field, so OK. o s390: ACCESS_ONCE(), so OK. o sh: Volatile field, so OK. o sparc: Volatile cast, so OK. READ_ONCE() might be nicer. Or, in another path, volatile field, so also OK. o tile: For 32-bit, looks like a bug. Compares ->current_ticket and ->next_ticket with no obvious protection. The compiler is free to load them in either order, so it is possible that the two fields could compare equal despite never having actually been equal at any given time. Needs something like arm, arm64, mips, or x86 to do single fetch, then compare fields in quantity fetched. Except that this appears to be using int on a 32-bit system, thus might not have a 64-bit load. If that is the case, the trick would be to load them in order. Except that this can be defeated by overflow. Are there really 32-bit tile systems with enough CPUs to overflow an unsigned short? For 64-bit, a READ_ONCE() appears to be in order -- no obvious volatility present. o x86: Uses READ_ONCE(), so OK. o xtensa: Uses volatile field, so OK. o UP: Uses volatile fields, so OK. So tile needs help in addition to the need for smp_read_barrier_depeneds(). One could argue that only Alpha really needs smp_read_barrier_depends(), so the others can avoid it. No opinion myself, at least not at the moment. Thanx, Paul > > /* > > * The above spin_unlock_wait() forms a control dependency with > > * any following stores; because we must first observe the lock > > * unlocked and we cannot speculate stores. > > * > > * Subsequent loads however can easily pass through the loads > > * represented by spin_unlock_wait() and therefore we need the > > * read barrier. > > * > > * This together is stronger than ACQUIRE for @local and > > * therefore we will observe the complete prior critical section > > * of @local. > > */ > > smp_rmb(); > > > > The obvious alternative is using spin_unlock_wait() with an > > smp_load_acquire(), but that might be more expensive on some archs due > > to repeated issuing of memory barriers. > > Please accept my apologies for the late reply. And thank you to Peter > for prodding me (again!) to think more carefully about this. Because we > have a bit of a comedy of errors at the moment. I will start with > mine, namely that we need to support DEC Alpha, which means we cannot > use ACCESS_ONCE() or READ_ONCE() to head a control-dependency chain. > Alpha needs an mb instruction for control dependencies, just as it does > for data dependencies. > > I have the following (untested, probably does not even build) patch > queued to introduce READ_ONCE_CTRL() to head control depedencies, making > them safe for DEC Alpha. And, just as important, also making them more > self-documenting. > > But the fun doesn't end there! Some architectures have "interesting" > implementations of spin_unlock_wait() and arch_spin_is_locked(). I will > deal with these in a follow-up email. > > In the meantime, thoughts on the patch below? > > Thanx, Paul > > ------------------------------------------------------------------------ > > smp: Make control dependencies work on Alpha, improve documentation > > The current formulation of control dependencies fails on DEC Alpha, which > does not respect dependencies of any kind unless an explicit memory is > provided. This means that the current fomulation of control dependencies > fails on Alpha. This commit therefore creates a READ_ONCE_CTRL() that > has the same overhead on non-Alpha systems, but causes Alpha to produce > the needed ordering. This commit also applies READ_ONCE_CTRL() to the > one known use of control dependencies. > > Use of READ_ONCE_CTRL() also has the beneficial effect of adding a bit > of self-documentation to control dependencies. > > Signed-off-by: Paul E. McKenney > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 52c320e3f107..16897552dc1c 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -617,16 +617,16 @@ case what's actually required is: > However, stores are not speculated. This means that ordering -is- provided > for load-store control dependencies, as in the following example: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > if (q) { > ACCESS_ONCE(b) = p; > } > > -Control dependencies pair normally with other types of barriers. > -That said, please note that ACCESS_ONCE() is not optional! Without the > -ACCESS_ONCE(), might combine the load from 'a' with other loads from > -'a', and the store to 'b' with other stores to 'b', with possible highly > -counterintuitive effects on ordering. > +Control dependencies pair normally with other types of barriers. That > +said, please note that READ_ONCE_CTRL() is not optional! Without the > +READ_ONCE_CTRL(), the compiler might combine the load from 'a' with > +other loads from 'a', and the store to 'b' with other stores to 'b', > +with possible highly counterintuitive effects on ordering. > > Worse yet, if the compiler is able to prove (say) that the value of > variable 'a' is always non-zero, it would be well within its rights > @@ -636,12 +636,15 @@ as follows: > q = a; > b = p; /* BUG: Compiler and CPU can both reorder!!! */ > > -So don't leave out the ACCESS_ONCE(). > +Finally, the READ_ONCE_CTRL() includes an smp_read_barrier_depends() > +that DEC Alpha needs in order to respect control depedencies. > + > +So don't leave out the READ_ONCE_CTRL(). > > It is tempting to try to enforce ordering on identical stores on both > branches of the "if" statement as follows: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > if (q) { > barrier(); > ACCESS_ONCE(b) = p; > @@ -655,7 +658,7 @@ branches of the "if" statement as follows: > Unfortunately, current compilers will transform this as follows at high > optimization levels: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > barrier(); > ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ > if (q) { > @@ -685,7 +688,7 @@ memory barriers, for example, smp_store_release(): > In contrast, without explicit memory barriers, two-legged-if control > ordering is guaranteed only when the stores differ, for example: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > if (q) { > ACCESS_ONCE(b) = p; > do_something(); > @@ -694,14 +697,14 @@ ordering is guaranteed only when the stores differ, for example: > do_something_else(); > } > > -The initial ACCESS_ONCE() is still required to prevent the compiler from > -proving the value of 'a'. > +The initial READ_ONCE_CTRL() is still required to prevent the compiler > +from proving the value of 'a'. > > In addition, you need to be careful what you do with the local variable 'q', > otherwise the compiler might be able to guess the value and again remove > the needed conditional. For example: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > if (q % MAX) { > ACCESS_ONCE(b) = p; > do_something(); > @@ -714,7 +717,7 @@ If MAX is defined to be 1, then the compiler knows that (q % MAX) is > equal to zero, in which case the compiler is within its rights to > transform the above code into the following: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > ACCESS_ONCE(b) = p; > do_something_else(); > > @@ -725,7 +728,7 @@ is gone, and the barrier won't bring it back. Therefore, if you are > relying on this ordering, you should make sure that MAX is greater than > one, perhaps as follows: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ > if (q % MAX) { > ACCESS_ONCE(b) = p; > @@ -742,14 +745,15 @@ of the 'if' statement. > You must also be careful not to rely too much on boolean short-circuit > evaluation. Consider this example: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > if (a || 1 > 0) > ACCESS_ONCE(b) = 1; > > -Because the second condition is always true, the compiler can transform > -this example as following, defeating control dependency: > +Because the first condition cannot fault and the second condition is > +always true, the compiler can transform this example as following, > +defeating control dependency: > > - q = ACCESS_ONCE(a); > + q = READ_ONCE_CTRL(a); > ACCESS_ONCE(b) = 1; > > This example underscores the need to ensure that the compiler cannot > @@ -762,8 +766,8 @@ demonstrated by two related examples, with the initial values of > x and y both being zero: > > CPU 0 CPU 1 > - ===================== ===================== > - r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y); > + ======================= ======================= > + r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y); > if (r1 > 0) if (r2 > 0) > ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1; > > @@ -783,7 +787,8 @@ But because control dependencies do -not- provide transitivity, the above > assertion can fail after the combined three-CPU example completes. If you > need the three-CPU example to provide ordering, you will need smp_mb() > between the loads and stores in the CPU 0 and CPU 1 code fragments, > -that is, just before or just after the "if" statements. > +that is, just before or just after the "if" statements. Furthermore, > +the original two-CPU example is very fragile and should be avoided. > > These two examples are the LB and WWC litmus tests from this paper: > http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this > @@ -791,6 +796,12 @@ site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html. > > In summary: > > + (*) Control dependencies must be headed by READ_ONCE_CTRL(). > + Or, as a much less preferable alternative, interpose > + be headed by READ_ONCE() or an ACCESS_ONCE() read and must > + have smp_read_barrier_depends() between this read and the > + control-dependent write. > + > (*) Control dependencies can order prior loads against later stores. > However, they do -not- guarantee any other sort of ordering: > Not prior loads against later loads, nor prior stores against > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 1b45e4a0519b..a57eacde2b84 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -264,6 +264,22 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s > #define WRITE_ONCE(x, val) \ > ({ typeof(x) __val = (val); __write_once_size(&(x), &__val, sizeof(__val)); __val; }) > > +/** > + * READ_ONCE_CTRL - Read a value heading a control dependency > + * @x: The value to be read, heading the control dependency > + * > + * Control dependencies are tricky. See Documentation/memory-barriers.txt > + * for important information on how to use them. Note that in many cases, > + * use of smp_load_acquire() will be much simpler. Control dependencies > + * should be avoided except on the hottest of hotpaths. > + */ > +#define READ_ONCE_CTRL(x) \ > +({ \ > + typeof(x) __val = READ_ONCE(x); \ > + smp_read_barrier_depends(); /* Enforce control dependency. */ \ > + __val; \ > +}) > + > #endif /* __KERNEL__ */ > > #endif /* __ASSEMBLY__ */ > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index eadb95ce7aac..67548d5de4cb 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -141,7 +141,7 @@ int perf_output_begin(struct perf_output_handle *handle, > perf_output_get_handle(handle); > > do { > - tail = ACCESS_ONCE(rb->user_page->data_tail); > + tail = READ_ONCE_CTRL(rb->user_page->data_tail); > offset = head = local_read(&rb->head); > if (!rb->overwrite && > unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/