Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753643AbbLDXno (ORCPT ); Fri, 4 Dec 2015 18:43:44 -0500 Received: from casper.infradead.org ([85.118.1.10]:41287 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751422AbbLDXnn (ORCPT ); Fri, 4 Dec 2015 18:43:43 -0500 Date: Sat, 5 Dec 2015 00:43:37 +0100 From: Peter Zijlstra To: Linus Torvalds Cc: Waiman Long , Will Deacon , Ingo Molnar , Oleg Nesterov , Linux Kernel Mailing List , Paul McKenney , Boqun Feng , Jonathan Corbet , Michal Hocko , David Howells , Paul Turner Subject: Re: [PATCH 3/4] locking: Introduce smp_cond_acquire() Message-ID: <20151204234337.GL17308@twins.programming.kicks-ass.net> References: <20151203124010.627312076@infradead.org> <20151203124339.552838970@infradead.org> <20151203163725.GJ11337@arm.com> <20151203202627.GV17308@twins.programming.kicks-ass.net> <5661FCD0.60909@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2201 Lines: 65 On Fri, Dec 04, 2015 at 02:05:49PM -0800, Linus Torvalds wrote: > Of course, I suspect we should not use READ_ONCE(), but some > architecture-overridable version that just defaults to READ_ONCE(). > Same goes for that "smp_rmb()". Because maybe some architectures will > just prefer an explicit acquire, and I suspect we do *not* want > architectures having to recreate and override that crazy loop. > > How much does this all actually end up mattering, btw? Not sure, I'll have to let Will quantify that. But the whole reason we're having this discussion is that ARM64 has a MONITOR+MWAIT like construct that they'd like to use to avoid the spinning. Of course, in order to use that, they _have_ to override the crazy loop. Now, Will and I spoke earlier today, and the version proposed by me (and you, since that is roughly similar) will indeed work for them in that it would allow them to rewrite the thing something like: typeof(*ptr) VAL; for (;;) { VAL = READ_ONCE(*ptr); if (expr) break; cmp_and_wait(ptr, VAL); } Where their cmd_and_wait(ptr, val) looks a little like: asm volatile( " ldxr %w0, %1 \n" " sub %w0, %w0, %2 \n" " cbnz 1f \n" " wfe \n" "1:" : "=&r" (tmp) : "Q" (*ptr), "r" (val) ); (excuse my poor ARM asm foo) Which sets up a load-exclusive monitor, compares if the value loaded matches what we previously saw, and if so, wait-for-event. WFE will wake on any event that would've also invalidated a subsequent stxr or store-exclusive. ARM64 also of course can choose to use load-acquire instead of the READ_ONCE(), or still issue the smp_rmb(), dunno what is best for them. The load-acquire would (potentially) be issued multiple times, vs the rmb only once. I'll let Will sort that. In any case, WFE is both better for power consumption and lowers the cacheline pressure, ie. nobody keeps trying to pull the line into shared state all the time while you're trying to get a store done. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/