Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751654AbdLARSj (ORCPT ); Fri, 1 Dec 2017 12:18:39 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:45722 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750975AbdLARSi (ORCPT ); Fri, 1 Dec 2017 12:18:38 -0500 Date: Fri, 1 Dec 2017 12:18:37 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Daniel Lustig cc: Boqun Feng , "Paul E. McKenney" , Andrea Parri , Luc Maranget , Jade Alglave , Nicholas Piggin , Peter Zijlstra , Will Deacon , David Howells , Palmer Dabbelt , Kernel development list Subject: Re: Unlock-lock questions and the Linux Kernel Memory Model In-Reply-To: <56ac3b6d-a898-1da0-7ccf-69a6968a923b@nvidia.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2821 Lines: 62 On Fri, 1 Dec 2017, Daniel Lustig wrote: > On 12/1/2017 7:32 AM, Alan Stern wrote: > > On Fri, 1 Dec 2017, Boqun Feng wrote: > >>> But even on a non-other-multicopy-atomic system, there has to be some > >>> synchronization between the memory controller and P1's CPU. Otherwise, > >>> how could the system guarantee that P1's smp_load_acquire would see the > >>> post-increment value of y? It seems reasonable to assume that this > >>> synchronization would also cause P1 to see x=1. > >>> > >> > >> I agree with you the "reasonable" part ;-) So basically, memory > >> controller could only do the write of AMO until P0's second write > >> propagated to the memory controller(and because of the wmb(), P0's first > >> write must be already propagated to the memory controller, too), so it > >> makes sense when the write of AMO propagated from memory controller to > >> P1, P0's first write is also propagted to P1. IOW, the write of AMO on > >> memory controller acts at least like a release. > >> > >> However, some part of myself is still a little paranoid, because to my > >> understanding, the point of AMO is to get atomic operations executing > >> as fast as possible, so maybe, AMO has some fast path for the memory > >> controller to forward a write to the CPU that issues the AMO, in that > >> way, it will become unreasonable ;-) > > > > It's true that a hardware design in the future might behave differently > > from current hardware. If that ever happens, we will need to rethink > > the situation. Maybe the designers will change their hardware to make > > it match the memory model. Or maybe the memory model will change. > > Do you mean all of the above in the context of increment etc, as opposed > to swap? ARM hardware in the wild is already documented as forwarding > SWP values to subsequent loads early, even past control dependencies. > Paul sent this link earlier in the thread. > > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0735r0.html > > The reason swap is special is because its store value is available to be > forwarded even before the AMO goes out to the memory controller or > wherever else it gets its load value from. I believe the current intention for herd is as follows: xchg() and similar RMW operations do not generate an internal dependency; cmpxchg() and similar RMW operations generate an internal control dependency; atomic_add() and similar RMW operations generate an internal data dependency. If herd adds support for saturating operations, they will generate at least a data dependency and maybe also a control dependency. Alan > Also, the case I described is an acquire rather than a control > dependency, but it's similar enough that it doesn't seem completely > unrealistic to think hardware might try to do this. > > Dan