Subject: Re: Unlock-lock questions and the Linux Kernel Memory Model
To: Alan Stern <stern@rowland.harvard.edu>,
        Boqun Feng <boqun.feng@gmail.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Andrea Parri <parri.andrea@gmail.com>,
        Luc Maranget <luc.maranget@inria.fr>,
        Jade Alglave <j.alglave@ucl.ac.uk>,
        Nicholas Piggin <npiggin@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Will Deacon <will.deacon@arm.com>, David Howells <dhowells@redhat.com>,
        Palmer Dabbelt <palmer@dabbelt.com>,
        Kernel development list <linux-kernel@vger.kernel.org>
References: <Pine.LNX.4.44L0.1712011009560.1361-100000@iolanthe.rowland.org>
From: Daniel Lustig <dlustig@nvidia.com>
Message-ID: <56ac3b6d-a898-1da0-7ccf-69a6968a923b@nvidia.com>
Date: Fri, 1 Dec 2017 08:17:04 -0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <Pine.LNX.4.44L0.1712011009560.1361-100000@iolanthe.rowland.org>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2256
Lines: 44

On 12/1/2017 7:32 AM, Alan Stern wrote:
> On Fri, 1 Dec 2017, Boqun Feng wrote:
>>> But even on a non-other-multicopy-atomic system, there has to be some 
>>> synchronization between the memory controller and P1's CPU.  Otherwise, 
>>> how could the system guarantee that P1's smp_load_acquire would see the 
>>> post-increment value of y?  It seems reasonable to assume that this 
>>> synchronization would also cause P1 to see x=1.
>>>
>>
>> I agree with you the "reasonable" part ;-) So basically, memory
>> controller could only do the write of AMO until P0's second write
>> propagated to the memory controller(and because of the wmb(), P0's first
>> write must be already propagated to the memory controller, too), so it
>> makes sense when the write of AMO propagated from memory controller to
>> P1, P0's first write is also propagted to P1. IOW, the write of AMO on
>> memory controller acts at least like a release.
>>
>> However, some part of myself is still a little paranoid, because to my
>> understanding, the point of AMO is to get atomic operations executing
>> as fast as possible, so maybe, AMO has some fast path for the memory
>> controller to forward a write to the CPU that issues the AMO, in that
>> way, it will become unreasonable ;-)
> 
> It's true that a hardware design in the future might behave differently 
> from current hardware.  If that ever happens, we will need to rethink 
> the situation.  Maybe the designers will change their hardware to make 
> it match the memory model.  Or maybe the memory model will change.

Do you mean all of the above in the context of increment etc, as opposed
to swap?  ARM hardware in the wild is already documented as forwarding
SWP values to subsequent loads early, even past control dependencies.
Paul sent this link earlier in the thread.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0735r0.html

The reason swap is special is because its store value is available to be
forwarded even before the AMO goes out to the memory controller or
wherever else it gets its load value from.

Also, the case I described is an acquire rather than a control
dependency, but it's similar enough that it doesn't seem completely
unrealistic to think hardware might try to do this.

Dan