Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751119AbdLAQRH (ORCPT ); Fri, 1 Dec 2017 11:17:07 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:18794 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750782AbdLAQRG (ORCPT ); Fri, 1 Dec 2017 11:17:06 -0500 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Fri, 01 Dec 2017 08:17:47 -0800 Subject: Re: Unlock-lock questions and the Linux Kernel Memory Model To: Alan Stern , Boqun Feng CC: "Paul E. McKenney" , Andrea Parri , Luc Maranget , Jade Alglave , Nicholas Piggin , Peter Zijlstra , Will Deacon , David Howells , Palmer Dabbelt , Kernel development list References: From: Daniel Lustig Message-ID: <56ac3b6d-a898-1da0-7ccf-69a6968a923b@nvidia.com> Date: Fri, 1 Dec 2017 08:17:04 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.2.172.114] X-ClientProxiedBy: HQMAIL108.nvidia.com (172.18.146.13) To HQMAIL105.nvidia.com (172.20.187.12) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2256 Lines: 44 On 12/1/2017 7:32 AM, Alan Stern wrote: > On Fri, 1 Dec 2017, Boqun Feng wrote: >>> But even on a non-other-multicopy-atomic system, there has to be some >>> synchronization between the memory controller and P1's CPU. Otherwise, >>> how could the system guarantee that P1's smp_load_acquire would see the >>> post-increment value of y? It seems reasonable to assume that this >>> synchronization would also cause P1 to see x=1. >>> >> >> I agree with you the "reasonable" part ;-) So basically, memory >> controller could only do the write of AMO until P0's second write >> propagated to the memory controller(and because of the wmb(), P0's first >> write must be already propagated to the memory controller, too), so it >> makes sense when the write of AMO propagated from memory controller to >> P1, P0's first write is also propagted to P1. IOW, the write of AMO on >> memory controller acts at least like a release. >> >> However, some part of myself is still a little paranoid, because to my >> understanding, the point of AMO is to get atomic operations executing >> as fast as possible, so maybe, AMO has some fast path for the memory >> controller to forward a write to the CPU that issues the AMO, in that >> way, it will become unreasonable ;-) > > It's true that a hardware design in the future might behave differently > from current hardware. If that ever happens, we will need to rethink > the situation. Maybe the designers will change their hardware to make > it match the memory model. Or maybe the memory model will change. Do you mean all of the above in the context of increment etc, as opposed to swap? ARM hardware in the wild is already documented as forwarding SWP values to subsequent loads early, even past control dependencies. Paul sent this link earlier in the thread. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0735r0.html The reason swap is special is because its store value is available to be forwarded even before the AMO goes out to the memory controller or wherever else it gets its load value from. Also, the case I described is an acquire rather than a control dependency, but it's similar enough that it doesn't seem completely unrealistic to think hardware might try to do this. Dan