Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965809AbdGTWwL (ORCPT ); Thu, 20 Jul 2017 18:52:11 -0400 Received: from mail-pf0-f170.google.com ([209.85.192.170]:33892 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964882AbdGTWwI (ORCPT ); Thu, 20 Jul 2017 18:52:08 -0400 Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example To: "Paul E. McKenney" Cc: Boqun Feng , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Akira Yokosawa References: <0d60072f-3848-a2b7-ce18-e1ed317b5c09@gmail.com> <20170719174334.GH3730@linux.vnet.ibm.com> <101f5108-663e-7fa4-ac2b-e790320e4e6f@gmail.com> <20170719215602.GK3730@linux.vnet.ibm.com> <20170720013112.fmrml6abdhi2nqdt@tardis> <20170720054704.GM3730@linux.vnet.ibm.com> <20170720161152.GQ3730@linux.vnet.ibm.com> <20170720214234.GY3730@linux.vnet.ibm.com> From: Akira Yokosawa Message-ID: <55457ca1-a8db-213c-3b9c-ead441f97200@gmail.com> Date: Fri, 21 Jul 2017 07:52:03 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170720214234.GY3730@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11269 Lines: 283 On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote: > On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote: >> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote: >>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote: >>>> On 2017/07/20 14:47, Paul E. McKenney wrote: >>>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote: >>>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote: >>>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote: >>>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote: >>>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote: >>>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001 >>>>>>>>>> From: Akira Yokosawa >>>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900 >>>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example >>>>>>>>>> >>>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering >>>>>>>>>> no-transitivity example"), the operator in "if" statement of >>>>>>>>>> the two-CPU example was modified from ">=" to ">". >>>>>>>>>> Now the example misses the point because there is no party >>>>>>>>>> who will modify "x" nor "y". So each CPU performs only the >>>>>>>>>> READ_ONCE(). >>>>>>>>>> >>>>>>>>>> The point of this example is to use control dependency for ordering, >>>>>>>>>> and the WRITE_ONCE() should always be executed. >>>>>>>>>> >>>>>>>>>> So it was correct prior to the above mentioned commit. Partial >>>>>>>>>> revert of the commit (with context adjustments regarding other >>>>>>>>>> changes thereafter) restores the point. >>>>>>>>>> >>>>>>>>>> Note that the three-CPU example demonstrating the lack of >>>>>>>>>> transitivity stands regardless of this partial revert. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Akira Yokosawa >>>>>>>>> >>>>>>>>> Hello, Akira, >>>>>>>>> >>>>>>>>> You are quite right that many compilers would generate straightforward >>>>>>>>> code for the code fragment below, and in that case, the assertion could >>>>>>>>> never trigger due to either TSO or control dependencies, depending on >>>>>>>>> the architecture this was running on. >>>>>>>>> >>>>>>>>> However, if the compiler was too smart for our good, it could figure >>>>>>>>> out that "x" and "y" only take on the values zero and one, so that >>>>>>>>> the "if" would always be taken. At that point, the compiler could >>>>>>>>> simply ignore the "if" with the result shown below. >>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> Documentation/memory-barriers.txt | 2 +- >>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt >>>>>>>>>> index c4ddfcd..c1ebe99 100644 >>>>>>>>>> --- a/Documentation/memory-barriers.txt >>>>>>>>>> +++ b/Documentation/memory-barriers.txt >>>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of >>>>>>>>>> CPU 0 CPU 1 >>>>>>>>>> ======================= ======================= >>>>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); >>>>>>>>>> - if (r1 > 0) if (r2 > 0) >>>>>>>>>> + if (r1 >= 0) if (r2 >= 0) >>>>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); >>>>>>>>>> >>>>>>>>>> assert(!(r1 == 1 && r2 == 1)); >>>>>>>>> >>>>>>>>> Original program: >>>>>>>>> >>>>>>>>> CPU 0 CPU 1 >>>>>>>>> ======================= ======================= >>>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); >>>>>>>>> if (r1 >= 0) if (r2 >= 0) >>>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); >>>>>>>>> >>>>>>>>> assert(!(r1 == 1 && r2 == 1)); >>>>>>>>> >>>>>>>>> Enthusiastically optimized program: >>>>>>>>> >>>>>>>>> CPU 0 CPU 1 >>>>>>>>> ======================= ======================= >>>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); >>>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); >>>>>>>>> >>>>>>>>> assert(!(r1 == 1 && r2 == 1)); >>>>>>>>> >>>>>>>>> This optimized version could trigger the assertion. >>>>>>>>> >>>>>>>>> So we do need to stick with the ">" comparison. >>>>>>>> >>>>>>>> Well but, >>>>>>>> >>>>>>>> Current example: >>>>>>>> >>>>>>>> CPU 0 CPU 1 >>>>>>>> ======================= ======================= >>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); >>>>>>>> if (r1 > 0) if (r2 > 0) >>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); >>>>>>>> >>>>>>>> assert(!(r1 == 1 && r2 == 1)); >>>>>>>> >>>>>>>> Such a clever compiler might be able to prove that "x" and "y" >>>>>>>> are never modified and end up in the following: >>>>>>>> >>>>>> >>>>>> Hi Akira, >>>>>> >>>>>> I wouldn't call that compiler a clever one, it's a broken one ;-) >>>>>> >>>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the >>>>>> compiler has to generate code that actually does a load, so the values >>>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not >>>>>> optimize the if()s out because the volatile semantics of READ_ONCE(). >>>>>> Otherwise, I think we have a lot more to worry about than this case. >>>>>> >>>>>>>> CPU 0 CPU 1 >>>>>>>> ======================= ======================= >>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); >>>>>>>> >>>>>>>> assert(!(r1 == 1 && r2 == 1)); >>>>>>>> >>>>>>>> This means it is impossible to describe this example in C, >>>>>>>> doesn't it? >>>>>>> >>>>>>> That is a matter of some debate, but it has gotten increasingly more >>>>>>> difficult to get C to do one's bidding over the decades. >>>>>>> >>>>>>>> What am I missing here? >>>>>>> >>>>>>> The compiler has to work harder in your example case, so it is probably >>>>>>> just a matter of time. In the original with ">=", all the compiler needed >>>>>>> to do was look at all the assignments and the initial value. In the >>>>>>> original, it also had to do reasoning based on control dependencies >>>>>>> (which are not yet part of the C standard). >>>>>>> >>>>>>> But yes, the degree to which a compiler can optimize atomics is a matter >>>>>>> of some debate. Here is a proposal to let the developer choose: >>>>>>> >>>>>> >>>>>> Hi Paul, >>>>>> >>>>>> I know the compiler could optimize atomics in very interesting ways, but >>>>>> this case is about volatile, so I guess our case is still fine? ;-) >>>>> >>>>> Hello, Boqun, >>>>> >>>>> When I asked that question, the answer I got was "the compiler must >>>>> emit the load instruction, but is under no obligation to actually use the >>>>> value loaded". >>>> >>>> So, it sounds like the following description found in memory-barriers.txt >>>> just above the example of lack of transitivity: >>>> >>>>> However, stores are not speculated. This means that ordering -is- provided >>>>> for load-store control dependencies, as in the following example: >>>>> >>>>> q = READ_ONCE(a); >>>>> if (q) { >>>>> WRITE_ONCE(b, 1); >>>>> } >>>>> >>>>> Control dependencies pair normally with other types of barriers. >>>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE() >>>>> are optional! Without the READ_ONCE(), the compiler might combine the >>>>> load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), >>>>> the compiler might combine the store to 'b' with other stores to 'b'. >>>>> Either can result in highly counterintuitive effects on ordering. >>>>> >>>>> Worse yet, if the compiler is able to prove (say) that the value of >>>>> variable 'a' is always non-zero, it would be well within its rights >>>>> to optimize the original example by eliminating the "if" statement >>>>> as follows: >>>>> >>>>> q = a; >>>>> b = 1; /* BUG: Compiler and CPU can both reorder!!! */ >>>>> >>>>> So don't leave out the READ_ONCE(). >>>> >>>> does not stand if the answer needs to be taken seriously. >>>> The READ_ONCE() is not good enough to prevent the "if" statement >>>> from being eliminated. >>>> >>>> Hmm... If we do need to care about such an extreme optimization, >>>> we should not rely on any control dependency in C source code >>>> at least until the compiler gets educated about control dependency. >>>> >>>> Is there some reasonable compromise? >>> >>> Right now, what we have are the volatile loads such as from READ_ONCE() >>> and WRITE_ONCE(). My defense of them has been that they need to properly >>> handle MMIO. But not all compiler writers agree with me, so we should >>> program at least a bit defensively. Though we should probably also >>> be writing tests to verify that the compiler is doing the right thing, >>> and those litmus tests should probably include your ">=" litmus test. >>> But we should not encourage developers to rely on it quite yet. >>> >>> My current position is that compiler writers are currently unlikely to >>> make their compilers do deep reasoning around the memory model, and that >>> they are currently unlikely to do cross-translation-unit assigned-value >>> analysis (though LTO might be changing that). But given a static atomic >>> that is visible only within a translation unit, I would expect them >>> to do value-range analysis. Hence my reluctance to present the ">=" >>> variant as a pattern to follow. >> >> So if we declare "x" and "y" are not static (not file local), >> can the ">=" variant be exempted from the possible optimization? >> >> For example: >> >> extern int x, y; >> >> CPU 0 CPU 1 >> ======================= ======================= >> r1 = READ_ONCE(x); r2 = READ_ONCE(y); >> if (r1 >= 0) if (r2 >= 0) >> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); >> >> assert(!(r1 == 1 && r2 == 1)); >> >> This looks safe to me. > > For the compilers I know about at the present time, yes. So if I respin the patch with the extern, would you still feel reluctant? Regards, Akira > > The underlying problem is that the standard says almost nothing about what > volatile does. I usually argue that it was intended to be used for MMIO, > so any optimization that would prevent a device driver from working must > be prohibited on volatiles. A lot of people really don't like volatile, > and would like to eliminate it, and these people also aren't very happy > about any attempt to better define volatile. > > Keeps things entertaining. ;-) > > Thanx, Paul > >> Thoughts? >> >> Thanks, Akira >> >>> >>> Thanx, Paul >>> >>> >>>> Thanks, Akira >>>> >>>>> >>>>> I don't happen to like that answer, by the way. ;-) >>>>> >>>>> Thanx, Paul >>>>> >>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html >>>>>>> >>>>>> >>>>>> Great material to wake up mind in the morning! Thanks. >>>>>> >>>>>> Regards, >>>>>> Boqun >>>>>> >>>>>>> What are your thoughts on this? >>>>>>> >>>>>>> Thanx, Paul >>>>>>> >>>>>>>> Thanks, Akira >>>>>>>> >>>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt, >>>>>>>>> so please do feel free to continue doing that! >>>>>>>>> >>>>>>>>> Thanx, Paul >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > >