Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>, linux-doc@vger.kernel.org,
        linux-kernel@vger.kernel.org, Akira Yokosawa <akiyks@gmail.com>
References: <0d60072f-3848-a2b7-ce18-e1ed317b5c09@gmail.com>
 <20170719174334.GH3730@linux.vnet.ibm.com>
 <101f5108-663e-7fa4-ac2b-e790320e4e6f@gmail.com>
 <20170719215602.GK3730@linux.vnet.ibm.com>
 <20170720013112.fmrml6abdhi2nqdt@tardis>
 <20170720054704.GM3730@linux.vnet.ibm.com>
 <bb6050e9-91a8-f04c-809c-7a22aed232a8@gmail.com>
 <20170720161152.GQ3730@linux.vnet.ibm.com>
 <d2d26fb2-dd01-9532-0cfc-3c90bc024cb7@gmail.com>
 <20170720214234.GY3730@linux.vnet.ibm.com>
From: Akira Yokosawa <akiyks@gmail.com>
Message-ID: <55457ca1-a8db-213c-3b9c-ead441f97200@gmail.com>
Date: Fri, 21 Jul 2017 07:52:03 +0900
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <20170720214234.GY3730@linux.vnet.ibm.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 11269
Lines: 283

On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:
>> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
>>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
>>>> On 2017/07/20 14:47, Paul E. McKenney wrote:
>>>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
>>>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
>>>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
>>>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:
>>>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
>>>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
>>>>>>>>>> From: Akira Yokosawa <akiyks@gmail.com>
>>>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900
>>>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example
>>>>>>>>>>
>>>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
>>>>>>>>>> no-transitivity example"), the operator in "if" statement of
>>>>>>>>>> the two-CPU example was modified from ">=" to ">".
>>>>>>>>>> Now the example misses the point because there is no party
>>>>>>>>>> who will modify "x" nor "y". So each CPU performs only the
>>>>>>>>>> READ_ONCE().
>>>>>>>>>>
>>>>>>>>>> The point of this example is to use control dependency for ordering,
>>>>>>>>>> and the WRITE_ONCE() should always be executed.
>>>>>>>>>>
>>>>>>>>>> So it was correct prior to the above mentioned commit. Partial
>>>>>>>>>> revert of the commit (with context adjustments regarding other
>>>>>>>>>> changes thereafter) restores the point.
>>>>>>>>>>
>>>>>>>>>> Note that the three-CPU example demonstrating the lack of
>>>>>>>>>> transitivity stands regardless of this partial revert.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
>>>>>>>>>
>>>>>>>>> Hello, Akira,
>>>>>>>>>
>>>>>>>>> You are quite right that many compilers would generate straightforward
>>>>>>>>> code for the code fragment below, and in that case, the assertion could
>>>>>>>>> never trigger due to either TSO or control dependencies, depending on
>>>>>>>>> the architecture this was running on.
>>>>>>>>>
>>>>>>>>> However, if the compiler was too smart for our good, it could figure
>>>>>>>>> out that "x" and "y" only take on the values zero and one, so that
>>>>>>>>> the "if" would always be taken.  At that point, the compiler could
>>>>>>>>> simply ignore the "if" with the result shown below.
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>  Documentation/memory-barriers.txt | 2 +-
>>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>>>>>>>>>> index c4ddfcd..c1ebe99 100644
>>>>>>>>>> --- a/Documentation/memory-barriers.txt
>>>>>>>>>> +++ b/Documentation/memory-barriers.txt
>>>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of
>>>>>>>>>>  	CPU 0                     CPU 1
>>>>>>>>>>  	=======================   =======================
>>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>>>> -	if (r1 > 0)               if (r2 > 0)
>>>>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)
>>>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>>>>
>>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
>>>>>>>>>
>>>>>>>>> Original program:
>>>>>>>>>
>>>>>>>>>  	CPU 0                     CPU 1
>>>>>>>>>  	=======================   =======================
>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>>> 	if (r1 >= 0)              if (r2 >= 0)
>>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>>>
>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
>>>>>>>>>
>>>>>>>>> Enthusiastically optimized program:
>>>>>>>>>
>>>>>>>>>  	CPU 0                     CPU 1
>>>>>>>>>  	=======================   =======================
>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>>>
>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
>>>>>>>>>
>>>>>>>>> This optimized version could trigger the assertion.
>>>>>>>>>
>>>>>>>>> So we do need to stick with the ">" comparison.
>>>>>>>>
>>>>>>>> Well but,
>>>>>>>>
>>>>>>>> Current example:
>>>>>>>>
>>>>>>>>  	CPU 0                     CPU 1
>>>>>>>>  	=======================   =======================
>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>> 	if (r1 > 0)               if (r2 > 0)
>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>>
>>>>>>>> 	assert(!(r1 == 1 && r2 == 1));
>>>>>>>>
>>>>>>>> Such a clever compiler might be able to prove that "x" and "y"
>>>>>>>> are never modified and end up in the following:
>>>>>>>>
>>>>>>
>>>>>> Hi Akira,
>>>>>>
>>>>>> I wouldn't call that compiler a clever one, it's a broken one ;-)
>>>>>>
>>>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the
>>>>>> compiler has to generate code that actually does a load, so the values
>>>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 
>>>>>> optimize the if()s out because the volatile semantics of READ_ONCE().
>>>>>> Otherwise, I think we have a lot more to worry about than this case.
>>>>>>
>>>>>>>>  	CPU 0                     CPU 1
>>>>>>>>  	=======================   =======================
>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>>
>>>>>>>> 	assert(!(r1 == 1 && r2 == 1));
>>>>>>>>
>>>>>>>> This means it is impossible to describe this example in C,
>>>>>>>> doesn't it?
>>>>>>>
>>>>>>> That is a matter of some debate, but it has gotten increasingly more
>>>>>>> difficult to get C to do one's bidding over the decades.
>>>>>>>
>>>>>>>> What am I missing here?
>>>>>>>
>>>>>>> The compiler has to work harder in your example case, so it is probably
>>>>>>> just a matter of time.  In the original with ">=", all the compiler needed
>>>>>>> to do was look at all the assignments and the initial value.  In the
>>>>>>> original, it also had to do reasoning based on control dependencies
>>>>>>> (which are not yet part of the C standard).
>>>>>>>
>>>>>>> But yes, the degree to which a compiler can optimize atomics is a matter
>>>>>>> of some debate.  Here is a proposal to let the developer choose:
>>>>>>>
>>>>>>
>>>>>> Hi Paul,
>>>>>>
>>>>>> I know the compiler could optimize atomics in very interesting ways, but
>>>>>> this case is about volatile, so I guess our case is still fine? ;-)
>>>>>
>>>>> Hello, Boqun,
>>>>>
>>>>> When I asked that question, the answer I got was "the compiler must
>>>>> emit the load instruction, but is under no obligation to actually use the
>>>>> value loaded".
>>>>
>>>> So, it sounds like the following description found in memory-barriers.txt
>>>> just above the example of lack of transitivity:
>>>>
>>>>> However, stores are not speculated.  This means that ordering -is- provided
>>>>> for load-store control dependencies, as in the following example:
>>>>>
>>>>> 	q = READ_ONCE(a);
>>>>> 	if (q) {
>>>>> 		WRITE_ONCE(b, 1);
>>>>> 	}
>>>>>
>>>>> Control dependencies pair normally with other types of barriers.
>>>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()
>>>>> are optional! Without the READ_ONCE(), the compiler might combine the
>>>>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),
>>>>> the compiler might combine the store to 'b' with other stores to 'b'.
>>>>> Either can result in highly counterintuitive effects on ordering.
>>>>>
>>>>> Worse yet, if the compiler is able to prove (say) that the value of
>>>>> variable 'a' is always non-zero, it would be well within its rights
>>>>> to optimize the original example by eliminating the "if" statement
>>>>> as follows:
>>>>>
>>>>> 	q = a;
>>>>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */
>>>>>
>>>>> So don't leave out the READ_ONCE().
>>>>
>>>> does not stand if the answer needs to be taken seriously.
>>>> The READ_ONCE() is not good enough to prevent the "if" statement
>>>> from being eliminated.
>>>>
>>>> Hmm... If we do need to care about such an extreme optimization,
>>>> we should not rely on any control dependency in C source code
>>>> at least until the compiler gets educated about control dependency.
>>>>
>>>> Is there some reasonable compromise?
>>>
>>> Right now, what we have are the volatile loads such as from READ_ONCE()
>>> and WRITE_ONCE().  My defense of them has been that they need to properly
>>> handle MMIO.  But not all compiler writers agree with me, so we should
>>> program at least a bit defensively.  Though we should probably also
>>> be writing tests to verify that the compiler is doing the right thing,
>>> and those litmus tests should probably include your ">=" litmus test.
>>> But we should not encourage developers to rely on it quite yet.
>>>
>>> My current position is that compiler writers are currently unlikely to
>>> make their compilers do deep reasoning around the memory model, and that
>>> they are currently unlikely to do cross-translation-unit assigned-value
>>> analysis (though LTO might be changing that).  But given a static atomic
>>> that is visible only within a translation unit, I would expect them
>>> to do value-range analysis.  Hence my reluctance to present the ">="
>>> variant as a pattern to follow.
>>
>> So if we declare "x" and "y" are not static (not file local),
>> can the ">=" variant be exempted from the possible optimization?
>>
>> For example:
>>
>> 	extern int x, y;
>>
>> 	CPU 0                     CPU 1
>> 	=======================   =======================
>> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>> 	if (r1 >= 0)              if (r2 >= 0)
>> 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>
>> 	assert(!(r1 == 1 && r2 == 1));
>>
>> This looks safe to me.
> 
> For the compilers I know about at the present time, yes.

So if I respin the patch with the extern, would you still feel reluctant?

             Regards, Akira

> 
> The underlying problem is that the standard says almost nothing about what
> volatile does.  I usually argue that it was intended to be used for MMIO,
> so any optimization that would prevent a device driver from working must
> be prohibited on volatiles.  A lot of people really don't like volatile,
> and would like to eliminate it, and these people also aren't very happy
> about any attempt to better define volatile.
> 
> Keeps things entertaining.  ;-)
> 
> 							Thanx, Paul
> 
>> Thoughts?
>>
>>             Thanks, Akira
>>
>>>
>>> 							Thanx, Paul
>>>
>>>
>>>>            Thanks, Akira
>>>>
>>>>>
>>>>> I don't happen to like that answer, by the way.  ;-)
>>>>>
>>>>> 							Thanx, Paul
>>>>>
>>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html
>>>>>>>
>>>>>>
>>>>>> Great material to wake up mind in the morning! Thanks.
>>>>>>
>>>>>> Regards,
>>>>>> Boqun
>>>>>>
>>>>>>> What are your thoughts on this?
>>>>>>>
>>>>>>> 							Thanx, Paul
>>>>>>>
>>>>>>>>             Thanks, Akira
>>>>>>>>
>>>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,
>>>>>>>>> so please do feel free to continue doing that!
>>>>>>>>>
>>>>>>>>> 							Thanx, Paul
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>