Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965879AbdGTVmi (ORCPT ); Thu, 20 Jul 2017 17:42:38 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:38273 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965350AbdGTVmg (ORCPT ); Thu, 20 Jul 2017 17:42:36 -0400 Date: Thu, 20 Jul 2017 14:42:34 -0700 From: "Paul E. McKenney" To: Akira Yokosawa Cc: Boqun Feng , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example Reply-To: paulmck@linux.vnet.ibm.com References: <0d60072f-3848-a2b7-ce18-e1ed317b5c09@gmail.com> <20170719174334.GH3730@linux.vnet.ibm.com> <101f5108-663e-7fa4-ac2b-e790320e4e6f@gmail.com> <20170719215602.GK3730@linux.vnet.ibm.com> <20170720013112.fmrml6abdhi2nqdt@tardis> <20170720054704.GM3730@linux.vnet.ibm.com> <20170720161152.GQ3730@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17072021-0040-0000-0000-000003836E3B X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007395; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00890462; UDB=6.00444890; IPR=6.00670628; BA=6.00005482; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016302; XFM=3.00000015; UTC=2017-07-20 21:42:34 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17072021-0041-0000-0000-00000777873B Message-Id: <20170720214234.GY3730@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-20_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707200334 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11067 Lines: 275 On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote: > On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote: > > On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote: > >> On 2017/07/20 14:47, Paul E. McKenney wrote: > >>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote: > >>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote: > >>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote: > >>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote: > >>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote: > >>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001 > >>>>>>>> From: Akira Yokosawa > >>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900 > >>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example > >>>>>>>> > >>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering > >>>>>>>> no-transitivity example"), the operator in "if" statement of > >>>>>>>> the two-CPU example was modified from ">=" to ">". > >>>>>>>> Now the example misses the point because there is no party > >>>>>>>> who will modify "x" nor "y". So each CPU performs only the > >>>>>>>> READ_ONCE(). > >>>>>>>> > >>>>>>>> The point of this example is to use control dependency for ordering, > >>>>>>>> and the WRITE_ONCE() should always be executed. > >>>>>>>> > >>>>>>>> So it was correct prior to the above mentioned commit. Partial > >>>>>>>> revert of the commit (with context adjustments regarding other > >>>>>>>> changes thereafter) restores the point. > >>>>>>>> > >>>>>>>> Note that the three-CPU example demonstrating the lack of > >>>>>>>> transitivity stands regardless of this partial revert. > >>>>>>>> > >>>>>>>> Signed-off-by: Akira Yokosawa > >>>>>>> > >>>>>>> Hello, Akira, > >>>>>>> > >>>>>>> You are quite right that many compilers would generate straightforward > >>>>>>> code for the code fragment below, and in that case, the assertion could > >>>>>>> never trigger due to either TSO or control dependencies, depending on > >>>>>>> the architecture this was running on. > >>>>>>> > >>>>>>> However, if the compiler was too smart for our good, it could figure > >>>>>>> out that "x" and "y" only take on the values zero and one, so that > >>>>>>> the "if" would always be taken. At that point, the compiler could > >>>>>>> simply ignore the "if" with the result shown below. > >>>>>>> > >>>>>>>> --- > >>>>>>>> Documentation/memory-barriers.txt | 2 +- > >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>>> > >>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > >>>>>>>> index c4ddfcd..c1ebe99 100644 > >>>>>>>> --- a/Documentation/memory-barriers.txt > >>>>>>>> +++ b/Documentation/memory-barriers.txt > >>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of > >>>>>>>> CPU 0 CPU 1 > >>>>>>>> ======================= ======================= > >>>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>>> - if (r1 > 0) if (r2 > 0) > >>>>>>>> + if (r1 >= 0) if (r2 >= 0) > >>>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>>> > >>>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>> > >>>>>>> Original program: > >>>>>>> > >>>>>>> CPU 0 CPU 1 > >>>>>>> ======================= ======================= > >>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>> if (r1 >= 0) if (r2 >= 0) > >>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>> > >>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>> > >>>>>>> Enthusiastically optimized program: > >>>>>>> > >>>>>>> CPU 0 CPU 1 > >>>>>>> ======================= ======================= > >>>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>>> > >>>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>>> > >>>>>>> This optimized version could trigger the assertion. > >>>>>>> > >>>>>>> So we do need to stick with the ">" comparison. > >>>>>> > >>>>>> Well but, > >>>>>> > >>>>>> Current example: > >>>>>> > >>>>>> CPU 0 CPU 1 > >>>>>> ======================= ======================= > >>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>> if (r1 > 0) if (r2 > 0) > >>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > >>>>>> > >>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>> > >>>>>> Such a clever compiler might be able to prove that "x" and "y" > >>>>>> are never modified and end up in the following: > >>>>>> > >>>> > >>>> Hi Akira, > >>>> > >>>> I wouldn't call that compiler a clever one, it's a broken one ;-) > >>>> > >>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the > >>>> compiler has to generate code that actually does a load, so the values > >>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not > >>>> optimize the if()s out because the volatile semantics of READ_ONCE(). > >>>> Otherwise, I think we have a lot more to worry about than this case. > >>>> > >>>>>> CPU 0 CPU 1 > >>>>>> ======================= ======================= > >>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y); > >>>>>> > >>>>>> assert(!(r1 == 1 && r2 == 1)); > >>>>>> > >>>>>> This means it is impossible to describe this example in C, > >>>>>> doesn't it? > >>>>> > >>>>> That is a matter of some debate, but it has gotten increasingly more > >>>>> difficult to get C to do one's bidding over the decades. > >>>>> > >>>>>> What am I missing here? > >>>>> > >>>>> The compiler has to work harder in your example case, so it is probably > >>>>> just a matter of time. In the original with ">=", all the compiler needed > >>>>> to do was look at all the assignments and the initial value. In the > >>>>> original, it also had to do reasoning based on control dependencies > >>>>> (which are not yet part of the C standard). > >>>>> > >>>>> But yes, the degree to which a compiler can optimize atomics is a matter > >>>>> of some debate. Here is a proposal to let the developer choose: > >>>>> > >>>> > >>>> Hi Paul, > >>>> > >>>> I know the compiler could optimize atomics in very interesting ways, but > >>>> this case is about volatile, so I guess our case is still fine? ;-) > >>> > >>> Hello, Boqun, > >>> > >>> When I asked that question, the answer I got was "the compiler must > >>> emit the load instruction, but is under no obligation to actually use the > >>> value loaded". > >> > >> So, it sounds like the following description found in memory-barriers.txt > >> just above the example of lack of transitivity: > >> > >>> However, stores are not speculated. This means that ordering -is- provided > >>> for load-store control dependencies, as in the following example: > >>> > >>> q = READ_ONCE(a); > >>> if (q) { > >>> WRITE_ONCE(b, 1); > >>> } > >>> > >>> Control dependencies pair normally with other types of barriers. > >>> That said, please note that neither READ_ONCE() nor WRITE_ONCE() > >>> are optional! Without the READ_ONCE(), the compiler might combine the > >>> load from 'a' with other loads from 'a'. Without the WRITE_ONCE(), > >>> the compiler might combine the store to 'b' with other stores to 'b'. > >>> Either can result in highly counterintuitive effects on ordering. > >>> > >>> Worse yet, if the compiler is able to prove (say) that the value of > >>> variable 'a' is always non-zero, it would be well within its rights > >>> to optimize the original example by eliminating the "if" statement > >>> as follows: > >>> > >>> q = a; > >>> b = 1; /* BUG: Compiler and CPU can both reorder!!! */ > >>> > >>> So don't leave out the READ_ONCE(). > >> > >> does not stand if the answer needs to be taken seriously. > >> The READ_ONCE() is not good enough to prevent the "if" statement > >> from being eliminated. > >> > >> Hmm... If we do need to care about such an extreme optimization, > >> we should not rely on any control dependency in C source code > >> at least until the compiler gets educated about control dependency. > >> > >> Is there some reasonable compromise? > > > > Right now, what we have are the volatile loads such as from READ_ONCE() > > and WRITE_ONCE(). My defense of them has been that they need to properly > > handle MMIO. But not all compiler writers agree with me, so we should > > program at least a bit defensively. Though we should probably also > > be writing tests to verify that the compiler is doing the right thing, > > and those litmus tests should probably include your ">=" litmus test. > > But we should not encourage developers to rely on it quite yet. > > > > My current position is that compiler writers are currently unlikely to > > make their compilers do deep reasoning around the memory model, and that > > they are currently unlikely to do cross-translation-unit assigned-value > > analysis (though LTO might be changing that). But given a static atomic > > that is visible only within a translation unit, I would expect them > > to do value-range analysis. Hence my reluctance to present the ">=" > > variant as a pattern to follow. > > So if we declare "x" and "y" are not static (not file local), > can the ">=" variant be exempted from the possible optimization? > > For example: > > extern int x, y; > > CPU 0 CPU 1 > ======================= ======================= > r1 = READ_ONCE(x); r2 = READ_ONCE(y); > if (r1 >= 0) if (r2 >= 0) > WRITE_ONCE(y, 1); WRITE_ONCE(x, 1); > > assert(!(r1 == 1 && r2 == 1)); > > This looks safe to me. For the compilers I know about at the present time, yes. The underlying problem is that the standard says almost nothing about what volatile does. I usually argue that it was intended to be used for MMIO, so any optimization that would prevent a device driver from working must be prohibited on volatiles. A lot of people really don't like volatile, and would like to eliminate it, and these people also aren't very happy about any attempt to better define volatile. Keeps things entertaining. ;-) Thanx, Paul > Thoughts? > > Thanks, Akira > > > > > Thanx, Paul > > > > > >> Thanks, Akira > >> > >>> > >>> I don't happen to like that answer, by the way. ;-) > >>> > >>> Thanx, Paul > >>> > >>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html > >>>>> > >>>> > >>>> Great material to wake up mind in the morning! Thanks. > >>>> > >>>> Regards, > >>>> Boqun > >>>> > >>>>> What are your thoughts on this? > >>>>> > >>>>> Thanx, Paul > >>>>> > >>>>>> Thanks, Akira > >>>>>> > >>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt, > >>>>>>> so please do feel free to continue doing that! > >>>>>>> > >>>>>>> Thanx, Paul > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > >