Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2080279imu; Wed, 12 Dec 2018 09:10:24 -0800 (PST) X-Google-Smtp-Source: AFSGD/WtUbVNHiOZHM655RS7VBzPnYNL9Q2EpolwTFeuEHc5lTePthABGEnyj9IR0eTjMXQ+sfAy X-Received: by 2002:a62:5444:: with SMTP id i65mr21671212pfb.193.1544634624628; Wed, 12 Dec 2018 09:10:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544634624; cv=none; d=google.com; s=arc-20160816; b=mtyrj+KncLWNWZEXg9hd8xm2APlI4GUQj9cntUqRVd2RrftOtl1HoRNn0DrdirSvCO pQ4Vg8JsnQyaA8R0//9EZgc9fGnAxfA3vkgGBSIiu8BQ8P7G8m3KMrofipoyk5dzQVy7 akbkbjN5mWh9AfFlSYIJMsRdg/cuvT+0Gt/JaXT+xbq4bX03kKKmEpYieQdl2sV8YLw6 5+Jd83GmnIF2C0b59T2OJRjG9FHQMQ0ynk8VmeiEKOVxtb3RfAe405i75MK62MQ5r7q4 Zi1pOzzwgrtkiP0imZHmvVaWcmGY/pNVB/M1YFeSPLNTsPBMTrM5T01ONt5awzMRcg5r gDbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=VTLnnchrA7mY0a+yA0MEC7c+L5QyYPPH6eO6Q3xQ9I4=; b=x4m3og1ffqJ96GVbAssqLXagqckvnKC40IIr+wsU6d2ZRPIJnhzoHNa2DoOEUca0iB 8WOyFNZVTLeznT9WSECG7gc+yf/f0Hyt52vyR8GP0UZBwaXxCn/xtOHRYe4rmllgMPGa LZMTWPn/4+/ZClX7+Kp1WHLvPOXU8FUPnkPDzjG2sNqXu79+cVB5/aLIR+qwda5Y8WqI SJWuArXDCfKi0zkVXSg3uW6Uk8kAfncIBe/CHagN5Xku7zNrjxVRtLp4QWKxiL514gbd FKTZ48mAMUYn0Lc/VhOVamw2/qacPdZvKxU0XpZMbxz8W8kP7/Dkko43cWoEgcXAwuFl K1kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s12si14898781pgh.488.2018.12.12.09.09.51; Wed, 12 Dec 2018 09:10:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727968AbeLLRHP (ORCPT + 99 others); Wed, 12 Dec 2018 12:07:15 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:60316 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726879AbeLLRHO (ORCPT ); Wed, 12 Dec 2018 12:07:14 -0500 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBCH44YJ025419 for ; Wed, 12 Dec 2018 12:07:13 -0500 Received: from e15.ny.us.ibm.com (e15.ny.us.ibm.com [129.33.205.205]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pb4wvcptr-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 12 Dec 2018 12:07:13 -0500 Received: from localhost by e15.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 12 Dec 2018 17:07:12 -0000 Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29) by e15.ny.us.ibm.com (146.89.104.202) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 12 Dec 2018 17:07:07 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBCH76SL18874434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 12 Dec 2018 17:07:06 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 526ABB2066; Wed, 12 Dec 2018 17:07:06 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1EF3AB205F; Wed, 12 Dec 2018 17:07:05 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.38]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 12 Dec 2018 17:07:05 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id E8C3116C2E08; Wed, 12 Dec 2018 09:07:06 -0800 (PST) Date: Wed, 12 Dec 2018 09:07:06 -0800 From: "Paul E. McKenney" To: Alan Stern Cc: David Goldblatt , mathieu.desnoyers@efficios.com, Florian Weimer , triegel@redhat.com, libc-alpha@sourceware.org, andrea.parri@amarulasolutions.com, will.deacon@arm.com, peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com, dlustig@nvidia.com, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Linux: Implement membarrier function Reply-To: paulmck@linux.ibm.com References: <20181211190801.GO4170@linux.ibm.com> <20181211212204.GR4170@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181211212204.GR4170@linux.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18121217-0068-0000-0000-000003709148 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010214; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01130770; UDB=6.00587605; IPR=6.00910903; MB=3.00024670; MTD=3.00000008; XFM=3.00000015; UTC=2018-12-12 17:07:12 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18121217-0069-0000-0000-000046BDFF4E Message-Id: <20181212170706.GA17397@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-12-12_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812120147 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 11, 2018 at 01:22:04PM -0800, Paul E. McKenney wrote: > On Tue, Dec 11, 2018 at 03:09:33PM -0500, Alan Stern wrote: > > On Tue, 11 Dec 2018, Paul E. McKenney wrote: > > > > > > Rewriting the litmus test in these terms gives: > > > > > > > > P0 P1 P2 P3 P4 P5 > > > > Wa=2 Wb=2 Wc=2 [mb23] [mb14] [mb05] > > > > mb0s mb1s mb2s Wd=2 We=2 Wf=2 > > > > mb0e mb1e mb2e Re=0 Rf=0 Ra=0 > > > > Rb=0 Rc=0 Rd=0 > > > > > > > > Here the brackets in "[mb23]", "[mb14]", and "[mb05]" mean that the > > > > positions of these barriers in their respective threads' program > > > > orderings is undetermined; they need not come at the top as shown. > > > > > > > > (Also, in case David is unfamiliar with it, the "Wa=2" notation is > > > > shorthand for "Write 2 to a" and "Rb=0" is short for "Read 0 from b".) > > > > > > > > Finally, here are a few facts which may be well known and obvious, but > > > > I'll state them anyway: > > > > > > > > A CPU cannot reorder instructions across a memory barrier. > > > > If x is po-after a barrier then x executes after the barrier > > > > is finished. > > > > > > > > If a store is po-before a barrier then the store propagates > > > > to every CPU before the barrier finishes. > > > > > > > > If a store propagates to some CPU before a load on that CPU > > > > reads from the same location, then the load will obtain the > > > > value from that store or a co-later store. This implies that > > > > if a load obtains a value co-earlier than some store then the > > > > load must have executed before the store propagated to the > > > > load's CPU. > > > > > > > > The proof consists of three main stages, each requiring three steps. > > > > Using the facts that b - f are all read as 0, I'll show that P1 > > > > executes Rc before P3 executes Re, then that P0 executes Rb before P4 > > > > executes Rf, and lastly that P5's Ra must obtain 2, not 0. This will > > > > demonstrate that the litmus test is not allowed. > > > > > > > > 1. Suppose that mb23 ends up coming po-later than Wd in P3. > > > > Then we would have: > > > > > > > > Wd propagates to P2 < mb23 < mb2e < Rd, > > > > > > > > and so Rd would obtain 2, not 0. Hence mb23 must come > > > > po-before Wd (as shown in the listing): mb23 < Wd. > > > > > > > > 2. Since mb23 therefore occurs po-before Re and instructions > > > > cannot be reordered across barriers, mb23 < Re. > > > > > > > > 3. Since Rc obtains 0, we must have: > > > > > > > > Rc < Wc propagates to P1 < mb2s < mb23 < Re. > > > > > > > > Thus Rc < Re. > > > > > > > > 4. Suppose that mb14 ends up coming po-later than We in P4. > > > > Then we would have: > > > > > > > > We propagates to P3 < mb14 < mb1e < Rc < Re, > > > > > > > > and so Re would obtain 2, not 0. Hence mb14 must come > > > > po-before We (as shown in the listing): mb14 < We. > > > > > > > > 5. Since mb14 therefore occurs po-before Rf and instructions > > > > cannot be reordered across barriers, mb14 < Rf. > > > > > > > > 6. Since Rb obtains 0, we must have: > > > > > > > > Rb < Wb propagates to P0 < mb1s < mb14 < Rf. > > > > > > > > Thus Rb < Rf. > > > > > > > > 7. Suppose that mb05 ends up coming po-later than Wf in P5. > > > > Then we would have: > > > > > > > > Wf propagates to P4 < mb05 < mb0e < Rb < Rf, > > > > > > > > and so Rf would obtain 2, not 0. Hence mb05 must come > > > > po-before Wf (as shown in the listing): mb05 < Wf. > > > > > > > > 8. Since mb05 therefore occurs po-before Ra and instructions > > > > cannot be reordered across barriers, mb05 < Ra. > > > > > > > > 9. Now we have: > > > > > > > > Wa propagates to P5 < mb0s < mb05 < Ra, > > > > > > > > and so Ra must obtain 2, not 0. QED. > > > > > > Like this, then, with maximal reordering of P3-P5's reads? > > > > > > P0 P1 P2 P3 P4 P5 > > > Wa=2 > > > mb0s > > > [mb05] > > > mb0e Ra=0 > > > Rb=0 Wb=2 > > > mb1s > > > [mb14] > > > mb1e Rf=0 > > > Rc=0 Wc=2 Wf=2 > > > mb2s > > > [mb23] > > > mb2e Re=0 > > > Rd=0 We=2 > > > Wd=2 > > > > Yes, that's right. This shows how P5's Ra must obtain 2 instead of 0. > > > > > But don't the sys_membarrier() calls affect everyone, especially given > > > the shared-variable communication? > > > > They do, but the other effects are irrelevant for this proof. > > If I understand correctly, the shared-variable communication within > sys_membarrier() is included in your proof in the form of ordering > between memory barriers in the mainline sys_membarrier() code and > in the IPI handlers. > > > > If so, why wouldn't this more strict > > > variant hold? > > > > > > P0 P1 P2 P3 P4 P5 > > > Wa=2 > > > mb0s > > > [mb05] [mb05] [mb05] > > > > You have misunderstood the naming scheme. mb05 is the barrier injected > > by P0's sys_membarrier call into P5. So the three barriers above > > should be named "mb03", "mb04", and "mb05". And you left out mb01 and > > mb02. > > The former is a copy-and-paste error on my part, the latter was > intentional because the IPIs among P0, P1, and P2 don't seem to > strengthen the ordering. > > > > mb0e > > > Rb=0 Wb=2 > > > mb1s > > > [mb14] [mb14] [mb14] > > > mb1e > > > Rc=0 Wc=2 > > > mb2s > > > [mb23] [mb23] [mb23] > > > mb2e Re=0 Rf=0 Ra=0 > > > Rd=0 We=2 Wf=2 > > > Wd=2 > > > > Yes, this does hold. But since it doesn't affect the end result, > > there's no point in mentioning all those other barriers. > > > > > In which case, wouldn't this cycle be forbidden even if it had only one > > > sys_membarrier() call? > > > > No, it wouldn't. I don't understand why you might think it would. > > Because I hadn't yet thought of the scenario I showed below. > > > This is just like RCU, if you imagine a tiny critical section between > > each adjacent pair of instructions. You wouldn't expect RCU to enforce > > ordering among six CPUs with only one synchronize_rcu call. > > Yes, I do now agree in light of the scenario shown below. > > > > Ah, but the IPIs are not necessarily synchronized across the CPUs, > > > so that the following could happen: > > > > > > P0 P1 P2 P3 P4 P5 > > > Wa=2 > > > mb0s > > > [mb05] [mb05] [mb05] > > > mb0e Ra=0 > > > Rb=0 Wb=2 > > > mb1s > > > [mb14] [mb14] > > > Rf=0 > > > Wf=2 > > > [mb14] > > > mb1e > > > Rc=0 Wc=2 > > > mb2s > > > [mb23] > > > Re=0 > > > We=2 > > > [mb23] [mb23] > > > mb2e > > > Rd=0 > > > Wd=2 > > > > Yes it could. But even in this execution you would end up with Ra=2 > > instead of Ra=0. > > Agreed. Or I should have said that the above execution is forbidden, > either way. > > > > I guess in light of this post in 2001, I really don't have an excuse, > > > do I? ;-) > > > > > > https://lists.gt.net/linux/kernel/223555 > > > > > > Or am I still missing something here? > > > > You tell me... > > I think I am on board. ;-) And more to the point, here is a three-process variant showing a cycle that is permitted: P0 P1 P2 Wa=2 Wb=2 Wc=2 mb0s [mb01] [mb02] mb0e Rb=0 Rc=0 Ra=0 As can be seen by reordering it as follows: P0 P1 P2 Ra=0 Wa=2 mb0s [mb01] Rc=0 Wc=2 [mb02] mb0e Rb=0 Wb=2 Make sense? Thanx, Paul