Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1315909yba; Thu, 4 Apr 2019 08:24:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqx5qU+eEjD9CVLcrIZTMqqlkIeUkfwuSPamsEsTJ8qqQkgjBlT1LX+cKaJmnEi2SjsXbew+ X-Received: by 2002:a63:4f52:: with SMTP id p18mr6346035pgl.333.1554391468428; Thu, 04 Apr 2019 08:24:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554391468; cv=none; d=google.com; s=arc-20160816; b=CRiZ6evzHcUEVYvsDsFONkJbHDi0T8FYj7sRkGOZU9R2Up/dPou1D9gaeW9UBvf5vA 5GqjkcYTjcEa5fQGKjIsxqOq0xsxCk9+1UCvG7aPjxI42W6Seqlr0jkm5FlPX7Wl4vRH 81lNFMk1VRAF0Q4jZHUIT4C9TLo/2Bh8CmXJp5ejRzXeGj51KX0PEQ3s0uVv2RinVa4X Tui0lYcKQPyjhNp3DpPByoolKBkovlYfnLaU5nyMwFtkbYm9prJZARzm3shp+ZvNPUmJ Wo2xAn2GpZvnNFsS0IOH4lJcZOq1DqvgaC8bZDJG22zU2EJEV/AEeP4/sZHcAXUsboVj kB9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=xxr1QaRWUj1QLxhen9H6GIbG0SDquErwV86sqpM/RA4=; b=C6OYOOcosBY4QqSCFdSXqUdRMJ8ciRKHYJQbsz+wFk/r/xYTgqzlpwksKoegrZemVl lWi0rXw45ardEg479BeZA/kFxNUOKPpNHNVo+vfVw4bcFy5e8L+77Qn5ec/gy++rGQbO NkGRl+V/L/ua3GmGODAev3jMHYflE1nXCsK08/0W50m/jiHohedFDpf2jLvsBhBLuhbP 8awSnl7tKY12hzgETGoE3smIYTlw/WEXyki8SizwsHoIlyFlErdLtKE0kDWw1UPnql2u 6y0RytnMQtaOb4sWxFntxk1w4KsXwHAc91T4Kw8vyNOlsvmKld8DX9RAvSPV/SfjX1fv vW3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g11si11320905pgo.563.2019.04.04.08.24.12; Thu, 04 Apr 2019 08:24:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728735AbfDDPXV (ORCPT + 99 others); Thu, 4 Apr 2019 11:23:21 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:56162 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726952AbfDDPXV (ORCPT ); Thu, 4 Apr 2019 11:23:21 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x34FJZBU065524 for ; Thu, 4 Apr 2019 11:23:19 -0400 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0a-001b2d01.pphosted.com with ESMTP id 2rnktqaebw-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 04 Apr 2019 11:23:19 -0400 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 4 Apr 2019 16:23:17 +0100 Received: from b01cxnp22033.gho.pok.ibm.com (9.57.198.23) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 4 Apr 2019 16:23:13 +0100 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x34FNCbS15401106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 4 Apr 2019 15:23:12 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C53A8B2064; Thu, 4 Apr 2019 15:23:12 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 88152B205F; Thu, 4 Apr 2019 15:23:12 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.70.82.188]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 4 Apr 2019 15:23:12 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id 129FF16C07FE; Thu, 4 Apr 2019 08:23:14 -0700 (PDT) Date: Thu, 4 Apr 2019 08:23:14 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: Oleg Nesterov , Jann Horn , Kees Cook , "Eric W. Biederman" , LKML , Android Kernel Team , Kernel Hardening , Andrew Morton , Matthew Wilcox , Michal Hocko , "Reshetova, Elena" , Alan Stern Subject: Re: [PATCH] Convert struct pid count to refcount_t Reply-To: paulmck@linux.ibm.com References: <20190328023432.GA93275@google.com> <20190328143738.GA261521@google.com> <20190328162641.GC19441@redhat.com> <20190328173707.GP4102@linux.ibm.com> <20190330023639.GA214473@google.com> <20190331215531.GF4102@linux.ibm.com> <20190401211139.GA5087@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190401211139.GA5087@google.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 19040415-0064-0000-0000-000003C680BB X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010873; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000284; SDB=6.01184255; UDB=6.00620064; IPR=6.00965007; MB=3.00026298; MTD=3.00000008; XFM=3.00000015; UTC=2019-04-04 15:23:17 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19040415-0065-0000-0000-00003CF2AA5E Message-Id: <20190404152314.GG14111@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-04_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904040098 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 01, 2019 at 05:11:39PM -0400, Joel Fernandes wrote: > Thanks a lot Alan and Paul for the replies. I replied inline. > > On Sun, Mar 31, 2019 at 02:55:31PM -0700, Paul E. McKenney wrote: > > On Fri, Mar 29, 2019 at 10:36:39PM -0400, Joel Fernandes wrote: > > > On Thu, Mar 28, 2019 at 10:37:07AM -0700, Paul E. McKenney wrote: > > > > On Thu, Mar 28, 2019 at 05:26:42PM +0100, Oleg Nesterov wrote: > > > > > On 03/28, Jann Horn wrote: > > > > > > > > > > > > Since we're just talking about RCU stuff now, adding Paul McKenney to > > > > > > the thread. > > > > > > > > > > Since you added Paul let me add more confusion to this thread ;) > > > > > > > > Woo-hoo!!! More confusion! Bring it on!!! ;-) > > > > > > Nice to take part in the confusion fun too!!! ;-) > > > > > > > > There were some concerns about the lack of barriers in put_pid(), but I can't > > > > > find that old discussion and I forgot the result of that discussion... > > > > > > > > > > Paul, could you confirm that this code > > > > > > > > > > CPU_0 CPU_1 > > > > > > > > > > X = 1; if (READ_ONCE(Y)) > > > > > mb(); X = 2; > > > > > Y = 1; BUG_ON(X != 2); > > > > > > > > > > > > > > > is correct? I think it is, control dependency pairs with mb(), right? > > > > > > > > The BUG_ON() is supposed to happen at the end of time, correct? > > > > As written, there is (in the strict sense) a data race between the load > > > > of X in the BUG_ON() and CPU_0's store to X. In a less strict sense, > > > > you could of course argue that this data race is harmless, especially > > > > if X is a single byte. But the more I talk to compiler writers, the > > > > less comfortable I become with data races in general. :-/ > > > > > > > > So I would also feel better if the "Y = 1" was WRITE_ONCE(). > > > > > > > > On the other hand, this is a great opportunity to try out Alan Stern's > > > > prototype plain-accesses patch to the Linux Kernel Memory Model (LKMM)! > > > > > > > > https://lkml.kernel.org/r/Pine.LNX.4.44L0.1903191459270.1593-200000@iolanthe.rowland.org > > > > > > > > Also adding Alan on CC. > > > > > > > > Here is what I believe is the litmus test that your are interested in: > > > > > > > > ------------------------------------------------------------------------ > > > > C OlegNesterov-put_pid > > > > > > > > {} > > > > > > > > P0(int *x, int *y) > > > > { > > > > *x = 1; > > > > smp_mb(); > > > > *y = 1; > > > > } > > > > > > > > P1(int *x, int *y) > > > > { > > > > int r1; > > > > > > > > r1 = READ_ONCE(*y); > > > > if (r1) > > > > *x = 2; > > > > } > > > > > > > > exists (1:r1=1 /\ ~x=2) > > > > ------------------------------------------------------------------------ > > > > > > > > Running this through herd with Alan's patch detects the data race > > > > and says that the undesired outcome is allowed: > > > > > > > > $ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid.litmus > > > > Test OlegNesterov-put_pid Allowed > > > > States 3 > > > > 1:r1=0; x=1; > > > > 1:r1=1; x=1; > > > > 1:r1=1; x=2; > > > > Ok > > > > Witnesses > > > > Positive: 1 Negative: 2 > > > > Flag data-race > > > > Condition exists (1:r1=1 /\ not (x=2)) > > > > Observation OlegNesterov-put_pid Sometimes 1 2 > > > > Time OlegNesterov-put_pid 0.00 > > > > Hash=a3e0043ad753effa860fea37eeba0a76 > > > > > > > > Using WRITE_ONCE() for P0()'s store to y still allows this outcome, > > > > although it does remove the "Flag data-race". > > > > > > > > Using WRITE_ONCE() for both P0()'s store to y and P1()'s store to x > > > > gets rid of both the "Flag data-race" and the undesired outcome: > > > > > > > > $ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-WO-WO.litmus > > > > Test OlegNesterov-put_pid-WO-WO Allowed > > > > States 2 > > > > 1:r1=0; x=1; > > > > 1:r1=1; x=2; > > > > No > > > > Witnesses > > > > Positive: 0 Negative: 2 > > > > Condition exists (1:r1=1 /\ not (x=2)) > > > > Observation OlegNesterov-put_pid-WO-WO Never 0 2 > > > > Time OlegNesterov-put_pid-WO-WO 0.01 > > > > Hash=6e1643e3c5e4739b590bde0a8e8a918e > > > > > > > > Here is the corresponding litmus test, in case I messed something up: > > > > > > > > ------------------------------------------------------------------------ > > > > C OlegNesterov-put_pid-WO-WO > > > > > > > > {} > > > > > > > > P0(int *x, int *y) > > > > { > > > > *x = 1; > > > > smp_mb(); > > > > WRITE_ONCE(*y, 1); > > > > } > > > > > > > > P1(int *x, int *y) > > > > { > > > > int r1; > > > > > > > > r1 = READ_ONCE(*y); > > > > if (r1) > > > > WRITE_ONCE(*x, 2); > > > > } > > > > > > > > exists (1:r1=1 /\ ~x=2) > > > > > > I ran the above examples too. Its a bit confusing to me why the WRITE_ONCE in > > > P0() is required, and why would the READ_ONCE / WRITE_ONCE in P1() not be > > > sufficient to prevent the exists condition. Shouldn't the compiler know that, > > > in P0(), it should not reorder the store to y=1 before the x=1 because there > > > is an explicit barrier between the 2 stores? Looks me to me like a broken > > > compiler :-|. > > > > > > So I would have expected the following litmus to result in Never, but it > > > doesn't with Alan's patch: > > > > > > P0(int *x, int *y) > > > { > > > *x = 1; > > > smp_mb(); > > > *y = 1; > > > } > > > > > > P1(int *x, int *y) > > > { > > > int r1; > > > > > > r1 = READ_ONCE(*y); > > > if (r1) > > > WRITE_ONCE(*x, 2); > > > } > > > > > > exists (1:r1=1 /\ ~x=2) > > > > The problem is that the compiler can turn both of P0()'s writes into reads: > > > > P0(int *x, int *y) > > { > > if (*x != 1) > > *x = 1; > > smp_mb(); > > if (*y != 1) > > *y = 1; > > } > > > > These reads will not be ordered by smp_wmb(), so you have to do WRITE_ONCE() > > to get an iron-clad ordering guarantee. > > But the snippet above has smp_mb() which does order writes, even for the > plain accesses. True, but that code has a data race, namely P0()'s plain write to y races with P1()'s READ_ONCE(). Data races give the compiler absolutely astonishing levels of freedom to rearrange your code. So, if you as a developer or maintainer choose to have data races, it is your responsibility to ensure that the compiler cannot mess you up. So what you should do in that case is to list the transformed code the compiler's optimizations can produce and feed the corresponding litmus tests to LKMM, using READ_ONCE() and WRITE_ONCE() for the post-optimization accesses. > > > > ------------------------------------------------------------------------ > > > > > > > > > If not, then put_pid() needs atomic_read_acquire() as it was proposed in that > > > > > discussion. > > > > > > > > Good point, let's try with smp_load_acquire() in P1(): > > > > > > > > $ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-WO-sla.litmus > > > > Test OlegNesterov-put_pid-WO-sla Allowed > > > > States 2 > > > > 1:r1=0; x=1; > > > > 1:r1=1; x=2; > > > > No > > > > Witnesses > > > > Positive: 0 Negative: 2 > > > > Condition exists (1:r1=1 /\ not (x=2)) > > > > Observation OlegNesterov-put_pid-WO-sla Never 0 2 > > > > Time OlegNesterov-put_pid-WO-sla 0.01 > > > > Hash=4fb0276eabf924793dec1970199db3a6 > > > > > > > > This also works. Here is the litmus test: > > > > > > > > ------------------------------------------------------------------------ > > > > C OlegNesterov-put_pid-WO-sla > > > > > > > > {} > > > > > > > > P0(int *x, int *y) > > > > { > > > > *x = 1; > > > > smp_mb(); > > > > WRITE_ONCE(*y, 1); > > > > } > > > > > > > > P1(int *x, int *y) > > > > { > > > > int r1; > > > > > > > > r1 = smp_load_acquire(y); > > > > if (r1) > > > > *x = 2; > > > > } > > > > > > > > exists (1:r1=1 /\ ~x=2) > > > > ------------------------------------------------------------------------ > > > > > > > > Demoting P0()'s WRITE_ONCE() to a plain write while leaving P1()'s > > > > smp_load_acquire() gets us a data race and allows the undesired > > > > outcome: > > > > > > Yeah, I think this is also what I was confused about above, is why is that > > > WRITE_ONCE required in P0() because there's already an smp_mb there. Surely > > > I'm missing something. ;-) > > > > The first of P0()'s writes can be a plain write, at least assuming > > sufficient synchronization to avoid the data race. But turning the second > > of P0()'s writes into a plain write is a bit riskier: That is a write of > > a constant, and those really are torn in some cases on some architectures. > > Like x86, for example. > > I understand. Are we talking about load/store tearing being the issue here? > Even under load/store tearing, I feel the program will produce correct > results because r1 is either 0 or 1 (a single bit cannot be torn). The compiler can (at least in theory) also transform *y = 1 as follows: *y = r1; /* Use *y as temporary storage. */ do_something(); r1 = *y; *y = 1; Here r1 is some register and do_something() is an inline function visible to the compiler (hence not implying barrier() upon call and return). I don't know of any compilers that actually do this, but for me use of WRITE_ONCE() is a small price to pay to prevent all past, present, and future compiler from ever destroying^W optimizing my code in this way. In your own code, if you dislike WRITE_ONCE() so much that you are willing to commit to (now and forever!!!) checking all applicable versions of compilers to make sure that they don't do this, well and good, knock yourself out. But it is your responsibility to do that checking, and you can attest to LKMM that you have done so by giving LKMM litmus tests that substitute WRITE_ONCE() for that plain C-language assignment statement. > Further, from the herd simulator output (below), according to the "States", > r1==1 means P1() AFAICS would have already finished the the read and set the > r1 register to 1. Then I am wondering why it couldn't take the branch to set > *x to 2. According to herd, r1 == 1 AND x == 1 is a perfectly valid state > for the below program. I still couldn't see in my mind how for the below > program, this is possible - in terms of compiler optimizations or other kinds > of ordering. Because there is a smp_mb() between the 2 plain writes in P0() > and P1() did establish that r1 is 1 in the positive case. :-/. I am surely > missing something :-) > > ---8<----------------------- > C Joel-put_pid > > {} > > P0(int *x, int *y) > { > *x = 1; > smp_mb(); > *y = 1; > } > > P1(int *x, int *y) > { > int r1; > > r1 = READ_ONCE(*y); > if (r1) > WRITE_ONCE(*x, 2); > } > > exists (1:r1=1 /\ ~x=2) > > ---8<----------------------- > Output: > > Test Joel-put_pid Allowed > States 3 > 1:r1=0; x=1; > 1:r1=1; x=1; <-- Can't figure out why r1=1 and x != 2 here. I must defer to Alan on this, but my guess is that this is due to the fact that there is a data race. Thanx, Paul > 1:r1=1; x=2; > Ok > Witnesses > Positive: 1 Negative: 2 > Flag data-race > Condition exists (1:r1=1 /\ not (x=2)) > Observation Joel-put_pid Sometimes 1 2 > Time OlegNesterov-put_pid-WO-WO 0.01 > Hash=c7bdd50418d42779b7c10ba9128369df > > > > > $ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-sla.litmus > > > > Test OlegNesterov-put_pid-sla Allowed > > > > States 3 > > > > 1:r1=0; x=1; > > > > 1:r1=1; x=1; > > > > 1:r1=1; x=2; > > > > Ok > > > > Witnesses > > > > Positive: 1 Negative: 2 > > > > Flag data-race > > > > Condition exists (1:r1=1 /\ not (x=2)) > > > > Observation OlegNesterov-put_pid-sla Sometimes 1 2 > > > > Time OlegNesterov-put_pid-sla 0.01 > > > > Hash=ec6f71f3d9f7cd6e45a874c872e3d946 > > > > > > > > But what if you are certain that the compiler cannot mess up your use > > > > of plain C-language loads and stores? Then simply tell LKMM that they > > > > are READ_ONCE() and WRITE_ONCE(), respectively. LKMM is admittedly > > > > somewhat paranoid, but real C compilers really do tear stores of certain > > > > constants on systems (like x86) that have store-immediate instructions, > > > > so a bit of paranoia is not misplaced here. ;-) > > > > > > > > Plus please note that this patch to LKMM is prototype and thus subject > > > > to change. > > > > > > Ah I see. Appreciate if Alan can also CC me on future posting of this since > > > I'm quite interested. ;-) > > > > His last posting should be easy to find. But please let me know if not, > > as I would be happy to send it along. > > I found it and I'm going through it. Thanks! > > - Joel >