Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp55673ybe; Thu, 12 Sep 2019 15:33:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqzN0NClJGcqIyo5ljT8REweZQUHpbmufBotOcpPFQcYVCBwSSg484dl9l9c9dLBFyM2hMMb X-Received: by 2002:a05:6402:1426:: with SMTP id c6mr28654118edx.53.1568327581194; Thu, 12 Sep 2019 15:33:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568327581; cv=none; d=google.com; s=arc-20160816; b=bguhYsq4cBS0joutrLWRLG1XulsjvLRqWVXuqA/LI6K9oeyvtfON6w/RPl/kSTozbI qxx1cbxVcXUyeaHqo9wzYrcnVNEXTnex54d0h4KlgRojwuWByB7fxwuBLw1NhPnnQt09 t4edTuk8I2ZosdE0d8qTttsSW1/4n+UrGiyzPEdPc9PWQ/d0lNxYbC/EyK8D1ixKM8k0 TI3n7C5KxSGjvR+spiVf9d3EbooN4OI+LGLXkeGCO4UtYVS4bl8yvZ2SWjQUaJ7LCpFY e1ygo7HMYoAxX4JpapQvuRjvYttDdKFtuTUitdUAm7gWLsBZaVoStRg9uRM21i+ZCe8e Murw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=kjYwGQgKkYHds+xelM+60qW9RVc7XHdwyF2yB9YI4VQ=; b=TLlm8o/R+SiEaaGELF7NO0SUy24LlVSmQSl8onsx/190fFFDO135Y1OCZ0GUiFbYf9 p7/SCNDosL6tUL8YQXqXTttegY46dcqFOncvZn5fXzoQEOo/4NhBRodfaI0m+U2ZJAmv dg9W2MY/1yJlq65sQ3iRrV0X2D01g6QAphaguE91qGDFCye5okBX5Fe1iWob7+iAChhA qlvdgdDuAcib+5HYim9I1mUgComXdwf8U8RILeI4aCcT8SNZ7ltB3Mz1H56atgCog2d2 0R56DSNpIX8S4v0/HP/LKzxQ/e4r5hE4hCiKycRGgzU4FbxnDWB0P0K8qYWIxzOuirQ9 M1Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rS4OAEV6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x24si13724634ejs.168.2019.09.12.15.32.37; Thu, 12 Sep 2019 15:33:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rS4OAEV6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729219AbfILWBg (ORCPT + 99 others); Thu, 12 Sep 2019 18:01:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:43646 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727041AbfILWBf (ORCPT ); Thu, 12 Sep 2019 18:01:35 -0400 Received: from paulmck-ThinkPad-P72 (unknown [193.126.51.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1335E206A5; Thu, 12 Sep 2019 22:01:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568325692; bh=56lqsuouZbZdkB8/tzlOSWh/NFCyfpRDSG7aXocbyoU=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=rS4OAEV6QPdnwzWl6HUr0rMmspvjyf3LzUP8NS0WAapZTXRFhVF5JmZGh+xaR7+Tx OWKz+AWiEZ5/W4r1M4awYTOkZ4XkqqoxV0031CdY5AdyPrh6MIweFg6DsbEK4u7SB8 9ie4U7XzMWe/gkJaDTUf/dZGn4aNtlv+jVGzyGfE= Date: Thu, 12 Sep 2019 15:01:26 -0700 From: "Paul E. McKenney" To: Alan Stern Cc: LKMM Maintainers -- Akira Yokosawa , Andrea Parri , Boqun Feng , Daniel Lustig , David Howells , Jade Alglave , Luc Maranget , Nicholas Piggin , Peter Zijlstra , Will Deacon , Kernel development list Subject: Re: Documentation for plain accesses and data races Message-ID: <20190912220126.GA14560@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 06, 2019 at 02:11:29PM -0400, Alan Stern wrote: > Folks: > > I have spent some time writing up a section for > tools/memory-model/Documentation/explanation.txt on plain accesses and > data races. The initial version is below. > > I'm afraid it's rather long and perhaps gets too bogged down in > complexities. On the other hand, this is a complicated topic so to > some extent this is unavoidable. > > In any case, I'd like to hear your comments and reviews. Good stuff, thank you for putting this together! Please see below for some questions, comments, and confusion interspersed. > Alan > > ------------------------------------------------------------------------ > > > PLAIN ACCESSES AND DATA RACES > ----------------------------- > > In the LKMM, memory accesses such as READ_ONCE(x), atomic_inc(&y), > smp_load_acquire(&z), and so on are collectively referred to as > "marked" accesses, because they are all annotated with special > operations of one kind or another. Ordinary C-language memory > accesses such as x or y = 0 are simply called "plain" accesses. > > Early versions of the LKMM had nothing to say about plain accesses. > The C standard allows compilers to assume that the variables affected > by plain accesses are not concurrently read or written by any other > threads or CPUs. This leaves compilers free to implement all manner > of transformations or optimizations of code containing plain accesses, > making such code very difficult for a memory model to handle. > > Here is just one example of a possible pitfall: > > int a = 6; > int *x = &a; > > P0() > { > int *r1; > int r2 = 0; > > r1 = x; > if (r1 != NULL) > r2 = READ_ONCE(*r1); > } > > P1() > { > WRITE_ONCE(x, NULL); > } I tried making a litmus test out of this: ------------------------------------------------------------------------ C plain-1 { int a = 6; int *x = &a; } P0(int **x) { int *r1; int r2 = 0; r1 = *x; if (r1 != 0) r2 = READ_ONCE(*r1); } P1(int **x) { WRITE_ONCE(*x, 0); } locations [a; x; r1] exists ~r2=6 /\ ~r2=0 ------------------------------------------------------------------------ However, r1 steadfastly refuses to have any value other than zero. ------------------------------------------------------------------------ $ herd7 -conf linux-kernel.cfg /tmp/argh Test plain-1 Allowed States 1 a=6; r1=0; r2=0; x=0; No Witnesses Positive: 0 Negative: 2 Flag data-race Condition exists (not (r2=6) /\ not (r2=0)) Observation plain-1 Never 0 2 Time plain-1 0.00 Hash=b0fdbd0f627fd65e0cd413bf87f6f4a4 ------------------------------------------------------------------------ What am I doing wrong here? Outdated herd7 version? $ herd7 -version 7.52+7(dev), Rev: c81f1ff06f30d5c28c34d893a29f5f8505334179 Hmmm... I might well be in an inconsistent herd7/ocaml state. If no one sees anything obvious, I will try reinstalling from scratch, but that will not likely happen in the next few days. > On the face of it, one would expect that when this code runs, the only > possible final values for r2 are 6 and 0, depending on whether or not > P1's store to x propagates to P0 before P0's load from x executes. > But since P0's load from x is a plain access, the compiler may decide > to carry out the load twice (for the comparison against NULL, then again > for the READ_ONCE()) and eliminate the temporary variable r1. The > object code generated for P0 could therefore end up looking rather > like this: > > P0() > { > int r2 = 0; > > if (x != NULL) > r2 = READ_ONCE(*x); > } > > And now it is obvious that this code runs the risk of dereferencing a > NULL pointer, because P1's store to x might propagate to P0 after the > test against NULL has been made but before the READ_ONCE() executes. > If the original code had said "r1 = READ_ONCE(x)" instead of "r1 = x", > the compiler would not have performed this optimization and there > would be no possibility of a NULL-pointer dereference. > > Given the possibility of transformations like this one, the LKMM > doesn't try to predict all possible outcomes of code containing plain > accesses. It is content to determine whether the code violates the I suggest starting this sentence with something like "It is instead content to determine", adding "instead", to help the reader transition. > compiler's assumptions, which would render the ultimate outcome > undefined. > > In technical terms, the compiler is allowed to assume that when the > program executes, there will not be any data races. A "data race" > occurs when two conflicting memory accesses execute concurrently; > two memory accesses "conflict" if: > > they access the same location, > > they occur on different CPUs (or in different threads on the > same CPU), > > at least one of them is a plain access, > > and at least one of them is a store. > > The LKMM tries to determine whether a program contains two conflicting > accesses which may execute concurrently; if it does then the LKMM says > there is a potential data race and makes no predictions about the > program's outcome. > > Determining whether two accesses conflict is easy; you can see that > all the concepts involved in the definition above are already part of > the memory model. The hard part is telling whether they may execute > concurrently. The LKMM takes a conservative attitude, assuming that > accesses may be concurrent unless it can prove they cannot. > > If two memory accesses aren't concurrent then one must execute before Should this say "If two memory accesses to the same location aren't concurrent..."? > the other. Therefore the LKMM decides two accesses aren't concurrent > if they can be connected by a sequence of hb, pb, and rb links > (together referred to as xb, for "executes before"). However, there > are two complicating factors. > > If X is a load and X executes before a store Y, then indeed there is > no danger of X and Y being concurrent. After all, Y can't have any > effect on the value obtained by X until the memory subsystem has > propagated Y from its own CPU to X's CPU, which won't happen until > some time after Y executes and thus after X executes. But if X is a > store, then even if X executes before Y it is still possible that X > will propagate to Y's CPU just as Y is executing. In such a case X > could very well interfere somehow with Y, and we would have to > consider X and Y to be concurrent. > > Therefore when X is a store, for X and Y to be non-concurrent the LKMM > requires not only that X must execute before Y but also that X must > propagate to Y's CPU before Y executes. (Or vice versa, of course, if > Y executes before X -- then Y must propagate to X's CPU before X > executes if Y is a store.) This is expressed by the visibility > relation (vis), where X ->vis Y is defined to hold if there is an > intermediate event Z such that: "if there is a marked intermediate event Z such that"? > X is connected to Z by a possibly empty sequence of > cumul-fence links followed by an optional rfe link (if none of > these links are present, X and Z are the same event), > > and either: > > Z is connected to Y by a strong-fence link followed by a > possibly empty sequence of xb links, "possibly empty sequence of xb links from a marked access"? > or: > > Z is on the same CPU as Y and is connected to Y by a possibly > empty sequence of xb links (again, if the sequence is empty it > means Z and Y are the same event). > > The motivations behind this definition are straightforward: > > cumul-fence memory barriers force stores that are po-before > the barrier to propagate to other CPUs before stores that are > po-after the barrier. > > An rfe link from an event W to an event R says that R reads > from W, which certainly means that W must have propagated to > R's CPU before R executed. > > strong-fence memory barriers force stores that are po-before > the barrier, or that propagate to the barrier's CPU before the > barrier executes, to propagate to all CPUs before any events > po-after the barrier can execute. > > To see how this works out in practice, consider our old friend, the MP > pattern (with fences and statement labels, but without the conditional > test): > > int buf = 0, flag = 0; > > P0() > { > X: WRITE_ONCE(buf, 1); > smp_wmb(); > W: WRITE_ONCE(flag, 1); > } > > P1() > { > int r1; > int r2 = 0; > > Z: r1 = READ_ONCE(flag); > smp_rmb(); > Y: r2 = READ_ONCE(buf); > } I have to ask. Why X then W then Z then Y? ;-) (This is MP+fencewmbonceonce+fencermbonceonce.litmus in the current set in tools/memory-model/litmus-tests.) > The smp_wmb() memory barrier gives a cumul-fence link from X to W, and > assuming r1 = 1 at the end, there is an rfe link from W to Z. This > means that the store to buf must propagate from P0 to P1 before Z > executes. Next, Z and Y are on the same CPU and the smp_rmb() fence > provides an xb link from Z to Y (i.e., it forces Z to execute before > Y). Therefore we have X ->vis Y: X must propagate to Y's CPU before Y > executes. > > The second complicating factor mentioned above arises from the fact > that when we are considering data races, some of the memory accesses > are plain. Now, although we have not said so explicitly, up to this > point most of the relations defined by the LKMM (ppo, hb, prop, > cumul-fence, pb, and so on -- including vis) apply only to marked > accesses. > > There are good reasons for this restriction. The compiler is not > allowed to apply fancy transformations to marked accesses, and > consequently each such access in the source code corresponds more or > less directly to a single machine instruction in the object code. But > plain accesses are a different story; the compiler may combine them, > split them up, duplicate them, eliminate them, invent new ones, and > who knows what else. Seeing a plain access in the source code tells > you almost nothing about what machine instructions will end up in the > object code. > > Fortunately, the compiler isn't completely free; it is subject to some > limitations. For one, it is not allowed to introduce a data race into > the object code if the source code does not already contain a data > race (if it could, memory models would be useless and no multithreaded > code would be safe!). For another, it cannot move a plain access past > a compiler barrier. > > A compiler barrier is a kind of fence, but as the name implies, it > only affects the compiler; it does not necessarily have any effect on > how instructions are executed by the CPU. In Linux kernel source > code, the barrier() function is a compiler barrier. It doesn't give > rise directly to any machine instructions in the object code; rather, > it affects how the compiler generates the rest of the object code. > Given source code like this: > > ... some memory accesses ... > barrier(); > ... some other memory accesses ... > > the barrier() function ensures that the machine instructions > corresponding to the first group of accesses will all end po-before > any machine instructions corresponding to the second group of accesses > -- even if some of the accesses are plain. (Of course, the CPU may > then execute some of those accesses out of program order, but we > already know how to deal with such issues.) Without the barrier() > there would be no such guarantee; the two groups of accesses could be > intermingled or even reversed in the object code. > > The LKMM doesn't say much about the barrier() function, but it does > require that all fences are also compiler barriers. In addition, it > requires that the ordering properties of memory barriers such as > smp_rmb() or smp_store_release() apply to plain accesses as well as to > marked accesses. > > This is the key to analyzing data races. Consider the MP pattern > again, now using plain accesses for buf: > > int buf = 0, flag = 0; > > P0() > { > U: buf = 1; > smp_wmb(); > X: WRITE_ONCE(flag, 1); > } > > P1() > { > int r1; > int r2 = 0; > > Y: r1 = READ_ONCE(flag); > if (r1) { > smp_rmb(); > V: r2 = buf; > } > } And same nit, why not just X, Y, and Z? Similar issues with the litmus test: ------------------------------------------------------------------------ C plain-4 { int buf = 0; int flag = 0; } P0(int *buf, int *flag) { *buf = 1; smp_wmb(); WRITE_ONCE(*flag, 1); } P1(int *buf, int *flag) { int r1; int r2 = 0; r1 = READ_ONCE(*flag); if (r1) { smp_rmb(); r2 = *buf; } } exists r1=1 /\ r2=0 ------------------------------------------------------------------------ > This program does not contain a data race. Although the U and V > accesses conflict, the LKMM can prove they are not concurrent as > follows: > > The smp_wmb() fence in P0 is both a compiler barrier and a > cumul-fence. It guarantees that no matter what hash of > machine instructions the compiler generates for the plain > access U, all those instructions will be po-before the fence. > Consequently U's store to buf, no matter how it is carried out > at the machine level, must propagate to P1 before X's store to > flag does. > > X and Y are both marked accesses. Hence an rfe link from X to > Y is a valid indicator that X propagated to P1 before Y > executed, i.e., X ->vis Y. (And if there is no rfe link then > r1 will be 0, so V will not be executed and ipso facto won't > race with U.) > > The smp_rmb() fence in P1 is a compiler barrier as well as a > fence. It guarantees that all the machine-level instructions > corresponding to the access V will be po-after the fence, and > therefore any loads among those instructions will execute > after the fence does and hence after Y does. > > Thus U's store to buf is forced to propagate to P1 before V's load > executes (assuming V does execute), ruling out the possibility of a > data race between them. > > This analysis illustrates how the LKMM deals with plain accesses in > general. Suppose R is a plain load and we want to show that R > executes before some marked access E. We can do this by finding a > marked access X such that R and X are ordered by a suitable fence and > X ->xb* E. If E was also a plain access, we would also look for a > marked access Y such that X ->xb* Y, and Y and E are ordered by a > fence. We describe this arrangement by saying that R is > "post-bounded" by X and E is "pre-bounded" by Y. > > In fact, we go one step further: Since R is a read, we say that R is > "r-post-bounded" by X. Similarly, E would be "r-pre-bounded" or > "w-pre-bounded" by Y, depending on whether E was a store or a load. > This distinction is needed because some fences affect only loads > (i.e., smp_rmb()) and some affect only stores (smp_wmb()); otherwise > the two types of bounds are the same. And as a degenerate case, we > say that a marked access pre-bounds and post-bounds itself (e.g., if R > above were a marked load then X could simply be taken to be R itself.) > > The need to distinguish between r- and w-bounding raises yet another > issue. When the source code contains a plain store, the compiler is > allowed to put plain loads of the same location into the object code. > For example, given the source code: > > x = 1; > > the compiler is theoretically allowed to generate object code that > looks like: > > if (x != 1) > x = 1; > > thereby adding a load (and possibly replacing the store entirely). > For this reason, whenever the LKMM requires a plain store to be > w-pre-bounded or w-post-bounded by a marked access, it also requires > the store to be r-pre-bounded or r-post-bounded, so as to handle cases > where the compiler adds a load. > > (This may be overly cautious. We don't know of any examples where a > compiler has augmented a store with a load in this fashion, and the > Linux kernel developers would probably fight pretty hard to change a > compiler if it ever did this. Still, better safe than sorry.) > > Incidentally, the other tranformation -- augmenting a plain load by > adding in a store to the same location -- is not allowed. This is > because the compiler cannot know whether any other CPUs might perform > a concurrent load from that location. Two concurrent loads don't > constitute a race (they can't interfere with each other), but a store > does race with a concurrent load. Thus adding a store might create a > data race where one was not already present in the source code, > something the compiler is forbidden to do. Augmenting a store with a > load, on the other hand, is acceptable because doing so won't create a > data race unless one already existed. > > The LKMM includes a second way to pre-bound plain accesses, in > addition to fences: an address dependency from a marked load. That > is, in the sequence: > > p = READ_ONCE(ptr); > r = *p; > > the LKMM says that the marked load of ptr pre-bounds the plain load of > *p; the marked load must execute before any of the machine > instructions corresponding to the plain load. This is a reasonable > stipulation, since after all, the CPU can't perform the load of *p > until it knows what value p will hold. Furthermore, without some > assumption like this one, some usages typical of RCU would count as > data races. For example: > > int a = 1, b; herd7 doesn't much like this. > int *ptr = &a; > > P0() > { > b = 2; > rcu_assign_ptr(ptr, &b); rcu_assign_pointer(), globally. > } > > P1() > { > int *p; > int r; > > rcu_read_lock(); > p = rcu_dereference(ptr); > r = *p; > rcu_read_unlock(); > } ------------------------------------------------------------------------ C plain-5 { int a = 1; int b; int *ptr = &a; } P0(int *b, int **ptr) { *b = 2; rcu_assign_pointer(*ptr, b); } P1(int *ptr) { int *r1; int r2; rcu_read_lock(); r1 = rcu_dereference(*ptr); r2 = *r1; rcu_read_unlock(); } exists r1=b /\ r2=1 ------------------------------------------------------------------------ Also strange output: ------------------------------------------------------------------------ $ herd7 -conf linux-kernel.cfg /tmp/argh Test plain-3 Allowed States 1 r1=0; r2=0; No Witnesses Positive: 0 Negative: 2 Condition exists (r1=b /\ r2=1) Observation plain-3 Never 0 2 Time plain-3 0.00 Hash=44e039597a92bdc7efc9217813478126 ------------------------------------------------------------------------ > (In this example the rcu_read_lock() and rcu_read_unlock() calls don't > really do anything, because there aren't any grace periods. They are > included merely for the sake of good form; typically P0 would call > synchronize_rcu() somewhere after the rcu_assign_ptr().) > > rcu_assign_ptr() performs a store-release, so the plain store to b is > definitely w-post-bounded before the store to ptr, and the two stores > will propagate to P1 in that order. However, rcu_dereference() is > only equivalent to READ_ONCE(). While it is a marked access, it is > not a fence or compiler barrier. Hence the only guarantee we have > that the load of ptr in P1 is r-pre-bounded before the load of *p > (thus avoiding a race) is the assumption about address dependencies. > > This is a situation where the compiler can undermine the memory model, > and a certain amount of care is required when programming constructs > like this one. In particular, comparisons between the pointer and > other known addresses can cause trouble. If you have something like: > > p = rcu_dereference(ptr); > if (p == &x) > r = *p; > > then the compiler just might generate object code resembling: > > p = rcu_dereference(ptr); > if (p == &x) > r = x; > > or even: > > rtemp = x; > p = rcu_dereference(ptr); > if (p == &x) > r = rtemp; > > which would invalidate the memory model's assumption, since the CPU > could now perform the load of x before the load of ptr (there might be > a control dependency but no address dependency at the machine level). > > Finally, it turns out there is a situation in which a plain write does > not need to be w-post-bounded: when it is separated from the > conflicting access by a fence. At first glance this may seem > impossible. After all, to be conflicting the second access has to be > on a different CPU from the first, and fences don't link events on > different CPUs. Well, normal fences don't -- but rcu-fence can! > Here's an example: > > int x, y; > > P0() > { > WRITE_ONCE(x, 1); > synchronize_rcu(); > y = 3; > } > > P1() > { > rcu_read_lock(); > if (READ_ONCE(x) == 0) > y = 2; > rcu_read_unlock(); > } I introduced an r1 to create an "exists" clause: ------------------------------------------------------------------------ C plain-6 { int x; int y; } P0(int *x, int *y) { WRITE_ONCE(*x, 1); synchronize_rcu(); *y = 3; } P1(int *x, int *y) { int r1; rcu_read_lock(); r1 = READ_ONCE(*x); if (r1 == 0) *y = 2; rcu_read_unlock(); } exists r1=0 /\ y=2 ------------------------------------------------------------------------ > Do the plain stores to y race? Clearly not if P1 reads a non-zero > value for x, so let's assume the READ_ONCE(x) does obtain 0. This > means that the read-side critical section in P1 must finish executing > before the grace period in P0 does, because RCU's Grace-Period > Guarantee says that otherwise P0's store to x would have propagated to > P1 before the critical section started and so would have been visible > to the READ_ONCE(). (Another way of putting it is that the fre link > from the READ_ONCE() to the WRITE_ONCE() gives rise to an rcu-link > between those two events.) > > This means there is an rcu-fence link from P1's "y = 2" store to P0's > "y = 3" store, and consequently the first must propagate from P1 to P0 > before the second can execute. Therefore the two stores cannot be > concurrent and there is no race, even though P1's plain store to y > isn't w-post-bounded by any marked accesses. > > Putting all this material together yields the following picture. For > two conflicting stores W and W', where W ->co W', the LKMM says the > stores don't race if W can be linked to W' by a > > w-post-bounded ; vis ; w-pre-bounded > > sequence. If W is plain then they also have to be linked by an > > r-post-bounded ; xb* ; w-pre-bounded > > sequence, and if W' is plain then they also have to be linked by a > > w-post-bounded ; vis ; r-pre-bounded > > sequence. For a conflicting load R and store W, the LKMM says the two > accesses don't race if R can be linked to W by an > > r-post-bounded ; xb* ; w-pre-bounded > > sequence or if W can be linked to R by a > > w-post-bounded ; vis ; r-pre-bounded > > sequence. For the cases involving a vis link, the LKMM also accepts > sequences in which W is linked to W' or R by a > > strong-fence ; xb* ; {w and/or r}-pre-bounded > > sequence with no post-bounding, and in every case the LKMM also allows > the link simply to be a fence with no bounding at all. If no sequence > of the appropriate sort exists, the LKMM says that the accesses race. > > There is one more part of the LKMM related to plain accesses (although > not to data races) we should discuss. Recall that many relations such > as hb are limited to marked accesses only. As a result, the > happens-before, propagates-before, and rcu axioms (which state that > various relation must not contain a cycle) doesn't apply to plain > accesses. Nevertheless, we do want to rule out such cycles, because > they don't make sense even for plain accesses. > > To this end, the LKMM imposes three extra restrictions, together > called the "plain-coherence" axiom because of their resemblance to the > coherency rules: > > If R and W conflict and it is possible to link R to W by one > of the xb* sequences listed above, then W ->rfe R is not > allowed (i.e., a load cannot read from a store that it > executes before, even if one or both is plain). > > If W and R conflict and it is possible to link W to R by one > of the vis sequences listed above, then R ->fre W is not > allowed (i.e., if a store is visible to a load then the load > must read from that store or one coherence-after it). > > If W and W' conflict and it is possible to link W to W' by one > of the vis sequences listed above, then W' ->co W is not > allowed (i.e., if one store is visible to another then it must > come after in the coherence order). > > This is the extent to which the LKMM deals with plain accesses. > Perhaps it could say more (for example, plain accesses might > contribute to the ppo relation), but at the moment it seems that this > minimal, conservative approach is good enough. I will need to read this last section again. Perhaps more than once. ;-) Anyway, good stuff!!! Thanx, Paul