2019-01-17 00:39:50

by Andrea Parri

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

[...]

> The difficulty with incorporating plain accesses in the memory model
> is that the compiler has very few constraints on how it treats plain
> accesses. It can eliminate them, duplicate them, rearrange them,
> merge them, split them up, and goodness knows what else. To make some
> sense of this, I have taken the view that a plain access can exist
> (perhaps multiple times) within a certain bounded region of code.
> Ordering of two accesses X and Y means that we guarantee at least one
> instance of the X access must be executed before any instances of the
> Y access. (This is assuming that neither of the accesses is
> completely eliminated by the compiler; otherwise there is nothing to
> order!)
>
> After adding some simple definitions for the sets of plain and marked
> accesses and for compiler barriers, the patch updates the ppo
> relation. The basic idea here is that ppo can be broken down into
> categories: memory barriers, overwrites, and dependencies (including
> dep-rfi).
>
> Memory barriers always provide ordering (compiler barriers do
> not but they have indirect effects).
>
> Overwriting always provides ordering. This may seem
> surprising in the case where both X and Y are plain writes,
> but in that case the memory model will say that X can be
> eliminated unless there is at least a compiler barrier between
> X and Y, and this barrier will enforce the ordering.
>
> Some dependencies provide ordering and some don't. Going by
> cases:
>
> An address dependency to a read provides ordering when
> the source is a marked read, even when the target is a
> plain read. This is necessary if rcu_dereference() is
> to work correctly; it is tantamount to assuming that
> the compiler never speculates address dependencies.
> However, if the source is a plain read then there is
> no ordering. This is because of Alpha, which does not
> respect address dependencies to reads (on Alpha,
> marked reads include a memory barrier to enforce the
> ordering but plain reads do not).

Can the compiler (maybe, it does?) transform, at the C or at the "asm"
level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?

C LB1

{
int *x = &a;
}

P0(int **x, int *y)
{
int *r0;

r0 = rcu_dereference(*x);
*r0 = 0;
smp_wmb();
WRITE_ONCE(*y, 1);
}

P1(int **x, int *y, int *b)
{
int r0;

r0 = READ_ONCE(*y);
rcu_assign_pointer(*x, b);
}

exists (0:r0=b /\ 1:r0=1)


C LB2

{
int *x = &a;
}

P0(int **x, int *y)
{
int *r0;

r0 = rcu_dereference(*x);
if (*r0)
*r0 = 0;
smp_wmb();
WRITE_ONCE(*y, 1);
}

P1(int **x, int *y, int *b)
{
int r0;

r0 = READ_ONCE(*y);
rcu_assign_pointer(*x, b);
}

exists (0:r0=b /\ 1:r0=1)

LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
clause is not satisfiable, while LB2's "exists" clause is satisfiable.

I'm adding Nick to Cc (I never spoke with him, but from what I see in
LKML, he must understand compiler better than I currently do... ;-/ )

Andrea


>
> An address dependency to a write always provides
> ordering. Neither the compiler nor the CPU can
> speculate the address of a write, because a wrong
> guess could generate a data race. (Question: do we
> need to include the case where the source is a plain
> read?)
>
> A data or control dependency to a write provides
> ordering if the target is a marked write. This is
> because the compiler is obliged to translate a marked
> write as a single machine instruction; if it
> speculates such a write there will be no opportunity
> to correct a mistake.
>
> Dep-rfi (i.e., a data or address dependency from a
> read to a write which is then read from on the same
> CPU) provides ordering between the two reads if the
> target is a marked read. This is again because the
> marked read will be translated as a machine-level load
> instruction, and then the CPU will guarantee the
> ordering.
>
> There is a special case (data;rfi) that doesn't
> provide ordering in itself but can contribute to other
> orderings. A data;rfi link corresponds to situations
> where a value is stored in a temporary shared variable
> and then loaded back again. Since the compiler might
> choose to eliminate the temporary, its accesses can't
> be said to be ordered -- but the accesses around it
> might be. As a simple example, consider:
>
> r1 = READ_ONCE(ptr);
> tmp = r1;
> r2 = tmp;
> WRITE_ONCE(*r2, 5);
>
> The plain accesses involving tmp don't have any
> particular ordering requirements, but we do know that
> the READ_ONCE must be ordered before the WRITE_ONCE.
> The chain of relations is:
>
> [marked] ; data ; rfi ; addr ; [marked]
>
> showing that a data;rfi has been inserted into an
> address dependency from a marked read to a marked
> write. In general, any number of data;rfi links can
> be inserted in each of the other kinds of dependencies.


2019-01-17 17:13:18

by Andrea Parri

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Wed, Jan 16, 2019 at 10:36:58PM +0100, Andrea Parri wrote:
> [...]
>
> > The difficulty with incorporating plain accesses in the memory model
> > is that the compiler has very few constraints on how it treats plain
> > accesses. It can eliminate them, duplicate them, rearrange them,
> > merge them, split them up, and goodness knows what else. To make some
> > sense of this, I have taken the view that a plain access can exist
> > (perhaps multiple times) within a certain bounded region of code.
> > Ordering of two accesses X and Y means that we guarantee at least one
> > instance of the X access must be executed before any instances of the
> > Y access. (This is assuming that neither of the accesses is
> > completely eliminated by the compiler; otherwise there is nothing to
> > order!)
> >
> > After adding some simple definitions for the sets of plain and marked
> > accesses and for compiler barriers, the patch updates the ppo
> > relation. The basic idea here is that ppo can be broken down into
> > categories: memory barriers, overwrites, and dependencies (including
> > dep-rfi).
> >
> > Memory barriers always provide ordering (compiler barriers do
> > not but they have indirect effects).
> >
> > Overwriting always provides ordering. This may seem
> > surprising in the case where both X and Y are plain writes,
> > but in that case the memory model will say that X can be
> > eliminated unless there is at least a compiler barrier between
> > X and Y, and this barrier will enforce the ordering.
> >
> > Some dependencies provide ordering and some don't. Going by
> > cases:
> >
> > An address dependency to a read provides ordering when
> > the source is a marked read, even when the target is a
> > plain read. This is necessary if rcu_dereference() is
> > to work correctly; it is tantamount to assuming that
> > the compiler never speculates address dependencies.
> > However, if the source is a plain read then there is
> > no ordering. This is because of Alpha, which does not
> > respect address dependencies to reads (on Alpha,
> > marked reads include a memory barrier to enforce the
> > ordering but plain reads do not).
>
> Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
>
> C LB1
>
> {
> int *x = &a;
> }
>
> P0(int **x, int *y)
> {
> int *r0;
>
> r0 = rcu_dereference(*x);
> *r0 = 0;
> smp_wmb();
> WRITE_ONCE(*y, 1);
> }
>
> P1(int **x, int *y, int *b)
> {
> int r0;
>
> r0 = READ_ONCE(*y);
> rcu_assign_pointer(*x, b);
> }
>
> exists (0:r0=b /\ 1:r0=1)
>
>
> C LB2
>
> {
> int *x = &a;
> }
>
> P0(int **x, int *y)
> {
> int *r0;
>
> r0 = rcu_dereference(*x);
> if (*r0)
> *r0 = 0;
> smp_wmb();
> WRITE_ONCE(*y, 1);
> }
>
> P1(int **x, int *y, int *b)
> {
> int r0;
>
> r0 = READ_ONCE(*y);
> rcu_assign_pointer(*x, b);
> }
>
> exists (0:r0=b /\ 1:r0=1)
>
> LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> clause is not satisfiable, while LB2's "exists" clause is satisfiable.
>
> I'm adding Nick to Cc (I never spoke with him, but from what I see in
> LKML, he must understand compiler better than I currently do... ;-/ )
>
> Andrea
>
>
> >
> > An address dependency to a write always provides
> > ordering. Neither the compiler nor the CPU can
> > speculate the address of a write, because a wrong
> > guess could generate a data race. (Question: do we
> > need to include the case where the source is a plain
> > read?)
> >
> > A data or control dependency to a write provides
> > ordering if the target is a marked write. This is
> > because the compiler is obliged to translate a marked
> > write as a single machine instruction; if it
> > speculates such a write there will be no opportunity
> > to correct a mistake.
> >
> > Dep-rfi (i.e., a data or address dependency from a
> > read to a write which is then read from on the same
> > CPU) provides ordering between the two reads if the
> > target is a marked read. This is again because the
> > marked read will be translated as a machine-level load
> > instruction, and then the CPU will guarantee the
> > ordering.
> >
> > There is a special case (data;rfi) that doesn't
> > provide ordering in itself but can contribute to other
> > orderings. A data;rfi link corresponds to situations
> > where a value is stored in a temporary shared variable
> > and then loaded back again. Since the compiler might
> > choose to eliminate the temporary, its accesses can't
> > be said to be ordered -- but the accesses around it
> > might be. As a simple example, consider:
> >
> > r1 = READ_ONCE(ptr);
> > tmp = r1;
> > r2 = tmp;
> > WRITE_ONCE(*r2, 5);
> >
> > The plain accesses involving tmp don't have any
> > particular ordering requirements, but we do know that
> > the READ_ONCE must be ordered before the WRITE_ONCE.
> > The chain of relations is:
> >
> > [marked] ; data ; rfi ; addr ; [marked]
> >
> > showing that a data;rfi has been inserted into an
> > address dependency from a marked read to a marked
> > write. In general, any number of data;rfi links can
> > be inserted in each of the other kinds of dependencies.

As a more general comment (disclaimer), I'm not sure we want to/can add
all the constraints above. On one hand, for some of them, I ignore the
existence of current use cases in the source (and I don't quite see my-
self encouraging their adoption...); on the other hand, these certainly
do not make the model "simpler" or easier to maintain (in a sound way).

Moreover, I doubt that runtime checkers a la KTSan will ever be able to
assist the developer by supporting these "dependency orderings". [1]

Maybe we could start by adding those orderings that we know are "widely"
relied upon _and_ used by the developers, and later add more/strengthen
the model as needed (where feasible).

Thoughts?

Andrea

[1] https://groups.google.com/d/msg/ktsan/bVZ1c6H2NE0/gapvllYNBQAJ

2019-01-17 20:23:21

by Alan Stern

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Thu, 17 Jan 2019, Andrea Parri wrote:

> > > There is a special case (data;rfi) that doesn't
> > > provide ordering in itself but can contribute to other
> > > orderings. A data;rfi link corresponds to situations
> > > where a value is stored in a temporary shared variable
> > > and then loaded back again. Since the compiler might
> > > choose to eliminate the temporary, its accesses can't
> > > be said to be ordered -- but the accesses around it
> > > might be. As a simple example, consider:
> > >
> > > r1 = READ_ONCE(ptr);
> > > tmp = r1;
> > > r2 = tmp;
> > > WRITE_ONCE(*r2, 5);
> > >
> > > The plain accesses involving tmp don't have any
> > > particular ordering requirements, but we do know that
> > > the READ_ONCE must be ordered before the WRITE_ONCE.
> > > The chain of relations is:
> > >
> > > [marked] ; data ; rfi ; addr ; [marked]
> > >
> > > showing that a data;rfi has been inserted into an
> > > address dependency from a marked read to a marked
> > > write. In general, any number of data;rfi links can
> > > be inserted in each of the other kinds of dependencies.
>
> As a more general comment (disclaimer), I'm not sure we want to/can add
> all the constraints above. On one hand, for some of them, I ignore the
> existence of current use cases in the source (and I don't quite see my-
> self encouraging their adoption...); on the other hand, these certainly
> do not make the model "simpler" or easier to maintain (in a sound way).
>
> Moreover, I doubt that runtime checkers a la KTSan will ever be able to
> assist the developer by supporting these "dependency orderings". [1]
>
> Maybe we could start by adding those orderings that we know are "widely"
> relied upon _and_ used by the developers, and later add more/strengthen
> the model as needed (where feasible).
>
> Thoughts?

Right now I'm inclined to give up on all dependency orderings other
than address dependency from a marked read. But this would mean
missing things like

MR ->addr PR ->data MW

which ought to be a valid ordering (MR stands for "marked read", "PR"
for "plain read", and "MW" for "marked write"). Is that going to be
okay? Or should I also include data and control dependencies from
plain reads to marked writes?

Also, should this still include "[marked] ; (data ; rfi)* ; addr"?
Without it, we wouldn't be able to tell that the following test does
not race:


C non-race4

{
int *x = a;
}

P0(int **x, int *b)
{
*b = 1;
smp_wmb();
rcu_assign_pointer(*x, b);
}

P1(int **x, int **tmp)
{
int *r1;
int *r2;
int r3;

r1 = rcu_dereference(*x);
tmp = r1;
r2 = tmp;
r3 = *r2;
}

exists (1:r1=b /\ 1:r3=0)


And it seems reasonable that this pattern could be used in the kernel.
Although, I admit, in this scenario it's much more likely that tmp
would be a non-shared variable.

Alan


2019-01-17 21:34:47

by Alan Stern

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Wed, 16 Jan 2019, Andrea Parri wrote:

> Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
>
> C LB1
>
> {
> int *x = &a;
> }
>
> P0(int **x, int *y)
> {
> int *r0;
>
> r0 = rcu_dereference(*x);
> *r0 = 0;
> smp_wmb();
> WRITE_ONCE(*y, 1);
> }
>
> P1(int **x, int *y, int *b)
> {
> int r0;
>
> r0 = READ_ONCE(*y);
> rcu_assign_pointer(*x, b);
> }
>
> exists (0:r0=b /\ 1:r0=1)
>
>
> C LB2
>
> {
> int *x = &a;
> }
>
> P0(int **x, int *y)
> {
> int *r0;
>
> r0 = rcu_dereference(*x);
> if (*r0)
> *r0 = 0;
> smp_wmb();
> WRITE_ONCE(*y, 1);
> }
>
> P1(int **x, int *y, int *b)
> {
> int r0;
>
> r0 = READ_ONCE(*y);
> rcu_assign_pointer(*x, b);
> }
>
> exists (0:r0=b /\ 1:r0=1)
>
> LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> clause is not satisfiable, while LB2's "exists" clause is satisfiable.

Umm. Transforming

*r0 = 0;

to

if (*r0 != 0)
*r0 = 0;

wouldn't work on Alpha if r0 was assigned from a plain read with no
memory barrier between. But when r0 is assigned from an
rcu_dereference call, or if there's no indirection (as in "if (a != 0)
a = 0;"), the compiler is indeed allowed to perform this
transformation.

This means my definition of preserved writes was wrong; a write we
thought had to be preserved could instead be transformed into a read.

This objection throws a serious monkey wrench into my approach. For
one thing, it implies that (as in the example) we can't expect
smp_wmb() always to order plain writes. For another, it means we have
to assume a lot more writes need not be preserved.

I don't know. This may doom the effort to formalize dependencies to
plain accesses. Or at least, those other than address dependencies
from marked reads.

Alan


2019-01-18 15:12:58

by Alan Stern

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Thu, 17 Jan 2019, Andrea Parri wrote:

> > Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> >
> > C LB1
> >
> > {
> > int *x = &a;
> > }
> >
> > P0(int **x, int *y)
> > {
> > int *r0;
> >
> > r0 = rcu_dereference(*x);
> > *r0 = 0;
> > smp_wmb();
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int **x, int *y, int *b)
> > {
> > int r0;
> >
> > r0 = READ_ONCE(*y);
> > rcu_assign_pointer(*x, b);
> > }
> >
> > exists (0:r0=b /\ 1:r0=1)
> >
> >
> > C LB2
> >
> > {
> > int *x = &a;
> > }
> >
> > P0(int **x, int *y)
> > {
> > int *r0;
> >
> > r0 = rcu_dereference(*x);
> > if (*r0)
> > *r0 = 0;
> > smp_wmb();
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int **x, int *y, int *b)
> > {
> > int r0;
> >
> > r0 = READ_ONCE(*y);
> > rcu_assign_pointer(*x, b);
> > }
> >
> > exists (0:r0=b /\ 1:r0=1)
> >
> > LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> > clause is not satisfiable, while LB2's "exists" clause is satisfiable.

A relatively simple solution to this problem would be to say that
smp_wmb doesn't order plain writes.

I think the rest of the memory model would then be okay. However, I am
open to arguments that this approach is too complex and we should
insist on the same kind of strict ordering for race avoidance that the
C11 standard uses (i.e., conflicting accesses separated by full memory
barriers or release & acquire barriers or locking).

Alan


2019-01-18 15:59:30

by Andrea Parri

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Fri, Jan 18, 2019 at 10:10:22AM -0500, Alan Stern wrote:
> On Thu, 17 Jan 2019, Andrea Parri wrote:
>
> > > Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> > > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> > >
> > > C LB1
> > >
> > > {
> > > int *x = &a;
> > > }
> > >
> > > P0(int **x, int *y)
> > > {
> > > int *r0;
> > >
> > > r0 = rcu_dereference(*x);
> > > *r0 = 0;
> > > smp_wmb();
> > > WRITE_ONCE(*y, 1);
> > > }
> > >
> > > P1(int **x, int *y, int *b)
> > > {
> > > int r0;
> > >
> > > r0 = READ_ONCE(*y);
> > > rcu_assign_pointer(*x, b);
> > > }
> > >
> > > exists (0:r0=b /\ 1:r0=1)
> > >
> > >
> > > C LB2
> > >
> > > {
> > > int *x = &a;
> > > }
> > >
> > > P0(int **x, int *y)
> > > {
> > > int *r0;
> > >
> > > r0 = rcu_dereference(*x);
> > > if (*r0)
> > > *r0 = 0;
> > > smp_wmb();
> > > WRITE_ONCE(*y, 1);
> > > }
> > >
> > > P1(int **x, int *y, int *b)
> > > {
> > > int r0;
> > >
> > > r0 = READ_ONCE(*y);
> > > rcu_assign_pointer(*x, b);
> > > }
> > >
> > > exists (0:r0=b /\ 1:r0=1)
> > >
> > > LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> > > clause is not satisfiable, while LB2's "exists" clause is satisfiable.
>
> A relatively simple solution to this problem would be to say that
> smp_wmb doesn't order plain writes.

It seems so; I don't have other solutions to suggest ATM. (But, TBH,
I'm still in the process of reviewing/testing these changes... )

And yes, this is a pain! : I don't have the exact statistics, but I'm
willing to believe that removing this order will take us back ~99% of
the current (~500!) uses of smp_wmb() ;-/

Oh, well, maybe we'll find a better solution one day: after all, that
one doesn't seem worse than what the current LKMM has to say! ;-)


>
> I think the rest of the memory model would then be okay. However, I am
> open to arguments that this approach is too complex and we should
> insist on the same kind of strict ordering for race avoidance that the
> C11 standard uses (i.e., conflicting accesses separated by full memory
> barriers or release & acquire barriers or locking).

Indeed; maybe, we've just found another reason to obsolete smp_wmb()! ;-)

Andrea


>
> Alan
>

2019-01-18 16:44:25

by Alan Stern

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Fri, 18 Jan 2019, Andrea Parri wrote:

> > A relatively simple solution to this problem would be to say that
> > smp_wmb doesn't order plain writes.
>
> It seems so; I don't have other solutions to suggest ATM. (But, TBH,
> I'm still in the process of reviewing/testing these changes... )
>
> And yes, this is a pain! : I don't have the exact statistics, but I'm
> willing to believe that removing this order will take us back ~99% of
> the current (~500!) uses of smp_wmb() ;-/
>
> Oh, well, maybe we'll find a better solution one day: after all, that
> one doesn't seem worse than what the current LKMM has to say! ;-)
>
>
> >
> > I think the rest of the memory model would then be okay. However, I am
> > open to arguments that this approach is too complex and we should
> > insist on the same kind of strict ordering for race avoidance that the
> > C11 standard uses (i.e., conflicting accesses separated by full memory
> > barriers or release & acquire barriers or locking).
>
> Indeed; maybe, we've just found another reason to obsolete smp_wmb()! ;-)

Here's another example of how smp_wmb can cause trouble. In this test,
I have replaced "*x = 1" in P1 with "r2 = *x; if (r2 != 1) *x = 1",
which is a perfectly valid transformation for the compiler to make.
But as a result of this transformation, the MP pattern between P1 and
P2 is now allowed!

This shows that when plain accesses are involved, smp_wmb() in the
writing thread is not sufficient to forbid MP.

Alan


C bad-wmb

{}

P0(int *x, int *y)
{
WRITE_ONCE(*x, 1);
smp_store_release(y, 1);
}

P1(int *x, int *y, int *z)
{
int r1;
int r2;

r1 = smp_load_acquire(y);
if (r1) {
/* Instead of *x = 1 ... */
r2 = *x;
if (r2 != 1)
*x = 1;
smp_wmb();
WRITE_ONCE(*z, 1);
}
}

P2(int *x, int *z)
{
int r3;
int r4 = 0;

r3 = READ_ONCE(*z);
if (r3) {
smp_rmb();
r4 = READ_ONCE(*x);
}
}

exists (2:r3=1 /\ 2:r4=0)


2019-01-18 18:55:33

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Plain accesses and data races in the Linux Kernel Memory Model

On Thu, Jan 17, 2019 at 02:43:54PM -0500, Alan Stern wrote:
> On Wed, 16 Jan 2019, Andrea Parri wrote:
>
> > Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> >
> > C LB1
> >
> > {
> > int *x = &a;
> > }
> >
> > P0(int **x, int *y)
> > {
> > int *r0;
> >
> > r0 = rcu_dereference(*x);
> > *r0 = 0;
> > smp_wmb();
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int **x, int *y, int *b)
> > {
> > int r0;
> >
> > r0 = READ_ONCE(*y);
> > rcu_assign_pointer(*x, b);
> > }
> >
> > exists (0:r0=b /\ 1:r0=1)
> >
> >
> > C LB2
> >
> > {
> > int *x = &a;
> > }
> >
> > P0(int **x, int *y)
> > {
> > int *r0;
> >
> > r0 = rcu_dereference(*x);
> > if (*r0)
> > *r0 = 0;
> > smp_wmb();
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int **x, int *y, int *b)
> > {
> > int r0;
> >
> > r0 = READ_ONCE(*y);
> > rcu_assign_pointer(*x, b);
> > }
> >
> > exists (0:r0=b /\ 1:r0=1)
> >
> > LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> > clause is not satisfiable, while LB2's "exists" clause is satisfiable.
>
> Umm. Transforming
>
> *r0 = 0;
>
> to
>
> if (*r0 != 0)
> *r0 = 0;
>
> wouldn't work on Alpha if r0 was assigned from a plain read with no
> memory barrier between. But when r0 is assigned from an
> rcu_dereference call, or if there's no indirection (as in "if (a != 0)
> a = 0;"), the compiler is indeed allowed to perform this
> transformation.
>
> This means my definition of preserved writes was wrong; a write we
> thought had to be preserved could instead be transformed into a read.
>
> This objection throws a serious monkey wrench into my approach. For
> one thing, it implies that (as in the example) we can't expect
> smp_wmb() always to order plain writes. For another, it means we have
> to assume a lot more writes need not be preserved.
>
> I don't know. This may doom the effort to formalize dependencies to
> plain accesses. Or at least, those other than address dependencies
> from marked reads.

(Catching up, hello from Auckland!)

At this point, I am very much in favor of taking the simpler starting
point. If someone is using any sort of dependency from a plain access,
all bets are off. Similarly, if someone is using a control or data
dependency even from a marked access, the later dependent access must
be marked to guarantee ordering.

I believe that the transformation from "*r0 = 0" should be convincing. ;-)

Thanx, Paul