LinuxLists.cc - Questions about memory barriers

2004-10-12 14:54:23

Subject: Questions about memory barriers

I have a couple of questions regarding memory barriers.

The first question concerns read_barrier_depends(). I'm not sure exactly
what it does.

The documentation in include/asm-i386/system.h says:

* No data-dependent reads from memory-like regions are ever reordered
* over this barrier. All reads preceding this primitive are guaranteed
* to access memory (but not necessarily other CPUs' caches) before any
* reads following this primitive that depend on the data return by
* any of the preceding reads.

Taken at face value, this implies that all reads preceding
read_barrier_depends are guaranteed to access memory before the barrier
finishes. Even reads whose data is not used by a subsequent read. Is
this right?

Furthermore, the text's distinction of reads "that depend on the data
return[ed] by any of the preceding reads" is nearly meaningless. Almost
any read from a non-constant location could fall into that category.
Consider this example:

q = p;
<... millions of instructions ...>
read_barrier_depends();
d = *q;

How is the processor supposed to remember whether or not the value of q
depends on the earlier read of p? Obviously it can't, so it must assume
that such a dependency exists. Only if q had very recently been assigned
a constant value would the processor know otherwise.

Putting these ideas together, they amount to saying that
read_barrier_depends is just like rmb except that reads from a constant
location following the barrier are allowed to be moved before the barrier.
Have I missed anything?

The first code example in system.h is not informative. It says that this
code sequence:

q = p;
read_barrier_depends();
d = *q;

enforces ordering. But that means nothing; the ordering is already forced
by the C language definition. After all, it's impossible for the
processor to load data from *q before it knows what value is stored in q.

The other code example says that

y = b;
read_barrier_depends();
x = a;

enforces nothing since there is no dependency between the read of "b" and
the read of "a". But the other documentation doesn't require such a
dependency to exist; it only requires that the read of "a" depends on data
from a previous read -- which is quite likely unless "a" is a statically
allocated variable. Was that the intention? It's not clear; the example
seems to imply that read_barrier_depends enforces ordering only in
situations where the C language already enforces it.

My second question concerns guarantees about barriers and synchronization
primitives. It doesn't seem to be documented anywhere, but I would assume
the following statements are all true:

Acquiring a semaphore or spinlock implicitly includes a read
barrier.

Releasing a semaphore or spinlock implicitly includes a write
barrier.

Reading the value of an atomic_t implicitly includes a read
barrier.

Setting or changing the value of an atomic_t implicitly includes
a write barrier.

test_bit(), test_and_set_bit(), etc. implicitly include read
barriers.

set_bit(), clear_bit(), test_and_set_bit(), etc. implicitly
include write barriers.

Without some guarantees like these, the synchronization primitives would
be a lot harder to use. Are these statements in fact correct?

Alan Stern

2004-10-13 00:54:51

by Clemens Buchacher

[permalink] [raw]

Subject: Re: Questions about memory barriers

On Tue, Oct 12, 2004 at 10:50:01AM -0400, Alan Stern wrote:
> * No data-dependent reads from memory-like regions are ever reordered
> * over this barrier. All reads preceding this primitive are guaranteed
> * to access memory (but not necessarily other CPUs' caches) before any
> * reads following this primitive that depend on the data return by
> * any of the preceding reads.
>
> Taken at face value, this implies that all reads preceding
> read_barrier_depends are guaranteed to access memory before the barrier
> finishes. Even reads whose data is not used by a subsequent read. Is
> this right?

No. It only affects instructions that would not have completed before a
subsequent (dependent) read.

> Furthermore, the text's distinction of reads "that depend on the data
> return[ed] by any of the preceding reads" is nearly meaningless. Almost
> any read from a non-constant location could fall into that category.
> Consider this example:
>
> q = p;
> <... millions of instructions ...>
> read_barrier_depends();
> d = *q;
>
> How is the processor supposed to remember whether or not the value of q
> depends on the earlier read of p? Obviously it can't, so it must assume
> that such a dependency exists. Only if q had very recently been assigned
> a constant value would the processor know otherwise.

Exactly. And only these recent assignments, i.e. incomplete assignments, are
relevant to instruction reordering anyway. The processor cannot reorder
instructions for which execution is already over. It can move the instruction
in front of the barrier, but not in front of read instructions it depends on.

> The first code example in system.h is not informative. It says that this
> code sequence:
>
> q = p;
> read_barrier_depends();
> d = *q;
>
> enforces ordering. But that means nothing; the ordering is already forced
> by the C language definition. After all, it's impossible for the
> processor to load data from *q before it knows what value is stored in q.

But it _is_ possible. The technique is called 'speculative reading' (or value
prediction). The processor guesses the value of q and reads *q before it
finishes the assignment to q. If it was wrong, the read has to be repeated. But
there is a chance that the stall can be avoided.

> The other code example says that
>
> y = b;
> read_barrier_depends();
> x = a;
>
> enforces nothing since there is no dependency between the read of "b" and
> the read of "a". But the other documentation doesn't require such a
> dependency to exist; it only requires that the read of "a" depends on data
> from a previous read -- which is quite likely unless "a" is a statically
> allocated variable. Was that the intention? It's not clear; the example
> seems to imply that read_barrier_depends enforces ordering only in
> situations where the C language already enforces it.

It has nothing to do with the programming language. Of course data dependencies
have to be respected. But while the processor ensures data consistency of the
program it is running, instruction reordering can have undesired effects when
data is shared with programs running on other processors.

Let's analyze the example from include/asm-i386/system.h:

* For example, the following code would force ordering (the initial
* value of "a" is zero, "b" is one, and "p" is "&a"):
*
* <programlisting>
* CPU 0 CPU 1
*
* b = 2;
* memory_barrier();
* p = &b; q = p;
* read_barrier_depends();
* d = *q;
* </programlisting>

The memory_barrier() on CPU 0 ensures that if p points to b, b already has the
value 2 and not its original value 1 any more. So if q is assigned the value p
on CPU 1, it subsequently points to either a (which is still zero) or to b, in
which case we would expect *q to give the value 2.

If, on the other hand read_barrier_depends() was not there, the processor could
try to predict the value of q and read *q before p is assigned to q. Consider
the following case (actual order of execution):

CPU 0 CPU 1

d = *q; // predict q to become pointer to b
b = 2;
memory_barrier(); [ ... other instructions ... ]
p = &b;
q = p; // q points to b, prediction
// turns out to be correct

Now d suddenly has the value 1, even though p never pointed to a variable
holding that value!

Clemens

2004-10-13 15:25:35

by Alan Stern

[permalink] [raw]

Subject: Re: Questions about memory barriers

On Wed, 13 Oct 2004, Clemens Buchacher wrote:

> On Tue, Oct 12, 2004 at 10:50:01AM -0400, Alan Stern wrote:
> > * No data-dependent reads from memory-like regions are ever reordered
> > * over this barrier. All reads preceding this primitive are guaranteed
> > * to access memory (but not necessarily other CPUs' caches) before any
> > * reads following this primitive that depend on the data return by
> > * any of the preceding reads.
> >
> > Taken at face value, this implies that all reads preceding
> > read_barrier_depends are guaranteed to access memory before the barrier
> > finishes. Even reads whose data is not used by a subsequent read. Is
> > this right?
>
> No. It only affects instructions that would not have completed before a
> subsequent (dependent) read.

Does it _affect_ those instructions, or does it simply wait until those
instructions have completed? And how does the processor know, when it
executes the barrier, which pending reads will be depended on (and so must
be completed) and which won't?

The impression I get from what you wrote below is that the barrier
instruction causes the processor to abandon all speculative reads until
any already-issued reads have completed. Isn't that essentially the same
as saying that no speculative (i.e., non-constant) read can be moved back
past the barrier and that the barrier won't finish until all
already-issued reads have completed (in other words, these reads can't be
moved forward past the barrier)?

> > The first code example in system.h is not informative. It says that this
> > code sequence:
> >
> > q = p;
> > read_barrier_depends();
> > d = *q;
> >
> > enforces ordering. But that means nothing; the ordering is already forced
> > by the C language definition. After all, it's impossible for the
> > processor to load data from *q before it knows what value is stored in q.
>
> But it _is_ possible. The technique is called 'speculative reading' (or value
> prediction). The processor guesses the value of q and reads *q before it
> finishes the assignment to q. If it was wrong, the read has to be repeated. But
> there is a chance that the stall can be avoided.

> It has nothing to do with the programming language. Of course data dependencies
> have to be respected. But while the processor ensures data consistency of the
> program it is running, instruction reordering can have undesired effects when
> data is shared with programs running on other processors.
>
> Let's analyze the example from include/asm-i386/system.h:
>
> * For example, the following code would force ordering (the initial
> * value of "a" is zero, "b" is one, and "p" is "&a"):
> *
> * <programlisting>
> * CPU 0 CPU 1
> *
> * b = 2;
> * memory_barrier();
> * p = &b; q = p;
> * read_barrier_depends();
> * d = *q;
> * </programlisting>
>
> The memory_barrier() on CPU 0 ensures that if p points to b, b already has the
> value 2 and not its original value 1 any more. So if q is assigned the value p
> on CPU 1, it subsequently points to either a (which is still zero) or to b, in
> which case we would expect *q to give the value 2.
>
> If, on the other hand read_barrier_depends() was not there, the processor could
> try to predict the value of q and read *q before p is assigned to q. Consider
> the following case (actual order of execution):
>
> CPU 0 CPU 1
>
> d = *q; // predict q to become pointer to b
> b = 2;
> memory_barrier(); [ ... other instructions ... ]
> p = &b;
> q = p; // q points to b, prediction
> // turns out to be correct
>
> Now d suddenly has the value 1, even though p never pointed to a variable
> holding that value!

Okay, I get it. The example would be even clearer if you stipulate that
initially q = &b; that would give the CPU a good reason for speculatively
fetching the value of b.

This seems like a devilishly easy sort of thing to overlook!

Alan Stern

2004-10-13 17:00:15

by Clemens Buchacher

[permalink] [raw]

Subject: Re: Questions about memory barriers

On Wed, Oct 13, 2004 at 11:25:30AM -0400, Alan Stern wrote:
> The impression I get from what you wrote below is that the barrier
> instruction causes the processor to abandon all speculative reads until
> any already-issued reads have completed. Isn't that essentially the same
> as saying that no speculative (i.e., non-constant) read can be moved back
> past the barrier and that the barrier won't finish until all
> already-issued reads have completed (in other words, these reads can't be
> moved forward past the barrier)?

Well, first you have to invalidate any preceding speculative reads and, as you
correctly stated, abstain from any further speculative reads until all reads
preceding the barrier have completed. This restriction, however, does not force
us to wait at the barrier until all reads have completed (as a read barrier
would). We can continue to execute instructions for which all reads they depend
on have completed.

As a result, all read_barrier_depends() does is to prohibit reordering of
dependent instructions _across_ the barrier.

> This seems like a devilishly easy sort of thing to overlook!

True. Note that Alpha is the only architecture whose read_barrier_depends()
macro expands to something other than a no-op.

Clemens