LinuxLists.cc - Re: [RFC] LKMM: Add volatile

2021-06-04 18:28:35

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 06:17:20PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 04, 2021 at 11:51:54AM -0400, Alan Stern wrote:
> > On Fri, Jun 04, 2021 at 05:42:28PM +0200, Peter Zijlstra wrote:
>
> > > #define volatile_if(cond) if (({ bool __t = (cond); BUILD_BUG_ON(__builtin_constant_p(__t)); volatile_cond(__t); }))
> >
> > That won't help with more complicated examples, such as:
> >
> > volatile_if (READ_ONCE(*x) * 0 + READ_ONCE(*y))
>
> That's effectively:
>
> volatile_if (READ_ONCE(*y))
> WRITE_ONCE(*y, 42);

Sorry, what I meant to write was:

volatile_if (READ_ONCE(*x) * 0 + READ_ONCE(*y))
WRITE_ONCE(*z, 42);

where there is no ordering between *x and *z. It's not daft, and yes, a
macro won't be able to warn about it.

Alan

> which is a valid, but daft, LOAD->STORE order, no? A compiler might
> maybe be able to WARN on that, but that's definitely beyond what we can
> do with macros.

2021-06-04 19:14:31

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 11:27 AM Alan Stern <[email protected]> wrote:
>
> volatile_if (READ_ONCE(*x) * 0 + READ_ONCE(*y))
> WRITE_ONCE(*z, 42);
>
> where there is no ordering between *x and *z.

I wouldn't worry about it.

I think a compiler is allowed to optimize away stupid code.

I get upset when a compiler says "oh, that's undefined, so I will
ignore the obvious meaning of it", but that's a different thing
entirely.

I really wish that the C standards group showed some spine, and said
"there is no undefined, there is only implementation-defined". That
would solve a *lot* of problems.

But I also realize that will never happen. Because "spine" and "good
taste" is not something that I've ever heard of happening in an
industry standards committee.

Side note: it is worth noting that my version of "volatile_if()" has
an added little quirk: it _ONLY_ orders the stuff inside the
if-statement.

I do think it's worth not adding new special cases (especially that
"asm goto" hack that will generate worse code than the compiler could
do), but it means that

x = READ_ONCE(ptr);
volatile_if (x > 0)
WRITE_ONCE(*z, 42);

has an ordering, but if you write it as

x = READ_ONCE(ptr);
volatile_if (x <= 0)
return;
WRITE_ONCE(*z, 42);

then I could in theory see teh compiler doing that WRITE_ONCE() as
some kind of non-control dependency.

That said, I don't actually see how the compiler could do anything
that actually broke the _semantics_ of the code. Yes, it could do the
write using a magical data dependency on the conditional and turning
it into a store on a conditional address instead (before doing the
branch), but honestly, I don't see how that would actually break
anything.

So this is more of a "in theory, the two sides are not symmetric". The
"asm volatile" in a barrier() will force the compiler to generate the
branch, and the memory clobber in barrier() will most certainly force
any stores inside the "volatile_if()" to be after the branch.

But because the memory clobber is only inside the if-statement true
case, the false case could have the compiler migrate any code in that
false thing to before the if.

Again, semantics do matter, and I don't see how the compiler could
actually break the fundamental issue of "load->conditional->store is a
fundamental ordering even without memory barriers because of basic
causality", because you can't just arbitrarily generate speculative
stores that would be visible to others.

But at the same time, that's *such* a fundamental rule that I really
am intrigued why people think "volatile_if()" is needed in reality (as
opposed to some "in theory, the compiler can know things that are
unknowable thanks to a magical oracle" BS argument)

Linus

2021-06-04 19:21:47

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 12:09 PM Linus Torvalds
<[email protected]> wrote:
>
> Again, semantics do matter, and I don't see how the compiler could
> actually break the fundamental issue of "load->conditional->store is a
> fundamental ordering even without memory barriers because of basic
> causality", because you can't just arbitrarily generate speculative
> stores that would be visible to others.

This, after all, is why we trust that the *hardware* can't do it.

Even if the hardware mis-speculates and goes down the wrong branch,
and speculatively does the store when it shouldn't have, we don't
care: we know that such a speculative store can not possibly become
semantically visible (*) to other threads.

For all the same reasons, I don't see how a compiler can violate
causal ordering of the code (assuming, again, that the test is
_meaningful_ - if we write nonsensical code, that's a different
issue).

If we have compilers that create speculative stores that are visible
to other threads, we need to fix them.

Linus

(*) By "semantically visible" I intend to avoid the whole timing/cache
pattern kind of non-semantic visibility that is all about the spectre
leakage kind of things.

2021-06-04 20:59:52

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 12:18:43PM -0700, Linus Torvalds wrote:
> On Fri, Jun 4, 2021 at 12:09 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Again, semantics do matter, and I don't see how the compiler could
> > actually break the fundamental issue of "load->conditional->store is a
> > fundamental ordering even without memory barriers because of basic
> > causality", because you can't just arbitrarily generate speculative
> > stores that would be visible to others.
>
> This, after all, is why we trust that the *hardware* can't do it.
>
> Even if the hardware mis-speculates and goes down the wrong branch,
> and speculatively does the store when it shouldn't have, we don't
> care: we know that such a speculative store can not possibly become
> semantically visible (*) to other threads.
>
> For all the same reasons, I don't see how a compiler can violate
> causal ordering of the code (assuming, again, that the test is
> _meaningful_ - if we write nonsensical code, that's a different
> issue).

I am probably missing your point, but something like this:

if (READ_ONCE(x))
y = 42;
else
y = 1729;

Can in theory be transformed into something like this:

y = 1729;
if (READ_ONCE(x))
y = 42;

The usual way to prevent it is to use WRITE_ONCE().

Fortunately, register sets are large, and gcc manages to do a single
store and use only %eax.

Thanx, Paul

> If we have compilers that create speculative stores that are visible
> to other threads, we need to fix them.
>
> Linus
>
> (*) By "semantically visible" I intend to avoid the whole timing/cache
> pattern kind of non-semantic visibility that is all about the spectre
> leakage kind of things.

2021-06-04 21:31:03

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 1:56 PM Paul E. McKenney <[email protected]> wrote:
>
> The usual way to prevent it is to use WRITE_ONCE().

The very *documentation example* for "volatile_if()" uses that WRITE_ONCE().

IOW, the patch that started this discussion has this comment in it:

+/**
+ * volatile_if() - Provide a control-dependency
+ *
+ * volatile_if(READ_ONCE(A))
+ * WRITE_ONCE(B, 1);
+ *
+ * will ensure that the STORE to B happens after the LOAD of A.

and my point is that I don't see *ANY THEORETICALLY POSSIBLE* way that
that "volatile_if()" could not be just a perfectly regular "if ()".

Can you?

Because we *literally* depend on the fundamental concept of causality
to make the hardware not re-order those operations.

That is the WHOLE AND ONLY point of this whole construct: we're
avoiding a possibly expensive hardware barrier operation, because we
know we have a more fundamental barrier that is INHERENT TO THE
OPERATION.

And I cannot for the life of me see how a compiler can break that
fundamental concept of causality either.

Seriously. Tell me how a compiler could _possibly_ turn that into
something that breaks the fundamental causal relationship. The same
fundamental causal relationship that is the whole and only reason we
don't need a memory barrier for the hardware.

And no, there is not a way in hell that the above can be written with
some kind of semantically visible speculative store without the
compiler being a total pile of garbage that wouldn't be usable for
compiling a kernel with.

If your argument is that the compiler can magically insert speculative
stores that can then be overwritten later, then MY argument is that
such a compiler could do that for *ANYTHING*. "volatile_if()" wouldn't
save us.

If that's valid compiler behavior in your opinion, then we have
exactly two options:

(a) give up

(b) not use that broken garbage of a compiler.

So I can certainly accept the patch with the simpler implementation of
"volatile_if()", but dammit, I want to see an actual real example
arguing for why it would be relevant and why the compiler would need
our help.

Because the EXACT VERY EXAMPLE that was in the patch as-is sure as
hell is no such thing.

If the intent is to *document* that "this conditional is part of a
load-conditional-store memory ordering pattern, then that is one
thing. But if that's the intent, then we might as well just write it
as

#define volatile_if(x) if (x)

and add a *comment* about why this kind of sequence doesn't need a
memory barrier.

I'd much rather have that kind of documentation, than have barriers
that are magical for theoretical compiler issues that aren't real, and
don't have any grounding in reality.

Without a real and valid example of how this could matter, this is
just voodoo programming.

We don't actually need to walk three times widdershins around the
computer before compiling the kernel.That's not how kernel development
works.

And we don't need to add a "volatile_if()" with magical barriers that
have no possibility of having real semantic meaning.

So I want to know what the semantic meaning of volatile_if() would be,
and why it fixes anything that a plain "if()" wouldn't. I want to see
the sequence where that "volatile_if()" actually *fixes* something.

Linus

2021-06-04 21:44:10

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 02:27:49PM -0700, Linus Torvalds wrote:
> On Fri, Jun 4, 2021 at 1:56 PM Paul E. McKenney <[email protected]> wrote:
> >
> > The usual way to prevent it is to use WRITE_ONCE().
>
> The very *documentation example* for "volatile_if()" uses that WRITE_ONCE().

Whew! ;-)

> IOW, the patch that started this discussion has this comment in it:
>
> +/**
> + * volatile_if() - Provide a control-dependency
> + *
> + * volatile_if(READ_ONCE(A))
> + * WRITE_ONCE(B, 1);
> + *
> + * will ensure that the STORE to B happens after the LOAD of A.
>
> and my point is that I don't see *ANY THEORETICALLY POSSIBLE* way that
> that "volatile_if()" could not be just a perfectly regular "if ()".
>
> Can you?

I cannot, maybe due to failure of imagination. But please see below.

> Because we *literally* depend on the fundamental concept of causality
> to make the hardware not re-order those operations.
>
> That is the WHOLE AND ONLY point of this whole construct: we're
> avoiding a possibly expensive hardware barrier operation, because we
> know we have a more fundamental barrier that is INHERENT TO THE
> OPERATION.
>
> And I cannot for the life of me see how a compiler can break that
> fundamental concept of causality either.
>
> Seriously. Tell me how a compiler could _possibly_ turn that into
> something that breaks the fundamental causal relationship. The same
> fundamental causal relationship that is the whole and only reason we
> don't need a memory barrier for the hardware.
>
> And no, there is not a way in hell that the above can be written with
> some kind of semantically visible speculative store without the
> compiler being a total pile of garbage that wouldn't be usable for
> compiling a kernel with.
>
> If your argument is that the compiler can magically insert speculative
> stores that can then be overwritten later, then MY argument is that
> such a compiler could do that for *ANYTHING*. "volatile_if()" wouldn't
> save us.
>
> If that's valid compiler behavior in your opinion, then we have
> exactly two options:
>
> (a) give up
>
> (b) not use that broken garbage of a compiler.
>
> So I can certainly accept the patch with the simpler implementation of
> "volatile_if()", but dammit, I want to see an actual real example
> arguing for why it would be relevant and why the compiler would need
> our help.
>
> Because the EXACT VERY EXAMPLE that was in the patch as-is sure as
> hell is no such thing.
>
> If the intent is to *document* that "this conditional is part of a
> load-conditional-store memory ordering pattern, then that is one
> thing. But if that's the intent, then we might as well just write it
> as
>
> #define volatile_if(x) if (x)
>
> and add a *comment* about why this kind of sequence doesn't need a
> memory barrier.
>
> I'd much rather have that kind of documentation, than have barriers
> that are magical for theoretical compiler issues that aren't real, and
> don't have any grounding in reality.
>
> Without a real and valid example of how this could matter, this is
> just voodoo programming.
>
> We don't actually need to walk three times widdershins around the
> computer before compiling the kernel.That's not how kernel development
> works.
>
> And we don't need to add a "volatile_if()" with magical barriers that
> have no possibility of having real semantic meaning.
>
> So I want to know what the semantic meaning of volatile_if() would be,
> and why it fixes anything that a plain "if()" wouldn't. I want to see
> the sequence where that "volatile_if()" actually *fixes* something.

Here is one use case:

volatile_if(READ_ONCE(A)) {
WRITE_ONCE(B, 1);
do_something();
} else {
WRITE_ONCE(B, 1);
do_something_else();
}

With plain "if", the compiler is within its rights to do this:

tmp = READ_ONCE(A);
WRITE_ONCE(B, 1);
if (tmp)
do_something();
else
do_something_else();

On x86, still no problem. But weaker hardware could now reorder the
store to B before the load from A. With volatile_if(), this reordering
would be prevented.

Thanx, Paul

2021-06-04 22:22:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 2:40 PM Paul E. McKenney <[email protected]> wrote:
>
> Here is one use case:
>
> volatile_if(READ_ONCE(A)) {
> WRITE_ONCE(B, 1);
> do_something();
> } else {
> WRITE_ONCE(B, 1);
> do_something_else();
> }
>
> With plain "if", the compiler is within its rights to do this:
>
> tmp = READ_ONCE(A);
> WRITE_ONCE(B, 1);
> if (tmp)
> do_something();
> else
> do_something_else();
>
> On x86, still no problem. But weaker hardware could now reorder the
> store to B before the load from A. With volatile_if(), this reordering
> would be prevented.

But *should* it be prevented? For code like the above?

I'm not really seeing that the above is a valid code sequence.

Sure, that "WRITE_ONCE(B, 1)" could be seen as a lock release, and
then it would be wrong to have the read of 'A' happen after the lock
has actually been released. But if that's the case, then it should
have used a smp_store_release() in the first place, not a
WRITE_ONCE().

So I don't see the above as much of a valid example of actual
READ/WRITE_ONCE() use.

If people use READ/WRITE_ONCE() like the above, and they actually
depend on that kind of ordering, I think that code is likely wrong to
begin with. Using "volatile_if()" doesn't make it more valid.

Now, part of this is that I do think that in *general* we should never
use this very suble load-cond-store pattern to begin with. We should
strive to use more smp_load_acquire() and smp_store_release() if we
care about ordering of accesses. They are typically cheap enough, and
if there's much of an ordering issue, they are the right things to do.

I think the whole "load-to-store ordering" subtle non-ordered case is
for very very special cases, when you literally don't have a general
memory ordering, you just have an ordering for *one* very particular
access. Like some of the very magical code in the rw-semaphore case,
or that smp_cond_load_acquire().

IOW, I would expect that we have a handful of uses of this thing. And
none of them have that "the conditional store is the same on both
sides" pattern, afaik.

And immediately when the conditional store is different, you end up
having a dependency on it that orders it.

But I guess I can accept the above made-up example as an "argument",
even though I feel it is entirely irrelevant to the actual issues and
uses we have.

Linus

2021-06-05 03:15:48

by Alan Stern

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 12:09:26PM -0700, Linus Torvalds wrote:
> Side note: it is worth noting that my version of "volatile_if()" has
> an added little quirk: it _ONLY_ orders the stuff inside the
> if-statement.
>
> I do think it's worth not adding new special cases (especially that
> "asm goto" hack that will generate worse code than the compiler could
> do), but it means that
>
> x = READ_ONCE(ptr);
> volatile_if (x > 0)
> WRITE_ONCE(*z, 42);
>
> has an ordering, but if you write it as
>
> x = READ_ONCE(ptr);
> volatile_if (x <= 0)
> return;
> WRITE_ONCE(*z, 42);
>
> then I could in theory see teh compiler doing that WRITE_ONCE() as
> some kind of non-control dependency.

This may be a minor point, but can that loophole be closed as follows?

define volatile_if(x) \
if ((({ _Bool __x = (x); BUILD_BUG_ON(__builtin_constant_p(__x)); __x; }) && \
({ barrier(); 1; })) || ({ barrier(); 0; }))

(It's now a little later at night than when I usually think about this
sort of thing, so my brain isn't firing on all its cylinders. Forgive
me if this is a dumb question.)

Alan

2021-06-05 14:59:41

by Alan Stern

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 03:19:11PM -0700, Linus Torvalds wrote:
> Now, part of this is that I do think that in *general* we should never
> use this very suble load-cond-store pattern to begin with. We should
> strive to use more smp_load_acquire() and smp_store_release() if we
> care about ordering of accesses. They are typically cheap enough, and
> if there's much of an ordering issue, they are the right things to do.
>
> I think the whole "load-to-store ordering" subtle non-ordered case is
> for very very special cases, when you literally don't have a general
> memory ordering, you just have an ordering for *one* very particular
> access. Like some of the very magical code in the rw-semaphore case,
> or that smp_cond_load_acquire().
>
> IOW, I would expect that we have a handful of uses of this thing. And
> none of them have that "the conditional store is the same on both
> sides" pattern, afaik.
>
> And immediately when the conditional store is different, you end up
> having a dependency on it that orders it.
>
> But I guess I can accept the above made-up example as an "argument",
> even though I feel it is entirely irrelevant to the actual issues and
> uses we have.

Indeed, the expansion of the currently proposed version of

volatile_if (A) {
B;
} else {
C;
}

is basically the same as

if (A) {
barrier();
B;
} else {
barrier();
C;
}

which is just about as easy to write by hand. (For some reason my
fingers don't like typing "volatile_"; the letters tend to get
scrambled.)

So given that:

1. Reliance on control dependencies is uncommon in the kernel,
and

2. The loads in A could just be replaced with load_acquires
at a low penalty (or store-releases could go into B and C),

it seems that we may not need volatile_if at all! The only real reason
for having it in the first place was to avoid the penalty of
load-acquire on architectures where it has a significant cost, when the
control dependency would provide the necessary ordering for free. Such
architectures are getting less and less common.

Alan

2021-06-05 16:28:33

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 8:14 PM Alan Stern <[email protected]> wrote:
>
> >
> > then I could in theory see teh compiler doing that WRITE_ONCE() as
> > some kind of non-control dependency.
>
> This may be a minor point, but can that loophole be closed as follows?

Note that it's actually entirely sufficient to have the barrier just
on one side.

I brought it up mainly as an oddity, and that it can result in the
compiler generating different code for the two different directions.

The reason that it is sufficient is that with the barrier in place (on
either side), the compiler really can't do much. It can't join either
of the sides, because it has to do that barrier on one side before any
common code.

In fact, even if the compiler decides to first do a conditional call
just around the barrier, and then do any common code (and then do
_another_ conditional branch), it still did that conditional branch
first, and the problem is solved. The CPU doesn't care, it will have
to resolve the branch before any subsequent stores are finalized.

Of course, if the compiler creates a conditional call just around the
barrier, and the barrier is empty (like we do now), and the compiler
leaves no mark of it in the result (like it does seem to do for empty
asm stataments), I could imagine some optimizing assembler (or linker)
screwing things up for us, and saying "a conditional branch to the
next instruction can just be removed).

At that point, we've lost again, and it's a toolchain issue. I don't
think that issue can currently happen, but it's an example of yet
another really subtle problem that *could* happen even if *we* do
everything right.

I also do not believe that any of our code that has this pattern would
have that situation where the compiler would generate a branch over
just the barrier. It's kind of similar to Paul's example in that
sense. When we use volatile_if(), the two sides are very very
different entirely regardless of the barrier, so in practice I think
this is all entirely moot.

Linus

2021-06-06 00:16:19

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Sat, Jun 05, 2021 at 10:57:39AM -0400, Alan Stern wrote:
> On Fri, Jun 04, 2021 at 03:19:11PM -0700, Linus Torvalds wrote:
> > Now, part of this is that I do think that in *general* we should never
> > use this very suble load-cond-store pattern to begin with. We should
> > strive to use more smp_load_acquire() and smp_store_release() if we
> > care about ordering of accesses. They are typically cheap enough, and
> > if there's much of an ordering issue, they are the right things to do.
> >
> > I think the whole "load-to-store ordering" subtle non-ordered case is
> > for very very special cases, when you literally don't have a general
> > memory ordering, you just have an ordering for *one* very particular
> > access. Like some of the very magical code in the rw-semaphore case,
> > or that smp_cond_load_acquire().
> >
> > IOW, I would expect that we have a handful of uses of this thing. And
> > none of them have that "the conditional store is the same on both
> > sides" pattern, afaik.
> >
> > And immediately when the conditional store is different, you end up
> > having a dependency on it that orders it.
> >
> > But I guess I can accept the above made-up example as an "argument",
> > even though I feel it is entirely irrelevant to the actual issues and
> > uses we have.
>
> Indeed, the expansion of the currently proposed version of
>
> volatile_if (A) {
> B;
> } else {
> C;
> }
>
> is basically the same as
>
> if (A) {
> barrier();
> B;
> } else {
> barrier();
> C;
> }
>
> which is just about as easy to write by hand. (For some reason my
> fingers don't like typing "volatile_"; the letters tend to get
> scrambled.)
>
> So given that:
>
> 1. Reliance on control dependencies is uncommon in the kernel,
> and
>
> 2. The loads in A could just be replaced with load_acquires
> at a low penalty (or store-releases could go into B and C),
>
> it seems that we may not need volatile_if at all! The only real reason
> for having it in the first place was to avoid the penalty of
> load-acquire on architectures where it has a significant cost, when the
> control dependency would provide the necessary ordering for free. Such
> architectures are getting less and less common.

That does sound good, but...

Current compilers beg to differ at -O2: https://godbolt.org/z/5K55Gardn

------------------------------------------------------------------------
#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
#define WRITE_ONCE(x, val) (READ_ONCE(x) = (val))
#define barrier() __asm__ __volatile__("": : :"memory")

int x, y;

int main(int argc, char *argv[])
{
if (READ_ONCE(x)) {
barrier();
WRITE_ONCE(y, 1);
} else {
barrier();
WRITE_ONCE(y, 1);
}
return 0;
}
------------------------------------------------------------------------

Both gcc and clang generate a load followed by a store, with no branch.
ARM gets the same results from both compilers.

As Linus suggested, removing one (but not both!) invocations of barrier()
does cause a branch to be emitted, so maybe that is a way forward.
Assuming it is more than just dumb luck, anyway. :-/

Thanx, Paul

2021-06-06 01:37:02

by Alan Stern

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Sat, Jun 05, 2021 at 05:14:18PM -0700, Paul E. McKenney wrote:
> On Sat, Jun 05, 2021 at 10:57:39AM -0400, Alan Stern wrote:
> > Indeed, the expansion of the currently proposed version of
> >
> > volatile_if (A) {
> > B;
> > } else {
> > C;
> > }
> >
> > is basically the same as
> >
> > if (A) {
> > barrier();
> > B;
> > } else {
> > barrier();
> > C;
> > }

> That does sound good, but...
>
> Current compilers beg to differ at -O2: https://godbolt.org/z/5K55Gardn
>
> ------------------------------------------------------------------------
> #define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> #define WRITE_ONCE(x, val) (READ_ONCE(x) = (val))
> #define barrier() __asm__ __volatile__("": : :"memory")
>
> int x, y;
>
> int main(int argc, char *argv[])
> {
> if (READ_ONCE(x)) {
> barrier();
> WRITE_ONCE(y, 1);
> } else {
> barrier();
> WRITE_ONCE(y, 1);
> }
> return 0;
> }
> ------------------------------------------------------------------------
>
> Both gcc and clang generate a load followed by a store, with no branch.
> ARM gets the same results from both compilers.
>
> As Linus suggested, removing one (but not both!) invocations of barrier()
> does cause a branch to be emitted, so maybe that is a way forward.
> Assuming it is more than just dumb luck, anyway. :-/

Interesting. And changing one of the branches from barrier() to __asm__
__volatile__("nop": : :"memory") also causes a branch to be emitted. So
even though the compiler doesn't "look inside" assembly code, it does
compare two pieces at least textually and apparently assumes if they are
identical then they do the same thing.

Alan

2021-06-06 03:49:50

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Sat, Jun 5, 2021 at 6:29 PM Alan Stern <[email protected]> wrote:
>
> Interesting. And changing one of the branches from barrier() to __asm__
> __volatile__("nop": : :"memory") also causes a branch to be emitted. So
> even though the compiler doesn't "look inside" assembly code, it does
> compare two pieces at least textually and apparently assumes if they are
> identical then they do the same thing.

That's actually a feature in some cases, ie the ability to do CSE on
asm statements (ie the "always has the same output" optimization that
the docs talk about).

So gcc has always looked at the asm string for that reason, afaik.

I think it's something of a bug when it comes to "asm volatile", but
the documentation isn't exactly super-specific.

There is a statement of "Under certain circumstances, GCC may
duplicate (or remove duplicates of) your assembly code when
optimizing" and a suggestion of using "%=" to generate a unique
instance of an asm.

Which might actually be a good idea for "barrier()", just in case.
However, the problem with that is that I don't think we are guaranteed
to have a universal comment character for asm statements.

IOW, it might be a good idea to do something like

#define barrier() \
__asm__ __volatile__("# barrier %=": : :"memory")

but I'm not 100% convinced that '#' is always a comment in asm code,
so the above might not actually build everywhere.

However, *testing* the above (in my config, where '#' does work as a
comment character) shows that gcc doesn't actually consider them to be
distinct EVEN THEN, and will still merge two barrier statements.

That's distressing.

So the gcc docs are actively wrong, and %= does nothing - it will
still compare as the exact same inline asm, because the string
equality testing is apparently done before any expansion.

Something like this *does* seem to work:

#define ____barrier(id) __asm__ __volatile__("#" #id: : :"memory")
#define __barrier(id) ____barrier(id)
#define barrier() __barrier(__COUNTER__)

which is "interesting" or "disgusting" depending on how you happen to feel.

And again - the above works only as long as "#" is a valid comment
character in the assembler. And I have this very dim memory of us
having comments in inline asm, and it breaking certain configurations
(for when the assembler that the compiler uses is a special
human-unfriendly one that only accepts compiler output).

You could make even more disgusting hacks, and have it generate something like

.pushsection .discard.barrier
.long #id
.popsection

instead of a comment. We already expect that to work and have generic
inline asm cases that generate code like that.

Linus

2021-06-06 04:48:06

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [RFC] LKMM: Add volatile_if()

On Sat, Jun 05, 2021 at 08:41:00PM -0700, Linus Torvalds wrote:
> On Sat, Jun 5, 2021 at 6:29 PM Alan Stern <[email protected]> wrote:
> >
> > Interesting. And changing one of the branches from barrier() to __asm__
> > __volatile__("nop": : :"memory") also causes a branch to be emitted. So
> > even though the compiler doesn't "look inside" assembly code, it does
> > compare two pieces at least textually and apparently assumes if they are
> > identical then they do the same thing.
>
> That's actually a feature in some cases, ie the ability to do CSE on
> asm statements (ie the "always has the same output" optimization that
> the docs talk about).

Agreed, albeit reluctantly. ;-)

> So gcc has always looked at the asm string for that reason, afaik.
>
> I think it's something of a bug when it comes to "asm volatile", but
> the documentation isn't exactly super-specific.
>
> There is a statement of "Under certain circumstances, GCC may
> duplicate (or remove duplicates of) your assembly code when
> optimizing" and a suggestion of using "%=" to generate a unique
> instance of an asm.

So gcc might some day note a do-nothing asm and duplicate it for
the sole purpose of collapsing the "then" and "else" clauses. I
guess I need to keep my paranoia for the time being, then. :-/

> Which might actually be a good idea for "barrier()", just in case.
> However, the problem with that is that I don't think we are guaranteed
> to have a universal comment character for asm statements.
>
> IOW, it might be a good idea to do something like
>
> #define barrier() \
> __asm__ __volatile__("# barrier %=": : :"memory")
>
> but I'm not 100% convinced that '#' is always a comment in asm code,
> so the above might not actually build everywhere.
>
> However, *testing* the above (in my config, where '#' does work as a
> comment character) shows that gcc doesn't actually consider them to be
> distinct EVEN THEN, and will still merge two barrier statements.
>
> That's distressing.

If I keep the old definition of barrier() and make a barrier1() as
you defined above:

#define barrier1() __asm__ __volatile__("# barrier %=": : :"memory")

Then putting barrier() in the "then" clause and barrier1() in the
"else" clause works, though clang 12 for whatever reason generates
an extra jump in that case. https://godbolt.org/z/YhbcsxsxG

Increasing the optimization level gets rid of the extra jump.

Of course, there is no guarantee that gcc won't learn about
assembler constants. :-/

> So the gcc docs are actively wrong, and %= does nothing - it will
> still compare as the exact same inline asm, because the string
> equality testing is apparently done before any expansion.
>
> Something like this *does* seem to work:
>
> #define ____barrier(id) __asm__ __volatile__("#" #id: : :"memory")
> #define __barrier(id) ____barrier(id)
> #define barrier() __barrier(__COUNTER__)
>
> which is "interesting" or "disgusting" depending on how you happen to feel.
>
> And again - the above works only as long as "#" is a valid comment
> character in the assembler. And I have this very dim memory of us
> having comments in inline asm, and it breaking certain configurations
> (for when the assembler that the compiler uses is a special
> human-unfriendly one that only accepts compiler output).
>
> You could make even more disgusting hacks, and have it generate something like
>
> .pushsection .discard.barrier
> .long #id
> .popsection
>
> instead of a comment. We already expect that to work and have generic
> inline asm cases that generate code like that.

And that does the trick as well, at least with recent gcc and clang.
https://godbolt.org/z/P8zPv9f9o

Thanx, Paul

2021-06-06 12:02:59