2021-06-04 17:16:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 9:37 AM Peter Zijlstra <[email protected]> wrote:
>
> >
> > Why is "volatile_if()" not just
> >
> > #define barier_true() ({ barrier(); 1; })
> >
> > #define volatile_if(x) if ((x) && barrier_true())
>
> Because we weren't sure compilers weren't still allowed to optimize the
> branch away.

This isn't about some "compiler folks think".

The above CANNOT be compiled any other way than with a branch.

A compiler that optimizes a branch away is simply broken.

Of course, the actual condition (ie "x" above) has to be something
that the compiler cannot statically determine is a constant, but since
the whole - and only - point is that there will be a READ_ONCE() or
similar there, that's not an issue.

The compiler *cannot* just say "oh, I'll do that 'volatile asm
barrier' whether the condition is true or not". That would be a
fundamental compiler bug.

It's as if we wrote

if (x) y++;

and the compiler went "Oh, I'll just increment 'y' unconditionally by
one, I'm sure the programmer doesn't mind, the conditional on 'x' is
immaterial".

No. That's not a C compiler. That's a stinking piece of buggy shit.
The compiler has to honor the conditional.

In that "y++" case, a compiler can decide to do it without a branch,
and basically rewrite the above as

y += !!x;

but with a "volatile asm", that would be a bug.

Of course, we might want to make sure that the compiler doesn't go
"oh, empty asm, I can ignore it", but if that's the case then it's not
about "volatile_if()" any more, at that point it's "oh, the compiler
broke our 'barrier()' implementation", and we have bigger issues.

Linus


2021-06-04 17:31:14

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 10:10:29AM -0700, Linus Torvalds wrote:
> The compiler *cannot* just say "oh, I'll do that 'volatile asm
> barrier' whether the condition is true or not". That would be a
> fundamental compiler bug.

Yes.

> Of course, we might want to make sure that the compiler doesn't go
> "oh, empty asm, I can ignore it",

It isn't allowed to do that. GCC has this arguable misfeature where it
doesn't show empty asm in the assembler output, but that has no bearing
on anything but how human-readable the output is.


Segher

2021-06-04 17:42:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 4, 2021 at 10:27 AM Segher Boessenkool
<[email protected]> wrote:
>
> > Of course, we might want to make sure that the compiler doesn't go
> > "oh, empty asm, I can ignore it",
>
> It isn't allowed to do that. GCC has this arguable misfeature where it
> doesn't show empty asm in the assembler output, but that has no bearing
> on anything but how human-readable the output is.

That sounds about right, but we have had people talking about the
compiler looking inside the asm string before.

So it worries me that some compiler person might at some point go all
breathy-voice on us and say "I am altering the deal. Pray I don't
alter it any further".

Side note: when grepping for what "barrier()" does on different
architectures and different compilers, I note that yes, it really is
just an empty asm volatile with a "memory" barrier. That should in all
way sbe sufficient.

BUT.

There's this really odd comment in <linux/compiler-intel.h> that talks
about some "ECC" compiler:

/* Intel ECC compiler doesn't support gcc specific asm stmts.
* It uses intrinsics to do the equivalent things.
*/

and it defines it as "__memory_barrier()". This seems to be an ia64 thing, but:

- I cannot get google to find me any documentation on such an intrinsic

- it seems to be bogus anyway, since we have "asm volatile" usage in
at least arch/ia64/mm/tlb.c

So I do note that "barrier()" has an odd definition in one odd ia64
case, and I can't find the semantics for it.

Admittedly I also cannot find it in myself to care. I don't think that
"Intel ECC" compiler case actually exists, and even if it does I don't
think itanium is relevant any more. But it was an odd detail on what
"barrier()" actually might mean to the compiler.

Linus

2021-06-04 18:27:15

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] LKMM: Add volatile_if()

On Fri, Jun 04, 2021 at 10:10:29AM -0700, Linus Torvalds wrote:
> On Fri, Jun 4, 2021 at 9:37 AM Peter Zijlstra <[email protected]> wrote:
> >
> > >
> > > Why is "volatile_if()" not just
> > >
> > > #define barier_true() ({ barrier(); 1; })
> > >
> > > #define volatile_if(x) if ((x) && barrier_true())
> >
> > Because we weren't sure compilers weren't still allowed to optimize the
> > branch away.
>
> This isn't about some "compiler folks think".
>
> The above CANNOT be compiled any other way than with a branch.
>
> A compiler that optimizes a branch away is simply broken.
>
> Of course, the actual condition (ie "x" above) has to be something
> that the compiler cannot statically determine is a constant, but since
> the whole - and only - point is that there will be a READ_ONCE() or
> similar there, that's not an issue.

In fact there is one weird case where it is an issue (mentioned in
memory-barriers.txt):

If some obscure arch-specific header file does:

#define FOO 1

and an unwitting programmer writes:

volatile_if (READ_ONCE(*y) % FOO == 0)
WRITE_ONCE(*z, 5);

then the compiler _can_ statically determine that the condition is a
constant, in spite of the READ_ONCE, but this fact isn't apparent to the
programmer. The generated object code will include both the read and
the write, but there won't necessarily be any ordering between them.

I don't know if cases like this exist in the kernel. It wouldn't be
surprising if they did though, particularly in situations where a
feature (like multi-level page tables) may be compiled away.

Alan

2021-06-04 18:31:36

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [RFC] LKMM: Add volatile_if()

Hi!

On Fri, Jun 04, 2021 at 10:38:43AM -0700, Linus Torvalds wrote:
> On Fri, Jun 4, 2021 at 10:27 AM Segher Boessenkool
> <[email protected]> wrote:
> > > Of course, we might want to make sure that the compiler doesn't go
> > > "oh, empty asm, I can ignore it",
> >
> > It isn't allowed to do that. GCC has this arguable misfeature where it
> > doesn't show empty asm in the assembler output, but that has no bearing
> > on anything but how human-readable the output is.
>
> That sounds about right, but we have had people talking about the
> compiler looking inside the asm string before.
>
> So it worries me that some compiler person might at some point go all
> breathy-voice on us and say "I am altering the deal. Pray I don't
> alter it any further".

GCC will never do that. And neither will any other compiler that claims
to implement the GCC asm extensions, if they are true to their word.

GCC *does* look inside the assembler template to estimate what code size
this asm will generate, and it tries to be pessimistic about its
estimate so that this will always work, but it always is possible to
mislead the compiler here, precisely because it does not actually
pretend it understands assembler code (think .irp or anything with
assembler macros for example). In very rare cases this leads to
(assembler) errors ("jump target out of range", that kind of thing).
The most effective workaround is to write less silly code ;-) And of
course this is documented, see
<https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html>

> Side note: when grepping for what "barrier()" does on different
> architectures and different compilers, I note that yes, it really is
> just an empty asm volatile with a "memory" barrier. That should in all
> way sbe sufficient.
>
> BUT.
>
> There's this really odd comment in <linux/compiler-intel.h> that talks
> about some "ECC" compiler:
>
> /* Intel ECC compiler doesn't support gcc specific asm stmts.
> * It uses intrinsics to do the equivalent things.
> */
>
> and it defines it as "__memory_barrier()". This seems to be an ia64 thing, but:

"ecc" apparently was "icc" but for Itanium. It ceased to exist some
time in the 2.4 era apparently. It was still used in 2003. Searching
for "ecpc" (the C++ compiler driver) will find a bit more.

> - I cannot get google to find me any documentation on such an intrinsic
>
> - it seems to be bogus anyway, since we have "asm volatile" usage in
> at least arch/ia64/mm/tlb.c
>
> So I do note that "barrier()" has an odd definition in one odd ia64
> case, and I can't find the semantics for it.
>
> Admittedly I also cannot find it in myself to care.

Yeah, I love code archaeology, but I have work to do as well :-)

> I don't think that
> "Intel ECC" compiler case actually exists, and even if it does I don't
> think itanium is relevant any more. But it was an odd detail on what
> "barrier()" actually might mean to the compiler.

:-)


Segher