LinuxLists.cc - [PATCH] make atomic_t volatile on all architectures

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Jesper Juhl wrote:
> On 09/08/2007, Chris Snook <[email protected]> wrote:
>> From: Chris Snook <[email protected]>
>>
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile. This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads
>> of all registers used in the loop, thus hurting performance instead of helping
>> it, particularly on architectures where it's unnecessary. Since we generally
>> want to re-read the contents of an atomic variable on every access anyway,
>> let's standardize the behavior across all architectures and avoid the
>> performance and correctness problems of requiring the use of barrier() in
>> loops that expect atomic_t variables to change externally. This is relevant
>> even on non-smp architectures, since drivers may use atomic operations in
>> interrupt handlers.
>>
>> Signed-off-by: Chris Snook <[email protected]>
>>
>
> Hmm, I thought we were trying to move away from volatile since it is
> very weakly defined and towards explicit barriers and locks...
> Points to --> Documentation/volatile-considered-harmful.txt

This is a special case. Usually, the use of volatile is just lazy. In
this case, it's probably necessary on at least some architectures, so we
can't remove it everywhere unless we want to rewrite atomic.h completely
in inline assembler for several architectures, and painstakingly verify
all that work. I would hope it's obvious that having consistent
behavior on all architectures, or even at all compiler optimization
levels within an architecture, can be agreed to be good. Additionally,
atomic_t variables are a rare exception, in that we pretty much always
want to work with the latest value in RAM on every access. Making this
atomic will allow us to remove a bunch of barriers which do nothing but
slow things down on most architectures.

I agree that the use of atomic_t in .c files is generally bad, but in
certain special cases, hiding it inside defined data types may be worth
the slight impurity, just as we sometimes tolerate lines longer than 80
columns when "fixing" it makes things unreadable.

-- Chris

2007-08-08 23:36:37

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Lennert Buytenhek wrote:
> On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:
>
>> From: Chris Snook <[email protected]>
>>
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile. This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads
>> of all registers used in the loop, thus hurting performance instead of helping
>> it, particularly on architectures where it's unnecessary. Since we generally
>> want to re-read the contents of an atomic variable on every access anyway,
>> let's standardize the behavior across all architectures and avoid the
>> performance and correctness problems of requiring the use of barrier() in
>> loops that expect atomic_t variables to change externally. This is relevant
>> even on non-smp architectures, since drivers may use atomic operations in
>> interrupt handlers.
>>
>> Signed-off-by: Chris Snook <[email protected]>
>
> Documentation/atomic_ops.txt would need updating:
>
> [...]
>
> One very important aspect of these two routines is that they DO NOT
> require any explicit memory barriers. They need only perform the
> atomic_t counter update in an SMP safe manner.

Thanks, I was looking for that. I'll re-send shortly with my comment
moved there. People are already using atomic_t in a manner that implies
the use of memory barriers and interrupt-safety, which is what the patch
enforces.

-- Chris

2007-08-08 23:51:43

by Jesper Juhl

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On 09/08/2007, Chris Snook <[email protected]> wrote:
> Jesper Juhl wrote:
> > On 09/08/2007, Chris Snook <[email protected]> wrote:
> >> From: Chris Snook <[email protected]>
> >>
> >> Some architectures currently do not declare the contents of an atomic_t to be
> >> volatile. This causes confusion since atomic_read() might not actually read
> >> anything if an optimizing compiler re-uses a value stored in a register, which
> >> can break code that loops until something external changes the value of an
> >> atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads
> >> of all registers used in the loop, thus hurting performance instead of helping
> >> it, particularly on architectures where it's unnecessary. Since we generally
> >> want to re-read the contents of an atomic variable on every access anyway,
> >> let's standardize the behavior across all architectures and avoid the
> >> performance and correctness problems of requiring the use of barrier() in
> >> loops that expect atomic_t variables to change externally. This is relevant
> >> even on non-smp architectures, since drivers may use atomic operations in
> >> interrupt handlers.
> >>
> >> Signed-off-by: Chris Snook <[email protected]>
> >>
> >
> > Hmm, I thought we were trying to move away from volatile since it is
> > very weakly defined and towards explicit barriers and locks...
> > Points to --> Documentation/volatile-considered-harmful.txt
>
> This is a special case.

I had a feeling it might be.

> Usually, the use of volatile is just lazy. In
> this case, it's probably necessary on at least some architectures, so we
> can't remove it everywhere unless we want to rewrite atomic.h completely
> in inline assembler for several architectures, and painstakingly verify
> all that work.

Sounds quite unpleasant and actively harmful - keeping stuff in plain
readable C makes it easier to handle by most people.

> I would hope it's obvious that having consistent
> behavior on all architectures, or even at all compiler optimization
> levels within an architecture, can be agreed to be good.

Agreed, consistency is good.

> Additionally,
> atomic_t variables are a rare exception, in that we pretty much always
> want to work with the latest value in RAM on every access. Making this
> atomic will allow us to remove a bunch of barriers which do nothing but
> slow things down on most architectures.
>
I believe you meant to say "volatile" and not "atomic" above. But
yes, if volatile is sufficiently defined to ensure it does what we
want in this case and using it means we can actually speed up the
kernel, then it does indeed sound reasonable. Especially since it's
confined to the implementation of atomic_t which most people shouldn't
be looking at anyway (but simply use) and when using an atomic_t it's
pretty reasonable to expect that it doesn't need any additional
locking or barriers since it's supposed to be *atomic*.

> I agree that the use of atomic_t in .c files is generally bad, but in
> certain special cases, hiding it inside defined data types may be worth
> the slight impurity, just as we sometimes tolerate lines longer than 80
> columns when "fixing" it makes things unreadable.
>
Can't argue with that.

It seems you've thought it through. I just wanted to be sure that
this approach was actually the one that makes the most sense, and
you've convinced me of that :-)

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-08-09 01:05:32

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Chris Snook <[email protected]> wrote:
>
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile. This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads

Such loops should always use something like cpu_relax() which comes
with a barrier.

> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary. Since we generally

Do you have an example of such a loop where performance is hurt by this?

The IPVS code that led to this patch was simply broken and has been
fixed to use cpu_relax().

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-08-09 01:48:36

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

From: Herbert Xu <[email protected]>
Date: Thu, 09 Aug 2007 09:03:27 +0800

> Such loops should always use something like cpu_relax() which comes
> with a barrier.

This is an excellent point.

And it needs to be weighed with the error prone'ness Andrew mentioned.
There probably is a middle ground somewhere.

2007-08-09 03:47:35

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Wed, Aug 08, 2007 at 06:48:24PM -0700, David Miller wrote:
> From: Herbert Xu <[email protected]>
> Date: Thu, 09 Aug 2007 09:03:27 +0800
>
> > Such loops should always use something like cpu_relax() which comes
> > with a barrier.
>
> This is an excellent point.
>
> And it needs to be weighed with the error prone'ness Andrew mentioned.
> There probably is a middle ground somewhere.

OK... I'll bite. ACCESS_ONCE(), see http://lkml.org/lkml/2007/7/11/664.

This would allow ACCESS_ONCE(atomic_read(&x)) to be used where refetching
would be problematic, but allow the compiler free rein in cases where
refetching is OK.

The ACCESS_ONCE() primitive of course has its limitations as well, but
you did ask for a middle ground. ;-)

Thanx, Paul

2007-08-09 04:19:29

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Wed, 8 Aug 2007, Chris Snook wrote:
>
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile. This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.

I'd be *much* happier with "atomic_read()" doing the "volatile" instead.

The fact is, volatile on data structures is a bug. It's a wart in the C
language. It shouldn't be used.

Volatile accesses in *code* can be ok, and if we have "atomic_read()"
expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.

But marking data structures volatile just makes the compiler screw up
totally, and makes code for initialization sequences etc much worse.

Linus

2007-08-09 05:05:19

by Jerry Jiang

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Wed, 8 Aug 2007 21:18:25 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Wed, 8 Aug 2007, Chris Snook wrote:
> >
> > Some architectures currently do not declare the contents of an atomic_t to be
> > volatile. This causes confusion since atomic_read() might not actually read
> > anything if an optimizing compiler re-uses a value stored in a register, which
> > can break code that loops until something external changes the value of an
> > atomic_t.
>
> I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
>
> The fact is, volatile on data structures is a bug. It's a wart in the C
> language. It shouldn't be used.

Why? It's a wart! Is it due to unclear C standard on volatile related point?

Why the *volatile-accesses-in-code* is acceptable, does C standard make it clear?

-- Jerry

>
> Volatile accesses in *code* can be ok, and if we have "atomic_read()"
> expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.
>
> But marking data structures volatile just makes the compiler screw up
> totally, and makes code for initialization sequences etc much worse.
>
> Linus

2007-08-09 07:38:30

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Linus Torvalds wrote:
>
> On Wed, 8 Aug 2007, Chris Snook wrote:
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile. This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t.
>
> I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
>
> The fact is, volatile on data structures is a bug. It's a wart in the C
> language. It shouldn't be used.
>
> Volatile accesses in *code* can be ok, and if we have "atomic_read()"
> expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.
>
> But marking data structures volatile just makes the compiler screw up
> totally, and makes code for initialization sequences etc much worse.
>
> Linus

Fair enough. Casting to (volatile int *) will give us the behavior people
expect when using atomic_t without needing to use inefficient barriers.

While we have the hood up, should we convert all the atomic_t's to non-volatile
and put volatile casts in all the atomic_reads? I don't know enough about the
various arches to say with confidence that those changes alone will preserve
existing behavior. We might need some arch-specific tweaking of the atomic
operations.

-- Chris

2007-08-09 07:49:04

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Herbert Xu wrote:
> Chris Snook <[email protected]> wrote:
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile. This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads
>
> Such loops should always use something like cpu_relax() which comes
> with a barrier.

If they're not doing anything, sure. Plenty of loops actually do some sort of
real work while waiting for their halt condition, possibly even work which is
necessary for their halt condition to occur, and you definitely don't want to be
doing cpu_relax() in this case. On register-rich architectures you can do quite
a lot of work without needing to reuse the register containing the result of the
atomic_read(). Those are precisely the architectures where barrier() hurts the
most.

>> of all registers used in the loop, thus hurting performance instead of helping
>> it, particularly on architectures where it's unnecessary. Since we generally
>
> Do you have an example of such a loop where performance is hurt by this?

Not handy. Perhaps more interesting are cases where we access the same atomic_t
twice in a hot path. If we can remove some of those barriers, those hot paths
will get faster.

Performance was only part of the motivation. The IPVS bug was an example of how
atomic_t is assumed (not always correctly) to work, and other recent discussions
on this list have made it clear that most people assume atomic_read() actually
reads something every time, and don't even think to consult the documentation
until they find out the hard way that it does not. I'm not saying we should
encourage lazy programming, but in this case the assumption is reasonable
because that's how people actually use atomic_t, and making this behavior
uniform across all architectures makes it more convenient to do things the right
way, which we should encourage.

> The IPVS code that led to this patch was simply broken and has been
> fixed to use cpu_relax().

I agree, busy-waiting should be done with cpu_relax(), if at all. I'm more
concerned about cases that are not busy-waiting, but could still get compiled
with the same optimization.

-- Chris

2007-08-09 08:15:14

by Heiko Carstens

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, Aug 09, 2007 at 03:31:10AM -0400, Chris Snook wrote:
> Linus Torvalds wrote:
> > I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
> > The fact is, volatile on data structures is a bug. It's a wart in the C
> > language. It shouldn't be used. Volatile accesses in *code* can be ok, and
> > if we have "atomic_read()" expand to a "*(volatile int *)&(x)->value", then
> > I'd be ok with that.
> > But marking data structures volatile just makes the compiler screw up
> > totally, and makes code for initialization sequences etc much worse.
> > Linus
>
> Fair enough. Casting to (volatile int *) will give us the behavior people
> expect when using atomic_t without needing to use inefficient barriers.
>
> While we have the hood up, should we convert all the atomic_t's to
> non-volatile and put volatile casts in all the atomic_reads? I don't know
> enough about the various arches to say with confidence that those changes
> alone will preserve existing behavior. We might need some arch-specific
> tweaking of the atomic operations.

If you write that patch could you include the atomic64 variants as well,
please? Besides that just post the patch to linux-arch and maintainers
should speak up.

2007-08-09 09:10:44

by Bodo Eggert

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Jerry Jiang <[email protected]> wrote:
> On Wed, 8 Aug 2007 21:18:25 -0700 (PDT)
>> On Wed, 8 Aug 2007, Chris Snook wrote:

>> > Some architectures currently do not declare the contents of an atomic_t to
>> > be
>> > volatile. This causes confusion since atomic_read() might not actually
>> > read anything if an optimizing compiler re-uses a value stored in a
>> > register, which can break code that loops until something external changes
>> > the value of an atomic_t.
>>
>> I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
>>
>> The fact is, volatile on data structures is a bug. It's a wart in the C
>> language. It shouldn't be used.
>
> Why? It's a wart! Is it due to unclear C standard on volatile related point?
>
> Why the *volatile-accesses-in-code* is acceptable, does C standard make it
> clear?

http://lwn.net/Articles/233482/
--
Fun things to slip into your budget
Heisenberg Compensator upgrade kit

Fri?, Spammer: [email protected]

2007-08-09 09:19:38

by Jerry Jiang

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, 09 Aug 2007 11:10:16 +0200
Bodo Eggert <[email protected]> wrote:

> >
> > Why the *volatile-accesses-in-code* is acceptable, does C standard make it
> > clear?
>
> http://lwn.net/Articles/233482/

I have read this article before, but What Linus said only focusing on
the conclusion-- The semantics of it are so unclear as
to be totally useless.

and still not to said "Why the *volatile-accesses-in-code* is
acceptable"

-- Jerry

> --
> Fun things to slip into your budget
> Heisenberg Compensator upgrade kit
>
> Friß, Spammer: [email protected]

2007-08-09 11:09:22

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, Aug 09, 2007 at 03:47:57AM -0400, Chris Snook wrote:
>
> If they're not doing anything, sure. Plenty of loops actually do some sort
> of real work while waiting for their halt condition, possibly even work
> which is necessary for their halt condition to occur, and you definitely
> don't want to be doing cpu_relax() in this case. On register-rich
> architectures you can do quite a lot of work without needing to reuse the
> register containing the result of the atomic_read(). Those are precisely
> the architectures where barrier() hurts the most.

I have a problem with this argument. The same loop could be
using a non-atomic as long as the updaters are serialised. Would
you suggest that we turn such non-atomics into volatiles too?

Any loop that's waiting for an external halt condition either
has to schedule away (which is a barrier) or you'd be busy
waiting in which case you should use cpu_relax.

Do you have an example where this isn't the case?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-08-09 11:44:55

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

Herbert Xu wrote:
> On Thu, Aug 09, 2007 at 03:47:57AM -0400, Chris Snook wrote:
>> If they're not doing anything, sure. Plenty of loops actually do some sort
>> of real work while waiting for their halt condition, possibly even work
>> which is necessary for their halt condition to occur, and you definitely
>> don't want to be doing cpu_relax() in this case. On register-rich
>> architectures you can do quite a lot of work without needing to reuse the
>> register containing the result of the atomic_read(). Those are precisely
>> the architectures where barrier() hurts the most.
>
> I have a problem with this argument. The same loop could be
> using a non-atomic as long as the updaters are serialised. Would
> you suggest that we turn such non-atomics into volatiles too?

No. I'm simply saying that when people call atomic_read, they expect a read to
occur. When this doesn't happen, people get confused. Merely referencing a
variable doesn't carry the same connotation.

Anyway, I'm taking Linus's advice and putting the cast in atomic_read(), rather
than the counter declaration itself. Everything else uses __asm__ __volatile__,
or calls atomic_read() with interrupts disabled. This ensures that
atomic_read() works as expected across all architectures, without the cruft the
compiler generates when you declare the variable itself volatile.

> Any loop that's waiting for an external halt condition either
> has to schedule away (which is a barrier) or you'd be busy
> waiting in which case you should use cpu_relax.

Not necessarily. Some code uses atomic_t for a sort of lightweight semaphore.
If your loop is actually doing real work, perhaps in a softirq handler
negotiating shared resources with a hard irq handler, you're not busy-waiting.

> Do you have an example where this isn't the case?

a) No, and that's sort of the point. We shouldn't have to audit the whole
kernel to find cases where a misleadingly-named function is doing what its users
are not expecting. If we can make it always do the right thing without any
substantial drawbacks, we should.

b) Loops are just one case, which came to mind because of the IPVS bug recently
discussed. I recall seeing some scheduler code recently which does an
atomic_read() twice on the same variable, with a barrier in between. It's in a
very hot path, so if we can remove that barrier, we save a bunch of register
loads. When you're context switching every microsecond in SCHED_RR, that really
matters.

-- Chris

2007-08-09 15:02:21

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, 9 Aug 2007, Jerry Jiang wrote:
>
> and still not to said "Why the *volatile-accesses-in-code* is
> acceptable"

I don't think volatile is necessarily wonderful in code _either_. So I
think the "atomic_read()" issue would be even better off if we just made
sure everybody behaves well and has the right barriers.

So the difference between "volatile in code" and "volatile on data
structures" is that the latter is *always* wrong (and wrong for very
fundamental reasons).

But "volatile in code" can be perfectly fine. If you hide the volatile in
an accessor function, and just specify that the rules for that function is
that it always loads from memory but doesn't imply any memory barriers,
than that is fine. And I think those kinds of semantics may well be
perfectly sane for "atomic_read()".

So I think a volatile access inside "atomic_read()" at least has
well-defined semantics. Are they the best possible semantics? I dunno. I
can imagine that it's perfectly fine, but on the other hand, it would be
interesting to see any code for which it matters - it's entirely possible
that the code itself is the problem, and should instead have a
"cpu_relax()" or similar that implies a barrier.

Linus

2007-08-09 17:42:36

by Chuck Ebbert

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On 08/09/2007 03:31 AM, Chris Snook wrote:
>
> Fair enough. Casting to (volatile int *) will give us the behavior
> people expect when using atomic_t without needing to use inefficient
> barriers.
>

You can use this forget() macro to make the compiler reread a variable:

#define forget(var) asm volatile ("" : "=m"(var))

2007-08-09 17:54:30

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, 2007-08-09 at 13:36 -0400, Chuck Ebbert wrote:
> > Fair enough. Casting to (volatile int *) will give us the behavior
> > people expect when using atomic_t without needing to use inefficient
> > barriers.
> >
>
> You can use this forget() macro to make the compiler reread a
> variable:
>
> #define forget(var) asm volatile ("" : "=m"(var))

You need to specify a "m"(var) input constraint as well. Without it the
compiler might remove the initialization of var. E.g.

void fn(void)
{
int var = 0;
forget(var);
/* now var can have any value. */
}

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2007-08-09 18:01:50

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, 9 Aug 2007, Chuck Ebbert wrote:
>
> You can use this forget() macro to make the compiler reread a variable:
>
> #define forget(var) asm volatile ("" : "=m"(var))

No. That will also make the compiler "forget" any previous writes to it,
so it changes behaviour.

You'd have to use "+m".

Linus

2007-08-09 18:17:26

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Thu, 2007-08-09 at 10:55 -0700, Linus Torvalds wrote:
> > You can use this forget() macro to make the compiler reread a variable:
> >
> > #define forget(var) asm volatile ("" : "=m"(var))
>
> No. That will also make the compiler "forget" any previous writes to it,
> so it changes behaviour.
>
> You'd have to use "+m".

Yes, though I would use "=m" on the output list and "m" on the input
list. The reason is that I've seen gcc fall on its face with an ICE on
s390 due to "+m". The explanation I've got from our compiler people was
quite esoteric, as far as I remember gcc splits "+m" to an input operand
and an output operand. Now it can happen that the compiler chooses two
different registers to access the same memory location. "+m" requires
that the two memory references are identical which causes the ICE if
they are not. I do not know if the current compilers still do this. Has
anyone else seen this happen ?

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2007-08-12 05:55:20

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

>> You'd have to use "+m".
>
> Yes, though I would use "=m" on the output list and "m" on the input
> list. The reason is that I've seen gcc fall on its face with an ICE on
> s390 due to "+m". The explanation I've got from our compiler people was
> quite esoteric, as far as I remember gcc splits "+m" to an input
> operand
> and an output operand. Now it can happen that the compiler chooses two
> different registers to access the same memory location. "+m" requires
> that the two memory references are identical which causes the ICE if
> they are not.

The problem is very nicely described here, last paragraph:
<http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html>

It's not a problem anymore in (very) recent GCC, although
that of course won't help you in the kernel (yet).

> I do not know if the current compilers still do this. Has
> anyone else seen this happen ?

In recent GCC, it's actually documented:

The ordinary output operands must be write-only; GCC will assume that
the values in these operands before the instruction are dead and need
not be generated. Extended asm supports input-output or read-write
operands. Use the constraint character `+' to indicate such an operand
and list it with the output operands. You should only use read-write
operands when the constraints for the operand (or the operand in which
only some of the bits are to be changed) allow a register.

Note that last line.

Segher

2007-08-12 06:11:00

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Sun, 12 Aug 2007, Segher Boessenkool wrote:
>
> Note that last line.

Segher, how about you just accept that Linux uses gcc as per reality, and
that sometimes the reality is different from your expectations?

"+m" works. We use it. It's better than the alternatives. Pointing to
stale documentation doesn't change anything.

The same is true of accesses through a volatile pointer. The kernel has
done that for a *loong* time (all MMIO IO is done that way), and it's
totally pointless to say that "volatile" isn't guaranteed to do what it
does. It works, and quite frankly, if it didn't work, it would be a gcc
BUG.

And again, this is not a "C standard" issue. It's a "sane implementation"
issue. Linux expects the compiler to be sane. If it isn't, that's not our
problem. gcc *is* sane, and I don't see why you constantly act as if it
wasn't.

Linus

2007-08-12 09:48:51

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Sun, 2007-08-12 at 07:53 +0200, Segher Boessenkool wrote:
> > Yes, though I would use "=m" on the output list and "m" on the input
> > list. The reason is that I've seen gcc fall on its face with an ICE on
> > s390 due to "+m". The explanation I've got from our compiler people was
> > quite esoteric, as far as I remember gcc splits "+m" to an input
> > operand
> > and an output operand. Now it can happen that the compiler chooses two
> > different registers to access the same memory location. "+m" requires
> > that the two memory references are identical which causes the ICE if
> > they are not.
>
> The problem is very nicely described here, last paragraph:
> <http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html>
>
> It's not a problem anymore in (very) recent GCC, although
> that of course won't help you in the kernel (yet).

So you are saying that gcc 3.x still has this problem ?

> > I do not know if the current compilers still do this. Has
> > anyone else seen this happen ?
>
> In recent GCC, it's actually documented:
>
> The ordinary output operands must be write-only; GCC will assume that
> the values in these operands before the instruction are dead and need
> not be generated. Extended asm supports input-output or read-write
> operands. Use the constraint character `+' to indicate such an operand
> and list it with the output operands. You should only use read-write
> operands when the constraints for the operand (or the operand in which
> only some of the bits are to be changed) allow a register.
>
> Note that last line.

I see, thanks for the info.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2007-08-12 09:49:40

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Sat, 2007-08-11 at 23:09 -0700, Linus Torvalds wrote:
> Segher, how about you just accept that Linux uses gcc as per reality, and
> that sometimes the reality is different from your expectations?
>
> "+m" works. We use it. It's better than the alternatives. Pointing to
> stale documentation doesn't change anything.

Well, perhaps on i386. I've seen some older versions of the s390 gcc die
with an ICE because I have used "+m" in some kernel inline assembly. I'm
happy to hear that this issue is fixed in recent gcc. Now I'll have to
find out if this is already true with gcc 3.x. The duplication "=m" and
"m" with the same constraint is rather annoying.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2007-08-12 09:55:29

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Sun, 12 Aug 2007, Martin Schwidefsky wrote:
>
> The duplication "=m" and "m" with the same constraint is rather
> annoying.

It's not only annoying, it causes gcc to generate bad code too. At least
certain versions of gcc will generate the address *twice*, even if there
is obviously only one address used.

If you have problems with "+m", you are often actually better off using
just "m" and then adding a memory clobber. But that has other code
generation downsides.

Linus

2007-08-12 10:28:54

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

>> Note that last line.
>
> Segher, how about you just accept that Linux uses gcc as per reality,
> and
> that sometimes the reality is different from your expectations?
>
> "+m" works.

It works _most of the time_. Ask Martin. Oh you don't even have to,
he told you two mails ago. My last mail simply pointed out that this
isn't a GCC bug, but merely documented behaviour. Well okay, it was
documented only after people found problems with it; it is an edge
case and pretty hard to trigger (impossible to trigger on RISC
architectures, and not very likely on x86 either, for different
reasons).

Since "realistic" people keep using "+m" anyhow, current GCC has added
a workaround that splits "+m" into an "=m" and an "m" constraint. It
likely will be accepted as a permanent feature. However, that
workaround
doesn't yet exist on some of the compilers Linux still allows building
with (but the problem does).

> We use it. It's better than the alternatives.

How is this better than simply writing the slightly more verbose
asm("..." : "=m"(XX) : "m"(XX)) ? It's a few more characters.
asm() is hard to write (correctly) anyway. I don't see the problem.

> Pointing to stale documentation doesn't change anything.

It's not "stale" documentation, it's freshly built from GCC mainline.
It's also not "stale" in the sense that it isn't updated; in fact,
the email I pointed to is from a thread where a change in this exact
paragraph in the documentation was suggested (a couple of weeks ago).

> The same is true of accesses through a volatile pointer. The kernel has
> done that for a *loong* time (all MMIO IO is done that way),

(All MMIO on certain architectures). This however is a completely
different case; there _is_ no underlying C object declared anywhere,
so no way can GCC prove that that object isn't volatile, so it _has_
to assume it is.

Also, in case of e.g. readb(), if the compiler can prove the resulting
value isn't used it doesn't have to perform the access at all -- the
documentation (again, not "stale" documentation) points this out quite
literally.

> and it's
> totally pointless to say that "volatile" isn't guaranteed to do what it
> does.

I actually asked a couple of other GCC developers last night, and
they all agreed with me. If you look at historic GCC problem reports
(I pointed at a few earlier in these threads) you would see that the
situation is far from clear.

> It works, and quite frankly, if it didn't work, it would be a gcc
> BUG.

How can it be a bug, if the semantics you think are guaranteed are
guaranteed in your (and others') minds only?

Of course, GCC likes to cater to its users, and tries to make things
work the way the programmer probably intended. However it a) cannot
*actually* read your mind, it just looks that way; b) it _also_ has
to implement the _correct_ semantics of volatile; c) since this isn't
part of any C version (no, not GNU99 or similar either), sometimes
GCC developers do not think about what some people expect the compiler
should do with such code, but just implement the requirements they
*know* exist. And that's when "BUGs" happen.

If you want GCC to do what you want it to do instead of what it does,
why not file an enhancement request? <http://gcc.gnu.org/bugzilla>.

To save you the trouble, I just did: <gcc.gnu.org/PR33053>. Let's
see what comes from it.

> And again, this is not a "C standard" issue. It's a "sane
> implementation"
> issue.

Maybe what you consider sane isn't as clear-cut as you think? I'm
not alone with this opinion, as the mailing lists archives readily
show.

> Linux expects the compiler to be sane.

And the compiler expects the code it is asked to translate to
be reasonably sane. Nothing new here. The compiler is more
forgiving than the Linux code, IMHO; that's why I suggested
using the obviously correct asm implementation of the atomic
get/set, rather than using the not-so-obviously-correct and
error-prone volatile thing. The semantics of the asm() are
exactly the semantics you expect from an atomic get/set, while
that of the volatile implementation are *not*; and on top of
that it is very well-tested well-understood technology.

> If it isn't, that's not our problem.

What, if the compiler doesn't work as you expect, like a silent
miscompilation, or even only the compiler ICEs, that is not a
problem for users and/or developers of Linux? Interesting
opinion. I'd rather stay clear of known pitfalls.

> gcc *is* sane,

Yes, it is.

> and I don't see why you constantly act as if it wasn't.

I guess I didn't express myself clearly enough then. Hopefully
some other people _did_ understand what I meant. In very harsh
words: "not all (proposed) kernel code is sane".

Segher

2007-08-12 10:36:18

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

>>> Yes, though I would use "=m" on the output list and "m" on the input
>>> list. The reason is that I've seen gcc fall on its face with an ICE
>>> on
>>> s390 due to "+m". The explanation I've got from our compiler people
>>> was
>>> quite esoteric, as far as I remember gcc splits "+m" to an input
>>> operand
>>> and an output operand. Now it can happen that the compiler chooses
>>> two
>>> different registers to access the same memory location. "+m" requires
>>> that the two memory references are identical which causes the ICE if
>>> they are not.
>>
>> The problem is very nicely described here, last paragraph:
>> <http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html>
>>
>> It's not a problem anymore in (very) recent GCC, although
>> that of course won't help you in the kernel (yet).
>
> So you are saying that gcc 3.x still has this problem ?

Yes. A warning ("read-write constraint does not allow a register")
was added for GCC-3.4, but the fix/workaround is more recent
(4.2 I believe, perhaps it was backported to the 4.1 branch).

>>> I do not know if the current compilers still do this. Has
>>> anyone else seen this happen ?
>>
>> In recent GCC, it's actually documented:
>>
>> The ordinary output operands must be write-only; GCC will assume
>> that
>> the values in these operands before the instruction are dead and need
>> not be generated. Extended asm supports input-output or read-write
>> operands. Use the constraint character `+' to indicate such an
>> operand
>> and list it with the output operands. You should only use read-write
>> operands when the constraints for the operand (or the operand in
>> which
>> only some of the bits are to be changed) allow a register.
>>
>> Note that last line.
>
> I see, thanks for the info.

My pleasure.

Segher

2007-08-12 16:36:49

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

>> "+m" works. We use it. It's better than the alternatives. Pointing to
>> stale documentation doesn't change anything.
>
> Well, perhaps on i386. I've seen some older versions of the s390 gcc
> die
> with an ICE because I have used "+m" in some kernel inline assembly.
> I'm
> happy to hear that this issue is fixed in recent gcc. Now I'll have to
> find out if this is already true with gcc 3.x.

It was fixed (that is, "+m" is translated into a separate read
and write by GCC itself) in GCC-4.0.0, I just learnt.

> The duplication "=m" and "m" with the same constraint is rather
> annoying.

Yeah. Compiler errors are more annoying though I dare say ;-)

Segher

2007-08-12 18:00:26

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Sun, 12 Aug 2007, Segher Boessenkool wrote:
>
> It works _most of the time_.

It used to have problems. Gcc has had problems in various areas.

> Ask Martin. Oh you don't even have to,
> he told you two mails ago. My last mail simply pointed out that this
> isn't a GCC bug, but merely documented behaviour.

It was *not* documented behaviour, and there were gcc people who actively
*encouraged* use of "+m", and actually fixed gcc.

The documentation is buggy.

Live with it.

Linus

2007-08-12 18:12:35

[permalink] [raw]

Subject: Re: [PATCH] make atomic_t volatile on all architectures

On Sun, 12 Aug 2007, Segher Boessenkool wrote:
>
> Yeah. Compiler errors are more annoying though I dare say ;-)

Actually, compile-time errors are fine, and easy to work around. *Much*
more annoying is when gcc actively generates subtly bad code. We've had
use-after-free issues due to incorrect gcc liveness calculations etc, and
inline asm has beeen one of the more common causes - exactly because the
kernel is one of the few users (along with glibc) that uses it at all.

Now *those* are hard to find - the code works most of the time, but the
compiler has inserted a really subtle race condition into the code
(deallocated a local stack entry before last use). We had that with our
semaphore code at some point.

Linus

2007-08-12 19:14:41