2006-01-21 00:55:40

by Chuck Ebbert

[permalink] [raw]
Subject: set_bit() is broken on i386?

/*
* setbit.c -- test the Linux set_bit() function
*
* Compare the output of this program with and without the
* -finline-functions option to GCC.
*
* If they are not the same, set_bit is broken.
*
* Result on i386 with gcc 3.3.2 (Fedora Core 2):
*
* [me@d2 t]$ gcc -O2 -o setbit.ex setbit.c ; ./setbit.ex
* 00010001
* [me@d2 t]$ gcc -O2 -o setbit.ex -finline-functions setbit.c ; ./setbit.ex
* 00000001
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

#define inline __attribute__((always_inline))

/*
* From 2.6.15/include/asm-i386/bitops.h -- needs a memory clobber?
*/
#define ADDR (*(volatile long *) addr)
static inline void set_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( "lock ; "
"btsl %1,%0"
:"=m" (ADDR)
:"Ir" (nr));
}

unsigned long b[2];

int main(int argc, char * const argv[])
{
b[1] = 0x1;
set_bit(12 * sizeof(unsigned long), b);
printf("%08x\n", b[1]);
return 0;
}
--
Chuck
Currently reading: _Sun Of Suns_ by Karl Schroeder


2006-01-21 01:15:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

On Fri, 2006-01-20 at 19:53 -0500, Chuck Ebbert wrote:

> #define ADDR (*(volatile long *) addr)
> static inline void set_bit(int nr, volatile unsigned long * addr)
> {
> __asm__ __volatile__( "lock ; "
> "btsl %1,%0"
> :"=m" (ADDR)
> :"Ir" (nr));
> }

The asm needs a memory clobber in order to avoid reordering with the
assignment to b[1]:

static inline void set_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( "lock ; "
"btsl %1,%0"
:"=m" (ADDR)
:"Ir" (nr)
: "memory");
}

This works consistently correctly for me.

Cheers,
Trond

2006-01-21 01:33:23

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

On Fri, 20 Jan 2006 19:53:14 EST, Chuck Ebbert said:
> /*
> * setbit.c -- test the Linux set_bit() function
> *
> * Compare the output of this program with and without the
> * -finline-functions option to GCC.
> *
> * If they are not the same, set_bit is broken.
> *
> * Result on i386 with gcc 3.3.2 (Fedora Core 2):
> *
> * [me@d2 t]$ gcc -O2 -o setbit.ex setbit.c ; ./setbit.ex
> * 00010001
> * [me@d2 t]$ gcc -O2 -o setbit.ex -finline-functions setbit.c ; ./setbit.ex
> * 00000001

It certainly seems to be gcc version dependent (and I'd not be surprised if the
exact combo of -O2, -Os, and -mfoo and -fwhatever flags as well). Trond is
probably right that a memory clobber will force gcc to DTIT (Do The Intended
Thing, which may be different from a DTRT) regardless of what gcc's code generator
decides to do....

% gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.0 20060117 (Red Hat 4.1.0-0.15)
% gcc -O2 -o setbit.ex setbit.c ; ./setbit.ex
00000001
% gcc -O2 -o setbit.ex -finline-functions setbit.c ; ./setbit.ex
00000001

Fedora Core -devel tree as of this morning (so sort-of FC5 test2).


Attachments:
(No filename) (226.00 B)

2006-01-21 01:38:37

by Kenny Simpson

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

Would it not be more correct to have "+m" to show that the register is both an input and an
output, and nothing more?
Or is set_bit supposed to be a general barrier or fence?

static inline void set_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( "lock ; "
"btsl %1,%0"
:"+m" (ADDR)
:"Ir" (nr));
}

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-01-21 01:46:49

by Kenny Simpson

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

As a follow-up to my previous post... this only works if the ADDR macro is not used:
"+m" (*addr)

What is the purpose of the ADDR macro? (besides calling upon the wrath of the undefined behavior
gremlins?)

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-01-21 01:46:25

by Andrew Morton

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

Chuck Ebbert <[email protected]> wrote:
>
> /*
> * setbit.c -- test the Linux set_bit() function
> *
> * Compare the output of this program with and without the
> * -finline-functions option to GCC.
> *
> * If they are not the same, set_bit is broken.
> *
> * Result on i386 with gcc 3.3.2 (Fedora Core 2):
> *
> * [me@d2 t]$ gcc -O2 -o setbit.ex setbit.c ; ./setbit.ex
> * 00010001
> * [me@d2 t]$ gcc -O2 -o setbit.ex -finline-functions setbit.c ; ./setbit.ex
> * 00000001
> */
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
>
> #define inline __attribute__((always_inline))
>
> /*
> * From 2.6.15/include/asm-i386/bitops.h -- needs a memory clobber?
> */
> #define ADDR (*(volatile long *) addr)
> static inline void set_bit(int nr, volatile unsigned long * addr)
> {
> __asm__ __volatile__( "lock ; "
> "btsl %1,%0"
> :"=m" (ADDR)
> :"Ir" (nr));
> }
>
> unsigned long b[2];
>
> int main(int argc, char * const argv[])
> {
> b[1] = 0x1;
> set_bit(12 * sizeof(unsigned long), b);
> printf("%08x\n", b[1]);
> return 0;
> }

Yes, that's a bit of a trap. Seems that any place where the compiler
"knows" the value of b[1], it'll cache it for the printf() push.

It's actually quite rare for the kernel to manipulate an unsigned long as
both a target of the bitops functions and as a plain old unsigned long.

We have smp_mb__before_clear_bit() and smp_mb__after_clear_bit(), which
need to be explicity used in the special case where we're using clear_bit()
in still-rather-poorly-defined circumstances.

But what you have here is misbehaviour on UP, with set_bit(), due to
compiler actions, which is quite unrelated.

So I guess we could add the rule "you need to use barrier() before treating
a bitop-manipulated word as a word" or we can go stick all the barriers
into bitops.h.

<does that>

OK, adding nine "memory"s to i386/bitops.h increases my allnoconfig
vmlinux's text from 774636 bytes to 774928 (292 bytes more).

But the problem is that we've gone and altered the foo_bit() _semantics_ on
x86. So people can now go and write code which will only work properly on
x86 (even more than at present).

It'll be hard to find all the code which treats bitops-targets as scalars.
It would be easier if we'd defined bitops as operating on a bitop_target_t
or whatever, but we didn't do that.

Personally, I'd like to stick the barriers in there and sleep soundly. But
we do need to make architectures all behave the same way (and define that
way clearly), and the impact of the new semantics might be worse on some
other architectures.

2006-01-21 01:50:00

by Andreas Schwab

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

Trond Myklebust <[email protected]> writes:

> On Fri, 2006-01-20 at 19:53 -0500, Chuck Ebbert wrote:
>
>> #define ADDR (*(volatile long *) addr)
>> static inline void set_bit(int nr, volatile unsigned long * addr)
>> {
>> __asm__ __volatile__( "lock ; "
>> "btsl %1,%0"
>> :"=m" (ADDR)
>> :"Ir" (nr));
>> }
>
> The asm needs a memory clobber in order to avoid reordering with the
> assignment to b[1]:

Check out 2.6.16-rc1, this has already been fixed.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2006-01-21 02:01:20

by Grzegorz Kulewski

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

On Fri, 20 Jan 2006, [email protected] wrote:

> On Fri, 20 Jan 2006 19:53:14 EST, Chuck Ebbert said:
>> /*
>> * setbit.c -- test the Linux set_bit() function
>> *
>> * Compare the output of this program with and without the
>> * -finline-functions option to GCC.
>> *
>> * If they are not the same, set_bit is broken.
>> *
>> * Result on i386 with gcc 3.3.2 (Fedora Core 2):
>> *
>> * [me@d2 t]$ gcc -O2 -o setbit.ex setbit.c ; ./setbit.ex
>> * 00010001
>> * [me@d2 t]$ gcc -O2 -o setbit.ex -finline-functions setbit.c ; ./setbit.ex
>> * 00000001
>
> It certainly seems to be gcc version dependent (and I'd not be surprised if the
> exact combo of -O2, -Os, and -mfoo and -fwhatever flags as well). Trond is
> probably right that a memory clobber will force gcc to DTIT (Do The Intended
> Thing, which may be different from a DTRT) regardless of what gcc's code generator
> decides to do....
>
> % gcc -v
> Using built-in specs.
> Target: i386-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
> Thread model: posix
> gcc version 4.1.0 20060117 (Red Hat 4.1.0-0.15)
> % gcc -O2 -o setbit.ex setbit.c ; ./setbit.ex
> 00000001
> % gcc -O2 -o setbit.ex -finline-functions setbit.c ; ./setbit.ex
> 00000001
>
> Fedora Core -devel tree as of this morning (so sort-of FC5 test2).

This is what I am getting under Gentoo with different gcc's:

$ export CC=gcc-3.4.5; $CC --version; rm ./setbit.ex; $CC -O2 -o setbit.ex
setbit.c ; ./setbit.ex; rm ./setbit.ex; $CC -O2 -o setbit.ex
-finline-functions setbit.c ; ./setbit.ex
gcc-3.4.5 (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

00000001
00000001


$ export CC=gcc-4.0.2; $CC --version; rm ./setbit.ex; $CC -O2 -o setbit.ex
setbit.c ; ./setbit.ex; rm ./setbit.ex; $CC -O2 -o setbit.ex
-finline-functions setbit.c ; ./setbit.ex
gcc-4.0.2 (GCC) 4.0.2 (Gentoo 4.0.2-r3, pie-8.7.8)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

00000001
00000001


$ export CC=gcc-4.1.0-beta20060113; $CC --version; rm ./setbit.ex; $CC -O2
-o setbit.ex setbit.c ; ./setbit.ex; rm ./setbit.ex; $CC -O2 -o setbit.ex
-finline-functions setbit.c ; ./setbit.ex
gcc-4.1.0-beta20060113 (GCC) 4.1.0-beta20060113 (prerelease)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

00000001
00000001


And on some really old and not upgraded server (with Gentoo too of
course):

$ export CC=gcc32; $CC --version; rm ./setbit.ex; $CC -O2 -o setbit.ex
setbit.c ; ./setbit.ex; rm ./setbit.ex; $CC -O2 -o setbit.ex
-finline-functions setbit.c ; ./setbit.ex
gcc (GCC) 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1,
pie-8.7.7.1)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

00010001
00000001


So either this problem is present only on gcc < 3.4 or Gentoo patched it
somehow in newer versions...


Thanks,

Grzegorz Kulewski

2006-01-21 02:07:04

by Kenny Simpson

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

Some day I'll learn to wait a few minutes before posting...

Is the issue here because btsl can touch many different bytes in the array, and there is no easy
way to tell gcc that an array is in-out, so "memory" is the best we can do?

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-01-21 02:39:37

by Andrew Morton

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

Andreas Schwab <[email protected]> wrote:
>
> Trond Myklebust <[email protected]> writes:
>
> > On Fri, 2006-01-20 at 19:53 -0500, Chuck Ebbert wrote:
> >
> >> #define ADDR (*(volatile long *) addr)
> >> static inline void set_bit(int nr, volatile unsigned long * addr)
> >> {
> >> __asm__ __volatile__( "lock ; "
> >> "btsl %1,%0"
> >> :"=m" (ADDR)
> >> :"Ir" (nr));
> >> }
> >
> > The asm needs a memory clobber in order to avoid reordering with the
> > assignment to b[1]:
>
> Check out 2.6.16-rc1, this has already been fixed.
>

No, that doesn't fix this testcase.

We need to somehow tell the compiler "this assembly statement altered
memory and you can't cache memory contents across it". That's what
"memory" (ie: barrier()) does. I don't think there's a way of telling gcc
_what_ memory was clobbered - just "all of memory".

2006-01-21 07:48:26

by Chuck Ebbert

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

In-Reply-To: <[email protected]>

On Fri, 20 Jan 2006, Andrew Morton wrote:

> We need to somehow tell the compiler "this assembly statement altered
> memory and you can't cache memory contents across it". That's what
> "memory" (ie: barrier()) does. I don't think there's a way of telling gcc
> _what_ memory was clobbered - just "all of memory".

I think you can do that by specifying an output operand that you
never use in your assembler code, e.g. changing this:

| __asm__ __volatile__( "lock ; "
| "btsl %1,%0"
| :"=m" (ADDR)
| :"Ir" (nr));

to this:

| #define LONGBITS (8 * sizeof(unsigned long))
|
| __asm__ __volatile__( "lock ; "
| "btsl %2,%1"
| :"=m"(*(&ADDR + nr/LONGBITS))
| :"m" (ADDR), "Ir" (nr));

fixes my example program by telling the compiler what memory location
is altered. (Note that %0 is never used inside the asm code.)
So iff 'nr' is a constant you can clobber specific memory locations.
--
Chuck
Currently reading: _Sun Of Suns_ by Karl Schroeder

2006-01-21 19:26:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

On Fri, 2006-01-20 at 18:38 -0800, Andrew Morton wrote:
> We need to somehow tell the compiler "this assembly statement altered
> memory and you can't cache memory contents across it". That's what
> "memory" (ie: barrier()) does. I don't think there's a way of telling gcc
> _what_ memory was clobbered - just "all of memory".

We _can_ (and do) tell gcc that the unsigned long at address "addr"
will be clobbered. The problem here is that we're actually applying
set_bit() to a bit array that is larger than the single long, so we are
not necessarily clobbering "addr", but rather the long at addr + X.

Most non-386 architectures don't actually have this compiler reordering
problem since they tend to convert the index into the bit array into an
offset for addr + a remainder:

unsigned long *offset = addr + (nr / 8*sizeof(unsigned long));
unsigned long bit = (nr % 8*sizeof(unsigned long));

and then tell the compiler that the long at "offset" will be clobbered.
The remaining architectures appear to already have the general memory
clobber set in their asms (as far as I can see).

IOW: the 386 and x86_64 appear to be the problem cases here, and then
only when applied to large bit arrays.

Cheers
Trond

2006-01-21 20:53:36

by Chuck Ebbert

[permalink] [raw]
Subject: Re: set_bit() is broken on i386?

In-Reply-To: <[email protected]>

(Resent because I forgot the cc: list; Grzegorz, please don't
reply to my original.)

On Sat, 21 Jan 2006, Grzegorz Kulewski wrote:

> > On Fri, 20 Jan 2006 19:53:14 EST, Chuck Ebbert said:
> >> /*
> >> * setbit.c -- test the Linux set_bit() function
> >> *
> >> * Compare the output of this program with and without the
> >> * -finline-functions option to GCC.
> >> *
> >> * If they are not the same, set_bit is broken.

I should have said "If they do not both output '00010001' then
set_bit() is broken."

You got:

> gcc-3.4.5 (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
> 00000001
> 00000001
>
> gcc-4.0.2 (GCC) 4.0.2 (Gentoo 4.0.2-r3, pie-8.7.8)
> 00000001
> 00000001
>
> gcc-4.1.0-beta20060113 (GCC) 4.1.0-beta20060113 (prerelease)
> 00000001
> 00000001
>
> gcc (GCC) 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1,

pie-8.7.7.1)
> 00010001
> 00000001

So the macro is broken even without -finline-functions on all but the
last compiler.

I couldn't break this new program, which assumes 'nr' is a constant:

/*
* setbit2.c -- test an alternate Linux set_bit() function
*
* Compare the output of this program with and without the
* -finline-functions option to GCC.
*
* If they do not both contain two ones, set_bit is broken.
*
* Result on i386 with gcc 4.0.2 (Fedora Core 4):
*
* [me@d2 t]$ gcc -O2 -o setbit2.ex setbit2.c ; ./setbit2.ex
* 0x00010001
* [me@d2 t]$ gcc -O2 -o setbit2.ex -finline-functions setbit2.c ;
./setbit2.ex
* 0x00010001
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

#define inline __attribute__((always_inline))
#define LONGBITS (8 * sizeof(unsigned long))

/*
* new set_bit() for testing; assumes nr is a compile-time constant
*/
#define ADDR (*(volatile long *) addr)
static inline void set_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( "lock ; "
"btsl %2,%1"
:"=m" (*(addr + nr/LONGBITS))
:"m" (*addr),"Ir" (nr),"m" (*(addr + nr/LONGBITS))
);
}

unsigned long a = 1;
unsigned long b[3];

#define WORD 2
int main(int argc, char * const argv[])
{
b[WORD] = a;
set_bit(WORD * LONGBITS + LONGBITS / 2, b);
printf("0x%08x\n", b[WORD]);
return 0;
}

--
Chuck