Hi all...
gcc3 gives this warning when using the __set_64bit_var function:
/usr/src/linux/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules
Is it a potential problem ?
This seems to cure it:
--- linux-2.4.22-jam1m/include/asm-i386/system.h.orig 2003-08-29 00:26:41.000000000 +0200
+++ linux-2.4.22-jam1m/include/asm-i386/system.h 2003-08-29 00:26:55.000000000 +0200
@@ -181,8 +181,8 @@
{
__set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)>>32ULL));
}
-#define ll_low(x) *(((unsigned int*)&(x))+0)
-#define ll_high(x) *(((unsigned int*)&(x))+1)
+#define ll_low(x) *(((unsigned int*)(void*)&(x))+0)
+#define ll_high(x) *(((unsigned int*)(void*)&(x))+1)
static inline void __set_64bit_var (unsigned long long *ptr,
unsigned long long value)
A collateral question: why is the reason for this function ?
long long assignments are not atomic in gcc ?
TIA
--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))
On Fri, Aug 29, 2003 at 12:35:11AM +0200, J.A. Magallon wrote:
> Hi all...
>
> gcc3 gives this warning when using the __set_64bit_var function:
>
> /usr/src/linux/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules
>
> Is it a potential problem ?
>
> This seems to cure it:
>
> --- linux-2.4.22-jam1m/include/asm-i386/system.h.orig 2003-08-29 00:26:41.000000000 +0200
> +++ linux-2.4.22-jam1m/include/asm-i386/system.h 2003-08-29 00:26:55.000000000 +0200
> @@ -181,8 +181,8 @@
> {
> __set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)>>32ULL));
> }
> -#define ll_low(x) *(((unsigned int*)&(x))+0)
> -#define ll_high(x) *(((unsigned int*)&(x))+1)
> +#define ll_low(x) *(((unsigned int*)(void*)&(x))+0)
> +#define ll_high(x) *(((unsigned int*)(void*)&(x))+1)
>
> static inline void __set_64bit_var (unsigned long long *ptr,
> unsigned long long value)
>
> A collateral question: why is the reason for this function ?
> long long assignments are not atomic in gcc ?
On x86, long long int == 64 bits but the chip is 32 bits wide,
so it uses 2 separate memory accesses. There are 64bit-wide
instructions which do bus-locking so that the are atomic,
but gcc will not use them directly.
( info nasm and look up "cmpxchg8b" and related friends)
Greets, Antonio.
On 08.29, Antonio Vargas wrote:
> On Fri, Aug 29, 2003 at 12:35:11AM +0200, J.A. Magallon wrote:
[...]
> >
> > A collateral question: why is the reason for this function ?
> > long long assignments are not atomic in gcc ?
>
> On x86, long long int == 64 bits but the chip is 32 bits wide,
> so it uses 2 separate memory accesses. There are 64bit-wide
> instructions which do bus-locking so that the are atomic,
> but gcc will not use them directly.
>
I know, my question was why gcc does not generate cmpxchg8b on
a 64 bits assign. Or it should not ?
--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))
On Fri, 29 Aug 2003, J.A. Magallon wrote:
>
> On 08.29, Antonio Vargas wrote:
> > On Fri, Aug 29, 2003 at 12:35:11AM +0200, J.A. Magallon wrote:
> [...]
> > >
> > > A collateral question: why is the reason for this function ?
> > > long long assignments are not atomic in gcc ?
> >
> > On x86, long long int == 64 bits but the chip is 32 bits wide,
> > so it uses 2 separate memory accesses. There are 64bit-wide
> > instructions which do bus-locking so that the are atomic,
> > but gcc will not use them directly.
> >
>
> I know, my question was why gcc does not generate cmpxchg8b on
> a 64 bits assign. Or it should not ?
>
It's not an assignment operator. The fact that you 'could' use
it as one is not relevant. For instance, using XOR you can
exchange the values of two operands. However, you would not
really like a 'C' compiler to do that. Instead, you would
expect it to stash some invisible temporary variable some-
where, hopefully in a register. If you really want to
swap values using the ^ operator, then you can code it yourself.
Wana play?
int main()
{
int a, b;
a = 0xaaaaaaaa;
b = 0xbbbbbbbb;
printf("a = %08x b = %08x \n", a, b);
// Swap
a ^= b;
b ^= a;
a ^= b;
printf("a = %08x b = %08x \n", a, b);
return 0;
}
The generated code is awful:
movl -8(%ebp),%edx
xorl %edx,-4(%ebp)
movl -4(%ebp),%edx
xorl %edx,-8(%ebp)
movl -8(%ebp),%edx
xorl %edx,-4(%ebp)
movl -8(%ebp),%eax
gcc doesn't care that some xchg operations are atomic. If there
was an 'atomic_t' type that 'C' (generically) knew about, then
the code-generator might try to find some strange sequence that
would perform 64-bit atomic operations on a 32-bit processor as
a side-effect, which is what it is with the compare/exchange-8-bytes
opcode.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
> A collateral question: why is the reason for this function ?
> long long assignments are not atomic in gcc ?
Another question: why do we do _double_ store here?
static inline void __set_64bit (unsigned long long * ptr,
unsigned int low, unsigned int high)
{
__asm__ __volatile__ (
"\n1:\t"
"movl (%0), %%eax\n\t"
"movl 4(%0), %%edx\n\t"
"lock cmpxchg8b (%0)\n\t"
"jnz 1b"
: /* no outputs */
: "D"(ptr),
"b"(low),
"c"(high)
: "ax","dx","memory");
}
This will execute expensive locked load-compare-store operation twice
almost always (unless previous value was already equal
to the value we are about to store)
AFAIK we can safely drop that loop (jnz instruction)
--
vda
On Fri, Aug 29, 2003 at 03:41:32PM -0400, Richard B. Johnson wrote:
> On Fri, 29 Aug 2003, J.A. Magallon wrote:
>
> >
> > On 08.29, Antonio Vargas wrote:
> > > On Fri, Aug 29, 2003 at 12:35:11AM +0200, J.A. Magallon wrote:
> > [...]
> > > >
> > > > A collateral question: why is the reason for this function ?
> > > > long long assignments are not atomic in gcc ?
> > >
> > > On x86, long long int == 64 bits but the chip is 32 bits wide,
> > > so it uses 2 separate memory accesses. There are 64bit-wide
> > > instructions which do bus-locking so that the are atomic,
> > > but gcc will not use them directly.
> > >
> >
> > I know, my question was why gcc does not generate cmpxchg8b on
> > a 64 bits assign. Or it should not ?
> >
>
> It's not an assignment operator. The fact that you 'could' use
> it as one is not relevant. For instance, using XOR you can
> exchange the values of two operands. However, you would not
> really like a 'C' compiler to do that. Instead, you would
> expect it to stash some invisible temporary variable some-
> where, hopefully in a register. If you really want to
> swap values using the ^ operator, then you can code it yourself.
>
>
> Wana play?
>
> int main()
> {
> int a, b;
> a = 0xaaaaaaaa;
> b = 0xbbbbbbbb;
> printf("a = %08x b = %08x \n", a, b);
> // Swap
> a ^= b;
> b ^= a;
> a ^= b;
> printf("a = %08x b = %08x \n", a, b);
> return 0;
> }
>
> The generated code is awful:
>
> movl -8(%ebp),%edx
> xorl %edx,-4(%ebp)
> movl -4(%ebp),%edx
> xorl %edx,-8(%ebp)
> movl -8(%ebp),%edx
> xorl %edx,-4(%ebp)
> movl -8(%ebp),%eax
>
> gcc doesn't care that some xchg operations are atomic. If there
> was an 'atomic_t' type that 'C' (generically) knew about, then
> the code-generator might try to find some strange sequence that
> would perform 64-bit atomic operations on a 32-bit processor as
> a side-effect, which is what it is with the compare/exchange-8-bytes
> opcode.
>
That was my fault for introducing an exchange instruction
into an assignement discussion, but I don't know of any
x86 instruction which can load 64bits to memory atomically,
is there any???
Greets, Antonio.
On Sat, Aug 30, 2003 at 08:27:44AM +0200, Antonio Vargas wrote:
> That was my fault for introducing an exchange instruction
> into an assignement discussion, but I don't know of any
> x86 instruction which can load 64bits to memory atomically,
> is there any???
perhaps "pusha", but it will load fare more than you need, and I don't know
if it's lockable.
Some MMX instruction might do it too, although not sure.
Willy
Willy Tarreau wrote:
> On Sat, Aug 30, 2003 at 08:27:44AM +0200, Antonio Vargas wrote:
> > That was my fault for introducing an exchange instruction
> > into an assignement discussion, but I don't know of any
> > x86 instruction which can load 64bits to memory atomically,
> > is there any???
>
> perhaps "pusha", but it will load fare more than you need, and I don't know
> if it's lockable.
"pusha" does not promise 64 bit writes. It can't be interrupted, but
I see nothing that ensures the multiple 32 bit words are combined into
atomic 64 bit writes as seen by other CPUs or peripherals.
> Some MMX instruction might do it too, although not sure.
Yes, if you want a 64 bit write and don't want to use cmpxchg8b, MMX
will do it.
You can also do it with the floating-point "fistpll" instruction (also
called "fistpq").
-- Jamie
> > A collateral question: why is the reason for this function ?
> > long long assignments are not atomic in gcc ?
>
> Another question: why do we do _double_ store here?
>
> static inline void __set_64bit (unsigned long long * ptr,
> unsigned int low, unsigned int high)
> {
> __asm__ __volatile__ (
> "\n1:\t"
> "movl (%0), %%eax\n\t"
> "movl 4(%0), %%edx\n\t"
> "lock cmpxchg8b (%0)\n\t"
> "jnz 1b"
> : /* no outputs */
> : "D"(ptr),
> "b"(low),
> "c"(high)
> : "ax","dx","memory");
> }
>
> This will execute expensive locked load-compare-store operation twice
> almost always (unless previous value was already equal
> to the value we are about to store)
It doesn't double store. cmpxchg8b does:
compare memory with edx:eax
if equal, copy copy ecx:ebx into memory, set zf = 1
else copy memory into edx:eax, set zf = 0
> AFAIK we can safely drop that loop (jnz instruction)
No. The only possible optimization would be to move 1: label directly at
cmpxgch8b. But it won't bring much, because loop is executed only if value
was changed after read and before cmpxchg.
There is another worse problem --- jump instructions are predicted as
taken when they point backwards, so it gets mispredicted. jnz should
really point to some other section, that is linked after .text, where
unconditional jump backwards would be.
Mikulas
On Sat, Aug 30, 2003 at 02:33:08PM +0200, Mikulas Patocka wrote:
> > > A collateral question: why is the reason for this function ?
> > > long long assignments are not atomic in gcc ?
> >
> > Another question: why do we do _double_ store here?
> >
> > static inline void __set_64bit (unsigned long long * ptr,
> > unsigned int low, unsigned int high)
> > {
> > __asm__ __volatile__ (
> > "\n1:\t"
> > "movl (%0), %%eax\n\t"
> > "movl 4(%0), %%edx\n\t"
> > "lock cmpxchg8b (%0)\n\t"
> > "jnz 1b"
> > : /* no outputs */
> > : "D"(ptr),
> > "b"(low),
> > "c"(high)
> > : "ax","dx","memory");
> > }
> >
> > This will execute expensive locked load-compare-store operation twice
> > almost always (unless previous value was already equal
> > to the value we are about to store)
>
> It doesn't double store. cmpxchg8b does:
> compare memory with edx:eax
> if equal, copy copy ecx:ebx into memory, set zf = 1
> else copy memory into edx:eax, set zf = 0
>
> > AFAIK we can safely drop that loop (jnz instruction)
>
> No. The only possible optimization would be to move 1: label directly at
> cmpxgch8b. But it won't bring much, because loop is executed only if value
> was changed after read and before cmpxchg.
Indeed.
>
> There is another worse problem --- jump instructions are predicted as
> taken when they point backwards, so it gets mispredicted. jnz should
> really point to some other section, that is linked after .text, where
> unconditional jump backwards would be.
On Pentium IV, you can prefix the jump instruction to alter the default
prediction (the encoding reuses one of the segment overides prefixes).
Regards,
Gabriel
On 08.29, J.A. Magallon wrote:
> Hi all...
>
> gcc3 gives this warning when using the __set_64bit_var function:
>
> /usr/src/linux/include/asm/system.h:190: warning: dereferencing type-punned pointer will break strict-aliasing rules
>
> Is it a potential problem ?
>
> This seems to cure it:
>
> --- linux-2.4.22-jam1m/include/asm-i386/system.h.orig 2003-08-29 00:26:41.000000000 +0200
> +++ linux-2.4.22-jam1m/include/asm-i386/system.h 2003-08-29 00:26:55.000000000 +0200
> @@ -181,8 +181,8 @@
> {
> __set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)>>32ULL));
> }
> -#define ll_low(x) *(((unsigned int*)&(x))+0)
> -#define ll_high(x) *(((unsigned int*)&(x))+1)
> +#define ll_low(x) *(((unsigned int*)(void*)&(x))+0)
> +#define ll_high(x) *(((unsigned int*)(void*)&(x))+1)
>
How about something like this:
typedef unsigned long long u64;
typedef unsigned long u32;
typedef unsigned short u16;
typedef unsigned char u8;
union u64_split {
u64 ll;
u32 l[2];
u16 w[4];
u8 b[8];
};
void f()
{
u64 a;
int l = ((union u64_split)a).l[0];
int h = ((union u64_split)a).l[1];
}
The clean way todo it without pointer arithmetic...
And with a modern gcc (anonymous structs):
union u64_split {
u64 x;
struct {
u32 l32,h32;
};
struct {
u16 ll16,lh16,hl16,hh16;
};
};
So you just write:
int l = ((union u64_split)a).l32;
u16 word = ((union u64_split)a).ll16;
I think it is better with structs, to take care of endianness if needed.
--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-1mdk))
> > static inline void __set_64bit (unsigned long long * ptr,
> > unsigned int low, unsigned int high)
> > {
> > __asm__ __volatile__ (
> > "\n1:\t"
> > "movl (%0), %%eax\n\t"
> > "movl 4(%0), %%edx\n\t"
> > "lock cmpxchg8b (%0)\n\t"
> > "jnz 1b"
> >
> > : /* no outputs */
> > : "D"(ptr),
> >
> > "b"(low),
> > "c"(high)
> >
> > : "ax","dx","memory");
> >
> > }
> >
> > This will execute expensive locked load-compare-store operation twice
> > almost always (unless previous value was already equal
> > to the value we are about to store)
>
> It doesn't double store. cmpxchg8b does:
> compare memory with edx:eax
> if equal, copy copy ecx:ebx into memory, set zf = 1
> else copy memory into edx:eax, set zf = 0
>
> > AFAIK we can safely drop that loop (jnz instruction)
>
> No. The only possible optimization would be to move 1: label directly at
> cmpxgch8b. But it won't bring much, because loop is executed only if value
> was changed after read and before cmpxchg.
You are right, I misremembered how cmpxchg8b works.
--
vda