2001-12-17 12:46:10

by Jan-Benedict Glaw

[permalink] [raw]
Subject: xchg and GCC's optimisation:-(

Hi!

I recently posted a patch against 2.2.19 which eliminates most
printk()s by (in short words):

#define printk(format, arg...) do {} while(0)

However, I got trouble using the floppy driver, because
./kernel/dma.c:free_dma() seems to get miscompiled:

void free_dma(unsigned int dmanr)
{
if (dmanr >= MAX_DMA_CHANNELS) {
printk("Trying to free DMA%d\n", dmanr);
return;
}

if (xchg(&dma_chan_busy[dmanr].lock, 0) == 0) {
/* ERROR */ printk("Trying to free free DMA%d\n", dmanr);
return;
}

} /* free_dma */

Including a real_printk() at the line marked with ERROR will
result in:
00000088 <free_dma>:
88: 83 ec 0c sub $0xc,%esp
8b: 8b 54 24 10 mov 0x10(%esp,1),%edx
8f: 83 fa 07 cmp $0x7,%edx
92: 77 1e ja b2 <free_dma+0x2a>
94: 31 c0 xor %eax,%eax
96: 87 04 d5 00 00 00 00 xchg %eax,0x0(,%edx,8)
9d: 85 c0 test %eax,%eax
9f: 75 11 jne b2 <free_dma+0x2a>
a1: 83 c4 f8 add $0xfffffff8,%esp
a4: 52 push %edx
a5: 68 11 00 00 00 push $0x11
aa: e8 fc ff ff ff call ab <free_dma+0x23>
af: 83 c4 10 add $0x10,%esp
b2: 83 c4 0c add $0xc,%esp
b5: c3 ret
b6: 8d 76 00 lea 0x0(%esi),%esi
b9: 8d bc 27 00 00 00 00 lea 0x0(%edi,1),%edi

...which is fine and contains the needed xchg call. However,
substituting the printk() with "do {} while (0)" above,
the "if" path seems to be completely removed by the optimizer:

00000088 <free_dma>:
88: c3 ret
89: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi


I've looked at ./include/asm-i386/system.h which does some black
magic with it, and I don't really understand that. However, the
result is that the xchg gets optimized away, rendering at least
the floppy module unuseable:-(

MfG, JBG


2001-12-17 13:20:13

by Momchil Velikov

[permalink] [raw]
Subject: Re: xchg and GCC's optimisation:-(

>>>>> "Jan-Benedict" == Jan-Benedict Glaw <[email protected]> writes:
Jan-Benedict> I've looked at ./include/asm-i386/system.h which does some black
Jan-Benedict> magic with it, and I don't really understand that. However, the
Jan-Benedict> result is that the xchg gets optimized away, rendering at least

Can you try with this patch ...

--- system.h.orig.0 Mon Dec 17 15:03:38 2001
+++ system.h Mon Dec 17 15:16:58 2001
@@ -205,21 +205,15 @@ static inline unsigned long __xchg(unsig
switch (size) {
case 1:
__asm__ __volatile__("xchgb %b0,%1"
- :"=q" (x)
- :"m" (*__xg(ptr)), "0" (x)
- :"memory");
+ :"+q" (x),"=m" (*__xg(ptr)));
break;
case 2:
__asm__ __volatile__("xchgw %w0,%1"
- :"=r" (x)
- :"m" (*__xg(ptr)), "0" (x)
- :"memory");
+ :"+r" (x),"=m" (*__xg(ptr)));
break;
case 4:
__asm__ __volatile__("xchgl %0,%1"
- :"=r" (x)
- :"m" (*__xg(ptr)), "0" (x)
- :"memory");
+ :"+r" (x), "=m" (*__xg(ptr)));
break;
}
return x;

2001-12-17 13:36:30

by Denis Vlasenko

[permalink] [raw]
Subject: Re: xchg and GCC's optimisation:-(

On Monday 17 December 2001 10:45, Jan-Benedict Glaw wrote:

> void free_dma(unsigned int dmanr)
> {
> if (dmanr >= MAX_DMA_CHANNELS) {
> printk("Trying to free DMA%d\n", dmanr);
> return;
> }
>
> if (xchg(&dma_chan_busy[dmanr].lock, 0) == 0) {
> /* ERROR */ printk("Trying to free free DMA%d\n", dmanr);
> return;
> }
>
> } /* free_dma */
>
> Including a real_printk() at the line marked with ERROR will
> result in:
[snip]
> ...which is fine and contains the needed xchg call. However,
> substituting the printk() with "do {} while (0)" above,
> the "if" path seems to be completely removed by the optimizer:
>
> 00000088 <free_dma>:
> 88: c3 ret
> 89: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi
>
>
> I've looked at ./include/asm-i386/system.h which does some black
> magic with it, and I don't really understand that. However, the
> result is that the xchg gets optimized away, rendering at least
> the floppy module unuseable:-(

There is a comment that asm is not 100% valid.
My GCC 3.0.1 does not produce buggy code, guess why?
It does _not_ inline __xchg() even at -O99!

So much of compiler improvement 8-(
--
vda

2001-12-17 13:56:38

by Jan-Benedict Glaw

[permalink] [raw]
Subject: Re: xchg and GCC's optimisation:-(

On Mon, Dec 17, 2001 at 03:18:45PM +0200, Momchil Velikov wrote:
> >>>>> "Jan-Benedict" == Jan-Benedict Glaw <[email protected]> writes:
> Jan-Benedict> I've looked at ./include/asm-i386/system.h which does some black
> Jan-Benedict> magic with it, and I don't really understand that. However, the
> Jan-Benedict> result is that the xchg gets optimized away, rendering at least
>
> Can you try with this patch ...
>
> --- system.h.orig.0 Mon Dec 17 15:03:38 2001
> +++ system.h Mon Dec 17 15:16:58 2001
> @@ -205,21 +205,15 @@ static inline unsigned long __xchg(unsig
> switch (size) {
> case 1:
> __asm__ __volatile__("xchgb %b0,%1"
> - :"=q" (x)
> - :"m" (*__xg(ptr)), "0" (x)
> - :"memory");
> + :"+q" (x),"=m" (*__xg(ptr)));
[...]

The patch fixes the problem. Please also send it for inclusion
in 2.2.x, 2.4.x and 2.5.x, because these kernel version will behave
exactly the same (system.h looks very similar between all these
versions...)

MfG, JBG