2005-10-29 21:56:16

by Alexandre Oliva

[permalink] [raw]
Subject: amd64 bitops fix for -Os

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171672

This patches fixes a bug that comes up when compiling the kernel for
x86_64 optimizing for size. It affects 2.6.14 for sure, but I'm
pretty sure many earlier kernels are affected as well.

The symptom is that, as soon as some change is made to the root
filesystem (e.g. dmesg > /var/log/dmesg), the kernel mostly hangs. It
was not the first time I'd run into this symptom, but this time I
could track the problem down to enabling size optimizations in the
kernel build. It took some time to narrow down the culprit source
with a binary search, compiling part of the kernel sources with -Os
and part with -O2, but eventually it was clear that bitops itself was
to blame, which should have been clear from the soft lockup oops I
got.

The problem is that find_first_zero_bit() fails when called with an
underflown size, because its inline asm assumes at least one iteration
of scasq will run. When this does not hold, the conditional branch
that follows it uses flags from instructions prior to the asm
statement.

When optimizing for speed, the generated code is such that the flags
will have the correct value, because of the side effects on flags of
the right shift of the size, that survive through to the asm
statement. When optimizing for size, however, the mov instruction
used to initialize %rax with -1 is replaced with a smaller or
instruction, that modifies the flags and thus breaks the
zero-trip-count case.

Obviously the asm statement must not rely on the compiler setting up
flags by chance, so we have to either force the flags to be set
properly or make sure we run scasq at least once. In teh
find_first_zero_bit case, this comes at pretty much no cost, since we
already test size for non-zero, but we used to do that adjusting it
from bits to words; changing it should have no visible effect on
performance.

As for find_first_bit, it's quite likely that the same bug is present
when it's called by find_next_bit in the same conditions, but
find_first_bit doesn't even test for zero. AFAICT, it has just been
luckier, so I went ahead and added the same guard code to it. This
unfortunately adds a test to the fast path, but I don't see how to
avoid that without auditing all callers.

I actually introduce means to guard against these cases in the public
wrappers, but the BUG_ONs are disabled by default. I've left a kernel
running with them enabled for a bit, and they never hit, which is a
good sign, but I haven't tested it thoroughly or anything. We could
probably do away with these new tests by modifying the find_next*bit
functions so as to not call the find_first*bit functions if they've
already exhausted the range implied by the size argument. I'm not
sure whether that's worth doing, though, so I didn't.

While staring at the code and trying to figure out what the problem
was, I removed some needless casts from find_next_zero_bit, by
constifying the automatic pointer properly, and also moved the actual
code from find_first_zero_bit to a separate internal function, such
that we could add the bug-check to the public interface only.

I also noticed find_first_zero_bit was less efficient than
find_first_bit in that the former saved and restored rbx, because GCC
chose that to hold (addr) within the asm statement, instead of using
the readily-available and caller-saved rsi. I've thus changed the
code to prefer rsi, although in a perfect world the compiler would be
able to figure that out by itself.

The compiler could do a bit better in find_first_zero_bit: if the
initial size turns out to be zero, it could return, like it does in
find_first_bit, but instead of sets rdx to zero and jumps to the end
of the function where rdx is copied to rax before the return
statement. This is a negative effect of the assignment of variable
res to rdx instead of rax, which gets the register allocator to map
the pseudo register representing the return value to rdx, requiring a
copy at the end and preventing (as far as the dumb compiler can see
:-) the direct use of a return in the zero-size case. I've verified
that this is not caused by the additional inline function that I
introduced.

I tried to change the use of registers so as to enable the better code
for this path, but I couldn't come up with anything that was as
efficient, so I figured I wouldn't try to optimize the exceptional
path in expense of the common fast path and left it alone. If anyone
can come up with something better, please go ahead.


Anyhow, with this patch I could run 2.6.14, as in the Fedora
development tree, except for the change to optimize for size.

Signed-off-by: Alexandre Oliva <[email protected]>

--- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
+++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200
@@ -1,5 +1,11 @@
#include <linux/bitops.h>

+#define BITOPS_CHECK_UNDERFLOW_RANGE 0
+
+#if BITOPS_CHECK_UNDERFLOW_RANGE
+# include <linux/kernel.h>
+#endif
+
#undef find_first_zero_bit
#undef find_next_zero_bit
#undef find_first_bit
@@ -13,11 +19,21 @@
* Returns the bit-number of the first zero bit, not the number of the byte
* containing a bit.
*/
-inline long find_first_zero_bit(const unsigned long * addr, unsigned long size)
+static inline long
+__find_first_zero_bit(const unsigned long * addr, unsigned long size)
{
long d0, d1, d2;
long res;

+ /* We must test the size in words, not in bits, because
+ otherwise incoming sizes in the range -63..-1 will not run
+ any scasq instructions, and then the flags used by the je
+ instruction will have whatever random value was in place
+ before. Nobody should call us like that, but
+ find_next_zero_bit() does when offset and size are at the
+ same word and it fails to find a zero itself. */
+ size += 63;
+ size >>= 6;
if (!size)
return 0;
asm volatile(
@@ -30,11 +46,22 @@
" shlq $3,%%rdi\n"
" addq %%rdi,%%rdx"
:"=d" (res), "=&c" (d0), "=&D" (d1), "=&a" (d2)
- :"0" (0ULL), "1" ((size + 63) >> 6), "2" (addr), "3" (-1ULL),
- [addr] "r" (addr) : "memory");
+ :"0" (0ULL), "1" (size), "2" (addr), "3" (-1ULL),
+ /* Any register here would do, but GCC tends to
+ prefer rbx over rsi, even though rsi is readily
+ available and doesn't have to be saved. */
+ [addr] "S" (addr) : "memory");
return res;
}

+long find_first_zero_bit(const unsigned long * addr, unsigned long size)
+{
+#if BITOPS_CHECK_UNDERFLOW_RANGE
+ BUG_ON (size + 63 < size);
+#endif
+ return __find_first_zero_bit (addr, size);
+}
+
/**
* find_next_zero_bit - find the first zero bit in a memory region
* @addr: The address to base the search on
@@ -43,7 +70,7 @@
*/
long find_next_zero_bit (const unsigned long * addr, long size, long offset)
{
- unsigned long * p = ((unsigned long *) addr) + (offset >> 6);
+ const unsigned long * p = addr + (offset >> 6);
unsigned long set = 0;
unsigned long res, bit = offset&63;

@@ -63,8 +90,8 @@
/*
* No zero yet, search remaining full words for a zero
*/
- res = find_first_zero_bit ((const unsigned long *)p,
- size - 64 * (p - (unsigned long *) addr));
+ res = __find_first_zero_bit (p, size - 64 * (p - addr));
+
return (offset + set + res);
}

@@ -74,6 +101,17 @@
long d0, d1;
long res;

+ /* We must test the size in words, not in bits, because
+ otherwise incoming sizes in the range -63..-1 will not run
+ any scasq instructions, and then the flags used by the jz
+ instruction will have whatever random value was in place
+ before. Nobody should call us like that, but
+ find_next_bit() does when offset and size are at the same
+ word and it fails to find a one itself. */
+ size += 63;
+ size >>= 6;
+ if (!size)
+ return 0;
asm volatile(
" repe; scasq\n"
" jz 1f\n"
@@ -83,8 +121,7 @@
" shlq $3,%%rdi\n"
" addq %%rdi,%%rax"
:"=a" (res), "=&c" (d0), "=&D" (d1)
- :"0" (0ULL),
- "1" ((size + 63) >> 6), "2" (addr),
+ :"0" (0ULL), "1" (size), "2" (addr),
[addr] "r" (addr) : "memory");
return res;
}
@@ -99,6 +136,9 @@
*/
long find_first_bit(const unsigned long * addr, unsigned long size)
{
+#if BITOPS_CHECK_UNDERFLOW_RANGE
+ BUG_ON (size + 63 < size);
+#endif
return __find_first_bit(addr,size);
}



--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}


2005-10-29 23:41:52

by Randy Dunlap

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

On Sat, 29 Oct 2005 19:56:02 -0200 Alexandre Oliva wrote:

> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171672
>
> This patches fixes a bug that comes up when compiling the kernel for
> x86_64 optimizing for size. It affects 2.6.14 for sure, but I'm
> pretty sure many earlier kernels are affected as well.
>
> The symptom is that, as soon as some change is made to the root
> filesystem (e.g. dmesg > /var/log/dmesg), the kernel mostly hangs. It
> was not the first time I'd run into this symptom, but this time I
> could track the problem down to enabling size optimizations in the
> kernel build. It took some time to narrow down the culprit source
> with a binary search, compiling part of the kernel sources with -Os
> and part with -O2, but eventually it was clear that bitops itself was
> to blame, which should have been clear from the soft lockup oops I
> got.
>
> The problem is that find_first_zero_bit() fails when called with an
> underflown size, because its inline asm assumes at least one iteration
> of scasq will run. When this does not hold, the conditional branch
> that follows it uses flags from instructions prior to the asm
> statement.
>
> When optimizing for speed, the generated code is such that the flags
> will have the correct value, because of the side effects on flags of
> the right shift of the size, that survive through to the asm
> statement. When optimizing for size, however, the mov instruction
> used to initialize %rax with -1 is replaced with a smaller or
> instruction, that modifies the flags and thus breaks the
> zero-trip-count case.
>
> Obviously the asm statement must not rely on the compiler setting up
> flags by chance, so we have to either force the flags to be set
> properly or make sure we run scasq at least once. In teh
> find_first_zero_bit case, this comes at pretty much no cost, since we
> already test size for non-zero, but we used to do that adjusting it
> from bits to words; changing it should have no visible effect on
> performance.
>
> As for find_first_bit, it's quite likely that the same bug is present
> when it's called by find_next_bit in the same conditions, but
> find_first_bit doesn't even test for zero. AFAICT, it has just been
> luckier, so I went ahead and added the same guard code to it. This
> unfortunately adds a test to the fast path, but I don't see how to
> avoid that without auditing all callers.
>
> I actually introduce means to guard against these cases in the public
> wrappers, but the BUG_ONs are disabled by default. I've left a kernel
> running with them enabled for a bit, and they never hit, which is a
> good sign, but I haven't tested it thoroughly or anything. We could
> probably do away with these new tests by modifying the find_next*bit
> functions so as to not call the find_first*bit functions if they've
> already exhausted the range implied by the size argument. I'm not
> sure whether that's worth doing, though, so I didn't.
>
> While staring at the code and trying to figure out what the problem
> was, I removed some needless casts from find_next_zero_bit, by
> constifying the automatic pointer properly, and also moved the actual
> code from find_first_zero_bit to a separate internal function, such
> that we could add the bug-check to the public interface only.
>
> I also noticed find_first_zero_bit was less efficient than
> find_first_bit in that the former saved and restored rbx, because GCC
> chose that to hold (addr) within the asm statement, instead of using
> the readily-available and caller-saved rsi. I've thus changed the
> code to prefer rsi, although in a perfect world the compiler would be
> able to figure that out by itself.
>
> The compiler could do a bit better in find_first_zero_bit: if the
> initial size turns out to be zero, it could return, like it does in
> find_first_bit, but instead of sets rdx to zero and jumps to the end
> of the function where rdx is copied to rax before the return
> statement. This is a negative effect of the assignment of variable
> res to rdx instead of rax, which gets the register allocator to map
> the pseudo register representing the return value to rdx, requiring a
> copy at the end and preventing (as far as the dumb compiler can see
> :-) the direct use of a return in the zero-size case. I've verified
> that this is not caused by the additional inline function that I
> introduced.
>
> I tried to change the use of registers so as to enable the better code
> for this path, but I couldn't come up with anything that was as
> efficient, so I figured I wouldn't try to optimize the exceptional
> path in expense of the common fast path and left it alone. If anyone
> can come up with something better, please go ahead.
>
>
> Anyhow, with this patch I could run 2.6.14, as in the Fedora
> development tree, except for the change to optimize for size.
>
> Signed-off-by: Alexandre Oliva <[email protected]>
>
> --- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
> +++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200
> @@ -1,5 +1,11 @@
> #include <linux/bitops.h>
>
> +#define BITOPS_CHECK_UNDERFLOW_RANGE 0
> +
> +#if BITOPS_CHECK_UNDERFLOW_RANGE
> +# include <linux/kernel.h>
> +#endif
> +
> #undef find_first_zero_bit
> #undef find_next_zero_bit
> #undef find_first_bit
> @@ -13,11 +19,21 @@
> * Returns the bit-number of the first zero bit, not the number of the byte
> * containing a bit.
> */
> -inline long find_first_zero_bit(const unsigned long * addr, unsigned long size)
> +static inline long
> +__find_first_zero_bit(const unsigned long * addr, unsigned long size)
> {
> long d0, d1, d2;
> long res;
>
> + /* We must test the size in words, not in bits, because
> + otherwise incoming sizes in the range -63..-1 will not run
> + any scasq instructions, and then the flags used by the je
> + instruction will have whatever random value was in place
> + before. Nobody should call us like that, but
> + find_next_zero_bit() does when offset and size are at the
> + same word and it fails to find a zero itself. */
> + size += 63;
> + size >>= 6;
> if (!size)
> return 0;
> asm volatile(
> @@ -30,11 +46,22 @@
> " shlq $3,%%rdi\n"
> " addq %%rdi,%%rdx"
> :"=d" (res), "=&c" (d0), "=&D" (d1), "=&a" (d2)
> - :"0" (0ULL), "1" ((size + 63) >> 6), "2" (addr), "3" (-1ULL),
> - [addr] "r" (addr) : "memory");
> + :"0" (0ULL), "1" (size), "2" (addr), "3" (-1ULL),
> + /* Any register here would do, but GCC tends to
> + prefer rbx over rsi, even though rsi is readily
> + available and doesn't have to be saved. */
> + [addr] "S" (addr) : "memory");
> return res;
> }
>
> +long find_first_zero_bit(const unsigned long * addr, unsigned long size)
> +{
> +#if BITOPS_CHECK_UNDERFLOW_RANGE
> + BUG_ON (size + 63 < size);
> +#endif
> + return __find_first_zero_bit (addr, size);
> +}
> +
> /**
> * find_next_zero_bit - find the first zero bit in a memory region
> * @addr: The address to base the search on
> @@ -43,7 +70,7 @@
> */
> long find_next_zero_bit (const unsigned long * addr, long size, long offset)
> {
> - unsigned long * p = ((unsigned long *) addr) + (offset >> 6);
> + const unsigned long * p = addr + (offset >> 6);
> unsigned long set = 0;
> unsigned long res, bit = offset&63;
>
> @@ -63,8 +90,8 @@
> /*
> * No zero yet, search remaining full words for a zero
> */
> - res = find_first_zero_bit ((const unsigned long *)p,
> - size - 64 * (p - (unsigned long *) addr));
> + res = __find_first_zero_bit (p, size - 64 * (p - addr));
> +
> return (offset + set + res);
> }
>
> @@ -74,6 +101,17 @@
> long d0, d1;
> long res;
>
> + /* We must test the size in words, not in bits, because
> + otherwise incoming sizes in the range -63..-1 will not run
> + any scasq instructions, and then the flags used by the jz
> + instruction will have whatever random value was in place
> + before. Nobody should call us like that, but
> + find_next_bit() does when offset and size are at the same
> + word and it fails to find a one itself. */
> + size += 63;
> + size >>= 6;
> + if (!size)
> + return 0;
> asm volatile(
> " repe; scasq\n"
> " jz 1f\n"
> @@ -83,8 +121,7 @@
> " shlq $3,%%rdi\n"
> " addq %%rdi,%%rax"
> :"=a" (res), "=&c" (d0), "=&D" (d1)
> - :"0" (0ULL),
> - "1" ((size + 63) >> 6), "2" (addr),
> + :"0" (0ULL), "1" (size), "2" (addr),
> [addr] "r" (addr) : "memory");
> return res;
> }
> @@ -99,6 +136,9 @@
> */
> long find_first_bit(const unsigned long * addr, unsigned long size)
> {
> +#if BITOPS_CHECK_UNDERFLOW_RANGE
> + BUG_ON (size + 63 < size);
> +#endif
> return __find_first_bit(addr,size);
> }

Thanks. I'll test this early next week.
I always use -Os and (on x86_64) I always get a SOFT LOCKUP
during boot or when loading X. For me, it's always in ext3fs,
doing some flavor of bitops, so I have high hopes for this fixing
that problem.


---
~Randy

2005-10-30 10:56:42

by Alexandre Oliva

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

On Oct 29, 2005, Alexandre Oliva <[email protected]> wrote:

> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171672

> In teh find_first_zero_bit case, this comes at pretty much no cost,
> since we already test size for non-zero, but we used to do that
> adjusting it from bits to words; changing it should have no visible
> effect on performance.

I forgot to mention that this approach to fix the problem was
suggested by Al Viro. Sorry, Al.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}

2005-10-30 19:32:20

by Pavel Machek

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

Hi!

> This patches fixes a bug that comes up when compiling the kernel for
> x86_64 optimizing for size. It affects 2.6.14 for sure, but I'm
> pretty sure many earlier kernels are affected as well.

Is the same problem in i386, too?

> --- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
> +++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200
> @@ -1,5 +1,11 @@
> #include <linux/bitops.h>
>
> +#define BITOPS_CHECK_UNDERFLOW_RANGE 0
> +
> +#if BITOPS_CHECK_UNDERFLOW_RANGE
> +# include <linux/kernel.h>
> +#endif

Could you drop the ifdefs? Nice for debugging but we don't
want them in mainline.

Plus you want to add signed-off-by: header and send it to [email protected].
Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2005-10-31 17:22:43

by Randy Dunlap

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

Alexandre Oliva wrote:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171672
>
> This patches fixes a bug that comes up when compiling the kernel for
> x86_64 optimizing for size. It affects 2.6.14 for sure, but I'm
> pretty sure many earlier kernels are affected as well.
>
> The symptom is that, as soon as some change is made to the root
> filesystem (e.g. dmesg > /var/log/dmesg), the kernel mostly hangs. It
> was not the first time I'd run into this symptom, but this time I
> could track the problem down to enabling size optimizations in the
> kernel build. It took some time to narrow down the culprit source
> with a binary search, compiling part of the kernel sources with -Os
> and part with -O2, but eventually it was clear that bitops itself was
> to blame, which should have been clear from the soft lockup oops I
> got.
>
> The problem is that find_first_zero_bit() fails when called with an
> underflown size, because its inline asm assumes at least one iteration
> of scasq will run. When this does not hold, the conditional branch
> that follows it uses flags from instructions prior to the asm
> statement.


[snip]


> Anyhow, with this patch I could run 2.6.14, as in the Fedora
> development tree, except for the change to optimize for size.

Yes, works for me too.

> Signed-off-by: Alexandre Oliva <[email protected]>

Possibly Andrew or Andi have already merged this into their trees.
However, I have a few comments on the patch re Linux style.
They are meant to help inform you and others -- that's all.

> --- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
> +++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200

Diffs should start with a top-level names (even if it's entirely
phony), so that they can be applied with many scripts that are around
and expect that.

> @@ -1,5 +1,11 @@
> #include <linux/bitops.h>
>
> +#define BITOPS_CHECK_UNDERFLOW_RANGE 0
> +
> +#if BITOPS_CHECK_UNDERFLOW_RANGE
> +# include <linux/kernel.h>
> +#endif

We don't usually indent inside if/endif.

> @@ -13,11 +19,21 @@
> * Returns the bit-number of the first zero bit, not the number of the byte
> * containing a bit.
> */
> -inline long find_first_zero_bit(const unsigned long * addr, unsigned long size)
> +static inline long
> +__find_first_zero_bit(const unsigned long * addr, unsigned long size)

The only good reason for splitting a function header is if it would
otherwise be > 80 columns, not just to put the function name at the
beginning of the line.

> {
> long d0, d1, d2;
> long res;
>
> + /* We must test the size in words, not in bits, because
> + otherwise incoming sizes in the range -63..-1 will not run
> + any scasq instructions, and then the flags used by the je
> + instruction will have whatever random value was in place
> + before. Nobody should call us like that, but
> + find_next_zero_bit() does when offset and size are at the
> + same word and it fails to find a zero itself. */

Linux long-comment style is:
/*
* line1 words ....
* line2
* line3
*/

> @@ -30,11 +46,22 @@
> " shlq $3,%%rdi\n"
> " addq %%rdi,%%rdx"
> :"=d" (res), "=&c" (d0), "=&D" (d1), "=&a" (d2)
> - :"0" (0ULL), "1" ((size + 63) >> 6), "2" (addr), "3" (-1ULL),
> - [addr] "r" (addr) : "memory");
> + :"0" (0ULL), "1" (size), "2" (addr), "3" (-1ULL),
> + /* Any register here would do, but GCC tends to
> + prefer rbx over rsi, even though rsi is readily
> + available and doesn't have to be saved. */
> + [addr] "S" (addr) : "memory");

Comment in the middle of the difficult-to-read asm instruction in
undesirable (IMO).

> return res;
> }
>

> @@ -74,6 +101,17 @@
> long d0, d1;
> long res;
>
> + /* We must test the size in words, not in bits, because
> + otherwise incoming sizes in the range -63..-1 will not run
> + any scasq instructions, and then the flags used by the jz
> + instruction will have whatever random value was in place
> + before. Nobody should call us like that, but
> + find_next_bit() does when offset and size are at the same
> + word and it fails to find a one itself. */

Comment style again.

But I'm happy that my kernel now boots. I didn't realize that it was
a -Os issue until your email. Thanks.

--
~Randy

2005-10-31 20:09:33

by Alexandre Oliva

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

On Oct 30, 2005, Pavel Machek <[email protected]> wrote:

>> This patches fixes a bug that comes up when compiling the kernel for
>> x86_64 optimizing for size. It affects 2.6.14 for sure, but I'm
>> pretty sure many earlier kernels are affected as well.

> Is the same problem in i386, too?

No, it doesn't use custom versions of find_first*bit.

>> --- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
>> +++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200
>> @@ -1,5 +1,11 @@
>> #include <linux/bitops.h>
>>
>> +#define BITOPS_CHECK_UNDERFLOW_RANGE 0
>> +
>> +#if BITOPS_CHECK_UNDERFLOW_RANGE
>> +# include <linux/kernel.h>
>> +#endif

> Could you drop the ifdefs? Nice for debugging but we don't
> want them in mainline.

Are you absolutely sure we don't? I'd almost left them in, enabled
unconditionally. Note that the ifdef will make no difference
whatsoever for the case I've just fixed, but it would help catch any
other (ab)uses(?) elsewhere that may have gone unnoticed until now.

> Plus you want to add signed-off-by: header and send it to [email protected].
> Pavel

Err... The header was right there. Or do you mean as an e-mail
header, as opposed to a regular line in the e-mail body?

I'll forward the patch to ak.

--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}

2005-10-31 20:30:09

by Alexandre Oliva

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os


--
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}


Attachments:
amd64-bitops.patch (3.72 kB)

2005-10-31 20:59:06

by Randy Dunlap

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

On Mon, 31 Oct 2005, Alexandre Oliva wrote:

> On Oct 31, 2005, Randy Dunlap <[email protected]> wrote:
>
> >> Signed-off-by: Alexandre Oliva <[email protected]>
>
> > Possibly Andrew or Andi have already merged this into their trees.
> > However, I have a few comments on the patch re Linux style.
> > They are meant to help inform you and others -- that's all.
>
> Thanks, I didn't realized I'd deviated from the recommended style. In
> this updated version of the patch, I've removed the ifdefs that could
> sanity-check arguments to the exported entry points and adjusted the
> comments to follow the guidelines.
>
> >> --- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
> >> +++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200
>
> > Diffs should start with a top-level names (even if it's entirely
> > phony), so that they can be applied with many scripts that are around
> > and expect that.
>
> I hope you mean -p1 vs -p0. I tend to prefer -p0 myself, but quilt
> makes it easy enough to handle either :-) Fixed in the revised
> version.

Yes, that's all that I meant.

> >> -inline long find_first_zero_bit(const unsigned long * addr, unsigned long size)
> >> +static inline long
> >> +__find_first_zero_bit(const unsigned long * addr, unsigned long size)
>
> > The only good reason for splitting a function header is if it would
> > otherwise be > 80 columns, not just to put the function name at the
> > beginning of the line.
>
> In this case, it would, because I'm adding static and two leading
> underscores. But I'll keep that in mind, since this is quite
> different from the GCC style.
>
> >> + /* Any register here would do, but GCC tends to
> >> + prefer rbx over rsi, even though rsi is readily
> >> + available and doesn't have to be saved. */
> >> + [addr] "S" (addr) : "memory");
>
> > Comment in the middle of the difficult-to-read asm instruction in
> > undesirable (IMO).
>
> I hope it is as useful after the statement. Moving it before it would
> move it too far apart from what it refers to IMHO.
>
> Thanks again for the style feedback, it's really appreciated.

and thanks for the changes too.

> Should I have retained the problem description in the patch file, or
> is the Signed-off-by: line enough?

There should be a short problem description in the patch
(before the SOB: line).

See Documentation/SubmittingPatches
or <http://linux.yyz.us/patch-format.html>
or <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> .

--
~Randy

2005-10-31 21:14:42

by Pavel Machek

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

Hi!

> >> --- arch/x86_64/lib/bitops.c~ 2005-10-27 22:02:08.000000000 -0200
> >> +++ arch/x86_64/lib/bitops.c 2005-10-29 18:24:27.000000000 -0200
> >> @@ -1,5 +1,11 @@
> >> #include <linux/bitops.h>
> >>
> >> +#define BITOPS_CHECK_UNDERFLOW_RANGE 0
> >> +
> >> +#if BITOPS_CHECK_UNDERFLOW_RANGE
> >> +# include <linux/kernel.h>
> >> +#endif
>
> > Could you drop the ifdefs? Nice for debugging but we don't
> > want them in mainline.
>
> Are you absolutely sure we don't? I'd almost left them in, enabled
> unconditionally. Note that the ifdef will make no difference
> whatsoever for the case I've just fixed, but it would help catch any
> other (ab)uses(?) elsewhere that may have gone unnoticed until now.

Well, ifdefs need to be gone. Maybe you should unconditionally check
it (I do not think so, but...) but ifdefs are certainly wrong.

> > Plus you want to add signed-off-by: header and send it to [email protected].
>
> Err... The header was right there. Or do you mean as an e-mail
> header, as opposed to a regular line in the e-mail body?

Okay, so I'm blind. Sorry. (Yes, it should be in body).

> I'll forward the patch to ak.

Thanks.
Pavel
--
Thanks, Sharp!

2005-11-03 18:57:09

by Andi Kleen

[permalink] [raw]
Subject: Re: amd64 bitops fix for -Os

On Monday 31 October 2005 18:24, Randy Dunlap wrote:

> Possibly Andrew or Andi have already merged this into their trees.
> However, I have a few comments on the patch re Linux style.
> They are meant to help inform you and others -- that's all.

I have it in my queue, modulo the ugly BUG_ONs.

-Andi