2007-06-15 16:57:00

by Vegard Nossum

[permalink] [raw]
Subject: [PATCH] Optimize is_power_of_2().

From: Vegard Nossum <[email protected]>
Date: Fri, 15 Jun 2007 18:35:49 +0200
Subject: [PATCH] Optimize is_power_of_2().

Rationale: Removes one conditional branch and reduces icache footprint.
Proof: If n is false, the product of n and any value is false. If n is
true, the truth of (n * x) is the truth of x.

Signed-off-by: Vegard Nossum <[email protected]>
---
include/linux/log2.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/log2.h b/include/linux/log2.h
index 1b8a2c1..56196b1 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -51,7 +51,7 @@ int __ilog2_u64(u64 n)
static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
- return (n != 0 && ((n & (n - 1)) == 0));
+ return n * !(n & (n - 1));
}

/*
--
1.5.0.6


2007-06-15 18:21:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Optimize is_power_of_2().

Vegard Nossum wrote:
> From: Vegard Nossum <[email protected]>
> Date: Fri, 15 Jun 2007 18:35:49 +0200
> Subject: [PATCH] Optimize is_power_of_2().
>
> Rationale: Removes one conditional branch and reduces icache footprint.
> Proof: If n is false, the product of n and any value is false. If n is
> true, the truth of (n * x) is the truth of x.
>

You realize that on a lot of platforms, multiplication is done by an
out-of-line subroutine, and that even on those that aren't, it's
generally a long-pipeline operation, right?

-hpa

2007-06-15 18:30:24

by Robert P. J. Day

[permalink] [raw]
Subject: Re: [PATCH] Optimize is_power_of_2().

On Fri, 15 Jun 2007, H. Peter Anvin wrote:

> Vegard Nossum wrote:
> > From: Vegard Nossum <[email protected]>
> > Date: Fri, 15 Jun 2007 18:35:49 +0200
> > Subject: [PATCH] Optimize is_power_of_2().
> >
> > Rationale: Removes one conditional branch and reduces icache
> > footprint. Proof: If n is false, the product of n and any value is
> > false. If n is true, the truth of (n * x) is the truth of x.
>
> You realize that on a lot of platforms, multiplication is done by an
> out-of-line subroutine, and that even on those that aren't, it's
> generally a long-pipeline operation, right?

i was *just* about to mention somthing of the sort, but probably not
as succinctly.

rday
--
========================================================================
Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page
========================================================================

2007-06-15 19:47:59

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] Optimize is_power_of_2().



On Jun 15 2007 18:56, Vegard Nossum wrote:
> bool is_power_of_2(unsigned long n)
> {
>- return (n != 0 && ((n & (n - 1)) == 0));
>+ return n * !(n & (n - 1));
> }

There is a third way which uses neither * nor &&, but []:

bool is_power_of_2(unsigned long n)
{
static const bool yn[] = {
[0] = false,
[1 ... (8 * sizeof(n) - 1)] = true,
};
return yn[(n & (n - 1)) == 0];
}



Jan
--

2007-06-15 19:54:51

by David M. Lloyd

[permalink] [raw]
Subject: Re: [PATCH] Optimize is_power_of_2().

On Fri, 15 Jun 2007 21:47:50 +0200 (CEST)
Jan Engelhardt <[email protected]> wrote:

> On Jun 15 2007 18:56, Vegard Nossum wrote:
> > bool is_power_of_2(unsigned long n)
> > {
> >- return (n != 0 && ((n & (n - 1)) == 0));
> >+ return n * !(n & (n - 1));
> > }
>
> There is a third way which uses neither * nor &&, but []:

I assume using something GCC-specific is right out?

bool is_power_of_to(unsigned long n)
{
return __builtin_ffsl(n) == 1;
}

- DML

2007-06-15 19:57:51

by David M. Lloyd

[permalink] [raw]
Subject: Re: [PATCH] Optimize is_power_of_2().

On Fri, 15 Jun 2007 14:54:20 -0500
"David M. Lloyd" <[email protected]> wrote:

> On Fri, 15 Jun 2007 21:47:50 +0200 (CEST)
> Jan Engelhardt <[email protected]> wrote:
>
> > On Jun 15 2007 18:56, Vegard Nossum wrote:
> > > bool is_power_of_2(unsigned long n)
> > > {
> > >- return (n != 0 && ((n & (n - 1)) == 0));
> > >+ return n * !(n & (n - 1));
> > > }
> >
> > There is a third way which uses neither * nor &&, but []:
>
> I assume using something GCC-specific is right out?
>
> bool is_power_of_to(unsigned long n)
> {
> return __builtin_ffsl(n) == 1;

Pretend I typed this instead:

bool is_power_of_2(unsigned long n)
{
return __builtin_popcountl(n) == 1;
}

All I can say is, it's really hot here :P

- DML

2007-06-15 20:26:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Optimize is_power_of_2().

Robert P. J. Day wrote:
> On Fri, 15 Jun 2007, H. Peter Anvin wrote:
>
>> Vegard Nossum wrote:
>>> From: Vegard Nossum <[email protected]>
>>> Date: Fri, 15 Jun 2007 18:35:49 +0200
>>> Subject: [PATCH] Optimize is_power_of_2().
>>>
>>> Rationale: Removes one conditional branch and reduces icache
>>> footprint. Proof: If n is false, the product of n and any value is
>>> false. If n is true, the truth of (n * x) is the truth of x.
>> You realize that on a lot of platforms, multiplication is done by an
>> out-of-line subroutine, and that even on those that aren't, it's
>> generally a long-pipeline operation, right?
>
> i was *just* about to mention somthing of the sort, but probably not
> as succinctly.
>

You also realize, I hope, that obfuscating this for the compiler just
hurts. As has already been described, there are versions involving
jumps, selects, multiplication, tables, ffs, and popcount. All of those
operations are hideously expensive on some architectures and cheap on
others.

Personally, I would bet that the version as originally written is
probably best, since select (conditional move) is a fairly commonly
supported operation, especially on architectures where jumps are expensive.

Just to confuse the situation further, here is yet another version:

bool is_power_of_2(unsigned long n)
{
return !!n & !(n & (n-1));
}

... which, on i686 compiles to:
00000000 <is_power_of_2>:
0: 85 c0 test %eax,%eax
2: 8d 50 ff lea 0xffffffff(%eax),%edx
5: 0f 95 c1 setne %cl
8: 85 d0 test %edx,%eax
a: 0f 94 c0 sete %al
d: 0f b6 c0 movzbl %al,%eax
10: 21 c8 and %ecx,%eax
12: c3 ret

In the current version, gcc does generate a jump on i686 and x86-64:

00000000 <is_power_of_2>:
0: 89 c2 mov %eax,%edx
2: 31 c0 xor %eax,%eax
4: 85 d2 test %edx,%edx
6: 74 0b je 13 <is_power_of_2+0x13>
8: 8d 42 ff lea 0xffffffff(%edx),%eax
b: 85 c2 test %eax,%edx
d: 0f 94 c0 sete %al
10: 83 e0 01 and $0x1,%eax
13: f3 c3 repz ret

-hpa