2003-05-01 05:04:08

by Hu Gang

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls

On Wed, 30 Apr 2003 13:55:12 -0700
Andrew Morton <[email protected]> wrote:

> Daniel Phillips <[email protected]> wrote:
> >
> > +static inline unsigned fls8(unsigned n)
> > +{
> > + return n & 0xf0?
> > + n & 0xc0? (n >> 7) + 7: (n >> 5) + 5:
> > + n & 0x0c? (n >> 3) + 3: n - ((n + 1) >> 2);
> > +}
>
> return fls_table[n];
>
>
> That'll be faster in benchmarks, possibly slower in real world. As usual.
>
Here is table version of the fls. Yes it fast than other.
typedef struct fls_table_t {
unsigned min;
unsigned max;
unsigned value;
} fls_table_t;

static fls_table_t fls_table[] = {
{0, 1, 1},
{1, 2, 2},
{2, 4, 3},
{4, 8, 4},
{8, 16, 5},
{16, 32, 6},
{32, 64, 7},
{64, 128, 8},
{128, 256, 9},
{256, 512, 10},
{512, 1024, 11},
{1024, 2048, 12},
{2048, 4096, 13},
{4096, 8192, 14},
{8192, 16384, 15},
{16384, 32768, 16},
{32768, 65536, 17},
{65536, 131072, 18},
{131072, 262144, 19},
{262144, 524288, 20},
{524288, 1048576, 21},
{1048576, 2097152, 22},
{2097152, 4194304, 23},
{4194304, 8388608, 24},
{8388608, 16777216, 25},
{16777216, 33554432, 26},
{33554432, 67108864, 27},
{67108864, 134217728, 28},
{134217728, 268435456, 29},
{268435456, 536870912, 30},
{536870912, 1073741824, 31},
{1073741824, 2147483648, 32},

{0, -1, 32},
};

static inline int fls_table_fls(unsigned n)
{
unsigned i = 0;

do {
i++;
} while (n <= fls_table[i].max && n > fls_table[i].min);

return fls_table[i].value;
}

--test log is here----
model name : Intel(R) Pentium(R) 4 Mobile CPU 1.60GHz
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)
==== -march=i686 -O3 ====
test_fls.c:110: warning: decimal constant is so large that it is unsigned
4294967295 iterations of nil... checksum = 4294967295

real 0m22.311s
user 0m21.850s
sys 0m0.010s
4294967295 iterations of old... checksum = 4294967265

real 0m34.673s
user 0m33.960s
sys 0m0.000s
4294967295 iterations of new... checksum = 4294967265

real 0m57.702s
user 0m56.530s
sys 0m0.040s
4294967295 iterations of new2... checksum = 4294967265

real 0m41.709s
user 0m40.810s
sys 0m0.000s
4294967295 iterations of table... checksum = 4294967295

real 0m22.274s
user 0m21.640s
sys 0m0.010s
------------------
==== -march=i686 -O2 ====
test_fls.c:110: warning: decimal constant is so large that it is unsigned
4294967295 iterations of nil... checksum = 4294967295

real 0m29.501s
user 0m28.880s
sys 0m0.040s
4294967295 iterations of old... checksum = 4294967265

real 0m49.054s
user 0m47.840s
sys 0m0.040s
4294967295 iterations of new... checksum = 4294967265

real 0m52.331s
user 0m51.270s
sys 0m0.010s
4294967295 iterations of new2... checksum = 4294967265

real 1m7.043s
user 1m5.660s
sys 0m0.010s
4294967295 iterations of table... checksum = 4294967295

real 0m45.783s
user 0m44.860s
sys 0m0.010s
------------------
==== -O2 ====
test_fls.c:110: warning: decimal constant is so large that it is unsigned
4294967295 iterations of nil... checksum = 4294967295

real 0m34.395s
user 0m33.670s
sys 0m0.000s
4294967295 iterations of old... checksum = 4294967265

real 0m52.277s
user 0m51.230s
sys 0m0.000s
4294967295 iterations of new... checksum = 4294967265

real 0m54.750s
user 0m53.640s
sys 0m0.010s
4294967295 iterations of new2... checksum = 4294967265

real 0m57.728s
user 0m56.590s
sys 0m0.000s
4294967295 iterations of table... checksum = 4294967295

real 0m44.540s
user 0m43.600s
sys 0m0.020s
------------------
==== -O3 ====
test_fls.c:110: warning: decimal constant is so large that it is unsigned
4294967295 iterations of nil... checksum = 4294967295

real 0m16.196s
user 0m15.880s
sys 0m0.000s
4294967295 iterations of old... checksum = 4294967265

real 0m34.342s
user 0m33.630s
sys 0m0.010s
4294967295 iterations of new... checksum = 4294967265

real 0m58.554s
user 0m57.380s
sys 0m0.000s
4294967295 iterations of new2... checksum = 4294967265

real 0m37.140s
user 0m36.320s
sys 0m0.040s
4294967295 iterations of table... checksum = 4294967295

real 0m21.723s
user 0m21.270s
sys 0m0.000s
------------------

all of the files can download from:
http://soulinfo.com/~hugang/kernel/
--
Hu Gang / Steve
Email : [email protected], [email protected]
GPG FinePrint: 4099 3F1D AE01 1817 68F7 D499 A6C2 C418 86C8 610E
ICQ# : 205800361
Registered Linux User : 204016


2003-05-01 10:10:55

by Willy Tarreau

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls

On Thu, May 01, 2003 at 01:16:05PM +0800, hugang wrote:

> Here is table version of the fls. Yes it fast than other.

Sorry, but this code returns wrong results. Test it with less
iterations, it doesn't return the same result.

> unsigned i = 0;
>
> do {
> i++;
> } while (n <= fls_table[i].max && n > fls_table[i].min);

You never even compare with table[0] !

> --test log is here----

I recoded a table based on your idea, and it's clearly faster than others, and
even faster than yours :

============
static unsigned tbl[33] = { 2147483648, 1073741824, 536870912, 268435456,
134217728, 67108864, 33554432, 16777216,
8388608, 4194304, 2097152, 1048576,
524288, 262144, 131072, 65536,
32768, 16384, 8192, 4096,
2048, 1024, 512, 256,
128, 64, 32, 16,
8, 4, 2, 1, 0 };


static inline int fls_tbl_fls(unsigned n)
{
unsigned i = 0;

while (n < tbl[i])
i++;

return 32 - i;
}
===========

it returns the right result. Compiled with gcc 2.95.3 -march=i686 -O3, athlon
1.533 GHz (1800+) :

4294967295 iterations of new... checksum = 4294967265

real 1m6.182s
user 1m6.060s
sys 0m0.133s
4294967295 iterations of new2... checksum = 4294967265

real 0m36.504s
user 0m36.294s
sys 0m0.202s
4294967295 iterations of fls_table... checksum = 4294967295

real 0m21.962s
user 0m21.833s
sys 0m0.124s
4294967295 iterations of fls_tbl... checksum = 4294967265

real 0m19.268s
user 0m19.102s
sys 0m0.168s

The same compiled with gcc-3.2.3 :

4294967295 iterations of fls_table... checksum = 4294967295

real 0m14.211s
user 0m14.210s
sys 0m0.000s

Cheers,
Willy

2003-05-01 11:05:40

by Hu Gang

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls

On Thu, 1 May 2003 12:22:30 +0200
Willy TARREAU <[email protected]> wrote:

> On Thu, May 01, 2003 at 01:16:05PM +0800, hugang wrote:
>
> > Here is table version of the fls. Yes it fast than other.
>
> Sorry, but this code returns wrong results. Test it with less
> iterations, it doesn't return the same result.
>
> > unsigned i = 0;
> >
> > do {
> > i++;
> > } while (n <= fls_table[i].max && n > fls_table[i].min);
>
> You never even compare with table[0] !
>
> > --test log is here----
>
> I recoded a table based on your idea, and it's clearly faster than others, and
> even faster than yours :
>
> ============
> static unsigned tbl[33] = { 2147483648, 1073741824, 536870912, 268435456,
> 134217728, 67108864, 33554432, 16777216,
> 8388608, 4194304, 2097152, 1048576,
> 524288, 262144, 131072, 65536,
> 32768, 16384, 8192, 4096,
> 2048, 1024, 512, 256,
> 128, 64, 32, 16,
> 8, 4, 2, 1, 0 };
>
>
> static inline int fls_tbl_fls(unsigned n)
> {
> unsigned i = 0;
>
> while (n < tbl[i])
> i++;
>
> return 32 - i;
> }
> ===========

Do see the mail by Andrew Morton? -----

From: Andrew Morton <[email protected]>
To: hugang <[email protected]>
Cc: [email protected]
Subject: Re: [RFC][PATCH] Faster generic_fls
Date: Wed, 30 Apr 2003 22:11:29 -0700
Sender: [email protected]
X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i586-pc-linux-gnu)

hugang <[email protected]> wrote:
>
> Here is table version of the fls. Yes it fast than other.

nooo.. That has a big cache footprint. At the very least you should use
a binary search. gcc will do it for you:

switch (n) {
case 0 ... 1:
return 1;
case 2 ... 3:
return 2;
case 4 ... 7:
return 3;
case 8 ... 15:
return 4;

etc.
----------
I agree him, The Lastest code my posted has works fine, Please check it.

--
Hu Gang / Steve
Email : [email protected], [email protected]
GPG FinePrint: 4099 3F1D AE01 1817 68F7 D499 A6C2 C418 86C8 610E
ICQ# : 205800361
Registered Linux User : 204016

2003-05-01 11:37:58

by Willy Tarreau

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls

On Thu, May 01, 2003 at 07:17:30PM +0800, hugang wrote:

> Do see the mail by Andrew Morton? -----

no sorry, I didn't see it, but I've read it now. I agree with him, that's
what I noticed here too. I also tried a binary search through the table in a
tree-like fashion, but the CPU doesn't like going back and forth through the
table at all ! I'm retrying with an "elastic mask".

Cheers
Willy