2015-02-02 03:23:57

by George Spelvin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] lib: find_*_bit reimplementation

Yury Norov <[email protected]> wrote:
> New implementations takes less space in source file (see diffstat)
> and in object. For me it's 710 vs 453 bytes of text.
>
> Patch was boot-tested on x86_64 and MIPS (big-endian) machines.
> Performance tests were ran on userspace with code like this:
>
> /* addr[] is filled from /dev/urandom */
> start = clock();
> while (ret < nbits)
> ret = find_next_bit(addr, nbits, ret + 1);
>
> end = clock();
> printf("%ld\t", (unsigned long) end - start);

> On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz rezults are next:
> (for find_next_bit, nbits is 8M, for find_first_bit - 80K)
>
> find_next_bit: find_first_bit:
> new current new current
> 26932 43151 14777 14925
> 26947 43182 14521 15423

I'll look at this more carefully, but one immediate thought is that this
is an unrealistic benchmark. It will amost never need to look at more
than one word of the array, but real arrays have long runs of zero
bits to skip over.

So the code size is appreciated, but the time benefits may be the result
of you optimizing for the wrong thing.

I'd try filling the array with mostly-identical bits, flipping with odds
of 1/256 or so.

For full generality, I'd test different 1->0 and 0->1 transition
probabilities. (But powers of two are probably enough for benchmarking.)


2015-02-04 23:07:32

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] lib: find_*_bit reimplementation


On 02.02.2015 06:17, George Spelvin wrote:
> Yury Norov <[email protected]> wrote:
>> New implementations takes less space in source file (see diffstat)
>> and in object. For me it's 710 vs 453 bytes of text.
>>
>> Patch was boot-tested on x86_64 and MIPS (big-endian) machines.
>> Performance tests were ran on userspace with code like this:
>>
>> /* addr[] is filled from /dev/urandom */
>> start = clock();
>> while (ret < nbits)
>> ret = find_next_bit(addr, nbits, ret + 1);
>>
>> end = clock();
>> printf("%ld\t", (unsigned long) end - start);
>> On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz rezults are next:
>> (for find_next_bit, nbits is 8M, for find_first_bit - 80K)
>>
>> find_next_bit: find_first_bit:
>> new current new current
>> 26932 43151 14777 14925
>> 26947 43182 14521 15423
> I'll look at this more carefully, but one immediate thought is that this
> is an unrealistic benchmark. It will amost never need to look at more
> than one word of the array, but real arrays have long runs of zero
> bits to skip over.
>
> So the code size is appreciated, but the time benefits may be the result
> of you optimizing for the wrong thing.
>
> I'd try filling the array with mostly-identical bits, flipping with odds
> of 1/256 or so.
>
> For full generality, I'd test different 1->0 and 0->1 transition
> probabilities. (But powers of two are probably enough for benchmarking.)
>
I think, test with random values represents at least one situation: well-fragmented memory
after long time work. (This is what I really have in my project.) In other hand, if long zero runs
is a typical behavior for one's system, it's a good opportunity for improvements, I think.
Anyway, the idea of testing find_bit on a long runs is good. Thank you.