2010-04-02 20:17:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: Discrepancy between comments for sched_find_first_bit


* Peter Zijlstra <[email protected]> wrote:

> On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
> > include/asm-generic/bitops/sched.h says
> > /*
> > * Every architecture must define this function. It's the fastest
> > * way of searching a 100-bit bitmap. It's guaranteed that at least
> > * one of the 100 bits is cleared.
> > */
> >
> > arch/alpha/include/asm/bitops.h says
> > /*
> > * Every architecture must define this function. It's the fastest
> > * way of searching a 140-bit bitmap where the first 100 bits are
> > * unlikely to be set. It's guaranteed that at least one of the 140
> > * bits is set.
> > */
> >
> > Is the guarantee that one of the first 100-bits set (and that the last
> > 40 are useless?), or 140-bits? If it's just the first 100 bits we care
> > about, then the alpha version needs to be fixed.
> >
> > I'm just checking this out, because gcc produces horrendous code for
> > sched_find_first_bit on alpha. I rewrote it in assembly and it's
> > better than 4 times faster.
> >
> > Also, is it even worth optimizing that function? It looks like it's
> > only used in kernel/sched_rt.c.
>
> (might help if you CC the scheduler people on scheduler functions :-)
>
> Right, so it used to be 140 bits with the old O(1) scheduler, currently
> (as you noted) sched_rt is the only user left and will therefore only
> care about the first 100 bits.
>
> As it stands I think it might still make sense to optimize this as for
> rt loads it still on a hot path.
>
> As to the 100 vs 140 length, would it really make much of difference to
> shorten the implementation to 100? If not I'd leave it at 140.
>
> Ingo, any comments?

I guess getting below the 128 bits boundary would shave an instruction and a
branch off or so?

Ingo


2010-04-02 20:50:15

by Matt Turner

[permalink] [raw]
Subject: Re: Discrepancy between comments for sched_find_first_bit

On Fri, Apr 2, 2010 at 4:16 PM, Ingo Molnar <[email protected]> wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
>> > include/asm-generic/bitops/sched.h says
>> > /*
>> > ?* Every architecture must define this function. It's the fastest
>> > ?* way of searching a 100-bit bitmap. ?It's guaranteed that at least
>> > ?* one of the 100 bits is cleared.
>> > ?*/
>> >
>> > arch/alpha/include/asm/bitops.h says
>> > /*
>> > ?* Every architecture must define this function. It's the fastest
>> > ?* way of searching a 140-bit bitmap where the first 100 bits are
>> > ?* unlikely to be set. It's guaranteed that at least one of the 140
>> > ?* bits is set.
>> > ?*/
>> >
>> > Is the guarantee that one of the first 100-bits set (and that the last
>> > 40 are useless?), or 140-bits? If it's just the first 100 bits we care
>> > about, then the alpha version needs to be fixed.
>> >
>> > I'm just checking this out, because gcc produces horrendous code for
>> > sched_find_first_bit on alpha. I rewrote it in assembly and it's
>> > better than 4 times faster.
>> >
>> > Also, is it even worth optimizing that function? It looks like it's
>> > only used in kernel/sched_rt.c.
>>
>> (might help if you CC the scheduler people on scheduler functions :-)
>>
>> Right, so it used to be 140 bits with the old O(1) scheduler, currently
>> (as you noted) sched_rt is the only user left and will therefore only
>> care about the first 100 bits.
>>
>> As it stands I think it might still make sense to optimize this as for
>> rt loads it still on a hot path.
>>
>> As to the 100 vs 140 length, would it really make much of difference to
>> shorten the implementation to 100? If not I'd leave it at 140.
>>
>> Ingo, any comments?
>
> I guess getting below the 128 bits boundary would shave an instruction and a
> branch off or so?
>
> ? ? ? ?Ingo
>

That's right. I should be able to get rid of a cmov, which kind of
counts as two instructions in EV6 scheduling.

So I should send a patch to reduce this to the first 100 (128) bits?

Thanks guys,
Matt

2010-04-02 21:25:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: Discrepancy between comments for sched_find_first_bit


* Matt Turner <[email protected]> wrote:

> On Fri, Apr 2, 2010 at 4:16 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Peter Zijlstra <[email protected]> wrote:
> >
> >> On Sun, 2010-03-28 at 23:37 -0400, Matt Turner wrote:
> >> > include/asm-generic/bitops/sched.h says
> >> > /*
> >> > ?* Every architecture must define this function. It's the fastest
> >> > ?* way of searching a 100-bit bitmap. ?It's guaranteed that at least
> >> > ?* one of the 100 bits is cleared.
> >> > ?*/
> >> >
> >> > arch/alpha/include/asm/bitops.h says
> >> > /*
> >> > ?* Every architecture must define this function. It's the fastest
> >> > ?* way of searching a 140-bit bitmap where the first 100 bits are
> >> > ?* unlikely to be set. It's guaranteed that at least one of the 140
> >> > ?* bits is set.
> >> > ?*/
> >> >
> >> > Is the guarantee that one of the first 100-bits set (and that the last
> >> > 40 are useless?), or 140-bits? If it's just the first 100 bits we care
> >> > about, then the alpha version needs to be fixed.
> >> >
> >> > I'm just checking this out, because gcc produces horrendous code for
> >> > sched_find_first_bit on alpha. I rewrote it in assembly and it's
> >> > better than 4 times faster.
> >> >
> >> > Also, is it even worth optimizing that function? It looks like it's
> >> > only used in kernel/sched_rt.c.
> >>
> >> (might help if you CC the scheduler people on scheduler functions :-)
> >>
> >> Right, so it used to be 140 bits with the old O(1) scheduler, currently
> >> (as you noted) sched_rt is the only user left and will therefore only
> >> care about the first 100 bits.
> >>
> >> As it stands I think it might still make sense to optimize this as for
> >> rt loads it still on a hot path.
> >>
> >> As to the 100 vs 140 length, would it really make much of difference to
> >> shorten the implementation to 100? If not I'd leave it at 140.
> >>
> >> Ingo, any comments?
> >
> > I guess getting below the 128 bits boundary would shave an instruction and a
> > branch off or so?
> >
> > ? ? ? ?Ingo
> >
>
> That's right. I should be able to get rid of a cmov, which kind of
> counts as two instructions in EV6 scheduling.
>
> So I should send a patch to reduce this to the first 100 (128) bits?

Sure, why not, every instruction counts :-)

Note, if you do it then please also include a disassembly of the area that
changed, so that we document the effect.

Ingo