2003-05-01 04:28:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls

In article <[email protected]>, Andi Kleen <[email protected]> wrote:
>Linus Torvalds <[email protected]> writes:
>
>> Yeah, except if you want best code generation you should probably use
>>
>> static inline int fls(int x)
>> {
>> int bit;
>> /* Only well-defined for non-zero */
>> asm("bsrl %1,%0":"=r" (bit):"rm" (x));
>
>"g" should work for the second operand too and it'll give gcc
>slightly more choices with possibly better code.

"g" allows immediates, which is _not_ correct.

>But the __builtin is probably preferable if gcc supports it because
>a builtin can be scheduled, inline assembly can't.

You're over-estimating gcc builtins, and underestimating inline
assembly.

gcc builtins usually generate _worse_ code than well-written inline
assembly, since builtins try to be generic (at least the ones that are
cross-architecture).

And inline asms schedule as well as any built-in, unless they are marked
volatile (either explicitly or implicitly by not having any outputs).

And the proof is in the pudding: I'll bet you a dollar my code generates
better code. AND my code works on the intel compiler _and_ with older
gcc's.

The fact is, gcc builtins are almost never worth it.

Linus


2003-05-01 23:49:54

by David Miller

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls

On Wed, 2003-04-30 at 21:40, Linus Torvalds wrote:
> And inline asms schedule as well as any built-in, unless they are marked
> volatile (either explicitly or implicitly by not having any outputs).

This actually is false. GCC does not know what resources the
instruction uses, therefore it cannot perform accurate instruction
scheduling.

Richard and I discussed at some time providing a way for inline
asms to give the instruction attributes.

An easier way is to provide a per-platform way to get at the
"weird" instructions a cpu has. This is precisely what GCC currently
provides in the form of __builtin_${CPU}_${WEIRD_INSN}(args...) type
interfaces. These give you what you want (complete control) yet
also let GCC schedule the thing accurately.

I know you think GCC is a pile of dogshit, in many ways I do too, but I
think it's just as important to point out the good parts where they
exist.

--
David S. Miller <[email protected]>

2003-05-02 05:01:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCH] Faster generic_fls


On 1 May 2003, David S. Miller wrote:
>
> This actually is false. GCC does not know what resources the
> instruction uses, therefore it cannot perform accurate instruction
> scheduling.

Yeah, and sadly the fact that gcc-3.2.x does better instruction scheduling
has shown itself to not actually _matter_ that much. I'm quite impressed
by the new scheduler, but gcc-2.x seems to still generate faster code on
too many examples.

CPU's are getting smarter, not dumber. And that implies, for example, that
instruction scheduling matters less and less. What matters on the P4, for
example, seems to be the _choice_ of instructions, not the scheduling of
them.

Linus