Date: Thu, 1 May 2003 13:15:39 -0400
From: Chuck Ebbert <76306.1226@compuserve.com>
Subject: Re: [RFC][PATCH] Faster generic_fls
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Message-ID: <200305011318_MC3-1-36ED-528D@compuserve.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	 charset=us-ascii
Content-Disposition: inline
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 967
Lines: 24

Linus Torvalds wrote:

>>  BTW, has someone benchmarked BSF/BSR on x86 ? It should be
>> faster, it it's also possible that a poor microcode implements it with a one
>> bit/cycle algo, which will result in one instruction not being as fast as your
>> code.
>
> I think the original i386 did it with a one-bit-per-cycle algorithm,
> anything since should be fine. In particular, on a P4 where I just tested,
> the bsf seems to be 4 cycles over the whole input set (actually, my whole
> loop was 4 cycles per iteration, so 4 cycles is worst-case. I'm assuming
> the rest could have been done in parallell).

 
 Just for comparison, the Pentium (Classic) manual says 6-43 clocks for
bsfl and 7-72 (!) for bsrl.

------
 Chuck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/