Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936361AbdGTP16 (ORCPT ); Thu, 20 Jul 2017 11:27:58 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:53082 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935913AbdGTP14 (ORCPT ); Thu, 20 Jul 2017 11:27:56 -0400 Date: Thu, 20 Jul 2017 17:27:49 +0200 From: Peter Zijlstra To: Linus Torvalds Cc: Anshul Garg , Davidlohr Bueso , Linux Kernel Mailing List , anshul.g@samsung.com, Thomas Gleixner , joe@perches.com Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function Message-ID: <20170720152749.k7al6xsvckczolzi@hirez.programming.kicks-ass.net> References: <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com> <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1426 Lines: 43 On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote: > ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=10000000 -DNEW=1 -DFLS=1 -DANSHUL=1 ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt > > Performance counter stats for './sqrt' (10 runs): > > 328,415,775 cycles:u ( +- 0.15% ) > 1,138,579,704 instructions:u # 3.47 insn per cycle ( +- 0.00% ) > > 0.088703205 seconds time elapsed > static __always_inline unsigned long fls(unsigned long word) > { > asm("rep; bsr %1,%0" > : "=r" (word) > : "rm" (word)); > return BITS_PER_LONG - 1 - word; > } That is actually "lzcnt", if I used the regular fls implementation: static __always_inline unsigned long __fls(unsigned long word) { asm("bsr %1,%0" : "=r" (word) : "rm" (word)); return word; } It ends up slightly more expensive: ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=10000000 -DNEW=1 -DFLS=1 -DANSHUL=1 ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt Performance counter stats for './sqrt' (10 runs): 384,842,215 cycles:u ( +- 0.08% ) 1,118,579,712 instructions:u # 2.91 insn per cycle ( +- 0.00% ) 0.103018001 seconds time elapsed Still loads cheaper than pretty much any other combination.