Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753720AbdGUN0b (ORCPT ); Fri, 21 Jul 2017 09:26:31 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:49052 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750861AbdGUN0a (ORCPT ); Fri, 21 Jul 2017 09:26:30 -0400 Date: Fri, 21 Jul 2017 15:26:21 +0200 From: Peter Zijlstra To: Joe Perches Cc: Linus Torvalds , Anshul Garg , Davidlohr Bueso , Linux Kernel Mailing List , "anshul.g@samsung.com" , Thomas Gleixner , Ingo Molnar , Will Deacon Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function Message-ID: <20170721132621.4a52p2qbqwakchkc@hirez.programming.kicks-ass.net> References: <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com> <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> <20170720223416.fxkgdtvuqwxxmf3y@hirez.programming.kicks-ass.net> <20170721114039.dqip5wj2tha42mol@hirez.programming.kicks-ass.net> <1500639310.14415.25.camel@perches.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1500639310.14415.25.camel@perches.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1764 Lines: 69 On Fri, Jul 21, 2017 at 05:15:10AM -0700, Joe Perches wrote: > On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote: > > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x) > > if (x <= 1) > > return x; > > > > - m = 1UL << (BITS_PER_LONG - 2); > > + m = 1UL << (__fls(x) & ~1U); > > + > > + while (m > x) > > + m >>= 2; > > while (m > x) ? > > Belt and suspenders if __fls is broken? Hmm... you're right, that should not happen. It is a remnant from when I rounded up, like: m = 1UL << ((__fls(x) + 1) & ~1UL); Because I worried about the case where m == x, which is not included in the loop above (but works when you look at the actual computation loop and passes VALIDATE=1). But check this... I cannot explain :/ When I remove that loop, we, as fully expected, loose 1 branch, but the cycle count for the branch-cold case shoots up. Must be something GCC does. EVENT=0 -DNEW=1 -DFLS=1 event: 19.626050 +- 0.038995 EVENT=0 -DNEW=1 -DFLS=1 -DWIPE_BTB=1 event: 109.610670 +- 0.425667 EVENT=0 -DNEW=1 -DFLS=1 -DANSHUL=1 event: 21.445680 +- 0.043782 EVENT=0 -DNEW=1 -DFLS=1 -DANSHUL=1 -DWIPE_BTB=1 event: 83.590420 +- 0.142126 EVENT=4 -DNEW=1 -DFLS=1 event: 20.252330 +- 0.005265 EVENT=4 -DNEW=1 -DFLS=1 -DWIPE_BTB=1 event: 20.252340 +- 0.005265 EVENT=4 -DNEW=1 -DFLS=1 -DANSHUL=1 event: 21.252300 +- 0.005266 EVENT=4 -DNEW=1 -DFLS=1 -DANSHUL=1 -DWIPE_BTB=1 event: 21.252300 +- 0.005266 EVENT=5 -DNEW=1 -DFLS=1 event: 0.019370 +- 0.000732 EVENT=5 -DNEW=1 -DFLS=1 -DWIPE_BTB=1 event: 3.665240 +- 0.005309 EVENT=5 -DNEW=1 -DFLS=1 -DANSHUL=1 event: 0.020150 +- 0.000755 EVENT=5 -DNEW=1 -DFLS=1 -DANSHUL=1 -DWIPE_BTB=1 event: 2.225330 +- 0.004875 Let me dig out another GCC version current: gcc (Debian 6.3.0-18) 6.3.0 20170516