Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935968AbdGTSbj (ORCPT ); Thu, 20 Jul 2017 14:31:39 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:32929 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935863AbdGTSbh (ORCPT ); Thu, 20 Jul 2017 14:31:37 -0400 MIME-Version: 1.0 In-Reply-To: <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> References: <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com> <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> From: Linus Torvalds Date: Thu, 20 Jul 2017 11:31:36 -0700 X-Google-Sender-Auth: D-F9iG6bFLujqFUsyYjISonABXA Message-ID: Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function To: Peter Zijlstra Cc: Anshul Garg , Davidlohr Bueso , Linux Kernel Mailing List , "anshul.g@samsung.com" , Thomas Gleixner , Joe Perches Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2361 Lines: 56 How did this two-year old thread get resurrected? Anyway, it got resurrected without even answering one core question: On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra wrote: > On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote: >>>> On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds wrote: >> > >> > (I'm also not entirely sure what uses int_sqrt() that ends up being so >> > performance-critical, so it would be good to document that too, since >> > that probably also matters for the "what's the normal argument range" >> > question..) This is still the case. Which of the (very few) users really _care_? And what are the normal values for that? For example, the 802.11 minstrel code does a "MINSTREL_TRUNC()" on a "unsigned int" value that is in millions. It's not even "unsigned long", so we know it's not many thousands of millions, and MINSTREL_TRUNC shifts it down by 12 bits. So we know we have at most a 20-bit argument. The one case that uses actual unsigned long seems to be "slow_is_prime_number()", and honestly, the sqrt() is the *least* of our problems there. There's a few drivers and filesystems that use it. I do not believe performance matters in those cases. Even if you do a "int_sqrt()" per nertwork packet on some high-performance network (and none of them look anything like that). And there's a couple of VM users. They don't look particularly critical either. So why do you care? Because honestly, calling int_sqrt() once in a blue moon with caches cold and no branch prediction information tends to have very different performance characteristics from calling it in a loop with very predictable input. So I think your "benchmark" is just garbage, in that it's testing something entirely different than the actual load. Also, since this is a generic library routine, no way can we depend on fls being fast. But we could certainly improve on the initial value a lot. It's just that we should probably strive to improve on it without adding extra branch misprediction or I$ misses - both things that your benchmark isn't actually testing at all, since it does the exact opposite of that by basically preloading both. And the *most* important question is that first one: "Why does this matter, and what is the range it matters for?" Linus