MIME-Version: 1.0
In-Reply-To: <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net>
References: <aksgarg1989@gmail.com> <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com>
 <CA+55aFw97rV3yHVtCoS0CySLpziYOMBxY+U4QsVEw+8o8gZDXQ@mail.gmail.com>
 <CA+55aFxnq=QB0xAR1KW+25WcD+Z2aFrBhPa9oVhGr=PjqMTsmA@mail.gmail.com> <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 20 Jul 2017 11:31:36 -0700
Message-ID: <CA+55aFyBxxzNHH_z2BDNP5kmupSMa07wKK+6j=aURHN-tbMSQg@mail.gmail.com>
Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function
To: Peter Zijlstra <peterz@infradead.org>
Cc: Anshul Garg <aksgarg1989@gmail.com>,
        Davidlohr Bueso <dave@stgolabs.net>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        "anshul.g@samsung.com" <anshul.g@samsung.com>,
        Thomas Gleixner <tglx@linutronix.de>, Joe Perches <joe@perches.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2361
Lines: 56

How did this two-year old thread get resurrected?

Anyway, it got resurrected without even answering one core question:

On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote:
>>>> On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> >
>> > (I'm also not entirely sure what uses int_sqrt() that ends up being so
>> > performance-critical, so it would be good to document that too, since
>> > that probably also matters for the "what's the normal argument range"
>> > question..)

This is still the case. Which of the (very few) users really _care_?
And what are the normal values for that?

For example, the 802.11 minstrel code does a "MINSTREL_TRUNC()" on a
"unsigned int" value that is in millions. It's not even "unsigned
long", so we know it's not many thousands of millions, and
MINSTREL_TRUNC shifts it down by 12 bits.

So we know we have at most a 20-bit argument.

The one case that uses actual unsigned long seems to be
"slow_is_prime_number()", and honestly, the sqrt() is the *least* of
our problems there.

There's a few drivers and filesystems that use it. I do not believe
performance matters in those cases. Even if you do a "int_sqrt()" per
nertwork packet on some high-performance network (and none of them
look anything like that).

And there's a couple of VM users. They don't look particularly critical either.

So why do you care? Because honestly, calling int_sqrt() once in a
blue moon with caches cold and no branch prediction information tends
to have very different performance characteristics from calling it in
a loop with very predictable input.

So I think your "benchmark" is just garbage, in that it's testing
something entirely different than the actual load.

Also, since this is a generic library routine, no way can we depend on
fls being fast.

But we could certainly improve on the initial value a lot. It's just
that we should probably strive to improve on it without adding extra
branch misprediction or I$ misses - both things that your benchmark
isn't actually testing at all, since it does the exact opposite of
that by basically preloading both.

And the *most* important question is that first one:

 "Why does this matter, and what is the range it matters for?"

                  Linus