Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965891AbdGTXYf (ORCPT ); Thu, 20 Jul 2017 19:24:35 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:36233 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964809AbdGTXYd (ORCPT ); Thu, 20 Jul 2017 19:24:33 -0400 MIME-Version: 1.0 In-Reply-To: <20170720223416.fxkgdtvuqwxxmf3y@hirez.programming.kicks-ass.net> References: <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com> <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> <20170720223416.fxkgdtvuqwxxmf3y@hirez.programming.kicks-ass.net> From: Linus Torvalds Date: Thu, 20 Jul 2017 16:24:32 -0700 X-Google-Sender-Auth: vMAD3ZzUnIWrXCAR27TmQBUANdk Message-ID: Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function To: Peter Zijlstra Cc: Anshul Garg , Davidlohr Bueso , Linux Kernel Mailing List , "anshul.g@samsung.com" , Thomas Gleixner , Joe Perches Content-Type: multipart/mixed; boundary="001a113a1af016681b0554c80d9e" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3109 Lines: 70 --001a113a1af016681b0554c80d9e Content-Type: text/plain; charset="UTF-8" On Thu, Jul 20, 2017 at 3:34 PM, Peter Zijlstra wrote: >> >> "Why does this matter, and what is the range it matters for?" > > I was looking to do some work on the idle estimator. Parts of that keep > online avg and variance for normal distributions. I wanted to bias the > avg downwards, the way to do that is to subtract a scaled stdev from it. > Computing the stdev from a variance requires the sqrt. > > In any case, I suppose the range of values would be in the order of > TICK_NSEC, so the variance would be a number of orders below that. So > we're looking at fairly small numbers <1e5. Ok. So that already cuts down the problem space a lot. And yes, the current code is very suboptimal for exactly the small number case. It's also typiocally the case that right-shifting is more expensive than left-shifting, so the fact that we left-shift twice in that loop for all the big values is extra expensive. So I would actually suggest a different approach: - start with a small 'm' - and grow it up. The "small m" means that there is a small constant, which is good. And growing it up is a single trivial left shift by two, which can also be done just two adds or as a single lea, so it works fine even on architectures with slow shifters etc. So mayube something like the attached? NOTE NOTE NOTE! This may be pure garbage. It's untested. But the code generation looks ok even if gcc is being way too clever and turns that first loop into a counted loop instead. Whatever, maybe it's the right thing to do. But note that I might have broken the sqrt for some case, so you need to double-check my logic there. The *intention* is that it's the exact same thing as it used to do, just initializing 'm' differently. Linus --001a113a1af016681b0554c80d9e Content-Type: text/plain; charset="US-ASCII"; name="patch.diff" Content-Disposition: attachment; filename="patch.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_j5d2d2ow0 IGxpYi9pbnRfc3FydC5jIHwgMTggKysrKysrKysrKysrKy0tLS0tCiAxIGZpbGUgY2hhbmdlZCwg MTMgaW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saWIvaW50X3Nx cnQuYyBiL2xpYi9pbnRfc3FydC5jCmluZGV4IDFlZjRjYzM0NDk3Ny4uZGEzYjNkYWJhZDhlIDEw MDY0NAotLS0gYS9saWIvaW50X3NxcnQuYworKysgYi9saWIvaW50X3NxcnQuYwpAQCAtMTYsMTQg KzE2LDIyIEBACiAgKi8KIHVuc2lnbmVkIGxvbmcgaW50X3NxcnQodW5zaWduZWQgbG9uZyB4KQog ewotCXVuc2lnbmVkIGxvbmcgYiwgbSwgeSA9IDA7CisJdW5zaWduZWQgbG9uZyBtLCB5OwogCiAJ aWYgKHggPD0gMSkKIAkJcmV0dXJuIHg7CiAKLQltID0gMVVMIDw8IChCSVRTX1BFUl9MT05HIC0g Mik7Ci0Jd2hpbGUgKG0gIT0gMCkgewotCQliID0geSArIG07CisJbSA9IDY0OworCWRvIHsKKwkJ dW5zaWduZWQgbG9uZyBuZXdfbSA9IG0gPDwgMjsKKwkJaWYgKCFuZXdfbSkKKwkJCWJyZWFrOwor CQltID0gbmV3X207CisJfSB3aGlsZSAobSA8IHgpOworCisJeSA9IDA7CisJZG8geworCQl1bnNp Z25lZCBsb25nIGIgPSB5ICsgbTsKIAkJeSA+Pj0gMTsKIAogCQlpZiAoeCA+PSBiKSB7CkBAIC0z MSw3ICszOSw3IEBAIHVuc2lnbmVkIGxvbmcgaW50X3NxcnQodW5zaWduZWQgbG9uZyB4KQogCQkJ eSArPSBtOwogCQl9CiAJCW0gPj49IDI7Ci0JfQorCX0gd2hpbGUgKG0pOwogCiAJcmV0dXJuIHk7 CiB9Cg== --001a113a1af016681b0554c80d9e--