Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751481AbbBESUc (ORCPT ); Thu, 5 Feb 2015 13:20:32 -0500 Received: from mail-ig0-f182.google.com ([209.85.213.182]:54295 "EHLO mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750835AbbBESUa (ORCPT ); Thu, 5 Feb 2015 13:20:30 -0500 MIME-Version: 1.0 In-Reply-To: References: <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com> <1422938843.2293.4.camel@stgolabs.net> Date: Thu, 5 Feb 2015 10:20:29 -0800 X-Google-Sender-Auth: 0O1WCKbrt9B6LG1wdybmpIeyA68 Message-ID: Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function From: Linus Torvalds To: Anshul Garg Cc: Davidlohr Bueso , Linux Kernel Mailing List , "anshul.g@samsung.com" Content-Type: multipart/mixed; boundary=90e6ba6143f2e6a777050e5b5b6f Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3177 Lines: 78 --90e6ba6143f2e6a777050e5b5b6f Content-Type: text/plain; charset=UTF-8 On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg wrote: > > I have done profiling of int_sqrt function using perf tool for 10 times. > For this purpose i have created a userspace program which uses sqrt function > from 1 to a million. Hmm. I did that too, and it doesn't improve things for me. In fact, it makes it slower. [torvalds@i7 ~]$ gcc -Wall -O2 -DREDUCE int_sqrt.c ; time ./a.out real 0m2.098s user 0m2.095s sys 0m0.000s [torvalds@i7 ~]$ gcc -Wall -O2 int_sqrt.c ; time ./a.out real 0m1.886s user 0m1.883s sys 0m0.000s and the profile shows that 35% of the time is spent on that branch back of the initial reduction loop. In contrast, my suggested "reduce just once" does seem to improve things: [torvalds@i7 ~]$ gcc -Wall -O2 -DONCE int_sqrt.c ; time ./a.out real 0m1.436s user 0m1.434s sys 0m0.000s but it's kind of hacky. NOTE! This probably depends a lot on microarchitecture details, including very much branch predictor etc. And I didn't actually check that it gives the right result, but I do think that this optimization needs to be looked at more if we want to do it. I was running this on an i7-4770S, fwiw. Attached is the stupid test-program I used to do the above. Maybe I did something wrong. Linus --90e6ba6143f2e6a777050e5b5b6f Content-Type: text/x-csrc; charset=US-ASCII; name="int_sqrt.c" Content-Disposition: attachment; filename="int_sqrt.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i5sgxuvt0 LyoKICogQ29weXJpZ2h0IChDKSAyMDEzIERhdmlkbG9ociBCdWVzbyA8ZGF2aWRsb2hyLmJ1ZXNv QGhwLmNvbT4KICoKICogIEJhc2VkIG9uIHRoZSBzaGlmdC1hbmQtc3VidHJhY3QgYWxnb3JpdGht IGZvciBjb21wdXRpbmcgaW50ZWdlcgogKiAgc3F1YXJlIHJvb3QgZnJvbSBHdXkgTC4gU3RlZWxl LgogKi8KCiNkZWZpbmUgQklUU19QRVJfTE9ORyAoOCpzaXplb2YobG9uZykpCgovKioKICogaW50 X3NxcnQgLSByb3VnaCBhcHByb3hpbWF0aW9uIHRvIHNxcnQKICogQHg6IGludGVnZXIgb2Ygd2hp Y2ggdG8gY2FsY3VsYXRlIHRoZSBzcXJ0CiAqCiAqIEEgdmVyeSByb3VnaCBhcHByb3hpbWF0aW9u IHRvIHRoZSBzcXJ0KCkgZnVuY3Rpb24uCiAqLwp1bnNpZ25lZCBsb25nIF9fYXR0cmlidXRlX18o KG5vaW5saW5lKSkgaW50X3NxcnQodW5zaWduZWQgbG9uZyB4KQp7Cgl1bnNpZ25lZCBsb25nIGIs IG0sIHkgPSAwOwoKCWlmICh4IDw9IDEpCgkJcmV0dXJuIHg7CgoJbSA9IDFVTCA8PCAoQklUU19Q RVJfTE9ORyAtIDIpOwoKI2lmZGVmIFJFRFVDRQoJd2hpbGUgKG0gPiB4KQoJCW0gPj49IDI7CiNl bGlmIGRlZmluZWQoT05DRSkKCXsKCQl1bnNpZ25lZCBsb25nIG4gPSBtID4+IChCSVRTX1BFUl9M T05HLzIpOwoJCW0gPSAobiA+PSB4KSA/IG4gOiBtOwoJfQojZW5kaWYKCXdoaWxlIChtICE9IDAp IHsKCQliID0geSArIG07CgkJeSA+Pj0gMTsKCgkJaWYgKHggPj0gYikgewoJCQl4IC09IGI7CgkJ CXkgKz0gbTsKCQl9CgkJbSA+Pj0gMjsKCX0KCglyZXR1cm4geTsKfQoKaW50IG1haW4oaW50IGFy Z2MsIGNoYXIgKiphcmd2KQp7Cgl1bnNpZ25lZCBsb25nIGk7CgoJZm9yIChpID0gMDsgaSA8IDEw MDAwMDAwMDsgaSsrKSB7CgkJdW5zaWduZWQgbG9uZyBhID0gaW50X3NxcnQoaSk7CgkJYXNtIHZv bGF0aWxlKCIiOiA6ICJyIiAoYSkpOwoJfQoJcmV0dXJuIDA7Cn0K --90e6ba6143f2e6a777050e5b5b6f-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/