Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755486AbdLVNpY (ORCPT ); Fri, 22 Dec 2017 08:45:24 -0500 Received: from mail-lf0-f67.google.com ([209.85.215.67]:41756 "EHLO mail-lf0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751623AbdLVNpS (ORCPT ); Fri, 22 Dec 2017 08:45:18 -0500 X-Google-Smtp-Source: ACJfBov+OlXRi25G1RfYakNieercKAG6m6gZeub49YPjFGhPWCIQ70Gv8N6kx0nsG3bUUgLqeJe4w6a/Cd1XxdUQp5k= MIME-Version: 1.0 In-Reply-To: <79658484b99a4f65bb3a1937b3d156ad@AcuMS.aculab.com> References: <20171220142001.18161-1-cmo@melexis.com> <1c1d0ffa8ee140bf9adbc78f1559b1e8@AcuMS.aculab.com> <20171220160001.manjff26gfbjccsw@hirez.programming.kicks-ass.net> <95b9b2b52554410a85a9f10c7f5e8b13@AcuMS.aculab.com> <20171221141130.cdng2mysnjj6j4i6@hirez.programming.kicks-ass.net> <79658484b99a4f65bb3a1937b3d156ad@AcuMS.aculab.com> From: Crt Mori Date: Fri, 22 Dec 2017 14:44:35 +0100 Message-ID: Subject: Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt To: David Laight Cc: Peter Zijlstra , Jonathan Cameron , Ingo Molnar , Andrew Morton , Kees Cook , Rusty Russell , Ian Abbott , Larry Finger , Niklas Soderlund , Thomas Gleixner , Krzysztof Kozlowski , Masahiro Yamada , "linux-kernel@vger.kernel.org" , "linux-iio@vger.kernel.org" , Joe Perches Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1127 Lines: 32 >From simple strong typing of existing int_sqrt we came to something a bit more complex or better. Can we decide now which we want in, or I submit v12 and we decide then (although it is not a v12, but whole new thing)? On 21 December 2017 at 15:48, David Laight wrote: > From: Peter Zijlstra >> Sent: 21 December 2017 14:12 > ... >> > > This part above looks like FLS >> > It also does the rest of the required shifts. >> >> Still, fls() + shift is way faster on hardware that has an fls >> instruction. >> >> Writing out that binary search doesn't make sense. > > If the hardware doesn't have an appropriate fls instruction > the soft fls()will be worse. > > If you used fls() you'd still need quite a bit of code > to generate the correct shift and loop count adjustment. > Given the cost of the loop iterations the 3 tests are noise. > The open coded version is obviously correct... > > I didn't add the 4th one because the code always does 2 iterations. > > If you were really worried about performance there are faster > algorithms (even doing 2 or 4 bits a time is faster). > > David >