Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753437AbcD0UJB (ORCPT ); Wed, 27 Apr 2016 16:09:01 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:48864 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752786AbcD0UJA (ORCPT ); Wed, 27 Apr 2016 16:09:00 -0400 Date: Wed, 27 Apr 2016 13:08:58 -0700 From: Andrew Morton To: zengzhaoxiu@163.com Cc: linux@horizon.com, peterz@infradead.org, linux-kernel@vger.kernel.org, Zhaoxiu Zeng Subject: Re: [patch V2] lib: GCD: add binary GCD algorithm Message-Id: <20160427130858.b64a654d0b0d228d8a06a4d4@linux-foundation.org> In-Reply-To: <1461744350-24156-1-git-send-email-zengzhaoxiu@163.com> References: <1461744350-24156-1-git-send-email-zengzhaoxiu@163.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3109 Lines: 101 On Wed, 27 Apr 2016 16:05:50 +0800 zengzhaoxiu@163.com wrote: > From: Zhaoxiu Zeng > > Because some architectures (alpha, armv6, etc.) don't provide hardware division, > the mod operation is slow! Binary GCD algorithm uses simple arithmetic operations, > it replaces division with arithmetic shifts, comparisons, and subtraction. > > I have compiled successfully with x86_64_defconfig and i386_defconfig. > > I use the following code to test: > > ... > > Compiled with "-O2", on "VirtualBox 4.2.0-35-generic #40-Ubuntu x86_64" got: > > zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd > gcd0: elapsed 92281 > gcd1: elapsed 55005 > gcd2: elapsed 91088 > zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd > gcd0: elapsed 115546 > gcd1: elapsed 55928 > gcd2: elapsed 91938 > zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd > gcd0: elapsed 91189 > gcd1: elapsed 55493 > gcd2: elapsed 90078 > zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd > gcd0: elapsed 157364 > gcd1: elapsed 55204 > gcd2: elapsed 90058 > zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd > gcd0: elapsed 91667 > gcd1: elapsed 54641 > gcd2: elapsed 91364 Can you please include a summary which describes the overall impact of the patch? eg, how faster does a%b become on > --- a/lib/gcd.c > +++ b/lib/gcd.c > @@ -2,19 +2,82 @@ > #include > #include > > -/* Greatest common divisor */ > +/* > + * use __ffs if the CPU has efficient __ffs > + */ > +#if (defined(CONFIG_ALPHA) && defined(CONFIG_ALPHA_EV6) && defined(CONFIG_ALPHA_EV67)) || \ > + defined(CONFIG_ARC) || \ > + (defined(CONFIG_ARM) && __LINUX_ARM_ARCH__ >= 5) || defined(CONFIG_ARM64) || \ > + defined(CONFIG_AVR32) || \ > + defined(CONFIG_BLACKFIN) || \ > + defined(CONFIG_C6X) || \ > + defined(CONFIG_CRIS) || \ > + defined(CONFIG_FRV) || \ > + defined(CONFIG_HEXAGON) || \ > + defined(CONFIG_IA64) || \ > + (defined(CONFIG_M68K) && \ > + (!defined(CONFIG_CPU_HAS_NO_BITFIELDS) || \ > + ((defined(__mcfisaaplus__) || defined(__mcfisac__)) && \ > + !defined(CONFIG_M68000) && !defined(CONFIG_MCPU32)))) || \ > + defined(CONFIG_MN10300) || \ > + defined(CONFIG_OPENRISC) || \ > + defined(CONFIG_POWERPC) || \ > + defined(CONFIG_S390) || \ > + defined(CONFIG_TILE) || \ > + defined(CONFIG_UNICORE32) || \ > + defined(CONFIG_X86) || \ > + defined(CONFIG_XTENSA) > +# define USE_FFS 1 > +#elif defined(CONFIG_MIPS) > +# define USE_FFS (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) > +#else > +# define USE_FFS 0 > +#endif Yikes. Normally we'd create a new Kconfig variable and do this in the individual arch/XXX/Kconfig files. Can we just assume CONFIG_ARCH_HAS_SLOW_FFS is false and then set it to true for MIPS, arm, etc? > +/* > + * This implements the binary GCD algorithm. (Often attributed to Stein, > + * but as Knith has noted, appears a first-century Chinese math text.) > + */ Knith might be Knuth? > unsigned long gcd(unsigned long a, unsigned long b) > { > - unsigned long r; > + unsigned long r = a | b; > + > + if (!a || !b) > + return r;