Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756177AbcLTRWR (ORCPT ); Tue, 20 Dec 2016 12:22:17 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:35331 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755220AbcLTRWO (ORCPT ); Tue, 20 Dec 2016 12:22:14 -0500 Subject: Re: [PATCH v6 2/5] lib: implement __arch_bitrev8x4() To: Will Deacon References: <6c1c052d3c1d0c02a791aaaf8e114360ab1cb4e7.1481918884.git.stillcompiling@gmail.com> <20161219100614.GC4508@arm.com> Cc: Alan Tull , Moritz Fischer , Russell King , Catalin Marinas , Shawn Guo , Sascha Hauer , Fabio Estevam , Mark Rutland , Rob Herring , Anatolij Gustschin , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fpga@vger.kernel.org From: Joshua Clayton Message-ID: <70fd505b-5a53-c380-8b9e-341f2be84e2d@gmail.com> Date: Tue, 20 Dec 2016 09:22:11 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161219100614.GC4508@arm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2197 Lines: 66 Will, On 12/19/2016 02:06 AM, Will Deacon wrote: > On Fri, Dec 16, 2016 at 03:17:51PM -0800, Joshua Clayton wrote: >> Implement faster bitrev8x4() for arm, arm64 and mips, all the platforms >> with CONFIG_HAVE_ARCH_BITREVERSE. >> ARM platforms just need a byteswap added to the existing __arch_bitrev32() >> Amusingly, the mips implementation is exactly the opposite, requiring >> removal of the byteswap from its __arch_bitrev32() >> >> Signed-off-by: Joshua Clayton >> --- >> arch/arm/include/asm/bitrev.h | 6 ++++++ >> arch/arm64/include/asm/bitrev.h | 6 ++++++ >> arch/mips/include/asm/bitrev.h | 6 ++++++ >> include/linux/bitrev.h | 1 + >> 4 files changed, 19 insertions(+) >> >> diff --git a/arch/arm/include/asm/bitrev.h b/arch/arm/include/asm/bitrev.h >> index ec291c3..9482f78 100644 >> --- a/arch/arm/include/asm/bitrev.h >> +++ b/arch/arm/include/asm/bitrev.h >> @@ -17,4 +17,10 @@ static __always_inline __attribute_const__ u8 __arch_bitrev8(u8 x) >> return __arch_bitrev32((u32)x) >> 24; >> } >> >> +static __always_inline __attribute_const__ u32 __arch_bitrev8x4(u32 x) >> +{ >> + __asm__ ("rbit %0, %1; rev %0, %0" : "=r" (x) : "r" (x)); >> + return x; >> +} >> + >> #endif >> diff --git a/arch/arm64/include/asm/bitrev.h b/arch/arm64/include/asm/bitrev.h >> index a5a0c36..1801078 100644 >> --- a/arch/arm64/include/asm/bitrev.h >> +++ b/arch/arm64/include/asm/bitrev.h >> @@ -16,4 +16,10 @@ static __always_inline __attribute_const__ u8 __arch_bitrev8(u8 x) >> return __arch_bitrev32((u32)x) >> 24; >> } >> >> +static __always_inline __attribute_const__ u32 __arch_bitrev8x4(u32 x) >> +{ >> + __asm__ ("rbit %0, %1; rev %0, %0" : "=r" (x) : "r" (x)); > This is broken -- you're operating on 64-bit registers. I only noticed > because if you write: Ugh. mea culpa. I squinted at the AARCH64 asm and erroneously believed it to be the same as arm. > swab32(bitrev32(x)) > > then GCC generates: > > rbit w0, w0 > rev w0, w0 > > so perhaps we should just implement the asm-generic version like that > and not bother with the arch-specific stuff? > > Will You are so right. That is exactly what I will do. Thanks, Joshua