From: Eric Biggers Subject: Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS Date: Mon, 18 Jun 2018 14:56:57 -0700 Message-ID: <20180618215657.GB8022@google.com> References: <20180214184223.254359-1-ebiggers@google.com> <20180214184223.254359-4-ebiggers@google.com> <8396d433caf1155f9ca422c6bad3200b@agner.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Jeffrey Walton , Greg Kaiser , Herbert Xu , Michael Halcrow , Patrik Torstensson , Stefan Agner , Paul Lawrence , linux-fscrypt@vger.kernel.org, "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , Greg Kroah-Hartman , Alex Cope , linux-crypto-owner@vger.kernel.org, linux-arm-kernel , Paul Crowley To: Ard Biesheuvel Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=m.gmane.org@lists.infradead.org List-Id: linux-crypto.vger.kernel.org On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote: > >>>>> + > >>>>> + // One-time XTS preparation > >>>>> + > >>>>> + /* > >>>>> + * Allocate stack space to store 128 bytes worth of tweaks. For > >>>>> + * performance, this space is aligned to a 16-byte boundary so that we > >>>>> + * can use the load/store instructions that declare 16-byte alignment. > >>>>> + */ > >>>>> + sub sp, #128 > >>>>> + bic sp, #0xf > >>>> > >>>> > >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y > >>>> > >>>> AS arch/arm/crypto/speck-neon-core.o > >>>> > >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages: > >>>> > >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> > >>>> In a quick hack this change seems to address it: > >>>> > >>>> > >>>> - sub sp, #128 > >>>> - bic sp, #0xf > >>>> + mov r6, sp > >>>> + sub r6, #128 > >>>> + bic r6, #0xf > >>>> + mov sp, r6 > >>>> > >>>> But there is probably a better solution to address this. > >>>> > >>> > >>> Given that there is no NEON on M class cores, I recommend we put something like > >>> > >>> THUMB(bx pc) > >>> THUMB(nop.w) > >>> THUMB(.arm) > >>> > >>> at the beginning and be done with it. > >> > >> I mean nop.n or just nop, of course, and we may need a '.align 2' at > >> the beginning as well. > > > > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems > > that bic sp,#0xf is the only issue... > > > > Well, in general, yes. In the case of NEON code, not really, since the > resulting code will not be smaller anyway, because the Thumb2 NEON > opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON > units, so all cores that this code can run on will be able to run in > ARM mode. > > So from a maintainability pov, having code that only assembles in one > way is better than having code that must compile both to ARM and to > Thumb2 opcodes. > > Just my 2 cents, anyway. I don't have too much of a preference, though Stefan's suggested 4 instructions can be reduced to 3, which also matches what aes-neonbs-core.S does: sub r12, sp, #128 bic r12, #0xf mov sp, r12 Ard, is the following what you're suggesting instead? diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S index 3c1e203e53b9..c989ce3dc057 100644 --- a/arch/arm/crypto/speck-neon-core.S +++ b/arch/arm/crypto/speck-neon-core.S @@ -8,6 +8,7 @@ */ #include +#include .text .fpu neon @@ -233,6 +234,12 @@ * nonzero multiple of 128. */ .macro _speck_xts_crypt n, decrypting + + .align 2 + THUMB(bx pc) + THUMB(nop) + THUMB(.arm) + push {r4-r7} mov r7, sp @@ -413,6 +420,8 @@ mov sp, r7 pop {r4-r7} bx lr + + THUMB(.thumb) .endm ENTRY(speck128_xts_encrypt_neon)