Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp252598imm; Fri, 19 Oct 2018 22:52:46 -0700 (PDT) X-Google-Smtp-Source: ACcGV631OwExWBaUk7cRpXLHmeCJlcNEvnp1E4ES17EUO3ELtS/1mm/LTBit77aVqgT4SLhyDpAa X-Received: by 2002:a63:2807:: with SMTP id o7-v6mr34755198pgo.155.1540014766097; Fri, 19 Oct 2018 22:52:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540014766; cv=none; d=google.com; s=arc-20160816; b=CginpSo+dZSmEcf783u6Y05xddxot900Guca7LRcpEoKzZpqUbp3l2c/bFoHKSuNQl VtPBmntOPsbL7ZqG2IQWjc2QTNbPAfumQ9PexxgobvKN6VdLj4pWOlmY01a8qtLwSmcF fr6VT62lQSD+rWGEQ6JZJSHvCs/t3A6uhgVx0o9AIgW9CZdyxzTRR6yu0iL/CvbboeWI QF1FWcg0a5nAWIlsnawM0mMW5nX8aHTf2pSpwhyu2N9KnQZ/zP43DbHUg88sZVAxqDfn h2TyctTUdIZyLbxy2gVAxniINSZJd7xYzkP/xO331hmUhQb1k7j005yQpQqEqeQuna8Q cjAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Q0nVnB2vz8FuOh7attialdqdaaoAdbr7jjkqJm0qbGk=; b=vow14Vmo3QHGaSoQZ5bTAQ2gRft3JrGznYeaM4BYeccsbBzki3ESQigfgc1MZvMOif bnJvnZl0H1r+Wx50dD8329jUP3+obUYv9XkgFevFF4jr4x1E2KCyY9hiT2/w3yQWh9rE kCLY6AbokbtB6ngwvv/PGplD+aKLZTTm67txMD9gwd6Bn5mnczbLinpUTB/2D3Ptdm5e pKvakYVFE2wf4vhByrTRSUdokTA97wj5KKye5UcQd/CuC8aqyoxoM+lBSf+FOS0L9ii5 jtsmUnOS3K68/uF7MeRT7qfWCTb27r7a4EbbnlsJoLJH0Cm+0mCEaW6sLiatHgIQO7Ub XOzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=FRb2CJc7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r25-v6si25940824pgl.146.2018.10.19.22.52.15; Fri, 19 Oct 2018 22:52:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=FRb2CJc7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726698AbeJTOAy (ORCPT + 99 others); Sat, 20 Oct 2018 10:00:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:40152 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726261AbeJTOAy (ORCPT ); Sat, 20 Oct 2018 10:00:54 -0400 Received: from sol.localdomain (c-67-185-97-198.hsd1.wa.comcast.net [67.185.97.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C7EDC214C2; Sat, 20 Oct 2018 05:51:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1540014699; bh=mMTtveDIsvCFGTKyiMaE3Af4jM69YsVrkZrfzGFm9B8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FRb2CJc720aWBSro/lCugAbyu1VkF9+HiudOoud4mi2ldrRDJJzV2NsimkuFpN9L4 bcCYRkaMrKj5/PSmYn2Tmpm4lBMYC6k6IhVR5byAGb2XlNdj29Qq//rdglPTOC8YfJ Xu9SOi30/AmFYEV5OH/U7K8z4Au4frHhqTEMzBqI= Date: Fri, 19 Oct 2018 22:51:37 -0700 From: Eric Biggers To: Ard Biesheuvel Cc: "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , linux-fscrypt@vger.kernel.org, linux-arm-kernel , Linux Kernel Mailing List , Herbert Xu , Paul Crowley , Greg Kaiser , Michael Halcrow , "Jason A . Donenfeld" , Samuel Neves , Tomer Ashur Subject: Re: [RFC PATCH v2 10/12] crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305 Message-ID: <20181020055136.GD876@sol.localdomain> References: <20181015175424.97147-1-ebiggers@kernel.org> <20181015175424.97147-11-ebiggers@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 20, 2018 at 12:12:56PM +0800, Ard Biesheuvel wrote: > On 16 October 2018 at 01:54, Eric Biggers wrote: > > From: Eric Biggers > > > > Add an ARM NEON implementation of NHPoly1305, an ε-almost-∆-universal > > hash function used in the Adiantum encryption mode. For now, only the > > NH portion is actually NEON-accelerated; the Poly1305 part is less > > performance-critical so is just implemented in C. > > > > Signed-off-by: Eric Biggers > > --- > > arch/arm/crypto/Kconfig | 5 ++ > > arch/arm/crypto/Makefile | 2 + > > arch/arm/crypto/nh-neon-core.S | 116 +++++++++++++++++++++++++ > > arch/arm/crypto/nhpoly1305-neon-glue.c | 78 +++++++++++++++++ > > 4 files changed, 201 insertions(+) > > create mode 100644 arch/arm/crypto/nh-neon-core.S > > create mode 100644 arch/arm/crypto/nhpoly1305-neon-glue.c > > > > diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig > > index cc932d9bba561..458562a34aabe 100644 > > --- a/arch/arm/crypto/Kconfig > > +++ b/arch/arm/crypto/Kconfig > > @@ -122,4 +122,9 @@ config CRYPTO_CHACHA20_NEON > > select CRYPTO_BLKCIPHER > > select CRYPTO_CHACHA20 > > > > +config CRYPTO_NHPOLY1305_NEON > > + tristate "NEON accelerated NHPoly1305 hash function (for Adiantum)" > > + depends on KERNEL_MODE_NEON > > + select CRYPTO_NHPOLY1305 > > + > > endif > > diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile > > index 005482ff95047..b65d6bfab8e6b 100644 > > --- a/arch/arm/crypto/Makefile > > +++ b/arch/arm/crypto/Makefile > > @@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o > > obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o > > obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o > > obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o > > +obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o > > > > ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o > > ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o > > @@ -53,6 +54,7 @@ ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o > > crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o > > crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o > > chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o > > +nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o > > > > ifdef REGENERATE_ARM_CRYPTO > > quiet_cmd_perl = PERL $@ > > diff --git a/arch/arm/crypto/nh-neon-core.S b/arch/arm/crypto/nh-neon-core.S > > new file mode 100644 > > index 0000000000000..434d80ab531c2 > > --- /dev/null > > +++ b/arch/arm/crypto/nh-neon-core.S > > @@ -0,0 +1,116 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +/* > > + * NH - ε-almost-universal hash function, NEON accelerated version > > + * > > + * Copyright 2018 Google LLC > > + * > > + * Author: Eric Biggers > > + */ > > + > > +#include > > + > > + .text > > + .fpu neon > > + > > + KEY .req r0 > > + MESSAGE .req r1 > > + MESSAGE_LEN .req r2 > > + HASH .req r3 > > + > > + PASS0_SUMS .req q0 > > + PASS0_SUM_A .req d0 > > + PASS0_SUM_B .req d1 > > + PASS1_SUMS .req q1 > > + PASS1_SUM_A .req d2 > > + PASS1_SUM_B .req d3 > > + PASS2_SUMS .req q2 > > + PASS2_SUM_A .req d4 > > + PASS2_SUM_B .req d5 > > + PASS3_SUMS .req q3 > > + PASS3_SUM_A .req d6 > > + PASS3_SUM_B .req d7 > > + K0 .req q4 > > + K1 .req q5 > > + K2 .req q6 > > + K3 .req q7 > > + T0 .req q8 > > + T0_L .req d16 > > + T0_H .req d17 > > + T1 .req q9 > > + T1_L .req d18 > > + T1_H .req d19 > > + T2 .req q10 > > + T2_L .req d20 > > + T2_H .req d21 > > + T3 .req q11 > > + T3_L .req d22 > > + T3_H .req d23 > > + > > +.macro _nh_stride k0, k1, k2, k3 > > + > > + // Load next message stride > > + vld1.8 {T3}, [MESSAGE]! > > + > > + // Load next key stride > > + vld1.32 {\k3}, [KEY]! > > + > > + // Add message words to key words > > + vadd.u32 T0, T3, \k0 > > + vadd.u32 T1, T3, \k1 > > + vadd.u32 T2, T3, \k2 > > + vadd.u32 T3, T3, \k3 > > + > > + // Multiply 32x32 => 64 and accumulate > > + vmlal.u32 PASS0_SUMS, T0_L, T0_H > > + vmlal.u32 PASS1_SUMS, T1_L, T1_H > > + vmlal.u32 PASS2_SUMS, T2_L, T2_H > > + vmlal.u32 PASS3_SUMS, T3_L, T3_H > > +.endm > > + > > Since we seem to have some spare NEON registers: would it help to have > a double round version of this macro? > It helps a little bit, but not much. The loads are the only part that can be optimized further. I think I'd rather have the shorter + simpler version, unless the loads can be optimized significantly more on other processors. Also, originally I had it loading the key and message for the next stride while doing the current one, but that didn't seem worthwhile either. > > +/* > > + * void nh_neon(const u32 *key, const u8 *message, size_t message_len, > > + * u8 hash[NH_HASH_BYTES]) > > + * > > + * It's guaranteed that message_len % 16 == 0. > > + */ > > +ENTRY(nh_neon) > > + > > + vld1.32 {K0,K1}, [KEY]! > > + vmov.u64 PASS0_SUMS, #0 > > + vmov.u64 PASS1_SUMS, #0 > > + vld1.32 {K2}, [KEY]! > > + vmov.u64 PASS2_SUMS, #0 > > + vmov.u64 PASS3_SUMS, #0 > > + > > + subs MESSAGE_LEN, MESSAGE_LEN, #64 > > + blt .Lloop4_done > > +.Lloop4: > > + _nh_stride K0, K1, K2, K3 > > + _nh_stride K1, K2, K3, K0 > > + _nh_stride K2, K3, K0, K1 > > + _nh_stride K3, K0, K1, K2 > > + subs MESSAGE_LEN, MESSAGE_LEN, #64 > > + bge .Lloop4 > > + > > +.Lloop4_done: > > + ands MESSAGE_LEN, MESSAGE_LEN, #63 > > + beq .Ldone > > + _nh_stride K0, K1, K2, K3 > > + > > + subs MESSAGE_LEN, MESSAGE_LEN, #16 > > + beq .Ldone > > + _nh_stride K1, K2, K3, K0 > > + > > + subs MESSAGE_LEN, MESSAGE_LEN, #16 > > + beq .Ldone > > + _nh_stride K2, K3, K0, K1 > > + > > +.Ldone: > > + // Sum the accumulators for each pass, then store the sums to 'hash' > > + vadd.u64 T0_L, PASS0_SUM_A, PASS0_SUM_B > > + vadd.u64 T0_H, PASS1_SUM_A, PASS1_SUM_B > > + vadd.u64 T1_L, PASS2_SUM_A, PASS2_SUM_B > > + vadd.u64 T1_H, PASS3_SUM_A, PASS3_SUM_B > > + vst1.8 {T0-T1}, [HASH] > > + bx lr > > +ENDPROC(nh_neon) > > diff --git a/arch/arm/crypto/nhpoly1305-neon-glue.c b/arch/arm/crypto/nhpoly1305-neon-glue.c > > new file mode 100644 > > index 0000000000000..df48a00f4c50f > > --- /dev/null > > +++ b/arch/arm/crypto/nhpoly1305-neon-glue.c > > @@ -0,0 +1,78 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * NHPoly1305 - ε-almost-∆-universal hash function for Adiantum > > + * (NEON accelerated version) > > + * > > + * Copyright 2018 Google LLC > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +asmlinkage void nh_neon(const u32 *key, const u8 *message, size_t message_len, > > + u8 hash[NH_HASH_BYTES]); > > + > > +static void _nh_neon(const u32 *key, const u8 *message, size_t message_len, > > + __le64 hash[NH_NUM_PASSES]) > > +{ > > + nh_neon(key, message, message_len, (u8 *)hash); > > +} > > + > > Why do we need this function? > For now it's not needed so I should probably just remove it, but it seems likely that indirect calls to assembly functions in the kernel will be going away in order to add support for CFI (control flow integrity). The android-4.9 and android-4.14 kernels support CFI on arm64, so you might notice that some of the arm64 crypto code had to be patched for this reason. - Eric