Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp920233pxb; Thu, 24 Mar 2022 09:16:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxxqyrolYAF4z9Ii6LC5MhtNuJj+7+QR8HeY44pTty1yhwCBm6APw6A+G0bqiWBdEIhhusd X-Received: by 2002:a17:906:d554:b0:6df:a6f8:799a with SMTP id cr20-20020a170906d55400b006dfa6f8799amr6633969ejc.492.1648138565369; Thu, 24 Mar 2022 09:16:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648138565; cv=none; d=google.com; s=arc-20160816; b=UHq7HWwFbgq52sr53sUiVt6rddKOfGeH2qWxUdp55AC/zcivYjEmzVECIbBj9PTLNA DbglUnBKJBQyuO60AzwOHm4Ab5lYg89URbFPaD4sOUs1OweW8ALWwuixHv24MJdZlwEq LseMofiZMaTeFKv+tfuWS8iY/FFzMlClV0APmjLC+j3cK3Q52W7Mqb08D5f75OFPlBqA 6ixEQq+cuz0ZZabhMrlVd3sed3rmgwmMLRtuD0/u2A3xnTRao4lymsgqkxT1WJDq/FkC 0/xPWmZUzZJdi+WCcqGWz+zSTZ9KxVtOc58k9JVf1TwIubYuns3PQ48WcNBB7elMU5uM KtBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=AKlQdYKQiqMy27/YkXTgymVnp7mfFYwf5fafKrWStl8=; b=alcfQ2Dpc4nHC40p8AqNTpUE+RT72sgfwTBQYRMGZLbIIDw/y6945QNWxi3Ljvfdcu DHz9jy1v/OdXM0X7ArbecMW70l0jRqB8ZW4Oa70cHH2+bMpaxZg6PGXCudbLwLQr/gIC t6/816ABDDoZEBqnzwuVB6xWH2vWFhwcBi41xR4t1FG7KhXmsHeFwcpE9FTUL4wYBIxm r3pKUmdEZYKmWzW9SIEy10A8F5zE/DOyc2ki3u38gZeMwwQQrNoirWKGHLk50gv/QCgg UxpE4mlOzZaWLMp3mJEIBfFv0NndeBZVstDuaFslIiHmiXUQIsKgPVYmaaCn1Wu+1G/O l2tQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S+QKPpcH; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e22-20020a17090658d600b006dfd7b161b1si15763641ejs.256.2022.03.24.09.15.26; Thu, 24 Mar 2022 09:16:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S+QKPpcH; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240020AbiCXBjR (ORCPT + 99 others); Wed, 23 Mar 2022 21:39:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233077AbiCXBjQ (ORCPT ); Wed, 23 Mar 2022 21:39:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8C9892D25 for ; Wed, 23 Mar 2022 18:37:45 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 444F86193C for ; Thu, 24 Mar 2022 01:37:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B169C340E9; Thu, 24 Mar 2022 01:37:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1648085864; bh=a1kYGdgex96NKMG8cYYpKD+LeiQiL8pwC9I+H0A1ER8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=S+QKPpcHrwUQ7wMwJahqQnSZ1XpEdl/9jrkggX3oLufEuLi01P5fM7z39y3pmqU57 Rs/2ByH+J/PTJdyLHBfJkvZwt8UYE3BqNWlZSbleYnR4+hXzt4XKxDB6cyVrazsL9w ZsahPVNZSHXFwftn2S+JWvEnBuvdewYrou9mykvdp8hSuyR8JJmwxdAG6CKFaAJN/t 7r2+rHFV8+ZuhL+fcz9SFgXoP95qb5E7uuBBqPO27+6tufVELTFiGutiAPUzHHtwiR KzsT0LMs20jEq94RyT8dGlrx5TzguCytV8iVNzDDKmwq+S3smWazfGLa5/MxGp763b C5mTEYjtPJMKQ== Date: Wed, 23 Mar 2022 18:37:42 -0700 From: Eric Biggers To: Nathan Huckleberry Cc: linux-crypto@vger.kernel.org, Herbert Xu , "David S. Miller" , linux-arm-kernel@lists.infradead.org, Paul Crowley , Sami Tolvanen , Ard Biesheuvel Subject: Re: [PATCH v3 7/8] crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL Message-ID: References: <20220315230035.3792663-1-nhuck@google.com> <20220315230035.3792663-8-nhuck@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220315230035.3792663-8-nhuck@google.com> X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, Mar 15, 2022 at 11:00:34PM +0000, Nathan Huckleberry wrote: > Add hardware accelerated version of POLYVAL for ARM64 CPUs with > Crypto Extension support. Nit: It's "Crypto Extensions", not "Crypto Extension". > +config CRYPTO_POLYVAL_ARM64_CE > + tristate "POLYVAL using ARMv8 Crypto Extensions (for HCTR2)" > + depends on KERNEL_MODE_NEON > + select CRYPTO_CRYPTD > + select CRYPTO_HASH > + select CRYPTO_POLYVAL CRYPTO_POLYVAL selects CRYPTO_HASH already, so there's no need to select it here. > +/* > + * Perform polynomial evaluation as specified by POLYVAL. This computes: > + * h^n * accumulator + h^n * m_0 + ... + h^1 * m_{n-1} > + * where n=nblocks, h is the hash key, and m_i are the message blocks. > + * > + * x0 - pointer to message blocks > + * x1 - pointer to precomputed key powers h^8 ... h^1 > + * x2 - number of blocks to hash > + * x3 - pointer to accumulator > + * > + * void pmull_polyval_update(const u8 *in, const struct polyval_ctx *ctx, > + * size_t nblocks, u8 *accumulator); > + */ > +SYM_FUNC_START(pmull_polyval_update) > + adr TMP, .Lgstar > + ld1 {GSTAR.2d}, [TMP] > + ld1 {SUM.16b}, [x3] > + ands PARTIAL_LEFT, BLOCKS_LEFT, #7 > + beq .LskipPartial > + partial_stride > +.LskipPartial: > + subs BLOCKS_LEFT, BLOCKS_LEFT, #NUM_PRECOMPUTE_POWERS > + blt .LstrideLoopExit > + ld1 {KEY8.16b, KEY7.16b, KEY6.16b, KEY5.16b}, [x1], #64 > + ld1 {KEY4.16b, KEY3.16b, KEY2.16b, KEY1.16b}, [x1], #64 > + full_stride 0 > + subs BLOCKS_LEFT, BLOCKS_LEFT, #NUM_PRECOMPUTE_POWERS > + blt .LstrideLoopExitReduce > +.LstrideLoop: > + full_stride 1 > + subs BLOCKS_LEFT, BLOCKS_LEFT, #NUM_PRECOMPUTE_POWERS > + bge .LstrideLoop > +.LstrideLoopExitReduce: > + montgomery_reduction > + mov SUM.16b, PH.16b > +.LstrideLoopExit: > + st1 {SUM.16b}, [x3] > + ret > +SYM_FUNC_END(pmull_polyval_update) Is there a reason why partial_stride is done first in the arm64 implementation, but last in the x86 implementation? It would be nice if the implementations worked the same way. Probably last would be better? What is the advantage of doing it first? Besides that, many of the comments I made on the x86 implementation apply to the arm64 implementation too. - Eric