Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3546880pxb; Mon, 4 Apr 2022 20:46:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGf3g6Ln0JmmQCTYgh+OY0C3FQkYvCQtDTiZIJR9yBuOkWoHyux6Ui5OFW+sQLX5Pel1Ys X-Received: by 2002:a17:90b:4a06:b0:1c7:2020:b5b9 with SMTP id kk6-20020a17090b4a0600b001c72020b5b9mr1701009pjb.58.1649130417894; Mon, 04 Apr 2022 20:46:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649130417; cv=none; d=google.com; s=arc-20160816; b=ZPXCUoGUEdKPLddDUeCzY2jp2QN7Q2TpNgfhFb2jh0Z7DjWj0jMsBCaLr4V1aw1zJV gdkVjOx2Mu6wYeNtVnjLFR3RJFh8fNZnBgcSpXh3Abz67ICTBLLvqDYWlenbk6fFew+n XnLVm3ipkqp8FVUUjimYl3WoKihfN8RpGmLqXUko8QDeGeh41yYJYsn3GxvgYEHqSqNe qJRIBOhFkQUq7pHnqirFvXZS4zoCLm/2oEEaM0YHzjhjpzqLGs1j42h5N+OKodL8EWo5 C7xxsMXDD2SLwjtIYB2y/wMnXbW99dQb7x6KPYuTSAkZa7k/tXAk2VA9JaJkfLh7vQpu mAAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=cRMKHMEeY9+RKsGGifttcYB8078b1P2NNUuN3rXNHsA=; b=Iln+Bi5f0APvSidj6+YORn9ncOO7AHMk/GFSXLDxj/erMp1Sjn32Nx3yhuKMokHRSc IX/m7HYyYwMukZUxF9BxjFTsq6IECxbP7RyM4berfcnoRjbsIAo7kv0t9AbU1kqulDWM modqy9RH+hx41QtuitRPGt6l9TYdLhhcYvx+7kzBv5yTCmH59u40k0xjGzO7Royv2oEf lumKNcVtRoXkHvlKdlcOqUcJ4hreZhmRd1cKk7WPv35Pbt6wAjfN+uQ174H87144Etm0 gUDCEBbElHl96sIc+2OUIOxblMdYtIUmkjtjLwW+sKjL6g91c6QgR7LfhbYoz/3ezsQx qhoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=LlZFviyz; spf=softfail (google.com: domain of transitioning linux-crypto-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id q21-20020a63e215000000b0039905cd175dsi6972013pgh.854.2022.04.04.20.46.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 20:46:57 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-crypto-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=LlZFviyz; spf=softfail (google.com: domain of transitioning linux-crypto-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EA0B32DC241; Mon, 4 Apr 2022 19:48:30 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229912AbiDECt4 (ORCPT + 99 others); Mon, 4 Apr 2022 22:49:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230075AbiDECtm (ORCPT ); Mon, 4 Apr 2022 22:49:42 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49C813256E5 for ; Mon, 4 Apr 2022 18:56:07 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id y32so5333994lfa.6 for ; Mon, 04 Apr 2022 18:56:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cRMKHMEeY9+RKsGGifttcYB8078b1P2NNUuN3rXNHsA=; b=LlZFviyzsfP62rQJSDGGCX3c9ZzlqUhIw0Grkn+0VsM5rBKFmnTsvaGr6mPFYw/tj6 ZBlT4zeTLpFbNjJFxipLzirk3SZnQBkKZztZrogl8TdhoIqBfNsmEHMQAwoLN4wDBSb1 RX6dA3jOFOe37TqV8nAumLk8bTAHjDZcvlCQ7IG1JaltWKxP8ngirmIA6lxxLG8A2Jz+ A/eJ43HBVW5k9XKtbxoZealcoZb77DOHQc72k2eDNR0zVMPrdx6whfcxtnwjka6gcAen A1ItInfmns7dDfPE74B3SyF17i4AKPzOoaFKdp52yNvyaChvEdHGjckkPvPTrw3IIDYA fxoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cRMKHMEeY9+RKsGGifttcYB8078b1P2NNUuN3rXNHsA=; b=fy7Nh8VjWSti+hV+3yp0rlKH8rg6107knL8WvM2GQKyA4nuVrdYXVmg9x6+iLJ0jvE I094FIghrlyqtSlx4aZ59lapJhPRVcga53HzVmwsLV7/68FR8wNKSrOB5A5BQPUgjtLc lTNX9O3KNke0S1TqSKkhYjy5luszx/j3Glk0so/VUzcqAY4p2uUz8WK0fuZ4I7B7Z8KM G9/sq2xHOLpP5rR7PvhqI6RswCJz2L75oDEs53YayICwLDc05tiKrCfMkXxYxOvLrsWd IMNam1MFH+uU0ct3agpjCkIEsO8vP/xwYfWp1wQx5hImb34Av/qc+XqfmUAYZ2dsWsHM ///g== X-Gm-Message-State: AOAM5304TSnYc9+kPPgXFYGD83finctPKaCWN2n5LxTG9cw4Y5xmkhn/ Bv5uCalkwLif0y8IsgFOzjG3pNNz2Cqfuzl0EEKPmCJDkfg= X-Received: by 2002:a05:6512:1153:b0:44a:3b47:4f88 with SMTP id m19-20020a056512115300b0044a3b474f88mr852181lfg.447.1649123765408; Mon, 04 Apr 2022 18:56:05 -0700 (PDT) MIME-Version: 1.0 References: <20220315230035.3792663-1-nhuck@google.com> <20220315230035.3792663-8-nhuck@google.com> In-Reply-To: From: Nathan Huckleberry Date: Mon, 4 Apr 2022 20:55:54 -0500 Message-ID: Subject: Re: [PATCH v3 7/8] crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL To: Eric Biggers Cc: linux-crypto@vger.kernel.org, Herbert Xu , "David S. Miller" , linux-arm-kernel@lists.infradead.org, Paul Crowley , Sami Tolvanen , Ard Biesheuvel Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Wed, Mar 23, 2022 at 8:37 PM Eric Biggers wrote: > > On Tue, Mar 15, 2022 at 11:00:34PM +0000, Nathan Huckleberry wrote: > > Add hardware accelerated version of POLYVAL for ARM64 CPUs with > > Crypto Extension support. > > Nit: It's "Crypto Extensions", not "Crypto Extension". > > > +config CRYPTO_POLYVAL_ARM64_CE > > + tristate "POLYVAL using ARMv8 Crypto Extensions (for HCTR2)" > > + depends on KERNEL_MODE_NEON > > + select CRYPTO_CRYPTD > > + select CRYPTO_HASH > > + select CRYPTO_POLYVAL > > CRYPTO_POLYVAL selects CRYPTO_HASH already, so there's no need to select it > here. > > > +/* > > + * Perform polynomial evaluation as specified by POLYVAL. This computes: > > + * h^n * accumulator + h^n * m_0 + ... + h^1 * m_{n-1} > > + * where n=nblocks, h is the hash key, and m_i are the message blocks. > > + * > > + * x0 - pointer to message blocks > > + * x1 - pointer to precomputed key powers h^8 ... h^1 > > + * x2 - number of blocks to hash > > + * x3 - pointer to accumulator > > + * > > + * void pmull_polyval_update(const u8 *in, const struct polyval_ctx *ctx, > > + * size_t nblocks, u8 *accumulator); > > + */ > > +SYM_FUNC_START(pmull_polyval_update) > > + adr TMP, .Lgstar > > + ld1 {GSTAR.2d}, [TMP] > > + ld1 {SUM.16b}, [x3] > > + ands PARTIAL_LEFT, BLOCKS_LEFT, #7 > > + beq .LskipPartial > > + partial_stride > > +.LskipPartial: > > + subs BLOCKS_LEFT, BLOCKS_LEFT, #NUM_PRECOMPUTE_POWERS > > + blt .LstrideLoopExit > > + ld1 {KEY8.16b, KEY7.16b, KEY6.16b, KEY5.16b}, [x1], #64 > > + ld1 {KEY4.16b, KEY3.16b, KEY2.16b, KEY1.16b}, [x1], #64 > > + full_stride 0 > > + subs BLOCKS_LEFT, BLOCKS_LEFT, #NUM_PRECOMPUTE_POWERS > > + blt .LstrideLoopExitReduce > > +.LstrideLoop: > > + full_stride 1 > > + subs BLOCKS_LEFT, BLOCKS_LEFT, #NUM_PRECOMPUTE_POWERS > > + bge .LstrideLoop > > +.LstrideLoopExitReduce: > > + montgomery_reduction > > + mov SUM.16b, PH.16b > > +.LstrideLoopExit: > > + st1 {SUM.16b}, [x3] > > + ret > > +SYM_FUNC_END(pmull_polyval_update) > > Is there a reason why partial_stride is done first in the arm64 implementation, > but last in the x86 implementation? It would be nice if the implementations > worked the same way. Probably last would be better? What is the advantage of > doing it first? It was so I could return early without loading keys into registers, since I only need them if there's a full stride. I was able to rewrite it in the same way that the x86 implementation works. > > Besides that, many of the comments I made on the x86 implementation apply to the > arm64 implementation too. > > - Eric