Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp8868pxb; Tue, 28 Sep 2021 14:05:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzwGFTt6z38pqNqqt29DzbOWyFZCNITsKvwlVjqUtBlTdTjWkZ9EUaQCkm7WC/Imt3z7Y3N X-Received: by 2002:a62:7f87:0:b0:444:b077:51ef with SMTP id a129-20020a627f87000000b00444b07751efmr7231723pfd.61.1632863132974; Tue, 28 Sep 2021 14:05:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632863132; cv=none; d=google.com; s=arc-20160816; b=d/To4Fpx/6CwYmVyQXYaqZkhgxbuJAr3cti/up8I2rN1cqqsXIxBYGZjTgur218CSu ivO97ZXhuwyhc2VDUAaUQWd1NFZGUlWAu/RNP4fpx9mzH6nKbsReo+5+TkLYtskc0SK/ 7nODwZlVc7N1lm9lwSdcFMDz80A3le5f6IE5Nm++A7vUt1dNpjCjYSH5v3t1D5MNIwa1 XEs0HSMcZNXdyvahIJ9nP/EtjSTLH4UePMRivCVRVzeRP8+Q1ZBArRAxqhgtw7oYYuww kisZJpNIglf7wtWzTQzSMZQLeFP6kD0dOduUimPRXUtMx3g72rfOCkk65r029dDWj24/ Ecew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mMl27IwjhBg7HWOkJMPpCvjN2gbSUIfeGG9XKpftzxc=; b=AcPjJBWVYkLc5UMWzzSmGKB/ldhPT8IcW98YWT5qhjRYwTCMhl7mOYGtsY0WKdFH7d NPrYeG57gry8uNwYGeETFaCy25L0j11X8BztVMmG3JmH/aqvqeRTsFlZ5KEic44VhNcW XwEcLhF/wQnmLJPJdLfUnAN70PUs/V2vfoXi4NBK5BXaBttojUwjqGdxZdfYaJ1VK5H0 V8QGyAzu16F9Aco2AuNk15L6sHUa5sJOuxrjrkapGxKrLXOzEi7an4ZzzaXoRozxWSiv 4D673JXL3riXLHCro9iK2ycSy/JpqD/ZVaz8cn0xI29cqvOuYM1QmodBdVThS3E0IEHL /yQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pOKMzWaH; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 16si4119288pjy.186.2021.09.28.14.05.11; Tue, 28 Sep 2021 14:05:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pOKMzWaH; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242843AbhI1VFz (ORCPT + 99 others); Tue, 28 Sep 2021 17:05:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:39130 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242829AbhI1VFy (ORCPT ); Tue, 28 Sep 2021 17:05:54 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id DF94B6135D; Tue, 28 Sep 2021 21:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632863054; bh=dx0ZpQGYPCR29OtNi20oDfVXROdfnahILA1jroXWLp4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=pOKMzWaHmapYq1YuLdn+RCjcUg7fQ9E4Dl4GTjwC2dhMggB8eUCq8QlQx36upm4hH SnA7CJ+5nFgf2CYkaeLTKAImpCoAvM5ddMBlvngiDusEaRWT3HLdOH1U5suzGgqXit I+NAUgFQKt2SEST7bf6evK/aZWqRw2kgs2+SfPaNfbXKTKocYKA9xuxAfqqXxBGkea 3RoOPiIW/6rw4I8sSSawvZwflBJ81tc+/JrnIHSPSuxYE9yhdm4Trhgmxsg56WHBHO S3f9mZkc9oQk3vqj9ilUNSVD+Aabhrb9K/rbf1SO4JqXLqlYPGP7Td7KBB5lJOI3k5 eQ2tdu8/dvoKA== Received: by mail-oi1-f172.google.com with SMTP id s69so111684oie.13; Tue, 28 Sep 2021 14:04:14 -0700 (PDT) X-Gm-Message-State: AOAM531ofdOW8HSfWuk0K11JudaJmP9fYejDdGWvUHI9NOGYRwU1J9rM 1SIY9Nxjsy+fApRiIC1rJJrNfr72uI8gyItuSRs= X-Received: by 2002:a05:6808:1148:: with SMTP id u8mr5129711oiu.33.1632863054165; Tue, 28 Sep 2021 14:04:14 -0700 (PDT) MIME-Version: 1.0 References: <20210923063027.166247-1-xiaokang.qian@arm.com> In-Reply-To: From: Ard Biesheuvel Date: Tue, 28 Sep 2021 23:04:03 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way interleave of aes and ghash To: Eric Biggers Cc: XiaokangQian , Herbert Xu , "David S. Miller" , Catalin Marinas , Will Deacon , nd , Linux Crypto Mailing List , Linux ARM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, 28 Sept 2021 at 08:27, Eric Biggers wrote: > > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote: > > To improve performance on cores with deep piplines such as A72,N1, > > implement gcm(aes) using a 4-way interleave of aes and ghash (totally > > 8 blocks in parallel), which can make full utilize of pipelines rather > > than the 4-way interleave we used currently. It can gain about 20% for > > big data sizes such that 8k. > > > > This is a complete new version of the GCM part of the combined GCM/GHASH > > driver, it will co-exist with the old driver, only serve for big data > > sizes. Instead of interleaving four invocations of AES where each chunk > > of 64 bytes is encrypted first and then ghashed, the new version uses a > > more coarse grained approach where a chunk of 64 bytes is encrypted and > > at the same time, one chunk of 64 bytes is ghashed (or ghashed and > > decrypted in the converse case). > > > > The table below compares the performance of the old driver and the new > > one on various micro-architectures and running in various modes with > > various data sizes. > > > > | AES-128 | AES-192 | AES-256 | > > #bytes | 1024 | 1420 | 8k | 1024 | 1420 | 8k | 1024 | 1420 | 8k | > > -------+------+------+-----+------+------+-----+------+------+-----+ > > A72 | 5.5% | 12% | 25% | 2.2% | 9.5%| 23%| -1% | 6.7%| 19% | > > A57 |-0.5% | 9.3%| 32% | -3% | 6.3%| 26%| -6% | 3.3%| 21% | > > N1 | 0.4% | 7.6%|24.5%| -2% | 5% | 22%| -4% | 2.7%| 20% | > > > > Signed-off-by: XiaokangQian > > Does this pass the self-tests, including the fuzz tests which are enabled by > CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y? > Please test both little-endian and big-endian. (Note that you don't need a big-endian user space for this - the self tests are executed before the rootfs is mounted) Also, you will have to rebase this onto the latest cryptodev tree, which carries some changes I made recently to this driver. Finally, I'd like to discuss whether we really need two separate drivers here. The 1k data point is not as relevant as the other ones, which show a worthwhile speedup for all micro architectures and data sizes (although I will give this a spin on TX2 myself when I have the chance) *If* we switch to this implementation completely, I would like to keep the improvement I added recently to the decrypt path to compare the tag using SIMD code, rather than copying it out and using memcmp(). Could you look into adopting this for this version as well? -- Ard.