Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp396839pxb; Thu, 30 Sep 2021 08:21:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwjYGsWsmWFMZEFFmFB3pYWe5ylgSJ6+KAnnhSHJ1oI2jr20bTv7tMWSO/QhB/dtoQ3gupi X-Received: by 2002:a50:da0a:: with SMTP id z10mr7838159edj.95.1633015265919; Thu, 30 Sep 2021 08:21:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633015265; cv=none; d=google.com; s=arc-20160816; b=OUnVPF4GEkG5F1F3kS21nmgWNKaebeGZ7tfa1uE5lEKeKfaUVdw+MQqhEMYPxCafJO vN+12G0Zj63dWjk+oexHuJ0M2Jspq8dFd/TPWCy1LlEOVfGl0VheUbBxmCYOrtAzMVkg yN5HdBMMtvUyqA+61Jf1XEoCHFGoaQ6xNk+4kJoufpFkdkSxn1nX/BbGzvDtTYPBMd47 GJzReuwo4OtAMpaTz217NH6XMuO3WYMn9pJ1uDTAXN8qTLAptbA5ZZLszBQ6yCu1zypy 6cTrYB+hSaugDMbTLaJ7WHmkAr6xCsJKd0phFwmCjJJ7YHvLg29cTna6yzLq2NEqjePk l5Zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=3ZhH8CsyIjZoP6++T/o40BNakkgYAqRzLIuRIqB696c=; b=Nk5pQ6LYTIhEHUZjOCB8lemQZs12Q5y5fkaickNOkY4SdPDb8MKxHUc7a2W1DHYuj3 PFaIy30JnkBD05PhcXq9Nk4M5cL/Uk8CqF4eIl+lflA7Lsw0aNcd/UAjWS+sNntkgxld ouOrAZkjsEhVQY1kmc8t03wrUn94leIDFSCCpsunCw5oQR15AYrosXiSNOrxAs6oZsXw HMM0uW8Z5LvBVdrAla0PW/SEZgfpa4OESlaxj/3v912kQ/k0jAOGQnbDjyySzxiljmJh Em4zD6g8WwEP3X8Pfr3Gqf6IBdu8yoyOHWHFEsuAF1vs1zgdpp7MA8sRtvdZcbwfo/fp ibgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=qU8OrPaM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mp6si5124746ejc.384.2021.09.30.08.20.38; Thu, 30 Sep 2021 08:21:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=qU8OrPaM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352027AbhI3O7I (ORCPT + 99 others); Thu, 30 Sep 2021 10:59:08 -0400 Received: from mail.kernel.org ([198.145.29.99]:45868 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352018AbhI3O7E (ORCPT ); Thu, 30 Sep 2021 10:59:04 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E18966128E; Thu, 30 Sep 2021 14:57:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1633013841; bh=Ul6eldZNU2CyYIjwQhqq+NM/92KkckTfZvBzAyidh0U=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=qU8OrPaMSytRZUhHLphRmXUU8eJafDbVZdicAkGC6IPTwxIHpPMfnF4bqsgWXNEXn lv4xdKF/dclZfeqCvb9Dx4xCxEGOntmdjjDRAhcyPiwq2yFrpO+S5PsPbFqzxZOOvt ZSBdRCvQej1tY9T+7OYM/c3k1WhoDaJKLQPKCsEuJG5VlJoD9EzLrmoRwp55NGmPqj 2G2uGw9BdnWp+iYb7FpjWHxDKH5BJJWV+cptWJMgUvdrrmkbAbsyE34HAmUMY+9nWb F1zLc41Om3yeVXR/dHYY0Pw7M2Po0D4Vybl5GO6RN/qnAlqqHIxbMmdzvM7LlPXBLM XIjFyg/ruuQ/w== Received: by mail-oi1-f182.google.com with SMTP id s69so7554699oie.13; Thu, 30 Sep 2021 07:57:21 -0700 (PDT) X-Gm-Message-State: AOAM5314oN1eR4Cav+aP7uniNQmPh1/RMyrepz4KqBLR6/HXfC+dgm/2 mTdA820sHUVkVzcrb+2xEfN6HOhOpJXYzu5xagE= X-Received: by 2002:aca:32c2:: with SMTP id y185mr3140111oiy.47.1633013841293; Thu, 30 Sep 2021 07:57:21 -0700 (PDT) MIME-Version: 1.0 References: <20210923063027.166247-1-xiaokang.qian@arm.com> In-Reply-To: From: Ard Biesheuvel Date: Thu, 30 Sep 2021 16:57:09 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way interleave of aes and ghash To: Xiaokang Qian Cc: Eric Biggers , Herbert Xu , "David S. Miller" , Catalin Marinas , Will Deacon , nd , Linux Crypto Mailing List , Linux ARM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 30 Sept 2021 at 03:32, Xiaokang Qian wrote: > > Thanks for the review. > > I will firstly change the decrypt path to compare the tag using SIMD code= , and then pass all of the self tests include fuzz tests(enabled by CONFIG= _CRYPTO_MANAGER_EXTRA_TESTS=3Dy), big endian ,little endian tests. > OK > About the 1K data point, I just remember that the 1420 bytes packet is co= mmonly used in IPSEC. > Yes, but your code is faster than the existing code for 1420 byte packets, right? So why should we keep the original code? We don't use GCM for block storage, and if IPsec throughput is a key performance metric for your system, you are likely to be using the maximum packet size so 1420 bytes not 1k. > > -----Original Message----- > From: Ard Biesheuvel > Sent: Wednesday, September 29, 2021 5:04 AM > To: Eric Biggers > Cc: Xiaokang Qian ; Herbert Xu ; David S. Miller ; Catalin Marinas ; Will Deacon ; nd ; Linux Cr= ypto Mailing List ; Linux ARM ; Linux Kernel Mailing List > Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way inter= leave of aes and ghash > > On Tue, 28 Sept 2021 at 08:27, Eric Biggers wrote: > > > > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote: > > > To improve performance on cores with deep piplines such as A72,N1, > > > implement gcm(aes) using a 4-way interleave of aes and ghash > > > (totally > > > 8 blocks in parallel), which can make full utilize of pipelines > > > rather than the 4-way interleave we used currently. It can gain > > > about 20% for big data sizes such that 8k. > > > > > > This is a complete new version of the GCM part of the combined > > > GCM/GHASH driver, it will co-exist with the old driver, only serve > > > for big data sizes. Instead of interleaving four invocations of AES > > > where each chunk of 64 bytes is encrypted first and then ghashed, > > > the new version uses a more coarse grained approach where a chunk of > > > 64 bytes is encrypted and at the same time, one chunk of 64 bytes is > > > ghashed (or ghashed and decrypted in the converse case). > > > > > > The table below compares the performance of the old driver and the > > > new one on various micro-architectures and running in various modes > > > with various data sizes. > > > > > > | AES-128 | AES-192 | AES-256 = | > > > #bytes | 1024 | 1420 | 8k | 1024 | 1420 | 8k | 1024 | 1420 | = 8k | > > > -------+------+------+-----+------+------+-----+------+------+--= ---+ > > > A72 | 5.5% | 12% | 25% | 2.2% | 9.5%| 23%| -1% | 6.7%| 1= 9% | > > > A57 |-0.5% | 9.3%| 32% | -3% | 6.3%| 26%| -6% | 3.3%| 2= 1% | > > > N1 | 0.4% | 7.6%|24.5%| -2% | 5% | 22%| -4% | 2.7%| > > > 20% | > > > > > > Signed-off-by: XiaokangQian > > > > Does this pass the self-tests, including the fuzz tests which are > > enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=3Dy? > > > > Please test both little-endian and big-endian. (Note that you don't need = a big-endian user space for this - the self tests are executed before the r= ootfs is mounted) > > Also, you will have to rebase this onto the latest cryptodev tree, which = carries some changes I made recently to this driver. > > Finally, I'd like to discuss whether we really need two separate drivers = here. The 1k data point is not as relevant as the other ones, which show a = worthwhile speedup for all micro architectures and data sizes (although I w= ill give this a spin on TX2 myself when I have the > chance) > > *If* we switch to this implementation completely, I would like to keep th= e improvement I added recently to the decrypt path to compare the tag using= SIMD code, rather than copying it out and using memcmp(). > Could you look into adopting this for this version as well? > > -- > Ard.