Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp647383lqo; Wed, 8 May 2024 10:19:41 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCV43ZhxJ37EJL7MgNPyQb20P9In+athstD2qgGnRZKbjB14WSytlhaUb0cHq4qHYRO91Feq3jXTKWZNAyh3TEGyOIdwMSUuNRDH/R1Ecg== X-Google-Smtp-Source: AGHT+IGLYxjqywQ8Bt55GTp9YQvbA+cSUAb1BfHVqjlSSlUCYjSdpSHGCeTnVJ63z0U4hKBqEO1V X-Received: by 2002:a05:6214:411b:b0:6a0:7b74:62ee with SMTP id 6a1803df08f44-6a1514c7f70mr32035936d6.47.1715188781186; Wed, 08 May 2024 10:19:41 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715188781; cv=pass; d=google.com; s=arc-20160816; b=0q6zcdvJ9jK+MBBofOGwPgW1nN7RfAaOyZbICT+P5yFy4IHFzgWmQc7fdJQRLC5yPn 18hEdh49Ij88KN1ZtxgFhwkxdg90XeJxhOomF7110RflrauF/A3UMVTOUcIQKIHTjlM8 Z2UenWvwl4yrXY60gHJsTbXkPA4JC9VZbyj6jGtqF4CCpnkQRJKfEN2P1HFTRS7JnfQK 4V3SRotPuTrPyytrO1TNkCCNKjhwp0I7xFtlZtrgcrannGXeuMWVBZ1ARZi4R5JjPt5n devhXNA4IO8RAI8B/PBuATPZKHA68sYOyqQqdsV48YLHNqShNsev+Mswh0GcbC9TIh5P WW5g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=F3H2ZaNVI9Pa5QbZGe8TUFkh7sUABllCE8PXgDpqd2I=; fh=oE2crgUXKZzvBo8h5exJwAAsAFhN1DUPMjE56EudZV4=; b=Ix31o/Hnry9HEUBpQVjiZ71DH8nKeE+MqRLwufx234gBUL4VJ0VrExnP4erL6EHsze 4jxMc0o7DdODutEadBy8/KJCGt2gDCVybhxwxiWNdWWjvpwJnsJmtaX3+WnXjuTrMnC8 0dB8Q2d2a+NYgoUxpy4jx+f63DN6gVa4NCB53hjPSCaOKNwCZb4EpUTBXD+QNheumef3 5RmpeGi+CHbocH93UkaiRjmCoBatJGGNF38L6tH8UALNDj0O9wimjviQq40F5YUySzWh xGhRSFQ2mCcbN0DGrXli90572E2/zNnaDaAx+tYeyzQnFxGHdObTkatLOaD6hMjymMGi 660w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gOPGAxJC; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-crypto+bounces-4071-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-crypto+bounces-4071-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id f4-20020a0562141d2400b006a0c94d5cbesi14998800qvd.42.2024.05.08.10.19.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 May 2024 10:19:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto+bounces-4071-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gOPGAxJC; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-crypto+bounces-4071-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-crypto+bounces-4071-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AFE291C21A94 for ; Wed, 8 May 2024 17:19:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 40B9E4CE19; Wed, 8 May 2024 17:19:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gOPGAxJC" X-Original-To: linux-crypto@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF78114A8C; Wed, 8 May 2024 17:19:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715188776; cv=none; b=QSSPk5xp6dG8PwvOWwm7gejQZyRvlj0oUIQAnRtYJSzDOIm7cpQg0uze/4IOmD35ts73sel131NDGhfFN2HxQn8RoXnHz7bXrXJfGnhcAxkaDDswYngK5Hjbq1zxMKU+FDowhpah6Z9JQFg11fm/E4C1enPqLqnlulkRS9BRoVg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715188776; c=relaxed/simple; bh=pb5+k8XHSQ0D1eISbFZIj5PJPfHcwqj3VC8RT4WOmu4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZHyHe7ciRBBe3xzIwi5FV1nwo+QvQwapeO3tnYyEmrATl6d3ty52XSaiCZEjYyKund01Fz4ouZkxCT60OCFtGgtLjDSGKUbJuV/MmGjhoDkTictUyMnO7X52SOvhF6MPhrzcLXLBkYUlgy2WRI+ySxKyiHGZbt1e2QlWaFYBVp4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gOPGAxJC; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7F62C2BD11; Wed, 8 May 2024 17:19:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715188775; bh=pb5+k8XHSQ0D1eISbFZIj5PJPfHcwqj3VC8RT4WOmu4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gOPGAxJCwfQ//lFHHvGPvIptaT5wyEznoXu97Z6asxiORAI817VzhWZyu3W2mRax+ CUXO9V6guRw9JfI7R7h1be/J57Syt0Fhcs8G1u5vnojLk74DwZWeBKQ9nGnvZhVGeq C0bLXWbiq4rP5Bv/N0qRnIfbJBYX/N48iy2O1bYWOXT7nCWI7rmCYZTC3HdwmMKVQS Q36M7iPaC0mmDlClVh92kGaDFYZf8QTPER40KADP8jUN/5NQDFyTyA3rsSaGaPQayJ HTEdMYIHGHio+1YpKc91NAj6m4qkR28PuX92R22tnDEqru7MlXeQKaSWDjrOiz1MSQ jcRtgJeOiMYIg== Date: Wed, 8 May 2024 10:19:34 -0700 From: Eric Biggers To: Ard Biesheuvel Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM Message-ID: <20240508171934.GA19059@sol.localdomain> References: <20240508071734.199462-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, May 08, 2024 at 11:01:25AM +0200, Ard Biesheuvel wrote: > > diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c > > index 5b25d2a58aeb..e4dec49023af 100644 > > --- a/arch/x86/crypto/aesni-intel_glue.c > > +++ b/arch/x86/crypto/aesni-intel_glue.c > > @@ -1212,17 +1212,481 @@ static struct simd_skcipher_alg *aes_xts_simdalg_##suffix > > DEFINE_XTS_ALG(aesni_avx, "xts-aes-aesni-avx", 500); > > #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ) > > DEFINE_XTS_ALG(vaes_avx2, "xts-aes-vaes-avx2", 600); > > DEFINE_XTS_ALG(vaes_avx10_256, "xts-aes-vaes-avx10_256", 700); > > DEFINE_XTS_ALG(vaes_avx10_512, "xts-aes-vaes-avx10_512", 800); > > -#endif > > + > > +#define NUM_KEY_POWERS 16 /* excludes zero padding */ > > +#define FULL_NUM_KEY_POWERS (NUM_KEY_POWERS + 3) /* includes zero padding */ > > + > > +struct aes_gcm_key_avx10 { > > + struct crypto_aes_ctx aes_key AESNI_ALIGN_ATTR; > > + u32 rfc4106_nonce AESNI_ALIGN_ATTR; > > Is the alignment needed here? > > > + u8 ghash_key_powers[FULL_NUM_KEY_POWERS][16] AESNI_ALIGN_ATTR; > > +}; It only has an effect on ghash_key_powers, so I'll probably drop it from the other two fields. This struct is also missing a comment, so I'll add one. As for why ghash_key_powers is aligned (and also aes_key by virtue of being at the start of the struct), it slightly improves performance of the load instructions vmovdqu8 and vbroadcasti32x4. It's not absolutely required though. I.e., the assembly doesn't use instructions like vmovdqa8 that actually enforce alignment. Semi-related note: I forgot to account for the alignment of the whole struct in aead_alg::base::cra_ctxsize. I'll fix that too. > > +static int gcm_setkey_vaes_avx10(struct crypto_aead *tfm, const u8 *raw_key, > > + unsigned int keylen, bool vl256) > > +{ > > + struct aes_gcm_key_avx10 *key = aes_align_addr(crypto_aead_ctx(tfm)); > > + int err; > > + > > + /* The assembly code assumes the following offsets. */ > > + BUILD_BUG_ON(offsetof(typeof(*key), aes_key.key_enc) != 0); > > + BUILD_BUG_ON(offsetof(typeof(*key), aes_key.key_length) != 480); > > + BUILD_BUG_ON(offsetof(typeof(*key), ghash_key_powers) != 512); > > + > > + if (likely(crypto_simd_usable())) { > > Is it really necessary to have 3 different code paths here? If so, > could you add a comment why? I think it's worth optimizing setkey for AES-GCM, since the assembly version of setkey is over 30x faster than the portable C version, and there are circumstances in which setkey performance could be important. For example latency of connection establishment, and throughput for users who change keys frequently. The portable C version takes roughly the same amount of time as encrypting 37 KB of data, which is a lot. There are also cases where new AES-GCM keys might be set even more frequently than expected. These do not necessarily apply to Linux right now. But these are issues that could easily arise, especially if my assembly code gets reused in userspace projects where it could see a wider range of use. So I wanted to have the assembly support pre-computing the hash key powers from the start. First, sometimes people work around AES-GCM's short nonce length by using an XChaCha-like construction. This results in a unique AES-GCM key for every message even if the key for the top-level mode does not change. Second, people don't always use APIs in the optimal way. The ability to pre-set a key can be supported in the API, but some people will write a function like 'byte[] encrypt(byte[] key, byte[] data) anyway, resulting in a key being set individually for every message, even if it doesn't change. The OpenSSL API also has a huge pitfall where to *not* set a new key, you have to explicitly pass NULL, which I'm sure many people get wrong. As for why have both aes_gcm_precompute_vaes_avx10_256() and aes_gcm_precompute_vaes_avx10_512(), the value of the 512-bit one is marginal, but it's easy to provide, and it does work nicely because the total size of the key powers being computed is always a multiple of 512 bits anyway. So the full 512 bits will always be used. I.e. there's no issue like there is for the AAD, where the AAD is usually <= 256 bits so I made it just use 256-bit vectors. As for why have the portable C code path when there is assembly too, I think that unfortunately this is required to fulfill the skcipher API contract on x86. Assuming that crypto_skcipher_setkey() can be called in the same contexts as crypto_skcipher_encrypt(), it can be called in a no-SIMD context. - Eric