Received: by 2002:a89:d88:0:b0:1fa:5c73:8e2d with SMTP id eb8csp1902499lqb; Mon, 27 May 2024 00:58:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWLSeMA5OBqAbtgJKF2H2EIOvmAqoHwtBPRkxIUl1zX4pGWOl7gYf1wYrcRAusb/7OiVtck9iAjg9LRQtDRhOSsYvkaPX2T7JzCZmaJ2A== X-Google-Smtp-Source: AGHT+IHGWtF9w/IgSUj6BAdh5NL+lJLtSDMbe9b6CKNdwpaxapwXgLGrl0HQvQVcy4efhK80fW3m X-Received: by 2002:a17:902:ce88:b0:1f4:6c90:3f32 with SMTP id d9443c01a7336-1f46c904969mr51482815ad.49.1716796685581; Mon, 27 May 2024 00:58:05 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716796685; cv=pass; d=google.com; s=arc-20160816; b=F/Bnnuk21K5iVvi9hu2q5e60LtORgLCkGyU1/LHn9Fl6q2t7pWwxhCxbBjZbn5IL1l qruBxqxELAbbSMwSlaKXWIsI58vPTA2cw8DrTF8ypQ64rdCoaIKDVZkRNdMTHect30x/ /nWaZvD/ajOU0aXtJDGaxCdbqf64Xe0EwEKKYWiacFERlgCP6Tl92XuG3JtfOl5hpgHU BIP1DcB3SVeZLknI8UF2J5KgIFsEC9JIoBWZFHzt0roMN517knj5wb3Hx5XVQKDDgxzt P0Zbk8YutJXhoSgMJjoqHUPmFPzzEYV9SjEZKeVE03YGTpJWAqtx8DbieWZguMKXhl64 rvQQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=APuKGne9E2cl8VibOomVghudaAwNtJtgxwf2vkwI/b8=; fh=dZVugqTbv7zMa5pDuGxY94D0lsZ6th+/NxpVIvRdjZ0=; b=Ec32GkChYK6CbeYVKhE+KFMeGm8MCUmRPhl4j0teQ5mFBy7TJTlxWda7DLxRQhhz5m Q7FihUUajMol4+kim+juDGJD6gbxTXH29z6atvF/XUWhgLPLs//AiB9ZNvuRcBoFFkku mq1pyHm92c/YJuWHS7DyCGRQ7+sd+OFX5De38Pkpi3UN1VHIea2nYTixo1YftEPp2Q+x 2gHqLxSX/iiIgpsHgBGcXBDofca4Rd/RvfQE2CVT5jvHot2dTGY1rRHKYR/C8CAow+8g UXJ98swjC14y8YQOCU7fOj4NupWVuRJbKIU5TwXXcSC29HtJRT2VLkTekHY5eFHh00Nl M/VQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jRq1QuIU; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-crypto+bounces-4405-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-crypto+bounces-4405-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d9443c01a7336-1f44c97001bsi55889665ad.323.2024.05.27.00.58.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 May 2024 00:58:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto+bounces-4405-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=jRq1QuIU; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-crypto+bounces-4405-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-crypto+bounces-4405-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8624BB20A69 for ; Mon, 27 May 2024 07:58:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AA96B381C7; Mon, 27 May 2024 07:57:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jRq1QuIU" X-Original-To: linux-crypto@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DC3B36120; Mon, 27 May 2024 07:57:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716796676; cv=none; b=E53CiT/kqNC6fL4khzS9teU+RI6OzMNmZy4xdOKzrc/Fib46g3yF8Urdcn5RsndEKjJzutmJESeKDNjkvEqHDxSqtiE87vWQMvJm+MyecAxxkTM06Q+1IaqIL2652YC/iXSw/cDD2xHOe9tBJv+QeRiP/xmRTp5RPh1uYHhxbr0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716796676; c=relaxed/simple; bh=unPcFSscR7nEC2MTNvUbr/YbqEm/EKh/dYHH6S/9Gps=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=ILBznsbBMp0fIhtmtwJmeauB4qUvcNC9Q6zrRSyH5aZ9eWlTgo1ZUoMYdrbIXAT9mCgUHXbl9oXvS67RKjL5TLOlC75F3v94HQcdhw3GljYtNjibY1yHAX2nv/qBYcl6ho5TgXNX4jd6QG95NH47dXR+NenW2pkbkaZMQ8y+oV8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jRq1QuIU; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id B93A1C2BBFC; Mon, 27 May 2024 07:57:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716796675; bh=unPcFSscR7nEC2MTNvUbr/YbqEm/EKh/dYHH6S/9Gps=; h=From:To:Cc:Subject:Date:From; b=jRq1QuIU3eeMrzqUWg5btVS4dd+odV7hxViAMmY0Zh4unDEdXTzZVopzXqIPWjudb KkGmylvQObz9LCm/XbagFPpvhwY2+GTBbktlEAcWW7Up4wiHt9ko/oALrpxJt0Nhwk /bCsAo0SJP4YpAYCiEvt6p41/kQzU1q8yrpQxWpfUfHYgPYYZVke3EeDlSOZH5R+ef 5vOJkPxSWtTBbuCS270WFM5ZbzRO+RBkbTXB6dZjeKe6Lw74hgUSSoa6Ln3WCmUf8y jb6en1T9mvQYFxoOx4It8zDlw94WHUCzOS30qG7QYbIW2QE+EHDDGDplTtcCqZD8aT 9seBNRt0e0eEA== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 0/2] x86_64 AES-GCM improvements Date: Mon, 27 May 2024 00:56:24 -0700 Message-ID: <20240527075626.142576-1-ebiggers@kernel.org> X-Mailer: git-send-email 2.45.1 Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This patchset adds a VAES and AVX512 / AVX10 implementation of AES-GCM (Galois/Counter Mode), which improves AES-GCM performance by up to 162%. In addition, it replaces the old AES-NI GCM code from Intel with new code that is slightly faster and fixes a number of issues including the massive binary size of over 250 KB. See the patches for details. The end state of the x86_64 AES-GCM assembly code is that we end up with two assembly files, one that generates AES-NI code with or without AVX, and one that generates VAES code with AVX512 / AVX10 with 256-bit or 512-bit vectors. There's no support for VAES alone (without AVX512 / AVX10). This differs slightly from what I did with AES-XTS where one file generates both AVX and AVX512 / AVX10 code including code using VAES alone (without AVX512 / AVX10), and another file generates non-AVX code only. For now this seems like the right choice for each particular algorithm, though, based on how much being limited to 16 SIMD registers and 128-bit vectors resulted in some significantly different design choices for AES-GCM, but not quite as much for AES-XTS. CPUs shipping with VAES alone also seems to be a temporary thing, so we perhaps shouldn't go too much out of our way to support that combination. Changed in v4: - Added AES-NI rewrite patch. - Adjusted the VAES-AVX10 patch slightly to make it possible to cleanly add the AES-NI support on top of it. Changed in v3: - Optimized the finalization code slightly. - Fixed a minor issue in my userspace benchmark program (guard page after key struct made "AVX512_Cloudflare" extra slow on some input lengths) and regenerated tables 3-4. Also upgraded to Emerald Rapids. - Eliminated an instruction from _aes_gcm_precompute. Changed in v2: - Additional assembly optimizations - Improved some comments - Aligned key struct to 64 bytes - Added comparison with Cloudflare's implementation of AES-GCM - Other cleanups Eric Biggers (2): crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM crypto: x86/aes-gcm - rewrite the AES-NI optimized AES-GCM arch/x86/crypto/Kconfig | 1 + arch/x86/crypto/Makefile | 8 +- arch/x86/crypto/aes-gcm-aesni-x86_64.S | 1131 +++++++++ arch/x86/crypto/aes-gcm-avx10-x86_64.S | 1222 ++++++++++ arch/x86/crypto/aesni-intel_asm.S | 1503 +----------- arch/x86/crypto/aesni-intel_avx-x86_64.S | 2804 ---------------------- arch/x86/crypto/aesni-intel_glue.c | 1269 ++++++---- 7 files changed, 3128 insertions(+), 4810 deletions(-) create mode 100644 arch/x86/crypto/aes-gcm-aesni-x86_64.S create mode 100644 arch/x86/crypto/aes-gcm-avx10-x86_64.S delete mode 100644 arch/x86/crypto/aesni-intel_avx-x86_64.S base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0 -- 2.45.1