Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp483532rdb; Thu, 18 Jan 2024 09:08:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IFdpbg3nLxBTczMzSwo+CMQdsOgHfFOKS43TJ9TfECvzwORqxcWbVy++0yZyYfoA+3io9mA X-Received: by 2002:a17:90a:f998:b0:28e:82df:ea30 with SMTP id cq24-20020a17090af99800b0028e82dfea30mr1053683pjb.9.1705597693593; Thu, 18 Jan 2024 09:08:13 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705597693; cv=pass; d=google.com; s=arc-20160816; b=XBKkIwDOCJK8a0zsVSDnroX7IbFEvMBrNDTpvEV9KBKN90RW/vgTaQ0ltsuk8phBBt jH8UwIiozWR5vMdgAQdcgXYjtlIqr03ulWEhATdEqZZC91eqUyV6/V5Px3HxrPehYRCM 0+RJ/V4yFty9yRqFto9Rzm4Y7BWmcvdrjG+gQEo0UXSTW+hMI3UMwSwWhJzqxapfCUwT dr/mZPJWWbLZx08Hs0gHhBGiPNmKCv1tWP/cYDLBlcjOWPMpVhT/1EcmaALV1J5Mhryn 9hNldqBuCeSjyLCiTc8TGNtGKWPl3NXpxW8kvi6v8id+NYvo7f9tVS8HR6zklBd47dmN xR8Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=cIgUgl/Kj0Nyil4FuwM1hWjLVD7oyZgze427wzrk/WE=; fh=MoNanF5d0vjGrp692+ghhk+JSPJJjsLNvHcs25SnBB4=; b=ASuRiErxTu3w4/jhvnK0mu6GZUaqZGp+amqgqaw7mRBrbI7qwvJ1nIhkuhX6fsHOLe ZxY5/X1fbaPomCLveO45dr+3NznEj7elbMnBuhhpyx0xm8KQZabTp4T02/mbHAydw93z mGrFpUW2mSPqZSIVAOcuCk1IXyUE8dGxTG5dhd5f4E3uB2IMoVpS8OcB99UPuG9FbQSQ cGukavBfP8ueiCuVIrjASjTT3tsx7T/o7i13B266aGIddUxDcFldKySaftc6STnOf5Gg XeTEOC3lBCUaDPN+aDiwWx89z2ONkv4gi59KccAk+8jxOuB+hl5rAHAu3R9aJi1z2gvm AYLg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Zhpl7XNJ; arc=pass (i=1 spf=pass spfdomain=flex--ardb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-crypto+bounces-1490-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-crypto+bounces-1490-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id y2-20020a17090a86c200b0028bf7ab5a92si1718314pjv.147.2024.01.18.09.08.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 09:08:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto+bounces-1490-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Zhpl7XNJ; arc=pass (i=1 spf=pass spfdomain=flex--ardb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-crypto+bounces-1490-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-crypto+bounces-1490-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 90DE8284573 for ; Thu, 18 Jan 2024 17:07:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0472C2C843; Thu, 18 Jan 2024 17:07:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Zhpl7XNJ" X-Original-To: linux-crypto@vger.kernel.org Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 495F22C84D for ; Thu, 18 Jan 2024 17:07:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705597674; cv=none; b=TAZ0y4/VU6Svp6vUY5dxU0crhb2pC47jif2PYpnr6OvwFYNQgQEUw/EQzosajMplcpzUlrrgTnWy+PANStRocwL0rT/7+EJOpuwvsXdrcmorGWrMop82xvSHD8AKF8QaXEXqGSYu+qojGVq0d1944gvdB4MB8en6B2M22XRe/AM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705597674; c=relaxed/simple; bh=E8xa5pPwKdPObbfeUy3dAiPldcHAbAySa96B8+aC2vc=; h=Received:DKIM-Signature:X-Google-DKIM-Signature: X-Gm-Message-State:X-Google-Smtp-Source:X-Received:Date: In-Reply-To:Mime-Version:References:X-Developer-Key: X-Developer-Signature:X-Mailer:Message-ID:Subject:From:To:Cc: Content-Type; b=XkHtFssn7chMHO9hUC9iaztUpWzYfiymSm6oxZJ6w2QwlK2kzOIgYN4kERvIN+sHG0V0wnCsy8m7eTasT9rkNkam2ZO0YI4aoVE9wqTf3dbjdGcsiVnTAIjvXmJDHGhJJImtGCSQuRC8cR+AYLHvI2lH75oWA2nPfZLsiQHx+08= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Zhpl7XNJ; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-337d2ccaab0so302662f8f.0 for ; Thu, 18 Jan 2024 09:07:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705597671; x=1706202471; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cIgUgl/Kj0Nyil4FuwM1hWjLVD7oyZgze427wzrk/WE=; b=Zhpl7XNJQ6cwpMPY0NGnHUgt1Oi0IOY/XROjT1N6j/Jj5/2hJ0ve9dgRfMOAIYQkAN RvQNsqg4Sd/V3udAGoICPfWVwB21tGMYOThhlxdeepU1HermT0pz0N/B229nVbkipRcI h//qyA9+x9gS3w9ZTNzyrA/8SH6zdPNOnF0HlctYLZOAR0suwU0+FkdiTUwi90biEVxS idOgOoLVH/2R23caGKGv4nU6dHl3558v1pkgmTpZcd2pjqYA9EzJZA4cQOJgM1mH4Zv9 pV0oXwft3kKBQnCMiUBl19O1ClTpKDC7uS5q30cCjo+zvYdRDGCf+HS62pSEMwJwcp7x eIxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705597671; x=1706202471; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cIgUgl/Kj0Nyil4FuwM1hWjLVD7oyZgze427wzrk/WE=; b=K4mUomH1QoH/wQO0uqqPEEOE3HL1DQhZZNJ06KqrEgG6B6CHe+JOHyAvkhurQlAoyO C8VXBD3WRgNe+RE8DZHzbQQL2wd5kDGydDbRyXEE4IUdEgp9frxwVHBxtM0MxulyE05L qhcp1PfVGb1pGaRLB0E2rl7eipXRMKN/Qynz/kkA4HMgQmvKMtx54UtC6iW5SIL0veoS jn+rfn0WssGLiEi1iuFVuA1q/IGuKOJp9Cet5aaLqzpF9PMsW4A3kFS8F4bq/cY0Bwo/ aty0T/dxl6IfKGIM696KKR3QYONaC9WRflzNtq4LxJv1n0TI26TZv9LADQnSMGKK0h3t aGnA== X-Gm-Message-State: AOJu0YzW/i9+E7aoyreJfv0BgwfPafOuMe48/gvLC6wxFZ7+wagfECUt imzDjVcCtARo02st+PwbblhnlPzH9Wdpyf2zfzKgXXD8z5GL+PpZBY9UHR5qVsMr5uEYXU+1ySk mcNsrZb0w69Xne1kw2Hj+THnMzMZyIkyLn5bgRdP8Mbvl1nsYeerHGbxpi0VLGugOTebC8CRz9k n4KnZ6dRrtvjYWeAib5MgDKdIS4FWWug== X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:adf:db4d:0:b0:337:61b6:e4af with SMTP id f13-20020adfdb4d000000b0033761b6e4afmr3219wrj.9.1705597671397; Thu, 18 Jan 2024 09:07:51 -0800 (PST) Date: Thu, 18 Jan 2024 18:06:35 +0100 In-Reply-To: <20240118170628.3049797-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240118170628.3049797-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=4505; i=ardb@kernel.org; h=from:subject; bh=BZ/DOHhRiS0C26Q9Ksj11LPomr42R+ccgChBBs+l+0U=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXVl1GzbqgdMnT8+hDwodXzQ+Jyn5PaTqwWqe+d7PZWzP crCL7mxo5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAExEwJLhn+qGG4L3FtmvaZh6 zNbametjVeoe2wfTlN07fzYbMgn+uMLIMKtoW5y8zqnEe5OEv8w8lzN50lG92xVBfKs2piTPfJO qzgkA X-Mailer: git-send-email 2.43.0.381.gb435a96ce8-goog Message-ID: <20240118170628.3049797-16-ardb+git@google.com> Subject: [PATCH v2 6/8] crypto: arm64/aes-ccm - Cache round keys and unroll AES loops From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel Content-Type: text/plain; charset="UTF-8" From: Ard Biesheuvel The CCM code as originally written attempted to use as few NEON registers as possible, to avoid having to eagerly preserve/restore the entire NEON register file at every call to kernel_neon_begin/end. At that time, this API took a number of NEON registers as a parameter, and only preserved that many registers. Today, the NEON register file is restored lazily, and the old API is long gone. This means we can use as many NEON registers as we can make meaningful use of, which means in the AES case that we can keep all round keys in registers rather than reloading each of them for each AES block processed. On Cortex-A53, this results in a speedup of more than 50%. (From 4 cycles per byte to 2.6 cycles per byte) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 95 ++++++++------------ 1 file changed, 38 insertions(+), 57 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 0132872bd780..0ec59fc4ef3e 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -14,40 +14,46 @@ .text .arch armv8-a+crypto + .macro load_round_keys, rk, nr, tmp + sub w\tmp, \nr, #10 + add \tmp, \rk, w\tmp, sxtw #4 + ld1 {v10.4s-v13.4s}, [\rk] + ld1 {v14.4s-v17.4s}, [\tmp], #64 + ld1 {v18.4s-v21.4s}, [\tmp], #64 + ld1 {v3.4s-v5.4s}, [\tmp] + .endm + + .macro dround, va, vb, vk + aese \va\().16b, \vk\().16b + aesmc \va\().16b, \va\().16b + aese \vb\().16b, \vk\().16b + aesmc \vb\().16b, \vb\().16b + .endm + + .macro aes_encrypt, va, vb, nr + tbz \nr, #2, .L\@ + dround \va, \vb, v10 + dround \va, \vb, v11 + tbz \nr, #1, .L\@ + dround \va, \vb, v12 + dround \va, \vb, v13 +.L\@: .irp v, v14, v15, v16, v17, v18, v19, v20, v21, v3 + dround \va, \vb, \v + .endr + aese \va\().16b, v4.16b + aese \vb\().16b, v4.16b + .endm + /* * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[], * u32 rounds); */ SYM_FUNC_START(ce_aes_ccm_final) - ld1 {v3.4s}, [x2], #16 /* load first round key */ ld1 {v0.16b}, [x0] /* load mac */ - cmp w3, #12 /* which key size? */ - sub w3, w3, #2 /* modified # of rounds */ ld1 {v1.16b}, [x1] /* load 1st ctriv */ - bmi 0f - bne 3f - mov v5.16b, v3.16b - b 2f -0: mov v4.16b, v3.16b -1: ld1 {v5.4s}, [x2], #16 /* load next round key */ - aese v0.16b, v4.16b - aesmc v0.16b, v0.16b - aese v1.16b, v4.16b - aesmc v1.16b, v1.16b -2: ld1 {v3.4s}, [x2], #16 /* load next round key */ - aese v0.16b, v5.16b - aesmc v0.16b, v0.16b - aese v1.16b, v5.16b - aesmc v1.16b, v1.16b -3: ld1 {v4.4s}, [x2], #16 /* load next round key */ - subs w3, w3, #3 - aese v0.16b, v3.16b - aesmc v0.16b, v0.16b - aese v1.16b, v3.16b - aesmc v1.16b, v1.16b - bpl 1b - aese v0.16b, v4.16b - aese v1.16b, v4.16b + + aes_encrypt v0, v1, w3 + /* final round key cancels out */ eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */ st1 {v0.16b}, [x0] /* store result */ @@ -55,6 +61,8 @@ SYM_FUNC_START(ce_aes_ccm_final) SYM_FUNC_END(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc + load_round_keys x3, w4, x10 + cbz x2, 5f ldr x8, [x6, #8] /* load lower ctr */ ld1 {v0.16b}, [x5] /* load mac */ @@ -64,37 +72,10 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ prfm pldl1strm, [x1] add x8, x8, #1 rev x9, x8 - cmp w4, #12 /* which key size? */ - sub w7, w4, #2 /* get modified # of rounds */ ins v1.d[1], x9 /* no carry in lower ctr */ - ld1 {v3.4s}, [x3] /* load first round key */ - add x10, x3, #16 - bmi 1f - bne 4f - mov v5.16b, v3.16b - b 3f -1: mov v4.16b, v3.16b - ld1 {v5.4s}, [x10], #16 /* load 2nd round key */ -2: /* inner loop: 3 rounds, 2x interleaved */ - aese v0.16b, v4.16b - aesmc v0.16b, v0.16b - aese v1.16b, v4.16b - aesmc v1.16b, v1.16b -3: ld1 {v3.4s}, [x10], #16 /* load next round key */ - aese v0.16b, v5.16b - aesmc v0.16b, v0.16b - aese v1.16b, v5.16b - aesmc v1.16b, v1.16b -4: ld1 {v4.4s}, [x10], #16 /* load next round key */ - subs w7, w7, #3 - aese v0.16b, v3.16b - aesmc v0.16b, v0.16b - aese v1.16b, v3.16b - aesmc v1.16b, v1.16b - ld1 {v5.4s}, [x10], #16 /* load next round key */ - bpl 2b - aese v0.16b, v4.16b - aese v1.16b, v4.16b + + aes_encrypt v0, v1, w4 + subs w2, w2, #16 bmi 6f /* partial block? */ ld1 {v2.16b}, [x1], #16 /* load next input block */ -- 2.43.0.381.gb435a96ce8-goog