Received: by 2002:a05:7412:3b8b:b0:fc:a2b0:25d7 with SMTP id nd11csp1638224rdb; Sat, 10 Feb 2024 10:12:47 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUFFsJK84A1ajMaXDssAz3fFntjoNFBO0XH9a3nboKqFaCgcvkxDvm7IOKWgXU4W08Q8nIBwhXYBHaSwx9SwXY5pZK/eMpd0/wxs1KMFA== X-Google-Smtp-Source: AGHT+IHPSlevPxBICn3uAL2PMrt3GxVWYc7JRAf8xvD5ayFOBm0E0LuzmhD9BwoWZXh6PZnIdxF4 X-Received: by 2002:a05:620a:560e:b0:785:af0a:54b7 with SMTP id vu14-20020a05620a560e00b00785af0a54b7mr2579525qkn.11.1707588767395; Sat, 10 Feb 2024 10:12:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707588767; cv=pass; d=google.com; s=arc-20160816; b=vMzbIP7++R9+esOrWYkL5ZL16bhGV8vr37sX9eU4NrTQuQTR0u2LJkv6dvePwDtl9E Iv6XxHX4nLt8RcRq4YMqUcwWcjXYDhaCc5Gsp8cCUmg3gey7rxTIlBJQhDMvGZqswt+1 xLe2ZMhRjfIZ/j4ZlnGvh1fKU5qXXrpJWhJrRANQRCxNEdzCCXmFneuAhDzEiOMOy9eD kZmCIVguAxiUMgyHJgCXilJU3atmJ8ci2AGr+N1Zaqz3iaTBdX3lNqo7Q80zg6pqcuDp ZRF0hGfjrTD0gl+5ETd7LFNQp1Ubya5t7kqU3c+HTRB1alCyVWegGLAEzFRX1tEcK9Rq Py7A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=YNzYplMs0ZcA8YsK7a38h5oH/12rA0FqPSCV7kT8X50=; fh=Cpt7nvvJ6ywDjT+t5oF498P0liAIKNYzvaerPfIYOHY=; b=dE0hNP9Nc7CjOelQnFJhrQIMjhYBtnKQYacf28IgaDHVHEOLEEXUOInK4ASqjZQTNE RieZaGSPPKSsylnAzqb1qIzqTOhY6PLHEETzpzyfmfEHo79FpPpP/0HbUdG2xvCi7Th9 h7NSc92rOkZ4OZTTSa50mj4qbJiC5/IpEfwhjFf0mbVOfvik/aneapmzkOqf552ZyDo8 Ta+9S53pDW1ItFKiEuW02w/NA1Fmdkh+m5QyCYj/UZZl9bOL5CwwjRKOHdsYoeg0fhBo NlBTUokitrCpSynTo4KdGDXCDF/nncmjp8/PCvgomVRKxTKYAKcCxDWBNnywgfzfrU85 J4Jg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Xz+Co7sw; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-crypto+bounces-1951-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-crypto+bounces-1951-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCXbvlezt4cFyDA71P6JWM2fPFH67JBbKqDSBJF8Y+1ilVceXIROfOgQTNGH7TPATy+yRgRhlgrx8s63UeHv9/9UKT6JVkRwclMzgtnzzA== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d13-20020a05620a136d00b007859d979764si2126052qkl.438.2024.02.10.10.12.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Feb 2024 10:12:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-crypto+bounces-1951-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Xz+Co7sw; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-crypto+bounces-1951-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-crypto+bounces-1951-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 181841C21A1A for ; Sat, 10 Feb 2024 18:12:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 30BA35DF06; Sat, 10 Feb 2024 18:12:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xz+Co7sw" X-Original-To: linux-crypto@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E47965C8F3 for ; Sat, 10 Feb 2024 18:12:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707588763; cv=none; b=HBHiJmbT0s8+nQa4VFAUbAs+zKSkhTGbc210w4r4v/PpjFKJ9i3I3UgWsIMKmmLOATKhFHzf8FvbKdb6B0SIsE71pK9O+ornNrbETtjUn6lNPQcQqx66KnXk4udKeYBWg41X54x4cWweg2UB8uysazGsQdwteMwpJjLk3TEC0as= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707588763; c=relaxed/simple; bh=J7eVMcLYjTAhRw0qnGWZ6vRpGFo/1AYZBX0WuO/8k7Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lpDvPSxtG7JqCY4o2JvBZfajuDDAUNzzVFud1puLr3pI/gc4Za7Asd8JmZYHwyilz5r0Fqqlxz/Lik76OXNezBLM8VhkChO+Ay6yk8FOmKLVTLOkcqd8btvziwMWjogM5HpeQFdz0Ev4mY32yc3P6EZjvK6hUf7Cw+LV8Je+kDE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Xz+Co7sw; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7800EC433C7; Sat, 10 Feb 2024 18:12:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707588762; bh=J7eVMcLYjTAhRw0qnGWZ6vRpGFo/1AYZBX0WuO/8k7Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Xz+Co7swEy5nyM4XgXp4SdtPmHBaKFobM9EGybWj8lp5YcOFb6A2L5D0nGHoNeZ3I nsXtDOAxgihjFSc/4Rguq09TyplRyQB6ixlaA911DZ3Ki+Nms4UkxGPPhjAhxBeRlH QF6oR9EgElTAxoQei553oKwedMkOMqCke9d3f5h60B/E3K9KUk00bnL8y9DaCbg1Ok TZS/2+uMP+GXeu8YPe2PHZ7KxcHk9RfKXGvxRdG6w40mYBR+UYS/HEawKu9MUHfgkQ MM7pHfAytxhnXrmPjrsPWa/rake1gHuLNvf2x0ts7FpHASenLMl0nlXJMyqigz1J7a m1ohsWGaihqBw== Date: Sat, 10 Feb 2024 10:12:40 -0800 From: Eric Biggers To: Jerry Shih Cc: linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-crypto@vger.kernel.org, Christoph =?iso-8859-1?Q?M=FCllner?= , Heiko Stuebner , Phoebe Chen , Andy Chiu Subject: Re: [PATCH riscv/for-next] crypto: riscv - parallelize AES-CBC decryption Message-ID: <20240210181240.GA1098@sol.localdomain> References: <20240208060851.154129-1-ebiggers@kernel.org> <04703246-6EF6-4B54-B8F1-96EDEC2FBA6B@sifive.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <04703246-6EF6-4B54-B8F1-96EDEC2FBA6B@sifive.com> On Sat, Feb 10, 2024 at 11:25:27PM +0800, Jerry Shih wrote: > > .macro aes_cbc_decrypt keylen > > + srli LEN, LEN, 2 // Convert LEN from bytes to words > > vle32.v v16, (IVP) // Load IV > > 1: > > - vle32.v v17, (INP) // Load ciphertext block > > - vmv.v.v v18, v17 // Save ciphertext block > > - aes_decrypt v17, \keylen // Decrypt > > - vxor.vv v17, v17, v16 // XOR with IV or prev ciphertext block > > - vse32.v v17, (OUTP) // Store plaintext block > > - vmv.v.v v16, v18 // Next "IV" is prev ciphertext block > > - addi INP, INP, 16 > > - addi OUTP, OUTP, 16 > > - addi LEN, LEN, -16 > > + vsetvli t0, LEN, e32, m4, ta, ma > > + vle32.v v20, (INP) // Load ciphertext blocks > > + vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks > > + addi t1, t0, -4 > > + vslidedown.vx v24, v20, t1 // Save last ciphertext block > > Do we need to setup the `e32, len=t0` for next IV? > I think we only need 128bit IV (with VL=4). > > > + aes_decrypt v20, \keylen // Decrypt the blocks > > + vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks > > + vse32.v v20, (OUTP) // Store plaintext blocks > > + vmv.v.v v16, v24 // Next "IV" is last ciphertext block > > Same VL issue here. It's true that the vslidedown.vx and vmv.v.v only need vl=4. But it also works fine with vl unchanged. It just results in some extra data being moved in the registers. My hypothesis is that this is going to be faster than having the three extra instructions per loop iteration to change the vl to 4 twice. I still have no real hardware to test on, so I have no quantitative data. All I can do is go with my instinct which is that the shorter version will be better. If you have access to a real CPU that supports the RISC-V vector crypto extensions, I'd be interested in the performance you get from each variant. (Of course, different RISC-V CPU implementations may have quite different performance characteristics, so that still won't be definitive.) Here is the alternative variant given as a diff from this patch: diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.S b/arch/riscv/crypto/aes-riscv64-zvkned.S index 43541aad6386c..ef380771f606a 100644 --- a/arch/riscv/crypto/aes-riscv64-zvkned.S +++ b/arch/riscv/crypto/aes-riscv64-zvkned.S @@ -146,10 +146,13 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned) vle32.v v20, (INP) // Load ciphertext blocks vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks addi t1, t0, -4 + vsetivli zero, 4, e32, m4, ta, ma vslidedown.vx v24, v20, t1 // Save last ciphertext block + vsetvli t0, LEN, e32, m4, ta, ma aes_decrypt v20, \keylen // Decrypt the blocks vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks vse32.v v20, (OUTP) // Store plaintext blocks + vsetivli zero, 4, e32, m4, ta, ma vmv.v.v v16, v24 // Next "IV" is last ciphertext block slli t1, t0, 2 // Words to bytes add INP, INP, t1 @@ -157,7 +160,6 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned) sub LEN, LEN, t0 bnez LEN, 1b - vsetivli zero, 4, e32, m1, ta, ma vse32.v v16, (IVP) // Store next IV ret .endm A third variant would be to just replace vmv.v.v with vmv1r.v. In general, this level of micro-optimization probably needs to be wait until there are a variety of CPUs to test on. We know that parallelizing the algorithms is helpful, so we should do that, as this patch does. But the effects of small variations in the instruction sequences are currently unclear. - Eric