Received: by 2002:ab2:620c:0:b0:1ef:ffd0:ce49 with SMTP id o12csp1249850lqt; Tue, 19 Mar 2024 18:49:04 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWaobRTYpKw6CdEWO5jq5Ah2I3xYCGa/0b1o1aHhqRM6orAf0pjPAwaCkcWk3sS8Ga9Diu6IF0Sq3z6n0GErtUrOcMK+jlsdkH7aGYVmA== X-Google-Smtp-Source: AGHT+IG3SY8GQp1nglgqSZ3PJ87cLSRUMrVFZZcD0UghprQRZhADk3WCpDCG6xsJayYKVE/mTecd X-Received: by 2002:a50:d7de:0:b0:568:c6d5:e13a with SMTP id m30-20020a50d7de000000b00568c6d5e13amr4109455edj.15.1710899343907; Tue, 19 Mar 2024 18:49:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710899343; cv=pass; d=google.com; s=arc-20160816; b=xdqYzs6CyrBZOrKhQx8z3UdmwmMwJVqvNSVfJpUT4qTBgCg6Kdk+n6RlZd37ClIv3v dd+jleu8kEjHmPBscs4JDqnzmY0cYeOWlvQwmbwBL8InKIqOJZHNop2s7SJD9iK3FHfj OaQsc3iATFzLqQt3xahoM59W6KmmEetI/8ZYCateEuPyjTVDHzCqkFR4ay+qhlfXYy4N Ff1gPrrcf+3xT7Oc6mPGauKTFNdCuuAT0ynneMlQNNR3Kx+OzRbMN5sFfvg2mf0+w6Eh N5Xa6vaX++AQu8msSpP9kuWFIZwsRtD1UwL2XsnC12ZnMmkaVVrWFrup2W99B7AhDRBO XRmw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:to:from:cc:in-reply-to :subject:date:dkim-signature; bh=0XFWeUGhARFGLtFMmrEktN1r5EqqYdHQ1yDObB41VHU=; fh=hurpz4Z7KztngXJvDIVDG9w3lWi0K6K78EZNOnXwUE0=; b=PI45aq/+4kee8QwUG10lB17Naf+rpTyhsoe6MoqA8qmfRBOyx7azjSkJdtZouaraMG L9/499Fgu733AMTtSzOKSPJkGKSsQ00WqCBVi8CBwlkqY+J6uhzXFyIw6UoN2RPv2yWj z5P3acJPPjSSU0v5tR6la66Ubbawbqamc1HOxkQ/OC6PxP1V8tgoQvJ06IeB/z3Fn6xn 9mWPPzG0tXmPy0uGbX+rAHHsyD1LlLHoRPgy982Zhn/wmsIrSOxgGUyymTSduZ+H1sdj A2uAX4kmIJ7PmgFqZt2FECPhrRyD/AI2VPSuAUYQactqDkiWOFRTDZ/EPrFCFEYH5Q7A lcOA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@dabbelt-com.20230601.gappssmtp.com header.s=20230601 header.b=bgQHemEv; arc=pass (i=1 spf=pass spfdomain=dabbelt.com dkim=pass dkdomain=dabbelt-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-crypto+bounces-2759-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-crypto+bounces-2759-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id ek17-20020a056402371100b0056ba785da85si498217edb.595.2024.03.19.18.49.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 18:49:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto+bounces-2759-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@dabbelt-com.20230601.gappssmtp.com header.s=20230601 header.b=bgQHemEv; arc=pass (i=1 spf=pass spfdomain=dabbelt.com dkim=pass dkdomain=dabbelt-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-crypto+bounces-2759-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-crypto+bounces-2759-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 3FDE41F2445A for ; Wed, 20 Mar 2024 01:49:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5560E7464; Wed, 20 Mar 2024 01:48:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=dabbelt-com.20230601.gappssmtp.com header.i=@dabbelt-com.20230601.gappssmtp.com header.b="bgQHemEv" X-Original-To: linux-crypto@vger.kernel.org Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE58F6FC3 for ; Wed, 20 Mar 2024 01:48:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710899338; cv=none; b=PP+6yJEFDaT1dsml/Rxx/jIbMUfVxMdx6Vi5ToYrw185y13XeuXZ5/ahEkO12ezpjk4b95TDPU6Z8rmzOMJMHJMaAhrHfNKkZ1bfQKab9IyWFjIKrWUd1Dx1vdpZ2ZZXodN3v1A3GRlEhjkA8VLR1bpyvoi6ZhJb1XGTtqLJVJ0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710899338; c=relaxed/simple; bh=qI5Tub3LHNOZFOE4/6hw3EHPShN11Tc+qwy45TA6jzI=; h=Date:Subject:In-Reply-To:CC:From:To:Message-ID:Mime-Version: Content-Type; b=eRXzwmA6PuxT82t1ZDPacICKxFelWCKKhoU4BJ3DXU93l4I9Ja/Zu9vlPa5o3bXHdU+0J8TaD/GVYXB5dQQbLo0w4DB174/YiWYapoXfB35YQW+6nl/rfWkJTMa6UA+dNGyB1lkfIptL+/e5OQgUnXY54F6hTDWHzucIwrs3ABc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=dabbelt.com; spf=pass smtp.mailfrom=dabbelt.com; dkim=pass (2048-bit key) header.d=dabbelt-com.20230601.gappssmtp.com header.i=@dabbelt-com.20230601.gappssmtp.com header.b=bgQHemEv; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=dabbelt.com Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6e709e0c123so3055301b3a.1 for ; Tue, 19 Mar 2024 18:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20230601.gappssmtp.com; s=20230601; t=1710899335; x=1711504135; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=0XFWeUGhARFGLtFMmrEktN1r5EqqYdHQ1yDObB41VHU=; b=bgQHemEvN05cdGU6PpvED1zoIXJgO5LBFTlnx50deFNOuq+GHUAaD7l6X6AZeWsNJQ 1plwhb/c1gHnH8CX4ofdt0KwXvycCbWRqrjaMWEoAZqkDU0abR8J6zUK964XMUM84i3y m0jGe4hOoPUABl+sORRsya0599TlvqK56yuJe7Xhp7fNPbL+YSSYdeec6RTgnBrNpZob xybC4lfWlAccgPfc+HFxo6jhah3Zeoe172z0ZQvSdKPMfJRUfsJKkp9FHnK8ZX2VfhdE mgwrO2RO2QPXa3nenA2EbMNh4RdKOzhMpBndkRFcmGXwEZN4OplqdUak+gI7NCAHIv+0 tLdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710899335; x=1711504135; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0XFWeUGhARFGLtFMmrEktN1r5EqqYdHQ1yDObB41VHU=; b=OUsnXzgCVN0ae36mr4MJz7Uw4lkUL1gZPA26l+n3HVqcuMfp8M/tVw75wptwTEbCgs 3JwA3ksMyz2qWEI6w/6jQHac0RHR9bXIXkYtaKw5tybz7hJ9qBiVSQY5w8XEXyWggTqO z93OlIV9hKhmDkB4KEOEjrotAzJKCctbJ5lXbVwq2qcq4T4Wa9vtqr6/SeU/9sUgtNoU buaCCrHnwNnMsG5HTVygMrg8zk+BccspFofpiW8heiLx9vr63rxw18NDtolntvzyhLff yiNxRxKBB4VsRC4AxU44RE++pFpBCzQl2M+Beg4hfBYb8GFGEbdvcAP8KIyNN+mybE+8 sOsA== X-Forwarded-Encrypted: i=1; AJvYcCU4wNn3At/yZUY9PQKf5ouNe3jOHM2chJA27dNPupP9GLI8ARFjMY7vzMHVx+8l3LbS5jcxVn/tViu7vLY+tje6HbR7AB8oOX/UDmhM X-Gm-Message-State: AOJu0YwnevWW67bJi70Bl74uqmUTy4X1Kz27kfggV+un5PBAVQi2MW42 pBofR22ZQ9mvOn/AYDOb+ogBwh1kuM5Yegk3T6Bhp9en6tlMe5DD32ciW6GrcDQ= X-Received: by 2002:a05:6a20:b915:b0:1a3:69a9:e3e7 with SMTP id fe21-20020a056a20b91500b001a369a9e3e7mr4012756pzb.18.1710899334899; Tue, 19 Mar 2024 18:48:54 -0700 (PDT) Received: from localhost ([50.213.54.97]) by smtp.gmail.com with ESMTPSA id ld11-20020a170902facb00b001db5079b705sm1628209plb.36.2024.03.19.18.48.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 18:48:54 -0700 (PDT) Date: Tue, 19 Mar 2024 18:48:54 -0700 (PDT) X-Google-Original-Date: Tue, 19 Mar 2024 18:48:50 PDT (-0700) Subject: Re: [PATCH riscv/for-next] crypto: riscv - parallelize AES-CBC decryption In-Reply-To: <20240210181240.GA1098@sol.localdomain> CC: jerry.shih@sifive.com, linux-riscv@lists.infradead.org, linux-crypto@vger.kernel.org, christoph.muellner@vrull.eu, Heiko Stuebner , phoebe.chen@sifive.com, andy.chiu@sifive.com From: Palmer Dabbelt To: ebiggers@kernel.org Message-ID: Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On Sat, 10 Feb 2024 10:12:40 PST (-0800), ebiggers@kernel.org wrote: > On Sat, Feb 10, 2024 at 11:25:27PM +0800, Jerry Shih wrote: >> > .macro aes_cbc_decrypt keylen >> > + srli LEN, LEN, 2 // Convert LEN from bytes to words >> > vle32.v v16, (IVP) // Load IV >> > 1: >> > - vle32.v v17, (INP) // Load ciphertext block >> > - vmv.v.v v18, v17 // Save ciphertext block >> > - aes_decrypt v17, \keylen // Decrypt >> > - vxor.vv v17, v17, v16 // XOR with IV or prev ciphertext block >> > - vse32.v v17, (OUTP) // Store plaintext block >> > - vmv.v.v v16, v18 // Next "IV" is prev ciphertext block >> > - addi INP, INP, 16 >> > - addi OUTP, OUTP, 16 >> > - addi LEN, LEN, -16 >> > + vsetvli t0, LEN, e32, m4, ta, ma >> > + vle32.v v20, (INP) // Load ciphertext blocks >> > + vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks >> > + addi t1, t0, -4 >> > + vslidedown.vx v24, v20, t1 // Save last ciphertext block >> >> Do we need to setup the `e32, len=t0` for next IV? >> I think we only need 128bit IV (with VL=4). >> >> > + aes_decrypt v20, \keylen // Decrypt the blocks >> > + vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks >> > + vse32.v v20, (OUTP) // Store plaintext blocks >> > + vmv.v.v v16, v24 // Next "IV" is last ciphertext block >> >> Same VL issue here. > > It's true that the vslidedown.vx and vmv.v.v only need vl=4. But it also works > fine with vl unchanged. It just results in some extra data being moved in the > registers. My hypothesis is that this is going to be faster than having the > three extra instructions per loop iteration to change the vl to 4 twice. > > I still have no real hardware to test on, so I have no quantitative data. All I > can do is go with my instinct which is that the shorter version will be better. > > If you have access to a real CPU that supports the RISC-V vector crypto > extensions, I'd be interested in the performance you get from each variant. > (Of course, different RISC-V CPU implementations may have quite different > performance characteristics, so that still won't be definitive.) We're stacking up a lot of stuff with HW-dependent performance questions, I think it's fine to just take what's reasonably simple for now. If we try to speculate about what future hardware might do we're just going to go crazy with possibilities, IMO we're way better off just optimizing as things show up. > Here is the alternative variant given as a diff from this patch: > > diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.S b/arch/riscv/crypto/aes-riscv64-zvkned.S > index 43541aad6386c..ef380771f606a 100644 > --- a/arch/riscv/crypto/aes-riscv64-zvkned.S > +++ b/arch/riscv/crypto/aes-riscv64-zvkned.S > @@ -146,10 +146,13 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned) > vle32.v v20, (INP) // Load ciphertext blocks > vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks > addi t1, t0, -4 > + vsetivli zero, 4, e32, m4, ta, ma > vslidedown.vx v24, v20, t1 // Save last ciphertext block > + vsetvli t0, LEN, e32, m4, ta, ma > aes_decrypt v20, \keylen // Decrypt the blocks > vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks > vse32.v v20, (OUTP) // Store plaintext blocks > + vsetivli zero, 4, e32, m4, ta, ma > vmv.v.v v16, v24 // Next "IV" is last ciphertext block > slli t1, t0, 2 // Words to bytes > add INP, INP, t1 > @@ -157,7 +160,6 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned) > sub LEN, LEN, t0 > bnez LEN, 1b > > - vsetivli zero, 4, e32, m1, ta, ma > vse32.v v16, (IVP) // Store next IV > ret > .endm > > A third variant would be to just replace vmv.v.v with vmv1r.v. > > In general, this level of micro-optimization probably needs to be wait until > there are a variety of CPUs to test on. We know that parallelizing the > algorithms is helpful, so we should do that, as this patch does. But the > effects of small variations in the instruction sequences are currently unclear. Ya, I agree. So I'm fine with this, it's a base and we can always improve it when there's something concrete to run on. > - Eric