2023-07-11 16:05:39

by Heiko Stuebner

[permalink] [raw]
Subject: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

From: Heiko Stuebner <[email protected]>

This series provides cryptographic implementations using the vector
crypto extensions.

v13 of the vector patchset dropped the patches for in-kernel usage of
vector instructions, I picked the ones from v12 over into this series
for now.

My basic goal was to not re-invent cryptographic code, so the heavy
lifting is done by those perl-asm scripts used in openssl and the perl
code used here-in stems from code that is targetted at openssl [0] and is
unmodified from there to limit needed review effort.

With a matching qemu (there are patches for vector-crypto flying around)
the in-kernel crypto-selftests (also the extended ones) are very happy
so far.


changes in v4:
- split off from scalar crypto patches but base on top of them
- adapt to pending openssl code [0] using the now frozen vector crypto
extensions - with all its changes
[0] https://github.com/openssl/openssl/pull/20149

changes in v3:
- rebase on top of 6.3-rc2
- rebase on top of vector-v14 patchset
- add the missing Co-developed-by mentions to showcase
the people that did the actual openSSL crypto code

changes in v2:
- rebased on 6.2 + zbb series, so don't include already
applied changes anymore
- refresh code picked from openssl as that side matures
- more algorithms (SHA512, AES, SM3, SM4)

Greentime Hu (2):
riscv: Add support for kernel mode vector
riscv: Add vector extension XOR implementation

Heiko Stuebner (10):
RISC-V: add helper function to read the vector VLEN
RISC-V: add vector crypto extension detection
RISC-V: crypto: update perl include with helpers for vector (crypto)
instructions
RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
RISC-V: crypto: add Zvkned accelerated AES encryption implementation
RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
RISC-V: crypto: add Zvksh accelerated SM3 hash implementation

arch/riscv/crypto/Kconfig | 68 ++-
arch/riscv/crypto/Makefile | 44 +-
arch/riscv/crypto/aes-riscv-glue.c | 168 ++++++
arch/riscv/crypto/aes-riscv64-zvkned.pl | 530 ++++++++++++++++++
arch/riscv/crypto/ghash-riscv64-glue.c | 245 ++++++++
arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl | 380 +++++++++++++
arch/riscv/crypto/ghash-riscv64-zvkg.pl | 168 ++++++
arch/riscv/crypto/riscv.pm | 433 +++++++++++++-
arch/riscv/crypto/sha256-riscv64-glue.c | 115 ++++
.../crypto/sha256-riscv64-zvbb-zvknha.pl | 314 +++++++++++
arch/riscv/crypto/sha512-riscv64-glue.c | 106 ++++
.../crypto/sha512-riscv64-zvbb-zvknhb.pl | 377 +++++++++++++
arch/riscv/crypto/sm3-riscv64-glue.c | 112 ++++
arch/riscv/crypto/sm3-riscv64-zvksh.pl | 225 ++++++++
arch/riscv/crypto/sm4-riscv64-glue.c | 162 ++++++
arch/riscv/crypto/sm4-riscv64-zvksed.pl | 300 ++++++++++
arch/riscv/include/asm/hwcap.h | 9 +
arch/riscv/include/asm/vector.h | 28 +
arch/riscv/include/asm/xor.h | 82 +++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpu.c | 8 +
arch/riscv/kernel/cpufeature.c | 50 ++
arch/riscv/kernel/kernel_mode_vector.c | 132 +++++
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/xor.S | 81 +++
25 files changed, 4136 insertions(+), 3 deletions(-)
create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl
create mode 100644 arch/riscv/include/asm/xor.h
create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
create mode 100644 arch/riscv/lib/xor.S

--
2.39.2



2023-07-11 16:06:07

by Heiko Stuebner

[permalink] [raw]
Subject: [PATCH v4 09/12] RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation

From: Heiko Stuebner <[email protected]>

This adds an accelerated SHA512 algorithm using either the Zvknhb
vector crypto extension.

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 8 +-
arch/riscv/crypto/sha512-riscv64-glue.c | 106 +++++
.../crypto/sha512-riscv64-zvbb-zvknhb.pl | 377 ++++++++++++++++++
4 files changed, 501 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 860919d230aa..e564f861d95e 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -28,4 +28,15 @@ config CRYPTO_SHA256_RISCV64
Architecture: riscv64 using
- Zvknha or Zvknhb vector crypto extensions

+config CRYPTO_SHA512_RISCV64
+ tristate "Hash functions: SHA-512"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_HASH
+ select CRYPTO_SHA512
+ help
+ SHA-512 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64
+ - Zvknhb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index cae2f255ceae..b12c925172db 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ endif
obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvbb-zvknha.o

+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvbb-zvknhb.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -30,5 +33,8 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(obj)/sha256-riscv64-zvbb-zvknha.S: $(src)/sha256-riscv64-zvbb-zvknha.pl
$(call cmd,perlasm)

+$(obj)/sha512-riscv64-zvbb-zvknhb.S: $(src)/sha512-riscv64-zvbb-zvknhb.pl
+ $(call cmd,perlasm)
+
clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
-clean-files += sha256-riscv64-zvknha.S
+clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..92ea1542c22a
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha512_base.h>
+
+asmlinkage void sha512_block_data_order_zvbb_zvknhb(u64 *digest, const void *data,
+ unsigned int num_blks);
+
+
+static void __sha512_block_data_order(struct sha512_state *sst, u8 const *src,
+ int blocks)
+{
+ sha512_block_data_order_zvbb_zvknhb(sst->state, src, blocks);
+}
+
+static int sha512_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ if (crypto_simd_usable()) {
+ int ret;
+
+ kernel_rvv_begin();
+ ret = sha512_base_do_update(desc, data, len,
+ __sha512_block_data_order);
+ kernel_rvv_end();
+ return ret;
+ } else {
+ return crypto_sha512_update(desc, data, len);
+ }
+}
+
+static int sha512_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (!crypto_simd_usable())
+ return crypto_sha512_finup(desc, data, len, out);
+
+ kernel_rvv_begin();
+ if (len)
+ sha512_base_do_update(desc, data, len,
+ __sha512_block_data_order);
+
+ sha512_base_do_finalize(desc, __sha512_block_data_order);
+ kernel_rvv_end();
+
+ return sha512_base_finish(desc, out);
+}
+
+static int sha512_final(struct shash_desc *desc, u8 *out)
+{
+ return sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_alg = {
+ .digestsize = SHA512_DIGEST_SIZE,
+ .init = sha512_base_init,
+ .update = sha512_update,
+ .final = sha512_final,
+ .finup = sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .base.cra_name = "sha512",
+ .base.cra_driver_name = "sha512-riscv64-zvknhb",
+ .base.cra_priority = 150,
+ .base.cra_blocksize = SHA512_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+};
+
+static int __init sha512_mod_init(void)
+{
+ /* sha512 needs at least a vlen of 256 to work correctly */
+ if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+ riscv_isa_extension_available(NULL, ZVBB) &&
+ riscv_vector_vlen() >= 256)
+ return crypto_register_shash(&sha512_alg);
+
+ return 0;
+}
+
+static void __exit sha512_mod_fini(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+ riscv_isa_extension_available(NULL, ZVBB) &&
+ riscv_vector_vlen() >= 256)
+ crypto_unregister_shash(&sha512_alg);
+}
+
+module_init(sha512_mod_init);
+module_exit(sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <[email protected]>");
+MODULE_AUTHOR("Ard Biesheuvel <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
new file mode 100644
index 000000000000..4bd09443dcdd
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
@@ -0,0 +1,377 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvbb')
+# - Vector SHA-2 Secure Hash ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha512_block_data_order_zvbb_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvbb_zvknhb
+.type sha512_block_data_order_zvbb_zvknhb,\@function
+sha512_block_data_order_zvbb_zvknhb:
+ @{[vsetivli__x0_4_e64_m1_ta_ma]}
+
+ # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+ # We achieve this by reading with a negative stride followed by
+ # element sliding.
+ li $STRIDE, -8
+ addi $H, $H, 24
+ @{[vlse64_v $V16, $H, $STRIDE]} # {d,c,b,a}
+ addi $H, $H, 32
+ @{[vlse64_v $V17, $H, $STRIDE]} # {h,g,f,e}
+ # Keep H advanced by 24
+ addi $H, $H, -32
+
+ @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+ @{[vslidedown_vi $V26, $V16, 2]} # {b,a,X,X}
+ @{[vslidedown_vi $V16, $V17, 2]} # {f,e,X,X}
+ @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+ @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+ # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+ @{[vmv_v_v $V26, $V16]}
+ @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+ la $KT, $K512 # Load round constants K512
+
+ # Load the 1024-bits of the message block in v10-v13 and perform
+ # an endian swap on each 4 bytes element.
+ @{[vle64_v $V10, $INP]}
+ @{[vrev8_v $V10, $V10]}
+ add $INP, $INP, 32
+ @{[vle64_v $V11, $INP]}
+ @{[vrev8_v $V11, $V11]}
+ add $INP, $INP, 32
+ @{[vle64_v $V12, $INP]}
+ @{[vrev8_v $V12, $V12]}
+ add $INP, $INP, 32
+ @{[vle64_v $V13, $INP]}
+ @{[vrev8_v $V13, $V13]}
+ add $INP, $INP, 32
+
+ # Decrement length by 1
+ add $LEN, $LEN, -1
+
+ # Set v0 up for the vmerge that replaces the first word (idx==0)
+ @{[vid_v $V0]}
+ @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0)
+
+ # Quad-round 0 (+0, v10->v11->v12->v13)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V10]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+ @{[vsha2ms_vv $V10, $V14, $V13]}
+
+ # Quad-round 1 (+1, v11->v12->v13->v10)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V11]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+ @{[vsha2ms_vv $V11, $V14, $V10]}
+
+ # Quad-round 2 (+2, v12->v13->v10->v11)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V12]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+ @{[vsha2ms_vv $V12, $V14, $V11]}
+
+ # Quad-round 3 (+3, v13->v10->v11->v12)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V13]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+ @{[vsha2ms_vv $V13, $V14, $V12]}
+
+ # Quad-round 4 (+0, v10->v11->v12->v13)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V10]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+ @{[vsha2ms_vv $V10, $V14, $V13]}
+
+ # Quad-round 5 (+1, v11->v12->v13->v10)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V11]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+ @{[vsha2ms_vv $V11, $V14, $V10]}
+
+ # Quad-round 6 (+2, v12->v13->v10->v11)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V12]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+ @{[vsha2ms_vv $V12, $V14, $V11]}
+
+ # Quad-round 7 (+3, v13->v10->v11->v12)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V13]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+ @{[vsha2ms_vv $V13, $V14, $V12]}
+
+ # Quad-round 8 (+0, v10->v11->v12->v13)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V10]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+ @{[vsha2ms_vv $V10, $V14, $V13]}
+
+ # Quad-round 9 (+1, v11->v12->v13->v10)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V11]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+ @{[vsha2ms_vv $V11, $V14, $V10]}
+
+ # Quad-round 10 (+2, v12->v13->v10->v11)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V12]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+ @{[vsha2ms_vv $V12, $V14, $V11]}
+
+ # Quad-round 11 (+3, v13->v10->v11->v12)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V13]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+ @{[vsha2ms_vv $V13, $V14, $V12]}
+
+ # Quad-round 12 (+0, v10->v11->v12->v13)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V10]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+ @{[vsha2ms_vv $V10, $V14, $V13]}
+
+ # Quad-round 13 (+1, v11->v12->v13->v10)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V11]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+ @{[vsha2ms_vv $V11, $V14, $V10]}
+
+ # Quad-round 14 (+2, v12->v13->v10->v11)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V12]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+ @{[vsha2ms_vv $V12, $V14, $V11]}
+
+ # Quad-round 15 (+3, v13->v10->v11->v12)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V13]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+ @{[vsha2ms_vv $V13, $V14, $V12]}
+
+ # Quad-round 16 (+0, v10->v11->v12->v13)
+ # Note that we stop generating new message schedule words (Wt, v10-13)
+ # as we already generated all the words we end up consuming (i.e., W[79:76]).
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V10]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+
+ # Quad-round 17 (+1, v11->v12->v13->v10)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V11]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+
+ # Quad-round 18 (+2, v12->v13->v10->v11)
+ @{[vle64_v $V15, ($KT)]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V14, $V15, $V12]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+ @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+
+ # Quad-round 19 (+3, v13->v10->v11->v12)
+ @{[vle64_v $V15, ($KT)]}
+ # No t1 increment needed.
+ @{[vadd_vv $V14, $V15, $V13]}
+ @{[vsha2cl_vv $V17, $V16, $V14]}
+ @{[vsha2ch_vv $V16, $V17, $V14]}
+
+ # H' = H+{a',b',c',...,h'}
+ @{[vadd_vv $V16, $V26, $V16]}
+ @{[vadd_vv $V17, $V27, $V17]}
+ @{[vmv_v_v $V26, $V16]}
+ @{[vmv_v_v $V27, $V17]}
+ bnez $LEN, L_round_loop
+
+ # v26 = v16 = {f,e,b,a}
+ # v27 = v17 = {h,g,d,c}
+ # Let's do the opposit transformation like on entry.
+
+ @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+ @{[vslidedown_vi $V16, $V27, 2]} # {d,c,X,X}
+ @{[vslidedown_vi $V26, $V26, 2]} # {b,a,X,X}
+ @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+ # H is already advanced by 24
+ @{[vsse64_v $V16, $H, $STRIDE]} # {a,b,c,d}
+ addi $H, $H, 32
+ @{[vsse64_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+ ret
+.size sha512_block_data_order_zvbb_zvknhb,.-sha512_block_data_order_zvbb_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+ .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+ .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+ .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+ .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+ .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+ .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+ .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+ .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+ .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+ .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+ .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+ .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+ .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+ .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+ .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+ .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+ .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+ .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+ .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+ .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+ .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+ .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+ .dword 0xd192e819d6ef5218, 0xd69906245565a910
+ .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+ .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+ .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+ .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+ .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+ .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+ .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+ .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+ .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+ .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+ .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+ .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+ .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+ .dword 0x28db77f523047d84, 0x32caab7b40c72493
+ .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+ .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+ .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.39.2


2023-07-13 07:49:35

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <[email protected]>
>
> This series provides cryptographic implementations using the vector
> crypto extensions.
>
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
>
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
>
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.

Where does this patchset apply to? I tried torvalds/master, linux-next/master,
riscv/for-next, and cryptodev/master. Nothing worked. When sending a
patch(set), please always use the '--base' option to 'git format-patch', or
explicitly mention where it applies to, or provide a link to a git repo.

- Eric

2023-07-14 06:33:13

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Thu, Jul 13, 2023 at 12:40:42AM -0700, Eric Biggers wrote:
> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > From: Heiko Stuebner <[email protected]>
> >
> > This series provides cryptographic implementations using the vector
> > crypto extensions.
> >
> > v13 of the vector patchset dropped the patches for in-kernel usage of
> > vector instructions, I picked the ones from v12 over into this series
> > for now.
> >
> > My basic goal was to not re-invent cryptographic code, so the heavy
> > lifting is done by those perl-asm scripts used in openssl and the perl
> > code used here-in stems from code that is targetted at openssl [0] and is
> > unmodified from there to limit needed review effort.
> >
> > With a matching qemu (there are patches for vector-crypto flying around)
> > the in-kernel crypto-selftests (also the extended ones) are very happy
> > so far.
>
> Where does this patchset apply to? I tried torvalds/master, linux-next/master,
> riscv/for-next, and cryptodev/master. Nothing worked. When sending a
> patch(set), please always use the '--base' option to 'git format-patch', or
> explicitly mention where it applies to, or provide a link to a git repo.
>

Hi Heiko, any update on this? I would like to review, and maybe test, this
patchset but there's no way for me to do so.

- Eric

2023-07-14 07:15:06

by Heiko Stuebner

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

Hi Eric,

Am Freitag, 14. Juli 2023, 08:27:08 CEST schrieb Eric Biggers:
> On Thu, Jul 13, 2023 at 12:40:42AM -0700, Eric Biggers wrote:
> > On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <[email protected]>
> > >
> > > This series provides cryptographic implementations using the vector
> > > crypto extensions.
> > >
> > > v13 of the vector patchset dropped the patches for in-kernel usage of
> > > vector instructions, I picked the ones from v12 over into this series
> > > for now.
> > >
> > > My basic goal was to not re-invent cryptographic code, so the heavy
> > > lifting is done by those perl-asm scripts used in openssl and the perl
> > > code used here-in stems from code that is targetted at openssl [0] and is
> > > unmodified from there to limit needed review effort.
> > >
> > > With a matching qemu (there are patches for vector-crypto flying around)
> > > the in-kernel crypto-selftests (also the extended ones) are very happy
> > > so far.
> >
> > Where does this patchset apply to? I tried torvalds/master, linux-next/master,
> > riscv/for-next, and cryptodev/master. Nothing worked. When sending a
> > patch(set), please always use the '--base' option to 'git format-patch', or
> > explicitly mention where it applies to, or provide a link to a git repo.
> >
>
> Hi Heiko, any update on this? I would like to review, and maybe test, this
> patchset but there's no way for me to do so.

sorry about that. As you said, this should've been mentioned in the
cover-letter.

This patchset goes on top of the v6 scalar one [0] which in turn
goes on top of the arch-random patchset [1] and that in turn sits
on top of 6.5-rc1 for me.


Heiko


[0] https://lore.kernel.org/r/[email protected]
[1] https://lore.kernel.org/r/[email protected]



2023-07-21 05:16:26

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

Hi Heiko,

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <[email protected]>
>
> This series provides cryptographic implementations using the vector
> crypto extensions.
>
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
>
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
>
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.
>
>
> changes in v4:
> - split off from scalar crypto patches but base on top of them
> - adapt to pending openssl code [0] using the now frozen vector crypto
> extensions - with all its changes
> [0] https://github.com/openssl/openssl/pull/20149
>
> changes in v3:
> - rebase on top of 6.3-rc2
> - rebase on top of vector-v14 patchset
> - add the missing Co-developed-by mentions to showcase
> the people that did the actual openSSL crypto code
>
> changes in v2:
> - rebased on 6.2 + zbb series, so don't include already
> applied changes anymore
> - refresh code picked from openssl as that side matures
> - more algorithms (SHA512, AES, SM3, SM4)
>
> Greentime Hu (2):
> riscv: Add support for kernel mode vector
> riscv: Add vector extension XOR implementation
>
> Heiko Stuebner (10):
> RISC-V: add helper function to read the vector VLEN
> RISC-V: add vector crypto extension detection
> RISC-V: crypto: update perl include with helpers for vector (crypto)
> instructions
> RISC-V: crypto: add Zvbb+Zvbc accelerated GCM GHASH implementation
> RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
> RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation
> RISC-V: crypto: add a vector-crypto-accelerated SHA512 implementation
> RISC-V: crypto: add Zvkned accelerated AES encryption implementation
> RISC-V: crypto: add Zvksed accelerated SM4 encryption implementation
> RISC-V: crypto: add Zvksh accelerated SM3 hash implementation
>
> arch/riscv/crypto/Kconfig | 68 ++-
> arch/riscv/crypto/Makefile | 44 +-
> arch/riscv/crypto/aes-riscv-glue.c | 168 ++++++
> arch/riscv/crypto/aes-riscv64-zvkned.pl | 530 ++++++++++++++++++
> arch/riscv/crypto/ghash-riscv64-glue.c | 245 ++++++++
> arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl | 380 +++++++++++++
> arch/riscv/crypto/ghash-riscv64-zvkg.pl | 168 ++++++
> arch/riscv/crypto/riscv.pm | 433 +++++++++++++-
> arch/riscv/crypto/sha256-riscv64-glue.c | 115 ++++
> .../crypto/sha256-riscv64-zvbb-zvknha.pl | 314 +++++++++++
> arch/riscv/crypto/sha512-riscv64-glue.c | 106 ++++
> .../crypto/sha512-riscv64-zvbb-zvknhb.pl | 377 +++++++++++++
> arch/riscv/crypto/sm3-riscv64-glue.c | 112 ++++
> arch/riscv/crypto/sm3-riscv64-zvksh.pl | 225 ++++++++
> arch/riscv/crypto/sm4-riscv64-glue.c | 162 ++++++
> arch/riscv/crypto/sm4-riscv64-zvksed.pl | 300 ++++++++++
> arch/riscv/include/asm/hwcap.h | 9 +
> arch/riscv/include/asm/vector.h | 28 +
> arch/riscv/include/asm/xor.h | 82 +++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/cpu.c | 8 +
> arch/riscv/kernel/cpufeature.c | 50 ++
> arch/riscv/kernel/kernel_mode_vector.c | 132 +++++
> arch/riscv/lib/Makefile | 1 +
> arch/riscv/lib/xor.S | 81 +++
> 25 files changed, 4136 insertions(+), 3 deletions(-)
> create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
> create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
> create mode 100644 arch/riscv/crypto/ghash-riscv64-zvbb-zvbc.pl
> create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
> create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sha256-riscv64-zvbb-zvknha.pl
> create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sha512-riscv64-zvbb-zvknhb.pl
> create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
> create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl
> create mode 100644 arch/riscv/include/asm/xor.h
> create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
> create mode 100644 arch/riscv/lib/xor.S
>

Thanks for working on this patchset! I'm glad to see that you and others are
working on this and the code in OpenSSL. And thanks for running all the kernel
crypto self-tests and verifying that they pass.

I'm still a bit worried about there being two competing sets of crypto
extensions for RISC-V: scalar and vector.

However the vector crypto extensions are moving forwards (they were recently
frozen), from what I've heard are being implemented in CPUs, and based on this
patchset implementations of most algorithms are ready already.

So I'm wondering: do you still think that it's valuable to continue with your
other patchset that adds GHASH acceleration using the scalar extensions (which
this patchset is still based on)?

I'm wondering if we should be 100% focused on the vector extensions for now to
avoid fragmentation of effort.

It's just not super clear to me what is driving the scalar crypto support right
now. Maybe embedded systems? Maybe it was just a mistep, perhaps due to being
started before the CPU even had a vector unit? I don't know. If you do indeed
have a strong reason for it, then you can go ahead -- I just wanted to make sure
we don't end up doing twice as much work unnecessarily.

- Eric

2023-09-14 00:12:35

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <[email protected]>
>
> This series provides cryptographic implementations using the vector
> crypto extensions.
>
> v13 of the vector patchset dropped the patches for in-kernel usage of
> vector instructions, I picked the ones from v12 over into this series
> for now.
>
> My basic goal was to not re-invent cryptographic code, so the heavy
> lifting is done by those perl-asm scripts used in openssl and the perl
> code used here-in stems from code that is targetted at openssl [0] and is
> unmodified from there to limit needed review effort.
>
> With a matching qemu (there are patches for vector-crypto flying around)
> the in-kernel crypto-selftests (also the extended ones) are very happy
> so far.

Hi Heiko! Are you still working on this patchset? And which of its
prerequisites still haven't been merged upstream?

- Eric

2023-10-06 19:47:55

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> On Sep 15, 2023, at 09:48, He-Jie Shih <[email protected]> wrote:
>
> > On Sep 14, 2023, at 09:10, Charlie Jenkins <[email protected]> wrote:
> >
> >> On Wed, Sep 13, 2023 at 05:11:44PM -0700, Eric Biggers wrote:
> >>> On Tue, Jul 11, 2023 at 05:37:31PM +0200, Heiko Stuebner wrote:
> >>>
> >>> Hi Heiko! Are you still working on this patchset? And which of its
> >>> prerequisites still haven't been merged upstream?
> >>>
> >>> - Eric
> >> It is my understanding that Heiko is taking a break from development, I
> >> don't think he will be working on this soon.
> >
> > We would like to take over these RISC-V vector crypto implementations.
> > Our proposed implementations is under reviewing in OpenSSL pr.
> > And I will check the gluing parts between Linux kernel and OpenSSL.
>
> The OpenSSL PR is at [1].
> And we are from SiFive.
>
> -Jerry
>
> [1]
> https://github.com/openssl/openssl/pull/21923

Hi Jerry, I'm wondering if you have an update on this? Do you need any help?

I'm also wondering about riscv.pm and the choice of generating the crypto
instructions from .words instead of using the assembler. It makes it
significantly harder to review the code, IMO. Can we depend on assembler
support for these instructions, or is that just not ready yet?

- Eric

2023-10-06 23:34:08

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <[email protected]> wrote:
>
> On Oct 7, 2023, at 03:47, Eric Biggers <[email protected]> wrote:
> > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
> >> On Sep 15, 2023, at 09:48, He-Jie Shih <[email protected]> wrote:
> >> The OpenSSL PR is at [1].
> >> And we are from SiFive.
> >>
> >> -Jerry
> >>
> >> [1]
> >> https://github.com/openssl/openssl/pull/21923
> >
> > Hi Jerry, I'm wondering if you have an update on this? Do you need any help?
>
> We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test
> cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the
> corner case(e.g. aes-xts with tail element).
>

There should be test cases for that.

> For aes-xts, I'm trying to update the implementation from OpenSSL. The design
> philosophy is different between OpenSSL and linux. In linux crypto, the data will
> be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist
> entry call.

Yes, this applies to all block ciphers that take an IV.

> And I'm thinking about how to handle the tail data in a simple way.

The RISC-V vector ISA is quite advanced, so there may be a better
trick using predicates etc but otherwise, I suppose you could reuse
the same trick that other asm implementations use, which is to use
unaligned loads and stores for the final blocks, and to use a vector
permute with a permute table to shift the bytes in the registers. But
this is not performance critical, given that existing in-kernel users
use sector or page size inputs only.

> By the way, the `xts(aes)` implementation for arm and x86 are using
> `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail
> element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size
> it not be a multiple of block size.
>

No, both XTS and CBC-CTS permit inputs that are not a multiple of the
block size, and will use some form of ciphertext stealing for the
final tail. There is a generic CTS template that wraps CBC but
combining them in the same way (e.g., using vector permute) will speed
up things substantially. *However*, I'm not sure how relevant CBC-CTS
is in the kernel, given that only fscrypt uses it IIRC but actually
prefers something else so for new systems perhaps you shouldn't
bother.

> Overall, we will have
> 1) aes cipher
> 2) aes with cbc/ecb/ctr/xts mode
> 3) sha256/512 for `vlen>=128` platform
> 4) sm3 for `vlen>=128` platform
> 5) sm4
> 6) ghash
> 7) `chacha20` stream cipher
>
> The vector crypto pr in OpenSSL is under reviewing, we are still updating the
> perl file into linux.
>
> The most complicated `gcm(aes)` mode will be in our next plan.
>
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler. It makes it
> > significantly harder to review the code, IMO. Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
>
> I have asked the same question before[1]. The reason is that Openssl could use
> very old compiler for compiling. Thus, the assembler might not know the standard
> rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all
> vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The
> gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl`
> file instead of the actually generated opcode for OpenSSL pr reviewing. And it's
> not hard to read the perl code.
>

I understand the desire to reuse code, and OpenSSL already relies on
so-called perlasm for this, but I think this is not a great choice,
and I actually think this was a mistake for RISC-V. OpenSSL relies on
perlasm for things like emittting different function pro-/epilogues
depending on the calling convention (SysV versus MS on x86_64, for
instance), but RISC-V does not have that much variety, and already
supports the insn_r / insn_i pseudo instructions to emit arbitrary
opcodes while still supporting named registers as usual. [Maybe my
experience does not quite extrapolate to the vector ISA, but I managed
to implement scalar AES [0] using the insn_r and insn_i pseudo
instructions (which are generally provided by the assembler but Linux
has fallback CPP macros for them as well), and this results on much
more maintainable code IMO.[

We are using some of the OpenSSL perlasm in the kernel already (and
some of it was introduced by me) but I don't think we should blindly
reuse all of the RISC-V code if some of it can straight-forwardly be
written as normal .S files.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes

2023-10-31 02:17:27

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Oct 7, 2023, at 03:47, Eric Biggers <[email protected]> wrote:
> On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote:
>> On Sep 15, 2023, at 09:48, He-Jie Shih <[email protected]> wrote:
>>
>> The OpenSSL PR is at [1].
>> And we are from SiFive.
>>
>> -Jerry
>>
>> [1]
>> https://github.com/openssl/openssl/pull/21923
>
> Hi Jerry, I'm wondering if you have an update on this? Do you need any help?

The RISC-V vector crypto OpenSSL pr[1] is merged.
And we also sent the vector-crypto patch based on Heiko's and OpenSSL
works.
Here is the link:
https://lore.kernel.org/all/[email protected]/

[1]
https://github.com/openssl/openssl/pull/21923

> I'm also wondering about riscv.pm and the choice of generating the crypto
> instructions from .words instead of using the assembler. It makes it
> significantly harder to review the code, IMO. Can we depend on assembler
> support for these instructions, or is that just not ready yet?
>
> - Eric

There is no public assembler supports the vector-crypto asm mnemonics.
We should still use `opcode` for vector-crypto instructions. But we might
use asm for standard rvv parts.
In order to reuse the codes in OpenSSL as much as possible, we still use
the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
mnemonic is still a better approach, I will `rewrite` all standard rvv parts
with asm mnemonics in next patch.

-Jerry


2023-11-02 04:04:08

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

Hi Jerry,

(Just so you know, you still need to fix your email configuration. Your emails
have a bogus Reply-To header, which makes replies not be sent to you by default.
I had to manually set the "To:" address when replying.)

On Tue, Oct 31, 2023 at 10:17:11AM +0800, Jerry Shih wrote:
>
> The RISC-V vector crypto OpenSSL pr[1] is merged.
> And we also sent the vector-crypto patch based on Heiko's and OpenSSL
> works.
> Here is the link:
> https://lore.kernel.org/all/[email protected]/
>
> [1]
> https://github.com/openssl/openssl/pull/21923

Awesome, thanks!

>
> > I'm also wondering about riscv.pm and the choice of generating the crypto
> > instructions from .words instead of using the assembler. It makes it
> > significantly harder to review the code, IMO. Can we depend on assembler
> > support for these instructions, or is that just not ready yet?
> >
> > - Eric
>
> There is no public assembler supports the vector-crypto asm mnemonics.
> We should still use `opcode` for vector-crypto instructions. But we might
> use asm for standard rvv parts.
> In order to reuse the codes in OpenSSL as much as possible, we still use
> the `riscv.pm` for all standard rvv and vector-crypto instructions. If the asm
> mnemonic is still a better approach, I will `rewrite` all standard rvv parts
> with asm mnemonics in next patch.

Tip-of-tree gcc + binutils seems to support them. Building some of the sample
code from the riscv-crypto repository:

$ riscv64-linux-gnu-as --version
GNU assembler (GNU Binutils) 2.41.50.20231021
$ riscv64-linux-gnu-gcc --version
riscv64-linux-gnu-gcc (GCC) 14.0.0 20231021 (experimental)
$ riscv64-linux-gnu-gcc -march=rv64ivzvkned -c riscv-crypto/doc/vector/code-samples/zvkned.s

And tip-of-tree clang supports them experimentally:

$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project 30416f39be326b403e19f23da387009736483119)
$ clang -menable-experimental-extensions -target riscv64-linux-gnu -march=rv64ivzvkned1 -c riscv-crypto/doc/vector/code-samples/zvkned.s

It would be nice to use a real assembler, so that people won't have to worry
about potential mistakes or inconsistencies in the perl-based "assembler". Also
keep in mind that if we allow people to compile this code without the real
assembler support from the beginning, it might end up staying that way for quite
a while in order to avoid breaking the build for people.

Ultimately it's up to you though; I think that you and others who have been
working on RISC-V crypto can make the best decision about what to do here. I
also don't want this patchset to be delayed waiting for other projects, so maybe
that indeed means the perl-based "assembler" needs to be used for now.

- Eric

2023-11-23 23:45:10

by Christoph Müllner

[permalink] [raw]
Subject: Re: [PATCH v4 00/12] RISC-V: support some cryptography accelerations

On Thu, Nov 23, 2023 at 12:43 AM Eric Biggers <[email protected]> wrote:
>
> On Wed, Nov 22, 2023 at 03:58:17PM +0800, Jerry Shih wrote:
> > On Nov 22, 2023, at 07:51, Eric Biggers <[email protected]> wrote:
> > > On Wed, Nov 01, 2023 at 09:03:33PM -0700, Eric Biggers wrote:
> > >>
> > >> It would be nice to use a real assembler, so that people won't have to worry
> > >> about potential mistakes or inconsistencies in the perl-based "assembler". Also
> > >> keep in mind that if we allow people to compile this code without the real
> > >> assembler support from the beginning, it might end up staying that way for quite
> > >> a while in order to avoid breaking the build for people.
> > >>
> > >> Ultimately it's up to you though; I think that you and others who have been
> > >> working on RISC-V crypto can make the best decision about what to do here. I
> > >> also don't want this patchset to be delayed waiting for other projects, so maybe
> > >> that indeed means the perl-based "assembler" needs to be used for now.
> > >
> > > Just wanted to bump up this discussion again. In binutils, the vector crypto
> > > v1.0.0 support was released 4 months ago in 2.41. See the NEWS file at
> > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=binutils/NEWS;hb=refs/heads/binutils-2_41-branch
> > >
> > > * The RISC-V port now supports the following new standard extensions:
> > > - Zicond (conditional zero instructions)
> > > - Zfa (additional floating-point instructions)
> > > - Zvbb, Zvbc, Zvkg, Zvkned, Zvknh[ab], Zvksed, Zvksh, Zvkn, Zvknc, Zvkng,
> > > Zvks, Zvksc, Zvkg, Zvkt (vector crypto instructions)
> > >
> > > That's every extension listed in the vector crypto v1.0.0 specification
> > > (https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf).
> >
> > It doesn't fit all v1.0 spec.
> > The `Zvkb` is missed in binutils. It's the subset of `Zvbb`. We needs some extra
> > works if user just try to use `Zvkb`.
> > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkb.adoc
> > Some crypto algorithms are already checking for `Zvkb` instead of `Zvbb`.
>
> Yeah, that's unfortunate that Zvkb got missed in binutils. However, since all
> Zvkb instructions are part of Zvbb, which is supported, assembling Zvkb
> instructions should still work --- right?

Not forgotten, but the Zvkb extension did not exist when the patchset
was merged.
RISC-V extension support is typically merged when specifications are "frozen".
This means a high bar for changes, but they are possible until the
spec is ratified.
Often nothing is changed until ratification, but here Zvkb has been
(re-)introduced.

I was not aware of this untils I read this email, so I just wrote a
patch that fills the gap:
https://sourceware.org/pipermail/binutils/2023-November/130762.html

Thanks for reporting!

BR
Christoph

>
> > > LLVM still has the vector crypto extensions marked as "experimental" extensions.
> > > However, there is an open pull request to mark them non-experimental:
> > > https://github.com/llvm/llvm-project/pull/69000
> > >
> > > Could we just go ahead and remove riscv.pm, develop with binutils for now, and
> > > prioritize getting the LLVM support finished?
> >
> > Yes, we could.
> > But we need to handle the extensions checking for toolchains like:
> > https://github.com/torvalds/linux/commit/b6fcdb191e36f82336f9b5e126d51c02e7323480
> > I could do that, but I think I need some times to test the builds. And it will introduce
> > more dependency patch which is not related to actual crypto algorithms and the
> > gluing code in kernel. I will send another patch for toolchain part after the v2 patch.
> > And I am working for v2 patch with your new review comments. The v2 will still
> > use `perlasm`.
>
> Note that perlasm (.pl) vs assembly (.S), and ".inst" vs real assembler
> instructions, are actually separate concerns. We could use real assembler
> instructions while still using perlasm. Or we could use assembly while still
> using macros that generate the instructions as .inst.
>
> My preference is indeed both: assembly (.S) with real assembler instructions.
> That would keep things more straightforward.
>
> We do not necessarily need to do both before merging the code, though. It will
> be beneficial to get this code merged sooner rather than later, so that other
> people can work on improving it.
>
> I would prioritize the change to use real assembler instructions. I do think
> it's worth thinking about getting that change in from the beginning, so that the
> toolchain prerequisites are properly in place from the beginning and people can
> properly account for them and prioritize the toolchain work as needed.
>
> - Eric