2023-11-27 07:07:30

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 00/13] RISC-V: provide some accelerated cryptography implementations using vector extensions

This series provides cryptographic implementations using the vector crypto
extensions[1] including:
1. AES cipher
2. AES with CBC/CTR/ECB/XTS block modes
3. ChaCha20 stream cipher
4. GHASH for GCM
5. SHA-224/256 and SHA-384/512 hash
6. SM3 hash
7. SM4 cipher

This patch set is based on Heiko Stuebner's work at:
Link: https://lore.kernel.org/all/[email protected]/

The implementations reuse the perl-asm scripts from OpenSSL[2] with some
changes adapting for the kernel crypto framework.
The perl-asm scripts generate the opcodes into `.S` files instead of asm
mnemonics. The reason for using opcodes is because that we don't want to
re-implement all crypto functions from OpenSSL. We will try to replace
perl-asm with asm mnemonics in the future. It needs lots of extensions
checking for toolchains. We already have RVV 1.0 in kernel, so we will
replace the RVV opcodes with asm mnemonics at first.

All changes pass the kernel run-time crypto self tests and the extra tests
with vector-crypto-enabled qemu.
Link: https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg00281.html

This series depend on:
1. kernel 6.6-rc7
Link: https://github.com/torvalds/linux/commit/05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
2. support kernel-mode vector
Link: https://lore.kernel.org/all/[email protected]/
3. vector crypto extensions detection
Link: https://lore.kernel.org/lkml/[email protected]/
4. fix the error message:
alg: skcipher: skipping comparison tests for xts-aes-aesni because
xts(ecb(aes-generic)) is unavailable
Link: https://lore.kernel.org/linux-crypto/[email protected]/

Here is a branch on github applying with all dependent patches:
Link: https://github.com/JerryShih/linux/tree/dev/jerrys/vector-crypto-upstream-v2

[1]
Link: https://github.com/riscv/riscv-crypto/blob/56ed7952d13eb5bdff92e2b522404668952f416d/doc/vector/riscv-crypto-spec-vector.adoc
[2]
Link: https://github.com/openssl/openssl/pull/21923

Updated patches (on current order): 4, 7, 8, 9, 10, 11, 12, 13
New patch: 6,
Unchanged patch: 1, 2, 3, 5
Deleted patch: -

Changelog v2:
- Do not turn on the RISC-V accelerated crypto kconfig options by
default.
- Assume RISC-V vector extension could support unaligned access in
kernel.
- Turn to use simd skcipher interface for AES-CBC/CTR/ECB/XTS and
Chacha20.
- Rename crypto file and driver names to make the most important
extension at first place.

Heiko Stuebner (2):
RISC-V: add helper function to read the vector VLEN
RISC-V: hook new crypto subdir into build-system

Jerry Shih (11):
RISC-V: crypto: add OpenSSL perl module for vector instructions
RISC-V: crypto: add Zvkned accelerated AES implementation
crypto: simd - Update `walksize` in simd skcipher
crypto: scatterwalk - Add scatterwalk_next() to get the next
scatterlist in scatter_walk
RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations
RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations
RISC-V: crypto: add Zvksed accelerated SM4 implementation
RISC-V: crypto: add Zvksh accelerated SM3 implementation
RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

arch/riscv/Kbuild | 1 +
arch/riscv/crypto/Kconfig | 110 ++
arch/riscv/crypto/Makefile | 68 +
.../crypto/aes-riscv64-block-mode-glue.c | 514 +++++++
arch/riscv/crypto/aes-riscv64-glue.c | 151 ++
arch/riscv/crypto/aes-riscv64-glue.h | 18 +
.../crypto/aes-riscv64-zvkned-zvbb-zvkg.pl | 949 ++++++++++++
arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl | 415 +++++
arch/riscv/crypto/aes-riscv64-zvkned.pl | 1339 +++++++++++++++++
arch/riscv/crypto/chacha-riscv64-glue.c | 122 ++
arch/riscv/crypto/chacha-riscv64-zvkb.pl | 321 ++++
arch/riscv/crypto/ghash-riscv64-glue.c | 175 +++
arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 ++
arch/riscv/crypto/riscv.pm | 828 ++++++++++
arch/riscv/crypto/sha256-riscv64-glue.c | 145 ++
.../sha256-riscv64-zvknha_or_zvknhb-zvkb.pl | 318 ++++
arch/riscv/crypto/sha512-riscv64-glue.c | 139 ++
.../crypto/sha512-riscv64-zvknhb-zvkb.pl | 266 ++++
arch/riscv/crypto/sm3-riscv64-glue.c | 124 ++
arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 +++
arch/riscv/crypto/sm4-riscv64-glue.c | 121 ++
arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++
arch/riscv/include/asm/vector.h | 11 +
crypto/Kconfig | 3 +
crypto/cryptd.c | 1 +
crypto/simd.c | 1 +
include/crypto/scatterwalk.h | 9 +-
27 files changed, 6745 insertions(+), 2 deletions(-)
create mode 100644 arch/riscv/crypto/Kconfig
create mode 100644 arch/riscv/crypto/Makefile
create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl
create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
create mode 100644 arch/riscv/crypto/riscv.pm
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

--
2.28.0


2023-11-27 07:07:57

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 01/13] RISC-V: add helper function to read the vector VLEN

From: Heiko Stuebner <[email protected]>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_v_vsize that contains
the value of "32 vector registers with vlenb length" that gets filled
during boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/include/asm/vector.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 9fb2dea66abd..1fd3e5510b64 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -244,4 +244,15 @@ void kernel_vector_allow_preemption(void);
#define kernel_vector_allow_preemption() do {} while (0)
#endif

+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_v_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+ return riscv_v_vsize / 32 * 8;
+}
+
#endif /* ! __ASM_RISCV_VECTOR_H */
--
2.28.0

2023-11-27 07:08:03

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

The `walksize` assignment is missed in simd skcipher.

Signed-off-by: Jerry Shih <[email protected]>
---
crypto/cryptd.c | 1 +
crypto/simd.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index bbcc368b6a55..253d13504ccb 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
(alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
+ inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);

diff --git a/crypto/simd.c b/crypto/simd.c
index edaa479a1ec5..ea0caabf90f1 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
@@ -181,6 +181,7 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const char *algname,

alg->ivsize = ialg->ivsize;
alg->chunksize = ialg->chunksize;
+ alg->walksize = ialg->walksize;
alg->min_keysize = ialg->min_keysize;
alg->max_keysize = ialg->max_keysize;

--
2.28.0

2023-11-27 07:08:12

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 02/13] RISC-V: hook new crypto subdir into build-system

From: Heiko Stuebner <[email protected]>

Create a crypto subdirectory for added accelerated cryptography routines
and hook it into the riscv Kbuild and the main crypto Kconfig.

Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/Kbuild | 1 +
arch/riscv/crypto/Kconfig | 5 +++++
arch/riscv/crypto/Makefile | 4 ++++
crypto/Kconfig | 3 +++
4 files changed, 13 insertions(+)
create mode 100644 arch/riscv/crypto/Kconfig
create mode 100644 arch/riscv/crypto/Makefile

diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index d25ad1c19f88..2c585f7a0b6e 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -2,6 +2,7 @@

obj-y += kernel/ mm/ net/
obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
+obj-$(CONFIG_CRYPTO) += crypto/
obj-y += errata/
obj-$(CONFIG_KVM) += kvm/

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
new file mode 100644
index 000000000000..10d60edc0110
--- /dev/null
+++ b/arch/riscv/crypto/Kconfig
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
+
+endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
new file mode 100644
index 000000000000..b3b6332c9f6d
--- /dev/null
+++ b/arch/riscv/crypto/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# linux/arch/riscv/crypto/Makefile
+#
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 650b1b3620d8..c7b23d2c58e4 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1436,6 +1436,9 @@ endif
if PPC
source "arch/powerpc/crypto/Kconfig"
endif
+if RISCV
+source "arch/riscv/crypto/Kconfig"
+endif
if S390
source "arch/s390/crypto/Kconfig"
endif
--
2.28.0

2023-11-27 07:08:12

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

The AES implementation using the Zvkned vector crypto extension from
OpenSSL(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `AES_RISCV64` option by default.
- Turn to use `crypto_aes_ctx` structure for aes key.
- Use `Zvkned` extension for AES-128/256 key expanding.
- Export riscv64_aes_* symbols for other modules.
- Add `asmlinkage` qualifier for crypto asm function.
- Reorder structure riscv64_aes_alg_zvkned members initialization in
the order declared.
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 11 +
arch/riscv/crypto/aes-riscv64-glue.c | 151 ++++++
arch/riscv/crypto/aes-riscv64-glue.h | 18 +
arch/riscv/crypto/aes-riscv64-zvkned.pl | 593 ++++++++++++++++++++++++
5 files changed, 784 insertions(+)
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 10d60edc0110..65189d4d47b3 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,4 +2,15 @@

menu "Accelerated Cryptographic Algorithms for CPU (riscv)"

+config CRYPTO_AES_RISCV64
+ tristate "Ciphers: AES"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_ALGAPI
+ select CRYPTO_LIB_AES
+ help
+ Block ciphers: AES cipher algorithms (FIPS-197)
+
+ Architecture: riscv64 using:
+ - Zvkned vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b3b6332c9f6d..90ca91d8df26 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -2,3 +2,14 @@
#
# linux/arch/riscv/crypto/Makefile
#
+
+obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
+aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
+
+quiet_cmd_perlasm = PERLASM $@
+ cmd_perlasm = $(PERL) $(<) void $(@)
+
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+ $(call cmd,perlasm)
+
+clean-files += aes-riscv64-zvkned.S
diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-riscv64-glue.c
new file mode 100644
index 000000000000..091e368edb30
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES implementation for RISC-V
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+/* aes cipher using zvkned vector crypto extension */
+asmlinkage int rv64i_zvkned_set_encrypt_key(const u8 *user_key, const int bytes,
+ const struct crypto_aes_ctx *key);
+asmlinkage void rv64i_zvkned_encrypt(const u8 *in, u8 *out,
+ const struct crypto_aes_ctx *key);
+asmlinkage void rv64i_zvkned_decrypt(const u8 *in, u8 *out,
+ const struct crypto_aes_ctx *key);
+
+int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
+ unsigned int keylen)
+{
+ int ret;
+
+ ret = aes_check_keylen(keylen);
+ if (ret < 0)
+ return -EINVAL;
+
+ /*
+ * The RISC-V AES vector crypto key expanding doesn't support AES-192.
+ * Use the generic software key expanding for that case.
+ */
+ if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
+ /*
+ * All zvkned-based functions use encryption expanding keys for both
+ * encryption and decryption.
+ */
+ kernel_vector_begin();
+ rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
+ kernel_vector_end();
+ } else {
+ ret = aes_expandkey(ctx, key, keylen);
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(riscv64_aes_setkey);
+
+void riscv64_aes_encrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+ const u8 *src)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvkned_encrypt(src, dst, ctx);
+ kernel_vector_end();
+ } else {
+ aes_encrypt(ctx, dst, src);
+ }
+}
+EXPORT_SYMBOL(riscv64_aes_encrypt_zvkned);
+
+void riscv64_aes_decrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+ const u8 *src)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvkned_decrypt(src, dst, ctx);
+ kernel_vector_end();
+ } else {
+ aes_decrypt(ctx, dst, src);
+ }
+}
+EXPORT_SYMBOL(riscv64_aes_decrypt_zvkned);
+
+static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return riscv64_aes_setkey(ctx, key, keylen);
+}
+
+static void aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ riscv64_aes_encrypt_zvkned(ctx, dst, src);
+}
+
+static void aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ riscv64_aes_decrypt_zvkned(ctx, dst, src);
+}
+
+static struct crypto_alg riscv64_aes_alg_zvkned = {
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "aes",
+ .cra_driver_name = "aes-riscv64-zvkned",
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = aes_setkey,
+ .cia_encrypt = aes_encrypt_zvkned,
+ .cia_decrypt = aes_decrypt_zvkned,
+ },
+ .cra_module = THIS_MODULE,
+};
+
+static inline bool check_aes_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_aes_mod_init(void)
+{
+ if (check_aes_ext())
+ return crypto_register_alg(&riscv64_aes_alg_zvkned);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_aes_mod_fini(void)
+{
+ crypto_unregister_alg(&riscv64_aes_alg_zvkned);
+}
+
+module_init(riscv64_aes_mod_init);
+module_exit(riscv64_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-riscv64-glue.h
new file mode 100644
index 000000000000..0416bbc4318e
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef AES_RISCV64_GLUE_H
+#define AES_RISCV64_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
+ unsigned int keylen);
+
+void riscv64_aes_encrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+ const u8 *src);
+
+void riscv64_aes_decrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst,
+ const u8 *src);
+
+#endif /* AES_RISCV64_GLUE_H */
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..303e82d9f6f0
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,593 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+{
+################################################################################
+# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bytes,
+# AES_KEY *key)
+my ($UKEY, $BYTES, $KEYP) = ("a0", "a1", "a2");
+my ($T0) = ("t0");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_encrypt_key
+.type rv64i_zvkned_set_encrypt_key,\@function
+rv64i_zvkned_set_encrypt_key:
+ beqz $UKEY, L_fail_m1
+ beqz $KEYP, L_fail_m1
+
+ # Store the key length.
+ sw $BYTES, 480($KEYP)
+
+ li $T0, 32
+ beq $BYTES, $T0, L_set_key_256
+ li $T0, 16
+ beq $BYTES, $T0, L_set_key_128
+
+ j L_fail_m2
+
+L_set_key_128:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load the key
+ @{[vle32_v $V10, $UKEY]}
+
+ # Generate keys for round 2-11 into registers v11-v20.
+ @{[vaeskf1_vi $V11, $V10, 1]} # v11 <- rk2 (w[ 4, 7])
+ @{[vaeskf1_vi $V12, $V11, 2]} # v12 <- rk3 (w[ 8,11])
+ @{[vaeskf1_vi $V13, $V12, 3]} # v13 <- rk4 (w[12,15])
+ @{[vaeskf1_vi $V14, $V13, 4]} # v14 <- rk5 (w[16,19])
+ @{[vaeskf1_vi $V15, $V14, 5]} # v15 <- rk6 (w[20,23])
+ @{[vaeskf1_vi $V16, $V15, 6]} # v16 <- rk7 (w[24,27])
+ @{[vaeskf1_vi $V17, $V16, 7]} # v17 <- rk8 (w[28,31])
+ @{[vaeskf1_vi $V18, $V17, 8]} # v18 <- rk9 (w[32,35])
+ @{[vaeskf1_vi $V19, $V18, 9]} # v19 <- rk10 (w[36,39])
+ @{[vaeskf1_vi $V20, $V19, 10]} # v20 <- rk11 (w[40,43])
+
+ # Store the round keys
+ @{[vse32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V13, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V14, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V15, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V16, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V17, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V18, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V19, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V20, $KEYP]}
+
+ li a0, 1
+ ret
+
+L_set_key_256:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load the key
+ @{[vle32_v $V10, $UKEY]}
+ addi $UKEY, $UKEY, 16
+ @{[vle32_v $V11, $UKEY]}
+
+ @{[vmv_v_v $V12, $V10]}
+ @{[vaeskf2_vi $V12, $V11, 2]}
+ @{[vmv_v_v $V13, $V11]}
+ @{[vaeskf2_vi $V13, $V12, 3]}
+ @{[vmv_v_v $V14, $V12]}
+ @{[vaeskf2_vi $V14, $V13, 4]}
+ @{[vmv_v_v $V15, $V13]}
+ @{[vaeskf2_vi $V15, $V14, 5]}
+ @{[vmv_v_v $V16, $V14]}
+ @{[vaeskf2_vi $V16, $V15, 6]}
+ @{[vmv_v_v $V17, $V15]}
+ @{[vaeskf2_vi $V17, $V16, 7]}
+ @{[vmv_v_v $V18, $V16]}
+ @{[vaeskf2_vi $V18, $V17, 8]}
+ @{[vmv_v_v $V19, $V17]}
+ @{[vaeskf2_vi $V19, $V18, 9]}
+ @{[vmv_v_v $V20, $V18]}
+ @{[vaeskf2_vi $V20, $V19, 10]}
+ @{[vmv_v_v $V21, $V19]}
+ @{[vaeskf2_vi $V21, $V20, 11]}
+ @{[vmv_v_v $V22, $V20]}
+ @{[vaeskf2_vi $V22, $V21, 12]}
+ @{[vmv_v_v $V23, $V21]}
+ @{[vaeskf2_vi $V23, $V22, 13]}
+ @{[vmv_v_v $V24, $V22]}
+ @{[vaeskf2_vi $V24, $V23, 14]}
+
+ @{[vse32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V13, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V14, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V15, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V16, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V17, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V18, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V19, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V20, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V21, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V22, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V23, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vse32_v $V24, $KEYP]}
+
+ li a0, 1
+ ret
+.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key
+___
+}
+
+{
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+# const AES_KEY *key);
+my ($INP, $OUTP, $KEYP) = ("a0", "a1", "a2");
+my ($T0) = ("t0");
+my ($KEY_LEN) = ("a3");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+ # Load key length.
+ lwu $KEY_LEN, 480($KEYP)
+
+ # Get proper routine for key length.
+ li $T0, 32
+ beq $KEY_LEN, $T0, L_enc_256
+ li $T0, 24
+ beq $KEY_LEN, $T0, L_enc_192
+ li $T0, 16
+ beq $KEY_LEN, $T0, L_enc_128
+
+ j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, $INP]}
+
+ @{[vle32_v $V10, $KEYP]}
+ @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ @{[vaesem_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ @{[vaesem_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ @{[vaesem_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ @{[vaesem_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+ @{[vaesem_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V16, $KEYP]}
+ @{[vaesem_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V17, $KEYP]}
+ @{[vaesem_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V18, $KEYP]}
+ @{[vaesem_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V19, $KEYP]}
+ @{[vaesem_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V20, $KEYP]}
+ @{[vaesef_vs $V1, $V20]} # with round key w[40,43]
+
+ @{[vse32_v $V1, $OUTP]}
+
+ ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_192:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, $INP]}
+
+ @{[vle32_v $V10, $KEYP]}
+ @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ @{[vaesem_vs $V1, $V11]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ @{[vaesem_vs $V1, $V12]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ @{[vaesem_vs $V1, $V13]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ @{[vaesem_vs $V1, $V14]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+ @{[vaesem_vs $V1, $V15]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V16, $KEYP]}
+ @{[vaesem_vs $V1, $V16]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V17, $KEYP]}
+ @{[vaesem_vs $V1, $V17]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V18, $KEYP]}
+ @{[vaesem_vs $V1, $V18]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V19, $KEYP]}
+ @{[vaesem_vs $V1, $V19]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V20, $KEYP]}
+ @{[vaesem_vs $V1, $V20]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V21, $KEYP]}
+ @{[vaesem_vs $V1, $V21]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V22, $KEYP]}
+ @{[vaesef_vs $V1, $V22]}
+
+ @{[vse32_v $V1, $OUTP]}
+ ret
+.size L_enc_192,.-L_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, $INP]}
+
+ @{[vle32_v $V10, $KEYP]}
+ @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ @{[vaesem_vs $V1, $V11]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ @{[vaesem_vs $V1, $V12]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ @{[vaesem_vs $V1, $V13]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ @{[vaesem_vs $V1, $V14]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+ @{[vaesem_vs $V1, $V15]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V16, $KEYP]}
+ @{[vaesem_vs $V1, $V16]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V17, $KEYP]}
+ @{[vaesem_vs $V1, $V17]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V18, $KEYP]}
+ @{[vaesem_vs $V1, $V18]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V19, $KEYP]}
+ @{[vaesem_vs $V1, $V19]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V20, $KEYP]}
+ @{[vaesem_vs $V1, $V20]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V21, $KEYP]}
+ @{[vaesem_vs $V1, $V21]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V22, $KEYP]}
+ @{[vaesem_vs $V1, $V22]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V23, $KEYP]}
+ @{[vaesem_vs $V1, $V23]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V24, $KEYP]}
+ @{[vaesef_vs $V1, $V24]}
+
+ @{[vse32_v $V1, $OUTP]}
+ ret
+.size L_enc_256,.-L_enc_256
+___
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+# const AES_KEY *key);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+ # Load key length.
+ lwu $KEY_LEN, 480($KEYP)
+
+ # Get proper routine for key length.
+ li $T0, 32
+ beq $KEY_LEN, $T0, L_dec_256
+ li $T0, 24
+ beq $KEY_LEN, $T0, L_dec_192
+ li $T0, 16
+ beq $KEY_LEN, $T0, L_dec_128
+
+ j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, $INP]}
+
+ addi $KEYP, $KEYP, 160
+ @{[vle32_v $V20, $KEYP]}
+ @{[vaesz_vs $V1, $V20]} # with round key w[40,43]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V19, $KEYP]}
+ @{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V18, $KEYP]}
+ @{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V17, $KEYP]}
+ @{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V16, $KEYP]}
+ @{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V15, $KEYP]}
+ @{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V14, $KEYP]}
+ @{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V13, $KEYP]}
+ @{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V12, $KEYP]}
+ @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V11, $KEYP]}
+ @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V10, $KEYP]}
+ @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]
+
+ @{[vse32_v $V1, $OUTP]}
+
+ ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_192:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, $INP]}
+
+ addi $KEYP, $KEYP, 192
+ @{[vle32_v $V22, $KEYP]}
+ @{[vaesz_vs $V1, $V22]} # with round key w[48,51]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V21, $KEYP]}
+ @{[vaesdm_vs $V1, $V21]} # with round key w[44,47]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V20, $KEYP]}
+ @{[vaesdm_vs $V1, $V20]} # with round key w[40,43]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V19, $KEYP]}
+ @{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V18, $KEYP]}
+ @{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V17, $KEYP]}
+ @{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V16, $KEYP]}
+ @{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V15, $KEYP]}
+ @{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V14, $KEYP]}
+ @{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V13, $KEYP]}
+ @{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V12, $KEYP]}
+ @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V11, $KEYP]}
+ @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V10, $KEYP]}
+ @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]
+
+ @{[vse32_v $V1, $OUTP]}
+
+ ret
+.size L_dec_192,.-L_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, $INP]}
+
+ addi $KEYP, $KEYP, 224
+ @{[vle32_v $V24, $KEYP]}
+ @{[vaesz_vs $V1, $V24]} # with round key w[56,59]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V23, $KEYP]}
+ @{[vaesdm_vs $V1, $V23]} # with round key w[52,55]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V22, $KEYP]}
+ @{[vaesdm_vs $V1, $V22]} # with round key w[48,51]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V21, $KEYP]}
+ @{[vaesdm_vs $V1, $V21]} # with round key w[44,47]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V20, $KEYP]}
+ @{[vaesdm_vs $V1, $V20]} # with round key w[40,43]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V19, $KEYP]}
+ @{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V18, $KEYP]}
+ @{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V17, $KEYP]}
+ @{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V16, $KEYP]}
+ @{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V15, $KEYP]}
+ @{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V14, $KEYP]}
+ @{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V13, $KEYP]}
+ @{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V12, $KEYP]}
+ @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V11, $KEYP]}
+ @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V10, $KEYP]}
+ @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]
+
+ @{[vse32_v $V1, $OUTP]}
+
+ ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+ li a0, -1
+ ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+ li a0, -2
+ ret
+.size L_fail_m2,.-L_fail_m2
+
+L_end:
+ ret
+.size L_end,.-L_end
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-27 07:08:34

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 03/13] RISC-V: crypto: add OpenSSL perl module for vector instructions

The OpenSSL has some RISC-V vector cryptography implementations which
could be reused for kernel. These implementations use a number of perl
helpers for handling vector and vector-crypto-extension instructions.
This patch take these perl helpers from OpenSSL(openssl/openssl#21923).
The unused scalar crypto instructions in the original perl module are
skipped.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/riscv.pm | 828 +++++++++++++++++++++++++++++++++++++
1 file changed, 828 insertions(+)
create mode 100644 arch/riscv/crypto/riscv.pm

diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
new file mode 100644
index 000000000000..e188f7476e3e
--- /dev/null
+++ b/arch/riscv/crypto/riscv.pm
@@ -0,0 +1,828 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+use strict;
+use warnings;
+
+# Set $have_stacktrace to 1 if we have Devel::StackTrace
+my $have_stacktrace = 0;
+if (eval {require Devel::StackTrace;1;}) {
+ $have_stacktrace = 1;
+}
+
+my @regs = map("x$_",(0..31));
+# Mapping from the RISC-V psABI ABI mnemonic names to the register number.
+my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1',
+ map("a$_",(0..7)),
+ map("s$_",(2..11)),
+ map("t$_",(3..6))
+);
+
+my %reglookup;
+@reglookup{@regs} = @regs;
+@reglookup{@regaliases} = @regs;
+
+# Takes a register name, possibly an alias, and converts it to a register index
+# from 0 to 31
+sub read_reg {
+ my $reg = lc shift;
+ if (!exists($reglookup{$reg})) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unknown register ".$reg."\n".$trace);
+ }
+ my $regstr = $reglookup{$reg};
+ if (!($regstr =~ /^x([0-9]+)$/)) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Could not process register ".$reg."\n".$trace);
+ }
+ return $1;
+}
+
+# Read the sew setting(8, 16, 32 and 64) and convert to vsew encoding.
+sub read_sew {
+ my $sew_setting = shift;
+
+ if ($sew_setting eq "e8") {
+ return 0;
+ } elsif ($sew_setting eq "e16") {
+ return 1;
+ } elsif ($sew_setting eq "e32") {
+ return 2;
+ } elsif ($sew_setting eq "e64") {
+ return 3;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported SEW setting:".$sew_setting."\n".$trace);
+ }
+}
+
+# Read the LMUL settings and convert to vlmul encoding.
+sub read_lmul {
+ my $lmul_setting = shift;
+
+ if ($lmul_setting eq "mf8") {
+ return 5;
+ } elsif ($lmul_setting eq "mf4") {
+ return 6;
+ } elsif ($lmul_setting eq "mf2") {
+ return 7;
+ } elsif ($lmul_setting eq "m1") {
+ return 0;
+ } elsif ($lmul_setting eq "m2") {
+ return 1;
+ } elsif ($lmul_setting eq "m4") {
+ return 2;
+ } elsif ($lmul_setting eq "m8") {
+ return 3;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported LMUL setting:".$lmul_setting."\n".$trace);
+ }
+}
+
+# Read the tail policy settings and convert to vta encoding.
+sub read_tail_policy {
+ my $tail_setting = shift;
+
+ if ($tail_setting eq "ta") {
+ return 1;
+ } elsif ($tail_setting eq "tu") {
+ return 0;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported tail policy setting:".$tail_setting."\n".$trace);
+ }
+}
+
+# Read the mask policy settings and convert to vma encoding.
+sub read_mask_policy {
+ my $mask_setting = shift;
+
+ if ($mask_setting eq "ma") {
+ return 1;
+ } elsif ($mask_setting eq "mu") {
+ return 0;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported mask policy setting:".$mask_setting."\n".$trace);
+ }
+}
+
+my @vregs = map("v$_",(0..31));
+my %vreglookup;
+@vreglookup{@vregs} = @vregs;
+
+sub read_vreg {
+ my $vreg = lc shift;
+ if (!exists($vreglookup{$vreg})) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unknown vector register ".$vreg."\n".$trace);
+ }
+ if (!($vreg =~ /^v([0-9]+)$/)) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Could not process vector register ".$vreg."\n".$trace);
+ }
+ return $1;
+}
+
+# Read the vm settings and convert to mask encoding.
+sub read_mask_vreg {
+ my $vreg = shift;
+ # The default value is unmasked.
+ my $mask_bit = 1;
+
+ if (defined($vreg)) {
+ my $reg_id = read_vreg $vreg;
+ if ($reg_id == 0) {
+ $mask_bit = 0;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("The ".$vreg." is not the mask register v0.\n".$trace);
+ }
+ }
+ return $mask_bit;
+}
+
+# Vector instructions
+
+sub vadd_vv {
+ # vadd.vv vd, vs2, vs1, vm
+ my $template = 0b000000_0_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vadd_vx {
+ # vadd.vx vd, vs2, rs1, vm
+ my $template = 0b000000_0_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsub_vv {
+ # vsub.vv vd, vs2, vs1, vm
+ my $template = 0b000010_0_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vsub_vx {
+ # vsub.vx vd, vs2, rs1, vm
+ my $template = 0b000010_0_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vid_v {
+ # vid.v vd
+ my $template = 0b0101001_00000_10001_010_00000_1010111;
+ my $vd = read_vreg shift;
+ return ".word ".($template | ($vd << 7));
+}
+
+sub viota_m {
+ # viota.m vd, vs2, vm
+ my $template = 0b010100_0_00000_10000_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vle8_v {
+ # vle8.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_000_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle32_v {
+ # vle32.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_110_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle64_v {
+ # vle64.v vd, (rs1)
+ my $template = 0b0000001_00000_00000_111_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse32_v {
+ # vlse32.v vd, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlsseg_nf_e32_v {
+ # vlsseg<nf>e32.v vd, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0000111;
+ my $nf = shift;
+ $nf -= 1;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse64_v {
+ # vlse64.v vd, (rs1), rs2
+ my $template = 0b0000101_00000_00000_111_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vluxei8_v {
+ # vluxei8.v vd, (rs1), vs2, vm
+ my $template = 0b000001_0_00000_00000_000_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmerge_vim {
+ # vmerge.vim vd, vs2, imm, v0
+ my $template = 0b0101110_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $imm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7));
+}
+
+sub vmerge_vvm {
+ # vmerge.vvm vd vs2 vs1
+ my $template = 0b0101110_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7))
+}
+
+sub vmseq_vi {
+ # vmseq.vi vd vs1, imm
+ my $template = 0b0110001_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $imm = shift;
+ return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7))
+}
+
+sub vmsgtu_vx {
+ # vmsgtu.vx vd vs2, rs1, vm
+ my $template = 0b011110_0_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7))
+}
+
+sub vmv_v_i {
+ # vmv.v.i vd, imm
+ my $template = 0b0101111_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $imm = shift;
+ return ".word ".($template | ($imm << 15) | ($vd << 7));
+}
+
+sub vmv_v_x {
+ # vmv.v.x vd, rs1
+ my $template = 0b0101111_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmv_v_v {
+ # vmv.v.v vd, vs1
+ my $template = 0b0101111_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vor_vv_v0t {
+ # vor.vv vd, vs2, vs1, v0.t
+ my $template = 0b0010100_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vse8_v {
+ # vse8.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_000_00000_0100111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vse32_v {
+ # vse32.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_110_00000_0100111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vssseg_nf_e32_v {
+ # vssseg<nf>e32.v vs3, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0100111;
+ my $nf = shift;
+ $nf -= 1;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsuxei8_v {
+ # vsuxei8.v vs3, (rs1), vs2, vm
+ my $template = 0b000001_0_00000_00000_000_00000_0100111;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vse64_v {
+ # vse64.v vd, (rs1)
+ my $template = 0b0000001_00000_00000_111_00000_0100111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsetivli__x0_2_e64_m1_tu_mu {
+ # vsetivli x0, 2, e64, m1, tu, mu
+ return ".word 0xc1817057";
+}
+
+sub vsetivli__x0_4_e32_m1_tu_mu {
+ # vsetivli x0, 4, e32, m1, tu, mu
+ return ".word 0xc1027057";
+}
+
+sub vsetivli__x0_4_e64_m1_tu_mu {
+ # vsetivli x0, 4, e64, m1, tu, mu
+ return ".word 0xc1827057";
+}
+
+sub vsetivli__x0_8_e32_m1_tu_mu {
+ # vsetivli x0, 8, e32, m1, tu, mu
+ return ".word 0xc1047057";
+}
+
+sub vsetvli {
+ # vsetvli rd, rs1, vtypei
+ my $template = 0b0_00000000000_00000_111_00000_1010111;
+ my $rd = read_reg shift;
+ my $rs1 = read_reg shift;
+ my $sew = read_sew shift;
+ my $lmul = read_lmul shift;
+ my $tail_policy = read_tail_policy shift;
+ my $mask_policy = read_mask_policy shift;
+ my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul;
+
+ return ".word ".($template | ($vtypei << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub vsetivli {
+ # vsetvli rd, uimm, vtypei
+ my $template = 0b11_0000000000_00000_111_00000_1010111;
+ my $rd = read_reg shift;
+ my $uimm = shift;
+ my $sew = read_sew shift;
+ my $lmul = read_lmul shift;
+ my $tail_policy = read_tail_policy shift;
+ my $mask_policy = read_mask_policy shift;
+ my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul;
+
+ return ".word ".($template | ($vtypei << 20) | ($uimm << 15) | ($rd << 7));
+}
+
+sub vslidedown_vi {
+ # vslidedown.vi vd, vs2, uimm
+ my $template = 0b0011111_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslidedown_vx {
+ # vslidedown.vx vd, vs2, rs1
+ my $template = 0b0011111_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vslideup_vi_v0t {
+ # vslideup.vi vd, vs2, uimm, v0.t
+ my $template = 0b0011100_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi {
+ # vslideup.vi vd, vs2, uimm
+ my $template = 0b0011101_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsll_vi {
+ # vsll.vi vd, vs2, uimm, vm
+ my $template = 0b1001011_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsrl_vx {
+ # vsrl.vx vd, vs2, rs1
+ my $template = 0b1010001_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsse32_v {
+ # vse32.v vs3, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0100111;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsse64_v {
+ # vsse64.v vs3, (rs1), rs2
+ my $template = 0b0000101_00000_00000_111_00000_0100111;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vxor_vv_v0t {
+ # vxor.vv vd, vs2, vs1, v0.t
+ my $template = 0b0010110_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vxor_vv {
+ # vxor.vv vd, vs2, vs1
+ my $template = 0b0010111_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vzext_vf2 {
+ # vzext.vf2 vd, vs2, vm
+ my $template = 0b010010_0_00000_00110_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+# Vector crypto instructions
+
+## Zvbb and Zvkb instructions
+##
+## vandn (also in zvkb)
+## vbrev
+## vbrev8 (also in zvkb)
+## vrev8 (also in zvkb)
+## vclz
+## vctz
+## vcpop
+## vrol (also in zvkb)
+## vror (also in zvkb)
+## vwsll
+
+sub vbrev8_v {
+ # vbrev8.v vd, vs2, vm
+ my $template = 0b010010_0_00000_01000_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vrev8_v {
+ # vrev8.v vd, vs2, vm
+ my $template = 0b010010_0_00000_01001_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vror_vi {
+ # vror.vi vd, vs2, uimm
+ my $template = 0b01010_0_1_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ my $uimm_i5 = $uimm >> 5;
+ my $uimm_i4_0 = $uimm & 0b11111;
+
+ return ".word ".($template | ($uimm_i5 << 26) | ($vs2 << 20) | ($uimm_i4_0 << 15) | ($vd << 7));
+}
+
+sub vwsll_vv {
+ # vwsll.vv vd, vs2, vs1, vm
+ my $template = 0b110101_0_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+## Zvbc instructions
+
+sub vclmulh_vx {
+ # vclmulh.vx vd, vs2, rs1
+ my $template = 0b0011011_00000_00000_110_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx_v0t {
+ # vclmul.vx vd, vs2, rs1, v0.t
+ my $template = 0b0011000_00000_00000_110_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx {
+ # vclmul.vx vd, vs2, rs1
+ my $template = 0b0011001_00000_00000_110_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+## Zvkg instructions
+
+sub vghsh_vv {
+ # vghsh.vv vd, vs2, vs1
+ my $template = 0b1011001_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vgmul_vv {
+ # vgmul.vv vd, vs2
+ my $template = 0b1010001_00000_10001_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkned instructions
+
+sub vaesdf_vs {
+ # vaesdf.vs vd, vs2
+ my $template = 0b101001_1_00000_00001_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesdm_vs {
+ # vaesdm.vs vd, vs2
+ my $template = 0b101001_1_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesef_vs {
+ # vaesef.vs vd, vs2
+ my $template = 0b101001_1_00000_00011_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesem_vs {
+ # vaesem.vs vd, vs2
+ my $template = 0b101001_1_00000_00010_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf1_vi {
+ # vaeskf1.vi vd, vs2, uimmm
+ my $template = 0b100010_1_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf2_vi {
+ # vaeskf2.vi vd, vs2, uimm
+ my $template = 0b101010_1_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vaesz_vs {
+ # vaesz.vs vd, vs2
+ my $template = 0b101001_1_00000_00111_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvknha and Zvknhb instructions
+
+sub vsha2ms_vv {
+ # vsha2ms.vv vd, vs2, vs1
+ my $template = 0b1011011_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2ch_vv {
+ # vsha2ch.vv vd, vs2, vs1
+ my $template = 0b101110_10000_00000_001_00000_01110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2cl_vv {
+ # vsha2cl.vv vd, vs2, vs1
+ my $template = 0b101111_10000_00000_001_00000_01110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+## Zvksed instructions
+
+sub vsm4k_vi {
+ # vsm4k.vi vd, vs2, uimm
+ my $template = 0b1000011_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsm4r_vs {
+ # vsm4r.vs vd, vs2
+ my $template = 0b1010011_00000_10000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## zvksh instructions
+
+sub vsm3c_vi {
+ # vsm3c.vi vd, vs2, uimm
+ my $template = 0b1010111_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7));
+}
+
+sub vsm3me_vv {
+ # vsm3me.vv vd, vs2, vs1
+ my $template = 0b1000001_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7));
+}
+
+1;
--
2.28.0

2023-11-27 07:09:01

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 06/13] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk

In some situations, we might split the `skcipher_request` into several
segments. When we try to move to next segment, we might use
`scatterwalk_ffwd()` to get the corresponding `scatterlist` iterating from
the head of `scatterlist`.

This helper function could just gather the information in `skcipher_walk`
and move to next `scatterlist` directly.

Signed-off-by: Jerry Shih <[email protected]>
---
include/crypto/scatterwalk.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index 32fc4473175b..b1a90afe695d 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -98,7 +98,12 @@ void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
unsigned int start, unsigned int nbytes, int out);

struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2],
- struct scatterlist *src,
- unsigned int len);
+ struct scatterlist *src, unsigned int len);
+
+static inline struct scatterlist *scatterwalk_next(struct scatterlist dst[2],
+ struct scatter_walk *src)
+{
+ return scatterwalk_ffwd(dst, src->sg, src->offset - src->sg->offset);
+}

#endif /* _CRYPTO_SCATTERWALK_H */
--
2.28.0

2023-11-27 07:09:11

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 07/13] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

Port the vector-crypto accelerated CBC, CTR, ECB and XTS block modes for
AES cipher from OpenSSL(openssl/openssl#21923).
In addition, support XTS-AES-192 mode which is not existed in OpenSSL.

Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `AES_BLOCK_RISCV64` option by default.
- Update asm function for using aes key in `crypto_aes_ctx` structure.
- Turn to use simd skcipher interface for AES-CBC/CTR/ECB/XTS modes.
We still have lots of discussions for kernel-vector implementation.
Before the final version of kernel-vector, use simd skcipher interface
to skip the fallback path for all aes modes in all kinds of contexts.
If we could always enable kernel-vector in softirq in the future, we
could make the original sync skcipher algorithm back.
- Refine aes-xts comments for head and tail blocks handling.
- Update VLEN constraint for aex-xts mode.
- Add `asmlinkage` qualifier for crypto asm function.
- Rename aes-riscv64-zvbb-zvkg-zvkned to aes-riscv64-zvkned-zvbb-zvkg.
- Rename aes-riscv64-zvkb-zvkned to aes-riscv64-zvkned-zvkb.
- Reorder structure riscv64_aes_algs_zvkned, riscv64_aes_alg_zvkned_zvkb
and riscv64_aes_alg_zvkned_zvbb_zvkg members initialization in the
order declared.
---
arch/riscv/crypto/Kconfig | 21 +
arch/riscv/crypto/Makefile | 11 +
.../crypto/aes-riscv64-block-mode-glue.c | 514 ++++++++++
.../crypto/aes-riscv64-zvkned-zvbb-zvkg.pl | 949 ++++++++++++++++++
arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl | 415 ++++++++
arch/riscv/crypto/aes-riscv64-zvkned.pl | 746 ++++++++++++++
6 files changed, 2656 insertions(+)
create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 65189d4d47b3..9d991ddda289 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -13,4 +13,25 @@ config CRYPTO_AES_RISCV64
Architecture: riscv64 using:
- Zvkned vector crypto extension

+config CRYPTO_AES_BLOCK_RISCV64
+ tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_AES_RISCV64
+ select CRYPTO_SIMD
+ select CRYPTO_SKCIPHER
+ help
+ Length-preserving ciphers: AES cipher algorithms (FIPS-197)
+ with block cipher modes:
+ - ECB (Electronic Codebook) mode (NIST SP 800-38A)
+ - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
+ - CTR (Counter) mode (NIST SP 800-38A)
+ - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
+ Stealing) mode (NIST SP 800-38E and IEEE 1619)
+
+ Architecture: riscv64 using:
+ - Zvkned vector crypto extension
+ - Zvbb vector extension (XTS)
+ - Zvkb vector crypto extension (CTR/XTS)
+ - Zvkg vector crypto extension (XTS)
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 90ca91d8df26..9574b009762f 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -6,10 +6,21 @@
obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o

+obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
+aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
$(call cmd,perlasm)

+$(obj)/aes-riscv64-zvkned-zvbb-zvkg.S: $(src)/aes-riscv64-zvkned-zvbb-zvkg.pl
+ $(call cmd,perlasm)
+
+$(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
+clean-files += aes-riscv64-zvkned-zvbb-zvkg.S
+clean-files += aes-riscv64-zvkned-zvkb.S
diff --git a/arch/riscv/crypto/aes-riscv64-block-mode-glue.c b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
new file mode 100644
index 000000000000..36fdd83b11ef
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
@@ -0,0 +1,514 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES block mode implementations for RISC-V
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/ctr.h>
+#include <crypto/xts.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/linkage.h>
+#include <linux/math.h>
+#include <linux/minmax.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+struct riscv64_aes_xts_ctx {
+ struct crypto_aes_ctx ctx1;
+ struct crypto_aes_ctx ctx2;
+};
+
+/* aes cbc block mode using zvkned vector crypto extension */
+asmlinkage void rv64i_zvkned_cbc_encrypt(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key,
+ u8 *ivec);
+asmlinkage void rv64i_zvkned_cbc_decrypt(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key,
+ u8 *ivec);
+/* aes ecb block mode using zvkned vector crypto extension */
+asmlinkage void rv64i_zvkned_ecb_encrypt(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key);
+asmlinkage void rv64i_zvkned_ecb_decrypt(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key);
+
+/* aes ctr block mode using zvkb and zvkned vector crypto extension */
+/* This func operates on 32-bit counter. Caller has to handle the overflow. */
+asmlinkage void
+rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key,
+ u8 *ivec);
+
+/* aes xts block mode using zvbb, zvkg and zvkned vector crypto extension */
+asmlinkage void
+rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key, u8 *iv,
+ int update_iv);
+asmlinkage void
+rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key, u8 *iv,
+ int update_iv);
+
+typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
+ const struct crypto_aes_ctx *key, u8 *iv,
+ int update_iv);
+
+/* ecb */
+static int aes_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+ return riscv64_aes_setkey(ctx, in_key, key_len);
+}
+
+static int ecb_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ /* If we have error here, the `nbytes` will be zero. */
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & (~(AES_BLOCK_SIZE - 1)), ctx);
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1));
+ }
+
+ return err;
+}
+
+static int ecb_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_ecb_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & (~(AES_BLOCK_SIZE - 1)), ctx);
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1));
+ }
+
+ return err;
+}
+
+/* cbc */
+static int cbc_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & (~(AES_BLOCK_SIZE - 1)), ctx,
+ walk.iv);
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1));
+ }
+
+ return err;
+}
+
+static int cbc_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_cbc_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & (~(AES_BLOCK_SIZE - 1)), ctx,
+ walk.iv);
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1));
+ }
+
+ return err;
+}
+
+/* ctr */
+static int ctr_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int ctr32;
+ unsigned int nbytes;
+ unsigned int blocks;
+ unsigned int current_blocks;
+ unsigned int current_length;
+ int err;
+
+ /* the ctr iv uses big endian */
+ ctr32 = get_unaligned_be32(req->iv + 12);
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ if (nbytes != walk.total) {
+ nbytes &= (~(AES_BLOCK_SIZE - 1));
+ blocks = nbytes / AES_BLOCK_SIZE;
+ } else {
+ /* This is the last walk. We should handle the tail data. */
+ blocks = DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE);
+ }
+ ctr32 += blocks;
+
+ kernel_vector_begin();
+ /*
+ * The `if` block below detects the overflow, which is then handled by
+ * limiting the amount of blocks to the exact overflow point.
+ */
+ if (ctr32 >= blocks) {
+ rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+ walk.src.virt.addr, walk.dst.virt.addr, nbytes,
+ ctx, req->iv);
+ } else {
+ /* use 2 ctr32 function calls for overflow case */
+ current_blocks = blocks - ctr32;
+ current_length =
+ min(nbytes, current_blocks * AES_BLOCK_SIZE);
+ rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+ walk.src.virt.addr, walk.dst.virt.addr,
+ current_length, ctx, req->iv);
+ crypto_inc(req->iv, 12);
+
+ if (ctr32) {
+ rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+ walk.src.virt.addr +
+ current_blocks * AES_BLOCK_SIZE,
+ walk.dst.virt.addr +
+ current_blocks * AES_BLOCK_SIZE,
+ nbytes - current_length, ctx, req->iv);
+ }
+ }
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
+
+ return err;
+}
+
+/* xts */
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ unsigned int xts_single_key_len = key_len / 2;
+ int ret;
+
+ ret = xts_verify_key(tfm, in_key, key_len);
+ if (ret)
+ return ret;
+ ret = riscv64_aes_setkey(&ctx->ctx1, in_key, xts_single_key_len);
+ if (ret)
+ return ret;
+ return riscv64_aes_setkey(&ctx->ctx2, in_key + xts_single_key_len,
+ xts_single_key_len);
+}
+
+static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_request sub_req;
+ struct scatterlist sg_src[2], sg_dst[2];
+ struct scatterlist *src, *dst;
+ struct skcipher_walk walk;
+ unsigned int walk_size = crypto_skcipher_walksize(tfm);
+ unsigned int tail_bytes;
+ unsigned int head_bytes;
+ unsigned int nbytes;
+ unsigned int update_iv = 1;
+ int err;
+
+ /* xts input size should be bigger than AES_BLOCK_SIZE */
+ if (req->cryptlen < AES_BLOCK_SIZE)
+ return -EINVAL;
+
+ /*
+ * We split xts-aes cryption into `head` and `tail` parts.
+ * The head block contains the input from the beginning which doesn't need
+ * `ciphertext stealing` method.
+ * The tail block contains at least two AES blocks including ciphertext
+ * stealing data from the end.
+ */
+ if (req->cryptlen <= walk_size) {
+ /*
+ * All data is in one `walk`. We could handle it within one AES-XTS call in
+ * the end.
+ */
+ tail_bytes = req->cryptlen;
+ head_bytes = 0;
+ } else {
+ if (req->cryptlen & (AES_BLOCK_SIZE - 1)) {
+ /*
+ * with ciphertext stealing
+ *
+ * Find the largest tail size which is small than `walk` size while the
+ * head part still fits AES block boundary.
+ */
+ tail_bytes = req->cryptlen & (AES_BLOCK_SIZE - 1);
+ tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
+ head_bytes = req->cryptlen - tail_bytes;
+ } else {
+ /* no ciphertext stealing */
+ tail_bytes = 0;
+ head_bytes = req->cryptlen;
+ }
+ }
+
+ riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
+
+ if (head_bytes && tail_bytes) {
+ /* If we have to parts, setup new request for head part only. */
+ skcipher_request_set_tfm(&sub_req, tfm);
+ skcipher_request_set_callback(
+ &sub_req, skcipher_request_flags(req), NULL, NULL);
+ skcipher_request_set_crypt(&sub_req, req->src, req->dst,
+ head_bytes, req->iv);
+ req = &sub_req;
+ }
+
+ if (head_bytes) {
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ if (nbytes == walk.total)
+ update_iv = (tail_bytes > 0);
+
+ nbytes &= (~(AES_BLOCK_SIZE - 1));
+ kernel_vector_begin();
+ func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
+ &ctx->ctx1, req->iv, update_iv);
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
+ if (err || !tail_bytes)
+ return err;
+
+ /*
+ * Setup new request for tail part.
+ * We use `scatterwalk_next()` to find the next scatterlist from last
+ * walk instead of iterating from the beginning.
+ */
+ dst = src = scatterwalk_next(sg_src, &walk.in);
+ if (req->dst != req->src)
+ dst = scatterwalk_next(sg_dst, &walk.out);
+ skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
+ }
+
+ /* tail */
+ err = skcipher_walk_virt(&walk, req, false);
+ if (err)
+ return err;
+ if (walk.nbytes != tail_bytes)
+ return -EINVAL;
+ kernel_vector_begin();
+ func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->ctx1,
+ req->iv, 0);
+ kernel_vector_end();
+
+ return skcipher_walk_done(&walk, 0);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+ return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt);
+}
+
+static struct skcipher_alg riscv64_aes_algs_zvkned[] = {
+ {
+ .setkey = aes_setkey,
+ .encrypt = ecb_encrypt,
+ .decrypt = ecb_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "__ecb(aes)",
+ .cra_driver_name = "__ecb-aes-riscv64-zvkned",
+ .cra_module = THIS_MODULE,
+ },
+ }, {
+ .setkey = aes_setkey,
+ .encrypt = cbc_encrypt,
+ .decrypt = cbc_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "__cbc(aes)",
+ .cra_driver_name = "__cbc-aes-riscv64-zvkned",
+ .cra_module = THIS_MODULE,
+ },
+ }
+};
+
+static struct simd_skcipher_alg
+ *riscv64_aes_simd_algs_zvkned[ARRAY_SIZE(riscv64_aes_algs_zvkned)];
+
+static struct skcipher_alg riscv64_aes_alg_zvkned_zvkb[] = {
+ {
+ .setkey = aes_setkey,
+ .encrypt = ctr_encrypt,
+ .decrypt = ctr_encrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "__ctr(aes)",
+ .cra_driver_name = "__ctr-aes-riscv64-zvkned-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ }
+};
+
+static struct simd_skcipher_alg *riscv64_aes_simd_alg_zvkned_zvkb[ARRAY_SIZE(
+ riscv64_aes_alg_zvkned_zvkb)];
+
+static struct skcipher_alg riscv64_aes_alg_zvkned_zvbb_zvkg[] = {
+ {
+ .setkey = xts_setkey,
+ .encrypt = xts_encrypt,
+ .decrypt = xts_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE * 2,
+ .max_keysize = AES_MAX_KEY_SIZE * 2,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_aes_xts_ctx),
+ .cra_priority = 300,
+ .cra_name = "__xts(aes)",
+ .cra_driver_name = "__xts-aes-riscv64-zvkned-zvbb-zvkg",
+ .cra_module = THIS_MODULE,
+ },
+ }
+};
+
+static struct simd_skcipher_alg
+ *riscv64_aes_simd_alg_zvkned_zvbb_zvkg[ARRAY_SIZE(
+ riscv64_aes_alg_zvkned_zvbb_zvkg)];
+
+static int __init riscv64_aes_block_mod_init(void)
+{
+ int ret = -ENODEV;
+
+ if (riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128 && riscv_vector_vlen() <= 2048) {
+ ret = simd_register_skciphers_compat(
+ riscv64_aes_algs_zvkned,
+ ARRAY_SIZE(riscv64_aes_algs_zvkned),
+ riscv64_aes_simd_algs_zvkned);
+ if (ret)
+ return ret;
+
+ if (riscv_isa_extension_available(NULL, ZVBB)) {
+ ret = simd_register_skciphers_compat(
+ riscv64_aes_alg_zvkned_zvkb,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb),
+ riscv64_aes_simd_alg_zvkned_zvkb);
+ if (ret)
+ goto unregister_zvkned;
+
+ if (riscv_isa_extension_available(NULL, ZVKG)) {
+ ret = simd_register_skciphers_compat(
+ riscv64_aes_alg_zvkned_zvbb_zvkg,
+ ARRAY_SIZE(
+ riscv64_aes_alg_zvkned_zvbb_zvkg),
+ riscv64_aes_simd_alg_zvkned_zvbb_zvkg);
+ if (ret)
+ goto unregister_zvkned_zvkb;
+ }
+ }
+ }
+
+ return ret;
+
+unregister_zvkned_zvkb:
+ simd_unregister_skciphers(riscv64_aes_alg_zvkned_zvkb,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb),
+ riscv64_aes_simd_alg_zvkned_zvkb);
+unregister_zvkned:
+ simd_unregister_skciphers(riscv64_aes_algs_zvkned,
+ ARRAY_SIZE(riscv64_aes_algs_zvkned),
+ riscv64_aes_simd_algs_zvkned);
+
+ return ret;
+}
+
+static void __exit riscv64_aes_block_mod_fini(void)
+{
+ simd_unregister_skciphers(riscv64_aes_alg_zvkned_zvbb_zvkg,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned_zvbb_zvkg),
+ riscv64_aes_simd_alg_zvkned_zvbb_zvkg);
+ simd_unregister_skciphers(riscv64_aes_alg_zvkned_zvkb,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb),
+ riscv64_aes_simd_alg_zvkned_zvkb);
+ simd_unregister_skciphers(riscv64_aes_algs_zvkned,
+ ARRAY_SIZE(riscv64_aes_algs_zvkned),
+ riscv64_aes_simd_algs_zvkned);
+}
+
+module_init(riscv64_aes_block_mod_init);
+module_exit(riscv64_aes_block_mod_fini);
+
+MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("cbc(aes)");
+MODULE_ALIAS_CRYPTO("ctr(aes)");
+MODULE_ALIAS_CRYPTO("ecb(aes)");
+MODULE_ALIAS_CRYPTO("xts(aes)");
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl b/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl
new file mode 100644
index 000000000000..6b6aad1cc97a
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl
@@ -0,0 +1,949 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128 && VLEN <= 2048
+# - RISC-V Vector Bit-manipulation extension ('Zvbb')
+# - RISC-V Vector GCM/GMAC extension ('Zvkg')
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+{
+################################################################################
+# void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const unsigned char *in,
+# unsigned char *out, size_t length,
+# const AES_KEY *key,
+# unsigned char iv[16],
+# int update_iv)
+my ($INPUT, $OUTPUT, $LENGTH, $KEY, $IV, $UPDATE_IV) = ("a0", "a1", "a2", "a3", "a4", "a5");
+my ($TAIL_LENGTH) = ("a6");
+my ($VL) = ("a7");
+my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3");
+my ($STORE_LEN32) = ("t4");
+my ($LEN32) = ("t5");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+# load iv to v28
+sub load_xts_iv0 {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V28, $IV]}
+___
+
+ return $code;
+}
+
+# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
+sub init_first_round {
+ my $code=<<___;
+ # load input
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vle32_v $V24, $INPUT]}
+
+ li $T0, 5
+ # We could simplify the initialization steps if we have `block<=1`.
+ blt $LEN32, $T0, 1f
+
+ # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses
+ # different order of coefficients. We should use`vbrev8` to reverse the
+ # data when we use `vgmul`.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vbrev8_v $V0, $V28]}
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vmv_v_i $V16, 0]}
+ # v16: [r-IV0, r-IV0, ...]
+ @{[vaesz_vs $V16, $V0]}
+
+ # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
+ # We use `vwsll` to get power of 2 multipliers. Current rvv spec only
+ # supports `SEW<=64`. So, the maximum `VLEN` for this approach is `2048`.
+ # SEW64_BITS * AES_BLOCK_SIZE / LMUL
+ # = 64 * 128 / 4 = 2048
+ #
+ # TODO: truncate the vl to `2048` for `vlen>2048` case.
+ slli $T0, $LEN32, 2
+ @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
+ # v2: [`1`, `1`, `1`, `1`, ...]
+ @{[vmv_v_i $V2, 1]}
+ # v3: [`0`, `1`, `2`, `3`, ...]
+ @{[vid_v $V3]}
+ @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
+ # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
+ @{[vzext_vf2 $V4, $V2]}
+ # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
+ @{[vzext_vf2 $V6, $V3]}
+ slli $T0, $LEN32, 1
+ @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
+ # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
+ @{[vwsll_vv $V8, $V4, $V6]}
+
+ # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vbrev8_v $V8, $V8]}
+ @{[vgmul_vv $V16, $V8]}
+
+ # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28.
+ # Reverse the bits order back.
+ @{[vbrev8_v $V28, $V16]}
+
+ # Prepare the x^n multiplier in v20. The `n` is the aes-xts block number
+ # in a LMUL=4 register group.
+ # n = ((VLEN*LMUL)/(32*4)) = ((VLEN*4)/(32*4))
+ # = (VLEN/32)
+ # We could use vsetvli with `e32, m1` to compute the `n` number.
+ @{[vsetvli $T0, "zero", "e32", "m1", "ta", "ma"]}
+ li $T1, 1
+ sll $T0, $T1, $T0
+ @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V0, 0]}
+ @{[vsetivli "zero", 1, "e64", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V0, $T0]}
+ @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]}
+ @{[vbrev8_v $V0, $V0]}
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vmv_v_i $V20, 0]}
+ @{[vaesz_vs $V20, $V0]}
+
+ j 2f
+1:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vbrev8_v $V16, $V28]}
+2:
+___
+
+ return $code;
+}
+
+# prepare xts enc last block's input(v24) and iv(v28)
+sub handle_xts_enc_last_block {
+ my $code=<<___;
+ bnez $TAIL_LENGTH, 2f
+
+ beqz $UPDATE_IV, 1f
+ ## Store next IV
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V28, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V28, $T0]}
+
+ # IV * `x`
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V28]}
+ # Reverse the IV's bits order back to big-endian
+ @{[vbrev8_v $V28, $V16]}
+
+ @{[vse32_v $V28, $IV]}
+1:
+
+ ret
+2:
+ # slidedown second to last block
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # ciphertext
+ @{[vslidedown_vx $V24, $V24, $VL]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block into v24
+ # note: We should load the last block before store the second to last block
+ # for in-place operation.
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V28, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V28, $T0]}
+
+ # compute IV for last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V28]}
+ @{[vbrev8_v $V28, $V16]}
+
+ # store second to last block
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "ta", "ma"]}
+ @{[vse8_v $V25, $OUTPUT]}
+___
+
+ return $code;
+}
+
+# prepare xts dec second to last block's input(v24) and iv(v29) and
+# last block's and iv(v28)
+sub handle_xts_dec_last_block {
+ my $code=<<___;
+ bnez $TAIL_LENGTH, 2f
+
+ beqz $UPDATE_IV, 1f
+ ## Store next IV
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V28, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V28, $T0]}
+
+ beqz $LENGTH, 3f
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+3:
+ # IV * `x`
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V28]}
+ # Reverse the IV's bits order back to big-endian
+ @{[vbrev8_v $V28, $V16]}
+
+ @{[vse32_v $V28, $IV]}
+1:
+
+ ret
+2:
+ # load second to last block's ciphertext
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V24, $INPUT]}
+ addi $INPUT, $INPUT, 16
+
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V20, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V20, $T0]}
+
+ beqz $LENGTH, 1f
+ # slidedown third to last block
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+ # compute IV for last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V20]}
+ @{[vbrev8_v $V28, $V16]}
+
+ # compute IV for second to last block
+ @{[vgmul_vv $V16, $V20]}
+ @{[vbrev8_v $V29, $V16]}
+ j 2f
+1:
+ # compute IV for second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V20]}
+ @{[vbrev8_v $V29, $V16]}
+2:
+___
+
+ return $code;
+}
+
+# Load all 11 round keys to v1-v11 registers.
+sub aes_128_load_key {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V2, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V3, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V4, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V5, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V6, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V7, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V8, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V9, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V10, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V11, $KEY]}
+___
+
+ return $code;
+}
+
+# Load all 13 round keys to v1-v13 registers.
+sub aes_192_load_key {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V2, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V3, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V4, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V5, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V6, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V7, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V8, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V9, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V10, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V11, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V12, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V13, $KEY]}
+___
+
+ return $code;
+}
+
+# Load all 15 round keys to v1-v15 registers.
+sub aes_256_load_key {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V2, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V3, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V4, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V5, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V6, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V7, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V8, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V9, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V10, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V11, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V12, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V13, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V14, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V15, $KEY]}
+___
+
+ return $code;
+}
+
+# aes-128 enc with round keys v1-v11
+sub aes_128_enc {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesef_vs $V24, $V11]}
+___
+
+ return $code;
+}
+
+# aes-128 dec with round keys v1-v11
+sub aes_128_dec {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V11]}
+ @{[vaesdm_vs $V24, $V10]}
+ @{[vaesdm_vs $V24, $V9]}
+ @{[vaesdm_vs $V24, $V8]}
+ @{[vaesdm_vs $V24, $V7]}
+ @{[vaesdm_vs $V24, $V6]}
+ @{[vaesdm_vs $V24, $V5]}
+ @{[vaesdm_vs $V24, $V4]}
+ @{[vaesdm_vs $V24, $V3]}
+ @{[vaesdm_vs $V24, $V2]}
+ @{[vaesdf_vs $V24, $V1]}
+___
+
+ return $code;
+}
+
+# aes-192 enc with round keys v1-v13
+sub aes_192_enc {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesef_vs $V24, $V13]}
+___
+
+ return $code;
+}
+
+# aes-192 dec with round keys v1-v13
+sub aes_192_dec {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V13]}
+ @{[vaesdm_vs $V24, $V12]}
+ @{[vaesdm_vs $V24, $V11]}
+ @{[vaesdm_vs $V24, $V10]}
+ @{[vaesdm_vs $V24, $V9]}
+ @{[vaesdm_vs $V24, $V8]}
+ @{[vaesdm_vs $V24, $V7]}
+ @{[vaesdm_vs $V24, $V6]}
+ @{[vaesdm_vs $V24, $V5]}
+ @{[vaesdm_vs $V24, $V4]}
+ @{[vaesdm_vs $V24, $V3]}
+ @{[vaesdm_vs $V24, $V2]}
+ @{[vaesdf_vs $V24, $V1]}
+___
+
+ return $code;
+}
+
+# aes-256 enc with round keys v1-v15
+sub aes_256_enc {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesem_vs $V24, $V13]}
+ @{[vaesem_vs $V24, $V14]}
+ @{[vaesef_vs $V24, $V15]}
+___
+
+ return $code;
+}
+
+# aes-256 dec with round keys v1-v15
+sub aes_256_dec {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V15]}
+ @{[vaesdm_vs $V24, $V14]}
+ @{[vaesdm_vs $V24, $V13]}
+ @{[vaesdm_vs $V24, $V12]}
+ @{[vaesdm_vs $V24, $V11]}
+ @{[vaesdm_vs $V24, $V10]}
+ @{[vaesdm_vs $V24, $V9]}
+ @{[vaesdm_vs $V24, $V8]}
+ @{[vaesdm_vs $V24, $V7]}
+ @{[vaesdm_vs $V24, $V6]}
+ @{[vaesdm_vs $V24, $V5]}
+ @{[vaesdm_vs $V24, $V4]}
+ @{[vaesdm_vs $V24, $V3]}
+ @{[vaesdm_vs $V24, $V2]}
+ @{[vaesdf_vs $V24, $V1]}
+___
+
+ return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt
+.type rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,\@function
+rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt:
+ @{[load_xts_iv0]}
+
+ # aes block size is 16
+ andi $TAIL_LENGTH, $LENGTH, 15
+ mv $STORE_LEN32, $LENGTH
+ beqz $TAIL_LENGTH, 1f
+ sub $LENGTH, $LENGTH, $TAIL_LENGTH
+ addi $STORE_LEN32, $LENGTH, -16
+1:
+ # We make the `LENGTH` become e32 length here.
+ srli $LEN32, $LENGTH, 2
+ srli $STORE_LEN32, $STORE_LEN32, 2
+
+ # Load key length.
+ lwu $T0, 480($KEY)
+ li $T1, 32
+ li $T2, 24
+ li $T3, 16
+ beq $T0, $T1, aes_xts_enc_256
+ beq $T0, $T2, aes_xts_enc_192
+ beq $T0, $T3, aes_xts_enc_128
+.size rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_128:
+ @{[init_first_round]}
+ @{[aes_128_load_key]}
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Lenc_blocks_128:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load plaintext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_128_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store ciphertext
+ @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+ sub $STORE_LEN32, $STORE_LEN32, $VL
+
+ bnez $LEN32, .Lenc_blocks_128
+
+ @{[handle_xts_enc_last_block]}
+
+ # xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_128_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store last block ciphertext
+ addi $OUTPUT, $OUTPUT, -16
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_enc_128,.-aes_xts_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_192:
+ @{[init_first_round]}
+ @{[aes_192_load_key]}
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Lenc_blocks_192:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load plaintext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_192_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store ciphertext
+ @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+ sub $STORE_LEN32, $STORE_LEN32, $VL
+
+ bnez $LEN32, .Lenc_blocks_192
+
+ @{[handle_xts_enc_last_block]}
+
+ # xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_192_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store last block ciphertext
+ addi $OUTPUT, $OUTPUT, -16
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_enc_192,.-aes_xts_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_256:
+ @{[init_first_round]}
+ @{[aes_256_load_key]}
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Lenc_blocks_256:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load plaintext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_256_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store ciphertext
+ @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+ sub $STORE_LEN32, $STORE_LEN32, $VL
+
+ bnez $LEN32, .Lenc_blocks_256
+
+ @{[handle_xts_enc_last_block]}
+
+ # xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_256_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store last block ciphertext
+ addi $OUTPUT, $OUTPUT, -16
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_enc_256,.-aes_xts_enc_256
+___
+
+################################################################################
+# void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const unsigned char *in,
+# unsigned char *out, size_t length,
+# const AES_KEY *key,
+# unsigned char iv[16],
+# int update_iv)
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt
+.type rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,\@function
+rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt:
+ @{[load_xts_iv0]}
+
+ # aes block size is 16
+ andi $TAIL_LENGTH, $LENGTH, 15
+ beqz $TAIL_LENGTH, 1f
+ sub $LENGTH, $LENGTH, $TAIL_LENGTH
+ addi $LENGTH, $LENGTH, -16
+1:
+ # We make the `LENGTH` become e32 length here.
+ srli $LEN32, $LENGTH, 2
+
+ # Load key length.
+ lwu $T0, 480($KEY)
+ li $T1, 32
+ li $T2, 24
+ li $T3, 16
+ beq $T0, $T1, aes_xts_dec_256
+ beq $T0, $T2, aes_xts_dec_192
+ beq $T0, $T3, aes_xts_dec_128
+.size rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_128:
+ @{[init_first_round]}
+ @{[aes_128_load_key]}
+
+ beqz $LEN32, 2f
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Ldec_blocks_128:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load ciphertext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_128_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+
+ bnez $LEN32, .Ldec_blocks_128
+
+2:
+ @{[handle_xts_dec_last_block]}
+
+ ## xts second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[aes_128_dec]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block ciphertext
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # store second to last block plaintext
+ addi $T0, $OUTPUT, 16
+ @{[vse8_v $V25, $T0]}
+
+ ## xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_128_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store second to last block plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_dec_128,.-aes_xts_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_192:
+ @{[init_first_round]}
+ @{[aes_192_load_key]}
+
+ beqz $LEN32, 2f
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Ldec_blocks_192:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load ciphertext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_192_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+
+ bnez $LEN32, .Ldec_blocks_192
+
+2:
+ @{[handle_xts_dec_last_block]}
+
+ ## xts second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[aes_192_dec]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block ciphertext
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # store second to last block plaintext
+ addi $T0, $OUTPUT, 16
+ @{[vse8_v $V25, $T0]}
+
+ ## xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_192_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store second to last block plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_dec_192,.-aes_xts_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_256:
+ @{[init_first_round]}
+ @{[aes_256_load_key]}
+
+ beqz $LEN32, 2f
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Ldec_blocks_256:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load ciphertext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_256_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+
+ bnez $LEN32, .Ldec_blocks_256
+
+2:
+ @{[handle_xts_dec_last_block]}
+
+ ## xts second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[aes_256_dec]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block ciphertext
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # store second to last block plaintext
+ addi $T0, $OUTPUT, 16
+ @{[vse8_v $V25, $T0]}
+
+ ## xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_256_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store second to last block plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_dec_256,.-aes_xts_dec_256
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl b/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl
new file mode 100644
index 000000000000..3b8c324bc4d5
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl
@@ -0,0 +1,415 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const unsigned char *in,
+# unsigned char *out, size_t length,
+# const void *key,
+# unsigned char ivec[16]);
+{
+my ($INP, $OUTP, $LEN, $KEYP, $IVP) = ("a0", "a1", "a2", "a3", "a4");
+my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3");
+my ($VL) = ("t4");
+my ($LEN32) = ("t5");
+my ($CTR) = ("t6");
+my ($MASK) = ("v0");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+# Prepare the AES ctr input data into v16.
+sub init_aes_ctr_input {
+ my $code=<<___;
+ # Setup mask into v0
+ # The mask pattern for 4*N-th elements
+ # mask v0: [000100010001....]
+ # Note:
+ # We could setup the mask just for the maximum element length instead of
+ # the VLMAX.
+ li $T0, 0b10001000
+ @{[vsetvli $T2, "zero", "e8", "m1", "ta", "ma"]}
+ @{[vmv_v_x $MASK, $T0]}
+ # Load IV.
+ # v31:[IV0, IV1, IV2, big-endian count]
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V31, $IVP]}
+ # Convert the big-endian counter into little-endian.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ @{[vrev8_v $V31, $V31, $MASK]}
+ # Splat the IV to v16
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vmv_v_i $V16, 0]}
+ @{[vaesz_vs $V16, $V31]}
+ # Prepare the ctr pattern into v20
+ # v20: [x, x, x, 0, x, x, x, 1, x, x, x, 2, ...]
+ @{[viota_m $V20, $MASK, $MASK]}
+ # v16:[IV0, IV1, IV2, count+0, IV0, IV1, IV2, count+1, ...]
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ @{[vadd_vv $V16, $V16, $V20, $MASK]}
+___
+
+ return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkb_zvkned_ctr32_encrypt_blocks
+.type rv64i_zvkb_zvkned_ctr32_encrypt_blocks,\@function
+rv64i_zvkb_zvkned_ctr32_encrypt_blocks:
+ # The aes block size is 16 bytes.
+ # We try to get the minimum aes block number including the tail data.
+ addi $T0, $LEN, 15
+ # the minimum block number
+ srli $T0, $T0, 4
+ # We make the block number become e32 length here.
+ slli $LEN32, $T0, 2
+
+ # Load key length.
+ lwu $T0, 480($KEYP)
+ li $T1, 32
+ li $T2, 24
+ li $T3, 16
+
+ beq $T0, $T1, ctr32_encrypt_blocks_256
+ beq $T0, $T2, ctr32_encrypt_blocks_192
+ beq $T0, $T3, ctr32_encrypt_blocks_128
+
+ ret
+.size rv64i_zvkb_zvkned_ctr32_encrypt_blocks,.-rv64i_zvkb_zvkned_ctr32_encrypt_blocks
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+
+ @{[init_aes_ctr_input]}
+
+ ##### AES body
+ j 2f
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+ # Prepare the AES ctr input into v24.
+ # The ctr data uses big-endian form.
+ @{[vmv_v_v $V24, $V16]}
+ @{[vrev8_v $V24, $V24, $MASK]}
+ srli $CTR, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ # Load plaintext in bytes into v20.
+ @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+ @{[vle8_v $V20, $INP]}
+ sub $LEN, $LEN, $T0
+ add $INP, $INP, $T0
+
+ @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesef_vs $V24, $V11]}
+
+ # ciphertext
+ @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V20]}
+
+ # Store the ciphertext.
+ @{[vse8_v $V24, $OUTP]}
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN, 1b
+
+ ## store ctr iv
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+ # Convert ctr data back to big-endian.
+ @{[vrev8_v $V16, $V16, $MASK]}
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size ctr32_encrypt_blocks_128,.-ctr32_encrypt_blocks_128
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+
+ @{[init_aes_ctr_input]}
+
+ ##### AES body
+ j 2f
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+ # Prepare the AES ctr input into v24.
+ # The ctr data uses big-endian form.
+ @{[vmv_v_v $V24, $V16]}
+ @{[vrev8_v $V24, $V24, $MASK]}
+ srli $CTR, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ # Load plaintext in bytes into v20.
+ @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+ @{[vle8_v $V20, $INP]}
+ sub $LEN, $LEN, $T0
+ add $INP, $INP, $T0
+
+ @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesef_vs $V24, $V13]}
+
+ # ciphertext
+ @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V20]}
+
+ # Store the ciphertext.
+ @{[vse8_v $V24, $OUTP]}
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN, 1b
+
+ ## store ctr iv
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+ # Convert ctr data back to big-endian.
+ @{[vrev8_v $V16, $V16, $MASK]}
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size ctr32_encrypt_blocks_192,.-ctr32_encrypt_blocks_192
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+
+ @{[init_aes_ctr_input]}
+
+ ##### AES body
+ j 2f
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+ # Prepare the AES ctr input into v24.
+ # The ctr data uses big-endian form.
+ @{[vmv_v_v $V24, $V16]}
+ @{[vrev8_v $V24, $V24, $MASK]}
+ srli $CTR, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ # Load plaintext in bytes into v20.
+ @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+ @{[vle8_v $V20, $INP]}
+ sub $LEN, $LEN, $T0
+ add $INP, $INP, $T0
+
+ @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesem_vs $V24, $V13]}
+ @{[vaesem_vs $V24, $V14]}
+ @{[vaesef_vs $V24, $V15]}
+
+ # ciphertext
+ @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V20]}
+
+ # Store the ciphertext.
+ @{[vse8_v $V24, $OUTP]}
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN, 1b
+
+ ## store ctr iv
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+ # Convert ctr data back to big-endian.
+ @{[vrev8_v $V16, $V16, $MASK]}
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size ctr32_encrypt_blocks_256,.-ctr32_encrypt_blocks_256
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
index 303e82d9f6f0..71a9248320c0 100644
--- a/arch/riscv/crypto/aes-riscv64-zvkned.pl
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -67,6 +67,752 @@ my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
$V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
) = map("v$_",(0..31));

+# Load all 11 round keys to v1-v11 registers.
+sub aes_128_load_key {
+ my $KEYP = shift;
+
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+___
+
+ return $code;
+}
+
+# Load all 13 round keys to v1-v13 registers.
+sub aes_192_load_key {
+ my $KEYP = shift;
+
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+___
+
+ return $code;
+}
+
+# Load all 15 round keys to v1-v15 registers.
+sub aes_256_load_key {
+ my $KEYP = shift;
+
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+___
+
+ return $code;
+}
+
+# aes-128 encryption with round keys v1-v11
+sub aes_128_encrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3]
+ @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesem_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesem_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesem_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesem_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesem_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesem_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesem_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesef_vs $V24, $V11]} # with round key w[40,43]
+___
+
+ return $code;
+}
+
+# aes-128 decryption with round keys v1-v11
+sub aes_128_decrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesdm_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesdm_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesdm_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesdm_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesdm_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesdm_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesdm_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3]
+___
+
+ return $code;
+}
+
+# aes-192 encryption with round keys v1-v13
+sub aes_192_encrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3]
+ @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesem_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesem_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesem_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesem_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesem_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesem_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesem_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesem_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesem_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesef_vs $V24, $V13]} # with round key w[48,51]
+___
+
+ return $code;
+}
+
+# aes-192 decryption with round keys v1-v13
+sub aes_192_decrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V13]} # with round key w[48,51]
+ @{[vaesdm_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesdm_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesdm_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesdm_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesdm_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesdm_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesdm_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesdm_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesdm_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3]
+___
+
+ return $code;
+}
+
+# aes-256 encryption with round keys v1-v15
+sub aes_256_encrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3]
+ @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesem_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesem_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesem_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesem_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesem_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesem_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesem_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesem_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesem_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesem_vs $V24, $V13]} # with round key w[48,51]
+ @{[vaesem_vs $V24, $V14]} # with round key w[52,55]
+ @{[vaesef_vs $V24, $V15]} # with round key w[56,59]
+___
+
+ return $code;
+}
+
+# aes-256 decryption with round keys v1-v15
+sub aes_256_decrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V15]} # with round key w[56,59]
+ @{[vaesdm_vs $V24, $V14]} # with round key w[52,55]
+ @{[vaesdm_vs $V24, $V13]} # with round key w[48,51]
+ @{[vaesdm_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesdm_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesdm_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesdm_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesdm_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesdm_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesdm_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesdm_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesdm_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3]
+___
+
+ return $code;
+}
+
+{
+###############################################################################
+# void rv64i_zvkned_cbc_encrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# unsigned char *ivec, const int enc);
+my ($INP, $OUTP, $LEN, $KEYP, $IVP, $ENC) = ("a0", "a1", "a2", "a3", "a4", "a5");
+my ($T0, $T1) = ("t0", "t1", "t2");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_cbc_encrypt
+.type rv64i_zvkned_cbc_encrypt,\@function
+rv64i_zvkned_cbc_encrypt:
+ # check whether the length is a multiple of 16 and >= 16
+ li $T1, 16
+ blt $LEN, $T1, L_end
+ andi $T1, $LEN, 15
+ bnez $T1, L_end
+
+ # Load key length.
+ lwu $T0, 480($KEYP)
+
+ # Get proper routine for key length.
+ li $T1, 16
+ beq $T1, $T0, L_cbc_enc_128
+
+ li $T1, 24
+ beq $T1, $T0, L_cbc_enc_192
+
+ li $T1, 32
+ beq $T1, $T0, L_cbc_enc_256
+
+ ret
+.size rv64i_zvkned_cbc_encrypt,.-rv64i_zvkned_cbc_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vxor_vv $V24, $V24, $V16]}
+ j 2f
+
+1:
+ @{[vle32_v $V17, $INP]}
+ @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+ # AES body
+ @{[aes_128_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ addi $INP, $INP, 16
+ addi $OUTP, $OUTP, 16
+ addi $LEN, $LEN, -16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V24, $IVP]}
+
+ ret
+.size L_cbc_enc_128,.-L_cbc_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vxor_vv $V24, $V24, $V16]}
+ j 2f
+
+1:
+ @{[vle32_v $V17, $INP]}
+ @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+ # AES body
+ @{[aes_192_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ addi $INP, $INP, 16
+ addi $OUTP, $OUTP, 16
+ addi $LEN, $LEN, -16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V24, $IVP]}
+
+ ret
+.size L_cbc_enc_192,.-L_cbc_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vxor_vv $V24, $V24, $V16]}
+ j 2f
+
+1:
+ @{[vle32_v $V17, $INP]}
+ @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+ # AES body
+ @{[aes_256_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ addi $INP, $INP, 16
+ addi $OUTP, $OUTP, 16
+ addi $LEN, $LEN, -16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V24, $IVP]}
+
+ ret
+.size L_cbc_enc_256,.-L_cbc_enc_256
+___
+
+###############################################################################
+# void rv64i_zvkned_cbc_decrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# unsigned char *ivec, const int enc);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_cbc_decrypt
+.type rv64i_zvkned_cbc_decrypt,\@function
+rv64i_zvkned_cbc_decrypt:
+ # check whether the length is a multiple of 16 and >= 16
+ li $T1, 16
+ blt $LEN, $T1, L_end
+ andi $T1, $LEN, 15
+ bnez $T1, L_end
+
+ # Load key length.
+ lwu $T0, 480($KEYP)
+
+ # Get proper routine for key length.
+ li $T1, 16
+ beq $T1, $T0, L_cbc_dec_128
+
+ li $T1, 24
+ beq $T1, $T0, L_cbc_dec_192
+
+ li $T1, 32
+ beq $T1, $T0, L_cbc_dec_256
+
+ ret
+.size rv64i_zvkned_cbc_decrypt,.-rv64i_zvkned_cbc_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ j 2f
+
+1:
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ addi $OUTP, $OUTP, 16
+
+2:
+ # AES body
+ @{[aes_128_decrypt]}
+
+ @{[vxor_vv $V24, $V24, $V16]}
+ @{[vse32_v $V24, $OUTP]}
+ @{[vmv_v_v $V16, $V17]}
+
+ addi $LEN, $LEN, -16
+ addi $INP, $INP, 16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size L_cbc_dec_128,.-L_cbc_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ j 2f
+
+1:
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ addi $OUTP, $OUTP, 16
+
+2:
+ # AES body
+ @{[aes_192_decrypt]}
+
+ @{[vxor_vv $V24, $V24, $V16]}
+ @{[vse32_v $V24, $OUTP]}
+ @{[vmv_v_v $V16, $V17]}
+
+ addi $LEN, $LEN, -16
+ addi $INP, $INP, 16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size L_cbc_dec_192,.-L_cbc_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ j 2f
+
+1:
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ addi $OUTP, $OUTP, 16
+
+2:
+ # AES body
+ @{[aes_256_decrypt]}
+
+ @{[vxor_vv $V24, $V24, $V16]}
+ @{[vse32_v $V24, $OUTP]}
+ @{[vmv_v_v $V16, $V17]}
+
+ addi $LEN, $LEN, -16
+ addi $INP, $INP, 16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size L_cbc_dec_256,.-L_cbc_dec_256
+___
+}
+
+{
+###############################################################################
+# void rv64i_zvkned_ecb_encrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# const int enc);
+my ($INP, $OUTP, $LEN, $KEYP, $ENC) = ("a0", "a1", "a2", "a3", "a4");
+my ($VL) = ("a5");
+my ($LEN32) = ("a6");
+my ($T0, $T1) = ("t0", "t1");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_ecb_encrypt
+.type rv64i_zvkned_ecb_encrypt,\@function
+rv64i_zvkned_ecb_encrypt:
+ # Make the LEN become e32 length.
+ srli $LEN32, $LEN, 2
+
+ # Load key length.
+ lwu $T0, 480($KEYP)
+
+ # Get proper routine for key length.
+ li $T1, 16
+ beq $T1, $T0, L_ecb_enc_128
+
+ li $T1, 24
+ beq $T1, $T0, L_ecb_enc_192
+
+ li $T1, 32
+ beq $T1, $T0, L_ecb_enc_256
+
+ ret
+.size rv64i_zvkned_ecb_encrypt,.-rv64i_zvkned_ecb_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_128_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_enc_128,.-L_ecb_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_192_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_enc_192,.-L_ecb_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_256_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_enc_256,.-L_ecb_enc_256
+___
+
+###############################################################################
+# void rv64i_zvkned_ecb_decrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# const int enc);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_ecb_decrypt
+.type rv64i_zvkned_ecb_decrypt,\@function
+rv64i_zvkned_ecb_decrypt:
+ # Make the LEN become e32 length.
+ srli $LEN32, $LEN, 2
+
+ # Load key length.
+ lwu $T0, 480($KEYP)
+
+ # Get proper routine for key length.
+ li $T1, 16
+ beq $T1, $T0, L_ecb_dec_128
+
+ li $T1, 24
+ beq $T1, $T0, L_ecb_dec_192
+
+ li $T1, 32
+ beq $T1, $T0, L_ecb_dec_256
+
+ ret
+.size rv64i_zvkned_ecb_decrypt,.-rv64i_zvkned_ecb_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_128_decrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_dec_128,.-L_ecb_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_192_decrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_dec_192,.-L_ecb_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_256_decrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_dec_256,.-L_ecb_dec_256
+___
+}
+
{
################################################################################
# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bytes,
--
2.28.0

2023-11-27 07:09:16

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 09/13] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations

Add SHA224 and 256 implementations using Zvknha or Zvknhb vector crypto
extensions from OpenSSL(openssl/openssl#21923).

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `SHA256_RISCV64` option by default.
- Add `asmlinkage` qualifier for crypto asm function.
- Rename sha256-riscv64-zvkb-zvknha_or_zvknhb to
sha256-riscv64-zvknha_or_zvknhb-zvkb.
- Reorder structure sha256_algs members initialization in the order
declared.
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sha256-riscv64-glue.c | 145 ++++++++
.../sha256-riscv64-zvknha_or_zvknhb-zvkb.pl | 318 ++++++++++++++++++
4 files changed, 481 insertions(+)
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 6863f01a2ab0..d31af9190717 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -44,4 +44,15 @@ config CRYPTO_GHASH_RISCV64
Architecture: riscv64 using:
- Zvkg vector crypto extension

+config CRYPTO_SHA256_RISCV64
+ tristate "Hash functions: SHA-224 and SHA-256"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_SHA256
+ help
+ SHA-224 and SHA-256 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64 using:
+ - Zvknha or Zvknhb vector crypto extensions
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 94a7f8eaa8a7..e9d7717ec943 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -12,6 +12,9 @@ aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvk
obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -27,7 +30,11 @@ $(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl
$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(call cmd,perlasm)

+$(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvkned-zvbb-zvkg.S
clean-files += aes-riscv64-zvkned-zvkb.S
clean-files += ghash-riscv64-zvkg.S
+clean-files += sha256-riscv64-zvknha_or_zvknhb-zvkb.S
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 000000000000..760d89031d1c
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISC-V 64
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha256_base.h>
+
+/*
+ * sha256 using zvkb and zvknha/b vector crypto extension
+ *
+ * This asm function will just take the first 256-bit as the sha256 state from
+ * the pointer to `struct sha256_state`.
+ */
+asmlinkage void
+sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest,
+ const u8 *data, int num_blks);
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ int ret = 0;
+
+ /*
+ * Make sure struct sha256_state begins directly with the SHA256
+ * 256-bit internal state, as this is what the asm function expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ret = sha256_base_do_update(
+ desc, data, len,
+ sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+ kernel_vector_end();
+ } else {
+ ret = crypto_sha256_update(desc, data, len);
+ }
+
+ return ret;
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sha256_base_do_update(
+ desc, data, len,
+ sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+ sha256_base_do_finalize(
+ desc, sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+ kernel_vector_end();
+
+ return sha256_base_finish(desc, out);
+ }
+
+ return crypto_sha256_finup(desc, data, len, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha256_algs[] = {
+ {
+ .init = sha256_base_init,
+ .update = riscv64_sha256_update,
+ .final = riscv64_sha256_final,
+ .finup = riscv64_sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA256_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_priority = 150,
+ .cra_name = "sha256",
+ .cra_driver_name = "sha256-riscv64-zvknha_or_zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ }, {
+ .init = sha224_base_init,
+ .update = riscv64_sha256_update,
+ .final = riscv64_sha256_final,
+ .finup = riscv64_sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA224_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA224_BLOCK_SIZE,
+ .cra_priority = 150,
+ .cra_name = "sha224",
+ .cra_driver_name = "sha224-riscv64-zvknha_or_zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
+
+static inline bool check_sha256_ext(void)
+{
+ /*
+ * From the spec:
+ * The Zvknhb ext supports both SHA-256 and SHA-512 and Zvknha only
+ * supports SHA-256.
+ */
+ return (riscv_isa_extension_available(NULL, ZVKNHA) ||
+ riscv_isa_extension_available(NULL, ZVKNHB)) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sha256_mod_init(void)
+{
+ if (check_sha256_ext())
+ return crypto_register_shashes(sha256_algs,
+ ARRAY_SIZE(sha256_algs));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sha256_mod_fini(void)
+{
+ crypto_unregister_shashes(sha256_algs, ARRAY_SIZE(sha256_algs));
+}
+
+module_init(riscv64_sha256_mod_init);
+module_exit(riscv64_sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha224");
+MODULE_ALIAS_CRYPTO("sha256");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl b/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl
new file mode 100644
index 000000000000..51b2d9d4f8f1
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl
@@ -0,0 +1,318 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+my $K256 = "K256";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4");
+
+sub sha_256_load_constant {
+ my $code=<<___;
+ la $KT, $K256 # Load round constants K256
+ @{[vle32_v $V10, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V11, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V12, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V13, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V14, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V15, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V16, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V17, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V18, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V19, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V20, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V21, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V22, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V23, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V24, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V25, $KT]}
+___
+
+ return $code;
+}
+
+################################################################################
+# void sha256_block_data_order_zvkb_zvknha_or_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha256_block_data_order_zvkb_zvknha_or_zvknhb
+.type sha256_block_data_order_zvkb_zvknha_or_zvknhb,\@function
+sha256_block_data_order_zvkb_zvknha_or_zvknhb:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[sha_256_load_constant]}
+
+ # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+ # The dst vtype is e32m1 and the index vtype is e8mf4.
+ # We use index-load with the following index pattern at v26.
+ # i8 index:
+ # 20, 16, 4, 0
+ # Instead of setting the i8 index, we could use a single 32bit
+ # little-endian value to cover the 4xi8 index.
+ # i32 value:
+ # 0x 00 04 10 14
+ li $INDEX_PATTERN, 0x00041014
+ @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_x $V26, $INDEX_PATTERN]}
+
+ addi $H2, $H, 8
+
+ # Use index-load to get {f,e,b,a},{h,g,d,c}
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vluxei8_v $V6, $H, $V26]}
+ @{[vluxei8_v $V7, $H2, $V26]}
+
+ # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling.
+ # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking.
+ @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V0, 0x01]}
+
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+L_round_loop:
+ # Decrement length by 1
+ add $LEN, $LEN, -1
+
+ # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}.
+ @{[vmv_v_v $V30, $V6]}
+ @{[vmv_v_v $V31, $V7]}
+
+ # Load the 512-bits of the message block in v1-v4 and perform
+ # an endian swap on each 4 bytes element.
+ @{[vle32_v $V1, $INP]}
+ @{[vrev8_v $V1, $V1]}
+ add $INP, $INP, 16
+ @{[vle32_v $V2, $INP]}
+ @{[vrev8_v $V2, $V2]}
+ add $INP, $INP, 16
+ @{[vle32_v $V3, $INP]}
+ @{[vrev8_v $V3, $V3]}
+ add $INP, $INP, 16
+ @{[vle32_v $V4, $INP]}
+ @{[vrev8_v $V4, $V4]}
+ add $INP, $INP, 16
+
+ # Quad-round 0 (+0, Wt from oldest to newest in v1->v2->v3->v4)
+ @{[vadd_vv $V5, $V10, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+ @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[19:16]
+
+ # Quad-round 1 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V11, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+ @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[23:20]
+
+ # Quad-round 2 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V12, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+ @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[27:24]
+
+ # Quad-round 3 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V13, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+ @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[31:28]
+
+ # Quad-round 4 (+0, v1->v2->v3->v4)
+ @{[vadd_vv $V5, $V14, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+ @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[35:32]
+
+ # Quad-round 5 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V15, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+ @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[39:36]
+
+ # Quad-round 6 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V16, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+ @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[43:40]
+
+ # Quad-round 7 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V17, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+ @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[47:44]
+
+ # Quad-round 8 (+0, v1->v2->v3->v4)
+ @{[vadd_vv $V5, $V18, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+ @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[51:48]
+
+ # Quad-round 9 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V19, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+ @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[55:52]
+
+ # Quad-round 10 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V20, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+ @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[59:56]
+
+ # Quad-round 11 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V21, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+ @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[63:60]
+
+ # Quad-round 12 (+0, v1->v2->v3->v4)
+ # Note that we stop generating new message schedule words (Wt, v1-13)
+ # as we already generated all the words we end up consuming (i.e., W[63:60]).
+ @{[vadd_vv $V5, $V22, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # Quad-round 13 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V23, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # Quad-round 14 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V24, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # Quad-round 15 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V25, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # H' = H+{a',b',c',...,h'}
+ @{[vadd_vv $V6, $V30, $V6]}
+ @{[vadd_vv $V7, $V31, $V7]}
+ bnez $LEN, L_round_loop
+
+ # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}.
+ @{[vsuxei8_v $V6, $H, $V26]}
+ @{[vsuxei8_v $V7, $H2, $V26]}
+
+ ret
+.size sha256_block_data_order_zvkb_zvknha_or_zvknhb,.-sha256_block_data_order_zvkb_zvknha_or_zvknhb
+
+.p2align 2
+.type $K256,\@object
+$K256:
+ .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+ .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+ .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+ .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+ .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+ .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+ .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+ .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+ .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+ .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+ .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+ .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+ .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+ .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+ .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+ .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size $K256,.-$K256
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-27 07:09:27

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 10/13] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations

Add SHA384 and 512 implementations using Zvknhb vector crypto extension
from OpenSSL(openssl/openssl#21923).

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `SHA512_RISCV64` option by default.
- Add `asmlinkage` qualifier for crypto asm function.
- Rename sha512-riscv64-zvkb-zvknhb to sha512-riscv64-zvknhb-zvkb.
- Reorder structure sha512_algs members initialization in the order
declared.
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sha512-riscv64-glue.c | 139 +++++++++
.../crypto/sha512-riscv64-zvknhb-zvkb.pl | 266 ++++++++++++++++++
4 files changed, 423 insertions(+)
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index d31af9190717..ad0b08a13c9a 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -55,4 +55,15 @@ config CRYPTO_SHA256_RISCV64
- Zvknha or Zvknhb vector crypto extensions
- Zvkb vector crypto extension

+config CRYPTO_SHA512_RISCV64
+ tristate "Hash functions: SHA-384 and SHA-512"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_SHA512
+ help
+ SHA-384 and SHA-512 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64 using:
+ - Zvknhb vector crypto extension
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index e9d7717ec943..8aabef950ad3 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o

+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -33,8 +36,12 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl
$(call cmd,perlasm)

+$(obj)/sha512-riscv64-zvknhb-zvkb.S: $(src)/sha512-riscv64-zvknhb-zvkb.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvkned-zvbb-zvkg.S
clean-files += aes-riscv64-zvkned-zvkb.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvknha_or_zvknhb-zvkb.S
+clean-files += sha512-riscv64-zvknhb-zvkb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..3dd8e1c9d402
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha512_base.h>
+
+/*
+ * sha512 using zvkb and zvknhb vector crypto extension
+ *
+ * This asm function will just take the first 512-bit as the sha512 state from
+ * the pointer to `struct sha512_state`.
+ */
+asmlinkage void sha512_block_data_order_zvkb_zvknhb(struct sha512_state *digest,
+ const u8 *data,
+ int num_blks);
+
+static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ int ret = 0;
+
+ /*
+ * Make sure struct sha512_state begins directly with the SHA512
+ * 512-bit internal state, as this is what the asm function expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ret = sha512_base_do_update(
+ desc, data, len, sha512_block_data_order_zvkb_zvknhb);
+ kernel_vector_end();
+ } else {
+ ret = crypto_sha512_update(desc, data, len);
+ }
+
+ return ret;
+}
+
+static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sha512_base_do_update(
+ desc, data, len,
+ sha512_block_data_order_zvkb_zvknhb);
+ sha512_base_do_finalize(desc,
+ sha512_block_data_order_zvkb_zvknhb);
+ kernel_vector_end();
+
+ return sha512_base_finish(desc, out);
+ }
+
+ return crypto_sha512_finup(desc, data, len, out);
+}
+
+static int riscv64_sha512_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_algs[] = {
+ {
+ .init = sha512_base_init,
+ .update = riscv64_sha512_update,
+ .final = riscv64_sha512_final,
+ .finup = riscv64_sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .digestsize = SHA512_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA512_BLOCK_SIZE,
+ .cra_priority = 150,
+ .cra_name = "sha512",
+ .cra_driver_name = "sha512-riscv64-zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ },
+ {
+ .init = sha384_base_init,
+ .update = riscv64_sha512_update,
+ .final = riscv64_sha512_final,
+ .finup = riscv64_sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .digestsize = SHA384_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA384_BLOCK_SIZE,
+ .cra_priority = 150,
+ .cra_name = "sha384",
+ .cra_driver_name = "sha384-riscv64-zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
+
+static inline bool check_sha512_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKNHB) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sha512_mod_init(void)
+{
+ if (check_sha512_ext())
+ return crypto_register_shashes(sha512_algs,
+ ARRAY_SIZE(sha512_algs));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sha512_mod_fini(void)
+{
+ crypto_unregister_shashes(sha512_algs, ARRAY_SIZE(sha512_algs));
+}
+
+module_init(riscv64_sha512_mod_init);
+module_exit(riscv64_sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha384");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl b/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl
new file mode 100644
index 000000000000..4be448266a59
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl
@@ -0,0 +1,266 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4");
+
+################################################################################
+# void sha512_block_data_order_zvkb_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvkb_zvknhb
+.type sha512_block_data_order_zvkb_zvknhb,\@function
+sha512_block_data_order_zvkb_zvknhb:
+ @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+
+ # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+ # The dst vtype is e64m2 and the index vtype is e8mf4.
+ # We use index-load with the following index pattern at v1.
+ # i8 index:
+ # 40, 32, 8, 0
+ # Instead of setting the i8 index, we could use a single 32bit
+ # little-endian value to cover the 4xi8 index.
+ # i32 value:
+ # 0x 00 08 20 28
+ li $INDEX_PATTERN, 0x00082028
+ @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_x $V1, $INDEX_PATTERN]}
+
+ addi $H2, $H, 16
+
+ # Use index-load to get {f,e,b,a},{h,g,d,c}
+ @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+ @{[vluxei8_v $V22, $H, $V1]}
+ @{[vluxei8_v $V24, $H2, $V1]}
+
+ # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling.
+ # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking.
+ @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V0, 0x01]}
+
+ @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+
+L_round_loop:
+ # Load round constants K512
+ la $KT, $K512
+
+ # Decrement length by 1
+ addi $LEN, $LEN, -1
+
+ # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}.
+ @{[vmv_v_v $V26, $V22]}
+ @{[vmv_v_v $V28, $V24]}
+
+ # Load the 1024-bits of the message block in v10-v16 and perform the endian
+ # swap.
+ @{[vle64_v $V10, $INP]}
+ @{[vrev8_v $V10, $V10]}
+ addi $INP, $INP, 32
+ @{[vle64_v $V12, $INP]}
+ @{[vrev8_v $V12, $V12]}
+ addi $INP, $INP, 32
+ @{[vle64_v $V14, $INP]}
+ @{[vrev8_v $V14, $V14]}
+ addi $INP, $INP, 32
+ @{[vle64_v $V16, $INP]}
+ @{[vrev8_v $V16, $V16]}
+ addi $INP, $INP, 32
+
+ .rept 4
+ # Quad-round 0 (+0, v10->v12->v14->v16)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V10]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V14, $V12, $V0]}
+ @{[vsha2ms_vv $V10, $V18, $V16]}
+
+ # Quad-round 1 (+1, v12->v14->v16->v10)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V12]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V16, $V14, $V0]}
+ @{[vsha2ms_vv $V12, $V18, $V10]}
+
+ # Quad-round 2 (+2, v14->v16->v10->v12)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V14]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V10, $V16, $V0]}
+ @{[vsha2ms_vv $V14, $V18, $V12]}
+
+ # Quad-round 3 (+3, v16->v10->v12->v14)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V16]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V12, $V10, $V0]}
+ @{[vsha2ms_vv $V16, $V18, $V14]}
+ .endr
+
+ # Quad-round 16 (+0, v10->v12->v14->v16)
+ # Note that we stop generating new message schedule words (Wt, v10-16)
+ # as we already generated all the words we end up consuming (i.e., W[79:76]).
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V10]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # Quad-round 17 (+1, v12->v14->v16->v10)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V12]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # Quad-round 18 (+2, v14->v16->v10->v12)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V14]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # Quad-round 19 (+3, v16->v10->v12->v14)
+ @{[vle64_v $V20, $KT]}
+ # No t1 increment needed.
+ @{[vadd_vv $V18, $V20, $V16]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # H' = H+{a',b',c',...,h'}
+ @{[vadd_vv $V22, $V26, $V22]}
+ @{[vadd_vv $V24, $V28, $V24]}
+ bnez $LEN, L_round_loop
+
+ # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}.
+ @{[vsuxei8_v $V22, $H, $V1]}
+ @{[vsuxei8_v $V24, $H2, $V1]}
+
+ ret
+.size sha512_block_data_order_zvkb_zvknhb,.-sha512_block_data_order_zvkb_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+ .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+ .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+ .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+ .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+ .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+ .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+ .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+ .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+ .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+ .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+ .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+ .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+ .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+ .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+ .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+ .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+ .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+ .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+ .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+ .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+ .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+ .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+ .dword 0xd192e819d6ef5218, 0xd69906245565a910
+ .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+ .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+ .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+ .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+ .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+ .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+ .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+ .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+ .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+ .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+ .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+ .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+ .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+ .dword 0x28db77f523047d84, 0x32caab7b40c72493
+ .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+ .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+ .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-27 07:09:41

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 11/13] RISC-V: crypto: add Zvksed accelerated SM4 implementation

Add SM4 implementation using Zvksed vector crypto extension from OpenSSL
(openssl/openssl#21923).

The perlasm here is different from the original implementation in OpenSSL.
In OpenSSL, SM4 has the separated set_encrypt_key and set_decrypt_key
functions. In kernel, these set_key functions are merged into a single
one in order to skip the redundant key expanding instructions.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `SM4_RISCV64` option by default.
- Add the missed `static` declaration for riscv64_sm4_zvksed_alg.
- Add `asmlinkage` qualifier for crypto asm function.
- Rename sm4-riscv64-zvkb-zvksed to sm4-riscv64-zvksed-zvkb.
- Reorder structure riscv64_sm4_zvksed_zvkb_alg members initialization
in the order declared.
---
arch/riscv/crypto/Kconfig | 17 ++
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sm4-riscv64-glue.c | 121 +++++++++++
arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++++++++++++++++++++++
4 files changed, 413 insertions(+)
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index ad0b08a13c9a..b28cf1972250 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -66,4 +66,21 @@ config CRYPTO_SHA512_RISCV64
- Zvknhb vector crypto extension
- Zvkb vector crypto extension

+config CRYPTO_SM4_RISCV64
+ tristate "Ciphers: SM4 (ShangMi 4)"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_ALGAPI
+ select CRYPTO_SM4
+ help
+ SM4 cipher algorithms (OSCCA GB/T 32907-2016,
+ ISO/IEC 18033-3:2010/Amd 1:2021)
+
+ SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+ Organization of State Commercial Administration of China (OSCCA)
+ as an authorized cryptographic algorithms for the use within China.
+
+ Architecture: riscv64 using:
+ - Zvksed vector crypto extension
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 8aabef950ad3..8e34861bba34 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o
obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o

+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -39,9 +42,13 @@ $(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha256-riscv64-zvknha_or_z
$(obj)/sha512-riscv64-zvknhb-zvkb.S: $(src)/sha512-riscv64-zvknhb-zvkb.pl
$(call cmd,perlasm)

+$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvkned-zvbb-zvkg.S
clean-files += aes-riscv64-zvkned-zvkb.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvknha_or_zvknhb-zvkb.S
clean-files += sha512-riscv64-zvknhb-zvkb.S
+clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 000000000000..9d9d24b67ee3
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv64 port of the OpenSSL SM4 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/sm4.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* sm4 using zvksed vector crypto extension */
+asmlinkage void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const u32 *key);
+asmlinkage void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const u32 *key);
+asmlinkage int rv64i_zvksed_sm4_set_key(const u8 *user_key,
+ unsigned int key_len, u32 *enc_key,
+ u32 *dec_key);
+
+static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int key_len)
+{
+ struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+ int ret = 0;
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (rv64i_zvksed_sm4_set_key(key, key_len, ctx->rkey_enc,
+ ctx->rkey_dec))
+ ret = -EINVAL;
+ kernel_vector_end();
+ } else {
+ ret = sm4_expandkey(ctx, key, key_len);
+ }
+
+ return ret;
+}
+
+static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst,
+ const u8 *src)
+{
+ const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvksed_sm4_encrypt(src, dst, ctx->rkey_enc);
+ kernel_vector_end();
+ } else {
+ sm4_crypt_block(ctx->rkey_enc, dst, src);
+ }
+}
+
+static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst,
+ const u8 *src)
+{
+ const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvksed_sm4_decrypt(src, dst, ctx->rkey_dec);
+ kernel_vector_end();
+ } else {
+ sm4_crypt_block(ctx->rkey_dec, dst, src);
+ }
+}
+
+static struct crypto_alg riscv64_sm4_zvksed_zvkb_alg = {
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = SM4_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct sm4_ctx),
+ .cra_priority = 300,
+ .cra_name = "sm4",
+ .cra_driver_name = "sm4-riscv64-zvksed-zvkb",
+ .cra_cipher = {
+ .cia_min_keysize = SM4_KEY_SIZE,
+ .cia_max_keysize = SM4_KEY_SIZE,
+ .cia_setkey = riscv64_sm4_setkey_zvksed,
+ .cia_encrypt = riscv64_sm4_encrypt_zvksed,
+ .cia_decrypt = riscv64_sm4_decrypt_zvksed,
+ },
+ .cra_module = THIS_MODULE,
+};
+
+static inline bool check_sm4_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKSED) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sm4_mod_init(void)
+{
+ if (check_sm4_ext())
+ return crypto_register_alg(&riscv64_sm4_zvksed_zvkb_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+ crypto_unregister_alg(&riscv64_sm4_zvksed_zvkb_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
new file mode 100644
index 000000000000..dab1d026a360
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
@@ -0,0 +1,268 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SM4 Block Cipher extension ('Zvksed')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+####
+# int rv64i_zvksed_sm4_set_key(const u8 *user_key, unsigned int key_len,
+# u32 *enc_key, u32 *dec_key);
+#
+{
+my ($ukey,$key_len,$enc_key,$dec_key)=("a0","a1","a2","a3");
+my ($fk,$stride)=("a4","a5");
+my ($t0,$t1)=("t0","t1");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_key
+.type rv64i_zvksed_sm4_set_key,\@function
+rv64i_zvksed_sm4_set_key:
+ li $t0, 16
+ beq $t0, $key_len, 1f
+ li a0, 1
+ ret
+1:
+
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load the user key
+ @{[vle32_v $vukey, $ukey]}
+ @{[vrev8_v $vukey, $vukey]}
+
+ # Load the FK.
+ la $fk, FK
+ @{[vle32_v $vfk, $fk]}
+
+ # Generate round keys.
+ @{[vxor_vv $vukey, $vukey, $vfk]}
+ @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+ @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+ @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+ @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+ @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+ @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+ @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+ @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+ # Store enc round keys
+ @{[vse32_v $vk0, $enc_key]} # rk[0:3]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk1, $enc_key]} # rk[4:7]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk2, $enc_key]} # rk[8:11]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk3, $enc_key]} # rk[12:15]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk4, $enc_key]} # rk[16:19]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk5, $enc_key]} # rk[20:23]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk6, $enc_key]} # rk[24:27]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk7, $enc_key]} # rk[28:31]
+
+ # Store dec round keys in reverse order
+ addi $dec_key, $dec_key, 12
+ li $stride, -4
+ @{[vsse32_v $vk7, $dec_key, $stride]} # rk[31:28]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk6, $dec_key, $stride]} # rk[27:24]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk5, $dec_key, $stride]} # rk[23:20]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk4, $dec_key, $stride]} # rk[19:16]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk3, $dec_key, $stride]} # rk[15:12]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk2, $dec_key, $stride]} # rk[11:8]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk1, $dec_key, $stride]} # rk[7:4]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk0, $dec_key, $stride]} # rk[3:0]
+
+ li a0, 0
+ ret
+.size rv64i_zvksed_sm4_set_key,.-rv64i_zvksed_sm4_set_key
+___
+}
+
+####
+# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out,
+# const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_encrypt
+.type rv64i_zvksed_sm4_encrypt,\@function
+rv64i_zvksed_sm4_encrypt:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load input data
+ @{[vle32_v $vdata, $in]}
+ @{[vrev8_v $vdata, $vdata]}
+
+ # Order of elements was adjusted in sm4_set_key()
+ # Encrypt with all keys
+ @{[vle32_v $vk0, $keys]} # rk[0:3]
+ @{[vsm4r_vs $vdata, $vk0]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk1, $keys]} # rk[4:7]
+ @{[vsm4r_vs $vdata, $vk1]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk2, $keys]} # rk[8:11]
+ @{[vsm4r_vs $vdata, $vk2]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk3, $keys]} # rk[12:15]
+ @{[vsm4r_vs $vdata, $vk3]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk4, $keys]} # rk[16:19]
+ @{[vsm4r_vs $vdata, $vk4]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk5, $keys]} # rk[20:23]
+ @{[vsm4r_vs $vdata, $vk5]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk6, $keys]} # rk[24:27]
+ @{[vsm4r_vs $vdata, $vk6]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk7, $keys]} # rk[28:31]
+ @{[vsm4r_vs $vdata, $vk7]}
+
+ # Save the ciphertext (in reverse element order)
+ @{[vrev8_v $vdata, $vdata]}
+ li $stride, -4
+ addi $out, $out, 12
+ @{[vsse32_v $vdata, $out, $stride]}
+
+ ret
+.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt
+___
+}
+
+####
+# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out,
+# const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_decrypt
+.type rv64i_zvksed_sm4_decrypt,\@function
+rv64i_zvksed_sm4_decrypt:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load input data
+ @{[vle32_v $vdata, $in]}
+ @{[vrev8_v $vdata, $vdata]}
+
+ # Order of key elements was adjusted in sm4_set_key()
+ # Decrypt with all keys
+ @{[vle32_v $vk7, $keys]} # rk[31:28]
+ @{[vsm4r_vs $vdata, $vk7]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk6, $keys]} # rk[27:24]
+ @{[vsm4r_vs $vdata, $vk6]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk5, $keys]} # rk[23:20]
+ @{[vsm4r_vs $vdata, $vk5]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk4, $keys]} # rk[19:16]
+ @{[vsm4r_vs $vdata, $vk4]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk3, $keys]} # rk[15:11]
+ @{[vsm4r_vs $vdata, $vk3]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk2, $keys]} # rk[11:8]
+ @{[vsm4r_vs $vdata, $vk2]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk1, $keys]} # rk[7:4]
+ @{[vsm4r_vs $vdata, $vk1]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk0, $keys]} # rk[3:0]
+ @{[vsm4r_vs $vdata, $vk0]}
+
+ # Save the ciphertext (in reverse element order)
+ @{[vrev8_v $vdata, $vdata]}
+ li $stride, -4
+ addi $out, $out, 12
+ @{[vsse32_v $vdata, $out, $stride]}
+
+ ret
+.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt
+___
+}
+
+$code .= <<___;
+# Family Key (little-endian 32-bit chunks)
+.p2align 3
+FK:
+ .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FK,.-FK
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-27 07:09:46

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 12/13] RISC-V: crypto: add Zvksh accelerated SM3 implementation

Add SM3 implementation using Zvksh vector crypto extension from OpenSSL
(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `SM3_RISCV64` option by default.
- Add `asmlinkage` qualifier for crypto asm function.
- Rename sm3-riscv64-zvkb-zvksh to sm3-riscv64-zvksh-zvkb.
- Reorder structure sm3_alg members initialization in the order declared.
---
arch/riscv/crypto/Kconfig | 12 ++
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sm3-riscv64-glue.c | 124 +++++++++++++
arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 +++++++++++++++++++++++++
4 files changed, 373 insertions(+)
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index b28cf1972250..7415fb303785 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -66,6 +66,18 @@ config CRYPTO_SHA512_RISCV64
- Zvknhb vector crypto extension
- Zvkb vector crypto extension

+config CRYPTO_SM3_RISCV64
+ tristate "Hash functions: SM3 (ShangMi 3)"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_HASH
+ select CRYPTO_SM3
+ help
+ SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)
+
+ Architecture: riscv64 using:
+ - Zvksh vector crypto extension
+ - Zvkb vector crypto extension
+
config CRYPTO_SM4_RISCV64
tristate "Ciphers: SM4 (ShangMi 4)"
depends on 64BIT && RISCV_ISA_V
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 8e34861bba34..b1f857695c1c 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o
obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o

+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o
+
obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o

@@ -42,6 +45,9 @@ $(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha256-riscv64-zvknha_or_z
$(obj)/sha512-riscv64-zvknhb-zvkb.S: $(src)/sha512-riscv64-zvknhb-zvkb.pl
$(call cmd,perlasm)

+$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl
+ $(call cmd,perlasm)
+
$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
$(call cmd,perlasm)

@@ -51,4 +57,5 @@ clean-files += aes-riscv64-zvkned-zvkb.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvknha_or_zvknhb-zvkb.S
clean-files += sha512-riscv64-zvknhb-zvkb.S
+clean-files += sm3-riscv64-zvksh.S
clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 000000000000..63c7af338877
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SM3 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sm3_base.h>
+
+/*
+ * sm3 using zvksh vector crypto extension
+ *
+ * This asm function will just take the first 256-bit as the sm3 state from
+ * the pointer to `struct sm3_state`.
+ */
+asmlinkage void ossl_hwsm3_block_data_order_zvksh(struct sm3_state *digest,
+ u8 const *o, int num);
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ int ret = 0;
+
+ /*
+ * Make sure struct sm3_state begins directly with the SM3 256-bit internal
+ * state, as this is what the asm function expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ret = sm3_base_do_update(desc, data, len,
+ ossl_hwsm3_block_data_order_zvksh);
+ kernel_vector_end();
+ } else {
+ sm3_update(shash_desc_ctx(desc), data, len);
+ }
+
+ return ret;
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sm3_state *ctx;
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sm3_base_do_update(desc, data, len,
+ ossl_hwsm3_block_data_order_zvksh);
+ sm3_base_do_finalize(desc, ossl_hwsm3_block_data_order_zvksh);
+ kernel_vector_end();
+
+ return sm3_base_finish(desc, out);
+ }
+
+ ctx = shash_desc_ctx(desc);
+ if (len)
+ sm3_update(ctx, data, len);
+ sm3_final(ctx, out);
+
+ return 0;
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sm3_alg = {
+ .init = sm3_base_init,
+ .update = riscv64_sm3_update,
+ .final = riscv64_sm3_final,
+ .finup = riscv64_sm3_finup,
+ .descsize = sizeof(struct sm3_state),
+ .digestsize = SM3_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SM3_BLOCK_SIZE,
+ .cra_priority = 150,
+ .cra_name = "sm3",
+ .cra_driver_name = "sm3-riscv64-zvksh-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static inline bool check_sm3_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKSH) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_riscv64_sm3_mod_init(void)
+{
+ if (check_sm3_ext())
+ return crypto_register_shash(&sm3_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sm3_mod_fini(void)
+{
+ crypto_unregister_shash(&sm3_alg);
+}
+
+module_init(riscv64_riscv64_sm3_mod_init);
+module_exit(riscv64_sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
new file mode 100644
index 000000000000..942d78d982e9
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
@@ -0,0 +1,230 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SM3 Secure Hash extension ('Zvksh')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num);
+{
+my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+$code .= <<___;
+.text
+.p2align 3
+.globl ossl_hwsm3_block_data_order_zvksh
+.type ossl_hwsm3_block_data_order_zvksh,\@function
+ossl_hwsm3_block_data_order_zvksh:
+ @{[vsetivli "zero", 8, "e32", "m2", "ta", "ma"]}
+
+ # Load initial state of hash context (c->A-H).
+ @{[vle32_v $V0, $CTX]}
+ @{[vrev8_v $V0, $V0]}
+
+L_sm3_loop:
+ # Copy the previous state to v2.
+ # It will be XOR'ed with the current state at the end of the round.
+ @{[vmv_v_v $V2, $V0]}
+
+ # Load the 64B block in 2x32B chunks.
+ @{[vle32_v $V6, $INPUT]} # v6 := {w7, ..., w0}
+ addi $INPUT, $INPUT, 32
+
+ @{[vle32_v $V8, $INPUT]} # v8 := {w15, ..., w8}
+ addi $INPUT, $INPUT, 32
+
+ addi $NUM, $NUM, -1
+
+ # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input
+ # 2 elements down so we process elements w2, w3, w6, w7
+ # This will be repeated for each odd round.
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w7, ..., w2}
+
+ @{[vsm3c_vi $V0, $V6, 0]}
+ @{[vsm3c_vi $V0, $V4, 1]}
+
+ # Prepare a vector with {w11, ..., w4}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w7, ..., w4}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w11, w10, w9, w8, w7, w6, w5, w4}
+
+ @{[vsm3c_vi $V0, $V4, 2]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w11, w10, w9, w8, w7, w6}
+ @{[vsm3c_vi $V0, $V4, 3]}
+
+ @{[vsm3c_vi $V0, $V8, 4]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w15, w14, w13, w12, w11, w10}
+ @{[vsm3c_vi $V0, $V4, 5]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w23, w22, w21, w20, w19, w18, w17, w16}
+
+ # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w15, w14, w13, w12}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w19, w18, w17, w16, w15, w14, w13, w12}
+
+ @{[vsm3c_vi $V0, $V4, 6]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w19, w18, w17, w16, w15, w14}
+ @{[vsm3c_vi $V0, $V4, 7]}
+
+ @{[vsm3c_vi $V0, $V6, 8]}
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w23, w22, w21, w20, w19, w18}
+ @{[vsm3c_vi $V0, $V4, 9]}
+
+ @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w31, w30, w29, w28, w27, w26, w25, w24}
+
+ # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w23, w22, w21, w20}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w27, w26, w25, w24, w23, w22, w21, w20}
+
+ @{[vsm3c_vi $V0, $V4, 10]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w27, w26, w25, w24, w23, w22}
+ @{[vsm3c_vi $V0, $V4, 11]}
+
+ @{[vsm3c_vi $V0, $V8, 12]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {x, X, w31, w30, w29, w28, w27, w26}
+ @{[vsm3c_vi $V0, $V4, 13]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w32, w33, w34, w35, w36, w37, w38, w39}
+
+ # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w31, w30, w29, w28}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w35, w34, w33, w32, w31, w30, w29, w28}
+
+ @{[vsm3c_vi $V0, $V4, 14]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w35, w34, w33, w32, w31, w30}
+ @{[vsm3c_vi $V0, $V4, 15]}
+
+ @{[vsm3c_vi $V0, $V6, 16]}
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w39, w38, w37, w36, w35, w34}
+ @{[vsm3c_vi $V0, $V4, 17]}
+
+ @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w47, w46, w45, w44, w43, w42, w41, w40}
+
+ # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w39, w38, w37, w36}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w43, w42, w41, w40, w39, w38, w37, w36}
+
+ @{[vsm3c_vi $V0, $V4, 18]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w43, w42, w41, w40, w39, w38}
+ @{[vsm3c_vi $V0, $V4, 19]}
+
+ @{[vsm3c_vi $V0, $V8, 20]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w47, w46, w45, w44, w43, w42}
+ @{[vsm3c_vi $V0, $V4, 21]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w55, w54, w53, w52, w51, w50, w49, w48}
+
+ # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w47, w46, w45, w44}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w51, w50, w49, w48, w47, w46, w45, w44}
+
+ @{[vsm3c_vi $V0, $V4, 22]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w51, w50, w49, w48, w47, w46}
+ @{[vsm3c_vi $V0, $V4, 23]}
+
+ @{[vsm3c_vi $V0, $V6, 24]}
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w55, w54, w53, w52, w51, w50}
+ @{[vsm3c_vi $V0, $V4, 25]}
+
+ @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w63, w62, w61, w60, w59, w58, w57, w56}
+
+ # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w55, w54, w53, w52}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w59, w58, w57, w56, w55, w54, w53, w52}
+
+ @{[vsm3c_vi $V0, $V4, 26]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w59, w58, w57, w56, w55, w54}
+ @{[vsm3c_vi $V0, $V4, 27]}
+
+ @{[vsm3c_vi $V0, $V8, 28]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w63, w62, w61, w60, w59, w58}
+ @{[vsm3c_vi $V0, $V4, 29]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w71, w70, w69, w68, w67, w66, w65, w64}
+
+ # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w63, w62, w61, w60}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w67, w66, w65, w64, w63, w62, w61, w60}
+
+ @{[vsm3c_vi $V0, $V4, 30]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w67, w66, w65, w64, w63, w62}
+ @{[vsm3c_vi $V0, $V4, 31]}
+
+ # XOR in the previous state.
+ @{[vxor_vv $V0, $V0, $V2]}
+
+ bnez $NUM, L_sm3_loop # Check if there are any more block to process
+L_sm3_end:
+ @{[vrev8_v $V0, $V0]}
+ @{[vse32_v $V0, $CTX]}
+ ret
+
+.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-27 07:09:53

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 08/13] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation

Add a gcm hash implementation using the Zvkg extension from OpenSSL
(openssl/openssl#21923).

The perlasm here is different from the original implementation in OpenSSL.
The OpenSSL assumes that the H is stored in little-endian. Thus, it needs
to convert the H to big-endian for Zvkg instructions. In kernel, we have
the big-endian H directly. There is no need for endian conversion.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `GHASH_RISCV64` option by default.
- Add `asmlinkage` qualifier for crypto asm function.
- Update the ghash fallback path in ghash_blocks().
- Rename structure riscv64_ghash_context to riscv64_ghash_tfm_ctx.
- Fold ghash_update_zvkg() and ghash_final_zvkg().
- Reorder structure riscv64_ghash_alg_zvkg members initialization in the
order declared.
---
arch/riscv/crypto/Kconfig | 10 ++
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/ghash-riscv64-glue.c | 175 ++++++++++++++++++++++++
arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 ++++++++++++++
4 files changed, 292 insertions(+)
create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 9d991ddda289..6863f01a2ab0 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -34,4 +34,14 @@ config CRYPTO_AES_BLOCK_RISCV64
- Zvkb vector crypto extension (CTR/XTS)
- Zvkg vector crypto extension (XTS)

+config CRYPTO_GHASH_RISCV64
+ tristate "Hash functions: GHASH"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_GCM
+ help
+ GCM GHASH function (NIST SP 800-38D)
+
+ Architecture: riscv64 using:
+ - Zvkg vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 9574b009762f..94a7f8eaa8a7 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o

+obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
+ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -21,6 +24,10 @@ $(obj)/aes-riscv64-zvkned-zvbb-zvkg.S: $(src)/aes-riscv64-zvkned-zvbb-zvkg.pl
$(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl
$(call cmd,perlasm)

+$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvkned-zvbb-zvkg.S
clean-files += aes-riscv64-zvkned-zvkb.S
+clean-files += ghash-riscv64-zvkg.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
new file mode 100644
index 000000000000..b01ab5714677
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * RISC-V optimized GHASH routines
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/ghash.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* ghash using zvkg vector crypto extension */
+asmlinkage void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp,
+ size_t len);
+
+struct riscv64_ghash_tfm_ctx {
+ be128 key;
+};
+
+struct riscv64_ghash_desc_ctx {
+ be128 shash;
+ u8 buffer[GHASH_BLOCK_SIZE];
+ u32 bytes;
+};
+
+static inline void ghash_blocks(const struct riscv64_ghash_tfm_ctx *tctx,
+ struct riscv64_ghash_desc_ctx *dctx,
+ const u8 *src, size_t srclen)
+{
+ /* The srclen is nonzero and a multiple of 16. */
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ gcm_ghash_rv64i_zvkg(&dctx->shash, &tctx->key, src, srclen);
+ kernel_vector_end();
+ } else {
+ do {
+ crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE);
+ gf128mul_lle(&dctx->shash, &tctx->key);
+ srclen -= GHASH_BLOCK_SIZE;
+ src += GHASH_BLOCK_SIZE;
+ } while (srclen);
+ }
+}
+
+static int ghash_init(struct shash_desc *desc)
+{
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+ *dctx = (struct riscv64_ghash_desc_ctx){};
+
+ return 0;
+}
+
+static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src,
+ unsigned int srclen)
+{
+ size_t len;
+ const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+ if (dctx->bytes) {
+ if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) {
+ memcpy(dctx->buffer + dctx->bytes, src, srclen);
+ dctx->bytes += srclen;
+ return 0;
+ }
+ memcpy(dctx->buffer + dctx->bytes, src,
+ GHASH_BLOCK_SIZE - dctx->bytes);
+
+ ghash_blocks(tctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE);
+
+ src += GHASH_BLOCK_SIZE - dctx->bytes;
+ srclen -= GHASH_BLOCK_SIZE - dctx->bytes;
+ dctx->bytes = 0;
+ }
+ len = srclen & ~(GHASH_BLOCK_SIZE - 1);
+
+ if (len) {
+ ghash_blocks(tctx, dctx, src, len);
+ src += len;
+ srclen -= len;
+ }
+
+ if (srclen) {
+ memcpy(dctx->buffer, src, srclen);
+ dctx->bytes = srclen;
+ }
+
+ return 0;
+}
+
+static int ghash_final_zvkg(struct shash_desc *desc, u8 *out)
+{
+ const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+ int i;
+
+ if (dctx->bytes) {
+ for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
+ dctx->buffer[i] = 0;
+
+ ghash_blocks(tctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE);
+ }
+
+ memcpy(out, &dctx->shash, GHASH_DIGEST_SIZE);
+
+ return 0;
+}
+
+static int ghash_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(tfm);
+
+ if (keylen != GHASH_BLOCK_SIZE)
+ return -EINVAL;
+
+ memcpy(&tctx->key, key, GHASH_BLOCK_SIZE);
+
+ return 0;
+}
+
+static struct shash_alg riscv64_ghash_alg_zvkg = {
+ .init = ghash_init,
+ .update = ghash_update_zvkg,
+ .final = ghash_final_zvkg,
+ .setkey = ghash_setkey,
+ .descsize = sizeof(struct riscv64_ghash_desc_ctx),
+ .digestsize = GHASH_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = GHASH_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_ghash_tfm_ctx),
+ .cra_priority = 303,
+ .cra_name = "ghash",
+ .cra_driver_name = "ghash-riscv64-zvkg",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static inline bool check_ghash_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKG) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_ghash_mod_init(void)
+{
+ if (check_ghash_ext())
+ return crypto_register_shash(&riscv64_ghash_alg_zvkg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_ghash_mod_fini(void)
+{
+ crypto_unregister_shash(&riscv64_ghash_alg_zvkg);
+}
+
+module_init(riscv64_ghash_mod_init);
+module_exit(riscv64_ghash_mod_fini);
+
+MODULE_DESCRIPTION("GCM GHASH (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
new file mode 100644
index 000000000000..4beea4ac9cbe
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
@@ -0,0 +1,100 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector GCM/GMAC extension ('Zvkg')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+###############################################################################
+# void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len)
+#
+# input: Xi: current hash value
+# H: hash key
+# inp: pointer to input data
+# len: length of input data in bytes (multiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$H,$inp,$len) = ("a0","a1","a2","a3");
+my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkg
+.type gcm_ghash_rv64i_zvkg,\@function
+gcm_ghash_rv64i_zvkg:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $vH, $H]}
+ @{[vle32_v $vXi, $Xi]}
+
+Lstep:
+ @{[vle32_v $vinp, $inp]}
+ add $inp, $inp, 16
+ add $len, $len, -16
+ @{[vghsh_vv $vXi, $vH, $vinp]}
+ bnez $len, Lstep
+
+ @{[vse32_v $vXi, $Xi]}
+ ret
+
+.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-27 07:09:57

by Jerry Shih

[permalink] [raw]
Subject: [PATCH v2 13/13] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

Add a ChaCha20 vector implementation from OpenSSL(openssl/openssl#21923).

Signed-off-by: Jerry Shih <[email protected]>
---
Changelog v2:
- Do not turn on kconfig `CHACHA20_RISCV64` option by default.
- Use simd skcipher interface.
- Add `asmlinkage` qualifier for crypto asm function.
- Reorder structure riscv64_chacha_alg_zvkb members initialization in
the order declared.
- Use smaller iv buffer instead of whole state matrix as chacha20's
input.
---
arch/riscv/crypto/Kconfig | 12 +
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/chacha-riscv64-glue.c | 122 +++++++++
arch/riscv/crypto/chacha-riscv64-zvkb.pl | 321 +++++++++++++++++++++++
4 files changed, 462 insertions(+)
create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 7415fb303785..1932297a1e73 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -34,6 +34,18 @@ config CRYPTO_AES_BLOCK_RISCV64
- Zvkb vector crypto extension (CTR/XTS)
- Zvkg vector crypto extension (XTS)

+config CRYPTO_CHACHA20_RISCV64
+ tristate "Ciphers: ChaCha20"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_SIMD
+ select CRYPTO_SKCIPHER
+ select CRYPTO_LIB_CHACHA_GENERIC
+ help
+ Length-preserving ciphers: ChaCha20 stream cipher algorithm
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+
config CRYPTO_GHASH_RISCV64
tristate "Hash functions: GHASH"
depends on 64BIT && RISCV_ISA_V
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b1f857695c1c..748c53aa38dc 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o

+obj-$(CONFIG_CRYPTO_CHACHA20_RISCV64) += chacha-riscv64.o
+chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o
+
obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

@@ -36,6 +39,9 @@ $(obj)/aes-riscv64-zvkned-zvbb-zvkg.S: $(src)/aes-riscv64-zvkned-zvbb-zvkg.pl
$(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl
$(call cmd,perlasm)

+$(obj)/chacha-riscv64-zvkb.S: $(src)/chacha-riscv64-zvkb.pl
+ $(call cmd,perlasm)
+
$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(call cmd,perlasm)

@@ -54,6 +60,7 @@ $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvkned-zvbb-zvkg.S
clean-files += aes-riscv64-zvkned-zvkb.S
+clean-files += chacha-riscv64-zvkb.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvknha_or_zvknhb-zvkb.S
clean-files += sha512-riscv64-zvknhb-zvkb.S
diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
new file mode 100644
index 000000000000..96047cb75222
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-glue.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL ChaCha20 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/chacha.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/crypto.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* chacha20 using zvkb vector crypto extension */
+asmlinkage void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len,
+ const u32 *key, const u32 *counter);
+
+static int chacha20_encrypt(struct skcipher_request *req)
+{
+ u32 iv[CHACHA_IV_SIZE / sizeof(u32)];
+ u8 block_buffer[CHACHA_BLOCK_SIZE];
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ unsigned int tail_bytes;
+ int err;
+
+ iv[0] = get_unaligned_le32(req->iv);
+ iv[1] = get_unaligned_le32(req->iv + 4);
+ iv[2] = get_unaligned_le32(req->iv + 8);
+ iv[3] = get_unaligned_le32(req->iv + 12);
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while (walk.nbytes) {
+ nbytes = walk.nbytes & (~(CHACHA_BLOCK_SIZE - 1));
+ tail_bytes = walk.nbytes & (CHACHA_BLOCK_SIZE - 1);
+ kernel_vector_begin();
+ if (nbytes) {
+ ChaCha20_ctr32_zvkb(walk.dst.virt.addr,
+ walk.src.virt.addr, nbytes,
+ ctx->key, iv);
+ iv[0] += nbytes / CHACHA_BLOCK_SIZE;
+ }
+ if (walk.nbytes == walk.total && tail_bytes > 0) {
+ memcpy(block_buffer, walk.src.virt.addr + nbytes,
+ tail_bytes);
+ ChaCha20_ctr32_zvkb(block_buffer, block_buffer,
+ CHACHA_BLOCK_SIZE, ctx->key, iv);
+ memcpy(walk.dst.virt.addr + nbytes, block_buffer,
+ tail_bytes);
+ tail_bytes = 0;
+ }
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, tail_bytes);
+ }
+
+ return err;
+}
+
+static struct skcipher_alg riscv64_chacha_alg_zvkb[] = {
+ {
+ .setkey = chacha20_setkey,
+ .encrypt = chacha20_encrypt,
+ .decrypt = chacha20_encrypt,
+ .min_keysize = CHACHA_KEY_SIZE,
+ .max_keysize = CHACHA_KEY_SIZE,
+ .ivsize = CHACHA_IV_SIZE,
+ .chunksize = CHACHA_BLOCK_SIZE,
+ .walksize = CHACHA_BLOCK_SIZE * 4,
+ .base = {
+ .cra_flags = CRYPTO_ALG_INTERNAL,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct chacha_ctx),
+ .cra_priority = 300,
+ .cra_name = "__chacha20",
+ .cra_driver_name = "__chacha20-riscv64-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ }
+};
+
+static struct simd_skcipher_alg
+ *riscv64_chacha_simd_alg_zvkb[ARRAY_SIZE(riscv64_chacha_alg_zvkb)];
+
+static inline bool check_chacha20_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_chacha_mod_init(void)
+{
+ if (check_chacha20_ext())
+ return simd_register_skciphers_compat(
+ riscv64_chacha_alg_zvkb,
+ ARRAY_SIZE(riscv64_chacha_alg_zvkb),
+ riscv64_chacha_simd_alg_zvkb);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_chacha_mod_fini(void)
+{
+ simd_unregister_skciphers(riscv64_chacha_alg_zvkb,
+ ARRAY_SIZE(riscv64_chacha_alg_zvkb),
+ riscv64_chacha_simd_alg_zvkb);
+}
+
+module_init(riscv64_chacha_mod_init);
+module_exit(riscv64_chacha_mod_fini);
+
+MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("chacha20");
diff --git a/arch/riscv/crypto/chacha-riscv64-zvkb.pl b/arch/riscv/crypto/chacha-riscv64-zvkb.pl
new file mode 100644
index 000000000000..6602ef79452f
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-zvkb.pl
@@ -0,0 +1,321 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023-2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You may not use
+# this file except in compliance with the License. You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT, ">$output";
+
+my $code = <<___;
+.text
+___
+
+# void ChaCha20_ctr32_zvkb(unsigned char *out, const unsigned char *inp,
+# size_t len, const unsigned int key[8],
+# const unsigned int counter[4]);
+################################################################################
+my ( $OUTPUT, $INPUT, $LEN, $KEY, $COUNTER ) = ( "a0", "a1", "a2", "a3", "a4" );
+my ( $T0 ) = ( "t0" );
+my ( $CONST_DATA0, $CONST_DATA1, $CONST_DATA2, $CONST_DATA3 ) =
+ ( "a5", "a6", "a7", "t1" );
+my ( $KEY0, $KEY1, $KEY2,$KEY3, $KEY4, $KEY5, $KEY6, $KEY7,
+ $COUNTER0, $COUNTER1, $NONCE0, $NONCE1
+) = ( "s0", "s1", "s2", "s3", "s4", "s5", "s6",
+ "s7", "s8", "s9", "s10", "s11" );
+my ( $VL, $STRIDE, $CHACHA_LOOP_COUNT ) = ( "t2", "t3", "t4" );
+my (
+ $V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, $V8, $V9, $V10,
+ $V11, $V12, $V13, $V14, $V15, $V16, $V17, $V18, $V19, $V20, $V21,
+ $V22, $V23, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map( "v$_", ( 0 .. 31 ) );
+
+sub chacha_quad_round_group {
+ my (
+ $A0, $B0, $C0, $D0, $A1, $B1, $C1, $D1,
+ $A2, $B2, $C2, $D2, $A3, $B3, $C3, $D3
+ ) = @_;
+
+ my $code = <<___;
+ # a += b; d ^= a; d <<<= 16;
+ @{[vadd_vv $A0, $A0, $B0]}
+ @{[vadd_vv $A1, $A1, $B1]}
+ @{[vadd_vv $A2, $A2, $B2]}
+ @{[vadd_vv $A3, $A3, $B3]}
+ @{[vxor_vv $D0, $D0, $A0]}
+ @{[vxor_vv $D1, $D1, $A1]}
+ @{[vxor_vv $D2, $D2, $A2]}
+ @{[vxor_vv $D3, $D3, $A3]}
+ @{[vror_vi $D0, $D0, 32 - 16]}
+ @{[vror_vi $D1, $D1, 32 - 16]}
+ @{[vror_vi $D2, $D2, 32 - 16]}
+ @{[vror_vi $D3, $D3, 32 - 16]}
+ # c += d; b ^= c; b <<<= 12;
+ @{[vadd_vv $C0, $C0, $D0]}
+ @{[vadd_vv $C1, $C1, $D1]}
+ @{[vadd_vv $C2, $C2, $D2]}
+ @{[vadd_vv $C3, $C3, $D3]}
+ @{[vxor_vv $B0, $B0, $C0]}
+ @{[vxor_vv $B1, $B1, $C1]}
+ @{[vxor_vv $B2, $B2, $C2]}
+ @{[vxor_vv $B3, $B3, $C3]}
+ @{[vror_vi $B0, $B0, 32 - 12]}
+ @{[vror_vi $B1, $B1, 32 - 12]}
+ @{[vror_vi $B2, $B2, 32 - 12]}
+ @{[vror_vi $B3, $B3, 32 - 12]}
+ # a += b; d ^= a; d <<<= 8;
+ @{[vadd_vv $A0, $A0, $B0]}
+ @{[vadd_vv $A1, $A1, $B1]}
+ @{[vadd_vv $A2, $A2, $B2]}
+ @{[vadd_vv $A3, $A3, $B3]}
+ @{[vxor_vv $D0, $D0, $A0]}
+ @{[vxor_vv $D1, $D1, $A1]}
+ @{[vxor_vv $D2, $D2, $A2]}
+ @{[vxor_vv $D3, $D3, $A3]}
+ @{[vror_vi $D0, $D0, 32 - 8]}
+ @{[vror_vi $D1, $D1, 32 - 8]}
+ @{[vror_vi $D2, $D2, 32 - 8]}
+ @{[vror_vi $D3, $D3, 32 - 8]}
+ # c += d; b ^= c; b <<<= 7;
+ @{[vadd_vv $C0, $C0, $D0]}
+ @{[vadd_vv $C1, $C1, $D1]}
+ @{[vadd_vv $C2, $C2, $D2]}
+ @{[vadd_vv $C3, $C3, $D3]}
+ @{[vxor_vv $B0, $B0, $C0]}
+ @{[vxor_vv $B1, $B1, $C1]}
+ @{[vxor_vv $B2, $B2, $C2]}
+ @{[vxor_vv $B3, $B3, $C3]}
+ @{[vror_vi $B0, $B0, 32 - 7]}
+ @{[vror_vi $B1, $B1, 32 - 7]}
+ @{[vror_vi $B2, $B2, 32 - 7]}
+ @{[vror_vi $B3, $B3, 32 - 7]}
+___
+
+ return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl ChaCha20_ctr32_zvkb
+.type ChaCha20_ctr32_zvkb,\@function
+ChaCha20_ctr32_zvkb:
+ srli $LEN, $LEN, 6
+ beqz $LEN, .Lend
+
+ addi sp, sp, -96
+ sd s0, 0(sp)
+ sd s1, 8(sp)
+ sd s2, 16(sp)
+ sd s3, 24(sp)
+ sd s4, 32(sp)
+ sd s5, 40(sp)
+ sd s6, 48(sp)
+ sd s7, 56(sp)
+ sd s8, 64(sp)
+ sd s9, 72(sp)
+ sd s10, 80(sp)
+ sd s11, 88(sp)
+
+ li $STRIDE, 64
+
+ #### chacha block data
+ # "expa" little endian
+ li $CONST_DATA0, 0x61707865
+ # "nd 3" little endian
+ li $CONST_DATA1, 0x3320646e
+ # "2-by" little endian
+ li $CONST_DATA2, 0x79622d32
+ # "te k" little endian
+ li $CONST_DATA3, 0x6b206574
+
+ lw $KEY0, 0($KEY)
+ lw $KEY1, 4($KEY)
+ lw $KEY2, 8($KEY)
+ lw $KEY3, 12($KEY)
+ lw $KEY4, 16($KEY)
+ lw $KEY5, 20($KEY)
+ lw $KEY6, 24($KEY)
+ lw $KEY7, 28($KEY)
+
+ lw $COUNTER0, 0($COUNTER)
+ lw $COUNTER1, 4($COUNTER)
+ lw $NONCE0, 8($COUNTER)
+ lw $NONCE1, 12($COUNTER)
+
+.Lblock_loop:
+ @{[vsetvli $VL, $LEN, "e32", "m1", "ta", "ma"]}
+
+ # init chacha const states
+ @{[vmv_v_x $V0, $CONST_DATA0]}
+ @{[vmv_v_x $V1, $CONST_DATA1]}
+ @{[vmv_v_x $V2, $CONST_DATA2]}
+ @{[vmv_v_x $V3, $CONST_DATA3]}
+
+ # init chacha key states
+ @{[vmv_v_x $V4, $KEY0]}
+ @{[vmv_v_x $V5, $KEY1]}
+ @{[vmv_v_x $V6, $KEY2]}
+ @{[vmv_v_x $V7, $KEY3]}
+ @{[vmv_v_x $V8, $KEY4]}
+ @{[vmv_v_x $V9, $KEY5]}
+ @{[vmv_v_x $V10, $KEY6]}
+ @{[vmv_v_x $V11, $KEY7]}
+
+ # init chacha key states
+ @{[vid_v $V12]}
+ @{[vadd_vx $V12, $V12, $COUNTER0]}
+ @{[vmv_v_x $V13, $COUNTER1]}
+
+ # init chacha nonce states
+ @{[vmv_v_x $V14, $NONCE0]}
+ @{[vmv_v_x $V15, $NONCE1]}
+
+ # load the top-half of input data
+ @{[vlsseg_nf_e32_v 8, $V16, $INPUT, $STRIDE]}
+
+ li $CHACHA_LOOP_COUNT, 10
+.Lround_loop:
+ addi $CHACHA_LOOP_COUNT, $CHACHA_LOOP_COUNT, -1
+ @{[chacha_quad_round_group
+ $V0, $V4, $V8, $V12,
+ $V1, $V5, $V9, $V13,
+ $V2, $V6, $V10, $V14,
+ $V3, $V7, $V11, $V15]}
+ @{[chacha_quad_round_group
+ $V0, $V5, $V10, $V15,
+ $V1, $V6, $V11, $V12,
+ $V2, $V7, $V8, $V13,
+ $V3, $V4, $V9, $V14]}
+ bnez $CHACHA_LOOP_COUNT, .Lround_loop
+
+ # load the bottom-half of input data
+ addi $T0, $INPUT, 32
+ @{[vlsseg_nf_e32_v 8, $V24, $T0, $STRIDE]}
+
+ # add chacha top-half initial block states
+ @{[vadd_vx $V0, $V0, $CONST_DATA0]}
+ @{[vadd_vx $V1, $V1, $CONST_DATA1]}
+ @{[vadd_vx $V2, $V2, $CONST_DATA2]}
+ @{[vadd_vx $V3, $V3, $CONST_DATA3]}
+ @{[vadd_vx $V4, $V4, $KEY0]}
+ @{[vadd_vx $V5, $V5, $KEY1]}
+ @{[vadd_vx $V6, $V6, $KEY2]}
+ @{[vadd_vx $V7, $V7, $KEY3]}
+ # xor with the top-half input
+ @{[vxor_vv $V16, $V16, $V0]}
+ @{[vxor_vv $V17, $V17, $V1]}
+ @{[vxor_vv $V18, $V18, $V2]}
+ @{[vxor_vv $V19, $V19, $V3]}
+ @{[vxor_vv $V20, $V20, $V4]}
+ @{[vxor_vv $V21, $V21, $V5]}
+ @{[vxor_vv $V22, $V22, $V6]}
+ @{[vxor_vv $V23, $V23, $V7]}
+
+ # save the top-half of output
+ @{[vssseg_nf_e32_v 8, $V16, $OUTPUT, $STRIDE]}
+
+ # add chacha bottom-half initial block states
+ @{[vadd_vx $V8, $V8, $KEY4]}
+ @{[vadd_vx $V9, $V9, $KEY5]}
+ @{[vadd_vx $V10, $V10, $KEY6]}
+ @{[vadd_vx $V11, $V11, $KEY7]}
+ @{[vid_v $V0]}
+ @{[vadd_vx $V12, $V12, $COUNTER0]}
+ @{[vadd_vx $V13, $V13, $COUNTER1]}
+ @{[vadd_vx $V14, $V14, $NONCE0]}
+ @{[vadd_vx $V15, $V15, $NONCE1]}
+ @{[vadd_vv $V12, $V12, $V0]}
+ # xor with the bottom-half input
+ @{[vxor_vv $V24, $V24, $V8]}
+ @{[vxor_vv $V25, $V25, $V9]}
+ @{[vxor_vv $V26, $V26, $V10]}
+ @{[vxor_vv $V27, $V27, $V11]}
+ @{[vxor_vv $V29, $V29, $V13]}
+ @{[vxor_vv $V28, $V28, $V12]}
+ @{[vxor_vv $V30, $V30, $V14]}
+ @{[vxor_vv $V31, $V31, $V15]}
+
+ # save the bottom-half of output
+ addi $T0, $OUTPUT, 32
+ @{[vssseg_nf_e32_v 8, $V24, $T0, $STRIDE]}
+
+ # update counter
+ add $COUNTER0, $COUNTER0, $VL
+ sub $LEN, $LEN, $VL
+ # increase offset for `4 * 16 * VL = 64 * VL`
+ slli $T0, $VL, 6
+ add $INPUT, $INPUT, $T0
+ add $OUTPUT, $OUTPUT, $T0
+ bnez $LEN, .Lblock_loop
+
+ ld s0, 0(sp)
+ ld s1, 8(sp)
+ ld s2, 16(sp)
+ ld s3, 24(sp)
+ ld s4, 32(sp)
+ ld s5, 40(sp)
+ ld s6, 48(sp)
+ ld s7, 56(sp)
+ ld s8, 64(sp)
+ ld s9, 72(sp)
+ ld s10, 80(sp)
+ ld s11, 88(sp)
+ addi sp, sp, 96
+
+.Lend:
+ ret
+.size ChaCha20_ctr32_zvkb,.-ChaCha20_ctr32_zvkb
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-28 03:46:08

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 01/13] RISC-V: add helper function to read the vector VLEN

On Mon, Nov 27, 2023 at 03:06:51PM +0800, Jerry Shih wrote:
> From: Heiko Stuebner <[email protected]>
>
> VLEN describes the length of each vector register and some instructions
> need specific minimal VLENs to work correctly.
>
> The vector code already includes a variable riscv_v_vsize that contains
> the value of "32 vector registers with vlenb length" that gets filled
> during boot. vlenb is the value contained in the CSR_VLENB register and
> the value represents "VLEN / 8".
>
> So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
> users when they need to check the available VLEN.
>
> Signed-off-by: Heiko Stuebner <[email protected]>
> Signed-off-by: Jerry Shih <[email protected]>
> ---
> arch/riscv/include/asm/vector.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)

Reviewed-by: Eric Biggers <[email protected]>

- Eric

2023-11-28 03:46:10

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 02/13] RISC-V: hook new crypto subdir into build-system

On Mon, Nov 27, 2023 at 03:06:52PM +0800, Jerry Shih wrote:
> From: Heiko Stuebner <[email protected]>
>
> Create a crypto subdirectory for added accelerated cryptography routines
> and hook it into the riscv Kbuild and the main crypto Kconfig.
>
> Signed-off-by: Heiko Stuebner <[email protected]>
> Signed-off-by: Jerry Shih <[email protected]>
> ---
> arch/riscv/Kbuild | 1 +
> arch/riscv/crypto/Kconfig | 5 +++++
> arch/riscv/crypto/Makefile | 4 ++++
> crypto/Kconfig | 3 +++
> 4 files changed, 13 insertions(+)
> create mode 100644 arch/riscv/crypto/Kconfig
> create mode 100644 arch/riscv/crypto/Makefile

Reviewed-by: Eric Biggers <[email protected]>

- Eric

2023-11-28 03:56:43

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Mon, Nov 27, 2023 at 03:06:54PM +0800, Jerry Shih wrote:
> +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
> + unsigned int keylen)
> +{
> + int ret;
> +
> + ret = aes_check_keylen(keylen);
> + if (ret < 0)
> + return -EINVAL;
> +
> + /*
> + * The RISC-V AES vector crypto key expanding doesn't support AES-192.
> + * Use the generic software key expanding for that case.
> + */
> + if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
> + /*
> + * All zvkned-based functions use encryption expanding keys for both
> + * encryption and decryption.
> + */
> + kernel_vector_begin();
> + rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
> + kernel_vector_end();
> + } else {
> + ret = aes_expandkey(ctx, key, keylen);
> + }

rv64i_zvkned_set_encrypt_key() does not initialize crypto_aes_ctx::key_dec.
So, decryption results will be incorrect if !crypto_simd_usable() later.

> +static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
> + unsigned int keylen)

It's best to avoid generic-sounding function names like this that could collide
with functions in crypto/ or lib/crypto/. A better name for this function, for
example, would be aes_setkey_zvkned().

> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> new file mode 100644
> index 000000000000..303e82d9f6f0
> --- /dev/null
> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
[...]
> +L_enc_128:
[...]
> +L_enc_192:
[...]
> +L_enc_256:

There's some severe source code duplication going on in the AES assembly, with
the three AES variants having separate source code. You can just leave this
as-is since this is what was merged into OpenSSL and we are borrowing that for
now, but I do expect that we'll want to clean this up later.

- Eric

2023-11-28 03:58:27

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

On Mon, Nov 27, 2023 at 03:06:55PM +0800, Jerry Shih wrote:
> The `walksize` assignment is missed in simd skcipher.
>
> Signed-off-by: Jerry Shih <[email protected]>
> ---
> crypto/cryptd.c | 1 +
> crypto/simd.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/crypto/cryptd.c b/crypto/cryptd.c
> index bbcc368b6a55..253d13504ccb 100644
> --- a/crypto/cryptd.c
> +++ b/crypto/cryptd.c
> @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
> (alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
> inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
> inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
> + inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
> inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
> inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
>
> diff --git a/crypto/simd.c b/crypto/simd.c
> index edaa479a1ec5..ea0caabf90f1 100644
> --- a/crypto/simd.c
> +++ b/crypto/simd.c
> @@ -181,6 +181,7 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const char *algname,
>
> alg->ivsize = ialg->ivsize;
> alg->chunksize = ialg->chunksize;
> + alg->walksize = ialg->walksize;
> alg->min_keysize = ialg->min_keysize;
> alg->max_keysize = ialg->max_keysize;

What are the consequences of this bug? I wonder if it actually matters? The
"inner" algorithm is the one that actually gets used for the "walk", right?

- Eric

2023-11-28 04:07:31

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

On Mon, Nov 27, 2023 at 03:06:57PM +0800, Jerry Shih wrote:
> +typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
> + const struct crypto_aes_ctx *key, u8 *iv,
> + int update_iv);

There's no need for this indirection, because the function pointer can only have
one value.

Note also that when Control Flow Integrity is enabled, assembly functions can
only be called indirectly when they use SYM_TYPED_FUNC_START. That's another
reason to avoid indirect calls that aren't actually necessary.

> + nbytes &= (~(AES_BLOCK_SIZE - 1));

Expressions like ~(n - 1) should not have another set of parentheses around them

> +static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
> +{
> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> + const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
> + struct skcipher_request sub_req;
> + struct scatterlist sg_src[2], sg_dst[2];
> + struct scatterlist *src, *dst;
> + struct skcipher_walk walk;
> + unsigned int walk_size = crypto_skcipher_walksize(tfm);
> + unsigned int tail_bytes;
> + unsigned int head_bytes;
> + unsigned int nbytes;
> + unsigned int update_iv = 1;
> + int err;
> +
> + /* xts input size should be bigger than AES_BLOCK_SIZE */
> + if (req->cryptlen < AES_BLOCK_SIZE)
> + return -EINVAL;
> +
> + /*
> + * We split xts-aes cryption into `head` and `tail` parts.
> + * The head block contains the input from the beginning which doesn't need
> + * `ciphertext stealing` method.
> + * The tail block contains at least two AES blocks including ciphertext
> + * stealing data from the end.
> + */
> + if (req->cryptlen <= walk_size) {
> + /*
> + * All data is in one `walk`. We could handle it within one AES-XTS call in
> + * the end.
> + */
> + tail_bytes = req->cryptlen;
> + head_bytes = 0;
> + } else {
> + if (req->cryptlen & (AES_BLOCK_SIZE - 1)) {
> + /*
> + * with ciphertext stealing
> + *
> + * Find the largest tail size which is small than `walk` size while the
> + * head part still fits AES block boundary.
> + */
> + tail_bytes = req->cryptlen & (AES_BLOCK_SIZE - 1);
> + tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
> + head_bytes = req->cryptlen - tail_bytes;
> + } else {
> + /* no ciphertext stealing */
> + tail_bytes = 0;
> + head_bytes = req->cryptlen;
> + }
> + }
> +
> + riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
> +
> + if (head_bytes && tail_bytes) {
> + /* If we have to parts, setup new request for head part only. */
> + skcipher_request_set_tfm(&sub_req, tfm);
> + skcipher_request_set_callback(
> + &sub_req, skcipher_request_flags(req), NULL, NULL);
> + skcipher_request_set_crypt(&sub_req, req->src, req->dst,
> + head_bytes, req->iv);
> + req = &sub_req;
> + }
> +
> + if (head_bytes) {
> + err = skcipher_walk_virt(&walk, req, false);
> + while ((nbytes = walk.nbytes)) {
> + if (nbytes == walk.total)
> + update_iv = (tail_bytes > 0);
> +
> + nbytes &= (~(AES_BLOCK_SIZE - 1));
> + kernel_vector_begin();
> + func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
> + &ctx->ctx1, req->iv, update_iv);
> + kernel_vector_end();
> +
> + err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> + }
> + if (err || !tail_bytes)
> + return err;
> +
> + /*
> + * Setup new request for tail part.
> + * We use `scatterwalk_next()` to find the next scatterlist from last
> + * walk instead of iterating from the beginning.
> + */
> + dst = src = scatterwalk_next(sg_src, &walk.in);
> + if (req->dst != req->src)
> + dst = scatterwalk_next(sg_dst, &walk.out);
> + skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
> + }
> +
> + /* tail */
> + err = skcipher_walk_virt(&walk, req, false);
> + if (err)
> + return err;
> + if (walk.nbytes != tail_bytes)
> + return -EINVAL;
> + kernel_vector_begin();
> + func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->ctx1,
> + req->iv, 0);
> + kernel_vector_end();
> +
> + return skcipher_walk_done(&walk, 0);
> +}

Did you consider writing xts_crypt() the way that arm64 and x86 do it? The
above seems to reinvent sort of the same thing from first principles. I'm
wondering if you should just copy the existing approach for now. Then there
would be no need to add the scatterwalk_next() function, and also the handling
of inputs that don't need ciphertext stealing would be a bit more streamlined.

> +static int __init riscv64_aes_block_mod_init(void)
> +{
> + int ret = -ENODEV;
> +
> + if (riscv_isa_extension_available(NULL, ZVKNED) &&
> + riscv_vector_vlen() >= 128 && riscv_vector_vlen() <= 2048) {
> + ret = simd_register_skciphers_compat(
> + riscv64_aes_algs_zvkned,
> + ARRAY_SIZE(riscv64_aes_algs_zvkned),
> + riscv64_aes_simd_algs_zvkned);
> + if (ret)
> + return ret;
> +
> + if (riscv_isa_extension_available(NULL, ZVBB)) {
> + ret = simd_register_skciphers_compat(
> + riscv64_aes_alg_zvkned_zvkb,
> + ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb),
> + riscv64_aes_simd_alg_zvkned_zvkb);
> + if (ret)
> + goto unregister_zvkned;

This makes the registration of the zvkned-zvkb algorithm conditional on zvbb,
not zvkb. Shouldn't the extension checks actually look like:

ZVKNED
ZVKB
ZVBB && ZVKG

- Eric

2023-11-28 04:12:53

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 09/13] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations

On Mon, Nov 27, 2023 at 03:06:59PM +0800, Jerry Shih wrote:
> +/*
> + * sha256 using zvkb and zvknha/b vector crypto extension
> + *
> + * This asm function will just take the first 256-bit as the sha256 state from
> + * the pointer to `struct sha256_state`.
> + */
> +asmlinkage void
> +sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest,
> + const u8 *data, int num_blks);

The SHA-2 and SM3 assembly functions are potentially being called using indirect
calls, depending on whether the compiler optimizes out the indirect call that
exists in the code or not. These assembly functions also are not defined using
SYM_TYPED_FUNC_START. This is not compatible with Control Flow Integrity
(CONFIG_CFI_CLANG); these indirect calls might generate CFI failures.

I recommend using wrapper functions to avoid this issue, like what is done in
arch/arm64/crypto/sha2-ce-glue.c.

- Eric

2023-11-28 04:13:28

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 12/13] RISC-V: crypto: add Zvksh accelerated SM3 implementation

On Mon, Nov 27, 2023 at 03:07:02PM +0800, Jerry Shih wrote:
> +static int __init riscv64_riscv64_sm3_mod_init(void)

There's an extra "_riscv64" in this function name.

- Eric

2023-11-28 04:22:51

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Nov 28, 2023, at 11:56, Eric Biggers <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:06:54PM +0800, Jerry Shih wrote:
>> +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
>> + unsigned int keylen)
>> +{
>> + int ret;
>> +
>> + ret = aes_check_keylen(keylen);
>> + if (ret < 0)
>> + return -EINVAL;
>> +
>> + /*
>> + * The RISC-V AES vector crypto key expanding doesn't support AES-192.
>> + * Use the generic software key expanding for that case.
>> + */
>> + if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
>> + /*
>> + * All zvkned-based functions use encryption expanding keys for both
>> + * encryption and decryption.
>> + */
>> + kernel_vector_begin();
>> + rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
>> + kernel_vector_end();
>> + } else {
>> + ret = aes_expandkey(ctx, key, keylen);
>> + }
>
> rv64i_zvkned_set_encrypt_key() does not initialize crypto_aes_ctx::key_dec.
> So, decryption results will be incorrect if !crypto_simd_usable() later.

Will we have the situation that `crypto_simd_usable()` condition is not consistent
during the aes_setkey(), aes_enc/dec()? If yes, all accelerated(or HW specific)
crypto algorithms should do the same implementations as the sw fallback path
since the `crypto_simd_usable()` will change back and forth.

>> +static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
>> + unsigned int keylen)
>
> It's best to avoid generic-sounding function names like this that could collide
> with functions in crypto/ or lib/crypto/. A better name for this function, for
> example, would be aes_setkey_zvkned().

Thx, I will fix that.

>> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
>> new file mode 100644
>> index 000000000000..303e82d9f6f0
>> --- /dev/null
>> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> [...]
>> +L_enc_128:
> [...]
>> +L_enc_192:
> [...]
>> +L_enc_256:
>
> There's some severe source code duplication going on in the AES assembly, with
> the three AES variants having separate source code. You can just leave this
> as-is since this is what was merged into OpenSSL and we are borrowing that for
> now, but I do expect that we'll want to clean this up later.

Do we prefer the code with the branches instead of the specified implementation?
We could make AES-128/192/256 together like:

@{[vaesz_vs $V24, $V1]}
@{[vaesem_vs $V24, $V2]}
@{[vaesem_vs $V24, $V3]}
@{[vaesem_vs $V24, $V4]}
@{[vaesem_vs $V24, $V5]}
@{[vaesem_vs $V24, $V6]}
@{[vaesem_vs $V24, $V7]}
@{[vaesem_vs $V24, $V8]}
@{[vaesem_vs $V24, $V9]}
@{[vaesem_vs $V24, $V10]}
beq $ROUND, $ROUND_11, 1f
@{[vaesem_vs $V24, $V11]}
@{[vaesem_vs $V24, $V12]}
beq $ROUND, $ROUND_13, 1f
@{[vaesem_vs $V24, $V13]}
@{[vaesem_vs $V24, $V14]}
1:
@{[vaesef_vs $V24, $V15]}

But we will have the additional costs for the branches.

> - Eric



2023-11-28 04:25:15

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 13/13] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

On Mon, Nov 27, 2023 at 03:07:03PM +0800, Jerry Shih wrote:
> +config CRYPTO_CHACHA20_RISCV64

Can you call this kconfig option just CRYPTO_CHACHA_RISCV64? I.e. drop the
"20". The ChaCha family of ciphers includes more than just ChaCha20.

The other architectures do use "CHACHA20" in their equivalent option, even when
they implement XChaCha12 too. But that's for historical reasons -- we didn't
want to break anything by renaming the kconfig options. For a new option we
should use the more general name from the beginning, even if initially only
ChaCha20 is implemented (which is fine).

> +static int chacha20_encrypt(struct skcipher_request *req)

riscv64_chacha_crypt(), please. chacha20_encrypt() is dangerously close to
being the same name as chacha20_crypt() which already exists in crypto/chacha.h.

> +static inline bool check_chacha20_ext(void)
> +{
> + return riscv_isa_extension_available(NULL, ZVKB) &&
> + riscv_vector_vlen() >= 128;
> +}

Just to double check: your intent is to simply require VLEN >= 128 for all the
RISC-V vector crypto code, even when some might work with a shorter VLEN? I
don't see anything in chacha-riscv64-zvkb.pl that assumes VLEN >= 128, for
example. I think it would even work with VLEN == 32.

I think requiring VLEN >= 128 anyway makes sense so that we don't have to worry
about validating the code with shorter VLEN. And "application processors" are
supposed to have VLEN >= 128. But I just wanted to make sure this is what you
intended too.

- Eric

2023-11-28 04:43:16

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Tue, Nov 28, 2023 at 12:22:26PM +0800, Jerry Shih wrote:
> On Nov 28, 2023, at 11:56, Eric Biggers <[email protected]> wrote:
> > On Mon, Nov 27, 2023 at 03:06:54PM +0800, Jerry Shih wrote:
> >> +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
> >> + unsigned int keylen)
> >> +{
> >> + int ret;
> >> +
> >> + ret = aes_check_keylen(keylen);
> >> + if (ret < 0)
> >> + return -EINVAL;
> >> +
> >> + /*
> >> + * The RISC-V AES vector crypto key expanding doesn't support AES-192.
> >> + * Use the generic software key expanding for that case.
> >> + */
> >> + if ((keylen == 16 || keylen == 32) && crypto_simd_usable()) {
> >> + /*
> >> + * All zvkned-based functions use encryption expanding keys for both
> >> + * encryption and decryption.
> >> + */
> >> + kernel_vector_begin();
> >> + rv64i_zvkned_set_encrypt_key(key, keylen, ctx);
> >> + kernel_vector_end();
> >> + } else {
> >> + ret = aes_expandkey(ctx, key, keylen);
> >> + }
> >
> > rv64i_zvkned_set_encrypt_key() does not initialize crypto_aes_ctx::key_dec.
> > So, decryption results will be incorrect if !crypto_simd_usable() later.
>
> Will we have the situation that `crypto_simd_usable()` condition is not consistent
> during the aes_setkey(), aes_enc/dec()? If yes, all accelerated(or HW specific)
> crypto algorithms should do the same implementations as the sw fallback path
> since the `crypto_simd_usable()` will change back and forth.

Yes, the calls to one "crypto_cipher" can happen in different contexts. For
example, crypto_simd_usable() can be true during setkey and false during
decrypt, or vice versa.

If the RISC-V decryption code wants to use the regular key schedule (key_enc)
instead of the "Equivalent Inverse Cipher key schedule" (key_dec), that's
perfectly fine, but setkey still needs to initialize key_dec in case the
fallback to aes_decrypt() gets taken.

> >> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> >> new file mode 100644
> >> index 000000000000..303e82d9f6f0
> >> --- /dev/null
> >> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
> > [...]
> >> +L_enc_128:
> > [...]
> >> +L_enc_192:
> > [...]
> >> +L_enc_256:
> >
> > There's some severe source code duplication going on in the AES assembly, with
> > the three AES variants having separate source code. You can just leave this
> > as-is since this is what was merged into OpenSSL and we are borrowing that for
> > now, but I do expect that we'll want to clean this up later.
>
> Do we prefer the code with the branches instead of the specified implementation?
> We could make AES-128/192/256 together like:
>
> @{[vaesz_vs $V24, $V1]}
> @{[vaesem_vs $V24, $V2]}
> @{[vaesem_vs $V24, $V3]}
> @{[vaesem_vs $V24, $V4]}
> @{[vaesem_vs $V24, $V5]}
> @{[vaesem_vs $V24, $V6]}
> @{[vaesem_vs $V24, $V7]}
> @{[vaesem_vs $V24, $V8]}
> @{[vaesem_vs $V24, $V9]}
> @{[vaesem_vs $V24, $V10]}
> beq $ROUND, $ROUND_11, 1f
> @{[vaesem_vs $V24, $V11]}
> @{[vaesem_vs $V24, $V12]}
> beq $ROUND, $ROUND_13, 1f
> @{[vaesem_vs $V24, $V13]}
> @{[vaesem_vs $V24, $V14]}
> 1:
> @{[vaesef_vs $V24, $V15]}
>
> But we will have the additional costs for the branches.
>

That needs to be decided on a case by case basis depending on the performance
impact and how much binary code is saved. On some architectures, separate
binary code for AES-{128,192,256} has been found to be worthwhile. However,
that does *not* mean that they need to have separate source code. Take a look
at how arch/x86/crypto/aes_ctrby8_avx-x86_64.S generates code for all the AES
variants using macros, for example.

Anyway, I don't think you should bother making too many changes to the "perlasm"
files. If we decide to make major cleanups I think we should just replace them
with .S files (which already support macros).

- Eric

2023-11-28 05:38:47

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

On Nov 28, 2023, at 11:58, Eric Biggers <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:06:55PM +0800, Jerry Shih wrote:
>> The `walksize` assignment is missed in simd skcipher.
>>
>> Signed-off-by: Jerry Shih <[email protected]>
>> ---
>> crypto/cryptd.c | 1 +
>> crypto/simd.c | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/crypto/cryptd.c b/crypto/cryptd.c
>> index bbcc368b6a55..253d13504ccb 100644
>> --- a/crypto/cryptd.c
>> +++ b/crypto/cryptd.c
>> @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
>> (alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
>> inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
>> inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
>> + inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
>> inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
>> inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
>>
>> diff --git a/crypto/simd.c b/crypto/simd.c
>> index edaa479a1ec5..ea0caabf90f1 100644
>> --- a/crypto/simd.c
>> +++ b/crypto/simd.c
>> @@ -181,6 +181,7 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const char *algname,
>>
>> alg->ivsize = ialg->ivsize;
>> alg->chunksize = ialg->chunksize;
>> + alg->walksize = ialg->walksize;
>> alg->min_keysize = ialg->min_keysize;
>> alg->max_keysize = ialg->max_keysize;
>
> What are the consequences of this bug? I wonder if it actually matters? The
> "inner" algorithm is the one that actually gets used for the "walk", right?
>
> - Eric

Without this, we might still use chunksize or cra_blocksize as the walksize
even though we setup with the larger walksize.

Here is the code for the walksize default value:
static int skcipher_prepare_alg(struct skcipher_alg *alg)
{
...
if (!alg->chunksize)
alg->chunksize = base->cra_blocksize;
if (!alg->walksize)
alg->walksize = alg->chunksize;

And we already have the bigger walksize for x86 aes-xts.
.base = {
.cra_name = "__xts(aes)",
...
},
.walksize = 2 * AES_BLOCK_SIZE,

The x86 aes-xts only uses one `walk` to handle the tail elements. It assumes
that the walksize contains 2 aes blocks. If walksize is not set correctly, maybe
some tail elements is not processed in simd-cipher mode for x86 aes-xts.

-Jerry

2023-11-28 07:17:18

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 09/13] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations

On Nov 28, 2023, at 12:12, Eric Biggers <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:06:59PM +0800, Jerry Shih wrote:
>> +/*
>> + * sha256 using zvkb and zvknha/b vector crypto extension
>> + *
>> + * This asm function will just take the first 256-bit as the sha256 state from
>> + * the pointer to `struct sha256_state`.
>> + */
>> +asmlinkage void
>> +sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest,
>> + const u8 *data, int num_blks);
>
> The SHA-2 and SM3 assembly functions are potentially being called using indirect
> calls, depending on whether the compiler optimizes out the indirect call that
> exists in the code or not. These assembly functions also are not defined using
> SYM_TYPED_FUNC_START. This is not compatible with Control Flow Integrity
> (CONFIG_CFI_CLANG); these indirect calls might generate CFI failures.
>
> I recommend using wrapper functions to avoid this issue, like what is done in
> arch/arm64/crypto/sha2-ce-glue.c.
>
> - Eric

Here is the previous review comment for the assembly function wrapper:
> > +asmlinkage void sha256_block_data_order_zvbb_zvknha(u32 *digest, const void *data,
> > + unsigned int num_blks);
> > +
> > +static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
> > + int blocks)
> > +{
> > + sha256_block_data_order_zvbb_zvknha(sst->state, src, blocks);
> > +}
> Having a double-underscored function wrap around a non-underscored one like this
> isn't conventional for Linux kernel code. IIRC some of the other crypto code
> happens to do this, but it really is supposed to be the other way around.
>
> I think you should just declare the assembly function to take a 'struct
> sha256_state', with a comment mentioning that only the 'u32 state[8]' at the
> beginning is actually used. That's what arch/x86/crypto/sha256_ssse3_glue.c
> does, for example. Then, __sha256_block_data_order() would be unneeded.

Do you mean that we need the wrapper functions back for both SHA-* and SM3?
If yes, we also don't need to check the state offset like:
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);

Could we just use the `SYM_TYPED_FUNC_START` in asm directly without the
wrappers?

-Jerry

2023-11-28 08:57:54

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 13/13] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

On Nov 28, 2023, at 12:25, Eric Biggers <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:07:03PM +0800, Jerry Shih wrote:
>> +config CRYPTO_CHACHA20_RISCV64
>
> Can you call this kconfig option just CRYPTO_CHACHA_RISCV64? I.e. drop the
> "20". The ChaCha family of ciphers includes more than just ChaCha20.
>
> The other architectures do use "CHACHA20" in their equivalent option, even when
> they implement XChaCha12 too. But that's for historical reasons -- we didn't
> want to break anything by renaming the kconfig options. For a new option we
> should use the more general name from the beginning, even if initially only
> ChaCha20 is implemented (which is fine).

I will use `CRYPTO_CHACHA_RISCV64` instead.

>> +static int chacha20_encrypt(struct skcipher_request *req)
>
> riscv64_chacha_crypt(), please. chacha20_encrypt() is dangerously close to
> being the same name as chacha20_crypt() which already exists in crypto/chacha.h.

The function will will have additional prefix/suffix.

>> +static inline bool check_chacha20_ext(void)
>> +{
>> + return riscv_isa_extension_available(NULL, ZVKB) &&
>> + riscv_vector_vlen() >= 128;
>> +}
>
> Just to double check: your intent is to simply require VLEN >= 128 for all the
> RISC-V vector crypto code, even when some might work with a shorter VLEN? I
> don't see anything in chacha-riscv64-zvkb.pl that assumes VLEN >= 128, for
> example. I think it would even work with VLEN == 32.

Yes, the chacha algorithm here only needs the VLEN>=32. But I think we will not get
benefits with that kind of hw.

> I think requiring VLEN >= 128 anyway makes sense so that we don't have to worry
> about validating the code with shorter VLEN. And "application processors" are
> supposed to have VLEN >= 128. But I just wanted to make sure this is what you
> intended too.

The standard "V" extension assumes VLEN>=128. I just follow that assumption.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#183-v-vector-extension-for-application-processors

-Jerry

2023-11-28 17:23:19

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

On Tue, Nov 28, 2023 at 01:38:29PM +0800, Jerry Shih wrote:
> On Nov 28, 2023, at 11:58, Eric Biggers <[email protected]> wrote:
> > On Mon, Nov 27, 2023 at 03:06:55PM +0800, Jerry Shih wrote:
> >> The `walksize` assignment is missed in simd skcipher.
> >>
> >> Signed-off-by: Jerry Shih <[email protected]>
> >> ---
> >> crypto/cryptd.c | 1 +
> >> crypto/simd.c | 1 +
> >> 2 files changed, 2 insertions(+)
> >>
> >> diff --git a/crypto/cryptd.c b/crypto/cryptd.c
> >> index bbcc368b6a55..253d13504ccb 100644
> >> --- a/crypto/cryptd.c
> >> +++ b/crypto/cryptd.c
> >> @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
> >> (alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
> >> inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
> >> inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
> >> + inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
> >> inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
> >> inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
> >>
> >> diff --git a/crypto/simd.c b/crypto/simd.c
> >> index edaa479a1ec5..ea0caabf90f1 100644
> >> --- a/crypto/simd.c
> >> +++ b/crypto/simd.c
> >> @@ -181,6 +181,7 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const char *algname,
> >>
> >> alg->ivsize = ialg->ivsize;
> >> alg->chunksize = ialg->chunksize;
> >> + alg->walksize = ialg->walksize;
> >> alg->min_keysize = ialg->min_keysize;
> >> alg->max_keysize = ialg->max_keysize;
> >
> > What are the consequences of this bug? I wonder if it actually matters? The
> > "inner" algorithm is the one that actually gets used for the "walk", right?
> >
> > - Eric
>
> Without this, we might still use chunksize or cra_blocksize as the walksize
> even though we setup with the larger walksize.
>
> Here is the code for the walksize default value:
> static int skcipher_prepare_alg(struct skcipher_alg *alg)
> {
> ...
> if (!alg->chunksize)
> alg->chunksize = base->cra_blocksize;
> if (!alg->walksize)
> alg->walksize = alg->chunksize;
>
> And we already have the bigger walksize for x86 aes-xts.
> .base = {
> .cra_name = "__xts(aes)",
> ...
> },
> .walksize = 2 * AES_BLOCK_SIZE,
>
> The x86 aes-xts only uses one `walk` to handle the tail elements. It assumes
> that the walksize contains 2 aes blocks. If walksize is not set correctly, maybe
> some tail elements is not processed in simd-cipher mode for x86 aes-xts.

With the SIMD helper there are three "algorithms": the underlying algorithm, the
cryptd algorithm, and the simd algorithm. This patch makes the "walksize"
property be propagated from the underlying algorithm to the cryptd and simd
algorithms. I don't see how that actually makes a difference, since the only
place the skcipher_walk happens is on the underlying algorithm. So it uses the
"walksize" from the underlying algorithm, right?

- Eric

2023-11-28 17:23:57

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 09/13] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations

On Tue, Nov 28, 2023 at 03:16:53PM +0800, Jerry Shih wrote:
> On Nov 28, 2023, at 12:12, Eric Biggers <[email protected]> wrote:
> > On Mon, Nov 27, 2023 at 03:06:59PM +0800, Jerry Shih wrote:
> >> +/*
> >> + * sha256 using zvkb and zvknha/b vector crypto extension
> >> + *
> >> + * This asm function will just take the first 256-bit as the sha256 state from
> >> + * the pointer to `struct sha256_state`.
> >> + */
> >> +asmlinkage void
> >> +sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest,
> >> + const u8 *data, int num_blks);
> >
> > The SHA-2 and SM3 assembly functions are potentially being called using indirect
> > calls, depending on whether the compiler optimizes out the indirect call that
> > exists in the code or not. These assembly functions also are not defined using
> > SYM_TYPED_FUNC_START. This is not compatible with Control Flow Integrity
> > (CONFIG_CFI_CLANG); these indirect calls might generate CFI failures.
> >
> > I recommend using wrapper functions to avoid this issue, like what is done in
> > arch/arm64/crypto/sha2-ce-glue.c.
> >
> > - Eric
>
> Here is the previous review comment for the assembly function wrapper:
> > > +asmlinkage void sha256_block_data_order_zvbb_zvknha(u32 *digest, const void *data,
> > > + unsigned int num_blks);
> > > +
> > > +static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
> > > + int blocks)
> > > +{
> > > + sha256_block_data_order_zvbb_zvknha(sst->state, src, blocks);
> > > +}
> > Having a double-underscored function wrap around a non-underscored one like this
> > isn't conventional for Linux kernel code. IIRC some of the other crypto code
> > happens to do this, but it really is supposed to be the other way around.
> >
> > I think you should just declare the assembly function to take a 'struct
> > sha256_state', with a comment mentioning that only the 'u32 state[8]' at the
> > beginning is actually used. That's what arch/x86/crypto/sha256_ssse3_glue.c
> > does, for example. Then, __sha256_block_data_order() would be unneeded.
>
> Do you mean that we need the wrapper functions back for both SHA-* and SM3?
> If yes, we also don't need to check the state offset like:
> BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
>
> Could we just use the `SYM_TYPED_FUNC_START` in asm directly without the
> wrappers?

Sorry, I forgot that I had recommended against wrapper functions earlier. I
didn't realize that SYM_TYPED_FUNC_START was missing. Yes, you can also do it
without wrapper functions if you add SYM_TYPED_FUNC_START to the assembly.

- Eric

2023-11-28 17:55:26

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

> +static inline bool check_aes_ext(void)
> +{
> + return riscv_isa_extension_available(NULL, ZVKNED) &&
> + riscv_vector_vlen() >= 128;
> +}

I'm not keen on this construct, where you are checking vlen greater than
128 and the presence of Zvkned without checking for the presence of V
itself. Can you use "has_vector()" in any places where you depend on the
presence of vector please?

Also, there are potentially a lot of places in this drivers where you
can replace "riscv_isa_extension_available()" with
"riscv_has_extension_likely()". The latter is optimised with
alternatives, so in places that are going to be evaluated frequently it
may be beneficial for you.

Cheers,
Conor.


Attachments:
(No filename) (710.00 B)
signature.asc (235.00 B)
Download all attachments

2023-11-28 20:13:04

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
> > +static inline bool check_aes_ext(void)
> > +{
> > + return riscv_isa_extension_available(NULL, ZVKNED) &&
> > + riscv_vector_vlen() >= 128;
> > +}
>
> I'm not keen on this construct, where you are checking vlen greater than
> 128 and the presence of Zvkned without checking for the presence of V
> itself. Can you use "has_vector()" in any places where you depend on the
> presence of vector please?

Shouldn't both of those things imply vector support already?

> Also, there are potentially a lot of places in this drivers where you
> can replace "riscv_isa_extension_available()" with
> "riscv_has_extension_likely()". The latter is optimised with
> alternatives, so in places that are going to be evaluated frequently it
> may be beneficial for you.

These extension checks are only executed in module_init functions, so they're
not performance critical.

- Eric

2023-11-29 02:40:21

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Nov 29, 2023, at 04:12, Eric Biggers <[email protected]> wrote:
> On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
>>> +static inline bool check_aes_ext(void)
>>> +{
>>> + return riscv_isa_extension_available(NULL, ZVKNED) &&
>>> + riscv_vector_vlen() >= 128;
>>> +}
>>
>> I'm not keen on this construct, where you are checking vlen greater than
>> 128 and the presence of Zvkned without checking for the presence of V
>> itself. Can you use "has_vector()" in any places where you depend on the
>> presence of vector please?
>
> Shouldn't both of those things imply vector support already?

The vector crypto extensions imply `V` extension. Should we still need to check
the `V` explicitly?
https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc#1-extensions-overview

>> Also, there are potentially a lot of places in this drivers where you
>> can replace "riscv_isa_extension_available()" with
>> "riscv_has_extension_likely()". The latter is optimised with
>> alternatives, so in places that are going to be evaluated frequently it
>> may be beneficial for you.
>
> These extension checks are only executed in module_init functions, so they're
> not performance critical.

All `riscv_isa_extension_available()` calls in crypto drivers are called once
in the module init calls. Should we still need that `riscv_has_extension_likely()`
with a little more code size?

> - Eric

2023-11-29 05:32:33

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 12/13] RISC-V: crypto: add Zvksh accelerated SM3 implementation

On Nov 28, 2023, at 12:13, Eric Biggers <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:07:02PM +0800, Jerry Shih wrote:
>> +static int __init riscv64_riscv64_sm3_mod_init(void)
>
> There's an extra "_riscv64" in this function name.
>
> - Eric

Fixed.

2023-11-29 07:57:49

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

On Nov 28, 2023, at 12:07, Eric Biggers <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:06:57PM +0800, Jerry Shih wrote:
>> +typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
>> + const struct crypto_aes_ctx *key, u8 *iv,
>> + int update_iv);
>
> There's no need for this indirection, because the function pointer can only have
> one value.
>
> Note also that when Control Flow Integrity is enabled, assembly functions can
> only be called indirectly when they use SYM_TYPED_FUNC_START. That's another
> reason to avoid indirect calls that aren't actually necessary.

We have two function pointers for encryption and decryption.
static int xts_encrypt(struct skcipher_request *req)
{
return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt);
}

static int xts_decrypt(struct skcipher_request *req)
{
return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt);
}
The enc and dec path could be folded together into `xts_crypt()`, but we will have
additional branches for enc/decryption path if we don't want to have the indirect calls.
Use `SYM_TYPED_FUNC_START` in asm might be better.

>> + nbytes &= (~(AES_BLOCK_SIZE - 1));
>
> Expressions like ~(n - 1) should not have another set of parentheses around them

Fixed.

>> +static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
>> +{
>> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>> + const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
>> + struct skcipher_request sub_req;
>> + struct scatterlist sg_src[2], sg_dst[2];
>> + struct scatterlist *src, *dst;
>> + struct skcipher_walk walk;
>> + unsigned int walk_size = crypto_skcipher_walksize(tfm);
>> + unsigned int tail_bytes;
>> + unsigned int head_bytes;
>> + unsigned int nbytes;
>> + unsigned int update_iv = 1;
>> + int err;
>> +
>> + /* xts input size should be bigger than AES_BLOCK_SIZE */
>> + if (req->cryptlen < AES_BLOCK_SIZE)
>> + return -EINVAL;
>> +
>> + /*
>> + * We split xts-aes cryption into `head` and `tail` parts.
>> + * The head block contains the input from the beginning which doesn't need
>> + * `ciphertext stealing` method.
>> + * The tail block contains at least two AES blocks including ciphertext
>> + * stealing data from the end.
>> + */
>> + if (req->cryptlen <= walk_size) {
>> + /*
>> + * All data is in one `walk`. We could handle it within one AES-XTS call in
>> + * the end.
>> + */
>> + tail_bytes = req->cryptlen;
>> + head_bytes = 0;
>> + } else {
>> + if (req->cryptlen & (AES_BLOCK_SIZE - 1)) {
>> + /*
>> + * with ciphertext stealing
>> + *
>> + * Find the largest tail size which is small than `walk` size while the
>> + * head part still fits AES block boundary.
>> + */
>> + tail_bytes = req->cryptlen & (AES_BLOCK_SIZE - 1);
>> + tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
>> + head_bytes = req->cryptlen - tail_bytes;
>> + } else {
>> + /* no ciphertext stealing */
>> + tail_bytes = 0;
>> + head_bytes = req->cryptlen;
>> + }
>> + }
>> +
>> + riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
>> +
>> + if (head_bytes && tail_bytes) {
>> + /* If we have to parts, setup new request for head part only. */
>> + skcipher_request_set_tfm(&sub_req, tfm);
>> + skcipher_request_set_callback(
>> + &sub_req, skcipher_request_flags(req), NULL, NULL);
>> + skcipher_request_set_crypt(&sub_req, req->src, req->dst,
>> + head_bytes, req->iv);
>> + req = &sub_req;
>> + }
>> +
>> + if (head_bytes) {
>> + err = skcipher_walk_virt(&walk, req, false);
>> + while ((nbytes = walk.nbytes)) {
>> + if (nbytes == walk.total)
>> + update_iv = (tail_bytes > 0);
>> +
>> + nbytes &= (~(AES_BLOCK_SIZE - 1));
>> + kernel_vector_begin();
>> + func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
>> + &ctx->ctx1, req->iv, update_iv);
>> + kernel_vector_end();
>> +
>> + err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
>> + }
>> + if (err || !tail_bytes)
>> + return err;
>> +
>> + /*
>> + * Setup new request for tail part.
>> + * We use `scatterwalk_next()` to find the next scatterlist from last
>> + * walk instead of iterating from the beginning.
>> + */
>> + dst = src = scatterwalk_next(sg_src, &walk.in);
>> + if (req->dst != req->src)
>> + dst = scatterwalk_next(sg_dst, &walk.out);
>> + skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
>> + }
>> +
>> + /* tail */
>> + err = skcipher_walk_virt(&walk, req, false);
>> + if (err)
>> + return err;
>> + if (walk.nbytes != tail_bytes)
>> + return -EINVAL;
>> + kernel_vector_begin();
>> + func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->ctx1,
>> + req->iv, 0);
>> + kernel_vector_end();
>> +
>> + return skcipher_walk_done(&walk, 0);
>> +}
>
> Did you consider writing xts_crypt() the way that arm64 and x86 do it? The
> above seems to reinvent sort of the same thing from first principles. I'm
> wondering if you should just copy the existing approach for now. Then there
> would be no need to add the scatterwalk_next() function, and also the handling
> of inputs that don't need ciphertext stealing would be a bit more streamlined.

I will check the arm and x86's implementations.
But the `scatterwalk_next()` proposed in this series does the same thing as the
call `scatterwalk_ffwd()` in arm and x86's implementations.
The scatterwalk_ffwd() iterates from the beginning of scatterlist(O(n)), but the
scatterwalk_next() is just iterates from the end point of the last used
scatterlist(O(1)).

>> +static int __init riscv64_aes_block_mod_init(void)
>> +{
>> + int ret = -ENODEV;
>> +
>> + if (riscv_isa_extension_available(NULL, ZVKNED) &&
>> + riscv_vector_vlen() >= 128 && riscv_vector_vlen() <= 2048) {
>> + ret = simd_register_skciphers_compat(
>> + riscv64_aes_algs_zvkned,
>> + ARRAY_SIZE(riscv64_aes_algs_zvkned),
>> + riscv64_aes_simd_algs_zvkned);
>> + if (ret)
>> + return ret;
>> +
>> + if (riscv_isa_extension_available(NULL, ZVBB)) {
>> + ret = simd_register_skciphers_compat(
>> + riscv64_aes_alg_zvkned_zvkb,
>> + ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb),
>> + riscv64_aes_simd_alg_zvkned_zvkb);
>> + if (ret)
>> + goto unregister_zvkned;
>
> This makes the registration of the zvkned-zvkb algorithm conditional on zvbb,
> not zvkb. Shouldn't the extension checks actually look like:
>
> ZVKNED
> ZVKB
> ZVBB && ZVKG

Fixed.
But we will have the conditions like:
if(ZVKNED) {
reg_cipher_1();
if(ZVKB) {
reg_cipher_2();
}
if (ZVBB && ZVKG) {
reg_cipher_3();
}
}

> - Eric

2023-11-29 11:15:40

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Wed, Nov 29, 2023 at 10:39:56AM +0800, Jerry Shih wrote:
> On Nov 29, 2023, at 04:12, Eric Biggers <[email protected]> wrote:
> > On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
> >>> +static inline bool check_aes_ext(void)
> >>> +{
> >>> + return riscv_isa_extension_available(NULL, ZVKNED) &&
> >>> + riscv_vector_vlen() >= 128;
> >>> +}
> >>
> >> I'm not keen on this construct, where you are checking vlen greater than
> >> 128 and the presence of Zvkned without checking for the presence of V
> >> itself. Can you use "has_vector()" in any places where you depend on the
> >> presence of vector please?
> >
> > Shouldn't both of those things imply vector support already?
>
> The vector crypto extensions imply `V` extension. Should we still need to check
> the `V` explicitly?
> https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc#1-extensions-overview

The check for Zkvned is only for whether or not Zvkned has been provided
in the DT or ACPI tables, it doesn't mean that the kernel supports the V
extension. I could see something like a hypervisor that does not support
vector parsing the "v" out of the DT or ACPI tables but not eliminating
every single extension that may depend on vector support.

The latter check is, IMO, an implementation detail and also should not
be used to imply that vector is supported.

Actually, Andy - questions for you. If the vsize is not homogeneous we do
not support vector for userspace and we disable vector in hwcap, but
riscv_v_size will have been set by riscv_fill_hwcap(). Is the disabling
of vector propagated to other locations in the kernel that inform
userspace, like hwprobe? I only skimmed the in-kernel vector patchset,
but I could not see anything there that ensures homogeneity either.
Should has_vector() calls start to fail if the vsize is not homogeneous?
I feel like they should, but I might very well be missing something here.

> >> Also, there are potentially a lot of places in this drivers where you
> >> can replace "riscv_isa_extension_available()" with
> >> "riscv_has_extension_likely()". The latter is optimised with
> >> alternatives, so in places that are going to be evaluated frequently it
> >> may be beneficial for you.
> >
> > These extension checks are only executed in module_init functions, so they're
> > not performance critical.

That's fine, they can continue as they are so.

Cheers,
Conor.


Attachments:
(No filename) (2.44 kB)
signature.asc (235.00 B)
Download all attachments

2023-11-29 20:16:34

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

On Wed, Nov 29, 2023 at 03:57:25PM +0800, Jerry Shih wrote:
> On Nov 28, 2023, at 12:07, Eric Biggers <[email protected]> wrote:
> > On Mon, Nov 27, 2023 at 03:06:57PM +0800, Jerry Shih wrote:
> >> +typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
> >> + const struct crypto_aes_ctx *key, u8 *iv,
> >> + int update_iv);
> >
> > There's no need for this indirection, because the function pointer can only have
> > one value.
> >
> > Note also that when Control Flow Integrity is enabled, assembly functions can
> > only be called indirectly when they use SYM_TYPED_FUNC_START. That's another
> > reason to avoid indirect calls that aren't actually necessary.
>
> We have two function pointers for encryption and decryption.
> static int xts_encrypt(struct skcipher_request *req)
> {
> return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt);
> }
>
> static int xts_decrypt(struct skcipher_request *req)
> {
> return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt);
> }
> The enc and dec path could be folded together into `xts_crypt()`, but we will have
> additional branches for enc/decryption path if we don't want to have the indirect calls.
> Use `SYM_TYPED_FUNC_START` in asm might be better.
>

Right. Normal branches are still more efficient and straightforward than
indirect calls, though, and they don't need any special considerations for CFI.
So I'd just add a 'bool encrypt' or 'bool decrypt' argument to xts_crypt(), and
make xts_crypt() call the appropriate assembly function based on that.

> > Did you consider writing xts_crypt() the way that arm64 and x86 do it? The
> > above seems to reinvent sort of the same thing from first principles. I'm
> > wondering if you should just copy the existing approach for now. Then there
> > would be no need to add the scatterwalk_next() function, and also the handling
> > of inputs that don't need ciphertext stealing would be a bit more streamlined.
>
> I will check the arm and x86's implementations.
> But the `scatterwalk_next()` proposed in this series does the same thing as the
> call `scatterwalk_ffwd()` in arm and x86's implementations.
> The scatterwalk_ffwd() iterates from the beginning of scatterlist(O(n)), but the
> scatterwalk_next() is just iterates from the end point of the last used
> scatterlist(O(1)).

Sure, but your scatterwalk_next() only matters when there are multiple
scatterlist entries and the AES-XTS message length isn't a multiple of the AES
block size. That's not an important case, so there's little need to
micro-optimize it. The case that actually matters for AES-XTS is a single-entry
scatterlist containing a whole number of AES blocks.

- Eric

2023-11-29 20:26:20

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation

On Wed, Nov 29, 2023 at 11:12:16AM +0000, Conor Dooley wrote:
> On Wed, Nov 29, 2023 at 10:39:56AM +0800, Jerry Shih wrote:
> > On Nov 29, 2023, at 04:12, Eric Biggers <[email protected]> wrote:
> > > On Tue, Nov 28, 2023 at 05:54:49PM +0000, Conor Dooley wrote:
> > >>> +static inline bool check_aes_ext(void)
> > >>> +{
> > >>> + return riscv_isa_extension_available(NULL, ZVKNED) &&
> > >>> + riscv_vector_vlen() >= 128;
> > >>> +}
> > >>
> > >> I'm not keen on this construct, where you are checking vlen greater than
> > >> 128 and the presence of Zvkned without checking for the presence of V
> > >> itself. Can you use "has_vector()" in any places where you depend on the
> > >> presence of vector please?
> > >
> > > Shouldn't both of those things imply vector support already?
> >
> > The vector crypto extensions imply `V` extension. Should we still need to check
> > the `V` explicitly?
> > https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-spec-vector.adoc#1-extensions-overview
>
> The check for Zkvned is only for whether or not Zvkned has been provided
> in the DT or ACPI tables, it doesn't mean that the kernel supports the V
> extension. I could see something like a hypervisor that does not support
> vector parsing the "v" out of the DT or ACPI tables but not eliminating
> every single extension that may depend on vector support.
>
> The latter check is, IMO, an implementation detail and also should not
> be used to imply that vector is supported.

First, the RISC-V crypto files are only compiled when CONFIG_RISCV_ISA_V=y.
So in those files, we know that the kernel supports V if the hardware does.

If the hardware can indeed declare extensions like Zvkned without declaring V,
that sounds problematic. Would /proc/cpuinfo end up with the same misleading
information in that case, in which case userspace would have the same problem
too? I think that such misconfigurations are best handled centrally by having
the low-level architecture code in the kernel clear all extensions that depend
on missing extensions. IIRC there have been issues like this on x86, and that
was the fix that was implemented. See arch/x86/kernel/cpu/cpuid-deps.c

- Eric

2023-12-01 02:10:33

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

On Nov 29, 2023, at 01:22, Eric Biggers <[email protected]> wrote:
> On Tue, Nov 28, 2023 at 01:38:29PM +0800, Jerry Shih wrote:
>> On Nov 28, 2023, at 11:58, Eric Biggers <[email protected]> wrote:
>>> On Mon, Nov 27, 2023 at 03:06:55PM +0800, Jerry Shih wrote:
>>>> The `walksize` assignment is missed in simd skcipher.
>>>>
>>>> Signed-off-by: Jerry Shih <[email protected]>
>>>> ---
>>>> crypto/cryptd.c | 1 +
>>>> crypto/simd.c | 1 +
>>>> 2 files changed, 2 insertions(+)
>>>>
>>>> diff --git a/crypto/cryptd.c b/crypto/cryptd.c
>>>> index bbcc368b6a55..253d13504ccb 100644
>>>> --- a/crypto/cryptd.c
>>>> +++ b/crypto/cryptd.c
>>>> @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
>>>> (alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
>>>> inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
>>>> inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
>>>> + inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
>>>> inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
>>>> inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
>>>>
>>>> diff --git a/crypto/simd.c b/crypto/simd.c
>>>> index edaa479a1ec5..ea0caabf90f1 100644
>>>> --- a/crypto/simd.c
>>>> +++ b/crypto/simd.c
>>>> @@ -181,6 +181,7 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const char *algname,
>>>>
>>>> alg->ivsize = ialg->ivsize;
>>>> alg->chunksize = ialg->chunksize;
>>>> + alg->walksize = ialg->walksize;
>>>> alg->min_keysize = ialg->min_keysize;
>>>> alg->max_keysize = ialg->max_keysize;
>>>
>>> What are the consequences of this bug? I wonder if it actually matters? The
>>> "inner" algorithm is the one that actually gets used for the "walk", right?
>>>
>>> - Eric
>>
>> Without this, we might still use chunksize or cra_blocksize as the walksize
>> even though we setup with the larger walksize.
>>
>> Here is the code for the walksize default value:
>> static int skcipher_prepare_alg(struct skcipher_alg *alg)
>> {
>> ...
>> if (!alg->chunksize)
>> alg->chunksize = base->cra_blocksize;
>> if (!alg->walksize)
>> alg->walksize = alg->chunksize;
>>
>> And we already have the bigger walksize for x86 aes-xts.
>> .base = {
>> .cra_name = "__xts(aes)",
>> ...
>> },
>> .walksize = 2 * AES_BLOCK_SIZE,
>>
>> The x86 aes-xts only uses one `walk` to handle the tail elements. It assumes
>> that the walksize contains 2 aes blocks. If walksize is not set correctly, maybe
>> some tail elements is not processed in simd-cipher mode for x86 aes-xts.
>
> With the SIMD helper there are three "algorithms": the underlying algorithm, the
> cryptd algorithm, and the simd algorithm. This patch makes the "walksize"
> property be propagated from the underlying algorithm to the cryptd and simd
> algorithms. I don't see how that actually makes a difference, since the only
> place the skcipher_walk happens is on the underlying algorithm. So it uses the
> "walksize" from the underlying algorithm, right?
>
> - Eric

Yes, you are right.
I re-check the cryptd and simd cipher flow. They use the underlying algorithms.
So, the actual `walksize` in the underlying algorithm is set by the user in
skcipher_alg def.
The x86 aes-xts works correctly for both cryptd and simd-cipher case.

This patch becomes fixing the `walksize` display error in `/proc/crypto`.

The aes-xts skcipher_alg def:
...
.ivsize = AES_BLOCK_SIZE,
.chunksize = AES_BLOCK_SIZE,
.walksize = AES_BLOCK_SIZE * 8,
.base = {
.cra_flags = CRYPTO_ALG_INTERNAL,
.cra_name = "__xts(aes)",
.cra_driver_name = "__xts-aes-riscv64-zvkned-zvbb-zvkg",
...
},


Without patch:
The original skcipher:
name : __xts(aes)
driver : __xts-aes-riscv64-zvkned-zvbb-zvkg
internal : yes
async : no
...
walksize : 128

The async skcipher registered by simd_register_skciphers_compat:
name : xts(aes)
driver : xts-aes-riscv64-zvkned-zvbb-zvkg
internal : no
async : yes
...
walksize : 16

...
name : __xts(aes)
driver : cryptd(__xts-aes-riscv64-zvkned-zvbb-zvkg)
internal : yes
async : yes
...
walksize : 16

With patch:
name : xts(aes)
driver : xts-aes-riscv64-zvkned-zvbb-zvkg
internal : no
async : yes
...
walksize : 128

...
name : __xts(aes)
driver : cryptd(__xts-aes-riscv64-zvkned-zvbb-zvkg)
internal : yes
async : yes
...
walksize : 128

2023-12-02 13:21:00

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

On Nov 30, 2023, at 04:16, Eric Biggers <[email protected]> wrote:
> On Wed, Nov 29, 2023 at 03:57:25PM +0800, Jerry Shih wrote:
>> On Nov 28, 2023, at 12:07, Eric Biggers <[email protected]> wrote:
>>> On Mon, Nov 27, 2023 at 03:06:57PM +0800, Jerry Shih wrote:
>>>> +typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
>>>> + const struct crypto_aes_ctx *key, u8 *iv,
>>>> + int update_iv);
>>>
>>> There's no need for this indirection, because the function pointer can only have
>>> one value.
>>>
>>> Note also that when Control Flow Integrity is enabled, assembly functions can
>>> only be called indirectly when they use SYM_TYPED_FUNC_START. That's another
>>> reason to avoid indirect calls that aren't actually necessary.
>>
>> We have two function pointers for encryption and decryption.
>> static int xts_encrypt(struct skcipher_request *req)
>> {
>> return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt);
>> }
>>
>> static int xts_decrypt(struct skcipher_request *req)
>> {
>> return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt);
>> }
>> The enc and dec path could be folded together into `xts_crypt()`, but we will have
>> additional branches for enc/decryption path if we don't want to have the indirect calls.
>> Use `SYM_TYPED_FUNC_START` in asm might be better.
>>
>
> Right. Normal branches are still more efficient and straightforward than
> indirect calls, though, and they don't need any special considerations for CFI.
> So I'd just add a 'bool encrypt' or 'bool decrypt' argument to xts_crypt(), and
> make xts_crypt() call the appropriate assembly function based on that.

Fixed.
The xts_crypt() now has an additional bool argument for enc/decryption.

>>> Did you consider writing xts_crypt() the way that arm64 and x86 do it? The
>>> above seems to reinvent sort of the same thing from first principles. I'm
>>> wondering if you should just copy the existing approach for now. Then there
>>> would be no need to add the scatterwalk_next() function, and also the handling
>>> of inputs that don't need ciphertext stealing would be a bit more streamlined.
>>
>> I will check the arm and x86's implementations.
>> But the `scatterwalk_next()` proposed in this series does the same thing as the
>> call `scatterwalk_ffwd()` in arm and x86's implementations.
>> The scatterwalk_ffwd() iterates from the beginning of scatterlist(O(n)), but the
>> scatterwalk_next() is just iterates from the end point of the last used
>> scatterlist(O(1)).
>
> Sure, but your scatterwalk_next() only matters when there are multiple
> scatterlist entries and the AES-XTS message length isn't a multiple of the AES
> block size. That's not an important case, so there's little need to
> micro-optimize it. The case that actually matters for AES-XTS is a single-entry
> scatterlist containing a whole number of AES blocks.

The v3 patch will remove the `scatterwalk_next()` and use `scatterwalk_ffwd()`
instead.

-Jerry

2023-12-08 04:06:17

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

On Mon, Nov 27, 2023 at 03:06:55PM +0800, Jerry Shih wrote:
> The `walksize` assignment is missed in simd skcipher.
>
> Signed-off-by: Jerry Shih <[email protected]>
> ---
> crypto/cryptd.c | 1 +
> crypto/simd.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/crypto/cryptd.c b/crypto/cryptd.c
> index bbcc368b6a55..253d13504ccb 100644
> --- a/crypto/cryptd.c
> +++ b/crypto/cryptd.c
> @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
> (alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
> inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
> inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
> + inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
> inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
> inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);

Sorry but this patch doesn't apply any more now that we have
lskcipher.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2023-12-08 04:19:00

by Jerry Shih

[permalink] [raw]
Subject: Re: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher

On Dec 8, 2023, at 12:05, Herbert Xu <[email protected]> wrote:
> On Mon, Nov 27, 2023 at 03:06:55PM +0800, Jerry Shih wrote:
>> The `walksize` assignment is missed in simd skcipher.
>>
>> Signed-off-by: Jerry Shih <[email protected]>
>> ---
>> crypto/cryptd.c | 1 +
>> crypto/simd.c | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/crypto/cryptd.c b/crypto/cryptd.c
>> index bbcc368b6a55..253d13504ccb 100644
>> --- a/crypto/cryptd.c
>> +++ b/crypto/cryptd.c
>> @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_template *tmpl,
>> (alg->base.cra_flags & CRYPTO_ALG_INTERNAL);
>> inst->alg.ivsize = crypto_skcipher_alg_ivsize(alg);
>> inst->alg.chunksize = crypto_skcipher_alg_chunksize(alg);
>> + inst->alg.walksize = crypto_skcipher_alg_walksize(alg);
>> inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg);
>> inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg);
>
> Sorry but this patch doesn't apply any more now that we have
> lskcipher.

The lskcipher is merged in kernel `6.7`. I will rebase the v3 series to `6.7` later.
Link: https://lore.kernel.org/all/[email protected]/

Some dependent patches are not applicable to `6.7` now. I will check the status for the
dependent patches.

-Jerry