2023-10-25 18:37:09

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 00/12] RISC-V: provide some accelerated cryptography implementations using vector extensions

This series provides cryptographic implementations using the vector crypto
extensions[1] including:
1. AES cipher
2. AES with CBC/CTR/ECB/XTS block modes
3. ChaCha20 stream cipher
4. GHASH for GCM
5. SHA-224/256 and SHA-384/512 hash
6. SM3 hash
7. SM4 cipher

This patch set is based on Heiko Stuebner's work at:
Link: https://lore.kernel.org/all/[email protected]/

The implementations reuse the perl-asm scripts from OpenSSL[2] with some
changes adapting for the kernel crypto framework.
The perl-asm scripts generate the opcodes into `.S` files instead of asm
mnemonics. The reason for using opcodes is because the assembler hasn't
supported the vector-crypto extensions yet.

All changes pass the kernel run-time crypto self tests and the extra tests
with vector-crypto-enabled qemu.
Link: https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg08822.html

This series depend on:
1. kernel 6.6-rc7
Link: https://github.com/torvalds/linux/commit/05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
2. support kernel-mode vector
Link: https://lore.kernel.org/all/[email protected]/
3. vector crypto extensions detection
Link: https://lore.kernel.org/lkml/[email protected]/
4. fix the error message:
alg: skcipher: skipping comparison tests for xts-aes-aesni because
xts(ecb(aes-generic)) is unavailable
Link: https://lore.kernel.org/linux-crypto/[email protected]/

Here is a branch on github applying with all dependent patches:
Link: https://github.com/JerryShih/linux/tree/dev/jerrys/vector-crypto-upstream

[1]
Link: https://github.com/riscv/riscv-crypto/blob/56ed7952d13eb5bdff92e2b522404668952f416d/doc/vector/riscv-crypto-spec-vector.adoc
[2]
Link: https://github.com/openssl/openssl/pull/21923

Heiko Stuebner (2):
RISC-V: add helper function to read the vector VLEN
RISC-V: hook new crypto subdir into build-system

Jerry Shih (10):
RISC-V: crypto: add OpenSSL perl module for vector instructions
RISC-V: crypto: add Zvkned accelerated AES implementation
crypto: scatterwalk - Add scatterwalk_next() to get the next
scatterlist in scatter_walk
RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations
RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation
RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations
RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations
RISC-V: crypto: add Zvksed accelerated SM4 implementation
RISC-V: crypto: add Zvksh accelerated SM3 implementation
RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

arch/riscv/Kbuild | 1 +
arch/riscv/crypto/Kconfig | 122 ++
arch/riscv/crypto/Makefile | 68 +
.../crypto/aes-riscv64-block-mode-glue.c | 486 +++++++
arch/riscv/crypto/aes-riscv64-glue.c | 163 +++
arch/riscv/crypto/aes-riscv64-glue.h | 28 +
.../crypto/aes-riscv64-zvbb-zvkg-zvkned.pl | 944 +++++++++++++
arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl | 416 ++++++
arch/riscv/crypto/aes-riscv64-zvkned.pl | 1198 +++++++++++++++++
arch/riscv/crypto/chacha-riscv64-glue.c | 120 ++
arch/riscv/crypto/chacha-riscv64-zvkb.pl | 322 +++++
arch/riscv/crypto/ghash-riscv64-glue.c | 191 +++
arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 ++
arch/riscv/crypto/riscv.pm | 828 ++++++++++++
arch/riscv/crypto/sha256-riscv64-glue.c | 140 ++
.../sha256-riscv64-zvkb-zvknha_or_zvknhb.pl | 318 +++++
arch/riscv/crypto/sha512-riscv64-glue.c | 133 ++
.../crypto/sha512-riscv64-zvkb-zvknhb.pl | 266 ++++
arch/riscv/crypto/sm3-riscv64-glue.c | 121 ++
arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 ++++
arch/riscv/crypto/sm4-riscv64-glue.c | 120 ++
arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++
arch/riscv/include/asm/vector.h | 11 +
crypto/Kconfig | 3 +
include/crypto/scatterwalk.h | 9 +-
25 files changed, 6604 insertions(+), 2 deletions(-)
create mode 100644 arch/riscv/crypto/Kconfig
create mode 100644 arch/riscv/crypto/Makefile
create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
create mode 100644 arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl
create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl
create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl
create mode 100644 arch/riscv/crypto/riscv.pm
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

--
2.28.0


2023-10-25 18:37:20

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 02/12] RISC-V: hook new crypto subdir into build-system

From: Heiko Stuebner <[email protected]>

Create a crypto subdirectory for added accelerated cryptography routines
and hook it into the riscv Kbuild and the main crypto Kconfig.

Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/Kbuild | 1 +
arch/riscv/crypto/Kconfig | 5 +++++
arch/riscv/crypto/Makefile | 4 ++++
crypto/Kconfig | 3 +++
4 files changed, 13 insertions(+)
create mode 100644 arch/riscv/crypto/Kconfig
create mode 100644 arch/riscv/crypto/Makefile

diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index d25ad1c19f88..2c585f7a0b6e 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -2,6 +2,7 @@

obj-y += kernel/ mm/ net/
obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
+obj-$(CONFIG_CRYPTO) += crypto/
obj-y += errata/
obj-$(CONFIG_KVM) += kvm/

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
new file mode 100644
index 000000000000..10d60edc0110
--- /dev/null
+++ b/arch/riscv/crypto/Kconfig
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
+
+endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
new file mode 100644
index 000000000000..b3b6332c9f6d
--- /dev/null
+++ b/arch/riscv/crypto/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# linux/arch/riscv/crypto/Makefile
+#
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 650b1b3620d8..c7b23d2c58e4 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1436,6 +1436,9 @@ endif
if PPC
source "arch/powerpc/crypto/Kconfig"
endif
+if RISCV
+source "arch/riscv/crypto/Kconfig"
+endif
if S390
source "arch/s390/crypto/Kconfig"
endif
--
2.28.0

2023-10-25 18:37:40

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation

The AES implementation using the Zvkned vector crypto extension from
OpenSSL(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 12 +
arch/riscv/crypto/Makefile | 11 +
arch/riscv/crypto/aes-riscv64-glue.c | 163 +++++++++
arch/riscv/crypto/aes-riscv64-glue.h | 28 ++
arch/riscv/crypto/aes-riscv64-zvkned.pl | 451 ++++++++++++++++++++++++
5 files changed, 665 insertions(+)
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 10d60edc0110..500938317e71 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,4 +2,16 @@

menu "Accelerated Cryptographic Algorithms for CPU (riscv)"

+config CRYPTO_AES_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Ciphers: AES"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_AES
+ select CRYPTO_ALGAPI
+ help
+ Block ciphers: AES cipher algorithms (FIPS-197)
+
+ Architecture: riscv64 using:
+ - Zvkned vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b3b6332c9f6d..90ca91d8df26 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -2,3 +2,14 @@
#
# linux/arch/riscv/crypto/Makefile
#
+
+obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
+aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
+
+quiet_cmd_perlasm = PERLASM $@
+ cmd_perlasm = $(PERL) $(<) void $(@)
+
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+ $(call cmd,perlasm)
+
+clean-files += aes-riscv64-zvkned.S
diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-riscv64-glue.c
new file mode 100644
index 000000000000..5c4f1018d3aa
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES implementation for RISC-V
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+/*
+ * aes cipher using zvkned vector crypto extension
+ *
+ * All zvkned-based functions use encryption expending keys for both encryption
+ * and decryption.
+ */
+void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
+void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
+
+static inline int aes_round_num(unsigned int keylen)
+{
+ switch (keylen) {
+ case AES_KEYSIZE_128:
+ return 10;
+ case AES_KEYSIZE_192:
+ return 12;
+ case AES_KEYSIZE_256:
+ return 14;
+ default:
+ return 0;
+ }
+}
+
+int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
+ unsigned int keylen)
+{
+ /*
+ * The RISC-V AES vector crypto key expending doesn't support AES-192.
+ * We just use the generic software key expending here to simplify the key
+ * expending flow.
+ */
+ u32 aes_rounds;
+ u32 key_length;
+ int ret;
+
+ ret = aes_expandkey(&ctx->fallback_ctx, key, keylen);
+ if (ret < 0)
+ return -EINVAL;
+
+ /*
+ * Copy the key from `crypto_aes_ctx` to `aes_key` for zvkned-based AES
+ * implementations.
+ */
+ aes_rounds = aes_round_num(keylen);
+ ctx->key.rounds = aes_rounds;
+ key_length = AES_BLOCK_SIZE * (aes_rounds + 1);
+ memcpy(ctx->key.key, ctx->fallback_ctx.key_enc, key_length);
+
+ return 0;
+}
+
+void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+ const u8 *src)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvkned_encrypt(src, dst, &ctx->key);
+ kernel_vector_end();
+ } else {
+ aes_encrypt(&ctx->fallback_ctx, dst, src);
+ }
+}
+
+void riscv64_aes_decrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+ const u8 *src)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvkned_decrypt(src, dst, &ctx->key);
+ kernel_vector_end();
+ } else {
+ aes_decrypt(&ctx->fallback_ctx, dst, src);
+ }
+}
+
+static int aes_setkey(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return riscv64_aes_setkey(ctx, key, keylen);
+}
+
+static void aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ riscv64_aes_encrypt_zvkned(ctx, dst, src);
+}
+
+static void aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ riscv64_aes_decrypt_zvkned(ctx, dst, src);
+}
+
+static struct crypto_alg riscv64_aes_alg_zvkned = {
+ .cra_name = "aes",
+ .cra_driver_name = "aes-riscv64-zvkned",
+ .cra_module = THIS_MODULE,
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = aes_setkey,
+ .cia_encrypt = aes_encrypt_zvkned,
+ .cia_decrypt = aes_decrypt_zvkned,
+ },
+};
+
+static inline bool check_aes_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_aes_mod_init(void)
+{
+ if (check_aes_ext())
+ return crypto_register_alg(&riscv64_aes_alg_zvkned);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_aes_mod_fini(void)
+{
+ if (check_aes_ext())
+ crypto_unregister_alg(&riscv64_aes_alg_zvkned);
+}
+
+module_init(riscv64_aes_mod_init);
+module_exit(riscv64_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-riscv64-glue.h
new file mode 100644
index 000000000000..7f1f675aca0d
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef AES_RISCV64_GLUE_H
+#define AES_RISCV64_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+struct aes_key {
+ u32 key[AES_MAX_KEYLENGTH_U32];
+ u32 rounds;
+};
+
+struct riscv64_aes_ctx {
+ struct aes_key key;
+ struct crypto_aes_ctx fallback_ctx;
+};
+
+int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
+ unsigned int keylen);
+
+void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+ const u8 *src);
+
+void riscv64_aes_decrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
+ const u8 *src);
+
+#endif /* AES_RISCV64_GLUE_H */
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..c0ecde77bf56
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,451 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+{
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+# const AES_KEY *key);
+my ($INP, $OUTP, $KEYP) = ("a0", "a1", "a2");
+my ($T0, $T1, $ROUNDS, $T6) = ("a3", "a4", "t5", "t6");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+ # Load number of rounds
+ lwu $ROUNDS, 240($KEYP)
+
+ # Get proper routine for key size
+ li $T6, 14
+ beq $ROUNDS, $T6, L_enc_256
+ li $T6, 10
+ beq $ROUNDS, $T6, L_enc_128
+ li $T6, 12
+ beq $ROUNDS, $T6, L_enc_192
+
+ j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, ($INP)]}
+
+ @{[vle32_v $V10, ($KEYP)]}
+ @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, ($KEYP)]}
+ @{[vaesem_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, ($KEYP)]}
+ @{[vaesem_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, ($KEYP)]}
+ @{[vaesem_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, ($KEYP)]}
+ @{[vaesem_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, ($KEYP)]}
+ @{[vaesem_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V16, ($KEYP)]}
+ @{[vaesem_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V17, ($KEYP)]}
+ @{[vaesem_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V18, ($KEYP)]}
+ @{[vaesem_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V19, ($KEYP)]}
+ @{[vaesem_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V20, ($KEYP)]}
+ @{[vaesef_vs $V1, $V20]} # with round key w[40,43]
+
+ @{[vse32_v $V1, ($OUTP)]}
+
+ ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_192:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, ($INP)]}
+
+ @{[vle32_v $V10, ($KEYP)]}
+ @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, ($KEYP)]}
+ @{[vaesem_vs $V1, $V11]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, ($KEYP)]}
+ @{[vaesem_vs $V1, $V12]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, ($KEYP)]}
+ @{[vaesem_vs $V1, $V13]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, ($KEYP)]}
+ @{[vaesem_vs $V1, $V14]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, ($KEYP)]}
+ @{[vaesem_vs $V1, $V15]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V16, ($KEYP)]}
+ @{[vaesem_vs $V1, $V16]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V17, ($KEYP)]}
+ @{[vaesem_vs $V1, $V17]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V18, ($KEYP)]}
+ @{[vaesem_vs $V1, $V18]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V19, ($KEYP)]}
+ @{[vaesem_vs $V1, $V19]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V20, ($KEYP)]}
+ @{[vaesem_vs $V1, $V20]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V21, ($KEYP)]}
+ @{[vaesem_vs $V1, $V21]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V22, ($KEYP)]}
+ @{[vaesef_vs $V1, $V22]}
+
+ @{[vse32_v $V1, ($OUTP)]}
+ ret
+.size L_enc_192,.-L_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, ($INP)]}
+
+ @{[vle32_v $V10, ($KEYP)]}
+ @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, ($KEYP)]}
+ @{[vaesem_vs $V1, $V11]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, ($KEYP)]}
+ @{[vaesem_vs $V1, $V12]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, ($KEYP)]}
+ @{[vaesem_vs $V1, $V13]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, ($KEYP)]}
+ @{[vaesem_vs $V1, $V14]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, ($KEYP)]}
+ @{[vaesem_vs $V1, $V15]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V16, ($KEYP)]}
+ @{[vaesem_vs $V1, $V16]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V17, ($KEYP)]}
+ @{[vaesem_vs $V1, $V17]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V18, ($KEYP)]}
+ @{[vaesem_vs $V1, $V18]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V19, ($KEYP)]}
+ @{[vaesem_vs $V1, $V19]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V20, ($KEYP)]}
+ @{[vaesem_vs $V1, $V20]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V21, ($KEYP)]}
+ @{[vaesem_vs $V1, $V21]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V22, ($KEYP)]}
+ @{[vaesem_vs $V1, $V22]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V23, ($KEYP)]}
+ @{[vaesem_vs $V1, $V23]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V24, ($KEYP)]}
+ @{[vaesef_vs $V1, $V24]}
+
+ @{[vse32_v $V1, ($OUTP)]}
+ ret
+.size L_enc_256,.-L_enc_256
+___
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+# const AES_KEY *key);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+ # Load number of rounds
+ lwu $ROUNDS, 240($KEYP)
+
+ # Get proper routine for key size
+ li $T6, 14
+ beq $ROUNDS, $T6, L_dec_256
+ li $T6, 10
+ beq $ROUNDS, $T6, L_dec_128
+ li $T6, 12
+ beq $ROUNDS, $T6, L_dec_192
+
+ j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, ($INP)]}
+
+ addi $KEYP, $KEYP, 160
+ @{[vle32_v $V20, ($KEYP)]}
+ @{[vaesz_vs $V1, $V20]} # with round key w[40,43]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V19, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V18, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V17, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V16, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V15, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V14, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V13, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V12, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V11, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V10, ($KEYP)]}
+ @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]
+
+ @{[vse32_v $V1, ($OUTP)]}
+
+ ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_192:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, ($INP)]}
+
+ addi $KEYP, $KEYP, 192
+ @{[vle32_v $V22, ($KEYP)]}
+ @{[vaesz_vs $V1, $V22]} # with round key w[48,51]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V21, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V21]} # with round key w[44,47]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V20, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V20]} # with round key w[40,43]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V19, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V18, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V17, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V16, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V15, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V14, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V13, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V12, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V11, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V10, ($KEYP)]}
+ @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]
+
+ @{[vse32_v $V1, ($OUTP)]}
+
+ ret
+.size L_dec_192,.-L_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[vle32_v $V1, ($INP)]}
+
+ addi $KEYP, $KEYP, 224
+ @{[vle32_v $V24, ($KEYP)]}
+ @{[vaesz_vs $V1, $V24]} # with round key w[56,59]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V23, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V23]} # with round key w[52,55]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V22, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V22]} # with round key w[48,51]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V21, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V21]} # with round key w[44,47]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V20, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V20]} # with round key w[40,43]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V19, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V18, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V17, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V16, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V15, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V14, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V13, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V12, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V11, ($KEYP)]}
+ @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
+ addi $KEYP, $KEYP, -16
+ @{[vle32_v $V10, ($KEYP)]}
+ @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]
+
+ @{[vse32_v $V1, ($OUTP)]}
+
+ ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+ li a0, -1
+ ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+ li a0, -2
+ ret
+.size L_fail_m2,.-L_fail_m2
+
+L_end:
+ ret
+.size L_end,.-L_end
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-10-25 18:37:42

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 03/12] RISC-V: crypto: add OpenSSL perl module for vector instructions

The OpenSSL has some RISC-V vector cryptography implementations which
could be reused for kernel. These implementations use a number of perl
helpers for handling vector and vector-crypto-extension instructions.
This patch take these perl helpers from OpenSSL(openssl/openssl#21923).
The unused scalar crypto instructions in the original perl module are
skipped.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/riscv.pm | 828 +++++++++++++++++++++++++++++++++++++
1 file changed, 828 insertions(+)
create mode 100644 arch/riscv/crypto/riscv.pm

diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
new file mode 100644
index 000000000000..e188f7476e3e
--- /dev/null
+++ b/arch/riscv/crypto/riscv.pm
@@ -0,0 +1,828 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+use strict;
+use warnings;
+
+# Set $have_stacktrace to 1 if we have Devel::StackTrace
+my $have_stacktrace = 0;
+if (eval {require Devel::StackTrace;1;}) {
+ $have_stacktrace = 1;
+}
+
+my @regs = map("x$_",(0..31));
+# Mapping from the RISC-V psABI ABI mnemonic names to the register number.
+my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1',
+ map("a$_",(0..7)),
+ map("s$_",(2..11)),
+ map("t$_",(3..6))
+);
+
+my %reglookup;
+@reglookup{@regs} = @regs;
+@reglookup{@regaliases} = @regs;
+
+# Takes a register name, possibly an alias, and converts it to a register index
+# from 0 to 31
+sub read_reg {
+ my $reg = lc shift;
+ if (!exists($reglookup{$reg})) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unknown register ".$reg."\n".$trace);
+ }
+ my $regstr = $reglookup{$reg};
+ if (!($regstr =~ /^x([0-9]+)$/)) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Could not process register ".$reg."\n".$trace);
+ }
+ return $1;
+}
+
+# Read the sew setting(8, 16, 32 and 64) and convert to vsew encoding.
+sub read_sew {
+ my $sew_setting = shift;
+
+ if ($sew_setting eq "e8") {
+ return 0;
+ } elsif ($sew_setting eq "e16") {
+ return 1;
+ } elsif ($sew_setting eq "e32") {
+ return 2;
+ } elsif ($sew_setting eq "e64") {
+ return 3;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported SEW setting:".$sew_setting."\n".$trace);
+ }
+}
+
+# Read the LMUL settings and convert to vlmul encoding.
+sub read_lmul {
+ my $lmul_setting = shift;
+
+ if ($lmul_setting eq "mf8") {
+ return 5;
+ } elsif ($lmul_setting eq "mf4") {
+ return 6;
+ } elsif ($lmul_setting eq "mf2") {
+ return 7;
+ } elsif ($lmul_setting eq "m1") {
+ return 0;
+ } elsif ($lmul_setting eq "m2") {
+ return 1;
+ } elsif ($lmul_setting eq "m4") {
+ return 2;
+ } elsif ($lmul_setting eq "m8") {
+ return 3;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported LMUL setting:".$lmul_setting."\n".$trace);
+ }
+}
+
+# Read the tail policy settings and convert to vta encoding.
+sub read_tail_policy {
+ my $tail_setting = shift;
+
+ if ($tail_setting eq "ta") {
+ return 1;
+ } elsif ($tail_setting eq "tu") {
+ return 0;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported tail policy setting:".$tail_setting."\n".$trace);
+ }
+}
+
+# Read the mask policy settings and convert to vma encoding.
+sub read_mask_policy {
+ my $mask_setting = shift;
+
+ if ($mask_setting eq "ma") {
+ return 1;
+ } elsif ($mask_setting eq "mu") {
+ return 0;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unsupported mask policy setting:".$mask_setting."\n".$trace);
+ }
+}
+
+my @vregs = map("v$_",(0..31));
+my %vreglookup;
+@vreglookup{@vregs} = @vregs;
+
+sub read_vreg {
+ my $vreg = lc shift;
+ if (!exists($vreglookup{$vreg})) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Unknown vector register ".$vreg."\n".$trace);
+ }
+ if (!($vreg =~ /^v([0-9]+)$/)) {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("Could not process vector register ".$vreg."\n".$trace);
+ }
+ return $1;
+}
+
+# Read the vm settings and convert to mask encoding.
+sub read_mask_vreg {
+ my $vreg = shift;
+ # The default value is unmasked.
+ my $mask_bit = 1;
+
+ if (defined($vreg)) {
+ my $reg_id = read_vreg $vreg;
+ if ($reg_id == 0) {
+ $mask_bit = 0;
+ } else {
+ my $trace = "";
+ if ($have_stacktrace) {
+ $trace = Devel::StackTrace->new->as_string;
+ }
+ die("The ".$vreg." is not the mask register v0.\n".$trace);
+ }
+ }
+ return $mask_bit;
+}
+
+# Vector instructions
+
+sub vadd_vv {
+ # vadd.vv vd, vs2, vs1, vm
+ my $template = 0b000000_0_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vadd_vx {
+ # vadd.vx vd, vs2, rs1, vm
+ my $template = 0b000000_0_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsub_vv {
+ # vsub.vv vd, vs2, vs1, vm
+ my $template = 0b000010_0_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vsub_vx {
+ # vsub.vx vd, vs2, rs1, vm
+ my $template = 0b000010_0_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vid_v {
+ # vid.v vd
+ my $template = 0b0101001_00000_10001_010_00000_1010111;
+ my $vd = read_vreg shift;
+ return ".word ".($template | ($vd << 7));
+}
+
+sub viota_m {
+ # viota.m vd, vs2, vm
+ my $template = 0b010100_0_00000_10000_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vle8_v {
+ # vle8.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_000_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle32_v {
+ # vle32.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_110_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle64_v {
+ # vle64.v vd, (rs1)
+ my $template = 0b0000001_00000_00000_111_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse32_v {
+ # vlse32.v vd, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlsseg_nf_e32_v {
+ # vlsseg<nf>e32.v vd, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0000111;
+ my $nf = shift;
+ $nf -= 1;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse64_v {
+ # vlse64.v vd, (rs1), rs2
+ my $template = 0b0000101_00000_00000_111_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vluxei8_v {
+ # vluxei8.v vd, (rs1), vs2, vm
+ my $template = 0b000001_0_00000_00000_000_00000_0000111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmerge_vim {
+ # vmerge.vim vd, vs2, imm, v0
+ my $template = 0b0101110_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $imm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7));
+}
+
+sub vmerge_vvm {
+ # vmerge.vvm vd vs2 vs1
+ my $template = 0b0101110_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7))
+}
+
+sub vmseq_vi {
+ # vmseq.vi vd vs1, imm
+ my $template = 0b0110001_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $imm = shift;
+ return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7))
+}
+
+sub vmsgtu_vx {
+ # vmsgtu.vx vd vs2, rs1, vm
+ my $template = 0b011110_0_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7))
+}
+
+sub vmv_v_i {
+ # vmv.v.i vd, imm
+ my $template = 0b0101111_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $imm = shift;
+ return ".word ".($template | ($imm << 15) | ($vd << 7));
+}
+
+sub vmv_v_x {
+ # vmv.v.x vd, rs1
+ my $template = 0b0101111_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmv_v_v {
+ # vmv.v.v vd, vs1
+ my $template = 0b0101111_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vor_vv_v0t {
+ # vor.vv vd, vs2, vs1, v0.t
+ my $template = 0b0010100_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vse8_v {
+ # vse8.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_000_00000_0100111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vse32_v {
+ # vse32.v vd, (rs1), vm
+ my $template = 0b000000_0_00000_00000_110_00000_0100111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vssseg_nf_e32_v {
+ # vssseg<nf>e32.v vs3, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0100111;
+ my $nf = shift;
+ $nf -= 1;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsuxei8_v {
+ # vsuxei8.v vs3, (rs1), vs2, vm
+ my $template = 0b000001_0_00000_00000_000_00000_0100111;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vse64_v {
+ # vse64.v vd, (rs1)
+ my $template = 0b0000001_00000_00000_111_00000_0100111;
+ my $vd = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsetivli__x0_2_e64_m1_tu_mu {
+ # vsetivli x0, 2, e64, m1, tu, mu
+ return ".word 0xc1817057";
+}
+
+sub vsetivli__x0_4_e32_m1_tu_mu {
+ # vsetivli x0, 4, e32, m1, tu, mu
+ return ".word 0xc1027057";
+}
+
+sub vsetivli__x0_4_e64_m1_tu_mu {
+ # vsetivli x0, 4, e64, m1, tu, mu
+ return ".word 0xc1827057";
+}
+
+sub vsetivli__x0_8_e32_m1_tu_mu {
+ # vsetivli x0, 8, e32, m1, tu, mu
+ return ".word 0xc1047057";
+}
+
+sub vsetvli {
+ # vsetvli rd, rs1, vtypei
+ my $template = 0b0_00000000000_00000_111_00000_1010111;
+ my $rd = read_reg shift;
+ my $rs1 = read_reg shift;
+ my $sew = read_sew shift;
+ my $lmul = read_lmul shift;
+ my $tail_policy = read_tail_policy shift;
+ my $mask_policy = read_mask_policy shift;
+ my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul;
+
+ return ".word ".($template | ($vtypei << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub vsetivli {
+ # vsetvli rd, uimm, vtypei
+ my $template = 0b11_0000000000_00000_111_00000_1010111;
+ my $rd = read_reg shift;
+ my $uimm = shift;
+ my $sew = read_sew shift;
+ my $lmul = read_lmul shift;
+ my $tail_policy = read_tail_policy shift;
+ my $mask_policy = read_mask_policy shift;
+ my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul;
+
+ return ".word ".($template | ($vtypei << 20) | ($uimm << 15) | ($rd << 7));
+}
+
+sub vslidedown_vi {
+ # vslidedown.vi vd, vs2, uimm
+ my $template = 0b0011111_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslidedown_vx {
+ # vslidedown.vx vd, vs2, rs1
+ my $template = 0b0011111_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vslideup_vi_v0t {
+ # vslideup.vi vd, vs2, uimm, v0.t
+ my $template = 0b0011100_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi {
+ # vslideup.vi vd, vs2, uimm
+ my $template = 0b0011101_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsll_vi {
+ # vsll.vi vd, vs2, uimm, vm
+ my $template = 0b1001011_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsrl_vx {
+ # vsrl.vx vd, vs2, rs1
+ my $template = 0b1010001_00000_00000_100_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsse32_v {
+ # vse32.v vs3, (rs1), rs2
+ my $template = 0b0000101_00000_00000_110_00000_0100111;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsse64_v {
+ # vsse64.v vs3, (rs1), rs2
+ my $template = 0b0000101_00000_00000_111_00000_0100111;
+ my $vs3 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ my $rs2 = read_reg shift;
+ return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vxor_vv_v0t {
+ # vxor.vv vd, vs2, vs1, v0.t
+ my $template = 0b0010110_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vxor_vv {
+ # vxor.vv vd, vs2, vs1
+ my $template = 0b0010111_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vzext_vf2 {
+ # vzext.vf2 vd, vs2, vm
+ my $template = 0b010010_0_00000_00110_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+# Vector crypto instructions
+
+## Zvbb and Zvkb instructions
+##
+## vandn (also in zvkb)
+## vbrev
+## vbrev8 (also in zvkb)
+## vrev8 (also in zvkb)
+## vclz
+## vctz
+## vcpop
+## vrol (also in zvkb)
+## vror (also in zvkb)
+## vwsll
+
+sub vbrev8_v {
+ # vbrev8.v vd, vs2, vm
+ my $template = 0b010010_0_00000_01000_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vrev8_v {
+ # vrev8.v vd, vs2, vm
+ my $template = 0b010010_0_00000_01001_010_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vror_vi {
+ # vror.vi vd, vs2, uimm
+ my $template = 0b01010_0_1_00000_00000_011_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ my $uimm_i5 = $uimm >> 5;
+ my $uimm_i4_0 = $uimm & 0b11111;
+
+ return ".word ".($template | ($uimm_i5 << 26) | ($vs2 << 20) | ($uimm_i4_0 << 15) | ($vd << 7));
+}
+
+sub vwsll_vv {
+ # vwsll.vv vd, vs2, vs1, vm
+ my $template = 0b110101_0_00000_00000_000_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ my $vm = read_mask_vreg shift;
+ return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+## Zvbc instructions
+
+sub vclmulh_vx {
+ # vclmulh.vx vd, vs2, rs1
+ my $template = 0b0011011_00000_00000_110_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx_v0t {
+ # vclmul.vx vd, vs2, rs1, v0.t
+ my $template = 0b0011000_00000_00000_110_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx {
+ # vclmul.vx vd, vs2, rs1
+ my $template = 0b0011001_00000_00000_110_00000_1010111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $rs1 = read_reg shift;
+ return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+## Zvkg instructions
+
+sub vghsh_vv {
+ # vghsh.vv vd, vs2, vs1
+ my $template = 0b1011001_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vgmul_vv {
+ # vgmul.vv vd, vs2
+ my $template = 0b1010001_00000_10001_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkned instructions
+
+sub vaesdf_vs {
+ # vaesdf.vs vd, vs2
+ my $template = 0b101001_1_00000_00001_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesdm_vs {
+ # vaesdm.vs vd, vs2
+ my $template = 0b101001_1_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesef_vs {
+ # vaesef.vs vd, vs2
+ my $template = 0b101001_1_00000_00011_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesem_vs {
+ # vaesem.vs vd, vs2
+ my $template = 0b101001_1_00000_00010_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf1_vi {
+ # vaeskf1.vi vd, vs2, uimmm
+ my $template = 0b100010_1_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf2_vi {
+ # vaeskf2.vi vd, vs2, uimm
+ my $template = 0b101010_1_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vaesz_vs {
+ # vaesz.vs vd, vs2
+ my $template = 0b101001_1_00000_00111_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvknha and Zvknhb instructions
+
+sub vsha2ms_vv {
+ # vsha2ms.vv vd, vs2, vs1
+ my $template = 0b1011011_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2ch_vv {
+ # vsha2ch.vv vd, vs2, vs1
+ my $template = 0b101110_10000_00000_001_00000_01110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2cl_vv {
+ # vsha2cl.vv vd, vs2, vs1
+ my $template = 0b101111_10000_00000_001_00000_01110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+## Zvksed instructions
+
+sub vsm4k_vi {
+ # vsm4k.vi vd, vs2, uimm
+ my $template = 0b1000011_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsm4r_vs {
+ # vsm4r.vs vd, vs2
+ my $template = 0b1010011_00000_10000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## zvksh instructions
+
+sub vsm3c_vi {
+ # vsm3c.vi vd, vs2, uimm
+ my $template = 0b1010111_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $uimm = shift;
+ return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7));
+}
+
+sub vsm3me_vv {
+ # vsm3me.vv vd, vs2, vs1
+ my $template = 0b1000001_00000_00000_010_00000_1110111;
+ my $vd = read_vreg shift;
+ my $vs2 = read_vreg shift;
+ my $vs1 = read_vreg shift;
+ return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7));
+}
+
+1;
--
2.28.0

2023-10-25 18:37:57

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 05/12] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk

In some situations, we might split the `skcipher_request` into several
segments. When we try to move to next segment, we might use
`scatterwalk_ffwd()` to get the corresponding `scatterlist` iterating from
the head of `scatterlist`.

This helper function could just gather the information in `skcipher_walk`
and move to next `scatterlist` directly.

Signed-off-by: Jerry Shih <[email protected]>
---
include/crypto/scatterwalk.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index 32fc4473175b..b1a90afe695d 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -98,7 +98,12 @@ void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
unsigned int start, unsigned int nbytes, int out);

struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2],
- struct scatterlist *src,
- unsigned int len);
+ struct scatterlist *src, unsigned int len);
+
+static inline struct scatterlist *scatterwalk_next(struct scatterlist dst[2],
+ struct scatter_walk *src)
+{
+ return scatterwalk_ffwd(dst, src->sg, src->offset - src->sg->offset);
+}

#endif /* _CRYPTO_SCATTERWALK_H */
--
2.28.0

2023-10-25 18:38:09

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 01/12] RISC-V: add helper function to read the vector VLEN

From: Heiko Stuebner <[email protected]>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_v_vsize that contains
the value of "32 vector registers with vlenb length" that gets filled
during boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/include/asm/vector.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 9fb2dea66abd..1fd3e5510b64 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -244,4 +244,15 @@ void kernel_vector_allow_preemption(void);
#define kernel_vector_allow_preemption() do {} while (0)
#endif

+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_v_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+ return riscv_v_vsize / 32 * 8;
+}
+
#endif /* ! __ASM_RISCV_VECTOR_H */
--
2.28.0

2023-10-25 18:38:13

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation

Add a gcm hash implementation using the Zvkg extension from OpenSSL
(openssl/openssl#21923).

The perlasm here is different from the original implementation in OpenSSL.
The OpenSSL assumes that the H is stored in little-endian. Thus, it needs
to convert the H to big-endian for Zvkg instructions. In kernel, we have
the big-endian H directly. There is no need for endian conversion.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 14 ++
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/ghash-riscv64-glue.c | 191 ++++++++++++++++++++++++
arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 +++++++++++++
4 files changed, 312 insertions(+)
create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index dfa9d0146d26..00be7177eb1e 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -35,4 +35,18 @@ config CRYPTO_AES_BLOCK_RISCV64
- Zvkg vector crypto extension (XTS)
- Zvkned vector crypto extension

+config CRYPTO_GHASH_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Hash functions: GHASH"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_GCM
+ select CRYPTO_GHASH
+ select CRYPTO_HASH
+ select CRYPTO_LIB_GF128MUL
+ help
+ GCM GHASH function (NIST SP 800-38D)
+
+ Architecture: riscv64 using:
+ - Zvkg vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 42a4e8ec79cf..532316cc1758 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o

+obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
+ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -21,6 +24,10 @@ $(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl
$(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
$(call cmd,perlasm)

+$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
clean-files += aes-riscv64-zvkb-zvkned.S
+clean-files += ghash-riscv64-zvkg.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
new file mode 100644
index 000000000000..d5b7f0e4f612
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * RISC-V optimized GHASH routines
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/ghash.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* ghash using zvkg vector crypto extension */
+void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len);
+
+struct riscv64_ghash_context {
+ be128 key;
+};
+
+struct riscv64_ghash_desc_ctx {
+ be128 shash;
+ u8 buffer[GHASH_BLOCK_SIZE];
+ u32 bytes;
+};
+
+typedef void (*ghash_func)(be128 *Xi, const be128 *H, const u8 *inp,
+ size_t len);
+
+static inline void ghash_blocks(const struct riscv64_ghash_context *ctx,
+ struct riscv64_ghash_desc_ctx *dctx,
+ const u8 *src, size_t srclen, ghash_func func)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ func(&dctx->shash, &ctx->key, src, srclen);
+ kernel_vector_end();
+ } else {
+ while (srclen >= GHASH_BLOCK_SIZE) {
+ crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE);
+ gf128mul_lle(&dctx->shash, &ctx->key);
+ srclen -= GHASH_BLOCK_SIZE;
+ src += GHASH_BLOCK_SIZE;
+ }
+ }
+}
+
+static int ghash_update(struct shash_desc *desc, const u8 *src, size_t srclen,
+ ghash_func func)
+{
+ size_t len;
+ const struct riscv64_ghash_context *ctx =
+ crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+ if (dctx->bytes) {
+ if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) {
+ memcpy(dctx->buffer + dctx->bytes, src, srclen);
+ dctx->bytes += srclen;
+ return 0;
+ }
+ memcpy(dctx->buffer + dctx->bytes, src,
+ GHASH_BLOCK_SIZE - dctx->bytes);
+
+ ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func);
+
+ src += GHASH_BLOCK_SIZE - dctx->bytes;
+ srclen -= GHASH_BLOCK_SIZE - dctx->bytes;
+ dctx->bytes = 0;
+ }
+ len = srclen & ~(GHASH_BLOCK_SIZE - 1);
+
+ if (len) {
+ ghash_blocks(ctx, dctx, src, len, func);
+ src += len;
+ srclen -= len;
+ }
+
+ if (srclen) {
+ memcpy(dctx->buffer, src, srclen);
+ dctx->bytes = srclen;
+ }
+
+ return 0;
+}
+
+static int ghash_final(struct shash_desc *desc, u8 *out, ghash_func func)
+{
+ const struct riscv64_ghash_context *ctx =
+ crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+ int i;
+
+ if (dctx->bytes) {
+ for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
+ dctx->buffer[i] = 0;
+
+ ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func);
+ dctx->bytes = 0;
+ }
+
+ memcpy(out, &dctx->shash, GHASH_DIGEST_SIZE);
+
+ return 0;
+}
+
+static int ghash_init(struct shash_desc *desc)
+{
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+ *dctx = (struct riscv64_ghash_desc_ctx){};
+
+ return 0;
+}
+
+static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src,
+ unsigned int srclen)
+{
+ return ghash_update(desc, src, srclen, gcm_ghash_rv64i_zvkg);
+}
+
+static int ghash_final_zvkg(struct shash_desc *desc, u8 *out)
+{
+ return ghash_final(desc, out, gcm_ghash_rv64i_zvkg);
+}
+
+static int ghash_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct riscv64_ghash_context *ctx =
+ crypto_tfm_ctx(crypto_shash_tfm(tfm));
+
+ if (keylen != GHASH_BLOCK_SIZE)
+ return -EINVAL;
+
+ memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);
+
+ return 0;
+}
+
+static struct shash_alg riscv64_ghash_alg_zvkg = {
+ .digestsize = GHASH_DIGEST_SIZE,
+ .init = ghash_init,
+ .update = ghash_update_zvkg,
+ .final = ghash_final_zvkg,
+ .setkey = ghash_setkey,
+ .descsize = sizeof(struct riscv64_ghash_desc_ctx),
+ .base = {
+ .cra_name = "ghash",
+ .cra_driver_name = "ghash-riscv64-zvkg",
+ .cra_priority = 303,
+ .cra_blocksize = GHASH_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_ghash_context),
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static inline bool check_ghash_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKG) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_ghash_mod_init(void)
+{
+ if (check_ghash_ext())
+ return crypto_register_shash(&riscv64_ghash_alg_zvkg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_ghash_mod_fini(void)
+{
+ if (check_ghash_ext())
+ crypto_unregister_shash(&riscv64_ghash_alg_zvkg);
+}
+
+module_init(riscv64_ghash_mod_init);
+module_exit(riscv64_ghash_mod_fini);
+
+MODULE_DESCRIPTION("GCM GHASH (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
new file mode 100644
index 000000000000..4beea4ac9cbe
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
@@ -0,0 +1,100 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector GCM/GMAC extension ('Zvkg')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+###############################################################################
+# void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len)
+#
+# input: Xi: current hash value
+# H: hash key
+# inp: pointer to input data
+# len: length of input data in bytes (multiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$H,$inp,$len) = ("a0","a1","a2","a3");
+my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkg
+.type gcm_ghash_rv64i_zvkg,\@function
+gcm_ghash_rv64i_zvkg:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $vH, $H]}
+ @{[vle32_v $vXi, $Xi]}
+
+Lstep:
+ @{[vle32_v $vinp, $inp]}
+ add $inp, $inp, 16
+ add $len, $len, -16
+ @{[vghsh_vv $vXi, $vH, $vinp]}
+ bnez $len, Lstep
+
+ @{[vse32_v $vXi, $Xi]}
+ ret
+
+.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-10-25 18:38:28

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

Port the vector-crypto accelerated CBC, CTR, ECB and XTS block modes for
AES cipher from OpenSSL(openssl/openssl#21923).
In addition, support XTS-AES-192 mode which is not existed in OpenSSL.

Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 21 +
arch/riscv/crypto/Makefile | 11 +
.../crypto/aes-riscv64-block-mode-glue.c | 486 +++++++++
.../crypto/aes-riscv64-zvbb-zvkg-zvkned.pl | 944 ++++++++++++++++++
arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl | 416 ++++++++
arch/riscv/crypto/aes-riscv64-zvkned.pl | 927 +++++++++++++++--
6 files changed, 2715 insertions(+), 90 deletions(-)
create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 500938317e71..dfa9d0146d26 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -14,4 +14,25 @@ config CRYPTO_AES_RISCV64
Architecture: riscv64 using:
- Zvkned vector crypto extension

+config CRYPTO_AES_BLOCK_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_AES_RISCV64
+ select CRYPTO_SKCIPHER
+ help
+ Length-preserving ciphers: AES cipher algorithms (FIPS-197)
+ with block cipher modes:
+ - ECB (Electronic Codebook) mode (NIST SP 800-38A)
+ - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
+ - CTR (Counter) mode (NIST SP 800-38A)
+ - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
+ Stealing) mode (NIST SP 800-38E and IEEE 1619)
+
+ Architecture: riscv64 using:
+ - Zvbb vector extension (XTS)
+ - Zvkb vector crypto extension (CTR/XTS)
+ - Zvkg vector crypto extension (XTS)
+ - Zvkned vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 90ca91d8df26..42a4e8ec79cf 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -6,10 +6,21 @@
obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o

+obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
+aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
$(call cmd,perlasm)

+$(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl
+ $(call cmd,perlasm)
+
+$(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
+clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
+clean-files += aes-riscv64-zvkb-zvkned.S
diff --git a/arch/riscv/crypto/aes-riscv64-block-mode-glue.c b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
new file mode 100644
index 000000000000..c33e902eff5e
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL AES block mode implementations for RISC-V
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/ctr.h>
+#include <crypto/xts.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/math.h>
+#include <linux/minmax.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "aes-riscv64-glue.h"
+
+#define AES_BLOCK_VALID_SIZE_MASK (~(AES_BLOCK_SIZE - 1))
+#define AES_BLOCK_REMAINING_SIZE_MASK (AES_BLOCK_SIZE - 1)
+
+struct riscv64_aes_xts_ctx {
+ struct riscv64_aes_ctx ctx1;
+ struct riscv64_aes_ctx ctx2;
+};
+
+/* aes cbc block mode using zvkned vector crypto extension */
+void rv64i_zvkned_cbc_encrypt(const u8 *in, u8 *out, size_t length,
+ const struct aes_key *key, u8 *ivec);
+void rv64i_zvkned_cbc_decrypt(const u8 *in, u8 *out, size_t length,
+ const struct aes_key *key, u8 *ivec);
+/* aes ecb block mode using zvkned vector crypto extension */
+void rv64i_zvkned_ecb_encrypt(const u8 *in, u8 *out, size_t length,
+ const struct aes_key *key);
+void rv64i_zvkned_ecb_decrypt(const u8 *in, u8 *out, size_t length,
+ const struct aes_key *key);
+
+/* aes ctr block mode using zvkb and zvkned vector crypto extension */
+/* This func operates on 32-bit counter. Caller has to handle the overflow. */
+void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const u8 *in, u8 *out,
+ size_t length,
+ const struct aes_key *key,
+ u8 *ivec);
+
+/* aes xts block mode using zvbb, zvkg and zvkned vector crypto extension */
+void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const u8 *in, u8 *out,
+ size_t length,
+ const struct aes_key *key, u8 *iv,
+ int update_iv);
+void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const u8 *in, u8 *out,
+ size_t length,
+ const struct aes_key *key, u8 *iv,
+ int update_iv);
+
+typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length,
+ const struct aes_key *key, u8 *iv, int update_iv);
+
+/* ecb */
+static int aes_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+ return riscv64_aes_setkey(ctx, in_key, key_len);
+}
+
+static int ecb_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ /* If we have error here, the `nbytes` will be zero. */
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & AES_BLOCK_VALID_SIZE_MASK,
+ &ctx->key);
+ kernel_vector_end();
+ err = skcipher_walk_done(
+ &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+ }
+
+ return err;
+}
+
+static int ecb_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_ecb_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & AES_BLOCK_VALID_SIZE_MASK,
+ &ctx->key);
+ kernel_vector_end();
+ err = skcipher_walk_done(
+ &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+ }
+
+ return err;
+}
+
+/* cbc */
+static int cbc_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & AES_BLOCK_VALID_SIZE_MASK,
+ &ctx->key, walk.iv);
+ kernel_vector_end();
+ err = skcipher_walk_done(
+ &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+ }
+
+ return err;
+}
+
+static int cbc_decrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ kernel_vector_begin();
+ rv64i_zvkned_cbc_decrypt(walk.src.virt.addr, walk.dst.virt.addr,
+ nbytes & AES_BLOCK_VALID_SIZE_MASK,
+ &ctx->key, walk.iv);
+ kernel_vector_end();
+ err = skcipher_walk_done(
+ &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
+ }
+
+ return err;
+}
+
+/* ctr */
+static int ctr_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int ctr32;
+ unsigned int nbytes;
+ unsigned int blocks;
+ unsigned int current_blocks;
+ unsigned int current_length;
+ int err;
+
+ /* the ctr iv uses big endian */
+ ctr32 = get_unaligned_be32(req->iv + 12);
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ if (nbytes != walk.total) {
+ nbytes &= AES_BLOCK_VALID_SIZE_MASK;
+ blocks = nbytes / AES_BLOCK_SIZE;
+ } else {
+ /* This is the last walk. We should handle the tail data. */
+ blocks = (nbytes + (AES_BLOCK_SIZE - 1)) /
+ AES_BLOCK_SIZE;
+ }
+ ctr32 += blocks;
+
+ kernel_vector_begin();
+ /*
+ * The `if` block below detects the overflow, which is then handled by
+ * limiting the amount of blocks to the exact overflow point.
+ */
+ if (ctr32 >= blocks) {
+ rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+ walk.src.virt.addr, walk.dst.virt.addr, nbytes,
+ &ctx->key, req->iv);
+ } else {
+ /* use 2 ctr32 function calls for overflow case */
+ current_blocks = blocks - ctr32;
+ current_length =
+ min(nbytes, current_blocks * AES_BLOCK_SIZE);
+ rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+ walk.src.virt.addr, walk.dst.virt.addr,
+ current_length, &ctx->key, req->iv);
+ crypto_inc(req->iv, 12);
+
+ if (ctr32) {
+ rv64i_zvkb_zvkned_ctr32_encrypt_blocks(
+ walk.src.virt.addr +
+ current_blocks * AES_BLOCK_SIZE,
+ walk.dst.virt.addr +
+ current_blocks * AES_BLOCK_SIZE,
+ nbytes - current_length, &ctx->key,
+ req->iv);
+ }
+ }
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
+
+ return err;
+}
+
+/* xts */
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ unsigned int xts_single_key_len = key_len / 2;
+ int ret;
+
+ ret = xts_verify_key(tfm, in_key, key_len);
+ if (ret)
+ return ret;
+ ret = riscv64_aes_setkey(&ctx->ctx1, in_key, xts_single_key_len);
+ if (ret)
+ return ret;
+ return riscv64_aes_setkey(&ctx->ctx2, in_key + xts_single_key_len,
+ xts_single_key_len);
+}
+
+static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_request sub_req;
+ struct scatterlist sg_src[2], sg_dst[2];
+ struct scatterlist *src, *dst;
+ struct skcipher_walk walk;
+ unsigned int walk_size = crypto_skcipher_walksize(tfm);
+ unsigned int tail_bytes;
+ unsigned int head_bytes;
+ unsigned int nbytes;
+ unsigned int update_iv = 1;
+ int err;
+
+ /* xts input size should be bigger than AES_BLOCK_SIZE */
+ if (req->cryptlen < AES_BLOCK_SIZE)
+ return -EINVAL;
+
+ /*
+ * The tail size should be small than walk_size. Thus, we could make sure the
+ * walk size for tail elements could be bigger than AES_BLOCK_SIZE.
+ */
+ if (req->cryptlen <= walk_size) {
+ tail_bytes = req->cryptlen;
+ head_bytes = 0;
+ } else {
+ if (req->cryptlen & AES_BLOCK_REMAINING_SIZE_MASK) {
+ tail_bytes = req->cryptlen &
+ AES_BLOCK_REMAINING_SIZE_MASK;
+ tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
+ head_bytes = req->cryptlen - tail_bytes;
+ } else {
+ tail_bytes = 0;
+ head_bytes = req->cryptlen;
+ }
+ }
+
+ riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
+
+ if (head_bytes && tail_bytes) {
+ skcipher_request_set_tfm(&sub_req, tfm);
+ skcipher_request_set_callback(
+ &sub_req, skcipher_request_flags(req), NULL, NULL);
+ skcipher_request_set_crypt(&sub_req, req->src, req->dst,
+ head_bytes, req->iv);
+ req = &sub_req;
+ }
+
+ if (head_bytes) {
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes)) {
+ if (nbytes == walk.total)
+ update_iv = (tail_bytes > 0);
+
+ nbytes &= AES_BLOCK_VALID_SIZE_MASK;
+ kernel_vector_begin();
+ func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
+ &ctx->ctx1.key, req->iv, update_iv);
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
+ if (err || !tail_bytes)
+ return err;
+
+ dst = src = scatterwalk_next(sg_src, &walk.in);
+ if (req->dst != req->src)
+ dst = scatterwalk_next(sg_dst, &walk.out);
+ skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
+ }
+
+ /* tail */
+ err = skcipher_walk_virt(&walk, req, false);
+ if (err)
+ return err;
+ if (walk.nbytes != tail_bytes)
+ return -EINVAL;
+ kernel_vector_begin();
+ func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes,
+ &ctx->ctx1.key, req->iv, 0);
+ kernel_vector_end();
+
+ return skcipher_walk_done(&walk, 0);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+ return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt);
+}
+
+static struct skcipher_alg riscv64_aes_alg_zvkned[] = { {
+ .base = {
+ .cra_name = "ecb(aes)",
+ .cra_driver_name = "ecb-aes-riscv64-zvkned",
+ .cra_priority = 300,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .setkey = aes_setkey,
+ .encrypt = ecb_encrypt,
+ .decrypt = ecb_decrypt,
+}, {
+ .base = {
+ .cra_name = "cbc(aes)",
+ .cra_driver_name = "cbc-aes-riscv64-zvkned",
+ .cra_priority = 300,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .setkey = aes_setkey,
+ .encrypt = cbc_encrypt,
+ .decrypt = cbc_decrypt,
+} };
+
+static struct skcipher_alg riscv64_aes_alg_zvkb_zvkned[] = { {
+ .base = {
+ .cra_name = "ctr(aes)",
+ .cra_driver_name = "ctr-aes-riscv64-zvkb-zvkned",
+ .cra_priority = 300,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct riscv64_aes_ctx),
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .setkey = aes_setkey,
+ .encrypt = ctr_encrypt,
+ .decrypt = ctr_encrypt,
+} };
+
+static struct skcipher_alg riscv64_aes_alg_zvbb_zvkg_zvkned[] = { {
+ .base = {
+ .cra_name = "xts(aes)",
+ .cra_driver_name = "xts-aes-riscv64-zvbb-zvkg-zvkned",
+ .cra_priority = 300,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_aes_xts_ctx),
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = AES_MIN_KEY_SIZE * 2,
+ .max_keysize = AES_MAX_KEY_SIZE * 2,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .setkey = xts_setkey,
+ .encrypt = xts_encrypt,
+ .decrypt = xts_decrypt,
+} };
+
+static int __init riscv64_aes_block_mod_init(void)
+{
+ int ret = -ENODEV;
+
+ if (riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128) {
+ ret = crypto_register_skciphers(
+ riscv64_aes_alg_zvkned,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned));
+ if (ret)
+ return ret;
+
+ if (riscv_isa_extension_available(NULL, ZVBB)) {
+ ret = crypto_register_skciphers(
+ riscv64_aes_alg_zvkb_zvkned,
+ ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned));
+ if (ret)
+ goto unregister_zvkned;
+
+ if (riscv_isa_extension_available(NULL, ZVKG)) {
+ ret = crypto_register_skciphers(
+ riscv64_aes_alg_zvbb_zvkg_zvkned,
+ ARRAY_SIZE(
+ riscv64_aes_alg_zvbb_zvkg_zvkned));
+ if (ret)
+ goto unregister_zvkb_zvkned;
+ }
+ }
+ }
+
+ return ret;
+
+unregister_zvkb_zvkned:
+ crypto_unregister_skciphers(riscv64_aes_alg_zvkb_zvkned,
+ ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned));
+unregister_zvkned:
+ crypto_unregister_skciphers(riscv64_aes_alg_zvkned,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned));
+
+ return ret;
+}
+
+static void __exit riscv64_aes_block_mod_fini(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128) {
+ crypto_unregister_skciphers(riscv64_aes_alg_zvkned,
+ ARRAY_SIZE(riscv64_aes_alg_zvkned));
+
+ if (riscv_isa_extension_available(NULL, ZVBB)) {
+ crypto_unregister_skciphers(
+ riscv64_aes_alg_zvkb_zvkned,
+ ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned));
+
+ if (riscv_isa_extension_available(NULL, ZVKG)) {
+ crypto_unregister_skciphers(
+ riscv64_aes_alg_zvbb_zvkg_zvkned,
+ ARRAY_SIZE(
+ riscv64_aes_alg_zvbb_zvkg_zvkned));
+ }
+ }
+ }
+}
+
+module_init(riscv64_aes_block_mod_init);
+module_exit(riscv64_aes_block_mod_fini);
+
+MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("cbc(aes)");
+MODULE_ALIAS_CRYPTO("ctr(aes)");
+MODULE_ALIAS_CRYPTO("ecb(aes)");
+MODULE_ALIAS_CRYPTO("xts(aes)");
diff --git a/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
new file mode 100644
index 000000000000..0daec9c38574
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl
@@ -0,0 +1,944 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Bit-manipulation extension ('Zvbb')
+# - RISC-V Vector GCM/GMAC extension ('Zvkg')
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+{
+################################################################################
+# void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const unsigned char *in,
+# unsigned char *out, size_t length,
+# const AES_KEY *key,
+# unsigned char iv[16],
+# int update_iv)
+my ($INPUT, $OUTPUT, $LENGTH, $KEY, $IV, $UPDATE_IV) = ("a0", "a1", "a2", "a3", "a4", "a5");
+my ($TAIL_LENGTH) = ("a6");
+my ($VL) = ("a7");
+my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3");
+my ($STORE_LEN32) = ("t4");
+my ($LEN32) = ("t5");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+# load iv to v28
+sub load_xts_iv0 {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V28, $IV]}
+___
+
+ return $code;
+}
+
+# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
+sub init_first_round {
+ my $code=<<___;
+ # load input
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vle32_v $V24, $INPUT]}
+
+ li $T0, 5
+ # We could simplify the initialization steps if we have `block<=1`.
+ blt $LEN32, $T0, 1f
+
+ # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses
+ # different order of coefficients. We should use`vbrev8` to reverse the
+ # data when we use `vgmul`.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vbrev8_v $V0, $V28]}
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vmv_v_i $V16, 0]}
+ # v16: [r-IV0, r-IV0, ...]
+ @{[vaesz_vs $V16, $V0]}
+
+ # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
+ slli $T0, $LEN32, 2
+ @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
+ # v2: [`1`, `1`, `1`, `1`, ...]
+ @{[vmv_v_i $V2, 1]}
+ # v3: [`0`, `1`, `2`, `3`, ...]
+ @{[vid_v $V3]}
+ @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
+ # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
+ @{[vzext_vf2 $V4, $V2]}
+ # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
+ @{[vzext_vf2 $V6, $V3]}
+ slli $T0, $LEN32, 1
+ @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
+ # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
+ @{[vwsll_vv $V8, $V4, $V6]}
+
+ # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vbrev8_v $V8, $V8]}
+ @{[vgmul_vv $V16, $V8]}
+
+ # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28.
+ # Reverse the bits order back.
+ @{[vbrev8_v $V28, $V16]}
+
+ # Prepare the x^n multiplier in v20. The `n` is the aes-xts block number
+ # in a LMUL=4 register group.
+ # n = ((VLEN*LMUL)/(32*4)) = ((VLEN*4)/(32*4))
+ # = (VLEN/32)
+ # We could use vsetvli with `e32, m1` to compute the `n` number.
+ @{[vsetvli $T0, "zero", "e32", "m1", "ta", "ma"]}
+ li $T1, 1
+ sll $T0, $T1, $T0
+ @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V0, 0]}
+ @{[vsetivli "zero", 1, "e64", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V0, $T0]}
+ @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]}
+ @{[vbrev8_v $V0, $V0]}
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vmv_v_i $V20, 0]}
+ @{[vaesz_vs $V20, $V0]}
+
+ j 2f
+1:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vbrev8_v $V16, $V28]}
+2:
+___
+
+ return $code;
+}
+
+# prepare xts enc last block's input(v24) and iv(v28)
+sub handle_xts_enc_last_block {
+ my $code=<<___;
+ bnez $TAIL_LENGTH, 2f
+
+ beqz $UPDATE_IV, 1f
+ ## Store next IV
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V28, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V28, $T0]}
+
+ # IV * `x`
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V28]}
+ # Reverse the IV's bits order back to big-endian
+ @{[vbrev8_v $V28, $V16]}
+
+ @{[vse32_v $V28, $IV]}
+1:
+
+ ret
+2:
+ # slidedown second to last block
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # ciphertext
+ @{[vslidedown_vx $V24, $V24, $VL]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block into v24
+ # note: We should load the last block before store the second to last block
+ # for in-place operation.
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V28, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V28, $T0]}
+
+ # compute IV for last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V28]}
+ @{[vbrev8_v $V28, $V16]}
+
+ # store second to last block
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "ta", "ma"]}
+ @{[vse8_v $V25, $OUTPUT]}
+___
+
+ return $code;
+}
+
+# prepare xts dec second to last block's input(v24) and iv(v29) and
+# last block's and iv(v28)
+sub handle_xts_dec_last_block {
+ my $code=<<___;
+ bnez $TAIL_LENGTH, 2f
+
+ beqz $UPDATE_IV, 1f
+ ## Store next IV
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V28, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V28, $T0]}
+
+ beqz $LENGTH, 3f
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+3:
+ # IV * `x`
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V28]}
+ # Reverse the IV's bits order back to big-endian
+ @{[vbrev8_v $V28, $V16]}
+
+ @{[vse32_v $V28, $IV]}
+1:
+
+ ret
+2:
+ # load second to last block's ciphertext
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V24, $INPUT]}
+ addi $INPUT, $INPUT, 16
+
+ # setup `x` multiplier with byte-reversed order
+ # 0b00000010 => 0b01000000 (0x40)
+ li $T0, 0x40
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V20, 0]}
+ @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]}
+ @{[vmv_v_x $V20, $T0]}
+
+ beqz $LENGTH, 1f
+ # slidedown third to last block
+ addi $VL, $VL, -4
+ @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]}
+ # multiplier
+ @{[vslidedown_vx $V16, $V16, $VL]}
+
+ # compute IV for last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V20]}
+ @{[vbrev8_v $V28, $V16]}
+
+ # compute IV for second to last block
+ @{[vgmul_vv $V16, $V20]}
+ @{[vbrev8_v $V29, $V16]}
+ j 2f
+1:
+ # compute IV for second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vgmul_vv $V16, $V20]}
+ @{[vbrev8_v $V29, $V16]}
+2:
+___
+
+ return $code;
+}
+
+# Load all 11 round keys to v1-v11 registers.
+sub aes_128_load_key {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V2, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V3, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V4, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V5, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V6, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V7, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V8, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V9, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V10, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V11, $KEY]}
+___
+
+ return $code;
+}
+
+# Load all 13 round keys to v1-v13 registers.
+sub aes_192_load_key {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V2, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V3, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V4, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V5, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V6, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V7, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V8, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V9, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V10, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V11, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V12, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V13, $KEY]}
+___
+
+ return $code;
+}
+
+# Load all 15 round keys to v1-v15 registers.
+sub aes_256_load_key {
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V2, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V3, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V4, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V5, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V6, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V7, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V8, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V9, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V10, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V11, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V12, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V13, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V14, $KEY]}
+ addi $KEY, $KEY, 16
+ @{[vle32_v $V15, $KEY]}
+___
+
+ return $code;
+}
+
+# aes-128 enc with round keys v1-v11
+sub aes_128_enc {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesef_vs $V24, $V11]}
+___
+
+ return $code;
+}
+
+# aes-128 dec with round keys v1-v11
+sub aes_128_dec {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V11]}
+ @{[vaesdm_vs $V24, $V10]}
+ @{[vaesdm_vs $V24, $V9]}
+ @{[vaesdm_vs $V24, $V8]}
+ @{[vaesdm_vs $V24, $V7]}
+ @{[vaesdm_vs $V24, $V6]}
+ @{[vaesdm_vs $V24, $V5]}
+ @{[vaesdm_vs $V24, $V4]}
+ @{[vaesdm_vs $V24, $V3]}
+ @{[vaesdm_vs $V24, $V2]}
+ @{[vaesdf_vs $V24, $V1]}
+___
+
+ return $code;
+}
+
+# aes-192 enc with round keys v1-v13
+sub aes_192_enc {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesef_vs $V24, $V13]}
+___
+
+ return $code;
+}
+
+# aes-192 dec with round keys v1-v13
+sub aes_192_dec {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V13]}
+ @{[vaesdm_vs $V24, $V12]}
+ @{[vaesdm_vs $V24, $V11]}
+ @{[vaesdm_vs $V24, $V10]}
+ @{[vaesdm_vs $V24, $V9]}
+ @{[vaesdm_vs $V24, $V8]}
+ @{[vaesdm_vs $V24, $V7]}
+ @{[vaesdm_vs $V24, $V6]}
+ @{[vaesdm_vs $V24, $V5]}
+ @{[vaesdm_vs $V24, $V4]}
+ @{[vaesdm_vs $V24, $V3]}
+ @{[vaesdm_vs $V24, $V2]}
+ @{[vaesdf_vs $V24, $V1]}
+___
+
+ return $code;
+}
+
+# aes-256 enc with round keys v1-v15
+sub aes_256_enc {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesem_vs $V24, $V13]}
+ @{[vaesem_vs $V24, $V14]}
+ @{[vaesef_vs $V24, $V15]}
+___
+
+ return $code;
+}
+
+# aes-256 dec with round keys v1-v15
+sub aes_256_dec {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V15]}
+ @{[vaesdm_vs $V24, $V14]}
+ @{[vaesdm_vs $V24, $V13]}
+ @{[vaesdm_vs $V24, $V12]}
+ @{[vaesdm_vs $V24, $V11]}
+ @{[vaesdm_vs $V24, $V10]}
+ @{[vaesdm_vs $V24, $V9]}
+ @{[vaesdm_vs $V24, $V8]}
+ @{[vaesdm_vs $V24, $V7]}
+ @{[vaesdm_vs $V24, $V6]}
+ @{[vaesdm_vs $V24, $V5]}
+ @{[vaesdm_vs $V24, $V4]}
+ @{[vaesdm_vs $V24, $V3]}
+ @{[vaesdm_vs $V24, $V2]}
+ @{[vaesdf_vs $V24, $V1]}
+___
+
+ return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt
+.type rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,\@function
+rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt:
+ @{[load_xts_iv0]}
+
+ # aes block size is 16
+ andi $TAIL_LENGTH, $LENGTH, 15
+ mv $STORE_LEN32, $LENGTH
+ beqz $TAIL_LENGTH, 1f
+ sub $LENGTH, $LENGTH, $TAIL_LENGTH
+ addi $STORE_LEN32, $LENGTH, -16
+1:
+ # We make the `LENGTH` become e32 length here.
+ srli $LEN32, $LENGTH, 2
+ srli $STORE_LEN32, $STORE_LEN32, 2
+
+ # Load number of rounds
+ lwu $T0, 240($KEY)
+ li $T1, 14
+ li $T2, 12
+ li $T3, 10
+ beq $T0, $T1, aes_xts_enc_256
+ beq $T0, $T2, aes_xts_enc_192
+ beq $T0, $T3, aes_xts_enc_128
+.size rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_128:
+ @{[init_first_round]}
+ @{[aes_128_load_key]}
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Lenc_blocks_128:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load plaintext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_128_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store ciphertext
+ @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+ sub $STORE_LEN32, $STORE_LEN32, $VL
+
+ bnez $LEN32, .Lenc_blocks_128
+
+ @{[handle_xts_enc_last_block]}
+
+ # xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_128_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store last block ciphertext
+ addi $OUTPUT, $OUTPUT, -16
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_enc_128,.-aes_xts_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_192:
+ @{[init_first_round]}
+ @{[aes_192_load_key]}
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Lenc_blocks_192:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load plaintext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_192_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store ciphertext
+ @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+ sub $STORE_LEN32, $STORE_LEN32, $VL
+
+ bnez $LEN32, .Lenc_blocks_192
+
+ @{[handle_xts_enc_last_block]}
+
+ # xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_192_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store last block ciphertext
+ addi $OUTPUT, $OUTPUT, -16
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_enc_192,.-aes_xts_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_enc_256:
+ @{[init_first_round]}
+ @{[aes_256_load_key]}
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Lenc_blocks_256:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load plaintext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_256_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store ciphertext
+ @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+ sub $STORE_LEN32, $STORE_LEN32, $VL
+
+ bnez $LEN32, .Lenc_blocks_256
+
+ @{[handle_xts_enc_last_block]}
+
+ # xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_256_enc]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store last block ciphertext
+ addi $OUTPUT, $OUTPUT, -16
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_enc_256,.-aes_xts_enc_256
+___
+
+################################################################################
+# void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const unsigned char *in,
+# unsigned char *out, size_t length,
+# const AES_KEY *key,
+# unsigned char iv[16],
+# int update_iv)
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt
+.type rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,\@function
+rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt:
+ @{[load_xts_iv0]}
+
+ # aes block size is 16
+ andi $TAIL_LENGTH, $LENGTH, 15
+ beqz $TAIL_LENGTH, 1f
+ sub $LENGTH, $LENGTH, $TAIL_LENGTH
+ addi $LENGTH, $LENGTH, -16
+1:
+ # We make the `LENGTH` become e32 length here.
+ srli $LEN32, $LENGTH, 2
+
+ # Load number of rounds
+ lwu $T0, 240($KEY)
+ li $T1, 14
+ li $T2, 12
+ li $T3, 10
+ beq $T0, $T1, aes_xts_dec_256
+ beq $T0, $T2, aes_xts_dec_192
+ beq $T0, $T3, aes_xts_dec_128
+.size rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_128:
+ @{[init_first_round]}
+ @{[aes_128_load_key]}
+
+ beqz $LEN32, 2f
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Ldec_blocks_128:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load ciphertext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_128_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+
+ bnez $LEN32, .Ldec_blocks_128
+
+2:
+ @{[handle_xts_dec_last_block]}
+
+ ## xts second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[aes_128_dec]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block ciphertext
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # store second to last block plaintext
+ addi $T0, $OUTPUT, 16
+ @{[vse8_v $V25, $T0]}
+
+ ## xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_128_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store second to last block plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_dec_128,.-aes_xts_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_192:
+ @{[init_first_round]}
+ @{[aes_192_load_key]}
+
+ beqz $LEN32, 2f
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Ldec_blocks_192:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load ciphertext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_192_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+
+ bnez $LEN32, .Ldec_blocks_192
+
+2:
+ @{[handle_xts_dec_last_block]}
+
+ ## xts second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[aes_192_dec]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block ciphertext
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # store second to last block plaintext
+ addi $T0, $OUTPUT, 16
+ @{[vse8_v $V25, $T0]}
+
+ ## xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_192_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store second to last block plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_dec_192,.-aes_xts_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+aes_xts_dec_256:
+ @{[init_first_round]}
+ @{[aes_256_load_key]}
+
+ beqz $LEN32, 2f
+
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ j 1f
+
+.Ldec_blocks_256:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ # load ciphertext into v24
+ @{[vle32_v $V24, $INPUT]}
+ # update iv
+ @{[vgmul_vv $V16, $V20]}
+ # reverse the iv's bits order back
+ @{[vbrev8_v $V28, $V16]}
+1:
+ @{[vxor_vv $V24, $V24, $V28]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+ add $INPUT, $INPUT, $T0
+ @{[aes_256_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+ add $OUTPUT, $OUTPUT, $T0
+
+ bnez $LEN32, .Ldec_blocks_256
+
+2:
+ @{[handle_xts_dec_last_block]}
+
+ ## xts second to last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[aes_256_dec]}
+ @{[vxor_vv $V24, $V24, $V29]}
+ @{[vmv_v_v $V25, $V24]}
+
+ # load last block ciphertext
+ @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]}
+ @{[vle8_v $V24, $INPUT]}
+
+ # store second to last block plaintext
+ addi $T0, $OUTPUT, 16
+ @{[vse8_v $V25, $T0]}
+
+ ## xts last block
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V28]}
+ @{[aes_256_dec]}
+ @{[vxor_vv $V24, $V24, $V28]}
+
+ # store second to last block plaintext
+ @{[vse32_v $V24, $OUTPUT]}
+
+ ret
+.size aes_xts_dec_256,.-aes_xts_dec_256
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl
new file mode 100644
index 000000000000..bc659da44c53
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl
@@ -0,0 +1,416 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector AES block cipher extension ('Zvkned')
+# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const unsigned char *in,
+# unsigned char *out, size_t length,
+# const void *key,
+# unsigned char ivec[16]);
+{
+my ($INP, $OUTP, $LEN, $KEYP, $IVP) = ("a0", "a1", "a2", "a3", "a4");
+my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3");
+my ($VL) = ("t4");
+my ($LEN32) = ("t5");
+my ($CTR) = ("t6");
+my ($MASK) = ("v0");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+# Prepare the AES ctr input data into v16.
+sub init_aes_ctr_input {
+ my $code=<<___;
+ # Setup mask into v0
+ # The mask pattern for 4*N-th elements
+ # mask v0: [000100010001....]
+ # Note:
+ # We could setup the mask just for the maximum element length instead of
+ # the VLMAX.
+ li $T0, 0b10001000
+ @{[vsetvli $T2, "zero", "e8", "m1", "ta", "ma"]}
+ @{[vmv_v_x $MASK, $T0]}
+ # Load IV.
+ # v31:[IV0, IV1, IV2, big-endian count]
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V31, $IVP]}
+ # Convert the big-endian counter into little-endian.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ @{[vrev8_v $V31, $V31, $MASK]}
+ # Splat the IV to v16
+ @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
+ @{[vmv_v_i $V16, 0]}
+ @{[vaesz_vs $V16, $V31]}
+ # Prepare the ctr pattern into v20
+ # v20: [x, x, x, 0, x, x, x, 1, x, x, x, 2, ...]
+ @{[viota_m $V20, $MASK, $MASK]}
+ # v16:[IV0, IV1, IV2, count+0, IV0, IV1, IV2, count+1, ...]
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ @{[vadd_vv $V16, $V16, $V20, $MASK]}
+___
+
+ return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkb_zvkned_ctr32_encrypt_blocks
+.type rv64i_zvkb_zvkned_ctr32_encrypt_blocks,\@function
+rv64i_zvkb_zvkned_ctr32_encrypt_blocks:
+ # The aes block size is 16 bytes.
+ # We try to get the minimum aes block number including the tail data.
+ addi $T0, $LEN, 15
+ # the minimum block number
+ srli $T0, $T0, 4
+ # We make the block number become e32 length here.
+ slli $LEN32, $T0, 2
+
+ # Load number of rounds
+ lwu $T0, 240($KEYP)
+ li $T1, 14
+ li $T2, 12
+ li $T3, 10
+
+ beq $T0, $T1, ctr32_encrypt_blocks_256
+ beq $T0, $T2, ctr32_encrypt_blocks_192
+ beq $T0, $T3, ctr32_encrypt_blocks_128
+
+ ret
+.size rv64i_zvkb_zvkned_ctr32_encrypt_blocks,.-rv64i_zvkb_zvkned_ctr32_encrypt_blocks
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+
+ @{[init_aes_ctr_input]}
+
+ ##### AES body
+ j 2f
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+ # Prepare the AES ctr input into v24.
+ # The ctr data uses big-endian form.
+ @{[vmv_v_v $V24, $V16]}
+ @{[vrev8_v $V24, $V24, $MASK]}
+ srli $CTR, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ # Load plaintext in bytes into v20.
+ @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+ @{[vle8_v $V20, $INP]}
+ sub $LEN, $LEN, $T0
+ add $INP, $INP, $T0
+
+ @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesef_vs $V24, $V11]}
+
+ # ciphertext
+ @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V20]}
+
+ # Store the ciphertext.
+ @{[vse8_v $V24, $OUTP]}
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN, 1b
+
+ ## store ctr iv
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+ # Convert ctr data back to big-endian.
+ @{[vrev8_v $V16, $V16, $MASK]}
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size ctr32_encrypt_blocks_128,.-ctr32_encrypt_blocks_128
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+
+ @{[init_aes_ctr_input]}
+
+ ##### AES body
+ j 2f
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+ # Prepare the AES ctr input into v24.
+ # The ctr data uses big-endian form.
+ @{[vmv_v_v $V24, $V16]}
+ @{[vrev8_v $V24, $V24, $MASK]}
+ srli $CTR, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ # Load plaintext in bytes into v20.
+ @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+ @{[vle8_v $V20, $INP]}
+ sub $LEN, $LEN, $T0
+ add $INP, $INP, $T0
+
+ @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesef_vs $V24, $V13]}
+
+ # ciphertext
+ @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V20]}
+
+ # Store the ciphertext.
+ @{[vse8_v $V24, $OUTP]}
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN, 1b
+
+ ## store ctr iv
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+ # Convert ctr data back to big-endian.
+ @{[vrev8_v $V16, $V16, $MASK]}
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size ctr32_encrypt_blocks_192,.-ctr32_encrypt_blocks_192
+___
+
+$code .= <<___;
+.p2align 3
+ctr32_encrypt_blocks_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+
+ @{[init_aes_ctr_input]}
+
+ ##### AES body
+ j 2f
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+2:
+ # Prepare the AES ctr input into v24.
+ # The ctr data uses big-endian form.
+ @{[vmv_v_v $V24, $V16]}
+ @{[vrev8_v $V24, $V24, $MASK]}
+ srli $CTR, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ # Load plaintext in bytes into v20.
+ @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]}
+ @{[vle8_v $V20, $INP]}
+ sub $LEN, $LEN, $T0
+ add $INP, $INP, $T0
+
+ @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]}
+ @{[vaesz_vs $V24, $V1]}
+ @{[vaesem_vs $V24, $V2]}
+ @{[vaesem_vs $V24, $V3]}
+ @{[vaesem_vs $V24, $V4]}
+ @{[vaesem_vs $V24, $V5]}
+ @{[vaesem_vs $V24, $V6]}
+ @{[vaesem_vs $V24, $V7]}
+ @{[vaesem_vs $V24, $V8]}
+ @{[vaesem_vs $V24, $V9]}
+ @{[vaesem_vs $V24, $V10]}
+ @{[vaesem_vs $V24, $V11]}
+ @{[vaesem_vs $V24, $V12]}
+ @{[vaesem_vs $V24, $V13]}
+ @{[vaesem_vs $V24, $V14]}
+ @{[vaesef_vs $V24, $V15]}
+
+ # ciphertext
+ @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]}
+ @{[vxor_vv $V24, $V24, $V20]}
+
+ # Store the ciphertext.
+ @{[vse8_v $V24, $OUTP]}
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN, 1b
+
+ ## store ctr iv
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]}
+ # Increase ctr in v16.
+ @{[vadd_vx $V16, $V16, $CTR, $MASK]}
+ # Convert ctr data back to big-endian.
+ @{[vrev8_v $V16, $V16, $MASK]}
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size ctr32_encrypt_blocks_256,.-ctr32_encrypt_blocks_256
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
index c0ecde77bf56..4689f878463a 100644
--- a/arch/riscv/crypto/aes-riscv64-zvkned.pl
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -66,6 +66,753 @@ my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
$V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
) = map("v$_",(0..31));

+# Load all 11 round keys to v1-v11 registers.
+sub aes_128_load_key {
+ my $KEYP = shift;
+
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+___
+
+ return $code;
+}
+
+# Load all 13 round keys to v1-v13 registers.
+sub aes_192_load_key {
+ my $KEYP = shift;
+
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+___
+
+ return $code;
+}
+
+# Load all 15 round keys to v1-v15 registers.
+sub aes_256_load_key {
+ my $KEYP = shift;
+
+ my $code=<<___;
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vle32_v $V1, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V2, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V3, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V4, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V5, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V6, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V7, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V8, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V9, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V10, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V11, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V12, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V13, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V14, $KEYP]}
+ addi $KEYP, $KEYP, 16
+ @{[vle32_v $V15, $KEYP]}
+___
+
+ return $code;
+}
+
+# aes-128 encryption with round keys v1-v11
+sub aes_128_encrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3]
+ @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesem_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesem_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesem_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesem_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesem_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesem_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesem_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesef_vs $V24, $V11]} # with round key w[40,43]
+___
+
+ return $code;
+}
+
+# aes-128 decryption with round keys v1-v11
+sub aes_128_decrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesdm_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesdm_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesdm_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesdm_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesdm_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesdm_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesdm_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3]
+___
+
+ return $code;
+}
+
+# aes-192 encryption with round keys v1-v13
+sub aes_192_encrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3]
+ @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesem_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesem_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesem_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesem_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesem_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesem_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesem_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesem_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesem_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesef_vs $V24, $V13]} # with round key w[48,51]
+___
+
+ return $code;
+}
+
+# aes-192 decryption with round keys v1-v13
+sub aes_192_decrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V13]} # with round key w[48,51]
+ @{[vaesdm_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesdm_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesdm_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesdm_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesdm_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesdm_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesdm_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesdm_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesdm_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3]
+___
+
+ return $code;
+}
+
+# aes-256 encryption with round keys v1-v15
+sub aes_256_encrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3]
+ @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesem_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesem_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesem_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesem_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesem_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesem_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesem_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesem_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesem_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesem_vs $V24, $V13]} # with round key w[48,51]
+ @{[vaesem_vs $V24, $V14]} # with round key w[52,55]
+ @{[vaesef_vs $V24, $V15]} # with round key w[56,59]
+___
+
+ return $code;
+}
+
+# aes-256 decryption with round keys v1-v15
+sub aes_256_decrypt {
+ my $code=<<___;
+ @{[vaesz_vs $V24, $V15]} # with round key w[56,59]
+ @{[vaesdm_vs $V24, $V14]} # with round key w[52,55]
+ @{[vaesdm_vs $V24, $V13]} # with round key w[48,51]
+ @{[vaesdm_vs $V24, $V12]} # with round key w[44,47]
+ @{[vaesdm_vs $V24, $V11]} # with round key w[40,43]
+ @{[vaesdm_vs $V24, $V10]} # with round key w[36,39]
+ @{[vaesdm_vs $V24, $V9]} # with round key w[32,35]
+ @{[vaesdm_vs $V24, $V8]} # with round key w[28,31]
+ @{[vaesdm_vs $V24, $V7]} # with round key w[24,27]
+ @{[vaesdm_vs $V24, $V6]} # with round key w[20,23]
+ @{[vaesdm_vs $V24, $V5]} # with round key w[16,19]
+ @{[vaesdm_vs $V24, $V4]} # with round key w[12,15]
+ @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11]
+ @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7]
+ @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3]
+___
+
+ return $code;
+}
+
+{
+###############################################################################
+# void rv64i_zvkned_cbc_encrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# unsigned char *ivec, const int enc);
+my ($INP, $OUTP, $LEN, $KEYP, $IVP, $ENC) = ("a0", "a1", "a2", "a3", "a4", "a5");
+my ($T0, $T1, $ROUNDS) = ("t0", "t1", "t2");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_cbc_encrypt
+.type rv64i_zvkned_cbc_encrypt,\@function
+rv64i_zvkned_cbc_encrypt:
+ # check whether the length is a multiple of 16 and >= 16
+ li $T1, 16
+ blt $LEN, $T1, L_end
+ andi $T1, $LEN, 15
+ bnez $T1, L_end
+
+ # Load number of rounds
+ lwu $ROUNDS, 240($KEYP)
+
+ # Get proper routine for key size
+ li $T0, 10
+ beq $ROUNDS, $T0, L_cbc_enc_128
+
+ li $T0, 12
+ beq $ROUNDS, $T0, L_cbc_enc_192
+
+ li $T0, 14
+ beq $ROUNDS, $T0, L_cbc_enc_256
+
+ ret
+.size rv64i_zvkned_cbc_encrypt,.-rv64i_zvkned_cbc_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vxor_vv $V24, $V24, $V16]}
+ j 2f
+
+1:
+ @{[vle32_v $V17, $INP]}
+ @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+ # AES body
+ @{[aes_128_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ addi $INP, $INP, 16
+ addi $OUTP, $OUTP, 16
+ addi $LEN, $LEN, -16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V24, $IVP]}
+
+ ret
+.size L_cbc_enc_128,.-L_cbc_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vxor_vv $V24, $V24, $V16]}
+ j 2f
+
+1:
+ @{[vle32_v $V17, $INP]}
+ @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+ # AES body
+ @{[aes_192_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ addi $INP, $INP, 16
+ addi $OUTP, $OUTP, 16
+ addi $LEN, $LEN, -16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V24, $IVP]}
+
+ ret
+.size L_cbc_enc_192,.-L_cbc_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_enc_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vxor_vv $V24, $V24, $V16]}
+ j 2f
+
+1:
+ @{[vle32_v $V17, $INP]}
+ @{[vxor_vv $V24, $V24, $V17]}
+
+2:
+ # AES body
+ @{[aes_256_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ addi $INP, $INP, 16
+ addi $OUTP, $OUTP, 16
+ addi $LEN, $LEN, -16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V24, $IVP]}
+
+ ret
+.size L_cbc_enc_256,.-L_cbc_enc_256
+___
+
+###############################################################################
+# void rv64i_zvkned_cbc_decrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# unsigned char *ivec, const int enc);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_cbc_decrypt
+.type rv64i_zvkned_cbc_decrypt,\@function
+rv64i_zvkned_cbc_decrypt:
+ # check whether the length is a multiple of 16 and >= 16
+ li $T1, 16
+ blt $LEN, $T1, L_end
+ andi $T1, $LEN, 15
+ bnez $T1, L_end
+
+ # Load number of rounds
+ lwu $ROUNDS, 240($KEYP)
+
+ # Get proper routine for key size
+ li $T0, 10
+ beq $ROUNDS, $T0, L_cbc_dec_128
+
+ li $T0, 12
+ beq $ROUNDS, $T0, L_cbc_dec_192
+
+ li $T0, 14
+ beq $ROUNDS, $T0, L_cbc_dec_256
+
+ ret
+.size rv64i_zvkned_cbc_decrypt,.-rv64i_zvkned_cbc_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ j 2f
+
+1:
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ addi $OUTP, $OUTP, 16
+
+2:
+ # AES body
+ @{[aes_128_decrypt]}
+
+ @{[vxor_vv $V24, $V24, $V16]}
+ @{[vse32_v $V24, $OUTP]}
+ @{[vmv_v_v $V16, $V17]}
+
+ addi $LEN, $LEN, -16
+ addi $INP, $INP, 16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size L_cbc_dec_128,.-L_cbc_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ j 2f
+
+1:
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ addi $OUTP, $OUTP, 16
+
+2:
+ # AES body
+ @{[aes_192_decrypt]}
+
+ @{[vxor_vv $V24, $V24, $V16]}
+ @{[vse32_v $V24, $OUTP]}
+ @{[vmv_v_v $V16, $V17]}
+
+ addi $LEN, $LEN, -16
+ addi $INP, $INP, 16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size L_cbc_dec_192,.-L_cbc_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_cbc_dec_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+ # Load IV.
+ @{[vle32_v $V16, $IVP]}
+
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ j 2f
+
+1:
+ @{[vle32_v $V24, $INP]}
+ @{[vmv_v_v $V17, $V24]}
+ addi $OUTP, $OUTP, 16
+
+2:
+ # AES body
+ @{[aes_256_decrypt]}
+
+ @{[vxor_vv $V24, $V24, $V16]}
+ @{[vse32_v $V24, $OUTP]}
+ @{[vmv_v_v $V16, $V17]}
+
+ addi $LEN, $LEN, -16
+ addi $INP, $INP, 16
+
+ bnez $LEN, 1b
+
+ @{[vse32_v $V16, $IVP]}
+
+ ret
+.size L_cbc_dec_256,.-L_cbc_dec_256
+___
+}
+
+{
+###############################################################################
+# void rv64i_zvkned_ecb_encrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# const int enc);
+my ($INP, $OUTP, $LEN, $KEYP, $ENC) = ("a0", "a1", "a2", "a3", "a4");
+my ($REMAIN_LEN) = ("a5");
+my ($VL) = ("a6");
+my ($T0, $T1, $ROUNDS) = ("t0", "t1", "t2");
+my ($LEN32) = ("t3");
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_ecb_encrypt
+.type rv64i_zvkned_ecb_encrypt,\@function
+rv64i_zvkned_ecb_encrypt:
+ # Make the LEN become e32 length.
+ srli $LEN32, $LEN, 2
+
+ # Load number of rounds
+ lwu $ROUNDS, 240($KEYP)
+
+ # Get proper routine for key size
+ li $T0, 10
+ beq $ROUNDS, $T0, L_ecb_enc_128
+
+ li $T0, 12
+ beq $ROUNDS, $T0, L_ecb_enc_192
+
+ li $T0, 14
+ beq $ROUNDS, $T0, L_ecb_enc_256
+
+ ret
+.size rv64i_zvkned_ecb_encrypt,.-rv64i_zvkned_ecb_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_128_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_enc_128,.-L_ecb_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_192_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_enc_192,.-L_ecb_enc_192
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_enc_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_256_encrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_enc_256,.-L_ecb_enc_256
+___
+
+###############################################################################
+# void rv64i_zvkned_ecb_decrypt(const unsigned char *in, unsigned char *out,
+# size_t length, const AES_KEY *key,
+# const int enc);
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_ecb_decrypt
+.type rv64i_zvkned_ecb_decrypt,\@function
+rv64i_zvkned_ecb_decrypt:
+ # Make the LEN become e32 length.
+ srli $LEN32, $LEN, 2
+
+ # Load number of rounds
+ lwu $ROUNDS, 240($KEYP)
+
+ # Get proper routine for key size
+ li $T0, 10
+ beq $ROUNDS, $T0, L_ecb_dec_128
+
+ li $T0, 12
+ beq $ROUNDS, $T0, L_ecb_dec_192
+
+ li $T0, 14
+ beq $ROUNDS, $T0, L_ecb_dec_256
+
+ ret
+.size rv64i_zvkned_ecb_decrypt,.-rv64i_zvkned_ecb_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_128:
+ # Load all 11 round keys to v1-v11 registers.
+ @{[aes_128_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_128_decrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_dec_128,.-L_ecb_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_192:
+ # Load all 13 round keys to v1-v13 registers.
+ @{[aes_192_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_192_decrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_dec_192,.-L_ecb_dec_192
+___
+
+$code .= <<___;
+.p2align 3
+L_ecb_dec_256:
+ # Load all 15 round keys to v1-v15 registers.
+ @{[aes_256_load_key $KEYP]}
+
+1:
+ @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
+ slli $T0, $VL, 2
+ sub $LEN32, $LEN32, $VL
+
+ @{[vle32_v $V24, $INP]}
+
+ # AES body
+ @{[aes_256_decrypt]}
+
+ @{[vse32_v $V24, $OUTP]}
+
+ add $INP, $INP, $T0
+ add $OUTP, $OUTP, $T0
+
+ bnez $LEN32, 1b
+
+ ret
+.size L_ecb_dec_256,.-L_ecb_dec_256
+___
+}
+
{
################################################################################
# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
@@ -98,42 +845,42 @@ $code .= <<___;
L_enc_128:
@{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}

- @{[vle32_v $V1, ($INP)]}
+ @{[vle32_v $V1, $INP]}

- @{[vle32_v $V10, ($KEYP)]}
+ @{[vle32_v $V10, $KEYP]}
@{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V11, ($KEYP)]}
+ @{[vle32_v $V11, $KEYP]}
@{[vaesem_vs $V1, $V11]} # with round key w[ 4, 7]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V12, ($KEYP)]}
+ @{[vle32_v $V12, $KEYP]}
@{[vaesem_vs $V1, $V12]} # with round key w[ 8,11]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V13, ($KEYP)]}
+ @{[vle32_v $V13, $KEYP]}
@{[vaesem_vs $V1, $V13]} # with round key w[12,15]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V14, ($KEYP)]}
+ @{[vle32_v $V14, $KEYP]}
@{[vaesem_vs $V1, $V14]} # with round key w[16,19]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V15, ($KEYP)]}
+ @{[vle32_v $V15, $KEYP]}
@{[vaesem_vs $V1, $V15]} # with round key w[20,23]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V16, ($KEYP)]}
+ @{[vle32_v $V16, $KEYP]}
@{[vaesem_vs $V1, $V16]} # with round key w[24,27]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V17, ($KEYP)]}
+ @{[vle32_v $V17, $KEYP]}
@{[vaesem_vs $V1, $V17]} # with round key w[28,31]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V18, ($KEYP)]}
+ @{[vle32_v $V18, $KEYP]}
@{[vaesem_vs $V1, $V18]} # with round key w[32,35]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V19, ($KEYP)]}
+ @{[vle32_v $V19, $KEYP]}
@{[vaesem_vs $V1, $V19]} # with round key w[36,39]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V20, ($KEYP)]}
+ @{[vle32_v $V20, $KEYP]}
@{[vaesef_vs $V1, $V20]} # with round key w[40,43]

- @{[vse32_v $V1, ($OUTP)]}
+ @{[vse32_v $V1, $OUTP]}

ret
.size L_enc_128,.-L_enc_128
@@ -144,48 +891,48 @@ $code .= <<___;
L_enc_192:
@{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}

- @{[vle32_v $V1, ($INP)]}
+ @{[vle32_v $V1, $INP]}

- @{[vle32_v $V10, ($KEYP)]}
+ @{[vle32_v $V10, $KEYP]}
@{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V11, ($KEYP)]}
+ @{[vle32_v $V11, $KEYP]}
@{[vaesem_vs $V1, $V11]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V12, ($KEYP)]}
+ @{[vle32_v $V12, $KEYP]}
@{[vaesem_vs $V1, $V12]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V13, ($KEYP)]}
+ @{[vle32_v $V13, $KEYP]}
@{[vaesem_vs $V1, $V13]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V14, ($KEYP)]}
+ @{[vle32_v $V14, $KEYP]}
@{[vaesem_vs $V1, $V14]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V15, ($KEYP)]}
+ @{[vle32_v $V15, $KEYP]}
@{[vaesem_vs $V1, $V15]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V16, ($KEYP)]}
+ @{[vle32_v $V16, $KEYP]}
@{[vaesem_vs $V1, $V16]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V17, ($KEYP)]}
+ @{[vle32_v $V17, $KEYP]}
@{[vaesem_vs $V1, $V17]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V18, ($KEYP)]}
+ @{[vle32_v $V18, $KEYP]}
@{[vaesem_vs $V1, $V18]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V19, ($KEYP)]}
+ @{[vle32_v $V19, $KEYP]}
@{[vaesem_vs $V1, $V19]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V20, ($KEYP)]}
+ @{[vle32_v $V20, $KEYP]}
@{[vaesem_vs $V1, $V20]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V21, ($KEYP)]}
+ @{[vle32_v $V21, $KEYP]}
@{[vaesem_vs $V1, $V21]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V22, ($KEYP)]}
+ @{[vle32_v $V22, $KEYP]}
@{[vaesef_vs $V1, $V22]}

- @{[vse32_v $V1, ($OUTP)]}
+ @{[vse32_v $V1, $OUTP]}
ret
.size L_enc_192,.-L_enc_192
___
@@ -195,54 +942,54 @@ $code .= <<___;
L_enc_256:
@{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}

- @{[vle32_v $V1, ($INP)]}
+ @{[vle32_v $V1, $INP]}

- @{[vle32_v $V10, ($KEYP)]}
+ @{[vle32_v $V10, $KEYP]}
@{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3]
addi $KEYP, $KEYP, 16
- @{[vle32_v $V11, ($KEYP)]}
+ @{[vle32_v $V11, $KEYP]}
@{[vaesem_vs $V1, $V11]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V12, ($KEYP)]}
+ @{[vle32_v $V12, $KEYP]}
@{[vaesem_vs $V1, $V12]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V13, ($KEYP)]}
+ @{[vle32_v $V13, $KEYP]}
@{[vaesem_vs $V1, $V13]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V14, ($KEYP)]}
+ @{[vle32_v $V14, $KEYP]}
@{[vaesem_vs $V1, $V14]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V15, ($KEYP)]}
+ @{[vle32_v $V15, $KEYP]}
@{[vaesem_vs $V1, $V15]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V16, ($KEYP)]}
+ @{[vle32_v $V16, $KEYP]}
@{[vaesem_vs $V1, $V16]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V17, ($KEYP)]}
+ @{[vle32_v $V17, $KEYP]}
@{[vaesem_vs $V1, $V17]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V18, ($KEYP)]}
+ @{[vle32_v $V18, $KEYP]}
@{[vaesem_vs $V1, $V18]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V19, ($KEYP)]}
+ @{[vle32_v $V19, $KEYP]}
@{[vaesem_vs $V1, $V19]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V20, ($KEYP)]}
+ @{[vle32_v $V20, $KEYP]}
@{[vaesem_vs $V1, $V20]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V21, ($KEYP)]}
+ @{[vle32_v $V21, $KEYP]}
@{[vaesem_vs $V1, $V21]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V22, ($KEYP)]}
+ @{[vle32_v $V22, $KEYP]}
@{[vaesem_vs $V1, $V22]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V23, ($KEYP)]}
+ @{[vle32_v $V23, $KEYP]}
@{[vaesem_vs $V1, $V23]}
addi $KEYP, $KEYP, 16
- @{[vle32_v $V24, ($KEYP)]}
+ @{[vle32_v $V24, $KEYP]}
@{[vaesef_vs $V1, $V24]}

- @{[vse32_v $V1, ($OUTP)]}
+ @{[vse32_v $V1, $OUTP]}
ret
.size L_enc_256,.-L_enc_256
___
@@ -275,43 +1022,43 @@ $code .= <<___;
L_dec_128:
@{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}

- @{[vle32_v $V1, ($INP)]}
+ @{[vle32_v $V1, $INP]}

addi $KEYP, $KEYP, 160
- @{[vle32_v $V20, ($KEYP)]}
+ @{[vle32_v $V20, $KEYP]}
@{[vaesz_vs $V1, $V20]} # with round key w[40,43]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V19, ($KEYP)]}
+ @{[vle32_v $V19, $KEYP]}
@{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V18, ($KEYP)]}
+ @{[vle32_v $V18, $KEYP]}
@{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V17, ($KEYP)]}
+ @{[vle32_v $V17, $KEYP]}
@{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V16, ($KEYP)]}
+ @{[vle32_v $V16, $KEYP]}
@{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V15, ($KEYP)]}
+ @{[vle32_v $V15, $KEYP]}
@{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V14, ($KEYP)]}
+ @{[vle32_v $V14, $KEYP]}
@{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V13, ($KEYP)]}
+ @{[vle32_v $V13, $KEYP]}
@{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V12, ($KEYP)]}
+ @{[vle32_v $V12, $KEYP]}
@{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V11, ($KEYP)]}
+ @{[vle32_v $V11, $KEYP]}
@{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V10, ($KEYP)]}
+ @{[vle32_v $V10, $KEYP]}
@{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]

- @{[vse32_v $V1, ($OUTP)]}
+ @{[vse32_v $V1, $OUTP]}

ret
.size L_dec_128,.-L_dec_128
@@ -322,49 +1069,49 @@ $code .= <<___;
L_dec_192:
@{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}

- @{[vle32_v $V1, ($INP)]}
+ @{[vle32_v $V1, $INP]}

addi $KEYP, $KEYP, 192
- @{[vle32_v $V22, ($KEYP)]}
+ @{[vle32_v $V22, $KEYP]}
@{[vaesz_vs $V1, $V22]} # with round key w[48,51]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V21, ($KEYP)]}
+ @{[vle32_v $V21, $KEYP]}
@{[vaesdm_vs $V1, $V21]} # with round key w[44,47]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V20, ($KEYP)]}
+ @{[vle32_v $V20, $KEYP]}
@{[vaesdm_vs $V1, $V20]} # with round key w[40,43]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V19, ($KEYP)]}
+ @{[vle32_v $V19, $KEYP]}
@{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V18, ($KEYP)]}
+ @{[vle32_v $V18, $KEYP]}
@{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V17, ($KEYP)]}
+ @{[vle32_v $V17, $KEYP]}
@{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V16, ($KEYP)]}
+ @{[vle32_v $V16, $KEYP]}
@{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V15, ($KEYP)]}
+ @{[vle32_v $V15, $KEYP]}
@{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V14, ($KEYP)]}
+ @{[vle32_v $V14, $KEYP]}
@{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V13, ($KEYP)]}
+ @{[vle32_v $V13, $KEYP]}
@{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V12, ($KEYP)]}
+ @{[vle32_v $V12, $KEYP]}
@{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V11, ($KEYP)]}
+ @{[vle32_v $V11, $KEYP]}
@{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V10, ($KEYP)]}
+ @{[vle32_v $V10, $KEYP]}
@{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]

- @{[vse32_v $V1, ($OUTP)]}
+ @{[vse32_v $V1, $OUTP]}

ret
.size L_dec_192,.-L_dec_192
@@ -375,55 +1122,55 @@ $code .= <<___;
L_dec_256:
@{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}

- @{[vle32_v $V1, ($INP)]}
+ @{[vle32_v $V1, $INP]}

addi $KEYP, $KEYP, 224
- @{[vle32_v $V24, ($KEYP)]}
+ @{[vle32_v $V24, $KEYP]}
@{[vaesz_vs $V1, $V24]} # with round key w[56,59]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V23, ($KEYP)]}
+ @{[vle32_v $V23, $KEYP]}
@{[vaesdm_vs $V1, $V23]} # with round key w[52,55]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V22, ($KEYP)]}
+ @{[vle32_v $V22, $KEYP]}
@{[vaesdm_vs $V1, $V22]} # with round key w[48,51]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V21, ($KEYP)]}
+ @{[vle32_v $V21, $KEYP]}
@{[vaesdm_vs $V1, $V21]} # with round key w[44,47]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V20, ($KEYP)]}
+ @{[vle32_v $V20, $KEYP]}
@{[vaesdm_vs $V1, $V20]} # with round key w[40,43]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V19, ($KEYP)]}
+ @{[vle32_v $V19, $KEYP]}
@{[vaesdm_vs $V1, $V19]} # with round key w[36,39]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V18, ($KEYP)]}
+ @{[vle32_v $V18, $KEYP]}
@{[vaesdm_vs $V1, $V18]} # with round key w[32,35]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V17, ($KEYP)]}
+ @{[vle32_v $V17, $KEYP]}
@{[vaesdm_vs $V1, $V17]} # with round key w[28,31]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V16, ($KEYP)]}
+ @{[vle32_v $V16, $KEYP]}
@{[vaesdm_vs $V1, $V16]} # with round key w[24,27]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V15, ($KEYP)]}
+ @{[vle32_v $V15, $KEYP]}
@{[vaesdm_vs $V1, $V15]} # with round key w[20,23]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V14, ($KEYP)]}
+ @{[vle32_v $V14, $KEYP]}
@{[vaesdm_vs $V1, $V14]} # with round key w[16,19]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V13, ($KEYP)]}
+ @{[vle32_v $V13, $KEYP]}
@{[vaesdm_vs $V1, $V13]} # with round key w[12,15]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V12, ($KEYP)]}
+ @{[vle32_v $V12, $KEYP]}
@{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V11, ($KEYP)]}
+ @{[vle32_v $V11, $KEYP]}
@{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7]
addi $KEYP, $KEYP, -16
- @{[vle32_v $V10, ($KEYP)]}
+ @{[vle32_v $V10, $KEYP]}
@{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3]

- @{[vse32_v $V1, ($OUTP)]}
+ @{[vse32_v $V1, $OUTP]}

ret
.size L_dec_256,.-L_dec_256
--
2.28.0

2023-10-25 18:38:44

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation

Add SM4 implementation using Zvksed vector crypto extension from OpenSSL
(openssl/openssl#21923).

The perlasm here is different from the original implementation in OpenSSL.
In OpenSSL, SM4 has the separated set_encrypt_key and set_decrypt_key
functions. In kernel, these set_key functions are merged into a single
one in order to skip the redundant key expanding instructions.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 19 ++
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sm4-riscv64-glue.c | 120 +++++++++++
arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++++++++++++++++++++++
4 files changed, 414 insertions(+)
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index a5a34777d9c3..ff004afa2874 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -75,4 +75,23 @@ config CRYPTO_SHA512_RISCV64
- Zvkb vector crypto extension
- Zvknhb vector crypto extension

+config CRYPTO_SM4_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Ciphers: SM4 (ShangMi 4)"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_ALGAPI
+ select CRYPTO_SM4
+ select CRYPTO_SM4_GENERIC
+ help
+ SM4 cipher algorithms (OSCCA GB/T 32907-2016,
+ ISO/IEC 18033-3:2010/Amd 1:2021)
+
+ SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+ Organization of State Commercial Administration of China (OSCCA)
+ as an authorized cryptographic algorithms for the use within China.
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+ - Zvksed vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index ebb4e861ebdc..ffad095e531f 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o

+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -39,9 +42,13 @@ $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha
$(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl
$(call cmd,perlasm)

+$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
clean-files += aes-riscv64-zvkb-zvkned.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
clean-files += sha512-riscv64-zvkb-zvknhb.S
+clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 000000000000..2d9983a4b1ee
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv64 port of the OpenSSL SM4 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/sm4.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+/* sm4 using zvksed vector crypto extension */
+void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const u32 *key);
+void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const u32 *key);
+int rv64i_zvksed_sm4_set_key(const u8 *userKey, unsigned int key_len,
+ u32 *enc_key, u32 *dec_key);
+
+static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int key_len)
+{
+ struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+ int ret = 0;
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (rv64i_zvksed_sm4_set_key(key, key_len, ctx->rkey_enc,
+ ctx->rkey_dec))
+ ret = -EINVAL;
+ kernel_vector_end();
+ } else {
+ ret = sm4_expandkey(ctx, key, key_len);
+ }
+
+ return ret;
+}
+
+static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst,
+ const u8 *src)
+{
+ const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvksed_sm4_encrypt(src, dst, ctx->rkey_enc);
+ kernel_vector_end();
+ } else {
+ sm4_crypt_block(ctx->rkey_enc, dst, src);
+ }
+}
+
+static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst,
+ const u8 *src)
+{
+ const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ rv64i_zvksed_sm4_decrypt(src, dst, ctx->rkey_dec);
+ kernel_vector_end();
+ } else {
+ sm4_crypt_block(ctx->rkey_dec, dst, src);
+ }
+}
+
+struct crypto_alg riscv64_sm4_zvksed_alg = {
+ .cra_name = "sm4",
+ .cra_driver_name = "sm4-riscv64-zvkb-zvksed",
+ .cra_module = THIS_MODULE,
+ .cra_priority = 300,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = SM4_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct sm4_ctx),
+ .cra_cipher = {
+ .cia_min_keysize = SM4_KEY_SIZE,
+ .cia_max_keysize = SM4_KEY_SIZE,
+ .cia_setkey = riscv64_sm4_setkey_zvksed,
+ .cia_encrypt = riscv64_sm4_encrypt_zvksed,
+ .cia_decrypt = riscv64_sm4_decrypt_zvksed,
+ },
+};
+
+static inline bool check_sm4_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKSED) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sm4_mod_init(void)
+{
+ if (check_sm4_ext())
+ return crypto_register_alg(&riscv64_sm4_zvksed_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+ if (check_sm4_ext())
+ crypto_unregister_alg(&riscv64_sm4_zvksed_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
new file mode 100644
index 000000000000..a764668c8398
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
@@ -0,0 +1,268 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SM4 Block Cipher extension ('Zvksed')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+####
+# int rv64i_zvksed_sm4_set_key(const u8 *userKey, unsigned int key_len,
+# u32 *enc_key, u32 *dec_key);
+#
+{
+my ($ukey,$key_len,$enc_key,$dec_key)=("a0","a1","a2","a3");
+my ($fk,$stride)=("a4","a5");
+my ($t0,$t1)=("t0","t1");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_key
+.type rv64i_zvksed_sm4_set_key,\@function
+rv64i_zvksed_sm4_set_key:
+ li $t0, 16
+ beq $t0, $key_len, 1f
+ li a0, 1
+ ret
+1:
+
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load the user key
+ @{[vle32_v $vukey, $ukey]}
+ @{[vrev8_v $vukey, $vukey]}
+
+ # Load the FK.
+ la $fk, FK
+ @{[vle32_v $vfk, $fk]}
+
+ # Generate round keys.
+ @{[vxor_vv $vukey, $vukey, $vfk]}
+ @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+ @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+ @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+ @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+ @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+ @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+ @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+ @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+ # Store enc round keys
+ @{[vse32_v $vk0, $enc_key]} # rk[0:3]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk1, $enc_key]} # rk[4:7]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk2, $enc_key]} # rk[8:11]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk3, $enc_key]} # rk[12:15]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk4, $enc_key]} # rk[16:19]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk5, $enc_key]} # rk[20:23]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk6, $enc_key]} # rk[24:27]
+ addi $enc_key, $enc_key, 16
+ @{[vse32_v $vk7, $enc_key]} # rk[28:31]
+
+ # Store dec round keys in reverse order
+ addi $dec_key, $dec_key, 12
+ li $stride, -4
+ @{[vsse32_v $vk7, $dec_key, $stride]} # rk[31:28]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk6, $dec_key, $stride]} # rk[27:24]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk5, $dec_key, $stride]} # rk[23:20]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk4, $dec_key, $stride]} # rk[19:16]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk3, $dec_key, $stride]} # rk[15:12]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk2, $dec_key, $stride]} # rk[11:8]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk1, $dec_key, $stride]} # rk[7:4]
+ addi $dec_key, $dec_key, 16
+ @{[vsse32_v $vk0, $dec_key, $stride]} # rk[3:0]
+
+ li a0, 0
+ ret
+.size rv64i_zvksed_sm4_set_key,.-rv64i_zvksed_sm4_set_key
+___
+}
+
+####
+# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out,
+# const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_encrypt
+.type rv64i_zvksed_sm4_encrypt,\@function
+rv64i_zvksed_sm4_encrypt:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load input data
+ @{[vle32_v $vdata, $in]}
+ @{[vrev8_v $vdata, $vdata]}
+
+ # Order of elements was adjusted in sm4_set_key()
+ # Encrypt with all keys
+ @{[vle32_v $vk0, $keys]} # rk[0:3]
+ @{[vsm4r_vs $vdata, $vk0]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk1, $keys]} # rk[4:7]
+ @{[vsm4r_vs $vdata, $vk1]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk2, $keys]} # rk[8:11]
+ @{[vsm4r_vs $vdata, $vk2]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk3, $keys]} # rk[12:15]
+ @{[vsm4r_vs $vdata, $vk3]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk4, $keys]} # rk[16:19]
+ @{[vsm4r_vs $vdata, $vk4]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk5, $keys]} # rk[20:23]
+ @{[vsm4r_vs $vdata, $vk5]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk6, $keys]} # rk[24:27]
+ @{[vsm4r_vs $vdata, $vk6]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk7, $keys]} # rk[28:31]
+ @{[vsm4r_vs $vdata, $vk7]}
+
+ # Save the ciphertext (in reverse element order)
+ @{[vrev8_v $vdata, $vdata]}
+ li $stride, -4
+ addi $out, $out, 12
+ @{[vsse32_v $vdata, $out, $stride]}
+
+ ret
+.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt
+___
+}
+
+####
+# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out,
+# const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_decrypt
+.type rv64i_zvksed_sm4_decrypt,\@function
+rv64i_zvksed_sm4_decrypt:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ # Load input data
+ @{[vle32_v $vdata, $in]}
+ @{[vrev8_v $vdata, $vdata]}
+
+ # Order of key elements was adjusted in sm4_set_key()
+ # Decrypt with all keys
+ @{[vle32_v $vk7, $keys]} # rk[31:28]
+ @{[vsm4r_vs $vdata, $vk7]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk6, $keys]} # rk[27:24]
+ @{[vsm4r_vs $vdata, $vk6]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk5, $keys]} # rk[23:20]
+ @{[vsm4r_vs $vdata, $vk5]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk4, $keys]} # rk[19:16]
+ @{[vsm4r_vs $vdata, $vk4]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk3, $keys]} # rk[15:11]
+ @{[vsm4r_vs $vdata, $vk3]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk2, $keys]} # rk[11:8]
+ @{[vsm4r_vs $vdata, $vk2]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk1, $keys]} # rk[7:4]
+ @{[vsm4r_vs $vdata, $vk1]}
+ addi $keys, $keys, 16
+ @{[vle32_v $vk0, $keys]} # rk[3:0]
+ @{[vsm4r_vs $vdata, $vk0]}
+
+ # Save the ciphertext (in reverse element order)
+ @{[vrev8_v $vdata, $vdata]}
+ li $stride, -4
+ addi $out, $out, 12
+ @{[vsse32_v $vdata, $out, $stride]}
+
+ ret
+.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt
+___
+}
+
+$code .= <<___;
+# Family Key (little-endian 32-bit chunks)
+.p2align 3
+FK:
+ .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FK,.-FK
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-10-25 18:38:53

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations

Add SHA384 and 512 implementations using Zvknhb vector crypto extension
from OpenSSL(openssl/openssl#21923).

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 13 +
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sha512-riscv64-glue.c | 133 +++++++++
.../crypto/sha512-riscv64-zvkb-zvknhb.pl | 266 ++++++++++++++++++
4 files changed, 419 insertions(+)
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index dbc86592063a..a5a34777d9c3 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -62,4 +62,17 @@ config CRYPTO_SHA256_RISCV64
- Zvkb vector crypto extension
- Zvknha or Zvknhb vector crypto extensions

+config CRYPTO_SHA512_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Hash functions: SHA-384 and SHA-512"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_HASH
+ select CRYPTO_SHA512
+ help
+ SHA-512 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+ - Zvknhb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 49577fb72cca..ebb4e861ebdc 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o

+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -33,8 +36,12 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
$(call cmd,perlasm)

+$(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
clean-files += aes-riscv64-zvkb-zvkned.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
+clean-files += sha512-riscv64-zvkb-zvknhb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..31202b7b8c9b
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha512_base.h>
+
+/*
+ * sha512 using zvkb and zvknhb vector crypto extension
+ *
+ * This asm function will just take the first 512-bit as the sha512 state from
+ * the pointer to `struct sha512_state`.
+ */
+void sha512_block_data_order_zvkb_zvknhb(struct sha512_state *digest,
+ const u8 *data, int num_blks);
+
+static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ int ret = 0;
+
+ /*
+ * Make sure struct sha256_state begins directly with the SHA256
+ * 256-bit internal state, as this is what the asm function expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ret = sha512_base_do_update(
+ desc, data, len, sha512_block_data_order_zvkb_zvknhb);
+ kernel_vector_end();
+ } else {
+ ret = crypto_sha512_update(desc, data, len);
+ }
+
+ return ret;
+}
+
+static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sha512_base_do_update(
+ desc, data, len,
+ sha512_block_data_order_zvkb_zvknhb);
+ sha512_base_do_finalize(desc,
+ sha512_block_data_order_zvkb_zvknhb);
+ kernel_vector_end();
+
+ return sha512_base_finish(desc, out);
+ }
+
+ return crypto_sha512_finup(desc, data, len, out);
+}
+
+static int riscv64_sha512_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_alg[] = {
+ {
+ .digestsize = SHA512_DIGEST_SIZE,
+ .init = sha512_base_init,
+ .update = riscv64_sha512_update,
+ .final = riscv64_sha512_final,
+ .finup = riscv64_sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .base.cra_name = "sha512",
+ .base.cra_driver_name = "sha512-riscv64-zvkb-zvknhb",
+ .base.cra_priority = 150,
+ .base.cra_blocksize = SHA512_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+ },
+ {
+ .digestsize = SHA384_DIGEST_SIZE,
+ .init = sha384_base_init,
+ .update = riscv64_sha512_update,
+ .final = riscv64_sha512_final,
+ .finup = riscv64_sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .base.cra_name = "sha384",
+ .base.cra_driver_name = "sha384-riscv64-zvkb-zvknhb",
+ .base.cra_priority = 150,
+ .base.cra_blocksize = SHA384_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+ }
+};
+
+static inline bool check_sha512_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKNHB) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sha512_mod_init(void)
+{
+ if (check_sha512_ext())
+ return crypto_register_shashes(sha512_alg,
+ ARRAY_SIZE(sha512_alg));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sha512_mod_fini(void)
+{
+ if (check_sha512_ext())
+ crypto_unregister_shashes(sha512_alg, ARRAY_SIZE(sha512_alg));
+}
+
+module_init(riscv64_sha512_mod_init);
+module_exit(riscv64_sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)");
+MODULE_AUTHOR("Andy Polyakov <[email protected]>");
+MODULE_AUTHOR("Ard Biesheuvel <[email protected]>");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha384");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl
new file mode 100644
index 000000000000..4be448266a59
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl
@@ -0,0 +1,266 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4");
+
+################################################################################
+# void sha512_block_data_order_zvkb_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvkb_zvknhb
+.type sha512_block_data_order_zvkb_zvknhb,\@function
+sha512_block_data_order_zvkb_zvknhb:
+ @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+
+ # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+ # The dst vtype is e64m2 and the index vtype is e8mf4.
+ # We use index-load with the following index pattern at v1.
+ # i8 index:
+ # 40, 32, 8, 0
+ # Instead of setting the i8 index, we could use a single 32bit
+ # little-endian value to cover the 4xi8 index.
+ # i32 value:
+ # 0x 00 08 20 28
+ li $INDEX_PATTERN, 0x00082028
+ @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_x $V1, $INDEX_PATTERN]}
+
+ addi $H2, $H, 16
+
+ # Use index-load to get {f,e,b,a},{h,g,d,c}
+ @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+ @{[vluxei8_v $V22, $H, $V1]}
+ @{[vluxei8_v $V24, $H2, $V1]}
+
+ # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling.
+ # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking.
+ @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V0, 0x01]}
+
+ @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]}
+
+L_round_loop:
+ # Load round constants K512
+ la $KT, $K512
+
+ # Decrement length by 1
+ addi $LEN, $LEN, -1
+
+ # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}.
+ @{[vmv_v_v $V26, $V22]}
+ @{[vmv_v_v $V28, $V24]}
+
+ # Load the 1024-bits of the message block in v10-v16 and perform the endian
+ # swap.
+ @{[vle64_v $V10, $INP]}
+ @{[vrev8_v $V10, $V10]}
+ addi $INP, $INP, 32
+ @{[vle64_v $V12, $INP]}
+ @{[vrev8_v $V12, $V12]}
+ addi $INP, $INP, 32
+ @{[vle64_v $V14, $INP]}
+ @{[vrev8_v $V14, $V14]}
+ addi $INP, $INP, 32
+ @{[vle64_v $V16, $INP]}
+ @{[vrev8_v $V16, $V16]}
+ addi $INP, $INP, 32
+
+ .rept 4
+ # Quad-round 0 (+0, v10->v12->v14->v16)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V10]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V14, $V12, $V0]}
+ @{[vsha2ms_vv $V10, $V18, $V16]}
+
+ # Quad-round 1 (+1, v12->v14->v16->v10)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V12]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V16, $V14, $V0]}
+ @{[vsha2ms_vv $V12, $V18, $V10]}
+
+ # Quad-round 2 (+2, v14->v16->v10->v12)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V14]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V10, $V16, $V0]}
+ @{[vsha2ms_vv $V14, $V18, $V12]}
+
+ # Quad-round 3 (+3, v16->v10->v12->v14)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V16]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+ @{[vmerge_vvm $V18, $V12, $V10, $V0]}
+ @{[vsha2ms_vv $V16, $V18, $V14]}
+ .endr
+
+ # Quad-round 16 (+0, v10->v12->v14->v16)
+ # Note that we stop generating new message schedule words (Wt, v10-16)
+ # as we already generated all the words we end up consuming (i.e., W[79:76]).
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V10]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # Quad-round 17 (+1, v12->v14->v16->v10)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V12]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # Quad-round 18 (+2, v14->v16->v10->v12)
+ @{[vle64_v $V20, $KT]}
+ addi $KT, $KT, 32
+ @{[vadd_vv $V18, $V20, $V14]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # Quad-round 19 (+3, v16->v10->v12->v14)
+ @{[vle64_v $V20, $KT]}
+ # No t1 increment needed.
+ @{[vadd_vv $V18, $V20, $V16]}
+ @{[vsha2cl_vv $V24, $V22, $V18]}
+ @{[vsha2ch_vv $V22, $V24, $V18]}
+
+ # H' = H+{a',b',c',...,h'}
+ @{[vadd_vv $V22, $V26, $V22]}
+ @{[vadd_vv $V24, $V28, $V24]}
+ bnez $LEN, L_round_loop
+
+ # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}.
+ @{[vsuxei8_v $V22, $H, $V1]}
+ @{[vsuxei8_v $V24, $H2, $V1]}
+
+ ret
+.size sha512_block_data_order_zvkb_zvknhb,.-sha512_block_data_order_zvkb_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+ .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+ .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+ .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+ .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+ .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+ .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+ .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+ .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+ .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+ .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+ .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+ .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+ .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+ .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+ .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+ .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+ .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+ .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+ .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+ .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+ .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+ .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+ .dword 0xd192e819d6ef5218, 0xd69906245565a910
+ .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+ .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+ .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+ .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+ .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+ .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+ .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+ .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+ .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+ .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+ .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+ .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+ .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+ .dword 0x28db77f523047d84, 0x32caab7b40c72493
+ .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+ .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+ .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-10-25 18:39:05

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 11/12] RISC-V: crypto: add Zvksh accelerated SM3 implementation

Add SM3 implementation using Zvksh vector crypto extension from OpenSSL
(openssl/openssl#21923).

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 13 ++
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sm3-riscv64-glue.c | 121 +++++++++++++
arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 +++++++++++++++++++++++++
4 files changed, 371 insertions(+)
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index ff004afa2874..2797b37394bb 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -75,6 +75,19 @@ config CRYPTO_SHA512_RISCV64
- Zvkb vector crypto extension
- Zvknhb vector crypto extension

+config CRYPTO_SM3_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Hash functions: SM3 (ShangMi 3)"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_HASH
+ select CRYPTO_SM3
+ help
+ SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+ - Zvksh vector crypto extension
+
config CRYPTO_SM4_RISCV64
default y if RISCV_ISA_V
tristate "Ciphers: SM4 (ShangMi 4)"
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index ffad095e531f..b772417703fd 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o

+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o
+
obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o

@@ -42,6 +45,9 @@ $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha
$(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl
$(call cmd,perlasm)

+$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl
+ $(call cmd,perlasm)
+
$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
$(call cmd,perlasm)

@@ -51,4 +57,5 @@ clean-files += aes-riscv64-zvkb-zvkned.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
clean-files += sha512-riscv64-zvkb-zvknhb.S
+clean-files += sm3-riscv64-zvksh.S
clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 000000000000..32d37b602e54
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SM3 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sm3_base.h>
+
+/*
+ * sm3 using zvksh vector crypto extension
+ *
+ * This asm function will just take the first 256-bit as the sm3 state from
+ * the pointer to `struct sm3_state`.
+ */
+void ossl_hwsm3_block_data_order_zvksh(struct sm3_state *digest, u8 const *o,
+ int num);
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ int ret = 0;
+
+ /*
+ * Make sure struct sm3_state begins directly with the SM3 256-bit internal
+ * state, as this is what the asm function expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ret = sm3_base_do_update(desc, data, len,
+ ossl_hwsm3_block_data_order_zvksh);
+ kernel_vector_end();
+ } else {
+ sm3_update(shash_desc_ctx(desc), data, len);
+ }
+
+ return ret;
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sm3_state *ctx;
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sm3_base_do_update(desc, data, len,
+ ossl_hwsm3_block_data_order_zvksh);
+ sm3_base_do_finalize(desc, ossl_hwsm3_block_data_order_zvksh);
+ kernel_vector_end();
+
+ return sm3_base_finish(desc, out);
+ }
+
+ ctx = shash_desc_ctx(desc);
+ if (len)
+ sm3_update(ctx, data, len);
+ sm3_final(ctx, out);
+
+ return 0;
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sm3_alg = {
+ .digestsize = SM3_DIGEST_SIZE,
+ .init = sm3_base_init,
+ .update = riscv64_sm3_update,
+ .final = riscv64_sm3_final,
+ .finup = riscv64_sm3_finup,
+ .descsize = sizeof(struct sm3_state),
+ .base.cra_name = "sm3",
+ .base.cra_driver_name = "sm3-riscv64-zvkb-zvksh",
+ .base.cra_priority = 150,
+ .base.cra_blocksize = SM3_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+};
+
+static inline bool check_sm3_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKSH) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_riscv64_sm3_mod_init(void)
+{
+ if (check_sm3_ext())
+ return crypto_register_shash(&sm3_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sm3_mod_fini(void)
+{
+ if (check_sm3_ext())
+ crypto_unregister_shash(&sm3_alg);
+}
+
+module_init(riscv64_riscv64_sm3_mod_init);
+module_exit(riscv64_sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 (RISC-V accelerated)");
+MODULE_AUTHOR("Andy Polyakov <[email protected]>");
+MODULE_AUTHOR("Ard Biesheuvel <[email protected]>");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
new file mode 100644
index 000000000000..942d78d982e9
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
@@ -0,0 +1,230 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SM3 Secure Hash extension ('Zvksh')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num);
+{
+my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2");
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+$code .= <<___;
+.text
+.p2align 3
+.globl ossl_hwsm3_block_data_order_zvksh
+.type ossl_hwsm3_block_data_order_zvksh,\@function
+ossl_hwsm3_block_data_order_zvksh:
+ @{[vsetivli "zero", 8, "e32", "m2", "ta", "ma"]}
+
+ # Load initial state of hash context (c->A-H).
+ @{[vle32_v $V0, $CTX]}
+ @{[vrev8_v $V0, $V0]}
+
+L_sm3_loop:
+ # Copy the previous state to v2.
+ # It will be XOR'ed with the current state at the end of the round.
+ @{[vmv_v_v $V2, $V0]}
+
+ # Load the 64B block in 2x32B chunks.
+ @{[vle32_v $V6, $INPUT]} # v6 := {w7, ..., w0}
+ addi $INPUT, $INPUT, 32
+
+ @{[vle32_v $V8, $INPUT]} # v8 := {w15, ..., w8}
+ addi $INPUT, $INPUT, 32
+
+ addi $NUM, $NUM, -1
+
+ # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input
+ # 2 elements down so we process elements w2, w3, w6, w7
+ # This will be repeated for each odd round.
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w7, ..., w2}
+
+ @{[vsm3c_vi $V0, $V6, 0]}
+ @{[vsm3c_vi $V0, $V4, 1]}
+
+ # Prepare a vector with {w11, ..., w4}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w7, ..., w4}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w11, w10, w9, w8, w7, w6, w5, w4}
+
+ @{[vsm3c_vi $V0, $V4, 2]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w11, w10, w9, w8, w7, w6}
+ @{[vsm3c_vi $V0, $V4, 3]}
+
+ @{[vsm3c_vi $V0, $V8, 4]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w15, w14, w13, w12, w11, w10}
+ @{[vsm3c_vi $V0, $V4, 5]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w23, w22, w21, w20, w19, w18, w17, w16}
+
+ # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w15, w14, w13, w12}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w19, w18, w17, w16, w15, w14, w13, w12}
+
+ @{[vsm3c_vi $V0, $V4, 6]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w19, w18, w17, w16, w15, w14}
+ @{[vsm3c_vi $V0, $V4, 7]}
+
+ @{[vsm3c_vi $V0, $V6, 8]}
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w23, w22, w21, w20, w19, w18}
+ @{[vsm3c_vi $V0, $V4, 9]}
+
+ @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w31, w30, w29, w28, w27, w26, w25, w24}
+
+ # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w23, w22, w21, w20}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w27, w26, w25, w24, w23, w22, w21, w20}
+
+ @{[vsm3c_vi $V0, $V4, 10]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w27, w26, w25, w24, w23, w22}
+ @{[vsm3c_vi $V0, $V4, 11]}
+
+ @{[vsm3c_vi $V0, $V8, 12]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {x, X, w31, w30, w29, w28, w27, w26}
+ @{[vsm3c_vi $V0, $V4, 13]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w32, w33, w34, w35, w36, w37, w38, w39}
+
+ # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w31, w30, w29, w28}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w35, w34, w33, w32, w31, w30, w29, w28}
+
+ @{[vsm3c_vi $V0, $V4, 14]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w35, w34, w33, w32, w31, w30}
+ @{[vsm3c_vi $V0, $V4, 15]}
+
+ @{[vsm3c_vi $V0, $V6, 16]}
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w39, w38, w37, w36, w35, w34}
+ @{[vsm3c_vi $V0, $V4, 17]}
+
+ @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w47, w46, w45, w44, w43, w42, w41, w40}
+
+ # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w39, w38, w37, w36}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w43, w42, w41, w40, w39, w38, w37, w36}
+
+ @{[vsm3c_vi $V0, $V4, 18]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w43, w42, w41, w40, w39, w38}
+ @{[vsm3c_vi $V0, $V4, 19]}
+
+ @{[vsm3c_vi $V0, $V8, 20]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w47, w46, w45, w44, w43, w42}
+ @{[vsm3c_vi $V0, $V4, 21]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w55, w54, w53, w52, w51, w50, w49, w48}
+
+ # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w47, w46, w45, w44}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w51, w50, w49, w48, w47, w46, w45, w44}
+
+ @{[vsm3c_vi $V0, $V4, 22]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w51, w50, w49, w48, w47, w46}
+ @{[vsm3c_vi $V0, $V4, 23]}
+
+ @{[vsm3c_vi $V0, $V6, 24]}
+ @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w55, w54, w53, w52, w51, w50}
+ @{[vsm3c_vi $V0, $V4, 25]}
+
+ @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w63, w62, w61, w60, w59, w58, w57, w56}
+
+ # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w55, w54, w53, w52}
+ @{[vslideup_vi $V4, $V8, 4]} # v4 := {w59, w58, w57, w56, w55, w54, w53, w52}
+
+ @{[vsm3c_vi $V0, $V4, 26]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w59, w58, w57, w56, w55, w54}
+ @{[vsm3c_vi $V0, $V4, 27]}
+
+ @{[vsm3c_vi $V0, $V8, 28]}
+ @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w63, w62, w61, w60, w59, w58}
+ @{[vsm3c_vi $V0, $V4, 29]}
+
+ @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w71, w70, w69, w68, w67, w66, w65, w64}
+
+ # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w63, w62, w61, w60}
+ @{[vslideup_vi $V4, $V6, 4]} # v4 := {w67, w66, w65, w64, w63, w62, w61, w60}
+
+ @{[vsm3c_vi $V0, $V4, 30]}
+ @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w67, w66, w65, w64, w63, w62}
+ @{[vsm3c_vi $V0, $V4, 31]}
+
+ # XOR in the previous state.
+ @{[vxor_vv $V0, $V0, $V2]}
+
+ bnez $NUM, L_sm3_loop # Check if there are any more block to process
+L_sm3_end:
+ @{[vrev8_v $V0, $V0]}
+ @{[vse32_v $V0, $CTX]}
+ ret
+
+.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-10-25 18:39:08

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

Add a ChaCha20 vector implementation from OpenSSL(openssl/openssl#21923).

Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 12 +
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/chacha-riscv64-glue.c | 120 +++++++++
arch/riscv/crypto/chacha-riscv64-zvkb.pl | 322 +++++++++++++++++++++++
4 files changed, 461 insertions(+)
create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 2797b37394bb..41ce453afafa 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -35,6 +35,18 @@ config CRYPTO_AES_BLOCK_RISCV64
- Zvkg vector crypto extension (XTS)
- Zvkned vector crypto extension

+config CRYPTO_CHACHA20_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Ciphers: ChaCha20"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_SKCIPHER
+ select CRYPTO_LIB_CHACHA_GENERIC
+ help
+ Length-preserving ciphers: ChaCha20 stream cipher algorithm
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+
config CRYPTO_GHASH_RISCV64
default y if RISCV_ISA_V
tristate "Hash functions: GHASH"
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b772417703fd..80b0ebc956a3 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o

+obj-$(CONFIG_CRYPTO_CHACHA20_RISCV64) += chacha-riscv64.o
+chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o
+
obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

@@ -36,6 +39,9 @@ $(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl
$(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
$(call cmd,perlasm)

+$(obj)/chacha-riscv64-zvkb.S: $(src)/chacha-riscv64-zvkb.pl
+ $(call cmd,perlasm)
+
$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(call cmd,perlasm)

@@ -54,6 +60,7 @@ $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
clean-files += aes-riscv64-zvkb-zvkned.S
+clean-files += chacha-riscv64-zvkb.S
clean-files += ghash-riscv64-zvkg.S
clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
clean-files += sha512-riscv64-zvkb-zvknhb.S
diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
new file mode 100644
index 000000000000..72011949f705
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-glue.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Port of the OpenSSL ChaCha20 implementation for RISC-V 64
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/chacha.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#define CHACHA_BLOCK_VALID_SIZE_MASK (~(CHACHA_BLOCK_SIZE - 1))
+#define CHACHA_BLOCK_REMAINING_SIZE_MASK (CHACHA_BLOCK_SIZE - 1)
+#define CHACHA_KEY_OFFSET 4
+#define CHACHA_IV_OFFSET 12
+
+/* chacha20 using zvkb vector crypto extension */
+void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len, const u32 *key,
+ const u32 *counter);
+
+static int chacha20_encrypt(struct skcipher_request *req)
+{
+ u32 state[CHACHA_STATE_WORDS];
+ u8 block_buffer[CHACHA_BLOCK_SIZE];
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ unsigned int tail_bytes;
+ int err;
+
+ chacha_init_generic(state, ctx->key, req->iv);
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while (walk.nbytes) {
+ nbytes = walk.nbytes & CHACHA_BLOCK_VALID_SIZE_MASK;
+ tail_bytes = walk.nbytes & CHACHA_BLOCK_REMAINING_SIZE_MASK;
+ kernel_vector_begin();
+ if (nbytes) {
+ ChaCha20_ctr32_zvkb(walk.dst.virt.addr,
+ walk.src.virt.addr, nbytes,
+ state + CHACHA_KEY_OFFSET,
+ state + CHACHA_IV_OFFSET);
+ state[CHACHA_IV_OFFSET] += nbytes / CHACHA_BLOCK_SIZE;
+ }
+ if (walk.nbytes == walk.total && tail_bytes > 0) {
+ memcpy(block_buffer, walk.src.virt.addr + nbytes,
+ tail_bytes);
+ ChaCha20_ctr32_zvkb(block_buffer, block_buffer,
+ CHACHA_BLOCK_SIZE,
+ state + CHACHA_KEY_OFFSET,
+ state + CHACHA_IV_OFFSET);
+ memcpy(walk.dst.virt.addr + nbytes, block_buffer,
+ tail_bytes);
+ tail_bytes = 0;
+ }
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, tail_bytes);
+ }
+
+ return err;
+}
+
+static struct skcipher_alg riscv64_chacha_alg_zvkb[] = { {
+ .base = {
+ .cra_name = "chacha20",
+ .cra_driver_name = "chacha20-riscv64-zvkb",
+ .cra_priority = 300,
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct chacha_ctx),
+ .cra_module = THIS_MODULE,
+ },
+ .min_keysize = CHACHA_KEY_SIZE,
+ .max_keysize = CHACHA_KEY_SIZE,
+ .ivsize = CHACHA_IV_SIZE,
+ .chunksize = CHACHA_BLOCK_SIZE,
+ .walksize = CHACHA_BLOCK_SIZE * 4,
+ .setkey = chacha20_setkey,
+ .encrypt = chacha20_encrypt,
+ .decrypt = chacha20_encrypt,
+} };
+
+static inline bool check_chacha20_ext(void)
+{
+ return riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_chacha_mod_init(void)
+{
+ if (check_chacha20_ext())
+ return crypto_register_skciphers(
+ riscv64_chacha_alg_zvkb,
+ ARRAY_SIZE(riscv64_chacha_alg_zvkb));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_chacha_mod_fini(void)
+{
+ if (check_chacha20_ext())
+ crypto_unregister_skciphers(
+ riscv64_chacha_alg_zvkb,
+ ARRAY_SIZE(riscv64_chacha_alg_zvkb));
+}
+
+module_init(riscv64_chacha_mod_init);
+module_exit(riscv64_chacha_mod_fini);
+
+MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("chacha20");
diff --git a/arch/riscv/crypto/chacha-riscv64-zvkb.pl b/arch/riscv/crypto/chacha-riscv64-zvkb.pl
new file mode 100644
index 000000000000..9caf7b247804
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-zvkb.pl
@@ -0,0 +1,322 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023-2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You may not use
+# this file except in compliance with the License. You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Jerry Shih <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT, ">$output";
+
+my $code = <<___;
+.text
+___
+
+# void ChaCha20_ctr32_zvkb(unsigned char *out, const unsigned char *inp,
+# size_t len, const unsigned int key[8],
+# const unsigned int counter[4]);
+################################################################################
+my ( $OUTPUT, $INPUT, $LEN, $KEY, $COUNTER ) = ( "a0", "a1", "a2", "a3", "a4" );
+my ( $T0 ) = ( "t0" );
+my ( $CONST_DATA0, $CONST_DATA1, $CONST_DATA2, $CONST_DATA3 ) =
+ ( "a5", "a6", "a7", "t1" );
+my ( $KEY0, $KEY1, $KEY2,$KEY3, $KEY4, $KEY5, $KEY6, $KEY7,
+ $COUNTER0, $COUNTER1, $NONCE0, $NONCE1
+) = ( "s0", "s1", "s2", "s3", "s4", "s5", "s6",
+ "s7", "s8", "s9", "s10", "s11" );
+my ( $VL, $STRIDE, $CHACHA_LOOP_COUNT ) = ( "t2", "t3", "t4" );
+my (
+ $V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, $V8, $V9, $V10,
+ $V11, $V12, $V13, $V14, $V15, $V16, $V17, $V18, $V19, $V20, $V21,
+ $V22, $V23, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map( "v$_", ( 0 .. 31 ) );
+
+sub chacha_quad_round_group {
+ my (
+ $A0, $B0, $C0, $D0, $A1, $B1, $C1, $D1,
+ $A2, $B2, $C2, $D2, $A3, $B3, $C3, $D3
+ ) = @_;
+
+ my $code = <<___;
+ # a += b; d ^= a; d <<<= 16;
+ @{[vadd_vv $A0, $A0, $B0]}
+ @{[vadd_vv $A1, $A1, $B1]}
+ @{[vadd_vv $A2, $A2, $B2]}
+ @{[vadd_vv $A3, $A3, $B3]}
+ @{[vxor_vv $D0, $D0, $A0]}
+ @{[vxor_vv $D1, $D1, $A1]}
+ @{[vxor_vv $D2, $D2, $A2]}
+ @{[vxor_vv $D3, $D3, $A3]}
+ @{[vror_vi $D0, $D0, 32 - 16]}
+ @{[vror_vi $D1, $D1, 32 - 16]}
+ @{[vror_vi $D2, $D2, 32 - 16]}
+ @{[vror_vi $D3, $D3, 32 - 16]}
+ # c += d; b ^= c; b <<<= 12;
+ @{[vadd_vv $C0, $C0, $D0]}
+ @{[vadd_vv $C1, $C1, $D1]}
+ @{[vadd_vv $C2, $C2, $D2]}
+ @{[vadd_vv $C3, $C3, $D3]}
+ @{[vxor_vv $B0, $B0, $C0]}
+ @{[vxor_vv $B1, $B1, $C1]}
+ @{[vxor_vv $B2, $B2, $C2]}
+ @{[vxor_vv $B3, $B3, $C3]}
+ @{[vror_vi $B0, $B0, 32 - 12]}
+ @{[vror_vi $B1, $B1, 32 - 12]}
+ @{[vror_vi $B2, $B2, 32 - 12]}
+ @{[vror_vi $B3, $B3, 32 - 12]}
+ # a += b; d ^= a; d <<<= 8;
+ @{[vadd_vv $A0, $A0, $B0]}
+ @{[vadd_vv $A1, $A1, $B1]}
+ @{[vadd_vv $A2, $A2, $B2]}
+ @{[vadd_vv $A3, $A3, $B3]}
+ @{[vxor_vv $D0, $D0, $A0]}
+ @{[vxor_vv $D1, $D1, $A1]}
+ @{[vxor_vv $D2, $D2, $A2]}
+ @{[vxor_vv $D3, $D3, $A3]}
+ @{[vror_vi $D0, $D0, 32 - 8]}
+ @{[vror_vi $D1, $D1, 32 - 8]}
+ @{[vror_vi $D2, $D2, 32 - 8]}
+ @{[vror_vi $D3, $D3, 32 - 8]}
+ # c += d; b ^= c; b <<<= 7;
+ @{[vadd_vv $C0, $C0, $D0]}
+ @{[vadd_vv $C1, $C1, $D1]}
+ @{[vadd_vv $C2, $C2, $D2]}
+ @{[vadd_vv $C3, $C3, $D3]}
+ @{[vxor_vv $B0, $B0, $C0]}
+ @{[vxor_vv $B1, $B1, $C1]}
+ @{[vxor_vv $B2, $B2, $C2]}
+ @{[vxor_vv $B3, $B3, $C3]}
+ @{[vror_vi $B0, $B0, 32 - 7]}
+ @{[vror_vi $B1, $B1, 32 - 7]}
+ @{[vror_vi $B2, $B2, 32 - 7]}
+ @{[vror_vi $B3, $B3, 32 - 7]}
+___
+
+ return $code;
+}
+
+$code .= <<___;
+.p2align 3
+.globl ChaCha20_ctr32_zvkb
+.type ChaCha20_ctr32_zvkb,\@function
+ChaCha20_ctr32_zvkb:
+ srli $LEN, $LEN, 6
+ beqz $LEN, .Lend
+
+ addi sp, sp, -96
+ sd s0, 0(sp)
+ sd s1, 8(sp)
+ sd s2, 16(sp)
+ sd s3, 24(sp)
+ sd s4, 32(sp)
+ sd s5, 40(sp)
+ sd s6, 48(sp)
+ sd s7, 56(sp)
+ sd s8, 64(sp)
+ sd s9, 72(sp)
+ sd s10, 80(sp)
+ sd s11, 88(sp)
+
+ li $STRIDE, 64
+
+ #### chacha block data
+ # "expa" little endian
+ li $CONST_DATA0, 0x61707865
+ # "nd 3" little endian
+ li $CONST_DATA1, 0x3320646e
+ # "2-by" little endian
+ li $CONST_DATA2, 0x79622d32
+ # "te k" little endian
+ li $CONST_DATA3, 0x6b206574
+
+ lw $KEY0, 0($KEY)
+ lw $KEY1, 4($KEY)
+ lw $KEY2, 8($KEY)
+ lw $KEY3, 12($KEY)
+ lw $KEY4, 16($KEY)
+ lw $KEY5, 20($KEY)
+ lw $KEY6, 24($KEY)
+ lw $KEY7, 28($KEY)
+
+ lw $COUNTER0, 0($COUNTER)
+ lw $COUNTER1, 4($COUNTER)
+ lw $NONCE0, 8($COUNTER)
+ lw $NONCE1, 12($COUNTER)
+
+.Lblock_loop:
+ @{[vsetvli $VL, $LEN, "e32", "m1", "ta", "ma"]}
+
+ # init chacha const states
+ @{[vmv_v_x $V0, $CONST_DATA0]}
+ @{[vmv_v_x $V1, $CONST_DATA1]}
+ @{[vmv_v_x $V2, $CONST_DATA2]}
+ @{[vmv_v_x $V3, $CONST_DATA3]}
+
+ # init chacha key states
+ @{[vmv_v_x $V4, $KEY0]}
+ @{[vmv_v_x $V5, $KEY1]}
+ @{[vmv_v_x $V6, $KEY2]}
+ @{[vmv_v_x $V7, $KEY3]}
+ @{[vmv_v_x $V8, $KEY4]}
+ @{[vmv_v_x $V9, $KEY5]}
+ @{[vmv_v_x $V10, $KEY6]}
+ @{[vmv_v_x $V11, $KEY7]}
+
+ # init chacha key states
+ @{[vid_v $V12]}
+ @{[vadd_vx $V12, $V12, $COUNTER0]}
+ @{[vmv_v_x $V13, $COUNTER1]}
+
+ # init chacha nonce states
+ @{[vmv_v_x $V14, $NONCE0]}
+ @{[vmv_v_x $V15, $NONCE1]}
+
+ # load the top-half of input data
+ @{[vlsseg_nf_e32_v 8, $V16, $INPUT, $STRIDE]}
+
+ li $CHACHA_LOOP_COUNT, 10
+.Lround_loop:
+ addi $CHACHA_LOOP_COUNT, $CHACHA_LOOP_COUNT, -1
+ @{[chacha_quad_round_group
+ $V0, $V4, $V8, $V12,
+ $V1, $V5, $V9, $V13,
+ $V2, $V6, $V10, $V14,
+ $V3, $V7, $V11, $V15]}
+ @{[chacha_quad_round_group
+ $V0, $V5, $V10, $V15,
+ $V1, $V6, $V11, $V12,
+ $V2, $V7, $V8, $V13,
+ $V3, $V4, $V9, $V14]}
+ bnez $CHACHA_LOOP_COUNT, .Lround_loop
+
+ # load the bottom-half of input data
+ addi $T0, $INPUT, 32
+ @{[vlsseg_nf_e32_v 8, $V24, $T0, $STRIDE]}
+
+ # add chacha top-half initial block states
+ @{[vadd_vx $V0, $V0, $CONST_DATA0]}
+ @{[vadd_vx $V1, $V1, $CONST_DATA1]}
+ @{[vadd_vx $V2, $V2, $CONST_DATA2]}
+ @{[vadd_vx $V3, $V3, $CONST_DATA3]}
+ @{[vadd_vx $V4, $V4, $KEY0]}
+ @{[vadd_vx $V5, $V5, $KEY1]}
+ @{[vadd_vx $V6, $V6, $KEY2]}
+ @{[vadd_vx $V7, $V7, $KEY3]}
+ # xor with the top-half input
+ @{[vxor_vv $V16, $V16, $V0]}
+ @{[vxor_vv $V17, $V17, $V1]}
+ @{[vxor_vv $V18, $V18, $V2]}
+ @{[vxor_vv $V19, $V19, $V3]}
+ @{[vxor_vv $V20, $V20, $V4]}
+ @{[vxor_vv $V21, $V21, $V5]}
+ @{[vxor_vv $V22, $V22, $V6]}
+ @{[vxor_vv $V23, $V23, $V7]}
+
+ # save the top-half of output
+ @{[vssseg_nf_e32_v 8, $V16, $OUTPUT, $STRIDE]}
+
+ # add chacha bottom-half initial block states
+ @{[vadd_vx $V8, $V8, $KEY4]}
+ @{[vadd_vx $V9, $V9, $KEY5]}
+ @{[vadd_vx $V10, $V10, $KEY6]}
+ @{[vadd_vx $V11, $V11, $KEY7]}
+ @{[vid_v $V0]}
+ @{[vadd_vx $V12, $V12, $COUNTER0]}
+ @{[vadd_vx $V13, $V13, $COUNTER1]}
+ @{[vadd_vx $V14, $V14, $NONCE0]}
+ @{[vadd_vx $V15, $V15, $NONCE1]}
+ @{[vadd_vv $V12, $V12, $V0]}
+ # xor with the bottom-half input
+ @{[vxor_vv $V24, $V24, $V8]}
+ @{[vxor_vv $V25, $V25, $V9]}
+ @{[vxor_vv $V26, $V26, $V10]}
+ @{[vxor_vv $V27, $V27, $V11]}
+ @{[vxor_vv $V29, $V29, $V13]}
+ @{[vxor_vv $V28, $V28, $V12]}
+ @{[vxor_vv $V30, $V30, $V14]}
+ @{[vxor_vv $V31, $V31, $V15]}
+
+ # save the bottom-half of output
+ addi $T0, $OUTPUT, 32
+ @{[vssseg_nf_e32_v 8, $V24, $T0, $STRIDE]}
+
+ # update counter
+ add $COUNTER0, $COUNTER0, $VL
+ sub $LEN, $LEN, $VL
+ # increase offset for `4 * 16 * VL = 64 * VL`
+ slli $T0, $VL, 6
+ add $INPUT, $INPUT, $T0
+ add $OUTPUT, $OUTPUT, $T0
+ bnez $LEN, .Lblock_loop
+
+ ld s0, 0(sp)
+ ld s1, 8(sp)
+ ld s2, 16(sp)
+ ld s3, 24(sp)
+ ld s4, 32(sp)
+ ld s5, 40(sp)
+ ld s6, 48(sp)
+ ld s7, 56(sp)
+ ld s8, 64(sp)
+ ld s9, 72(sp)
+ ld s10, 80(sp)
+ ld s11, 88(sp)
+ addi sp, sp, 96
+
+.Lend:
+ ret
+.size ChaCha20_ctr32_zvkb,.-ChaCha20_ctr32_zvkb
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-10-25 18:39:34

by Jerry Shih

[permalink] [raw]
Subject: [PATCH 08/12] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations

Add SHA224 and 256 implementations using Zvknha or Zvknhb vector crypto
extensions from OpenSSL(openssl/openssl#21923).

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
---
arch/riscv/crypto/Kconfig | 13 +
arch/riscv/crypto/Makefile | 7 +
arch/riscv/crypto/sha256-riscv64-glue.c | 140 ++++++++
.../sha256-riscv64-zvkb-zvknha_or_zvknhb.pl | 318 ++++++++++++++++++
4 files changed, 478 insertions(+)
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 00be7177eb1e..dbc86592063a 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -49,4 +49,17 @@ config CRYPTO_GHASH_RISCV64
Architecture: riscv64 using:
- Zvkg vector crypto extension

+config CRYPTO_SHA256_RISCV64
+ default y if RISCV_ISA_V
+ tristate "Hash functions: SHA-224 and SHA-256"
+ depends on 64BIT && RISCV_ISA_V
+ select CRYPTO_HASH
+ select CRYPTO_SHA256
+ help
+ SHA-256 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+ - Zvknha or Zvknhb vector crypto extensions
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 532316cc1758..49577fb72cca 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -12,6 +12,9 @@ aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkne
obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o
+
quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $(<) void $(@)

@@ -27,7 +30,11 @@ $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl
$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
$(call cmd,perlasm)

+$(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
+ $(call cmd,perlasm)
+
clean-files += aes-riscv64-zvkned.S
clean-files += aes-riscv64-zvbb-zvkg-zvkned.S
clean-files += aes-riscv64-zvkb-zvkned.S
clean-files += ghash-riscv64-zvkg.S
+clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 000000000000..35acfb6e4b2d
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISC-V 64
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha256_base.h>
+
+/*
+ * sha256 using zvkb and zvknha/b vector crypto extension
+ *
+ * This asm function will just take the first 256-bit as the sha256 state from
+ * the pointer to `struct sha256_state`.
+ */
+void sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest,
+ const u8 *data,
+ int num_blks);
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ int ret = 0;
+
+ /*
+ * Make sure struct sha256_state begins directly with the SHA256
+ * 256-bit internal state, as this is what the asm function expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ret = sha256_base_do_update(
+ desc, data, len,
+ sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+ kernel_vector_end();
+ } else {
+ ret = crypto_sha256_update(desc, data, len);
+ }
+
+ return ret;
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sha256_base_do_update(
+ desc, data, len,
+ sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+ sha256_base_do_finalize(
+ desc, sha256_block_data_order_zvkb_zvknha_or_zvknhb);
+ kernel_vector_end();
+
+ return sha256_base_finish(desc, out);
+ }
+
+ return crypto_sha256_finup(desc, data, len, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha256_alg[] = {
+ {
+ .digestsize = SHA256_DIGEST_SIZE,
+ .init = sha256_base_init,
+ .update = riscv64_sha256_update,
+ .final = riscv64_sha256_final,
+ .finup = riscv64_sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .base.cra_name = "sha256",
+ .base.cra_driver_name = "sha256-riscv64-zvkb-zvknha_or_zvknhb",
+ .base.cra_priority = 150,
+ .base.cra_blocksize = SHA256_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+ },
+ {
+ .digestsize = SHA224_DIGEST_SIZE,
+ .init = sha224_base_init,
+ .update = riscv64_sha256_update,
+ .final = riscv64_sha256_final,
+ .finup = riscv64_sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .base.cra_name = "sha224",
+ .base.cra_driver_name = "sha224-riscv64-zvkb-zvknha_or_zvknhb",
+ .base.cra_priority = 150,
+ .base.cra_blocksize = SHA224_BLOCK_SIZE,
+ .base.cra_module = THIS_MODULE,
+ }
+};
+
+static inline bool check_sha256_ext(void)
+{
+ /*
+ * From the spec:
+ * The Zvknhb ext supports both SHA-256 and SHA-512 and Zvknha only
+ * supports SHA-256.
+ */
+ return (riscv_isa_extension_available(NULL, ZVKNHA) ||
+ riscv_isa_extension_available(NULL, ZVKNHB)) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128;
+}
+
+static int __init riscv64_sha256_mod_init(void)
+{
+ if (check_sha256_ext())
+ return crypto_register_shashes(sha256_alg,
+ ARRAY_SIZE(sha256_alg));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sha256_mod_fini(void)
+{
+ if (check_sha256_ext())
+ crypto_unregister_shashes(sha256_alg, ARRAY_SIZE(sha256_alg));
+}
+
+module_init(riscv64_sha256_mod_init);
+module_exit(riscv64_sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)");
+MODULE_AUTHOR("Andy Polyakov <[email protected]>");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha224");
+MODULE_ALIAS_CRYPTO("sha256");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl b/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
new file mode 100644
index 000000000000..51b2d9d4f8f1
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl
@@ -0,0 +1,318 @@
+#! /usr/bin/env perl
+# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause
+#
+# This file is dual-licensed, meaning that you can use it under your
+# choice of either of the following two licenses:
+#
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License"). You can obtain
+# a copy in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+#
+# or
+#
+# Copyright (c) 2023, Christoph Müllner <[email protected]>
+# Copyright (c) 2023, Phoebe Chen <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# 1. Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V Vector ('V') with VLEN >= 128
+# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7,
+ $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15,
+ $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23,
+ $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31,
+) = map("v$_",(0..31));
+
+my $K256 = "K256";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4");
+
+sub sha_256_load_constant {
+ my $code=<<___;
+ la $KT, $K256 # Load round constants K256
+ @{[vle32_v $V10, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V11, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V12, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V13, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V14, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V15, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V16, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V17, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V18, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V19, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V20, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V21, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V22, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V23, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V24, $KT]}
+ addi $KT, $KT, 16
+ @{[vle32_v $V25, $KT]}
+___
+
+ return $code;
+}
+
+################################################################################
+# void sha256_block_data_order_zvkb_zvknha_or_zvknhb(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha256_block_data_order_zvkb_zvknha_or_zvknhb
+.type sha256_block_data_order_zvkb_zvknha_or_zvknhb,\@function
+sha256_block_data_order_zvkb_zvknha_or_zvknhb:
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+ @{[sha_256_load_constant]}
+
+ # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+ # The dst vtype is e32m1 and the index vtype is e8mf4.
+ # We use index-load with the following index pattern at v26.
+ # i8 index:
+ # 20, 16, 4, 0
+ # Instead of setting the i8 index, we could use a single 32bit
+ # little-endian value to cover the 4xi8 index.
+ # i32 value:
+ # 0x 00 04 10 14
+ li $INDEX_PATTERN, 0x00041014
+ @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]}
+ @{[vmv_v_x $V26, $INDEX_PATTERN]}
+
+ addi $H2, $H, 8
+
+ # Use index-load to get {f,e,b,a},{h,g,d,c}
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+ @{[vluxei8_v $V6, $H, $V26]}
+ @{[vluxei8_v $V7, $H2, $V26]}
+
+ # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling.
+ # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking.
+ @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]}
+ @{[vmv_v_i $V0, 0x01]}
+
+ @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
+
+L_round_loop:
+ # Decrement length by 1
+ add $LEN, $LEN, -1
+
+ # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}.
+ @{[vmv_v_v $V30, $V6]}
+ @{[vmv_v_v $V31, $V7]}
+
+ # Load the 512-bits of the message block in v1-v4 and perform
+ # an endian swap on each 4 bytes element.
+ @{[vle32_v $V1, $INP]}
+ @{[vrev8_v $V1, $V1]}
+ add $INP, $INP, 16
+ @{[vle32_v $V2, $INP]}
+ @{[vrev8_v $V2, $V2]}
+ add $INP, $INP, 16
+ @{[vle32_v $V3, $INP]}
+ @{[vrev8_v $V3, $V3]}
+ add $INP, $INP, 16
+ @{[vle32_v $V4, $INP]}
+ @{[vrev8_v $V4, $V4]}
+ add $INP, $INP, 16
+
+ # Quad-round 0 (+0, Wt from oldest to newest in v1->v2->v3->v4)
+ @{[vadd_vv $V5, $V10, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+ @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[19:16]
+
+ # Quad-round 1 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V11, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+ @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[23:20]
+
+ # Quad-round 2 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V12, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+ @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[27:24]
+
+ # Quad-round 3 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V13, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+ @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[31:28]
+
+ # Quad-round 4 (+0, v1->v2->v3->v4)
+ @{[vadd_vv $V5, $V14, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+ @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[35:32]
+
+ # Quad-round 5 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V15, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+ @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[39:36]
+
+ # Quad-round 6 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V16, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+ @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[43:40]
+
+ # Quad-round 7 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V17, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+ @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[47:44]
+
+ # Quad-round 8 (+0, v1->v2->v3->v4)
+ @{[vadd_vv $V5, $V18, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V3, $V2, $V0]}
+ @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[51:48]
+
+ # Quad-round 9 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V19, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V4, $V3, $V0]}
+ @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[55:52]
+
+ # Quad-round 10 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V20, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V1, $V4, $V0]}
+ @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[59:56]
+
+ # Quad-round 11 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V21, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+ @{[vmerge_vvm $V5, $V2, $V1, $V0]}
+ @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[63:60]
+
+ # Quad-round 12 (+0, v1->v2->v3->v4)
+ # Note that we stop generating new message schedule words (Wt, v1-13)
+ # as we already generated all the words we end up consuming (i.e., W[63:60]).
+ @{[vadd_vv $V5, $V22, $V1]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # Quad-round 13 (+1, v2->v3->v4->v1)
+ @{[vadd_vv $V5, $V23, $V2]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # Quad-round 14 (+2, v3->v4->v1->v2)
+ @{[vadd_vv $V5, $V24, $V3]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # Quad-round 15 (+3, v4->v1->v2->v3)
+ @{[vadd_vv $V5, $V25, $V4]}
+ @{[vsha2cl_vv $V7, $V6, $V5]}
+ @{[vsha2ch_vv $V6, $V7, $V5]}
+
+ # H' = H+{a',b',c',...,h'}
+ @{[vadd_vv $V6, $V30, $V6]}
+ @{[vadd_vv $V7, $V31, $V7]}
+ bnez $LEN, L_round_loop
+
+ # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}.
+ @{[vsuxei8_v $V6, $H, $V26]}
+ @{[vsuxei8_v $V7, $H2, $V26]}
+
+ ret
+.size sha256_block_data_order_zvkb_zvknha_or_zvknhb,.-sha256_block_data_order_zvkb_zvknha_or_zvknhb
+
+.p2align 2
+.type $K256,\@object
+$K256:
+ .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+ .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+ .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+ .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+ .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+ .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+ .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+ .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+ .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+ .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+ .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+ .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+ .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+ .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+ .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+ .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size $K256,.-$K256
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
--
2.28.0

2023-11-02 04:52:06

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation

On Thu, Oct 26, 2023 at 02:36:36AM +0800, Jerry Shih wrote:
> diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
> index 10d60edc0110..500938317e71 100644
> --- a/arch/riscv/crypto/Kconfig
> +++ b/arch/riscv/crypto/Kconfig
> @@ -2,4 +2,16 @@
>
> menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
>
> +config CRYPTO_AES_RISCV64
> + default y if RISCV_ISA_V
> + tristate "Ciphers: AES"
> + depends on 64BIT && RISCV_ISA_V
> + select CRYPTO_AES
> + select CRYPTO_ALGAPI
> + help
> + Block ciphers: AES cipher algorithms (FIPS-197)
> +
> + Architecture: riscv64 using:
> + - Zvkned vector crypto extension

kconfig options should default to off.

I.e., remove the line "default y if RISCV_ISA_V"

> + *
> + * All zvkned-based functions use encryption expending keys for both encryption
> + * and decryption.
> + */

The above comment is a bit confusing. It's describing the 'key' field of struct
aes_key; maybe there should be a comment there instead:

struct aes_key {
u32 key[AES_MAX_KEYLENGTH_U32]; /* round keys in encryption order */
u32 rounds;
};

> +int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key,
> + unsigned int keylen)
> +{
> + /*
> + * The RISC-V AES vector crypto key expending doesn't support AES-192.
> + * We just use the generic software key expending here to simplify the key
> + * expending flow.
> + */

expending => expanding

> + u32 aes_rounds;
> + u32 key_length;
> + int ret;
> +
> + ret = aes_expandkey(&ctx->fallback_ctx, key, keylen);
> + if (ret < 0)
> + return -EINVAL;
> +
> + /*
> + * Copy the key from `crypto_aes_ctx` to `aes_key` for zvkned-based AES
> + * implementations.
> + */
> + aes_rounds = aes_round_num(keylen);
> + ctx->key.rounds = aes_rounds;
> + key_length = AES_BLOCK_SIZE * (aes_rounds + 1);
> + memcpy(ctx->key.key, ctx->fallback_ctx.key_enc, key_length);
> +
> + return 0;
> +}

Ideally this would use the same crypto_aes_ctx for both the fallback and the
assembly code. I suppose we don't want to diverge from the OpenSSL code (unless
it gets rewritten), though. So I guess this is fine for now.

> void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst,
> const u8 *src)

These functions can be called from a different module (aes-block-riscv64), so
they need EXPORT_SYMBOL_GPL.

> +static inline bool check_aes_ext(void)
> +{
> + return riscv_isa_extension_available(NULL, ZVKNED) &&
> + riscv_vector_vlen() >= 128;
> +}
> +
> +static int __init riscv64_aes_mod_init(void)
> +{
> + if (check_aes_ext())
> + return crypto_register_alg(&riscv64_aes_alg_zvkned);
> +
> + return -ENODEV;
> +}
> +
> +static void __exit riscv64_aes_mod_fini(void)
> +{
> + if (check_aes_ext())
> + crypto_unregister_alg(&riscv64_aes_alg_zvkned);
> +}
> +
> +module_init(riscv64_aes_mod_init);
> +module_exit(riscv64_aes_mod_fini);

module_exit can only run if module_init succeeded. So, in cases like this it's
not necessary to check for CPU features before unregistering the algorithm.

- Eric

2023-11-02 05:16:51

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> +config CRYPTO_AES_BLOCK_RISCV64
> + default y if RISCV_ISA_V
> + tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
> + depends on 64BIT && RISCV_ISA_V
> + select CRYPTO_AES_RISCV64
> + select CRYPTO_SKCIPHER
> + help
> + Length-preserving ciphers: AES cipher algorithms (FIPS-197)
> + with block cipher modes:
> + - ECB (Electronic Codebook) mode (NIST SP 800-38A)
> + - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
> + - CTR (Counter) mode (NIST SP 800-38A)
> + - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
> + Stealing) mode (NIST SP 800-38E and IEEE 1619)
> +
> + Architecture: riscv64 using:
> + - Zvbb vector extension (XTS)
> + - Zvkb vector crypto extension (CTR/XTS)
> + - Zvkg vector crypto extension (XTS)
> + - Zvkned vector crypto extension

Maybe list Zvkned first since it's the most important one in this context.

> +#define AES_BLOCK_VALID_SIZE_MASK (~(AES_BLOCK_SIZE - 1))
> +#define AES_BLOCK_REMAINING_SIZE_MASK (AES_BLOCK_SIZE - 1)

I think it would be easier to read if these values were just used directly.

> +static int ecb_encrypt(struct skcipher_request *req)
> +{
> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> + struct skcipher_walk walk;
> + unsigned int nbytes;
> + int err;
> +
> + /* If we have error here, the `nbytes` will be zero. */
> + err = skcipher_walk_virt(&walk, req, false);
> + while ((nbytes = walk.nbytes)) {
> + kernel_vector_begin();
> + rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> + nbytes & AES_BLOCK_VALID_SIZE_MASK,
> + &ctx->key);
> + kernel_vector_end();
> + err = skcipher_walk_done(
> + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> + }
> +
> + return err;
> +}

There's no fallback for !crypto_simd_usable() here. I really like it this way.
However, for it to work (for skciphers and aeads), RISC-V needs to allow the
vector registers to be used in softirq context. Is that already the case?

> +/* ctr */
> +static int ctr_encrypt(struct skcipher_request *req)
> +{
> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> + struct skcipher_walk walk;
> + unsigned int ctr32;
> + unsigned int nbytes;
> + unsigned int blocks;
> + unsigned int current_blocks;
> + unsigned int current_length;
> + int err;
> +
> + /* the ctr iv uses big endian */
> + ctr32 = get_unaligned_be32(req->iv + 12);
> + err = skcipher_walk_virt(&walk, req, false);
> + while ((nbytes = walk.nbytes)) {
> + if (nbytes != walk.total) {
> + nbytes &= AES_BLOCK_VALID_SIZE_MASK;
> + blocks = nbytes / AES_BLOCK_SIZE;
> + } else {
> + /* This is the last walk. We should handle the tail data. */
> + blocks = (nbytes + (AES_BLOCK_SIZE - 1)) /
> + AES_BLOCK_SIZE;

'(nbytes + (AES_BLOCK_SIZE - 1)) / AES_BLOCK_SIZE' can be replaced with
'DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE)'

> +static int xts_crypt(struct skcipher_request *req, aes_xts_func func)
> +{
> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> + const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
> + struct skcipher_request sub_req;
> + struct scatterlist sg_src[2], sg_dst[2];
> + struct scatterlist *src, *dst;
> + struct skcipher_walk walk;
> + unsigned int walk_size = crypto_skcipher_walksize(tfm);
> + unsigned int tail_bytes;
> + unsigned int head_bytes;
> + unsigned int nbytes;
> + unsigned int update_iv = 1;
> + int err;
> +
> + /* xts input size should be bigger than AES_BLOCK_SIZE */
> + if (req->cryptlen < AES_BLOCK_SIZE)
> + return -EINVAL;
> +
> + /*
> + * The tail size should be small than walk_size. Thus, we could make sure the
> + * walk size for tail elements could be bigger than AES_BLOCK_SIZE.
> + */
> + if (req->cryptlen <= walk_size) {
> + tail_bytes = req->cryptlen;
> + head_bytes = 0;
> + } else {
> + if (req->cryptlen & AES_BLOCK_REMAINING_SIZE_MASK) {
> + tail_bytes = req->cryptlen &
> + AES_BLOCK_REMAINING_SIZE_MASK;
> + tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE;
> + head_bytes = req->cryptlen - tail_bytes;
> + } else {
> + tail_bytes = 0;
> + head_bytes = req->cryptlen;
> + }
> + }
> +
> + riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv);
> +
> + if (head_bytes && tail_bytes) {
> + skcipher_request_set_tfm(&sub_req, tfm);
> + skcipher_request_set_callback(
> + &sub_req, skcipher_request_flags(req), NULL, NULL);
> + skcipher_request_set_crypt(&sub_req, req->src, req->dst,
> + head_bytes, req->iv);
> + req = &sub_req;
> + }
> +
> + if (head_bytes) {
> + err = skcipher_walk_virt(&walk, req, false);
> + while ((nbytes = walk.nbytes)) {
> + if (nbytes == walk.total)
> + update_iv = (tail_bytes > 0);
> +
> + nbytes &= AES_BLOCK_VALID_SIZE_MASK;
> + kernel_vector_begin();
> + func(walk.src.virt.addr, walk.dst.virt.addr, nbytes,
> + &ctx->ctx1.key, req->iv, update_iv);
> + kernel_vector_end();
> +
> + err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> + }
> + if (err || !tail_bytes)
> + return err;
> +
> + dst = src = scatterwalk_next(sg_src, &walk.in);
> + if (req->dst != req->src)
> + dst = scatterwalk_next(sg_dst, &walk.out);
> + skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv);
> + }
> +
> + /* tail */
> + err = skcipher_walk_virt(&walk, req, false);
> + if (err)
> + return err;
> + if (walk.nbytes != tail_bytes)
> + return -EINVAL;
> + kernel_vector_begin();
> + func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes,
> + &ctx->ctx1.key, req->iv, 0);
> + kernel_vector_end();
> +
> + return skcipher_walk_done(&walk, 0);
> +}

This function looks a bit weird. I see it's also the only caller of the
scatterwalk_next() function that you're adding. I haven't looked at this super
closely, but I expect that there's a cleaner way of handling the "tail" than
this -- maybe use scatterwalk_map_and_copy() to copy from/to a stack buffer?

- Eric

2023-11-02 05:43:37

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation

On Thu, Oct 26, 2023 at 02:36:44AM +0800, Jerry Shih wrote:
> +static struct skcipher_alg riscv64_chacha_alg_zvkb[] = { {
> + .base = {
> + .cra_name = "chacha20",
> + .cra_driver_name = "chacha20-riscv64-zvkb",
> + .cra_priority = 300,
> + .cra_blocksize = 1,
> + .cra_ctxsize = sizeof(struct chacha_ctx),
> + .cra_module = THIS_MODULE,
> + },
> + .min_keysize = CHACHA_KEY_SIZE,
> + .max_keysize = CHACHA_KEY_SIZE,
> + .ivsize = CHACHA_IV_SIZE,
> + .chunksize = CHACHA_BLOCK_SIZE,
> + .walksize = CHACHA_BLOCK_SIZE * 4,
> + .setkey = chacha20_setkey,
> + .encrypt = chacha20_encrypt,
> + .decrypt = chacha20_encrypt,
> +} };
> +
> +static inline bool check_chacha20_ext(void)
> +{
> + return riscv_isa_extension_available(NULL, ZVKB) &&
> + riscv_vector_vlen() >= 128;
> +}
> +
> +static int __init riscv64_chacha_mod_init(void)
> +{
> + if (check_chacha20_ext())
> + return crypto_register_skciphers(
> + riscv64_chacha_alg_zvkb,
> + ARRAY_SIZE(riscv64_chacha_alg_zvkb));
> +
> + return -ENODEV;
> +}
> +
> +static void __exit riscv64_chacha_mod_fini(void)
> +{
> + if (check_chacha20_ext())
> + crypto_unregister_skciphers(
> + riscv64_chacha_alg_zvkb,
> + ARRAY_SIZE(riscv64_chacha_alg_zvkb));
> +}

When there's just one algorithm being registered/unregistered,
crypto_register_skcipher() and crypto_unregister_skcipher() can be used.

> +# - RV64I
> +# - RISC-V Vector ('V') with VLEN >= 128
> +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
> +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores)

How is the presence of the Zicclsm extension guaranteed?

- Eric

2023-11-02 05:58:56

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation

On Thu, Oct 26, 2023 at 02:36:42AM +0800, Jerry Shih wrote:
> +struct crypto_alg riscv64_sm4_zvksed_alg = {
> + .cra_name = "sm4",
> + .cra_driver_name = "sm4-riscv64-zvkb-zvksed",
> + .cra_module = THIS_MODULE,
> + .cra_priority = 300,
> + .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
> + .cra_blocksize = SM4_BLOCK_SIZE,
> + .cra_ctxsize = sizeof(struct sm4_ctx),
> + .cra_cipher = {
> + .cia_min_keysize = SM4_KEY_SIZE,
> + .cia_max_keysize = SM4_KEY_SIZE,
> + .cia_setkey = riscv64_sm4_setkey_zvksed,
> + .cia_encrypt = riscv64_sm4_encrypt_zvksed,
> + .cia_decrypt = riscv64_sm4_decrypt_zvksed,
> + },
> +};

This should be 'static'.

- Eric