2024-01-02 06:50:18

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 00/13] RISC-V crypto with reworked asm files

As discussed previously, the proposed use of the so-called perlasm for
the RISC-V crypto assembly files makes them difficult to read, and these
files have some other issues such extensively duplicating source code
for the different AES key lengths and for the unrolled hash functions.
There is/was a desire to share code with OpenSSL, but many of the files
have already diverged significantly; also, for most of the algorithms
the source code can be quite short anyway, due to the native support for
them in the RISC-V vector crypto extensions combined with the way the
RISC-V vector extension naturally scales to arbitrary vector lengths.

Since we're still waiting for prerequisite patches to be merged anyway,
we have a bit more time to get this cleaned up properly. So I've had a
go at cleaning up the patchset to use standard .S files, with the code
duplication fixed. I also made some tweaks to make the different
algorithms consistent with each other and with what exists in the kernel
already for other architectures, and tried to improve comments.

The result is this series, which passes all tests and is about 2400
lines shorter than the latest version with the perlasm
(https://lore.kernel.org/linux-crypto/[email protected]/).
All the same functionality and general optimizations are still included,
except for some minor optimizations in XTS that I dropped since it's not
clear they are worth the complexity. (Note that almost all users of XTS
in the kernel only use it with block-aligned messages, so it's not very
important to optimize the ciphertext stealing case.)

I'd appreciate people's thoughts on this series. Jerry, I hope I'm not
stepping on your toes too much here, but I think there are some big
improvements here.

This series is based on riscv/for-next
(https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=for-next)
commit f352a28cc2fb4ee8d08c6a6362c9a861fcc84236, and for convenience
I've included the prerequisite patches too.

Andy Chiu (1):
riscv: vector: make Vector always available for softirq context

Eric Biggers (1):
RISC-V: add TOOLCHAIN_HAS_VECTOR_CRYPTO

Greentime Hu (1):
riscv: Add support for kernel mode vector

Heiko Stuebner (2):
RISC-V: add helper function to read the vector VLEN
RISC-V: hook new crypto subdir into build-system

Jerry Shih (8):
crypto: riscv - add vector crypto accelerated AES
crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}
crypto: riscv - add vector crypto accelerated ChaCha20
crypto: riscv - add vector crypto accelerated GHASH
crypto: riscv - add vector crypto accelerated SHA-{256,224}
crypto: riscv - add vector crypto accelerated SHA-{512,384}
crypto: riscv - add vector crypto accelerated SM3
crypto: riscv - add vector crypto accelerated SM4

arch/riscv/Kbuild | 1 +
arch/riscv/Kconfig | 7 +
arch/riscv/crypto/Kconfig | 108 +++++
arch/riscv/crypto/Makefile | 28 ++
arch/riscv/crypto/aes-macros.S | 156 +++++++
.../crypto/aes-riscv64-block-mode-glue.c | 435 ++++++++++++++++++
arch/riscv/crypto/aes-riscv64-glue.c | 123 +++++
arch/riscv/crypto/aes-riscv64-glue.h | 15 +
.../crypto/aes-riscv64-zvkned-zvbb-zvkg.S | 300 ++++++++++++
arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S | 146 ++++++
arch/riscv/crypto/aes-riscv64-zvkned.S | 180 ++++++++
arch/riscv/crypto/chacha-riscv64-glue.c | 101 ++++
arch/riscv/crypto/chacha-riscv64-zvkb.S | 294 ++++++++++++
arch/riscv/crypto/ghash-riscv64-glue.c | 170 +++++++
arch/riscv/crypto/ghash-riscv64-zvkg.S | 72 +++
arch/riscv/crypto/sha256-riscv64-glue.c | 137 ++++++
.../sha256-riscv64-zvknha_or_zvknhb-zvkb.S | 225 +++++++++
arch/riscv/crypto/sha512-riscv64-glue.c | 133 ++++++
.../riscv/crypto/sha512-riscv64-zvknhb-zvkb.S | 203 ++++++++
arch/riscv/crypto/sm3-riscv64-glue.c | 112 +++++
arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S | 123 +++++
arch/riscv/crypto/sm4-riscv64-glue.c | 109 +++++
arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S | 117 +++++
arch/riscv/include/asm/processor.h | 14 +-
arch/riscv/include/asm/simd.h | 48 ++
arch/riscv/include/asm/vector.h | 20 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kernel_mode_vector.c | 126 +++++
arch/riscv/kernel/process.c | 1 +
crypto/Kconfig | 3 +
30 files changed, 3507 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/crypto/Kconfig
create mode 100644 arch/riscv/crypto/Makefile
create mode 100644 arch/riscv/crypto/aes-macros.S
create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.S
create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.S
create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.S
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S
create mode 100644 arch/riscv/include/asm/simd.h
create mode 100644 arch/riscv/kernel/kernel_mode_vector.c


base-commit: f352a28cc2fb4ee8d08c6a6362c9a861fcc84236
--
2.43.0



2024-01-02 06:50:35

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 01/13] riscv: Add support for kernel mode vector

From: Greentime Hu <[email protected]>

Add kernel_vector_begin() and kernel_vector_end() function declarations
and corresponding definitions in kernel_mode_vector.c

These are needed to wrap uses of vector in kernel mode.

Co-developed-by: Vincent Chen <[email protected]>
Signed-off-by: Vincent Chen <[email protected]>
Signed-off-by: Greentime Hu <[email protected]>
Signed-off-by: Andy Chiu <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/include/asm/processor.h | 13 ++-
arch/riscv/include/asm/simd.h | 44 ++++++++++
arch/riscv/include/asm/vector.h | 9 ++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kernel_mode_vector.c | 116 +++++++++++++++++++++++++
arch/riscv/kernel/process.c | 1 +
6 files changed, 183 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/simd.h
create mode 100644 arch/riscv/kernel/kernel_mode_vector.c

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index f19f861cda549..28d19aea24b1d 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -66,29 +66,40 @@
#define TASK_UNMAPPED_BASE PAGE_ALIGN((UL(1) << MMAP_MIN_VA_BITS) / 3)
#else
#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
#endif

#ifndef __ASSEMBLY__

struct task_struct;
struct pt_regs;

+/*
+ * We use a flag to track in-kernel Vector context. Currently the flag has the
+ * following meaning:
+ *
+ * - bit 0: indicates whether the in-kernel Vector context is active. The
+ * activation of this state disables the preemption. Currently only 0 and 1
+ * are valid value for this field. Other values are reserved for future uses.
+ */
+#define RISCV_KERNEL_MODE_V 0x1
+
/* CPU-specific state of a task */
struct thread_struct {
/* Callee-saved registers */
unsigned long ra;
unsigned long sp; /* Kernel mode stack */
unsigned long s[12]; /* s[0]: frame pointer */
struct __riscv_d_ext_state fstate;
unsigned long bad_cause;
- unsigned long vstate_ctrl;
+ u32 riscv_v_flags;
+ u32 vstate_ctrl;
struct __riscv_v_ext_state vstate;
unsigned long align_ctl;
};

/* Whitelist the fstate from the task_struct for hardened usercopy */
static inline void arch_thread_struct_whitelist(unsigned long *offset,
unsigned long *size)
{
*offset = offsetof(struct thread_struct, fstate);
*size = sizeof_field(struct thread_struct, fstate);
diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
new file mode 100644
index 0000000000000..ef8af413a9fc7
--- /dev/null
+++ b/arch/riscv/include/asm/simd.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2017 Linaro Ltd. <[email protected]>
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_SIMD_H
+#define __ASM_SIMD_H
+
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+
+#ifdef CONFIG_RISCV_ISA_V
+/*
+ * may_use_simd - whether it is allowable at this time to issue vector
+ * instructions or access the vector register file
+ *
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static __must_check inline bool may_use_simd(void)
+{
+ /*
+ * RISCV_KERNEL_MODE_V is only set while preemption is disabled,
+ * and is clear whenever preemption is enabled.
+ */
+ return !in_hardirq() && !in_nmi() && !(riscv_v_flags() & RISCV_KERNEL_MODE_V);
+}
+
+#else /* ! CONFIG_RISCV_ISA_V */
+
+static __must_check inline bool may_use_simd(void)
+{
+ return false;
+}
+
+#endif /* ! CONFIG_RISCV_ISA_V */
+
+#endif
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 87aaef656257c..71af3404fda14 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -15,20 +15,29 @@
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <asm/ptrace.h>
#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/asm.h>

extern unsigned long riscv_v_vsize;
int riscv_v_setup_vsize(void);
bool riscv_v_first_use_handler(struct pt_regs *regs);
+void kernel_vector_begin(void);
+void kernel_vector_end(void);
+void get_cpu_vector_context(void);
+void put_cpu_vector_context(void);
+
+static inline u32 riscv_v_flags(void)
+{
+ return current->thread.riscv_v_flags;
+}

static __always_inline bool has_vector(void)
{
return riscv_has_extension_unlikely(RISCV_ISA_EXT_v);
}

static inline void __riscv_v_vstate_clean(struct pt_regs *regs)
{
regs->status = (regs->status & ~SR_VS) | SR_VS_CLEAN;
}
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index fee22a3d1b534..8c58595696b32 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -56,20 +56,21 @@ obj-y += riscv_ksyms.o
obj-y += stacktrace.o
obj-y += cacheinfo.o
obj-y += patch.o
obj-y += probes/
obj-y += tests/
obj-$(CONFIG_MMU) += vdso.o vdso/

obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o
obj-$(CONFIG_FPU) += fpu.o
obj-$(CONFIG_RISCV_ISA_V) += vector.o
+obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o
obj-$(CONFIG_SMP) += smpboot.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SMP) += cpu_ops.o

obj-$(CONFIG_RISCV_BOOT_SPINWAIT) += cpu_ops_spinwait.o
obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o

obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
new file mode 100644
index 0000000000000..114cf4f0a0eb6
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ * Copyright (C) 2017 Linaro Ltd. <[email protected]>
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+#include <asm/simd.h>
+
+static inline void riscv_v_flags_set(u32 flags)
+{
+ current->thread.riscv_v_flags = flags;
+}
+
+static inline void riscv_v_start(u32 flags)
+{
+ int orig;
+
+ orig = riscv_v_flags();
+ BUG_ON((orig & flags) != 0);
+ riscv_v_flags_set(orig | flags);
+}
+
+static inline void riscv_v_stop(u32 flags)
+{
+ int orig;
+
+ orig = riscv_v_flags();
+ BUG_ON((orig & flags) == 0);
+ riscv_v_flags_set(orig & ~flags);
+}
+
+/*
+ * Claim ownership of the CPU vector context for use by the calling context.
+ *
+ * The caller may freely manipulate the vector context metadata until
+ * put_cpu_vector_context() is called.
+ */
+void get_cpu_vector_context(void)
+{
+ preempt_disable();
+
+ riscv_v_start(RISCV_KERNEL_MODE_V);
+}
+
+/*
+ * Release the CPU vector context.
+ *
+ * Must be called from a context in which get_cpu_vector_context() was
+ * previously called, with no call to put_cpu_vector_context() in the
+ * meantime.
+ */
+void put_cpu_vector_context(void)
+{
+ riscv_v_stop(RISCV_KERNEL_MODE_V);
+
+ preempt_enable();
+}
+
+/*
+ * kernel_vector_begin(): obtain the CPU vector registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_simd() returns true.
+ * Task context in the vector registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_vector_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the vector registers until kernel_vector_end() is
+ * called.
+ */
+void kernel_vector_begin(void)
+{
+ if (WARN_ON(!has_vector()))
+ return;
+
+ BUG_ON(!may_use_simd());
+
+ get_cpu_vector_context();
+
+ riscv_v_vstate_save(current, task_pt_regs(current));
+
+ riscv_v_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_vector_begin);
+
+/*
+ * kernel_vector_end(): give the CPU vector registers back to the current task
+ *
+ * Must be called from a context in which kernel_vector_begin() was previously
+ * called, with no call to kernel_vector_end() in the meantime.
+ *
+ * The caller must not use the vector registers after this function is called,
+ * unless kernel_vector_begin() is called again in the meantime.
+ */
+void kernel_vector_end(void)
+{
+ if (WARN_ON(!has_vector()))
+ return;
+
+ riscv_v_vstate_restore(current, task_pt_regs(current));
+
+ riscv_v_disable();
+
+ put_cpu_vector_context();
+}
+EXPORT_SYMBOL_GPL(kernel_vector_end);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 4f21d970a1292..4a1275db11460 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -214,14 +214,15 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
*childregs = *(current_pt_regs());
/* Turn off status.VS */
riscv_v_vstate_off(childregs);
if (usp) /* User fork */
childregs->sp = usp;
if (clone_flags & CLONE_SETTLS)
childregs->tp = tls;
childregs->a0 = 0; /* Return value of fork() */
p->thread.s[0] = 0;
}
+ p->thread.riscv_v_flags = 0;
p->thread.ra = (unsigned long)ret_from_fork;
p->thread.sp = (unsigned long)childregs; /* kernel sp */
return 0;
}
--
2.43.0


2024-01-02 06:50:49

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 02/13] riscv: vector: make Vector always available for softirq context

From: Andy Chiu <[email protected]>

The goal of this patch is to provide full support of Vector in kernel
softirq context. So that some of the crypto alogrithms won't need scalar
fallbacks.

By disabling bottom halves in active kernel-mode Vector, softirq will
not be able to nest on top of any kernel-mode Vector. So, softirq
context is able to use Vector whenever it runs.

After this patch, Vector context cannot start with irqs disabled.
Otherwise local_bh_enable() may run in a wrong context.

Disabling bh is not enough for RT-kernel to prevent preeemption. So
we must disable preemption, which also implies disabling bh on RT.

Related-to: commit 696207d4258b ("arm64/sve: Make kernel FPU protection RT friendly")
Related-to: commit 66c3ec5a7120 ("arm64: neon: Forbid when irqs are disabled")
Signed-off-by: Andy Chiu <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/include/asm/processor.h | 5 +++--
arch/riscv/include/asm/simd.h | 6 +++++-
arch/riscv/kernel/kernel_mode_vector.c | 14 ++++++++++++--
3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 28d19aea24b1d..e768397890673 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -71,22 +71,23 @@
#ifndef __ASSEMBLY__

struct task_struct;
struct pt_regs;

/*
* We use a flag to track in-kernel Vector context. Currently the flag has the
* following meaning:
*
* - bit 0: indicates whether the in-kernel Vector context is active. The
- * activation of this state disables the preemption. Currently only 0 and 1
- * are valid value for this field. Other values are reserved for future uses.
+ * activation of this state disables the preemption. On a non-RT kernel, it
+ * also disable bh. Currently only 0 and 1 are valid value for this field.
+ * Other values are reserved for future uses.
*/
#define RISCV_KERNEL_MODE_V 0x1

/* CPU-specific state of a task */
struct thread_struct {
/* Callee-saved registers */
unsigned long ra;
unsigned long sp; /* Kernel mode stack */
unsigned long s[12]; /* s[0]: frame pointer */
struct __riscv_d_ext_state fstate;
diff --git a/arch/riscv/include/asm/simd.h b/arch/riscv/include/asm/simd.h
index ef8af413a9fc7..4d699e16c9a96 100644
--- a/arch/riscv/include/asm/simd.h
+++ b/arch/riscv/include/asm/simd.h
@@ -21,22 +21,26 @@
* instructions or access the vector register file
*
* Callers must not assume that the result remains true beyond the next
* preempt_enable() or return from softirq context.
*/
static __must_check inline bool may_use_simd(void)
{
/*
* RISCV_KERNEL_MODE_V is only set while preemption is disabled,
* and is clear whenever preemption is enabled.
+ *
+ * Kernel-mode Vector temporarily disables bh. So we must not return
+ * true on irq_disabled(). Otherwise we would fail the lockdep check
+ * calling local_bh_enable()
*/
- return !in_hardirq() && !in_nmi() && !(riscv_v_flags() & RISCV_KERNEL_MODE_V);
+ return !in_hardirq() && !in_nmi() && !irqs_disabled() && !(riscv_v_flags() & RISCV_KERNEL_MODE_V);
}

#else /* ! CONFIG_RISCV_ISA_V */

static __must_check inline bool may_use_simd(void)
{
return false;
}

#endif /* ! CONFIG_RISCV_ISA_V */
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
index 114cf4f0a0eb6..2fc145edae3dd 100644
--- a/arch/riscv/kernel/kernel_mode_vector.c
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -39,37 +39,47 @@ static inline void riscv_v_stop(u32 flags)
}

/*
* Claim ownership of the CPU vector context for use by the calling context.
*
* The caller may freely manipulate the vector context metadata until
* put_cpu_vector_context() is called.
*/
void get_cpu_vector_context(void)
{
- preempt_disable();
+ /*
+ * disable softirqs so it is impossible for softirqs to nest
+ * get_cpu_vector_context() when kernel is actively using Vector.
+ */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ local_bh_disable();
+ else
+ preempt_disable();

riscv_v_start(RISCV_KERNEL_MODE_V);
}

/*
* Release the CPU vector context.
*
* Must be called from a context in which get_cpu_vector_context() was
* previously called, with no call to put_cpu_vector_context() in the
* meantime.
*/
void put_cpu_vector_context(void)
{
riscv_v_stop(RISCV_KERNEL_MODE_V);

- preempt_enable();
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ local_bh_enable();
+ else
+ preempt_enable();
}

/*
* kernel_vector_begin(): obtain the CPU vector registers for use by the calling
* context
*
* Must not be called unless may_use_simd() returns true.
* Task context in the vector registers is saved back to memory as necessary.
*
* A matching call to kernel_vector_end() must be made before returning from the
--
2.43.0


2024-01-02 06:51:07

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 03/13] RISC-V: add helper function to read the vector VLEN

From: Heiko Stuebner <[email protected]>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_v_vsize that contains
the value of "32 vector registers with vlenb length" that gets filled
during boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/include/asm/vector.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 71af3404fda14..ae724e016fe24 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -218,11 +218,22 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
#define riscv_v_vsize (0)
#define riscv_v_vstate_discard(regs) do {} while (0)
#define riscv_v_vstate_save(task, regs) do {} while (0)
#define riscv_v_vstate_restore(task, regs) do {} while (0)
#define __switch_to_vector(__prev, __next) do {} while (0)
#define riscv_v_vstate_off(regs) do {} while (0)
#define riscv_v_vstate_on(regs) do {} while (0)

#endif /* CONFIG_RISCV_ISA_V */

+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_v_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+ return riscv_v_vsize / 32 * 8;
+}
+
#endif /* ! __ASM_RISCV_VECTOR_H */
--
2.43.0


2024-01-02 06:51:21

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 04/13] RISC-V: add TOOLCHAIN_HAS_VECTOR_CRYPTO

From: Eric Biggers <[email protected]>

Add a kconfig symbol that indicates whether the toolchain supports the
vector crypto extensions. This is needed by the RISC-V crypto code.

Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/Kconfig | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a6..bf275a713b1b5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -526,20 +526,27 @@ config RISCV_ISA_V_DEFAULT_ENABLE
If you don't know what to do here, say Y.

config TOOLCHAIN_HAS_ZBB
bool
default y
depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb)
depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb)
depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
depends on AS_HAS_OPTION_ARCH

+# This symbol indicates that the toolchain supports all v1.0 vector crypto
+# extensions, including Zvk*, Zvbb, and Zvbc. LLVM added all of these at once.
+# binutils added all except Zvkb, then added Zvkb. So we just check for Zvkb.
+config TOOLCHAIN_HAS_VECTOR_CRYPTO
+ def_bool $(as-instr, .option arch$(comma) +zvkb)
+ depends on AS_HAS_OPTION_ARCH
+
config RISCV_ISA_ZBB
bool "Zbb extension support for bit manipulation instructions"
depends on TOOLCHAIN_HAS_ZBB
depends on MMU
depends on RISCV_ALTERNATIVE
default y
help
Adds support to dynamically detect the presence of the ZBB
extension (basic bit manipulation) and enable its usage.

--
2.43.0


2024-01-02 06:51:35

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 05/13] RISC-V: hook new crypto subdir into build-system

From: Heiko Stuebner <[email protected]>

Create a crypto subdirectory for added accelerated cryptography routines
and hook it into the riscv Kbuild and the main crypto Kconfig.

Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/Kbuild | 1 +
arch/riscv/crypto/Kconfig | 5 +++++
arch/riscv/crypto/Makefile | 4 ++++
crypto/Kconfig | 3 +++
4 files changed, 13 insertions(+)
create mode 100644 arch/riscv/crypto/Kconfig
create mode 100644 arch/riscv/crypto/Makefile

diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index d25ad1c19f881..2c585f7a0b6ef 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -1,11 +1,12 @@
# SPDX-License-Identifier: GPL-2.0-only

obj-y += kernel/ mm/ net/
obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
+obj-$(CONFIG_CRYPTO) += crypto/
obj-y += errata/
obj-$(CONFIG_KVM) += kvm/

obj-$(CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY) += purgatory/

# for cleaning
subdir- += boot
diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
new file mode 100644
index 0000000000000..10d60edc0110a
--- /dev/null
+++ b/arch/riscv/crypto/Kconfig
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
+
+endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
new file mode 100644
index 0000000000000..b3b6332c9f6d0
--- /dev/null
+++ b/arch/riscv/crypto/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# linux/arch/riscv/crypto/Makefile
+#
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 70661f58ee41c..c8fd2b83e589b 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1512,20 +1512,23 @@ source "arch/arm64/crypto/Kconfig"
endif
if LOONGARCH
source "arch/loongarch/crypto/Kconfig"
endif
if MIPS
source "arch/mips/crypto/Kconfig"
endif
if PPC
source "arch/powerpc/crypto/Kconfig"
endif
+if RISCV
+source "arch/riscv/crypto/Kconfig"
+endif
if S390
source "arch/s390/crypto/Kconfig"
endif
if SPARC
source "arch/sparc/crypto/Kconfig"
endif
if X86
source "arch/x86/crypto/Kconfig"
endif
endif
--
2.43.0


2024-01-02 06:52:04

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 06/13] crypto: riscv - add vector crypto accelerated AES

From: Jerry Shih <[email protected]>

Add an implementation of AES using the Zvkned extension. The assembly
code is derived from OpenSSL code (openssl/openssl#21923) that was
dual-licensed so that it could be reused in the kernel. Nevertheless,
the assembly has been significantly reworked for integration with the
kernel, for example by using a regular .S file instead of the so-called
perlasm, using the assembler instead of bare '.inst', greatly reducing
code duplication, supporting AES-192, and making the code use the same
AES key structure as the C code.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 11 ++
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/aes-macros.S | 156 +++++++++++++++++++++++++
arch/riscv/crypto/aes-riscv64-glue.c | 123 +++++++++++++++++++
arch/riscv/crypto/aes-riscv64-glue.h | 15 +++
arch/riscv/crypto/aes-riscv64-zvkned.S | 84 +++++++++++++
6 files changed, 392 insertions(+)
create mode 100644 arch/riscv/crypto/aes-macros.S
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 10d60edc0110a..2a7c365f2a86c 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -1,5 +1,16 @@
# SPDX-License-Identifier: GPL-2.0

menu "Accelerated Cryptographic Algorithms for CPU (riscv)"

+config CRYPTO_AES_RISCV64
+ tristate "Ciphers: AES"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_ALGAPI
+ select CRYPTO_LIB_AES
+ help
+ Block ciphers: AES cipher algorithms (FIPS-197)
+
+ Architecture: riscv64 using:
+ - Zvkned vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b3b6332c9f6d0..dca698c5cba3e 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -1,4 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
#
# linux/arch/riscv/crypto/Makefile
#
+
+obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
+aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
diff --git a/arch/riscv/crypto/aes-macros.S b/arch/riscv/crypto/aes-macros.S
new file mode 100644
index 0000000000000..2ada0c70f4a6a
--- /dev/null
+++ b/arch/riscv/crypto/aes-macros.S
@@ -0,0 +1,156 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Phoebe Chen <[email protected]>
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. INP NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER INP CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING INP ANY WAY OUTP OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// This file contains macros that are shared by the other aes-*.S files. The
+// generated code of these macros depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector AES block cipher extension ('Zvkned')
+
+// Loads the AES round keys from \keyp into vector registers and jumps to code
+// specific to the length of the key. Specifically:
+// - If AES-128, loads round keys into v1-v11 and jumps to \label128.
+// - If AES-192, loads round keys into v1-v13 and jumps to \label192.
+// - If AES-256, loads round keys into v1-v15 and continues onwards.
+//
+// Also sets vl=4 and vtype=e32,m1,ta,ma. Clobbers t0 and t1.
+.macro aes_begin keyp, label128, label192
+ lwu t0, 480(\keyp) // t0 = key length in bytes
+ li t1, 24 // t1 = key length for AES-192
+ vsetivli zero, 4, e32, m1, ta, ma
+ vle32.v v1, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v2, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v3, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v4, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v5, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v6, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v7, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v8, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v9, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v10, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v11, (\keyp)
+ blt t0, t1, \label128 // If AES-128, goto label128.
+ addi \keyp, \keyp, 16
+ vle32.v v12, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v13, (\keyp)
+ beq t0, t1, \label192 // If AES-192, goto label192.
+ // Else, it's AES-256.
+ addi \keyp, \keyp, 16
+ vle32.v v14, (\keyp)
+ addi \keyp, \keyp, 16
+ vle32.v v15, (\keyp)
+.endm
+
+// Encrypts \data using zvkned instructions, using the round keys loaded into
+// v1-v11 (for AES-128), v1-v13 (for AES-192), or v1-v15 (for AES-256). \keylen
+// is the AES key length in bits. vl and vtype must already be set
+// appropriately. Note that if vl > 4, multiple blocks are encrypted.
+.macro aes_encrypt data, keylen
+ vaesz.vs \data, v1
+ vaesem.vs \data, v2
+ vaesem.vs \data, v3
+ vaesem.vs \data, v4
+ vaesem.vs \data, v5
+ vaesem.vs \data, v6
+ vaesem.vs \data, v7
+ vaesem.vs \data, v8
+ vaesem.vs \data, v9
+ vaesem.vs \data, v10
+.if \keylen == 128
+ vaesef.vs \data, v11
+.elseif \keylen == 192
+ vaesem.vs \data, v11
+ vaesem.vs \data, v12
+ vaesef.vs \data, v13
+.else
+ vaesem.vs \data, v11
+ vaesem.vs \data, v12
+ vaesem.vs \data, v13
+ vaesem.vs \data, v14
+ vaesef.vs \data, v15
+.endif
+.endm
+
+// Same as aes_encrypt, but decrypts instead of encrypts.
+.macro aes_decrypt data, keylen
+.if \keylen == 128
+ vaesz.vs \data, v11
+.elseif \keylen == 192
+ vaesz.vs \data, v13
+ vaesdm.vs \data, v12
+ vaesdm.vs \data, v11
+.else
+ vaesz.vs \data, v15
+ vaesdm.vs \data, v14
+ vaesdm.vs \data, v13
+ vaesdm.vs \data, v12
+ vaesdm.vs \data, v11
+.endif
+ vaesdm.vs \data, v10
+ vaesdm.vs \data, v9
+ vaesdm.vs \data, v8
+ vaesdm.vs \data, v7
+ vaesdm.vs \data, v6
+ vaesdm.vs \data, v5
+ vaesdm.vs \data, v4
+ vaesdm.vs \data, v3
+ vaesdm.vs \data, v2
+ vaesdf.vs \data, v1
+.endm
+
+// Expands to aes_encrypt or aes_decrypt according to \enc, which is 1 or 0.
+.macro aes_crypt data, enc, keylen
+.if \enc
+ aes_encrypt \data, \keylen
+.else
+ aes_decrypt \data, \keylen
+.endif
+.endm
diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-riscv64-glue.c
new file mode 100644
index 0000000000000..f9c7b1a638f2d
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AES using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+#include "aes-riscv64-glue.h"
+
+asmlinkage void aes_encrypt_zvkned(const struct crypto_aes_ctx *key,
+ const u8 in[AES_BLOCK_SIZE],
+ u8 out[AES_BLOCK_SIZE]);
+asmlinkage void aes_decrypt_zvkned(const struct crypto_aes_ctx *key,
+ const u8 in[AES_BLOCK_SIZE],
+ u8 out[AES_BLOCK_SIZE]);
+
+int __riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key,
+ unsigned int keylen)
+{
+ /*
+ * Always use the generic key expansion routine, for two reasons:
+ *
+ * - zvkned's key expansion instructions don't support AES-192.
+ *
+ * - ctx->key_dec always needs to be initialized with the round keys for
+ * the Equivalent Inverse Cipher, in case the no-SIMD fallback is
+ * taken during decryption. But the zvkned code does not use this.
+ */
+ return aes_expandkey(ctx, key, keylen);
+}
+EXPORT_SYMBOL_GPL(__riscv64_aes_setkey);
+
+void __riscv64_aes_encrypt(const struct crypto_aes_ctx *ctx,
+ u8 *dst, const u8 *src)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ aes_encrypt_zvkned(ctx, src, dst);
+ kernel_vector_end();
+ } else {
+ aes_encrypt(ctx, dst, src);
+ }
+}
+EXPORT_SYMBOL_GPL(__riscv64_aes_encrypt);
+
+static int riscv64_aes_setkey(struct crypto_tfm *tfm,
+ const u8 *key, unsigned int keylen)
+{
+ struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __riscv64_aes_setkey(ctx, key, keylen);
+}
+
+static void riscv64_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ __riscv64_aes_encrypt(ctx, dst, src);
+}
+
+static void riscv64_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ aes_decrypt_zvkned(ctx, src, dst);
+ kernel_vector_end();
+ } else {
+ aes_decrypt(ctx, dst, src);
+ }
+}
+
+static struct crypto_alg riscv64_aes_alg = {
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "aes",
+ .cra_driver_name = "aes-riscv64-zvkned",
+ .cra_cipher = {
+ .cia_min_keysize = AES_MIN_KEY_SIZE,
+ .cia_max_keysize = AES_MAX_KEY_SIZE,
+ .cia_setkey = riscv64_aes_setkey,
+ .cia_encrypt = riscv64_aes_encrypt,
+ .cia_decrypt = riscv64_aes_decrypt,
+ },
+ .cra_module = THIS_MODULE,
+};
+
+static int __init riscv64_aes_mod_init(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_alg(&riscv64_aes_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_aes_mod_fini(void)
+{
+ crypto_unregister_alg(&riscv64_aes_alg);
+}
+
+module_init(riscv64_aes_mod_init);
+module_exit(riscv64_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-riscv64-glue.h
new file mode 100644
index 0000000000000..0d38613380784
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-glue.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef AES_RISCV64_GLUE_H
+#define AES_RISCV64_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+int __riscv64_aes_setkey(struct crypto_aes_ctx *ctx,
+ const u8 *key, unsigned int keylen);
+
+void __riscv64_aes_encrypt(const struct crypto_aes_ctx *ctx,
+ u8 *dst, const u8 *src);
+
+#endif /* AES_RISCV64_GLUE_H */
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.S b/arch/riscv/crypto/aes-riscv64-zvkned.S
new file mode 100644
index 0000000000000..3346978b89d6a
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.S
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Phoebe Chen <[email protected]>
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. INP NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER INP CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING INP ANY WAY OUTP OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector AES block cipher extension ('Zvkned')
+
+#include <linux/linkage.h>
+
+.text
+.option arch, +zvkned
+
+#include "aes-macros.S"
+
+#define KEYP a0
+#define INP a1
+#define OUTP a2
+#define LEN a3
+#define IVP a4
+
+.macro __aes_crypt_zvkned enc, keylen
+ vle32.v v16, (INP)
+ aes_crypt v16, \enc, \keylen
+ vse32.v v16, (OUTP)
+ ret
+.endm
+
+.macro aes_crypt_zvkned enc
+ aes_begin KEYP, 111f, 222f
+ __aes_crypt_zvkned \enc, 256
+111:
+ __aes_crypt_zvkned \enc, 128
+222:
+ __aes_crypt_zvkned \enc, 192
+.endm
+
+// void aes_encrypt_zvkned(const struct crypto_aes_ctx *key,
+// const u8 in[16], u8 out[16]);
+SYM_FUNC_START(aes_encrypt_zvkned)
+ aes_crypt_zvkned 1
+SYM_FUNC_END(aes_encrypt_zvkned)
+
+// Same prototype and calling convention as the encryption function
+SYM_FUNC_START(aes_decrypt_zvkned)
+ aes_crypt_zvkned 0
+SYM_FUNC_END(aes_decrypt_zvkned)
--
2.43.0


2024-01-02 06:52:21

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 07/13] crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}

From: Jerry Shih <[email protected]>

Add implementations of AES-ECB, AES-CBC, AES-CTR, and AES-XTS using the
RISC-V vector crypto extensions. The assembly code is derived from
OpenSSL code (openssl/openssl#21923) that was dual-licensed so that it
could be reused in the kernel. Nevertheless, the assembly has been
significantly reworked for integration with the kernel, for example by
using regular .S files instead of the so-called perlasm, using the
assembler instead of bare '.inst', greatly reducing code duplication,
supporting AES-192, and making the code use the same AES key structure
as the C code.

Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 20 +
arch/riscv/crypto/Makefile | 3 +
.../crypto/aes-riscv64-block-mode-glue.c | 435 ++++++++++++++++++
.../crypto/aes-riscv64-zvkned-zvbb-zvkg.S | 300 ++++++++++++
arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S | 146 ++++++
arch/riscv/crypto/aes-riscv64-zvkned.S | 96 ++++
6 files changed, 1000 insertions(+)
create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S
create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 2a7c365f2a86c..db8a5bcbea785 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -6,11 +6,31 @@ config CRYPTO_AES_RISCV64
tristate "Ciphers: AES"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_ALGAPI
select CRYPTO_LIB_AES
help
Block ciphers: AES cipher algorithms (FIPS-197)

Architecture: riscv64 using:
- Zvkned vector crypto extension

+config CRYPTO_AES_BLOCK_RISCV64
+ tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_AES_RISCV64
+ select CRYPTO_SKCIPHER
+ help
+ Length-preserving ciphers: AES cipher algorithms (FIPS-197)
+ with block cipher modes:
+ - ECB (Electronic Codebook) mode (NIST SP 800-38A)
+ - CBC (Cipher Block Chaining) mode (NIST SP 800-38A)
+ - CTR (Counter) mode (NIST SP 800-38A)
+ - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
+ Stealing) mode (NIST SP 800-38E and IEEE 1619)
+
+ Architecture: riscv64 using:
+ - Zvkned vector crypto extension
+ - Zvbb vector extension (XTS)
+ - Zvkb vector crypto extension (CTR/XTS)
+ - Zvkg vector crypto extension (XTS)
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index dca698c5cba3e..5dd91f34f0d52 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -1,7 +1,10 @@
# SPDX-License-Identifier: GPL-2.0-only
#
# linux/arch/riscv/crypto/Makefile
#

obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
+
+obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
+aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o
diff --git a/arch/riscv/crypto/aes-riscv64-block-mode-glue.c b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
new file mode 100644
index 0000000000000..3250f6a0c1aae
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c
@@ -0,0 +1,435 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AES in ECB, CBC, CTR, and XTS modes using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/xts.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+#include "aes-riscv64-glue.h"
+
+struct riscv64_aes_xts_ctx {
+ struct crypto_aes_ctx ctx1;
+ struct crypto_aes_ctx ctx2;
+};
+
+asmlinkage void aes_ecb_encrypt_zvkned(const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len);
+asmlinkage void aes_ecb_decrypt_zvkned(const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len);
+
+asmlinkage void aes_cbc_encrypt_zvkned(const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len,
+ u8 iv[AES_BLOCK_SIZE]);
+asmlinkage void aes_cbc_decrypt_zvkned(const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len,
+ u8 iv[AES_BLOCK_SIZE]);
+
+asmlinkage void aes_ctr32_crypt_zvkned_zvkb(const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len,
+ u8 iv[AES_BLOCK_SIZE]);
+
+asmlinkage void aes_xts_encrypt_zvkned_zvbb_zvkg(
+ const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len,
+ u8 tweak[AES_BLOCK_SIZE]);
+
+asmlinkage void aes_xts_decrypt_zvkned_zvbb_zvkg(
+ const struct crypto_aes_ctx *key,
+ const u8 *in, u8 *out, size_t len,
+ u8 tweak[AES_BLOCK_SIZE]);
+
+static int riscv64_aes_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+ return __riscv64_aes_setkey(ctx, key, keylen);
+}
+
+static inline int riscv64_aes_ecb_crypt(struct skcipher_request *req, bool enc)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ /* If we have error here, the `nbytes` will be zero. */
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes) != 0) {
+ kernel_vector_begin();
+ if (enc)
+ aes_ecb_encrypt_zvkned(ctx, walk.src.virt.addr,
+ walk.dst.virt.addr,
+ nbytes & ~(AES_BLOCK_SIZE - 1));
+ else
+ aes_ecb_decrypt_zvkned(ctx, walk.src.virt.addr,
+ walk.dst.virt.addr,
+ nbytes & ~(AES_BLOCK_SIZE - 1));
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1));
+ }
+
+ return err;
+}
+
+static int riscv64_aes_ecb_encrypt(struct skcipher_request *req)
+{
+ return riscv64_aes_ecb_crypt(req, true);
+}
+
+static int riscv64_aes_ecb_decrypt(struct skcipher_request *req)
+{
+ return riscv64_aes_ecb_crypt(req, false);
+}
+
+static inline int riscv64_aes_cbc_crypt(struct skcipher_request *req, bool enc)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes) != 0) {
+ kernel_vector_begin();
+ if (enc)
+ aes_cbc_encrypt_zvkned(ctx, walk.src.virt.addr,
+ walk.dst.virt.addr,
+ nbytes & ~(AES_BLOCK_SIZE - 1),
+ walk.iv);
+ else
+ aes_cbc_decrypt_zvkned(ctx, walk.src.virt.addr,
+ walk.dst.virt.addr,
+ nbytes & ~(AES_BLOCK_SIZE - 1),
+ walk.iv);
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1));
+ }
+
+ return err;
+}
+
+static int riscv64_aes_cbc_encrypt(struct skcipher_request *req)
+{
+ return riscv64_aes_cbc_crypt(req, true);
+}
+
+static int riscv64_aes_cbc_decrypt(struct skcipher_request *req)
+{
+ return riscv64_aes_cbc_crypt(req, false);
+}
+
+static int riscv64_aes_ctr_crypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int ctr32;
+ unsigned int nbytes;
+ unsigned int blocks;
+ unsigned int current_blocks;
+ unsigned int current_length;
+ int err;
+
+ /* the ctr iv uses big endian */
+ ctr32 = get_unaligned_be32(req->iv + 12);
+ err = skcipher_walk_virt(&walk, req, false);
+ while ((nbytes = walk.nbytes) != 0) {
+ if (nbytes != walk.total) {
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+ blocks = nbytes / AES_BLOCK_SIZE;
+ } else {
+ /* This is the last walk. We should handle the tail data. */
+ blocks = DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE);
+ }
+ ctr32 += blocks;
+
+ kernel_vector_begin();
+ /*
+ * The `if` block below detects the overflow, which is then handled by
+ * limiting the amount of blocks to the exact overflow point.
+ */
+ if (ctr32 >= blocks) {
+ aes_ctr32_crypt_zvkned_zvkb(ctx, walk.src.virt.addr,
+ walk.dst.virt.addr, nbytes,
+ req->iv);
+ } else {
+ /* use 2 ctr32 function calls for overflow case */
+ current_blocks = blocks - ctr32;
+ current_length =
+ min(nbytes, current_blocks * AES_BLOCK_SIZE);
+ aes_ctr32_crypt_zvkned_zvkb(ctx, walk.src.virt.addr,
+ walk.dst.virt.addr,
+ current_length, req->iv);
+ crypto_inc(req->iv, 12);
+
+ if (ctr32) {
+ aes_ctr32_crypt_zvkned_zvkb(
+ ctx,
+ walk.src.virt.addr +
+ current_blocks * AES_BLOCK_SIZE,
+ walk.dst.virt.addr +
+ current_blocks * AES_BLOCK_SIZE,
+ nbytes - current_length, req->iv);
+ }
+ }
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
+
+ return err;
+}
+
+static int riscv64_aes_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+ err = __riscv64_aes_setkey(&ctx->ctx1, key, keylen / 2);
+ if (err)
+ return err;
+ return __riscv64_aes_setkey(&ctx->ctx2, key + keylen / 2, keylen / 2);
+}
+
+static int riscv64_aes_xts_crypt(struct skcipher_request *req, bool enc)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int tail = req->cryptlen % AES_BLOCK_SIZE;
+ struct scatterlist sg_src[2], sg_dst[2];
+ struct skcipher_request subreq;
+ struct scatterlist *src, *dst;
+ struct skcipher_walk walk;
+ int err;
+
+ if (req->cryptlen < AES_BLOCK_SIZE)
+ return -EINVAL;
+
+ __riscv64_aes_encrypt(&ctx->ctx2, req->iv, req->iv);
+
+ err = skcipher_walk_virt(&walk, req, false);
+
+ if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+ int xts_blocks = DIV_ROUND_UP(req->cryptlen,
+ AES_BLOCK_SIZE) - 2;
+
+ skcipher_walk_abort(&walk);
+
+ skcipher_request_set_tfm(&subreq, tfm);
+ skcipher_request_set_callback(&subreq,
+ skcipher_request_flags(req),
+ NULL, NULL);
+ skcipher_request_set_crypt(&subreq, req->src, req->dst,
+ xts_blocks * AES_BLOCK_SIZE,
+ req->iv);
+ req = &subreq;
+ err = skcipher_walk_virt(&walk, req, false);
+ } else {
+ tail = 0;
+ }
+
+ while (walk.nbytes >= AES_BLOCK_SIZE) {
+ unsigned int nbytes = walk.nbytes;
+
+ if (walk.nbytes < walk.total)
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+ kernel_vector_begin();
+ if (enc)
+ aes_xts_encrypt_zvkned_zvbb_zvkg(
+ &ctx->ctx1, walk.src.virt.addr,
+ walk.dst.virt.addr, nbytes, req->iv);
+ else
+ aes_xts_decrypt_zvkned_zvbb_zvkg(
+ &ctx->ctx1, walk.src.virt.addr,
+ walk.dst.virt.addr, nbytes, req->iv);
+ kernel_vector_end();
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
+
+ if (err || likely(!tail))
+ return err;
+
+ dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
+
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
+
+ kernel_vector_begin();
+ if (enc)
+ aes_xts_encrypt_zvkned_zvbb_zvkg(
+ &ctx->ctx1, walk.src.virt.addr,
+ walk.dst.virt.addr, walk.nbytes, req->iv);
+ else
+ aes_xts_decrypt_zvkned_zvbb_zvkg(
+ &ctx->ctx1, walk.src.virt.addr,
+ walk.dst.virt.addr, walk.nbytes, req->iv);
+ kernel_vector_end();
+
+ return skcipher_walk_done(&walk, 0);
+}
+
+static int riscv64_aes_xts_encrypt(struct skcipher_request *req)
+{
+ return riscv64_aes_xts_crypt(req, true);
+}
+
+static int riscv64_aes_xts_decrypt(struct skcipher_request *req)
+{
+ return riscv64_aes_xts_crypt(req, false);
+}
+
+static struct skcipher_alg riscv64_zvkned_aes_algs[] = {
+ {
+ .setkey = riscv64_aes_setkey,
+ .encrypt = riscv64_aes_ecb_encrypt,
+ .decrypt = riscv64_aes_ecb_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "ecb(aes)",
+ .cra_driver_name = "ecb-aes-riscv64-zvkned",
+ .cra_module = THIS_MODULE,
+ },
+ }, {
+ .setkey = riscv64_aes_setkey,
+ .encrypt = riscv64_aes_cbc_encrypt,
+ .decrypt = riscv64_aes_cbc_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "cbc(aes)",
+ .cra_driver_name = "cbc-aes-riscv64-zvkned",
+ .cra_module = THIS_MODULE,
+ },
+ }
+};
+
+static struct skcipher_alg riscv64_zvkned_zvkb_aes_alg = {
+ .setkey = riscv64_aes_setkey,
+ .encrypt = riscv64_aes_ctr_crypt,
+ .decrypt = riscv64_aes_ctr_crypt,
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct crypto_aes_ctx),
+ .cra_priority = 300,
+ .cra_name = "ctr(aes)",
+ .cra_driver_name = "ctr-aes-riscv64-zvkned-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static struct skcipher_alg riscv64_zvkned_zvbb_zvkg_aes_alg = {
+ .setkey = riscv64_aes_xts_setkey,
+ .encrypt = riscv64_aes_xts_encrypt,
+ .decrypt = riscv64_aes_xts_decrypt,
+ .min_keysize = AES_MIN_KEY_SIZE * 2,
+ .max_keysize = AES_MAX_KEY_SIZE * 2,
+ .ivsize = AES_BLOCK_SIZE,
+ .chunksize = AES_BLOCK_SIZE,
+ .walksize = AES_BLOCK_SIZE * 8,
+ .base = {
+ .cra_blocksize = AES_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_aes_xts_ctx),
+ .cra_priority = 300,
+ .cra_name = "xts(aes)",
+ .cra_driver_name = "xts-aes-riscv64-zvkned-zvbb-zvkg",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static int __init riscv64_aes_block_mod_init(void)
+{
+ int err = -ENODEV;
+
+ if (riscv_isa_extension_available(NULL, ZVKNED) &&
+ riscv_vector_vlen() >= 128) {
+ err = crypto_register_skciphers(
+ riscv64_zvkned_aes_algs,
+ ARRAY_SIZE(riscv64_zvkned_aes_algs));
+ if (err)
+ return err;
+
+ if (riscv_isa_extension_available(NULL, ZVKB)) {
+ err = crypto_register_skcipher(
+ &riscv64_zvkned_zvkb_aes_alg);
+ if (err)
+ goto unregister_zvkned;
+ }
+
+ if (riscv_isa_extension_available(NULL, ZVBB) &&
+ riscv_isa_extension_available(NULL, ZVKG) &&
+ riscv_vector_vlen() < 2048 /* XTS impl limitation */) {
+ err = crypto_register_skcipher(
+ &riscv64_zvkned_zvbb_zvkg_aes_alg);
+ if (err)
+ goto unregister_zvkned_zvkb;
+ }
+ }
+
+ return err;
+
+unregister_zvkned_zvkb:
+ if (riscv_isa_extension_available(NULL, ZVKB))
+ crypto_unregister_skcipher(&riscv64_zvkned_zvkb_aes_alg);
+unregister_zvkned:
+ crypto_unregister_skciphers(riscv64_zvkned_aes_algs,
+ ARRAY_SIZE(riscv64_zvkned_aes_algs));
+ return err;
+}
+
+static void __exit riscv64_aes_block_mod_fini(void)
+{
+ crypto_unregister_skcipher(&riscv64_zvkned_zvbb_zvkg_aes_alg);
+ crypto_unregister_skcipher(&riscv64_zvkned_zvkb_aes_alg);
+ crypto_unregister_skciphers(riscv64_zvkned_aes_algs,
+ ARRAY_SIZE(riscv64_zvkned_aes_algs));
+}
+
+module_init(riscv64_aes_block_mod_init);
+module_exit(riscv64_aes_block_mod_fini);
+
+MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("ecb(aes)");
+MODULE_ALIAS_CRYPTO("cbc(aes)");
+MODULE_ALIAS_CRYPTO("ctr(aes)");
+MODULE_ALIAS_CRYPTO("xts(aes)");
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S b/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S
new file mode 100644
index 0000000000000..41eb618e8a5d9
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S
@@ -0,0 +1,300 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128 && VLEN < 2048
+// - RISC-V Vector AES block cipher extension ('Zvkned')
+// - RISC-V Vector Bit-manipulation extension ('Zvbb')
+// - RISC-V Vector GCM/GMAC extension ('Zvkg')
+
+#include <linux/linkage.h>
+
+.text
+.option arch, +zvkned, +zvbb, +zvkg
+
+#include "aes-macros.S"
+
+#define KEYP a0
+#define INP a1
+#define OUTP a2
+#define LEN a3
+#define TWEAKP a4
+
+#define LEN32 a5
+#define TAIL_LEN a6
+#define VL a7
+
+// v1-v15 contain the AES round keys, but they are used for temporaries before
+// the AES round keys have been loaded.
+#define TWEAKS v16 // LMUL=4 (most of the time)
+#define TWEAKS_BREV v20 // LMUL=4 (most of the time)
+#define MULTS_BREV v24 // LMUL=4 (most of the time)
+#define TMP0 v28
+#define TMP1 v29
+#define TMP2 v30
+#define TMP3 v31
+
+// xts_init initializes the following values:
+//
+// TWEAKS: N 128-bit tweaks T*(x^i) for i in 0..(N - 1)
+// TWEAKS_BREV: same as TWEAKS, but bit-reversed
+// MULTS_BREV: N 128-bit values x^N, bit-reversed. Only if N > 1.
+//
+// N is the maximum number of blocks that will be processed per loop iteration,
+// computed using vsetvli.
+//
+// The field convention used by XTS is the same as that of GHASH, but with the
+// bits reversed within each byte. The zvkg extension provides the vgmul
+// instruction which does multiplication in this field. Therefore, for tweak
+// computation we use vgmul to do multiplications in parallel, instead of
+// serially multiplying by x using shifting+xoring. Note that for this to work,
+// the inputs and outputs to vgmul must be bit-reversed (we do it with vbrev8).
+.macro xts_init
+
+ // Load the first tweak T.
+ vsetivli zero, 4, e32, m1, ta, ma
+ vle32.v TWEAKS, (TWEAKP)
+
+ // If there's only one block (or no blocks at all), then skip the tweak
+ // sequence computation because (at most) T itself is needed.
+ li t0, 16
+ ble LEN, t0, .Linit_single_block\@
+
+ // Save a copy of T bit-reversed in v12.
+ vbrev8.v v12, TWEAKS
+
+ //
+ // Generate x^i for i in 0..(N - 1), i.e. 128-bit values 1 << i assuming
+ // that N <= 128. Though, this code actually requires N < 64 (or
+ // equivalently VLEN < 2048) due to the use of 64-bit intermediate
+ // values here and in the x^N computation later.
+ //
+ vsetvli VL, LEN32, e32, m4, ta, ma
+ srli t0, VL, 2 // t0 = N (num blocks)
+ // Generate two sequences, each with N 32-bit values:
+ // v0=[1, 1, 1, ...] and v1=[0, 1, 2, ...].
+ vsetvli zero, t0, e32, m1, ta, ma
+ vmv.v.i v0, 1
+ vid.v v1
+ // Use vzext to zero-extend the sequences to 64 bits. Reinterpret them
+ // as two sequences, each with 2*N 32-bit values:
+ // v2=[1, 0, 1, 0, 1, 0, ...] and v4=[0, 0, 1, 0, 2, 0, ...].
+ vsetvli zero, t0, e64, m2, ta, ma
+ vzext.vf2 v2, v0
+ vzext.vf2 v4, v1
+ slli t1, t0, 1 // t1 = 2*N
+ vsetvli zero, t1, e32, m2, ta, ma
+ // Use vwsll to compute [1<<0, 0<<0, 1<<1, 0<<0, 1<<2, 0<<0, ...],
+ // widening to 64 bits per element. When reinterpreted as N 128-bit
+ // values, this is the needed sequence of 128-bit values 1 << i (x^i).
+ vwsll.vv v8, v2, v4
+
+ // Copy the bit-reversed T to all N elements of TWEAKS_BREV, then
+ // multiply by x^i. This gives the sequence T*(x^i), bit-reversed.
+ vsetvli zero, LEN32, e32, m4, ta, ma
+ vmv.v.i TWEAKS_BREV, 0
+ vaesz.vs TWEAKS_BREV, v12
+ vbrev8.v v8, v8
+ vgmul.vv TWEAKS_BREV, v8
+
+ // Save a copy of the sequence T*(x^i) with the bit reversal undone.
+ vbrev8.v TWEAKS, TWEAKS_BREV
+
+ // Generate N copies of x^N, i.e. 128-bit values 1 << N, bit-reversed.
+ li t1, 1
+ sll t1, t1, t0 // t1 = 1 << N
+ vsetivli zero, 2, e64, m1, ta, ma
+ vmv.v.i v0, 0
+ vsetivli zero, 1, e64, m1, tu, ma
+ vmv.v.x v0, t1
+ vbrev8.v v0, v0
+ vsetvli zero, LEN32, e32, m4, ta, ma
+ vmv.v.i MULTS_BREV, 0
+ vaesz.vs MULTS_BREV, v0
+
+ j .Linit_done\@
+
+.Linit_single_block\@:
+ vbrev8.v TWEAKS_BREV, TWEAKS
+.Linit_done\@:
+.endm
+
+// Set the first 128 bits of MULTS_BREV to 0x40, i.e. 'x' bit-reversed. This is
+// the multiplier required to advance the tweak by one.
+.macro load_x
+ li t0, 0x40
+ vsetivli zero, 4, e32, m1, ta, ma
+ vmv.v.i MULTS_BREV, 0
+ vsetivli zero, 1, e8, m1, tu, ma
+ vmv.v.x MULTS_BREV, t0
+.endm
+
+.macro __aes_xts_crypt enc, keylen
+ // With 16 < len <= 31, there's no main loop, just ciphertext stealing.
+ beqz LEN32, .Lcts_without_main_loop\@
+
+ vsetvli VL, LEN32, e32, m4, ta, ma
+ j 2f
+1:
+ vsetvli VL, LEN32, e32, m4, ta, ma
+ // Compute the next sequence of tweaks by multiplying the previous
+ // sequence by x^N. Store the result in both bit-reversed order and
+ // regular order (i.e. with the bit reversal undone).
+ vgmul.vv TWEAKS_BREV, MULTS_BREV
+ vbrev8.v TWEAKS, TWEAKS_BREV
+2:
+ // Encrypt or decrypt VL/4 blocks.
+ vle32.v TMP0, (INP)
+ vxor.vv TMP0, TMP0, TWEAKS
+ aes_crypt TMP0, \enc, \keylen
+ vxor.vv TMP0, TMP0, TWEAKS
+ vse32.v TMP0, (OUTP)
+
+ // Update the pointers and the remaining length.
+ slli t0, VL, 2
+ add INP, INP, t0
+ add OUTP, OUTP, t0
+ sub LEN32, LEN32, VL
+
+ // Repeat if more blocks remain.
+ bnez LEN32, 1b
+
+.Lmain_loop_done\@:
+ load_x
+
+ // Compute the next tweak.
+ addi t0, VL, -4
+ vsetivli zero, 4, e32, m4, ta, ma
+ vslidedown.vx TWEAKS_BREV, TWEAKS_BREV, t0 // Extract last tweak
+ vsetivli zero, 4, e32, m1, ta, ma
+ vgmul.vv TWEAKS_BREV, MULTS_BREV // Advance to next tweak
+
+ bnez TAIL_LEN, .Lcts\@
+
+ // Update *TWEAKP to contain the next tweak.
+ vbrev8.v TWEAKS, TWEAKS_BREV
+ vse32.v TWEAKS, (TWEAKP)
+ ret
+
+.Lcts_without_main_loop\@:
+ load_x
+.Lcts\@:
+ // TWEAKS_BREV now contains the next tweak. Compute the one after that.
+ vsetivli zero, 4, e32, m1, ta, ma
+ vmv.v.v TMP0, TWEAKS_BREV
+ vgmul.vv TMP0, MULTS_BREV
+ // Undo the bit reversal of the next two tweaks and store them in TMP1
+ // and TMP2, such that TMP1 is the first needed and TMP2 the second.
+.if \enc
+ vbrev8.v TMP1, TWEAKS_BREV
+ vbrev8.v TMP2, TMP0
+.else
+ vbrev8.v TMP1, TMP0
+ vbrev8.v TMP2, TWEAKS_BREV
+.endif
+
+ // Encrypt/decrypt the last full block.
+ vle32.v TMP0, (INP)
+ vxor.vv TMP0, TMP0, TMP1
+ aes_crypt TMP0, \enc, \keylen
+ vxor.vv TMP0, TMP0, TMP1
+
+ // Swap the first TAIL_LEN bytes of the above result with the tail.
+ // Note that to support in-place encryption/decryption, the load from
+ // the input tail must happen before the store to the output tail.
+ addi t0, INP, 16
+ addi t1, OUTP, 16
+ vmv.v.v TMP3, TMP0
+ vsetvli zero, TAIL_LEN, e8, m1, tu, ma
+ vle8.v TMP0, (t0)
+ vse8.v TMP3, (t1)
+
+ // Encrypt/decrypt again and store the last full block.
+ vsetivli zero, 4, e32, m1, ta, ma
+ vxor.vv TMP0, TMP0, TMP2
+ aes_crypt TMP0, \enc, \keylen
+ vxor.vv TMP0, TMP0, TMP2
+ vse32.v TMP0, (OUTP)
+
+ ret
+.endm
+
+.macro aes_xts_crypt enc
+
+ // Check whether the length is a multiple of the AES block size.
+ andi TAIL_LEN, LEN, 15
+ beqz TAIL_LEN, 1f
+
+ // The length isn't a multiple of the AES block size, so ciphertext
+ // stealing will be required. Ciphertext stealing involves special
+ // handling of the partial block and the last full block, so subtract
+ // the length of both from the length to be processed in the main loop.
+ sub LEN, LEN, TAIL_LEN
+ addi LEN, LEN, -16
+1:
+ srli LEN32, LEN, 2
+ // LEN and LEN32 now contain the total length of the blocks that will be
+ // processed in the main loop, in bytes and 32-bit words respectively.
+
+ xts_init
+ aes_begin KEYP, 111f, 222f
+ __aes_xts_crypt \enc, 256
+111:
+ __aes_xts_crypt \enc, 128
+222:
+ __aes_xts_crypt \enc, 192
+.endm
+
+// void aes_xts_encrypt_zvkned_zvbb_zvkg(const struct crypto_aes_ctx *key,
+// const u8 *in, u8 *out, size_t len,
+// u8 tweak[16]);
+//
+// |key| is the data key. |tweak| contains the next tweak; the encryption of
+// the original IV with the tweak key was already done. This function supports
+// incremental computation, but |len| must always be >= 16 (AES_BLOCK_SIZE), and
+// |len| must be a multiple of 16 except on the last call. If |len| is a
+// multiple of 16, then this function updates |tweak| to contain the next tweak.
+SYM_FUNC_START(aes_xts_encrypt_zvkned_zvbb_zvkg)
+ aes_xts_crypt 1
+SYM_FUNC_END(aes_xts_encrypt_zvkned_zvbb_zvkg)
+
+// Same prototype and calling convention as the encryption function
+SYM_FUNC_START(aes_xts_decrypt_zvkned_zvbb_zvkg)
+ aes_xts_crypt 0
+SYM_FUNC_END(aes_xts_decrypt_zvkned_zvbb_zvkg)
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S b/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S
new file mode 100644
index 0000000000000..4a48f2f72c927
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S
@@ -0,0 +1,146 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector AES block cipher extension ('Zvkned')
+// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+#include <linux/linkage.h>
+
+.text
+.option arch, +zvkned, +zvkb
+
+#include "aes-macros.S"
+
+#define KEYP a0
+#define INP a1
+#define OUTP a2
+#define LEN a3
+#define IVP a4
+
+#define LEN32 a5
+#define VL_E32 a6
+#define VL_BLOCKS a7
+
+.macro aes_ctr32_crypt keylen
+ // LEN32 = number of blocks, rounded up, in e32's
+ addi t0, LEN, 15
+ srli t0, t0, 4
+ slli LEN32, t0, 2
+
+ // Create a mask that selects the last 32-bit word of each 128-bit
+ // block. This is the word that contains the (big-endian) counter.
+ li t0, 0b10001000
+ vsetvli t1, zero, e8, m1, ta, ma
+ vmv.v.x v0, t0
+
+ // Load the IV into v31. The last 32-bit word contains the counter.
+ vsetivli zero, 4, e32, m1, ta, ma
+ vle32.v v31, (IVP)
+
+ // Convert the big-endian counter into little-endian.
+ vsetivli zero, 4, e32, m1, ta, mu
+ vrev8.v v31, v31, v0.t
+
+ // Splat the IV to v16 (with LMUL=4). The number of copies is the
+ // maximum number of blocks that will be processed per iteration.
+ vsetvli zero, LEN32, e32, m4, ta, ma
+ vmv.v.i v16, 0
+ vaesz.vs v16, v31
+
+ // v20 = [x, x, x, 0, x, x, x, 1, ...]
+ viota.m v20, v0, v0.t
+ // v16 = [IV0, IV1, IV2, counter+0, IV0, IV1, IV2, counter+1, ...]
+ vsetvli VL_E32, LEN32, e32, m4, ta, mu
+ vadd.vv v16, v16, v20, v0.t
+
+ j 2f
+1:
+ // Set the number of blocks to process in this iteration. vl=VL_E32 is
+ // the length in 32-bit elements, i.e. 4 times the number of blocks.
+ vsetvli VL_E32, LEN32, e32, m4, ta, mu
+
+ // Increment the counters by the number of blocks processed in the
+ // previous iteration.
+ vadd.vx v16, v16, VL_BLOCKS, v0.t
+2:
+ // Prepare the AES inputs into v24.
+ vmv.v.v v24, v16
+ vrev8.v v24, v24, v0.t // Convert back to big-endian.
+
+ // Encrypt the AES inputs to create the next portion of the keystream.
+ aes_encrypt v24, \keylen
+
+ // XOR the data with the keystream.
+ vsetvli t0, LEN, e8, m4, ta, ma
+ vle8.v v20, (INP)
+ vxor.vv v20, v20, v24
+ vse8.v v20, (OUTP)
+
+ // Advance the pointers and update the remaining length.
+ add INP, INP, t0
+ add OUTP, OUTP, t0
+ sub LEN, LEN, t0
+ sub LEN32, LEN32, VL_E32
+ srli VL_BLOCKS, VL_E32, 2
+
+ // Repeat if more data remains.
+ bnez LEN, 1b
+
+ // Update *IVP to contain the next counter.
+ vsetivli zero, 4, e32, m1, ta, mu
+ vadd.vx v16, v16, VL_BLOCKS, v0.t
+ vrev8.v v16, v16, v0.t // Convert back to big-endian.
+ vse32.v v16, (IVP)
+
+ ret
+.endm
+
+// void aes_ctr32_crypt_zvkned_zvkb(const struct crypto_aes_ctx *key,
+// const u8 *in, u8 *out, size_t len,
+// u8 iv[AES_BLOCK_SIZE]);
+SYM_FUNC_START(aes_ctr32_crypt_zvkned_zvkb)
+ aes_begin KEYP, 111f, 222f
+ aes_ctr32_crypt 256
+111:
+ aes_ctr32_crypt 128
+222:
+ aes_ctr32_crypt 192
+SYM_FUNC_END(aes_ctr32_crypt_zvkned_zvkb)
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.S b/arch/riscv/crypto/aes-riscv64-zvkned.S
index 3346978b89d6a..6f7e8b0f31423 100644
--- a/arch/riscv/crypto/aes-riscv64-zvkned.S
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.S
@@ -75,10 +75,106 @@
// void aes_encrypt_zvkned(const struct crypto_aes_ctx *key,
// const u8 in[16], u8 out[16]);
SYM_FUNC_START(aes_encrypt_zvkned)
aes_crypt_zvkned 1
SYM_FUNC_END(aes_encrypt_zvkned)

// Same prototype and calling convention as the encryption function
SYM_FUNC_START(aes_decrypt_zvkned)
aes_crypt_zvkned 0
SYM_FUNC_END(aes_decrypt_zvkned)
+
+.macro __aes_ecb_crypt enc, keylen
+ srli t0, LEN, 2
+ // t0 is the remaining length in 32-bit words. It's a multiple of 4.
+1:
+ vsetvli t1, t0, e32, m4, ta, ma
+ sub t0, t0, t1 // Subtract number of words processed
+ slli t1, t1, 2 // Words to bytes
+ vle32.v v16, (INP)
+ aes_crypt v16, \enc, \keylen
+ vse32.v v16, (OUTP)
+ add INP, INP, t1
+ add OUTP, OUTP, t1
+ bnez t0, 1b
+
+ ret
+.endm
+
+.macro aes_ecb_crypt enc
+ aes_begin KEYP, 111f, 222f
+ __aes_ecb_crypt \enc, 256
+111:
+ __aes_ecb_crypt \enc, 128
+222:
+ __aes_ecb_crypt \enc, 192
+.endm
+
+// void aes_ecb_encrypt_zvkned(const struct crypto_aes_ctx *key,
+// const u8 *in, u8 *out, size_t len);
+//
+// |len| must be nonzero and a multiple of 16 (AES_BLOCK_SIZE).
+SYM_FUNC_START(aes_ecb_encrypt_zvkned)
+ aes_ecb_crypt 1
+SYM_FUNC_END(aes_ecb_encrypt_zvkned)
+
+// Same prototype and calling convention as the encryption function
+SYM_FUNC_START(aes_ecb_decrypt_zvkned)
+ aes_ecb_crypt 0
+SYM_FUNC_END(aes_ecb_decrypt_zvkned)
+
+.macro aes_cbc_encrypt keylen
+ vle32.v v16, (IVP) // Load IV
+1:
+ vle32.v v17, (INP) // Load plaintext block
+ vxor.vv v16, v16, v17 // XOR with IV or prev ciphertext block
+ aes_encrypt v16, \keylen // Encrypt
+ vse32.v v16, (OUTP) // Store ciphertext block
+ addi INP, INP, 16
+ addi OUTP, OUTP, 16
+ addi LEN, LEN, -16
+ bnez LEN, 1b
+
+ vse32.v v16, (IVP) // Store next IV
+ ret
+.endm
+
+.macro aes_cbc_decrypt keylen
+ vle32.v v16, (IVP) // Load IV
+1:
+ vle32.v v17, (INP) // Load ciphertext block
+ vmv.v.v v18, v17 // Save ciphertext block
+ aes_decrypt v17, \keylen // Decrypt
+ vxor.vv v17, v17, v16 // XOR with IV or prev ciphertext block
+ vse32.v v17, (OUTP) // Store plaintext block
+ vmv.v.v v16, v18 // Next "IV" is prev ciphertext block
+ addi INP, INP, 16
+ addi OUTP, OUTP, 16
+ addi LEN, LEN, -16
+ bnez LEN, 1b
+
+ vse32.v v16, (IVP) // Store next IV
+ ret
+.endm
+
+// void aes_cbc_encrypt_zvkned(const struct crypto_aes_ctx *key,
+// const u8 *in, u8 *out, size_t len, u8 iv[16]);
+//
+// |len| must be nonzero and a multiple of 16 (AES_BLOCK_SIZE).
+SYM_FUNC_START(aes_cbc_encrypt_zvkned)
+ aes_begin KEYP, 111f, 222f
+ aes_cbc_encrypt 256
+111:
+ aes_cbc_encrypt 128
+222:
+ aes_cbc_encrypt 192
+SYM_FUNC_END(aes_cbc_encrypt_zvkned)
+
+// Same prototype and calling convention as the encryption function
+SYM_FUNC_START(aes_cbc_decrypt_zvkned)
+ aes_begin KEYP, 111f, 222f
+ aes_cbc_decrypt 256
+111:
+ aes_cbc_decrypt 128
+222:
+ aes_cbc_decrypt 192
+SYM_FUNC_END(aes_cbc_decrypt_zvkned)
--
2.43.0


2024-01-02 06:52:35

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 08/13] crypto: riscv - add vector crypto accelerated ChaCha20

From: Jerry Shih <[email protected]>

Add an implementation of ChaCha20 using the Zvkb extension. The
assembly code is derived from OpenSSL code (openssl/openssl#21923) that
was dual-licensed so that it could be reused in the kernel.
Nevertheless, the assembly has been significantly reworked for
integration with the kernel, for example by using a regular .S file
instead of the so-called perlasm, using the assembler instead of bare
'.inst', and reducing code duplication.

Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/chacha-riscv64-glue.c | 101 ++++++++
arch/riscv/crypto/chacha-riscv64-zvkb.S | 294 ++++++++++++++++++++++++
4 files changed, 409 insertions(+)
create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index db8a5bcbea785..d9a6920df9e99 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -26,11 +26,22 @@ config CRYPTO_AES_BLOCK_RISCV64
- CTR (Counter) mode (NIST SP 800-38A)
- XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext
Stealing) mode (NIST SP 800-38E and IEEE 1619)

Architecture: riscv64 using:
- Zvkned vector crypto extension
- Zvbb vector extension (XTS)
- Zvkb vector crypto extension (CTR/XTS)
- Zvkg vector crypto extension (XTS)

+config CRYPTO_CHACHA_RISCV64
+ tristate "Ciphers: ChaCha"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_SKCIPHER
+ select CRYPTO_LIB_CHACHA_GENERIC
+ help
+ Length-preserving ciphers: ChaCha20 stream cipher algorithm
+
+ Architecture: riscv64 using:
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 5dd91f34f0d52..7b1e3a3f2041f 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -1,10 +1,13 @@
# SPDX-License-Identifier: GPL-2.0-only
#
# linux/arch/riscv/crypto/Makefile
#

obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o

obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o
+
+obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o
+chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o
diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c
new file mode 100644
index 0000000000000..0fc2e43de401a
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-glue.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ChaCha20 using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/chacha.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+asmlinkage void chacha20_zvkb(const u32 key[8], const u8 *in, u8 *out,
+ size_t len, const u32 iv[4]);
+
+static int riscv64_chacha20_crypt(struct skcipher_request *req)
+{
+ u32 iv[CHACHA_IV_SIZE / sizeof(u32)];
+ u8 block_buffer[CHACHA_BLOCK_SIZE];
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ unsigned int nbytes;
+ unsigned int tail_bytes;
+ int err;
+
+ iv[0] = get_unaligned_le32(req->iv);
+ iv[1] = get_unaligned_le32(req->iv + 4);
+ iv[2] = get_unaligned_le32(req->iv + 8);
+ iv[3] = get_unaligned_le32(req->iv + 12);
+
+ err = skcipher_walk_virt(&walk, req, false);
+ while (walk.nbytes) {
+ nbytes = walk.nbytes & ~(CHACHA_BLOCK_SIZE - 1);
+ tail_bytes = walk.nbytes & (CHACHA_BLOCK_SIZE - 1);
+ kernel_vector_begin();
+ if (nbytes) {
+ chacha20_zvkb(ctx->key, walk.src.virt.addr,
+ walk.dst.virt.addr, nbytes, iv);
+ iv[0] += nbytes / CHACHA_BLOCK_SIZE;
+ }
+ if (walk.nbytes == walk.total && tail_bytes > 0) {
+ memcpy(block_buffer, walk.src.virt.addr + nbytes,
+ tail_bytes);
+ chacha20_zvkb(ctx->key, block_buffer, block_buffer,
+ CHACHA_BLOCK_SIZE, iv);
+ memcpy(walk.dst.virt.addr + nbytes, block_buffer,
+ tail_bytes);
+ tail_bytes = 0;
+ }
+ kernel_vector_end();
+
+ err = skcipher_walk_done(&walk, tail_bytes);
+ }
+
+ return err;
+}
+
+static struct skcipher_alg riscv64_chacha_alg = {
+ .setkey = chacha20_setkey,
+ .encrypt = riscv64_chacha20_crypt,
+ .decrypt = riscv64_chacha20_crypt,
+ .min_keysize = CHACHA_KEY_SIZE,
+ .max_keysize = CHACHA_KEY_SIZE,
+ .ivsize = CHACHA_IV_SIZE,
+ .chunksize = CHACHA_BLOCK_SIZE,
+ .walksize = CHACHA_BLOCK_SIZE * 4,
+ .base = {
+ .cra_blocksize = 1,
+ .cra_ctxsize = sizeof(struct chacha_ctx),
+ .cra_priority = 300,
+ .cra_name = "chacha20",
+ .cra_driver_name = "chacha20-riscv64-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static int __init riscv64_chacha_mod_init(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_skcipher(&riscv64_chacha_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_chacha_mod_fini(void)
+{
+ crypto_unregister_skcipher(&riscv64_chacha_alg);
+}
+
+module_init(riscv64_chacha_mod_init);
+module_exit(riscv64_chacha_mod_fini);
+
+MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)");
+MODULE_AUTHOR("Jerry Shih <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("chacha20");
diff --git a/arch/riscv/crypto/chacha-riscv64-zvkb.S b/arch/riscv/crypto/chacha-riscv64-zvkb.S
new file mode 100644
index 0000000000000..642256dbee0f9
--- /dev/null
+++ b/arch/riscv/crypto/chacha-riscv64-zvkb.S
@@ -0,0 +1,294 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+#include <linux/linkage.h>
+
+.text
+.option arch, +zvkb
+
+#define KEYP a0
+#define INP a1
+#define OUTP a2
+#define LEN a3
+#define IVP a4
+
+#define CONSTS0 a5
+#define CONSTS1 a6
+#define CONSTS2 a7
+#define CONSTS3 t0
+#define TMP t1
+#define VL t2
+#define STRIDE t3
+#define NROUNDS t4
+#define KEY0 s0
+#define KEY1 s1
+#define KEY2 s2
+#define KEY3 s3
+#define KEY4 s4
+#define KEY5 s5
+#define KEY6 s6
+#define KEY7 s7
+#define COUNTER s8
+#define NONCE0 s9
+#define NONCE1 s10
+#define NONCE2 s11
+
+.macro chacha_round a0, b0, c0, d0, a1, b1, c1, d1, \
+ a2, b2, c2, d2, a3, b3, c3, d3
+ // a += b; d ^= a; d = rol(d, 16);
+ vadd.vv \a0, \a0, \b0
+ vadd.vv \a1, \a1, \b1
+ vadd.vv \a2, \a2, \b2
+ vadd.vv \a3, \a3, \b3
+ vxor.vv \d0, \d0, \a0
+ vxor.vv \d1, \d1, \a1
+ vxor.vv \d2, \d2, \a2
+ vxor.vv \d3, \d3, \a3
+ vror.vi \d0, \d0, 32 - 16
+ vror.vi \d1, \d1, 32 - 16
+ vror.vi \d2, \d2, 32 - 16
+ vror.vi \d3, \d3, 32 - 16
+
+ // c += d; b ^= c; b = rol(b, 12);
+ vadd.vv \c0, \c0, \d0
+ vadd.vv \c1, \c1, \d1
+ vadd.vv \c2, \c2, \d2
+ vadd.vv \c3, \c3, \d3
+ vxor.vv \b0, \b0, \c0
+ vxor.vv \b1, \b1, \c1
+ vxor.vv \b2, \b2, \c2
+ vxor.vv \b3, \b3, \c3
+ vror.vi \b0, \b0, 32 - 12
+ vror.vi \b1, \b1, 32 - 12
+ vror.vi \b2, \b2, 32 - 12
+ vror.vi \b3, \b3, 32 - 12
+
+ // a += b; d ^= a; d = rol(d, 8);
+ vadd.vv \a0, \a0, \b0
+ vadd.vv \a1, \a1, \b1
+ vadd.vv \a2, \a2, \b2
+ vadd.vv \a3, \a3, \b3
+ vxor.vv \d0, \d0, \a0
+ vxor.vv \d1, \d1, \a1
+ vxor.vv \d2, \d2, \a2
+ vxor.vv \d3, \d3, \a3
+ vror.vi \d0, \d0, 32 - 8
+ vror.vi \d1, \d1, 32 - 8
+ vror.vi \d2, \d2, 32 - 8
+ vror.vi \d3, \d3, 32 - 8
+
+ // c += d; b ^= c; b = rol(b, 7);
+ vadd.vv \c0, \c0, \d0
+ vadd.vv \c1, \c1, \d1
+ vadd.vv \c2, \c2, \d2
+ vadd.vv \c3, \c3, \d3
+ vxor.vv \b0, \b0, \c0
+ vxor.vv \b1, \b1, \c1
+ vxor.vv \b2, \b2, \c2
+ vxor.vv \b3, \b3, \c3
+ vror.vi \b0, \b0, 32 - 7
+ vror.vi \b1, \b1, 32 - 7
+ vror.vi \b2, \b2, 32 - 7
+ vror.vi \b3, \b3, 32 - 7
+.endm
+
+// void chacha20_zvkb(const u32 key[8], const u8 *in, u8 *out, size_t len,
+// const u32 iv[4]);
+//
+// |len| must be nonzero and a multiple of 64 (CHACHA_BLOCK_SIZE).
+// The counter is treated as 32-bit, following the RFC7539 convention.
+SYM_FUNC_START(chacha20_zvkb)
+ srli LEN, LEN, 6 // Bytes to blocks
+
+ addi sp, sp, -96
+ sd s0, 0(sp)
+ sd s1, 8(sp)
+ sd s2, 16(sp)
+ sd s3, 24(sp)
+ sd s4, 32(sp)
+ sd s5, 40(sp)
+ sd s6, 48(sp)
+ sd s7, 56(sp)
+ sd s8, 64(sp)
+ sd s9, 72(sp)
+ sd s10, 80(sp)
+ sd s11, 88(sp)
+
+ li STRIDE, 64
+
+ // Set up the initial state matrix in scalar registers.
+ li CONSTS0, 0x61707865 // "expa" little endian
+ li CONSTS1, 0x3320646e // "nd 3" little endian
+ li CONSTS2, 0x79622d32 // "2-by" little endian
+ li CONSTS3, 0x6b206574 // "te k" little endian
+ lw KEY0, 0(KEYP)
+ lw KEY1, 4(KEYP)
+ lw KEY2, 8(KEYP)
+ lw KEY3, 12(KEYP)
+ lw KEY4, 16(KEYP)
+ lw KEY5, 20(KEYP)
+ lw KEY6, 24(KEYP)
+ lw KEY7, 28(KEYP)
+ lw COUNTER, 0(IVP)
+ lw NONCE0, 4(IVP)
+ lw NONCE1, 8(IVP)
+ lw NONCE2, 12(IVP)
+
+.Lblock_loop:
+ // Set vl to the number of blocks to process in this iteration.
+ vsetvli VL, LEN, e32, m1, ta, ma
+
+ // Set up the initial state matrix for the next VL blocks in v0-v15.
+ // v{i} holds the i'th 32-bit word of the state matrix for all blocks.
+ // Note that only the counter word, at index 12, differs across blocks.
+ vmv.v.x v0, CONSTS0
+ vmv.v.x v1, CONSTS1
+ vmv.v.x v2, CONSTS2
+ vmv.v.x v3, CONSTS3
+ vmv.v.x v4, KEY0
+ vmv.v.x v5, KEY1
+ vmv.v.x v6, KEY2
+ vmv.v.x v7, KEY3
+ vmv.v.x v8, KEY4
+ vmv.v.x v9, KEY5
+ vmv.v.x v10, KEY6
+ vmv.v.x v11, KEY7
+ vid.v v12
+ vadd.vx v12, v12, COUNTER
+ vmv.v.x v13, NONCE0
+ vmv.v.x v14, NONCE1
+ vmv.v.x v15, NONCE2
+
+ // Load the first half of the input data for each block into v16-v23.
+ // v{16+i} holds the i'th 32-bit word for all blocks.
+ vlsseg8e32.v v16, (INP), STRIDE
+
+ li NROUNDS, 20
+.Lround_loop:
+ addi NROUNDS, NROUNDS, -2
+ // column round
+ chacha_round v0, v4, v8, v12, v1, v5, v9, v13, \
+ v2, v6, v10, v14, v3, v7, v11, v15
+ // diagonal round
+ chacha_round v0, v5, v10, v15, v1, v6, v11, v12, \
+ v2, v7, v8, v13, v3, v4, v9, v14
+ bnez NROUNDS, .Lround_loop
+
+ // Load the second half of the input data for each block into v24-v31.
+ // v{24+i} holds the {8+i}'th 32-bit word for all blocks.
+ addi TMP, INP, 32
+ vlsseg8e32.v v24, (TMP), STRIDE
+
+ // Finalize the first half of the keystream for each block.
+ vadd.vx v0, v0, CONSTS0
+ vadd.vx v1, v1, CONSTS1
+ vadd.vx v2, v2, CONSTS2
+ vadd.vx v3, v3, CONSTS3
+ vadd.vx v4, v4, KEY0
+ vadd.vx v5, v5, KEY1
+ vadd.vx v6, v6, KEY2
+ vadd.vx v7, v7, KEY3
+
+ // Encrypt/decrypt the first half of the data for each block.
+ vxor.vv v16, v16, v0
+ vxor.vv v17, v17, v1
+ vxor.vv v18, v18, v2
+ vxor.vv v19, v19, v3
+ vxor.vv v20, v20, v4
+ vxor.vv v21, v21, v5
+ vxor.vv v22, v22, v6
+ vxor.vv v23, v23, v7
+
+ // Store the first half of the output data for each block.
+ vssseg8e32.v v16, (OUTP), STRIDE
+
+ // Finalize the second half of the keystream for each block.
+ vadd.vx v8, v8, KEY4
+ vadd.vx v9, v9, KEY5
+ vadd.vx v10, v10, KEY6
+ vadd.vx v11, v11, KEY7
+ vid.v v0
+ vadd.vx v12, v12, COUNTER
+ vadd.vx v13, v13, NONCE0
+ vadd.vx v14, v14, NONCE1
+ vadd.vx v15, v15, NONCE2
+ vadd.vv v12, v12, v0
+
+ // Encrypt/decrypt the second half of the data for each block.
+ vxor.vv v24, v24, v8
+ vxor.vv v25, v25, v9
+ vxor.vv v26, v26, v10
+ vxor.vv v27, v27, v11
+ vxor.vv v29, v29, v13
+ vxor.vv v28, v28, v12
+ vxor.vv v30, v30, v14
+ vxor.vv v31, v31, v15
+
+ // Store the second half of the output data for each block.
+ addi TMP, OUTP, 32
+ vssseg8e32.v v24, (TMP), STRIDE
+
+ // Update the counter, the remaining number of blocks, and the input and
+ // output pointers according to the number of blocks processed (VL).
+ add COUNTER, COUNTER, VL
+ sub LEN, LEN, VL
+ slli TMP, VL, 6
+ add OUTP, OUTP, TMP
+ add INP, INP, TMP
+ bnez LEN, .Lblock_loop
+
+ ld s0, 0(sp)
+ ld s1, 8(sp)
+ ld s2, 16(sp)
+ ld s3, 24(sp)
+ ld s4, 32(sp)
+ ld s5, 40(sp)
+ ld s6, 48(sp)
+ ld s7, 56(sp)
+ ld s8, 64(sp)
+ ld s9, 72(sp)
+ ld s10, 80(sp)
+ ld s11, 88(sp)
+ addi sp, sp, 96
+ ret
+SYM_FUNC_END(chacha20_zvkb)
--
2.43.0


2024-01-02 06:52:54

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 09/13] crypto: riscv - add vector crypto accelerated GHASH

From: Jerry Shih <[email protected]>

Add an implementation of GHASH using the zvkg extension. The assembly
code is derived from OpenSSL code (openssl/openssl#21923) that was
dual-licensed so that it could be reused in the kernel. Nevertheless,
the assembly has been significantly reworked for integration with the
kernel, for example by using a regular .S file instead of the so-called
perlasm, using the assembler instead of bare '.inst', reducing code
duplication, and eliminating unnecessary endianness conversions.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 10 ++
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/ghash-riscv64-glue.c | 170 +++++++++++++++++++++++++
arch/riscv/crypto/ghash-riscv64-zvkg.S | 72 +++++++++++
4 files changed, 255 insertions(+)
create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index d9a6920df9e99..573818bb3e677 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -37,11 +37,21 @@ config CRYPTO_CHACHA_RISCV64
tristate "Ciphers: ChaCha"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_SKCIPHER
select CRYPTO_LIB_CHACHA_GENERIC
help
Length-preserving ciphers: ChaCha20 stream cipher algorithm

Architecture: riscv64 using:
- Zvkb vector crypto extension

+config CRYPTO_GHASH_RISCV64
+ tristate "Hash functions: GHASH"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_GCM
+ help
+ GCM GHASH function (NIST SP 800-38D)
+
+ Architecture: riscv64 using:
+ - Zvkg vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 7b1e3a3f2041f..d21d3a3fc157c 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -4,10 +4,13 @@
#

obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o

obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o

obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o
chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o
+
+obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
+ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
new file mode 100644
index 0000000000000..d20c929771f05
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GHASH using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/ghash.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+asmlinkage void ghash_zvkg(be128 *accumulator, const be128 *key,
+ const u8 *data, size_t len);
+
+struct riscv64_ghash_tfm_ctx {
+ be128 key;
+};
+
+struct riscv64_ghash_desc_ctx {
+ be128 shash;
+ u8 buffer[GHASH_BLOCK_SIZE];
+ u32 bytes;
+};
+
+static int riscv64_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(tfm);
+
+ if (keylen != GHASH_BLOCK_SIZE)
+ return -EINVAL;
+
+ memcpy(&tctx->key, key, GHASH_BLOCK_SIZE);
+
+ return 0;
+}
+
+static int riscv64_ghash_init(struct shash_desc *desc)
+{
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+ *dctx = (struct riscv64_ghash_desc_ctx){};
+
+ return 0;
+}
+
+static inline void
+riscv64_ghash_blocks(const struct riscv64_ghash_tfm_ctx *tctx,
+ struct riscv64_ghash_desc_ctx *dctx,
+ const u8 *src, size_t srclen)
+{
+ /* The srclen is nonzero and a multiple of 16. */
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ ghash_zvkg(&dctx->shash, &tctx->key, src, srclen);
+ kernel_vector_end();
+ } else {
+ do {
+ crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE);
+ gf128mul_lle(&dctx->shash, &tctx->key);
+ srclen -= GHASH_BLOCK_SIZE;
+ src += GHASH_BLOCK_SIZE;
+ } while (srclen);
+ }
+}
+
+static int riscv64_ghash_update(struct shash_desc *desc, const u8 *src,
+ unsigned int srclen)
+{
+ size_t len;
+ const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+ if (dctx->bytes) {
+ if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) {
+ memcpy(dctx->buffer + dctx->bytes, src, srclen);
+ dctx->bytes += srclen;
+ return 0;
+ }
+ memcpy(dctx->buffer + dctx->bytes, src,
+ GHASH_BLOCK_SIZE - dctx->bytes);
+
+ riscv64_ghash_blocks(tctx, dctx, dctx->buffer,
+ GHASH_BLOCK_SIZE);
+
+ src += GHASH_BLOCK_SIZE - dctx->bytes;
+ srclen -= GHASH_BLOCK_SIZE - dctx->bytes;
+ dctx->bytes = 0;
+ }
+ len = srclen & ~(GHASH_BLOCK_SIZE - 1);
+
+ if (len) {
+ riscv64_ghash_blocks(tctx, dctx, src, len);
+ src += len;
+ srclen -= len;
+ }
+
+ if (srclen) {
+ memcpy(dctx->buffer, src, srclen);
+ dctx->bytes = srclen;
+ }
+
+ return 0;
+}
+
+static int riscv64_ghash_final(struct shash_desc *desc, u8 *out)
+{
+ const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
+ struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+ int i;
+
+ if (dctx->bytes) {
+ for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
+ dctx->buffer[i] = 0;
+
+ riscv64_ghash_blocks(tctx, dctx, dctx->buffer,
+ GHASH_BLOCK_SIZE);
+ }
+
+ memcpy(out, &dctx->shash, GHASH_DIGEST_SIZE);
+
+ return 0;
+}
+
+static struct shash_alg riscv64_ghash_alg = {
+ .init = riscv64_ghash_init,
+ .update = riscv64_ghash_update,
+ .final = riscv64_ghash_final,
+ .setkey = riscv64_ghash_setkey,
+ .descsize = sizeof(struct riscv64_ghash_desc_ctx),
+ .digestsize = GHASH_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = GHASH_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct riscv64_ghash_tfm_ctx),
+ .cra_priority = 300,
+ .cra_name = "ghash",
+ .cra_driver_name = "ghash-riscv64-zvkg",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static int __init riscv64_ghash_mod_init(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKG) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_shash(&riscv64_ghash_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_ghash_mod_fini(void)
+{
+ crypto_unregister_shash(&riscv64_ghash_alg);
+}
+
+module_init(riscv64_ghash_mod_init);
+module_exit(riscv64_ghash_mod_fini);
+
+MODULE_DESCRIPTION("GHASH (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.S b/arch/riscv/crypto/ghash-riscv64-zvkg.S
new file mode 100644
index 0000000000000..7d406ea743220
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.S
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector GCM/GMAC extension ('Zvkg')
+
+#include <linux/linkage.h>
+
+.text
+.option arch, +zvkg
+
+#define ACCUMULATOR a0
+#define KEY a1
+#define DATA a2
+#define LEN a3
+
+// void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data,
+// size_t len);
+//
+// |len| must be nonzero and a multiple of 16 (GHASH_BLOCK_SIZE).
+SYM_FUNC_START(ghash_zvkg)
+ vsetivli zero, 4, e32, m1, ta, ma
+ vle32.v v1, (ACCUMULATOR)
+ vle32.v v2, (KEY)
+.Lnext_block:
+ vle32.v v3, (DATA)
+ addi DATA, DATA, 16
+ addi LEN, LEN, -16
+ vghsh.vv v1, v2, v3
+ bnez LEN, .Lnext_block
+
+ vse32.v v1, (ACCUMULATOR)
+ ret
+SYM_FUNC_END(ghash_zvkg)
--
2.43.0


2024-01-02 06:53:10

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 10/13] crypto: riscv - add vector crypto accelerated SHA-{256,224}

From: Jerry Shih <[email protected]>

Add an implementation of SHA-224 and SHA-256 using the Zvknha or Zvknhb
extension. The assembly code is derived from OpenSSL code
(openssl/openssl#21923) that was dual-licensed so that it could be
reused in the kernel. Nevertheless, the assembly has been significantly
reworked for integration with the kernel, for example by using a regular
.S file instead of the so-called perlasm, using the assembler instead of
bare '.inst', and greatly reducing code duplication.

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/sha256-riscv64-glue.c | 137 +++++++++++
.../sha256-riscv64-zvknha_or_zvknhb-zvkb.S | 225 ++++++++++++++++++
4 files changed, 376 insertions(+)
create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 573818bb3e677..533bc6def123a 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -47,11 +47,22 @@ config CRYPTO_CHACHA_RISCV64
config CRYPTO_GHASH_RISCV64
tristate "Hash functions: GHASH"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_GCM
help
GCM GHASH function (NIST SP 800-38D)

Architecture: riscv64 using:
- Zvkg vector crypto extension

+config CRYPTO_SHA256_RISCV64
+ tristate "Hash functions: SHA-224 and SHA-256"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_SHA256
+ help
+ SHA-224 and SHA-256 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64 using:
+ - Zvknha or Zvknhb vector crypto extensions
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index d21d3a3fc157c..28a58e89927ae 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -7,10 +7,13 @@ obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o

obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o

obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o
chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o

obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
+
+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 0000000000000..5c3fa3d0174a4
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * SHA-256 and SHA-224 using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha256_base.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+/*
+ * Note: the asm function only uses the 'state' field of struct sha256_state.
+ * It is assumed to be the first field.
+ */
+asmlinkage void sha256_transform_zvknha_or_zvknhb_zvkb(
+ struct sha256_state *state, const u8 *data, int num_blocks);
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ /*
+ * Ensure struct sha256_state begins directly with the SHA-256
+ * 256-bit internal state, as this is what the asm function expects.
+ */
+ BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ sha256_base_do_update(desc, data, len,
+ sha256_transform_zvknha_or_zvknhb_zvkb);
+ kernel_vector_end();
+ } else {
+ crypto_sha256_update(desc, data, len);
+ }
+ return 0;
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sha256_base_do_update(
+ desc, data, len,
+ sha256_transform_zvknha_or_zvknhb_zvkb);
+ sha256_base_do_finalize(
+ desc, sha256_transform_zvknha_or_zvknhb_zvkb);
+ kernel_vector_end();
+
+ return sha256_base_finish(desc, out);
+ }
+
+ return crypto_sha256_finup(desc, data, len, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static int riscv64_sha256_digest(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ return sha256_base_init(desc) ?:
+ riscv64_sha256_finup(desc, data, len, out);
+}
+
+static struct shash_alg riscv64_sha256_algs[] = {
+ {
+ .init = sha256_base_init,
+ .update = riscv64_sha256_update,
+ .final = riscv64_sha256_final,
+ .finup = riscv64_sha256_finup,
+ .digest = riscv64_sha256_digest,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA256_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_priority = 300,
+ .cra_name = "sha256",
+ .cra_driver_name = "sha256-riscv64-zvknha_or_zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ }, {
+ .init = sha224_base_init,
+ .update = riscv64_sha256_update,
+ .final = riscv64_sha256_final,
+ .finup = riscv64_sha256_finup,
+ .descsize = sizeof(struct sha256_state),
+ .digestsize = SHA224_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA224_BLOCK_SIZE,
+ .cra_priority = 300,
+ .cra_name = "sha224",
+ .cra_driver_name = "sha224-riscv64-zvknha_or_zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
+
+static int __init riscv64_sha256_mod_init(void)
+{
+ /* Both zvknha and zvknhb provide the SHA-256 instructions. */
+ if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+ riscv_isa_extension_available(NULL, ZVKNHB)) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_shashes(riscv64_sha256_algs,
+ ARRAY_SIZE(riscv64_sha256_algs));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sha256_mod_fini(void)
+{
+ crypto_unregister_shashes(riscv64_sha256_algs,
+ ARRAY_SIZE(riscv64_sha256_algs));
+}
+
+module_init(riscv64_sha256_mod_init);
+module_exit(riscv64_sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha256");
+MODULE_ALIAS_CRYPTO("sha224");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S b/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S
new file mode 100644
index 0000000000000..faa7ee2b129b1
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S
@@ -0,0 +1,225 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Phoebe Chen <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb')
+// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+#include <linux/cfi_types.h>
+
+.text
+.option arch, +zvknha, +zvkb
+
+#define STATEP a0
+#define DATA a1
+#define NUM_BLOCKS a2
+
+#define STATEP_C a3
+
+#define MASK v0
+#define INDICES v1
+#define W0 v2
+#define W1 v3
+#define W2 v4
+#define W3 v5
+#define VTMP v6
+#define FEBA v7
+#define HGDC v8
+#define K0 v10
+#define K1 v11
+#define K2 v12
+#define K3 v13
+#define K4 v14
+#define K5 v15
+#define K6 v16
+#define K7 v17
+#define K8 v18
+#define K9 v19
+#define K10 v20
+#define K11 v21
+#define K12 v22
+#define K13 v23
+#define K14 v24
+#define K15 v25
+#define PREV_FEBA v30
+#define PREV_HGDC v31
+
+// Do 4 rounds of SHA-256. w0 contains the current 4 message schedule words.
+//
+// If not all the message schedule words have been computed yet, then this also
+// computes 4 more message schedule words. w1-w3 contain the next 3 groups of 4
+// message schedule words; this macro computes the group after w3 and writes it
+// to w0. This means that the next (w0, w1, w2, w3) is the current (w1, w2, w3,
+// w0), so the caller must cycle through the registers accordingly.
+.macro sha256_4rounds last, k, w0, w1, w2, w3
+ vadd.vv VTMP, \k, \w0
+ vsha2cl.vv HGDC, FEBA, VTMP
+ vsha2ch.vv FEBA, HGDC, VTMP
+.if !\last
+ vmerge.vvm VTMP, \w2, \w1, MASK
+ vsha2ms.vv \w0, VTMP, \w3
+.endif
+.endm
+
+.macro sha256_16rounds last, k0, k1, k2, k3
+ sha256_4rounds \last, \k0, W0, W1, W2, W3
+ sha256_4rounds \last, \k1, W1, W2, W3, W0
+ sha256_4rounds \last, \k2, W2, W3, W0, W1
+ sha256_4rounds \last, \k3, W3, W0, W1, W2
+.endm
+
+// void sha256_transform_zvknha_or_zvknhb_zvkb(u32 state[8], const u8 *data,
+// int num_blocks);
+SYM_TYPED_FUNC_START(sha256_transform_zvknha_or_zvknhb_zvkb)
+
+ // Load the round constants into K0-K15.
+ vsetivli zero, 4, e32, m1, ta, ma
+ la t0, K256
+ vle32.v K0, (t0)
+ addi t0, t0, 16
+ vle32.v K1, (t0)
+ addi t0, t0, 16
+ vle32.v K2, (t0)
+ addi t0, t0, 16
+ vle32.v K3, (t0)
+ addi t0, t0, 16
+ vle32.v K4, (t0)
+ addi t0, t0, 16
+ vle32.v K5, (t0)
+ addi t0, t0, 16
+ vle32.v K6, (t0)
+ addi t0, t0, 16
+ vle32.v K7, (t0)
+ addi t0, t0, 16
+ vle32.v K8, (t0)
+ addi t0, t0, 16
+ vle32.v K9, (t0)
+ addi t0, t0, 16
+ vle32.v K10, (t0)
+ addi t0, t0, 16
+ vle32.v K11, (t0)
+ addi t0, t0, 16
+ vle32.v K12, (t0)
+ addi t0, t0, 16
+ vle32.v K13, (t0)
+ addi t0, t0, 16
+ vle32.v K14, (t0)
+ addi t0, t0, 16
+ vle32.v K15, (t0)
+
+ // Setup mask for the vmerge to replace the first word (idx==0) in
+ // message scheduling. There are 4 words, so an 8-bit mask suffices.
+ vsetivli zero, 1, e8, m1, ta, ma
+ vmv.v.i MASK, 0x01
+
+ // Load the state. The state is stored as {a,b,c,d,e,f,g,h}, but we
+ // need {f,e,b,a},{h,g,d,c}. The dst vtype is e32m1 and the index vtype
+ // is e8mf4. We use index-load with the i8 indices {20, 16, 4, 0},
+ // loaded using the 32-bit little endian value 0x00041014.
+ li t0, 0x00041014
+ vsetivli zero, 1, e32, m1, ta, ma
+ vmv.v.x INDICES, t0
+ addi STATEP_C, STATEP, 8
+ vsetivli zero, 4, e32, m1, ta, ma
+ vluxei8.v FEBA, (STATEP), INDICES
+ vluxei8.v HGDC, (STATEP_C), INDICES
+
+.Lnext_block:
+ addi NUM_BLOCKS, NUM_BLOCKS, -1
+
+ // Save the previous state, as it's needed later.
+ vmv.v.v PREV_FEBA, FEBA
+ vmv.v.v PREV_HGDC, HGDC
+
+ // Load the next 512-bit message block and endian-swap each 32-bit word.
+ vle32.v W0, (DATA)
+ vrev8.v W0, W0
+ addi DATA, DATA, 16
+ vle32.v W1, (DATA)
+ vrev8.v W1, W1
+ addi DATA, DATA, 16
+ vle32.v W2, (DATA)
+ vrev8.v W2, W2
+ addi DATA, DATA, 16
+ vle32.v W3, (DATA)
+ vrev8.v W3, W3
+ addi DATA, DATA, 16
+
+ // Do the 64 rounds of SHA-256.
+ sha256_16rounds 0, K0, K1, K2, K3
+ sha256_16rounds 0, K4, K5, K6, K7
+ sha256_16rounds 0, K8, K9, K10, K11
+ sha256_16rounds 1, K12, K13, K14, K15
+
+ // Add the previous state.
+ vadd.vv FEBA, FEBA, PREV_FEBA
+ vadd.vv HGDC, HGDC, PREV_HGDC
+
+ // Repeat if more blocks remain.
+ bnez NUM_BLOCKS, .Lnext_block
+
+ // Store the new state and return.
+ vsuxei8.v FEBA, (STATEP), INDICES
+ vsuxei8.v HGDC, (STATEP_C), INDICES
+ ret
+SYM_FUNC_END(sha256_transform_zvknha_or_zvknhb_zvkb)
+
+.section ".rodata"
+.p2align 2
+.type K256, @object
+K256:
+ .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+ .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+ .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+ .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+ .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+ .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+ .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+ .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+ .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+ .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+ .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+ .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+ .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+ .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+ .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+ .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size K256, . - K256
--
2.43.0


2024-01-02 06:53:27

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 11/13] crypto: riscv - add vector crypto accelerated SHA-{512,384}

From: Jerry Shih <[email protected]>

Add an implementation of SHA-384 and SHA-512 using the Zvknhb extension.
The assembly code is derived from OpenSSL code (openssl/openssl#21923)
that was dual-licensed so that it could be reused in the kernel.
Nevertheless, the assembly has been significantly reworked for
integration with the kernel, for example by using a regular .S file
instead of the so-called perlasm, using the assembler instead of bare
'.inst', and greatly reducing code duplication.

Co-developed-by: Charalampos Mitrodimas <[email protected]>
Signed-off-by: Charalampos Mitrodimas <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Co-developed-by: Phoebe Chen <[email protected]>
Signed-off-by: Phoebe Chen <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 11 +
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/sha512-riscv64-glue.c | 133 ++++++++++++
.../riscv/crypto/sha512-riscv64-zvknhb-zvkb.S | 203 ++++++++++++++++++
4 files changed, 350 insertions(+)
create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 533bc6def123a..ca13895c3e0f6 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -58,11 +58,22 @@ config CRYPTO_SHA256_RISCV64
tristate "Hash functions: SHA-224 and SHA-256"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_SHA256
help
SHA-224 and SHA-256 secure hash algorithm (FIPS 180)

Architecture: riscv64 using:
- Zvknha or Zvknhb vector crypto extensions
- Zvkb vector crypto extension

+config CRYPTO_SHA512_RISCV64
+ tristate "Hash functions: SHA-384 and SHA-512"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_SHA512
+ help
+ SHA-384 and SHA-512 secure hash algorithm (FIPS 180)
+
+ Architecture: riscv64 using:
+ - Zvknhb vector crypto extension
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 28a58e89927ae..e30a1bcc788bf 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -10,10 +10,13 @@ obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o

obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o
chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o

obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o
+
+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 0000000000000..f30f723742cbe
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * SHA-512 and SHA-384 using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha512_base.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+/*
+ * Note: the asm function only uses the 'state' field of struct sha512_state.
+ * It is assumed to be the first field.
+ */
+asmlinkage void sha512_transform_zvknhb_zvkb(
+ struct sha512_state *state, const u8 *data, int num_blocks);
+
+static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ /*
+ * Ensure struct sha512_state begins directly with the SHA-512
+ * 512-bit internal state, as this is what the asm function expects.
+ */
+ BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ sha512_base_do_update(desc, data, len,
+ sha512_transform_zvknhb_zvkb);
+ kernel_vector_end();
+ } else {
+ crypto_sha512_update(desc, data, len);
+ }
+ return 0;
+}
+
+static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sha512_base_do_update(desc, data, len,
+ sha512_transform_zvknhb_zvkb);
+ sha512_base_do_finalize(desc, sha512_transform_zvknhb_zvkb);
+ kernel_vector_end();
+
+ return sha512_base_finish(desc, out);
+ }
+
+ return crypto_sha512_finup(desc, data, len, out);
+}
+
+static int riscv64_sha512_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sha512_finup(desc, NULL, 0, out);
+}
+
+static int riscv64_sha512_digest(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ return sha512_base_init(desc) ?:
+ riscv64_sha512_finup(desc, data, len, out);
+}
+
+static struct shash_alg riscv64_sha512_algs[] = {
+ {
+ .init = sha512_base_init,
+ .update = riscv64_sha512_update,
+ .final = riscv64_sha512_final,
+ .finup = riscv64_sha512_finup,
+ .digest = riscv64_sha512_digest,
+ .descsize = sizeof(struct sha512_state),
+ .digestsize = SHA512_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA512_BLOCK_SIZE,
+ .cra_priority = 300,
+ .cra_name = "sha512",
+ .cra_driver_name = "sha512-riscv64-zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ }, {
+ .init = sha384_base_init,
+ .update = riscv64_sha512_update,
+ .final = riscv64_sha512_final,
+ .finup = riscv64_sha512_finup,
+ .descsize = sizeof(struct sha512_state),
+ .digestsize = SHA384_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SHA384_BLOCK_SIZE,
+ .cra_priority = 300,
+ .cra_name = "sha384",
+ .cra_driver_name = "sha384-riscv64-zvknhb-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+ },
+};
+
+static int __init riscv64_sha512_mod_init(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_shashes(riscv64_sha512_algs,
+ ARRAY_SIZE(riscv64_sha512_algs));
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sha512_mod_fini(void)
+{
+ crypto_unregister_shashes(riscv64_sha512_algs,
+ ARRAY_SIZE(riscv64_sha512_algs));
+}
+
+module_init(riscv64_sha512_mod_init);
+module_exit(riscv64_sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sha512");
+MODULE_ALIAS_CRYPTO("sha384");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S b/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S
new file mode 100644
index 0000000000000..3a9ae210f9158
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Phoebe Chen <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb')
+// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+#include <linux/cfi_types.h>
+
+.text
+.option arch, +zvknhb, +zvkb
+
+#define STATEP a0
+#define DATA a1
+#define NUM_BLOCKS a2
+
+#define STATEP_C a3
+#define K a4
+
+#define MASK v0
+#define INDICES v1
+#define W0 v10 // LMUL=2
+#define W1 v12 // LMUL=2
+#define W2 v14 // LMUL=2
+#define W3 v16 // LMUL=2
+#define VTMP v20 // LMUL=2
+#define FEBA v22 // LMUL=2
+#define HGDC v24 // LMUL=2
+#define PREV_FEBA v26 // LMUL=2
+#define PREV_HGDC v28 // LMUL=2
+
+// Do 4 rounds of SHA-512. w0 contains the current 4 message schedule words.
+//
+// If not all the message schedule words have been computed yet, then this also
+// computes 4 more message schedule words. w1-w3 contain the next 3 groups of 4
+// message schedule words; this macro computes the group after w3 and writes it
+// to w0. This means that the next (w0, w1, w2, w3) is the current (w1, w2, w3,
+// w0), so the caller must cycle through the registers accordingly.
+.macro sha512_4rounds last, w0, w1, w2, w3
+ vle64.v VTMP, (K)
+ addi K, K, 32
+ vadd.vv VTMP, VTMP, \w0
+ vsha2cl.vv HGDC, FEBA, VTMP
+ vsha2ch.vv FEBA, HGDC, VTMP
+.if !\last
+ vmerge.vvm VTMP, \w2, \w1, MASK
+ vsha2ms.vv \w0, VTMP, \w3
+.endif
+.endm
+
+.macro sha512_16rounds last
+ sha512_4rounds \last, W0, W1, W2, W3
+ sha512_4rounds \last, W1, W2, W3, W0
+ sha512_4rounds \last, W2, W3, W0, W1
+ sha512_4rounds \last, W3, W0, W1, W2
+.endm
+
+// void sha512_transform_zvknhb_zvkb(u64 state[8], const u8 *data,
+// int num_blocks);
+SYM_TYPED_FUNC_START(sha512_transform_zvknhb_zvkb)
+
+ // Setup mask for the vmerge to replace the first word (idx==0) in
+ // message scheduling. There are 4 words, so an 8-bit mask suffices.
+ vsetivli zero, 1, e8, m1, ta, ma
+ vmv.v.i MASK, 0x01
+
+ // Load the state. The state is stored as {a,b,c,d,e,f,g,h}, but we
+ // need {f,e,b,a},{h,g,d,c}. The dst vtype is e64m2 and the index vtype
+ // is e8mf4. We use index-load with the i8 indices {40, 32, 8, 0},
+ // loaded using the 32-bit little endian value 0x00082028.
+ li t0, 0x00082028
+ vsetivli zero, 1, e32, m1, ta, ma
+ vmv.v.x INDICES, t0
+ addi STATEP_C, STATEP, 16
+ vsetivli zero, 4, e64, m2, ta, ma
+ vluxei8.v FEBA, (STATEP), INDICES
+ vluxei8.v HGDC, (STATEP_C), INDICES
+
+.Lnext_block:
+ la K, K512
+ addi NUM_BLOCKS, NUM_BLOCKS, -1
+
+ // Save the previous state, as it's needed later.
+ vmv.v.v PREV_FEBA, FEBA
+ vmv.v.v PREV_HGDC, HGDC
+
+ // Load the next 1024-bit message block and endian-swap each 64-bit word
+ vle64.v W0, (DATA)
+ vrev8.v W0, W0
+ addi DATA, DATA, 32
+ vle64.v W1, (DATA)
+ vrev8.v W1, W1
+ addi DATA, DATA, 32
+ vle64.v W2, (DATA)
+ vrev8.v W2, W2
+ addi DATA, DATA, 32
+ vle64.v W3, (DATA)
+ vrev8.v W3, W3
+ addi DATA, DATA, 32
+
+ // Do the 80 rounds of SHA-512.
+ sha512_16rounds 0
+ sha512_16rounds 0
+ sha512_16rounds 0
+ sha512_16rounds 0
+ sha512_16rounds 1
+
+ // Add the previous state.
+ vadd.vv FEBA, FEBA, PREV_FEBA
+ vadd.vv HGDC, HGDC, PREV_HGDC
+
+ // Repeat if more blocks remain.
+ bnez NUM_BLOCKS, .Lnext_block
+
+ // Store the new state and return.
+ vsuxei8.v FEBA, (STATEP), INDICES
+ vsuxei8.v HGDC, (STATEP_C), INDICES
+ ret
+SYM_FUNC_END(sha512_transform_zvknhb_zvkb)
+
+.section ".rodata"
+.p2align 3
+.type K512, @object
+K512:
+ .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+ .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+ .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+ .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+ .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+ .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+ .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+ .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+ .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+ .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+ .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+ .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+ .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+ .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+ .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+ .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+ .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+ .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+ .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+ .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+ .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+ .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+ .dword 0xd192e819d6ef5218, 0xd69906245565a910
+ .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+ .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+ .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+ .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+ .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+ .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+ .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+ .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+ .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+ .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+ .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+ .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+ .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+ .dword 0x28db77f523047d84, 0x32caab7b40c72493
+ .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+ .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+ .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size K512, . - K512
--
2.43.0


2024-01-02 06:53:43

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 12/13] crypto: riscv - add vector crypto accelerated SM3

From: Jerry Shih <[email protected]>

Add an implementation of SM3 using the Zvksh extension. The assembly
code is derived from OpenSSL code (openssl/openssl#21923) that was
dual-licensed so that it could be reused in the kernel. Nevertheless,
the assembly has been significantly reworked for integration with the
kernel, for example by using a regular .S file instead of the so-called
perlasm, using the assembler instead of bare '.inst', and greatly
reducing code duplication.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 12 ++
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/sm3-riscv64-glue.c | 112 +++++++++++++++++++
arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S | 123 +++++++++++++++++++++
4 files changed, 250 insertions(+)
create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index ca13895c3e0f6..d44911853fa2d 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -69,11 +69,23 @@ config CRYPTO_SHA512_RISCV64
tristate "Hash functions: SHA-384 and SHA-512"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_SHA512
help
SHA-384 and SHA-512 secure hash algorithm (FIPS 180)

Architecture: riscv64 using:
- Zvknhb vector crypto extension
- Zvkb vector crypto extension

+config CRYPTO_SM3_RISCV64
+ tristate "Hash functions: SM3 (ShangMi 3)"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_HASH
+ select CRYPTO_SM3
+ help
+ SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)
+
+ Architecture: riscv64 using:
+ - Zvksh vector crypto extension
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index e30a1bcc788bf..4b5efabd1162a 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -13,10 +13,13 @@ obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o
chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o

obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o

obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o
+
+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh-zvkb.o
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 0000000000000..c9816ce3043f2
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * SM3 using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sm3_base.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+/*
+ * Note: the asm function only uses the 'state' field of struct sm3_state.
+ * It is assumed to be the first field.
+ */
+asmlinkage void sm3_transform_zvksh_zvkb(
+ struct sm3_state *state, const u8 *data, int num_blocks);
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ /*
+ * Ensure struct sm3_state begins directly with the SM3
+ * 256-bit internal state, as this is what the asm function expects.
+ */
+ BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ sm3_base_do_update(desc, data, len, sm3_transform_zvksh_zvkb);
+ kernel_vector_end();
+ } else {
+ sm3_update(shash_desc_ctx(desc), data, len);
+ }
+ return 0;
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+ unsigned int len, u8 *out)
+{
+ struct sm3_state *ctx;
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ if (len)
+ sm3_base_do_update(desc, data, len,
+ sm3_transform_zvksh_zvkb);
+ sm3_base_do_finalize(desc, sm3_transform_zvksh_zvkb);
+ kernel_vector_end();
+
+ return sm3_base_finish(desc, out);
+ }
+
+ ctx = shash_desc_ctx(desc);
+ if (len)
+ sm3_update(ctx, data, len);
+ sm3_final(ctx, out);
+
+ return 0;
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+ return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg riscv64_sm3_alg = {
+ .init = sm3_base_init,
+ .update = riscv64_sm3_update,
+ .final = riscv64_sm3_final,
+ .finup = riscv64_sm3_finup,
+ .descsize = sizeof(struct sm3_state),
+ .digestsize = SM3_DIGEST_SIZE,
+ .base = {
+ .cra_blocksize = SM3_BLOCK_SIZE,
+ .cra_priority = 300,
+ .cra_name = "sm3",
+ .cra_driver_name = "sm3-riscv64-zvksh-zvkb",
+ .cra_module = THIS_MODULE,
+ },
+};
+
+static int __init riscv64_sm3_mod_init(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKSH) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_shash(&riscv64_sm3_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sm3_mod_fini(void)
+{
+ crypto_unregister_shash(&riscv64_sm3_alg);
+}
+
+module_init(riscv64_sm3_mod_init);
+module_exit(riscv64_sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S b/arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S
new file mode 100644
index 0000000000000..a2b65d961c04a
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S
@@ -0,0 +1,123 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector SM3 Secure Hash extension ('Zvksh')
+// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+#include <linux/cfi_types.h>
+
+.text
+.option arch, +zvksh, +zvkb
+
+#define STATEP a0
+#define DATA a1
+#define NUM_BLOCKS a2
+
+#define STATE v0 // LMUL=2
+#define PREV_STATE v2 // LMUL=2
+#define W0 v4 // LMUL=2
+#define W1 v6 // LMUL=2
+#define VTMP v8 // LMUL=2
+
+.macro sm3_8rounds i, w0, w1
+ // Do 4 rounds using W_{0+i}..W_{7+i}.
+ vsm3c.vi STATE, \w0, \i + 0
+ vslidedown.vi VTMP, \w0, 2
+ vsm3c.vi STATE, VTMP, \i + 1
+
+ // Compute W_{4+i}..W_{11+i}.
+ vslidedown.vi VTMP, \w0, 4
+ vslideup.vi VTMP, \w1, 4
+
+ // Do 4 rounds using W_{4+i}..W_{11+i}.
+ vsm3c.vi STATE, VTMP, \i + 2
+ vslidedown.vi VTMP, VTMP, 2
+ vsm3c.vi STATE, VTMP, \i + 3
+
+.if \i < 28
+ // Compute W_{16+i}..W_{23+i}.
+ vsm3me.vv \w0, \w1, \w0
+.endif
+ // For the next 8 rounds, w0 and w1 are swapped.
+.endm
+
+// void sm3_transform_zvksh_zvkb(u32 state[8], const u8 *data, int num_blocks);
+SYM_TYPED_FUNC_START(sm3_transform_zvksh_zvkb)
+
+ // Load the state and endian-swap each 32-bit word.
+ vsetivli zero, 8, e32, m2, ta, ma
+ vle32.v STATE, (STATEP)
+ vrev8.v STATE, STATE
+
+.Lnext_block:
+ addi NUM_BLOCKS, NUM_BLOCKS, -1
+
+ // Save the previous state, as it's needed later.
+ vmv.v.v PREV_STATE, STATE
+
+ // Load the next 512-bit message block into W0-W1.
+ vle32.v W0, (DATA)
+ addi DATA, DATA, 32
+ vle32.v W1, (DATA)
+ addi DATA, DATA, 32
+
+ // Do the 64 rounds of SM3.
+ sm3_8rounds 0, W0, W1
+ sm3_8rounds 4, W1, W0
+ sm3_8rounds 8, W0, W1
+ sm3_8rounds 12, W1, W0
+ sm3_8rounds 16, W0, W1
+ sm3_8rounds 20, W1, W0
+ sm3_8rounds 24, W0, W1
+ sm3_8rounds 28, W1, W0
+
+ // XOR in the previous state.
+ vxor.vv STATE, STATE, PREV_STATE
+
+ // Repeat if more blocks remain.
+ bnez NUM_BLOCKS, .Lnext_block
+
+ // Store the new state and return.
+ vrev8.v STATE, STATE
+ vse32.v STATE, (STATEP)
+ ret
+SYM_FUNC_END(sm3_transform_zvksh_zvkb)
--
2.43.0


2024-01-02 06:54:01

by Eric Biggers

[permalink] [raw]
Subject: [RFC PATCH 13/13] crypto: riscv - add vector crypto accelerated SM4

From: Jerry Shih <[email protected]>

Add an implementation of SM4 using the Zvksed extension. The assembly
code is derived from OpenSSL code (openssl/openssl#21923) that was
dual-licensed so that it could be reused in the kernel. Nevertheless,
the assembly has been significantly reworked for integration with the
kernel, for example by using a regular .S file instead of the so-called
perlasm, using the assembler instead of bare '.inst', and greatly
reducing code duplication.

Co-developed-by: Christoph Müllner <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Co-developed-by: Heiko Stuebner <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Jerry Shih <[email protected]>
Co-developed-by: Eric Biggers <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
arch/riscv/crypto/Kconfig | 17 +++
arch/riscv/crypto/Makefile | 3 +
arch/riscv/crypto/sm4-riscv64-glue.c | 109 ++++++++++++++++++
arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S | 117 ++++++++++++++++++++
4 files changed, 246 insertions(+)
create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index d44911853fa2d..9ba225d968289 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -81,11 +81,28 @@ config CRYPTO_SM3_RISCV64
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_HASH
select CRYPTO_SM3
help
SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)

Architecture: riscv64 using:
- Zvksh vector crypto extension
- Zvkb vector crypto extension

+config CRYPTO_SM4_RISCV64
+ tristate "Ciphers: SM4 (ShangMi 4)"
+ depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
+ select CRYPTO_ALGAPI
+ select CRYPTO_SM4
+ help
+ SM4 block cipher algorithm (OSCCA GB/T 32907-2016,
+ ISO/IEC 18033-3:2010/Amd 1:2021)
+
+ SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+ Organization of State Commercial Administration of China (OSCCA)
+ as an authorized cryptographic algorithm for use within China.
+
+ Architecture: riscv64 using:
+ - Zvksed vector crypto extension
+ - Zvkb vector crypto extension
+
endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 4b5efabd1162a..70d60060d9b26 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -16,10 +16,13 @@ obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o

obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o

obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o

obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh-zvkb.o
+
+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed-zvkb.o
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 0000000000000..b8993f1411f45
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SM4 using the RISC-V vector crypto extensions
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <[email protected]>
+ *
+ * Copyright (C) 2023 SiFive, Inc.
+ * Author: Jerry Shih <[email protected]>
+ */
+
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sm4.h>
+#include <linux/linkage.h>
+#include <linux/module.h>
+
+asmlinkage void sm4_expandkey_zvksed_zvkb(const u8 user_key[SM4_KEY_SIZE],
+ u32 rkey_enc[SM4_RKEY_WORDS],
+ u32 rkey_dec[SM4_RKEY_WORDS]);
+asmlinkage void sm4_crypt_zvksed_zvkb(const u32 rkey[SM4_RKEY_WORDS],
+ const u8 in[SM4_BLOCK_SIZE],
+ u8 out[SM4_BLOCK_SIZE]);
+
+static int riscv64_sm4_setkey(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int key_len)
+{
+ struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (key_len != SM4_KEY_SIZE)
+ return -EINVAL;
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ sm4_expandkey_zvksed_zvkb(key, ctx->rkey_enc, ctx->rkey_dec);
+ kernel_vector_end();
+ } else {
+ sm4_expandkey(ctx, key, key_len);
+ }
+ return 0;
+}
+
+static void riscv64_sm4_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ sm4_crypt_zvksed_zvkb(ctx->rkey_enc, src, dst);
+ kernel_vector_end();
+ } else {
+ sm4_crypt_block(ctx->rkey_enc, dst, src);
+ }
+}
+
+static void riscv64_sm4_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+ const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ if (crypto_simd_usable()) {
+ kernel_vector_begin();
+ sm4_crypt_zvksed_zvkb(ctx->rkey_dec, src, dst);
+ kernel_vector_end();
+ } else {
+ sm4_crypt_block(ctx->rkey_dec, dst, src);
+ }
+}
+
+static struct crypto_alg riscv64_sm4_alg = {
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = SM4_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct sm4_ctx),
+ .cra_priority = 300,
+ .cra_name = "sm4",
+ .cra_driver_name = "sm4-riscv64-zvksed-zvkb",
+ .cra_cipher = {
+ .cia_min_keysize = SM4_KEY_SIZE,
+ .cia_max_keysize = SM4_KEY_SIZE,
+ .cia_setkey = riscv64_sm4_setkey,
+ .cia_encrypt = riscv64_sm4_encrypt,
+ .cia_decrypt = riscv64_sm4_decrypt,
+ },
+ .cra_module = THIS_MODULE,
+};
+
+static int __init riscv64_sm4_mod_init(void)
+{
+ if (riscv_isa_extension_available(NULL, ZVKSED) &&
+ riscv_isa_extension_available(NULL, ZVKB) &&
+ riscv_vector_vlen() >= 128)
+ return crypto_register_alg(&riscv64_sm4_alg);
+
+ return -ENODEV;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+ crypto_unregister_alg(&riscv64_sm4_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (RISC-V accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S b/arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S
new file mode 100644
index 0000000000000..143e60ddf3fe4
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
+//
+// This file is dual-licensed, meaning that you can use it under your
+// choice of either of the following two licenses:
+//
+// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+//
+// Licensed under the Apache License 2.0 (the "License"). You can obtain
+// a copy in the file LICENSE in the source distribution or at
+// https://www.openssl.org/source/license.html
+//
+// or
+//
+// Copyright (c) 2023, Christoph Müllner <[email protected]>
+// Copyright (c) 2023, Jerry Shih <[email protected]>
+// Copyright 2024 Google LLC
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// The generated code of this file depends on the following RISC-V extensions:
+// - RV64I
+// - RISC-V Vector ('V') with VLEN >= 128
+// - RISC-V Vector SM4 Block Cipher extension ('Zvksed')
+// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
+
+#include <linux/linkage.h>
+
+.text
+.option arch, +zvksed, +zvkb
+
+// void sm4_expandkey_zksed_zvkb(const u8 user_key[16], u32 rkey_enc[32],
+// u32 rkey_dec[32]);
+SYM_FUNC_START(sm4_expandkey_zvksed_zvkb)
+ vsetivli zero, 4, e32, m1, ta, ma
+
+ // Load the user key.
+ vle32.v v1, (a0)
+ vrev8.v v1, v1
+
+ // XOR the user key with the family key.
+ la t0, FAMILY_KEY
+ vle32.v v2, (t0)
+ vxor.vv v1, v1, v2
+
+ // Compute the round keys. Store them in forwards order in rkey_enc
+ // and in reverse order in rkey_dec.
+ addi a2, a2, 31*4
+ li t0, -4
+ .set i, 0
+.rept 8
+ vsm4k.vi v1, v1, i
+ vse32.v v1, (a1) // Store to rkey_enc.
+ vsse32.v v1, (a2), t0 // Store to rkey_dec.
+.if i < 7
+ addi a1, a1, 16
+ addi a2, a2, -16
+.endif
+ .set i, i + 1
+.endr
+
+ ret
+SYM_FUNC_END(sm4_expandkey_zvksed_zvkb)
+
+// void sm4_crypt_zvksed_zvkb(const u32 rkey[32], const u8 in[16], u8 out[16]);
+SYM_FUNC_START(sm4_crypt_zvksed_zvkb)
+ vsetivli zero, 4, e32, m1, ta, ma
+
+ // Load the input data.
+ vle32.v v1, (a1)
+ vrev8.v v1, v1
+
+ // Do the 32 rounds of SM4, 4 at a time.
+ .set i, 0
+.rept 8
+ vle32.v v2, (a0)
+ vsm4r.vs v1, v2
+.if i < 7
+ addi a0, a0, 16
+.endif
+ .set i, i + 1
+.endr
+
+ // Store the output data (in reverse element order).
+ vrev8.v v1, v1
+ li t0, -4
+ addi a2, a2, 12
+ vsse32.v v1, (a2), t0
+
+ ret
+SYM_FUNC_END(sm4_crypt_zvksed_zvkb)
+
+.section ".rodata"
+.p2align 2
+.type FAMILY_KEY, @object
+FAMILY_KEY:
+ .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FAMILY_KEY, .-FAMILY_KEY
--
2.43.0


2024-01-03 14:00:49

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] RISC-V crypto with reworked asm files

On Tue, 2 Jan 2024 at 07:50, Eric Biggers <[email protected]> wrote:
>
> As discussed previously, the proposed use of the so-called perlasm for
> the RISC-V crypto assembly files makes them difficult to read, and these
> files have some other issues such extensively duplicating source code
> for the different AES key lengths and for the unrolled hash functions.
> There is/was a desire to share code with OpenSSL, but many of the files
> have already diverged significantly; also, for most of the algorithms
> the source code can be quite short anyway, due to the native support for
> them in the RISC-V vector crypto extensions combined with the way the
> RISC-V vector extension naturally scales to arbitrary vector lengths.
>
> Since we're still waiting for prerequisite patches to be merged anyway,
> we have a bit more time to get this cleaned up properly. So I've had a
> go at cleaning up the patchset to use standard .S files, with the code
> duplication fixed. I also made some tweaks to make the different
> algorithms consistent with each other and with what exists in the kernel
> already for other architectures, and tried to improve comments.
>
> The result is this series, which passes all tests and is about 2400
> lines shorter than the latest version with the perlasm
> (https://lore.kernel.org/linux-crypto/[email protected]/).
> All the same functionality and general optimizations are still included,
> except for some minor optimizations in XTS that I dropped since it's not
> clear they are worth the complexity. (Note that almost all users of XTS
> in the kernel only use it with block-aligned messages, so it's not very
> important to optimize the ciphertext stealing case.)
>
> I'd appreciate people's thoughts on this series. Jerry, I hope I'm not
> stepping on your toes too much here, but I think there are some big
> improvements here.
>

As I have indicated before, I fully agree with Eric here that avoiding
perlasm is preferable: sharing code with OpenSSL is great if we can
simply adopt the exact same code (and track OpenSSL as its upstream)
but this never really worked that well for skciphers, due to API
differences. (The SHA transforms can be reused a bit more easily)

I will also note that perlasm is not as useful for RISC-V as it is for
other architectures: in OpenSSL, perlasm is also used to abstract
differences in calling conventions between, e.g., x86_64 on Linux vs
Windows, or to support building with outdated [proprietary]
toolchains.

I do wonder if we could also use .req directives for register aliases
instead of CPP defines? It shouldn't matter for working code, but the
diagnostics tend to be a bit more useful if the aliases are visible to
the assembler.







> This series is based on riscv/for-next
> (https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=for-next)
> commit f352a28cc2fb4ee8d08c6a6362c9a861fcc84236, and for convenience
> I've included the prerequisite patches too.
>
> Andy Chiu (1):
> riscv: vector: make Vector always available for softirq context
>
> Eric Biggers (1):
> RISC-V: add TOOLCHAIN_HAS_VECTOR_CRYPTO
>
> Greentime Hu (1):
> riscv: Add support for kernel mode vector
>
> Heiko Stuebner (2):
> RISC-V: add helper function to read the vector VLEN
> RISC-V: hook new crypto subdir into build-system
>
> Jerry Shih (8):
> crypto: riscv - add vector crypto accelerated AES
> crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}
> crypto: riscv - add vector crypto accelerated ChaCha20
> crypto: riscv - add vector crypto accelerated GHASH
> crypto: riscv - add vector crypto accelerated SHA-{256,224}
> crypto: riscv - add vector crypto accelerated SHA-{512,384}
> crypto: riscv - add vector crypto accelerated SM3
> crypto: riscv - add vector crypto accelerated SM4
>
> arch/riscv/Kbuild | 1 +
> arch/riscv/Kconfig | 7 +
> arch/riscv/crypto/Kconfig | 108 +++++
> arch/riscv/crypto/Makefile | 28 ++
> arch/riscv/crypto/aes-macros.S | 156 +++++++
> .../crypto/aes-riscv64-block-mode-glue.c | 435 ++++++++++++++++++
> arch/riscv/crypto/aes-riscv64-glue.c | 123 +++++
> arch/riscv/crypto/aes-riscv64-glue.h | 15 +
> .../crypto/aes-riscv64-zvkned-zvbb-zvkg.S | 300 ++++++++++++
> arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S | 146 ++++++
> arch/riscv/crypto/aes-riscv64-zvkned.S | 180 ++++++++
> arch/riscv/crypto/chacha-riscv64-glue.c | 101 ++++
> arch/riscv/crypto/chacha-riscv64-zvkb.S | 294 ++++++++++++
> arch/riscv/crypto/ghash-riscv64-glue.c | 170 +++++++
> arch/riscv/crypto/ghash-riscv64-zvkg.S | 72 +++
> arch/riscv/crypto/sha256-riscv64-glue.c | 137 ++++++
> .../sha256-riscv64-zvknha_or_zvknhb-zvkb.S | 225 +++++++++
> arch/riscv/crypto/sha512-riscv64-glue.c | 133 ++++++
> .../riscv/crypto/sha512-riscv64-zvknhb-zvkb.S | 203 ++++++++
> arch/riscv/crypto/sm3-riscv64-glue.c | 112 +++++
> arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S | 123 +++++
> arch/riscv/crypto/sm4-riscv64-glue.c | 109 +++++
> arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S | 117 +++++
> arch/riscv/include/asm/processor.h | 14 +-
> arch/riscv/include/asm/simd.h | 48 ++
> arch/riscv/include/asm/vector.h | 20 +
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/kernel_mode_vector.c | 126 +++++
> arch/riscv/kernel/process.c | 1 +
> crypto/Kconfig | 3 +
> 30 files changed, 3507 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/crypto/Kconfig
> create mode 100644 arch/riscv/crypto/Makefile
> create mode 100644 arch/riscv/crypto/aes-macros.S
> create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c
> create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h
> create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S
> create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S
> create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.S
> create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.S
> create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.S
> create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S
> create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S
> create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S
> create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
> create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S
> create mode 100644 arch/riscv/include/asm/simd.h
> create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
>
>
> base-commit: f352a28cc2fb4ee8d08c6a6362c9a861fcc84236
> --
> 2.43.0
>

2024-01-03 14:36:10

by Eric Biggers

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] RISC-V crypto with reworked asm files

On Wed, Jan 03, 2024 at 03:00:29PM +0100, Ard Biesheuvel wrote:
> On Tue, 2 Jan 2024 at 07:50, Eric Biggers <[email protected]> wrote:
> >
> > As discussed previously, the proposed use of the so-called perlasm for
> > the RISC-V crypto assembly files makes them difficult to read, and these
> > files have some other issues such extensively duplicating source code
> > for the different AES key lengths and for the unrolled hash functions.
> > There is/was a desire to share code with OpenSSL, but many of the files
> > have already diverged significantly; also, for most of the algorithms
> > the source code can be quite short anyway, due to the native support for
> > them in the RISC-V vector crypto extensions combined with the way the
> > RISC-V vector extension naturally scales to arbitrary vector lengths.
> >
> > Since we're still waiting for prerequisite patches to be merged anyway,
> > we have a bit more time to get this cleaned up properly. So I've had a
> > go at cleaning up the patchset to use standard .S files, with the code
> > duplication fixed. I also made some tweaks to make the different
> > algorithms consistent with each other and with what exists in the kernel
> > already for other architectures, and tried to improve comments.
> >
> > The result is this series, which passes all tests and is about 2400
> > lines shorter than the latest version with the perlasm
> > (https://lore.kernel.org/linux-crypto/[email protected]/).
> > All the same functionality and general optimizations are still included,
> > except for some minor optimizations in XTS that I dropped since it's not
> > clear they are worth the complexity. (Note that almost all users of XTS
> > in the kernel only use it with block-aligned messages, so it's not very
> > important to optimize the ciphertext stealing case.)
> >
> > I'd appreciate people's thoughts on this series. Jerry, I hope I'm not
> > stepping on your toes too much here, but I think there are some big
> > improvements here.
> >
>
> As I have indicated before, I fully agree with Eric here that avoiding
> perlasm is preferable: sharing code with OpenSSL is great if we can
> simply adopt the exact same code (and track OpenSSL as its upstream)
> but this never really worked that well for skciphers, due to API
> differences. (The SHA transforms can be reused a bit more easily)
>
> I will also note that perlasm is not as useful for RISC-V as it is for
> other architectures: in OpenSSL, perlasm is also used to abstract
> differences in calling conventions between, e.g., x86_64 on Linux vs
> Windows, or to support building with outdated [proprietary]
> toolchains.
>
> I do wonder if we could also use .req directives for register aliases
> instead of CPP defines? It shouldn't matter for working code, but the
> diagnostics tend to be a bit more useful if the aliases are visible to
> the assembler.

.req unfortunately isn't an option since it is specific to AArch64 and ARM
assembly. So we have to use #defines like x86 does. Ultimately, the effect is
about the same.

- Eric

2024-01-03 14:50:54

by Eric Biggers

[permalink] [raw]
Subject: Re: [RFC PATCH 07/13] crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}

On Tue, Jan 02, 2024 at 12:47:33AM -0600, Eric Biggers wrote:
> diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
> index dca698c5cba3e..5dd91f34f0d52 100644
> --- a/arch/riscv/crypto/Makefile
> +++ b/arch/riscv/crypto/Makefile
> @@ -1,7 +1,10 @@
> # SPDX-License-Identifier: GPL-2.0-only
> #
> # linux/arch/riscv/crypto/Makefile
> #
>
> obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
> aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
> +
> +obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
> +aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o

A bug I noticed (which is also present in Jerry's patchset) is that some of the
code of the aes-block-riscv64 module is located in aes-riscv64-zvkned.S, which
isn't built into that module but rather into aes-riscv64. This causes a build
error when both CONFIG_CRYPTO_AES_RISCV64 and CONFIG_CRYPTO_AES_BLOCK_RISCV64
are set to 'm':

ERROR: modpost: "aes_cbc_decrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
ERROR: modpost: "aes_ecb_decrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
ERROR: modpost: "aes_cbc_encrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
ERROR: modpost: "aes_ecb_encrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!

To fix this, I think we should just merge the two modules and kconfig options
together so that there is one module that provides both the AES modes and the
AES single-block cipher. That's how x86's aesni-intel works, for example.

- Eric

2024-01-04 10:23:55

by Jerry Shih

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] RISC-V crypto with reworked asm files

On Jan 3, 2024, at 22:35, Eric Biggers <[email protected]> wrote:
> On Wed, Jan 03, 2024 at 03:00:29PM +0100, Ard Biesheuvel wrote:
>> On Tue, 2 Jan 2024 at 07:50, Eric Biggers <[email protected]> wrote:
>>>
>>> As discussed previously, the proposed use of the so-called perlasm for
>>> the RISC-V crypto assembly files makes them difficult to read, and these
>>> files have some other issues such extensively duplicating source code
>>> for the different AES key lengths and for the unrolled hash functions.
>>> There is/was a desire to share code with OpenSSL, but many of the files
>>> have already diverged significantly; also, for most of the algorithms
>>> the source code can be quite short anyway, due to the native support for
>>> them in the RISC-V vector crypto extensions combined with the way the
>>> RISC-V vector extension naturally scales to arbitrary vector lengths.
>>>
>>> Since we're still waiting for prerequisite patches to be merged anyway,
>>> we have a bit more time to get this cleaned up properly. So I've had a
>>> go at cleaning up the patchset to use standard .S files, with the code
>>> duplication fixed. I also made some tweaks to make the different
>>> algorithms consistent with each other and with what exists in the kernel

Do you mean the xts gluing part only? Do we have the different algorithms
for the actual implementation in .S?

>>> already for other architectures, and tried to improve comments.

The improved comments are better. Thanks.

>>> The result is this series, which passes all tests and is about 2400
>>> lines shorter than the latest version with the perlasm
>>> (https://lore.kernel.org/linux-crypto/[email protected]/).

For the unrolled hash functions case(sha256/512), the .S implementation uses
macro instead. But I think the macro expanded code will be the same. Do we
care about the source code size instead of the final binary code size?

For the AES variants part, the .S uses branches inside the inner loop. Does the
branch inside the inner loop better than the bigger code size?

>>> All the same functionality and general optimizations are still included,
>>> except for some minor optimizations in XTS that I dropped since it's not
>>> clear they are worth the complexity. (Note that almost all users of XTS
>>> in the kernel only use it with block-aligned messages, so it's not very
>>> important to optimize the ciphertext stealing case.)

The aesni/neon are SIMD, so I think the rvv version could have the different
design. And I think my implementation is very similar to x86-xts except the
tail block numbers for ciphertext stealing case.
The x86-xts-like implementation uses the fixed 2 block for the ciphertext
stealing case.

+ int xts_blocks = DIV_ROUND_UP(req->cryptlen,
+ AES_BLOCK_SIZE) - 2;

>>> I'd appreciate people's thoughts on this series. Jerry, I hope I'm not
>>> stepping on your toes too much here, but I think there are some big
>>> improvements here.
>>>
>>
>> As I have indicated before, I fully agree with Eric here that avoiding
>> perlasm is preferable: sharing code with OpenSSL is great if we can
>> simply adopt the exact same code (and track OpenSSL as its upstream)
>> but this never really worked that well for skciphers, due to API
>> differences. (The SHA transforms can be reused a bit more easily)

In my opinion, I would prefer the perlasm with the usage of asm mnemonics.
I could see the expanded asm source from perlsm. I don't know how to dump the
expanded asm source when we use asm `.macro` directives. I use objdump to
see the final asm.
And we could use function to modularize the asm implementations. The macro
might do the same things, but it's more clear for me for the argument passing
and more powerful string manipulation.
And I could have scope for the register name binding. It's more clear to me
comparing with a long list of `#define xxx`.

If the pure .S is still more preferable in kernel, let's do that.

-Jerry

>> I will also note that perlasm is not as useful for RISC-V as it is for
>> other architectures: in OpenSSL, perlasm is also used to abstract
>> differences in calling conventions between, e.g., x86_64 on Linux vs
>> Windows, or to support building with outdated [proprietary]
>> toolchains.
>>
>> I do wonder if we could also use .req directives for register aliases
>> instead of CPP defines? It shouldn't matter for working code, but the
>> diagnostics tend to be a bit more useful if the aliases are visible to
>> the assembler.
>
> .req unfortunately isn't an option since it is specific to AArch64 and ARM
> assembly. So we have to use #defines like x86 does. Ultimately, the effect is
> about the same.
>
> - Eric


2024-01-04 17:24:16

by Eric Biggers

[permalink] [raw]
Subject: Re: [RFC PATCH 07/13] crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}

On Thu, Jan 04, 2024 at 04:47:16PM +0800, Jerry Shih wrote:
> On Jan 3, 2024, at 22:50, Eric Biggers <[email protected]> wrote:
> > On Tue, Jan 02, 2024 at 12:47:33AM -0600, Eric Biggers wrote:
> >> diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
> >> index dca698c5cba3e..5dd91f34f0d52 100644
> >> --- a/arch/riscv/crypto/Makefile
> >> +++ b/arch/riscv/crypto/Makefile
> >> @@ -1,7 +1,10 @@
> >> # SPDX-License-Identifier: GPL-2.0-only
> >> #
> >> # linux/arch/riscv/crypto/Makefile
> >> #
> >>
> >> obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
> >> aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o
> >> +
> >> +obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o
> >> +aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o
> >
> > A bug I noticed (which is also present in Jerry's patchset) is that some of the
> > code of the aes-block-riscv64 module is located in aes-riscv64-zvkned.S, which
> > isn't built into that module but rather into aes-riscv64. This causes a build
> > error when both CONFIG_CRYPTO_AES_RISCV64 and CONFIG_CRYPTO_AES_BLOCK_RISCV64
> > are set to 'm':
> >
> > ERROR: modpost: "aes_cbc_decrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
> > ERROR: modpost: "aes_ecb_decrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
> > ERROR: modpost: "aes_cbc_encrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
> > ERROR: modpost: "aes_ecb_encrypt_zvkned" [arch/riscv/crypto/aes-block-riscv64.ko] undefined!
> >
> > To fix this, I think we should just merge the two modules and kconfig options
> > together so that there is one module that provides both the AES modes and the
> > AES single-block cipher. That's how x86's aesni-intel works, for example.
> >
> > - Eric
>
> That's a bug in my patchset.
> I don't test with all options with `M` settings since I can't boot to qemu with all `M` settings.
>
> Could we move the cbc and ecb from `aes-riscv64-zvkned` to `aes-block-riscv64` instead of merging
> these two modules?
> Thus, we could still enable the single aes block cipher without other extensions(e.g. zvbb or zvkg).

My proposal to merge the two modules still results in the single-block AES
cipher being registered when zvkned alone is present. It just won't be
selectable separately via kconfig. I think that's fine.

- Eric

2024-01-04 18:03:07

by Eric Biggers

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] RISC-V crypto with reworked asm files

On Thu, Jan 04, 2024 at 06:23:42PM +0800, Jerry Shih wrote:
> On Jan 3, 2024, at 22:35, Eric Biggers <[email protected]> wrote:
> > On Wed, Jan 03, 2024 at 03:00:29PM +0100, Ard Biesheuvel wrote:
> >> On Tue, 2 Jan 2024 at 07:50, Eric Biggers <[email protected]> wrote:
> >>>
> >>> As discussed previously, the proposed use of the so-called perlasm for
> >>> the RISC-V crypto assembly files makes them difficult to read, and these
> >>> files have some other issues such extensively duplicating source code
> >>> for the different AES key lengths and for the unrolled hash functions.
> >>> There is/was a desire to share code with OpenSSL, but many of the files
> >>> have already diverged significantly; also, for most of the algorithms
> >>> the source code can be quite short anyway, due to the native support for
> >>> them in the RISC-V vector crypto extensions combined with the way the
> >>> RISC-V vector extension naturally scales to arbitrary vector lengths.
> >>>
> >>> Since we're still waiting for prerequisite patches to be merged anyway,
> >>> we have a bit more time to get this cleaned up properly. So I've had a
> >>> go at cleaning up the patchset to use standard .S files, with the code
> >>> duplication fixed. I also made some tweaks to make the different
> >>> algorithms consistent with each other and with what exists in the kernel
>
> Do you mean the xts gluing part only? Do we have the different algorithms
> for the actual implementation in .S?

I did change both the XTS assembly and XTS glue code slightly. However, the
general approach is still the same, except for a couple minor optimizations that
I dropped. See what I wrote below:

All the same functionality and general optimizations are still included,
except for some minor optimizations in XTS that I dropped since it's not
clear they are worth the complexity. (Note that almost all users of XTS
in the kernel only use it with block-aligned messages, so it's not very
important to optimize the ciphertext stealing case.)

> >>> The result is this series, which passes all tests and is about 2400
> >>> lines shorter than the latest version with the perlasm
> >>> (https://lore.kernel.org/linux-crypto/[email protected]/).
>
> For the unrolled hash functions case(sha256/512), the .S implementation uses
> macro instead. But I think the macro expanded code will be the same. Do we
> care about the source code size instead of the final binary code size?

Yes, because it's very difficult to review code that's been hand-unrolled.
Essentially what I had been doing to review your code was to reverse engineer
what the loop that was being unrolled was, and then go through each unrolled
iteration and verify that it matches what it is supposed to. (While also
ignoring the code comments, as some of the comments had been copy+pasted without
being updated to be correct for what iteration they were commenting.)

It's much simpler to just use .rept to unroll the code for us.

> For the AES variants part, the .S uses branches inside the inner loop.

No, it does not. The macros generate a separate inner loop for each AES key
length, just like what the perl scripts had. The source code just isn't
duplicated 3 times anymore.

>
> >>> All the same functionality and general optimizations are still included,
> >>> except for some minor optimizations in XTS that I dropped since it's not
> >>> clear they are worth the complexity. (Note that almost all users of XTS
> >>> in the kernel only use it with block-aligned messages, so it's not very
> >>> important to optimize the ciphertext stealing case.)
>
> The aesni/neon are SIMD, so I think the rvv version could have the different
> design. And I think my implementation is very similar to x86-xts except the
> tail block numbers for ciphertext stealing case.
> The x86-xts-like implementation uses the fixed 2 block for the ciphertext
> stealing case.
>
> + int xts_blocks = DIV_ROUND_UP(req->cryptlen,
> + AES_BLOCK_SIZE) - 2;

The two specific XTS optimizations that I dropped are:

1. There was a parameter 'update_iv' that told the assembly code whether to
calculate and return the next tweak or not. Computing the next tweak is a
fairly small computation; it's not clear to me that it's worth adding a
branch and extra code complexity to skip that computation.

2. For XTS encryption with ciphertext stealing, the assembly code encrypted the
last full block during the main loop instead of separately, in order to take
better advantage of parallelism. I don't like how this optimization only
applied to inputs whose length is not a multiple of 16, which are very rare.
Also, it only applied to encryption, not to decryption. That meant that
separate code was needed for encryption and decryption. Note that for disk
encryption, decryption is more performance-critical than encryption.

Again, we could bring these back later. I'd just like to error on the side of
simplicity for the initial version. If we include optimizations that aren't
actually worthwhile, it might be hard to remove them later.

> >>> I'd appreciate people's thoughts on this series. Jerry, I hope I'm not
> >>> stepping on your toes too much here, but I think there are some big
> >>> improvements here.
> >>>
> >>
> >> As I have indicated before, I fully agree with Eric here that avoiding
> >> perlasm is preferable: sharing code with OpenSSL is great if we can
> >> simply adopt the exact same code (and track OpenSSL as its upstream)
> >> but this never really worked that well for skciphers, due to API
> >> differences. (The SHA transforms can be reused a bit more easily)
>
> In my opinion, I would prefer the perlasm with the usage of asm mnemonics.
> I could see the expanded asm source from perlsm. I don't know how to dump the
> expanded asm source when we use asm `.macro` directives. I use objdump to
> see the final asm.

Yes, you can always use objdump to see everything expanded out. But there's not
much of a need to do that when you can just read the source code instead.

> And we could use function to modularize the asm implementations. The macro
> might do the same things, but it's more clear for me for the argument passing
> and more powerful string manipulation.

The perl scripts were not using any advanced capabilities of perl, and in some
cases they even emitted assembly macros like .rept instead of using perl. So
the use of perl seemed unnecessary and just over-complicated things.

BTW, we could always use perlasm for specific algorithms where we actually need
advanced scripting, like to interleave two independent computations with each
other. It doesn't have to be all one or the other. It's just that the code
that's been written for RISC-V so far is relatively straightforward and doesn't
seem to benefit from perlasm.

> And I could have scope for the register name binding. It's more clear to me
> comparing with a long list of `#define xxx`.

I think it's good enough. We can always re-#define things later in the file, or
just split functions with different parameters into different files.

BTW, the reason that some of the lists of defines got longer is that I actually
gave things proper names, whereas some the perl scripts used trivial names like
"V28" for the v28 register.

- Eric