2021-03-04 07:26:34

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 0/9] running kernel mode SIMD with softirqs disabled

[ TL;DR for the non-ARM folks on CC: disabling softirq processing when using
SIMD in kernel mode could reduce complexity and improve performance, but we
need to decide whether we can do this, and how much softirq processing
latency we can tolerate. If we can find a satisfactory solution for this,
we might do the same for x86 and 32-bit ARM as well.

However, based on preliminary off-list discussions with peterz and luto, it
seems that for x86, there is a preference for using per-CPU buffers to
preserve/restore the task context's kernel mode SIMD state when the task is
interrupted to perform kernel mode SIMD in softirq context. On arm64, we
actually had this arrangement before, and removed it because it made
reasoning about preserving/restoring userland SVE state (32 SIMD registers
of up to 2 kbit in size) rather complex. ]

The crypto API provides two ways to invoke symmetric encryption algorithms:
- synchronously, where the transformation is guaranteed to be done by the
time the function returns;
- asynchronously, where the function may return with a -EINPROGRESS return code,
and a completion will be signalled when the transformation is done.

The latter is mainly intended for h/w accelerators, where the throughput would
be severely limited by the latency otherwise. However, it is also being used
for software algorithms based on SIMD instructions, which cannot be issued from
any context (the rules are not the same on each architecture, but typically,
SIMD can be used in task context, or in softirq context if it was not taken
while the SIMD was already in use in kernel mode).

Many users of the crypto API exist in the kernel today that opt out of this
asynchronous interface (802.11, macsec, kerberos, sw kTLS), or use a library
interface which is fundamentally synchronous (wireguard). This means we end
up using a degraded mode for the contended case (a scalar fallback) as well
as the uncontended case (generic GCM/CCM/CTR chaining mode templates wrapped
around the SIMD cipher as opposed to accelerated implementations of the full
chaining modes in question). Note that scalar AES runs ~20x slower than the
SIMD instruction based version.

So let's address this for arm64, by reorganizing kernel mode SIMD support so
that the SIMD unit can always be assumed to be available. This means we need
to defer softirq processing when grabbing the NEON unit in task context, so
that any use of it in softirq context is guaranteed not to interrupt any code
that was already using the NEON.

This obviously impacts softirq processing latency, which is why the existing
conditional yield support is modified to take pending softirqs into account.

Change since RFC/v1:
- add patch to remove obsolete cond_yield_neon macros
- rebased onto new, simplified cond_yield macro
- include patches to remove the async path from all arm64 crypto skciphers
and AEADs

Previous RFC version:
[0] https://lore.kernel.org/linux-arm-kernel/[email protected]/

The first 3 patches will need to go through the arm64 tree, so once this
series is reviewed, some coordination is required between the arm64 and
crypto trees to get this merged without conflicts.

Cc: Dave Martin <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>

Ard Biesheuvel (9):
arm64: assembler: remove conditional NEON yield macros
arm64: assembler: introduce wxN aliases for wN registers
arm64: fpsimd: run kernel mode NEON with softirqs disabled
crypto: aead - disallow en/decrypt for non-task or non-softirq context
crypto: skcipher - disallow en/decrypt for non-task or non-softirq
context
crypto: arm64/gcm-aes-ce - remove non-SIMD fallback path
crypto: arm64/aes-ccm - remove non-SIMD fallback path
crypto: arm64/aes-ce - stop using SIMD helper for skciphers
crypto: arm64/aes-neonbs - stop using SIMD helper for skciphers

arch/arm64/crypto/Kconfig | 3 -
arch/arm64/crypto/aes-ce-ccm-glue.c | 151 +++-----------
arch/arm64/crypto/aes-glue.c | 102 ++--------
arch/arm64/crypto/aes-modes.S | 2 +-
arch/arm64/crypto/aes-neonbs-glue.c | 122 +-----------
arch/arm64/crypto/ghash-ce-glue.c | 209 +++++---------------
arch/arm64/crypto/sha1-ce-core.S | 2 +-
arch/arm64/crypto/sha2-ce-core.S | 2 +-
arch/arm64/crypto/sha3-ce-core.S | 4 +-
arch/arm64/crypto/sha512-ce-core.S | 2 +-
arch/arm64/include/asm/assembler.h | 106 +++-------
arch/arm64/kernel/asm-offsets.c | 2 +
arch/arm64/kernel/fpsimd.c | 4 +-
crypto/aead.c | 10 +
crypto/skcipher.c | 10 +
15 files changed, 162 insertions(+), 569 deletions(-)

--
2.30.1


2021-03-04 07:26:34

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 3/9] arm64: fpsimd: run kernel mode NEON with softirqs disabled

Kernel mode NEON can be used in task or softirq context, but only in
a non-nesting manner, i.e., softirq context is only permitted if the
interrupt was not taken at a point where the kernel was using the NEON
in task context.

This means all users of kernel mode NEON have to be aware of this
limitation, and either need to provide scalar fallbacks that may be much
slower (up to 20x for AES instructions) and potentially less safe, or
use an asynchronous interface that defers processing to a later time
when the NEON is guaranteed to be available.

Given that grabbing and releasing the NEON is cheap, we can relax this
restriction, by increasing the granularity of kernel mode NEON code, and
always disabling softirq processing while the NEON is being used in task
context.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/crypto/aes-modes.S | 2 +-
arch/arm64/crypto/sha1-ce-core.S | 2 +-
arch/arm64/crypto/sha2-ce-core.S | 2 +-
arch/arm64/crypto/sha3-ce-core.S | 4 +--
arch/arm64/crypto/sha512-ce-core.S | 2 +-
arch/arm64/include/asm/assembler.h | 28 +++++++++++++++-----
arch/arm64/kernel/asm-offsets.c | 2 ++
arch/arm64/kernel/fpsimd.c | 4 +--
8 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index bbdb54702aa7..ab6c14ef9f4e 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -700,7 +700,7 @@ AES_FUNC_START(aes_mac_update)
cbz w5, .Lmacout
encrypt_block v0, w2, x1, x7, w8
st1 {v0.16b}, [x4] /* return dg */
- cond_yield .Lmacout, x7
+ cond_yield .Lmacout, x7, x8
b .Lmacloop4x
.Lmac1x:
add w3, w3, #4
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 8c02bbc2684e..889ca0f8972b 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -121,7 +121,7 @@ CPU_LE( rev32 v11.16b, v11.16b )
add dgav.4s, dgav.4s, dg0v.4s

cbz w2, 2f
- cond_yield 3f, x5
+ cond_yield 3f, x5, x6
b 0b

/*
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index 6cdea7d56059..491179922f49 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -129,7 +129,7 @@ CPU_LE( rev32 v19.16b, v19.16b )

/* handled all input blocks? */
cbz w2, 2f
- cond_yield 3f, x5
+ cond_yield 3f, x5, x6
b 0b

/*
diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S
index 6f5208414fe3..9c77313f5a60 100644
--- a/arch/arm64/crypto/sha3-ce-core.S
+++ b/arch/arm64/crypto/sha3-ce-core.S
@@ -184,11 +184,11 @@ SYM_FUNC_START(sha3_ce_transform)
eor v0.16b, v0.16b, v31.16b

cbnz w8, 3b
- cond_yield 3f, x8
+ cond_yield 4f, x8, x9
cbnz w2, 0b

/* save state */
-3: st1 { v0.1d- v3.1d}, [x0], #32
+4: st1 { v0.1d- v3.1d}, [x0], #32
st1 { v4.1d- v7.1d}, [x0], #32
st1 { v8.1d-v11.1d}, [x0], #32
st1 {v12.1d-v15.1d}, [x0], #32
diff --git a/arch/arm64/crypto/sha512-ce-core.S b/arch/arm64/crypto/sha512-ce-core.S
index d6e7f6c95fa6..b6a3a36e15f5 100644
--- a/arch/arm64/crypto/sha512-ce-core.S
+++ b/arch/arm64/crypto/sha512-ce-core.S
@@ -195,7 +195,7 @@ CPU_LE( rev64 v19.16b, v19.16b )
add v10.2d, v10.2d, v2.2d
add v11.2d, v11.2d, v3.2d

- cond_yield 3f, x4
+ cond_yield 3f, x4, x5
/* handled all input blocks? */
cbnz w2, 0b

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 7b076ccd1a54..6ac38f7cf824 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -15,6 +15,7 @@
#include <asm-generic/export.h>

#include <asm/asm-offsets.h>
+#include <asm/alternative.h>
#include <asm/cpufeature.h>
#include <asm/cputype.h>
#include <asm/debug-monitors.h>
@@ -701,19 +702,32 @@ USER(\label, ic ivau, \tmp2) // invalidate I line PoU
.endm

/*
- * Check whether preempt-disabled code should yield as soon as it
- * is able. This is the case if re-enabling preemption a single
- * time results in a preempt count of zero, and the TIF_NEED_RESCHED
- * flag is set. (Note that the latter is stored negated in the
- * top word of the thread_info::preempt_count field)
+ * Check whether preempt/bh-disabled asm code should yield as soon as
+ * it is able. This is the case if we are currently running in task
+ * context, and either a softirq is pending, or the TIF_NEED_RESCHED
+ * flag is set and re-enabling preemption a single time would result in
+ * a preempt count of zero. (Note that the TIF_NEED_RESCHED flag is
+ * stored negated in the top word of the thread_info::preempt_count
+ * field)
*/
- .macro cond_yield, lbl:req, tmp:req
-#ifdef CONFIG_PREEMPTION
+ .macro cond_yield, lbl:req, tmp:req, tmp2:req
get_current_task \tmp
ldr \tmp, [\tmp, #TSK_TI_PREEMPT]
+ /*
+ * If we are serving a softirq, there is no point in yielding: the
+ * softirq will not be preempted no matter what we do, so we should
+ * run to completion as quickly as we can.
+ */
+ tbnz \tmp, #SOFTIRQ_SHIFT, .Lnoyield_\@
+#ifdef CONFIG_PREEMPTION
sub \tmp, \tmp, #PREEMPT_DISABLE_OFFSET
cbz \tmp, \lbl
#endif
+ adr_l \tmp, irq_stat + IRQ_CPUSTAT_SOFTIRQ_PENDING
+ this_cpu_offset \tmp2
+ ldr w\tmp, [\tmp, \tmp2]
+ cbnz w\tmp, \lbl // yield on pending softirq in task context
+.Lnoyield_\@:
.endm

/*
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a36e2fc330d4..cc7267a24bf7 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -95,6 +95,8 @@ int main(void)
DEFINE(DMA_FROM_DEVICE, DMA_FROM_DEVICE);
BLANK();
DEFINE(PREEMPT_DISABLE_OFFSET, PREEMPT_DISABLE_OFFSET);
+ DEFINE(SOFTIRQ_SHIFT, SOFTIRQ_SHIFT);
+ DEFINE(IRQ_CPUSTAT_SOFTIRQ_PENDING, offsetof(irq_cpustat_t, __softirq_pending));
BLANK();
DEFINE(CPU_BOOT_STACK, offsetof(struct secondary_data, stack));
DEFINE(CPU_BOOT_TASK, offsetof(struct secondary_data, task));
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 062b21f30f94..823e3a8a8871 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -180,7 +180,7 @@ static void __get_cpu_fpsimd_context(void)
*/
static void get_cpu_fpsimd_context(void)
{
- preempt_disable();
+ local_bh_disable();
__get_cpu_fpsimd_context();
}

@@ -201,7 +201,7 @@ static void __put_cpu_fpsimd_context(void)
static void put_cpu_fpsimd_context(void)
{
__put_cpu_fpsimd_context();
- preempt_enable();
+ local_bh_enable();
}

static bool have_cpu_fpsimd_context(void)
--
2.30.1

2021-03-04 07:26:37

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 6/9] crypto: arm64/gcm-aes-ce - remove non-SIMD fallback path

Now that kernel mode SIMD is guaranteed to be available when executing
in task or softirq context, we no longer need scalar fallbacks to use
when the NEON is unavailable. So get rid of them.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/crypto/ghash-ce-glue.c | 209 +++++---------------
1 file changed, 51 insertions(+), 158 deletions(-)

diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index 720cd3a58da3..15794fe21a0b 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -362,84 +362,36 @@ static int gcm_encrypt(struct aead_request *req)

err = skcipher_walk_aead_encrypt(&walk, req, false);

- if (likely(crypto_simd_usable())) {
- do {
- const u8 *src = walk.src.virt.addr;
- u8 *dst = walk.dst.virt.addr;
- int nbytes = walk.nbytes;
-
- tag = (u8 *)&lengths;
-
- if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE)) {
- src = dst = memcpy(buf + sizeof(buf) - nbytes,
- src, nbytes);
- } else if (nbytes < walk.total) {
- nbytes &= ~(AES_BLOCK_SIZE - 1);
- tag = NULL;
- }
-
- kernel_neon_begin();
- pmull_gcm_encrypt(nbytes, dst, src, ctx->ghash_key.h,
- dg, iv, ctx->aes_key.key_enc, nrounds,
- tag);
- kernel_neon_end();
-
- if (unlikely(!nbytes))
- break;
-
- if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE))
- memcpy(walk.dst.virt.addr,
- buf + sizeof(buf) - nbytes, nbytes);
-
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
- } while (walk.nbytes);
- } else {
- while (walk.nbytes >= AES_BLOCK_SIZE) {
- int blocks = walk.nbytes / AES_BLOCK_SIZE;
- const u8 *src = walk.src.virt.addr;
- u8 *dst = walk.dst.virt.addr;
- int remaining = blocks;
-
- do {
- aes_encrypt(&ctx->aes_key, buf, iv);
- crypto_xor_cpy(dst, src, buf, AES_BLOCK_SIZE);
- crypto_inc(iv, AES_BLOCK_SIZE);
-
- dst += AES_BLOCK_SIZE;
- src += AES_BLOCK_SIZE;
- } while (--remaining > 0);
-
- ghash_do_update(blocks, dg, walk.dst.virt.addr,
- &ctx->ghash_key, NULL);
-
- err = skcipher_walk_done(&walk,
- walk.nbytes % AES_BLOCK_SIZE);
- }
-
- /* handle the tail */
- if (walk.nbytes) {
- aes_encrypt(&ctx->aes_key, buf, iv);
+ do {
+ const u8 *src = walk.src.virt.addr;
+ u8 *dst = walk.dst.virt.addr;
+ int nbytes = walk.nbytes;

- crypto_xor_cpy(walk.dst.virt.addr, walk.src.virt.addr,
- buf, walk.nbytes);
+ tag = (u8 *)&lengths;

- memcpy(buf, walk.dst.virt.addr, walk.nbytes);
- memset(buf + walk.nbytes, 0, sizeof(buf) - walk.nbytes);
+ if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE)) {
+ src = dst = memcpy(buf + sizeof(buf) - nbytes,
+ src, nbytes);
+ } else if (nbytes < walk.total) {
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+ tag = NULL;
}

- tag = (u8 *)&lengths;
- ghash_do_update(1, dg, tag, &ctx->ghash_key,
- walk.nbytes ? buf : NULL);
+ kernel_neon_begin();
+ pmull_gcm_encrypt(nbytes, dst, src, ctx->ghash_key.h,
+ dg, iv, ctx->aes_key.key_enc, nrounds,
+ tag);
+ kernel_neon_end();

- if (walk.nbytes)
- err = skcipher_walk_done(&walk, 0);
+ if (unlikely(!nbytes))
+ break;

- put_unaligned_be64(dg[1], tag);
- put_unaligned_be64(dg[0], tag + 8);
- put_unaligned_be32(1, iv + GCM_IV_SIZE);
- aes_encrypt(&ctx->aes_key, iv, iv);
- crypto_xor(tag, iv, AES_BLOCK_SIZE);
- }
+ if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE))
+ memcpy(walk.dst.virt.addr,
+ buf + sizeof(buf) - nbytes, nbytes);
+
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ } while (walk.nbytes);

if (err)
return err;
@@ -464,6 +416,7 @@ static int gcm_decrypt(struct aead_request *req)
u64 dg[2] = {};
be128 lengths;
u8 *tag;
+ int ret;
int err;

lengths.a = cpu_to_be64(req->assoclen * 8);
@@ -481,101 +434,41 @@ static int gcm_decrypt(struct aead_request *req)

err = skcipher_walk_aead_decrypt(&walk, req, false);

- if (likely(crypto_simd_usable())) {
- int ret;
-
- do {
- const u8 *src = walk.src.virt.addr;
- u8 *dst = walk.dst.virt.addr;
- int nbytes = walk.nbytes;
-
- tag = (u8 *)&lengths;
-
- if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE)) {
- src = dst = memcpy(buf + sizeof(buf) - nbytes,
- src, nbytes);
- } else if (nbytes < walk.total) {
- nbytes &= ~(AES_BLOCK_SIZE - 1);
- tag = NULL;
- }
-
- kernel_neon_begin();
- ret = pmull_gcm_decrypt(nbytes, dst, src,
- ctx->ghash_key.h,
- dg, iv, ctx->aes_key.key_enc,
- nrounds, tag, otag, authsize);
- kernel_neon_end();
-
- if (unlikely(!nbytes))
- break;
-
- if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE))
- memcpy(walk.dst.virt.addr,
- buf + sizeof(buf) - nbytes, nbytes);
-
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
- } while (walk.nbytes);
-
- if (err)
- return err;
- if (ret)
- return -EBADMSG;
- } else {
- while (walk.nbytes >= AES_BLOCK_SIZE) {
- int blocks = walk.nbytes / AES_BLOCK_SIZE;
- const u8 *src = walk.src.virt.addr;
- u8 *dst = walk.dst.virt.addr;
-
- ghash_do_update(blocks, dg, walk.src.virt.addr,
- &ctx->ghash_key, NULL);
-
- do {
- aes_encrypt(&ctx->aes_key, buf, iv);
- crypto_xor_cpy(dst, src, buf, AES_BLOCK_SIZE);
- crypto_inc(iv, AES_BLOCK_SIZE);
-
- dst += AES_BLOCK_SIZE;
- src += AES_BLOCK_SIZE;
- } while (--blocks > 0);
+ do {
+ const u8 *src = walk.src.virt.addr;
+ u8 *dst = walk.dst.virt.addr;
+ int nbytes = walk.nbytes;

- err = skcipher_walk_done(&walk,
- walk.nbytes % AES_BLOCK_SIZE);
- }
+ tag = (u8 *)&lengths;

- /* handle the tail */
- if (walk.nbytes) {
- memcpy(buf, walk.src.virt.addr, walk.nbytes);
- memset(buf + walk.nbytes, 0, sizeof(buf) - walk.nbytes);
+ if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE)) {
+ src = dst = memcpy(buf + sizeof(buf) - nbytes,
+ src, nbytes);
+ } else if (nbytes < walk.total) {
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
+ tag = NULL;
}

- tag = (u8 *)&lengths;
- ghash_do_update(1, dg, tag, &ctx->ghash_key,
- walk.nbytes ? buf : NULL);
-
- if (walk.nbytes) {
- aes_encrypt(&ctx->aes_key, buf, iv);
+ kernel_neon_begin();
+ ret = pmull_gcm_decrypt(nbytes, dst, src, ctx->ghash_key.h,
+ dg, iv, ctx->aes_key.key_enc,
+ nrounds, tag, otag, authsize);
+ kernel_neon_end();

- crypto_xor_cpy(walk.dst.virt.addr, walk.src.virt.addr,
- buf, walk.nbytes);
+ if (unlikely(!nbytes))
+ break;

- err = skcipher_walk_done(&walk, 0);
- }
+ if (unlikely(nbytes > 0 && nbytes < AES_BLOCK_SIZE))
+ memcpy(walk.dst.virt.addr,
+ buf + sizeof(buf) - nbytes, nbytes);

- if (err)
- return err;
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ } while (walk.nbytes);

- put_unaligned_be64(dg[1], tag);
- put_unaligned_be64(dg[0], tag + 8);
- put_unaligned_be32(1, iv + GCM_IV_SIZE);
- aes_encrypt(&ctx->aes_key, iv, iv);
- crypto_xor(tag, iv, AES_BLOCK_SIZE);
+ if (err)
+ return err;

- if (crypto_memneq(tag, otag, authsize)) {
- memzero_explicit(tag, AES_BLOCK_SIZE);
- return -EBADMSG;
- }
- }
- return 0;
+ return ret ? -EBADMSG : 0;
}

static struct aead_alg gcm_aes_alg = {
--
2.30.1

2021-03-04 07:26:40

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 4/9] crypto: aead - disallow en/decrypt for non-task or non-softirq context

In order to ensure that kernel mode SIMD routines will not need a scalar
fallback if they run with softirqs disabled, disallow any use of the
AEAD encrypt and decrypt routines from outside of task or softirq context.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
crypto/aead.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/crypto/aead.c b/crypto/aead.c
index 16991095270d..b5304b3d3314 100644
--- a/crypto/aead.c
+++ b/crypto/aead.c
@@ -87,6 +87,11 @@ int crypto_aead_encrypt(struct aead_request *req)
unsigned int cryptlen = req->cryptlen;
int ret;

+ if (!(alg->cra_flags & CRYPTO_ALG_ASYNC) &&
+ WARN_ONCE(!in_task() && !in_serving_softirq(),
+ "synchronous call from invalid context\n"))
+ return -EBUSY;
+
crypto_stats_get(alg);
if (crypto_aead_get_flags(aead) & CRYPTO_TFM_NEED_KEY)
ret = -ENOKEY;
@@ -104,6 +109,11 @@ int crypto_aead_decrypt(struct aead_request *req)
unsigned int cryptlen = req->cryptlen;
int ret;

+ if (!(alg->cra_flags & CRYPTO_ALG_ASYNC) &&
+ WARN_ONCE(!in_task() && !in_serving_softirq(),
+ "synchronous call from invalid context\n"))
+ return -EBUSY;
+
crypto_stats_get(alg);
if (crypto_aead_get_flags(aead) & CRYPTO_TFM_NEED_KEY)
ret = -ENOKEY;
--
2.30.1

2021-03-04 07:26:41

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 7/9] crypto: arm64/aes-ccm - remove non-SIMD fallback path

AES/CCM on arm64 is implemented as a synchronous AEAD, and so it is
guaranteed by the API that it is only invoked in task or softirq
context. Since softirqs are now only handled when the SIMD is not
being used in the task context that was interrupted to service the
softirq, we no longer need a fallback path. Let's remove it.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/crypto/aes-ce-ccm-glue.c | 151 ++++----------------
1 file changed, 30 insertions(+), 121 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index f6d19b0dc893..e6a7243825a2 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -99,36 +99,10 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[],
u32 abytes, u32 *macp)
{
- if (crypto_simd_usable()) {
- kernel_neon_begin();
- ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc,
- num_rounds(key));
- kernel_neon_end();
- } else {
- if (*macp > 0 && *macp < AES_BLOCK_SIZE) {
- int added = min(abytes, AES_BLOCK_SIZE - *macp);
-
- crypto_xor(&mac[*macp], in, added);
-
- *macp += added;
- in += added;
- abytes -= added;
- }
-
- while (abytes >= AES_BLOCK_SIZE) {
- aes_encrypt(key, mac, mac);
- crypto_xor(mac, in, AES_BLOCK_SIZE);
-
- in += AES_BLOCK_SIZE;
- abytes -= AES_BLOCK_SIZE;
- }
-
- if (abytes > 0) {
- aes_encrypt(key, mac, mac);
- crypto_xor(mac, in, abytes);
- *macp = abytes;
- }
- }
+ kernel_neon_begin();
+ ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc,
+ num_rounds(key));
+ kernel_neon_end();
}

static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
@@ -171,54 +145,6 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
} while (len);
}

-static int ccm_crypt_fallback(struct skcipher_walk *walk, u8 mac[], u8 iv0[],
- struct crypto_aes_ctx *ctx, bool enc)
-{
- u8 buf[AES_BLOCK_SIZE];
- int err = 0;
-
- while (walk->nbytes) {
- int blocks = walk->nbytes / AES_BLOCK_SIZE;
- u32 tail = walk->nbytes % AES_BLOCK_SIZE;
- u8 *dst = walk->dst.virt.addr;
- u8 *src = walk->src.virt.addr;
- u32 nbytes = walk->nbytes;
-
- if (nbytes == walk->total && tail > 0) {
- blocks++;
- tail = 0;
- }
-
- do {
- u32 bsize = AES_BLOCK_SIZE;
-
- if (nbytes < AES_BLOCK_SIZE)
- bsize = nbytes;
-
- crypto_inc(walk->iv, AES_BLOCK_SIZE);
- aes_encrypt(ctx, buf, walk->iv);
- aes_encrypt(ctx, mac, mac);
- if (enc)
- crypto_xor(mac, src, bsize);
- crypto_xor_cpy(dst, src, buf, bsize);
- if (!enc)
- crypto_xor(mac, dst, bsize);
- dst += bsize;
- src += bsize;
- nbytes -= bsize;
- } while (--blocks);
-
- err = skcipher_walk_done(walk, tail);
- }
-
- if (!err) {
- aes_encrypt(ctx, buf, iv0);
- aes_encrypt(ctx, mac, mac);
- crypto_xor(mac, buf, AES_BLOCK_SIZE);
- }
- return err;
-}
-
static int ccm_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
@@ -241,30 +167,22 @@ static int ccm_encrypt(struct aead_request *req)

err = skcipher_walk_aead_encrypt(&walk, req, false);

- if (crypto_simd_usable()) {
- while (walk.nbytes) {
- u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;

- if (walk.nbytes == walk.total)
- tail = 0;
+ if (walk.nbytes == walk.total)
+ tail = 0;

- kernel_neon_begin();
- ce_aes_ccm_encrypt(walk.dst.virt.addr,
- walk.src.virt.addr,
- walk.nbytes - tail, ctx->key_enc,
- num_rounds(ctx), mac, walk.iv);
- kernel_neon_end();
+ kernel_neon_begin();
+ ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc,
+ num_rounds(ctx), mac, walk.iv);

- err = skcipher_walk_done(&walk, tail);
- }
- if (!err) {
- kernel_neon_begin();
- ce_aes_ccm_final(mac, buf, ctx->key_enc,
- num_rounds(ctx));
- kernel_neon_end();
- }
- } else {
- err = ccm_crypt_fallback(&walk, mac, buf, ctx, true);
+ if (walk.nbytes == walk.total)
+ ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+ kernel_neon_end();
+
+ err = skcipher_walk_done(&walk, tail);
}
if (err)
return err;
@@ -299,32 +217,23 @@ static int ccm_decrypt(struct aead_request *req)

err = skcipher_walk_aead_decrypt(&walk, req, false);

- if (crypto_simd_usable()) {
- while (walk.nbytes) {
- u32 tail = walk.nbytes % AES_BLOCK_SIZE;
+ while (walk.nbytes) {
+ u32 tail = walk.nbytes % AES_BLOCK_SIZE;

- if (walk.nbytes == walk.total)
- tail = 0;
+ if (walk.nbytes == walk.total)
+ tail = 0;

- kernel_neon_begin();
- ce_aes_ccm_decrypt(walk.dst.virt.addr,
- walk.src.virt.addr,
- walk.nbytes - tail, ctx->key_enc,
- num_rounds(ctx), mac, walk.iv);
- kernel_neon_end();
+ kernel_neon_begin();
+ ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ walk.nbytes - tail, ctx->key_enc,
+ num_rounds(ctx), mac, walk.iv);

- err = skcipher_walk_done(&walk, tail);
- }
- if (!err) {
- kernel_neon_begin();
- ce_aes_ccm_final(mac, buf, ctx->key_enc,
- num_rounds(ctx));
- kernel_neon_end();
- }
- } else {
- err = ccm_crypt_fallback(&walk, mac, buf, ctx, false);
- }
+ if (walk.nbytes == walk.total)
+ ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
+ kernel_neon_end();

+ err = skcipher_walk_done(&walk, tail);
+ }
if (err)
return err;

--
2.30.1

2021-03-04 07:27:26

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 1/9] arm64: assembler: remove conditional NEON yield macros

The users of the conditional NEON yield macros have all been switched to
the simplified cond_yield macro, and so the NEON specific ones can be
removed.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/include/asm/assembler.h | 70 --------------------
1 file changed, 70 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index ca31594d3d6c..e0fc1d424f9b 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -692,76 +692,6 @@ USER(\label, ic ivau, \tmp2) // invalidate I line PoU
isb
.endm

-/*
- * Check whether to yield to another runnable task from kernel mode NEON code
- * (which runs with preemption disabled).
- *
- * if_will_cond_yield_neon
- * // pre-yield patchup code
- * do_cond_yield_neon
- * // post-yield patchup code
- * endif_yield_neon <label>
- *
- * where <label> is optional, and marks the point where execution will resume
- * after a yield has been performed. If omitted, execution resumes right after
- * the endif_yield_neon invocation. Note that the entire sequence, including
- * the provided patchup code, will be omitted from the image if
- * CONFIG_PREEMPTION is not defined.
- *
- * As a convenience, in the case where no patchup code is required, the above
- * sequence may be abbreviated to
- *
- * cond_yield_neon <label>
- *
- * Note that the patchup code does not support assembler directives that change
- * the output section, any use of such directives is undefined.
- *
- * The yield itself consists of the following:
- * - Check whether the preempt count is exactly 1 and a reschedule is also
- * needed. If so, calling of preempt_enable() in kernel_neon_end() will
- * trigger a reschedule. If it is not the case, yielding is pointless.
- * - Disable and re-enable kernel mode NEON, and branch to the yield fixup
- * code.
- *
- * This macro sequence may clobber all CPU state that is not guaranteed by the
- * AAPCS to be preserved across an ordinary function call.
- */
-
- .macro cond_yield_neon, lbl
- if_will_cond_yield_neon
- do_cond_yield_neon
- endif_yield_neon \lbl
- .endm
-
- .macro if_will_cond_yield_neon
-#ifdef CONFIG_PREEMPTION
- get_current_task x0
- ldr x0, [x0, #TSK_TI_PREEMPT]
- sub x0, x0, #PREEMPT_DISABLE_OFFSET
- cbz x0, .Lyield_\@
- /* fall through to endif_yield_neon */
- .subsection 1
-.Lyield_\@ :
-#else
- .section ".discard.cond_yield_neon", "ax"
-#endif
- .endm
-
- .macro do_cond_yield_neon
- bl kernel_neon_end
- bl kernel_neon_begin
- .endm
-
- .macro endif_yield_neon, lbl
- .ifnb \lbl
- b \lbl
- .else
- b .Lyield_out_\@
- .endif
- .previous
-.Lyield_out_\@ :
- .endm
-
/*
* Check whether preempt-disabled code should yield as soon as it
* is able. This is the case if re-enabling preemption a single
--
2.30.1

2021-03-04 07:27:26

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 5/9] crypto: skcipher - disallow en/decrypt for non-task or non-softirq context

In order to ensure that kernel mode SIMD routines will not need a scalar
fallback if they run with softirqs disabled, disallow any use of the
skcipher encrypt and decrypt routines from outside of task or softirq
context.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
crypto/skcipher.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index a15376245416..d1a2ba6eacbe 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -623,6 +623,11 @@ int crypto_skcipher_encrypt(struct skcipher_request *req)
unsigned int cryptlen = req->cryptlen;
int ret;

+ if (!(alg->cra_flags & CRYPTO_ALG_ASYNC) &&
+ WARN_ONCE(!in_task() && !in_serving_softirq(),
+ "synchronous call from invalid context\n"))
+ return -EBUSY;
+
crypto_stats_get(alg);
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
ret = -ENOKEY;
@@ -640,6 +645,11 @@ int crypto_skcipher_decrypt(struct skcipher_request *req)
unsigned int cryptlen = req->cryptlen;
int ret;

+ if (!(alg->cra_flags & CRYPTO_ALG_ASYNC) &&
+ WARN_ONCE(!in_task() && !in_serving_softirq(),
+ "synchronous call from invalid context\n"))
+ return -EBUSY;
+
crypto_stats_get(alg);
if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
ret = -ENOKEY;
--
2.30.1

2021-03-30 09:54:53

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 1/9] arm64: assembler: remove conditional NEON yield macros

On Tue, Mar 02, 2021 at 10:01:10AM +0100, Ard Biesheuvel wrote:
> The users of the conditional NEON yield macros have all been switched to
> the simplified cond_yield macro, and so the NEON specific ones can be
> removed.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/include/asm/assembler.h | 70 --------------------
> 1 file changed, 70 deletions(-)

Acked-by: Will Deacon <[email protected]>

Will

2021-03-30 10:37:38

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] arm64: fpsimd: run kernel mode NEON with softirqs disabled

On Tue, Mar 02, 2021 at 10:01:12AM +0100, Ard Biesheuvel wrote:
> Kernel mode NEON can be used in task or softirq context, but only in
> a non-nesting manner, i.e., softirq context is only permitted if the
> interrupt was not taken at a point where the kernel was using the NEON
> in task context.
>
> This means all users of kernel mode NEON have to be aware of this
> limitation, and either need to provide scalar fallbacks that may be much
> slower (up to 20x for AES instructions) and potentially less safe, or
> use an asynchronous interface that defers processing to a later time
> when the NEON is guaranteed to be available.
>
> Given that grabbing and releasing the NEON is cheap, we can relax this
> restriction, by increasing the granularity of kernel mode NEON code, and
> always disabling softirq processing while the NEON is being used in task
> context.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/crypto/aes-modes.S | 2 +-
> arch/arm64/crypto/sha1-ce-core.S | 2 +-
> arch/arm64/crypto/sha2-ce-core.S | 2 +-
> arch/arm64/crypto/sha3-ce-core.S | 4 +--
> arch/arm64/crypto/sha512-ce-core.S | 2 +-
> arch/arm64/include/asm/assembler.h | 28 +++++++++++++++-----
> arch/arm64/kernel/asm-offsets.c | 2 ++
> arch/arm64/kernel/fpsimd.c | 4 +--
> 8 files changed, 31 insertions(+), 15 deletions(-)

Acked-by: Will Deacon <[email protected]>

Will

2021-04-12 08:40:58

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 1/9] arm64: assembler: remove conditional NEON yield macros

On Tue, 30 Mar 2021 at 11:52, Will Deacon <[email protected]> wrote:
>
> On Tue, Mar 02, 2021 at 10:01:10AM +0100, Ard Biesheuvel wrote:
> > The users of the conditional NEON yield macros have all been switched to
> > the simplified cond_yield macro, and so the NEON specific ones can be
> > removed.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/arm64/include/asm/assembler.h | 70 --------------------
> > 1 file changed, 70 deletions(-)
>
> Acked-by: Will Deacon <[email protected]>
>

Thanks.

Catalin,

Please consider taking the first 3 patches of this series through the
arm64 tree. I will then resubmit the rest to be taken via the crypto
tree for the next cycle.

Thanks,
Ard.

2021-04-12 13:13:18

by Catalin Marinas

[permalink] [raw]
Subject: Re: (subset) [PATCH v2 0/9] running kernel mode SIMD with softirqs disabled

On Tue, 2 Mar 2021 10:01:09 +0100, Ard Biesheuvel wrote:
> [ TL;DR for the non-ARM folks on CC: disabling softirq processing when using
> SIMD in kernel mode could reduce complexity and improve performance, but we
> need to decide whether we can do this, and how much softirq processing
> latency we can tolerate. If we can find a satisfactory solution for this,
> we might do the same for x86 and 32-bit ARM as well.
>
> However, based on preliminary off-list discussions with peterz and luto, it
> seems that for x86, there is a preference for using per-CPU buffers to
> preserve/restore the task context's kernel mode SIMD state when the task is
> interrupted to perform kernel mode SIMD in softirq context. On arm64, we
> actually had this arrangement before, and removed it because it made
> reasoning about preserving/restoring userland SVE state (32 SIMD registers
> of up to 2 kbit in size) rather complex. ]
>
> [...]

Applied to arm64 (for-next/neon-softirqs-disabled), thanks!

[1/9] arm64: assembler: remove conditional NEON yield macros
https://git.kernel.org/arm64/c/27248fe1abb2
[2/9] arm64: assembler: introduce wxN aliases for wN registers
https://git.kernel.org/arm64/c/4c4dcd3541f8
[3/9] arm64: fpsimd: run kernel mode NEON with softirqs disabled
https://git.kernel.org/arm64/c/13150149aa6d

--
Catalin