2023-12-08 12:37:08

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v4 0/4] arm64: Run kernel mode NEON with preemption enabled

From: Ard Biesheuvel <[email protected]>

Currently, kernel mode NEON (SIMD) support is implemented in a way that
requires preemption to be disabled while the SIMD registers are live.
The reason for this is that those registers are not in the set that is
preserved/restored on exception entry/exit and context switch, as this
would impact performance generally, even for workloads where kernel mode
SIMD is not the bottleneck.

However, doing substantial work with preemption disabled is not great,
as it affects scheduling latency, which is especially problematic for
real-time use cases. So ideally, we should keep preemption enabled when
we can, and find another way to ensure that this does not corrupt the
NEON register state of in-kernel SIMD users.

This series implements a suggestion by Mark Rutland, and introduces a
thread_info flag TIF_KERNEL_FPSTATE, which indicates to the thread
switch machinery that the task in question has live kernel mode SIMD
state which needs to be preserved and restored. The space needed for
this is allocated in thread_struct. (*)

Given that currently, we run kernel mode NEON with softirqs disabled (to
avoid the need for preserving kernel mode NEON context belonging to task
context while the SIMD unit is being used by code running in softirq
context), just removing the preempt_disable/enable calls is not
sufficient, and we also need to leave softirqs enabled. This means that
we may need to preserve kernel mode NEON state not only on a context
switch, but also when code running in softirq context takes ownership of
the SIMD unit, but this is straight-forward once we add the scratch
space to thread_struct. (On PREEMPT_RT, softirqs execute with preemption
enabled, making kernel mode FPSIMD in softirq context preemptible as
well. We rely on the fact that the task that hosts the softirq dispatch
logic does not itself use kernel mode FPSIMD in task context to ensure
that there is only a single kernel mode FPSIMD state that may need to be
preserved and restored.)

(*) We might decide to allocate this space (~512 bytes) dynamically, if
the thread_struct memory footprint causes issues. However, we should
also explore doing the same for the user space FPSIMD state, as kernel
threads never return to user space and have no need for this allocation.

v4:
- for the time being, make the existing yield logic depend on
CONFIG_PREEMPT_VOLUNTARY, so it can be retired once the prerequisite
core preempt changes (which remove CONFIG_PREEMPT_VOLUNTARY) have been
merged [0]
- incorporate feedback from Mark Rutland and include his acks

v3:
- add patch to drop yield logic from crypto C glue code
- add R-b from Mark

v2:
- tweak some commit logs for clarity
- integrate with the existing lazy restore logic
- add Mark's R-b to patch #1

[0] https://lore.kernel.org/lkml/[email protected]/

Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>

Ard Biesheuvel (4):
arm64: fpsimd: Drop unneeded 'busy' flag
arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
arm64: crypto: Disable yielding logic unless
CONFIG_PREEMPT_VOLUNTARY=y

arch/arm64/crypto/aes-ce-ccm-glue.c | 8 +-
arch/arm64/crypto/chacha-neon-glue.c | 5 +-
arch/arm64/crypto/crct10dif-ce-glue.c | 6 +-
arch/arm64/crypto/nhpoly1305-neon-glue.c | 5 +-
arch/arm64/crypto/poly1305-glue.c | 5 +-
arch/arm64/crypto/polyval-ce-glue.c | 9 +-
arch/arm64/include/asm/assembler.h | 4 +-
arch/arm64/include/asm/processor.h | 3 +
arch/arm64/include/asm/simd.h | 11 +-
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/kernel/fpsimd.c | 163 +++++++++++++-------
11 files changed, 140 insertions(+), 80 deletions(-)

--
2.43.0.472.g3155946c3a-goog



2023-12-08 12:37:19

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v4 1/4] arm64: fpsimd: Drop unneeded 'busy' flag

From: Ard Biesheuvel <[email protected]>

Kernel mode NEON will preserve the user mode FPSIMD state by saving it
into the task struct before clobbering the registers. In order to avoid
the need for preserving kernel mode state too, we disallow nested use of
kernel mode NEON, i..e, use in softirq context while the interrupted
task context was using kernel mode NEON too.

Originally, this policy was implemented using a per-CPU flag which was
exposed via may_use_simd(), requiring the users of the kernel mode NEON
to deal with the possibility that it might return false, and having NEON
and non-NEON code paths. This policy was changed by commit
13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
disabled"), and now, softirq processing is disabled entirely instead,
and so may_use_simd() can never fail when called from task or softirq
context.

This means we can drop the fpsimd_context_busy flag entirely, and
instead, ensure that we disable softirq processing in places where we
formerly relied on the flag for preventing races in the FPSIMD preserve
routines.

Signed-off-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/simd.h | 11 +---
arch/arm64/kernel/fpsimd.c | 53 +++++---------------
2 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 6a75d7ecdcaa..8e86c9e70e48 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -12,8 +12,6 @@
#include <linux/preempt.h>
#include <linux/types.h>

-DECLARE_PER_CPU(bool, fpsimd_context_busy);
-
#ifdef CONFIG_KERNEL_MODE_NEON

/*
@@ -28,17 +26,10 @@ static __must_check inline bool may_use_simd(void)
/*
* We must make sure that the SVE has been initialized properly
* before using the SIMD in kernel.
- * fpsimd_context_busy is only set while preemption is disabled,
- * and is clear whenever preemption is enabled. Since
- * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy
- * cannot change under our feet -- if it's set we cannot be
- * migrated, and if it's clear we cannot be migrated to a CPU
- * where it is set.
*/
return !WARN_ON(!system_capabilities_finalized()) &&
system_supports_fpsimd() &&
- !in_hardirq() && !irqs_disabled() && !in_nmi() &&
- !this_cpu_read(fpsimd_context_busy);
+ !in_hardirq() && !irqs_disabled() && !in_nmi();
}

#else /* ! CONFIG_KERNEL_MODE_NEON */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1559c706d32d..ccc4a78a70e4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -85,13 +85,13 @@
* softirq kicks in. Upon vcpu_put(), KVM will save the vcpu FP state and
* flag the register state as invalid.
*
- * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may
- * save the task's FPSIMD context back to task_struct from softirq context.
- * To prevent this from racing with the manipulation of the task's FPSIMD state
- * from task context and thereby corrupting the state, it is necessary to
- * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
- * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to
- * run but prevent them to use FPSIMD.
+ * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may be
+ * called from softirq context, which will save the task's FPSIMD context back
+ * to task_struct. To prevent this from racing with the manipulation of the
+ * task's FPSIMD state from task context and thereby corrupting the state, it
+ * is necessary to protect any manipulation of a task's fpsimd_state or
+ * TIF_FOREIGN_FPSTATE flag with get_cpu_fpsimd_context(), which will suspend
+ * softirq servicing entirely until put_cpu_fpsimd_context() is called.
*
* For a certain task, the sequence may look something like this:
* - the task gets scheduled in; if both the task's fpsimd_cpu field
@@ -209,27 +209,14 @@ static inline void sme_free(struct task_struct *t) { }

#endif

-DEFINE_PER_CPU(bool, fpsimd_context_busy);
-EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
-
static void fpsimd_bind_task_to_cpu(void);

-static void __get_cpu_fpsimd_context(void)
-{
- bool busy = __this_cpu_xchg(fpsimd_context_busy, true);
-
- WARN_ON(busy);
-}
-
/*
* Claim ownership of the CPU FPSIMD context for use by the calling context.
*
* The caller may freely manipulate the FPSIMD context metadata until
* put_cpu_fpsimd_context() is called.
*
- * The double-underscore version must only be called if you know the task
- * can't be preempted.
- *
* On RT kernels local_bh_disable() is not sufficient because it only
* serializes soft interrupt related sections via a local lock, but stays
* preemptible. Disabling preemption is the right choice here as bottom
@@ -242,14 +229,6 @@ static void get_cpu_fpsimd_context(void)
local_bh_disable();
else
preempt_disable();
- __get_cpu_fpsimd_context();
-}
-
-static void __put_cpu_fpsimd_context(void)
-{
- bool busy = __this_cpu_xchg(fpsimd_context_busy, false);
-
- WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
}

/*
@@ -261,18 +240,12 @@ static void __put_cpu_fpsimd_context(void)
*/
static void put_cpu_fpsimd_context(void)
{
- __put_cpu_fpsimd_context();
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
local_bh_enable();
else
preempt_enable();
}

-static bool have_cpu_fpsimd_context(void)
-{
- return !preemptible() && __this_cpu_read(fpsimd_context_busy);
-}
-
unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
{
return task->thread.vl[type];
@@ -383,7 +356,7 @@ static void task_fpsimd_load(void)
bool restore_ffr;

WARN_ON(!system_supports_fpsimd());
- WARN_ON(!have_cpu_fpsimd_context());
+ WARN_ON(preemptible());

if (system_supports_sve() || system_supports_sme()) {
switch (current->thread.fp_type) {
@@ -467,7 +440,7 @@ static void fpsimd_save(void)
unsigned int vl;

WARN_ON(!system_supports_fpsimd());
- WARN_ON(!have_cpu_fpsimd_context());
+ WARN_ON(preemptible());

if (test_thread_flag(TIF_FOREIGN_FPSTATE))
return;
@@ -1507,7 +1480,7 @@ void fpsimd_thread_switch(struct task_struct *next)
if (!system_supports_fpsimd())
return;

- __get_cpu_fpsimd_context();
+ WARN_ON_ONCE(!irqs_disabled());

/* Save unsaved fpsimd state, if any: */
fpsimd_save();
@@ -1523,8 +1496,6 @@ void fpsimd_thread_switch(struct task_struct *next)

update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
wrong_task || wrong_cpu);
-
- __put_cpu_fpsimd_context();
}

static void fpsimd_flush_thread_vl(enum vec_type type)
@@ -1829,10 +1800,10 @@ void fpsimd_save_and_flush_cpu_state(void)
if (!system_supports_fpsimd())
return;
WARN_ON(preemptible());
- __get_cpu_fpsimd_context();
+ get_cpu_fpsimd_context();
fpsimd_save();
fpsimd_flush_cpu_state();
- __put_cpu_fpsimd_context();
+ put_cpu_fpsimd_context();
}

#ifdef CONFIG_KERNEL_MODE_NEON
--
2.43.0.472.g3155946c3a-goog


2023-12-08 12:37:28

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v4 2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch

From: Ard Biesheuvel <[email protected]>

Currently, the FPSIMD register file is not preserved and restored along
with the general registers on exception entry/exit or context switch.
For this reason, we disable preemption when enabling FPSIMD for kernel
mode use in task context, and suspend the processing of softirqs so that
there are no concurrent uses in the kernel. (Kernel mode FPSIMD may not
be used at all in other contexts).

Disabling preemption while doing CPU intensive work on inputs of
potentially unbounded size is bad for real-time performance, which is
why we try and ensure that SIMD crypto code does not operate on more
than ~4k at a time, which is an arbitrary limit and requires assembler
code to implement efficiently.

We can avoid the need for disabling preemption if we can ensure that any
in-kernel users of the NEON will not lose the FPSIMD register state
across a context switch. And given that disabling softirqs implicitly
disables preemption as well, we will also have to ensure that a softirq
that runs code using FPSIMD can safely interrupt an in-kernel user.

So introduce a thread_info flag TIF_USING_KMODE_FPSIMD, and modify the
context switch hook for FPSIMD to preserve and restore the kernel mode
FPSIMD to/from struct thread_struct when it is set. This avoids any
scheduling blackouts due to prolonged use of FPSIMD in kernel mode,
without the need for manual yielding.

In order to support softirq processing while FPSIMD is being used in
kernel task context, use the same flag to decide whether the kernel mode
FPSIMD state needs to be preserved and restored before allowing FPSIMD
to be used in softirq context.

Signed-off-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
---
arch/arm64/include/asm/processor.h | 2 +
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/kernel/fpsimd.c | 92 ++++++++++++++++----
3 files changed, 77 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index e5bc54522e71..ce6eebd6c08b 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -167,6 +167,8 @@ struct thread_struct {
unsigned long fault_address; /* fault info */
unsigned long fault_code; /* ESR_EL1 value */
struct debug_info debug; /* debugging */
+
+ struct user_fpsimd_state kernel_fpsimd_state;
#ifdef CONFIG_ARM64_PTR_AUTH
struct ptrauth_keys_user keys_user;
#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 553d1bc559c6..e72a3bf9e563 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -80,6 +80,7 @@ void arch_setup_new_exec(void);
#define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */
#define TIF_SME 27 /* SME in use */
#define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */
+#define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */

#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ccc4a78a70e4..c2d05de677d1 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -357,6 +357,7 @@ static void task_fpsimd_load(void)

WARN_ON(!system_supports_fpsimd());
WARN_ON(preemptible());
+ WARN_ON(test_thread_flag(TIF_KERNEL_FPSTATE));

if (system_supports_sve() || system_supports_sme()) {
switch (current->thread.fp_type) {
@@ -379,7 +380,7 @@ static void task_fpsimd_load(void)
default:
/*
* This indicates either a bug in
- * fpsimd_save() or memory corruption, we
+ * fpsimd_save_user_state() or memory corruption, we
* should always record an explicit format
* when we save. We always at least have the
* memory allocated for FPSMID registers so
@@ -430,7 +431,7 @@ static void task_fpsimd_load(void)
* than via current, if we are saving KVM state then it will have
* ensured that the type of registers to save is set in last->to_save.
*/
-static void fpsimd_save(void)
+static void fpsimd_save_user_state(void)
{
struct cpu_fp_state const *last =
this_cpu_ptr(&fpsimd_last_state);
@@ -861,7 +862,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
if (task == current) {
get_cpu_fpsimd_context();

- fpsimd_save();
+ fpsimd_save_user_state();
}

fpsimd_flush_task_state(task);
@@ -1473,6 +1474,16 @@ void do_fpsimd_exc(unsigned long esr, struct pt_regs *regs)
current);
}

+static void fpsimd_load_kernel_state(struct task_struct *task)
+{
+ fpsimd_load_state(&task->thread.kernel_fpsimd_state);
+}
+
+static void fpsimd_save_kernel_state(struct task_struct *task)
+{
+ fpsimd_save_state(&task->thread.kernel_fpsimd_state);
+}
+
void fpsimd_thread_switch(struct task_struct *next)
{
bool wrong_task, wrong_cpu;
@@ -1483,19 +1494,28 @@ void fpsimd_thread_switch(struct task_struct *next)
WARN_ON_ONCE(!irqs_disabled());

/* Save unsaved fpsimd state, if any: */
- fpsimd_save();
+ if (test_thread_flag(TIF_KERNEL_FPSTATE))
+ fpsimd_save_kernel_state(current);
+ else
+ fpsimd_save_user_state();

- /*
- * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's
- * state. For kernel threads, FPSIMD registers are never loaded
- * and wrong_task and wrong_cpu will always be true.
- */
- wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
- &next->thread.uw.fpsimd_state;
- wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+ if (test_tsk_thread_flag(next, TIF_KERNEL_FPSTATE)) {
+ fpsimd_load_kernel_state(next);
+ set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
+ } else {
+ /*
+ * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's
+ * state. For kernel threads, FPSIMD registers are never
+ * loaded with user mode FPSIMD state and so wrong_task and
+ * wrong_cpu will always be true.
+ */
+ wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+ &next->thread.uw.fpsimd_state;
+ wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();

- update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
- wrong_task || wrong_cpu);
+ update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+ wrong_task || wrong_cpu);
+ }
}

static void fpsimd_flush_thread_vl(enum vec_type type)
@@ -1585,7 +1605,7 @@ void fpsimd_preserve_current_state(void)
return;

get_cpu_fpsimd_context();
- fpsimd_save();
+ fpsimd_save_user_state();
put_cpu_fpsimd_context();
}

@@ -1801,7 +1821,7 @@ void fpsimd_save_and_flush_cpu_state(void)
return;
WARN_ON(preemptible());
get_cpu_fpsimd_context();
- fpsimd_save();
+ fpsimd_save_user_state();
fpsimd_flush_cpu_state();
put_cpu_fpsimd_context();
}
@@ -1835,10 +1855,37 @@ void kernel_neon_begin(void)
get_cpu_fpsimd_context();

/* Save unsaved fpsimd state, if any: */
- fpsimd_save();
+ if (test_thread_flag(TIF_KERNEL_FPSTATE)) {
+ BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq());
+ fpsimd_save_kernel_state(current);
+ } else {
+ fpsimd_save_user_state();
+
+ /*
+ * Set the thread flag so that the kernel mode FPSIMD state
+ * will be context switched along with the rest of the task
+ * state.
+ *
+ * On non-PREEMPT_RT, softirqs may interrupt task level kernel
+ * mode FPSIMD, but the task will not be preemptible so setting
+ * TIF_KERNEL_FPSTATE for those would be both wrong (as it
+ * would mark the task context FPSIMD state as requiring a
+ * context switch) and unnecessary.
+ *
+ * On PREEMPT_RT, softirqs are serviced from a separate thread,
+ * which is scheduled as usual, and this guarantees that these
+ * softirqs are not interrupting use of the FPSIMD in kernel
+ * mode in task context. So in this case, setting the flag here
+ * is always appropriate.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq())
+ set_thread_flag(TIF_KERNEL_FPSTATE);
+ }

/* Invalidate any task state remaining in the fpsimd regs: */
fpsimd_flush_cpu_state();
+
+ put_cpu_fpsimd_context();
}
EXPORT_SYMBOL_GPL(kernel_neon_begin);

@@ -1856,7 +1903,16 @@ void kernel_neon_end(void)
if (!system_supports_fpsimd())
return;

- put_cpu_fpsimd_context();
+ /*
+ * If we are returning from a nested use of kernel mode FPSIMD, restore
+ * the task context kernel mode FPSIMD state. This can only happen when
+ * running in softirq context on non-PREEMPT_RT.
+ */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() &&
+ test_thread_flag(TIF_KERNEL_FPSTATE))
+ fpsimd_load_kernel_state(current);
+ else
+ clear_thread_flag(TIF_KERNEL_FPSTATE);
}
EXPORT_SYMBOL_GPL(kernel_neon_end);

--
2.43.0.472.g3155946c3a-goog


2023-12-08 12:37:37

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v4 3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD

From: Ard Biesheuvel <[email protected]>

Now that kernel mode FPSIMD state is context switched along with other
task state, we can enable the existing logic that keeps track of which
task's FPSIMD state the CPU is holding in its registers. If it is the
context of the task that we are switching to, we can elide the reload of
the FPSIMD state from memory.

Note that we also need to check whether the FPSIMD state on this CPU is
the most recent: if a task gets migrated away and back again, the state
in memory may be more recent than the state in the CPU. So add another
CPU id field to task_struct to keep track of this. (We could reuse the
existing CPU id field used for user mode context, but that might result
in user state to be discarded unnecessarily, given that two distinct
CPUs could be holding the most recent user mode state and the most
recent kernel mode state)

Signed-off-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/kernel/fpsimd.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index ce6eebd6c08b..5b0a04810b23 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -169,6 +169,7 @@ struct thread_struct {
struct debug_info debug; /* debugging */

struct user_fpsimd_state kernel_fpsimd_state;
+ unsigned int kernel_fpsimd_cpu;
#ifdef CONFIG_ARM64_PTR_AUTH
struct ptrauth_keys_user keys_user;
#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index c2d05de677d1..50ae93d9baec 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1476,12 +1476,30 @@ void do_fpsimd_exc(unsigned long esr, struct pt_regs *regs)

static void fpsimd_load_kernel_state(struct task_struct *task)
{
+ struct cpu_fp_state *last = this_cpu_ptr(&fpsimd_last_state);
+
+ /*
+ * Elide the load if this CPU holds the most recent kernel mode
+ * FPSIMD context of the current task.
+ */
+ if (last->st == &task->thread.kernel_fpsimd_state &&
+ task->thread.kernel_fpsimd_cpu == smp_processor_id())
+ return;
+
fpsimd_load_state(&task->thread.kernel_fpsimd_state);
}

static void fpsimd_save_kernel_state(struct task_struct *task)
{
+ struct cpu_fp_state cpu_fp_state = {
+ .st = &task->thread.kernel_fpsimd_state,
+ .to_save = FP_STATE_FPSIMD,
+ };
+
fpsimd_save_state(&task->thread.kernel_fpsimd_state);
+ fpsimd_bind_state_to_cpu(&cpu_fp_state);
+
+ task->thread.kernel_fpsimd_cpu = smp_processor_id();
}

void fpsimd_thread_switch(struct task_struct *next)
--
2.43.0.472.g3155946c3a-goog


2023-12-08 12:37:47

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v4 4/4] arm64: crypto: Disable yielding logic unless CONFIG_PREEMPT_VOLUNTARY=y

From: Ard Biesheuvel <[email protected]>

Now that kernel mode use of SIMD runs with preemption enabled, the
explicit yield logic is redundant for preemptible builds, and since it
should not actually be used at all on non-preemptible builds (where
kernel work is supposed to run to completion and not give up its time
slice prematurely), let's make it depend on CONFIG_PREEMPT_VOLUNTARY.

Once CONFIG_PREEMPT_VOLUNTARY is removed, all the logic it guards can be
removed as well.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/crypto/aes-ce-ccm-glue.c | 8 ++++++--
arch/arm64/crypto/chacha-neon-glue.c | 5 ++++-
arch/arm64/crypto/crct10dif-ce-glue.c | 6 ++++--
arch/arm64/crypto/nhpoly1305-neon-glue.c | 5 ++++-
arch/arm64/crypto/poly1305-glue.c | 5 ++++-
arch/arm64/crypto/polyval-ce-glue.c | 9 +++++++--
arch/arm64/include/asm/assembler.h | 4 ++--
7 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
index 25cd3808ecbe..82e293a698ff 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -125,13 +125,17 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
scatterwalk_start(&walk, sg_next(walk.sg));
n = scatterwalk_clamp(&walk, len);
}
- n = min_t(u32, n, SZ_4K); /* yield NEON at least every 4k */
+
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY))
+ n = min_t(u32, n, SZ_4K); /* yield NEON at least every 4k */
+
p = scatterwalk_map(&walk);

macp = ce_aes_ccm_auth_data(mac, p, n, macp, ctx->key_enc,
num_rounds(ctx));

- if (len / SZ_4K > (len - n) / SZ_4K) {
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) &&
+ len / SZ_4K > (len - n) / SZ_4K) {
kernel_neon_end();
kernel_neon_begin();
}
diff --git a/arch/arm64/crypto/chacha-neon-glue.c b/arch/arm64/crypto/chacha-neon-glue.c
index af2bbca38e70..655b250cef4a 100644
--- a/arch/arm64/crypto/chacha-neon-glue.c
+++ b/arch/arm64/crypto/chacha-neon-glue.c
@@ -88,7 +88,10 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes,
return chacha_crypt_generic(state, dst, src, bytes, nrounds);

do {
- unsigned int todo = min_t(unsigned int, bytes, SZ_4K);
+ unsigned int todo = bytes;
+
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY))
+ todo = min_t(unsigned int, todo, SZ_4K);

kernel_neon_begin();
chacha_doneon(state, dst, src, todo, nrounds);
diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c
index 09eb1456aed4..c6e8cf4f56da 100644
--- a/arch/arm64/crypto/crct10dif-ce-glue.c
+++ b/arch/arm64/crypto/crct10dif-ce-glue.c
@@ -40,7 +40,8 @@ static int crct10dif_update_pmull_p8(struct shash_desc *desc, const u8 *data,
do {
unsigned int chunk = length;

- if (chunk > SZ_4K + CRC_T10DIF_PMULL_CHUNK_SIZE)
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) &&
+ chunk > SZ_4K + CRC_T10DIF_PMULL_CHUNK_SIZE)
chunk = SZ_4K;

kernel_neon_begin();
@@ -65,7 +66,8 @@ static int crct10dif_update_pmull_p64(struct shash_desc *desc, const u8 *data,
do {
unsigned int chunk = length;

- if (chunk > SZ_4K + CRC_T10DIF_PMULL_CHUNK_SIZE)
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY) &&
+ chunk > SZ_4K + CRC_T10DIF_PMULL_CHUNK_SIZE)
chunk = SZ_4K;

kernel_neon_begin();
diff --git a/arch/arm64/crypto/nhpoly1305-neon-glue.c b/arch/arm64/crypto/nhpoly1305-neon-glue.c
index e4a0b463f080..cbbc51b27d93 100644
--- a/arch/arm64/crypto/nhpoly1305-neon-glue.c
+++ b/arch/arm64/crypto/nhpoly1305-neon-glue.c
@@ -23,7 +23,10 @@ static int nhpoly1305_neon_update(struct shash_desc *desc,
return crypto_nhpoly1305_update(desc, src, srclen);

do {
- unsigned int n = min_t(unsigned int, srclen, SZ_4K);
+ unsigned int n = srclen;
+
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY))
+ n = min_t(unsigned int, n, SZ_4K);

kernel_neon_begin();
crypto_nhpoly1305_update_helper(desc, src, n, nh_neon);
diff --git a/arch/arm64/crypto/poly1305-glue.c b/arch/arm64/crypto/poly1305-glue.c
index 1fae18ba11ed..27f84f5bfc98 100644
--- a/arch/arm64/crypto/poly1305-glue.c
+++ b/arch/arm64/crypto/poly1305-glue.c
@@ -144,7 +144,10 @@ void poly1305_update_arch(struct poly1305_desc_ctx *dctx, const u8 *src,

if (static_branch_likely(&have_neon) && crypto_simd_usable()) {
do {
- unsigned int todo = min_t(unsigned int, len, SZ_4K);
+ unsigned int todo = len;
+
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY))
+ todo = min_t(unsigned int, todo, SZ_4K);

kernel_neon_begin();
poly1305_blocks_neon(&dctx->h, src, todo, 1);
diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c
index 0a3b5718df85..c4c0fb3fcaf4 100644
--- a/arch/arm64/crypto/polyval-ce-glue.c
+++ b/arch/arm64/crypto/polyval-ce-glue.c
@@ -123,8 +123,13 @@ static int polyval_arm64_update(struct shash_desc *desc,
}

while (srclen >= POLYVAL_BLOCK_SIZE) {
- /* allow rescheduling every 4K bytes */
- nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
+ unsigned int len = srclen;
+
+ if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY))
+ /* allow rescheduling every 4K bytes */
+ len = min(len, 4096U);
+
+ nblocks = len / POLYVAL_BLOCK_SIZE;
internal_polyval_update(tctx, src, nblocks, dctx->buffer);
srclen -= nblocks * POLYVAL_BLOCK_SIZE;
src += nblocks * POLYVAL_BLOCK_SIZE;
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 376a980f2bad..0180ac1f9b8b 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -769,6 +769,7 @@ alternative_endif
* field)
*/
.macro cond_yield, lbl:req, tmp:req, tmp2:req
+#ifdef CONFIG_PREEMPT_VOLUNTARY
get_current_task \tmp
ldr \tmp, [\tmp, #TSK_TI_PREEMPT]
/*
@@ -777,15 +778,14 @@ alternative_endif
* run to completion as quickly as we can.
*/
tbnz \tmp, #SOFTIRQ_SHIFT, .Lnoyield_\@
-#ifdef CONFIG_PREEMPTION
sub \tmp, \tmp, #PREEMPT_DISABLE_OFFSET
cbz \tmp, \lbl
-#endif
adr_l \tmp, irq_stat + IRQ_CPUSTAT_SOFTIRQ_PENDING
get_this_cpu_offset \tmp2
ldr w\tmp, [\tmp, \tmp2]
cbnz w\tmp, \lbl // yield on pending softirq in task context
.Lnoyield_\@:
+#endif
.endm

/*
--
2.43.0.472.g3155946c3a-goog


2023-12-11 20:27:46

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64: Run kernel mode NEON with preemption enabled

Hey Ard,

On Fri, 8 Dec 2023 12:32:19 +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <[email protected]>
>
> Currently, kernel mode NEON (SIMD) support is implemented in a way that
> requires preemption to be disabled while the SIMD registers are live.
> The reason for this is that those registers are not in the set that is
> preserved/restored on exception entry/exit and context switch, as this
> would impact performance generally, even for workloads where kernel mode
> SIMD is not the bottleneck.
>
> [...]

I applied the first three patches to for-next/fpsimd:

[1/4] arm64: fpsimd: Drop unneeded 'busy' flag
https://git.kernel.org/arm64/c/e109130b0e5e
[2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
https://git.kernel.org/arm64/c/1e3a3de1ff6c
[3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
https://git.kernel.org/arm64/c/035262623959

It would be nice to have an Ack from Herbert on the last one so that
he's aware of the possible conflicts.

The other thing I tangentially wondered about is what happens now if code
calls uaccess routines (e.g. get_user()) within a kernel_neon_{begin,end}
section? I think previously the fact that preemption had to be disabled
would've caused the might_fault() to explode, but now I suppose the BUG_ON()
in kernel_neon_begin() will save us. Is that right?

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

2023-12-12 08:16:37

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64: Run kernel mode NEON with preemption enabled

On Mon, 11 Dec 2023 at 21:27, Will Deacon <[email protected]> wrote:
>
> Hey Ard,
>
> On Fri, 8 Dec 2023 12:32:19 +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <[email protected]>
> >
> > Currently, kernel mode NEON (SIMD) support is implemented in a way that
> > requires preemption to be disabled while the SIMD registers are live.
> > The reason for this is that those registers are not in the set that is
> > preserved/restored on exception entry/exit and context switch, as this
> > would impact performance generally, even for workloads where kernel mode
> > SIMD is not the bottleneck.
> >
> > [...]
>
> I applied the first three patches to for-next/fpsimd:
>

Thanks

> [1/4] arm64: fpsimd: Drop unneeded 'busy' flag
> https://git.kernel.org/arm64/c/e109130b0e5e
> [2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
> https://git.kernel.org/arm64/c/1e3a3de1ff6c

I spotted a typo in the commit log of this patch:

TIF_USING_KMODE_FPSIMD -> TIF_KERNEL_FPSTATE


> [3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
> https://git.kernel.org/arm64/c/035262623959
>
> It would be nice to have an Ack from Herbert on the last one so that
> he's aware of the possible conflicts.
>
> The other thing I tangentially wondered about is what happens now if code
> calls uaccess routines (e.g. get_user()) within a kernel_neon_{begin,end}
> section? I think previously the fact that preemption had to be disabled
> would've caused the might_fault() to explode, but now I suppose the BUG_ON()
> in kernel_neon_begin() will save us. Is that right?
>

Not sure what you mean by 'save us'. Is there anything fundamentally
wrong with doing user access from a kernel mode NEON region if
preemption remains enabled?

The BUG_ON() will only catch uses from hardirq/NMI context, or cases
where FP/SIMD is not implemented/enabled in the first place so it will
not trigger on a user access.

2023-12-12 10:55:24

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64: Run kernel mode NEON with preemption enabled

On Tue, Dec 12, 2023 at 09:16:13AM +0100, Ard Biesheuvel wrote:
> On Mon, 11 Dec 2023 at 21:27, Will Deacon <[email protected]> wrote:
> > [1/4] arm64: fpsimd: Drop unneeded 'busy' flag
> > https://git.kernel.org/arm64/c/e109130b0e5e
> > [2/4] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
> > https://git.kernel.org/arm64/c/1e3a3de1ff6c
>
> I spotted a typo in the commit log of this patch:
>
> TIF_USING_KMODE_FPSIMD -> TIF_KERNEL_FPSTATE

Cheers, I'll go in and fix that (so the SHAs will change).

> > [3/4] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
> > https://git.kernel.org/arm64/c/035262623959
> >
> > It would be nice to have an Ack from Herbert on the last one so that
> > he's aware of the possible conflicts.
> >
> > The other thing I tangentially wondered about is what happens now if code
> > calls uaccess routines (e.g. get_user()) within a kernel_neon_{begin,end}
> > section? I think previously the fact that preemption had to be disabled
> > would've caused the might_fault() to explode, but now I suppose the BUG_ON()
> > in kernel_neon_begin() will save us. Is that right?
> >
>
> Not sure what you mean by 'save us'. Is there anything fundamentally
> wrong with doing user access from a kernel mode NEON region if
> preemption remains enabled?
>
> The BUG_ON() will only catch uses from hardirq/NMI context, or cases
> where FP/SIMD is not implemented/enabled in the first place so it will
> not trigger on a user access.

As discussed off-list, the vague concern was if kernel_neon_begin() is
nested off the back of a user fault. The BUG_ON() should fire in that case,
so we're all good.

Thanks!

Will

2023-12-12 11:28:13

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] arm64: fpsimd: Drop unneeded 'busy' flag

Hi Ard,

On Fri, Dec 8, 2023 at 12:34 PM Ard Biesheuvel <[email protected]> wrote:
> From: Ard Biesheuvel <[email protected]>
> Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> into the task struct before clobbering the registers. In order to avoid
> the need for preserving kernel mode state too, we disallow nested use of
> kernel mode NEON, i..e, use in softirq context while the interrupted
> task context was using kernel mode NEON too.
>
> Originally, this policy was implemented using a per-CPU flag which was
> exposed via may_use_simd(), requiring the users of the kernel mode NEON
> to deal with the possibility that it might return false, and having NEON
> and non-NEON code paths. This policy was changed by commit
> 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
> disabled"), and now, softirq processing is disabled entirely instead,
> and so may_use_simd() can never fail when called from task or softirq
> context.
>
> This means we can drop the fpsimd_context_busy flag entirely, and
> instead, ensure that we disable softirq processing in places where we
> formerly relied on the flag for preventing races in the FPSIMD preserve
> routines.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> Reviewed-by: Mark Brown <[email protected]>

Thanks for your patch, which is now commit e109130b0e5ec3fd
("arm64: fpsimd: Drop unneeded 'busy' flag") in arm64/for-next/core
and next-20231212.

I have bisected the following warning during boot (on Salvator-XS with
R-Car H3 ES2.0 and on White-Hawk with R-Car V4H) followed by a lock-up
(on Salvator-XS) to this commit:

Reverting commits 035262623959cbe1 ("arm64: fpsimd: Implement lazy
restore for kernel mode FPSIMD"), 1e3a3de1ff6ca6b1 ("arm64: fpsimd:
Preserve/restore kernel mode NEON at context switch"), and this commit
fixes the issue.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-12-12 12:30:08

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] arm64: fpsimd: Drop unneeded 'busy' flag

Hi Geert,

Cheers for the report.

On Tue, Dec 12, 2023 at 12:27:50PM +0100, Geert Uytterhoeven wrote:
> On Fri, Dec 8, 2023 at 12:34 PM Ard Biesheuvel <[email protected]> wrote:
> > From: Ard Biesheuvel <[email protected]>
> > Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> > into the task struct before clobbering the registers. In order to avoid
> > the need for preserving kernel mode state too, we disallow nested use of
> > kernel mode NEON, i..e, use in softirq context while the interrupted
> > task context was using kernel mode NEON too.
> >
> > Originally, this policy was implemented using a per-CPU flag which was
> > exposed via may_use_simd(), requiring the users of the kernel mode NEON
> > to deal with the possibility that it might return false, and having NEON
> > and non-NEON code paths. This policy was changed by commit
> > 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
> > disabled"), and now, softirq processing is disabled entirely instead,
> > and so may_use_simd() can never fail when called from task or softirq
> > context.
> >
> > This means we can drop the fpsimd_context_busy flag entirely, and
> > instead, ensure that we disable softirq processing in places where we
> > formerly relied on the flag for preventing races in the FPSIMD preserve
> > routines.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > Reviewed-by: Mark Brown <[email protected]>
>
> Thanks for your patch, which is now commit e109130b0e5ec3fd
> ("arm64: fpsimd: Drop unneeded 'busy' flag") in arm64/for-next/core
> and next-20231212.
>
> I have bisected the following warning during boot (on Salvator-XS with
> R-Car H3 ES2.0 and on White-Hawk with R-Car V4H) followed by a lock-up
> (on Salvator-XS) to this commit:

Please can you provide the output from the warning and, if possible, a
pointer to your .config?

Cheers,

Will

2023-12-12 12:35:01

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] arm64: fpsimd: Drop unneeded 'busy' flag

Hi Will,

On Tue, Dec 12, 2023 at 1:30 PM Will Deacon <[email protected]> wrote:
> On Tue, Dec 12, 2023 at 12:27:50PM +0100, Geert Uytterhoeven wrote:
> > On Fri, Dec 8, 2023 at 12:34 PM Ard Biesheuvel <[email protected]> wrote:
> > > From: Ard Biesheuvel <[email protected]>
> > > Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> > > into the task struct before clobbering the registers. In order to avoid
> > > the need for preserving kernel mode state too, we disallow nested use of
> > > kernel mode NEON, i..e, use in softirq context while the interrupted
> > > task context was using kernel mode NEON too.
> > >
> > > Originally, this policy was implemented using a per-CPU flag which was
> > > exposed via may_use_simd(), requiring the users of the kernel mode NEON
> > > to deal with the possibility that it might return false, and having NEON
> > > and non-NEON code paths. This policy was changed by commit
> > > 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
> > > disabled"), and now, softirq processing is disabled entirely instead,
> > > and so may_use_simd() can never fail when called from task or softirq
> > > context.
> > >
> > > This means we can drop the fpsimd_context_busy flag entirely, and
> > > instead, ensure that we disable softirq processing in places where we
> > > formerly relied on the flag for preventing races in the FPSIMD preserve
> > > routines.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > Reviewed-by: Mark Brown <[email protected]>
> >
> > Thanks for your patch, which is now commit e109130b0e5ec3fd
> > ("arm64: fpsimd: Drop unneeded 'busy' flag") in arm64/for-next/core
> > and next-20231212.
> >
> > I have bisected the following warning during boot (on Salvator-XS with
> > R-Car H3 ES2.0 and on White-Hawk with R-Car V4H) followed by a lock-up
> > (on Salvator-XS) to this commit:
>
> Please can you provide the output from the warning and, if possible, a

Oops, how did that log disappear?

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/softirq.c:361 __local_bh_enable_ip+0x1a4/0x1c8
CPU: 0 PID: 0 Comm: swapper/0 Not tainted
6.7.0-rc3-arm64-renesas-00001-ge109130b0e5e #2383
Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT)
pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __local_bh_enable_ip+0x1a4/0x1c8
lr : fpsimd_save_and_flush_cpu_state+0x4c/0x58
sp : ffff800081703bb0
x29: ffff800081703bb0 x28: ffff80008171c6b0 x27: ffff8000800167fc
x26: 0000000000000000 x25: 0000000000000000 x24: ffff800081710a68
x23: ffff8000813c4008 x22: ffff800081703ca4 x21: 0000000000000000
x20: ffff8000800167c8 x19: 0000000000000200 x18: ffffffffffffffff
x17: ffff0004c27f2400 x16: 0000000000000001 x15: 0000000000000003
x14: 0000000000000000 x13: 0000000000000003 x12: 0000000000000000
x11: 071c71c71c71c71c x10: ffff800082086b88 x9 : 0000000000000000
x8 : ffff80008203ab50 x7 : 0000000000000bb0 x6 : ffff80008203b700
x5 : ffff80067e2ee000 x4 : 0000000000000202 x3 : ffff80067e2ee000
x2 : ffff80067e2ee000 x1 : ffff800081719fc0 x0 : 0000000100000202
Call trace:
__local_bh_enable_ip+0x1a4/0x1c8
fpsimd_save_and_flush_cpu_state+0x4c/0x58
fpsimd_cpu_pm_notifier+0x1c/0x28
notifier_call_chain+0x9c/0x174
raw_notifier_call_chain_robust+0x40/0x98
cpu_pm_enter+0x3c/0x70
psci_enter_idle_state+0x28/0x78
cpuidle_enter_state+0xe4/0x568
cpuidle_enter+0x34/0x48
do_idle+0x214/0x290
cpu_startup_entry+0x34/0x38
rest_init+0xf8/0x188
arch_post_acpi_subsys_init+0x0/0x8
start_kernel+0x4fc/0x5ec
__primary_switched+0xb4/0xbc
irq event stamp: 15351
hardirqs last enabled at (15349): [<ffff80008015d9bc>]
tick_nohz_idle_enter+0xcc/0x198
hardirqs last disabled at (15350): [<ffff8000800e27a8>] do_idle+0xc4/0x290
softirqs last enabled at (15330): [<ffff800080010614>] __do_softirq+0x494/0x4dc
softirqs last disabled at (15351): [<ffff8000800167c8>]
fpsimd_save_and_flush_cpu_state+0x24/0x58
---[ end trace 0000000000000000 ]---

> pointer to your .config?

renesas-defconfig from
https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel.git/log/?h=topic/renesas-defconfig

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-12-12 12:59:50

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] arm64: fpsimd: Drop unneeded 'busy' flag

On Tue, 12 Dec 2023 at 13:34, Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Will,
>
> On Tue, Dec 12, 2023 at 1:30 PM Will Deacon <[email protected]> wrote:
> > On Tue, Dec 12, 2023 at 12:27:50PM +0100, Geert Uytterhoeven wrote:
> > > On Fri, Dec 8, 2023 at 12:34 PM Ard Biesheuvel <[email protected]> wrote:
> > > > From: Ard Biesheuvel <[email protected]>
> > > > Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> > > > into the task struct before clobbering the registers. In order to avoid
> > > > the need for preserving kernel mode state too, we disallow nested use of
> > > > kernel mode NEON, i..e, use in softirq context while the interrupted
> > > > task context was using kernel mode NEON too.
> > > >
> > > > Originally, this policy was implemented using a per-CPU flag which was
> > > > exposed via may_use_simd(), requiring the users of the kernel mode NEON
> > > > to deal with the possibility that it might return false, and having NEON
> > > > and non-NEON code paths. This policy was changed by commit
> > > > 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
> > > > disabled"), and now, softirq processing is disabled entirely instead,
> > > > and so may_use_simd() can never fail when called from task or softirq
> > > > context.
> > > >
> > > > This means we can drop the fpsimd_context_busy flag entirely, and
> > > > instead, ensure that we disable softirq processing in places where we
> > > > formerly relied on the flag for preventing races in the FPSIMD preserve
> > > > routines.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > > Reviewed-by: Mark Brown <[email protected]>
> > >
> > > Thanks for your patch, which is now commit e109130b0e5ec3fd
> > > ("arm64: fpsimd: Drop unneeded 'busy' flag") in arm64/for-next/core
> > > and next-20231212.
> > >
> > > I have bisected the following warning during boot (on Salvator-XS with
> > > R-Car H3 ES2.0 and on White-Hawk with R-Car V4H) followed by a lock-up
> > > (on Salvator-XS) to this commit:
> >
> > Please can you provide the output from the warning and, if possible, a
>
> Oops, how did that log disappear?
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at kernel/softirq.c:361 __local_bh_enable_ip+0x1a4/0x1c8
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.7.0-rc3-arm64-renesas-00001-ge109130b0e5e #2383
> Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT)
> pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __local_bh_enable_ip+0x1a4/0x1c8
> lr : fpsimd_save_and_flush_cpu_state+0x4c/0x58
> sp : ffff800081703bb0
> x29: ffff800081703bb0 x28: ffff80008171c6b0 x27: ffff8000800167fc
> x26: 0000000000000000 x25: 0000000000000000 x24: ffff800081710a68
> x23: ffff8000813c4008 x22: ffff800081703ca4 x21: 0000000000000000
> x20: ffff8000800167c8 x19: 0000000000000200 x18: ffffffffffffffff
> x17: ffff0004c27f2400 x16: 0000000000000001 x15: 0000000000000003
> x14: 0000000000000000 x13: 0000000000000003 x12: 0000000000000000
> x11: 071c71c71c71c71c x10: ffff800082086b88 x9 : 0000000000000000
> x8 : ffff80008203ab50 x7 : 0000000000000bb0 x6 : ffff80008203b700
> x5 : ffff80067e2ee000 x4 : 0000000000000202 x3 : ffff80067e2ee000
> x2 : ffff80067e2ee000 x1 : ffff800081719fc0 x0 : 0000000100000202
> Call trace:
> __local_bh_enable_ip+0x1a4/0x1c8
> fpsimd_save_and_flush_cpu_state+0x4c/0x58
> fpsimd_cpu_pm_notifier+0x1c/0x28
> notifier_call_chain+0x9c/0x174
> raw_notifier_call_chain_robust+0x40/0x98
> cpu_pm_enter+0x3c/0x70
> psci_enter_idle_state+0x28/0x78
> cpuidle_enter_state+0xe4/0x568
> cpuidle_enter+0x34/0x48
> do_idle+0x214/0x290

Thanks for the report. I missed the fact that this is called from the
idle path. The old code just set and cleared the 'busy' flag there

Could you please check whether this fixes the issue? Thanks.

--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1835,13 +1835,15 @@ static void fpsimd_flush_cpu_state(void)
*/
void fpsimd_save_and_flush_cpu_state(void)
{
+ unsigned long flags;
+
if (!system_supports_fpsimd())
return;
WARN_ON(preemptible());
- get_cpu_fpsimd_context();
+ local_irq_save(flags);
fpsimd_save_user_state();
fpsimd_flush_cpu_state();
- put_cpu_fpsimd_context();
+ local_irq_restore(flags);
}

#ifdef CONFIG_KERNEL_MODE_NEON

2023-12-12 13:54:55

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] arm64: fpsimd: Drop unneeded 'busy' flag

Hi Ard,

On Tue, Dec 12, 2023 at 1:59 PM Ard Biesheuvel <[email protected]> wrote:
> On Tue, 12 Dec 2023 at 13:34, Geert Uytterhoeven <[email protected]> wrote:
> > On Tue, Dec 12, 2023 at 1:30 PM Will Deacon <[email protected]> wrote:
> > > On Tue, Dec 12, 2023 at 12:27:50PM +0100, Geert Uytterhoeven wrote:
> > > > On Fri, Dec 8, 2023 at 12:34 PM Ard Biesheuvel <[email protected]> wrote:
> > > > > From: Ard Biesheuvel <[email protected]>
> > > > > Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> > > > > into the task struct before clobbering the registers. In order to avoid
> > > > > the need for preserving kernel mode state too, we disallow nested use of
> > > > > kernel mode NEON, i..e, use in softirq context while the interrupted
> > > > > task context was using kernel mode NEON too.
> > > > >
> > > > > Originally, this policy was implemented using a per-CPU flag which was
> > > > > exposed via may_use_simd(), requiring the users of the kernel mode NEON
> > > > > to deal with the possibility that it might return false, and having NEON
> > > > > and non-NEON code paths. This policy was changed by commit
> > > > > 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
> > > > > disabled"), and now, softirq processing is disabled entirely instead,
> > > > > and so may_use_simd() can never fail when called from task or softirq
> > > > > context.
> > > > >
> > > > > This means we can drop the fpsimd_context_busy flag entirely, and
> > > > > instead, ensure that we disable softirq processing in places where we
> > > > > formerly relied on the flag for preventing races in the FPSIMD preserve
> > > > > routines.
> > > > >
> > > > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > > > Reviewed-by: Mark Brown <[email protected]>
> > > >
> > > > Thanks for your patch, which is now commit e109130b0e5ec3fd
> > > > ("arm64: fpsimd: Drop unneeded 'busy' flag") in arm64/for-next/core
> > > > and next-20231212.
> > > >
> > > > I have bisected the following warning during boot (on Salvator-XS with
> > > > R-Car H3 ES2.0 and on White-Hawk with R-Car V4H) followed by a lock-up
> > > > (on Salvator-XS) to this commit:
> > >
> > > Please can you provide the output from the warning and, if possible, a
> >
> > Oops, how did that log disappear?
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 0 at kernel/softirq.c:361 __local_bh_enable_ip+0x1a4/0x1c8
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > 6.7.0-rc3-arm64-renesas-00001-ge109130b0e5e #2383
> > Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT)
> > pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : __local_bh_enable_ip+0x1a4/0x1c8
> > lr : fpsimd_save_and_flush_cpu_state+0x4c/0x58
> > sp : ffff800081703bb0
> > x29: ffff800081703bb0 x28: ffff80008171c6b0 x27: ffff8000800167fc
> > x26: 0000000000000000 x25: 0000000000000000 x24: ffff800081710a68
> > x23: ffff8000813c4008 x22: ffff800081703ca4 x21: 0000000000000000
> > x20: ffff8000800167c8 x19: 0000000000000200 x18: ffffffffffffffff
> > x17: ffff0004c27f2400 x16: 0000000000000001 x15: 0000000000000003
> > x14: 0000000000000000 x13: 0000000000000003 x12: 0000000000000000
> > x11: 071c71c71c71c71c x10: ffff800082086b88 x9 : 0000000000000000
> > x8 : ffff80008203ab50 x7 : 0000000000000bb0 x6 : ffff80008203b700
> > x5 : ffff80067e2ee000 x4 : 0000000000000202 x3 : ffff80067e2ee000
> > x2 : ffff80067e2ee000 x1 : ffff800081719fc0 x0 : 0000000100000202
> > Call trace:
> > __local_bh_enable_ip+0x1a4/0x1c8
> > fpsimd_save_and_flush_cpu_state+0x4c/0x58
> > fpsimd_cpu_pm_notifier+0x1c/0x28
> > notifier_call_chain+0x9c/0x174
> > raw_notifier_call_chain_robust+0x40/0x98
> > cpu_pm_enter+0x3c/0x70
> > psci_enter_idle_state+0x28/0x78
> > cpuidle_enter_state+0xe4/0x568
> > cpuidle_enter+0x34/0x48
> > do_idle+0x214/0x290
>
> Thanks for the report. I missed the fact that this is called from the
> idle path. The old code just set and cleared the 'busy' flag there
>
> Could you please check whether this fixes the issue? Thanks.

Thanks, that fixed the issue!

Tested-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-12-12 16:36:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] arm64: Run kernel mode NEON with preemption enabled

On Tue, Dec 12, 2023 at 10:55:13AM +0000, Will Deacon wrote:
> As discussed off-list, the vague concern was if kernel_neon_begin() is
> nested off the back of a user fault. The BUG_ON() should fire in that case,
> so we're all good.

I think we should really document all these rules, and Samuel's series
to add the portable FPU API would be the right place for it. I'd also
really love to see objtool support for enforcing the various rules
where possible.