Hi all,
This patch series keep softirqs enabled while touching FPSIMD/SVE state. For
more details on the impact see patch #3.
This patch series has been benchmarked on Linux 5.1-rc4 with defconfig.
On Juno2:
* hackbench 100 process 1000 (10 times)
* .7% quicker
On ThunderX 2:
* hackbench 1000 process 1000 (20 times)
* 3.4% quicker
Note that while the benchmark has been done on 5.1-rc4, the patch series is
based on kvm-arm/next as it has few conflicts with the SVE KVM series.
Cheers,
Julien Grall (3):
arm64/fpsimd: Remove the prototype for sve_flush_cpu_state()
arch/arm64: fpsimd: Introduce fpsimd_save_and_flush_cpu_state() and
use it
arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state
arch/arm64/include/asm/fpsimd.h | 6 +-
arch/arm64/include/asm/simd.h | 4 +-
arch/arm64/kernel/fpsimd.c | 148 +++++++++++++++++++++++++++++-----------
arch/arm64/kvm/fpsimd.c | 3 +-
4 files changed, 112 insertions(+), 49 deletions(-)
--
2.11.0
The only external user of fpsimd_save() and fpsimd_flush_cpu_state() is
the KVM FPSIMD code.
A follow-up patch will mandate hold of the FPSIMD context while
modifying it. Rather than having to export the new helpers to get/put
the context, we can just introduce a new function to combine
fpsimd_save() and fpsimd_flush_cpu_state().
This has also the advantage to remove any external call of fpsimd_save()
and fpsimd_flush_cpu_state(), so they can be turned static.
Lastly, the new function can also be used in the PM notifier.
Signed-off-by: Julien Grall <[email protected]>
---
Changes in v2:
- Patch added
---
arch/arm64/include/asm/fpsimd.h | 5 ++---
arch/arm64/kernel/fpsimd.c | 17 +++++++++++++----
arch/arm64/kvm/fpsimd.c | 3 +--
3 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 32abbfd852e8..88c223f6ded6 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -48,11 +48,11 @@ struct task_struct;
extern void fpsimd_save_state(struct user_fpsimd_state *state);
extern void fpsimd_load_state(struct user_fpsimd_state *state);
-extern void fpsimd_save(void);
-
extern void fpsimd_thread_switch(struct task_struct *next);
extern void fpsimd_flush_thread(void);
+extern void fpsimd_save_and_flush_cpu_state(void);
+
extern void fpsimd_signal_preserve_current_state(void);
extern void fpsimd_preserve_current_state(void);
extern void fpsimd_restore_current_state(void);
@@ -63,7 +63,6 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
void *sve_state, unsigned int sve_vl);
extern void fpsimd_flush_task_state(struct task_struct *target);
-extern void fpsimd_flush_cpu_state(void);
/* Maximum VL that SVE VL-agnostic software can transparently support */
#define SVE_VL_ARCH_MAX 0x100
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 577296bba730..9e4e4b6acd93 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -241,7 +241,7 @@ static void task_fpsimd_load(void)
*
* Softirqs (and preemption) must be disabled.
*/
-void fpsimd_save(void)
+static void fpsimd_save(void)
{
struct fpsimd_last_state_struct const *last =
this_cpu_ptr(&fpsimd_last_state);
@@ -1117,12 +1117,22 @@ void fpsimd_flush_task_state(struct task_struct *t)
* Invalidate any task's FPSIMD state that is present on this cpu.
* This function must be called with softirqs disabled.
*/
-void fpsimd_flush_cpu_state(void)
+static void fpsimd_flush_cpu_state(void)
{
__this_cpu_write(fpsimd_last_state.st, NULL);
set_thread_flag(TIF_FOREIGN_FPSTATE);
}
+/*
+ * Save the FPSIMD state to memory and invalidate cpu view.
+ * This function must be called with softirqs (and preemption) disabled.
+ */
+void fpsimd_save_and_flush_cpu_state(void)
+{
+ fpsimd_save();
+ fpsimd_flush_cpu_state();
+}
+
#ifdef CONFIG_KERNEL_MODE_NEON
DEFINE_PER_CPU(bool, kernel_neon_busy);
@@ -1279,8 +1289,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
{
switch (cmd) {
case CPU_PM_ENTER:
- fpsimd_save();
- fpsimd_flush_cpu_state();
+ fpsimd_save_and_flush_cpu_state();
break;
case CPU_PM_EXIT:
break;
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 6e3c9c8b2df9..01bd2e862506 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -113,8 +113,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1];
/* Clean guest FP state to memory and invalidate cpu view */
- fpsimd_save();
- fpsimd_flush_cpu_state();
+ fpsimd_save_and_flush_cpu_state();
if (guest_has_sve)
*guest_zcr = read_sysreg_s(SYS_ZCR_EL12);
--
2.11.0
When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
the kernel may be able to use FPSIMD/SVE. This is for instance the case
for crypto code.
Any use of FPSIMD/SVE in the kernel are clearly marked by using the
function kernel_neon_{begin, end}. Furthermore, this can only be used
when may_use_simd() returns true.
The current implementation of may_use_simd() allows softirq to use
FPSIMD/SVE unless it is currently in used (i.e kernel_neon_busy is true).
When in used, softirqs usually fallback to a software method.
At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
when touching the FPSIMD/SVE context. This has the drawback to disable
all softirqs even if they are not using FPSIMD/SVE.
As a softirq should not rely on been able to use simd at a given time,
there are limited reason to keep softirq disabled when touching the
FPSIMD/SVE context. Instead, we can only disable preemption and tell
the NEON unit is currently in use.
This patch introduces two new helpers {get, put}_cpu_fpsimd_context to
mark the area using FPSIMD/SVE context and use them in replacement of
local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
also re-implemented to use the new helpers.
Additionally, this patch introduced a double-underscored version of each
helper that can be used when preemption is disabled. This avoid to
disable/enable preemption for again and helps documenting places where
context can only be used by one instance.
This patch has been benchmarked on Linux 5.1-rc4 with defconfig.
On Juno2:
* hackbench 100 process 1000 (10 times)
* .7% quicker
On ThunderX 2:
* hackbench 1000 process 1000 (20 times)
* 3.4% quicker
Signed-off-by: Julien Grall <[email protected]>
---
Changes in v2:
- Remove spurious call to kernel_neon_enable in kernel_neon_begin.
- Rename kernel_neon_{enable, disable} to {get, put}_cpu_fpsimd_context
- Introduce a double-underscore version of the helpers for case
where preemption is already disabled
- Introduce have_cpu_fpsimd_context() and use it in WARN_ON(...)
- Surround more places in the code with the new helpers
- Rework the comments
- Update the commit message with the benchmark result
---
arch/arm64/include/asm/simd.h | 4 +-
arch/arm64/kernel/fpsimd.c | 133 ++++++++++++++++++++++++++++++------------
2 files changed, 97 insertions(+), 40 deletions(-)
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 6495cc51246f..94c0dac508aa 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -15,10 +15,10 @@
#include <linux/preempt.h>
#include <linux/types.h>
-#ifdef CONFIG_KERNEL_MODE_NEON
-
DECLARE_PER_CPU(bool, kernel_neon_busy);
+#ifdef CONFIG_KERNEL_MODE_NEON
+
/*
* may_use_simd - whether it is allowable at this time to issue SIMD
* instructions or access the SIMD register file
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9e4e4b6acd93..761d848fb26d 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -92,7 +92,8 @@
* To prevent this from racing with the manipulation of the task's FPSIMD state
* from task context and thereby corrupting the state, it is necessary to
* protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
- * flag with local_bh_disable() unless softirqs are already masked.
+ * flag with kernel_neon_{disable, enable}. This will still allow softirqs to
+ * run but prevent them to use FPSIMD.
*
* For a certain task, the sequence may look something like this:
* - the task gets scheduled in; if both the task's fpsimd_cpu field
@@ -150,6 +151,58 @@ extern void __percpu *efi_sve_state;
#endif /* ! CONFIG_ARM64_SVE */
+DEFINE_PER_CPU(bool, kernel_neon_busy);
+EXPORT_PER_CPU_SYMBOL(kernel_neon_busy);
+
+/*
+ * Obtain the CPU FPSIMD context for use by the calling context.
+ *
+ * The caller may freely modify FPSIMD context until *put_cpu_fpsimd_context()
+ * is called.
+ *
+ * The double-underscore version must only be called if you know the task
+ * can't be preempted.
+ *
+ * __get_cpu_fpsimd_context() *must* be in pair with __put_cpu_fpsimd_context()
+ * get_cpu_fpsimd_context() *must* be in pair with put_cpu_fpsimd_context()
+ */
+static void __get_cpu_fpsimd_context(void)
+{
+ bool busy = __this_cpu_xchg(kernel_neon_busy, true);
+
+ WARN_ON(busy);
+}
+
+static void get_cpu_fpsimd_context(void)
+{
+ preempt_disable();
+ __get_cpu_fpsimd_context();
+}
+
+/*
+ * Release the CPU FPSIMD context.
+ *
+ * Must be called from a context in which *get_cpu_fpsimd_context() was
+ * previously called, with no call to *put_cpu_fpsimd_context() in the
+ * meantime.
+ */
+static void __put_cpu_fpsimd_context(void)
+{
+ bool busy = __this_cpu_xchg(kernel_neon_busy, false);
+ WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
+}
+
+static void put_cpu_fpsimd_context(void)
+{
+ __put_cpu_fpsimd_context();
+ preempt_enable();
+}
+
+static bool have_cpu_fpsimd_context(void)
+{
+ return (!preemptible() && __this_cpu_read(kernel_neon_busy));
+}
+
/*
* Call __sve_free() directly only if you know task can't be scheduled
* or preempted.
@@ -221,11 +274,12 @@ static void sve_free(struct task_struct *task)
* thread_struct is known to be up to date, when preparing to enter
* userspace.
*
- * Softirqs (and preemption) must be disabled.
+ * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
+ * before calling this function.
*/
static void task_fpsimd_load(void)
{
- WARN_ON(!in_softirq() && !irqs_disabled());
+ WARN_ON(!have_cpu_fpsimd_context());
if (system_supports_sve() && test_thread_flag(TIF_SVE))
sve_load_state(sve_pffr(¤t->thread),
@@ -239,15 +293,22 @@ static void task_fpsimd_load(void)
* Ensure FPSIMD/SVE storage in memory for the loaded context is up to
* date with respect to the CPU registers.
*
- * Softirqs (and preemption) must be disabled.
+ * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
+ * before calling this function.
*/
static void fpsimd_save(void)
{
struct fpsimd_last_state_struct const *last =
this_cpu_ptr(&fpsimd_last_state);
/* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
+ WARN_ON(!have_cpu_fpsimd_context());
- WARN_ON(!in_softirq() && !irqs_disabled());
+ if ( !have_cpu_fpsimd_context() )
+ {
+ printk("preemptible() = %u kernel_neon_busy = %u\n",
+ preemptible(), __this_cpu_read(kernel_neon_busy));
+ while (1);
+ }
if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
@@ -352,7 +413,8 @@ static int __init sve_sysctl_init(void) { return 0; }
* task->thread.sve_state.
*
* Task can be a non-runnable task, or current. In the latter case,
- * softirqs (and preemption) must be disabled.
+ * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
+ * before calling this function.
* task->thread.sve_state must point to at least sve_state_size(task)
* bytes of allocated kernel memory.
* task->thread.uw.fpsimd_state must be up to date before calling this
@@ -379,7 +441,8 @@ static void fpsimd_to_sve(struct task_struct *task)
* task->thread.uw.fpsimd_state.
*
* Task can be a non-runnable task, or current. In the latter case,
- * softirqs (and preemption) must be disabled.
+ * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
+ * before calling this function.
* task->thread.sve_state must point to at least sve_state_size(task)
* bytes of allocated kernel memory.
* task->thread.sve_state must be up to date before calling this function.
@@ -539,7 +602,7 @@ int sve_set_vector_length(struct task_struct *task,
* non-SVE thread.
*/
if (task == current) {
- local_bh_disable();
+ get_cpu_fpsimd_context();
fpsimd_save();
}
@@ -549,7 +612,7 @@ int sve_set_vector_length(struct task_struct *task,
sve_to_fpsimd(task);
if (task == current)
- local_bh_enable();
+ put_cpu_fpsimd_context();
/*
* Force reallocation of task SVE state to the correct size
@@ -862,7 +925,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
sve_alloc(current);
- local_bh_disable();
+ get_cpu_fpsimd_context();
fpsimd_save();
@@ -873,7 +936,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
if (test_and_set_thread_flag(TIF_SVE))
WARN_ON(1); /* SVE access shouldn't have trapped */
- local_bh_enable();
+ put_cpu_fpsimd_context();
}
/*
@@ -917,6 +980,8 @@ void fpsimd_thread_switch(struct task_struct *next)
if (!system_supports_fpsimd())
return;
+ __get_cpu_fpsimd_context();
+
/* Save unsaved fpsimd state, if any: */
fpsimd_save();
@@ -931,6 +996,8 @@ void fpsimd_thread_switch(struct task_struct *next)
update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
wrong_task || wrong_cpu);
+
+ __put_cpu_fpsimd_context();
}
void fpsimd_flush_thread(void)
@@ -940,7 +1007,7 @@ void fpsimd_flush_thread(void)
if (!system_supports_fpsimd())
return;
- local_bh_disable();
+ get_cpu_fpsimd_context();
fpsimd_flush_task_state(current);
memset(¤t->thread.uw.fpsimd_state, 0,
@@ -981,7 +1048,7 @@ void fpsimd_flush_thread(void)
current->thread.sve_vl_onexec = 0;
}
- local_bh_enable();
+ put_cpu_fpsimd_context();
}
/*
@@ -993,9 +1060,9 @@ void fpsimd_preserve_current_state(void)
if (!system_supports_fpsimd())
return;
- local_bh_disable();
+ get_cpu_fpsimd_context();
fpsimd_save();
- local_bh_enable();
+ put_cpu_fpsimd_context();
}
/*
@@ -1012,7 +1079,8 @@ void fpsimd_signal_preserve_current_state(void)
/*
* Associate current's FPSIMD context with this cpu
- * Preemption must be disabled when calling this function.
+ * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
+ * before calling this function.
*/
void fpsimd_bind_task_to_cpu(void)
{
@@ -1058,14 +1126,14 @@ void fpsimd_restore_current_state(void)
if (!system_supports_fpsimd())
return;
- local_bh_disable();
+ get_cpu_fpsimd_context();
if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
task_fpsimd_load();
fpsimd_bind_task_to_cpu();
}
- local_bh_enable();
+ put_cpu_fpsimd_context();
}
/*
@@ -1078,7 +1146,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
if (!system_supports_fpsimd())
return;
- local_bh_disable();
+ get_cpu_fpsimd_context();
current->thread.uw.fpsimd_state = *state;
if (system_supports_sve() && test_thread_flag(TIF_SVE))
@@ -1089,7 +1157,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
clear_thread_flag(TIF_FOREIGN_FPSTATE);
- local_bh_enable();
+ put_cpu_fpsimd_context();
}
/*
@@ -1115,7 +1183,8 @@ void fpsimd_flush_task_state(struct task_struct *t)
/*
* Invalidate any task's FPSIMD state that is present on this cpu.
- * This function must be called with softirqs disabled.
+ * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
+ * before calling this function.
*/
static void fpsimd_flush_cpu_state(void)
{
@@ -1125,19 +1194,18 @@ static void fpsimd_flush_cpu_state(void)
/*
* Save the FPSIMD state to memory and invalidate cpu view.
- * This function must be called with softirqs (and preemption) disabled.
+ * This function must be called with preemption disabled.
*/
void fpsimd_save_and_flush_cpu_state(void)
{
+ __get_cpu_fpsimd_context();
fpsimd_save();
fpsimd_flush_cpu_state();
+ __put_cpu_fpsimd_context();
}
#ifdef CONFIG_KERNEL_MODE_NEON
-DEFINE_PER_CPU(bool, kernel_neon_busy);
-EXPORT_PER_CPU_SYMBOL(kernel_neon_busy);
-
/*
* Kernel-side NEON support functions
*/
@@ -1162,19 +1230,13 @@ void kernel_neon_begin(void)
BUG_ON(!may_use_simd());
- local_bh_disable();
-
- __this_cpu_write(kernel_neon_busy, true);
+ get_cpu_fpsimd_context();
/* Save unsaved fpsimd state, if any: */
fpsimd_save();
/* Invalidate any task state remaining in the fpsimd regs: */
fpsimd_flush_cpu_state();
-
- preempt_disable();
-
- local_bh_enable();
}
EXPORT_SYMBOL(kernel_neon_begin);
@@ -1189,15 +1251,10 @@ EXPORT_SYMBOL(kernel_neon_begin);
*/
void kernel_neon_end(void)
{
- bool busy;
-
if (!system_supports_fpsimd())
return;
- busy = __this_cpu_xchg(kernel_neon_busy, false);
- WARN_ON(!busy); /* No matching kernel_neon_begin()? */
-
- preempt_enable();
+ put_cpu_fpsimd_context();
}
EXPORT_SYMBOL(kernel_neon_end);
--
2.11.0
The function sve_flush_cpu_state() has been removed in commit
21cdd7fd76e3 "KVM: arm64: Remove eager host SVE state saving".
So remove the associated prototype in asm/fpsimd.h.
Signed-off-by: Julien Grall <[email protected]>
---
Changes in v2:
- Patch added
---
arch/arm64/include/asm/fpsimd.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index ad6d2e41eb37..32abbfd852e8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -64,7 +64,6 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
extern void fpsimd_flush_task_state(struct task_struct *target);
extern void fpsimd_flush_cpu_state(void);
-extern void sve_flush_cpu_state(void);
/* Maximum VL that SVE VL-agnostic software can transparently support */
#define SVE_VL_ARCH_MAX 0x100
--
2.11.0
On Fri, Apr 12, 2019 at 06:14:18PM +0100, Julien Grall wrote:
> The function sve_flush_cpu_state() has been removed in commit
> 21cdd7fd76e3 "KVM: arm64: Remove eager host SVE state saving".
>
> So remove the associated prototype in asm/fpsimd.h.
>
> Signed-off-by: Julien Grall <[email protected]>
Good spot, thanks.
Reviewed-by: Dave Martin <[email protected]>
>
> ---
> Changes in v2:
> - Patch added
> ---
> arch/arm64/include/asm/fpsimd.h | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index ad6d2e41eb37..32abbfd852e8 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -64,7 +64,6 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
>
> extern void fpsimd_flush_task_state(struct task_struct *target);
> extern void fpsimd_flush_cpu_state(void);
> -extern void sve_flush_cpu_state(void);
>
> /* Maximum VL that SVE VL-agnostic software can transparently support */
> #define SVE_VL_ARCH_MAX 0x100
> --
> 2.11.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Fri, Apr 12, 2019 at 06:14:19PM +0100, Julien Grall wrote:
> The only external user of fpsimd_save() and fpsimd_flush_cpu_state() is
> the KVM FPSIMD code.
>
> A follow-up patch will mandate hold of the FPSIMD context while
This is a bit hard to read without understanding the following patch.
Maybe elaborate a bit with something like
"The following patch will introduce a mechanism to acquire ownership of the FPSIMD/SVE context for performing context management operations. Rather than having to export this [...]"
> modifying it. Rather than having to export the new helpers to get/put
> the context, we can just introduce a new function to combine
> fpsimd_save() and fpsimd_flush_cpu_state().
>
> This has also the advantage to remove any external call of fpsimd_save()
> and fpsimd_flush_cpu_state(), so they can be turned static.
>
> Lastly, the new function can also be used in the PM notifier.
>
> Signed-off-by: Julien Grall <[email protected]>
>
> ---
> Changes in v2:
> - Patch added
> ---
> arch/arm64/include/asm/fpsimd.h | 5 ++---
> arch/arm64/kernel/fpsimd.c | 17 +++++++++++++----
> arch/arm64/kvm/fpsimd.c | 3 +--
> 3 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 32abbfd852e8..88c223f6ded6 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -48,11 +48,11 @@ struct task_struct;
> extern void fpsimd_save_state(struct user_fpsimd_state *state);
> extern void fpsimd_load_state(struct user_fpsimd_state *state);
>
> -extern void fpsimd_save(void);
> -
> extern void fpsimd_thread_switch(struct task_struct *next);
> extern void fpsimd_flush_thread(void);
>
> +extern void fpsimd_save_and_flush_cpu_state(void);
> +
Nit, bit I think this makes more sense alongside the other flush
functions.
Can you just drop this new declaration in place of the old declaration
for sve_flush_cpu_state()?
> extern void fpsimd_signal_preserve_current_state(void);
> extern void fpsimd_preserve_current_state(void);
> extern void fpsimd_restore_current_state(void);
> @@ -63,7 +63,6 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
> void *sve_state, unsigned int sve_vl);
>
> extern void fpsimd_flush_task_state(struct task_struct *target);
> -extern void fpsimd_flush_cpu_state(void);
>
> /* Maximum VL that SVE VL-agnostic software can transparently support */
> #define SVE_VL_ARCH_MAX 0x100
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 577296bba730..9e4e4b6acd93 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -241,7 +241,7 @@ static void task_fpsimd_load(void)
> *
> * Softirqs (and preemption) must be disabled.
> */
> -void fpsimd_save(void)
> +static void fpsimd_save(void)
> {
> struct fpsimd_last_state_struct const *last =
> this_cpu_ptr(&fpsimd_last_state);
> @@ -1117,12 +1117,22 @@ void fpsimd_flush_task_state(struct task_struct *t)
> * Invalidate any task's FPSIMD state that is present on this cpu.
> * This function must be called with softirqs disabled.
> */
> -void fpsimd_flush_cpu_state(void)
> +static void fpsimd_flush_cpu_state(void)
> {
> __this_cpu_write(fpsimd_last_state.st, NULL);
> set_thread_flag(TIF_FOREIGN_FPSTATE);
> }
>
> +/*
> + * Save the FPSIMD state to memory and invalidate cpu view.
> + * This function must be called with softirqs (and preemption) disabled.
> + */
> +void fpsimd_save_and_flush_cpu_state(void)
> +{
> + fpsimd_save();
> + fpsimd_flush_cpu_state();
> +}
> +
> #ifdef CONFIG_KERNEL_MODE_NEON
>
> DEFINE_PER_CPU(bool, kernel_neon_busy);
> @@ -1279,8 +1289,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> {
> switch (cmd) {
> case CPU_PM_ENTER:
> - fpsimd_save();
> - fpsimd_flush_cpu_state();
> + fpsimd_save_and_flush_cpu_state();
Any reason not to do this in kernel_neon_begin() too?
> break;
> case CPU_PM_EXIT:
> break;
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 6e3c9c8b2df9..01bd2e862506 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -113,8 +113,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1];
>
> /* Clean guest FP state to memory and invalidate cpu view */
You can lose the comment here, since the call is now self-explanatory.
> - fpsimd_save();
> - fpsimd_flush_cpu_state();
> + fpsimd_save_and_flush_cpu_state();
[...]
Otherwise, looks sensible.
Cheers
---Dave
Hi Dave,
On 4/16/19 12:53 PM, Dave Martin wrote:
> On Fri, Apr 12, 2019 at 06:14:19PM +0100, Julien Grall wrote:
>> The only external user of fpsimd_save() and fpsimd_flush_cpu_state() is
>> the KVM FPSIMD code.
>>
>> A follow-up patch will mandate hold of the FPSIMD context while
>
> This is a bit hard to read without understanding the following patch.
>
> Maybe elaborate a bit with something like
>
> "The following patch will introduce a mechanism to acquire ownership of the FPSIMD/SVE context for performing context management operations. Rather than having to export this [...]"
I will update the commit message with your suggestion.
>
>> modifying it. Rather than having to export the new helpers to get/put
>> the context, we can just introduce a new function to combine
>> fpsimd_save() and fpsimd_flush_cpu_state().
>>
>> This has also the advantage to remove any external call of fpsimd_save()
>> and fpsimd_flush_cpu_state(), so they can be turned static.
>>
>> Lastly, the new function can also be used in the PM notifier.
>>
>> Signed-off-by: Julien Grall <[email protected]>
>>
>> ---
>> Changes in v2:
>> - Patch added
>> ---
>> arch/arm64/include/asm/fpsimd.h | 5 ++---
>> arch/arm64/kernel/fpsimd.c | 17 +++++++++++++----
>> arch/arm64/kvm/fpsimd.c | 3 +--
>> 3 files changed, 16 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
>> index 32abbfd852e8..88c223f6ded6 100644
>> --- a/arch/arm64/include/asm/fpsimd.h
>> +++ b/arch/arm64/include/asm/fpsimd.h
>> @@ -48,11 +48,11 @@ struct task_struct;
>> extern void fpsimd_save_state(struct user_fpsimd_state *state);
>> extern void fpsimd_load_state(struct user_fpsimd_state *state);
>>
>> -extern void fpsimd_save(void);
>> -
>> extern void fpsimd_thread_switch(struct task_struct *next);
>> extern void fpsimd_flush_thread(void);
>>
>> +extern void fpsimd_save_and_flush_cpu_state(void);
>> +
>
> Nit, bit I think this makes more sense alongside the other flush
> functions.
>
> Can you just drop this new declaration in place of the old declaration
> for sve_flush_cpu_state()?
Sure, I will do it in the next version.
>
>> extern void fpsimd_signal_preserve_current_state(void);
>> extern void fpsimd_preserve_current_state(void);
>> extern void fpsimd_restore_current_state(void);
>> @@ -63,7 +63,6 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
>> void *sve_state, unsigned int sve_vl);
>>
>> extern void fpsimd_flush_task_state(struct task_struct *target);
>> -extern void fpsimd_flush_cpu_state(void);
>>
>> /* Maximum VL that SVE VL-agnostic software can transparently support */
>> #define SVE_VL_ARCH_MAX 0x100
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 577296bba730..9e4e4b6acd93 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -241,7 +241,7 @@ static void task_fpsimd_load(void)
>> *
>> * Softirqs (and preemption) must be disabled.
>> */
>> -void fpsimd_save(void)
>> +static void fpsimd_save(void)
>> {
>> struct fpsimd_last_state_struct const *last =
>> this_cpu_ptr(&fpsimd_last_state);
>> @@ -1117,12 +1117,22 @@ void fpsimd_flush_task_state(struct task_struct *t)
>> * Invalidate any task's FPSIMD state that is present on this cpu.
>> * This function must be called with softirqs disabled.
>> */
>> -void fpsimd_flush_cpu_state(void)
>> +static void fpsimd_flush_cpu_state(void)
>> {
>> __this_cpu_write(fpsimd_last_state.st, NULL);
>> set_thread_flag(TIF_FOREIGN_FPSTATE);
>> }
>>
>> +/*
>> + * Save the FPSIMD state to memory and invalidate cpu view.
>> + * This function must be called with softirqs (and preemption) disabled.
>> + */
>> +void fpsimd_save_and_flush_cpu_state(void)
>> +{
>> + fpsimd_save();
>> + fpsimd_flush_cpu_state();
>> +}
>> +
>> #ifdef CONFIG_KERNEL_MODE_NEON
>>
>> DEFINE_PER_CPU(bool, kernel_neon_busy);
>> @@ -1279,8 +1289,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>> {
>> switch (cmd) {
>> case CPU_PM_ENTER:
>> - fpsimd_save();
>> - fpsimd_flush_cpu_state();
>> + fpsimd_save_and_flush_cpu_state();
>
> Any reason not to do this in kernel_neon_begin() too?
In the next patch, fpsimd_save_and_flush_cpu_state() will also call
{get, put}_cpu_fpsimd_context(). So we avoid to expose the two new
functions outside of the file.
I could introduce a double-underscore version of this function if you
think it is worth it.
>
>> break;
>> case CPU_PM_EXIT:
>> break;
>> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
>> index 6e3c9c8b2df9..01bd2e862506 100644
>> --- a/arch/arm64/kvm/fpsimd.c
>> +++ b/arch/arm64/kvm/fpsimd.c
>> @@ -113,8 +113,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
>> u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1];
>>
>> /* Clean guest FP state to memory and invalidate cpu view */
>
> You can lose the comment here, since the call is now self-explanatory.
Sure.
>
>> - fpsimd_save();
>> - fpsimd_flush_cpu_state();
>> + fpsimd_save_and_flush_cpu_state();
>
> [...]
>
> Otherwise, looks sensible.
Thank you for the review!
Cheers,
--
Julien Grall
On Fri, Apr 12, 2019 at 06:14:20PM +0100, Julien Grall wrote:
> When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
> the kernel may be able to use FPSIMD/SVE. This is for instance the case
> for crypto code.
>
> Any use of FPSIMD/SVE in the kernel are clearly marked by using the
> function kernel_neon_{begin, end}. Furthermore, this can only be used
> when may_use_simd() returns true.
>
> The current implementation of may_use_simd() allows softirq to use
> FPSIMD/SVE unless it is currently in used (i.e kernel_neon_busy is true).
Nit: "in used" -> "in use"
> When in used, softirqs usually fallback to a software method.
Likewise.
Nit: "fallback" -> "fall back"
> At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
> when touching the FPSIMD/SVE context. This has the drawback to disable
> all softirqs even if they are not using FPSIMD/SVE.
>
> As a softirq should not rely on been able to use simd at a given time,
> there are limited reason to keep softirq disabled when touching the
The implication is not totally clear to me here. Maybe write something
like
"Since a softirq is supposed to check may_use_simd() anyway before
attempting to use FPSIMD/SVE, there is limited reason to keep softirq
disabled when touching the FPSIMD/SVE context [...]"
> FPSIMD/SVE context. Instead, we can only disable preemption and tell
I'd put "just" or "simply" instead of "only" here.
> the NEON unit is currently in use.
Maybe "mark the FPSIMD/SVE context as in use by setting the
CPU's kernel_neon_busy flag".
> This patch introduces two new helpers {get, put}_cpu_fpsimd_context to
> mark the area using FPSIMD/SVE context and use them in replacement of
uses
> local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
> also re-implemented to use the new helpers.
>
> Additionally, this patch introduced a double-underscored version of each
introduces
> helper that can be used when preemption is disabled. This avoid to
> disable/enable preemption for again and helps documenting places where
The wording seems a bit mangled here? Also, these are not for general
use, so maybe say something like
"For use in the fpsimd_thread_switch(), which is a critical path where
preemption is already disabled, double-underscored versions of the
helpers are provided to avoid disabling preemption again."
(I'm assuming here that we don't need to use these elsewhere -- see
other comments.)
> context can only be used by one instance.
>
> This patch has been benchmarked on Linux 5.1-rc4 with defconfig.
>
> On Juno2:
> * hackbench 100 process 1000 (10 times)
> * .7% quicker
>
> On ThunderX 2:
> * hackbench 1000 process 1000 (20 times)
> * 3.4% quicker
>
> Signed-off-by: Julien Grall <[email protected]>
>
> ---
> Changes in v2:
> - Remove spurious call to kernel_neon_enable in kernel_neon_begin.
> - Rename kernel_neon_{enable, disable} to {get, put}_cpu_fpsimd_context
> - Introduce a double-underscore version of the helpers for case
> where preemption is already disabled
> - Introduce have_cpu_fpsimd_context() and use it in WARN_ON(...)
> - Surround more places in the code with the new helpers
> - Rework the comments
> - Update the commit message with the benchmark result
> ---
> arch/arm64/include/asm/simd.h | 4 +-
> arch/arm64/kernel/fpsimd.c | 133 ++++++++++++++++++++++++++++++------------
> 2 files changed, 97 insertions(+), 40 deletions(-)
>
> diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
> index 6495cc51246f..94c0dac508aa 100644
> --- a/arch/arm64/include/asm/simd.h
> +++ b/arch/arm64/include/asm/simd.h
> @@ -15,10 +15,10 @@
> #include <linux/preempt.h>
> #include <linux/types.h>
>
> -#ifdef CONFIG_KERNEL_MODE_NEON
> -
> DECLARE_PER_CPU(bool, kernel_neon_busy);
>
> +#ifdef CONFIG_KERNEL_MODE_NEON
> +
> /*
> * may_use_simd - whether it is allowable at this time to issue SIMD
> * instructions or access the SIMD register file
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 9e4e4b6acd93..761d848fb26d 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -92,7 +92,8 @@
> * To prevent this from racing with the manipulation of the task's FPSIMD state
> * from task context and thereby corrupting the state, it is necessary to
> * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
> - * flag with local_bh_disable() unless softirqs are already masked.
> + * flag with kernel_neon_{disable, enable}. This will still allow softirqs to
These names don't match the code now.
> + * run but prevent them to use FPSIMD.
> *
> * For a certain task, the sequence may look something like this:
> * - the task gets scheduled in; if both the task's fpsimd_cpu field
> @@ -150,6 +151,58 @@ extern void __percpu *efi_sve_state;
>
> #endif /* ! CONFIG_ARM64_SVE */
>
> +DEFINE_PER_CPU(bool, kernel_neon_busy);
> +EXPORT_PER_CPU_SYMBOL(kernel_neon_busy);
This feels mis-named now. Maybe "fpsimd_context_busy" would be a better
name?
> +
> +/*
> + * Obtain the CPU FPSIMD context for use by the calling context.
> + *
> + * The caller may freely modify FPSIMD context until *put_cpu_fpsimd_context()
Nit: Why *? This makes it look a bit like get_cpu_fpsimd_context()
returns a pointer and you're saying something about dereferencing that
pointer here.
> + * is called.
> + *
> + * The double-underscore version must only be called if you know the task
> + * can't be preempted.
> + *
> + * __get_cpu_fpsimd_context() *must* be in pair with __put_cpu_fpsimd_context()
> + * get_cpu_fpsimd_context() *must* be in pair with put_cpu_fpsimd_context()
"in pair" -> "paired with"?
I'd move each of these comments to be next to the function it applies
to.
> + */
> +static void __get_cpu_fpsimd_context(void)
> +{
> + bool busy = __this_cpu_xchg(kernel_neon_busy, true);
> +
I don't mind whether there is a blank line here or not, but please make
it consistent with __put_cpu_fpsimd_context().
> + WARN_ON(busy);
> +}
> +
> +static void get_cpu_fpsimd_context(void)
> +{
> + preempt_disable();
> + __get_cpu_fpsimd_context();
> +}
> +
> +/*
> + * Release the CPU FPSIMD context.
> + *
> + * Must be called from a context in which *get_cpu_fpsimd_context() was
Nit: Why *?
> + * previously called, with no call to *put_cpu_fpsimd_context() in the
> + * meantime.
> + */
> +static void __put_cpu_fpsimd_context(void)
> +{
> + bool busy = __this_cpu_xchg(kernel_neon_busy, false);
> + WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
> +}
> +
> +static void put_cpu_fpsimd_context(void)
> +{
> + __put_cpu_fpsimd_context();
> + preempt_enable();
> +}
> +
> +static bool have_cpu_fpsimd_context(void)
> +{
> + return (!preemptible() && __this_cpu_read(kernel_neon_busy));
Nit: Redundant ()
> +}
> +
> /*
> * Call __sve_free() directly only if you know task can't be scheduled
> * or preempted.
> @@ -221,11 +274,12 @@ static void sve_free(struct task_struct *task)
> * thread_struct is known to be up to date, when preparing to enter
> * userspace.
> *
> - * Softirqs (and preemption) must be disabled.
> + * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
or __get_cpu_fpsimd_context()? Since this is effectively documented by
the WARN_ON() and this is a local function anyway, maybe it would be
simpler just to drop this comment here?
> + * before calling this function.
> */
> static void task_fpsimd_load(void)
> {
> - WARN_ON(!in_softirq() && !irqs_disabled());
> + WARN_ON(!have_cpu_fpsimd_context());
>
> if (system_supports_sve() && test_thread_flag(TIF_SVE))
> sve_load_state(sve_pffr(¤t->thread),
> @@ -239,15 +293,22 @@ static void task_fpsimd_load(void)
> * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
> * date with respect to the CPU registers.
> *
> - * Softirqs (and preemption) must be disabled.
> + * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
Likewise.
> + * before calling this function.
> */
> static void fpsimd_save(void)
> {
> struct fpsimd_last_state_struct const *last =
> this_cpu_ptr(&fpsimd_last_state);
> /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
> + WARN_ON(!have_cpu_fpsimd_context());
>
> - WARN_ON(!in_softirq() && !irqs_disabled());
> + if ( !have_cpu_fpsimd_context() )
Nit: Redundant whitespace around expression.
> + {
> + printk("preemptible() = %u kernel_neon_busy = %u\n",
> + preemptible(), __this_cpu_read(kernel_neon_busy));
> + while (1);
> + }
>
> if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
> @@ -352,7 +413,8 @@ static int __init sve_sysctl_init(void) { return 0; }
> * task->thread.sve_state.
> *
> * Task can be a non-runnable task, or current. In the latter case,
> - * softirqs (and preemption) must be disabled.
> + * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
> + * before calling this function.
> * task->thread.sve_state must point to at least sve_state_size(task)
> * bytes of allocated kernel memory.
> * task->thread.uw.fpsimd_state must be up to date before calling this
> @@ -379,7 +441,8 @@ static void fpsimd_to_sve(struct task_struct *task)
> * task->thread.uw.fpsimd_state.
> *
> * Task can be a non-runnable task, or current. In the latter case,
> - * softirqs (and preemption) must be disabled.
> + * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
> + * before calling this function.
> * task->thread.sve_state must point to at least sve_state_size(task)
> * bytes of allocated kernel memory.
> * task->thread.sve_state must be up to date before calling this function.
> @@ -539,7 +602,7 @@ int sve_set_vector_length(struct task_struct *task,
> * non-SVE thread.
> */
> if (task == current) {
> - local_bh_disable();
> + get_cpu_fpsimd_context();
>
> fpsimd_save();
> }
> @@ -549,7 +612,7 @@ int sve_set_vector_length(struct task_struct *task,
> sve_to_fpsimd(task);
>
> if (task == current)
> - local_bh_enable();
> + put_cpu_fpsimd_context();
>
> /*
> * Force reallocation of task SVE state to the correct size
> @@ -862,7 +925,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>
> sve_alloc(current);
>
> - local_bh_disable();
> + get_cpu_fpsimd_context();
>
> fpsimd_save();
>
> @@ -873,7 +936,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
> if (test_and_set_thread_flag(TIF_SVE))
> WARN_ON(1); /* SVE access shouldn't have trapped */
>
> - local_bh_enable();
> + put_cpu_fpsimd_context();
> }
>
> /*
> @@ -917,6 +980,8 @@ void fpsimd_thread_switch(struct task_struct *next)
> if (!system_supports_fpsimd())
> return;
>
> + __get_cpu_fpsimd_context();
> +
> /* Save unsaved fpsimd state, if any: */
> fpsimd_save();
>
> @@ -931,6 +996,8 @@ void fpsimd_thread_switch(struct task_struct *next)
>
> update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> wrong_task || wrong_cpu);
> +
> + __put_cpu_fpsimd_context();
> }
>
> void fpsimd_flush_thread(void)
> @@ -940,7 +1007,7 @@ void fpsimd_flush_thread(void)
> if (!system_supports_fpsimd())
> return;
>
> - local_bh_disable();
> + get_cpu_fpsimd_context();
>
> fpsimd_flush_task_state(current);
> memset(¤t->thread.uw.fpsimd_state, 0,
> @@ -981,7 +1048,7 @@ void fpsimd_flush_thread(void)
> current->thread.sve_vl_onexec = 0;
> }
>
> - local_bh_enable();
> + put_cpu_fpsimd_context();
> }
>
> /*
> @@ -993,9 +1060,9 @@ void fpsimd_preserve_current_state(void)
> if (!system_supports_fpsimd())
> return;
>
> - local_bh_disable();
> + get_cpu_fpsimd_context();
> fpsimd_save();
> - local_bh_enable();
> + put_cpu_fpsimd_context();
> }
>
> /*
> @@ -1012,7 +1079,8 @@ void fpsimd_signal_preserve_current_state(void)
>
> /*
> * Associate current's FPSIMD context with this cpu
> - * Preemption must be disabled when calling this function.
> + * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
> + * before calling this function.
> */
> void fpsimd_bind_task_to_cpu(void)
> {
> @@ -1058,14 +1126,14 @@ void fpsimd_restore_current_state(void)
> if (!system_supports_fpsimd())
> return;
>
> - local_bh_disable();
> + get_cpu_fpsimd_context();
>
> if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> task_fpsimd_load();
> fpsimd_bind_task_to_cpu();
> }
>
> - local_bh_enable();
> + put_cpu_fpsimd_context();
> }
>
> /*
> @@ -1078,7 +1146,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
> if (!system_supports_fpsimd())
> return;
>
> - local_bh_disable();
> + get_cpu_fpsimd_context();
>
> current->thread.uw.fpsimd_state = *state;
> if (system_supports_sve() && test_thread_flag(TIF_SVE))
> @@ -1089,7 +1157,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>
> clear_thread_flag(TIF_FOREIGN_FPSTATE);
>
> - local_bh_enable();
> + put_cpu_fpsimd_context();
> }
>
> /*
> @@ -1115,7 +1183,8 @@ void fpsimd_flush_task_state(struct task_struct *t)
>
> /*
> * Invalidate any task's FPSIMD state that is present on this cpu.
> - * This function must be called with softirqs disabled.
> + * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
> + * before calling this function.
> */
> static void fpsimd_flush_cpu_state(void)
> {
> @@ -1125,19 +1194,18 @@ static void fpsimd_flush_cpu_state(void)
>
> /*
> * Save the FPSIMD state to memory and invalidate cpu view.
> - * This function must be called with softirqs (and preemption) disabled.
> + * This function must be called with preemption disabled.
> */
> void fpsimd_save_and_flush_cpu_state(void)
> {
> + __get_cpu_fpsimd_context();
> fpsimd_save();
> fpsimd_flush_cpu_state();
> + __put_cpu_fpsimd_context();
It may be cleaner to avoid the assumption about preemption already being
disabled here. fpsimd_thread_switch() is rather a special case, but for
this one is this really used on a hot path that justifies the assumption?
If not, we could just move to the regular (non-__) functions here and
drop that comment.
[...]
Cheers
---Dave
Hi Dave,
On 16/04/2019 13:30, Dave Martin wrote:
> On Fri, Apr 12, 2019 at 06:14:20PM +0100, Julien Grall wrote:
>> When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
>> the kernel may be able to use FPSIMD/SVE. This is for instance the case
>> for crypto code.
>>
>> Any use of FPSIMD/SVE in the kernel are clearly marked by using the
>> function kernel_neon_{begin, end}. Furthermore, this can only be used
>> when may_use_simd() returns true.
>>
>> The current implementation of may_use_simd() allows softirq to use
>> FPSIMD/SVE unless it is currently in used (i.e kernel_neon_busy is true).
>
> Nit: "in used" -> "in use"
>
>> When in used, softirqs usually fallback to a software method.
>
> Likewise.
>
> Nit: "fallback" -> "fall back"
>
>> At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
>> when touching the FPSIMD/SVE context. This has the drawback to disable
>> all softirqs even if they are not using FPSIMD/SVE.
>>
>> As a softirq should not rely on been able to use simd at a given time,
>> there are limited reason to keep softirq disabled when touching the
>
> The implication is not totally clear to me here. Maybe write something
> like
>
> "Since a softirq is supposed to check may_use_simd() anyway before
> attempting to use FPSIMD/SVE, there is limited reason to keep softirq
> disabled when touching the FPSIMD/SVE context [...]"
I will update the commit message.
>
>> FPSIMD/SVE context. Instead, we can only disable preemption and tell
>
> I'd put "just" or "simply" instead of "only" here.
>
>> the NEON unit is currently in use.
>
> Maybe "mark the FPSIMD/SVE context as in use by setting the
> CPU's kernel_neon_busy flag".
Sounds better as this flag does not protect only the hardware but some part of
context that resides in memory.
>
>> This patch introduces two new helpers {get, put}_cpu_fpsimd_context to
>> mark the area using FPSIMD/SVE context and use them in replacement of
>
> uses
>
>> local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
>> also re-implemented to use the new helpers.
>>
>> Additionally, this patch introduced a double-underscored version of each
>
> introduces
>
>> helper that can be used when preemption is disabled. This avoid to
>> disable/enable preemption for again and helps documenting places where
>
> The wording seems a bit mangled here?
Oops yes.
> Also, these are not for general use, so maybe say something like
>
> "For use in the fpsimd_thread_switch(), which is a critical path where
> preemption is already disabled, double-underscored versions of the
> helpers are provided to avoid disabling preemption again."
>
> (I'm assuming here that we don't need to use these elsewhere -- see
> other comments.)
I will comment on this below.
>
>> context can only be used by one instance.
>>
>> This patch has been benchmarked on Linux 5.1-rc4 with defconfig.
>>
>> On Juno2:
>> * hackbench 100 process 1000 (10 times)
>> * .7% quicker
>>
>> On ThunderX 2:
>> * hackbench 1000 process 1000 (20 times)
>> * 3.4% quicker
>>
>> Signed-off-by: Julien Grall <[email protected]>
>>
>> ---
>> Changes in v2:
>> - Remove spurious call to kernel_neon_enable in kernel_neon_begin.
>> - Rename kernel_neon_{enable, disable} to {get, put}_cpu_fpsimd_context
>> - Introduce a double-underscore version of the helpers for case
>> where preemption is already disabled
>> - Introduce have_cpu_fpsimd_context() and use it in WARN_ON(...)
>> - Surround more places in the code with the new helpers
>> - Rework the comments
>> - Update the commit message with the benchmark result
>> ---
>> arch/arm64/include/asm/simd.h | 4 +-
>> arch/arm64/kernel/fpsimd.c | 133 ++++++++++++++++++++++++++++++------------
>> 2 files changed, 97 insertions(+), 40 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
>> index 6495cc51246f..94c0dac508aa 100644
>> --- a/arch/arm64/include/asm/simd.h
>> +++ b/arch/arm64/include/asm/simd.h
>> @@ -15,10 +15,10 @@
>> #include <linux/preempt.h>
>> #include <linux/types.h>
>>
>> -#ifdef CONFIG_KERNEL_MODE_NEON
>> -
>> DECLARE_PER_CPU(bool, kernel_neon_busy);
>>
>> +#ifdef CONFIG_KERNEL_MODE_NEON
>> +
>> /*
>> * may_use_simd - whether it is allowable at this time to issue SIMD
>> * instructions or access the SIMD register file
>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> index 9e4e4b6acd93..761d848fb26d 100644
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -92,7 +92,8 @@
>> * To prevent this from racing with the manipulation of the task's FPSIMD state
>> * from task context and thereby corrupting the state, it is necessary to
>> * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
>> - * flag with local_bh_disable() unless softirqs are already masked.
>> + * flag with kernel_neon_{disable, enable}. This will still allow softirqs to
>
> These names don't match the code now.
>
>> + * run but prevent them to use FPSIMD.
>> *
>> * For a certain task, the sequence may look something like this:
>> * - the task gets scheduled in; if both the task's fpsimd_cpu field
>> @@ -150,6 +151,58 @@ extern void __percpu *efi_sve_state;
>>
>> #endif /* ! CONFIG_ARM64_SVE */
>>
>> +DEFINE_PER_CPU(bool, kernel_neon_busy);
>> +EXPORT_PER_CPU_SYMBOL(kernel_neon_busy);
>
> This feels mis-named now. Maybe "fpsimd_context_busy" would be a better
> name?
Make sense. I will update it.
>
>> +
>> +/*
>> + * Obtain the CPU FPSIMD context for use by the calling context.
>> + *
>> + * The caller may freely modify FPSIMD context until *put_cpu_fpsimd_context()
>
> Nit: Why *? This makes it look a bit like get_cpu_fpsimd_context()
> returns a pointer and you're saying something about dereferencing that
> pointer here.
I tend to use * for wildcard. In this context it used to refers to both the
double-underscored version and the one without.
I can use {,__}put_cpu_fpsimd_context() instead.
>
>> + * is called.
>> + *
>> + * The double-underscore version must only be called if you know the task
>> + * can't be preempted.
>> + *
>> + * __get_cpu_fpsimd_context() *must* be in pair with __put_cpu_fpsimd_context()
>> + * get_cpu_fpsimd_context() *must* be in pair with put_cpu_fpsimd_context()
>
> "in pair" -> "paired with"?
Sure.
>
> I'd move each of these comments to be next to the function it applies
> to.
Do you mean on top of {,__}put_cpu_ or {,__}get_cpu?
>
>> + */
>> +static void __get_cpu_fpsimd_context(void)
>> +{
>> + bool busy = __this_cpu_xchg(kernel_neon_busy, true);
>> +
>
> I don't mind whether there is a blank line here or not, but please make
> it consistent with __put_cpu_fpsimd_context().
I will modify __put_cpu_fpsimd_context() to add a newline.
>
>> + WARN_ON(busy);
>> +}
>> +
>> +static void get_cpu_fpsimd_context(void)
>> +{
>> + preempt_disable();
>> + __get_cpu_fpsimd_context();
>> +}
>> +
>> +/*
>> + * Release the CPU FPSIMD context.
>> + *
>> + * Must be called from a context in which *get_cpu_fpsimd_context() was
>
> Nit: Why *?
Same as above, I can update to use {,__} instead of *.
>
>> + * previously called, with no call to *put_cpu_fpsimd_context() in the
>> + * meantime.
>> + */
>> +static void __put_cpu_fpsimd_context(void)
>> +{
>> + bool busy = __this_cpu_xchg(kernel_neon_busy, false);
>> + WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
>> +}
>> +
>> +static void put_cpu_fpsimd_context(void)
>> +{
>> + __put_cpu_fpsimd_context();
>> + preempt_enable();
>> +}
>> +
>> +static bool have_cpu_fpsimd_context(void)
>> +{
>> + return (!preemptible() && __this_cpu_read(kernel_neon_busy));
>
> Nit: Redundant ()
>
>> +}
>> +
>> /*
>> * Call __sve_free() directly only if you know task can't be scheduled
>> * or preempted.
>> @@ -221,11 +274,12 @@ static void sve_free(struct task_struct *task)
>> * thread_struct is known to be up to date, when preparing to enter
>> * userspace.
>> *
>> - * Softirqs (and preemption) must be disabled.
>> + * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
>
> or __get_cpu_fpsimd_context()? Since this is effectively documented by
> the WARN_ON() and this is a local function anyway, maybe it would be
> simpler just to drop this comment here?
I am fine with that.
>
>> + * before calling this function.
>> */
>> static void task_fpsimd_load(void)
>> {
>> - WARN_ON(!in_softirq() && !irqs_disabled());
>> + WARN_ON(!have_cpu_fpsimd_context());
>>
>> if (system_supports_sve() && test_thread_flag(TIF_SVE))
>> sve_load_state(sve_pffr(¤t->thread),
>> @@ -239,15 +293,22 @@ static void task_fpsimd_load(void)
>> * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
>> * date with respect to the CPU registers.
>> *
>> - * Softirqs (and preemption) must be disabled.
>> + * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
>
> Likewise.
Ditto.
>
>> + * before calling this function.
>> */
>> static void fpsimd_save(void)
>> {
>> struct fpsimd_last_state_struct const *last =
>> this_cpu_ptr(&fpsimd_last_state);
>> /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
>> + WARN_ON(!have_cpu_fpsimd_context());
>>
>> - WARN_ON(!in_softirq() && !irqs_disabled());
>> + if ( !have_cpu_fpsimd_context() )
>
> Nit: Redundant whitespace around expression.
This hunk should actually be dropped. I was using for debugging and forgot to
remove it before sending the series :/.
>
>> + {
>> + printk("preemptible() = %u kernel_neon_busy = %u\n",
>> + preemptible(), __this_cpu_read(kernel_neon_busy));
>> + while (1);
>> + }
>>
>> if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
>> if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
>> @@ -352,7 +413,8 @@ static int __init sve_sysctl_init(void) { return 0; }
>> * task->thread.sve_state.
>> *
>> * Task can be a non-runnable task, or current. In the latter case,
>> - * softirqs (and preemption) must be disabled.
>> + * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
>> + * before calling this function.
I noticed you didn't comment about the usage of get_cpu_fpsimd_context here. Do
you want to add a WARN(..) in the function, or just using {,___} here?
>> * task->thread.sve_state must point to at least sve_state_size(task)
>> * bytes of allocated kernel memory.
>> * task->thread.uw.fpsimd_state must be up to date before calling this
>> @@ -379,7 +441,8 @@ static void fpsimd_to_sve(struct task_struct *task)
>> * task->thread.uw.fpsimd_state.
>> *
>> * Task can be a non-runnable task, or current. In the latter case,
>> - * softirqs (and preemption) must be disabled.
>> + * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
>> + * before calling this function.
Same question here.
[...]
>> @@ -1012,7 +1079,8 @@ void fpsimd_signal_preserve_current_state(void)
>>
>> /*
>> * Associate current's FPSIMD context with this cpu
>> - * Preemption must be disabled when calling this function.
>> + * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
>> + * before calling this function.
Same question here.
>> /*
>> * Invalidate any task's FPSIMD state that is present on this cpu.
>> - * This function must be called with softirqs disabled.
>> + * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
>> + * before calling this function.
>> */
>> static void fpsimd_flush_cpu_state(void)
>> {
>> @@ -1125,19 +1194,18 @@ static void fpsimd_flush_cpu_state(void)
>>
>> /*
>> * Save the FPSIMD state to memory and invalidate cpu view.
>> - * This function must be called with softirqs (and preemption) disabled.
>> + * This function must be called with preemption disabled.
>> */
>> void fpsimd_save_and_flush_cpu_state(void)
>> {
>> + __get_cpu_fpsimd_context();
>> fpsimd_save();
>> fpsimd_flush_cpu_state();
>> + __put_cpu_fpsimd_context();
>
> It may be cleaner to avoid the assumption about preemption already being
> disabled here. fpsimd_thread_switch() is rather a special case, but for
> this one is this really used on a hot path that justifies the assumption?
It is currently only called with preemption disabled. So I thought it would be
better to avoid disabling preemption again. But I am happy to use the non-__
version if you think it is better.
Cheers,
--
Julien Grall
On Wed, Apr 17, 2019 at 12:37:57PM +0100, Julien Grall wrote:
> Hi Dave,
>
> On 16/04/2019 13:30, Dave Martin wrote:
> >On Fri, Apr 12, 2019 at 06:14:20PM +0100, Julien Grall wrote:
[...]
> >>+
> >>+/*
> >>+ * Obtain the CPU FPSIMD context for use by the calling context.
If we say something like "Claim ownership of the CPU FPSIMD context for
use by the calling context", this may be easier to reference elsewhere
(see below).
> >>+ *
> >>+ * The caller may freely modify FPSIMD context until *put_cpu_fpsimd_context()
> >
> >Nit: Why *? This makes it look a bit like get_cpu_fpsimd_context()
> >returns a pointer and you're saying something about dereferencing that
> >pointer here.
>
> I tend to use * for wildcard. In this context it used to refers to both the
> double-underscored version and the one without.
Ah, right. * makes sense in general, but here it confused me.
> I can use {,__}put_cpu_fpsimd_context() instead.
Maybe, but the caller is not supposed to pair get_cpu_fpsimd_context()
with __put_cpu_fpsimd_context().
What if we just comment the non-underscore versions. The purpose of
the __ versions is then relatively obvious without additional commenting.
>
> >
> >>+ * is called.
> >>+ *
> >>+ * The double-underscore version must only be called if you know the task
> >>+ * can't be preempted.
> >>+ *
> >>+ * __get_cpu_fpsimd_context() *must* be in pair with __put_cpu_fpsimd_context()
> >>+ * get_cpu_fpsimd_context() *must* be in pair with put_cpu_fpsimd_context()
> >
> >"in pair" -> "paired with"?
>
> Sure.
>
> >
> >I'd move each of these comments to be next to the function it applies
> >to.
>
> Do you mean on top of {,__}put_cpu_ or {,__}get_cpu?
It doesn't matter. It's the statement about get_cpu_fpsimd_context()
next to the definition of __get_cpu_fpsimd_context() that I find a bit
weird.
Arguably we could solve this by just having less commenting: the
expectation that a "get" should be paired with a subsequent matching
"put" is a common pattern (which was the reason for suggesting those
names).
> >>+ */
> >>+static void __get_cpu_fpsimd_context(void)
> >>+{
> >>+ bool busy = __this_cpu_xchg(kernel_neon_busy, true);
> >>+
> >
> >I don't mind whether there is a blank line here or not, but please make
> >it consistent with __put_cpu_fpsimd_context().
>
> I will modify __put_cpu_fpsimd_context() to add a newline.
>
> >
> >>+ WARN_ON(busy);
> >>+}
> >>+
> >>+static void get_cpu_fpsimd_context(void)
> >>+{
> >>+ preempt_disable();
> >>+ __get_cpu_fpsimd_context();
> >>+}
> >>+
> >>+/*
> >>+ * Release the CPU FPSIMD context.
> >>+ *
> >>+ * Must be called from a context in which *get_cpu_fpsimd_context() was
> >
> >Nit: Why *?
>
> Same as above, I can update to use {,__} instead of *.
>
> >
> >>+ * previously called, with no call to *put_cpu_fpsimd_context() in the
> >>+ * meantime.
> >>+ */
> >>+static void __put_cpu_fpsimd_context(void)
> >>+{
> >>+ bool busy = __this_cpu_xchg(kernel_neon_busy, false);
> >>+ WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
> >>+}
> >>+
> >>+static void put_cpu_fpsimd_context(void)
> >>+{
> >>+ __put_cpu_fpsimd_context();
> >>+ preempt_enable();
> >>+}
> >>+
> >>+static bool have_cpu_fpsimd_context(void)
> >>+{
> >>+ return (!preemptible() && __this_cpu_read(kernel_neon_busy));
> >
> >Nit: Redundant ()
> >
> >>+}
> >>+
> >> /*
> >> * Call __sve_free() directly only if you know task can't be scheduled
> >> * or preempted.
> >>@@ -221,11 +274,12 @@ static void sve_free(struct task_struct *task)
> >> * thread_struct is known to be up to date, when preparing to enter
> >> * userspace.
> >> *
> >>- * Softirqs (and preemption) must be disabled.
> >>+ * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
> >
> >or __get_cpu_fpsimd_context()? Since this is effectively documented by
> >the WARN_ON() and this is a local function anyway, maybe it would be
> >simpler just to drop this comment here?
>
> I am fine with that.
>
> >
> >>+ * before calling this function.
> >> */
> >> static void task_fpsimd_load(void)
> >> {
> >>- WARN_ON(!in_softirq() && !irqs_disabled());
> >>+ WARN_ON(!have_cpu_fpsimd_context());
> >> if (system_supports_sve() && test_thread_flag(TIF_SVE))
> >> sve_load_state(sve_pffr(¤t->thread),
> >>@@ -239,15 +293,22 @@ static void task_fpsimd_load(void)
> >> * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
> >> * date with respect to the CPU registers.
> >> *
> >>- * Softirqs (and preemption) must be disabled.
> >>+ * The FPSIMD context must be acquired with get_cpu_fpsimd_context()
> >
> >Likewise.
>
> Ditto.
>
> >
> >>+ * before calling this function.
> >> */
> >> static void fpsimd_save(void)
> >> {
> >> struct fpsimd_last_state_struct const *last =
> >> this_cpu_ptr(&fpsimd_last_state);
> >> /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
> >>+ WARN_ON(!have_cpu_fpsimd_context());
> >>- WARN_ON(!in_softirq() && !irqs_disabled());
> >>+ if ( !have_cpu_fpsimd_context() )
> >
> >Nit: Redundant whitespace around expression.
>
> This hunk should actually be dropped. I was using for debugging and forgot
> to remove it before sending the series :/.
Ah, right. I didn't spot it -- yes, please remove.
> >
> >>+ {
> >>+ printk("preemptible() = %u kernel_neon_busy = %u\n",
> >>+ preemptible(), __this_cpu_read(kernel_neon_busy));
> >>+ while (1);
> >>+ }
> >> if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> >> if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
> >>@@ -352,7 +413,8 @@ static int __init sve_sysctl_init(void) { return 0; }
> >> * task->thread.sve_state.
> >> *
> >> * Task can be a non-runnable task, or current. In the latter case,
> >>- * softirqs (and preemption) must be disabled.
> >>+ * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
> >>+ * before calling this function.
>
> I noticed you didn't comment about the usage of get_cpu_fpsimd_context here.
> Do you want to add a WARN(..) in the function, or just using {,___} here?
Partly because of things getting repetitive.
WARN() is not free, so I suggest we don't add any new ones for now.
From a comment point of view, if we just say something "the caller must
have ownership of the cpu FPSIMD context" to refer back to the
commenting for get_cpu_fpsimd_context(), that should cover the general
case.
>
> >> * task->thread.sve_state must point to at least sve_state_size(task)
> >> * bytes of allocated kernel memory.
> >> * task->thread.uw.fpsimd_state must be up to date before calling this
> >>@@ -379,7 +441,8 @@ static void fpsimd_to_sve(struct task_struct *task)
> >> * task->thread.uw.fpsimd_state.
> >> *
> >> * Task can be a non-runnable task, or current. In the latter case,
> >>- * softirqs (and preemption) must be disabled.
> >>+ * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
> >>+ * before calling this function.
>
> Same question here.
>
> [...]
>
> >>@@ -1012,7 +1079,8 @@ void fpsimd_signal_preserve_current_state(void)
> >> /*
> >> * Associate current's FPSIMD context with this cpu
> >>- * Preemption must be disabled when calling this function.
> >>+ * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
> >>+ * before calling this function.
>
> Same question here.
Same for these.
> >> /*
> >> * Invalidate any task's FPSIMD state that is present on this cpu.
> >>- * This function must be called with softirqs disabled.
> >>+ * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
> >>+ * before calling this function.
> >> */
> >> static void fpsimd_flush_cpu_state(void)
> >> {
> >>@@ -1125,19 +1194,18 @@ static void fpsimd_flush_cpu_state(void)
> >> /*
> >> * Save the FPSIMD state to memory and invalidate cpu view.
> >>- * This function must be called with softirqs (and preemption) disabled.
> >>+ * This function must be called with preemption disabled.
> >> */
> >> void fpsimd_save_and_flush_cpu_state(void)
> >> {
> >>+ __get_cpu_fpsimd_context();
> >> fpsimd_save();
> >> fpsimd_flush_cpu_state();
> >>+ __put_cpu_fpsimd_context();
> >
> >It may be cleaner to avoid the assumption about preemption already being
> >disabled here. fpsimd_thread_switch() is rather a special case, but for
> >this one is this really used on a hot path that justifies the assumption?
>
> It is currently only called with preemption disabled. So I thought it would
> be better to avoid disabling preemption again. But I am happy to use the
> non-__ version if you think it is better.
Hmmm, this works either way. Since this is not fast-path and has an
external caller, it might be worth adding a WARN_ON(preemptible()) here
if you want to stick with the __ functions.
Otherwise, I don't have a strong opinion.
Cheers
---Dave
Hi Dave,
On 4/17/19 3:01 PM, Dave Martin wrote:
> On Wed, Apr 17, 2019 at 12:37:57PM +0100, Julien Grall wrote:
>> Hi Dave,
>>
>> On 16/04/2019 13:30, Dave Martin wrote:
>>> On Fri, Apr 12, 2019 at 06:14:20PM +0100, Julien Grall wrote:
>
> [...]
>
>>>> +
>>>> +/*
>>>> + * Obtain the CPU FPSIMD context for use by the calling context.
>
> If we say something like "Claim ownership of the CPU FPSIMD context for
> use by the calling context", this may be easier to reference elsewhere
> (see below).
>
>>>> + *
>>>> + * The caller may freely modify FPSIMD context until *put_cpu_fpsimd_context()
>>>
>>> Nit: Why *? This makes it look a bit like get_cpu_fpsimd_context()
>>> returns a pointer and you're saying something about dereferencing that
>>> pointer here.
>>
>> I tend to use * for wildcard. In this context it used to refers to both the
>> double-underscored version and the one without.
>
> Ah, right. * makes sense in general, but here it confused me.
>
>> I can use {,__}put_cpu_fpsimd_context() instead.
>
> Maybe, but the caller is not supposed to pair get_cpu_fpsimd_context()
> with __put_cpu_fpsimd_context().
>
> What if we just comment the non-underscore versions. The purpose of
> the __ versions is then relatively obvious without additional commenting.
I am happy with this solution as well.
[...]
>
>>>
>>>> + {
>>>> + printk("preemptible() = %u kernel_neon_busy = %u\n",
>>>> + preemptible(), __this_cpu_read(kernel_neon_busy));
>>>> + while (1);
>>>> + }
>>>> if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
>>>> if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
>>>> @@ -352,7 +413,8 @@ static int __init sve_sysctl_init(void) { return 0; }
>>>> * task->thread.sve_state.
>>>> *
>>>> * Task can be a non-runnable task, or current. In the latter case,
>>>> - * softirqs (and preemption) must be disabled.
>>>> + * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
>>>> + * before calling this function.
>>
>> I noticed you didn't comment about the usage of get_cpu_fpsimd_context here.
>> Do you want to add a WARN(..) in the function, or just using {,___} here?
>
> Partly because of things getting repetitive.
>
> WARN() is not free, so I suggest we don't add any new ones for now.
>
> From a comment point of view, if we just say something "the caller must
> have ownership of the cpu FPSIMD context" to refer back to the
> commenting for get_cpu_fpsimd_context(), that should cover the general
> case.
I am fine with this suggestion.
>>
>>>> * task->thread.sve_state must point to at least sve_state_size(task)
>>>> * bytes of allocated kernel memory.
>>>> * task->thread.uw.fpsimd_state must be up to date before calling this
>>>> @@ -379,7 +441,8 @@ static void fpsimd_to_sve(struct task_struct *task)
>>>> * task->thread.uw.fpsimd_state.
>>>> *
>>>> * Task can be a non-runnable task, or current. In the latter case,
>>>> - * softirqs (and preemption) must be disabled.
>>>> + * the FPSIMD context must be acquired with get_fpu_fpsimd_context()
>>>> + * before calling this function.
>>
>> Same question here.
>>
>> [...]
>>
>>>> @@ -1012,7 +1079,8 @@ void fpsimd_signal_preserve_current_state(void)
>>>> /*
>>>> * Associate current's FPSIMD context with this cpu
>>>> - * Preemption must be disabled when calling this function.
>>>> + * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
>>>> + * before calling this function.
>>
>> Same question here.
>
> Same for these.
>
>>>> /*
>>>> * Invalidate any task's FPSIMD state that is present on this cpu.
>>>> - * This function must be called with softirqs disabled.
>>>> + * The FPSIMD context should be acquired with get_cpu_fpsimd_context()
>>>> + * before calling this function.
>>>> */
>>>> static void fpsimd_flush_cpu_state(void)
>>>> {
>>>> @@ -1125,19 +1194,18 @@ static void fpsimd_flush_cpu_state(void)
>>>> /*
>>>> * Save the FPSIMD state to memory and invalidate cpu view.
>>>> - * This function must be called with softirqs (and preemption) disabled.
>>>> + * This function must be called with preemption disabled.
>>>> */
>>>> void fpsimd_save_and_flush_cpu_state(void)
>>>> {
>>>> + __get_cpu_fpsimd_context();
>>>> fpsimd_save();
>>>> fpsimd_flush_cpu_state();
>>>> + __put_cpu_fpsimd_context();
>>>
>>> It may be cleaner to avoid the assumption about preemption already being
>>> disabled here. fpsimd_thread_switch() is rather a special case, but for
>>> this one is this really used on a hot path that justifies the assumption?
>>
>> It is currently only called with preemption disabled. So I thought it would
>> be better to avoid disabling preemption again. But I am happy to use the
>> non-__ version if you think it is better.
>
> Hmmm, this works either way. Since this is not fast-path and has an
> external caller, it might be worth adding a WARN_ON(preemptible()) here
> if you want to stick with the __ functions.
As it is not a fastpath, I will use the non-underscore version.
Cheers,
--
Julien Grall
On Tue, Apr 23, 2019 at 11:59:44AM +0100, Julien Grall wrote:
> Hi Dave,
>
> On 4/17/19 3:01 PM, Dave Martin wrote:
> >On Wed, Apr 17, 2019 at 12:37:57PM +0100, Julien Grall wrote:
> >>Hi Dave,
> >>
> >>On 16/04/2019 13:30, Dave Martin wrote:
> >>>On Fri, Apr 12, 2019 at 06:14:20PM +0100, Julien Grall wrote:
[...]
> >>>>@@ -1125,19 +1194,18 @@ static void fpsimd_flush_cpu_state(void)
> >>>> /*
> >>>> * Save the FPSIMD state to memory and invalidate cpu view.
> >>>>- * This function must be called with softirqs (and preemption) disabled.
> >>>>+ * This function must be called with preemption disabled.
> >>>> */
> >>>> void fpsimd_save_and_flush_cpu_state(void)
> >>>> {
> >>>>+ __get_cpu_fpsimd_context();
> >>>> fpsimd_save();
> >>>> fpsimd_flush_cpu_state();
> >>>>+ __put_cpu_fpsimd_context();
> >>>
> >>>It may be cleaner to avoid the assumption about preemption already being
> >>>disabled here. fpsimd_thread_switch() is rather a special case, but for
> >>>this one is this really used on a hot path that justifies the assumption?
> >>
> >>It is currently only called with preemption disabled. So I thought it would
> >>be better to avoid disabling preemption again. But I am happy to use the
> >>non-__ version if you think it is better.
> >
> >Hmmm, this works either way. Since this is not fast-path and has an
> >external caller, it might be worth adding a WARN_ON(preemptible()) here
> >if you want to stick with the __ functions.
>
> As it is not a fastpath, I will use the non-underscore version.
OK, either is good for me.
Cheers
---Dave