2024-06-05 08:36:40

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer

This series is one of the dependencies of the fast-headers work,
which aims to reduce header complexity by removing <asm/processor.h>
from the <linux/sched.h> dependency chain, which headers are headers
are fat enough already even if we do not combine them.

To achieve that decoupling, one of the key steps is to not embedd any
C types from <asm/processor.h> into task_struct.

The only architecture that relies on that in a serious fashion is x86,
via 'struct thread::fpu', and the series below attempts to resolve it
by using a calculated &task->thread.fpu value via the x86_task_fpu()
helper.

Code generation: without CONFIG_X86_DEBUG_FPU=y we get a small reduction:

text data bss dec hex filename
26475783 10439082 1740804 38655669 24dd6b5 vmlinux.before
26475601 10435146 1740804 38651551 24dc69f vmlinux.after

With the new debug code - which I think we'll remove soon-ish, there's an
expected increase:

text data bss dec hex filename
26476198 10439286 1740804 38656288 24dd920 vmlinux.CONFIG_X86_DEBUG_FPU.before
26477000 10435458 1740804 38653262 24dcd4e vmlinux.CONFIG_X86_DEBUG_FPU.after

The Git tree can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/fpu

Thanks,

Ingo

===============>
Ingo Molnar (3):
x86/fpu: Make task_struct::thread constant size
x86/fpu: Remove the thread::fpu pointer
x86/fpu: Remove init_task FPU state dependencies, add debugging warning

arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/include/asm/processor.h | 19 ++++++++++---------
arch/x86/kernel/fpu/context.h | 4 ++--
arch/x86/kernel/fpu/core.c | 57 +++++++++++++++++++++++++++++++--------------------------
arch/x86/kernel/fpu/init.c | 23 +++++++++++++----------
arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
arch/x86/kernel/fpu/xstate.c | 23 ++++++++++-------------
arch/x86/kernel/fpu/xstate.h | 6 +++---
arch/x86/kernel/process.c | 6 ++----
arch/x86/kernel/signal.c | 6 +++---
arch/x86/kernel/traps.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
include/linux/sched.h | 13 +++----------
17 files changed, 104 insertions(+), 107 deletions(-)

--
2.43.0



2024-06-05 08:37:29

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer

As suggested by Oleg, remove the thread::fpu pointer, as we can
calculate it via x86_task_fpu() at compile-time.

This improves code generation a bit:

kepler:~/tip> size vmlinux.before vmlinux.after
text data bss dec hex filename
26475405 10435342 1740804 38651551 24dc69f vmlinux.before
26475339 10959630 1216516 38651485 24dc65d vmlinux.after

Suggested-by: Oleg Nesterov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

==================>
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/include/asm/processor.h | 5 ++---
arch/x86/include/asm/vm86.h | 2 +-
arch/x86/kernel/fpu/context.h | 2 +-
arch/x86/kernel/fpu/core.c | 30 ++++++++++++++----------------
arch/x86/kernel/fpu/init.c | 7 +++----
arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/xstate.h | 6 +++---
arch/x86/kernel/process.c | 6 ++----
arch/x86/kernel/signal.c | 6 +++---
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/vmlinux.lds.S | 4 ++++
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
18 files changed, 71 insertions(+), 73 deletions(-)

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/include/asm/processor.h | 5 ++---
arch/x86/kernel/fpu/context.h | 2 +-
arch/x86/kernel/fpu/core.c | 30 ++++++++++++++----------------
arch/x86/kernel/fpu/init.c | 7 +++----
arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/xstate.h | 6 +++---
arch/x86/kernel/process.c | 6 ++----
arch/x86/kernel/signal.c | 6 +++---
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/vmlinux.lds.S | 4 ++++
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
17 files changed, 70 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 3cf20ab49b5f..1feaa68b7567 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -41,7 +41,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
{
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
!(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
- struct fpu *old_fpu = old->thread.fpu;
+ struct fpu *old_fpu = x86_task_fpu(old);

save_fpregs_to_fpstate(old_fpu);
/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 920b0beebd11..249c5fa20de4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -502,11 +502,10 @@ struct thread_struct {

struct thread_shstk shstk;
#endif
-
- /* Floating point and extended processor state */
- struct fpu *fpu;
};

+#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+
/*
* X86 doesn't need any embedded-FPU-struct quirks:
*/
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 96d1f34179b3..10d0a720659c 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -53,7 +53,7 @@ static inline void fpregs_activate(struct fpu *fpu)
/* Internal helper for switch_fpu_return() and signal frame setup */
static inline void fpregs_restore_userregs(void)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
int cpu = smp_processor_id();

if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 816d48978da4..0ccabcd3bf62 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -204,7 +204,7 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
return;

spin_lock_irq(&current->sighand->siglock);
- fpuperm = &current->group_leader->thread.fpu->guest_perm;
+ fpuperm = &x86_task_fpu(current->group_leader)->guest_perm;
perm = fpuperm->__state_perm;

/* First fpstate allocation locks down permissions. */
@@ -316,7 +316,7 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
- struct fpstate *fps = current->thread.fpu->fpstate;
+ struct fpstate *fps = x86_task_fpu(current)->fpstate;

lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
{
struct fpstate *guest_fps = guest_fpu->fpstate;
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
struct fpstate *cur_fps = fpu->fpstate;

fpregs_lock();
@@ -430,7 +430,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
!test_thread_flag(TIF_NEED_FPU_LOAD)) {
set_thread_flag(TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(current->thread.fpu);
+ save_fpregs_to_fpstate(x86_task_fpu(current));
}
__cpu_invalidate_fpregs_state();

@@ -458,7 +458,7 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
*/
void fpu_sync_fpstate(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != current->thread.fpu);
+ WARN_ON_FPU(fpu != x86_task_fpu(current));

fpregs_lock();
trace_x86_fpu_before_save(fpu);
@@ -543,7 +543,7 @@ void fpstate_reset(struct fpu *fpu)
static inline void fpu_inherit_perms(struct fpu *dst_fpu)
{
if (fpu_state_size_dynamic()) {
- struct fpu *src_fpu = current->group_leader->thread.fpu;
+ struct fpu *src_fpu = x86_task_fpu(current->group_leader);

spin_lock_irq(&current->sighand->siglock);
/* Fork also inherits the permissions of the parent */
@@ -563,7 +563,7 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
if (!ssp)
return 0;

- xstate = get_xsave_addr(&dst->thread.fpu->fpstate->regs.xsave,
+ xstate = get_xsave_addr(&x86_task_fpu(dst)->fpstate->regs.xsave,
XFEATURE_CET_USER);

/*
@@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
- struct fpu *src_fpu = current->thread.fpu;
+ struct fpu *src_fpu = x86_task_fpu(current);

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
BUG_ON(!src_fpu);

- dst->thread.fpu = dst_fpu;
-
/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;

@@ -678,7 +676,7 @@ void fpu__drop(struct fpu *fpu)
{
preempt_disable();

- if (fpu == current->thread.fpu) {
+ if (fpu == x86_task_fpu(current)) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
@@ -712,7 +710,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
*/
static void fpu_reset_fpregs(void)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

fpregs_lock();
__fpu_invalidate_fpregs_state(fpu);
@@ -741,7 +739,7 @@ static void fpu_reset_fpregs(void)
*/
void fpu__clear_user_states(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != current->thread.fpu);
+ WARN_ON_FPU(fpu != x86_task_fpu(current));

fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
@@ -774,7 +772,7 @@ void fpu__clear_user_states(struct fpu *fpu)

void fpu_flush_thread(void)
{
- fpstate_reset(current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
fpu_reset_fpregs();
}
/*
@@ -815,7 +813,7 @@ void fpregs_lock_and_load(void)
*/
void fpregs_assert_state_consistent(void)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

if (test_thread_flag(TIF_NEED_FPU_LOAD))
return;
@@ -827,7 +825,7 @@ EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);

void fpregs_mark_activate(void)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

fpregs_activate(fpu);
fpu->last_cpu = smp_processor_id();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index de618ec509aa..11aa31410df2 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
+ fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
else
#endif
asm volatile ("fninit");
@@ -78,7 +78,6 @@ static void __init fpu__init_system_early_generic(void)
int this_cpu = smp_processor_id();

fpstate_reset(&x86_init_fpu);
- current->thread.fpu = &x86_init_fpu;
per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
x86_init_fpu.last_cpu = this_cpu;

@@ -165,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
- task_size -= sizeof(current->thread.fpu->__fpstate.regs);
+ task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);

/*
* Add back the dynamically-calculated register state
@@ -210,7 +209,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
}

/*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 38bc0b390d02..19fd159217f7 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,7 +45,7 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
*/
static void sync_fpstate(struct fpu *fpu)
{
- if (fpu == current->thread.fpu)
+ if (fpu == x86_task_fpu(current))
fpu_sync_fpstate(fpu);
}

@@ -63,7 +63,7 @@ static void fpu_force_restore(struct fpu *fpu)
* Only stopped child tasks can be used to modify the FPU
* state in the fpstate buffer:
*/
- WARN_ON_FPU(fpu == current->thread.fpu);
+ WARN_ON_FPU(fpu == x86_task_fpu(current));

__fpu_invalidate_fpregs_state(fpu);
}
@@ -71,7 +71,7 @@ static void fpu_force_restore(struct fpu *fpu)
int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);

if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
@@ -91,7 +91,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct fxregs_state newstate;
int ret;

@@ -133,7 +133,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;

- sync_fpstate(target->thread.fpu);
+ sync_fpstate(x86_task_fpu(target));

copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
return 0;
@@ -143,7 +143,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct xregs_state *tmpbuf = NULL;
int ret;

@@ -187,7 +187,7 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct cet_user_state *cetregs;

if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
@@ -213,7 +213,7 @@ int ssp_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct xregs_state *xsave = &fpu->fpstate->regs.xsave;
struct cet_user_state *cetregs;
unsigned long user_ssp;
@@ -367,7 +367,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
- __convert_from_fxsr(env, tsk, &tsk->thread.fpu->fpstate->regs.fxsave);
+ __convert_from_fxsr(env, tsk, &x86_task_fpu(tsk)->fpstate->regs.fxsave);
}

void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -400,7 +400,7 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;

@@ -432,7 +432,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct user_i387_ia32_struct env;
int ret;

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 6b262d0b8fc1..b203bf4617fc 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -38,7 +38,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
- fx_sw->xstate_size > current->thread.fpu->fpstate->user_size ||
+ fx_sw->xstate_size > x86_task_fpu(current)->fpstate->user_size ||
fx_sw->xstate_size > fx_sw->extended_size)
goto setfx;

@@ -54,7 +54,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
if (likely(magic2 == FP_XSTATE_MAGIC2))
return true;
setfx:
- trace_x86_fpu_xstate_check_failed(current->thread.fpu);
+ trace_x86_fpu_xstate_check_failed(x86_task_fpu(current));

/* Set the parameters for fx only state */
fx_sw->magic1 = 0;
@@ -69,13 +69,13 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
{
if (use_fxsr()) {
- struct xregs_state *xsave = &tsk->thread.fpu->fpstate->regs.xsave;
+ struct xregs_state *xsave = &x86_task_fpu(tsk)->fpstate->regs.xsave;
struct user_i387_ia32_struct env;
struct _fpstate_32 __user *fp = buf;

fpregs_lock();
if (!test_thread_flag(TIF_NEED_FPU_LOAD))
- fxsave(&tsk->thread.fpu->fpstate->regs.fxsave);
+ fxsave(&x86_task_fpu(tsk)->fpstate->regs.fxsave);
fpregs_unlock();

convert_from_fxsr(&env, tsk);
@@ -188,7 +188,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
{
struct task_struct *tsk = current;
- struct fpstate *fpstate = tsk->thread.fpu->fpstate;
+ struct fpstate *fpstate = x86_task_fpu(tsk)->fpstate;
bool ia32_fxstate = (buf != buf_fx);
int ret;

@@ -276,7 +276,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
*/
static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
int ret;

/* Restore enabled features only. */
@@ -336,7 +336,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
bool ia32_fxstate)
{
struct task_struct *tsk = current;
- struct fpu *fpu = tsk->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(tsk);
struct user_i387_ia32_struct env;
bool success, fx_only = false;
union fpregs_state *fpregs;
@@ -456,7 +456,7 @@ static inline unsigned int xstate_sigframe_size(struct fpstate *fpstate)
*/
bool fpu__restore_sig(void __user *buf, int ia32_frame)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
void __user *buf_fx = buf;
bool ia32_fxstate = false;
bool success = false;
@@ -503,7 +503,7 @@ unsigned long
fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long *buf_fx, unsigned long *size)
{
- unsigned long frame_size = xstate_sigframe_size(current->thread.fpu->fpstate);
+ unsigned long frame_size = xstate_sigframe_size(x86_task_fpu(current)->fpstate);

*buf_fx = sp = round_down(sp - frame_size, 64);
if (ia32_frame && use_fxsr()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1d1f6599731b..90b11671e943 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -735,7 +735,7 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
*/
init_fpstate.xfd = 0;

- fpstate_reset(current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
}

/*
@@ -845,7 +845,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;

/* Reset the state for the current task */
- fpstate_reset(current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));

/*
* Update info used for ptrace frames; use standard-format size and no
@@ -919,7 +919,7 @@ void fpu__resume_cpu(void)
}

if (fpu_state_size_dynamic())
- wrmsrl(MSR_IA32_XFD, current->thread.fpu->fpstate->xfd);
+ wrmsrl(MSR_IA32_XFD, x86_task_fpu(current)->fpstate->xfd);
}

/*
@@ -1188,8 +1188,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode copy_mode)
{
- __copy_xstate_to_uabi_buf(to, tsk->thread.fpu->fpstate,
- tsk->thread.fpu->fpstate->user_xfeatures,
+ __copy_xstate_to_uabi_buf(to, x86_task_fpu(tsk)->fpstate,
+ x86_task_fpu(tsk)->fpstate->user_xfeatures,
tsk->thread.pkru, copy_mode);
}

@@ -1329,7 +1329,7 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u
int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(tsk->thread.fpu->fpstate, NULL, ubuf, &tsk->thread.pkru);
+ return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru);
}

static bool validate_independent_components(u64 mask)
@@ -1423,7 +1423,7 @@ static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
* The XFD MSR does not match fpstate->xfd. That's invalid when
* the passed in fpstate is current's fpstate.
*/
- if (fpstate->xfd == current->thread.fpu->fpstate->xfd)
+ if (fpstate->xfd == x86_task_fpu(current)->fpstate->xfd)
return false;

/*
@@ -1500,7 +1500,7 @@ void fpstate_free(struct fpu *fpu)
static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
unsigned int usize, struct fpu_guest *guest_fpu)
{
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
struct fpstate *curfps, *newfps = NULL;
unsigned int fpsize;
bool in_use;
@@ -1593,7 +1593,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
* AVX512.
*/
bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
- struct fpu *fpu = current->group_leader->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current->group_leader);
struct fpu_state_perm *perm;
unsigned int ksize, usize;
u64 mask;
@@ -1696,7 +1696,7 @@ int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
return -EPERM;
}

- fpu = current->group_leader->thread.fpu;
+ fpu = x86_task_fpu(current->group_leader);
perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
ksize = perm->__state_size;
usize = perm->__user_state_size;
@@ -1801,7 +1801,7 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
*/
static void avx512_status(struct seq_file *m, struct task_struct *task)
{
- unsigned long timestamp = READ_ONCE(task->thread.fpu->avx512_timestamp);
+ unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
long delta;

if (!timestamp) {
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 1c720c376a69..391fa3c69e5e 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -22,7 +22,7 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)

static inline u64 xstate_get_group_perm(bool guest)
{
- struct fpu *fpu = current->group_leader->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current->group_leader);
struct fpu_state_perm *perm;

/* Pairs with WRITE_ONCE() in xstate_request_perm() */
@@ -265,7 +265,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
* internally, e.g. PKRU. That's user space ABI and also required
* to allow the signal handler to modify PKRU.
*/
- struct fpstate *fpstate = current->thread.fpu->fpstate;
+ struct fpstate *fpstate = x86_task_fpu(current)->fpstate;
u64 mask = fpstate->user_xfeatures;
u32 lmask;
u32 hmask;
@@ -296,7 +296,7 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
u32 hmask = mask >> 32;
int err;

- xfd_validate_state(current->thread.fpu->fpstate, mask, true);
+ xfd_validate_state(x86_task_fpu(current)->fpstate, mask, true);

stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5f3f48713870..4184c085627e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,8 +96,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
#ifdef CONFIG_VM86
dst->thread.vm86 = NULL;
#endif
- /* Drop the copied pointer to current's fpstate */
- dst->thread.fpu = NULL;

return 0;
}
@@ -106,7 +104,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
void arch_release_task_struct(struct task_struct *tsk)
{
if (fpu_state_size_dynamic())
- fpstate_free(tsk->thread.fpu);
+ fpstate_free(x86_task_fpu(tsk));
}
#endif

@@ -116,7 +114,7 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = t->fpu;
+ struct fpu *fpu = x86_task_fpu(tsk);

if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 7e709ae8e99a..d7906f5979a4 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -228,7 +228,7 @@ static void
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
bool stepping, failed;
- struct fpu *fpu = current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

if (v8086_mode(regs))
save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -396,14 +396,14 @@ bool sigaltstack_size_valid(size_t ss_size)
if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
return true;

- fsize += current->group_leader->thread.fpu->perm.__user_state_size;
+ fsize += x86_task_fpu(current->group_leader)->perm.__user_state_size;
if (likely(ss_size > fsize))
return true;

if (strict_sigaltstack_size)
return ss_size > fsize;

- mask = current->group_leader->thread.fpu->perm.__state_perm;
+ mask = x86_task_fpu(current->group_leader)->perm.__state_perm;
if (mask & XFEATURE_MASK_USER_DYNAMIC)
return ss_size > fsize;

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c79e36c4071b..b19fb494c57c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1148,7 +1148,7 @@ DEFINE_IDTENTRY_RAW(exc_debug)
static void math_error(struct pt_regs *regs, int trapnr)
{
struct task_struct *task = current;
- struct fpu *fpu = task->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(task);
int si_code;
char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
"simd exception";
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3509afc6a672..226244a894da 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,6 +170,10 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;

+ __x86_init_fpu_begin = .;
+ . = __x86_init_fpu_begin + 128*PAGE_SIZE;
+ __x86_init_fpu_end = .;
+
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index 8087953164fc..5f253ae406b6 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)

void finit(void)
{
- fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
+ fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
}

/*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 30400d95d9d0..5034df617740 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -641,7 +641,7 @@ int fpregs_soft_set(struct task_struct *target,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
+ struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
void *space = s387->st_space;
int ret;
int offset, other, i, tags, regnr, tag, newtop;
@@ -692,7 +692,7 @@ int fpregs_soft_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)
{
- struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
+ struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
const void *space = s387->st_space;
int offset = (S387->ftop & 7) * 10, other = 80 - offset;

diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index 3417337e7d99..5e238e930fe3 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
}

-#define I387 (&current->thread.fpu->fpstate->regs)
+#define I387 (&x86_task_fpu(current)->fpstate->regs)
#define FPU_info (I387->soft.info)

#define FPU_CS (*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 9400c1c29fc8..1359ad75da3a 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,

/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (task->thread.fpu->state)
+ * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
--
2.43.0


2024-06-05 09:07:05

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning

init_task's FPU state initialization was a bit of a hack:

__x86_init_fpu_begin = .;
. = __x86_init_fpu_begin + 128*PAGE_SIZE;
__x86_init_fpu_end = .;

But the init task isn't supposed to be using the FPU in any case,
so remove the hack and add in some debug warnings.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/processor.h | 6 +++++-
arch/x86/kernel/fpu/core.c | 12 +++++++++---
arch/x86/kernel/fpu/init.c | 5 ++---
arch/x86/kernel/fpu/xstate.c | 3 ---
arch/x86/kernel/vmlinux.lds.S | 4 ----
5 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 249c5fa20de4..ed8981866f4d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,7 +504,11 @@ struct thread_struct {
#endif
};

-#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#ifdef CONFIG_X86_DEBUG_FPU
+extern struct fpu *x86_task_fpu(struct task_struct *task);
+#else
+# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#endif

/*
* X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 0ccabcd3bf62..fdc3b227800d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -51,6 +51,15 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
*/
DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);

+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+ WARN_ON_ONCE(task == &init_task);
+
+ return (void *)task + sizeof(*task);
+}
+#endif
+
/*
* Can we use the FPU in kernel mode with the
* whole "kernel_fpu_begin/end()" sequence?
@@ -591,10 +600,8 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
- struct fpu *src_fpu = x86_task_fpu(current);

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
- BUG_ON(!src_fpu);

/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -657,7 +664,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
if (update_fpu_shstk(dst, ssp))
return 1;

- trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);

return 0;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 11aa31410df2..53580e59e5db 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
+ ;
else
#endif
asm volatile ("fninit");
@@ -164,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
- task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
+ task_size -= sizeof(union fpregs_state);

/*
* Add back the dynamically-calculated register state
@@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(x86_task_fpu(current));
}

/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 90b11671e943..1f37da22ddbe 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
if (err)
goto out_disable;

- /* Reset the state for the current task */
- fpstate_reset(x86_task_fpu(current));
-
/*
* Update info used for ptrace frames; use standard-format size and no
* supervisor xstates:
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 226244a894da..3509afc6a672 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,10 +170,6 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;

- __x86_init_fpu_begin = .;
- . = __x86_init_fpu_begin + 128*PAGE_SIZE;
- __x86_init_fpu_end = .;
-
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA
--
2.43.0


2024-06-05 09:08:47

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 1/3] x86/fpu: Make task_struct::thread constant size

Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.

This allows us to remove the old kludge of task_struct being variable size:

struct task_struct {

...
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end

/* CPU-specific state of this task: */
struct thread_struct thread;

/*
* WARNING: on x86, 'thread_struct' contains a variable-sized
* structure. It *MUST* be at the end of 'task_struct'.
*
* Do not put anything below here!
*/
};

... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.

But the primary motivation is to allow the decoupling of task_struct from
hardware details (<asm/processor.h> in particular), and to eventually allow
the per-task infrastructure:

DECLARE_PER_TASK(type, name);
...
per_task(current, name) = val;

... which requires task_struct to be a constant size struct.

The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/include/asm/processor.h | 14 ++++++--------
arch/x86/kernel/fpu/context.h | 4 ++--
arch/x86/kernel/fpu/core.c | 51 ++++++++++++++++++++++++++-------------------------
arch/x86/kernel/fpu/init.c | 25 +++++++++++++++----------
arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/xstate.h | 6 +++---
arch/x86/kernel/process.c | 6 +++---
arch/x86/kernel/signal.c | 6 +++---
arch/x86/kernel/traps.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
include/linux/sched.h | 13 +++----------
17 files changed, 99 insertions(+), 102 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index c485f1944c5f..3cf20ab49b5f 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -41,7 +41,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
{
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
!(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
- struct fpu *old_fpu = &old->thread.fpu;
+ struct fpu *old_fpu = old->thread.fpu;

save_fpregs_to_fpstate(old_fpu);
/*
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cb4f6c513c48..920b0beebd11 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,19 +504,17 @@ struct thread_struct {
#endif

/* Floating point and extended processor state */
- struct fpu fpu;
- /*
- * WARNING: 'fpu' is dynamically-sized. It *MUST* be at
- * the end.
- */
+ struct fpu *fpu;
};

-extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
-
+/*
+ * X86 doesn't need any embedded-FPU-struct quirks:
+ */
static inline void arch_thread_struct_whitelist(unsigned long *offset,
unsigned long *size)
{
- fpu_thread_struct_whitelist(offset, size);
+ *offset = 0;
+ *size = 0;
}

static inline void
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index f6d856bd50bc..96d1f34179b3 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -53,7 +53,7 @@ static inline void fpregs_activate(struct fpu *fpu)
/* Internal helper for switch_fpu_return() and signal frame setup */
static inline void fpregs_restore_userregs(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;
int cpu = smp_processor_id();

if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
@@ -67,7 +67,7 @@ static inline void fpregs_restore_userregs(void)
* If PKRU is enabled, then the PKRU value is already
* correct because it was either set in switch_to() or in
* flush_thread(). So it is excluded because it might be
- * not up to date in current->thread.fpu.xsave state.
+ * not up to date in current->thread.fpu->xsave state.
*
* XFD state is handled in restore_fpregs_from_fpstate().
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1209c7aebb21..816d48978da4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -204,7 +204,7 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
return;

spin_lock_irq(&current->sighand->siglock);
- fpuperm = &current->group_leader->thread.fpu.guest_perm;
+ fpuperm = &current->group_leader->thread.fpu->guest_perm;
perm = fpuperm->__state_perm;

/* First fpstate allocation locks down permissions. */
@@ -316,7 +316,7 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
- struct fpstate *fps = current->thread.fpu.fpstate;
+ struct fpstate *fps = current->thread.fpu->fpstate;

lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
{
struct fpstate *guest_fps = guest_fpu->fpstate;
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;
struct fpstate *cur_fps = fpu->fpstate;

fpregs_lock();
@@ -430,7 +430,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
!test_thread_flag(TIF_NEED_FPU_LOAD)) {
set_thread_flag(TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(&current->thread.fpu);
+ save_fpregs_to_fpstate(current->thread.fpu);
}
__cpu_invalidate_fpregs_state();

@@ -458,7 +458,7 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
*/
void fpu_sync_fpstate(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != &current->thread.fpu);
+ WARN_ON_FPU(fpu != current->thread.fpu);

fpregs_lock();
trace_x86_fpu_before_save(fpu);
@@ -543,7 +543,7 @@ void fpstate_reset(struct fpu *fpu)
static inline void fpu_inherit_perms(struct fpu *dst_fpu)
{
if (fpu_state_size_dynamic()) {
- struct fpu *src_fpu = &current->group_leader->thread.fpu;
+ struct fpu *src_fpu = current->group_leader->thread.fpu;

spin_lock_irq(&current->sighand->siglock);
/* Fork also inherits the permissions of the parent */
@@ -563,7 +563,7 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
if (!ssp)
return 0;

- xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave,
+ xstate = get_xsave_addr(&dst->thread.fpu->fpstate->regs.xsave,
XFEATURE_CET_USER);

/*
@@ -584,8 +584,19 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long ssp)
{
- struct fpu *src_fpu = &current->thread.fpu;
- struct fpu *dst_fpu = &dst->thread.fpu;
+ /*
+ * We allocate the new FPU structure right after the end of the task struct.
+ * task allocation size already took this into account.
+ *
+ * This is safe because task_struct size is a multiple of cacheline size.
+ */
+ struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+ struct fpu *src_fpu = current->thread.fpu;
+
+ BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
+ BUG_ON(!src_fpu);
+
+ dst->thread.fpu = dst_fpu;

/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -654,16 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
return 0;
}

-/*
- * Whitelist the FPU register state embedded into task_struct for hardened
- * usercopy.
- */
-void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
-{
- *offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
- *size = fpu_kernel_cfg.default_size;
-}
-
/*
* Drops current FPU state: deactivates the fpregs and
* the fpstate. NOTE: it still leaves previous contents
@@ -677,7 +678,7 @@ void fpu__drop(struct fpu *fpu)
{
preempt_disable();

- if (fpu == &current->thread.fpu) {
+ if (fpu == current->thread.fpu) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
@@ -711,7 +712,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
*/
static void fpu_reset_fpregs(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;

fpregs_lock();
__fpu_invalidate_fpregs_state(fpu);
@@ -740,7 +741,7 @@ static void fpu_reset_fpregs(void)
*/
void fpu__clear_user_states(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != &current->thread.fpu);
+ WARN_ON_FPU(fpu != current->thread.fpu);

fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
@@ -773,7 +774,7 @@ void fpu__clear_user_states(struct fpu *fpu)

void fpu_flush_thread(void)
{
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(current->thread.fpu);
fpu_reset_fpregs();
}
/*
@@ -814,7 +815,7 @@ void fpregs_lock_and_load(void)
*/
void fpregs_assert_state_consistent(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;

if (test_thread_flag(TIF_NEED_FPU_LOAD))
return;
@@ -826,7 +827,7 @@ EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);

void fpregs_mark_activate(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;

fpregs_activate(fpu);
fpu->last_cpu = smp_processor_id();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 998a08f17e33..de618ec509aa 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
+ fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
else
#endif
asm volatile ("fninit");
@@ -71,8 +71,17 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}

+static struct fpu x86_init_fpu __read_mostly;
+
static void __init fpu__init_system_early_generic(void)
{
+ int this_cpu = smp_processor_id();
+
+ fpstate_reset(&x86_init_fpu);
+ current->thread.fpu = &x86_init_fpu;
+ per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
+ x86_init_fpu.last_cpu = this_cpu;
+
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
if (fpu__probe_without_cpuid())
@@ -150,11 +159,13 @@ static void __init fpu__init_task_struct_size(void)
{
int task_size = sizeof(struct task_struct);

+ task_size += sizeof(struct fpu);
+
/*
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
- task_size -= sizeof(current->thread.fpu.__fpstate.regs);
+ task_size -= sizeof(current->thread.fpu->__fpstate.regs);

/*
* Add back the dynamically-calculated register state
@@ -164,14 +175,9 @@ static void __init fpu__init_task_struct_size(void)

/*
* We dynamically size 'struct fpu', so we require that
- * it be at the end of 'thread_struct' and that
- * 'thread_struct' be at the end of 'task_struct'. If
- * you hit a compile error here, check the structure to
- * see if something got added to the end.
+ * 'state' be at the end of 'it:
*/
CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
- CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
- CHECK_MEMBER_AT_END_OF(struct task_struct, thread);

arch_task_struct_size = task_size;
}
@@ -204,7 +210,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(current->thread.fpu);
}

/*
@@ -213,7 +219,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
*/
void __init fpu__init_system(void)
{
- fpstate_reset(&current->thread.fpu);
fpu__init_system_early_generic();

/*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 6bc1eb2a21bd..38bc0b390d02 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,7 +45,7 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
*/
static void sync_fpstate(struct fpu *fpu)
{
- if (fpu == &current->thread.fpu)
+ if (fpu == current->thread.fpu)
fpu_sync_fpstate(fpu);
}

@@ -63,7 +63,7 @@ static void fpu_force_restore(struct fpu *fpu)
* Only stopped child tasks can be used to modify the FPU
* state in the fpstate buffer:
*/
- WARN_ON_FPU(fpu == &current->thread.fpu);
+ WARN_ON_FPU(fpu == current->thread.fpu);

__fpu_invalidate_fpregs_state(fpu);
}
@@ -71,7 +71,7 @@ static void fpu_force_restore(struct fpu *fpu)
int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;

if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
@@ -91,7 +91,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;
struct fxregs_state newstate;
int ret;

@@ -133,7 +133,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;

- sync_fpstate(&target->thread.fpu);
+ sync_fpstate(target->thread.fpu);

copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
return 0;
@@ -143,7 +143,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;
struct xregs_state *tmpbuf = NULL;
int ret;

@@ -187,7 +187,7 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;
struct cet_user_state *cetregs;

if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
@@ -213,7 +213,7 @@ int ssp_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;
struct xregs_state *xsave = &fpu->fpstate->regs.xsave;
struct cet_user_state *cetregs;
unsigned long user_ssp;
@@ -367,7 +367,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
- __convert_from_fxsr(env, tsk, &tsk->thread.fpu.fpstate->regs.fxsave);
+ __convert_from_fxsr(env, tsk, &tsk->thread.fpu->fpstate->regs.fxsave);
}

void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -400,7 +400,7 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;
struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;

@@ -432,7 +432,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = target->thread.fpu;
struct user_i387_ia32_struct env;
int ret;

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 247f2225aa9f..6b262d0b8fc1 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -38,7 +38,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
- fx_sw->xstate_size > current->thread.fpu.fpstate->user_size ||
+ fx_sw->xstate_size > current->thread.fpu->fpstate->user_size ||
fx_sw->xstate_size > fx_sw->extended_size)
goto setfx;

@@ -54,7 +54,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
if (likely(magic2 == FP_XSTATE_MAGIC2))
return true;
setfx:
- trace_x86_fpu_xstate_check_failed(&current->thread.fpu);
+ trace_x86_fpu_xstate_check_failed(current->thread.fpu);

/* Set the parameters for fx only state */
fx_sw->magic1 = 0;
@@ -69,13 +69,13 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
{
if (use_fxsr()) {
- struct xregs_state *xsave = &tsk->thread.fpu.fpstate->regs.xsave;
+ struct xregs_state *xsave = &tsk->thread.fpu->fpstate->regs.xsave;
struct user_i387_ia32_struct env;
struct _fpstate_32 __user *fp = buf;

fpregs_lock();
if (!test_thread_flag(TIF_NEED_FPU_LOAD))
- fxsave(&tsk->thread.fpu.fpstate->regs.fxsave);
+ fxsave(&tsk->thread.fpu->fpstate->regs.fxsave);
fpregs_unlock();

convert_from_fxsr(&env, tsk);
@@ -188,7 +188,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
{
struct task_struct *tsk = current;
- struct fpstate *fpstate = tsk->thread.fpu.fpstate;
+ struct fpstate *fpstate = tsk->thread.fpu->fpstate;
bool ia32_fxstate = (buf != buf_fx);
int ret;

@@ -276,7 +276,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
*/
static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;
int ret;

/* Restore enabled features only. */
@@ -336,7 +336,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
bool ia32_fxstate)
{
struct task_struct *tsk = current;
- struct fpu *fpu = &tsk->thread.fpu;
+ struct fpu *fpu = tsk->thread.fpu;
struct user_i387_ia32_struct env;
bool success, fx_only = false;
union fpregs_state *fpregs;
@@ -456,7 +456,7 @@ static inline unsigned int xstate_sigframe_size(struct fpstate *fpstate)
*/
bool fpu__restore_sig(void __user *buf, int ia32_frame)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;
void __user *buf_fx = buf;
bool ia32_fxstate = false;
bool success = false;
@@ -503,7 +503,7 @@ unsigned long
fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long *buf_fx, unsigned long *size)
{
- unsigned long frame_size = xstate_sigframe_size(current->thread.fpu.fpstate);
+ unsigned long frame_size = xstate_sigframe_size(current->thread.fpu->fpstate);

*buf_fx = sp = round_down(sp - frame_size, 64);
if (ia32_frame && use_fxsr()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee5e0..1d1f6599731b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -735,7 +735,7 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
*/
init_fpstate.xfd = 0;

- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(current->thread.fpu);
}

/*
@@ -845,7 +845,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;

/* Reset the state for the current task */
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(current->thread.fpu);

/*
* Update info used for ptrace frames; use standard-format size and no
@@ -919,7 +919,7 @@ void fpu__resume_cpu(void)
}

if (fpu_state_size_dynamic())
- wrmsrl(MSR_IA32_XFD, current->thread.fpu.fpstate->xfd);
+ wrmsrl(MSR_IA32_XFD, current->thread.fpu->fpstate->xfd);
}

/*
@@ -1188,8 +1188,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode copy_mode)
{
- __copy_xstate_to_uabi_buf(to, tsk->thread.fpu.fpstate,
- tsk->thread.fpu.fpstate->user_xfeatures,
+ __copy_xstate_to_uabi_buf(to, tsk->thread.fpu->fpstate,
+ tsk->thread.fpu->fpstate->user_xfeatures,
tsk->thread.pkru, copy_mode);
}

@@ -1329,7 +1329,7 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u
int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
+ return copy_uabi_to_xstate(tsk->thread.fpu->fpstate, NULL, ubuf, &tsk->thread.pkru);
}

static bool validate_independent_components(u64 mask)
@@ -1423,7 +1423,7 @@ static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
* The XFD MSR does not match fpstate->xfd. That's invalid when
* the passed in fpstate is current's fpstate.
*/
- if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
+ if (fpstate->xfd == current->thread.fpu->fpstate->xfd)
return false;

/*
@@ -1500,7 +1500,7 @@ void fpstate_free(struct fpu *fpu)
static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
unsigned int usize, struct fpu_guest *guest_fpu)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;
struct fpstate *curfps, *newfps = NULL;
unsigned int fpsize;
bool in_use;
@@ -1593,7 +1593,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
* AVX512.
*/
bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
- struct fpu *fpu = &current->group_leader->thread.fpu;
+ struct fpu *fpu = current->group_leader->thread.fpu;
struct fpu_state_perm *perm;
unsigned int ksize, usize;
u64 mask;
@@ -1696,7 +1696,7 @@ int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
return -EPERM;
}

- fpu = &current->group_leader->thread.fpu;
+ fpu = current->group_leader->thread.fpu;
perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
ksize = perm->__state_size;
usize = perm->__user_state_size;
@@ -1801,7 +1801,7 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
*/
static void avx512_status(struct seq_file *m, struct task_struct *task)
{
- unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+ unsigned long timestamp = READ_ONCE(task->thread.fpu->avx512_timestamp);
long delta;

if (!timestamp) {
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 05df04f39628..1c720c376a69 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -22,7 +22,7 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)

static inline u64 xstate_get_group_perm(bool guest)
{
- struct fpu *fpu = &current->group_leader->thread.fpu;
+ struct fpu *fpu = current->group_leader->thread.fpu;
struct fpu_state_perm *perm;

/* Pairs with WRITE_ONCE() in xstate_request_perm() */
@@ -265,7 +265,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
* internally, e.g. PKRU. That's user space ABI and also required
* to allow the signal handler to modify PKRU.
*/
- struct fpstate *fpstate = current->thread.fpu.fpstate;
+ struct fpstate *fpstate = current->thread.fpu->fpstate;
u64 mask = fpstate->user_xfeatures;
u32 lmask;
u32 hmask;
@@ -296,7 +296,7 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
u32 hmask = mask >> 32;
int err;

- xfd_validate_state(current->thread.fpu.fpstate, mask, true);
+ xfd_validate_state(current->thread.fpu->fpstate, mask, true);

stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..5f3f48713870 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
dst->thread.vm86 = NULL;
#endif
/* Drop the copied pointer to current's fpstate */
- dst->thread.fpu.fpstate = NULL;
+ dst->thread.fpu = NULL;

return 0;
}
@@ -106,7 +106,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
void arch_release_task_struct(struct task_struct *tsk)
{
if (fpu_state_size_dynamic())
- fpstate_free(&tsk->thread.fpu);
+ fpstate_free(tsk->thread.fpu);
}
#endif

@@ -116,7 +116,7 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = &t->fpu;
+ struct fpu *fpu = t->fpu;

if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 31b6f5dddfc2..7e709ae8e99a 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -228,7 +228,7 @@ static void
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
bool stepping, failed;
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = current->thread.fpu;

if (v8086_mode(regs))
save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -396,14 +396,14 @@ bool sigaltstack_size_valid(size_t ss_size)
if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
return true;

- fsize += current->group_leader->thread.fpu.perm.__user_state_size;
+ fsize += current->group_leader->thread.fpu->perm.__user_state_size;
if (likely(ss_size > fsize))
return true;

if (strict_sigaltstack_size)
return ss_size > fsize;

- mask = current->group_leader->thread.fpu.perm.__state_perm;
+ mask = current->group_leader->thread.fpu->perm.__state_perm;
if (mask & XFEATURE_MASK_USER_DYNAMIC)
return ss_size > fsize;

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..c79e36c4071b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1148,7 +1148,7 @@ DEFINE_IDTENTRY_RAW(exc_debug)
static void math_error(struct pt_regs *regs, int trapnr)
{
struct task_struct *task = current;
- struct fpu *fpu = &task->thread.fpu;
+ struct fpu *fpu = task->thread.fpu;
int si_code;
char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
"simd exception";
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index d62662bdd460..8087953164fc 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)

void finit(void)
{
- fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
+ fpstate_init_soft(&current->thread.fpu->fpstate->regs.soft);
}

/*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 91c52ead1226..30400d95d9d0 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -641,7 +641,7 @@ int fpregs_soft_set(struct task_struct *target,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+ struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
void *space = s387->st_space;
int ret;
int offset, other, i, tags, regnr, tag, newtop;
@@ -692,7 +692,7 @@ int fpregs_soft_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)
{
- struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+ struct swregs_state *s387 = &target->thread.fpu->fpstate->regs.soft;
const void *space = s387->st_space;
int offset = (S387->ftop & 7) * 10, other = 80 - offset;

diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index eec3e4805c75..3417337e7d99 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
}

-#define I387 (&current->thread.fpu.fpstate->regs)
+#define I387 (&current->thread.fpu->fpstate->regs)
#define FPU_info (I387->soft.info)

#define FPU_CS (*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 51986e8a9d35..9400c1c29fc8 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,

/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (task->thread.fpu.state)
+ * here because the FPU state of a task using the FPU (task->thread.fpu->state)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..215a7380e41c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1554,21 +1554,14 @@ struct task_struct {
struct user_event_mm *user_event_mm;
#endif

+ /* CPU-specific state of this task: */
+ struct thread_struct thread;
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-
- /* CPU-specific state of this task: */
- struct thread_struct thread;
-
- /*
- * WARNING: on x86, 'thread_struct' contains a variable-sized
- * structure. It *MUST* be at the end of 'task_struct'.
- *
- * Do not put anything below here!
- */
};

#define TASK_REPORT_IDLE (TASK_REPORT + 1)
--
2.43.0


2024-06-05 13:52:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer

On 06/05, Ingo Molnar wrote:
>
> @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> * This is safe because task_struct size is a multiple of cacheline size.
> */
> struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> - struct fpu *src_fpu = current->thread.fpu;
> + struct fpu *src_fpu = x86_task_fpu(current);

I think this patch can also change

struct fpu *dst_fpu = (void *)dst + sizeof(*dst);

above to use x86_task_fpu(dst).

Oleg.


2024-06-05 14:19:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning

On 06/05, Ingo Molnar wrote:
>
> But the init task isn't supposed to be using the FPU in any case,

Afaics, the same is true for any PF_KTHREAD/USER_WORKER thread?

> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> + WARN_ON_ONCE(task == &init_task);
> +
> + return (void *)task + sizeof(*task);
> +}

So perhaps WARN_ON_ONCE(task->flags & PF_KTHREAD) makes more sense?

Although in this case fpu_clone() can't use x86_task_fpu(dst) as I tried
to suggest in reply to 2/3.




This is offtopic right now, but I am wondering if it makes any sense to
make arch_task_struct_size depend on PF_KTHREAD... I mean, create another
task_struct_kthread_cachep used by alloc_task_struct_node() if PF_KTHREAD.

Oleg.


2024-06-05 16:17:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning

On Wed, 5 Jun 2024 at 07:19, Oleg Nesterov <[email protected]> wrote:
>
> On 06/05, Ingo Molnar wrote:
> >
> > But the init task isn't supposed to be using the FPU in any case,
>
> Afaics, the same is true for any PF_KTHREAD/USER_WORKER thread?

I don't think so. We have various users of kernel_fpu_begin()/end()
that are very much about things like crypto and RAID xor memory copies
etc that will be used by kernel worker threads.

In fact, as far as I know, we'll use the FPU in the init_task too
thanks to irq_fpu_usable(). Look up the nft_pipapo_avx2_lookup()
function.

Maybe other patches removed that, I didn't check the context, but the
"init_task doesn't use FPU" doesn't actually sound true to me.

Linus

2024-06-05 16:27:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning

On 06/05, Linus Torvalds wrote:
>
> On Wed, 5 Jun 2024 at 07:19, Oleg Nesterov <[email protected]> wrote:
> >
> > On 06/05, Ingo Molnar wrote:
> > >
> > > But the init task isn't supposed to be using the FPU in any case,
> >
> > Afaics, the same is true for any PF_KTHREAD/USER_WORKER thread?
>
> I don't think so. We have various users of kernel_fpu_begin()/end()
> that are very much about things like crypto and RAID xor memory copies
> etc that will be used by kernel worker threads.

Yes, but kernel_fpu_begin() never does save_fpregs_to_fpstate() if
current->flags & PF_KTHREAD ?

Oleg.


2024-06-05 17:28:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86/fpu: Remove init_task FPU state dependencies, add debugging warning

On Wed, 5 Jun 2024 at 09:27, Oleg Nesterov <[email protected]> wrote:
>
> Yes, but kernel_fpu_begin() never does save_fpregs_to_fpstate() if
> current->flags & PF_KTHREAD ?

Ahh, and init_thread does have PF_KTHREAD.

Ok, looks fine to me, except I think the commit message should be cleared up.

The whole sentence about

"But the init task isn't supposed to be using the FPU in any case ..."

is just simply not true.

It should be more along the lines of "kernel threads don't need an FPU
save area, because their FPU use is not preemptible or reentrant and
they don't return to user space".

Linus

2024-06-05 19:05:37

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/fpu: Make task_struct::thread constant size

On 6/5/2024 1:35 AM, Ingo Molnar wrote:
>
> /*
> * Handler for when we fail to restore a task's FPU state. We should never get
> - * here because the FPU state of a task using the FPU (task->thread.fpu.state)
> + * here because the FPU state of a task using the FPU (task->thread.fpu->state)

Just a nitpick:
fpu::fpstate now points to the active FPU in-memory storage.

Thanks,
Chang

2024-06-05 21:21:59

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH 0/3, v3] x86/fpu: Remove the thread::fpu pointer

On Wed, Jun 5, 2024 at 4:36 AM Ingo Molnar <[email protected]> wrote:
>
> This series is one of the dependencies of the fast-headers work,
> which aims to reduce header complexity by removing <asm/processor.h>
> from the <linux/sched.h> dependency chain, which headers are headers
> are fat enough already even if we do not combine them.
>
> To achieve that decoupling, one of the key steps is to not embedd any
> C types from <asm/processor.h> into task_struct.
>
> The only architecture that relies on that in a serious fashion is x86,
> via 'struct thread::fpu', and the series below attempts to resolve it
> by using a calculated &task->thread.fpu value via the x86_task_fpu()
> helper.
>
> Code generation: without CONFIG_X86_DEBUG_FPU=y we get a small reduction:
>
> text data bss dec hex filename
> 26475783 10439082 1740804 38655669 24dd6b5 vmlinux.before
> 26475601 10435146 1740804 38651551 24dc69f vmlinux.after
>
> With the new debug code - which I think we'll remove soon-ish, there's an
> expected increase:
>
> text data bss dec hex filename
> 26476198 10439286 1740804 38656288 24dd920 vmlinux.CONFIG_X86_DEBUG_FPU.before
> 26477000 10435458 1740804 38653262 24dcd4e vmlinux.CONFIG_X86_DEBUG_FPU.after
>
> The Git tree can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/fpu
>
> Thanks,
>
> Ingo
>
> ===============>
> Ingo Molnar (3):
> x86/fpu: Make task_struct::thread constant size
> x86/fpu: Remove the thread::fpu pointer
> x86/fpu: Remove init_task FPU state dependencies, add debugging warning
>
> arch/x86/include/asm/fpu/sched.h | 2 +-
> arch/x86/include/asm/processor.h | 19 ++++++++++---------
> arch/x86/kernel/fpu/context.h | 4 ++--
> arch/x86/kernel/fpu/core.c | 57 +++++++++++++++++++++++++++++++--------------------------
> arch/x86/kernel/fpu/init.c | 23 +++++++++++++----------
> arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
> arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
> arch/x86/kernel/fpu/xstate.c | 23 ++++++++++-------------
> arch/x86/kernel/fpu/xstate.h | 6 +++---
> arch/x86/kernel/process.c | 6 ++----
> arch/x86/kernel/signal.c | 6 +++---
> arch/x86/kernel/traps.c | 2 +-
> arch/x86/math-emu/fpu_aux.c | 2 +-
> arch/x86/math-emu/fpu_entry.c | 4 ++--
> arch/x86/math-emu/fpu_system.h | 2 +-
> arch/x86/mm/extable.c | 2 +-
> include/linux/sched.h | 13 +++----------
> 17 files changed, 104 insertions(+), 107 deletions(-)

This series would be better if you added the x86_task_fpu() helper in
an initial patch without any other changes. That would make the
actual changes more visible with less code churn.

Brian Gerst

2024-06-06 08:31:17

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks


* Linus Torvalds <[email protected]> wrote:

> On Wed, 5 Jun 2024 at 09:27, Oleg Nesterov <[email protected]> wrote:
> >
> > Yes, but kernel_fpu_begin() never does save_fpregs_to_fpstate() if
> > current->flags & PF_KTHREAD ?
>
> Ahh, and init_thread does have PF_KTHREAD.
>
> Ok, looks fine to me, except I think the commit message should be cleared up.
>
> The whole sentence about
>
> "But the init task isn't supposed to be using the FPU in any case ..."
>
> is just simply not true.
>
> It should be more along the lines of "kernel threads don't need an FPU
> save area, because their FPU use is not preemptible or reentrant and
> they don't return to user space".

Indeed - I fixed & clarified the changelog accordingly - see the new patch
further below.

I changed the debug check to test for PF_KTHREAD, and to return NULL:

+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+ if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
+ return NULL;
+
+ return (void *)task + sizeof(*task);
+}
+#endif

... and the NULL we return will likely crash & exit any kthreads attempting
to use the FPU context area - which I think is preferable to returning
invalid memory that may then be corrupted.

Hopefully this remains a hypothethical concern. :-)

Alternatively, this may be one of the very few cases where a BUG_ON() might
be justified? This condition is not recoverable in any sane fashion IMO.

Thanks,

Ingo

============================>
From: Ingo Molnar <[email protected]>
Date: Wed, 5 Jun 2024 10:35:57 +0200
Subject: [PATCH] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks

init_task's FPU state initialization was a bit of a hack:

__x86_init_fpu_begin = .;
. = __x86_init_fpu_begin + 128*PAGE_SIZE;
__x86_init_fpu_end = .;

But the init task isn't supposed to be using the FPU context
in any case, so remove the hack and add in some debug warnings.

As Linus noted in the discussion, the init task (and other
PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(),
but they don't need the context area because their FPU use is not
preemptible or reentrant, and they don't return to user-space.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/processor.h | 6 +++++-
arch/x86/kernel/fpu/core.c | 13 ++++++++++---
arch/x86/kernel/fpu/init.c | 5 ++---
arch/x86/kernel/fpu/xstate.c | 3 ---
arch/x86/kernel/vmlinux.lds.S | 4 ----
5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 249c5fa20de4..ed8981866f4d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,7 +504,11 @@ struct thread_struct {
#endif
};

-#define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#ifdef CONFIG_X86_DEBUG_FPU
+extern struct fpu *x86_task_fpu(struct task_struct *task);
+#else
+# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+#endif

/*
* X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 0ccabcd3bf62..d9ff8ca5b32d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -51,6 +51,16 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
*/
DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);

+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+ if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
+ return NULL;
+
+ return (void *)task + sizeof(*task);
+}
+#endif
+
/*
* Can we use the FPU in kernel mode with the
* whole "kernel_fpu_begin/end()" sequence?
@@ -591,10 +601,8 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
- struct fpu *src_fpu = x86_task_fpu(current);

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
- BUG_ON(!src_fpu);

/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -657,7 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
if (update_fpu_shstk(dst, ssp))
return 1;

- trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);

return 0;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 11aa31410df2..53580e59e5db 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
+ ;
else
#endif
asm volatile ("fninit");
@@ -164,7 +164,7 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
- task_size -= sizeof(x86_task_fpu(current)->__fpstate.regs);
+ task_size -= sizeof(union fpregs_state);

/*
* Add back the dynamically-calculated register state
@@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(x86_task_fpu(current));
}

/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 90b11671e943..1f37da22ddbe 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
if (err)
goto out_disable;

- /* Reset the state for the current task */
- fpstate_reset(x86_task_fpu(current));
-
/*
* Update info used for ptrace frames; use standard-format size and no
* supervisor xstates:
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 226244a894da..3509afc6a672 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,10 +170,6 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;

- __x86_init_fpu_begin = .;
- . = __x86_init_fpu_begin + 128*PAGE_SIZE;
- __x86_init_fpu_end = .;
-
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA

2024-06-06 08:46:36

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call

This encapsulates the fpu__drop() functionality better, and it
will also enable other changes that want to check a task for
PF_KTHREAD before calling x86_task_fpu().

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/kernel/fpu/core.c | 4 +++-
arch/x86/kernel/process.c | 3 +--
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 1feaa68b7567..5fd12634bcc4 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -10,7 +10,7 @@
#include <asm/trace/fpu.h>

extern void save_fpregs_to_fpstate(struct fpu *fpu);
-extern void fpu__drop(struct fpu *fpu);
+extern void fpu__drop(struct task_struct *tsk);
extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long shstk_addr);
extern void fpu_flush_thread(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d9ff8ca5b32d..3223eb3dd09d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -679,8 +679,10 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* a state-restore is coming: either an explicit one,
* or a reschedule.
*/
-void fpu__drop(struct fpu *fpu)
+void fpu__drop(struct task_struct *tsk)
{
+ struct fpu *fpu = x86_task_fpu(tsk);
+
preempt_disable();

if (fpu == x86_task_fpu(current)) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4184c085627e..0a24997c8cc6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -114,7 +114,6 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = x86_task_fpu(tsk);

if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
@@ -122,7 +121,7 @@ void exit_thread(struct task_struct *tsk)
free_vm86(t);

shstk_free(tsk);
- fpu__drop(fpu);
+ fpu__drop(tsk);
}

static int set_new_tls(struct task_struct *p, unsigned long tls)

2024-06-06 08:51:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks


* Ingo Molnar <[email protected]> wrote:

> I changed the debug check to test for PF_KTHREAD, and to return NULL:
>
> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> + if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
> + return NULL;
> +
> + return (void *)task + sizeof(*task);
> +}
> +#endif
>
> ... and the NULL we return will likely crash & exit any kthreads attempting
> to use the FPU context area - which I think is preferable to returning
> invalid memory that may then be corrupted.
>
> Hopefully this remains a hypothethical concern. :-)
>
> Alternatively, this may be one of the very few cases where a BUG_ON() might
> be justified? This condition is not recoverable in any sane fashion IMO.

And promptly this triggered in live testing, because while kthreads do not
use the FPU context area, the current fpu__drop() code does call
x86_task_cpu() even for kthreads...

See the two new 4/3 and 5/3 patches in this thread I've sent that clean
this up:

[PATCH 4/3] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
[PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit

I'll also reorder the patches to apply these fpu__drop() changes before
adding the debug warning.

Thanks,

Ingo

2024-06-06 08:56:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer


* Oleg Nesterov <[email protected]> wrote:

> On 06/05, Ingo Molnar wrote:
> >
> > @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> > * This is safe because task_struct size is a multiple of cacheline size.
> > */
> > struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > - struct fpu *src_fpu = current->thread.fpu;
> > + struct fpu *src_fpu = x86_task_fpu(current);
>
> I think this patch can also change
>
> struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
>
> above to use x86_task_fpu(dst).

Yeah, so I'd prefer to keep it open coded, because of the comment and the
debug check makes a lot more sense if the pointer calculation is visible:

/*
* We allocate the new FPU structure right after the end of the task struct.
* task allocation size already took this into account.
*
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *dst_fpu = (void *)dst + sizeof(*dst);

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);

But yes, you are right, this is technically an open-coded x86_task_fpu().

Thanks,

Ingo

2024-06-06 08:56:42

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 5/3] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit

fpu__drop() calls x86_task_fpu() unconditionally, while the FPU context
area will not be present if it's the init task, and should not be in
use when it's some other type of kthread.

Return early for PF_KTHREAD tasks. The debug warning in x86_task_fpu()
will catch any kthreads attempting to use the FPU save area.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3223eb3dd09d..db608afa686f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -681,7 +681,13 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
*/
void fpu__drop(struct task_struct *tsk)
{
- struct fpu *fpu = x86_task_fpu(tsk);
+ struct fpu *fpu;
+
+ /* PF_KTHREAD tasks do not use the FPU context area: */
+ if (tsk->flags & PF_KTHREAD)
+ return;
+
+ fpu = x86_task_fpu(tsk);

preempt_disable();


2024-06-06 09:16:19

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method


* Brian Gerst <[email protected]> wrote:

> > 17 files changed, 104 insertions(+), 107 deletions(-)
>
> This series would be better if you added the x86_task_fpu() helper in
> an initial patch without any other changes. That would make the
> actual changes more visible with less code churn.

Makes sense - I've split out the patch below and adjusted the rest of the
series. Is this what you had in mind?

Note that I also robustified the macro a bit:

-# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
+# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))

Thanks,

Ingo

========================>
From: Ingo Molnar <[email protected]>
Date: Thu, 6 Jun 2024 11:01:14 +0200
Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method

The per-task FPU context/save area is allocated right
next to task_struct() - introduce the x86_task_fpu()
helper that calculates this explicitly from the
task pointer.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/processor.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 920b0beebd11..fb6f030f0692 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -507,6 +507,8 @@ struct thread_struct {
struct fpu *fpu;
};

+#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
+
/*
* X86 doesn't need any embedded-FPU-struct quirks:
*/

2024-06-06 09:32:39

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()


* Chang S. Bae <[email protected]> wrote:

> On 6/5/2024 1:35 AM, Ingo Molnar wrote:
> > /*
> > * Handler for when we fail to restore a task's FPU state. We should never get
> > - * here because the FPU state of a task using the FPU (task->thread.fpu.state)
> > + * here because the FPU state of a task using the FPU (task->thread.fpu->state)
>
> Just a nitpick:
> fpu::fpstate now points to the active FPU in-memory storage.

Yeah, that is a stale comment, but this was just a 'sed' job in essence,
and I'd like to keep that particular patch semi-automated.

Note that later on that comment gets further mangled into:

/*
* Handler for when we fail to restore a task's FPU state. We should never get
* here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
* registers of the task previously executing on the CPU. Mitigate this class
* of vulnerability by restoring from the initial state (essentially, zeroing
* out all the FPU registers) if we can't restore from the task's FPU state.
*/

... which didn't improve clarity either. :-)

How about the patch below?

Thanks,

Ingo

=========================>
From: Ingo Molnar <[email protected]>
Date: Thu, 6 Jun 2024 11:27:57 +0200
Subject: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()

The state -> fpstate rename of the fpu::fpstate field didn't
get propagated to the comment describing ex_handler_fprestore(),
fix it.

Reported-by: Chang S. Bae <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/extable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 1359ad75da3a..bf8dab18be97 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,

/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
+ * here because the FPU state of a task using the FPU (struct fpu::fpstate)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU

2024-06-06 12:02:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks

On 06/06, Ingo Molnar wrote:
>
> I changed the debug check to test for PF_KTHREAD, and to return NULL:
>
> +#ifdef CONFIG_X86_DEBUG_FPU
> +struct fpu *x86_task_fpu(struct task_struct *task)
> +{
> + if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
> + return NULL;
> +
> + return (void *)task + sizeof(*task);
> +}
> +#endif

How many users enable CONFIG_X86_DEBUG_FPU? Perhaps it makes sense
to check PF_KTHREAD unconditionally for the start, them add
if (IS_ENABLED(X86_DEBUG_FPU)). But I won't insist.


For the record, I think we can later change this code to check

task->flags & (PF_KTHREAD | PF_USER_WORKER)

but I guess this needs some (simple) changes in the ptrace/coredump
paths.

Oleg.


2024-06-06 15:47:27

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method

On Thu, Jun 6, 2024 at 5:06 AM Ingo Molnar <[email protected]> wrote:
>
>
> * Brian Gerst <[email protected]> wrote:
>
> > > 17 files changed, 104 insertions(+), 107 deletions(-)
> >
> > This series would be better if you added the x86_task_fpu() helper in
> > an initial patch without any other changes. That would make the
> > actual changes more visible with less code churn.
>
> Makes sense - I've split out the patch below and adjusted the rest of the
> series. Is this what you had in mind?
>
> Note that I also robustified the macro a bit:
>
> -# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
> +# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
>
> Thanks,
>
> Ingo
>
> ========================>
> From: Ingo Molnar <[email protected]>
> Date: Thu, 6 Jun 2024 11:01:14 +0200
> Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
>
> The per-task FPU context/save area is allocated right
> next to task_struct() - introduce the x86_task_fpu()
> helper that calculates this explicitly from the
> task pointer.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/include/asm/processor.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 920b0beebd11..fb6f030f0692 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -507,6 +507,8 @@ struct thread_struct {
> struct fpu *fpu;
> };
>
> +#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
> +
> /*
> * X86 doesn't need any embedded-FPU-struct quirks:
> */

Since this should be the first patch in the series, It would be:

#define #define x86_task_fpu(task) (&(task)->thread.fpu)

along with converting the existing accesses to task->thread.fpu in one
patch with no other functional changes. Then you could change how the
fpu struct is allocated without touching every access site again.


Brian Gerst

2024-06-06 15:56:42

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()

On 6/6/2024 2:30 AM, Ingo Molnar wrote:
>
> From: Ingo Molnar <[email protected]>
> Date: Thu, 6 Jun 2024 11:27:57 +0200
> Subject: [PATCH] x86/fpu: Fix stale comment in ex_handler_fprestore()
>
> The state -> fpstate rename of the fpu::fpstate field didn't
> get propagated to the comment describing ex_handler_fprestore(),
> fix it.
>
> Reported-by: Chang S. Bae <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>

Yeah, looks good to me. Feel free to add:

Reviewed-by Chang S. Bae <[email protected]>

Thanks,
Chang

2024-06-07 10:57:02

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3, v4] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks


* Oleg Nesterov <[email protected]> wrote:

> On 06/06, Ingo Molnar wrote:
> >
> > I changed the debug check to test for PF_KTHREAD, and to return NULL:
> >
> > +#ifdef CONFIG_X86_DEBUG_FPU
> > +struct fpu *x86_task_fpu(struct task_struct *task)
> > +{
> > + if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
> > + return NULL;
> > +
> > + return (void *)task + sizeof(*task);
> > +}
> > +#endif
>
> How many users enable CONFIG_X86_DEBUG_FPU?

Ubuntu does:

kepler:~/tip> grep X86_DEBUG_FPU /boot/config-6.*
/boot/config-6.8.0-35-generic:CONFIG_X86_DEBUG_FPU=y

Fedora doesn't:

kepler:~/s/tmp> grep X86_DEBUG_FPU ./lib/modules/6.9.0-64.fc41.x86_64/config
# CONFIG_X86_DEBUG_FPU is not set

So it's a bit of a hit and miss, but at least one major distribution does,
which is all that we need really.

> [...] Perhaps it makes sense to check PF_KTHREAD unconditionally for the
> start, them add if (IS_ENABLED(X86_DEBUG_FPU)). But I won't insist.

I think Ubuntu ought to add enough debug coverage - and this check isn't
exactly cheap, given how it replaces a simple build time structure offset
with a full-blown function call ...

> For the record, I think we can later change this code to check
>
> task->flags & (PF_KTHREAD | PF_USER_WORKER)
>
> but I guess this needs some (simple) changes in the ptrace/coredump
> paths.

Sounds useful, would you be interested in cooking up a series on top of
tip:master (or tip:WIP.x86/fpu), if you are interested and have the time?

I'll send out one last iteration of this series today, otherwise the
changes seem close to final for an upstream merge via the tip:x86/fpu tree.

Thanks,

Ingo

2024-06-07 11:38:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method


* Brian Gerst <[email protected]> wrote:

> On Thu, Jun 6, 2024 at 5:06 AM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Brian Gerst <[email protected]> wrote:
> >
> > > > 17 files changed, 104 insertions(+), 107 deletions(-)
> > >
> > > This series would be better if you added the x86_task_fpu() helper in
> > > an initial patch without any other changes. That would make the
> > > actual changes more visible with less code churn.
> >
> > Makes sense - I've split out the patch below and adjusted the rest of the
> > series. Is this what you had in mind?
> >
> > Note that I also robustified the macro a bit:
> >
> > -# define x86_task_fpu(task) ((struct fpu *)((void *)task + sizeof(*task)))
> > +# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
> >
> > Thanks,
> >
> > Ingo
> >
> > ========================>
> > From: Ingo Molnar <[email protected]>
> > Date: Thu, 6 Jun 2024 11:01:14 +0200
> > Subject: [PATCH] x86/fpu: Introduce the x86_task_fpu() helper method
> >
> > The per-task FPU context/save area is allocated right
> > next to task_struct() - introduce the x86_task_fpu()
> > helper that calculates this explicitly from the
> > task pointer.
> >
> > Signed-off-by: Ingo Molnar <[email protected]>
> > ---
> > arch/x86/include/asm/processor.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 920b0beebd11..fb6f030f0692 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -507,6 +507,8 @@ struct thread_struct {
> > struct fpu *fpu;
> > };
> >
> > +#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
> > +
> > /*
> > * X86 doesn't need any embedded-FPU-struct quirks:
> > */
>
> Since this should be the first patch in the series, It would be:
>
> #define #define x86_task_fpu(task) (&(task)->thread.fpu)
>
> along with converting the existing accesses to task->thread.fpu in one
> patch with no other functional changes. Then you could change how the
> fpu struct is allocated without touching every access site again.

Yeah, you are right of course - I've restructured the series accordingly,
it's now indeed a lot easier to review internally now, but has the same end
result. Will post it later today.

Thanks,

Ingo

2024-06-08 06:55:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer


* Ingo Molnar <[email protected]> wrote:

>
> * Oleg Nesterov <[email protected]> wrote:
>
> > On 06/05, Ingo Molnar wrote:
> > >
> > > @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> > > * This is safe because task_struct size is a multiple of cacheline size.
> > > */
> > > struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > > - struct fpu *src_fpu = current->thread.fpu;
> > > + struct fpu *src_fpu = x86_task_fpu(current);
> >
> > I think this patch can also change
> >
> > struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> >
> > above to use x86_task_fpu(dst).
>
> Yeah, so I'd prefer to keep it open coded, because of the comment and the
> debug check makes a lot more sense if the pointer calculation is visible:

On a second thought I changed it to your suggested variant:

struct fpu *src_fpu = x86_task_fpu(current);
struct fpu *dst_fpu = x86_task_fpu(dst);

because you are right, it's in fact easier to read this way.

Thanks,

Ingo

2024-06-08 07:27:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer


* Ingo Molnar <[email protected]> wrote:

>
> * Ingo Molnar <[email protected]> wrote:
>
> >
> > * Oleg Nesterov <[email protected]> wrote:
> >
> > > On 06/05, Ingo Molnar wrote:
> > > >
> > > > @@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> > > > * This is safe because task_struct size is a multiple of cacheline size.
> > > > */
> > > > struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > > > - struct fpu *src_fpu = current->thread.fpu;
> > > > + struct fpu *src_fpu = x86_task_fpu(current);
> > >
> > > I think this patch can also change
> > >
> > > struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> > >
> > > above to use x86_task_fpu(dst).
> >
> > Yeah, so I'd prefer to keep it open coded, because of the comment and the
> > debug check makes a lot more sense if the pointer calculation is visible:
>
> On a second thought I changed it to your suggested variant:
>
> struct fpu *src_fpu = x86_task_fpu(current);
> struct fpu *dst_fpu = x86_task_fpu(dst);
>
> because you are right, it's in fact easier to read this way.

On a third thought, while more readable, this doesn't work in practice with
the current scheme, because x86_task_fpu() gets called on kthreads in
fpu_clone(), which trips up the new debugging code.

We could resolve it by special-casing PF_KTHREAD here too, but that weakens
the whole readability argument. I'll leave it as-is for now.

Thanks,

Ingo

2024-06-08 10:12:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/fpu: Remove the thread::fpu pointer

On 06/08, Ingo Molnar wrote:
>
> On a third thought, while more readable, this doesn't work in practice with
> the current scheme, because x86_task_fpu() gets called on kthreads in
> fpu_clone(), which trips up the new debugging code.

Yes, yes, I even mentioned this in reply to 3/3.

> We could resolve it by special-casing PF_KTHREAD here too, but that weakens
> the whole readability argument. I'll leave it as-is for now.

Agreed.

Note that PF_KTHREAD | PF_USER_WORKER is already a special case in fpu_clone(),
see the "if (minimal)" check. We can probably do more changes later, for example
I don't even understand why do we need to initialize dst_fpu->fpstate->regs,
switch_fpu_return() is only called by arch_exit_to_user_mode_prepare().

Oleg.