2024-06-08 07:32:26

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 00/10, -v4] x86/fpu: Remove thread::fpu

This series is one of the dependencies of the fast-headers work,
which aims to reduce header complexity by removing <asm/processor.h>
from the <linux/sched.h> dependency chain, which headers are headers
are fat enough already even if we do not combine them.

To achieve that decoupling, one of the key steps is to not embedd any
C types from <asm/processor.h> into task_struct.

The only architecture that relies on that in a serious fashion is x86,
via the 'struct thread::fpu' variable size structure. The series below
attempts to resolve it by using a calculated fpu context area address
value via the x86_task_fpu() helper. The allocation layout of
task_struct + fpu-save-area doesn't change.

Changes in -v3:

- Restructure the series to be easier to review
- Extend the debug checks to all PF_KTHREAD tasks
- A few cleanups on top

The Git tree can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/fpu

Thanks,

Ingo

======================>

Ingo Molnar (9):
x86/fpu: Introduce the x86_task_fpu() helper method
x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu()
x86/fpu: Make task_struct::thread constant size
x86/fpu: Remove the thread::fpu pointer
x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit
x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
x86/fpu: Use 'fpstate' variable names consistently
x86/fpu: Fix stale comment in ex_handler_fprestore()

arch/x86/include/asm/fpu/api.h | 2 +-
arch/x86/include/asm/fpu/sched.h | 4 +-
arch/x86/include/asm/processor.h | 23 ++++++------
arch/x86/kernel/fpu/context.h | 4 +-
arch/x86/kernel/fpu/core.c | 80 +++++++++++++++++++++++-----------------
arch/x86/kernel/fpu/init.c | 23 +++++++-----
arch/x86/kernel/fpu/regset.c | 22 +++++------
arch/x86/kernel/fpu/signal.c | 18 ++++-----
arch/x86/kernel/fpu/xstate.c | 27 ++++++--------
arch/x86/kernel/fpu/xstate.h | 6 +--
arch/x86/kernel/process.c | 7 +---
arch/x86/kernel/signal.c | 6 +--
arch/x86/kernel/traps.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 +-
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
include/linux/sched.h | 13 ++-----
18 files changed, 126 insertions(+), 121 deletions(-)


2024-06-08 07:32:30

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 1/9] x86/fpu: Introduce the x86_task_fpu() helper method

The per-task FPU context/save area is allocated right
next to task_struct, currently in a variable-size
array via task_struct::thread.fpu[], but we plan to
fully hide it from the C type scope.

Introduce the x86_task_fpu() accessor that gets to the
FPU context pointer explicitly from the task pointer.

Right now this is a simple (task)->thread.fpu wrapper.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/processor.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cb4f6c513c48..35aa8f652964 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -511,6 +511,8 @@ struct thread_struct {
*/
};

+#define x86_task_fpu(task) (&(task)->thread.fpu)
+
extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);

static inline void arch_thread_struct_whitelist(unsigned long *offset,
--
2.43.0


2024-06-08 07:32:48

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size

Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.

This allows us to remove the old kludge of task_struct being variable size:

struct task_struct {

...
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end

/* CPU-specific state of this task: */
struct thread_struct thread;

/*
* WARNING: on x86, 'thread_struct' contains a variable-sized
* structure. It *MUST* be at the end of 'task_struct'.
*
* Do not put anything below here!
*/
};

... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.

But the primary motivation is to allow the decoupling of task_struct from
hardware details (<asm/processor.h> in particular), and to eventually allow
the per-task infrastructure:

DECLARE_PER_TASK(type, name);
...
per_task(current, name) = val;

... which requires task_struct to be a constant size struct.

The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/processor.h | 20 +++++++++-----------
arch/x86/kernel/fpu/core.c | 23 ++++++++++++-----------
arch/x86/kernel/fpu/init.c | 19 ++++++++++++-------
arch/x86/kernel/process.c | 2 +-
include/linux/sched.h | 13 +++----------
5 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 35aa8f652964..64509c7f26c8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,21 +504,19 @@ struct thread_struct {
#endif

/* Floating point and extended processor state */
- struct fpu fpu;
- /*
- * WARNING: 'fpu' is dynamically-sized. It *MUST* be at
- * the end.
- */
+ struct fpu *fpu;
};

-#define x86_task_fpu(task) (&(task)->thread.fpu)
+#define x86_task_fpu(task) ((task)->thread.fpu)

-extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
-
-static inline void arch_thread_struct_whitelist(unsigned long *offset,
- unsigned long *size)
+/*
+ * X86 doesn't need any embedded-FPU-struct quirks:
+ */
+static inline void
+arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
{
- fpu_thread_struct_whitelist(offset, size);
+ *offset = 0;
+ *size = 0;
}

static inline void
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index ca6745f8ac2a..f0c4367804b3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -584,8 +584,19 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long ssp)
{
+ /*
+ * We allocate the new FPU structure right after the end of the task struct.
+ * task allocation size already took this into account.
+ *
+ * This is safe because task_struct size is a multiple of cacheline size.
+ */
struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = x86_task_fpu(dst);
+ struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+
+ BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
+ BUG_ON(!src_fpu);
+
+ dst->thread.fpu = dst_fpu;

/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -654,16 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
return 0;
}

-/*
- * Whitelist the FPU register state embedded into task_struct for hardened
- * usercopy.
- */
-void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
-{
- *offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
- *size = fpu_kernel_cfg.default_size;
-}
-
/*
* Drops current FPU state: deactivates the fpregs and
* the fpstate. NOTE: it still leaves previous contents
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index ad5cb2943d37..4e8d37b5a90b 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,8 +71,17 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}

+static struct fpu x86_init_fpu __read_mostly;
+
static void __init fpu__init_system_early_generic(void)
{
+ int this_cpu = smp_processor_id();
+
+ fpstate_reset(&x86_init_fpu);
+ current->thread.fpu = &x86_init_fpu;
+ per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
+ x86_init_fpu.last_cpu = this_cpu;
+
if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
if (fpu__probe_without_cpuid())
@@ -150,6 +159,8 @@ static void __init fpu__init_task_struct_size(void)
{
int task_size = sizeof(struct task_struct);

+ task_size += sizeof(struct fpu);
+
/*
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
@@ -164,14 +175,9 @@ static void __init fpu__init_task_struct_size(void)

/*
* We dynamically size 'struct fpu', so we require that
- * it be at the end of 'thread_struct' and that
- * 'thread_struct' be at the end of 'task_struct'. If
- * you hit a compile error here, check the structure to
- * see if something got added to the end.
+ * 'state' be at the end of 'it:
*/
CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
- CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
- CHECK_MEMBER_AT_END_OF(struct task_struct, thread);

arch_task_struct_size = task_size;
}
@@ -213,7 +219,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
*/
void __init fpu__init_system(void)
{
- fpstate_reset(x86_task_fpu(current));
fpu__init_system_early_generic();

/*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index adfeefd6375a..5bb73bc0e31a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
dst->thread.vm86 = NULL;
#endif
/* Drop the copied pointer to current's fpstate */
- x86_task_fpu(dst)->fpstate = NULL;
+ dst->thread.fpu = NULL;

return 0;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..215a7380e41c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1554,21 +1554,14 @@ struct task_struct {
struct user_event_mm *user_event_mm;
#endif

+ /* CPU-specific state of this task: */
+ struct thread_struct thread;
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-
- /* CPU-specific state of this task: */
- struct thread_struct thread;
-
- /*
- * WARNING: on x86, 'thread_struct' contains a variable-sized
- * structure. It *MUST* be at the end of 'task_struct'.
- *
- * Do not put anything below here!
- */
};

#define TASK_REPORT_IDLE (TASK_REPORT + 1)
--
2.43.0


2024-06-08 07:32:54

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 4/9] x86/fpu: Remove the thread::fpu pointer

As suggested by Oleg, remove the thread::fpu pointer, as we can
calculate it via x86_task_fpu() at compile-time.

This improves code generation a bit:

kepler:~/tip> size vmlinux.before vmlinux.after
text data bss dec hex filename
26475405 10435342 1740804 38651551 24dc69f vmlinux.before
26475339 10959630 1216516 38651485 24dc65d vmlinux.after

Suggested-by: Oleg Nesterov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/processor.h | 5 +----
arch/x86/kernel/fpu/core.c | 4 +---
arch/x86/kernel/fpu/init.c | 1 -
arch/x86/kernel/process.c | 2 --
arch/x86/kernel/vmlinux.lds.S | 4 ++++
5 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 64509c7f26c8..3de609aad0af 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -502,12 +502,9 @@ struct thread_struct {

struct thread_shstk shstk;
#endif
-
- /* Floating point and extended processor state */
- struct fpu *fpu;
};

-#define x86_task_fpu(task) ((task)->thread.fpu)
+#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))

/*
* X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f0c4367804b3..167a9c7ed6d3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -591,13 +591,11 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+ struct fpu *dst_fpu = x86_task_fpu(dst);

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
BUG_ON(!src_fpu);

- dst->thread.fpu = dst_fpu;
-
/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..794682b52373 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -78,7 +78,6 @@ static void __init fpu__init_system_early_generic(void)
int this_cpu = smp_processor_id();

fpstate_reset(&x86_init_fpu);
- current->thread.fpu = &x86_init_fpu;
per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
x86_init_fpu.last_cpu = this_cpu;

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5bb73bc0e31a..4184c085627e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,8 +96,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
#ifdef CONFIG_VM86
dst->thread.vm86 = NULL;
#endif
- /* Drop the copied pointer to current's fpstate */
- dst->thread.fpu = NULL;

return 0;
}
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3509afc6a672..226244a894da 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,6 +170,10 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;

+ __x86_init_fpu_begin = .;
+ . = __x86_init_fpu_begin + 128*PAGE_SIZE;
+ __x86_init_fpu_end = .;
+
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA
--
2.43.0


2024-06-08 07:32:55

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 2/9] x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu()

This will make the removal of the task_struct::thread.fpu array
easier.

No change in functionality - code generated before and after this
commit is identical on x86-defconfig:

kepler:~/tip> diff -up vmlinux.before.asm vmlinux.after.asm
kepler:~/tip>

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/kernel/fpu/context.h | 4 ++--
arch/x86/kernel/fpu/core.c | 30 +++++++++++++++---------------
arch/x86/kernel/fpu/init.c | 8 ++++----
arch/x86/kernel/fpu/regset.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/signal.c | 18 +++++++++---------
arch/x86/kernel/fpu/xstate.c | 22 +++++++++++-----------
arch/x86/kernel/fpu/xstate.h | 6 +++---
arch/x86/kernel/process.c | 6 +++---
arch/x86/kernel/signal.c | 6 +++---
arch/x86/kernel/traps.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/extable.c | 2 +-
15 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index c485f1944c5f..1feaa68b7567 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -41,7 +41,7 @@ static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
{
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
!(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
- struct fpu *old_fpu = &old->thread.fpu;
+ struct fpu *old_fpu = x86_task_fpu(old);

save_fpregs_to_fpstate(old_fpu);
/*
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index f6d856bd50bc..10d0a720659c 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -53,7 +53,7 @@ static inline void fpregs_activate(struct fpu *fpu)
/* Internal helper for switch_fpu_return() and signal frame setup */
static inline void fpregs_restore_userregs(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
int cpu = smp_processor_id();

if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
@@ -67,7 +67,7 @@ static inline void fpregs_restore_userregs(void)
* If PKRU is enabled, then the PKRU value is already
* correct because it was either set in switch_to() or in
* flush_thread(). So it is excluded because it might be
- * not up to date in current->thread.fpu.xsave state.
+ * not up to date in current->thread.fpu->xsave state.
*
* XFD state is handled in restore_fpregs_from_fpstate().
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1209c7aebb21..ca6745f8ac2a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -204,7 +204,7 @@ static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
return;

spin_lock_irq(&current->sighand->siglock);
- fpuperm = &current->group_leader->thread.fpu.guest_perm;
+ fpuperm = &x86_task_fpu(current->group_leader)->guest_perm;
perm = fpuperm->__state_perm;

/* First fpstate allocation locks down permissions. */
@@ -316,7 +316,7 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
- struct fpstate *fps = current->thread.fpu.fpstate;
+ struct fpstate *fps = x86_task_fpu(current)->fpstate;

lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
@@ -330,7 +330,7 @@ EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
{
struct fpstate *guest_fps = guest_fpu->fpstate;
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
struct fpstate *cur_fps = fpu->fpstate;

fpregs_lock();
@@ -430,7 +430,7 @@ void kernel_fpu_begin_mask(unsigned int kfpu_mask)
if (!(current->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
!test_thread_flag(TIF_NEED_FPU_LOAD)) {
set_thread_flag(TIF_NEED_FPU_LOAD);
- save_fpregs_to_fpstate(&current->thread.fpu);
+ save_fpregs_to_fpstate(x86_task_fpu(current));
}
__cpu_invalidate_fpregs_state();

@@ -458,7 +458,7 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
*/
void fpu_sync_fpstate(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != &current->thread.fpu);
+ WARN_ON_FPU(fpu != x86_task_fpu(current));

fpregs_lock();
trace_x86_fpu_before_save(fpu);
@@ -543,7 +543,7 @@ void fpstate_reset(struct fpu *fpu)
static inline void fpu_inherit_perms(struct fpu *dst_fpu)
{
if (fpu_state_size_dynamic()) {
- struct fpu *src_fpu = &current->group_leader->thread.fpu;
+ struct fpu *src_fpu = x86_task_fpu(current->group_leader);

spin_lock_irq(&current->sighand->siglock);
/* Fork also inherits the permissions of the parent */
@@ -563,7 +563,7 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
if (!ssp)
return 0;

- xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave,
+ xstate = get_xsave_addr(&x86_task_fpu(dst)->fpstate->regs.xsave,
XFEATURE_CET_USER);

/*
@@ -584,8 +584,8 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long ssp)
{
- struct fpu *src_fpu = &current->thread.fpu;
- struct fpu *dst_fpu = &dst->thread.fpu;
+ struct fpu *src_fpu = x86_task_fpu(current);
+ struct fpu *dst_fpu = x86_task_fpu(dst);

/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -677,7 +677,7 @@ void fpu__drop(struct fpu *fpu)
{
preempt_disable();

- if (fpu == &current->thread.fpu) {
+ if (fpu == x86_task_fpu(current)) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
@@ -711,7 +711,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
*/
static void fpu_reset_fpregs(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

fpregs_lock();
__fpu_invalidate_fpregs_state(fpu);
@@ -740,7 +740,7 @@ static void fpu_reset_fpregs(void)
*/
void fpu__clear_user_states(struct fpu *fpu)
{
- WARN_ON_FPU(fpu != &current->thread.fpu);
+ WARN_ON_FPU(fpu != x86_task_fpu(current));

fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
@@ -773,7 +773,7 @@ void fpu__clear_user_states(struct fpu *fpu)

void fpu_flush_thread(void)
{
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
fpu_reset_fpregs();
}
/*
@@ -814,7 +814,7 @@ void fpregs_lock_and_load(void)
*/
void fpregs_assert_state_consistent(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

if (test_thread_flag(TIF_NEED_FPU_LOAD))
return;
@@ -826,7 +826,7 @@ EXPORT_SYMBOL_GPL(fpregs_assert_state_consistent);

void fpregs_mark_activate(void)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

fpregs_activate(fpu);
fpu->last_cpu = smp_processor_id();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 998a08f17e33..ad5cb2943d37 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
+ fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
else
#endif
asm volatile ("fninit");
@@ -154,7 +154,7 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
- task_size -= sizeof(current->thread.fpu.__fpstate.regs);
+ task_size -= sizeof(union fpregs_state);

/*
* Add back the dynamically-calculated register state
@@ -204,7 +204,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
}

/*
@@ -213,7 +213,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
*/
void __init fpu__init_system(void)
{
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
fpu__init_system_early_generic();

/*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 6bc1eb2a21bd..19fd159217f7 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -45,7 +45,7 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
*/
static void sync_fpstate(struct fpu *fpu)
{
- if (fpu == &current->thread.fpu)
+ if (fpu == x86_task_fpu(current))
fpu_sync_fpstate(fpu);
}

@@ -63,7 +63,7 @@ static void fpu_force_restore(struct fpu *fpu)
* Only stopped child tasks can be used to modify the FPU
* state in the fpstate buffer:
*/
- WARN_ON_FPU(fpu == &current->thread.fpu);
+ WARN_ON_FPU(fpu == x86_task_fpu(current));

__fpu_invalidate_fpregs_state(fpu);
}
@@ -71,7 +71,7 @@ static void fpu_force_restore(struct fpu *fpu)
int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);

if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
@@ -91,7 +91,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct fxregs_state newstate;
int ret;

@@ -133,7 +133,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV;

- sync_fpstate(&target->thread.fpu);
+ sync_fpstate(x86_task_fpu(target));

copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
return 0;
@@ -143,7 +143,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct xregs_state *tmpbuf = NULL;
int ret;

@@ -187,7 +187,7 @@ int ssp_active(struct task_struct *target, const struct user_regset *regset)
int ssp_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct cet_user_state *cetregs;

if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK))
@@ -213,7 +213,7 @@ int ssp_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct xregs_state *xsave = &fpu->fpstate->regs.xsave;
struct cet_user_state *cetregs;
unsigned long user_ssp;
@@ -367,7 +367,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
- __convert_from_fxsr(env, tsk, &tsk->thread.fpu.fpstate->regs.fxsave);
+ __convert_from_fxsr(env, tsk, &x86_task_fpu(tsk)->fpstate->regs.fxsave);
}

void convert_to_fxsr(struct fxregs_state *fxsave,
@@ -400,7 +400,7 @@ void convert_to_fxsr(struct fxregs_state *fxsave,
int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;

@@ -432,7 +432,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct fpu *fpu = &target->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(target);
struct user_i387_ia32_struct env;
int ret;

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 247f2225aa9f..b203bf4617fc 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -38,7 +38,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
- fx_sw->xstate_size > current->thread.fpu.fpstate->user_size ||
+ fx_sw->xstate_size > x86_task_fpu(current)->fpstate->user_size ||
fx_sw->xstate_size > fx_sw->extended_size)
goto setfx;

@@ -54,7 +54,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
if (likely(magic2 == FP_XSTATE_MAGIC2))
return true;
setfx:
- trace_x86_fpu_xstate_check_failed(&current->thread.fpu);
+ trace_x86_fpu_xstate_check_failed(x86_task_fpu(current));

/* Set the parameters for fx only state */
fx_sw->magic1 = 0;
@@ -69,13 +69,13 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
static inline bool save_fsave_header(struct task_struct *tsk, void __user *buf)
{
if (use_fxsr()) {
- struct xregs_state *xsave = &tsk->thread.fpu.fpstate->regs.xsave;
+ struct xregs_state *xsave = &x86_task_fpu(tsk)->fpstate->regs.xsave;
struct user_i387_ia32_struct env;
struct _fpstate_32 __user *fp = buf;

fpregs_lock();
if (!test_thread_flag(TIF_NEED_FPU_LOAD))
- fxsave(&tsk->thread.fpu.fpstate->regs.fxsave);
+ fxsave(&x86_task_fpu(tsk)->fpstate->regs.fxsave);
fpregs_unlock();

convert_from_fxsr(&env, tsk);
@@ -188,7 +188,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
{
struct task_struct *tsk = current;
- struct fpstate *fpstate = tsk->thread.fpu.fpstate;
+ struct fpstate *fpstate = x86_task_fpu(tsk)->fpstate;
bool ia32_fxstate = (buf != buf_fx);
int ret;

@@ -276,7 +276,7 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
*/
static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
int ret;

/* Restore enabled features only. */
@@ -336,7 +336,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
bool ia32_fxstate)
{
struct task_struct *tsk = current;
- struct fpu *fpu = &tsk->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(tsk);
struct user_i387_ia32_struct env;
bool success, fx_only = false;
union fpregs_state *fpregs;
@@ -456,7 +456,7 @@ static inline unsigned int xstate_sigframe_size(struct fpstate *fpstate)
*/
bool fpu__restore_sig(void __user *buf, int ia32_frame)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
void __user *buf_fx = buf;
bool ia32_fxstate = false;
bool success = false;
@@ -503,7 +503,7 @@ unsigned long
fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long *buf_fx, unsigned long *size)
{
- unsigned long frame_size = xstate_sigframe_size(current->thread.fpu.fpstate);
+ unsigned long frame_size = xstate_sigframe_size(x86_task_fpu(current)->fpstate);

*buf_fx = sp = round_down(sp - frame_size, 64);
if (ia32_frame && use_fxsr()) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee5e0..90b11671e943 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -735,7 +735,7 @@ static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
*/
init_fpstate.xfd = 0;

- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));
}

/*
@@ -845,7 +845,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;

/* Reset the state for the current task */
- fpstate_reset(&current->thread.fpu);
+ fpstate_reset(x86_task_fpu(current));

/*
* Update info used for ptrace frames; use standard-format size and no
@@ -919,7 +919,7 @@ void fpu__resume_cpu(void)
}

if (fpu_state_size_dynamic())
- wrmsrl(MSR_IA32_XFD, current->thread.fpu.fpstate->xfd);
+ wrmsrl(MSR_IA32_XFD, x86_task_fpu(current)->fpstate->xfd);
}

/*
@@ -1188,8 +1188,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode copy_mode)
{
- __copy_xstate_to_uabi_buf(to, tsk->thread.fpu.fpstate,
- tsk->thread.fpu.fpstate->user_xfeatures,
+ __copy_xstate_to_uabi_buf(to, x86_task_fpu(tsk)->fpstate,
+ x86_task_fpu(tsk)->fpstate->user_xfeatures,
tsk->thread.pkru, copy_mode);
}

@@ -1329,7 +1329,7 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u
int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
+ return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru);
}

static bool validate_independent_components(u64 mask)
@@ -1423,7 +1423,7 @@ static bool xstate_op_valid(struct fpstate *fpstate, u64 mask, bool rstor)
* The XFD MSR does not match fpstate->xfd. That's invalid when
* the passed in fpstate is current's fpstate.
*/
- if (fpstate->xfd == current->thread.fpu.fpstate->xfd)
+ if (fpstate->xfd == x86_task_fpu(current)->fpstate->xfd)
return false;

/*
@@ -1500,7 +1500,7 @@ void fpstate_free(struct fpu *fpu)
static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
unsigned int usize, struct fpu_guest *guest_fpu)
{
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);
struct fpstate *curfps, *newfps = NULL;
unsigned int fpsize;
bool in_use;
@@ -1593,7 +1593,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
* AVX512.
*/
bool compacted = cpu_feature_enabled(X86_FEATURE_XCOMPACTED);
- struct fpu *fpu = &current->group_leader->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current->group_leader);
struct fpu_state_perm *perm;
unsigned int ksize, usize;
u64 mask;
@@ -1696,7 +1696,7 @@ int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
return -EPERM;
}

- fpu = &current->group_leader->thread.fpu;
+ fpu = x86_task_fpu(current->group_leader);
perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
ksize = perm->__state_size;
usize = perm->__user_state_size;
@@ -1801,7 +1801,7 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
*/
static void avx512_status(struct seq_file *m, struct task_struct *task)
{
- unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+ unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
long delta;

if (!timestamp) {
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 05df04f39628..391fa3c69e5e 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -22,7 +22,7 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)

static inline u64 xstate_get_group_perm(bool guest)
{
- struct fpu *fpu = &current->group_leader->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current->group_leader);
struct fpu_state_perm *perm;

/* Pairs with WRITE_ONCE() in xstate_request_perm() */
@@ -265,7 +265,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
* internally, e.g. PKRU. That's user space ABI and also required
* to allow the signal handler to modify PKRU.
*/
- struct fpstate *fpstate = current->thread.fpu.fpstate;
+ struct fpstate *fpstate = x86_task_fpu(current)->fpstate;
u64 mask = fpstate->user_xfeatures;
u32 lmask;
u32 hmask;
@@ -296,7 +296,7 @@ static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64
u32 hmask = mask >> 32;
int err;

- xfd_validate_state(current->thread.fpu.fpstate, mask, true);
+ xfd_validate_state(x86_task_fpu(current)->fpstate, mask, true);

stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..adfeefd6375a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
dst->thread.vm86 = NULL;
#endif
/* Drop the copied pointer to current's fpstate */
- dst->thread.fpu.fpstate = NULL;
+ x86_task_fpu(dst)->fpstate = NULL;

return 0;
}
@@ -106,7 +106,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
void arch_release_task_struct(struct task_struct *tsk)
{
if (fpu_state_size_dynamic())
- fpstate_free(&tsk->thread.fpu);
+ fpstate_free(x86_task_fpu(tsk));
}
#endif

@@ -116,7 +116,7 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = &t->fpu;
+ struct fpu *fpu = x86_task_fpu(tsk);

if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 31b6f5dddfc2..d7906f5979a4 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -228,7 +228,7 @@ static void
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
bool stepping, failed;
- struct fpu *fpu = &current->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(current);

if (v8086_mode(regs))
save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -396,14 +396,14 @@ bool sigaltstack_size_valid(size_t ss_size)
if (!fpu_state_size_dynamic() && !strict_sigaltstack_size)
return true;

- fsize += current->group_leader->thread.fpu.perm.__user_state_size;
+ fsize += x86_task_fpu(current->group_leader)->perm.__user_state_size;
if (likely(ss_size > fsize))
return true;

if (strict_sigaltstack_size)
return ss_size > fsize;

- mask = current->group_leader->thread.fpu.perm.__state_perm;
+ mask = x86_task_fpu(current->group_leader)->perm.__state_perm;
if (mask & XFEATURE_MASK_USER_DYNAMIC)
return ss_size > fsize;

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..b19fb494c57c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1148,7 +1148,7 @@ DEFINE_IDTENTRY_RAW(exc_debug)
static void math_error(struct pt_regs *regs, int trapnr)
{
struct task_struct *task = current;
- struct fpu *fpu = &task->thread.fpu;
+ struct fpu *fpu = x86_task_fpu(task);
int si_code;
char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
"simd exception";
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index d62662bdd460..5f253ae406b6 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -53,7 +53,7 @@ void fpstate_init_soft(struct swregs_state *soft)

void finit(void)
{
- fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
+ fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
}

/*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 91c52ead1226..5034df617740 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -641,7 +641,7 @@ int fpregs_soft_set(struct task_struct *target,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+ struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
void *space = s387->st_space;
int ret;
int offset, other, i, tags, regnr, tag, newtop;
@@ -692,7 +692,7 @@ int fpregs_soft_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)
{
- struct swregs_state *s387 = &target->thread.fpu.fpstate->regs.soft;
+ struct swregs_state *s387 = &x86_task_fpu(target)->fpstate->regs.soft;
const void *space = s387->st_space;
int offset = (S387->ftop & 7) * 10, other = 80 - offset;

diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index eec3e4805c75..5e238e930fe3 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -73,7 +73,7 @@ static inline bool seg_writable(struct desc_struct *d)
return (d->type & SEG_TYPE_EXECUTE_MASK) == SEG_TYPE_WRITABLE;
}

-#define I387 (&current->thread.fpu.fpstate->regs)
+#define I387 (&x86_task_fpu(current)->fpstate->regs)
#define FPU_info (I387->soft.info)

#define FPU_CS (*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 51986e8a9d35..1359ad75da3a 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,

/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (task->thread.fpu.state)
+ * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
--
2.43.0


2024-06-08 07:33:08

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 5/9] x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call

This encapsulates the fpu__drop() functionality better, and it
will also enable other changes that want to check a task for
PF_KTHREAD before calling x86_task_fpu().

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/sched.h | 2 +-
arch/x86/kernel/fpu/core.c | 4 +++-
arch/x86/kernel/process.c | 3 +--
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index 1feaa68b7567..5fd12634bcc4 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -10,7 +10,7 @@
#include <asm/trace/fpu.h>

extern void save_fpregs_to_fpstate(struct fpu *fpu);
-extern void fpu__drop(struct fpu *fpu);
+extern void fpu__drop(struct task_struct *tsk);
extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
unsigned long shstk_addr);
extern void fpu_flush_thread(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 167a9c7ed6d3..c85667c0695d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -672,8 +672,10 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* a state-restore is coming: either an explicit one,
* or a reschedule.
*/
-void fpu__drop(struct fpu *fpu)
+void fpu__drop(struct task_struct *tsk)
{
+ struct fpu *fpu = x86_task_fpu(tsk);
+
preempt_disable();

if (fpu == x86_task_fpu(current)) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4184c085627e..0a24997c8cc6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -114,7 +114,6 @@ void arch_release_task_struct(struct task_struct *tsk)
void exit_thread(struct task_struct *tsk)
{
struct thread_struct *t = &tsk->thread;
- struct fpu *fpu = x86_task_fpu(tsk);

if (test_thread_flag(TIF_IO_BITMAP))
io_bitmap_exit(tsk);
@@ -122,7 +121,7 @@ void exit_thread(struct task_struct *tsk)
free_vm86(t);

shstk_free(tsk);
- fpu__drop(fpu);
+ fpu__drop(tsk);
}

static int set_new_tls(struct task_struct *p, unsigned long tls)
--
2.43.0


2024-06-08 07:33:11

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit

fpu__drop() calls x86_task_fpu() unconditionally, while the FPU context
area will not be present if it's the init task, and should not be in
use when it's some other type of kthread.

Return early for PF_KTHREAD tasks. The debug warning in x86_task_fpu()
will catch any kthreads attempting to use the FPU save area.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index c85667c0695d..52d5843c886c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -674,7 +674,13 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
*/
void fpu__drop(struct task_struct *tsk)
{
- struct fpu *fpu = x86_task_fpu(tsk);
+ struct fpu *fpu;
+
+ /* PF_KTHREAD tasks do not use the FPU context area: */
+ if (tsk->flags & PF_KTHREAD)
+ return;
+
+ fpu = x86_task_fpu(tsk);

preempt_disable();

--
2.43.0


2024-06-08 07:33:43

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 8/9] x86/fpu: Use 'fpstate' variable names consistently

A few uses of 'fps' snuck in, which is rather confusing
(to me) as it suggests frames-per-second. ;-)

Rename them to the canonical 'fpstate' name.

No change in functionality.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 2 +-
arch/x86/kernel/fpu/core.c | 14 +++++++-------
arch/x86/kernel/fpu/xstate.c | 4 ++--
3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index f86ad3335529..708ea13ab2b0 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -139,7 +139,7 @@ static inline void fpstate_free(struct fpu *fpu) { }
#endif

/* fpstate-related functions which are exported to KVM */
-extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
+extern void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int xfeature);

extern u64 xstate_get_guest_group_perm(void);

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index db608afa686f..6aef0116add4 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -266,16 +266,16 @@ EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate);

void fpu_free_guest_fpstate(struct fpu_guest *gfpu)
{
- struct fpstate *fps = gfpu->fpstate;
+ struct fpstate *fpstate = gfpu->fpstate;

- if (!fps)
+ if (!fpstate)
return;

- if (WARN_ON_ONCE(!fps->is_valloc || !fps->is_guest || fps->in_use))
+ if (WARN_ON_ONCE(!fpstate->is_valloc || !fpstate->is_guest || fpstate->in_use))
return;

gfpu->fpstate = NULL;
- vfree(fps);
+ vfree(fpstate);
}
EXPORT_SYMBOL_GPL(fpu_free_guest_fpstate);

@@ -326,12 +326,12 @@ EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
- struct fpstate *fps = x86_task_fpu(current)->fpstate;
+ struct fpstate *fpstate = x86_task_fpu(current)->fpstate;

lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
- rdmsrl(MSR_IA32_XFD, fps->xfd);
- __this_cpu_write(xfd_state, fps->xfd);
+ rdmsrl(MSR_IA32_XFD, fpstate->xfd);
+ __this_cpu_write(xfd_state, fpstate->xfd);
}
}
EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1f37da22ddbe..68dea76b4afc 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1392,9 +1392,9 @@ void xrstors(struct xregs_state *xstate, u64 mask)
}

#if IS_ENABLED(CONFIG_KVM)
-void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature)
+void fpstate_clear_xstate_component(struct fpstate *fpstate, unsigned int xfeature)
{
- void *addr = get_xsave_addr(&fps->regs.xsave, xfeature);
+ void *addr = get_xsave_addr(&fpstate->regs.xsave, xfeature);

if (addr)
memset(addr, 0, xstate_sizes[xfeature]);
--
2.43.0


2024-06-08 07:33:45

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 7/9] x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks

init_task's FPU state initialization was a bit of a hack:

__x86_init_fpu_begin = .;
. = __x86_init_fpu_begin + 128*PAGE_SIZE;
__x86_init_fpu_end = .;

But the init task isn't supposed to be using the FPU context
in any case, so remove the hack and add in some debug warnings.

As Linus noted in the discussion, the init task (and other
PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(),
but they don't need the context area because their FPU use is not
preemptible or reentrant, and they don't return to user-space.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/processor.h | 6 +++++-
arch/x86/kernel/fpu/core.c | 15 +++++++++++----
arch/x86/kernel/fpu/init.c | 3 +--
arch/x86/kernel/fpu/xstate.c | 3 ---
arch/x86/kernel/vmlinux.lds.S | 4 ----
5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3de609aad0af..4fd3364dbc73 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -504,7 +504,11 @@ struct thread_struct {
#endif
};

-#define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
+#ifdef CONFIG_X86_DEBUG_FPU
+extern struct fpu *x86_task_fpu(struct task_struct *task);
+#else
+# define x86_task_fpu(task) ((struct fpu *)((void *)(task) + sizeof(*(task))))
+#endif

/*
* X86 doesn't need any embedded-FPU-struct quirks:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 52d5843c886c..db608afa686f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -51,6 +51,16 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu);
*/
DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);

+#ifdef CONFIG_X86_DEBUG_FPU
+struct fpu *x86_task_fpu(struct task_struct *task)
+{
+ if (WARN_ON_ONCE(task->flags & PF_KTHREAD))
+ return NULL;
+
+ return (void *)task + sizeof(*task);
+}
+#endif
+
/*
* Can we use the FPU in kernel mode with the
* whole "kernel_fpu_begin/end()" sequence?
@@ -590,11 +600,9 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
*
* This is safe because task_struct size is a multiple of cacheline size.
*/
- struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = x86_task_fpu(dst);
+ struct fpu *dst_fpu = (void *)dst + sizeof(*dst);

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
- BUG_ON(!src_fpu);

/* The new task's FPU state cannot be valid in the hardware. */
dst_fpu->last_cpu = -1;
@@ -657,7 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
if (update_fpu_shstk(dst, ssp))
return 1;

- trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);

return 0;
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 794682b52373..53580e59e5db 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
- fpstate_init_soft(&x86_task_fpu(current)->fpstate->regs.soft);
+ ;
else
#endif
asm volatile ("fninit");
@@ -209,7 +209,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
- fpstate_reset(x86_task_fpu(current));
}

/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 90b11671e943..1f37da22ddbe 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -844,9 +844,6 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
if (err)
goto out_disable;

- /* Reset the state for the current task */
- fpstate_reset(x86_task_fpu(current));
-
/*
* Update info used for ptrace frames; use standard-format size and no
* supervisor xstates:
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 226244a894da..3509afc6a672 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -170,10 +170,6 @@ SECTIONS
/* equivalent to task_pt_regs(&init_task) */
__top_init_kernel_stack = __end_init_stack - TOP_OF_KERNEL_STACK_PADDING - PTREGS_SIZE;

- __x86_init_fpu_begin = .;
- . = __x86_init_fpu_begin + 128*PAGE_SIZE;
- __x86_init_fpu_end = .;
-
#ifdef CONFIG_X86_32
/* 32 bit has nosave before _edata */
NOSAVE_DATA
--
2.43.0


2024-06-08 07:33:52

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 9/9] x86/fpu: Fix stale comment in ex_handler_fprestore()

The state -> fpstate rename of the fpu::fpstate field didn't
get propagated to the comment describing ex_handler_fprestore(),
fix it.

Reported-by: Chang S. Bae <[email protected]>
Reviewed-by Chang S. Bae <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/extable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 1359ad75da3a..bf8dab18be97 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -111,7 +111,7 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,

/*
* Handler for when we fail to restore a task's FPU state. We should never get
- * here because the FPU state of a task using the FPU (x86_task_fpu(task)->state)
+ * here because the FPU state of a task using the FPU (struct fpu::fpstate)
* should always be valid. However, past bugs have allowed userspace to set
* reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
* These caused XRSTOR to fail when switching to the task, leaking the FPU
--
2.43.0


2024-06-10 10:25:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD tasks during exit

The whole series looks good to me, and afaics 7/9 allows more
cleanups / improvements.

But let me ask a stupid question about fpu__drop(), I know nothing
about fpu asm.

fpu__drop() does

/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"

and this comment predates the git history. Could someone explain
why exactly the exiting user-space thread needs fwait ?

And if it is needed, suppose that a kernel thread exits right
after kernel_fpu_end(), can this lead to the delayed exception?


And otoh, perhaps fpu__drop() can set TIF_NEED_FPU_LOAD to avoid
switch_fpu_prepare()->save_fpregs_to_fpstate() on its path to the
final schedule?


On 06/08, Ingo Molnar wrote:
>
> void fpu__drop(struct task_struct *tsk)
> {
> - struct fpu *fpu = x86_task_fpu(tsk);
> + struct fpu *fpu;
> +
> + /* PF_KTHREAD tasks do not use the FPU context area: */
> + if (tsk->flags & PF_KTHREAD)
> + return;

I think it can already do

if (tsk->flags & (PF_KTHREAD | PF_USER_WORKER))
return;

This matches other similar checks. But I won't insist, and I
think all these checks need some cleanups anyway.

Oleg.


2024-06-10 21:14:01

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size

Hi Ingo,

On Sat, Jun 08, 2024 at 09:31:28AM +0200, Ingo Molnar wrote:
> Turn thread.fpu into a pointer. Since most FPU code internals work by passing
> around the FPU pointer already, the code generation impact is small.
>
> This allows us to remove the old kludge of task_struct being variable size:
>
> struct task_struct {
>
> ...
> /*
> * New fields for task_struct should be added above here, so that
> * they are included in the randomized portion of task_struct.
> */
> randomized_struct_fields_end
>
> /* CPU-specific state of this task: */
> struct thread_struct thread;
>
> /*
> * WARNING: on x86, 'thread_struct' contains a variable-sized
> * structure. It *MUST* be at the end of 'task_struct'.
> *
> * Do not put anything below here!
> */
> };
>
> ... which creates a number of problems, such as requiring thread_struct to be
> the last member of the struct - not allowing it to be struct-randomized, etc.
>
> But the primary motivation is to allow the decoupling of task_struct from
> hardware details (<asm/processor.h> in particular), and to eventually allow
> the per-task infrastructure:
>
> DECLARE_PER_TASK(type, name);
> ...
> per_task(current, name) = val;
>
> ... which requires task_struct to be a constant size struct.
>
> The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
> now that the FPU structure is not embedded in the task struct anymore, which
> reduces text footprint a bit.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Fenghua Yu <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Uros Bizjak <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> arch/x86/include/asm/processor.h | 20 +++++++++-----------
> arch/x86/kernel/fpu/core.c | 23 ++++++++++++-----------
> arch/x86/kernel/fpu/init.c | 19 ++++++++++++-------
> arch/x86/kernel/process.c | 2 +-
> include/linux/sched.h | 13 +++----------
> 5 files changed, 37 insertions(+), 40 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 35aa8f652964..64509c7f26c8 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -504,21 +504,19 @@ struct thread_struct {
> #endif
>
> /* Floating point and extended processor state */
> - struct fpu fpu;
> - /*
> - * WARNING: 'fpu' is dynamically-sized. It *MUST* be at
> - * the end.
> - */
> + struct fpu *fpu;
> };
>
> -#define x86_task_fpu(task) (&(task)->thread.fpu)
> +#define x86_task_fpu(task) ((task)->thread.fpu)
>
> -extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
> -
> -static inline void arch_thread_struct_whitelist(unsigned long *offset,
> - unsigned long *size)
> +/*
> + * X86 doesn't need any embedded-FPU-struct quirks:
> + */
> +static inline void
> +arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
> {
> - fpu_thread_struct_whitelist(offset, size);
> + *offset = 0;
> + *size = 0;
> }
>
> static inline void
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index ca6745f8ac2a..f0c4367804b3 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -584,8 +584,19 @@ static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
> int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> unsigned long ssp)
> {
> + /*
> + * We allocate the new FPU structure right after the end of the task struct.
> + * task allocation size already took this into account.
> + *
> + * This is safe because task_struct size is a multiple of cacheline size.
> + */
> struct fpu *src_fpu = x86_task_fpu(current);
> - struct fpu *dst_fpu = x86_task_fpu(dst);
> + struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
> +
> + BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
> + BUG_ON(!src_fpu);
> +
> + dst->thread.fpu = dst_fpu;
>
> /* The new task's FPU state cannot be valid in the hardware. */
> dst_fpu->last_cpu = -1;
> @@ -654,16 +665,6 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
> return 0;
> }
>
> -/*
> - * Whitelist the FPU register state embedded into task_struct for hardened
> - * usercopy.
> - */
> -void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
> -{
> - *offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
> - *size = fpu_kernel_cfg.default_size;
> -}
> -
> /*
> * Drops current FPU state: deactivates the fpregs and
> * the fpstate. NOTE: it still leaves previous contents
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index ad5cb2943d37..4e8d37b5a90b 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -71,8 +71,17 @@ static bool __init fpu__probe_without_cpuid(void)
> return fsw == 0 && (fcw & 0x103f) == 0x003f;
> }
>
> +static struct fpu x86_init_fpu __read_mostly;
> +
> static void __init fpu__init_system_early_generic(void)
> {
> + int this_cpu = smp_processor_id();
> +
> + fpstate_reset(&x86_init_fpu);
> + current->thread.fpu = &x86_init_fpu;
> + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> + x86_init_fpu.last_cpu = this_cpu;
> +
> if (!boot_cpu_has(X86_FEATURE_CPUID) &&
> !test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
> if (fpu__probe_without_cpuid())
> @@ -150,6 +159,8 @@ static void __init fpu__init_task_struct_size(void)
> {
> int task_size = sizeof(struct task_struct);
>
> + task_size += sizeof(struct fpu);
> +
> /*
> * Subtract off the static size of the register state.
> * It potentially has a bunch of padding.
> @@ -164,14 +175,9 @@ static void __init fpu__init_task_struct_size(void)
>
> /*
> * We dynamically size 'struct fpu', so we require that
> - * it be at the end of 'thread_struct' and that
> - * 'thread_struct' be at the end of 'task_struct'. If
> - * you hit a compile error here, check the structure to
> - * see if something got added to the end.
> + * 'state' be at the end of 'it:
> */
> CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
> - CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
> - CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
>
> arch_task_struct_size = task_size;
> }
> @@ -213,7 +219,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
> */
> void __init fpu__init_system(void)
> {
> - fpstate_reset(x86_task_fpu(current));
> fpu__init_system_early_generic();
>
> /*
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index adfeefd6375a..5bb73bc0e31a 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -97,7 +97,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> dst->thread.vm86 = NULL;
> #endif
> /* Drop the copied pointer to current's fpstate */
> - x86_task_fpu(dst)->fpstate = NULL;
> + dst->thread.fpu = NULL;
>
> return 0;
> }
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 61591ac6eab6..215a7380e41c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1554,21 +1554,14 @@ struct task_struct {
> struct user_event_mm *user_event_mm;
> #endif
>
> + /* CPU-specific state of this task: */
> + struct thread_struct thread;
> +
> /*
> * New fields for task_struct should be added above here, so that
> * they are included in the randomized portion of task_struct.
> */
> randomized_struct_fields_end
> -
> - /* CPU-specific state of this task: */
> - struct thread_struct thread;
> -
> - /*
> - * WARNING: on x86, 'thread_struct' contains a variable-sized
> - * structure. It *MUST* be at the end of 'task_struct'.
> - *
> - * Do not put anything below here!
> - */
> };
>
> #define TASK_REPORT_IDLE (TASK_REPORT + 1)
> --
> 2.43.0
>

I am seeing a crash with this change in -next as
commit 018d456409d6 ("x86/fpu: Make task_struct::thread constant size")
and I still see it in WIP.x86/fpu from commit 4f4a9b399357 ("x86/fpu:
Make task_struct::thread constant size").

$ make -skj"$(nproc)" ARCH=i386 CROSS_COMPILE=i386-linux- defconfig bzImage

$ qemu-system-i386 \
-display none \
-nodefaults \
-M q35 \
-d unimp,guest_errors \
-append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
-kernel arch/x86/boot/bzImage \
-initrd rootfs.cpio \
-cpu host \
-enable-kvm \
-m 512m \
-smp 8 \
-serial mon:stdio
[ 0.000000] Linux version 6.10.0-rc2-00003-g4f4a9b399357 (nathan@thelio-3990X) (i386-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Mon Jun 10 13:58:20 MST 2024
...
[ 0.337514] ------------[ cut here ]------------
[ 0.338184] Bad FPU state detected at restore_fpregs_from_fpstate+0x38/0x6c, reinitializing FPU registers.
[ 0.338195] WARNING: CPU: 2 PID: 100 at arch/x86/mm/extable.c:127 fixup_exception+0x41e/0x45c
[ 0.340506] Modules linked in:
[ 0.340905] CPU: 2 PID: 100 Comm: modprobe Not tainted 6.10.0-rc2-00003-g4f4a9b399357 #1
[ 0.341939] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.343363] EIP: fixup_exception+0x41e/0x45c
[ 0.343916] Code: e8 cb c0 00 00 0f 0b eb cb 0f 0b ba 4c e9 7f da eb b6 b2 01 88 15 16 bf 6c da 89 44 24 04 c7 04 24 18 39 3b da e8 a6 c0 00 00 <0f> 0b e9 ee fd ff ff 8d b4 26 00 00 00 00 0f 0b ba 4c e9 7f da e9
[ 0.346288] EAX: 0000005e EBX: da4e53f0 ECX: 00000000 EDX: da5d606c
[ 0.347082] ESI: c1ba5ef0 EDI: 0000000d EBP: c1ba5e58 ESP: c1ba5dd8
[ 0.347882] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010086
[ 0.348744] CR0: 80050033 CR2: bfe4afcb CR3: 012f5000 CR4: 00350ed0
[ 0.349540] Call Trace:
[ 0.349855] ? show_regs+0x4d/0x54
[ 0.350300] ? fixup_exception+0x41e/0x45c
[ 0.350828] ? __warn+0x84/0x150
[ 0.351242] ? fixup_exception+0x41e/0x45c
[ 0.351765] ? fixup_exception+0x41e/0x45c
[ 0.352289] ? report_bug+0x186/0x1b0
[ 0.352756] ? exc_overflow+0x50/0x50
[ 0.353230] ? handle_bug+0x2d/0x50
[ 0.353675] ? exc_invalid_op+0x1b/0x70
[ 0.354171] ? console_unlock+0x53/0xc4
[ 0.354658] ? handle_exception+0x14b/0x14b
[ 0.355196] ? exc_overflow+0x50/0x50
[ 0.355663] ? fixup_exception+0x41e/0x45c
[ 0.356184] ? exc_overflow+0x50/0x50
[ 0.356651] ? fixup_exception+0x41e/0x45c
[ 0.357172] ? restore_fpregs_from_fpstate+0x38/0x6c
[ 0.357809] ? _get_random_bytes+0x65/0x190
[ 0.358341] ? mt_find+0xd1/0x458
[ 0.358766] ? exc_bounds+0xac/0xac
[ 0.359213] exc_general_protection+0x97/0x358
[ 0.359775] ? randomize_page+0x37/0x54
[ 0.360271] ? exc_bounds+0xac/0xac
[ 0.360722] handle_exception+0x14b/0x14b
[ 0.361232] EIP: restore_fpregs_from_fpstate+0x38/0x6c
[ 0.361885] Code: 7d fc 89 ca eb 09 cc cc cc db e2 0f 77 db 03 3e 8d 74 26 00 8b 3d ec 31 46 da 8b 0d e8 31 46 da 21 fa 8d 7b 40 21 c8 0f c7 1f <8b> 5d f8 8b 7d fc 89 ec 5d c3 66 90 3e 8d 74 26 00 0f ae 4b 40 8b
[ 0.364284] EAX: 00000007 EBX: c1a73860 ECX: 00000007 EDX: 00000000
[ 0.365077] ESI: c1a73820 EDI: c1a738a0 EBP: c1ba5f54 ESP: c1ba5f4c
[ 0.365870] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002
[ 0.366728] ? exc_bounds+0xac/0xac
[ 0.367171] ? exc_bounds+0xac/0xac
[ 0.367621] ? restore_fpregs_from_fpstate+0x35/0x6c
[ 0.368255] switch_fpu_return+0x49/0xd0
[ 0.368756] syscall_exit_to_user_mode+0x181/0x1a8
[ 0.369364] ? call_usermodehelper_exec_async+0xbe/0x1ac
[ 0.370040] ? call_usermodehelper+0x8c/0x8c
[ 0.370586] ret_from_fork+0x23/0x44
[ 0.371048] ? call_usermodehelper+0x8c/0x8c
[ 0.371592] ret_from_fork_asm+0x12/0x18
[ 0.372092] entry_INT80_32+0x108/0x108
[ 0.372581] EIP: 0xb7f77087
[ 0.372941] Code: Unable to access opcode bytes at 0xb7f7705d.
[ 0.373679] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
[ 0.374477] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfe4aed0
[ 0.375271] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
[ 0.376134] ---[ end trace 0000000000000000 ]---
...

The rootfs is from [1] if you should need it (x86-rootfs.cpio.zst,
decompress with zstd first). I am more than happy to provide any
additional information or test patches as necessary.

[1]: https://github.com/ClangBuiltLinux/boot-utils/releases

Cheers,
Nathan

# bad: [d35b2284e966c0bef3e2182a5c5ea02177dd32e4] Add linux-next specific files for 20240607
# good: [8a92980606e3585d72d510a03b59906e96755b8a] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect start 'd35b2284e966c0bef3e2182a5c5ea02177dd32e4' '8a92980606e3585d72d510a03b59906e96755b8a'
# good: [faef37a085e57f29479f853624948cdc7df6e366] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect good faef37a085e57f29479f853624948cdc7df6e366
# good: [822946749b5f02d9f49dde4b4cb8f2535c247ce5] Merge branch 'drm-xe-next' of https://gitlab.freedesktop.org/drm/xe/kernel
git bisect good 822946749b5f02d9f49dde4b4cb8f2535c247ce5
# bad: [9ffebdc86bc23fdf0622eb0c38d395c2b99b7f32] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect bad 9ffebdc86bc23fdf0622eb0c38d395c2b99b7f32
# good: [9a85e49be89a9150fd2ec9964f48013a00c261d1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good 9a85e49be89a9150fd2ec9964f48013a00c261d1
# bad: [0435675ef6ba0b3e14859bd4c6edb0d81093d28e] Merge branch 'edac-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
git bisect bad 0435675ef6ba0b3e14859bd4c6edb0d81093d28e
# bad: [8f4652aac08030f2d5110be87229ad0f15c33496] Merge branch into tip/master: 'timers/core'
git bisect bad 8f4652aac08030f2d5110be87229ad0f15c33496
# bad: [0724c2b228bef4ea71cf5be0ab64de1065e32299] Merge branch into tip/master: 'irq/core'
git bisect bad 0724c2b228bef4ea71cf5be0ab64de1065e32299
# good: [51d8bfbcae9aeedee48cddefe6776c96acb2d83e] Merge branch into tip/master: 'x86/urgent'
git bisect good 51d8bfbcae9aeedee48cddefe6776c96acb2d83e
# bad: [80b691d02d005beafcd120a3b92c951155db543a] x86/fpu: Use 'fpstate' variable names consistently
git bisect bad 80b691d02d005beafcd120a3b92c951155db543a
# bad: [eb6428fd2eb4f72c735ba6fddd62147ac8d544c2] x86/fpu: Remove the thread::fpu pointer
git bisect bad eb6428fd2eb4f72c735ba6fddd62147ac8d544c2
# bad: [7329f3c69f07a502eb2ab5a6b4d27cd6a067579b] x86/fpu: Introduce the x86_task_fpu() helper method
git bisect bad 7329f3c69f07a502eb2ab5a6b4d27cd6a067579b
# bad: [018d456409d6c9ef4046eb5db95ce357acfeba23] x86/fpu: Make task_struct::thread constant size
git bisect bad 018d456409d6c9ef4046eb5db95ce357acfeba23
# first bad commit: [018d456409d6c9ef4046eb5db95ce357acfeba23] x86/fpu: Make task_struct::thread constant size

2024-06-11 12:44:11

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size

I don't think this can explain the problem reported by Nathan, but.

On 06/08, Ingo Molnar wrote:
>
> +static struct fpu x86_init_fpu __read_mostly;
> +
> static void __init fpu__init_system_early_generic(void)
> {
> + int this_cpu = smp_processor_id();
> +
> + fpstate_reset(&x86_init_fpu);
> + current->thread.fpu = &x86_init_fpu;

OK,

> + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> + x86_init_fpu.last_cpu = this_cpu;

Why? I think it should do

x86_init_fpu.last_cpu = -1;
set_thread_flag(TIF_NEED_FPU_LOAD);

And the next patch should kill x86_init_fpu altogether, but keep
TIF_NEED_FPU_LOAD. It should be never cleared if PF_KTHREAD.

Oleg.


2024-06-12 08:17:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size


* Oleg Nesterov <[email protected]> wrote:

> I don't think this can explain the problem reported by Nathan, but.
>
> On 06/08, Ingo Molnar wrote:
> >
> > +static struct fpu x86_init_fpu __read_mostly;
> > +
> > static void __init fpu__init_system_early_generic(void)
> > {
> > + int this_cpu = smp_processor_id();
> > +
> > + fpstate_reset(&x86_init_fpu);
> > + current->thread.fpu = &x86_init_fpu;
>
> OK,
>
> > + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> > + x86_init_fpu.last_cpu = this_cpu;
>
> Why? I think it should do
>
> x86_init_fpu.last_cpu = -1;
> set_thread_flag(TIF_NEED_FPU_LOAD);
>
> And the next patch should kill x86_init_fpu altogether, but keep
> TIF_NEED_FPU_LOAD. It should be never cleared if PF_KTHREAD.

So I applied the patch further below on top of:

4f4a9b399357 x86/fpu: Make task_struct::thread constant size

And Nathan's 32-bit kernel testcase [but running with 1 CPU to simplify it]
still crashes in a similar fashion in the (first?) modprobe instance with a
bad FPU state exception:

x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[...]

netconsole: network logging started
cfg80211: Loading compiled-in X.509 certificates for regulatory database
------------[ cut here ]------------
Bad FPU state detected at restore_fpregs_from_fpstate+0x38/0x6c, reinitializing FPU registers.
WARNING: CPU: 0 PID: 60 at arch/x86/mm/extable.c:127 fixup_exception+0x41e/0x45c
Modules linked in:
CPU: 0 PID: 60 Comm: modprobe Not tainted 6.10.0-rc2-00003-g4f4a9b399357-dirty #39
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
...

... and the kernel goes down shortly afterwards - full crashlog attached.

What am I missing?

Thanks,

Ingo

===================>

arch/x86/kernel/fpu/init.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..8f912f564fb1 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -75,12 +75,11 @@ static struct fpu x86_init_fpu __read_mostly;

static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+
+ x86_init_fpu.last_cpu = -1;
+ set_thread_flag(TIF_NEED_FPU_LOAD);

if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {


Attachments:
(No filename) (2.87 kB)
crash.log (24.97 kB)
Download all attachments

2024-06-12 09:45:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size

On 06/12, Ingo Molnar wrote:
>
> * Oleg Nesterov <[email protected]> wrote:
>
> > > + per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> > > + x86_init_fpu.last_cpu = this_cpu;
> >
> > Why? I think it should do
> >
> > x86_init_fpu.last_cpu = -1;
> > set_thread_flag(TIF_NEED_FPU_LOAD);
> >
> > And the next patch should kill x86_init_fpu altogether, but keep
> > TIF_NEED_FPU_LOAD. It should be never cleared if PF_KTHREAD.
>
> So I applied the patch further below on top of:
>
> 4f4a9b399357 x86/fpu: Make task_struct::thread constant size
>
> And Nathan's 32-bit kernel testcase [but running with 1 CPU to simplify it]
> still crashes in a similar fashion

Yes, I didn't expect it can fix the problem. Still makes sense, I think.

> in the (first?) modprobe instance with a
> bad FPU state exception:

OK, I reproduced it too. I see nothing wrong in the usermodehelper or
kernel_execve paths... and fpu_clone() looks fine, "minimal" is still
true if init_task or another PF_KTHREAD calls user_mode_thread().

So I appiled the patch below and save_fpregs_to_fpstate() in
fpu__init_system() triggers the WARN_ON_FPU(err) in os_xsave()

[ 0.014609] RESTORED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[ 0.014958] ------------[ cut here ]------------
[ 0.014958] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.h:189 save_fpregs_to_fpstate+0x74/0x80
...

so I _think_ we can probably forget about modprobe/etc.

Oleg.


diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index 10d0a720659c..9fa78f75b2e5 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -56,8 +56,8 @@ static inline void fpregs_restore_userregs(void)
struct fpu *fpu = x86_task_fpu(current);
int cpu = smp_processor_id();

- if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
- return;
+// if (WARN_ON_ONCE(current->flags & (PF_KTHREAD | PF_USER_WORKER)))
+// return;

if (!fpregs_state_valid(fpu, cpu)) {
/*
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..0e63d54595aa 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,11 +5,11 @@
#include <asm/fpu/api.h>
#include <asm/tlbflush.h>
#include <asm/setup.h>
-
+#include <asm/fpu/signal.h>
#include <linux/sched.h>
#include <linux/sched/task.h>
#include <linux/init.h>
-
+#include "context.h"
#include "internal.h"
#include "legacy.h"
#include "xstate.h"
@@ -75,12 +75,12 @@ static struct fpu x86_init_fpu __read_mostly;

static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
+// int this_cpu = smp_processor_id();

fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;

if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
@@ -217,6 +217,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
* Called on the boot CPU once per system bootup, to set up the initial
* FPU state that is later cloned into all processes:
*/
+void save_fpregs_to_fpstate(struct fpu *fpu);
void __init fpu__init_system(void)
{
fpu__init_system_early_generic();
@@ -231,4 +232,10 @@ void __init fpu__init_system(void)
fpu__init_system_xstate_size_legacy();
fpu__init_system_xstate(fpu_kernel_cfg.max_size);
fpu__init_task_struct_size();
+
+ BUG_ON(x86_task_fpu(current) != &x86_init_fpu);
+ fpregs_restore_userregs();
+ pr_crit("RESTORED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
+ save_fpregs_to_fpstate(&x86_init_fpu);
+ pr_crit("SAVED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n");
}


2024-06-12 19:03:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size

The patch below seems to fix the problem.

Again, the changes in fpu__init_system_early_generic() are not
strictly needed to fix it, but I believe make sense anyway.

Oleg.


diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..848ea79886ba 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,16 +71,14 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}

-static struct fpu x86_init_fpu __read_mostly;
+static struct fpu x86_init_fpu __attribute__ ((aligned (64))) __read_mostly;

static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;

if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 215a7380e41c..ec22b9bf27f5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1562,7 +1562,7 @@ struct task_struct {
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-};
+} __attribute__ ((aligned (64)));

#define TASK_REPORT_IDLE (TASK_REPORT + 1)
#define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1)


2024-06-12 20:32:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/fpu: Make task_struct::thread constant size

Damn, sorry for the additional spam, but if I was not clear...

On 06/12, Oleg Nesterov wrote:
>
> The patch below seems to fix the problem.

Of course I am not trying to propose this change. This is just
the debugging patch. Which can hopefully explain the problem.

> Again, the changes in fpu__init_system_early_generic() are not
> strictly needed to fix it, but I believe make sense anyway.

...

> static void __init fpu__init_system_early_generic(void)
> {
> - int this_cpu = smp_processor_id();
> -
> fpstate_reset(&x86_init_fpu);
> current->thread.fpu = &x86_init_fpu;
> - per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> - x86_init_fpu.last_cpu = this_cpu;
> + set_thread_flag(TIF_NEED_FPU_LOAD);
> + x86_init_fpu.last_cpu = -1;

Yes, in particular set_thread_flag(TIF_NEED_FPU_LOAD). And x86_init_fpu
should die after the next patch.

Oleg.


2024-06-13 09:36:53

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels


* Oleg Nesterov <[email protected]> wrote:

> The patch below seems to fix the problem.
>
> Again, the changes in fpu__init_system_early_generic() are not
> strictly needed to fix it, but I believe make sense anyway.
>
> Oleg.
>
>
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index 4e8d37b5a90b..848ea79886ba 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -71,16 +71,14 @@ static bool __init fpu__probe_without_cpuid(void)
> return fsw == 0 && (fcw & 0x103f) == 0x003f;
> }
>
> -static struct fpu x86_init_fpu __read_mostly;
> +static struct fpu x86_init_fpu __attribute__ ((aligned (64))) __read_mostly;
>
> static void __init fpu__init_system_early_generic(void)
> {
> - int this_cpu = smp_processor_id();
> -
> fpstate_reset(&x86_init_fpu);
> current->thread.fpu = &x86_init_fpu;
> - per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
> - x86_init_fpu.last_cpu = this_cpu;
> + set_thread_flag(TIF_NEED_FPU_LOAD);
> + x86_init_fpu.last_cpu = -1;
>
> if (!boot_cpu_has(X86_FEATURE_CPUID) &&
> !test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 215a7380e41c..ec22b9bf27f5 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1562,7 +1562,7 @@ struct task_struct {
> * they are included in the randomized portion of task_struct.
> */
> randomized_struct_fields_end
> -};
> +} __attribute__ ((aligned (64)));

Oh ... indeed, FPU context save area must be 64 bytes aligned!

On 64-bit kernels this was a given, accidentally, but on 32-bit kernels
init_task was only 32-byte aligned:

c22f04e0 D init_task

... which misaligned the struct fpu as well, I think. With your fix:

c22f0500 D init_task

What happened is that due to my series 'struct task_struct' lost its
64-byte alignment attribute, which broke the fpu struct allocation code on
32-bit kernels and made the 64-bit one probably unrobust as well.

To add insult to injury, I was aware of the alignment requirement, and
tried to cover it with an assert, but doubly mis-coded it:

+ BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);

Which is buggy:

- As on 32-bit kernels CONFIG_X86_L1_CACHE_SHIFT=5, ie. 32 bytes ...

- Nor does it really check the alignment of the FPU context save area
within struct fpu as it's allocated after task_struct ...

The interim patch below against the full WIP.x86/fpu series is what fixes
Nathan's 32-bit testcase.

Further improvements:

- The extra alignment attribute in <linux/sched.h> will affect other
architecture as well, although in practice the alignment of init_task is
not critical, and is very likely at least 32 bytes, probably more.
Still, it's a bit ugly in its current form.

- Also, because this was pretty hard to debug, we should probably add an
alignment check to fpu__init_task_struct_size() where we allocate the
fpu context structure, and fix the buggy size-assert.

Thanks a lot for your help Oleg! I've added this tag of yours:

Fixed-by: Oleg Nesterov <[email protected]>

... and would appreciate your Acked-by or Reviewed-by for the eventual
final version of the series, but I don't insist. ;-)

Ingo

=================>
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 53580e59e5db..16b6611634c3 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,15 +71,13 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}

-static struct fpu x86_init_fpu __read_mostly;
+static struct fpu x86_init_fpu __attribute__ ((aligned (64))) __read_mostly;

static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;

if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 215a7380e41c..ec22b9bf27f5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1562,7 +1562,7 @@ struct task_struct {
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
-};
+} __attribute__ ((aligned (64)));

#define TASK_REPORT_IDLE (TASK_REPORT + 1)
#define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1)

2024-06-14 15:19:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels

Hi Ingo, sorry for delay.

On 06/13, Ingo Molnar wrote:
>
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1562,7 +1562,7 @@ struct task_struct {
> > * they are included in the randomized portion of task_struct.
> > */
> > randomized_struct_fields_end
> > -};
> > +} __attribute__ ((aligned (64)));

I guess __aligned(64) will look a bit better, but this is minor.

> What happened is that due to my series 'struct task_struct' lost its
> 64-byte alignment attribute, which broke the fpu struct allocation code on
> 32-bit kernels and made the 64-bit one probably unrobust as well.

Yes, and note that struct fpstate has the same __aligned(64), that is
how I noticed the potential problem and decided to check.

But Ingo, it was a shot in the dark ;) I still don't really understand
what exactly should be aligned. Is it the fpstate->regs member? Or what?
If yes, perhaps this member needs __aligned(64) too to be safe?

> ... and would appreciate your Acked-by or Reviewed-by for the eventual
> final version of the series, but I don't insist. ;-)

Thanks ;) will do.

Oleg.


2024-06-15 10:25:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels

On 06/14, Oleg Nesterov wrote:
>
> On 06/13, Ingo Molnar wrote:
> >
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -1562,7 +1562,7 @@ struct task_struct {
> > > * they are included in the randomized portion of task_struct.
> > > */
> > > randomized_struct_fields_end
> > > -};
> > > +} __attribute__ ((aligned (64)));
>
> I guess __aligned(64) will look a bit better, but this is minor.

...

> But Ingo, it was a shot in the dark ;) I still don't really understand
> what exactly should be aligned. Is it the fpstate->regs member? Or what?
> If yes, perhaps this member needs __aligned(64) too to be safe?

Ah, I didn't notice that fpregs_state->xregs_state has
__attribute__ ((packed, aligned (64))), so everything is clear.

From your previous email:

- The extra alignment attribute in <linux/sched.h> will affect other
architecture as well, although in practice the alignment of init_task is
not critical, and is very likely at least 32 bytes, probably more.
Still, it's a bit ugly in its current form.

Agreed, but afaics we need to align task_struct only to ensure that

(void *)task + sizeof(*task);

doesn't break the alignment.

So perhaps we can (later) change x86_task_fpu(), fpu_clone(), and
fpu__init_task_struct_size() to use

ALIGN(sizeof(struct task_struct), 64)

and remove the alignment attribute in sched.h?

Or use ARCH_MIN_TASKALIGN == __alignof__(union fpregs_state) which is
also used in fork_init()->kmem_cache_create().

Oleg.


2024-06-16 10:57:53

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/9] x86/fpu: Fix 'struct fpu' misalignment on 32-bit kernels

On 06/15, Oleg Nesterov wrote:
>
> So perhaps we can (later) change x86_task_fpu(), fpu_clone(), and
> fpu__init_task_struct_size() to use
>
> ALIGN(sizeof(struct task_struct), 64)
>
> and remove the alignment attribute in sched.h?

On the 2nd thought, perhaps this makes sense from the very beginning?
See the patch below, up to you.

> Or use ARCH_MIN_TASKALIGN == __alignof__(union fpregs_state) which is
> also used in fork_init()->kmem_cache_create().

Either way, I hope that CONFIG_X86_VSMP can't define ARCH_MIN_TASKALIGN
less than __alignof__(fpregs_state).

Oleg.
---

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 64509c7f26c8..7887e9493330 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -507,6 +507,9 @@ struct thread_struct {
struct fpu *fpu;
};

+#define X86_TASK_SIZE \
+ ALIGN(sizeof(struct task_struct), __alignof__(union fpregs_state))
+
#define x86_task_fpu(task) ((task)->thread.fpu)

/*
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f0c4367804b3..613198372764 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -591,7 +591,7 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
* This is safe because task_struct size is a multiple of cacheline size.
*/
struct fpu *src_fpu = x86_task_fpu(current);
- struct fpu *dst_fpu = (void *)dst + sizeof(*dst);
+ struct fpu *dst_fpu = (void *)dst + X86_TASK_SIZE;

BUILD_BUG_ON(sizeof(*dst) % SMP_CACHE_BYTES != 0);
BUG_ON(!src_fpu);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4e8d37b5a90b..8b43c83b82c7 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -71,16 +71,14 @@ static bool __init fpu__probe_without_cpuid(void)
return fsw == 0 && (fcw & 0x103f) == 0x003f;
}

-static struct fpu x86_init_fpu __read_mostly;
+static struct fpu x86_init_fpu __aligned(64) __read_mostly;

static void __init fpu__init_system_early_generic(void)
{
- int this_cpu = smp_processor_id();
-
fpstate_reset(&x86_init_fpu);
current->thread.fpu = &x86_init_fpu;
- per_cpu(fpu_fpregs_owner_ctx, this_cpu) = &x86_init_fpu;
- x86_init_fpu.last_cpu = this_cpu;
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ x86_init_fpu.last_cpu = -1;

if (!boot_cpu_has(X86_FEATURE_CPUID) &&
!test_bit(X86_FEATURE_FPU, (unsigned long *)cpu_caps_cleared)) {
@@ -157,7 +155,7 @@ static void __init fpu__init_system_generic(void)
*/
static void __init fpu__init_task_struct_size(void)
{
- int task_size = sizeof(struct task_struct);
+ int task_size = X86_TASK_SIZE;

task_size += sizeof(struct fpu);