These patches defer FPU state loading until return to userspace.
This has the advantage of not clobbering the FPU state of one task
with that of another, when that other task only stays in kernel mode.
It also allows us to skip the FPU restore in kernel_fpu_end(), which
will help tasks that do multiple invokations of kernel_fpu_begin/end
without returning to userspace, for example KVM VCPU tasks.
We could also skip the restore of the KVM VCPU guest FPU state at
guest entry time, if it is still valid, but I have not implemented
that yet.
The code that loads FPU context directly into registers from user
space memory, or saves directly to user space memory, is wrapped
in a retry loop, that ensures the FPU state is correctly set up
at the start, and verifies that it is still valid at the end.
I have stress tested these patches with various FPU test programs,
and things seem to survive.
However, I have not found any good test suites that mix FPU
use and signal handlers. Close scrutiny of these patches would
be appreciated.
From: Rik van Riel <[email protected]>
Add helper functions that ensure a task's floating point registers are
set up the way they need to be - either with the task's floating point
state loaded in, or ready to accept a task's new floating point state.
These helpers can be called from code that accesses the floating point
state from a preemptible state, in preparation for the lazier floating
point loading code, using loops like this:
do {
make_fpregs_active();
...
} while (unlikely(!fpregs_active()));
This way a task can safely do things like saving the floating point
state of a task to user space memory (the signal handling code does
this), without requiring that the floating point state is restored
at every context switch.
If the floating point registers are still active when leaving the
loop, the floating point state has ended up in its destination
(registers or memory) in one piece, and will be saved at the next
context switch.
When the floating point state is already present, these functions
do nothing.
Signed-off-by: Rik van Riel <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 53 +++++++++++++++++++++++++++++++++----
1 file changed, 48 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 621ba3bfa2a7..d40deb337807 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -473,6 +473,14 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
/*
+ * Is this thread using floating point state.
+ */
+static inline int fpstate_active(void)
+{
+ return current->thread.fpu.fpstate_active;
+}
+
+/*
* FPU context switch related helper methods:
*/
@@ -548,6 +556,44 @@ static inline int fpregs_active(void)
}
/*
+ * Ensure the floating point registers are ready for this task, and
+ * contain the task's floating point state.
+ *
+ * The loadnew variant can be used when loading new floating point
+ * state, and the old floating point register state does not matter.
+ */
+static inline void __make_fpregs_active(struct fpu *fpu, int cpu)
+{
+ if (!fpregs_state_valid(fpu, cpu))
+ copy_kernel_to_fpregs(&fpu->state);
+ fpregs_activate(fpu);
+}
+
+static inline void make_fpregs_active(void)
+{
+ struct fpu *fpu = ¤t->thread.fpu;
+
+ if (fpregs_active())
+ return;
+
+ preempt_disable();
+ __make_fpregs_active(fpu, raw_smp_processor_id());
+ preempt_enable();
+}
+
+static inline void make_fpregs_active_loadnew(void)
+{
+ struct fpu *fpu = ¤t->thread.fpu;
+
+ if (fpregs_active())
+ return;
+
+ preempt_disable();
+ fpregs_activate(fpu);
+ preempt_enable();
+}
+
+/*
* FPU state switching for scheduling.
*
* This is a two-stage process:
@@ -587,11 +633,8 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
bool preload = static_cpu_has(X86_FEATURE_FPU) &&
new_fpu->fpstate_active;
- if (preload) {
- if (!fpregs_state_valid(new_fpu, cpu))
- copy_kernel_to_fpregs(&new_fpu->state);
- fpregs_activate(new_fpu);
- }
+ if (preload)
+ __make_fpregs_active(new_fpu, cpu);
}
/*
--
2.7.4
From: Rik van Riel <[email protected]>
The patch that defers loading of FPU state until return to userspace
can result in tasks with an FPU state not having that FPU state loaded
in certain paths in the kernel, which want to access it.
Wrap those code paths in a loop that ensures the FPU state is valid before
any operations begin, and still valid afterwards.
Right now this code does nothing, since the FPU state of a task is loaded
at context switch time and will always be valid.
After the patch that defers loading of FPU state, the condition at the end
of the loop will ensure that the operation is restarted if the task was
context switched away during the operation.
Some of these loops involve code that involve copying FPU state from
or to user space memory. The potential for page faults mean this code
cannot be made non-preemptible, making a retry loop the easiest option.
Sufficiently fast operations to or from kernel memory can simply be run
with preempt disabled.
Signed-off-by: Rik van Riel <[email protected]>
---
arch/x86/kernel/fpu/core.c | 2 ++
arch/x86/kernel/fpu/signal.c | 28 +++++++++++++++++-----------
arch/x86/kernel/fpu/xstate.c | 5 +++++
arch/x86/mm/pkeys.c | 11 +++++++----
4 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 30f11ab6c07e..34ba9d47c20f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -490,11 +490,13 @@ void fpu__clear(struct fpu *fpu)
/* FPU state will be reallocated lazily at the first use. */
fpu__drop(fpu);
} else {
+ preempt_disable();
if (!fpu->fpstate_active) {
fpu__activate_curr(fpu);
user_fpu_begin();
}
copy_init_fpstate_to_fpregs();
+ preempt_enable();
}
}
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 83c23c230b4c..15128532aa81 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -121,12 +121,16 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
{
int err;
- if (use_xsave())
- err = copy_xregs_to_user(buf);
- else if (use_fxsr())
- err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
- else
- err = copy_fregs_to_user((struct fregs_state __user *) buf);
+ do {
+ make_fpregs_active();
+
+ if (use_xsave())
+ err = copy_xregs_to_user(buf);
+ else if (use_fxsr())
+ err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
+ else
+ err = copy_fregs_to_user((struct fregs_state __user *) buf);
+ } while (unlikely(!fpregs_active()));
if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
err = -EFAULT;
@@ -350,11 +354,13 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
* For 64-bit frames and 32-bit fsave frames, restore the user
* state to the registers directly (with exceptions handled).
*/
- user_fpu_begin();
- if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
- fpu__clear(fpu);
- return -1;
- }
+ do {
+ user_fpu_begin();
+ if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
+ fpu__clear(fpu);
+ return -1;
+ }
+ } while (unlikely(!fpregs_active()));
}
return 0;
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 76bc2a1a3a79..798cfb242b18 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -896,6 +896,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
/* Shift the bits in to the correct place in PKRU for pkey: */
new_pkru_bits <<= pkey_shift;
+ preempt_disable();
+ make_fpregs_active();
+
/* Get old PKRU and mask off any old bits in place: */
old_pkru = read_pkru();
old_pkru &= ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift);
@@ -903,6 +906,8 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
/* Write old part along with new part: */
write_pkru(old_pkru | new_pkru_bits);
+ preempt_enable();
+
return 0;
}
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index e8c474451928..9eba07353404 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -32,10 +32,13 @@ int __execute_only_pkey(struct mm_struct *mm)
* can make fpregs inactive.
*/
preempt_disable();
- if (fpregs_active() &&
- !__pkru_allows_read(read_pkru(), PKEY_DEDICATED_EXECUTE_ONLY)) {
- preempt_enable();
- return PKEY_DEDICATED_EXECUTE_ONLY;
+ if (fpstate_active()) {
+ make_fpregs_active();
+ if (!__pkru_allows_read(read_pkru(),
+ PKEY_DEDICATED_EXECUTE_ONLY)) {
+ preempt_enable();
+ return PKEY_DEDICATED_EXECUTE_ONLY;
+ }
}
preempt_enable();
ret = arch_set_user_pkey_access(current, PKEY_DEDICATED_EXECUTE_ONLY,
--
2.7.4
From: Rik van Riel <[email protected]>
Defer loading of FPU state until return to userspace. This gives
the kernel the potential to skip loading FPU state for tasks that
stay in kernel mode, or for tasks that end up with repeated
invocations of kernel_fpu_begin.
It also increases the chances that a task's FPU state will remain
valid in the FPU registers until it is scheduled back in, allowing
us to skip restoring that task's FPU state altogether.
This also prepares the ground work for not having to restore
qemu userspace FPU state in KVM VCPU threads, when merely returning
to the host kernel because the guest went idle, or is running a
kernel thread. That functionality will come in a later patch.
Signed-off-by: Rik van Riel <[email protected]>
---
arch/x86/entry/common.c | 9 +++++++++
arch/x86/include/asm/fpu/api.h | 5 +++++
arch/x86/include/asm/fpu/internal.h | 13 +++++--------
arch/x86/include/asm/thread_info.h | 4 +++-
arch/x86/kernel/fpu/core.c | 28 ++++++++++++++++++++++++----
arch/x86/kernel/process_32.c | 5 ++---
arch/x86/kernel/process_64.c | 5 ++---
7 files changed, 50 insertions(+), 19 deletions(-)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index bdd9cc59d20f..0c11ee22f90b 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -27,6 +27,7 @@
#include <asm/vdso.h>
#include <asm/uaccess.h>
#include <asm/cpufeature.h>
+#include <asm/fpu/api.h>
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
@@ -189,6 +190,14 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
exit_to_usermode_loop(regs, cached_flags);
+ /* Reload ti->flags; we may have rescheduled above. */
+ cached_flags = READ_ONCE(ti->flags);
+
+ if (unlikely(cached_flags & _TIF_LOAD_FPU)) {
+ clear_thread_flag(TIF_LOAD_FPU);
+ switch_fpu_return();
+ }
+
#ifdef CONFIG_COMPAT
/*
* Compat syscalls set TS_COMPAT. Make sure we clear it before
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 1429a7c736db..d7ef49a03b51 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -37,6 +37,11 @@ extern int irq_ts_save(void);
extern void irq_ts_restore(int TS_state);
/*
+ * Load the task FPU state before returning to userspace.
+ */
+extern void switch_fpu_return(void);
+
+/*
* Query the presence of one or more xfeatures. Works on any legacy CPU as well.
*
* If 'feature_name' is set then put a human-readable description of
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index d40deb337807..cccc0c059b41 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -601,8 +601,8 @@ static inline void make_fpregs_active_loadnew(void)
* - switch_fpu_prepare() saves the old state.
* This is done within the context of the old process.
*
- * - switch_fpu_finish() restores the new state as
- * necessary.
+ * - switch_fpu_finish() sets TIF_LOAD_FPU; the floating point state
+ * will get loaded on return to userspace, or when the kernel needs it.
*/
static inline void
switch_fpu_prepare(struct fpu *old_fpu, int cpu)
@@ -628,13 +628,10 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
* Set up the userspace FPU context for the new task, if the task
* has used the FPU.
*/
-static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
+static inline void switch_fpu_finish(void)
{
- bool preload = static_cpu_has(X86_FEATURE_FPU) &&
- new_fpu->fpstate_active;
-
- if (preload)
- __make_fpregs_active(new_fpu, cpu);
+ if (static_cpu_has(X86_FEATURE_FPU))
+ set_thread_flag(TIF_LOAD_FPU);
}
/*
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 2aaca53c0974..9941d118f2cc 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -90,6 +90,7 @@ struct task_struct;
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define TIF_ADDR32 29 /* 32-bit address space on 64 bits */
#define TIF_X32 30 /* 32-bit native x86-64 binary */
+#define TIF_LOAD_FPU 31 /* load FPU on return to userspace */
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@@ -112,6 +113,7 @@ struct task_struct;
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_ADDR32 (1 << TIF_ADDR32)
#define _TIF_X32 (1 << TIF_X32)
+#define _TIF_LOAD_FPU (1 << TIF_LOAD_FPU)
/*
* work to do in syscall_trace_enter(). Also includes TIF_NOHZ for
@@ -125,7 +127,7 @@ struct task_struct;
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK \
((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \
- _TIF_NOHZ)
+ _TIF_NOHZ | _TIF_LOAD_FPU)
/* flags to check in __switch_to() */
#define _TIF_WORK_CTXSW \
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 34ba9d47c20f..09c4254a6e26 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -99,14 +99,14 @@ void __kernel_fpu_begin(void)
kernel_fpu_disable();
+ __cpu_invalidate_fpregs_state();
+
if (fpu->fpregs_active) {
/*
* Ignore return value -- we don't care if reg state
* is clobbered.
*/
copy_fpregs_to_fpstate(fpu);
- } else {
- __cpu_invalidate_fpregs_state();
}
}
EXPORT_SYMBOL(__kernel_fpu_begin);
@@ -115,8 +115,10 @@ void __kernel_fpu_end(void)
{
struct fpu *fpu = ¤t->thread.fpu;
- if (fpu->fpregs_active)
- copy_kernel_to_fpregs(&fpu->state);
+ if (fpu->fpregs_active) {
+ switch_fpu_finish();
+ fpu->fpregs_active = 0;
+ }
kernel_fpu_enable();
}
@@ -501,6 +503,24 @@ void fpu__clear(struct fpu *fpu)
}
/*
+ * Load FPU context before returning to userspace.
+ */
+void switch_fpu_return(void)
+{
+ if (!static_cpu_has(X86_FEATURE_FPU))
+ return;
+
+ /*
+ * We should never return to user space without the task's
+ * own FPU contents loaded into the registers. That makes it
+ * a bug to not have the task's FPU state set up.
+ */
+ WARN_ON_FPU(!fpstate_active());
+
+ make_fpregs_active();
+}
+
+/*
* x87 math exception handling:
*/
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7dc8c9c3d801..9103871f80f5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -229,7 +229,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
struct thread_struct *prev = &prev_p->thread,
*next = &next_p->thread;
struct fpu *prev_fpu = &prev->fpu;
- struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(cpu_tss, cpu);
@@ -294,9 +293,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
if (prev->gs | next->gs)
lazy_load_gs(next->gs);
- switch_fpu_finish(next_fpu, cpu);
-
this_cpu_write(current_task, next_p);
+ switch_fpu_finish();
+
return prev_p;
}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 705669efb762..4b228e1e4423 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -260,7 +260,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
struct thread_struct *prev = &prev_p->thread;
struct thread_struct *next = &next_p->thread;
struct fpu *prev_fpu = &prev->fpu;
- struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(cpu_tss, cpu);
unsigned prev_fsindex, prev_gsindex;
@@ -415,8 +414,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
prev->gsbase = 0;
prev->gsindex = prev_gsindex;
- switch_fpu_finish(next_fpu, cpu);
-
/*
* Switch the PDA and FPU contexts.
*/
@@ -425,6 +422,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/* Reload esp0 and ss1. This changes current_thread_info(). */
load_sp0(tss, next);
+ switch_fpu_finish();
+
/*
* Now maybe reload the debug registers and handle I/O bitmaps
*/
--
2.7.4
On Mon, Oct 17, 2016 at 1:09 PM, <[email protected]> wrote:
> From: Rik van Riel <[email protected]>
>
> Add helper functions that ensure a task's floating point registers are
> set up the way they need to be - either with the task's floating point
> state loaded in, or ready to accept a task's new floating point state.
>
> These helpers can be called from code that accesses the floating point
> state from a preemptible state, in preparation for the lazier floating
> point loading code, using loops like this:
>
> do {
> make_fpregs_active();
> ...
> } while (unlikely(!fpregs_active()));
>
> This way a task can safely do things like saving the floating point
> state of a task to user space memory (the signal handling code does
> this), without requiring that the floating point state is restored
> at every context switch.
Sadly, I think this model is problematic. An attacker can set up some
memory that causes writes to block in a controlled manner (using
userfaultfd, FUSE, madvise() hammering, etc). The attacker can
arrange for the uaccess write in the "..." to block and then for some
privileged target task to be scheduled. The attacker then gets their
task to be scheduled next and the privileged xstate gets written to
the attacker's memory. Then the attacker either reads it back from a
different thread or arranges for the next iteration of the loop to
fail outright. Now the attacker has read another task's xstate.
Dave and/or Yu-cheng: didn't one of you have some code to allow a user
xstate buffer to be filled from the copy in kernel memory? If we did
that, we could avoid this mess entirely.
Alternatively, there could be flag that causes FPU loads to be
temporarily eager. Maybe the sequence would look like:
pin_fpregs_active();
...
unpin_fpregs_active();
or maybe get_fpregs() / put_fpregs().
--Andy
On Mon, Oct 17, 2016 at 1:09 PM, <[email protected]> wrote:
> From: Rik van Riel <[email protected]>
>
> Defer loading of FPU state until return to userspace. This gives
> the kernel the potential to skip loading FPU state for tasks that
> stay in kernel mode, or for tasks that end up with repeated
> invocations of kernel_fpu_begin.
>
> It also increases the chances that a task's FPU state will remain
> valid in the FPU registers until it is scheduled back in, allowing
> us to skip restoring that task's FPU state altogether.
>
> This also prepares the ground work for not having to restore
> qemu userspace FPU state in KVM VCPU threads, when merely returning
> to the host kernel because the guest went idle, or is running a
> kernel thread. That functionality will come in a later patch.
>
> Signed-off-by: Rik van Riel <[email protected]>
> ---
> arch/x86/entry/common.c | 9 +++++++++
> arch/x86/include/asm/fpu/api.h | 5 +++++
> arch/x86/include/asm/fpu/internal.h | 13 +++++--------
> arch/x86/include/asm/thread_info.h | 4 +++-
> arch/x86/kernel/fpu/core.c | 28 ++++++++++++++++++++++++----
> arch/x86/kernel/process_32.c | 5 ++---
> arch/x86/kernel/process_64.c | 5 ++---
> 7 files changed, 50 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index bdd9cc59d20f..0c11ee22f90b 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -27,6 +27,7 @@
> #include <asm/vdso.h>
> #include <asm/uaccess.h>
> #include <asm/cpufeature.h>
> +#include <asm/fpu/api.h>
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/syscalls.h>
> @@ -189,6 +190,14 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
> if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
> exit_to_usermode_loop(regs, cached_flags);
>
> + /* Reload ti->flags; we may have rescheduled above. */
> + cached_flags = READ_ONCE(ti->flags);
Stick this bit in the "if" above, please.
> +
> + if (unlikely(cached_flags & _TIF_LOAD_FPU)) {
> + clear_thread_flag(TIF_LOAD_FPU);
> + switch_fpu_return();
> + }
> +
But I still don't see how this can work correctly with PKRU.
--Andy
On Mon, Oct 17, 2016 at 01:57:06PM -0700, Andy Lutomirski wrote:
> Dave and/or Yu-cheng: didn't one of you have some code to allow a user
> xstate buffer to be filled from the copy in kernel memory? If we did
> that, we could avoid this mess entirely.
In copy_fpstate_to_sigframe() (arch/x86/kernel/fpu/signal.c), the
assumption was we have lazy fpu:
if (fpregs_active() || we want an #NM exception)
copy_fpregs_to_sigframe();
else
copy kernel buffer to user buffer;
But this is not the true anymore. Or do you mean something else?
-- Yu-cheng
On Mon, Oct 17, 2016 at 4:04 PM, Yu-cheng Yu <[email protected]> wrote:
> On Mon, Oct 17, 2016 at 01:57:06PM -0700, Andy Lutomirski wrote:
>> Dave and/or Yu-cheng: didn't one of you have some code to allow a user
>> xstate buffer to be filled from the copy in kernel memory? If we did
>> that, we could avoid this mess entirely.
>
> In copy_fpstate_to_sigframe() (arch/x86/kernel/fpu/signal.c), the
> assumption was we have lazy fpu:
>
> if (fpregs_active() || we want an #NM exception)
> copy_fpregs_to_sigframe();
> else
> copy kernel buffer to user buffer;
>
> But this is not the true anymore. Or do you mean something else?
Rik wants to add a different form of FPU laziness, and it would be
simpler if we could just always copy from a kernel buffer. Does code
to do that exist in the tree?
--Andy
On Mon, Oct 17, 2016 at 04:30:47PM -0700, Andy Lutomirski wrote:
> Rik wants to add a different form of FPU laziness, and it would be
> simpler if we could just always copy from a kernel buffer. Does code
> to do that exist in the tree?
If we know fpregs are not active at the time of the copy and compacted format
is the concern, can we use copyout_from_xsaves() in xstate.c?
Yu-cheng
On Mon, 2016-10-17 at 13:58 -0700, Andy Lutomirski wrote:
> On Mon, Oct 17, 2016 at 1:09 PM, <[email protected]> wrote:
> >
> > From: Rik van Riel <[email protected]>
> >
> > Defer loading of FPU state until return to userspace. This gives
> > the kernel the potential to skip loading FPU state for tasks that
> > stay in kernel mode, or for tasks that end up with repeated
> > invocations of kernel_fpu_begin.
> > #define CREATE_TRACE_POINTS
> > #include <trace/events/syscalls.h>
> > @@ -189,6 +190,14 @@ __visible inline void
> > prepare_exit_to_usermode(struct pt_regs *regs)
> > if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
> > exit_to_usermode_loop(regs, cached_flags);
> >
> > + /* Reload ti->flags; we may have rescheduled above. */
> > + cached_flags = READ_ONCE(ti->flags);
>
> Stick this bit in the "if" above, please.
Will do.
> But I still don't see how this can work correctly with PKRU.
OK, Andy and I talked on IRC, and we have some ideas on how
to fix & improve this series:
1) pin/unpin_fpregs_active to prevent leaking of other
users' fpregs contents to userspace (patch 1)
2) eagerly switch PKRU state (only), at task switch time,
if the incoming task has different protection keys from
the outgoing task (somewhat unlikely), just like the
KVM vcpu entry & exit code is already doing
3) remove stts from the KVM VMX code (Andy may get
to this before me)
4) enhance __kernel_fpu_begin() to take an fpu argument,
and let the caller (really just kvm_load_guest_fpu)
know whether that fpu state is still present in the
registers, allowing it to skip __copy_kernel_to_fpregs
--
All Rights Reversed.
On Mon, Oct 17, 2016 at 4:45 PM, Yu-cheng Yu <[email protected]> wrote:
> On Mon, Oct 17, 2016 at 04:30:47PM -0700, Andy Lutomirski wrote:
>> Rik wants to add a different form of FPU laziness, and it would be
>> simpler if we could just always copy from a kernel buffer. Does code
>> to do that exist in the tree?
>
> If we know fpregs are not active at the time of the copy and compacted format
> is the concern, can we use copyout_from_xsaves() in xstate.c?
>
Right, that. Rik, that could be an alternative solution.
--Andy
* [email protected] <[email protected]> wrote:
> These patches defer FPU state loading until return to userspace.
>
> This has the advantage of not clobbering the FPU state of one task
> with that of another, when that other task only stays in kernel mode.
>
> It also allows us to skip the FPU restore in kernel_fpu_end(), which
> will help tasks that do multiple invokations of kernel_fpu_begin/end
> without returning to userspace, for example KVM VCPU tasks.
>
> We could also skip the restore of the KVM VCPU guest FPU state at
> guest entry time, if it is still valid, but I have not implemented
> that yet.
>
> The code that loads FPU context directly into registers from user
> space memory, or saves directly to user space memory, is wrapped
> in a retry loop, that ensures the FPU state is correctly set up
> at the start, and verifies that it is still valid at the end.
>
> I have stress tested these patches with various FPU test programs,
> and things seem to survive.
>
> However, I have not found any good test suites that mix FPU
> use and signal handlers. Close scrutiny of these patches would
> be appreciated.
BTW., for the next version it would be nice to also have a benchmark that shows
the advantages (and proves that it's not causing measurable overhead elsewhere).
Either an FPU-aware extension to 'perf bench sched' or a separate 'perf bench fpu'
suite would be nice.
Thanks,
Ingo