2011-03-09 19:17:17

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 0/8] rework of extended state handling, LWP support

This patch set is a general cleanup and rework of the code related to
handling of FPU and otherextended states.

All handling of extended states, including the FPU state, is now handled
by xsave/xrstor wrappers that fall back to fxsave/fxrstor, or even
fsave/frstor, if hardware support for those features is lacking.

Non-lazy xstates, which cannot be restored lazily, can now be easily
supported with almost no processing overhead. This makes adding basic
support for AMDs LWP almost trivial.

Since non-lazy xstates are inherently incompatible with lazy allocation
of the xstate area, the complete removal of lazy allocation to further
reduce code complexity should be considered. Since SSE-optimized library
functions are widely used today, most processes would have an xstate
area anyway, so the memory overhead wouldn't be big enough to be much of
an issue.


Hans Rosenfeld (8):
x86, xsave: cleanup fpu/xsave support
x86, xsave: rework fpu/xsave support
x86, xsave: cleanup fpu/xsave signal frame setup
x86, xsave: remove unused code
x86, xsave: more cleanups
x86, xsave: add support for non-lazy xstates
x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)
x86, xsave: remove lazy allocation of xstate area

arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 226 +++++++-------------------
arch/x86/include/asm/processor.h | 12 ++
arch/x86/include/asm/sigcontext.h | 12 ++
arch/x86/include/asm/thread_info.h | 4 +-
arch/x86/include/asm/xsave.h | 99 ++---------
arch/x86/kernel/i387.c | 310 ++++--------------------------------
arch/x86/kernel/process_32.c | 29 ++---
arch/x86/kernel/process_64.c | 28 +---
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/traps.c | 47 +-----
arch/x86/kernel/xsave.c | 313 +++++++++++++++++++++++-------------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 11 +-
arch/x86/math-emu/fpu_entry.c | 8 +-
drivers/lguest/x86/core.c | 2 +-
16 files changed, 373 insertions(+), 738 deletions(-)


2011-03-09 19:15:19

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 2/8] x86, xsave: rework fpu/xsave support

This is a complete rework of the code that handles FPU and related
extended states. Since FPU, XMM and YMM states are just variants of what
xsave handles, all of the old FPU-specific state handling code will be
hidden behind a set of functions that resemble xsave and xrstor. For
hardware that does not support xsave, the code falls back to
fxsave/fxrstor or even fsave/frstor.

A xstate_mask member will be added to the thread_info structure that
will control which states are to be saved by xsave. It is set to include
all "lazy" states (that is, all states currently supported: FPU, XMM and
YMM) by the #NM handler when a lazy restore is triggered or by
switch_to() when the tasks FPU context is preloaded. Xstate_mask is
intended to completely replace TS_USEDFPU in a later cleanup patch.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 44 +++++++++++++++++++---
arch/x86/include/asm/thread_info.h | 2 +
arch/x86/include/asm/xsave.h | 14 ++++++-
arch/x86/kernel/i387.c | 11 ++++--
arch/x86/kernel/process_32.c | 27 +++++---------
arch/x86/kernel/process_64.c | 26 ++++----------
arch/x86/kernel/traps.c | 11 +++---
arch/x86/kernel/xsave.c | 71 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 7 ++--
drivers/lguest/x86/core.c | 2 +-
10 files changed, 158 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index d908383..939af08 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -224,12 +224,46 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
+static inline void fpu_restore(struct fpu *fpu)
+{
+ fxrstor_checking(&fpu->state->fxsave);
+}
+
+static inline void fpu_save(struct fpu *fpu)
+{
+ if (use_fxsr()) {
+ fpu_fxsave(fpu);
+ } else {
+ asm volatile("fsave %[fx]; fwait"
+ : [fx] "=m" (fpu->state->fsave));
+ }
+}
+
+static inline void fpu_clean(struct fpu *fpu)
+{
+ u32 swd = (use_fxsr() || use_xsave()) ?
+ fpu->state->fxsave.swd : fpu->state->fsave.swd;
+
+ if (unlikely(swd & X87_FSW_ES))
+ asm volatile("fnclex");
+
+ /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
+ is pending. Clear the x87 state here by setting it to fixed
+ values. safe_address is a random variable that should be in L1 */
+ alternative_input(
+ ASM_NOP8 ASM_NOP2,
+ "emms\n\t" /* clear stack tags */
+ "fildl %P[addr]", /* set F?P to defined value */
+ X86_FEATURE_FXSAVE_LEAK,
+ [addr] "m" (safe_address));
+}
+
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
struct xsave_struct *xstate = &fpu->state->xsave;

- fpu_xsave(xstate);
+ fpu_xsave(xstate, -1);

/*
* xsave header may indicate the init state of the FP.
@@ -295,18 +329,16 @@ static inline void __clear_fpu(struct task_struct *tsk)
"2:\n"
_ASM_EXTABLE(1b, 2b));
task_thread_info(tsk)->status &= ~TS_USEDFPU;
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
stts();
}
}

static inline void kernel_fpu_begin(void)
{
- struct thread_info *me = current_thread_info();
preempt_disable();
- if (me->status & TS_USEDFPU)
- __save_init_fpu(me->task);
- else
- clts();
+ save_xstates(current_thread_info()->task);
+ clts();
}

static inline void kernel_fpu_end(void)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f0b6e5d..5c92d21 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -26,6 +26,7 @@ struct exec_domain;
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
+ __u64 xstate_mask; /* xstates in use */
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
@@ -47,6 +48,7 @@ struct thread_info {
{ \
.task = &tsk, \
.exec_domain = &default_exec_domain, \
+ .xstate_mask = 0, \
.flags = 0, \
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 8bcbbce..6052a84 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -25,6 +25,8 @@
*/
#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)

+#define XCNTXT_LAZY XCNTXT_MASK
+
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
@@ -35,6 +37,11 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

+extern void xsave(struct fpu *, u64);
+extern void xrstor(struct fpu *, u64);
+extern void save_xstates(struct task_struct *);
+extern void restore_xstates(struct task_struct *, u64);
+
extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
@@ -113,15 +120,18 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx)
+static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
alternative_input(
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index e60c38c..5ab66ec 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -152,8 +152,11 @@ int init_fpu(struct task_struct *tsk)
int ret;

if (tsk_used_math(tsk)) {
- if (HAVE_HWFP && tsk == current)
- unlazy_fpu(tsk);
+ if (HAVE_HWFP && tsk == current) {
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
+ }
return 0;
}

@@ -600,7 +603,9 @@ int save_i387_xstate_ia32(void __user *buf)
NULL, fp) ? -1 : 1;
}

- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();

if (cpu_has_xsave)
return save_i387_xsave(fp);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..8df07c3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -185,7 +185,9 @@ void release_thread(struct task_struct *dead_task)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -294,21 +296,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*next = &next_p->thread;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
- bool preload_fpu;

/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */

- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;
-
- __unlazy_fpu(prev_p);
+ save_xstates(prev_p);

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -349,11 +343,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);

- /* If we're going to preload the fpu context, make sure clts
- is run while we're batching the cpu state updates. */
- if (preload_fpu)
- clts();
-
/*
* Leave lazy mode, flushing any hypercalls made here.
* This must be done before restoring TLS segments so
@@ -363,8 +352,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/
arch_end_context_switch(next_p);

- if (preload_fpu)
- __math_state_restore();
+ /*
+ * Restore enabled extended states for the task.
+ */
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

/*
* Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bd387e8..67c5838 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -249,7 +249,9 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -378,17 +380,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
unsigned fsindex, gsindex;
- bool preload_fpu;
-
- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -420,11 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
load_TLS(next, cpu);

/* Must be after DS reload */
- __unlazy_fpu(prev_p);
-
- /* Make sure cpu is ready for new context */
- if (preload_fpu)
- clts();
+ save_xstates(prev_p);

/*
* Leave lazy mode, flushing any hypercalls made here.
@@ -485,11 +475,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
__switch_to_xtra(prev_p, next_p, tss);

/*
- * Preload the FPU context, now that we've determined that the
- * task is likely to be using it.
+ * Restore enabled extended states for the task.
*/
- if (preload_fpu)
- __math_state_restore();
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

return prev_p;
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 32f3043..072c30e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -625,7 +625,10 @@ void math_error(struct pt_regs *regs, int error_code, int trapnr)
/*
* Save the info for the exception handler and clear the error.
*/
- save_init_fpu(task);
+ preempt_disable();
+ save_xstates(task);
+ preempt_enable();
+
task->thread.trap_no = trapnr;
task->thread.error_code = error_code;
info.si_signo = SIGFPE;
@@ -734,7 +737,7 @@ void __math_state_restore(void)
return;
}

- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
+ thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
}

@@ -768,9 +771,7 @@ asmlinkage void math_state_restore(void)
local_irq_disable();
}

- clts(); /* Allow maths ops (or we recurse) */
-
- __math_state_restore();
+ restore_xstates(tsk, XCNTXT_LAZY);
}
EXPORT_SYMBOL_GPL(math_state_restore);

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index e204b07..c422527 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -5,6 +5,7 @@
*/
#include <linux/bootmem.h>
#include <linux/compat.h>
+#include <linux/module.h>
#include <asm/i387.h>
#ifdef CONFIG_IA32_EMULATION
#include <asm/sigcontext32.h>
@@ -474,3 +475,73 @@ void __cpuinit xsave_init(void)
next_func = xstate_enable;
this_func();
}
+
+void xsave(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ fpu_xsave(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_save(fpu);
+
+ if (mask & XCNTXT_LAZY)
+ fpu_clean(fpu);
+
+ stts();
+}
+EXPORT_SYMBOL(xsave);
+
+void save_xstates(struct task_struct *tsk)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xsave(&tsk->thread.fpu, ti->xstate_mask);
+
+ if (!(ti->xstate_mask & XCNTXT_LAZY))
+ tsk->fpu_counter = 0;
+
+ /*
+ * If the task hasn't used the fpu the last 5 timeslices,
+ * force a lazy restore of the math states by clearing them
+ * from xstate_mask.
+ */
+ if (tsk->fpu_counter < 5)
+ ti->xstate_mask &= ~XCNTXT_LAZY;
+
+ ti->status &= ~TS_USEDFPU;
+}
+EXPORT_SYMBOL(save_xstates);
+
+void xrstor(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ xrstor_state(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_restore(fpu);
+
+ if (!(mask & XCNTXT_LAZY))
+ stts();
+}
+EXPORT_SYMBOL(xrstor);
+
+void restore_xstates(struct task_struct *tsk, u64 mask)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xrstor(&tsk->thread.fpu, mask);
+
+ ti->xstate_mask |= mask;
+ ti->status |= TS_USEDFPU;
+ if (mask & XCNTXT_LAZY)
+ tsk->fpu_counter++;
+}
+EXPORT_SYMBOL(restore_xstates);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bcc0efc..8fb21ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -58,6 +58,7 @@
#include <asm/xcr.h>
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/xsave.h>

#define MAX_IO_MSRS 256
#define CR0_RESERVED_BITS \
@@ -5793,8 +5794,8 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
*/
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
- unlazy_fpu(current);
- fpu_restore_checking(&vcpu->arch.guest_fpu);
+ save_xstates(current);
+ xrstor(&vcpu->arch.guest_fpu, -1);
trace_kvm_fpu(1);
}

@@ -5806,7 +5807,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- fpu_save_init(&vcpu->arch.guest_fpu);
+ xsave(&vcpu->arch.guest_fpu, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9f1659c..ef62289 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -204,7 +204,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
* uses the FPU.
*/
if (cpu->ts)
- unlazy_fpu(current);
+ save_xstates(current);

/*
* SYSENTER is an optimized way of doing system calls. We can't allow
--
1.5.6.5

2011-03-09 19:15:23

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 1/8] x86, xsave: cleanup fpu/xsave support

Removed the functions fpu_fxrstor_checking() and restore_fpu_checking()
because they weren't doing anything. Removed redundant xsave/xrstor
implementations. Since xsave/xrstor is not specific to the FPU, and also
for consistency, all xsave/xrstor functions now take a xsave_struct
argument.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 20 +++-------
arch/x86/include/asm/xsave.h | 81 +++++++++++++++---------------------------
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/xsave.c | 4 +-
4 files changed, 38 insertions(+), 69 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ef32890..d908383 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -227,12 +227,14 @@ static inline void fpu_fxsave(struct fpu *fpu)
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
- fpu_xsave(fpu);
+ struct xsave_struct *xstate = &fpu->state->xsave;
+
+ fpu_xsave(xstate);

/*
* xsave header may indicate the init state of the FP.
*/
- if (!(fpu->state->xsave.xsave_hdr.xstate_bv & XSTATE_FP))
+ if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
return;
} else if (use_fxsr()) {
fpu_fxsave(fpu);
@@ -262,22 +264,12 @@ static inline void __save_init_fpu(struct task_struct *tsk)
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}

-static inline int fpu_fxrstor_checking(struct fpu *fpu)
-{
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
static inline int fpu_restore_checking(struct fpu *fpu)
{
if (use_xsave())
- return fpu_xrstor_checking(fpu);
+ return xrstor_checking(&fpu->state->xsave, -1);
else
- return fpu_fxrstor_checking(fpu);
-}
-
-static inline int restore_fpu_checking(struct task_struct *tsk)
-{
- return fpu_restore_checking(&tsk->thread.fpu);
+ return fxrstor_checking(&fpu->state->fxsave);
}

/*
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c6ce245..8bcbbce 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -42,10 +42,11 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
void __user *fpstate,
struct _fpx_sw_bytes *sw);

-static inline int fpu_xrstor_checking(struct fpu *fpu)
+static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
{
- struct xsave_struct *fx = &fpu->state->xsave;
int err;
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;

asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
"2:\n"
@@ -55,13 +56,23 @@ static inline int fpu_xrstor_checking(struct fpu *fpu)
".previous\n"
_ASM_EXTABLE(1b, 3b)
: [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (-1), "d" (-1), "0" (0)
+ : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
: "memory");

return err;
}

-static inline int xsave_user(struct xsave_struct __user *buf)
+static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
+{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
+ asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
+ : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+ : "memory");
+}
+
+static inline int xsave_checking(struct xsave_struct __user *buf)
{
int err;

@@ -74,58 +85,24 @@ static inline int xsave_user(struct xsave_struct __user *buf)
if (unlikely(err))
return -EFAULT;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
+ asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl $-1,%[err]\n"
+ " jmp 2b\n"
+ ".previous\n"
+ _ASM_EXTABLE(1b,3b)
+ : [err] "=r" (err)
+ : "D" (buf), "a" (-1), "d" (-1), "0" (0)
+ : "memory");
+
if (unlikely(err) && __clear_user(buf, xstate_size))
err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
-static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask)
-{
- int err;
- struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
- u32 lmask = mask;
- u32 hmask = mask >> 32;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0)
- : "memory"); /* memory required? */
+ /* No need to clear here because the caller clears USED_MATH */
return err;
}

-static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
-{
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
- : "memory");
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -136,7 +113,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct fpu *fpu)
+static inline void fpu_xsave(struct xsave_struct *fx)
{
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
@@ -144,7 +121,7 @@ static inline void fpu_xsave(struct fpu *fpu)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (&fpu->state->xsave), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (-1), "d" (-1) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..32f3043 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -728,7 +728,7 @@ void __math_state_restore(void)
/*
* Paranoid restore. send a SIGSEGV if we fail to restore the state.
*/
- if (unlikely(restore_fpu_checking(tsk))) {
+ if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
stts();
force_sig(SIGSEGV, tsk);
return;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 5471285..e204b07 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -170,7 +170,7 @@ int save_i387_xstate(void __user *buf)

if (task_thread_info(tsk)->status & TS_USEDFPU) {
if (use_xsave())
- err = xsave_user(buf);
+ err = xsave_checking(buf);
else
err = fxsave_user(buf);

@@ -247,7 +247,7 @@ static int restore_user_xstate(void __user *buf)
/*
* restore the state passed by the user.
*/
- err = xrestore_user(buf, mask);
+ err = xrstor_checking((__force struct xsave_struct *)buf, mask);
if (err)
return err;

--
1.5.6.5

2011-03-09 19:15:27

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 6/8] x86, xsave: add support for non-lazy xstates

Non-lazy xstates are, as the name suggests, extended states that cannot
be saved or restored lazily. The state for AMDs LWP feature is an
example of this.

This patch adds support for this kind of xstates. If any such states are
present and supported on the running system, they will always be enabled
in xstate_mask so that they are always restored in switch_to. Since lazy
allocation of the xstate area won't work when non-lazy xstates are used,
all tasks will always have a xstate area preallocated.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 11 +++++++++++
arch/x86/include/asm/xsave.h | 5 +++--
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/xsave.c | 9 +++++++++
5 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index c81b63e..b3b3f17 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -335,6 +335,17 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)

extern void fpu_finit(struct fpu *fpu);

+static inline void fpu_clear(struct fpu *fpu)
+{
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
+ } else {
+ fpu_free(fpu);
+ }
+}
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_I387_H */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index fbbc7db..18401cc 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -23,9 +23,10 @@
/*
* These are the features that the OS can handle currently.
*/
-#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_NONLAZY 0

-#define XCNTXT_LAZY XCNTXT_MASK
+#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8df07c3..a878736 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -257,7 +257,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}
EXPORT_SYMBOL_GPL(start_thread);

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 67c5838..67a6bc9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -344,7 +344,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}

void
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index b6d6f38..d4050fa 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -16,6 +16,7 @@
* Supported feature mask by the CPU and the kernel.
*/
u64 pcntxt_mask;
+EXPORT_SYMBOL(pcntxt_mask);

/*
* Represents init state for the supported extended state.
@@ -479,6 +480,14 @@ static void __init xstate_enable_boot_cpu(void)
printk(KERN_INFO "xsave/xrstor: enabled xstate_bv 0x%llx, "
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);
+
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ static union thread_xstate x;
+
+ task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
+ init_task.thread.fpu.state = &x;
+ fpu_finit(&init_task.thread.fpu);
+ }
}

/*
--
1.5.6.5

2011-03-09 19:15:30

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 8/8] x86, xsave: remove lazy allocation of xstate area

This patch completely removes lazy allocation of the xstate area. All
tasks will always have an xstate area preallocated, just like they
already do when non-lazy features are present. The size of the xsave
area ranges from 112 to 960 bytes, depending on the xstates present and
enabled. Since it is common to use SSE etc. for optimization, the actual
overhead is expected to negligible.

This removes some of the special-case handling of non-lazy xstates. It
also greatly simplifies init_fpu() by removing the allocation code, the
check for presence of the xstate area or init_fpu() return value.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 12 +++-------
arch/x86/kernel/i387.c | 46 +++++++++++-----------------------------
arch/x86/kernel/traps.c | 16 +------------
arch/x86/kernel/xsave.c | 21 ++----------------
arch/x86/kvm/x86.c | 4 +-
arch/x86/math-emu/fpu_entry.c | 8 +-----
6 files changed, 26 insertions(+), 81 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index b3b3f17..70136e3 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -40,7 +40,7 @@
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
-extern int init_fpu(struct task_struct *child);
+extern void init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

@@ -337,13 +337,9 @@ extern void fpu_finit(struct fpu *fpu);

static inline void fpu_clear(struct fpu *fpu)
{
- if (pcntxt_mask & XCNTXT_NONLAZY) {
- memset(fpu->state, 0, xstate_size);
- fpu_finit(fpu);
- set_used_math();
- } else {
- fpu_free(fpu);
- }
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 88fefba..32b3c8d 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -42,6 +42,8 @@ void __cpuinit mxcsr_feature_mask_init(void)

static void __cpuinit init_thread_xstate(void)
{
+ static union thread_xstate x;
+
/*
* Note that xstate_size might be overwriten later during
* xsave_init().
@@ -62,6 +64,9 @@ static void __cpuinit init_thread_xstate(void)
xstate_size = sizeof(struct i387_fxsave_struct);
else
xstate_size = sizeof(struct i387_fsave_struct);
+
+ init_task.thread.fpu.state = &x;
+ fpu_finit(&init_task.thread.fpu);
}

/*
@@ -127,30 +132,20 @@ EXPORT_SYMBOL_GPL(fpu_finit);
* value at reset if we support XMM instructions and then
* remeber the current task has used the FPU.
*/
-int init_fpu(struct task_struct *tsk)
+void init_fpu(struct task_struct *tsk)
{
- int ret;
-
if (tsk_used_math(tsk)) {
if (HAVE_HWFP && tsk == current) {
preempt_disable();
save_xstates(tsk);
preempt_enable();
}
- return 0;
+ return;
}

- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpu_alloc(&tsk->thread.fpu);
- if (ret)
- return ret;
-
fpu_finit(&tsk->thread.fpu);

set_stopped_child_used_math(tsk);
- return 0;
}
EXPORT_SYMBOL_GPL(init_fpu);

@@ -173,14 +168,10 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- int ret;
-
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -198,9 +189,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -232,9 +221,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

/*
* Copy the 48bytes defined by the software first into the xstate
@@ -262,9 +249,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->xsave, 0, -1);
@@ -427,11 +412,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
void *kbuf, void __user *ubuf)
{
struct user_i387_ia32_struct env;
- int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (!HAVE_HWFP)
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -462,9 +444,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 872fc78..c8fbd04 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -734,20 +734,8 @@ asmlinkage void math_state_restore(void)
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task;

- if (!tsk_used_math(tsk)) {
- local_irq_enable();
- /*
- * does a slab alloc which can sleep
- */
- if (init_fpu(tsk)) {
- /*
- * ran out of memory!
- */
- do_group_exit(SIGKILL);
- return;
- }
- local_irq_disable();
- }
+ if (!tsk_used_math(tsk))
+ init_fpu(tsk);

restore_xstates(tsk, XCNTXT_LAZY);
}
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d4050fa..78f7a1c 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -262,7 +262,6 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
int xstate_invalid = 0;
- int err;

if (!buf) {
if (used_math()) {
@@ -275,11 +274,8 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;

- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
+ if (!used_math())
+ init_fpu(tsk);

if (!HAVE_HWFP) {
set_used_math();
@@ -481,13 +477,8 @@ static void __init xstate_enable_boot_cpu(void)
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);

- if (pcntxt_mask & XCNTXT_NONLAZY) {
- static union thread_xstate x;
-
+ if (pcntxt_mask & XCNTXT_NONLAZY)
task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
- init_task.thread.fpu.state = &x;
- fpu_finit(&init_task.thread.fpu);
- }
}

/*
@@ -530,9 +521,6 @@ void save_xstates(struct task_struct *tsk)
{
struct thread_info *ti = task_thread_info(tsk);

- if (!fpu_allocated(&tsk->thread.fpu))
- return;
-
xsave(&tsk->thread.fpu.state->xsave, ti->xstate_mask);

if (!(ti->xstate_mask & XCNTXT_LAZY))
@@ -566,9 +554,6 @@ void restore_xstates(struct task_struct *tsk, u64 mask)
{
struct thread_info *ti = task_thread_info(tsk);

- if (!fpu_allocated(&tsk->thread.fpu))
- return;
-
xrstor(&tsk->thread.fpu.state->xsave, mask);

ti->xstate_mask |= mask;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 10aeb04..bd71b12 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5377,8 +5377,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;

- if (!tsk_used_math(current) && init_fpu(current))
- return -ENOMEM;
+ if (!tsk_used_math(current))
+ init_fpu(current);

if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 7718541..472e2b9 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -147,12 +147,8 @@ void math_emulate(struct math_emu_info *info)
unsigned long code_limit = 0; /* Initialized to stop compiler warnings */
struct desc_struct code_descriptor;

- if (!used_math()) {
- if (init_fpu(current)) {
- do_group_exit(SIGKILL);
- return;
- }
- }
+ if (!used_math())
+ init_fpu(current);

#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
1.5.6.5

2011-03-09 19:15:33

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 3/8] x86, xsave: cleanup fpu/xsave signal frame setup

There are currently two code paths that handle the fpu/xsave context in
a signal frame for 32bit and 64bit tasks. These two code paths differ
only in that they have or lack certain micro-optimizations or do some
additional work (fsave compatibility for 32bit). The code is complex,
mostly duplicate and hard to understand and maintain.

This patch creates a set of two new, unified and cleaned up functions to
replace them. Besides avoiding the duplicate code, it is now obvious
what is done in which situations. The micro-optimization w.r.t xsave
(saving and restoring directly from the user buffer) is gone, and with
it the headaches caused by it about validating the buffer alignment and
contents and catching possible xsave/xrstor faults.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 20 ++++
arch/x86/include/asm/xsave.h | 4 +-
arch/x86/kernel/i387.c | 32 ++-----
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/xsave.c | 199 ++++++++++++++++++++++++++++++++++++++++--
6 files changed, 227 insertions(+), 36 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..2605fae 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -255,7 +255,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,

get_user_ex(tmp, &sc->fpstate);
buf = compat_ptr(tmp);
- err |= restore_i387_xstate_ia32(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_ia32_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -396,7 +396,7 @@ static void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
if (used_math()) {
sp = sp - sig_xstate_ia32_size;
*fpstate = (struct _fpstate_ia32 *) sp;
- if (save_i387_xstate_ia32(*fpstate) < 0)
+ if (save_xstates_sigframe(*fpstate, sig_xstate_ia32_size) < 0)
return (void __user *) -1L;
}

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 939af08..30930bf 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -25,6 +25,20 @@
#include <asm/uaccess.h>
#include <asm/xsave.h>

+#ifdef CONFIG_X86_64
+# include <asm/sigcontext32.h>
+# include <asm/user32.h>
+#else
+# define save_i387_xstate_ia32 save_i387_xstate
+# define restore_i387_xstate_ia32 restore_i387_xstate
+# define _fpstate_ia32 _fpstate
+# define _xstate_ia32 _xstate
+# define sig_xstate_ia32_size sig_xstate_size
+# define fx_sw_reserved_ia32 fx_sw_reserved
+# define user_i387_ia32_struct user_i387_struct
+# define user32_fxsr_struct user_fxsr_struct
+#endif
+
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
@@ -33,6 +47,9 @@ extern asmlinkage void math_state_restore(void);
extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

+extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
+extern void convert_to_fxsr(struct task_struct *, const struct user_i387_ia32_struct *);
+
extern user_regset_active_fn fpregs_active, xfpregs_active;
extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
xstateregs_get;
@@ -46,6 +63,7 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
#define xstateregs_active fpregs_active

extern struct _fpx_sw_bytes fx_sw_reserved;
+extern unsigned int mxcsr_feature_mask;
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
@@ -56,8 +74,10 @@ extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
+# define HAVE_HWFP (boot_cpu_data.hard_math)
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
+# define HAVE_HWFP 1
static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
#endif

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 6052a84..200c56d 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -41,12 +41,14 @@ extern void xsave(struct fpu *, u64);
extern void xrstor(struct fpu *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
+extern int save_xstates_sigframe(void __user *, unsigned int);
+extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+ unsigned int size,
struct _fpx_sw_bytes *sw);

static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 5ab66ec..5cec7c2 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -18,27 +18,7 @@
#include <asm/i387.h>
#include <asm/user.h>

-#ifdef CONFIG_X86_64
-# include <asm/sigcontext32.h>
-# include <asm/user32.h>
-#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
-# define _fpstate_ia32 _fpstate
-# define _xstate_ia32 _xstate
-# define sig_xstate_ia32_size sig_xstate_size
-# define fx_sw_reserved_ia32 fx_sw_reserved
-# define user_i387_ia32_struct user_i387_struct
-# define user32_fxsr_struct user_fxsr_struct
-#endif
-
-#ifdef CONFIG_MATH_EMULATION
-# define HAVE_HWFP (boot_cpu_data.hard_math)
-#else
-# define HAVE_HWFP 1
-#endif
-
-static unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
+unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
unsigned int sig_xstate_ia32_size = sizeof(struct _fpstate_ia32);
@@ -375,7 +355,7 @@ static inline u32 twd_fxsr_to_i387(struct i387_fxsave_struct *fxsave)
* FXSR floating point environment conversions.
*/

-static void
+void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -412,8 +392,8 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
memcpy(&to[i], &from[i], sizeof(to[0]));
}

-static void convert_to_fxsr(struct task_struct *tsk,
- const struct user_i387_ia32_struct *env)
+void convert_to_fxsr(struct task_struct *tsk,
+ const struct user_i387_ia32_struct *env)

{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -653,7 +633,9 @@ static int restore_i387_xsave(void __user *buf)
u64 mask;
int err;

- if (check_for_xstate(fx, buf, &fx_sw_user))
+ if (check_for_xstate(fx, sig_xstate_ia32_size -
+ offsetof(struct _fpstate_ia32, _fxsr_env),
+ &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 4fd173c..f6705ff 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -117,7 +117,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
regs->orig_ax = -1; /* disable syscall checks */

get_user_ex(buf, &sc->fpstate);
- err |= restore_i387_xstate(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -252,7 +252,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
return (void __user *)-1L;

/* save i387 state */
- if (used_math() && save_i387_xstate(*fpstate) < 0)
+ if (used_math() && save_xstates_sigframe(*fpstate, sig_xstate_size) < 0)
return (void __user *)-1L;

return (void __user *)sp;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index c422527..df9b0bb 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -103,8 +103,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
@@ -131,11 +130,11 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
fx_sw_user->xstate_size > fx_sw_user->extended_size)
return -EINVAL;

- err = __get_user(magic2, (__u32 *) (((void *)fpstate) +
- fx_sw_user->extended_size -
+ err = __get_user(magic2, (__u32 *) (((void *)buf) + size -
FP_XSTATE_MAGIC2_SIZE));
if (err)
return err;
+
/*
* Check for the presence of second magic word at the end of memory
* layout. This detects the case where the user just copied the legacy
@@ -148,11 +147,109 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
return 0;
}

-#ifdef CONFIG_X86_64
/*
* Signal frame handlers.
*/
+int save_xstates_sigframe(void __user *buf, unsigned int size)
+{
+ void __user *buf_fxsave = buf;
+ struct task_struct *tsk = current;
+ struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ int err;
+
+ if (!access_ok(VERIFY_WRITE, buf, size))
+ return -EACCES;
+
+ BUG_ON(size < xstate_size);
+
+ if (!used_math())
+ return 0;
+
+ clear_used_math(); /* trigger finit */
+
+ if (!HAVE_HWFP)
+ return fpregs_soft_get(current, NULL, 0,
+ sizeof(struct user_i387_ia32_struct), NULL,
+ (struct _fpstate_ia32 __user *) buf) ? -1 : 1;
+
+ save_xstates(tsk);
+ sanitize_i387_state(tsk);
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32) {
+ if (cpu_has_xsave || cpu_has_fxsr) {
+ struct user_i387_ia32_struct env;
+ struct _fpstate_ia32 __user *fp = buf;
+
+ convert_from_fxsr(&env, tsk);
+ if (__copy_to_user(buf, &env, sizeof(env)))
+ return -1;
+
+ err = __put_user(xsave->i387.swd, &fp->status);
+ err |= __put_user(X86_FXSR_MAGIC, &fp->magic);
+
+ if (err)
+ return -1;
+
+ buf_fxsave = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+#if defined(CONFIG_X86_64)
+ buf = buf_fxsave;
+#endif
+ } else {
+ struct i387_fsave_struct *fsave =
+ &tsk->thread.fpu.state->fsave;
+
+ fsave->status = fsave->swd;
+ }
+ }
+#endif
+
+ if (__copy_to_user(buf_fxsave, xsave, size))
+ return -1;
+
+ if (cpu_has_xsave) {
+ struct _fpstate __user *fp = buf;
+ struct _xstate __user *x = buf;
+ u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+
+ err = __copy_to_user(&fp->sw_reserved,
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ ia32 ? &fx_sw_reserved_ia32 :
+#endif
+ &fx_sw_reserved,
+ sizeof (struct _fpx_sw_bytes));
+
+ err |= __put_user(FP_XSTATE_MAGIC2,
+ (__u32 __user *) (buf_fxsave + size
+ - FP_XSTATE_MAGIC2_SIZE));
+
+ /*
+ * For legacy compatible, we always set FP/SSE bits in the bit
+ * vector while saving the state to the user context. This will
+ * enable us capturing any changes(during sigreturn) to
+ * the FP/SSE bits by the legacy applications which don't touch
+ * xstate_bv in the xsave header.
+ *
+ * xsave aware apps can change the xstate_bv in the xsave
+ * header as well as change any contents in the memory layout.
+ * xrestore as part of sigreturn will capture all the changes.
+ */
+ xstate_bv |= XSTATE_FPSSE;
+
+ err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
+
+ if (err)
+ return err;
+ }

+ return 1;
+}
+
+#ifdef CONFIG_X86_64
int save_i387_xstate(void __user *buf)
{
struct task_struct *tsk = current;
@@ -240,7 +337,7 @@ static int restore_user_xstate(void __user *buf)
int err;

if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, buf, &fx_sw_user))
+ check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
@@ -315,6 +412,96 @@ clear:
}
#endif

+int restore_xstates_sigframe(void __user *buf, unsigned int size)
+{
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ struct user_i387_ia32_struct env;
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ struct _fpx_sw_bytes fx_sw_user;
+ struct task_struct *tsk = current;
+ struct _fpstate_ia32 __user *fp = buf;
+ struct xsave_struct *xsave;
+ int xstate_invalid = 0;
+ int err;
+
+ if (!buf) {
+ if (used_math()) {
+ clear_fpu(tsk);
+ clear_used_math();
+ }
+ return 0;
+ }
+
+ if (!access_ok(VERIFY_READ, buf, size))
+ return -EACCES;
+
+ if (!used_math()) {
+ err = init_fpu(tsk);
+ if (err)
+ return err;
+ }
+
+ if (!HAVE_HWFP) {
+ set_used_math();
+ return fpregs_soft_set(current, NULL,
+ 0, sizeof(struct user_i387_ia32_struct),
+ NULL, fp) != 0;
+ }
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32 && (cpu_has_xsave || cpu_has_fxsr)) {
+ if (__copy_from_user(&env, buf, sizeof(env)))
+ return -1;
+ buf = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+ }
+#endif
+
+ if (cpu_has_xsave)
+ xstate_invalid = check_for_xstate(buf, size, &fx_sw_user);
+
+ xsave = &tsk->thread.fpu.state->xsave;
+ if (__copy_from_user(xsave, buf, xstate_size))
+ return -1;
+
+ if (cpu_has_xsave) {
+ u64 *xstate_bv = &xsave->xsave_hdr.xstate_bv;
+
+ /*
+ * If this is no valid xstate, disable all extended states.
+ *
+ * For valid xstates, clear any illegal bits and any bits
+ * that have been cleared in fx_sw_user.xstate_bv.
+ */
+ if (xstate_invalid)
+ *xstate_bv = XSTATE_FPSSE;
+ else
+ *xstate_bv &= pcntxt_mask & fx_sw_user.xstate_bv;
+
+ task_thread_info(tsk)->xstate_mask |= *xstate_bv;
+
+ xsave->xsave_hdr.reserved1[0] =
+ xsave->xsave_hdr.reserved1[1] = 0;
+ } else {
+ task_thread_info(tsk)->xstate_mask |= XCNTXT_LAZY;
+ }
+
+ if (cpu_has_xsave || cpu_has_fxsr) {
+ xsave->i387.mxcsr &= mxcsr_feature_mask;
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32)
+ convert_to_fxsr(tsk, &env);
+#endif
+ }
+
+ set_used_math();
+ restore_xstates(tsk, task_thread_info(tsk)->xstate_mask);
+
+ return 0;
+}
+
/*
* Prepare the SW reserved portion of the fxsave memory layout, indicating
* the presence of the extended state information in the memory layout
--
1.5.6.5

2011-03-09 19:16:10

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 5/8] x86, xsave: more cleanups

Removed some declarations from headers that weren't used.

Retired TS_USEDFPU, it has been replaced by the XCNTXT_* bits in
xstate_mask.

There is no reason functions like fpu_fxsave() etc. need to know or
handle anything else than a buffer to save/restore their stuff to/from.

Sanitize_i387_state() is extra work that is only needed when xsaveopt is
used. There is no point in hiding this in an inline function, adding
extra code lines just to save a single if() in the five places it is
used. Also, it is obscuring a fact that might well be interesting to
whoever is reading the code, but it is not gaining anything.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 48 ++++++++++++------------------------
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/include/asm/xsave.h | 9 ++----
arch/x86/kernel/i387.c | 12 ++++++---
arch/x86/kernel/xsave.c | 32 +++++++++++------------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 4 +-
7 files changed, 45 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 97867ea..c81b63e 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -42,7 +42,6 @@ extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
extern int init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
-extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
@@ -60,15 +59,10 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
*/
#define xstateregs_active fpregs_active

-extern struct _fpx_sw_bytes fx_sw_reserved;
extern unsigned int mxcsr_feature_mask;
+
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
-extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
-struct _fpstate_ia32;
-struct _xstate_ia32;
-extern int save_i387_xstate_ia32(void __user *buf);
-extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
@@ -76,7 +70,7 @@ extern int restore_i387_xstate_ia32(void __user *buf);
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
# define HAVE_HWFP 1
-static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
+# define finit_soft_fpu(x)
#endif

#define X87_FSW_ES (1 << 7) /* Exception Summary */
@@ -96,15 +90,6 @@ static __always_inline __pure bool use_fxsr(void)
return static_cpu_has(X86_FEATURE_FXSR);
}

-extern void __sanitize_i387_state(struct task_struct *);
-
-static inline void sanitize_i387_state(struct task_struct *tsk)
-{
- if (!use_xsaveopt())
- return;
- __sanitize_i387_state(tsk);
-}
-
#ifdef CONFIG_X86_64
static inline void fxrstor(struct i387_fxsave_struct *fx)
{
@@ -118,7 +103,7 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
#endif
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
/* Using "rex64; fxsave %0" is broken because, if the memory operand
uses any extended registers for addressing, a second REX prefix
@@ -129,7 +114,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
/* Using "fxsaveq %0" would be the ideal choice, but is only supported
starting with gas 2.16. */
__asm__ __volatile__("fxsaveq %0"
- : "=m" (fpu->state->fxsave));
+ : "=m" (*fx));
#else
/* Using, as a workaround, the properly prefixed form below isn't
accepted by any binutils version so far released, complaining that
@@ -140,8 +125,8 @@ static inline void fpu_fxsave(struct fpu *fpu)
This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
asm volatile("rex64/fxsave (%[fx])"
- : "=m" (fpu->state->fxsave)
- : [fx] "R" (&fpu->state->fxsave));
+ : "=m" (*fx)
+ : [fx] "R" (fx));
#endif
}

@@ -161,10 +146,10 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
"m" (*fx));
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
asm volatile("fxsave %[fx]"
- : [fx] "=m" (fpu->state->fxsave));
+ : [fx] "=m" (*fx));
}

#endif /* CONFIG_X86_64 */
@@ -181,25 +166,25 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
-static inline void fpu_restore(struct fpu *fpu)
+static inline void fpu_restore(struct i387_fxsave_struct *fx)
{
- fxrstor(&fpu->state->fxsave);
+ fxrstor(fx);
}

-static inline void fpu_save(struct fpu *fpu)
+static inline void fpu_save(struct i387_fxsave_struct *fx)
{
if (use_fxsr()) {
- fpu_fxsave(fpu);
+ fpu_fxsave(fx);
} else {
asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
+ : [fx] "=m" (*fx));
}
}

-static inline void fpu_clean(struct fpu *fpu)
+static inline void fpu_clean(struct i387_fxsave_struct *fx)
{
u32 swd = (use_fxsr() || use_xsave()) ?
- fpu->state->fxsave.swd : fpu->state->fsave.swd;
+ fx->swd : ((struct i387_fsave_struct *)fx)->swd;

if (unlikely(swd & X87_FSW_ES))
asm volatile("fnclex");
@@ -217,12 +202,11 @@ static inline void fpu_clean(struct fpu *fpu)

static inline void __clear_fpu(struct task_struct *tsk)
{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
+ if (task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
_ASM_EXTABLE(1b, 2b));
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
stts();
}
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 5c92d21..13de316 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -238,8 +238,6 @@ static inline struct thread_info *current_thread_info(void)
* ever touches our thread-synchronous status, so we don't
* have to worry about atomic accesses.
*/
-#define TS_USEDFPU 0x0001 /* FPU was used by this task
- this quantum (SMP) */
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
#define TS_POLLING 0x0004 /* idle task polling need_resched,
skip sending interrupt */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 742da4a..fbbc7db 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -37,8 +37,8 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

-extern void xsave(struct fpu *, u64);
-extern void xrstor(struct fpu *, u64);
+extern void xsave(struct xsave_struct *, u64);
+extern void xrstor(struct xsave_struct *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
extern int save_xstates_sigframe(void __user *, unsigned int);
@@ -46,10 +46,7 @@ extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
-extern int init_fpu(struct task_struct *child);
-extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- unsigned int size,
- struct _fpx_sw_bytes *sw);
+extern void sanitize_i387_state(struct task_struct *);

static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index d2d2b69..88fefba 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -182,7 +182,8 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -201,7 +202,8 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -440,7 +442,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
-1);
}

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (kbuf && pos == 0 && count == sizeof(env)) {
convert_from_fxsr(kbuf, target);
@@ -463,7 +466,8 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (!HAVE_HWFP)
return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf);
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a10c13e..b6d6f38 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -39,7 +39,7 @@ static unsigned int *xstate_offsets, *xstate_sizes, xstate_features;
* that the user doesn't see some stale state in the memory layout during
* signal handling, debugging etc.
*/
-void __sanitize_i387_state(struct task_struct *tsk)
+void sanitize_i387_state(struct task_struct *tsk)
{
u64 xstate_bv;
int feature_bit = 0x2;
@@ -48,7 +48,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!fx)
return;

- BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);
+ BUG_ON(task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY);

xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;

@@ -103,8 +103,8 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
- struct _fpx_sw_bytes *fx_sw_user)
+static int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
+ struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
sizeof(struct xsave_hdr_struct);
@@ -176,7 +176,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
- sanitize_i387_state(tsk);
+ if (use_xsaveopt())
+ sanitize_i387_state(tsk);

#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
if (ia32) {
@@ -500,17 +501,17 @@ void __cpuinit xsave_init(void)
this_func();
}

-void xsave(struct fpu *fpu, u64 mask)
+void xsave(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- fpu_xsave(&fpu->state->xsave, mask);
+ fpu_xsave(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_save(fpu);
+ fpu_save(&x->i387);

if (mask & XCNTXT_LAZY)
- fpu_clean(fpu);
+ fpu_clean(&x->i387);

stts();
}
@@ -523,7 +524,7 @@ void save_xstates(struct task_struct *tsk)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xsave(&tsk->thread.fpu, ti->xstate_mask);
+ xsave(&tsk->thread.fpu.state->xsave, ti->xstate_mask);

if (!(ti->xstate_mask & XCNTXT_LAZY))
tsk->fpu_counter = 0;
@@ -535,19 +536,17 @@ void save_xstates(struct task_struct *tsk)
*/
if (tsk->fpu_counter < 5)
ti->xstate_mask &= ~XCNTXT_LAZY;
-
- ti->status &= ~TS_USEDFPU;
}
EXPORT_SYMBOL(save_xstates);

-void xrstor(struct fpu *fpu, u64 mask)
+void xrstor(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- xrstor_state(&fpu->state->xsave, mask);
+ xrstor_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_restore(fpu);
+ fpu_restore(&x->i387);

if (!(mask & XCNTXT_LAZY))
stts();
@@ -561,10 +560,9 @@ void restore_xstates(struct task_struct *tsk, u64 mask)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xrstor(&tsk->thread.fpu, mask);
+ xrstor(&tsk->thread.fpu.state->xsave, mask);

ti->xstate_mask |= mask;
- ti->status |= TS_USEDFPU;
if (mask & XCNTXT_LAZY)
tsk->fpu_counter++;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bf89ec2..d79bf2f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -876,7 +876,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
#ifdef CONFIG_X86_64
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
#endif
- if (current_thread_info()->status & TS_USEDFPU)
+ if (current_thread_info()->xstate_mask & XCNTXT_LAZY)
clts();
load_gdt(&__get_cpu_var(host_gdt));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8fb21ea..10aeb04 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5795,7 +5795,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
save_xstates(current);
- xrstor(&vcpu->arch.guest_fpu, -1);
+ xrstor(&vcpu->arch.guest_fpu.state->xsave, -1);
trace_kvm_fpu(1);
}

@@ -5807,7 +5807,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- xsave(&vcpu->arch.guest_fpu, -1);
+ xsave(&vcpu->arch.guest_fpu.state->xsave, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
--
1.5.6.5

2011-03-09 19:16:14

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 7/8] x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)

This patch extends the xsave structure to support the LWP state. The
xstate feature bit for LWP is added to XCNTXT_NONLAZY, thereby enabling
kernel support for saving/restoring LWP state.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/processor.h | 12 ++++++++++++
arch/x86/include/asm/sigcontext.h | 12 ++++++++++++
arch/x86/include/asm/xsave.h | 3 ++-
3 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4c25ab4..df2cbd4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -353,6 +353,17 @@ struct ymmh_struct {
u32 ymmh_space[64];
};

+struct lwp_struct {
+ u64 lwpcb_addr;
+ u32 flags;
+ u32 buf_head_offset;
+ u64 buf_base;
+ u32 buf_size;
+ u32 filters;
+ u64 saved_event_record[4];
+ u32 event_counter[16];
+};
+
struct xsave_hdr_struct {
u64 xstate_bv;
u64 reserved1[2];
@@ -363,6 +374,7 @@ struct xsave_struct {
struct i387_fxsave_struct i387;
struct xsave_hdr_struct xsave_hdr;
struct ymmh_struct ymmh;
+ struct lwp_struct lwp;
/* new processor state extensions will go here */
} __attribute__ ((packed, aligned (64)));

diff --git a/arch/x86/include/asm/sigcontext.h b/arch/x86/include/asm/sigcontext.h
index 04459d2..0a58b82 100644
--- a/arch/x86/include/asm/sigcontext.h
+++ b/arch/x86/include/asm/sigcontext.h
@@ -274,6 +274,17 @@ struct _ymmh_state {
__u32 ymmh_space[64];
};

+struct _lwp_state {
+ __u64 lwpcb_addr;
+ __u32 flags;
+ __u32 buf_head_offset;
+ __u64 buf_base;
+ __u32 buf_size;
+ __u32 filters;
+ __u64 saved_event_record[4];
+ __u32 event_counter[16];
+};
+
/*
* Extended state pointed by the fpstate pointer in the sigcontext.
* In addition to the fpstate, information encoded in the xstate_hdr
@@ -284,6 +295,7 @@ struct _xstate {
struct _fpstate fpstate;
struct _xsave_hdr xstate_hdr;
struct _ymmh_state ymmh;
+ struct _lwp_state lwp;
/* new processor state extensions go here */
};

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 18401cc..a169115 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -9,6 +9,7 @@
#define XSTATE_FP 0x1
#define XSTATE_SSE 0x2
#define XSTATE_YMM 0x4
+#define XSTATE_LWP (1ULL << 62)

#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)

@@ -24,7 +25,7 @@
* These are the features that the OS can handle currently.
*/
#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
-#define XCNTXT_NONLAZY 0
+#define XCNTXT_NONLAZY (XSTATE_LWP)

#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

--
1.5.6.5

2011-03-09 19:16:07

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC 4/8] x86, xsave: remove unused code

The patches to rework the fpu/xsave handling and signal frame setup have
made a lot of code unused. This patch removes all this now useless stuff.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 155 ++----------------------------
arch/x86/include/asm/xsave.h | 51 ----------
arch/x86/kernel/i387.c | 221 ------------------------------------------
arch/x86/kernel/traps.c | 22 ----
arch/x86/kernel/xsave.c | 163 -------------------------------
5 files changed, 7 insertions(+), 605 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 30930bf..97867ea 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -29,8 +29,6 @@
# include <asm/sigcontext32.h>
# include <asm/user32.h>
#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
# define _fpstate_ia32 _fpstate
# define _xstate_ia32 _xstate
# define sig_xstate_ia32_size sig_xstate_size
@@ -108,75 +106,16 @@ static inline void sanitize_i387_state(struct task_struct *tsk)
}

#ifdef CONFIG_X86_64
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
- int err;
-
- /* See comment in fxsave() below. */
-#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxrstorq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "m" (*fx), "0" (0));
-#else
- asm volatile("1: rex64/fxrstor (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "R" (fx), "m" (*fx), "0" (0));
-#endif
- return err;
-}
-
-static inline int fxsave_user(struct i387_fxsave_struct __user *fx)
-{
- int err;
-
- /*
- * Clear the bytes not touched by the fxsave and reserved
- * for the SW usage.
- */
- err = __clear_user(&fx->sw_reserved,
- sizeof(struct _fpx_sw_bytes));
- if (unlikely(err))
- return -EFAULT;
-
/* See comment in fxsave() below. */
#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxsaveq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), [fx] "=m" (*fx)
- : "0" (0));
+ asm volatile("fxrstorq %[fx]\n\t"
+ : : [fx] "m" (*fx));
#else
- asm volatile("1: rex64/fxsave (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), "=m" (*fx)
- : [fx] "R" (fx), "0" (0));
+ asm volatile("rex64/fxrstor (%[fx])\n\t"
+ : : [fx] "R" (fx), "m" (*fx));
#endif
- if (unlikely(err) &&
- __clear_user(fx, sizeof(struct i387_fxsave_struct)))
- err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -209,7 +148,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
#else /* CONFIG_X86_32 */

/* perform fxrstor iff the processor has extended states, otherwise frstor */
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
/*
* The "nop" is needed to make the instructions the same
@@ -220,8 +159,6 @@ static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
"fxrstor %1",
X86_FEATURE_FXSR,
"m" (*fx));
-
- return 0;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -246,7 +183,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
*/
static inline void fpu_restore(struct fpu *fpu)
{
- fxrstor_checking(&fpu->state->fxsave);
+ fxrstor(&fpu->state->fxsave);
}

static inline void fpu_save(struct fpu *fpu)
@@ -278,69 +215,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void fpu_save_init(struct fpu *fpu)
-{
- if (use_xsave()) {
- struct xsave_struct *xstate = &fpu->state->xsave;
-
- fpu_xsave(xstate, -1);
-
- /*
- * xsave header may indicate the init state of the FP.
- */
- if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
- return;
- } else if (use_fxsr()) {
- fpu_fxsave(fpu);
- } else {
- asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
- return;
- }
-
- if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES))
- asm volatile("fnclex");
-
- /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
- is pending. Clear the x87 state here by setting it to fixed
- values. safe_address is a random variable that should be in L1 */
- alternative_input(
- ASM_NOP8 ASM_NOP2,
- "emms\n\t" /* clear stack tags */
- "fildl %P[addr]", /* set F?P to defined value */
- X86_FEATURE_FXSAVE_LEAK,
- [addr] "m" (safe_address));
-}
-
-static inline void __save_init_fpu(struct task_struct *tsk)
-{
- fpu_save_init(&tsk->thread.fpu);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
-}
-
-static inline int fpu_restore_checking(struct fpu *fpu)
-{
- if (use_xsave())
- return xrstor_checking(&fpu->state->xsave, -1);
- else
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
-/*
- * Signal frame handlers...
- */
-extern int save_i387_xstate(void __user *buf);
-extern int restore_i387_xstate(void __user *buf);
-
-static inline void __unlazy_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- __save_init_fpu(tsk);
- stts();
- } else
- tsk->fpu_counter = 0;
-}
-
static inline void __clear_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
@@ -409,21 +283,6 @@ static inline void irq_ts_restore(int TS_state)
/*
* These disable preemption on their own and are safe
*/
-static inline void save_init_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __save_init_fpu(tsk);
- stts();
- preempt_enable();
-}
-
-static inline void unlazy_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __unlazy_fpu(tsk);
- preempt_enable();
-}
-
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 200c56d..742da4a 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -51,26 +51,6 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
unsigned int size,
struct _fpx_sw_bytes *sw);

-static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
-{
- int err;
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
- : "memory");
-
- return err;
-}
-
static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -81,37 +61,6 @@ static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline int xsave_checking(struct xsave_struct __user *buf)
-{
- int err;
-
- /*
- * Clear the xsave header first, so that reserved fields are
- * initialized to zero.
- */
- err = __clear_user(&buf->xsave_hdr,
- sizeof(struct xsave_hdr_struct));
- if (unlikely(err))
- return -EFAULT;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b,3b)
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
-
- if (unlikely(err) && __clear_user(buf, xstate_size))
- err = -EFAULT;
-
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 5cec7c2..d2d2b69 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -490,227 +490,6 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
}

/*
- * Signal frame handlers.
- */
-
-static inline int save_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fsave_struct *fp = &tsk->thread.fpu.state->fsave;
-
- fp->status = fp->swd;
- if (__copy_to_user(buf, fp, sizeof(struct i387_fsave_struct)))
- return -1;
- return 1;
-}
-
-static int save_i387_fxsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
- struct user_i387_ia32_struct env;
- int err = 0;
-
- convert_from_fxsr(&env, tsk);
- if (__copy_to_user(buf, &env, sizeof(env)))
- return -1;
-
- err |= __put_user(fx->swd, &buf->status);
- err |= __put_user(X86_FXSR_MAGIC, &buf->magic);
- if (err)
- return -1;
-
- if (__copy_to_user(&buf->_fxsr_env[0], fx, xstate_size))
- return -1;
- return 1;
-}
-
-static int save_i387_xsave(void __user *buf)
-{
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fx = buf;
- int err = 0;
-
-
- sanitize_i387_state(tsk);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context.
- * This will enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware applications can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
-
- if (save_i387_fxsave(fx) < 0)
- return -1;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved_ia32,
- sizeof(struct _fpx_sw_bytes));
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_ia32_size
- - FP_XSTATE_MAGIC2_SIZE));
- if (err)
- return -1;
-
- return 1;
-}
-
-int save_i387_xstate_ia32(void __user *buf)
-{
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
- struct task_struct *tsk = current;
-
- if (!used_math())
- return 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_ia32_size))
- return -EACCES;
- /*
- * This will cause a "finit" to be triggered by the next
- * attempted FPU operation by the 'current' process.
- */
- clear_used_math();
-
- if (!HAVE_HWFP) {
- return fpregs_soft_get(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) ? -1 : 1;
- }
-
- preempt_disable();
- save_xstates(tsk);
- preempt_enable();
-
- if (cpu_has_xsave)
- return save_i387_xsave(fp);
- if (cpu_has_fxsr)
- return save_i387_fxsave(fp);
- else
- return save_i387_fsave(fp);
-}
-
-static inline int restore_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
-
- return __copy_from_user(&tsk->thread.fpu.state->fsave, buf,
- sizeof(struct i387_fsave_struct));
-}
-
-static int restore_i387_fxsave(struct _fpstate_ia32 __user *buf,
- unsigned int size)
-{
- struct task_struct *tsk = current;
- struct user_i387_ia32_struct env;
- int err;
-
- err = __copy_from_user(&tsk->thread.fpu.state->fxsave, &buf->_fxsr_env[0],
- size);
- /* mxcsr reserved bits must be masked to zero for security reasons */
- tsk->thread.fpu.state->fxsave.mxcsr &= mxcsr_feature_mask;
- if (err || __copy_from_user(&env, buf, sizeof(env)))
- return 1;
- convert_to_fxsr(tsk, &env);
-
- return 0;
-}
-
-static int restore_i387_xsave(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- struct _fpstate_ia32 __user *fx_user =
- ((struct _fpstate_ia32 __user *) buf);
- struct i387_fxsave_struct __user *fx =
- (struct i387_fxsave_struct __user *) &fx_user->_fxsr_env[0];
- struct xsave_hdr_struct *xsave_hdr =
- &current->thread.fpu.state->xsave.xsave_hdr;
- u64 mask;
- int err;
-
- if (check_for_xstate(fx, sig_xstate_ia32_size -
- offsetof(struct _fpstate_ia32, _fxsr_env),
- &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- err = restore_i387_fxsave(buf, fx_sw_user.xstate_size);
-
- xsave_hdr->xstate_bv &= pcntxt_mask;
- /*
- * These bits must be zero.
- */
- xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
-
- /*
- * Init the state that is not present in the memory layout
- * and enabled by the OS.
- */
- mask = ~(pcntxt_mask & ~mask);
- xsave_hdr->xstate_bv &= mask;
-
- return err;
-fx_only:
- /*
- * Couldn't find the extended state information in the memory
- * layout. Restore the FP/SSE and init the other extended state
- * enabled by the OS.
- */
- xsave_hdr->xstate_bv = XSTATE_FPSSE;
- return restore_i387_fxsave(buf, sizeof(struct i387_fxsave_struct));
-}
-
-int restore_i387_xstate_ia32(void __user *buf)
-{
- int err;
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
-
- if (HAVE_HWFP)
- clear_fpu(tsk);
-
- if (!buf) {
- if (used_math()) {
- clear_fpu(tsk);
- clear_used_math();
- }
-
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_ia32_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (HAVE_HWFP) {
- if (cpu_has_xsave)
- err = restore_i387_xsave(buf);
- else if (cpu_has_fxsr)
- err = restore_i387_fxsave(fp, sizeof(struct
- i387_fxsave_struct));
- else
- err = restore_i387_fsave(fp);
- } else {
- err = fpregs_soft_set(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) != 0;
- }
- set_used_math();
-
- return err;
-}
-
-/*
* FPU state for core dumps.
* This is only used for a.out dumps now.
* It is declared generically using elf_fpregset_t (which is
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 072c30e..872fc78 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -720,28 +720,6 @@ asmlinkage void __attribute__((weak)) smp_threshold_interrupt(void)
}

/*
- * __math_state_restore assumes that cr0.TS is already clear and the
- * fpu state is all ready for use. Used during context switch.
- */
-void __math_state_restore(void)
-{
- struct thread_info *thread = current_thread_info();
- struct task_struct *tsk = thread->task;
-
- /*
- * Paranoid restore. send a SIGSEGV if we fail to restore the state.
- */
- if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
- stts();
- force_sig(SIGSEGV, tsk);
- return;
- }
-
- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
- tsk->fpu_counter++;
-}
-
-/*
* 'math_state_restore()' saves the current math information in the
* old math state array, and gets the new ones from the current task
*
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index df9b0bb..a10c13e 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -249,169 +249,6 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
return 1;
}

-#ifdef CONFIG_X86_64
-int save_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_size))
- return -EACCES;
-
- BUG_ON(sig_xstate_size < xstate_size);
-
- if ((unsigned long)buf % 64)
- printk("save_i387_xstate: bad fpstate %p\n", buf);
-
- if (!used_math())
- return 0;
-
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- if (use_xsave())
- err = xsave_checking(buf);
- else
- err = fxsave_user(buf);
-
- if (err)
- return err;
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
- } else {
- sanitize_i387_state(tsk);
- if (__copy_to_user(buf, &tsk->thread.fpu.state->fxsave,
- xstate_size))
- return -1;
- }
-
- clear_used_math(); /* trigger finit */
-
- if (use_xsave()) {
- struct _fpstate __user *fx = buf;
- struct _xstate __user *x = buf;
- u64 xstate_bv;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved,
- sizeof(struct _fpx_sw_bytes));
-
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_size
- - FP_XSTATE_MAGIC2_SIZE));
-
- /*
- * Read the xstate_bv which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers and
- * set the FP/SSE bits.
- */
- err |= __get_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context. This will
- * enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware apps can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- xstate_bv |= XSTATE_FPSSE;
-
- err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- if (err)
- return err;
- }
-
- return 1;
-}
-
-/*
- * Restore the extended state if present. Otherwise, restore the FP/SSE
- * state.
- */
-static int restore_user_xstate(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- u64 mask;
- int err;
-
- if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- /*
- * restore the state passed by the user.
- */
- err = xrstor_checking((__force struct xsave_struct *)buf, mask);
- if (err)
- return err;
-
- /*
- * init the state skipped by the user.
- */
- mask = pcntxt_mask & ~mask;
- if (unlikely(mask))
- xrstor_state(init_xstate_buf, mask);
-
- return 0;
-
-fx_only:
- /*
- * couldn't find the extended state information in the
- * memory layout. Restore just the FP/SSE and init all
- * the other extended state.
- */
- xrstor_state(init_xstate_buf, pcntxt_mask & ~XSTATE_FPSSE);
- return fxrstor_checking((__force struct i387_fxsave_struct *)buf);
-}
-
-/*
- * This restores directly out of user space. Exceptions are handled.
- */
-int restore_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!buf) {
- if (used_math())
- goto clear;
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (!(task_thread_info(current)->status & TS_USEDFPU)) {
- clts();
- task_thread_info(current)->status |= TS_USEDFPU;
- }
- if (use_xsave())
- err = restore_user_xstate(buf);
- else
- err = fxrstor_checking((__force struct i387_fxsave_struct *)
- buf);
- if (unlikely(err)) {
- /*
- * Encountered an error while doing the restore from the
- * user buffer, clear the fpu state.
- */
-clear:
- clear_fpu(tsk);
- clear_used_math();
- }
- return err;
-}
-#endif
-
int restore_xstates_sigframe(void __user *buf, unsigned int size)
{
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
--
1.5.6.5

2011-03-23 15:28:07

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 6/8] x86, xsave: add support for non-lazy xstates

Non-lazy xstates are, as the name suggests, extended states that cannot
be saved or restored lazily. The state for AMDs LWP feature is an
example of this.

This patch adds support for this kind of xstates. If any such states are
present and supported on the running system, they will always be enabled
in xstate_mask so that they are always restored in switch_to. Since lazy
allocation of the xstate area won't work when non-lazy xstates are used,
all tasks will always have a xstate area preallocated.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 11 +++++++++++
arch/x86/include/asm/xsave.h | 5 +++--
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/xsave.c | 11 ++++++++++-
5 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index b8f9617..22ad24c 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -330,6 +330,17 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)

extern void fpu_finit(struct fpu *fpu);

+static inline void fpu_clear(struct fpu *fpu)
+{
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
+ } else {
+ fpu_free(fpu);
+ }
+}
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_I387_H */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index b8861d4..4ccee3c 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -23,9 +23,10 @@
/*
* These are the features that the OS can handle currently.
*/
-#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_NONLAZY 0

-#define XCNTXT_LAZY XCNTXT_MASK
+#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8df07c3..a878736 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -257,7 +257,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}
EXPORT_SYMBOL_GPL(start_thread);

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 67c5838..67a6bc9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -344,7 +344,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}

void
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 4e5bf58..7b08d32 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -16,6 +16,7 @@
* Supported feature mask by the CPU and the kernel.
*/
u64 pcntxt_mask;
+EXPORT_SYMBOL(pcntxt_mask);

/*
* Represents init state for the supported extended state.
@@ -260,7 +261,7 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct task_struct *tsk = current;
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
- u64 xstate_mask = 0;
+ u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
int err;

if (!buf) {
@@ -477,6 +478,14 @@ static void __init xstate_enable_boot_cpu(void)
printk(KERN_INFO "xsave/xrstor: enabled xstate_bv 0x%llx, "
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);
+
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ static union thread_xstate x;
+
+ task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
+ init_task.thread.fpu.state = &x;
+ fpu_finit(&init_task.thread.fpu);
+ }
}

/*
--
1.5.6.5

2011-03-23 15:28:11

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 2/8] x86, xsave: rework fpu/xsave support

This is a complete rework of the code that handles FPU and related
extended states. Since FPU, XMM and YMM states are just variants of what
xsave handles, all of the old FPU-specific state handling code will be
hidden behind a set of functions that resemble xsave and xrstor. For
hardware that does not support xsave, the code falls back to
fxsave/fxrstor or even fsave/frstor.

A xstate_mask member will be added to the thread_info structure that
will control which states are to be saved by xsave. It is set to include
all "lazy" states (that is, all states currently supported: FPU, XMM and
YMM) by the #NM handler when a lazy restore is triggered or by
switch_to() when the tasks FPU context is preloaded. Xstate_mask is
intended to completely replace TS_USEDFPU in a later cleanup patch.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 44 +++++++++++++++++++---
arch/x86/include/asm/thread_info.h | 2 +
arch/x86/include/asm/xsave.h | 14 ++++++-
arch/x86/kernel/i387.c | 11 ++++--
arch/x86/kernel/process_32.c | 27 +++++---------
arch/x86/kernel/process_64.c | 26 ++++----------
arch/x86/kernel/traps.c | 11 +++---
arch/x86/kernel/xsave.c | 71 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 7 ++--
drivers/lguest/x86/core.c | 2 +-
10 files changed, 158 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index d908383..939af08 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -224,12 +224,46 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
+static inline void fpu_restore(struct fpu *fpu)
+{
+ fxrstor_checking(&fpu->state->fxsave);
+}
+
+static inline void fpu_save(struct fpu *fpu)
+{
+ if (use_fxsr()) {
+ fpu_fxsave(fpu);
+ } else {
+ asm volatile("fsave %[fx]; fwait"
+ : [fx] "=m" (fpu->state->fsave));
+ }
+}
+
+static inline void fpu_clean(struct fpu *fpu)
+{
+ u32 swd = (use_fxsr() || use_xsave()) ?
+ fpu->state->fxsave.swd : fpu->state->fsave.swd;
+
+ if (unlikely(swd & X87_FSW_ES))
+ asm volatile("fnclex");
+
+ /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
+ is pending. Clear the x87 state here by setting it to fixed
+ values. safe_address is a random variable that should be in L1 */
+ alternative_input(
+ ASM_NOP8 ASM_NOP2,
+ "emms\n\t" /* clear stack tags */
+ "fildl %P[addr]", /* set F?P to defined value */
+ X86_FEATURE_FXSAVE_LEAK,
+ [addr] "m" (safe_address));
+}
+
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
struct xsave_struct *xstate = &fpu->state->xsave;

- fpu_xsave(xstate);
+ fpu_xsave(xstate, -1);

/*
* xsave header may indicate the init state of the FP.
@@ -295,18 +329,16 @@ static inline void __clear_fpu(struct task_struct *tsk)
"2:\n"
_ASM_EXTABLE(1b, 2b));
task_thread_info(tsk)->status &= ~TS_USEDFPU;
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
stts();
}
}

static inline void kernel_fpu_begin(void)
{
- struct thread_info *me = current_thread_info();
preempt_disable();
- if (me->status & TS_USEDFPU)
- __save_init_fpu(me->task);
- else
- clts();
+ save_xstates(current_thread_info()->task);
+ clts();
}

static inline void kernel_fpu_end(void)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f0b6e5d..5c92d21 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -26,6 +26,7 @@ struct exec_domain;
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
+ __u64 xstate_mask; /* xstates in use */
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
@@ -47,6 +48,7 @@ struct thread_info {
{ \
.task = &tsk, \
.exec_domain = &default_exec_domain, \
+ .xstate_mask = 0, \
.flags = 0, \
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 8bcbbce..6052a84 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -25,6 +25,8 @@
*/
#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)

+#define XCNTXT_LAZY XCNTXT_MASK
+
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
@@ -35,6 +37,11 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

+extern void xsave(struct fpu *, u64);
+extern void xrstor(struct fpu *, u64);
+extern void save_xstates(struct task_struct *);
+extern void restore_xstates(struct task_struct *, u64);
+
extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
@@ -113,15 +120,18 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx)
+static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
alternative_input(
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index e60c38c..5ab66ec 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -152,8 +152,11 @@ int init_fpu(struct task_struct *tsk)
int ret;

if (tsk_used_math(tsk)) {
- if (HAVE_HWFP && tsk == current)
- unlazy_fpu(tsk);
+ if (HAVE_HWFP && tsk == current) {
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
+ }
return 0;
}

@@ -600,7 +603,9 @@ int save_i387_xstate_ia32(void __user *buf)
NULL, fp) ? -1 : 1;
}

- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();

if (cpu_has_xsave)
return save_i387_xsave(fp);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..8df07c3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -185,7 +185,9 @@ void release_thread(struct task_struct *dead_task)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -294,21 +296,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*next = &next_p->thread;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
- bool preload_fpu;

/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */

- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;
-
- __unlazy_fpu(prev_p);
+ save_xstates(prev_p);

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -349,11 +343,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);

- /* If we're going to preload the fpu context, make sure clts
- is run while we're batching the cpu state updates. */
- if (preload_fpu)
- clts();
-
/*
* Leave lazy mode, flushing any hypercalls made here.
* This must be done before restoring TLS segments so
@@ -363,8 +352,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/
arch_end_context_switch(next_p);

- if (preload_fpu)
- __math_state_restore();
+ /*
+ * Restore enabled extended states for the task.
+ */
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

/*
* Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bd387e8..67c5838 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -249,7 +249,9 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -378,17 +380,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
unsigned fsindex, gsindex;
- bool preload_fpu;
-
- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -420,11 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
load_TLS(next, cpu);

/* Must be after DS reload */
- __unlazy_fpu(prev_p);
-
- /* Make sure cpu is ready for new context */
- if (preload_fpu)
- clts();
+ save_xstates(prev_p);

/*
* Leave lazy mode, flushing any hypercalls made here.
@@ -485,11 +475,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
__switch_to_xtra(prev_p, next_p, tss);

/*
- * Preload the FPU context, now that we've determined that the
- * task is likely to be using it.
+ * Restore enabled extended states for the task.
*/
- if (preload_fpu)
- __math_state_restore();
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

return prev_p;
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 32f3043..072c30e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -625,7 +625,10 @@ void math_error(struct pt_regs *regs, int error_code, int trapnr)
/*
* Save the info for the exception handler and clear the error.
*/
- save_init_fpu(task);
+ preempt_disable();
+ save_xstates(task);
+ preempt_enable();
+
task->thread.trap_no = trapnr;
task->thread.error_code = error_code;
info.si_signo = SIGFPE;
@@ -734,7 +737,7 @@ void __math_state_restore(void)
return;
}

- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
+ thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
}

@@ -768,9 +771,7 @@ asmlinkage void math_state_restore(void)
local_irq_disable();
}

- clts(); /* Allow maths ops (or we recurse) */
-
- __math_state_restore();
+ restore_xstates(tsk, XCNTXT_LAZY);
}
EXPORT_SYMBOL_GPL(math_state_restore);

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index e204b07..c422527 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -5,6 +5,7 @@
*/
#include <linux/bootmem.h>
#include <linux/compat.h>
+#include <linux/module.h>
#include <asm/i387.h>
#ifdef CONFIG_IA32_EMULATION
#include <asm/sigcontext32.h>
@@ -474,3 +475,73 @@ void __cpuinit xsave_init(void)
next_func = xstate_enable;
this_func();
}
+
+void xsave(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ fpu_xsave(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_save(fpu);
+
+ if (mask & XCNTXT_LAZY)
+ fpu_clean(fpu);
+
+ stts();
+}
+EXPORT_SYMBOL(xsave);
+
+void save_xstates(struct task_struct *tsk)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xsave(&tsk->thread.fpu, ti->xstate_mask);
+
+ if (!(ti->xstate_mask & XCNTXT_LAZY))
+ tsk->fpu_counter = 0;
+
+ /*
+ * If the task hasn't used the fpu the last 5 timeslices,
+ * force a lazy restore of the math states by clearing them
+ * from xstate_mask.
+ */
+ if (tsk->fpu_counter < 5)
+ ti->xstate_mask &= ~XCNTXT_LAZY;
+
+ ti->status &= ~TS_USEDFPU;
+}
+EXPORT_SYMBOL(save_xstates);
+
+void xrstor(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ xrstor_state(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_restore(fpu);
+
+ if (!(mask & XCNTXT_LAZY))
+ stts();
+}
+EXPORT_SYMBOL(xrstor);
+
+void restore_xstates(struct task_struct *tsk, u64 mask)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xrstor(&tsk->thread.fpu, mask);
+
+ ti->xstate_mask |= mask;
+ ti->status |= TS_USEDFPU;
+ if (mask & XCNTXT_LAZY)
+ tsk->fpu_counter++;
+}
+EXPORT_SYMBOL(restore_xstates);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bcc0efc..8fb21ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -58,6 +58,7 @@
#include <asm/xcr.h>
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/xsave.h>

#define MAX_IO_MSRS 256
#define CR0_RESERVED_BITS \
@@ -5793,8 +5794,8 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
*/
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
- unlazy_fpu(current);
- fpu_restore_checking(&vcpu->arch.guest_fpu);
+ save_xstates(current);
+ xrstor(&vcpu->arch.guest_fpu, -1);
trace_kvm_fpu(1);
}

@@ -5806,7 +5807,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- fpu_save_init(&vcpu->arch.guest_fpu);
+ xsave(&vcpu->arch.guest_fpu, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9f1659c..ef62289 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -204,7 +204,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
* uses the FPU.
*/
if (cpu->ts)
- unlazy_fpu(current);
+ save_xstates(current);

/*
* SYSENTER is an optimized way of doing system calls. We can't allow
--
1.5.6.5

2011-03-23 15:28:14

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 7/8] x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)

This patch extends the xsave structure to support the LWP state. The
xstate feature bit for LWP is added to XCNTXT_NONLAZY, thereby enabling
kernel support for saving/restoring LWP state. The LWP state is also
saved/restored on signal entry/return, just like all other xstates. LWP
state needs to be reset (disabled) when on entry in a signal handler.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/processor.h | 12 ++++++++++++
arch/x86/include/asm/sigcontext.h | 12 ++++++++++++
arch/x86/include/asm/xsave.h | 3 ++-
arch/x86/kernel/xsave.c | 2 ++
5 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 823d482..0ba2150 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -130,6 +130,7 @@
#define MSR_AMD64_IBSDCPHYSAD 0xc0011039
#define MSR_AMD64_IBSCTL 0xc001103a
#define MSR_AMD64_IBSBRTARGET 0xc001103b
+#define MSR_AMD64_LWP_CBADDR 0xc0000106

/* Fam 15h MSRs */
#define MSR_F15H_PERF_CTL 0xc0010200
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4c25ab4..df2cbd4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -353,6 +353,17 @@ struct ymmh_struct {
u32 ymmh_space[64];
};

+struct lwp_struct {
+ u64 lwpcb_addr;
+ u32 flags;
+ u32 buf_head_offset;
+ u64 buf_base;
+ u32 buf_size;
+ u32 filters;
+ u64 saved_event_record[4];
+ u32 event_counter[16];
+};
+
struct xsave_hdr_struct {
u64 xstate_bv;
u64 reserved1[2];
@@ -363,6 +374,7 @@ struct xsave_struct {
struct i387_fxsave_struct i387;
struct xsave_hdr_struct xsave_hdr;
struct ymmh_struct ymmh;
+ struct lwp_struct lwp;
/* new processor state extensions will go here */
} __attribute__ ((packed, aligned (64)));

diff --git a/arch/x86/include/asm/sigcontext.h b/arch/x86/include/asm/sigcontext.h
index 04459d2..0a58b82 100644
--- a/arch/x86/include/asm/sigcontext.h
+++ b/arch/x86/include/asm/sigcontext.h
@@ -274,6 +274,17 @@ struct _ymmh_state {
__u32 ymmh_space[64];
};

+struct _lwp_state {
+ __u64 lwpcb_addr;
+ __u32 flags;
+ __u32 buf_head_offset;
+ __u64 buf_base;
+ __u32 buf_size;
+ __u32 filters;
+ __u64 saved_event_record[4];
+ __u32 event_counter[16];
+};
+
/*
* Extended state pointed by the fpstate pointer in the sigcontext.
* In addition to the fpstate, information encoded in the xstate_hdr
@@ -284,6 +295,7 @@ struct _xstate {
struct _fpstate fpstate;
struct _xsave_hdr xstate_hdr;
struct _ymmh_state ymmh;
+ struct _lwp_state lwp;
/* new processor state extensions go here */
};

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 4ccee3c..be89f0e 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -9,6 +9,7 @@
#define XSTATE_FP 0x1
#define XSTATE_SSE 0x2
#define XSTATE_YMM 0x4
+#define XSTATE_LWP (1ULL << 62)

#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)

@@ -24,7 +25,7 @@
* These are the features that the OS can handle currently.
*/
#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
-#define XCNTXT_NONLAZY 0
+#define XCNTXT_NONLAZY (XSTATE_LWP)

#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 7b08d32..d3dc65e 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -177,6 +177,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
+ if (pcntxt_mask & XSTATE_LWP)
+ wrmsrl(MSR_AMD64_LWP_CBADDR, 0);
if (use_xsaveopt())
sanitize_i387_state(tsk);

--
1.5.6.5

2011-03-23 15:28:38

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 4/8] x86, xsave: remove unused code

The patches to rework the fpu/xsave handling and signal frame setup have
made a lot of code unused. This patch removes all this now useless stuff.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 155 ++----------------------------
arch/x86/include/asm/xsave.h | 51 ----------
arch/x86/kernel/i387.c | 221 ------------------------------------------
arch/x86/kernel/traps.c | 22 ----
arch/x86/kernel/xsave.c | 163 -------------------------------
5 files changed, 7 insertions(+), 605 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 30930bf..97867ea 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -29,8 +29,6 @@
# include <asm/sigcontext32.h>
# include <asm/user32.h>
#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
# define _fpstate_ia32 _fpstate
# define _xstate_ia32 _xstate
# define sig_xstate_ia32_size sig_xstate_size
@@ -108,75 +106,16 @@ static inline void sanitize_i387_state(struct task_struct *tsk)
}

#ifdef CONFIG_X86_64
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
- int err;
-
- /* See comment in fxsave() below. */
-#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxrstorq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "m" (*fx), "0" (0));
-#else
- asm volatile("1: rex64/fxrstor (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "R" (fx), "m" (*fx), "0" (0));
-#endif
- return err;
-}
-
-static inline int fxsave_user(struct i387_fxsave_struct __user *fx)
-{
- int err;
-
- /*
- * Clear the bytes not touched by the fxsave and reserved
- * for the SW usage.
- */
- err = __clear_user(&fx->sw_reserved,
- sizeof(struct _fpx_sw_bytes));
- if (unlikely(err))
- return -EFAULT;
-
/* See comment in fxsave() below. */
#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxsaveq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), [fx] "=m" (*fx)
- : "0" (0));
+ asm volatile("fxrstorq %[fx]\n\t"
+ : : [fx] "m" (*fx));
#else
- asm volatile("1: rex64/fxsave (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), "=m" (*fx)
- : [fx] "R" (fx), "0" (0));
+ asm volatile("rex64/fxrstor (%[fx])\n\t"
+ : : [fx] "R" (fx), "m" (*fx));
#endif
- if (unlikely(err) &&
- __clear_user(fx, sizeof(struct i387_fxsave_struct)))
- err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -209,7 +148,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
#else /* CONFIG_X86_32 */

/* perform fxrstor iff the processor has extended states, otherwise frstor */
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
/*
* The "nop" is needed to make the instructions the same
@@ -220,8 +159,6 @@ static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
"fxrstor %1",
X86_FEATURE_FXSR,
"m" (*fx));
-
- return 0;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -246,7 +183,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
*/
static inline void fpu_restore(struct fpu *fpu)
{
- fxrstor_checking(&fpu->state->fxsave);
+ fxrstor(&fpu->state->fxsave);
}

static inline void fpu_save(struct fpu *fpu)
@@ -278,69 +215,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void fpu_save_init(struct fpu *fpu)
-{
- if (use_xsave()) {
- struct xsave_struct *xstate = &fpu->state->xsave;
-
- fpu_xsave(xstate, -1);
-
- /*
- * xsave header may indicate the init state of the FP.
- */
- if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
- return;
- } else if (use_fxsr()) {
- fpu_fxsave(fpu);
- } else {
- asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
- return;
- }
-
- if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES))
- asm volatile("fnclex");
-
- /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
- is pending. Clear the x87 state here by setting it to fixed
- values. safe_address is a random variable that should be in L1 */
- alternative_input(
- ASM_NOP8 ASM_NOP2,
- "emms\n\t" /* clear stack tags */
- "fildl %P[addr]", /* set F?P to defined value */
- X86_FEATURE_FXSAVE_LEAK,
- [addr] "m" (safe_address));
-}
-
-static inline void __save_init_fpu(struct task_struct *tsk)
-{
- fpu_save_init(&tsk->thread.fpu);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
-}
-
-static inline int fpu_restore_checking(struct fpu *fpu)
-{
- if (use_xsave())
- return xrstor_checking(&fpu->state->xsave, -1);
- else
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
-/*
- * Signal frame handlers...
- */
-extern int save_i387_xstate(void __user *buf);
-extern int restore_i387_xstate(void __user *buf);
-
-static inline void __unlazy_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- __save_init_fpu(tsk);
- stts();
- } else
- tsk->fpu_counter = 0;
-}
-
static inline void __clear_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
@@ -409,21 +283,6 @@ static inline void irq_ts_restore(int TS_state)
/*
* These disable preemption on their own and are safe
*/
-static inline void save_init_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __save_init_fpu(tsk);
- stts();
- preempt_enable();
-}
-
-static inline void unlazy_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __unlazy_fpu(tsk);
- preempt_enable();
-}
-
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 200c56d..742da4a 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -51,26 +51,6 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
unsigned int size,
struct _fpx_sw_bytes *sw);

-static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
-{
- int err;
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
- : "memory");
-
- return err;
-}
-
static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -81,37 +61,6 @@ static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline int xsave_checking(struct xsave_struct __user *buf)
-{
- int err;
-
- /*
- * Clear the xsave header first, so that reserved fields are
- * initialized to zero.
- */
- err = __clear_user(&buf->xsave_hdr,
- sizeof(struct xsave_hdr_struct));
- if (unlikely(err))
- return -EFAULT;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b,3b)
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
-
- if (unlikely(err) && __clear_user(buf, xstate_size))
- err = -EFAULT;
-
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 5cec7c2..d2d2b69 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -490,227 +490,6 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
}

/*
- * Signal frame handlers.
- */
-
-static inline int save_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fsave_struct *fp = &tsk->thread.fpu.state->fsave;
-
- fp->status = fp->swd;
- if (__copy_to_user(buf, fp, sizeof(struct i387_fsave_struct)))
- return -1;
- return 1;
-}
-
-static int save_i387_fxsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
- struct user_i387_ia32_struct env;
- int err = 0;
-
- convert_from_fxsr(&env, tsk);
- if (__copy_to_user(buf, &env, sizeof(env)))
- return -1;
-
- err |= __put_user(fx->swd, &buf->status);
- err |= __put_user(X86_FXSR_MAGIC, &buf->magic);
- if (err)
- return -1;
-
- if (__copy_to_user(&buf->_fxsr_env[0], fx, xstate_size))
- return -1;
- return 1;
-}
-
-static int save_i387_xsave(void __user *buf)
-{
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fx = buf;
- int err = 0;
-
-
- sanitize_i387_state(tsk);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context.
- * This will enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware applications can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
-
- if (save_i387_fxsave(fx) < 0)
- return -1;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved_ia32,
- sizeof(struct _fpx_sw_bytes));
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_ia32_size
- - FP_XSTATE_MAGIC2_SIZE));
- if (err)
- return -1;
-
- return 1;
-}
-
-int save_i387_xstate_ia32(void __user *buf)
-{
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
- struct task_struct *tsk = current;
-
- if (!used_math())
- return 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_ia32_size))
- return -EACCES;
- /*
- * This will cause a "finit" to be triggered by the next
- * attempted FPU operation by the 'current' process.
- */
- clear_used_math();
-
- if (!HAVE_HWFP) {
- return fpregs_soft_get(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) ? -1 : 1;
- }
-
- preempt_disable();
- save_xstates(tsk);
- preempt_enable();
-
- if (cpu_has_xsave)
- return save_i387_xsave(fp);
- if (cpu_has_fxsr)
- return save_i387_fxsave(fp);
- else
- return save_i387_fsave(fp);
-}
-
-static inline int restore_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
-
- return __copy_from_user(&tsk->thread.fpu.state->fsave, buf,
- sizeof(struct i387_fsave_struct));
-}
-
-static int restore_i387_fxsave(struct _fpstate_ia32 __user *buf,
- unsigned int size)
-{
- struct task_struct *tsk = current;
- struct user_i387_ia32_struct env;
- int err;
-
- err = __copy_from_user(&tsk->thread.fpu.state->fxsave, &buf->_fxsr_env[0],
- size);
- /* mxcsr reserved bits must be masked to zero for security reasons */
- tsk->thread.fpu.state->fxsave.mxcsr &= mxcsr_feature_mask;
- if (err || __copy_from_user(&env, buf, sizeof(env)))
- return 1;
- convert_to_fxsr(tsk, &env);
-
- return 0;
-}
-
-static int restore_i387_xsave(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- struct _fpstate_ia32 __user *fx_user =
- ((struct _fpstate_ia32 __user *) buf);
- struct i387_fxsave_struct __user *fx =
- (struct i387_fxsave_struct __user *) &fx_user->_fxsr_env[0];
- struct xsave_hdr_struct *xsave_hdr =
- &current->thread.fpu.state->xsave.xsave_hdr;
- u64 mask;
- int err;
-
- if (check_for_xstate(fx, sig_xstate_ia32_size -
- offsetof(struct _fpstate_ia32, _fxsr_env),
- &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- err = restore_i387_fxsave(buf, fx_sw_user.xstate_size);
-
- xsave_hdr->xstate_bv &= pcntxt_mask;
- /*
- * These bits must be zero.
- */
- xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
-
- /*
- * Init the state that is not present in the memory layout
- * and enabled by the OS.
- */
- mask = ~(pcntxt_mask & ~mask);
- xsave_hdr->xstate_bv &= mask;
-
- return err;
-fx_only:
- /*
- * Couldn't find the extended state information in the memory
- * layout. Restore the FP/SSE and init the other extended state
- * enabled by the OS.
- */
- xsave_hdr->xstate_bv = XSTATE_FPSSE;
- return restore_i387_fxsave(buf, sizeof(struct i387_fxsave_struct));
-}
-
-int restore_i387_xstate_ia32(void __user *buf)
-{
- int err;
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
-
- if (HAVE_HWFP)
- clear_fpu(tsk);
-
- if (!buf) {
- if (used_math()) {
- clear_fpu(tsk);
- clear_used_math();
- }
-
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_ia32_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (HAVE_HWFP) {
- if (cpu_has_xsave)
- err = restore_i387_xsave(buf);
- else if (cpu_has_fxsr)
- err = restore_i387_fxsave(fp, sizeof(struct
- i387_fxsave_struct));
- else
- err = restore_i387_fsave(fp);
- } else {
- err = fpregs_soft_set(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) != 0;
- }
- set_used_math();
-
- return err;
-}
-
-/*
* FPU state for core dumps.
* This is only used for a.out dumps now.
* It is declared generically using elf_fpregset_t (which is
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 072c30e..872fc78 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -720,28 +720,6 @@ asmlinkage void __attribute__((weak)) smp_threshold_interrupt(void)
}

/*
- * __math_state_restore assumes that cr0.TS is already clear and the
- * fpu state is all ready for use. Used during context switch.
- */
-void __math_state_restore(void)
-{
- struct thread_info *thread = current_thread_info();
- struct task_struct *tsk = thread->task;
-
- /*
- * Paranoid restore. send a SIGSEGV if we fail to restore the state.
- */
- if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
- stts();
- force_sig(SIGSEGV, tsk);
- return;
- }
-
- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
- tsk->fpu_counter++;
-}
-
-/*
* 'math_state_restore()' saves the current math information in the
* old math state array, and gets the new ones from the current task
*
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 5d07a88..f2714ea 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -249,169 +249,6 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
return 1;
}

-#ifdef CONFIG_X86_64
-int save_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_size))
- return -EACCES;
-
- BUG_ON(sig_xstate_size < xstate_size);
-
- if ((unsigned long)buf % 64)
- printk("save_i387_xstate: bad fpstate %p\n", buf);
-
- if (!used_math())
- return 0;
-
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- if (use_xsave())
- err = xsave_checking(buf);
- else
- err = fxsave_user(buf);
-
- if (err)
- return err;
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
- } else {
- sanitize_i387_state(tsk);
- if (__copy_to_user(buf, &tsk->thread.fpu.state->fxsave,
- xstate_size))
- return -1;
- }
-
- clear_used_math(); /* trigger finit */
-
- if (use_xsave()) {
- struct _fpstate __user *fx = buf;
- struct _xstate __user *x = buf;
- u64 xstate_bv;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved,
- sizeof(struct _fpx_sw_bytes));
-
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_size
- - FP_XSTATE_MAGIC2_SIZE));
-
- /*
- * Read the xstate_bv which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers and
- * set the FP/SSE bits.
- */
- err |= __get_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context. This will
- * enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware apps can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- xstate_bv |= XSTATE_FPSSE;
-
- err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- if (err)
- return err;
- }
-
- return 1;
-}
-
-/*
- * Restore the extended state if present. Otherwise, restore the FP/SSE
- * state.
- */
-static int restore_user_xstate(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- u64 mask;
- int err;
-
- if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- /*
- * restore the state passed by the user.
- */
- err = xrstor_checking((__force struct xsave_struct *)buf, mask);
- if (err)
- return err;
-
- /*
- * init the state skipped by the user.
- */
- mask = pcntxt_mask & ~mask;
- if (unlikely(mask))
- xrstor_state(init_xstate_buf, mask);
-
- return 0;
-
-fx_only:
- /*
- * couldn't find the extended state information in the
- * memory layout. Restore just the FP/SSE and init all
- * the other extended state.
- */
- xrstor_state(init_xstate_buf, pcntxt_mask & ~XSTATE_FPSSE);
- return fxrstor_checking((__force struct i387_fxsave_struct *)buf);
-}
-
-/*
- * This restores directly out of user space. Exceptions are handled.
- */
-int restore_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!buf) {
- if (used_math())
- goto clear;
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (!(task_thread_info(current)->status & TS_USEDFPU)) {
- clts();
- task_thread_info(current)->status |= TS_USEDFPU;
- }
- if (use_xsave())
- err = restore_user_xstate(buf);
- else
- err = fxrstor_checking((__force struct i387_fxsave_struct *)
- buf);
- if (unlikely(err)) {
- /*
- * Encountered an error while doing the restore from the
- * user buffer, clear the fpu state.
- */
-clear:
- clear_fpu(tsk);
- clear_used_math();
- }
- return err;
-}
-#endif
-
int restore_xstates_sigframe(void __user *buf, unsigned int size)
{
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
--
1.5.6.5

2011-03-23 15:28:05

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 0/8] x86, xsave: rework of extended state handling, LWP support

Changes since last patch set:
* fixed race in signal return path that could lead to xstate corruptions
* avoid LWP state inconsitency after signal return by disabling LWP
when entering a signal handler


This patch set is a general cleanup and rework of the code related to
handling of FPU and other extended states.

All handling of extended states, including the FPU state, is now handled
by xsave/xrstor wrappers that fall back to fxsave/fxrstor, or even
fsave/frstor, if hardware support for those features is lacking.

Non-lazy xstates, which cannot be restored lazily, can now be easily
supported with almost no processing overhead. This makes adding basic
support for AMDs LWP almost trivial.

Since non-lazy xstates are inherently incompatible with lazy allocation
of the xstate area, the complete removal of lazy allocation to further
reduce code complexity should be considered. Since SSE-optimized library
functions are widely used today, most processes will have an xstate area
anyway, so the memory overhead wouldn't be big enough to be much of an
issue.


Hans Rosenfeld (8):
x86, xsave: cleanup fpu/xsave support
x86, xsave: rework fpu/xsave support
x86, xsave: cleanup fpu/xsave signal frame setup
x86, xsave: remove unused code
x86, xsave: more cleanups
x86, xsave: add support for non-lazy xstates
x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)
x86, xsave: remove lazy allocation of xstate area

arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 243 ++++++++--------------------
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/processor.h | 12 ++
arch/x86/include/asm/sigcontext.h | 12 ++
arch/x86/include/asm/thread_info.h | 4 +-
arch/x86/include/asm/xsave.h | 100 +++----------
arch/x86/kernel/i387.c | 310 ++++--------------------------------
arch/x86/kernel/process_32.c | 29 ++---
arch/x86/kernel/process_64.c | 28 +---
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/traps.c | 47 +-----
arch/x86/kernel/xsave.c | 311 +++++++++++++++++++++++-------------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 11 +-
arch/x86/math-emu/fpu_entry.c | 8 +-
drivers/lguest/x86/core.c | 2 +-
17 files changed, 380 insertions(+), 748 deletions(-)

2011-03-23 15:28:57

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 8/8] x86, xsave: remove lazy allocation of xstate area

This patch completely removes lazy allocation of the xstate area. All
tasks will always have an xstate area preallocated, just like they
already do when non-lazy features are present. The size of the xsave
area ranges from 112 to 960 bytes, depending on the xstates present and
enabled. Since it is common to use SSE etc. for optimization, the actual
overhead is expected to negligible.

This removes some of the special-case handling of non-lazy xstates. It
also greatly simplifies init_fpu() by removing the allocation code, the
check for presence of the xstate area or init_fpu() return value.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 12 +++-------
arch/x86/kernel/i387.c | 46 +++++++++++-----------------------------
arch/x86/kernel/traps.c | 16 +------------
arch/x86/kernel/xsave.c | 21 ++----------------
arch/x86/kvm/x86.c | 4 +-
arch/x86/math-emu/fpu_entry.c | 8 +-----
6 files changed, 26 insertions(+), 81 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 22ad24c..0448f45 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -40,7 +40,7 @@
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
-extern int init_fpu(struct task_struct *child);
+extern void init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

@@ -332,13 +332,9 @@ extern void fpu_finit(struct fpu *fpu);

static inline void fpu_clear(struct fpu *fpu)
{
- if (pcntxt_mask & XCNTXT_NONLAZY) {
- memset(fpu->state, 0, xstate_size);
- fpu_finit(fpu);
- set_used_math();
- } else {
- fpu_free(fpu);
- }
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 88fefba..32b3c8d 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -42,6 +42,8 @@ void __cpuinit mxcsr_feature_mask_init(void)

static void __cpuinit init_thread_xstate(void)
{
+ static union thread_xstate x;
+
/*
* Note that xstate_size might be overwriten later during
* xsave_init().
@@ -62,6 +64,9 @@ static void __cpuinit init_thread_xstate(void)
xstate_size = sizeof(struct i387_fxsave_struct);
else
xstate_size = sizeof(struct i387_fsave_struct);
+
+ init_task.thread.fpu.state = &x;
+ fpu_finit(&init_task.thread.fpu);
}

/*
@@ -127,30 +132,20 @@ EXPORT_SYMBOL_GPL(fpu_finit);
* value at reset if we support XMM instructions and then
* remeber the current task has used the FPU.
*/
-int init_fpu(struct task_struct *tsk)
+void init_fpu(struct task_struct *tsk)
{
- int ret;
-
if (tsk_used_math(tsk)) {
if (HAVE_HWFP && tsk == current) {
preempt_disable();
save_xstates(tsk);
preempt_enable();
}
- return 0;
+ return;
}

- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpu_alloc(&tsk->thread.fpu);
- if (ret)
- return ret;
-
fpu_finit(&tsk->thread.fpu);

set_stopped_child_used_math(tsk);
- return 0;
}
EXPORT_SYMBOL_GPL(init_fpu);

@@ -173,14 +168,10 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- int ret;
-
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -198,9 +189,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -232,9 +221,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

/*
* Copy the 48bytes defined by the software first into the xstate
@@ -262,9 +249,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->xsave, 0, -1);
@@ -427,11 +412,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
void *kbuf, void __user *ubuf)
{
struct user_i387_ia32_struct env;
- int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (!HAVE_HWFP)
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -462,9 +444,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 872fc78..c8fbd04 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -734,20 +734,8 @@ asmlinkage void math_state_restore(void)
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task;

- if (!tsk_used_math(tsk)) {
- local_irq_enable();
- /*
- * does a slab alloc which can sleep
- */
- if (init_fpu(tsk)) {
- /*
- * ran out of memory!
- */
- do_group_exit(SIGKILL);
- return;
- }
- local_irq_disable();
- }
+ if (!tsk_used_math(tsk))
+ init_fpu(tsk);

restore_xstates(tsk, XCNTXT_LAZY);
}
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d3dc65e..81f54e9 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -264,7 +264,6 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
- int err;

if (!buf) {
if (used_math()) {
@@ -277,11 +276,8 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;

- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
+ if (!used_math())
+ init_fpu(tsk);

if (!HAVE_HWFP) {
set_used_math();
@@ -481,13 +477,8 @@ static void __init xstate_enable_boot_cpu(void)
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);

- if (pcntxt_mask & XCNTXT_NONLAZY) {
- static union thread_xstate x;
-
+ if (pcntxt_mask & XCNTXT_NONLAZY)
task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
- init_task.thread.fpu.state = &x;
- fpu_finit(&init_task.thread.fpu);
- }
}

/*
@@ -530,9 +521,6 @@ void save_xstates(struct task_struct *tsk)
{
struct thread_info *ti = task_thread_info(tsk);

- if (!fpu_allocated(&tsk->thread.fpu))
- return;
-
xsave(&tsk->thread.fpu.state->xsave, ti->xstate_mask);

if (!(ti->xstate_mask & XCNTXT_LAZY))
@@ -566,9 +554,6 @@ void restore_xstates(struct task_struct *tsk, u64 mask)
{
struct thread_info *ti = task_thread_info(tsk);

- if (!fpu_allocated(&tsk->thread.fpu))
- return;
-
xrstor(&tsk->thread.fpu.state->xsave, mask);

ti->xstate_mask |= mask;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 10aeb04..bd71b12 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5377,8 +5377,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;

- if (!tsk_used_math(current) && init_fpu(current))
- return -ENOMEM;
+ if (!tsk_used_math(current))
+ init_fpu(current);

if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 7718541..472e2b9 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -147,12 +147,8 @@ void math_emulate(struct math_emu_info *info)
unsigned long code_limit = 0; /* Initialized to stop compiler warnings */
struct desc_struct code_descriptor;

- if (!used_math()) {
- if (init_fpu(current)) {
- do_group_exit(SIGKILL);
- return;
- }
- }
+ if (!used_math())
+ init_fpu(current);

#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
1.5.6.5

2011-03-23 15:28:59

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 1/8] x86, xsave: cleanup fpu/xsave support

Removed the functions fpu_fxrstor_checking() and restore_fpu_checking()
because they weren't doing anything. Removed redundant xsave/xrstor
implementations. Since xsave/xrstor is not specific to the FPU, and also
for consistency, all xsave/xrstor functions now take a xsave_struct
argument.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 20 +++-------
arch/x86/include/asm/xsave.h | 81 +++++++++++++++---------------------------
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/xsave.c | 4 +-
4 files changed, 38 insertions(+), 69 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ef32890..d908383 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -227,12 +227,14 @@ static inline void fpu_fxsave(struct fpu *fpu)
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
- fpu_xsave(fpu);
+ struct xsave_struct *xstate = &fpu->state->xsave;
+
+ fpu_xsave(xstate);

/*
* xsave header may indicate the init state of the FP.
*/
- if (!(fpu->state->xsave.xsave_hdr.xstate_bv & XSTATE_FP))
+ if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
return;
} else if (use_fxsr()) {
fpu_fxsave(fpu);
@@ -262,22 +264,12 @@ static inline void __save_init_fpu(struct task_struct *tsk)
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}

-static inline int fpu_fxrstor_checking(struct fpu *fpu)
-{
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
static inline int fpu_restore_checking(struct fpu *fpu)
{
if (use_xsave())
- return fpu_xrstor_checking(fpu);
+ return xrstor_checking(&fpu->state->xsave, -1);
else
- return fpu_fxrstor_checking(fpu);
-}
-
-static inline int restore_fpu_checking(struct task_struct *tsk)
-{
- return fpu_restore_checking(&tsk->thread.fpu);
+ return fxrstor_checking(&fpu->state->fxsave);
}

/*
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c6ce245..8bcbbce 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -42,10 +42,11 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
void __user *fpstate,
struct _fpx_sw_bytes *sw);

-static inline int fpu_xrstor_checking(struct fpu *fpu)
+static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
{
- struct xsave_struct *fx = &fpu->state->xsave;
int err;
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;

asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
"2:\n"
@@ -55,13 +56,23 @@ static inline int fpu_xrstor_checking(struct fpu *fpu)
".previous\n"
_ASM_EXTABLE(1b, 3b)
: [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (-1), "d" (-1), "0" (0)
+ : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
: "memory");

return err;
}

-static inline int xsave_user(struct xsave_struct __user *buf)
+static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
+{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
+ asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
+ : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+ : "memory");
+}
+
+static inline int xsave_checking(struct xsave_struct __user *buf)
{
int err;

@@ -74,58 +85,24 @@ static inline int xsave_user(struct xsave_struct __user *buf)
if (unlikely(err))
return -EFAULT;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
+ asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl $-1,%[err]\n"
+ " jmp 2b\n"
+ ".previous\n"
+ _ASM_EXTABLE(1b,3b)
+ : [err] "=r" (err)
+ : "D" (buf), "a" (-1), "d" (-1), "0" (0)
+ : "memory");
+
if (unlikely(err) && __clear_user(buf, xstate_size))
err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
-static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask)
-{
- int err;
- struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
- u32 lmask = mask;
- u32 hmask = mask >> 32;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0)
- : "memory"); /* memory required? */
+ /* No need to clear here because the caller clears USED_MATH */
return err;
}

-static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
-{
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
- : "memory");
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -136,7 +113,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct fpu *fpu)
+static inline void fpu_xsave(struct xsave_struct *fx)
{
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
@@ -144,7 +121,7 @@ static inline void fpu_xsave(struct fpu *fpu)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (&fpu->state->xsave), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (-1), "d" (-1) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..32f3043 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -728,7 +728,7 @@ void __math_state_restore(void)
/*
* Paranoid restore. send a SIGSEGV if we fail to restore the state.
*/
- if (unlikely(restore_fpu_checking(tsk))) {
+ if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
stts();
force_sig(SIGSEGV, tsk);
return;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 5471285..e204b07 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -170,7 +170,7 @@ int save_i387_xstate(void __user *buf)

if (task_thread_info(tsk)->status & TS_USEDFPU) {
if (use_xsave())
- err = xsave_user(buf);
+ err = xsave_checking(buf);
else
err = fxsave_user(buf);

@@ -247,7 +247,7 @@ static int restore_user_xstate(void __user *buf)
/*
* restore the state passed by the user.
*/
- err = xrestore_user(buf, mask);
+ err = xrstor_checking((__force struct xsave_struct *)buf, mask);
if (err)
return err;

--
1.5.6.5

2011-03-23 15:29:22

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 5/8] x86, xsave: more cleanups

Removed some declarations from headers that weren't used.

Retired TS_USEDFPU, it has been replaced by the XCNTXT_* bits in
xstate_mask.

There is no reason functions like fpu_fxsave() etc. need to know or
handle anything else than a buffer to save/restore their stuff to/from.

Sanitize_i387_state() is extra work that is only needed when xsaveopt is
used. There is no point in hiding this in an inline function, adding
extra code lines just to save a single if() in the five places it is
used. Also, it is obscuring a fact that might well be interesting to
whoever is reading the code, but it is not gaining anything.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 67 ++++++++++++-----------------------
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/include/asm/xsave.h | 14 +++----
arch/x86/kernel/i387.c | 12 ++++--
arch/x86/kernel/xsave.c | 32 ++++++++---------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 4 +-
7 files changed, 55 insertions(+), 78 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 97867ea..b8f9617 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -42,7 +42,6 @@ extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
extern int init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
-extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
@@ -60,15 +59,10 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
*/
#define xstateregs_active fpregs_active

-extern struct _fpx_sw_bytes fx_sw_reserved;
extern unsigned int mxcsr_feature_mask;
+
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
-extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
-struct _fpstate_ia32;
-struct _xstate_ia32;
-extern int save_i387_xstate_ia32(void __user *buf);
-extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
@@ -76,7 +70,7 @@ extern int restore_i387_xstate_ia32(void __user *buf);
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
# define HAVE_HWFP 1
-static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
+# define finit_soft_fpu(x)
#endif

#define X87_FSW_ES (1 << 7) /* Exception Summary */
@@ -96,15 +90,6 @@ static __always_inline __pure bool use_fxsr(void)
return static_cpu_has(X86_FEATURE_FXSR);
}

-extern void __sanitize_i387_state(struct task_struct *);
-
-static inline void sanitize_i387_state(struct task_struct *tsk)
-{
- if (!use_xsaveopt())
- return;
- __sanitize_i387_state(tsk);
-}
-
#ifdef CONFIG_X86_64
static inline void fxrstor(struct i387_fxsave_struct *fx)
{
@@ -118,7 +103,7 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
#endif
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
/* Using "rex64; fxsave %0" is broken because, if the memory operand
uses any extended registers for addressing, a second REX prefix
@@ -129,7 +114,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
/* Using "fxsaveq %0" would be the ideal choice, but is only supported
starting with gas 2.16. */
__asm__ __volatile__("fxsaveq %0"
- : "=m" (fpu->state->fxsave));
+ : "=m" (*fx));
#else
/* Using, as a workaround, the properly prefixed form below isn't
accepted by any binutils version so far released, complaining that
@@ -140,8 +125,8 @@ static inline void fpu_fxsave(struct fpu *fpu)
This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
asm volatile("rex64/fxsave (%[fx])"
- : "=m" (fpu->state->fxsave)
- : [fx] "R" (&fpu->state->fxsave));
+ : "=m" (*fx)
+ : [fx] "R" (fx));
#endif
}

@@ -161,10 +146,10 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
"m" (*fx));
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
asm volatile("fxsave %[fx]"
- : [fx] "=m" (fpu->state->fxsave));
+ : [fx] "=m" (*fx));
}

#endif /* CONFIG_X86_64 */
@@ -181,25 +166,25 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
-static inline void fpu_restore(struct fpu *fpu)
+static inline void fpu_restore(struct i387_fxsave_struct *fx)
{
- fxrstor(&fpu->state->fxsave);
+ fxrstor(fx);
}

-static inline void fpu_save(struct fpu *fpu)
+static inline void fpu_save(struct i387_fxsave_struct *fx)
{
if (use_fxsr()) {
- fpu_fxsave(fpu);
+ fpu_fxsave(fx);
} else {
asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
+ : [fx] "=m" (*fx));
}
}

-static inline void fpu_clean(struct fpu *fpu)
+static inline void fpu_clean(struct i387_fxsave_struct *fx)
{
u32 swd = (use_fxsr() || use_xsave()) ?
- fpu->state->fxsave.swd : fpu->state->fsave.swd;
+ fx->swd : ((struct i387_fsave_struct *)fx)->swd;

if (unlikely(swd & X87_FSW_ES))
asm volatile("fnclex");
@@ -215,19 +200,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void __clear_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- /* Ignore delayed exceptions from user space */
- asm volatile("1: fwait\n"
- "2:\n"
- _ASM_EXTABLE(1b, 2b));
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
- stts();
- }
-}
-
static inline void kernel_fpu_begin(void)
{
preempt_disable();
@@ -286,7 +258,14 @@ static inline void irq_ts_restore(int TS_state)
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
- __clear_fpu(tsk);
+ if (task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY) {
+ /* Ignore delayed exceptions from user space */
+ asm volatile("1: fwait\n"
+ "2:\n"
+ _ASM_EXTABLE(1b, 2b));
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
+ stts();
+ }
preempt_enable();
}

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 5c92d21..13de316 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -238,8 +238,6 @@ static inline struct thread_info *current_thread_info(void)
* ever touches our thread-synchronous status, so we don't
* have to worry about atomic accesses.
*/
-#define TS_USEDFPU 0x0001 /* FPU was used by this task
- this quantum (SMP) */
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
#define TS_POLLING 0x0004 /* idle task polling need_resched,
skip sending interrupt */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 742da4a..b8861d4 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -37,8 +37,8 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

-extern void xsave(struct fpu *, u64);
-extern void xrstor(struct fpu *, u64);
+extern void xsave(struct xsave_struct *, u64);
+extern void xrstor(struct xsave_struct *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
extern int save_xstates_sigframe(void __user *, unsigned int);
@@ -46,10 +46,7 @@ extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
-extern int init_fpu(struct task_struct *child);
-extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- unsigned int size,
- struct _fpx_sw_bytes *sw);
+extern void sanitize_i387_state(struct task_struct *);

static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
@@ -71,7 +68,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
+static inline void xsaveopt_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
@@ -82,7 +79,8 @@ static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (lmask), "d" (hmask) :
+ "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
+
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index d2d2b69..88fefba 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -182,7 +182,8 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -201,7 +202,8 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -440,7 +442,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
-1);
}

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (kbuf && pos == 0 && count == sizeof(env)) {
convert_from_fxsr(kbuf, target);
@@ -463,7 +466,8 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (!HAVE_HWFP)
return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf);
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index f2714ea..4e5bf58 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -39,7 +39,7 @@ static unsigned int *xstate_offsets, *xstate_sizes, xstate_features;
* that the user doesn't see some stale state in the memory layout during
* signal handling, debugging etc.
*/
-void __sanitize_i387_state(struct task_struct *tsk)
+void sanitize_i387_state(struct task_struct *tsk)
{
u64 xstate_bv;
int feature_bit = 0x2;
@@ -48,7 +48,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!fx)
return;

- BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);
+ BUG_ON(task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY);

xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;

@@ -103,8 +103,8 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
- struct _fpx_sw_bytes *fx_sw_user)
+static int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
+ struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
sizeof(struct xsave_hdr_struct);
@@ -176,7 +176,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
- sanitize_i387_state(tsk);
+ if (use_xsaveopt())
+ sanitize_i387_state(tsk);

#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
if (ia32) {
@@ -498,17 +499,17 @@ void __cpuinit xsave_init(void)
this_func();
}

-void xsave(struct fpu *fpu, u64 mask)
+void xsave(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- fpu_xsave(&fpu->state->xsave, mask);
+ xsaveopt_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_save(fpu);
+ fpu_save(&x->i387);

if (mask & XCNTXT_LAZY)
- fpu_clean(fpu);
+ fpu_clean(&x->i387);

stts();
}
@@ -521,7 +522,7 @@ void save_xstates(struct task_struct *tsk)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xsave(&tsk->thread.fpu, ti->xstate_mask);
+ xsave(&tsk->thread.fpu.state->xsave, ti->xstate_mask);

if (!(ti->xstate_mask & XCNTXT_LAZY))
tsk->fpu_counter = 0;
@@ -533,19 +534,17 @@ void save_xstates(struct task_struct *tsk)
*/
if (tsk->fpu_counter < 5)
ti->xstate_mask &= ~XCNTXT_LAZY;
-
- ti->status &= ~TS_USEDFPU;
}
EXPORT_SYMBOL(save_xstates);

-void xrstor(struct fpu *fpu, u64 mask)
+void xrstor(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- xrstor_state(&fpu->state->xsave, mask);
+ xrstor_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_restore(fpu);
+ fpu_restore(&x->i387);

if (!(mask & XCNTXT_LAZY))
stts();
@@ -559,10 +558,9 @@ void restore_xstates(struct task_struct *tsk, u64 mask)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xrstor(&tsk->thread.fpu, mask);
+ xrstor(&tsk->thread.fpu.state->xsave, mask);

ti->xstate_mask |= mask;
- ti->status |= TS_USEDFPU;
if (mask & XCNTXT_LAZY)
tsk->fpu_counter++;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bf89ec2..d79bf2f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -876,7 +876,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
#ifdef CONFIG_X86_64
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
#endif
- if (current_thread_info()->status & TS_USEDFPU)
+ if (current_thread_info()->xstate_mask & XCNTXT_LAZY)
clts();
load_gdt(&__get_cpu_var(host_gdt));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8fb21ea..10aeb04 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5795,7 +5795,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
save_xstates(current);
- xrstor(&vcpu->arch.guest_fpu, -1);
+ xrstor(&vcpu->arch.guest_fpu.state->xsave, -1);
trace_kvm_fpu(1);
}

@@ -5807,7 +5807,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- xsave(&vcpu->arch.guest_fpu, -1);
+ xsave(&vcpu->arch.guest_fpu.state->xsave, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
--
1.5.6.5

2011-03-23 15:29:35

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v2 3/8] x86, xsave: cleanup fpu/xsave signal frame setup

There are currently two code paths that handle the fpu/xsave context in
a signal frame for 32bit and 64bit tasks. These two code paths differ
only in that they have or lack certain micro-optimizations or do some
additional work (fsave compatibility for 32bit). The code is complex,
mostly duplicate and hard to understand and maintain.

This patch creates a set of two new, unified and cleaned up functions to
replace them. Besides avoiding the duplicate code, it is now obvious
what is done in which situations. The micro-optimization w.r.t xsave
(saving and restoring directly from the user buffer) is gone, and with
it the headaches caused by it about validating the buffer alignment and
contents and catching possible xsave/xrstor faults.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 20 ++++
arch/x86/include/asm/xsave.h | 4 +-
arch/x86/kernel/i387.c | 32 ++------
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/xsave.c | 197 ++++++++++++++++++++++++++++++++++++++++--
6 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..2605fae 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -255,7 +255,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,

get_user_ex(tmp, &sc->fpstate);
buf = compat_ptr(tmp);
- err |= restore_i387_xstate_ia32(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_ia32_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -396,7 +396,7 @@ static void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
if (used_math()) {
sp = sp - sig_xstate_ia32_size;
*fpstate = (struct _fpstate_ia32 *) sp;
- if (save_i387_xstate_ia32(*fpstate) < 0)
+ if (save_xstates_sigframe(*fpstate, sig_xstate_ia32_size) < 0)
return (void __user *) -1L;
}

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 939af08..30930bf 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -25,6 +25,20 @@
#include <asm/uaccess.h>
#include <asm/xsave.h>

+#ifdef CONFIG_X86_64
+# include <asm/sigcontext32.h>
+# include <asm/user32.h>
+#else
+# define save_i387_xstate_ia32 save_i387_xstate
+# define restore_i387_xstate_ia32 restore_i387_xstate
+# define _fpstate_ia32 _fpstate
+# define _xstate_ia32 _xstate
+# define sig_xstate_ia32_size sig_xstate_size
+# define fx_sw_reserved_ia32 fx_sw_reserved
+# define user_i387_ia32_struct user_i387_struct
+# define user32_fxsr_struct user_fxsr_struct
+#endif
+
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
@@ -33,6 +47,9 @@ extern asmlinkage void math_state_restore(void);
extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

+extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
+extern void convert_to_fxsr(struct task_struct *, const struct user_i387_ia32_struct *);
+
extern user_regset_active_fn fpregs_active, xfpregs_active;
extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
xstateregs_get;
@@ -46,6 +63,7 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
#define xstateregs_active fpregs_active

extern struct _fpx_sw_bytes fx_sw_reserved;
+extern unsigned int mxcsr_feature_mask;
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
@@ -56,8 +74,10 @@ extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
+# define HAVE_HWFP (boot_cpu_data.hard_math)
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
+# define HAVE_HWFP 1
static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
#endif

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 6052a84..200c56d 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -41,12 +41,14 @@ extern void xsave(struct fpu *, u64);
extern void xrstor(struct fpu *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
+extern int save_xstates_sigframe(void __user *, unsigned int);
+extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+ unsigned int size,
struct _fpx_sw_bytes *sw);

static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 5ab66ec..5cec7c2 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -18,27 +18,7 @@
#include <asm/i387.h>
#include <asm/user.h>

-#ifdef CONFIG_X86_64
-# include <asm/sigcontext32.h>
-# include <asm/user32.h>
-#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
-# define _fpstate_ia32 _fpstate
-# define _xstate_ia32 _xstate
-# define sig_xstate_ia32_size sig_xstate_size
-# define fx_sw_reserved_ia32 fx_sw_reserved
-# define user_i387_ia32_struct user_i387_struct
-# define user32_fxsr_struct user_fxsr_struct
-#endif
-
-#ifdef CONFIG_MATH_EMULATION
-# define HAVE_HWFP (boot_cpu_data.hard_math)
-#else
-# define HAVE_HWFP 1
-#endif
-
-static unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
+unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
unsigned int sig_xstate_ia32_size = sizeof(struct _fpstate_ia32);
@@ -375,7 +355,7 @@ static inline u32 twd_fxsr_to_i387(struct i387_fxsave_struct *fxsave)
* FXSR floating point environment conversions.
*/

-static void
+void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -412,8 +392,8 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
memcpy(&to[i], &from[i], sizeof(to[0]));
}

-static void convert_to_fxsr(struct task_struct *tsk,
- const struct user_i387_ia32_struct *env)
+void convert_to_fxsr(struct task_struct *tsk,
+ const struct user_i387_ia32_struct *env)

{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -653,7 +633,9 @@ static int restore_i387_xsave(void __user *buf)
u64 mask;
int err;

- if (check_for_xstate(fx, buf, &fx_sw_user))
+ if (check_for_xstate(fx, sig_xstate_ia32_size -
+ offsetof(struct _fpstate_ia32, _fxsr_env),
+ &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 4fd173c..f6705ff 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -117,7 +117,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
regs->orig_ax = -1; /* disable syscall checks */

get_user_ex(buf, &sc->fpstate);
- err |= restore_i387_xstate(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -252,7 +252,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
return (void __user *)-1L;

/* save i387 state */
- if (used_math() && save_i387_xstate(*fpstate) < 0)
+ if (used_math() && save_xstates_sigframe(*fpstate, sig_xstate_size) < 0)
return (void __user *)-1L;

return (void __user *)sp;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index c422527..5d07a88 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -103,8 +103,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
@@ -131,11 +130,11 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
fx_sw_user->xstate_size > fx_sw_user->extended_size)
return -EINVAL;

- err = __get_user(magic2, (__u32 *) (((void *)fpstate) +
- fx_sw_user->extended_size -
+ err = __get_user(magic2, (__u32 *) (((void *)buf) + size -
FP_XSTATE_MAGIC2_SIZE));
if (err)
return err;
+
/*
* Check for the presence of second magic word at the end of memory
* layout. This detects the case where the user just copied the legacy
@@ -148,11 +147,109 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
return 0;
}

-#ifdef CONFIG_X86_64
/*
* Signal frame handlers.
*/
+int save_xstates_sigframe(void __user *buf, unsigned int size)
+{
+ void __user *buf_fxsave = buf;
+ struct task_struct *tsk = current;
+ struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ int err;
+
+ if (!access_ok(VERIFY_WRITE, buf, size))
+ return -EACCES;
+
+ BUG_ON(size < xstate_size);
+
+ if (!used_math())
+ return 0;
+
+ clear_used_math(); /* trigger finit */
+
+ if (!HAVE_HWFP)
+ return fpregs_soft_get(current, NULL, 0,
+ sizeof(struct user_i387_ia32_struct), NULL,
+ (struct _fpstate_ia32 __user *) buf) ? -1 : 1;
+
+ save_xstates(tsk);
+ sanitize_i387_state(tsk);
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32) {
+ if (use_xsave() || use_fxsr()) {
+ struct user_i387_ia32_struct env;
+ struct _fpstate_ia32 __user *fp = buf;
+
+ convert_from_fxsr(&env, tsk);
+ if (__copy_to_user(buf, &env, sizeof(env)))
+ return -1;
+
+ err = __put_user(xsave->i387.swd, &fp->status);
+ err |= __put_user(X86_FXSR_MAGIC, &fp->magic);
+
+ if (err)
+ return -1;
+
+ buf_fxsave = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+#if defined(CONFIG_X86_64)
+ buf = buf_fxsave;
+#endif
+ } else {
+ struct i387_fsave_struct *fsave =
+ &tsk->thread.fpu.state->fsave;
+
+ fsave->status = fsave->swd;
+ }
+ }
+#endif

+ if (__copy_to_user(buf_fxsave, xsave, size))
+ return -1;
+
+ if (use_xsave()) {
+ struct _fpstate __user *fp = buf;
+ struct _xstate __user *x = buf;
+ u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+
+ err = __copy_to_user(&fp->sw_reserved,
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ ia32 ? &fx_sw_reserved_ia32 :
+#endif
+ &fx_sw_reserved,
+ sizeof (struct _fpx_sw_bytes));
+
+ err |= __put_user(FP_XSTATE_MAGIC2,
+ (__u32 __user *) (buf_fxsave + size
+ - FP_XSTATE_MAGIC2_SIZE));
+
+ /*
+ * For legacy compatible, we always set FP/SSE bits in the bit
+ * vector while saving the state to the user context. This will
+ * enable us capturing any changes(during sigreturn) to
+ * the FP/SSE bits by the legacy applications which don't touch
+ * xstate_bv in the xsave header.
+ *
+ * xsave aware apps can change the xstate_bv in the xsave
+ * header as well as change any contents in the memory layout.
+ * xrestore as part of sigreturn will capture all the changes.
+ */
+ xstate_bv |= XSTATE_FPSSE;
+
+ err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
+
+ if (err)
+ return err;
+ }
+
+ return 1;
+}
+
+#ifdef CONFIG_X86_64
int save_i387_xstate(void __user *buf)
{
struct task_struct *tsk = current;
@@ -240,7 +337,7 @@ static int restore_user_xstate(void __user *buf)
int err;

if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, buf, &fx_sw_user))
+ check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
@@ -315,6 +412,94 @@ clear:
}
#endif

+int restore_xstates_sigframe(void __user *buf, unsigned int size)
+{
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ struct user_i387_ia32_struct env;
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ struct _fpx_sw_bytes fx_sw_user;
+ struct task_struct *tsk = current;
+ struct _fpstate_ia32 __user *fp = buf;
+ struct xsave_struct *xsave;
+ u64 xstate_mask = 0;
+ int err;
+
+ if (!buf) {
+ if (used_math()) {
+ clear_fpu(tsk);
+ clear_used_math();
+ }
+ return 0;
+ }
+
+ if (!access_ok(VERIFY_READ, buf, size))
+ return -EACCES;
+
+ if (!used_math()) {
+ err = init_fpu(tsk);
+ if (err)
+ return err;
+ }
+
+ if (!HAVE_HWFP) {
+ set_used_math();
+ return fpregs_soft_set(current, NULL,
+ 0, sizeof(struct user_i387_ia32_struct),
+ NULL, fp) != 0;
+ }
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32 && (use_xsave() || use_fxsr())) {
+ if (__copy_from_user(&env, buf, sizeof(env)))
+ return -1;
+ buf = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+ }
+#endif
+
+ xsave = &tsk->thread.fpu.state->xsave;
+ task_thread_info(tsk)->xstate_mask = 0;
+ if (__copy_from_user(xsave, buf, xstate_size))
+ return -1;
+
+ if (use_xsave()) {
+ u64 *xstate_bv = &xsave->xsave_hdr.xstate_bv;
+
+ /*
+ * If this is no valid xstate, disable all extended states.
+ *
+ * For valid xstates, clear any illegal bits and any bits
+ * that have been cleared in fx_sw_user.xstate_bv.
+ */
+ if (check_for_xstate(buf, size, &fx_sw_user))
+ *xstate_bv = XSTATE_FPSSE;
+ else
+ *xstate_bv &= pcntxt_mask & fx_sw_user.xstate_bv;
+
+ xstate_mask |= *xstate_bv;
+
+ xsave->xsave_hdr.reserved1[0] =
+ xsave->xsave_hdr.reserved1[1] = 0;
+ } else {
+ xstate_mask |= XCNTXT_LAZY;
+ }
+
+ if (use_xsave() || use_fxsr()) {
+ xsave->i387.mxcsr &= mxcsr_feature_mask;
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32)
+ convert_to_fxsr(tsk, &env);
+#endif
+ }
+
+ set_used_math();
+ restore_xstates(tsk, xstate_mask);
+
+ return 0;
+}
+
/*
* Prepare the SW reserved portion of the fxsave memory layout, indicating
* the presence of the extended state information in the memory layout
--
1.5.6.5

2011-03-24 11:39:15

by Brian Gerst

[permalink] [raw]
Subject: Re: [RFC v2 8/8] x86, xsave: remove lazy allocation of xstate area

On Wed, Mar 23, 2011 at 11:27 AM, Hans Rosenfeld <[email protected]> wrote:
> This patch completely removes lazy allocation of the xstate area. All
> tasks will always have an xstate area preallocated, just like they
> already do when non-lazy features are present. The size of the xsave
> area ranges from 112 to 960 bytes, depending on the xstates present and
> enabled. Since it is common to use SSE etc. for optimization, the actual
> overhead is expected to negligible.
>
> This removes some of the special-case handling of non-lazy xstates. It
> also greatly simplifies init_fpu() by removing the allocation code, the
> check for presence of the xstate area or init_fpu() return value.
>
> Signed-off-by: Hans Rosenfeld <[email protected]>

I'm not sure I like this. I did a quick test on 64-bit, and found
that while most if not all user processes allocated the fpu save area
(probably because of glibc blindly initializing the fpu), kernel
threads did not. This patch would force kernel threads to allocate
memory they would never use.

--
Brian Gerst

2011-03-29 14:17:28

by Hans Rosenfeld

[permalink] [raw]
Subject: Re: [RFC v2 8/8] x86, xsave: remove lazy allocation of xstate area

On Thu, Mar 24, 2011 at 07:39:13AM -0400, Brian Gerst wrote:
> On Wed, Mar 23, 2011 at 11:27 AM, Hans Rosenfeld <[email protected]> wrote:
> > This patch completely removes lazy allocation of the xstate area. All
> > tasks will always have an xstate area preallocated, just like they
> > already do when non-lazy features are present. The size of the xsave
> > area ranges from 112 to 960 bytes, depending on the xstates present and
> > enabled. Since it is common to use SSE etc. for optimization, the actual
> > overhead is expected to negligible.
> >
> > This removes some of the special-case handling of non-lazy xstates. It
> > also greatly simplifies init_fpu() by removing the allocation code, the
> > check for presence of the xstate area or init_fpu() return value.
> >
> > Signed-off-by: Hans Rosenfeld <[email protected]>
>
> I'm not sure I like this. I did a quick test on 64-bit, and found
> that while most if not all user processes allocated the fpu save area
> (probably because of glibc blindly initializing the fpu), kernel
> threads did not. This patch would force kernel threads to allocate
> memory they would never use.

Yes, up to a few kilobytes would be wasted by kernel threads. The
related code gets much simpler. I think that is a good thing.

Anyway, the patch is not essential for the rework and LWP support, so I
don't really care that much about it.


Did you take a look at the other patches? I haven't yet received a
single comment on them.


Hans


--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

2011-03-29 15:28:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v2 8/8] x86, xsave: remove lazy allocation of xstate area

On 03/29/2011 07:17 AM, Hans Rosenfeld wrote:
>>
>> I'm not sure I like this. I did a quick test on 64-bit, and found
>> that while most if not all user processes allocated the fpu save area
>> (probably because of glibc blindly initializing the fpu), kernel
>> threads did not. This patch would force kernel threads to allocate
>> memory they would never use.
>
> Yes, up to a few kilobytes would be wasted by kernel threads. The
> related code gets much simpler. I think that is a good thing.
>

This is silly. It shouldn't be very hard to allocate this for user
threads while avoiding the allocation for kernel threads. The only
excuse for allocating it for user threads is if it becomes part of the
kernel stack allocation.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2011-03-30 13:12:04

by Hans Rosenfeld

[permalink] [raw]
Subject: Re: [RFC v2 8/8] x86, xsave: remove lazy allocation of xstate area

On Tue, Mar 29, 2011 at 11:27:50AM -0400, H. Peter Anvin wrote:
> On 03/29/2011 07:17 AM, Hans Rosenfeld wrote:
> >>
> >> I'm not sure I like this. I did a quick test on 64-bit, and found
> >> that while most if not all user processes allocated the fpu save area
> >> (probably because of glibc blindly initializing the fpu), kernel
> >> threads did not. This patch would force kernel threads to allocate
> >> memory they would never use.
> >
> > Yes, up to a few kilobytes would be wasted by kernel threads. The
> > related code gets much simpler. I think that is a good thing.
> >
>
> This is silly. It shouldn't be very hard to allocate this for user
> threads while avoiding the allocation for kernel threads. The only
> excuse for allocating it for user threads is if it becomes part of the
> kernel stack allocation.

The allocation itself is not what I'm concerned about. I'm more worried
about the code that always has to check whether a thread has a xstate
area allocated or not. But I will try and find out to get this done the
way you suggested.


Meanwhile, could you please review the other patches? They are much more
important to me.


Hans


--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

2011-04-05 15:51:20

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 7/8] x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)

This patch extends the xsave structure to support the LWP state. The
xstate feature bit for LWP is added to XCNTXT_NONLAZY, thereby enabling
kernel support for saving/restoring LWP state. The LWP state is also
saved/restored on signal entry/return, just like all other xstates. LWP
state needs to be reset (disabled) when entering a signal handler.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/processor.h | 12 ++++++++++++
arch/x86/include/asm/sigcontext.h | 12 ++++++++++++
arch/x86/include/asm/xsave.h | 3 ++-
arch/x86/kernel/xsave.c | 2 ++
5 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fd5a1f3..55edab6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -131,6 +131,7 @@
#define MSR_AMD64_IBSDCPHYSAD 0xc0011039
#define MSR_AMD64_IBSCTL 0xc001103a
#define MSR_AMD64_IBSBRTARGET 0xc001103b
+#define MSR_AMD64_LWP_CBADDR 0xc0000106

/* Fam 15h MSRs */
#define MSR_F15H_PERF_CTL 0xc0010200
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4c25ab4..df2cbd4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -353,6 +353,17 @@ struct ymmh_struct {
u32 ymmh_space[64];
};

+struct lwp_struct {
+ u64 lwpcb_addr;
+ u32 flags;
+ u32 buf_head_offset;
+ u64 buf_base;
+ u32 buf_size;
+ u32 filters;
+ u64 saved_event_record[4];
+ u32 event_counter[16];
+};
+
struct xsave_hdr_struct {
u64 xstate_bv;
u64 reserved1[2];
@@ -363,6 +374,7 @@ struct xsave_struct {
struct i387_fxsave_struct i387;
struct xsave_hdr_struct xsave_hdr;
struct ymmh_struct ymmh;
+ struct lwp_struct lwp;
/* new processor state extensions will go here */
} __attribute__ ((packed, aligned (64)));

diff --git a/arch/x86/include/asm/sigcontext.h b/arch/x86/include/asm/sigcontext.h
index 04459d2..0a58b82 100644
--- a/arch/x86/include/asm/sigcontext.h
+++ b/arch/x86/include/asm/sigcontext.h
@@ -274,6 +274,17 @@ struct _ymmh_state {
__u32 ymmh_space[64];
};

+struct _lwp_state {
+ __u64 lwpcb_addr;
+ __u32 flags;
+ __u32 buf_head_offset;
+ __u64 buf_base;
+ __u32 buf_size;
+ __u32 filters;
+ __u64 saved_event_record[4];
+ __u32 event_counter[16];
+};
+
/*
* Extended state pointed by the fpstate pointer in the sigcontext.
* In addition to the fpstate, information encoded in the xstate_hdr
@@ -284,6 +295,7 @@ struct _xstate {
struct _fpstate fpstate;
struct _xsave_hdr xstate_hdr;
struct _ymmh_state ymmh;
+ struct _lwp_state lwp;
/* new processor state extensions go here */
};

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 4ccee3c..be89f0e 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -9,6 +9,7 @@
#define XSTATE_FP 0x1
#define XSTATE_SSE 0x2
#define XSTATE_YMM 0x4
+#define XSTATE_LWP (1ULL << 62)

#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)

@@ -24,7 +25,7 @@
* These are the features that the OS can handle currently.
*/
#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
-#define XCNTXT_NONLAZY 0
+#define XCNTXT_NONLAZY (XSTATE_LWP)

#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 56ab3d3..a188362 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -177,6 +177,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
+ if (pcntxt_mask & XSTATE_LWP)
+ wrmsrl(MSR_AMD64_LWP_CBADDR, 0);
if (use_xsaveopt())
sanitize_i387_state(tsk);

--
1.5.6.5

2011-04-05 15:51:34

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 5/8] x86, xsave: more cleanups

Removed some declarations from headers that weren't used.

Retired TS_USEDFPU, it has been replaced by the XCNTXT_* bits in
xstate_mask.

There is no reason functions like fpu_fxsave() etc. need to know or
handle anything else than a buffer to save/restore their stuff to/from.

Sanitize_i387_state() is extra work that is only needed when xsaveopt is
used. There is no point in hiding this in an inline function, adding
extra code lines just to save a single if() in the five places it is
used. Also, it is obscuring a fact that might well be interesting to
whoever is reading the code, but it is not gaining anything.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 67 ++++++++++++-----------------------
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/include/asm/xsave.h | 14 +++----
arch/x86/kernel/i387.c | 12 ++++--
arch/x86/kernel/xsave.c | 32 ++++++++---------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 4 +-
7 files changed, 55 insertions(+), 78 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 97867ea..b8f9617 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -42,7 +42,6 @@ extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
extern int init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
-extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
@@ -60,15 +59,10 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
*/
#define xstateregs_active fpregs_active

-extern struct _fpx_sw_bytes fx_sw_reserved;
extern unsigned int mxcsr_feature_mask;
+
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
-extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
-struct _fpstate_ia32;
-struct _xstate_ia32;
-extern int save_i387_xstate_ia32(void __user *buf);
-extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
@@ -76,7 +70,7 @@ extern int restore_i387_xstate_ia32(void __user *buf);
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
# define HAVE_HWFP 1
-static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
+# define finit_soft_fpu(x)
#endif

#define X87_FSW_ES (1 << 7) /* Exception Summary */
@@ -96,15 +90,6 @@ static __always_inline __pure bool use_fxsr(void)
return static_cpu_has(X86_FEATURE_FXSR);
}

-extern void __sanitize_i387_state(struct task_struct *);
-
-static inline void sanitize_i387_state(struct task_struct *tsk)
-{
- if (!use_xsaveopt())
- return;
- __sanitize_i387_state(tsk);
-}
-
#ifdef CONFIG_X86_64
static inline void fxrstor(struct i387_fxsave_struct *fx)
{
@@ -118,7 +103,7 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
#endif
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
/* Using "rex64; fxsave %0" is broken because, if the memory operand
uses any extended registers for addressing, a second REX prefix
@@ -129,7 +114,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
/* Using "fxsaveq %0" would be the ideal choice, but is only supported
starting with gas 2.16. */
__asm__ __volatile__("fxsaveq %0"
- : "=m" (fpu->state->fxsave));
+ : "=m" (*fx));
#else
/* Using, as a workaround, the properly prefixed form below isn't
accepted by any binutils version so far released, complaining that
@@ -140,8 +125,8 @@ static inline void fpu_fxsave(struct fpu *fpu)
This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
asm volatile("rex64/fxsave (%[fx])"
- : "=m" (fpu->state->fxsave)
- : [fx] "R" (&fpu->state->fxsave));
+ : "=m" (*fx)
+ : [fx] "R" (fx));
#endif
}

@@ -161,10 +146,10 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
"m" (*fx));
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
asm volatile("fxsave %[fx]"
- : [fx] "=m" (fpu->state->fxsave));
+ : [fx] "=m" (*fx));
}

#endif /* CONFIG_X86_64 */
@@ -181,25 +166,25 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
-static inline void fpu_restore(struct fpu *fpu)
+static inline void fpu_restore(struct i387_fxsave_struct *fx)
{
- fxrstor(&fpu->state->fxsave);
+ fxrstor(fx);
}

-static inline void fpu_save(struct fpu *fpu)
+static inline void fpu_save(struct i387_fxsave_struct *fx)
{
if (use_fxsr()) {
- fpu_fxsave(fpu);
+ fpu_fxsave(fx);
} else {
asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
+ : [fx] "=m" (*fx));
}
}

-static inline void fpu_clean(struct fpu *fpu)
+static inline void fpu_clean(struct i387_fxsave_struct *fx)
{
u32 swd = (use_fxsr() || use_xsave()) ?
- fpu->state->fxsave.swd : fpu->state->fsave.swd;
+ fx->swd : ((struct i387_fsave_struct *)fx)->swd;

if (unlikely(swd & X87_FSW_ES))
asm volatile("fnclex");
@@ -215,19 +200,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void __clear_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- /* Ignore delayed exceptions from user space */
- asm volatile("1: fwait\n"
- "2:\n"
- _ASM_EXTABLE(1b, 2b));
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
- stts();
- }
-}
-
static inline void kernel_fpu_begin(void)
{
preempt_disable();
@@ -286,7 +258,14 @@ static inline void irq_ts_restore(int TS_state)
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
- __clear_fpu(tsk);
+ if (task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY) {
+ /* Ignore delayed exceptions from user space */
+ asm volatile("1: fwait\n"
+ "2:\n"
+ _ASM_EXTABLE(1b, 2b));
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
+ stts();
+ }
preempt_enable();
}

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index ec12d62..0e691c6 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -244,8 +244,6 @@ static inline struct thread_info *current_thread_info(void)
* ever touches our thread-synchronous status, so we don't
* have to worry about atomic accesses.
*/
-#define TS_USEDFPU 0x0001 /* FPU was used by this task
- this quantum (SMP) */
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
#define TS_POLLING 0x0004 /* idle task polling need_resched,
skip sending interrupt */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 742da4a..b8861d4 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -37,8 +37,8 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

-extern void xsave(struct fpu *, u64);
-extern void xrstor(struct fpu *, u64);
+extern void xsave(struct xsave_struct *, u64);
+extern void xrstor(struct xsave_struct *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
extern int save_xstates_sigframe(void __user *, unsigned int);
@@ -46,10 +46,7 @@ extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
-extern int init_fpu(struct task_struct *child);
-extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- unsigned int size,
- struct _fpx_sw_bytes *sw);
+extern void sanitize_i387_state(struct task_struct *);

static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
@@ -71,7 +68,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
+static inline void xsaveopt_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
@@ -82,7 +79,8 @@ static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (lmask), "d" (hmask) :
+ "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
+
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index ca33c0b..dd9644a 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -182,7 +182,8 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -201,7 +202,8 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -440,7 +442,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
-1);
}

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (kbuf && pos == 0 && count == sizeof(env)) {
convert_from_fxsr(kbuf, target);
@@ -463,7 +466,8 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (!HAVE_HWFP)
return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf);
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 9ecc791..d42810f 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -39,7 +39,7 @@ static unsigned int *xstate_offsets, *xstate_sizes, xstate_features;
* that the user doesn't see some stale state in the memory layout during
* signal handling, debugging etc.
*/
-void __sanitize_i387_state(struct task_struct *tsk)
+void sanitize_i387_state(struct task_struct *tsk)
{
u64 xstate_bv;
int feature_bit = 0x2;
@@ -48,7 +48,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!fx)
return;

- BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);
+ BUG_ON(task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY);

xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;

@@ -103,8 +103,8 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
- struct _fpx_sw_bytes *fx_sw_user)
+static int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
+ struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
sizeof(struct xsave_hdr_struct);
@@ -176,7 +176,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
- sanitize_i387_state(tsk);
+ if (use_xsaveopt())
+ sanitize_i387_state(tsk);

#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
if (ia32) {
@@ -498,17 +499,17 @@ void __cpuinit xsave_init(void)
this_func();
}

-void xsave(struct fpu *fpu, u64 mask)
+void xsave(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- fpu_xsave(&fpu->state->xsave, mask);
+ xsaveopt_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_save(fpu);
+ fpu_save(&x->i387);

if (mask & XCNTXT_LAZY)
- fpu_clean(fpu);
+ fpu_clean(&x->i387);

stts();
}
@@ -521,7 +522,7 @@ void save_xstates(struct task_struct *tsk)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xsave(&tsk->thread.fpu, ti->xstate_mask);
+ xsave(&tsk->thread.fpu.state->xsave, ti->xstate_mask);

if (!(ti->xstate_mask & XCNTXT_LAZY))
tsk->fpu_counter = 0;
@@ -533,19 +534,17 @@ void save_xstates(struct task_struct *tsk)
*/
if (tsk->fpu_counter < 5)
ti->xstate_mask &= ~XCNTXT_LAZY;
-
- ti->status &= ~TS_USEDFPU;
}
EXPORT_SYMBOL(save_xstates);

-void xrstor(struct fpu *fpu, u64 mask)
+void xrstor(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- xrstor_state(&fpu->state->xsave, mask);
+ xrstor_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_restore(fpu);
+ fpu_restore(&x->i387);

if (!(mask & XCNTXT_LAZY))
stts();
@@ -559,10 +558,9 @@ void restore_xstates(struct task_struct *tsk, u64 mask)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xrstor(&tsk->thread.fpu, mask);
+ xrstor(&tsk->thread.fpu.state->xsave, mask);

ti->xstate_mask |= mask;
- ti->status |= TS_USEDFPU;
if (mask & XCNTXT_LAZY)
tsk->fpu_counter++;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5b4cdcb..f756c95 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -876,7 +876,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
#ifdef CONFIG_X86_64
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
#endif
- if (current_thread_info()->status & TS_USEDFPU)
+ if (current_thread_info()->xstate_mask & XCNTXT_LAZY)
clts();
load_gdt(&__get_cpu_var(host_gdt));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aae9e8f..bc04e15 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5805,7 +5805,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
save_xstates(current);
- xrstor(&vcpu->arch.guest_fpu, -1);
+ xrstor(&vcpu->arch.guest_fpu.state->xsave, -1);
trace_kvm_fpu(1);
}

@@ -5817,7 +5817,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- xsave(&vcpu->arch.guest_fpu, -1);
+ xsave(&vcpu->arch.guest_fpu.state->xsave, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
--
1.5.6.5

2011-04-05 15:51:46

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 2/8] x86, xsave: rework fpu/xsave support

This is a complete rework of the code that handles FPU and related
extended states. Since FPU, XMM and YMM states are just variants of what
xsave handles, all of the old FPU-specific state handling code will be
hidden behind a set of functions that resemble xsave and xrstor. For
hardware that does not support xsave, the code falls back to
fxsave/fxrstor or even fsave/frstor.

A xstate_mask member will be added to the thread_info structure that
will control which states are to be saved by xsave. It is set to include
all "lazy" states (that is, all states currently supported: FPU, XMM and
YMM) by the #NM handler when a lazy restore is triggered or by
switch_to() when the tasks FPU context is preloaded. Xstate_mask is
intended to completely replace TS_USEDFPU in a later cleanup patch.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 44 +++++++++++++++++++---
arch/x86/include/asm/thread_info.h | 2 +
arch/x86/include/asm/xsave.h | 14 ++++++-
arch/x86/kernel/i387.c | 11 ++++--
arch/x86/kernel/process_32.c | 27 +++++---------
arch/x86/kernel/process_64.c | 26 ++++----------
arch/x86/kernel/traps.c | 11 +++---
arch/x86/kernel/xsave.c | 71 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 7 ++--
drivers/lguest/x86/core.c | 2 +-
10 files changed, 158 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index d908383..939af08 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -224,12 +224,46 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
+static inline void fpu_restore(struct fpu *fpu)
+{
+ fxrstor_checking(&fpu->state->fxsave);
+}
+
+static inline void fpu_save(struct fpu *fpu)
+{
+ if (use_fxsr()) {
+ fpu_fxsave(fpu);
+ } else {
+ asm volatile("fsave %[fx]; fwait"
+ : [fx] "=m" (fpu->state->fsave));
+ }
+}
+
+static inline void fpu_clean(struct fpu *fpu)
+{
+ u32 swd = (use_fxsr() || use_xsave()) ?
+ fpu->state->fxsave.swd : fpu->state->fsave.swd;
+
+ if (unlikely(swd & X87_FSW_ES))
+ asm volatile("fnclex");
+
+ /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
+ is pending. Clear the x87 state here by setting it to fixed
+ values. safe_address is a random variable that should be in L1 */
+ alternative_input(
+ ASM_NOP8 ASM_NOP2,
+ "emms\n\t" /* clear stack tags */
+ "fildl %P[addr]", /* set F?P to defined value */
+ X86_FEATURE_FXSAVE_LEAK,
+ [addr] "m" (safe_address));
+}
+
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
struct xsave_struct *xstate = &fpu->state->xsave;

- fpu_xsave(xstate);
+ fpu_xsave(xstate, -1);

/*
* xsave header may indicate the init state of the FP.
@@ -295,18 +329,16 @@ static inline void __clear_fpu(struct task_struct *tsk)
"2:\n"
_ASM_EXTABLE(1b, 2b));
task_thread_info(tsk)->status &= ~TS_USEDFPU;
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
stts();
}
}

static inline void kernel_fpu_begin(void)
{
- struct thread_info *me = current_thread_info();
preempt_disable();
- if (me->status & TS_USEDFPU)
- __save_init_fpu(me->task);
- else
- clts();
+ save_xstates(current_thread_info()->task);
+ clts();
}

static inline void kernel_fpu_end(void)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 1f2e61e..ec12d62 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -26,6 +26,7 @@ struct exec_domain;
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
+ __u64 xstate_mask; /* xstates in use */
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
@@ -47,6 +48,7 @@ struct thread_info {
{ \
.task = &tsk, \
.exec_domain = &default_exec_domain, \
+ .xstate_mask = 0, \
.flags = 0, \
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 8bcbbce..6052a84 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -25,6 +25,8 @@
*/
#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)

+#define XCNTXT_LAZY XCNTXT_MASK
+
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
@@ -35,6 +37,11 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

+extern void xsave(struct fpu *, u64);
+extern void xrstor(struct fpu *, u64);
+extern void save_xstates(struct task_struct *);
+extern void restore_xstates(struct task_struct *, u64);
+
extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
@@ -113,15 +120,18 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx)
+static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
alternative_input(
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 12aff25..1088ac5 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -152,8 +152,11 @@ int init_fpu(struct task_struct *tsk)
int ret;

if (tsk_used_math(tsk)) {
- if (HAVE_HWFP && tsk == current)
- unlazy_fpu(tsk);
+ if (HAVE_HWFP && tsk == current) {
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
+ }
return 0;
}

@@ -600,7 +603,9 @@ int save_i387_xstate_ia32(void __user *buf)
NULL, fp) ? -1 : 1;
}

- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();

if (cpu_has_xsave)
return save_i387_xsave(fp);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..8df07c3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -185,7 +185,9 @@ void release_thread(struct task_struct *dead_task)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -294,21 +296,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*next = &next_p->thread;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
- bool preload_fpu;

/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */

- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;
-
- __unlazy_fpu(prev_p);
+ save_xstates(prev_p);

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -349,11 +343,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);

- /* If we're going to preload the fpu context, make sure clts
- is run while we're batching the cpu state updates. */
- if (preload_fpu)
- clts();
-
/*
* Leave lazy mode, flushing any hypercalls made here.
* This must be done before restoring TLS segments so
@@ -363,8 +352,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/
arch_end_context_switch(next_p);

- if (preload_fpu)
- __math_state_restore();
+ /*
+ * Restore enabled extended states for the task.
+ */
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

/*
* Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6c9dd92..cbf1a67 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -249,7 +249,9 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -378,17 +380,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
unsigned fsindex, gsindex;
- bool preload_fpu;
-
- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -420,11 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
load_TLS(next, cpu);

/* Must be after DS reload */
- __unlazy_fpu(prev_p);
-
- /* Make sure cpu is ready for new context */
- if (preload_fpu)
- clts();
+ save_xstates(prev_p);

/*
* Leave lazy mode, flushing any hypercalls made here.
@@ -485,11 +475,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
__switch_to_xtra(prev_p, next_p, tss);

/*
- * Preload the FPU context, now that we've determined that the
- * task is likely to be using it.
+ * Restore enabled extended states for the task.
*/
- if (preload_fpu)
- __math_state_restore();
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

return prev_p;
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 32f3043..072c30e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -625,7 +625,10 @@ void math_error(struct pt_regs *regs, int error_code, int trapnr)
/*
* Save the info for the exception handler and clear the error.
*/
- save_init_fpu(task);
+ preempt_disable();
+ save_xstates(task);
+ preempt_enable();
+
task->thread.trap_no = trapnr;
task->thread.error_code = error_code;
info.si_signo = SIGFPE;
@@ -734,7 +737,7 @@ void __math_state_restore(void)
return;
}

- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
+ thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
}

@@ -768,9 +771,7 @@ asmlinkage void math_state_restore(void)
local_irq_disable();
}

- clts(); /* Allow maths ops (or we recurse) */
-
- __math_state_restore();
+ restore_xstates(tsk, XCNTXT_LAZY);
}
EXPORT_SYMBOL_GPL(math_state_restore);

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 6b063d7..d9fa41f 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -5,6 +5,7 @@
*/
#include <linux/bootmem.h>
#include <linux/compat.h>
+#include <linux/module.h>
#include <asm/i387.h>
#ifdef CONFIG_IA32_EMULATION
#include <asm/sigcontext32.h>
@@ -474,3 +475,73 @@ void __cpuinit xsave_init(void)
next_func = xstate_enable;
this_func();
}
+
+void xsave(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ fpu_xsave(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_save(fpu);
+
+ if (mask & XCNTXT_LAZY)
+ fpu_clean(fpu);
+
+ stts();
+}
+EXPORT_SYMBOL(xsave);
+
+void save_xstates(struct task_struct *tsk)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xsave(&tsk->thread.fpu, ti->xstate_mask);
+
+ if (!(ti->xstate_mask & XCNTXT_LAZY))
+ tsk->fpu_counter = 0;
+
+ /*
+ * If the task hasn't used the fpu the last 5 timeslices,
+ * force a lazy restore of the math states by clearing them
+ * from xstate_mask.
+ */
+ if (tsk->fpu_counter < 5)
+ ti->xstate_mask &= ~XCNTXT_LAZY;
+
+ ti->status &= ~TS_USEDFPU;
+}
+EXPORT_SYMBOL(save_xstates);
+
+void xrstor(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ xrstor_state(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_restore(fpu);
+
+ if (!(mask & XCNTXT_LAZY))
+ stts();
+}
+EXPORT_SYMBOL(xrstor);
+
+void restore_xstates(struct task_struct *tsk, u64 mask)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xrstor(&tsk->thread.fpu, mask);
+
+ ti->xstate_mask |= mask;
+ ti->status |= TS_USEDFPU;
+ if (mask & XCNTXT_LAZY)
+ tsk->fpu_counter++;
+}
+EXPORT_SYMBOL(restore_xstates);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58f517b..aae9e8f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -58,6 +58,7 @@
#include <asm/xcr.h>
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/xsave.h>

#define MAX_IO_MSRS 256
#define CR0_RESERVED_BITS \
@@ -5803,8 +5804,8 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
*/
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
- unlazy_fpu(current);
- fpu_restore_checking(&vcpu->arch.guest_fpu);
+ save_xstates(current);
+ xrstor(&vcpu->arch.guest_fpu, -1);
trace_kvm_fpu(1);
}

@@ -5816,7 +5817,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- fpu_save_init(&vcpu->arch.guest_fpu);
+ xsave(&vcpu->arch.guest_fpu, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9f1659c..ef62289 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -204,7 +204,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
* uses the FPU.
*/
if (cpu->ts)
- unlazy_fpu(current);
+ save_xstates(current);

/*
* SYSENTER is an optimized way of doing system calls. We can't allow
--
1.5.6.5

2011-04-05 15:51:53

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 3/8] x86, xsave: cleanup fpu/xsave signal frame setup

There are currently two code paths that handle the fpu/xsave context in
a signal frame for 32bit and 64bit tasks. These two code paths differ
only in that they have or lack certain micro-optimizations or do some
additional work (fsave compatibility for 32bit). The code is complex,
mostly duplicate and hard to understand and maintain.

This patch creates a set of two new, unified and cleaned up functions to
replace them. Besides avoiding the duplicate code, it is now obvious
what is done in which situations. The micro-optimization w.r.t xsave
(saving and restoring directly from the user buffer) is gone, and with
it the headaches caused by it about validating the buffer alignment and
contents and catching possible xsave/xrstor faults.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 20 ++++
arch/x86/include/asm/xsave.h | 4 +-
arch/x86/kernel/i387.c | 32 ++------
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/xsave.c | 197 ++++++++++++++++++++++++++++++++++++++++--
6 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..2605fae 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -255,7 +255,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,

get_user_ex(tmp, &sc->fpstate);
buf = compat_ptr(tmp);
- err |= restore_i387_xstate_ia32(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_ia32_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -396,7 +396,7 @@ static void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
if (used_math()) {
sp = sp - sig_xstate_ia32_size;
*fpstate = (struct _fpstate_ia32 *) sp;
- if (save_i387_xstate_ia32(*fpstate) < 0)
+ if (save_xstates_sigframe(*fpstate, sig_xstate_ia32_size) < 0)
return (void __user *) -1L;
}

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 939af08..30930bf 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -25,6 +25,20 @@
#include <asm/uaccess.h>
#include <asm/xsave.h>

+#ifdef CONFIG_X86_64
+# include <asm/sigcontext32.h>
+# include <asm/user32.h>
+#else
+# define save_i387_xstate_ia32 save_i387_xstate
+# define restore_i387_xstate_ia32 restore_i387_xstate
+# define _fpstate_ia32 _fpstate
+# define _xstate_ia32 _xstate
+# define sig_xstate_ia32_size sig_xstate_size
+# define fx_sw_reserved_ia32 fx_sw_reserved
+# define user_i387_ia32_struct user_i387_struct
+# define user32_fxsr_struct user_fxsr_struct
+#endif
+
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
@@ -33,6 +47,9 @@ extern asmlinkage void math_state_restore(void);
extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

+extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
+extern void convert_to_fxsr(struct task_struct *, const struct user_i387_ia32_struct *);
+
extern user_regset_active_fn fpregs_active, xfpregs_active;
extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
xstateregs_get;
@@ -46,6 +63,7 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
#define xstateregs_active fpregs_active

extern struct _fpx_sw_bytes fx_sw_reserved;
+extern unsigned int mxcsr_feature_mask;
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
@@ -56,8 +74,10 @@ extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
+# define HAVE_HWFP (boot_cpu_data.hard_math)
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
+# define HAVE_HWFP 1
static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
#endif

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 6052a84..200c56d 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -41,12 +41,14 @@ extern void xsave(struct fpu *, u64);
extern void xrstor(struct fpu *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
+extern int save_xstates_sigframe(void __user *, unsigned int);
+extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+ unsigned int size,
struct _fpx_sw_bytes *sw);

static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 1088ac5..69625a8 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -18,27 +18,7 @@
#include <asm/i387.h>
#include <asm/user.h>

-#ifdef CONFIG_X86_64
-# include <asm/sigcontext32.h>
-# include <asm/user32.h>
-#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
-# define _fpstate_ia32 _fpstate
-# define _xstate_ia32 _xstate
-# define sig_xstate_ia32_size sig_xstate_size
-# define fx_sw_reserved_ia32 fx_sw_reserved
-# define user_i387_ia32_struct user_i387_struct
-# define user32_fxsr_struct user_fxsr_struct
-#endif
-
-#ifdef CONFIG_MATH_EMULATION
-# define HAVE_HWFP (boot_cpu_data.hard_math)
-#else
-# define HAVE_HWFP 1
-#endif
-
-static unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
+unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
unsigned int sig_xstate_ia32_size = sizeof(struct _fpstate_ia32);
@@ -375,7 +355,7 @@ static inline u32 twd_fxsr_to_i387(struct i387_fxsave_struct *fxsave)
* FXSR floating point environment conversions.
*/

-static void
+void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -412,8 +392,8 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
memcpy(&to[i], &from[i], sizeof(to[0]));
}

-static void convert_to_fxsr(struct task_struct *tsk,
- const struct user_i387_ia32_struct *env)
+void convert_to_fxsr(struct task_struct *tsk,
+ const struct user_i387_ia32_struct *env)

{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -653,7 +633,9 @@ static int restore_i387_xsave(void __user *buf)
u64 mask;
int err;

- if (check_for_xstate(fx, buf, &fx_sw_user))
+ if (check_for_xstate(fx, sig_xstate_ia32_size -
+ offsetof(struct _fpstate_ia32, _fxsr_env),
+ &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 4fd173c..f6705ff 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -117,7 +117,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
regs->orig_ax = -1; /* disable syscall checks */

get_user_ex(buf, &sc->fpstate);
- err |= restore_i387_xstate(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -252,7 +252,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
return (void __user *)-1L;

/* save i387 state */
- if (used_math() && save_i387_xstate(*fpstate) < 0)
+ if (used_math() && save_xstates_sigframe(*fpstate, sig_xstate_size) < 0)
return (void __user *)-1L;

return (void __user *)sp;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d9fa41f..08b2fe8 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -103,8 +103,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
@@ -131,11 +130,11 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
fx_sw_user->xstate_size > fx_sw_user->extended_size)
return -EINVAL;

- err = __get_user(magic2, (__u32 *) (((void *)fpstate) +
- fx_sw_user->extended_size -
+ err = __get_user(magic2, (__u32 *) (((void *)buf) + size -
FP_XSTATE_MAGIC2_SIZE));
if (err)
return err;
+
/*
* Check for the presence of second magic word at the end of memory
* layout. This detects the case where the user just copied the legacy
@@ -148,11 +147,109 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
return 0;
}

-#ifdef CONFIG_X86_64
/*
* Signal frame handlers.
*/
+int save_xstates_sigframe(void __user *buf, unsigned int size)
+{
+ void __user *buf_fxsave = buf;
+ struct task_struct *tsk = current;
+ struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ int err;
+
+ if (!access_ok(VERIFY_WRITE, buf, size))
+ return -EACCES;
+
+ BUG_ON(size < xstate_size);
+
+ if (!used_math())
+ return 0;
+
+ clear_used_math(); /* trigger finit */
+
+ if (!HAVE_HWFP)
+ return fpregs_soft_get(current, NULL, 0,
+ sizeof(struct user_i387_ia32_struct), NULL,
+ (struct _fpstate_ia32 __user *) buf) ? -1 : 1;
+
+ save_xstates(tsk);
+ sanitize_i387_state(tsk);
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32) {
+ if (use_xsave() || use_fxsr()) {
+ struct user_i387_ia32_struct env;
+ struct _fpstate_ia32 __user *fp = buf;
+
+ convert_from_fxsr(&env, tsk);
+ if (__copy_to_user(buf, &env, sizeof(env)))
+ return -1;
+
+ err = __put_user(xsave->i387.swd, &fp->status);
+ err |= __put_user(X86_FXSR_MAGIC, &fp->magic);
+
+ if (err)
+ return -1;
+
+ buf_fxsave = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+#if defined(CONFIG_X86_64)
+ buf = buf_fxsave;
+#endif
+ } else {
+ struct i387_fsave_struct *fsave =
+ &tsk->thread.fpu.state->fsave;
+
+ fsave->status = fsave->swd;
+ }
+ }
+#endif

+ if (__copy_to_user(buf_fxsave, xsave, size))
+ return -1;
+
+ if (use_xsave()) {
+ struct _fpstate __user *fp = buf;
+ struct _xstate __user *x = buf;
+ u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+
+ err = __copy_to_user(&fp->sw_reserved,
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ ia32 ? &fx_sw_reserved_ia32 :
+#endif
+ &fx_sw_reserved,
+ sizeof (struct _fpx_sw_bytes));
+
+ err |= __put_user(FP_XSTATE_MAGIC2,
+ (__u32 __user *) (buf_fxsave + size
+ - FP_XSTATE_MAGIC2_SIZE));
+
+ /*
+ * For legacy compatible, we always set FP/SSE bits in the bit
+ * vector while saving the state to the user context. This will
+ * enable us capturing any changes(during sigreturn) to
+ * the FP/SSE bits by the legacy applications which don't touch
+ * xstate_bv in the xsave header.
+ *
+ * xsave aware apps can change the xstate_bv in the xsave
+ * header as well as change any contents in the memory layout.
+ * xrestore as part of sigreturn will capture all the changes.
+ */
+ xstate_bv |= XSTATE_FPSSE;
+
+ err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
+
+ if (err)
+ return err;
+ }
+
+ return 1;
+}
+
+#ifdef CONFIG_X86_64
int save_i387_xstate(void __user *buf)
{
struct task_struct *tsk = current;
@@ -240,7 +337,7 @@ static int restore_user_xstate(void __user *buf)
int err;

if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, buf, &fx_sw_user))
+ check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
@@ -315,6 +412,94 @@ clear:
}
#endif

+int restore_xstates_sigframe(void __user *buf, unsigned int size)
+{
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ struct user_i387_ia32_struct env;
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ struct _fpx_sw_bytes fx_sw_user;
+ struct task_struct *tsk = current;
+ struct _fpstate_ia32 __user *fp = buf;
+ struct xsave_struct *xsave;
+ u64 xstate_mask = 0;
+ int err;
+
+ if (!buf) {
+ if (used_math()) {
+ clear_fpu(tsk);
+ clear_used_math();
+ }
+ return 0;
+ }
+
+ if (!access_ok(VERIFY_READ, buf, size))
+ return -EACCES;
+
+ if (!used_math()) {
+ err = init_fpu(tsk);
+ if (err)
+ return err;
+ }
+
+ if (!HAVE_HWFP) {
+ set_used_math();
+ return fpregs_soft_set(current, NULL,
+ 0, sizeof(struct user_i387_ia32_struct),
+ NULL, fp) != 0;
+ }
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32 && (use_xsave() || use_fxsr())) {
+ if (__copy_from_user(&env, buf, sizeof(env)))
+ return -1;
+ buf = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+ }
+#endif
+
+ xsave = &tsk->thread.fpu.state->xsave;
+ task_thread_info(tsk)->xstate_mask = 0;
+ if (__copy_from_user(xsave, buf, xstate_size))
+ return -1;
+
+ if (use_xsave()) {
+ u64 *xstate_bv = &xsave->xsave_hdr.xstate_bv;
+
+ /*
+ * If this is no valid xstate, disable all extended states.
+ *
+ * For valid xstates, clear any illegal bits and any bits
+ * that have been cleared in fx_sw_user.xstate_bv.
+ */
+ if (check_for_xstate(buf, size, &fx_sw_user))
+ *xstate_bv = XSTATE_FPSSE;
+ else
+ *xstate_bv &= pcntxt_mask & fx_sw_user.xstate_bv;
+
+ xstate_mask |= *xstate_bv;
+
+ xsave->xsave_hdr.reserved1[0] =
+ xsave->xsave_hdr.reserved1[1] = 0;
+ } else {
+ xstate_mask |= XCNTXT_LAZY;
+ }
+
+ if (use_xsave() || use_fxsr()) {
+ xsave->i387.mxcsr &= mxcsr_feature_mask;
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32)
+ convert_to_fxsr(tsk, &env);
+#endif
+ }
+
+ set_used_math();
+ restore_xstates(tsk, xstate_mask);
+
+ return 0;
+}
+
/*
* Prepare the SW reserved portion of the fxsave memory layout, indicating
* the presence of the extended state information in the memory layout
--
1.5.6.5

2011-04-05 15:52:02

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 4/8] x86, xsave: remove unused code

The patches to rework the fpu/xsave handling and signal frame setup have
made a lot of code unused. This patch removes all this now useless stuff.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 155 ++----------------------------
arch/x86/include/asm/xsave.h | 51 ----------
arch/x86/kernel/i387.c | 221 ------------------------------------------
arch/x86/kernel/traps.c | 22 ----
arch/x86/kernel/xsave.c | 163 -------------------------------
5 files changed, 7 insertions(+), 605 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 30930bf..97867ea 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -29,8 +29,6 @@
# include <asm/sigcontext32.h>
# include <asm/user32.h>
#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
# define _fpstate_ia32 _fpstate
# define _xstate_ia32 _xstate
# define sig_xstate_ia32_size sig_xstate_size
@@ -108,75 +106,16 @@ static inline void sanitize_i387_state(struct task_struct *tsk)
}

#ifdef CONFIG_X86_64
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
- int err;
-
- /* See comment in fxsave() below. */
-#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxrstorq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "m" (*fx), "0" (0));
-#else
- asm volatile("1: rex64/fxrstor (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "R" (fx), "m" (*fx), "0" (0));
-#endif
- return err;
-}
-
-static inline int fxsave_user(struct i387_fxsave_struct __user *fx)
-{
- int err;
-
- /*
- * Clear the bytes not touched by the fxsave and reserved
- * for the SW usage.
- */
- err = __clear_user(&fx->sw_reserved,
- sizeof(struct _fpx_sw_bytes));
- if (unlikely(err))
- return -EFAULT;
-
/* See comment in fxsave() below. */
#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxsaveq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), [fx] "=m" (*fx)
- : "0" (0));
+ asm volatile("fxrstorq %[fx]\n\t"
+ : : [fx] "m" (*fx));
#else
- asm volatile("1: rex64/fxsave (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), "=m" (*fx)
- : [fx] "R" (fx), "0" (0));
+ asm volatile("rex64/fxrstor (%[fx])\n\t"
+ : : [fx] "R" (fx), "m" (*fx));
#endif
- if (unlikely(err) &&
- __clear_user(fx, sizeof(struct i387_fxsave_struct)))
- err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -209,7 +148,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
#else /* CONFIG_X86_32 */

/* perform fxrstor iff the processor has extended states, otherwise frstor */
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
/*
* The "nop" is needed to make the instructions the same
@@ -220,8 +159,6 @@ static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
"fxrstor %1",
X86_FEATURE_FXSR,
"m" (*fx));
-
- return 0;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -246,7 +183,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
*/
static inline void fpu_restore(struct fpu *fpu)
{
- fxrstor_checking(&fpu->state->fxsave);
+ fxrstor(&fpu->state->fxsave);
}

static inline void fpu_save(struct fpu *fpu)
@@ -278,69 +215,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void fpu_save_init(struct fpu *fpu)
-{
- if (use_xsave()) {
- struct xsave_struct *xstate = &fpu->state->xsave;
-
- fpu_xsave(xstate, -1);
-
- /*
- * xsave header may indicate the init state of the FP.
- */
- if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
- return;
- } else if (use_fxsr()) {
- fpu_fxsave(fpu);
- } else {
- asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
- return;
- }
-
- if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES))
- asm volatile("fnclex");
-
- /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
- is pending. Clear the x87 state here by setting it to fixed
- values. safe_address is a random variable that should be in L1 */
- alternative_input(
- ASM_NOP8 ASM_NOP2,
- "emms\n\t" /* clear stack tags */
- "fildl %P[addr]", /* set F?P to defined value */
- X86_FEATURE_FXSAVE_LEAK,
- [addr] "m" (safe_address));
-}
-
-static inline void __save_init_fpu(struct task_struct *tsk)
-{
- fpu_save_init(&tsk->thread.fpu);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
-}
-
-static inline int fpu_restore_checking(struct fpu *fpu)
-{
- if (use_xsave())
- return xrstor_checking(&fpu->state->xsave, -1);
- else
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
-/*
- * Signal frame handlers...
- */
-extern int save_i387_xstate(void __user *buf);
-extern int restore_i387_xstate(void __user *buf);
-
-static inline void __unlazy_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- __save_init_fpu(tsk);
- stts();
- } else
- tsk->fpu_counter = 0;
-}
-
static inline void __clear_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
@@ -409,21 +283,6 @@ static inline void irq_ts_restore(int TS_state)
/*
* These disable preemption on their own and are safe
*/
-static inline void save_init_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __save_init_fpu(tsk);
- stts();
- preempt_enable();
-}
-
-static inline void unlazy_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __unlazy_fpu(tsk);
- preempt_enable();
-}
-
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 200c56d..742da4a 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -51,26 +51,6 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
unsigned int size,
struct _fpx_sw_bytes *sw);

-static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
-{
- int err;
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
- : "memory");
-
- return err;
-}
-
static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -81,37 +61,6 @@ static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline int xsave_checking(struct xsave_struct __user *buf)
-{
- int err;
-
- /*
- * Clear the xsave header first, so that reserved fields are
- * initialized to zero.
- */
- err = __clear_user(&buf->xsave_hdr,
- sizeof(struct xsave_hdr_struct));
- if (unlikely(err))
- return -EFAULT;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b,3b)
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
-
- if (unlikely(err) && __clear_user(buf, xstate_size))
- err = -EFAULT;
-
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 69625a8..ca33c0b 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -490,227 +490,6 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
}

/*
- * Signal frame handlers.
- */
-
-static inline int save_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fsave_struct *fp = &tsk->thread.fpu.state->fsave;
-
- fp->status = fp->swd;
- if (__copy_to_user(buf, fp, sizeof(struct i387_fsave_struct)))
- return -1;
- return 1;
-}
-
-static int save_i387_fxsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
- struct user_i387_ia32_struct env;
- int err = 0;
-
- convert_from_fxsr(&env, tsk);
- if (__copy_to_user(buf, &env, sizeof(env)))
- return -1;
-
- err |= __put_user(fx->swd, &buf->status);
- err |= __put_user(X86_FXSR_MAGIC, &buf->magic);
- if (err)
- return -1;
-
- if (__copy_to_user(&buf->_fxsr_env[0], fx, xstate_size))
- return -1;
- return 1;
-}
-
-static int save_i387_xsave(void __user *buf)
-{
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fx = buf;
- int err = 0;
-
-
- sanitize_i387_state(tsk);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context.
- * This will enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware applications can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
-
- if (save_i387_fxsave(fx) < 0)
- return -1;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved_ia32,
- sizeof(struct _fpx_sw_bytes));
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_ia32_size
- - FP_XSTATE_MAGIC2_SIZE));
- if (err)
- return -1;
-
- return 1;
-}
-
-int save_i387_xstate_ia32(void __user *buf)
-{
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
- struct task_struct *tsk = current;
-
- if (!used_math())
- return 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_ia32_size))
- return -EACCES;
- /*
- * This will cause a "finit" to be triggered by the next
- * attempted FPU operation by the 'current' process.
- */
- clear_used_math();
-
- if (!HAVE_HWFP) {
- return fpregs_soft_get(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) ? -1 : 1;
- }
-
- preempt_disable();
- save_xstates(tsk);
- preempt_enable();
-
- if (cpu_has_xsave)
- return save_i387_xsave(fp);
- if (cpu_has_fxsr)
- return save_i387_fxsave(fp);
- else
- return save_i387_fsave(fp);
-}
-
-static inline int restore_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
-
- return __copy_from_user(&tsk->thread.fpu.state->fsave, buf,
- sizeof(struct i387_fsave_struct));
-}
-
-static int restore_i387_fxsave(struct _fpstate_ia32 __user *buf,
- unsigned int size)
-{
- struct task_struct *tsk = current;
- struct user_i387_ia32_struct env;
- int err;
-
- err = __copy_from_user(&tsk->thread.fpu.state->fxsave, &buf->_fxsr_env[0],
- size);
- /* mxcsr reserved bits must be masked to zero for security reasons */
- tsk->thread.fpu.state->fxsave.mxcsr &= mxcsr_feature_mask;
- if (err || __copy_from_user(&env, buf, sizeof(env)))
- return 1;
- convert_to_fxsr(tsk, &env);
-
- return 0;
-}
-
-static int restore_i387_xsave(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- struct _fpstate_ia32 __user *fx_user =
- ((struct _fpstate_ia32 __user *) buf);
- struct i387_fxsave_struct __user *fx =
- (struct i387_fxsave_struct __user *) &fx_user->_fxsr_env[0];
- struct xsave_hdr_struct *xsave_hdr =
- &current->thread.fpu.state->xsave.xsave_hdr;
- u64 mask;
- int err;
-
- if (check_for_xstate(fx, sig_xstate_ia32_size -
- offsetof(struct _fpstate_ia32, _fxsr_env),
- &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- err = restore_i387_fxsave(buf, fx_sw_user.xstate_size);
-
- xsave_hdr->xstate_bv &= pcntxt_mask;
- /*
- * These bits must be zero.
- */
- xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
-
- /*
- * Init the state that is not present in the memory layout
- * and enabled by the OS.
- */
- mask = ~(pcntxt_mask & ~mask);
- xsave_hdr->xstate_bv &= mask;
-
- return err;
-fx_only:
- /*
- * Couldn't find the extended state information in the memory
- * layout. Restore the FP/SSE and init the other extended state
- * enabled by the OS.
- */
- xsave_hdr->xstate_bv = XSTATE_FPSSE;
- return restore_i387_fxsave(buf, sizeof(struct i387_fxsave_struct));
-}
-
-int restore_i387_xstate_ia32(void __user *buf)
-{
- int err;
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
-
- if (HAVE_HWFP)
- clear_fpu(tsk);
-
- if (!buf) {
- if (used_math()) {
- clear_fpu(tsk);
- clear_used_math();
- }
-
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_ia32_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (HAVE_HWFP) {
- if (cpu_has_xsave)
- err = restore_i387_xsave(buf);
- else if (cpu_has_fxsr)
- err = restore_i387_fxsave(fp, sizeof(struct
- i387_fxsave_struct));
- else
- err = restore_i387_fsave(fp);
- } else {
- err = fpregs_soft_set(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) != 0;
- }
- set_used_math();
-
- return err;
-}
-
-/*
* FPU state for core dumps.
* This is only used for a.out dumps now.
* It is declared generically using elf_fpregset_t (which is
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 072c30e..872fc78 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -720,28 +720,6 @@ asmlinkage void __attribute__((weak)) smp_threshold_interrupt(void)
}

/*
- * __math_state_restore assumes that cr0.TS is already clear and the
- * fpu state is all ready for use. Used during context switch.
- */
-void __math_state_restore(void)
-{
- struct thread_info *thread = current_thread_info();
- struct task_struct *tsk = thread->task;
-
- /*
- * Paranoid restore. send a SIGSEGV if we fail to restore the state.
- */
- if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
- stts();
- force_sig(SIGSEGV, tsk);
- return;
- }
-
- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
- tsk->fpu_counter++;
-}
-
-/*
* 'math_state_restore()' saves the current math information in the
* old math state array, and gets the new ones from the current task
*
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 08b2fe8..9ecc791 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -249,169 +249,6 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
return 1;
}

-#ifdef CONFIG_X86_64
-int save_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_size))
- return -EACCES;
-
- BUG_ON(sig_xstate_size < xstate_size);
-
- if ((unsigned long)buf % 64)
- printk("save_i387_xstate: bad fpstate %p\n", buf);
-
- if (!used_math())
- return 0;
-
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- if (use_xsave())
- err = xsave_checking(buf);
- else
- err = fxsave_user(buf);
-
- if (err)
- return err;
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
- } else {
- sanitize_i387_state(tsk);
- if (__copy_to_user(buf, &tsk->thread.fpu.state->fxsave,
- xstate_size))
- return -1;
- }
-
- clear_used_math(); /* trigger finit */
-
- if (use_xsave()) {
- struct _fpstate __user *fx = buf;
- struct _xstate __user *x = buf;
- u64 xstate_bv;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved,
- sizeof(struct _fpx_sw_bytes));
-
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_size
- - FP_XSTATE_MAGIC2_SIZE));
-
- /*
- * Read the xstate_bv which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers and
- * set the FP/SSE bits.
- */
- err |= __get_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context. This will
- * enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware apps can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- xstate_bv |= XSTATE_FPSSE;
-
- err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- if (err)
- return err;
- }
-
- return 1;
-}
-
-/*
- * Restore the extended state if present. Otherwise, restore the FP/SSE
- * state.
- */
-static int restore_user_xstate(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- u64 mask;
- int err;
-
- if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- /*
- * restore the state passed by the user.
- */
- err = xrstor_checking((__force struct xsave_struct *)buf, mask);
- if (err)
- return err;
-
- /*
- * init the state skipped by the user.
- */
- mask = pcntxt_mask & ~mask;
- if (unlikely(mask))
- xrstor_state(init_xstate_buf, mask);
-
- return 0;
-
-fx_only:
- /*
- * couldn't find the extended state information in the
- * memory layout. Restore just the FP/SSE and init all
- * the other extended state.
- */
- xrstor_state(init_xstate_buf, pcntxt_mask & ~XSTATE_FPSSE);
- return fxrstor_checking((__force struct i387_fxsave_struct *)buf);
-}
-
-/*
- * This restores directly out of user space. Exceptions are handled.
- */
-int restore_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!buf) {
- if (used_math())
- goto clear;
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (!(task_thread_info(current)->status & TS_USEDFPU)) {
- clts();
- task_thread_info(current)->status |= TS_USEDFPU;
- }
- if (use_xsave())
- err = restore_user_xstate(buf);
- else
- err = fxrstor_checking((__force struct i387_fxsave_struct *)
- buf);
- if (unlikely(err)) {
- /*
- * Encountered an error while doing the restore from the
- * user buffer, clear the fpu state.
- */
-clear:
- clear_fpu(tsk);
- clear_used_math();
- }
- return err;
-}
-#endif
-
int restore_xstates_sigframe(void __user *buf, unsigned int size)
{
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
--
1.5.6.5

2011-04-05 15:51:55

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 1/8] x86, xsave: cleanup fpu/xsave support

Removed the functions fpu_fxrstor_checking() and restore_fpu_checking()
because they weren't doing anything. Removed redundant xsave/xrstor
implementations. Since xsave/xrstor is not specific to the FPU, and also
for consistency, all xsave/xrstor functions now take a xsave_struct
argument.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 20 +++-------
arch/x86/include/asm/xsave.h | 81 +++++++++++++++---------------------------
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/xsave.c | 4 +-
4 files changed, 38 insertions(+), 69 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ef32890..d908383 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -227,12 +227,14 @@ static inline void fpu_fxsave(struct fpu *fpu)
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
- fpu_xsave(fpu);
+ struct xsave_struct *xstate = &fpu->state->xsave;
+
+ fpu_xsave(xstate);

/*
* xsave header may indicate the init state of the FP.
*/
- if (!(fpu->state->xsave.xsave_hdr.xstate_bv & XSTATE_FP))
+ if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
return;
} else if (use_fxsr()) {
fpu_fxsave(fpu);
@@ -262,22 +264,12 @@ static inline void __save_init_fpu(struct task_struct *tsk)
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}

-static inline int fpu_fxrstor_checking(struct fpu *fpu)
-{
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
static inline int fpu_restore_checking(struct fpu *fpu)
{
if (use_xsave())
- return fpu_xrstor_checking(fpu);
+ return xrstor_checking(&fpu->state->xsave, -1);
else
- return fpu_fxrstor_checking(fpu);
-}
-
-static inline int restore_fpu_checking(struct task_struct *tsk)
-{
- return fpu_restore_checking(&tsk->thread.fpu);
+ return fxrstor_checking(&fpu->state->fxsave);
}

/*
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c6ce245..8bcbbce 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -42,10 +42,11 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
void __user *fpstate,
struct _fpx_sw_bytes *sw);

-static inline int fpu_xrstor_checking(struct fpu *fpu)
+static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
{
- struct xsave_struct *fx = &fpu->state->xsave;
int err;
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;

asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
"2:\n"
@@ -55,13 +56,23 @@ static inline int fpu_xrstor_checking(struct fpu *fpu)
".previous\n"
_ASM_EXTABLE(1b, 3b)
: [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (-1), "d" (-1), "0" (0)
+ : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
: "memory");

return err;
}

-static inline int xsave_user(struct xsave_struct __user *buf)
+static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
+{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
+ asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
+ : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+ : "memory");
+}
+
+static inline int xsave_checking(struct xsave_struct __user *buf)
{
int err;

@@ -74,58 +85,24 @@ static inline int xsave_user(struct xsave_struct __user *buf)
if (unlikely(err))
return -EFAULT;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
+ asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl $-1,%[err]\n"
+ " jmp 2b\n"
+ ".previous\n"
+ _ASM_EXTABLE(1b,3b)
+ : [err] "=r" (err)
+ : "D" (buf), "a" (-1), "d" (-1), "0" (0)
+ : "memory");
+
if (unlikely(err) && __clear_user(buf, xstate_size))
err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
-static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask)
-{
- int err;
- struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
- u32 lmask = mask;
- u32 hmask = mask >> 32;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0)
- : "memory"); /* memory required? */
+ /* No need to clear here because the caller clears USED_MATH */
return err;
}

-static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
-{
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
- : "memory");
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -136,7 +113,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct fpu *fpu)
+static inline void fpu_xsave(struct xsave_struct *fx)
{
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
@@ -144,7 +121,7 @@ static inline void fpu_xsave(struct fpu *fpu)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (&fpu->state->xsave), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (-1), "d" (-1) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..32f3043 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -728,7 +728,7 @@ void __math_state_restore(void)
/*
* Paranoid restore. send a SIGSEGV if we fail to restore the state.
*/
- if (unlikely(restore_fpu_checking(tsk))) {
+ if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
stts();
force_sig(SIGSEGV, tsk);
return;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a391134..6b063d7 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -170,7 +170,7 @@ int save_i387_xstate(void __user *buf)

if (task_thread_info(tsk)->status & TS_USEDFPU) {
if (use_xsave())
- err = xsave_user(buf);
+ err = xsave_checking(buf);
else
err = fxsave_user(buf);

@@ -247,7 +247,7 @@ static int restore_user_xstate(void __user *buf)
/*
* restore the state passed by the user.
*/
- err = xrestore_user(buf, mask);
+ err = xrstor_checking((__force struct xsave_struct *)buf, mask);
if (err)
return err;

--
1.5.6.5

2011-04-05 15:52:35

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 8/8] x86, xsave: remove lazy allocation of xstate area

This patch completely removes lazy allocation of the xstate area. All
user tasks will always have an xstate area preallocated, just like they
already do when non-lazy features are present. The size of the xsave
area ranges from 112 to 960 bytes, depending on the xstates present and
enabled. Since it is common to use SSE etc. for optimization, the actual
overhead is expected to negligible.

This removes some of the special-case handling of non-lazy xstates. It
also greatly simplifies init_fpu() by removing the allocation code, the
check for presence of the xstate area or init_fpu() return value.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 20 ++++++++------------
arch/x86/kernel/i387.c | 41 +++++++++--------------------------------
arch/x86/kernel/traps.c | 16 ++--------------
arch/x86/kernel/xsave.c | 8 ++------
arch/x86/kvm/x86.c | 4 ++--
arch/x86/math-emu/fpu_entry.c | 8 ++------
6 files changed, 25 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index efe1476..989c0ac 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -40,7 +40,7 @@
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
-extern int init_fpu(struct task_struct *child);
+extern void init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

@@ -333,18 +333,14 @@ static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;

static inline void fpu_clear(struct fpu *fpu)
{
- if (pcntxt_mask & XCNTXT_NONLAZY) {
- if (!fpu_allocated(fpu)) {
- BUG_ON(init_xstate == NULL);
- fpu->state = init_xstate;
- init_xstate = NULL;
- }
- memset(fpu->state, 0, xstate_size);
- fpu_finit(fpu);
- set_used_math();
- } else {
- fpu_free(fpu);
+ if (!fpu_allocated(fpu)) {
+ BUG_ON(init_xstate == NULL);
+ fpu->state = init_xstate;
+ init_xstate = NULL;
}
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index dd9644a..df0b139 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -127,9 +127,9 @@ EXPORT_SYMBOL_GPL(fpu_finit);
* value at reset if we support XMM instructions and then
* remember the current task has used the FPU.
*/
-int init_fpu(struct task_struct *tsk)
+void init_fpu(struct task_struct *tsk)
{
- int ret;
+ BUG_ON(tsk->flags & PF_KTHREAD);

if (tsk_used_math(tsk)) {
if (HAVE_HWFP && tsk == current) {
@@ -137,20 +137,12 @@ int init_fpu(struct task_struct *tsk)
save_xstates(tsk);
preempt_enable();
}
- return 0;
+ return;
}

- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpu_alloc(&tsk->thread.fpu);
- if (ret)
- return ret;
-
fpu_finit(&tsk->thread.fpu);

set_stopped_child_used_math(tsk);
- return 0;
}
EXPORT_SYMBOL_GPL(init_fpu);

@@ -173,14 +165,10 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- int ret;
-
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -198,9 +186,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -232,9 +218,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

/*
* Copy the 48bytes defined by the software first into the xstate
@@ -262,9 +246,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->xsave, 0, -1);
@@ -427,11 +409,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
void *kbuf, void __user *ubuf)
{
struct user_i387_ia32_struct env;
- int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (!HAVE_HWFP)
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -462,9 +441,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 872fc78..c8fbd04 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -734,20 +734,8 @@ asmlinkage void math_state_restore(void)
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task;

- if (!tsk_used_math(tsk)) {
- local_irq_enable();
- /*
- * does a slab alloc which can sleep
- */
- if (init_fpu(tsk)) {
- /*
- * ran out of memory!
- */
- do_group_exit(SIGKILL);
- return;
- }
- local_irq_disable();
- }
+ if (!tsk_used_math(tsk))
+ init_fpu(tsk);

restore_xstates(tsk, XCNTXT_LAZY);
}
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a188362..62f2df8 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -264,7 +264,6 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
- int err;

if (!buf) {
if (used_math()) {
@@ -277,11 +276,8 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;

- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
+ if (!used_math())
+ init_fpu(tsk);

if (!HAVE_HWFP) {
set_used_math();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc04e15..17e52a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5386,8 +5386,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;

- if (!tsk_used_math(current) && init_fpu(current))
- return -ENOMEM;
+ if (!tsk_used_math(current))
+ init_fpu(current);

if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 7718541..472e2b9 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -147,12 +147,8 @@ void math_emulate(struct math_emu_info *info)
unsigned long code_limit = 0; /* Initialized to stop compiler warnings */
struct desc_struct code_descriptor;

- if (!used_math()) {
- if (init_fpu(current)) {
- do_group_exit(SIGKILL);
- return;
- }
- }
+ if (!used_math())
+ init_fpu(current);

#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
1.5.6.5

2011-04-05 15:52:29

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 6/8] x86, xsave: add support for non-lazy xstates

Non-lazy xstates are, as the name suggests, extended states that cannot
be saved or restored lazily. The state for AMDs LWP feature is an
example of this.

This patch adds support for this kind of xstates. If any such states are
present and supported on the running system, they will always be enabled
in xstate_mask so that they are always restored in switch_to. Since lazy
allocation of the xstate area won't work when non-lazy xstates are used,
all user tasks will always have a xstate area preallocated.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 17 +++++++++++++++++
arch/x86/include/asm/xsave.h | 5 +++--
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/xsave.c | 6 +++++-
5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index b8f9617..efe1476 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -329,6 +329,23 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)
}

extern void fpu_finit(struct fpu *fpu);
+static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;
+
+static inline void fpu_clear(struct fpu *fpu)
+{
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ if (!fpu_allocated(fpu)) {
+ BUG_ON(init_xstate == NULL);
+ fpu->state = init_xstate;
+ init_xstate = NULL;
+ }
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
+ } else {
+ fpu_free(fpu);
+ }
+}

#endif /* __ASSEMBLY__ */

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index b8861d4..4ccee3c 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -23,9 +23,10 @@
/*
* These are the features that the OS can handle currently.
*/
-#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_NONLAZY 0

-#define XCNTXT_LAZY XCNTXT_MASK
+#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8df07c3..a878736 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -257,7 +257,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}
EXPORT_SYMBOL_GPL(start_thread);

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cbf1a67..8ff35fc 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -344,7 +344,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}

void
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d42810f..56ab3d3 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -16,6 +16,7 @@
* Supported feature mask by the CPU and the kernel.
*/
u64 pcntxt_mask;
+EXPORT_SYMBOL(pcntxt_mask);

/*
* Represents init state for the supported extended state.
@@ -260,7 +261,7 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct task_struct *tsk = current;
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
- u64 xstate_mask = 0;
+ u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
int err;

if (!buf) {
@@ -477,6 +478,9 @@ static void __init xstate_enable_boot_cpu(void)
printk(KERN_INFO "xsave/xrstor: enabled xstate_bv 0x%llx, "
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);
+
+ if (pcntxt_mask & XCNTXT_NONLAZY)
+ task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
}

/*
--
1.5.6.5

2011-04-05 15:52:43

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support

Changes since last patch set:
* fixed pre-allocation of xsave area to exclude kernel threads


This patch set is a general cleanup and rework of the code related to
handling of FPU and other extended states.

All handling of extended states, including the FPU state, is now handled
by xsave/xrstor wrappers that fall back to fxsave/fxrstor, or even
fsave/frstor, if hardware support for those features is lacking.

Non-lazy xstates, which cannot be restored lazily, can now be easily
supported with almost no processing overhead. This makes adding basic
support for AMDs LWP almost trivial.

Since non-lazy xstates are inherently incompatible with lazy allocation
of the xstate area, the complete removal of lazy allocation to further
reduce code complexity should be considered. Since SSE-optimized library
functions are widely used today, most processes will have an xstate area
anyway, so the memory overhead wouldn't be big enough to be much of an
issue.

Hans Rosenfeld (8):
x86, xsave: cleanup fpu/xsave support
x86, xsave: rework fpu/xsave support
x86, xsave: cleanup fpu/xsave signal frame setup
x86, xsave: remove unused code
x86, xsave: more cleanups
x86, xsave: add support for non-lazy xstates
x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)
x86, xsave: remove lazy allocation of xstate area

arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 249 ++++++++--------------------
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/processor.h | 12 ++
arch/x86/include/asm/sigcontext.h | 12 ++
arch/x86/include/asm/thread_info.h | 4 +-
arch/x86/include/asm/xsave.h | 100 ++---------
arch/x86/kernel/i387.c | 305 +++-------------------------------
arch/x86/kernel/process_32.c | 29 +--
arch/x86/kernel/process_64.c | 28 +---
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/traps.c | 47 +-----
arch/x86/kernel/xsave.c | 317 +++++++++++++++++++++++-------------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 11 +-
arch/x86/math-emu/fpu_entry.c | 8 +-
drivers/lguest/x86/core.c | 2 +-
17 files changed, 388 insertions(+), 747 deletions(-)

2011-04-06 22:04:09

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: cleanup fpu/xsave support

Commit-ID: 26bce4e4c56f5929f96a4b82e7eb10ec2f61998d
Gitweb: http://git.kernel.org/tip/26bce4e4c56f5929f96a4b82e7eb10ec2f61998d
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:49 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:16 -0700

x86, xsave: cleanup fpu/xsave support

Removed the functions fpu_fxrstor_checking() and restore_fpu_checking()
because they weren't doing anything. Removed redundant xsave/xrstor
implementations. Since xsave/xrstor is not specific to the FPU, and also
for consistency, all xsave/xrstor functions now take a xsave_struct
argument.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 20 +++-------
arch/x86/include/asm/xsave.h | 81 +++++++++++++++---------------------------
arch/x86/kernel/traps.c | 2 +-
arch/x86/kernel/xsave.c | 4 +-
4 files changed, 38 insertions(+), 69 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ef32890..d908383 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -227,12 +227,14 @@ static inline void fpu_fxsave(struct fpu *fpu)
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
- fpu_xsave(fpu);
+ struct xsave_struct *xstate = &fpu->state->xsave;
+
+ fpu_xsave(xstate);

/*
* xsave header may indicate the init state of the FP.
*/
- if (!(fpu->state->xsave.xsave_hdr.xstate_bv & XSTATE_FP))
+ if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
return;
} else if (use_fxsr()) {
fpu_fxsave(fpu);
@@ -262,22 +264,12 @@ static inline void __save_init_fpu(struct task_struct *tsk)
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}

-static inline int fpu_fxrstor_checking(struct fpu *fpu)
-{
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
static inline int fpu_restore_checking(struct fpu *fpu)
{
if (use_xsave())
- return fpu_xrstor_checking(fpu);
+ return xrstor_checking(&fpu->state->xsave, -1);
else
- return fpu_fxrstor_checking(fpu);
-}
-
-static inline int restore_fpu_checking(struct task_struct *tsk)
-{
- return fpu_restore_checking(&tsk->thread.fpu);
+ return fxrstor_checking(&fpu->state->fxsave);
}

/*
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c6ce245..8bcbbce 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -42,10 +42,11 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
void __user *fpstate,
struct _fpx_sw_bytes *sw);

-static inline int fpu_xrstor_checking(struct fpu *fpu)
+static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
{
- struct xsave_struct *fx = &fpu->state->xsave;
int err;
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;

asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
"2:\n"
@@ -55,13 +56,23 @@ static inline int fpu_xrstor_checking(struct fpu *fpu)
".previous\n"
_ASM_EXTABLE(1b, 3b)
: [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (-1), "d" (-1), "0" (0)
+ : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
: "memory");

return err;
}

-static inline int xsave_user(struct xsave_struct __user *buf)
+static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
+{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
+ asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
+ : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
+ : "memory");
+}
+
+static inline int xsave_checking(struct xsave_struct __user *buf)
{
int err;

@@ -74,58 +85,24 @@ static inline int xsave_user(struct xsave_struct __user *buf)
if (unlikely(err))
return -EFAULT;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
+ asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl $-1,%[err]\n"
+ " jmp 2b\n"
+ ".previous\n"
+ _ASM_EXTABLE(1b,3b)
+ : [err] "=r" (err)
+ : "D" (buf), "a" (-1), "d" (-1), "0" (0)
+ : "memory");
+
if (unlikely(err) && __clear_user(buf, xstate_size))
err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
-static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask)
-{
- int err;
- struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
- u32 lmask = mask;
- u32 hmask = mask >> 32;

- __asm__ __volatile__("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- _ASM_ALIGN "\n"
- _ASM_PTR "1b,3b\n"
- ".previous"
- : [err] "=r" (err)
- : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0)
- : "memory"); /* memory required? */
+ /* No need to clear here because the caller clears USED_MATH */
return err;
}

-static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
-{
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile(".byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- : : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
- : "memory");
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -136,7 +113,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct fpu *fpu)
+static inline void fpu_xsave(struct xsave_struct *fx)
{
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
@@ -144,7 +121,7 @@ static inline void fpu_xsave(struct fpu *fpu)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (&fpu->state->xsave), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (-1), "d" (-1) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9b6716..32f3043 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -728,7 +728,7 @@ void __math_state_restore(void)
/*
* Paranoid restore. send a SIGSEGV if we fail to restore the state.
*/
- if (unlikely(restore_fpu_checking(tsk))) {
+ if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
stts();
force_sig(SIGSEGV, tsk);
return;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a391134..6b063d7 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -170,7 +170,7 @@ int save_i387_xstate(void __user *buf)

if (task_thread_info(tsk)->status & TS_USEDFPU) {
if (use_xsave())
- err = xsave_user(buf);
+ err = xsave_checking(buf);
else
err = fxsave_user(buf);

@@ -247,7 +247,7 @@ static int restore_user_xstate(void __user *buf)
/*
* restore the state passed by the user.
*/
- err = xrestore_user(buf, mask);
+ err = xrstor_checking((__force struct xsave_struct *)buf, mask);
if (err)
return err;

2011-04-06 22:04:35

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: rework fpu/xsave support

Commit-ID: 7f4f0a56a7d391f43cee4fb56f8c3f949fd22029
Gitweb: http://git.kernel.org/tip/7f4f0a56a7d391f43cee4fb56f8c3f949fd22029
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:50 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:17 -0700

x86, xsave: rework fpu/xsave support

This is a complete rework of the code that handles FPU and related
extended states. Since FPU, XMM and YMM states are just variants of what
xsave handles, all of the old FPU-specific state handling code will be
hidden behind a set of functions that resemble xsave and xrstor. For
hardware that does not support xsave, the code falls back to
fxsave/fxrstor or even fsave/frstor.

A xstate_mask member will be added to the thread_info structure that
will control which states are to be saved by xsave. It is set to include
all "lazy" states (that is, all states currently supported: FPU, XMM and
YMM) by the #NM handler when a lazy restore is triggered or by
switch_to() when the tasks FPU context is preloaded. Xstate_mask is
intended to completely replace TS_USEDFPU in a later cleanup patch.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 44 +++++++++++++++++++---
arch/x86/include/asm/thread_info.h | 2 +
arch/x86/include/asm/xsave.h | 14 ++++++-
arch/x86/kernel/i387.c | 11 ++++--
arch/x86/kernel/process_32.c | 27 +++++---------
arch/x86/kernel/process_64.c | 26 ++++----------
arch/x86/kernel/traps.c | 11 +++---
arch/x86/kernel/xsave.c | 71 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 7 ++--
drivers/lguest/x86/core.c | 2 +-
10 files changed, 158 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index d908383..6622ed2 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -224,12 +224,46 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
+static inline void fpu_restore(struct fpu *fpu)
+{
+ fxrstor_checking(&fpu->state->fxsave);
+}
+
+static inline void fpu_save(struct fpu *fpu)
+{
+ if (use_fxsr()) {
+ fpu_fxsave(fpu);
+ } else {
+ asm volatile("fsave %[fx]; fwait"
+ : [fx] "=m" (fpu->state->fsave));
+ }
+}
+
+static inline void fpu_clean(struct fpu *fpu)
+{
+ u32 swd = (use_fxsr() || use_xsave()) ?
+ fpu->state->fxsave.swd : fpu->state->fsave.swd;
+
+ if (unlikely(swd & X87_FSW_ES))
+ asm volatile("fnclex");
+
+ /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
+ is pending. Clear the x87 state here by setting it to fixed
+ values. safe_address is a random variable that should be in L1 */
+ alternative_input(
+ ASM_NOP8 ASM_NOP2,
+ "emms\n\t" /* clear stack tags */
+ "fildl %P[addr]", /* set F?P to defined value */
+ X86_FEATURE_FXSAVE_LEAK,
+ [addr] "m" (safe_address));
+}
+
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
struct xsave_struct *xstate = &fpu->state->xsave;

- fpu_xsave(xstate);
+ fpu_xsave(xstate, -1);

/*
* xsave header may indicate the init state of the FP.
@@ -295,18 +329,16 @@ static inline void __clear_fpu(struct task_struct *tsk)
"2:\n"
_ASM_EXTABLE(1b, 2b));
task_thread_info(tsk)->status &= ~TS_USEDFPU;
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
stts();
}
}

static inline void kernel_fpu_begin(void)
{
- struct thread_info *me = current_thread_info();
preempt_disable();
- if (me->status & TS_USEDFPU)
- __save_init_fpu(me->task);
- else
- clts();
+ save_xstates(current_thread_info()->task);
+ clts();
}

static inline void kernel_fpu_end(void)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 1f2e61e..ec12d62 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -26,6 +26,7 @@ struct exec_domain;
struct thread_info {
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
+ __u64 xstate_mask; /* xstates in use */
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
@@ -47,6 +48,7 @@ struct thread_info {
{ \
.task = &tsk, \
.exec_domain = &default_exec_domain, \
+ .xstate_mask = 0, \
.flags = 0, \
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 8bcbbce..6052a84 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -25,6 +25,8 @@
*/
#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)

+#define XCNTXT_LAZY XCNTXT_MASK
+
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
@@ -35,6 +37,11 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

+extern void xsave(struct fpu *, u64);
+extern void xrstor(struct fpu *, u64);
+extern void save_xstates(struct task_struct *);
+extern void restore_xstates(struct task_struct *, u64);
+
extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
@@ -113,15 +120,18 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx)
+static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
{
+ u32 lmask = mask;
+ u32 hmask = mask >> 32;
+
/* This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
alternative_input(
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (-1), "d" (-1) :
+ [fx] "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 12aff25..1088ac5 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -152,8 +152,11 @@ int init_fpu(struct task_struct *tsk)
int ret;

if (tsk_used_math(tsk)) {
- if (HAVE_HWFP && tsk == current)
- unlazy_fpu(tsk);
+ if (HAVE_HWFP && tsk == current) {
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
+ }
return 0;
}

@@ -600,7 +603,9 @@ int save_i387_xstate_ia32(void __user *buf)
NULL, fp) ? -1 : 1;
}

- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();

if (cpu_has_xsave)
return save_i387_xsave(fp);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..8df07c3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -185,7 +185,9 @@ void release_thread(struct task_struct *dead_task)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -294,21 +296,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*next = &next_p->thread;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
- bool preload_fpu;

/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */

- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;
-
- __unlazy_fpu(prev_p);
+ save_xstates(prev_p);

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -349,11 +343,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);

- /* If we're going to preload the fpu context, make sure clts
- is run while we're batching the cpu state updates. */
- if (preload_fpu)
- clts();
-
/*
* Leave lazy mode, flushing any hypercalls made here.
* This must be done before restoring TLS segments so
@@ -363,8 +352,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/
arch_end_context_switch(next_p);

- if (preload_fpu)
- __math_state_restore();
+ /*
+ * Restore enabled extended states for the task.
+ */
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

/*
* Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6c9dd92..cbf1a67 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -249,7 +249,9 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
*/
void prepare_to_copy(struct task_struct *tsk)
{
- unlazy_fpu(tsk);
+ preempt_disable();
+ save_xstates(tsk);
+ preempt_enable();
}

int copy_thread(unsigned long clone_flags, unsigned long sp,
@@ -378,17 +380,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
unsigned fsindex, gsindex;
- bool preload_fpu;
-
- /*
- * If the task has used fpu the last 5 timeslices, just do a full
- * restore of the math state immediately to avoid the trap; the
- * chances of needing FPU soon are obviously high now
- */
- preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

/* we're going to use this soon, after a few expensive things */
- if (preload_fpu)
+ if (task_thread_info(next_p)->xstate_mask)
prefetch(next->fpu.state);

/*
@@ -420,11 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
load_TLS(next, cpu);

/* Must be after DS reload */
- __unlazy_fpu(prev_p);
-
- /* Make sure cpu is ready for new context */
- if (preload_fpu)
- clts();
+ save_xstates(prev_p);

/*
* Leave lazy mode, flushing any hypercalls made here.
@@ -485,11 +475,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
__switch_to_xtra(prev_p, next_p, tss);

/*
- * Preload the FPU context, now that we've determined that the
- * task is likely to be using it.
+ * Restore enabled extended states for the task.
*/
- if (preload_fpu)
- __math_state_restore();
+ restore_xstates(next_p, task_thread_info(next_p)->xstate_mask);

return prev_p;
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 32f3043..072c30e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -625,7 +625,10 @@ void math_error(struct pt_regs *regs, int error_code, int trapnr)
/*
* Save the info for the exception handler and clear the error.
*/
- save_init_fpu(task);
+ preempt_disable();
+ save_xstates(task);
+ preempt_enable();
+
task->thread.trap_no = trapnr;
task->thread.error_code = error_code;
info.si_signo = SIGFPE;
@@ -734,7 +737,7 @@ void __math_state_restore(void)
return;
}

- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
+ thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
}

@@ -768,9 +771,7 @@ asmlinkage void math_state_restore(void)
local_irq_disable();
}

- clts(); /* Allow maths ops (or we recurse) */
-
- __math_state_restore();
+ restore_xstates(tsk, XCNTXT_LAZY);
}
EXPORT_SYMBOL_GPL(math_state_restore);

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 6b063d7..d9fa41f 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -5,6 +5,7 @@
*/
#include <linux/bootmem.h>
#include <linux/compat.h>
+#include <linux/module.h>
#include <asm/i387.h>
#ifdef CONFIG_IA32_EMULATION
#include <asm/sigcontext32.h>
@@ -474,3 +475,73 @@ void __cpuinit xsave_init(void)
next_func = xstate_enable;
this_func();
}
+
+void xsave(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ fpu_xsave(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_save(fpu);
+
+ if (mask & XCNTXT_LAZY)
+ fpu_clean(fpu);
+
+ stts();
+}
+EXPORT_SYMBOL(xsave);
+
+void save_xstates(struct task_struct *tsk)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xsave(&tsk->thread.fpu, ti->xstate_mask);
+
+ if (!(ti->xstate_mask & XCNTXT_LAZY))
+ tsk->fpu_counter = 0;
+
+ /*
+ * If the task hasn't used the fpu the last 5 timeslices,
+ * force a lazy restore of the math states by clearing them
+ * from xstate_mask.
+ */
+ if (tsk->fpu_counter < 5)
+ ti->xstate_mask &= ~XCNTXT_LAZY;
+
+ ti->status &= ~TS_USEDFPU;
+}
+EXPORT_SYMBOL(save_xstates);
+
+void xrstor(struct fpu *fpu, u64 mask)
+{
+ clts();
+
+ if (use_xsave())
+ xrstor_state(&fpu->state->xsave, mask);
+ else if (mask & XCNTXT_LAZY)
+ fpu_restore(fpu);
+
+ if (!(mask & XCNTXT_LAZY))
+ stts();
+}
+EXPORT_SYMBOL(xrstor);
+
+void restore_xstates(struct task_struct *tsk, u64 mask)
+{
+ struct thread_info *ti = task_thread_info(tsk);
+
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return;
+
+ xrstor(&tsk->thread.fpu, mask);
+
+ ti->xstate_mask |= mask;
+ ti->status |= TS_USEDFPU;
+ if (mask & XCNTXT_LAZY)
+ tsk->fpu_counter++;
+}
+EXPORT_SYMBOL(restore_xstates);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58f517b..aae9e8f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -58,6 +58,7 @@
#include <asm/xcr.h>
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/xsave.h>

#define MAX_IO_MSRS 256
#define CR0_RESERVED_BITS \
@@ -5803,8 +5804,8 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
*/
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
- unlazy_fpu(current);
- fpu_restore_checking(&vcpu->arch.guest_fpu);
+ save_xstates(current);
+ xrstor(&vcpu->arch.guest_fpu, -1);
trace_kvm_fpu(1);
}

@@ -5816,7 +5817,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- fpu_save_init(&vcpu->arch.guest_fpu);
+ xsave(&vcpu->arch.guest_fpu, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9f1659c..ef62289 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -204,7 +204,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
* uses the FPU.
*/
if (cpu->ts)
- unlazy_fpu(current);
+ save_xstates(current);

/*
* SYSENTER is an optimized way of doing system calls. We can't allow

2011-04-06 22:05:04

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: cleanup fpu/xsave signal frame setup

Commit-ID: 0c11e6f1aed142f9c08782ad08cd920541afd4d2
Gitweb: http://git.kernel.org/tip/0c11e6f1aed142f9c08782ad08cd920541afd4d2
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:51 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:18 -0700

x86, xsave: cleanup fpu/xsave signal frame setup

There are currently two code paths that handle the fpu/xsave context in
a signal frame for 32bit and 64bit tasks. These two code paths differ
only in that they have or lack certain micro-optimizations or do some
additional work (fsave compatibility for 32bit). The code is complex,
mostly duplicate and hard to understand and maintain.

This patch creates a set of two new, unified and cleaned up functions to
replace them. Besides avoiding the duplicate code, it is now obvious
what is done in which situations. The micro-optimization w.r.t xsave
(saving and restoring directly from the user buffer) is gone, and with
it the headaches caused by it about validating the buffer alignment and
contents and catching possible xsave/xrstor faults.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/ia32/ia32_signal.c | 4 +-
arch/x86/include/asm/i387.h | 20 ++++
arch/x86/include/asm/xsave.h | 4 +-
arch/x86/kernel/i387.c | 32 ++------
arch/x86/kernel/signal.c | 4 +-
arch/x86/kernel/xsave.c | 197 ++++++++++++++++++++++++++++++++++++++++--
6 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..2605fae 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -255,7 +255,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,

get_user_ex(tmp, &sc->fpstate);
buf = compat_ptr(tmp);
- err |= restore_i387_xstate_ia32(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_ia32_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -396,7 +396,7 @@ static void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
if (used_math()) {
sp = sp - sig_xstate_ia32_size;
*fpstate = (struct _fpstate_ia32 *) sp;
- if (save_i387_xstate_ia32(*fpstate) < 0)
+ if (save_xstates_sigframe(*fpstate, sig_xstate_ia32_size) < 0)
return (void __user *) -1L;
}

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 6622ed2..fc716bc 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -25,6 +25,20 @@
#include <asm/uaccess.h>
#include <asm/xsave.h>

+#ifdef CONFIG_X86_64
+# include <asm/sigcontext32.h>
+# include <asm/user32.h>
+#else
+# define save_i387_xstate_ia32 save_i387_xstate
+# define restore_i387_xstate_ia32 restore_i387_xstate
+# define _fpstate_ia32 _fpstate
+# define _xstate_ia32 _xstate
+# define sig_xstate_ia32_size sig_xstate_size
+# define fx_sw_reserved_ia32 fx_sw_reserved
+# define user_i387_ia32_struct user_i387_struct
+# define user32_fxsr_struct user_fxsr_struct
+#endif
+
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
@@ -33,6 +47,9 @@ extern asmlinkage void math_state_restore(void);
extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

+extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
+extern void convert_to_fxsr(struct task_struct *, const struct user_i387_ia32_struct *);
+
extern user_regset_active_fn fpregs_active, xfpregs_active;
extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
xstateregs_get;
@@ -46,6 +63,7 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
#define xstateregs_active fpregs_active

extern struct _fpx_sw_bytes fx_sw_reserved;
+extern unsigned int mxcsr_feature_mask;
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
@@ -56,8 +74,10 @@ extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
+# define HAVE_HWFP (boot_cpu_data.hard_math)
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
+# define HAVE_HWFP 1
static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
#endif

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 6052a84..200c56d 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -41,12 +41,14 @@ extern void xsave(struct fpu *, u64);
extern void xrstor(struct fpu *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
+extern int save_xstates_sigframe(void __user *, unsigned int);
+extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+ unsigned int size,
struct _fpx_sw_bytes *sw);

static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 1088ac5..69625a8 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -18,27 +18,7 @@
#include <asm/i387.h>
#include <asm/user.h>

-#ifdef CONFIG_X86_64
-# include <asm/sigcontext32.h>
-# include <asm/user32.h>
-#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
-# define _fpstate_ia32 _fpstate
-# define _xstate_ia32 _xstate
-# define sig_xstate_ia32_size sig_xstate_size
-# define fx_sw_reserved_ia32 fx_sw_reserved
-# define user_i387_ia32_struct user_i387_struct
-# define user32_fxsr_struct user_fxsr_struct
-#endif
-
-#ifdef CONFIG_MATH_EMULATION
-# define HAVE_HWFP (boot_cpu_data.hard_math)
-#else
-# define HAVE_HWFP 1
-#endif
-
-static unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
+unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
unsigned int sig_xstate_ia32_size = sizeof(struct _fpstate_ia32);
@@ -375,7 +355,7 @@ static inline u32 twd_fxsr_to_i387(struct i387_fxsave_struct *fxsave)
* FXSR floating point environment conversions.
*/

-static void
+void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -412,8 +392,8 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
memcpy(&to[i], &from[i], sizeof(to[0]));
}

-static void convert_to_fxsr(struct task_struct *tsk,
- const struct user_i387_ia32_struct *env)
+void convert_to_fxsr(struct task_struct *tsk,
+ const struct user_i387_ia32_struct *env)

{
struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
@@ -653,7 +633,9 @@ static int restore_i387_xsave(void __user *buf)
u64 mask;
int err;

- if (check_for_xstate(fx, buf, &fx_sw_user))
+ if (check_for_xstate(fx, sig_xstate_ia32_size -
+ offsetof(struct _fpstate_ia32, _fxsr_env),
+ &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 4fd173c..f6705ff 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -117,7 +117,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
regs->orig_ax = -1; /* disable syscall checks */

get_user_ex(buf, &sc->fpstate);
- err |= restore_i387_xstate(buf);
+ err |= restore_xstates_sigframe(buf, sig_xstate_size);

get_user_ex(*pax, &sc->ax);
} get_user_catch(err);
@@ -252,7 +252,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
return (void __user *)-1L;

/* save i387 state */
- if (used_math() && save_i387_xstate(*fpstate) < 0)
+ if (used_math() && save_xstates_sigframe(*fpstate, sig_xstate_size) < 0)
return (void __user *)-1L;

return (void __user *)sp;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d9fa41f..08b2fe8 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -103,8 +103,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf,
- void __user *fpstate,
+int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
@@ -131,11 +130,11 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
fx_sw_user->xstate_size > fx_sw_user->extended_size)
return -EINVAL;

- err = __get_user(magic2, (__u32 *) (((void *)fpstate) +
- fx_sw_user->extended_size -
+ err = __get_user(magic2, (__u32 *) (((void *)buf) + size -
FP_XSTATE_MAGIC2_SIZE));
if (err)
return err;
+
/*
* Check for the presence of second magic word at the end of memory
* layout. This detects the case where the user just copied the legacy
@@ -148,11 +147,109 @@ int check_for_xstate(struct i387_fxsave_struct __user *buf,
return 0;
}

-#ifdef CONFIG_X86_64
/*
* Signal frame handlers.
*/
+int save_xstates_sigframe(void __user *buf, unsigned int size)
+{
+ void __user *buf_fxsave = buf;
+ struct task_struct *tsk = current;
+ struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ int err;
+
+ if (!access_ok(VERIFY_WRITE, buf, size))
+ return -EACCES;
+
+ BUG_ON(size < xstate_size);
+
+ if (!used_math())
+ return 0;
+
+ clear_used_math(); /* trigger finit */
+
+ if (!HAVE_HWFP)
+ return fpregs_soft_get(current, NULL, 0,
+ sizeof(struct user_i387_ia32_struct), NULL,
+ (struct _fpstate_ia32 __user *) buf) ? -1 : 1;
+
+ save_xstates(tsk);
+ sanitize_i387_state(tsk);
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32) {
+ if (use_xsave() || use_fxsr()) {
+ struct user_i387_ia32_struct env;
+ struct _fpstate_ia32 __user *fp = buf;
+
+ convert_from_fxsr(&env, tsk);
+ if (__copy_to_user(buf, &env, sizeof(env)))
+ return -1;
+
+ err = __put_user(xsave->i387.swd, &fp->status);
+ err |= __put_user(X86_FXSR_MAGIC, &fp->magic);
+
+ if (err)
+ return -1;
+
+ buf_fxsave = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+#if defined(CONFIG_X86_64)
+ buf = buf_fxsave;
+#endif
+ } else {
+ struct i387_fsave_struct *fsave =
+ &tsk->thread.fpu.state->fsave;
+
+ fsave->status = fsave->swd;
+ }
+ }
+#endif

+ if (__copy_to_user(buf_fxsave, xsave, size))
+ return -1;
+
+ if (use_xsave()) {
+ struct _fpstate __user *fp = buf;
+ struct _xstate __user *x = buf;
+ u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+
+ err = __copy_to_user(&fp->sw_reserved,
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ ia32 ? &fx_sw_reserved_ia32 :
+#endif
+ &fx_sw_reserved,
+ sizeof (struct _fpx_sw_bytes));
+
+ err |= __put_user(FP_XSTATE_MAGIC2,
+ (__u32 __user *) (buf_fxsave + size
+ - FP_XSTATE_MAGIC2_SIZE));
+
+ /*
+ * For legacy compatible, we always set FP/SSE bits in the bit
+ * vector while saving the state to the user context. This will
+ * enable us capturing any changes(during sigreturn) to
+ * the FP/SSE bits by the legacy applications which don't touch
+ * xstate_bv in the xsave header.
+ *
+ * xsave aware apps can change the xstate_bv in the xsave
+ * header as well as change any contents in the memory layout.
+ * xrestore as part of sigreturn will capture all the changes.
+ */
+ xstate_bv |= XSTATE_FPSSE;
+
+ err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
+
+ if (err)
+ return err;
+ }
+
+ return 1;
+}
+
+#ifdef CONFIG_X86_64
int save_i387_xstate(void __user *buf)
{
struct task_struct *tsk = current;
@@ -240,7 +337,7 @@ static int restore_user_xstate(void __user *buf)
int err;

if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, buf, &fx_sw_user))
+ check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
goto fx_only;

mask = fx_sw_user.xstate_bv;
@@ -315,6 +412,94 @@ clear:
}
#endif

+int restore_xstates_sigframe(void __user *buf, unsigned int size)
+{
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ struct user_i387_ia32_struct env;
+ int ia32 = size == sig_xstate_ia32_size;
+#endif
+ struct _fpx_sw_bytes fx_sw_user;
+ struct task_struct *tsk = current;
+ struct _fpstate_ia32 __user *fp = buf;
+ struct xsave_struct *xsave;
+ u64 xstate_mask = 0;
+ int err;
+
+ if (!buf) {
+ if (used_math()) {
+ clear_fpu(tsk);
+ clear_used_math();
+ }
+ return 0;
+ }
+
+ if (!access_ok(VERIFY_READ, buf, size))
+ return -EACCES;
+
+ if (!used_math()) {
+ err = init_fpu(tsk);
+ if (err)
+ return err;
+ }
+
+ if (!HAVE_HWFP) {
+ set_used_math();
+ return fpregs_soft_set(current, NULL,
+ 0, sizeof(struct user_i387_ia32_struct),
+ NULL, fp) != 0;
+ }
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32 && (use_xsave() || use_fxsr())) {
+ if (__copy_from_user(&env, buf, sizeof(env)))
+ return -1;
+ buf = fp->_fxsr_env;
+ size -= offsetof(struct _fpstate_ia32, _fxsr_env);
+ }
+#endif
+
+ xsave = &tsk->thread.fpu.state->xsave;
+ task_thread_info(tsk)->xstate_mask = 0;
+ if (__copy_from_user(xsave, buf, xstate_size))
+ return -1;
+
+ if (use_xsave()) {
+ u64 *xstate_bv = &xsave->xsave_hdr.xstate_bv;
+
+ /*
+ * If this is no valid xstate, disable all extended states.
+ *
+ * For valid xstates, clear any illegal bits and any bits
+ * that have been cleared in fx_sw_user.xstate_bv.
+ */
+ if (check_for_xstate(buf, size, &fx_sw_user))
+ *xstate_bv = XSTATE_FPSSE;
+ else
+ *xstate_bv &= pcntxt_mask & fx_sw_user.xstate_bv;
+
+ xstate_mask |= *xstate_bv;
+
+ xsave->xsave_hdr.reserved1[0] =
+ xsave->xsave_hdr.reserved1[1] = 0;
+ } else {
+ xstate_mask |= XCNTXT_LAZY;
+ }
+
+ if (use_xsave() || use_fxsr()) {
+ xsave->i387.mxcsr &= mxcsr_feature_mask;
+
+#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+ if (ia32)
+ convert_to_fxsr(tsk, &env);
+#endif
+ }
+
+ set_used_math();
+ restore_xstates(tsk, xstate_mask);
+
+ return 0;
+}
+
/*
* Prepare the SW reserved portion of the fxsave memory layout, indicating
* the presence of the extended state information in the memory layout

2011-04-06 22:05:31

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: remove unused code

Commit-ID: 2efd67935eb7aa6ee4c124e7315d964245a1bfe8
Gitweb: http://git.kernel.org/tip/2efd67935eb7aa6ee4c124e7315d964245a1bfe8
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:52 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:18 -0700

x86, xsave: remove unused code

The patches to rework the fpu/xsave handling and signal frame setup have
made a lot of code unused. This patch removes all this now useless stuff.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 155 ++----------------------------
arch/x86/include/asm/xsave.h | 51 ----------
arch/x86/kernel/i387.c | 221 ------------------------------------------
arch/x86/kernel/traps.c | 22 ----
arch/x86/kernel/xsave.c | 163 -------------------------------
5 files changed, 7 insertions(+), 605 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index fc716bc..75c0800 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -29,8 +29,6 @@
# include <asm/sigcontext32.h>
# include <asm/user32.h>
#else
-# define save_i387_xstate_ia32 save_i387_xstate
-# define restore_i387_xstate_ia32 restore_i387_xstate
# define _fpstate_ia32 _fpstate
# define _xstate_ia32 _xstate
# define sig_xstate_ia32_size sig_xstate_size
@@ -108,75 +106,16 @@ static inline void sanitize_i387_state(struct task_struct *tsk)
}

#ifdef CONFIG_X86_64
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
- int err;
-
- /* See comment in fxsave() below. */
-#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxrstorq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "m" (*fx), "0" (0));
-#else
- asm volatile("1: rex64/fxrstor (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : [fx] "R" (fx), "m" (*fx), "0" (0));
-#endif
- return err;
-}
-
-static inline int fxsave_user(struct i387_fxsave_struct __user *fx)
-{
- int err;
-
- /*
- * Clear the bytes not touched by the fxsave and reserved
- * for the SW usage.
- */
- err = __clear_user(&fx->sw_reserved,
- sizeof(struct _fpx_sw_bytes));
- if (unlikely(err))
- return -EFAULT;
-
/* See comment in fxsave() below. */
#ifdef CONFIG_AS_FXSAVEQ
- asm volatile("1: fxsaveq %[fx]\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), [fx] "=m" (*fx)
- : "0" (0));
+ asm volatile("fxrstorq %[fx]\n\t"
+ : : [fx] "m" (*fx));
#else
- asm volatile("1: rex64/fxsave (%[fx])\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err), "=m" (*fx)
- : [fx] "R" (fx), "0" (0));
+ asm volatile("rex64/fxrstor (%[fx])\n\t"
+ : : [fx] "R" (fx), "m" (*fx));
#endif
- if (unlikely(err) &&
- __clear_user(fx, sizeof(struct i387_fxsave_struct)))
- err = -EFAULT;
- /* No need to clear here because the caller clears USED_MATH */
- return err;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -209,7 +148,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
#else /* CONFIG_X86_32 */

/* perform fxrstor iff the processor has extended states, otherwise frstor */
-static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
+static inline void fxrstor(struct i387_fxsave_struct *fx)
{
/*
* The "nop" is needed to make the instructions the same
@@ -220,8 +159,6 @@ static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
"fxrstor %1",
X86_FEATURE_FXSR,
"m" (*fx));
-
- return 0;
}

static inline void fpu_fxsave(struct fpu *fpu)
@@ -246,7 +183,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
*/
static inline void fpu_restore(struct fpu *fpu)
{
- fxrstor_checking(&fpu->state->fxsave);
+ fxrstor(&fpu->state->fxsave);
}

static inline void fpu_save(struct fpu *fpu)
@@ -278,69 +215,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void fpu_save_init(struct fpu *fpu)
-{
- if (use_xsave()) {
- struct xsave_struct *xstate = &fpu->state->xsave;
-
- fpu_xsave(xstate, -1);
-
- /*
- * xsave header may indicate the init state of the FP.
- */
- if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
- return;
- } else if (use_fxsr()) {
- fpu_fxsave(fpu);
- } else {
- asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
- return;
- }
-
- if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES))
- asm volatile("fnclex");
-
- /* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
- is pending. Clear the x87 state here by setting it to fixed
- values. safe_address is a random variable that should be in L1 */
- alternative_input(
- ASM_NOP8 ASM_NOP2,
- "emms\n\t" /* clear stack tags */
- "fildl %P[addr]", /* set F?P to defined value */
- X86_FEATURE_FXSAVE_LEAK,
- [addr] "m" (safe_address));
-}
-
-static inline void __save_init_fpu(struct task_struct *tsk)
-{
- fpu_save_init(&tsk->thread.fpu);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
-}
-
-static inline int fpu_restore_checking(struct fpu *fpu)
-{
- if (use_xsave())
- return xrstor_checking(&fpu->state->xsave, -1);
- else
- return fxrstor_checking(&fpu->state->fxsave);
-}
-
-/*
- * Signal frame handlers...
- */
-extern int save_i387_xstate(void __user *buf);
-extern int restore_i387_xstate(void __user *buf);
-
-static inline void __unlazy_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- __save_init_fpu(tsk);
- stts();
- } else
- tsk->fpu_counter = 0;
-}
-
static inline void __clear_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
@@ -409,21 +283,6 @@ static inline void irq_ts_restore(int TS_state)
/*
* These disable preemption on their own and are safe
*/
-static inline void save_init_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __save_init_fpu(tsk);
- stts();
- preempt_enable();
-}
-
-static inline void unlazy_fpu(struct task_struct *tsk)
-{
- preempt_disable();
- __unlazy_fpu(tsk);
- preempt_enable();
-}
-
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 200c56d..742da4a 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -51,26 +51,6 @@ extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
unsigned int size,
struct _fpx_sw_bytes *sw);

-static inline int xrstor_checking(struct xsave_struct *fx, u64 mask)
-{
- int err;
- u32 lmask = mask;
- u32 hmask = mask >> 32;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n\t"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b, 3b)
- : [err] "=r" (err)
- : "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask), "0" (0)
- : "memory");
-
- return err;
-}
-
static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
@@ -81,37 +61,6 @@ static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline int xsave_checking(struct xsave_struct __user *buf)
-{
- int err;
-
- /*
- * Clear the xsave header first, so that reserved fields are
- * initialized to zero.
- */
- err = __clear_user(&buf->xsave_hdr,
- sizeof(struct xsave_hdr_struct));
- if (unlikely(err))
- return -EFAULT;
-
- asm volatile("1: .byte " REX_PREFIX "0x0f,0xae,0x27\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl $-1,%[err]\n"
- " jmp 2b\n"
- ".previous\n"
- _ASM_EXTABLE(1b,3b)
- : [err] "=r" (err)
- : "D" (buf), "a" (-1), "d" (-1), "0" (0)
- : "memory");
-
- if (unlikely(err) && __clear_user(buf, xstate_size))
- err = -EFAULT;
-
- /* No need to clear here because the caller clears USED_MATH */
- return err;
-}
-
static inline void xsave_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 69625a8..ca33c0b 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -490,227 +490,6 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
}

/*
- * Signal frame handlers.
- */
-
-static inline int save_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fsave_struct *fp = &tsk->thread.fpu.state->fsave;
-
- fp->status = fp->swd;
- if (__copy_to_user(buf, fp, sizeof(struct i387_fsave_struct)))
- return -1;
- return 1;
-}
-
-static int save_i387_fxsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
- struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
- struct user_i387_ia32_struct env;
- int err = 0;
-
- convert_from_fxsr(&env, tsk);
- if (__copy_to_user(buf, &env, sizeof(env)))
- return -1;
-
- err |= __put_user(fx->swd, &buf->status);
- err |= __put_user(X86_FXSR_MAGIC, &buf->magic);
- if (err)
- return -1;
-
- if (__copy_to_user(&buf->_fxsr_env[0], fx, xstate_size))
- return -1;
- return 1;
-}
-
-static int save_i387_xsave(void __user *buf)
-{
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fx = buf;
- int err = 0;
-
-
- sanitize_i387_state(tsk);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context.
- * This will enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware applications can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
-
- if (save_i387_fxsave(fx) < 0)
- return -1;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved_ia32,
- sizeof(struct _fpx_sw_bytes));
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_ia32_size
- - FP_XSTATE_MAGIC2_SIZE));
- if (err)
- return -1;
-
- return 1;
-}
-
-int save_i387_xstate_ia32(void __user *buf)
-{
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
- struct task_struct *tsk = current;
-
- if (!used_math())
- return 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_ia32_size))
- return -EACCES;
- /*
- * This will cause a "finit" to be triggered by the next
- * attempted FPU operation by the 'current' process.
- */
- clear_used_math();
-
- if (!HAVE_HWFP) {
- return fpregs_soft_get(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) ? -1 : 1;
- }
-
- preempt_disable();
- save_xstates(tsk);
- preempt_enable();
-
- if (cpu_has_xsave)
- return save_i387_xsave(fp);
- if (cpu_has_fxsr)
- return save_i387_fxsave(fp);
- else
- return save_i387_fsave(fp);
-}
-
-static inline int restore_i387_fsave(struct _fpstate_ia32 __user *buf)
-{
- struct task_struct *tsk = current;
-
- return __copy_from_user(&tsk->thread.fpu.state->fsave, buf,
- sizeof(struct i387_fsave_struct));
-}
-
-static int restore_i387_fxsave(struct _fpstate_ia32 __user *buf,
- unsigned int size)
-{
- struct task_struct *tsk = current;
- struct user_i387_ia32_struct env;
- int err;
-
- err = __copy_from_user(&tsk->thread.fpu.state->fxsave, &buf->_fxsr_env[0],
- size);
- /* mxcsr reserved bits must be masked to zero for security reasons */
- tsk->thread.fpu.state->fxsave.mxcsr &= mxcsr_feature_mask;
- if (err || __copy_from_user(&env, buf, sizeof(env)))
- return 1;
- convert_to_fxsr(tsk, &env);
-
- return 0;
-}
-
-static int restore_i387_xsave(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- struct _fpstate_ia32 __user *fx_user =
- ((struct _fpstate_ia32 __user *) buf);
- struct i387_fxsave_struct __user *fx =
- (struct i387_fxsave_struct __user *) &fx_user->_fxsr_env[0];
- struct xsave_hdr_struct *xsave_hdr =
- &current->thread.fpu.state->xsave.xsave_hdr;
- u64 mask;
- int err;
-
- if (check_for_xstate(fx, sig_xstate_ia32_size -
- offsetof(struct _fpstate_ia32, _fxsr_env),
- &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- err = restore_i387_fxsave(buf, fx_sw_user.xstate_size);
-
- xsave_hdr->xstate_bv &= pcntxt_mask;
- /*
- * These bits must be zero.
- */
- xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
-
- /*
- * Init the state that is not present in the memory layout
- * and enabled by the OS.
- */
- mask = ~(pcntxt_mask & ~mask);
- xsave_hdr->xstate_bv &= mask;
-
- return err;
-fx_only:
- /*
- * Couldn't find the extended state information in the memory
- * layout. Restore the FP/SSE and init the other extended state
- * enabled by the OS.
- */
- xsave_hdr->xstate_bv = XSTATE_FPSSE;
- return restore_i387_fxsave(buf, sizeof(struct i387_fxsave_struct));
-}
-
-int restore_i387_xstate_ia32(void __user *buf)
-{
- int err;
- struct task_struct *tsk = current;
- struct _fpstate_ia32 __user *fp = (struct _fpstate_ia32 __user *) buf;
-
- if (HAVE_HWFP)
- clear_fpu(tsk);
-
- if (!buf) {
- if (used_math()) {
- clear_fpu(tsk);
- clear_used_math();
- }
-
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_ia32_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (HAVE_HWFP) {
- if (cpu_has_xsave)
- err = restore_i387_xsave(buf);
- else if (cpu_has_fxsr)
- err = restore_i387_fxsave(fp, sizeof(struct
- i387_fxsave_struct));
- else
- err = restore_i387_fsave(fp);
- } else {
- err = fpregs_soft_set(current, NULL,
- 0, sizeof(struct user_i387_ia32_struct),
- NULL, fp) != 0;
- }
- set_used_math();
-
- return err;
-}
-
-/*
* FPU state for core dumps.
* This is only used for a.out dumps now.
* It is declared generically using elf_fpregset_t (which is
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 072c30e..872fc78 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -720,28 +720,6 @@ asmlinkage void __attribute__((weak)) smp_threshold_interrupt(void)
}

/*
- * __math_state_restore assumes that cr0.TS is already clear and the
- * fpu state is all ready for use. Used during context switch.
- */
-void __math_state_restore(void)
-{
- struct thread_info *thread = current_thread_info();
- struct task_struct *tsk = thread->task;
-
- /*
- * Paranoid restore. send a SIGSEGV if we fail to restore the state.
- */
- if (unlikely(fpu_restore_checking(&tsk->thread.fpu))) {
- stts();
- force_sig(SIGSEGV, tsk);
- return;
- }
-
- thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
- tsk->fpu_counter++;
-}
-
-/*
* 'math_state_restore()' saves the current math information in the
* old math state array, and gets the new ones from the current task
*
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 08b2fe8..9ecc791 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -249,169 +249,6 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
return 1;
}

-#ifdef CONFIG_X86_64
-int save_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!access_ok(VERIFY_WRITE, buf, sig_xstate_size))
- return -EACCES;
-
- BUG_ON(sig_xstate_size < xstate_size);
-
- if ((unsigned long)buf % 64)
- printk("save_i387_xstate: bad fpstate %p\n", buf);
-
- if (!used_math())
- return 0;
-
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- if (use_xsave())
- err = xsave_checking(buf);
- else
- err = fxsave_user(buf);
-
- if (err)
- return err;
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- stts();
- } else {
- sanitize_i387_state(tsk);
- if (__copy_to_user(buf, &tsk->thread.fpu.state->fxsave,
- xstate_size))
- return -1;
- }
-
- clear_used_math(); /* trigger finit */
-
- if (use_xsave()) {
- struct _fpstate __user *fx = buf;
- struct _xstate __user *x = buf;
- u64 xstate_bv;
-
- err = __copy_to_user(&fx->sw_reserved, &fx_sw_reserved,
- sizeof(struct _fpx_sw_bytes));
-
- err |= __put_user(FP_XSTATE_MAGIC2,
- (__u32 __user *) (buf + sig_xstate_size
- - FP_XSTATE_MAGIC2_SIZE));
-
- /*
- * Read the xstate_bv which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers and
- * set the FP/SSE bits.
- */
- err |= __get_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- /*
- * For legacy compatible, we always set FP/SSE bits in the bit
- * vector while saving the state to the user context. This will
- * enable us capturing any changes(during sigreturn) to
- * the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
- *
- * xsave aware apps can change the xstate_bv in the xsave
- * header as well as change any contents in the memory layout.
- * xrestore as part of sigreturn will capture all the changes.
- */
- xstate_bv |= XSTATE_FPSSE;
-
- err |= __put_user(xstate_bv, &x->xstate_hdr.xstate_bv);
-
- if (err)
- return err;
- }
-
- return 1;
-}
-
-/*
- * Restore the extended state if present. Otherwise, restore the FP/SSE
- * state.
- */
-static int restore_user_xstate(void __user *buf)
-{
- struct _fpx_sw_bytes fx_sw_user;
- u64 mask;
- int err;
-
- if (((unsigned long)buf % 64) ||
- check_for_xstate(buf, sig_xstate_size, &fx_sw_user))
- goto fx_only;
-
- mask = fx_sw_user.xstate_bv;
-
- /*
- * restore the state passed by the user.
- */
- err = xrstor_checking((__force struct xsave_struct *)buf, mask);
- if (err)
- return err;
-
- /*
- * init the state skipped by the user.
- */
- mask = pcntxt_mask & ~mask;
- if (unlikely(mask))
- xrstor_state(init_xstate_buf, mask);
-
- return 0;
-
-fx_only:
- /*
- * couldn't find the extended state information in the
- * memory layout. Restore just the FP/SSE and init all
- * the other extended state.
- */
- xrstor_state(init_xstate_buf, pcntxt_mask & ~XSTATE_FPSSE);
- return fxrstor_checking((__force struct i387_fxsave_struct *)buf);
-}
-
-/*
- * This restores directly out of user space. Exceptions are handled.
- */
-int restore_i387_xstate(void __user *buf)
-{
- struct task_struct *tsk = current;
- int err = 0;
-
- if (!buf) {
- if (used_math())
- goto clear;
- return 0;
- } else
- if (!access_ok(VERIFY_READ, buf, sig_xstate_size))
- return -EACCES;
-
- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
-
- if (!(task_thread_info(current)->status & TS_USEDFPU)) {
- clts();
- task_thread_info(current)->status |= TS_USEDFPU;
- }
- if (use_xsave())
- err = restore_user_xstate(buf);
- else
- err = fxrstor_checking((__force struct i387_fxsave_struct *)
- buf);
- if (unlikely(err)) {
- /*
- * Encountered an error while doing the restore from the
- * user buffer, clear the fpu state.
- */
-clear:
- clear_fpu(tsk);
- clear_used_math();
- }
- return err;
-}
-#endif
-
int restore_xstates_sigframe(void __user *buf, unsigned int size)
{
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)

2011-04-06 22:05:49

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: more cleanups

Commit-ID: 324cbb83e215fd8abc73021ef9fd98b151225e01
Gitweb: http://git.kernel.org/tip/324cbb83e215fd8abc73021ef9fd98b151225e01
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:53 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:19 -0700

x86, xsave: more cleanups

Removed some declarations from headers that weren't used.

Retired TS_USEDFPU, it has been replaced by the XCNTXT_* bits in
xstate_mask.

There is no reason functions like fpu_fxsave() etc. need to know or
handle anything else than a buffer to save/restore their stuff to/from.

Sanitize_i387_state() is extra work that is only needed when xsaveopt is
used. There is no point in hiding this in an inline function, adding
extra code lines just to save a single if() in the five places it is
used. Also, it is obscuring a fact that might well be interesting to
whoever is reading the code, but it is not gaining anything.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 67 ++++++++++++-----------------------
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/include/asm/xsave.h | 14 +++----
arch/x86/kernel/i387.c | 12 ++++--
arch/x86/kernel/xsave.c | 32 ++++++++---------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 4 +-
7 files changed, 55 insertions(+), 78 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 75c0800..0381578 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -42,7 +42,6 @@ extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
extern int init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
-extern void __math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

extern void convert_from_fxsr(struct user_i387_ia32_struct *, struct task_struct *);
@@ -60,15 +59,10 @@ extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
*/
#define xstateregs_active fpregs_active

-extern struct _fpx_sw_bytes fx_sw_reserved;
extern unsigned int mxcsr_feature_mask;
+
#ifdef CONFIG_IA32_EMULATION
extern unsigned int sig_xstate_ia32_size;
-extern struct _fpx_sw_bytes fx_sw_reserved_ia32;
-struct _fpstate_ia32;
-struct _xstate_ia32;
-extern int save_i387_xstate_ia32(void __user *buf);
-extern int restore_i387_xstate_ia32(void __user *buf);
#endif

#ifdef CONFIG_MATH_EMULATION
@@ -76,7 +70,7 @@ extern int restore_i387_xstate_ia32(void __user *buf);
extern void finit_soft_fpu(struct i387_soft_struct *soft);
#else
# define HAVE_HWFP 1
-static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
+# define finit_soft_fpu(x)
#endif

#define X87_FSW_ES (1 << 7) /* Exception Summary */
@@ -96,15 +90,6 @@ static __always_inline __pure bool use_fxsr(void)
return static_cpu_has(X86_FEATURE_FXSR);
}

-extern void __sanitize_i387_state(struct task_struct *);
-
-static inline void sanitize_i387_state(struct task_struct *tsk)
-{
- if (!use_xsaveopt())
- return;
- __sanitize_i387_state(tsk);
-}
-
#ifdef CONFIG_X86_64
static inline void fxrstor(struct i387_fxsave_struct *fx)
{
@@ -118,7 +103,7 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
#endif
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
/* Using "rex64; fxsave %0" is broken because, if the memory operand
uses any extended registers for addressing, a second REX prefix
@@ -129,7 +114,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
/* Using "fxsaveq %0" would be the ideal choice, but is only supported
starting with gas 2.16. */
__asm__ __volatile__("fxsaveq %0"
- : "=m" (fpu->state->fxsave));
+ : "=m" (*fx));
#else
/* Using, as a workaround, the properly prefixed form below isn't
accepted by any binutils version so far released, complaining that
@@ -140,8 +125,8 @@ static inline void fpu_fxsave(struct fpu *fpu)
This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
asm volatile("rex64/fxsave (%[fx])"
- : "=m" (fpu->state->fxsave)
- : [fx] "R" (&fpu->state->fxsave));
+ : "=m" (*fx)
+ : [fx] "R" (fx));
#endif
}

@@ -161,10 +146,10 @@ static inline void fxrstor(struct i387_fxsave_struct *fx)
"m" (*fx));
}

-static inline void fpu_fxsave(struct fpu *fpu)
+static inline void fpu_fxsave(struct i387_fxsave_struct *fx)
{
asm volatile("fxsave %[fx]"
- : [fx] "=m" (fpu->state->fxsave));
+ : [fx] "=m" (*fx));
}

#endif /* CONFIG_X86_64 */
@@ -181,25 +166,25 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled
*/
-static inline void fpu_restore(struct fpu *fpu)
+static inline void fpu_restore(struct i387_fxsave_struct *fx)
{
- fxrstor(&fpu->state->fxsave);
+ fxrstor(fx);
}

-static inline void fpu_save(struct fpu *fpu)
+static inline void fpu_save(struct i387_fxsave_struct *fx)
{
if (use_fxsr()) {
- fpu_fxsave(fpu);
+ fpu_fxsave(fx);
} else {
asm volatile("fsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
+ : [fx] "=m" (*fx));
}
}

-static inline void fpu_clean(struct fpu *fpu)
+static inline void fpu_clean(struct i387_fxsave_struct *fx)
{
u32 swd = (use_fxsr() || use_xsave()) ?
- fpu->state->fxsave.swd : fpu->state->fsave.swd;
+ fx->swd : ((struct i387_fsave_struct *)fx)->swd;

if (unlikely(swd & X87_FSW_ES))
asm volatile("fnclex");
@@ -215,19 +200,6 @@ static inline void fpu_clean(struct fpu *fpu)
[addr] "m" (safe_address));
}

-static inline void __clear_fpu(struct task_struct *tsk)
-{
- if (task_thread_info(tsk)->status & TS_USEDFPU) {
- /* Ignore delayed exceptions from user space */
- asm volatile("1: fwait\n"
- "2:\n"
- _ASM_EXTABLE(1b, 2b));
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
- task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
- stts();
- }
-}
-
static inline void kernel_fpu_begin(void)
{
preempt_disable();
@@ -286,7 +258,14 @@ static inline void irq_ts_restore(int TS_state)
static inline void clear_fpu(struct task_struct *tsk)
{
preempt_disable();
- __clear_fpu(tsk);
+ if (task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY) {
+ /* Ignore delayed exceptions from user space */
+ asm volatile("1: fwait\n"
+ "2:\n"
+ _ASM_EXTABLE(1b, 2b));
+ task_thread_info(tsk)->xstate_mask &= ~XCNTXT_LAZY;
+ stts();
+ }
preempt_enable();
}

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index ec12d62..0e691c6 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -244,8 +244,6 @@ static inline struct thread_info *current_thread_info(void)
* ever touches our thread-synchronous status, so we don't
* have to worry about atomic accesses.
*/
-#define TS_USEDFPU 0x0001 /* FPU was used by this task
- this quantum (SMP) */
#define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/
#define TS_POLLING 0x0004 /* idle task polling need_resched,
skip sending interrupt */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 742da4a..b8861d4 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -37,8 +37,8 @@ extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

-extern void xsave(struct fpu *, u64);
-extern void xrstor(struct fpu *, u64);
+extern void xsave(struct xsave_struct *, u64);
+extern void xrstor(struct xsave_struct *, u64);
extern void save_xstates(struct task_struct *);
extern void restore_xstates(struct task_struct *, u64);
extern int save_xstates_sigframe(void __user *, unsigned int);
@@ -46,10 +46,7 @@ extern int restore_xstates_sigframe(void __user *, unsigned int);

extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
-extern int init_fpu(struct task_struct *child);
-extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
- unsigned int size,
- struct _fpx_sw_bytes *sw);
+extern void sanitize_i387_state(struct task_struct *);

static inline void xrstor_state(struct xsave_struct *fx, u64 mask)
{
@@ -71,7 +68,7 @@ static inline void xsave_state(struct xsave_struct *fx, u64 mask)
: "memory");
}

-static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
+static inline void xsaveopt_state(struct xsave_struct *fx, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
@@ -82,7 +79,8 @@ static inline void fpu_xsave(struct xsave_struct *fx, u64 mask)
".byte " REX_PREFIX "0x0f,0xae,0x27",
".byte " REX_PREFIX "0x0f,0xae,0x37",
X86_FEATURE_XSAVEOPT,
- [fx] "D" (fx), "a" (lmask), "d" (hmask) :
+ "D" (fx), "a" (lmask), "d" (hmask) :
"memory");
}
+
#endif
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index ca33c0b..dd9644a 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -182,7 +182,8 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -201,7 +202,8 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->fxsave, 0, -1);
@@ -440,7 +442,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
-1);
}

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (kbuf && pos == 0 && count == sizeof(env)) {
convert_from_fxsr(kbuf, target);
@@ -463,7 +466,8 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;

- sanitize_i387_state(target);
+ if (use_xsaveopt())
+ sanitize_i387_state(target);

if (!HAVE_HWFP)
return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf);
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 9ecc791..d42810f 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -39,7 +39,7 @@ static unsigned int *xstate_offsets, *xstate_sizes, xstate_features;
* that the user doesn't see some stale state in the memory layout during
* signal handling, debugging etc.
*/
-void __sanitize_i387_state(struct task_struct *tsk)
+void sanitize_i387_state(struct task_struct *tsk)
{
u64 xstate_bv;
int feature_bit = 0x2;
@@ -48,7 +48,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!fx)
return;

- BUG_ON(task_thread_info(tsk)->status & TS_USEDFPU);
+ BUG_ON(task_thread_info(tsk)->xstate_mask & XCNTXT_LAZY);

xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;

@@ -103,8 +103,8 @@ void __sanitize_i387_state(struct task_struct *tsk)
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
- struct _fpx_sw_bytes *fx_sw_user)
+static int check_for_xstate(struct i387_fxsave_struct __user *buf, unsigned int size,
+ struct _fpx_sw_bytes *fx_sw_user)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
sizeof(struct xsave_hdr_struct);
@@ -176,7 +176,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
- sanitize_i387_state(tsk);
+ if (use_xsaveopt())
+ sanitize_i387_state(tsk);

#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
if (ia32) {
@@ -498,17 +499,17 @@ void __cpuinit xsave_init(void)
this_func();
}

-void xsave(struct fpu *fpu, u64 mask)
+void xsave(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- fpu_xsave(&fpu->state->xsave, mask);
+ xsaveopt_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_save(fpu);
+ fpu_save(&x->i387);

if (mask & XCNTXT_LAZY)
- fpu_clean(fpu);
+ fpu_clean(&x->i387);

stts();
}
@@ -521,7 +522,7 @@ void save_xstates(struct task_struct *tsk)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xsave(&tsk->thread.fpu, ti->xstate_mask);
+ xsave(&tsk->thread.fpu.state->xsave, ti->xstate_mask);

if (!(ti->xstate_mask & XCNTXT_LAZY))
tsk->fpu_counter = 0;
@@ -533,19 +534,17 @@ void save_xstates(struct task_struct *tsk)
*/
if (tsk->fpu_counter < 5)
ti->xstate_mask &= ~XCNTXT_LAZY;
-
- ti->status &= ~TS_USEDFPU;
}
EXPORT_SYMBOL(save_xstates);

-void xrstor(struct fpu *fpu, u64 mask)
+void xrstor(struct xsave_struct *x, u64 mask)
{
clts();

if (use_xsave())
- xrstor_state(&fpu->state->xsave, mask);
+ xrstor_state(x, mask);
else if (mask & XCNTXT_LAZY)
- fpu_restore(fpu);
+ fpu_restore(&x->i387);

if (!(mask & XCNTXT_LAZY))
stts();
@@ -559,10 +558,9 @@ void restore_xstates(struct task_struct *tsk, u64 mask)
if (!fpu_allocated(&tsk->thread.fpu))
return;

- xrstor(&tsk->thread.fpu, mask);
+ xrstor(&tsk->thread.fpu.state->xsave, mask);

ti->xstate_mask |= mask;
- ti->status |= TS_USEDFPU;
if (mask & XCNTXT_LAZY)
tsk->fpu_counter++;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5b4cdcb..f756c95 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -876,7 +876,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
#ifdef CONFIG_X86_64
wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
#endif
- if (current_thread_info()->status & TS_USEDFPU)
+ if (current_thread_info()->xstate_mask & XCNTXT_LAZY)
clts();
load_gdt(&__get_cpu_var(host_gdt));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aae9e8f..bc04e15 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5805,7 +5805,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
kvm_put_guest_xcr0(vcpu);
vcpu->guest_fpu_loaded = 1;
save_xstates(current);
- xrstor(&vcpu->arch.guest_fpu, -1);
+ xrstor(&vcpu->arch.guest_fpu.state->xsave, -1);
trace_kvm_fpu(1);
}

@@ -5817,7 +5817,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;

vcpu->guest_fpu_loaded = 0;
- xsave(&vcpu->arch.guest_fpu, -1);
+ xsave(&vcpu->arch.guest_fpu.state->xsave, -1);
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);

2011-04-06 22:06:17

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: add support for non-lazy xstates

Commit-ID: 4182a4d68bac5782bf76743193c1d9f7d17a34a4
Gitweb: http://git.kernel.org/tip/4182a4d68bac5782bf76743193c1d9f7d17a34a4
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:54 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:19 -0700

x86, xsave: add support for non-lazy xstates

Non-lazy xstates are, as the name suggests, extended states that cannot
be saved or restored lazily. The state for AMDs LWP feature is an
example of this.

This patch adds support for this kind of xstates. If any such states are
present and supported on the running system, they will always be enabled
in xstate_mask so that they are always restored in switch_to. Since lazy
allocation of the xstate area won't work when non-lazy xstates are used,
all user tasks will always have a xstate area preallocated.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 17 +++++++++++++++++
arch/x86/include/asm/xsave.h | 5 +++--
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/xsave.c | 6 +++++-
5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 0381578..da55ab6 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -329,6 +329,23 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)
}

extern void fpu_finit(struct fpu *fpu);
+static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;
+
+static inline void fpu_clear(struct fpu *fpu)
+{
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ if (!fpu_allocated(fpu)) {
+ BUG_ON(init_xstate == NULL);
+ fpu->state = init_xstate;
+ init_xstate = NULL;
+ }
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
+ } else {
+ fpu_free(fpu);
+ }
+}

#endif /* __ASSEMBLY__ */

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index b8861d4..4ccee3c 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -23,9 +23,10 @@
/*
* These are the features that the OS can handle currently.
*/
-#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_NONLAZY 0

-#define XCNTXT_LAZY XCNTXT_MASK
+#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8df07c3..a878736 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -257,7 +257,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}
EXPORT_SYMBOL_GPL(start_thread);

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cbf1a67..8ff35fc 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -344,7 +344,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}

void
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d42810f..56ab3d3 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -16,6 +16,7 @@
* Supported feature mask by the CPU and the kernel.
*/
u64 pcntxt_mask;
+EXPORT_SYMBOL(pcntxt_mask);

/*
* Represents init state for the supported extended state.
@@ -260,7 +261,7 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct task_struct *tsk = current;
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
- u64 xstate_mask = 0;
+ u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
int err;

if (!buf) {
@@ -477,6 +478,9 @@ static void __init xstate_enable_boot_cpu(void)
printk(KERN_INFO "xsave/xrstor: enabled xstate_bv 0x%llx, "
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);
+
+ if (pcntxt_mask & XCNTXT_NONLAZY)
+ task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
}

/*

2011-04-06 22:06:36

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)

Commit-ID: 1039b306b1c68c2b4183b22a131c5f031dfedc2b
Gitweb: http://git.kernel.org/tip/1039b306b1c68c2b4183b22a131c5f031dfedc2b
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:55 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:20 -0700

x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP)

This patch extends the xsave structure to support the LWP state. The
xstate feature bit for LWP is added to XCNTXT_NONLAZY, thereby enabling
kernel support for saving/restoring LWP state. The LWP state is also
saved/restored on signal entry/return, just like all other xstates. LWP
state needs to be reset (disabled) when entering a signal handler.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/processor.h | 12 ++++++++++++
arch/x86/include/asm/sigcontext.h | 12 ++++++++++++
arch/x86/include/asm/xsave.h | 3 ++-
arch/x86/kernel/xsave.c | 2 ++
5 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fd5a1f3..55edab6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -131,6 +131,7 @@
#define MSR_AMD64_IBSDCPHYSAD 0xc0011039
#define MSR_AMD64_IBSCTL 0xc001103a
#define MSR_AMD64_IBSBRTARGET 0xc001103b
+#define MSR_AMD64_LWP_CBADDR 0xc0000106

/* Fam 15h MSRs */
#define MSR_F15H_PERF_CTL 0xc0010200
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4c25ab4..df2cbd4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -353,6 +353,17 @@ struct ymmh_struct {
u32 ymmh_space[64];
};

+struct lwp_struct {
+ u64 lwpcb_addr;
+ u32 flags;
+ u32 buf_head_offset;
+ u64 buf_base;
+ u32 buf_size;
+ u32 filters;
+ u64 saved_event_record[4];
+ u32 event_counter[16];
+};
+
struct xsave_hdr_struct {
u64 xstate_bv;
u64 reserved1[2];
@@ -363,6 +374,7 @@ struct xsave_struct {
struct i387_fxsave_struct i387;
struct xsave_hdr_struct xsave_hdr;
struct ymmh_struct ymmh;
+ struct lwp_struct lwp;
/* new processor state extensions will go here */
} __attribute__ ((packed, aligned (64)));

diff --git a/arch/x86/include/asm/sigcontext.h b/arch/x86/include/asm/sigcontext.h
index 04459d2..0a58b82 100644
--- a/arch/x86/include/asm/sigcontext.h
+++ b/arch/x86/include/asm/sigcontext.h
@@ -274,6 +274,17 @@ struct _ymmh_state {
__u32 ymmh_space[64];
};

+struct _lwp_state {
+ __u64 lwpcb_addr;
+ __u32 flags;
+ __u32 buf_head_offset;
+ __u64 buf_base;
+ __u32 buf_size;
+ __u32 filters;
+ __u64 saved_event_record[4];
+ __u32 event_counter[16];
+};
+
/*
* Extended state pointed by the fpstate pointer in the sigcontext.
* In addition to the fpstate, information encoded in the xstate_hdr
@@ -284,6 +295,7 @@ struct _xstate {
struct _fpstate fpstate;
struct _xsave_hdr xstate_hdr;
struct _ymmh_state ymmh;
+ struct _lwp_state lwp;
/* new processor state extensions go here */
};

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 4ccee3c..be89f0e 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -9,6 +9,7 @@
#define XSTATE_FP 0x1
#define XSTATE_SSE 0x2
#define XSTATE_YMM 0x4
+#define XSTATE_LWP (1ULL << 62)

#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)

@@ -24,7 +25,7 @@
* These are the features that the OS can handle currently.
*/
#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
-#define XCNTXT_NONLAZY 0
+#define XCNTXT_NONLAZY (XSTATE_LWP)

#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 56ab3d3..a188362 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -177,6 +177,8 @@ int save_xstates_sigframe(void __user *buf, unsigned int size)
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;

save_xstates(tsk);
+ if (pcntxt_mask & XSTATE_LWP)
+ wrmsrl(MSR_AMD64_LWP_CBADDR, 0);
if (use_xsaveopt())
sanitize_i387_state(tsk);

2011-04-06 22:06:59

by Hans Rosenfeld

[permalink] [raw]
Subject: [tip:x86/xsave] x86, xsave: remove lazy allocation of xstate area

Commit-ID: 66beba27e8b5c3f61818cc58bd6c9e0e3cfd7711
Gitweb: http://git.kernel.org/tip/66beba27e8b5c3f61818cc58bd6c9e0e3cfd7711
Author: Hans Rosenfeld <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:50:56 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 6 Apr 2011 14:15:21 -0700

x86, xsave: remove lazy allocation of xstate area

This patch completely removes lazy allocation of the xstate area. All
user tasks will always have an xstate area preallocated, just like they
already do when non-lazy features are present. The size of the xsave
area ranges from 112 to 960 bytes, depending on the xstates present and
enabled. Since it is common to use SSE etc. for optimization, the actual
overhead is expected to negligible.

This removes some of the special-case handling of non-lazy xstates. It
also greatly simplifies init_fpu() by removing the allocation code, the
check for presence of the xstate area or init_fpu() return value.

Signed-off-by: Hans Rosenfeld <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 20 ++++++++------------
arch/x86/kernel/i387.c | 41 +++++++++--------------------------------
arch/x86/kernel/traps.c | 16 ++--------------
arch/x86/kernel/xsave.c | 8 ++------
arch/x86/kvm/x86.c | 4 ++--
arch/x86/math-emu/fpu_entry.c | 8 ++------
6 files changed, 25 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index da55ab6..3ca900b 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -40,7 +40,7 @@
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
-extern int init_fpu(struct task_struct *child);
+extern void init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

@@ -333,18 +333,14 @@ static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;

static inline void fpu_clear(struct fpu *fpu)
{
- if (pcntxt_mask & XCNTXT_NONLAZY) {
- if (!fpu_allocated(fpu)) {
- BUG_ON(init_xstate == NULL);
- fpu->state = init_xstate;
- init_xstate = NULL;
- }
- memset(fpu->state, 0, xstate_size);
- fpu_finit(fpu);
- set_used_math();
- } else {
- fpu_free(fpu);
+ if (!fpu_allocated(fpu)) {
+ BUG_ON(init_xstate == NULL);
+ fpu->state = init_xstate;
+ init_xstate = NULL;
}
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index dd9644a..df0b139 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -127,9 +127,9 @@ EXPORT_SYMBOL_GPL(fpu_finit);
* value at reset if we support XMM instructions and then
* remember the current task has used the FPU.
*/
-int init_fpu(struct task_struct *tsk)
+void init_fpu(struct task_struct *tsk)
{
- int ret;
+ BUG_ON(tsk->flags & PF_KTHREAD);

if (tsk_used_math(tsk)) {
if (HAVE_HWFP && tsk == current) {
@@ -137,20 +137,12 @@ int init_fpu(struct task_struct *tsk)
save_xstates(tsk);
preempt_enable();
}
- return 0;
+ return;
}

- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpu_alloc(&tsk->thread.fpu);
- if (ret)
- return ret;
-
fpu_finit(&tsk->thread.fpu);

set_stopped_child_used_math(tsk);
- return 0;
}
EXPORT_SYMBOL_GPL(init_fpu);

@@ -173,14 +165,10 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- int ret;
-
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -198,9 +186,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -232,9 +218,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

/*
* Copy the 48bytes defined by the software first into the xstate
@@ -262,9 +246,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->xsave, 0, -1);
@@ -427,11 +409,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
void *kbuf, void __user *ubuf)
{
struct user_i387_ia32_struct env;
- int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (!HAVE_HWFP)
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -462,9 +441,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 872fc78..c8fbd04 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -734,20 +734,8 @@ asmlinkage void math_state_restore(void)
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task;

- if (!tsk_used_math(tsk)) {
- local_irq_enable();
- /*
- * does a slab alloc which can sleep
- */
- if (init_fpu(tsk)) {
- /*
- * ran out of memory!
- */
- do_group_exit(SIGKILL);
- return;
- }
- local_irq_disable();
- }
+ if (!tsk_used_math(tsk))
+ init_fpu(tsk);

restore_xstates(tsk, XCNTXT_LAZY);
}
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a188362..62f2df8 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -264,7 +264,6 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
- int err;

if (!buf) {
if (used_math()) {
@@ -277,11 +276,8 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;

- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
+ if (!used_math())
+ init_fpu(tsk);

if (!HAVE_HWFP) {
set_used_math();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc04e15..17e52a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5386,8 +5386,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;

- if (!tsk_used_math(current) && init_fpu(current))
- return -ENOMEM;
+ if (!tsk_used_math(current))
+ init_fpu(current);

if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 7718541..472e2b9 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -147,12 +147,8 @@ void math_emulate(struct math_emu_info *info)
unsigned long code_limit = 0; /* Initialized to stop compiler warnings */
struct desc_struct code_descriptor;

- if (!used_math()) {
- if (init_fpu(current)) {
- do_group_exit(SIGKILL);
- return;
- }
- }
+ if (!used_math())
+ init_fpu(current);

#ifdef RE_ENTRANT_CHECKING
if (emulating) {

2011-04-07 07:24:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support


FYI, the bits in tip:x86/xsave crash on boot on an AMD X2 testbox:

[ 10.823492] Freeing unused kernel memory: 616k freed
[ 11.087787] ------------[ cut here ]------------
[ 11.088312] Kernel BUG at ffffffff8100a140 [verbose debug info unavailable]
[ 11.088312] invalid opcode: 0000 [#1] SMP
[ 11.088312] last sysfs file:
[ 11.088312] CPU 1
[ 11.088312] Modules linked in:
[ 11.088312]
[ 11.088312] Pid: 41, comm: modprobe Not tainted 2.6.39-rc2-tip+ #113394
[ 11.088312] RIP: 0010:[<ffffffff8100a140>] [<ffffffff8100a140>] start_thread_common.constprop.1+0x100/0x110
[ 11.088312] RSP: 0018:ffff88003d7c5c40 EFLAGS: 00010246
[ 11.088312] RAX: ffff88003d7c5fd8 RBX: ffff88003d74bd40 RCX: 0000000000000033
[ 11.088312] RDX: 00007ffffffff000 RSI: 000000310f600ac0 RDI: 0000000000000000
[ 11.088312] RBP: ffff88003d7c5c60 R08: 0000000000000000 R09: 0000000000000004
[ 11.088312] R10: 00007fff4ae4dd68 R11: 0000000000000000 R12: 00007fff4ae4dd60
[ 11.088312] R13: 000000310f600ac0 R14: 0000000000000033 R15: ffff88003d74bd40
[ 11.088312] FS: 00007f48d909f780(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 11.088312] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 11.088312] CR2: 00007fff4ae4def9 CR3: 000000003d7af000 CR4: 00000000000006e0
[ 11.088312] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 11.088312] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 11.088312] Process modprobe (pid: 41, threadinfo ffff88003d7c4000, task ffff88003d74bd40)
[ 11.088312] Stack:
[ 11.088312] ffff88003d72c400 ffff88003d60a400 0000000000000000 ffff88003d7c5e80
[ 11.088312] ffff88003d7c5c70 ffffffff8100a546 ffff88003d7c5d90 ffffffff8117c7de
[ 11.088312] ffff88003d74bd40 0000000000000004 00007fff4ae4dda8 00007fff4ae4dd68
[ 11.088312] Call Trace:
[ 11.088312] [<ffffffff8100a546>] start_thread+0x16/0x20
[ 11.088312] [<ffffffff8117c7de>] load_elf_binary+0x14fe/0x1980
[ 11.088312] [<ffffffff81138392>] search_binary_handler+0xc2/0x2a0
[ 11.088312] [<ffffffff8117b2e0>] ? load_elf_library+0x2b0/0x2b0
[ 11.088312] [<ffffffff8113a35c>] do_execve+0x24c/0x2d0
[ 11.088312] [<ffffffff81014b97>] sys_execve+0x47/0x80
[ 11.088312] [<ffffffff8145b698>] kernel_execve+0x68/0xd0
[ 11.088312] [<ffffffff8106ca83>] ? ____call_usermodehelper+0x93/0xa0
[ 11.088312] [<ffffffff8145b624>] kernel_thread_helper+0x4/0x10
[ 11.088312] [<ffffffff81459f54>] ? retint_restore_args+0x13/0x13
[ 11.088312] [<ffffffff8106c9f0>] ? call_usermodehelper_setup+0xe0/0xe0
[ 11.088312] [<ffffffff8145b620>] ? gs_change+0x13/0x13
[ 11.088312] Code: f0 4c 8b 75 f8 c9 c3 0f 1f 40 00 48 8b 3d 19 01 64 00 48 85 ff 74 14 48 89 bb a0 04 00 00 48 c7 05 02 01 64 00 00 00 00 00 eb a1 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66
[ 11.088312] RIP [<ffffffff8100a140>] start_thread_common.constprop.1+0x100/0x110
[ 11.088312] RSP <ffff88003d7c5c40>

Full crashlog and kernel config attached. I've excluded x86/save from
tip:master for now.

Thanks,

Ingo


Attachments:
(No filename) (3.02 kB)
config-Thu_Apr__7_09_46_14_CEST_2011.bad (71.45 kB)
crash.log (187.69 kB)
Download all attachments

2011-04-07 15:30:43

by Hans Rosenfeld

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support

On Thu, Apr 07, 2011 at 03:23:05AM -0400, Ingo Molnar wrote:
>
> FYI, the bits in tip:x86/xsave crash on boot on an AMD X2 testbox:
>
> [ 10.823492] Freeing unused kernel memory: 616k freed
> [ 11.087787] ------------[ cut here ]------------
> [ 11.088312] Kernel BUG at ffffffff8100a140 [verbose debug info unavailable]
> [ 11.088312] invalid opcode: 0000 [#1] SMP
> [ 11.088312] last sysfs file:
> [ 11.088312] CPU 1
> [ 11.088312] Modules linked in:
> [ 11.088312]
> [ 11.088312] Pid: 41, comm: modprobe Not tainted 2.6.39-rc2-tip+ #113394
> [ 11.088312] RIP: 0010:[<ffffffff8100a140>] [<ffffffff8100a140>] start_thread_common.constprop.1+0x100/0x110
> [ 11.088312] RSP: 0018:ffff88003d7c5c40 EFLAGS: 00010246
> [ 11.088312] RAX: ffff88003d7c5fd8 RBX: ffff88003d74bd40 RCX: 0000000000000033
> [ 11.088312] RDX: 00007ffffffff000 RSI: 000000310f600ac0 RDI: 0000000000000000
> [ 11.088312] RBP: ffff88003d7c5c60 R08: 0000000000000000 R09: 0000000000000004
> [ 11.088312] R10: 00007fff4ae4dd68 R11: 0000000000000000 R12: 00007fff4ae4dd60
> [ 11.088312] R13: 000000310f600ac0 R14: 0000000000000033 R15: ffff88003d74bd40
> [ 11.088312] FS: 00007f48d909f780(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
> [ 11.088312] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 11.088312] CR2: 00007fff4ae4def9 CR3: 000000003d7af000 CR4: 00000000000006e0
> [ 11.088312] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 11.088312] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 11.088312] Process modprobe (pid: 41, threadinfo ffff88003d7c4000, task ffff88003d74bd40)
> [ 11.088312] Stack:
> [ 11.088312] ffff88003d72c400 ffff88003d60a400 0000000000000000 ffff88003d7c5e80
> [ 11.088312] ffff88003d7c5c70 ffffffff8100a546 ffff88003d7c5d90 ffffffff8117c7de
> [ 11.088312] ffff88003d74bd40 0000000000000004 00007fff4ae4dda8 00007fff4ae4dd68
> [ 11.088312] Call Trace:
> [ 11.088312] [<ffffffff8100a546>] start_thread+0x16/0x20
> [ 11.088312] [<ffffffff8117c7de>] load_elf_binary+0x14fe/0x1980
> [ 11.088312] [<ffffffff81138392>] search_binary_handler+0xc2/0x2a0
> [ 11.088312] [<ffffffff8117b2e0>] ? load_elf_library+0x2b0/0x2b0
> [ 11.088312] [<ffffffff8113a35c>] do_execve+0x24c/0x2d0
> [ 11.088312] [<ffffffff81014b97>] sys_execve+0x47/0x80
> [ 11.088312] [<ffffffff8145b698>] kernel_execve+0x68/0xd0
> [ 11.088312] [<ffffffff8106ca83>] ? ____call_usermodehelper+0x93/0xa0
> [ 11.088312] [<ffffffff8145b624>] kernel_thread_helper+0x4/0x10
> [ 11.088312] [<ffffffff81459f54>] ? retint_restore_args+0x13/0x13
> [ 11.088312] [<ffffffff8106c9f0>] ? call_usermodehelper_setup+0xe0/0xe0
> [ 11.088312] [<ffffffff8145b620>] ? gs_change+0x13/0x13
> [ 11.088312] Code: f0 4c 8b 75 f8 c9 c3 0f 1f 40 00 48 8b 3d 19 01 64 00 48 85 ff 74 14 48 89 bb a0 04 00 00 48 c7 05 02 01 64 00 00 00 00 00 eb a1 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66
> [ 11.088312] RIP [<ffffffff8100a140>] start_thread_common.constprop.1+0x100/0x110
> [ 11.088312] RSP <ffff88003d7c5c40>

Sorry for that, it seems I made a wrong assumption about
kernel_execve() usage. Updated patches will follow shortly.


Hans


--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

2011-04-07 16:09:08

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v4 6/8] x86, xsave: add support for non-lazy xstates

Non-lazy xstates are, as the name suggests, extended states that cannot
be saved or restored lazily. The state for AMDs LWP feature is an
example of this.

This patch adds support for this kind of xstates. If any such states are
present and supported on the running system, they will always be enabled
in xstate_mask so that they are always restored in switch_to. Since lazy
allocation of the xstate area won't work when non-lazy xstates are used,
all user tasks will always have a xstate area preallocated.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 14 ++++++++++++++
arch/x86/include/asm/xsave.h | 5 +++--
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/xsave.c | 6 +++++-
5 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index b8f9617..67233a5 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -330,6 +330,20 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)

extern void fpu_finit(struct fpu *fpu);

+static inline void fpu_clear(struct fpu *fpu)
+{
+ if (pcntxt_mask & XCNTXT_NONLAZY) {
+ if (!fpu_allocated(fpu) && fpu_alloc(fpu))
+ do_group_exit(SIGKILL);
+
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
+ } else {
+ fpu_free(fpu);
+ }
+}
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_I387_H */
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index b8861d4..4ccee3c 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -23,9 +23,10 @@
/*
* These are the features that the OS can handle currently.
*/
-#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_LAZY (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_NONLAZY 0

-#define XCNTXT_LAZY XCNTXT_MASK
+#define XCNTXT_MASK (XCNTXT_LAZY | XCNTXT_NONLAZY)

#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8df07c3..a878736 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -257,7 +257,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}
EXPORT_SYMBOL_GPL(start_thread);

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cbf1a67..8ff35fc 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -344,7 +344,7 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
/*
* Free the old FP and other extended state
*/
- free_thread_xstate(current);
+ fpu_clear(&current->thread.fpu);
}

void
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index d42810f..56ab3d3 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -16,6 +16,7 @@
* Supported feature mask by the CPU and the kernel.
*/
u64 pcntxt_mask;
+EXPORT_SYMBOL(pcntxt_mask);

/*
* Represents init state for the supported extended state.
@@ -260,7 +261,7 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct task_struct *tsk = current;
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
- u64 xstate_mask = 0;
+ u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
int err;

if (!buf) {
@@ -477,6 +478,9 @@ static void __init xstate_enable_boot_cpu(void)
printk(KERN_INFO "xsave/xrstor: enabled xstate_bv 0x%llx, "
"cntxt size 0x%x\n",
pcntxt_mask, xstate_size);
+
+ if (pcntxt_mask & XCNTXT_NONLAZY)
+ task_thread_info(&init_task)->xstate_mask |= XCNTXT_NONLAZY;
}

/*
--
1.5.6.5

2011-04-07 16:09:00

by Hans Rosenfeld

[permalink] [raw]
Subject: [RFC v4 8/8] x86, xsave: remove lazy allocation of xstate area

This patch completely removes lazy allocation of the xstate area. All
user tasks will always have an xstate area preallocated, just like they
already do when non-lazy features are present. The size of the xsave
area ranges from 112 to 960 bytes, depending on the xstates present and
enabled. Since it is common to use SSE etc. for optimization, the actual
overhead is expected to negligible.

This removes some of the special-case handling of non-lazy xstates. It
also greatly simplifies init_fpu() by removing the allocation code, the
check for presence of the xstate area or init_fpu() return value.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 16 ++++++----------
arch/x86/kernel/i387.c | 41 +++++++++--------------------------------
arch/x86/kernel/traps.c | 16 ++--------------
arch/x86/kernel/xsave.c | 8 ++------
arch/x86/kvm/x86.c | 4 ++--
arch/x86/math-emu/fpu_entry.c | 8 ++------
6 files changed, 23 insertions(+), 70 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 67233a5..833b6f1 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -40,7 +40,7 @@
extern unsigned int sig_xstate_size;
extern void fpu_init(void);
extern void mxcsr_feature_mask_init(void);
-extern int init_fpu(struct task_struct *child);
+extern void init_fpu(struct task_struct *child);
extern asmlinkage void math_state_restore(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

@@ -332,16 +332,12 @@ extern void fpu_finit(struct fpu *fpu);

static inline void fpu_clear(struct fpu *fpu)
{
- if (pcntxt_mask & XCNTXT_NONLAZY) {
- if (!fpu_allocated(fpu) && fpu_alloc(fpu))
- do_group_exit(SIGKILL);
+ if (!fpu_allocated(fpu) && fpu_alloc(fpu))
+ do_group_exit(SIGKILL);

- memset(fpu->state, 0, xstate_size);
- fpu_finit(fpu);
- set_used_math();
- } else {
- fpu_free(fpu);
- }
+ memset(fpu->state, 0, xstate_size);
+ fpu_finit(fpu);
+ set_used_math();
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index dd9644a..df0b139 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -127,9 +127,9 @@ EXPORT_SYMBOL_GPL(fpu_finit);
* value at reset if we support XMM instructions and then
* remember the current task has used the FPU.
*/
-int init_fpu(struct task_struct *tsk)
+void init_fpu(struct task_struct *tsk)
{
- int ret;
+ BUG_ON(tsk->flags & PF_KTHREAD);

if (tsk_used_math(tsk)) {
if (HAVE_HWFP && tsk == current) {
@@ -137,20 +137,12 @@ int init_fpu(struct task_struct *tsk)
save_xstates(tsk);
preempt_enable();
}
- return 0;
+ return;
}

- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpu_alloc(&tsk->thread.fpu);
- if (ret)
- return ret;
-
fpu_finit(&tsk->thread.fpu);

set_stopped_child_used_math(tsk);
- return 0;
}
EXPORT_SYMBOL_GPL(init_fpu);

@@ -173,14 +165,10 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- int ret;
-
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -198,9 +186,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
@@ -232,9 +218,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

/*
* Copy the 48bytes defined by the software first into the xstate
@@ -262,9 +246,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpu.state->xsave, 0, -1);
@@ -427,11 +409,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
void *kbuf, void __user *ubuf)
{
struct user_i387_ia32_struct env;
- int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (!HAVE_HWFP)
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -462,9 +441,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;

- ret = init_fpu(target);
- if (ret)
- return ret;
+ init_fpu(target);

if (use_xsaveopt())
sanitize_i387_state(target);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 872fc78..c8fbd04 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -734,20 +734,8 @@ asmlinkage void math_state_restore(void)
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task;

- if (!tsk_used_math(tsk)) {
- local_irq_enable();
- /*
- * does a slab alloc which can sleep
- */
- if (init_fpu(tsk)) {
- /*
- * ran out of memory!
- */
- do_group_exit(SIGKILL);
- return;
- }
- local_irq_disable();
- }
+ if (!tsk_used_math(tsk))
+ init_fpu(tsk);

restore_xstates(tsk, XCNTXT_LAZY);
}
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index a188362..62f2df8 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -264,7 +264,6 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
struct _fpstate_ia32 __user *fp = buf;
struct xsave_struct *xsave;
u64 xstate_mask = pcntxt_mask & XCNTXT_NONLAZY;
- int err;

if (!buf) {
if (used_math()) {
@@ -277,11 +276,8 @@ int restore_xstates_sigframe(void __user *buf, unsigned int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;

- if (!used_math()) {
- err = init_fpu(tsk);
- if (err)
- return err;
- }
+ if (!used_math())
+ init_fpu(tsk);

if (!HAVE_HWFP) {
set_used_math();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc04e15..17e52a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5386,8 +5386,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;

- if (!tsk_used_math(current) && init_fpu(current))
- return -ENOMEM;
+ if (!tsk_used_math(current))
+ init_fpu(current);

if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 7718541..472e2b9 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -147,12 +147,8 @@ void math_emulate(struct math_emu_info *info)
unsigned long code_limit = 0; /* Initialized to stop compiler warnings */
struct desc_struct code_descriptor;

- if (!used_math()) {
- if (init_fpu(current)) {
- do_group_exit(SIGKILL);
- return;
- }
- }
+ if (!used_math())
+ init_fpu(current);

#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
1.5.6.5

2011-04-13 10:58:37

by Hans Rosenfeld

[permalink] [raw]
Subject: [PATCH] x86, xsave: fix non-lazy allocation of the xsave area

A single static xsave area just for init is not enough, since there are
more user processes that are directly executed by kernel threads. Use
fpu_alloc(), and SIGKILL the process if that fails.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 989c0ac..833b6f1 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -329,15 +329,12 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)
}

extern void fpu_finit(struct fpu *fpu);
-static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;

static inline void fpu_clear(struct fpu *fpu)
{
- if (!fpu_allocated(fpu)) {
- BUG_ON(init_xstate == NULL);
- fpu->state = init_xstate;
- init_xstate = NULL;
- }
+ if (!fpu_allocated(fpu) && fpu_alloc(fpu))
+ do_group_exit(SIGKILL);
+
memset(fpu->state, 0, xstate_size);
fpu_finit(fpu);
set_used_math();
--
1.5.6.5

2011-04-13 23:22:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86, xsave: fix non-lazy allocation of the xsave area

On 04/13/2011 03:58 AM, Hans Rosenfeld wrote:
> A single static xsave area just for init is not enough, since there are
> more user processes that are directly executed by kernel threads. Use
> fpu_alloc(), and SIGKILL the process if that fails.
>
> Signed-off-by: Hans Rosenfeld <[email protected]>
> ---
> arch/x86/include/asm/i387.h | 9 +++------
> 1 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
> index 989c0ac..833b6f1 100644
> --- a/arch/x86/include/asm/i387.h
> +++ b/arch/x86/include/asm/i387.h
> @@ -329,15 +329,12 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)
> }
>
> extern void fpu_finit(struct fpu *fpu);
> -static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;
>
> static inline void fpu_clear(struct fpu *fpu)
> {
> - if (!fpu_allocated(fpu)) {
> - BUG_ON(init_xstate == NULL);
> - fpu->state = init_xstate;
> - init_xstate = NULL;
> - }
> + if (!fpu_allocated(fpu) && fpu_alloc(fpu))
> + do_group_exit(SIGKILL);
> +
> memset(fpu->state, 0, xstate_size);
> fpu_finit(fpu);
> set_used_math();

Ideally this should be done earlier, while it is still possible to
ENOMEM the exec. Specifically, it probably should be done from a new
arch hook at the top in flush_old_exec(). I'm not sure how much it
matters in practice, because if we are that memory-constrained we'll
probably die shortly anyway, and to a kernel thread it is probably not
that much of a difference if the exec'd process dies with SIGKILL or if
it gets ENOMEM from the exec() -- it will typically be visible only from
the parent thread anyway.

-hpa

2011-04-15 16:47:34

by Hans Rosenfeld

[permalink] [raw]
Subject: [PATCH 1/1] x86, xsave: fix non-lazy allocation of the xsave area

A single static xsave area just for init is not enough, since there are
more user processes that are directly executed by kernel threads. Add a
call to a new arch-specific function to flush_old_exec(), which will in
turn call fpu_alloc() to allocate a xsave area if necessary.

Signed-off-by: Hans Rosenfeld <[email protected]>
---
arch/x86/include/asm/i387.h | 6 ------
arch/x86/kernel/process.c | 7 +++++++
fs/exec.c | 9 +++++++++
3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index 989c0ac..0448f45 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -329,15 +329,9 @@ static inline void fpu_copy(struct fpu *dst, struct fpu *src)
}

extern void fpu_finit(struct fpu *fpu);
-static union thread_xstate __init_xstate, *init_xstate = &__init_xstate;

static inline void fpu_clear(struct fpu *fpu)
{
- if (!fpu_allocated(fpu)) {
- BUG_ON(init_xstate == NULL);
- fpu->state = init_xstate;
- init_xstate = NULL;
- }
memset(fpu->state, 0, xstate_size);
fpu_finit(fpu);
set_used_math();
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0382f98..3edfbf2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -26,6 +26,13 @@
struct kmem_cache *task_xstate_cachep;
EXPORT_SYMBOL_GPL(task_xstate_cachep);

+int arch_prealloc_fpu(struct task_struct *tsk)
+{
+ if (!fpu_allocated(&tsk->thread.fpu))
+ return fpu_alloc(&tsk->thread.fpu);
+ return 0;
+}
+
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
{
int ret;
diff --git a/fs/exec.c b/fs/exec.c
index 5e62d26..c5b5c1e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1022,10 +1022,19 @@ void set_task_comm(struct task_struct *tsk, char *buf)
perf_event_comm(tsk);
}

+int __attribute__((weak)) arch_prealloc_fpu(struct task_struct *tsk)
+{
+ return 0;
+}
+
int flush_old_exec(struct linux_binprm * bprm)
{
int retval;

+ retval = arch_prealloc_fpu(current);
+ if (retval)
+ goto out;
+
/*
* Make sure we have a private signal table and that
* we are unassociated from the previous thread group.
--
1.5.6.5

2011-05-16 19:10:48

by Hans Rosenfeld

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support

Hi,

On Thu, Apr 07, 2011 at 03:23:05AM -0400, Ingo Molnar wrote:
>
> FYI, the bits in tip:x86/xsave crash on boot on an AMD X2 testbox:

> Full crashlog and kernel config attached. I've excluded x86/save from
> tip:master for now.

this issue has been fixed a few weeks ago.

Are there any plans to include x86/xsave into tip:master again?


Hans


--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

2011-05-17 11:30:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support


* Hans Rosenfeld <[email protected]> wrote:

> Hi,
>
> On Thu, Apr 07, 2011 at 03:23:05AM -0400, Ingo Molnar wrote:
> >
> > FYI, the bits in tip:x86/xsave crash on boot on an AMD X2 testbox:
>
> > Full crashlog and kernel config attached. I've excluded x86/save from
> > tip:master for now.
>
> this issue has been fixed a few weeks ago.
>
> Are there any plans to include x86/xsave into tip:master again?

Regarding the LWP bits, that branch was indeed excluded because of that crash,
while re-checking the branch today i noticed at least one serious design error
in it, which makes me reconsider the whole thing:

- Where is the hardware interrupt that signals the ring-buffer-full condition
exposed to user-space and how can user-space wait for ring buffer events?
AFAICS this needs to set the LWP_CFG MSR and needs an irq handler, which
needs kernel side support - but that is not included in these patches.

The way we solved this with Intel's BTS (and PEBS) feature is that there's
a per task hardware buffer that is coupled with the event ring buffer, so
both setup and 'waiting' for the ring-buffer happens automatically and
transparently because tools can already wait on the ring-buffer.

Considerable effort went into that model on the Intel side before we merged
it and i see no reason why an AMD hw-tracing feature should not have this
too...

[ If that is implemented we can expose LWP to user-space as well (which can
choose to utilize it directly and buffer into its own memory area without
irqs and using polling, but i'd generally discourage such crude event
collection methods). ]

- LWP is exposed indiscriminately, without giving user-space a chance to
disable it on a per task basis. Security-conscious apps would want to disable
access to the LWP instructions - which are all ring 3 and unprivileged! We
already allow this for the TSC for example. Right now sandboxed code like
seccomp would get access to LWP as well - not good. Some intelligent
(optional) control is needed, probably using cr0's lwp-enabled bit.

There are a couple of other items as well:

- The LWP_CFG has other features as well, such as the ability to aggregate
events amongst cores. This is not exposed either. This looks like a lower
prio, optional item which could be offered after the first patches went
upstream.

- like we do it for PEBS with the perf_attr.precise attribute, it would be nice
to report not RIP+1 but the real RIP itself. On Intel we use LBR to discover
the previous instruction, this might not be possible on AMD CPUs.

One solution would be to disassemble the sampled instruction and approximate
the previous one by assuming that it's the preceding instruction (for
branches and calls this might not be true). If we do this then the event::FUS
bit has to be taken into account - in case the CPU has fused the instruction
and we have a two instructions delay in reporting.

In any case, this is an optional item too and v1 support can be merged
without trying to implement precise RIP support.

- there are a few interesting looking event details that we'd want to expose
in a generalized manner: branch taken/not taken bit, branch prediction
hit/miss bit, etc.

This too is optional.

- The LWPVAL instruction allows the user-space generation of samples. There
needs to be a matching generic event for it, which is then inserted into the
perf ring-buffer. Similarly, LWPINS needs to have a matching generic record
as well, so that user-space can decode it.

This too looks optional to me.

- You'd eventually want to expose the randomization (bits 60-63 in the LWPCB)
feature as well, via an attribute bit. Ditto for filtering such as cache
latency filtering, which looks the most useful. The low/high IP filter could
be exposed as well. All optional. For remaining featurities if there's no sane
way to expose them generally we can expose a raw event field as
well and have a raw event configuration space to twiddle these details.

In general LWP is pretty neat and i agree that we want to offer it, it offers
access to five top categories of hw events (which we also have generalized):

- instructions
- branches
- the most important types of cache misses
- CPU cycles
- constant (bus) cycles

- user-space generated events/samples

So it will fit nicely into our existing scheme of how we handle PMU features
and generalizations.

Here are a couple of suggestions to LWP hardware designers:

- the fact that LWP cannot count kernel events right now is unfortunate -
there's no reason not to allow privileged user-space to request ring 3
events as well - hopefully this misfeature will be fixed in future
iterations of the hardware.

- it would be nice to allow the per task masking/unmasking of LWP without
having to modify the cr0 (which can be expensive). A third mode
implemented in the LWP_CFG MSG would suffice: it would make the LWP
instructions privileged, but would otherwise allow LWP event collection
to occur even on sandboxed code.

- it would be nice to also log the previous retired instruction in the
trace entry, to ease decoding of the real instruction that generated
an event. (Fused instructions can generate their RIP at the first
instruction.)

Thanks,

Ingo

2011-05-17 15:23:05

by Hans Rosenfeld

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support

On Tue, May 17, 2011 at 07:30:20AM -0400, Ingo Molnar wrote:
> Regarding the LWP bits, that branch was indeed excluded because of that crash,
> while re-checking the branch today i noticed at least one serious design error
> in it, which makes me reconsider the whole thing:

If you don't like the patch to enable LWP, you could leave that one out
for now. The other xsave rework patches are necessary for LWP, but they
also make sense in their own right.

> - Where is the hardware interrupt that signals the ring-buffer-full condition
> exposed to user-space and how can user-space wait for ring buffer events?
> AFAICS this needs to set the LWP_CFG MSR and needs an irq handler, which
> needs kernel side support - but that is not included in these
> patches.

This is not strictly necessary. All that the LWP patch does is enable a
new instruction set that can be used without any support for interrupts.
A user process tracing itself with LWP can always poll the ring buffer.

> The way we solved this with Intel's BTS (and PEBS) feature is that there's
> a per task hardware buffer that is coupled with the event ring buffer, so
> both setup and 'waiting' for the ring-buffer happens automatically and
> transparently because tools can already wait on the ring-buffer.
>
> Considerable effort went into that model on the Intel side before we merged
> it and i see no reason why an AMD hw-tracing feature should not have this
> too...

I don't see how that is related to LWP, which by design only works in
user space and directly logs to user space buffers.

> [ If that is implemented we can expose LWP to user-space as well (which can
> choose to utilize it directly and buffer into its own memory area without
> irqs and using polling, but i'd generally discourage such crude event
> collection methods). ]

Well, thats exactly how LWP is supposed to work. Its all user space. It
works only in user mode and it logs directly to a buffer in virtual
address space of the process being traced. The kernel doesn't have to
care at all about LWP for basic functionality, given that it enables the
instruction set and saving/restoring of the LWP state. Enabling the LWP
interrupt and relaying that as a signal or whatever is completely
optional and can be done later if necessary.

> - LWP is exposed indiscriminately, without giving user-space a chance to
> disable it on a per task basis. Security-conscious apps would want to disable
> access to the LWP instructions - which are all ring 3 and unprivileged! We
> already allow this for the TSC for example. Right now sandboxed code like
> seccomp would get access to LWP as well - not good. Some intelligent
> (optional) control is needed, probably using cr0's lwp-enabled bit.

What exactly is the point here? If a program doesn't want to use LWP for
whatever reason, it doesn't have to. No state is saved/restored by
XSAVE/XRSTOR for LWP if it is unused. A security-conscious app would
also not allow any LD_PRELOADs or anything like that which could use LWP
behind its back. What exactly is gained by disabling it, except for
breaking the specification?

Note that there is only one way to disable LWP, and that is clearing the
LWP bit in the XFEATURE_ENABLED_MASK in XCR0. Messing with that in a
running system will cause a lot of pain.

> There are a couple of other items as well:
>
> - The LWP_CFG has other features as well, such as the ability to aggregate
> events amongst cores. This is not exposed either. This looks like a lower
> prio, optional item which could be offered after the first patches went
> upstream.

I don't see that anywhere in the specification, where did you find that?

> - like we do it for PEBS with the perf_attr.precise attribute, it would be nice
> to report not RIP+1 but the real RIP itself. On Intel we use LBR to discover
> the previous instruction, this might not be possible on AMD CPUs.
>
> One solution would be to disassemble the sampled instruction and approximate
> the previous one by assuming that it's the preceding instruction (for
> branches and calls this might not be true). If we do this then the event::FUS
> bit has to be taken into account - in case the CPU has fused the instruction
> and we have a two instructions delay in reporting.
>
> In any case, this is an optional item too and v1 support can be merged
> without trying to implement precise RIP support.
>
> - there are a few interesting looking event details that we'd want to expose
> in a generalized manner: branch taken/not taken bit, branch prediction
> hit/miss bit, etc.
>
> This too is optional.
>
> - The LWPVAL instruction allows the user-space generation of samples. There
> needs to be a matching generic event for it, which is then inserted into the
> perf ring-buffer. Similarly, LWPINS needs to have a matching generic record
> as well, so that user-space can decode it.
>
> This too looks optional to me.
>
> - You'd eventually want to expose the randomization (bits 60-63 in the LWPCB)
> feature as well, via an attribute bit. Ditto for filtering such as cache
> latency filtering, which looks the most useful. The low/high IP filter could
> be exposed as well. All optional. For remaining featurities if there's no sane
> way to expose them generally we can expose a raw event field as
> well and have a raw event configuration space to twiddle these details.
>
> In general LWP is pretty neat and i agree that we want to offer it, it offers
> access to five top categories of hw events (which we also have generalized):
>
> - instructions
> - branches
> - the most important types of cache misses
> - CPU cycles
> - constant (bus) cycles
>
> - user-space generated events/samples
>
> So it will fit nicely into our existing scheme of how we handle PMU features
> and generalizations.

I don't quite understand what you are proposing here. The LWPCB is
controlled by the user space application that traces itself, so all of
it is already exposed by the hardware. The samples are directly logged to
the user space buffer by the hardware, so there is no work to do for the
kernel here. Any post-processing of the samples (for precise RIP or
such) needs to be done in the user space.

We had some discussions about how to make LWP more accessible to
users. Having LWP support in perf would certainly be nice, but the
implementation would be very much different from that for other PMUs.
LWP does almost everything in hardware that perf does in the kernel.

As I said before, with this patch I'm enabling a new instruction set and
associated extended state. How exactly user programs use it, and how it
might fit into existing PMU APIs and tools is not really that important
now.

> Here are a couple of suggestions to LWP hardware designers:
>
> - the fact that LWP cannot count kernel events right now is unfortunate -
> there's no reason not to allow privileged user-space to request ring 3
> events as well - hopefully this misfeature will be fixed in future
> iterations of the hardware.
>
> - it would be nice to allow the per task masking/unmasking of LWP without
> having to modify the cr0 (which can be expensive). A third mode
> implemented in the LWP_CFG MSG would suffice: it would make the LWP
> instructions privileged, but would otherwise allow LWP event collection
> to occur even on sandboxed code.
>
> - it would be nice to also log the previous retired instruction in the
> trace entry, to ease decoding of the real instruction that generated
> an event. (Fused instructions can generate their RIP at the first
> instruction.)

I will forward this to our hardware designers, but I have my doubts
about the first two of your suggestions. They seem to be orthogonal to
what LWP is supposed to be.


Hans


--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

2011-05-18 08:16:57

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support

Hi Ingo,

thanks for your thoughts on this. I have some comments below.

On Tue, May 17, 2011 at 01:30:20PM +0200, Ingo Molnar wrote:

> - Where is the hardware interrupt that signals the ring-buffer-full condition
> exposed to user-space and how can user-space wait for ring buffer events?
> AFAICS this needs to set the LWP_CFG MSR and needs an irq handler, which
> needs kernel side support - but that is not included in these patches.
>
> The way we solved this with Intel's BTS (and PEBS) feature is that there's
> a per task hardware buffer that is coupled with the event ring buffer, so
> both setup and 'waiting' for the ring-buffer happens automatically and
> transparently because tools can already wait on the ring-buffer.
>
> Considerable effort went into that model on the Intel side before we merged
> it and i see no reason why an AMD hw-tracing feature should not have this
> too...
>
> [ If that is implemented we can expose LWP to user-space as well (which can
> choose to utilize it directly and buffer into its own memory area without
> irqs and using polling, but i'd generally discourage such crude event
> collection methods). ]

If I understand this correctly you suggest to propagate the lwp-events
through perf into user-space. This is certainly good because it provides
a unified interface, but it somewhat elimitates the 'lightweight' part
of LWP because the samples need to be read by the kernel from user-space
memory (the lwp-ring-buffer needs to be in user-space memory), convert
it to perf-samples, and copy it back to user-space. The benefit is the
unified interface but the 'lightweight' and low-impact part vanishes to
some degree.

Also, LWP is somewhat different from the old-style PMU. LWP is designed
for self-monitoring of applications that want to optimize themself at
runtime, like JIT compilers (Java, LVMM, ...) or databases. For those
applications it would be good to keep LWP as lightweight as possible.

The missing support for interupts is certainly a problem here which
significantly limits the usefulness of the feature for now. My idea was
to expose the interupt-event through perf to user-space so that the
application can wait on that event to read out the LWP ring-buffer.

But to come back to your idea, it probably could be done in a way to
enable profiling of other applications using LWP. The kernel needs to
allocate the lwp ring-buffer and setup lwp itself. The problem is that
the buffer needs to be user-accessible and where to map this buffer:

a) On the kernel-part of the address space. Problematic because
every process can read the buffer of other tasks. So this is
a no-go from a security point-of-view.

b) Change the address space layout in a comatible way to allow
the kernel to map it (e.g. make a small part of the
kernel-address space per-process). Somewhat intrusive to
current x86 code, also not sure this feature is worth it.

c) Some way to let userspace setup such a buffer and give the
address to the kernel, or we mmap it directly into user
address space. But that may cause other problems with
applications that have strict requirements for their
address-space layout.

Bottom-line is, we need a good and secure way to setup a user-accessible
buffer per-process in the kernel. If we have that we can use LWP to
monitor other applications (unless the application decides to use LWP of
its own).

I like the idea, but we should also make sure that we don't prevent the
low-impact self-monitoring use-case for applications that want it.

> - LWP is exposed indiscriminately, without giving user-space a chance to
> disable it on a per task basis. Security-conscious apps would want to disable
> access to the LWP instructions - which are all ring 3 and unprivileged! We
> already allow this for the TSC for example. Right now sandboxed code like
> seccomp would get access to LWP as well - not good. Some intelligent
> (optional) control is needed, probably using cr0's lwp-enabled bit.

That could certainly be done, but requires an xcr0 write at
context-switch. JFI, how can the tsc be disabled for a task from
userspace?

Regards,

Joerg

2011-05-18 11:00:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support


* Joerg Roedel <[email protected]> wrote:

> Hi Ingo,
>
> thanks for your thoughts on this. I have some comments below.
>
> On Tue, May 17, 2011 at 01:30:20PM +0200, Ingo Molnar wrote:
>
> > - Where is the hardware interrupt that signals the ring-buffer-full condition
> > exposed to user-space and how can user-space wait for ring buffer events?
> > AFAICS this needs to set the LWP_CFG MSR and needs an irq handler, which
> > needs kernel side support - but that is not included in these patches.
> >
> > The way we solved this with Intel's BTS (and PEBS) feature is that there's
> > a per task hardware buffer that is coupled with the event ring buffer, so
> > both setup and 'waiting' for the ring-buffer happens automatically and
> > transparently because tools can already wait on the ring-buffer.
> >
> > Considerable effort went into that model on the Intel side before we merged
> > it and i see no reason why an AMD hw-tracing feature should not have this
> > too...
> >
> > [ If that is implemented we can expose LWP to user-space as well (which can
> > choose to utilize it directly and buffer into its own memory area without
> > irqs and using polling, but i'd generally discourage such crude event
> > collection methods). ]
>
> If I understand this correctly you suggest to propagate the lwp-events
> through perf into user-space. This is certainly good because it provides
> a unified interface, but it somewhat elimitates the 'lightweight' part
> of LWP because the samples need to be read by the kernel from user-space
> memory (the lwp-ring-buffer needs to be in user-space memory), convert
> it to perf-samples, and copy it back to user-space. The benefit is the
> unified interface but the 'lightweight' and low-impact part vanishes to
> some degree.

I have two arguments here.

1) it does not matter much in practice

Say we have a large amount of samples: a hundred thousand samples for a second
worth of application execution. This 100 KHz sampling is already 100 times
larger than the default we use in tools.

100k samples - the 'lightweight' comes from not having to incur the cost of
100,000 PMU interrupts spread out with 1000+ overhead cycles each - but being
able to batch it up in groups.

The copying of the 100k samples means the handling of 3.2 MB of data per
second. The copying itself is *negligible* - this is from an ancient AMD box:

phoenix:~> perf bench mem memcpy
# Running mem/memcpy benchmark...
# Copying 1MB Bytes ...

727.802038 MB/Sec
1.949227 GB/Sec (with prefault)

On modern CPUs it ought to be in the 0.1% overhead range. For usual sampling
rates the copying would be in the 0.001% overhead range.

And for that we get a much better abstraction and much better tooling model.
The decision is a no-brainer really.

Note that if user-space *really* wants to get rid of even this overhead it can
use the instructions in a raw way. I expect that to have the fate of
sendfile(): zero-copy was trumpeted to be a big performance thing but in
practice it rarely mattered, usability was what kept people on read()/write().

[ and compared to raw LWP instructions the usability disadvantage of sendfile()
is almost non-existent. ]

2) there's no contradiction: lightweight access can be supported in the perf
abstraction as well

While the PEBS buffer is not exposed to user-space, we can expose the buffer in
the LWP case and make 'raw collection' possible. As long as the standard
facilities are used to *configure* profiling and as long as the standard
facilities are used for the threshold irq functionality, etc. this is not
something i object to.

And if zero copying matters a lot, then regular tools will use that facility as
well.

> Also, LWP is somewhat different from the old-style PMU. LWP is designed
> for self-monitoring of applications that want to optimize themself at
> runtime, like JIT compilers (Java, LVMM, ...) or databases. For those
> applications it would be good to keep LWP as lightweight as possible.

That goal does not contradict the sane resource management and synchronization
requirements i outlined.

> The missing support for interupts is certainly a problem here which
> significantly limits the usefulness of the feature for now. [...]

Yes, that's the key observation.

> [...] My idea was to expose the interupt-event through perf to user-space so
> that the application can wait on that event to read out the LWP ring-buffer.

The (much) better thing (which you seem to realize later in your mail) is to
just integrate the buffer and teach the kernel to parse it.

Then *all* tools will be able to utilize this (useful looking) hardware feature
straight away, with very little modifications needed - the advantage of
standardized kernel interfaces.

If the CPU guys give us a 'measure kernel mode' bit it as well in the future
then it will be even more useful all around.

So this is not just about the current first generation hardware, it's also
about what LWP could very well turn out to look like in the future, using
obvious extensions.

By making it a limited user-space hack just because LWP *can* be used as such a
hack we would really risk condemning a very valuable piece of silicon to that
stupid role forever. It does not have to be used as such a hack and it does not
have to be condemned to that role.

> But to come back to your idea, it probably could be done in a way to
> enable profiling of other applications using LWP. The kernel needs to
> allocate the lwp ring-buffer and setup lwp itself. [...]

Yes.

> [...] The problem is that the buffer needs to be user-accessible and where to
> map this buffer:
>
> a) On the kernel-part of the address space. Problematic because
> every process can read the buffer of other tasks. So this is
> a no-go from a security point-of-view.

No, the hardware buffer can (and should) be in user memory. We also want to
expose it (see raw decoding above), just like we expose raw events.

> b) Change the address space layout in a comatible way to allow
> the kernel to map it (e.g. make a small part of the
> kernel-address space per-process). Somewhat intrusive to
> current x86 code, also not sure this feature is worth it.

There's nothing wrong with allocating user memory on behalf of the task, if it
asks for it (or if the parent or some other controlling task wants to profile
the task) - we do it in a couple of places in the kernel - the perf subsystem
itself does it.

> c) Some way to let userspace setup such a buffer and give the
> address to the kernel, or we mmap it directly into user
> address space. But that may cause other problems with
> applications that have strict requirements for their
> address-space layout.

A buffer has to be allocated no matter who does it.

> Bottom-line is, we need a good and secure way to setup a user-accessible
> buffer per-process in the kernel. [...]

Correct. It can either be a do_mmap() call, or if we want to handle any aspect
of it ourselves then it can be done like arch/x86/vdso/vma.c::init_vdso_vars()
sets up the vdso vma.

We also obviously want to mlock this area (within the perf page-locking
limits). While the LWP hardware is robust enough to not crash on a not present
(paged out or not yet paged in) page, spuriously losing samples is not good.

> [...] If we have that we can use LWP to monitor other applications (unless
> the application decides to use LWP of its own).

Yes!

That's not surprising: the hw feature itself looks pretty decently done, except
the few things i noted in my first mail which limit its utility needlessly.
[ They could ask us next time around they add a feature like this. ]

> I like the idea, but we should also make sure that we don't prevent the
> low-impact self-monitoring use-case for applications that want it.

Yes, while i dont find that a too interesting usecase i see no problem with
exposing this 'raw' area to apps that want to parse it directly (in fact the hw
forces that, because the area has to be ring 3 writable) - as long as the whole
resource infrastructure of creating and managing it is sane.

The kernel is a resource manager and this is a useful CPU resource.

> > - LWP is exposed indiscriminately, without giving user-space a chance to
> > disable it on a per task basis. Security-conscious apps would want to disable
> > access to the LWP instructions - which are all ring 3 and unprivileged! We
> > already allow this for the TSC for example. Right now sandboxed code like
> > seccomp would get access to LWP as well - not good. Some intelligent
> > (optional) control is needed, probably using cr0's lwp-enabled bit.
>
> That could certainly be done, but requires an xcr0 write at
> context-switch. JFI, how can the tsc be disabled for a task from
> userspace?

See prctl_set_seccomp()'s disable_TSC call. The scheduler notices the TIF_NOTSC
flag and twiddles CR4::TSD. TSC disablement is implicit in the seccomp
execution model.

Here there should be a TIF_NOLWP, tied into seccomp by default and twiddling
xcr0 at context-switch.

This will be zero overhead by default.

Thanks,

Ingo

2011-05-18 11:23:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support


* Hans Rosenfeld <[email protected]> wrote:

> > Here are a couple of suggestions to LWP hardware designers:
> >
> > - the fact that LWP cannot count kernel events right now is unfortunate -
> > there's no reason not to allow privileged user-space to request ring 3
> > events as well - hopefully this misfeature will be fixed in future
> > iterations of the hardware.
> >
> > - it would be nice to allow the per task masking/unmasking of LWP without
> > having to modify the cr0 (which can be expensive). A third mode
> > implemented in the LWP_CFG MSG would suffice: it would make the LWP
> > instructions privileged, but would otherwise allow LWP event collection
> > to occur even on sandboxed code.
> >
> > - it would be nice to also log the previous retired instruction in the
> > trace entry, to ease decoding of the real instruction that generated
> > an event. (Fused instructions can generate their RIP at the first
> > instruction.)
>
> I will forward this to our hardware designers, but I have my doubts about the
> first two of your suggestions. They seem to be orthogonal to what LWP is
> supposed to be.

Not sure why you think those two suggestions are 'orthogonal to LWP', they are
not:

- the second suggestion adds a third security model to the current
all-or-nothing nature of LWP instructions.

- the first suggestion is a variation of its current security model as well:
it allows LWP driven event collection in kernel mode, not just user mode.

There is nothing fundamentally ring-3-only about the concept of 'light weight
profiling' - while ring-3-only event collection is understandably necessary for
unprivileged user-space, it is not the only interesting mode of lightweight
event collection.

Thanks,

Ingo

2011-05-18 13:52:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support


* Hans Rosenfeld <[email protected]> wrote:

> On Tue, May 17, 2011 at 07:30:20AM -0400, Ingo Molnar wrote:
> > Regarding the LWP bits, that branch was indeed excluded because of that crash,
> > while re-checking the branch today i noticed at least one serious design error
> > in it, which makes me reconsider the whole thing:
>
> If you don't like the patch to enable LWP, you could leave that one out
> for now. The other xsave rework patches are necessary for LWP, but they
> also make sense in their own right.
>
> > - Where is the hardware interrupt that signals the ring-buffer-full condition
> > exposed to user-space and how can user-space wait for ring buffer events?
> > AFAICS this needs to set the LWP_CFG MSR and needs an irq handler, which
> > needs kernel side support - but that is not included in these
> > patches.
>
> This is not strictly necessary. All that the LWP patch does is enable a
> new instruction set that can be used without any support for interrupts.
> A user process tracing itself with LWP can always poll the ring buffer.

Only allowing the buffer to be polled is like 1980's technology, we can (and
must) do better than that to expose useful hardware resources ...

LWP has a threshold interrupt and if you utilize it as i suggested it will
solve this problem.

> > The way we solved this with Intel's BTS (and PEBS) feature is that there's
> > a per task hardware buffer that is coupled with the event ring buffer, so
> > both setup and 'waiting' for the ring-buffer happens automatically and
> > transparently because tools can already wait on the ring-buffer.
> >
> > Considerable effort went into that model on the Intel side before we merged
> > it and i see no reason why an AMD hw-tracing feature should not have this
> > too...
>
> I don't see how that is related to LWP, which by design only works in user
> space and directly logs to user space buffers.

Both PEBS/BTS and LWP hw-logs to a buffer in virtual memory - full stop.

PEBS allows this buffer to be kernel privileged. That does not change the
fundamental model though: the best way to expose such capabilities is via a
standardized event interface.

> > [ If that is implemented we can expose LWP to user-space as well (which can
> > choose to utilize it directly and buffer into its own memory area without
> > irqs and using polling, but i'd generally discourage such crude event
> > collection methods). ]
>
> Well, thats exactly how LWP is supposed to work. Its all user space. It works
> only in user mode and it logs directly to a buffer in virtual address space
> of the process being traced. The kernel doesn't have to care at all about LWP
> for basic functionality, given that it enables the instruction set and
> saving/restoring of the LWP state. Enabling the LWP interrupt and relaying
> that as a signal or whatever is completely optional and can be done later if
> necessary.

This is a very poor model of exposing a useful (and valuable) CPU hardware
resource to user-space. Especially since we already have working example of how
to do a proper model via the PEBS/BTS code.

To give an example of where we turn a low level hardware resource into a higher
level concept: for example USB disks are primarily designed to be throw-away
containers of user space controlled trash - thus is the right way to expose
them would be to give the raw USB disk to user-space?

I don't think so, instead what we do is that we have kernel support for USB
disks which enumerates and organizes them into storage devices and we put
filesystems on them - so that apps can access an USB stick via standardized
APIs - without actually tools being particularly USB-aware.

There's still raw USB devices available which you can use if you really want
(or need) to, but 99% of the usage is via standardized, higher level
interfaces.

I'm sure you'll agree that this kind of standardization and abstraction helps!

We are trying to do something rather similar with PEBS/BTS and want to fit LWP
into that as well.

> > - LWP is exposed indiscriminately, without giving user-space a chance to
> > disable it on a per task basis. Security-conscious apps would want to disable
> > access to the LWP instructions - which are all ring 3 and unprivileged! We
> > already allow this for the TSC for example. Right now sandboxed code like
> > seccomp would get access to LWP as well - not good. Some intelligent
> > (optional) control is needed, probably using cr0's lwp-enabled bit.
>
> What exactly is the point here? If a program doesn't want to use LWP for
> whatever reason, it doesn't have to. [...]

The point is risk (surface of attack) reduction: in Linux there's support for
heavily sandboxed code, for which today we even turn the RDTSC instruction off.
(see my mail to Joerg for specific details about seccomp)

There's security models where they want to run untrusted assembly code but want
to reduce the attack surface as much as possible. Such code wants to be able to
exclude the LWP instructions from being used by sandboxed code.

> [...] No state is saved/restored by XSAVE/XRSTOR for LWP if it is unused. A
> security-conscious app would also not allow any LD_PRELOADs or anything like
> that which could use LWP behind its back. What exactly is gained by disabling
> it, except for breaking the specification?

The gain is to optionally allow it for sandboxing host code to turn off CPU
resources it does not want to expose. LWP is a whole new set of (rather
complex) CPU logic.

> Note that there is only one way to disable LWP, and that is clearing the LWP
> bit in the XFEATURE_ENABLED_MASK in XCR0. Messing with that in a running
> system will cause a lot of pain.

We already modify cr0 in certain circumstances so it's possible and robust but
indeed it's not particularly common at the moment - nor will it be the common
case in the future.

> > There are a couple of other items as well:
> >
> > - The LWP_CFG has other features as well, such as the ability to aggregate
> > events amongst cores. This is not exposed either. This looks like a lower
> > prio, optional item which could be offered after the first patches went
> > upstream.
>
> I don't see that anywhere in the specification, where did you find that?

There's a COREID field in the LWP_CFG MSR, which the spec says should be
initialized by the OS to the local APIC ID. I did not see this done in the
patches: it just enables LWP.

The core ID allows a buffering model where software can use a single target
buffer with multiple threads logging into it. The CoreID field is then used to
demultiplex which core the event originated from.

The LWP_CFG::COREID field has a reset value of 0, so with the current patches
if the BIOS forgets to initalize the MSR (not unheard of) we are left with only
partially working LWP: all-zeroes events and no ability to demux.

( Furthermore, the COREID has a hardware limit of 256 so any system bigger that
256 CPUs should gracefully bail out, should software attempt to set up a
shared buffer for say 512 cores. )

(I have looked at the revision 3.08 PDF.)

> > - like we do it for PEBS with the perf_attr.precise attribute, it would be nice
> > to report not RIP+1 but the real RIP itself. On Intel we use LBR to discover
> > the previous instruction, this might not be possible on AMD CPUs.
> >
> > One solution would be to disassemble the sampled instruction and approximate
> > the previous one by assuming that it's the preceding instruction (for
> > branches and calls this might not be true). If we do this then the event::FUS
> > bit has to be taken into account - in case the CPU has fused the instruction
> > and we have a two instructions delay in reporting.
> >
> > In any case, this is an optional item too and v1 support can be merged
> > without trying to implement precise RIP support.
> >
> > - there are a few interesting looking event details that we'd want to expose
> > in a generalized manner: branch taken/not taken bit, branch prediction
> > hit/miss bit, etc.
> >
> > This too is optional.
> >
> > - The LWPVAL instruction allows the user-space generation of samples. There
> > needs to be a matching generic event for it, which is then inserted into the
> > perf ring-buffer. Similarly, LWPINS needs to have a matching generic record
> > as well, so that user-space can decode it.
> >
> > This too looks optional to me.
> >
> > - You'd eventually want to expose the randomization (bits 60-63 in the LWPCB)
> > feature as well, via an attribute bit. Ditto for filtering such as cache
> > latency filtering, which looks the most useful. The low/high IP filter could
> > be exposed as well. All optional. For remaining featurities if there's no sane
> > way to expose them generally we can expose a raw event field as
> > well and have a raw event configuration space to twiddle these details.
> >
> > In general LWP is pretty neat and i agree that we want to offer it, it offers
> > access to five top categories of hw events (which we also have generalized):
> >
> > - instructions
> > - branches
> > - the most important types of cache misses
> > - CPU cycles
> > - constant (bus) cycles
> >
> > - user-space generated events/samples
> >
> > So it will fit nicely into our existing scheme of how we handle PMU features
> > and generalizations.
>
> I don't quite understand what you are proposing here. [...]

See Joerg's mail and my mail to Joerg, it outlines the model. Check out how we
support PEBS today, LWP support will look quite similar to it - the main
difference being how the buffer is allocated (for LWP it's a userspace buffer
in the target task's mm, shared with the monitoring task (which can be itself
as well)) and of course the hw specific configuration and parsing of the
events.

> [...] The LWPCB is controlled by the user space application that traces
> itself, so all of it is already exposed by the hardware. The samples are
> directly logged to the user space buffer by the hardware, so there is no work
> to do for the kernel here. Any post-processing of the samples (for precise
> RIP or such) needs to be done in the user space.

Yes, this is similar to the PEBS and BTS model: there too the hardware logs
automatically, and the kernel gets a threshold interrupt (or drains the queue
explicitly).

> We had some discussions about how to make LWP more accessible to users.
> Having LWP support in perf would certainly be nice, but the implementation
> would be very much different from that for other PMUs. LWP does almost
> everything in hardware that perf does in the kernel.

Yes, it should be supported similarly to how we support PEBS and BTS today.
This is a problem that has been solved already, the AMD LWP model is a newer
incarnation of the concept.

> As I said before, with this patch I'm enabling a new instruction set and
> associated extended state. How exactly user programs use it, and how it might
> fit into existing PMU APIs and tools is not really that important now.

How support for a new x86 CPU resource is integrated into the kernel is highly
important and relevant to me both as an x86 maintainer and as a perf/PMU
maintainer.

Thanks,

Ingo

2011-05-18 18:02:31

by Andreas Herrmann

[permalink] [raw]
Subject: Re: [RFC v3 0/8] x86, xsave: rework of extended state handling, LWP support

On Tue, May 17, 2011 at 01:30:20PM +0200, Ingo Molnar wrote:
>
> * Hans Rosenfeld <[email protected]> wrote:
>
> > Hi,
> >
> > On Thu, Apr 07, 2011 at 03:23:05AM -0400, Ingo Molnar wrote:
> > >
> > > FYI, the bits in tip:x86/xsave crash on boot on an AMD X2 testbox:
> >
> > > Full crashlog and kernel config attached. I've excluded x86/save from
> > > tip:master for now.
> >
> > this issue has been fixed a few weeks ago.
> >
> > Are there any plans to include x86/xsave into tip:master again?
>
> Regarding the LWP bits, that branch was indeed excluded because of that crash,
> while re-checking the branch today i noticed at least one serious design error
> in it, which makes me reconsider the whole thing:

Independend of all the concerns and useful comments regarding LWP ...
it seems to me that patches 1-7 are still of use (cleaning up the
save/restore code making it more maintainable).

Wouldn't it make sense to add patches 1-7 to a branch that is tested
with linux-next to find potential regressions?


Andreas