2016-03-04 18:15:52

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

XSAVES is a kernel-mode instruction. It offers a compacted format and
memory-write optimization. These patches fix known issues in the first
implementation. They are intended for discussion and getting feedback
before actually getting applied.

Version 4 adds a new patch (#9) to fix missing XSAVE legacy region
offset/size values.

Patch 1, 2, and 4 are for converting between kernel-mode xstate area and
signal frames.

Patch 3 fixes optimization issues introduced by XSAVES to the buffer
init_fpstate.

Patch 5 and 6 are related to xstate component offsets.

Patch 7 is for converting between kernel-mode xstate area and ptrace
frames.

Patch 8 fixes xstate area print out.

Patch 9 fixes missing offset/size values for XSAVE legacy region.

Patch 10 re-enables XSAVES.

Yu-cheng Yu (10):
x86/xsaves: Define and use user_xstate_size for xstate size in signal
context
x86/xsaves: Rename xstate_size to kernel_xstate_size to explicitly
distinguish xstate size in kernel from user space
x86/xsaves: Keep init_fpstate.xsave.header.xfeatures as zero for init
optimization
x86/xsaves: Introduce a new check that allows correct xstates copy
from kernel to user directly
x86/xsaves: Align xstate components according to CPUID
x86/xsaves: Supervisor state component offset
x86/xsaves: Fix PTRACE frames for XSAVES
x86/xsaves: Fix XSTATE component offset print out
x86/xsaves: Fix xstate_offsets, xstate_sizes for non-extended states
x86/xsaves: Re-enable XSAVES

arch/x86/include/asm/fpu/types.h | 2 +
arch/x86/include/asm/fpu/xstate.h | 8 +-
arch/x86/include/asm/processor.h | 3 +-
arch/x86/kernel/fpu/core.c | 6 +-
arch/x86/kernel/fpu/init.c | 32 +--
arch/x86/kernel/fpu/regset.c | 56 ++++--
arch/x86/kernel/fpu/signal.c | 69 ++++++-
arch/x86/kernel/fpu/xstate.c | 411 ++++++++++++++++++++++++++++++--------
8 files changed, 449 insertions(+), 138 deletions(-)

--
1.9.1


2016-03-04 18:16:09

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 01/10] x86/xsaves: Define and use user_xstate_size for xstate size in signal context

If "xsaves" is enabled, kernel always uses compacted format of xsave area.
But user space still uses standard format of xsave area. Thus, xstate size
in kernel's xsave area is smaller than xstate size in user's xsave area.
The xstate in signal frame should be in standard format for user's signal
handler to access.

In no "xsaves" case, xsave area in both user space and kernel space are in
standard format. Therefore, user's and kernel's xstate sizes are equal.

In "xsaves" case, xsave area in user space is in standard format while
xsave area in kernel space is in compacted format. Therefore, kernel's
xstate size is smaller than user's xstate size.

So here is the problem: currently kernel assumes its own xstate size is
signal frame's xstate size. This is not a problem in no "xsaves" case.
It is an issue in "xsaves" case because kernel's xstate size is smaller
than user's xstate size. In fpu__alloc_mathframe(), a smaller fpstate
buffer is allocated for the standard format xstate in signal frame.
Then kernel saves only part of xstate registers into this smaller
user's fpstate buffer and user will see part of the xstate registers in
signal context. Similar issue happens after returning from signal handler:
kernel will only restore part of xstate registers from user's fpstate
buffer in signal frame.

This patch defines and uses user_xstate_size for xstate size in signal
frame. It's read from returned value in ebx from CPUID leaf 0x0D subleaf
0x0. This is maximum size required by enabled states in XCR0 and may be
different from ecx when states at the end of the xsave area are not
enabled. This value indicates the size required for XSAVE to save all
supported user states in legacy/standard format.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
---
arch/x86/include/asm/fpu/xstate.h | 1 -
arch/x86/include/asm/processor.h | 1 +
arch/x86/kernel/fpu/init.c | 5 ++-
arch/x86/kernel/fpu/signal.c | 26 ++++++++++----
arch/x86/kernel/fpu/xstate.c | 71 ++++++++++++++++++++++++---------------
5 files changed, 67 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index af30fde..c6667f2 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -39,7 +39,6 @@
#define REX_PREFIX
#endif

-extern unsigned int xstate_size;
extern u64 xfeatures_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 20c11d1..01e127e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -366,6 +366,7 @@ DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
#endif /* X86_64 */

extern unsigned int xstate_size;
+extern unsigned int user_xstate_size;

struct perf_event;

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6d9f0a7..4ac2561 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -193,7 +193,7 @@ static void __init fpu__init_task_struct_size(void)
}

/*
- * Set up the xstate_size based on the legacy FPU context size.
+ * Set up the user and kernel xstate_size based on the legacy FPU context size.
*
* We set this up first, and later it will be overwritten by
* fpu__init_system_xstate() if the CPU knows about xstates.
@@ -224,6 +224,9 @@ static void __init fpu__init_system_xstate_size_legacy(void)
else
xstate_size = sizeof(struct fregs_state);
}
+
+ user_xstate_size = xstate_size;
+
/*
* Quirk: we don't yet handle the XSAVES* instructions
* correctly, as we don't correctly convert between
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 31c6a60..ee6d662 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -31,7 +31,7 @@ static inline int check_for_xstate(struct fxregs_state __user *buf,
/* Check for the first magic field and other error scenarios. */
if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
fx_sw->xstate_size < min_xstate_size ||
- fx_sw->xstate_size > xstate_size ||
+ fx_sw->xstate_size > user_xstate_size ||
fx_sw->xstate_size > fx_sw->extended_size)
return -1;

@@ -88,7 +88,7 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
if (!use_xsave())
return err;

- err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + xstate_size));
+ err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + user_xstate_size));

/*
* Read the xfeatures which we copied (directly from the cpu or
@@ -125,7 +125,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
else
err = copy_fregs_to_user((struct fregs_state __user *) buf);

- if (unlikely(err) && __clear_user(buf, xstate_size))
+ if (unlikely(err) && __clear_user(buf, user_xstate_size))
err = -EFAULT;
return err;
}
@@ -175,8 +175,19 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
if (ia32_fxstate)
copy_fxregs_to_kernel(&tsk->thread.fpu);
} else {
+ /*
+ * It is a *bug* if kernel uses compacted-format for xsave
+ * area and we copy it out directly to a signal frame. It
+ * should have been handled above by saving the registers
+ * directly.
+ */
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+ WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
+ return -1;
+ }
+
fpstate_sanitize_xstate(&tsk->thread.fpu);
- if (__copy_to_user(buf_fx, xsave, xstate_size))
+ if (__copy_to_user(buf_fx, xsave, user_xstate_size))
return -1;
}

@@ -341,7 +352,8 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)

static inline int xstate_sigframe_size(void)
{
- return use_xsave() ? xstate_size + FP_XSTATE_MAGIC2_SIZE : xstate_size;
+ return use_xsave() ? user_xstate_size + FP_XSTATE_MAGIC2_SIZE :
+ user_xstate_size;
}

/*
@@ -385,12 +397,12 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
*/
void fpu__init_prepare_fx_sw_frame(void)
{
- int size = xstate_size + FP_XSTATE_MAGIC2_SIZE;
+ int size = user_xstate_size + FP_XSTATE_MAGIC2_SIZE;

fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
fx_sw_reserved.extended_size = size;
fx_sw_reserved.xfeatures = xfeatures_mask;
- fx_sw_reserved.xstate_size = xstate_size;
+ fx_sw_reserved.xstate_size = user_xstate_size;

if (config_enabled(CONFIG_IA32_EMULATION) ||
config_enabled(CONFIG_X86_32)) {
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d425cda5..9d41c0f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -35,6 +35,8 @@ static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] =
static unsigned int xstate_sizes[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];

+unsigned int user_xstate_size;
+
/*
* Clear all of the X86_FEATURE_* bits that are unavailable
* when the CPU has no XSAVE support.
@@ -159,7 +161,7 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
*/
while (xfeatures) {
if (xfeatures & 0x1) {
- int offset = xstate_offsets[feature_bit];
+ int offset = xstate_comp_offsets[feature_bit];
int size = xstate_sizes[feature_bit];

memcpy((void *)fx + offset,
@@ -518,8 +520,9 @@ static void do_extra_xstate_size_checks(void)
XSTATE_WARN_ON(paranoid_xstate_size != xstate_size);
}

+
/*
- * Calculate total size of enabled xstates in XCR0/xfeatures_mask.
+ * Get total size of enabled xstates in XCR0/xfeatures_mask.
*
* Note the SDM's wording here. "sub-function 0" only enumerates
* the size of the *user* states. If we use it to size a buffer
@@ -529,34 +532,33 @@ static void do_extra_xstate_size_checks(void)
* Note that we do not currently set any bits on IA32_XSS so
* 'XCR0 | IA32_XSS == XCR0' for now.
*/
-static unsigned int __init calculate_xstate_size(void)
+static unsigned int __init get_xsaves_size(void)
{
unsigned int eax, ebx, ecx, edx;
- unsigned int calculated_xstate_size;
+ /*
+ * - CPUID function 0DH, sub-function 1:
+ * EBX enumerates the size (in bytes) required by
+ * the XSAVES instruction for an XSAVE area
+ * containing all the state components
+ * corresponding to bits currently set in
+ * XCR0 | IA32_XSS.
+ */
+ cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
+ return ebx;
+}

- if (!cpu_has_xsaves) {
- /*
- * - CPUID function 0DH, sub-function 0:
- * EBX enumerates the size (in bytes) required by
- * the XSAVE instruction for an XSAVE area
- * containing all the *user* state components
- * corresponding to bits currently set in XCR0.
- */
- cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
- calculated_xstate_size = ebx;
- } else {
- /*
- * - CPUID function 0DH, sub-function 1:
- * EBX enumerates the size (in bytes) required by
- * the XSAVES instruction for an XSAVE area
- * containing all the state components
- * corresponding to bits currently set in
- * XCR0 | IA32_XSS.
- */
- cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
- calculated_xstate_size = ebx;
- }
- return calculated_xstate_size;
+static unsigned int __init get_xsave_size(void)
+{
+ unsigned int eax, ebx, ecx, edx;
+ /*
+ * - CPUID function 0DH, sub-function 0:
+ * EBX enumerates the size (in bytes) required by
+ * the XSAVE instruction for an XSAVE area
+ * containing all the *user* state components
+ * corresponding to bits currently set in XCR0.
+ */
+ cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ return ebx;
}

/*
@@ -576,7 +578,15 @@ static bool is_supported_xstate_size(unsigned int test_xstate_size)
static int init_xstate_size(void)
{
/* Recompute the context size for enabled features: */
- unsigned int possible_xstate_size = calculate_xstate_size();
+ unsigned int possible_xstate_size;
+ unsigned int xsave_size;
+
+ xsave_size = get_xsave_size();
+
+ if (cpu_has_xsaves)
+ possible_xstate_size = get_xsaves_size();
+ else
+ possible_xstate_size = xsave_size;

/* Ensure we have the space to store all enabled: */
if (!is_supported_xstate_size(possible_xstate_size))
@@ -588,6 +598,11 @@ static int init_xstate_size(void)
*/
xstate_size = possible_xstate_size;
do_extra_xstate_size_checks();
+
+ /*
+ * User space is always in standard format.
+ */
+ user_xstate_size = xsave_size;
return 0;
}

--
1.9.1

2016-03-04 18:16:56

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 02/10] x86/xsaves: Rename xstate_size to kernel_xstate_size to explicitly distinguish xstate size in kernel from user space

User space uses standard format xsave area. fpstate in signal frame should
have standard format size.

To explicitly distinguish between xstate size in kernel space and the one
in user space, we rename xstate_size to kernel_xstate_size. This patch is
not fixing a bug. It just makes kernel code more clear.

So we define the xsave area sizes in two global variables:

kernel_xstate_size (previous xstate_size): the xsave area size used in
xsave area allocated in kernel
user_xstate_size: the xsave area size used in xsave area used by user.

In no "xsaves" case, xsave area in both user space and kernel space are in
standard format. Therefore, kernel_xstate_size and user_xstate_size are
equal.

In "xsaves" case, xsave area in user space is in standard format while
xsave area in kernel space is in compact format. Therefore, kernel's
xstate size is less than user's xstate size.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
---
arch/x86/include/asm/processor.h | 2 +-
arch/x86/kernel/fpu/core.c | 6 +++---
arch/x86/kernel/fpu/init.c | 18 +++++++++---------
arch/x86/kernel/fpu/signal.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 8 ++++----
5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 01e127e..68a5fa4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -365,7 +365,7 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
#endif /* X86_64 */

-extern unsigned int xstate_size;
+extern unsigned int kernel_xstate_size;
extern unsigned int user_xstate_size;

struct perf_event;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d25097c..faa00c0 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -214,7 +214,7 @@ void fpstate_init(union fpregs_state *state)
return;
}

- memset(state, 0, xstate_size);
+ memset(state, 0, kernel_xstate_size);

if (cpu_has_fxsr)
fpstate_init_fxstate(&state->fxsave);
@@ -238,7 +238,7 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
* leak into the child task:
*/
if (use_eager_fpu())
- memset(&dst_fpu->state.xsave, 0, xstate_size);
+ memset(&dst_fpu->state.xsave, 0, kernel_xstate_size);

/*
* Save current FPU registers directly into the child
@@ -258,7 +258,7 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
*/
preempt_disable();
if (!copy_fpregs_to_fpstate(dst_fpu)) {
- memcpy(&src_fpu->state, &dst_fpu->state, xstate_size);
+ memcpy(&src_fpu->state, &dst_fpu->state, kernel_xstate_size);
fpregs_deactivate(src_fpu);
}
preempt_enable();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 4ac2561..b5952d5 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -143,8 +143,8 @@ static void __init fpu__init_system_generic(void)
* This is inherent to the XSAVE architecture which puts all state
* components into a single, continuous memory block:
*/
-unsigned int xstate_size;
-EXPORT_SYMBOL_GPL(xstate_size);
+unsigned int kernel_xstate_size;
+EXPORT_SYMBOL_GPL(kernel_xstate_size);

/* Get alignment of the TYPE. */
#define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
@@ -176,7 +176,7 @@ static void __init fpu__init_task_struct_size(void)
* Add back the dynamically-calculated register state
* size.
*/
- task_size += xstate_size;
+ task_size += kernel_xstate_size;

/*
* We dynamically size 'struct fpu', so we require that
@@ -193,7 +193,7 @@ static void __init fpu__init_task_struct_size(void)
}

/*
- * Set up the user and kernel xstate_size based on the legacy FPU context size.
+ * Set up the user and kernel xstate sizes based on the legacy FPU context size.
*
* We set this up first, and later it will be overwritten by
* fpu__init_system_xstate() if the CPU knows about xstates.
@@ -206,7 +206,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
on_boot_cpu = 0;

/*
- * Note that xstate_size might be overwriten later during
+ * Note that xstate sizes might be overwriten later during
* fpu__init_system_xstate().
*/

@@ -217,15 +217,15 @@ static void __init fpu__init_system_xstate_size_legacy(void)
*/
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
- xstate_size = sizeof(struct swregs_state);
+ kernel_xstate_size = sizeof(struct swregs_state);
} else {
if (cpu_has_fxsr)
- xstate_size = sizeof(struct fxregs_state);
+ kernel_xstate_size = sizeof(struct fxregs_state);
else
- xstate_size = sizeof(struct fregs_state);
+ kernel_xstate_size = sizeof(struct fregs_state);
}

- user_xstate_size = xstate_size;
+ user_xstate_size = kernel_xstate_size;

/*
* Quirk: we don't yet handle the XSAVES* instructions
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index ee6d662..0fbf60c 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -261,7 +261,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
int ia32_fxstate = (buf != buf_fx);
struct task_struct *tsk = current;
struct fpu *fpu = &tsk->thread.fpu;
- int state_size = xstate_size;
+ int state_size = kernel_xstate_size;
u64 xfeatures = 0;
int fx_only = 0;

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9d41c0f..b8d6d98 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -517,7 +517,7 @@ static void do_extra_xstate_size_checks(void)
*/
paranoid_xstate_size += xfeature_size(i);
}
- XSTATE_WARN_ON(paranoid_xstate_size != xstate_size);
+ XSTATE_WARN_ON(paranoid_xstate_size != kernel_xstate_size);
}


@@ -596,7 +596,7 @@ static int init_xstate_size(void)
* The size is OK, we are definitely going to use xsave,
* make it known to the world that we need more space.
*/
- xstate_size = possible_xstate_size;
+ kernel_xstate_size = possible_xstate_size;
do_extra_xstate_size_checks();

/*
@@ -659,14 +659,14 @@ void __init fpu__init_system_xstate(void)
return;
}

- update_regset_xstate_info(xstate_size, xfeatures_mask);
+ update_regset_xstate_info(kernel_xstate_size, xfeatures_mask);
fpu__init_prepare_fx_sw_frame();
setup_init_fpu_buf();
setup_xstate_comp();

pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
xfeatures_mask,
- xstate_size,
+ kernel_xstate_size,
cpu_has_xsaves ? "compacted" : "standard");
}

--
1.9.1

2016-03-04 18:17:14

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 03/10] x86/xsaves: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization

Keep init_fpstate.xsave.header.xfeatures as zero for init optimization.
This is important for init optimization that is implemented in processor.
If a bit corresponding to an xstate in xstate_bv is 0, it means the
xstate is in init status and will not be read from memory to the processor
during XRSTOR/XRSTORS instruction. This largely impacts context switch
performance.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index b8d6d98..25e11a1 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -311,13 +311,11 @@ static void __init setup_init_fpu_buf(void)
setup_xstate_features();
print_xstate_features();

- if (cpu_has_xsaves) {
+ if (cpu_has_xsaves)
init_fpstate.xsave.header.xcomp_bv = (u64)1 << 63 | xfeatures_mask;
- init_fpstate.xsave.header.xfeatures = xfeatures_mask;
- }

/*
- * Init all the features state with header_bv being 0x0
+ * Init all the features state with header.xfeatures being 0x0
*/
copy_kernel_to_xregs_booting(&init_fpstate.xsave);

--
1.9.1

2016-03-04 18:17:20

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 06/10] x86/xsaves: Supervisor state component offset

CPUID function 0x0d, sub function (i, i > 1) returns in ebx the offset of
xstate component i. Zero is returned for a supervisor state. A supervisor
state can only be saved by XSAVES and XSAVES uses a compacted format.
There is no fixed offset for a supervisor state. This patch checks and
makes sure a supervisor state offset is not recorded or mis-used. This has
no effect in practice as we currently use no supervisor states, but it
would be good to fix.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/include/asm/fpu/types.h | 2 ++
arch/x86/include/asm/fpu/xstate.h | 3 ++
arch/x86/kernel/fpu/xstate.c | 62 ++++++++++++++++++++++++---------------
3 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 1c6f6ac..11466cf 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -108,6 +108,7 @@ enum xfeature {
XFEATURE_OPMASK,
XFEATURE_ZMM_Hi256,
XFEATURE_Hi16_ZMM,
+ XFEATURE_PT,

XFEATURE_MAX,
};
@@ -120,6 +121,7 @@ enum xfeature {
#define XFEATURE_MASK_OPMASK (1 << XFEATURE_OPMASK)
#define XFEATURE_MASK_ZMM_Hi256 (1 << XFEATURE_ZMM_Hi256)
#define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
+#define XFEATURE_MASK_PT (1 << XFEATURE_PT)

#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index c6667f2..b4f5d94 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -18,6 +18,9 @@
#define XSAVE_YMM_SIZE 256
#define XSAVE_YMM_OFFSET (XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)

+/* Supervisor features */
+#define XFEATURE_MASK_SUPERVISOR (XFEATURE_MASK_PT)
+
/* Supported features which support lazy state saving */
#define XFEATURE_MASK_LAZY (XFEATURE_MASK_FP | \
XFEATURE_MASK_SSE)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6e42b87..aaab0d3 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -95,6 +95,27 @@ int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
}
EXPORT_SYMBOL_GPL(cpu_has_xfeatures);

+static int xfeature_is_supervisor(int xfeature_nr)
+{
+ /*
+ * We currently do not support supervisor states, but if
+ * we did, we could find out like this.
+ *
+ * SDM says: If state component i is a user state component,
+ * ECX[0] return 0; if state component i is a supervisor
+ * state component, ECX[0] returns 1.
+ */
+ u32 eax, ebx, ecx, edx;
+
+ cpuid_count(XSTATE_CPUID, xfeature_nr, &eax, &ebx, &ecx, &edx);
+ return !!(ecx & 1);
+}
+
+static int xfeature_is_user(int xfeature_nr)
+{
+ return !xfeature_is_supervisor(xfeature_nr);
+}
+
/*
* When executing XSAVEOPT (or other optimized XSAVE instructions), if
* a processor implementation detects that an FPU state component is still
@@ -213,7 +234,14 @@ static void __init setup_xstate_features(void)
continue;

cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx);
- xstate_offsets[i] = ebx;
+
+ /*
+ * If an xfeature is supervisor state, the offset
+ * in ebx is invalid. We leave it to -1.
+ */
+ if (xfeature_is_user(i))
+ xstate_offsets[i] = ebx;
+
xstate_sizes[i] = eax;
/*
* In our xstate size checks, we assume that the
@@ -357,32 +385,20 @@ static void __init setup_init_fpu_buf(void)
copy_xregs_to_kernel_booting(&init_fpstate.xsave);
}

-static int xfeature_is_supervisor(int xfeature_nr)
-{
- /*
- * We currently do not support supervisor states, but if
- * we did, we could find out like this.
- *
- * SDM says: If state component i is a user state component,
- * ECX[0] return 0; if state component i is a supervisor
- * state component, ECX[0] returns 1.
- u32 eax, ebx, ecx, edx;
- cpuid_count(XSTATE_CPUID, xfeature_nr, &eax, &ebx, &ecx, &edx;
- return !!(ecx & 1);
- */
- return 0;
-}
-/*
-static int xfeature_is_user(int xfeature_nr)
-{
- return !xfeature_is_supervisor(xfeature_nr);
-}
-*/
-
static int xfeature_uncompacted_offset(int xfeature_nr)
{
u32 eax, ebx, ecx, edx;

+ /*
+ * Only XSAVES supports supervisor states and it uses compacted
+ * format. Checking a supervisor state's uncompacted offset is
+ * an error.
+ */
+ if (XFEATURE_MASK_SUPERVISOR & (1 << xfeature_nr)) {
+ WARN_ONCE(1, "No fixed offset for xstate %d\n", xfeature_nr);
+ return -1;
+ }
+
CHECK_XFEATURE(xfeature_nr);
cpuid_count(XSTATE_CPUID, xfeature_nr, &eax, &ebx, &ecx, &edx);
return ebx;
--
1.9.1

2016-03-04 18:17:33

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 08/10] x86/xsaves: Fix XSTATE component offset print out

Component offset print out was incorrect for XSAVES. Correct it and move
to a separate function.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index b9d4d59..2e80d6f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -251,8 +251,6 @@ static void __init setup_xstate_features(void)
WARN_ONCE(last_good_offset > xstate_offsets[i],
"x86/fpu: misordered xstate at %d\n", last_good_offset);
last_good_offset = xstate_offsets[i];
-
- printk(KERN_INFO "x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d\n", i, ebx, i, eax);
}
}

@@ -355,6 +353,21 @@ static void __init setup_xstate_comp(void)
}

/*
+ * Print out xstate component offsets and sizes
+ */
+static void __init print_xstate_offset_size(void)
+{
+ int i;
+
+ for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
+ if (!xfeature_enabled(i))
+ continue;
+ pr_info("x86/fpu: xstate_offset[%d]: %4d, xstate_sizes[%d]: %4d\n",
+ i, xstate_comp_offsets[i], i, xstate_sizes[i]);
+ }
+}
+
+/*
* setup the xstate image representing the init state
*/
static void __init setup_init_fpu_buf(void)
@@ -687,6 +700,7 @@ void __init fpu__init_system_xstate(void)
fpu__init_prepare_fx_sw_frame();
setup_init_fpu_buf();
setup_xstate_comp();
+ print_xstate_offset_size();

pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
xfeatures_mask,
--
1.9.1

2016-03-04 18:17:30

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

XSAVES uses compacted format and is a kernel instruction. The kernel
should use standard-format, non-supervisor state data for PTRACE.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/include/asm/fpu/xstate.h | 4 +
arch/x86/kernel/fpu/regset.c | 56 +++++++++---
arch/x86/kernel/fpu/xstate.c | 173 +++++++++++++++++++++++++++++++++++++-
3 files changed, 217 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index b4f5d94..a5cd808 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -50,5 +50,9 @@ extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
void fpu__xstate_clear_all_cpu_caps(void);
void *get_xsave_addr(struct xregs_state *xsave, int xstate);
const void *get_xsave_field_ptr(int xstate_field);
+int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf,
+ void __user *ubuf, const struct xregs_state *xsave);
+int copyin_to_xsaves(const void *kbuf, const void __user *ubuf,
+ struct xregs_state *xsave);

#endif
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 0bc3490..61fe8e9 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -4,6 +4,7 @@
#include <asm/fpu/internal.h>
#include <asm/fpu/signal.h>
#include <asm/fpu/regset.h>
+#include <asm/fpu/xstate.h>

/*
* The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
@@ -82,21 +83,30 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

+ xsave = &fpu->state.xsave;
+
fpu__activate_fpstate_read(fpu);

- xsave = &fpu->state.xsave;
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+ ret = copyout_from_xsaves(pos, count, kbuf, ubuf, xsave);
+ } else {
+ fpstate_sanitize_xstate(fpu);
+
+ /*
+ * Copy the 48 bytes defined by the software into the xsave
+ * area in the thread struct, so that we can copy the whole
+ * area to user using one user_regset_copyout().
+ */
+ memcpy(&xsave->i387.sw_reserved,
+ xstate_fx_sw_bytes, sizeof(xstate_fx_sw_bytes));
+
+ /*
+ * Copy the xstate memory layout.
+ */
+ ret = user_regset_copyout(&pos,
+ &count, &kbuf, &ubuf, xsave, 0, -1);
+ }

- /*
- * Copy the 48bytes defined by the software first into the xstate
- * memory layout in the thread struct, so that we can copy the entire
- * xstateregs to the user using one user_regset_copyout().
- */
- memcpy(&xsave->i387.sw_reserved,
- xstate_fx_sw_bytes, sizeof(xstate_fx_sw_bytes));
- /*
- * Copy the xstate memory layout.
- */
- ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
return ret;
}

@@ -111,11 +121,29 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;

- fpu__activate_fpstate_write(fpu);
+ /*
+ * A whole standard-format XSAVE buffer is needed.
+ */
+ if ((pos != 0) || (count < user_xstate_size))
+ return -EFAULT;

xsave = &fpu->state.xsave;

- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
+ fpu__activate_fpstate_write(fpu);
+
+ if (boot_cpu_has(X86_FEATURE_XSAVES))
+ ret = copyin_to_xsaves(kbuf, ubuf, xsave);
+ else
+ ret = user_regset_copyin(&pos,
+ &count, &kbuf, &ubuf, xsave, 0, -1);
+
+ /*
+ * In case of failure, mark all states as init.
+ */
+
+ if (ret)
+ fpstate_init(&fpu->state);
+
/*
* mxcsr reserved bits must be masked to zero for security reasons.
*/
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index aaab0d3..b9d4d59 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -677,7 +677,13 @@ void __init fpu__init_system_xstate(void)
return;
}

- update_regset_xstate_info(kernel_xstate_size, xfeatures_mask);
+ /*
+ * Update info used for ptrace frames; use standard-format size and no
+ * supervisor xstates.
+ */
+ update_regset_xstate_info(user_xstate_size,
+ xfeatures_mask & ~XFEATURE_MASK_SUPERVISOR);
+
fpu__init_prepare_fx_sw_frame();
setup_init_fpu_buf();
setup_xstate_comp();
@@ -701,6 +707,15 @@ void fpu__resume_cpu(void)
}

/*
+ * Get an xstate component address for either kernel or user mode.
+ */
+static void *get_xsave_addr_no_check(const struct xregs_state *xsave,
+ int feature_nr)
+{
+ return (void *)xsave + xstate_comp_offsets[feature_nr];
+}
+
+/*
* Given the xsave area and a state inside, this function returns the
* address of the state.
*
@@ -748,7 +763,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
if (!(xsave->header.xfeatures & xstate_feature))
return NULL;

- return (void *)xsave + xstate_comp_offsets[feature_nr];
+ return get_xsave_addr_no_check(xsave, feature_nr);
}
EXPORT_SYMBOL_GPL(get_xsave_addr);

@@ -783,3 +798,157 @@ const void *get_xsave_field_ptr(int xsave_state)

return get_xsave_addr(&fpu->state.xsave, xsave_state);
}
+
+/*
+ * This is similar to user_regset_copyout(), but will not add offset to
+ * the source data pointer or increment pos, count, kbuf, and ubuf.
+ */
+static inline int xstate_copyout(unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf,
+ const void *data, const int start_pos,
+ const int end_pos)
+{
+ if ((count == 0) || (pos < start_pos))
+ return 0;
+
+ if (end_pos < 0 || pos < end_pos) {
+ unsigned int copy =
+ (end_pos < 0 ? count : min(count, end_pos - pos));
+
+ if (kbuf)
+ memcpy(kbuf + pos, data, copy);
+ else if (__copy_to_user(ubuf + pos, data, copy))
+ return -EFAULT;
+ }
+ return 0;
+}
+
+/*
+ * Convert from kernel XSAVES compacted format to standard format and copy
+ * to a ptrace buffer. It supports partial copy but pos always starts from
+ * zero. This is called from xstateregs_get() and there we check the cpu
+ * has XSAVES.
+ */
+int copyout_from_xsaves(unsigned int pos, unsigned int count, void *kbuf,
+ void __user *ubuf, const struct xregs_state *xsave)
+{
+ unsigned int offset, size;
+ int ret, i;
+ struct xstate_header header;
+
+ /*
+ * Currently copy_regset_to_user() starts from pos 0.
+ */
+ if (unlikely(pos != 0))
+ return -EFAULT;
+
+ /*
+ * The destination is a ptrace buffer; we put in only user xstates.
+ */
+ memset(&header, 0, sizeof(header));
+ header.xfeatures = xsave->header.xfeatures;
+ header.xfeatures &= ~XFEATURE_MASK_SUPERVISOR;
+
+ /*
+ * Copy xregs_state->header.
+ */
+ offset = offsetof(struct xregs_state, header);
+ size = sizeof(header);
+
+ ret = xstate_copyout(offset, size, kbuf, ubuf, &header, 0, count);
+
+ if (ret)
+ return ret;
+
+ for (i = 0; i < XFEATURE_MAX; i++) {
+ /*
+ * Copy only in-use xstates.
+ */
+ if (((header.xfeatures >> i) & 1) && xfeature_enabled(i)) {
+ void *src = get_xsave_addr_no_check(xsave, i);
+
+ offset = xstate_offsets[i];
+ size = xstate_sizes[i];
+
+ ret = xstate_copyout(offset, size, kbuf, ubuf, src, 0,
+ count);
+
+ if (ret)
+ return ret;
+
+ if (offset + size >= count)
+ break;
+ }
+ }
+
+ /*
+ * Fill xsave->i387.sw_reserved value for ptrace frame.
+ */
+ offset = offsetof(struct fxregs_state, sw_reserved);
+ size = sizeof(xstate_fx_sw_bytes);
+
+ ret = xstate_copyout(offset, size, kbuf, ubuf, xstate_fx_sw_bytes, 0,
+ count);
+
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/*
+ * Convert from a ptrace standard-format buffer to kernel XSAVES format
+ * and copy to the target thread. This is called from xstateregs_set() and
+ * there we check the cpu has XSAVES and a whole standard-sized buffer
+ * exists.
+ */
+int copyin_to_xsaves(const void *kbuf, const void __user *ubuf,
+ struct xregs_state *xsave)
+{
+ unsigned int offset, size;
+ int i;
+ u64 xfeatures;
+
+ offset = offsetof(struct xregs_state, header);
+ size = sizeof(xfeatures);
+
+ if (kbuf)
+ memcpy(&xfeatures, kbuf + offset, size);
+ else if (__copy_from_user(&xfeatures, ubuf + offset, size))
+ return -EFAULT;
+
+ /*
+ * Reject if the user tries to set any supervisor xstates.
+ */
+ if (xfeatures & XFEATURE_MASK_SUPERVISOR)
+ return -EINVAL;
+
+ for (i = 0; i < XFEATURE_MAX; i++) {
+ u64 mask = ((u64)1 << i);
+
+ if ((xfeatures & mask) && xfeature_enabled(i)) {
+ void *dst = get_xsave_addr_no_check(xsave, i);
+
+ offset = xstate_offsets[i];
+ size = xstate_sizes[i];
+
+ if (kbuf)
+ memcpy(dst, kbuf + offset, size);
+ else if (__copy_from_user(dst, ubuf + offset, size))
+ return -EFAULT;
+ }
+ }
+
+ /*
+ * The state that came in from userspace was user-state only.
+ * Mask all the user states out of 'xfeatures'.
+ */
+ xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR;
+
+ /*
+ * Add back in the features that came in from userspace.
+ */
+ xsave->header.xfeatures |= xfeatures;
+
+ return 0;
+}
--
1.9.1

2016-03-04 18:17:28

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 09/10] x86/xsaves: Fix xstate_offsets, xstate_sizes for legacy components

The arrays xstate_offsets[] and xstate_sizes[] record XSAVE area
offsets and sizes. Values for legacy components i387 and XMMs were
not initialized. Fix it.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2e80d6f..06618ea 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -228,6 +228,15 @@ static void __init setup_xstate_features(void)
/* start at the beginnning of the "extended state" */
unsigned int last_good_offset = offsetof(struct xregs_state,
extended_state_area);
+ /*
+ * The FP xstates and SSE xstates are legacy states. They are always
+ * in the fixed offsets in the xsave area in either compacted form
+ * or standard form.
+ */
+ xstate_offsets[0] = 0;
+ xstate_sizes[0] = offsetof(struct fxregs_state, xmm_space);
+ xstate_offsets[1] = xstate_sizes[0];
+ xstate_sizes[1] = FIELD_SIZEOF(struct fxregs_state, xmm_space);

for (i = FIRST_EXTENDED_XFEATURE; i < XFEATURE_MAX; i++) {
if (!xfeature_enabled(i))
--
1.9.1

2016-03-04 18:17:26

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 10/10] x86/xsaves: Re-enable XSAVES

We did not handle XSAVES* instructions correctly. There were issues in
converting between standard and compacted format when interfacing with
user-space. These issues have been corrected.

Add a WARN_ONCE() to make it clear that XSAVES supervisor states are not
yet implemented.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/kernel/fpu/init.c | 15 ---------------
arch/x86/kernel/fpu/xstate.c | 10 ++++++++++
2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index b5952d5..6bfa059 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -226,21 +226,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
}

user_xstate_size = kernel_xstate_size;
-
- /*
- * Quirk: we don't yet handle the XSAVES* instructions
- * correctly, as we don't correctly convert between
- * standard and compacted format when interfacing
- * with user-space - so disable it for now.
- *
- * The difference is small: with recent CPUs the
- * compacted format is only marginally smaller than
- * the standard FPU state format.
- *
- * ( This is easy to backport while we are fixing
- * XSAVES* support. )
- */
- setup_clear_cpu_cap(X86_FEATURE_XSAVES);
}

/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 06618ea..541f56e 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -204,6 +204,16 @@ void fpu__init_cpu_xstate(void)
if (!cpu_has_xsave || !xfeatures_mask)
return;

+ /*
+ * Make it clear that XSAVES supervisor states are not yet
+ * implemented should anyone expect it to work by changing
+ * bits in XFEATURE_MASK_* macros and XCR0.
+ */
+ WARN_ONCE((xfeatures_mask & XFEATURE_MASK_SUPERVISOR),
+ "x86/fpu: XSAVES supervisor states are not yet implemented.\n");
+
+ xfeatures_mask &= ~XFEATURE_MASK_SUPERVISOR;
+
cr4_set_bits(X86_CR4_OSXSAVE);
xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
}
--
1.9.1

2016-03-04 18:17:16

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

XSAVES is a kernel instruction and uses a compacted format. When
working with user space, the kernel should provide standard-format,
non-supervisor state data. We cannot do __copy_to_user() from a compacted-
format kernel xstate area to a signal frame.

Note that the path to copy_fpstate_to_sigframe() does currently check if
the thread has used FPU, but add a WARN_ONCE() there to detect any
potential mis-use.

Dave Hansen proposes this method to simplify copy xstate directly to user.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off by: Yu-cheng Yu <[email protected]>
---
arch/x86/kernel/fpu/signal.c | 41 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 0fbf60c..09945f1 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -130,6 +130,45 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
return err;
}

+static int may_copy_fpregs_to_sigframe(void)
+{
+ /*
+ * In signal handling path, the kernel already checks if
+ * FPU instructions have been used before it calls
+ * copy_fpstate_to_sigframe(). We check this here again
+ * to detect any potential mis-use and saving invalid
+ * register values directly to a signal frame.
+ */
+ WARN_ONCE(!current->thread.fpu.fpstate_active,
+ "direct FPU save with no math use\n");
+
+ /*
+ * In the case that we are using a compacted kernel
+ * xsave area, we can not copy the thread.fpu.state
+ * directly to userspace and *must* save it from the
+ * registers directly.
+ */
+ if (boot_cpu_has(X86_FEATURE_XSAVES))
+ return 1;
+
+ /*
+ * fpregs_active() means "Can I use the FPU hardware
+ * without taking a device-not-available exception?" This
+ * means that saving the registers directly will be
+ * cheaper than copying their contents out of
+ * thread.fpu.state.
+ *
+ * Note that fpregs_active() is inherently racy and may
+ * become false at any time. If this race happens, we
+ * will take a harmless device-not-available exception
+ * when we attempt the FPU save instruction.
+ */
+ if (fpregs_active())
+ return 1;
+
+ return 0;
+}
+
/*
* Save the fpu, extended register state to the user signal frame.
*
@@ -167,7 +206,7 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
sizeof(struct user_i387_ia32_struct), NULL,
(struct _fpstate_32 __user *) buf) ? -1 : 1;

- if (fpregs_active()) {
+ if (may_copy_fpregs_to_sigframe()) {
/* Save the live register state to the user directly. */
if (copy_fpregs_to_sigframe(buf_fx))
return -1;
--
1.9.1

2016-03-04 18:18:39

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v4 05/10] x86/xsaves: Align xstate components according to CPUID

CPUID function 0x0d, sub function (i, i > 1) returns in ecx[1] the
alignment requirement of component i when the compacted format is used.

If ecx[1] is 0, component i is located immediately following the preceding
component. If ecx[1] is 1, component i is located on the next 64-byte
boundary following the preceding component.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 60 +++++++++++++++++++++++---------------------
1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 25e11a1..6e42b87 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -252,6 +252,33 @@ static void __init print_xstate_features(void)
}

/*
+ * This check is important because it is easy to get XSTATE_*
+ * confused with XSTATE_BIT_*.
+ */
+#define CHECK_XFEATURE(nr) do { \
+ WARN_ON(nr < FIRST_EXTENDED_XFEATURE); \
+ WARN_ON(nr >= XFEATURE_MAX); \
+} while (0)
+
+/*
+ * We could cache this like xstate_size[], but we only use
+ * it here, so it would be a waste of space.
+ */
+static int xfeature_is_aligned(int xfeature_nr)
+{
+ u32 eax, ebx, ecx, edx;
+
+ CHECK_XFEATURE(xfeature_nr);
+ cpuid_count(XSTATE_CPUID, xfeature_nr, &eax, &ebx, &ecx, &edx);
+ /*
+ * The value returned by ECX[1] indicates the alignment
+ * of state component i when the compacted format
+ * of the extended region of an XSAVE area is used
+ */
+ return !!(ecx & 2);
+}
+
+/*
* This function sets up offsets and sizes of all extended states in
* xsave area. This supports both standard format and compacted format
* of the xsave aread.
@@ -288,10 +315,14 @@ static void __init setup_xstate_comp(void)
else
xstate_comp_sizes[i] = 0;

- if (i > FIRST_EXTENDED_XFEATURE)
+ if (i > FIRST_EXTENDED_XFEATURE) {
xstate_comp_offsets[i] = xstate_comp_offsets[i-1]
+ xstate_comp_sizes[i-1];

+ if (xfeature_is_aligned(i))
+ xstate_comp_offsets[i] =
+ ALIGN(xstate_comp_offsets[i], 64);
+ }
}
}

@@ -348,33 +379,6 @@ static int xfeature_is_user(int xfeature_nr)
}
*/

-/*
- * This check is important because it is easy to get XSTATE_*
- * confused with XSTATE_BIT_*.
- */
-#define CHECK_XFEATURE(nr) do { \
- WARN_ON(nr < FIRST_EXTENDED_XFEATURE); \
- WARN_ON(nr >= XFEATURE_MAX); \
-} while (0)
-
-/*
- * We could cache this like xstate_size[], but we only use
- * it here, so it would be a waste of space.
- */
-static int xfeature_is_aligned(int xfeature_nr)
-{
- u32 eax, ebx, ecx, edx;
-
- CHECK_XFEATURE(xfeature_nr);
- cpuid_count(XSTATE_CPUID, xfeature_nr, &eax, &ebx, &ecx, &edx);
- /*
- * The value returned by ECX[1] indicates the alignment
- * of state component i when the compacted format
- * of the extended region of an XSAVE area is used
- */
- return !!(ecx & 2);
-}
-
static int xfeature_uncompacted_offset(int xfeature_nr)
{
u32 eax, ebx, ecx, edx;
--
1.9.1

2016-04-29 18:09:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

Hi Folks,

I've heard through the grapevine that there's some concern that we
should not be bothering to enable XSAVES because there's not a
sufficient use case for it. Maybe it's meager today, but I still think
we should do it.

I'll try to lay out why.

Today, on every Skylake system, this patch saves 128 bytes in each
task_struct. If there were an Atom system with XSAVES it would save 384
bytes since there is no AVX support on Atom. If there were a future
processor which has an xstate _past_ AVX-512, but that does not have
AVX-512 itself, that savings goes up to 2048+384 bytes. I believe it is
*inevitable* that the savings will become substantial.

Plus, if the processors ever start supporting a supervisor state that we
_need_ in Linux, we have to XSAVES support anyway.

It's inevitable that we _will_ need it.

Why do it today?

Now that Skylake is out, we _can_ get reasonable testing of this feature
from early adopters in the wild. If we turn this on today, and it
breaks, we break a relatively modest number of Skylake systems (1%? 2%?
0.1%?). Let's say we wait $X years when the benefits are greater. We
turn it on, and something breaks. We'll break 50% (or 40% or whatever)
of the systems in production.

Once we *HAVE* XSAVES support, it also opens up the possibilities for
doing things like dynamic XSAVE buffer allocation. For instance, let
threads that are not _using_ AVX-512 not waste the 2k of space for it.

So why wait?

2016-04-29 19:43:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues


* Dave Hansen <[email protected]> wrote:

> Hi Folks,
>
> I've heard through the grapevine that there's some concern that we
> should not be bothering to enable XSAVES because there's not a
> sufficient use case for it. [...]

So I have no fundamental objections against this series - I didn't apply it back
in March because not all patches had your Reviewed-by tag. Basically after you
sorted out all the XSAVE dynamic feature detection/sizing issues I was a happy
camper and have no objection against XSAVES.

Could you please send a refreshed version against the latestest tip:master?

Thanks,

Ingo

2016-04-29 20:01:49

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On Fri, Apr 29, 2016 at 11:09:23AM -0700, Dave Hansen wrote:

> Once we *HAVE* XSAVES support, it also opens up the possibilities for
> doing things like dynamic XSAVE buffer allocation. For instance, let
> threads that are not _using_ AVX-512 not waste the 2k of space for it.

If we can somehow modify exec* system call to scan the executable binary (in user space) and pass along a bitmask containing xfeatures used in the binary, and XSAVES is enabled in the kernel, we can easily save a lot of memory. The kernel only needs to allocate space for tasks that actually use xstates; most of them do not.

2016-04-29 20:03:46

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On 04/29/2016 12:57 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 11:09:23AM -0700, Dave Hansen wrote:
>> Once we *HAVE* XSAVES support, it also opens up the possibilities for
>> doing things like dynamic XSAVE buffer allocation. For instance, let
>> threads that are not _using_ AVX-512 not waste the 2k of space for
>> it.
>
> If we can somehow modify exec* system call to scan the executable
> binary (in user space) and pass along a bitmask containing xfeatures
> used in the binary, and XSAVES is enabled in the kernel, we can
> easily save a lot of memory. The kernel only needs to allocate space
> for tasks that actually use xstates; most of them do not.

That's not feasible. Think of dynamic libraries or just-in-time
compilers. What instruction set does /usr/bin/java use, for instance? :)

2016-04-29 20:09:11

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 0fbf60c..09945f1 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -130,6 +130,45 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
> return err;
> }
>
> +static int may_copy_fpregs_to_sigframe(void)
> +{
> + /*
> + * In signal handling path, the kernel already checks if
> + * FPU instructions have been used before it calls
> + * copy_fpstate_to_sigframe(). We check this here again
> + * to detect any potential mis-use and saving invalid
> + * register values directly to a signal frame.
> + */
> + WARN_ONCE(!current->thread.fpu.fpstate_active,
> + "direct FPU save with no math use\n");

This is probably an OK check for this _particular_ context (since this
context is all ready to copy_to_user() the fpu state). But is it good
generally? Why couldn't you have a !fpstate_active thread that _was_
fpregs_active?

Such a thread _could_ do a direct XSAVE with no issues.

2016-04-29 20:11:34

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote:
> That's not feasible. Think of dynamic libraries or just-in-time
> compilers. What instruction set does /usr/bin/java use, for instance? :)

The java argument is true. In that case or when the bitmask is missing, we can allocate for all supported features.

2016-04-29 20:12:04

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] x86/xsaves: Align xstate components according to CPUID

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> CPUID function 0x0d, sub function (i, i > 1) returns in ecx[1] the
> alignment requirement of component i when the compacted format is used.
>
> If ecx[1] is 0, component i is located immediately following the preceding
> component. If ecx[1] is 1, component i is located on the next 64-byte
> boundary following the preceding component.

Reviewed-by: Dave Hansen <[email protected]>

2016-04-29 20:17:04

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> + if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> + ret = copyout_from_xsaves(pos, count, kbuf, ubuf, xsave);

On a higher level, we really should stop using
"boot_cpu_has(X86_FEATURE_XSAVES)" as a proxy for "the kernel XSAVE
buffer is in the XSAVES format". I think our use of the _hardware_
CPUID bit is confusing at least the KVM folks.

We probably want a software X86_FEATURE_OS_XSAVES or something.

2016-04-29 20:25:32

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On Fri, Apr 29, 2016 at 1:07 PM, Yu-cheng Yu <[email protected]> wrote:
> On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote:
>> That's not feasible. Think of dynamic libraries or just-in-time
>> compilers. What instruction set does /usr/bin/java use, for instance? :)
>
> The java argument is true. In that case or when the bitmask is missing, we can allocate for all supported features.
>

I actually want to see us moving in the direction of unconditionally
allocating everything on process startup. If we can stop using CR0.TS
entirely, I think everything will be better.

--Andy

2016-04-29 20:25:39

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> + for (i = 0; i < XFEATURE_MAX; i++) {
> + /*
> + * Copy only in-use xstates.
> + */
> + if (((header.xfeatures >> i) & 1) && xfeature_enabled(i)) {
> + void *src = get_xsave_addr_no_check(xsave, i);

How could a bit in header.xfeatures get set if it is not set in
xfeature_enabled() aka xfeatures_mask aka XCR0?

...
> +int copyin_to_xsaves(const void *kbuf, const void __user *ubuf,
> + struct xregs_state *xsave)
> +{
> + unsigned int offset, size;
> + int i;
> + u64 xfeatures;
> +
> + offset = offsetof(struct xregs_state, header);
> + size = sizeof(xfeatures);
> +
> + if (kbuf)
> + memcpy(&xfeatures, kbuf + offset, size);
> + else if (__copy_from_user(&xfeatures, ubuf + offset, size))
> + return -EFAULT;
> +
> + /*
> + * Reject if the user tries to set any supervisor xstates.
> + */
> + if (xfeatures & XFEATURE_MASK_SUPERVISOR)
> + return -EINVAL;
> +
> + for (i = 0; i < XFEATURE_MAX; i++) {
> + u64 mask = ((u64)1 << i);
> +
> + if ((xfeatures & mask) && xfeature_enabled(i)) {
> + void *dst = get_xsave_addr_no_check(xsave, i);
> +
> + offset = xstate_offsets[i];
> + size = xstate_sizes[i];
> +
> + if (kbuf)
> + memcpy(dst, kbuf + offset, size);
> + else if (__copy_from_user(dst, ubuf + offset, size))
> + return -EFAULT;
> + }
> + }

If a caller tries to pass a non-enabled xfeature in, we appear to just
silently drop it and return success. Is that really what we want to do
or do we want to error out?

2016-04-29 20:26:56

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 08/10] x86/xsaves: Fix XSTATE component offset print out

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> Component offset print out was incorrect for XSAVES. Correct it and move
> to a separate function.

Reviewed-by: Dave Hansen <[email protected]>

2016-04-29 20:28:34

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] x86/xsaves: Fix xstate_offsets, xstate_sizes for legacy components

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> The arrays xstate_offsets[] and xstate_sizes[] record XSAVE area
> offsets and sizes. Values for legacy components i387 and XMMs were
> not initialized. Fix it.

Is this just a completeness thing or does it actually break something?

In any case:

Reviewed-by: Dave Hansen <[email protected]>

2016-04-29 20:32:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] x86/xsaves: Re-enable XSAVES

On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> We did not handle XSAVES* instructions correctly. There were issues in
> converting between standard and compacted format when interfacing with
> user-space. These issues have been corrected.
>
> Add a WARN_ONCE() to make it clear that XSAVES supervisor states are not
> yet implemented.

The reason I haven't acked this patch is that I want to be _sure_ that
we've audited all of the call paths that access the XSAVE buffer to
ensure that they can all either handle the XSAVES format *or* don't care
for whatever reason.

Could you share the steps that you've taken to assure yourself that all
of the call paths are handled and we don't have more bugs?

FWIW, this was the single biggest lesson I learned from the failure the
last time this got turned on: we simply didn't go look for all the
places that the new format had to be handled. Let's be sure we don't
repeat that.

If we get this *wrong* in another user/kernel interface like we did for
ptrace and the signal save/restore and we ever enable a supervisor state
we've got an almost certain immediate root hole of some kind. I think
we need to exercise some serious caution here. Thank $DEITY we don't
have any supported supervisor states at the moment.

2016-04-29 20:37:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On 04/29/2016 01:07 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote:
>>> That's not feasible. Think of dynamic libraries or just-in-time
>>> compilers. What instruction set does /usr/bin/java use, for
>>> instance? :)
> The java argument is true. In that case or when the bitmask is
> missing, we can allocate for all supported features.

Remember, execve() doesn't replace the task_struct. How do we resize
the task_struct at execve() time? If /bin/bash doesn't use AVX, then
fork()s and execve()s an AVX-using program, what do we do?

2016-04-29 20:38:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On Fri, Apr 29, 2016 at 01:16:59PM -0700, Dave Hansen wrote:
> We probably want a software X86_FEATURE_OS_XSAVES or something.

... or a simple variable.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-04-29 20:40:12

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On 04/29/2016 01:25 PM, Andy Lutomirski wrote:
> On Fri, Apr 29, 2016 at 1:07 PM, Yu-cheng Yu <[email protected]> wrote:
>> On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote:
>>> That's not feasible. Think of dynamic libraries or just-in-time
>>> compilers. What instruction set does /usr/bin/java use, for instance? :)
>>
>> The java argument is true. In that case or when the bitmask is
>> missing, we can allocate for all supported features.
>
> I actually want to see us moving in the direction of unconditionally
> allocating everything on process startup. If we can stop using CR0.TS
> entirely, I think everything will be better.

We can absolutely allocate the worst-case XSAVE buffer at task startup
for folks that never want to see a latency spike in the life of the app
no matter what.

But I also think it would be pretty nice if 'ls' didn't pay the 2k cost
to have AVX-512 state if it's not using AVX-512. We also don't have to
do this with CR0.TS. We'd actually use a combination of out-of-line
(not appended to task_struct) XSAVE buffers and XGETBV1 to check the
size of our XSAVE buffer before we call XSAVE* and resize it when needed.

Maybe nobody will ever care enough about 2kbytes/thread, though.

2016-04-29 20:40:57

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On 04/29/2016 01:38 PM, Borislav Petkov wrote:
> On Fri, Apr 29, 2016 at 01:16:59PM -0700, Dave Hansen wrote:
>> > We probably want a software X86_FEATURE_OS_XSAVES or something.
> ... or a simple variable.

I think we do some instruction patching based on it, so I was just
suggesting the software X86_FEATURE because it would plug in to the
existing scheme easier.

2016-04-29 20:47:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On Fri, Apr 29, 2016 at 01:40:51PM -0700, Dave Hansen wrote:
> I think we do some instruction patching based on it, so I was just
> suggesting the software X86_FEATURE because it would plug in to the
> existing scheme easier.

Ah ok, that's fine then.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-04-29 20:49:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On Fri, Apr 29, 2016 at 1:40 PM, Dave Hansen
<[email protected]> wrote:
> On 04/29/2016 01:25 PM, Andy Lutomirski wrote:
>> On Fri, Apr 29, 2016 at 1:07 PM, Yu-cheng Yu <[email protected]> wrote:
>>> On Fri, Apr 29, 2016 at 01:03:43PM -0700, Dave Hansen wrote:
>>>> That's not feasible. Think of dynamic libraries or just-in-time
>>>> compilers. What instruction set does /usr/bin/java use, for instance? :)
>>>
>>> The java argument is true. In that case or when the bitmask is
>>> missing, we can allocate for all supported features.
>>
>> I actually want to see us moving in the direction of unconditionally
>> allocating everything on process startup. If we can stop using CR0.TS
>> entirely, I think everything will be better.
>
> We can absolutely allocate the worst-case XSAVE buffer at task startup
> for folks that never want to see a latency spike in the life of the app
> no matter what.
>
> But I also think it would be pretty nice if 'ls' didn't pay the 2k cost
> to have AVX-512 state if it's not using AVX-512. We also don't have to
> do this with CR0.TS. We'd actually use a combination of out-of-line
> (not appended to task_struct) XSAVE buffers and XGETBV1 to check the
> size of our XSAVE buffer before we call XSAVE* and resize it when needed.
>
> Maybe nobody will ever care enough about 2kbytes/thread, though.

I suspect we're so far about 2k/thread that no one cares.

That being said, when I wrote this email, I wasn't thinking about
compacted form at all. I think we should allocate a viable xstate
area of some sort on startup and use saves/xrstors/xsaveopt/whatever
without fiddling with TS and eagerly save and restore even if no
extended state whatsoever has been used. I'm certainly okay in
principle with reallocating.

However, what do we do if we run out when memory when trying to reallocate?

--
Andy Lutomirski
AMA Capital Management, LLC

2016-04-29 22:11:36

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] x86/xsaves: Fix xstate_offsets, xstate_sizes for legacy components

On Fri, Apr 29, 2016 at 01:28:31PM -0700, Dave Hansen wrote:
> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> > The arrays xstate_offsets[] and xstate_sizes[] record XSAVE area
> > offsets and sizes. Values for legacy components i387 and XMMs were
> > not initialized. Fix it.
>
> Is this just a completeness thing or does it actually break something?

PTRACE format conversion needs them.

2016-04-29 22:13:36

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] x86/xsaves: Fix xstate_offsets, xstate_sizes for legacy components

On 04/29/2016 03:07 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:28:31PM -0700, Dave Hansen wrote:
>> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
>>> The arrays xstate_offsets[] and xstate_sizes[] record XSAVE area
>>> offsets and sizes. Values for legacy components i387 and XMMs were
>>> not initialized. Fix it.
>>
>> Is this just a completeness thing or does it actually break something?
>
> PTRACE format conversion needs them.

If you respin these, can you please note that in the changelog and/or
comments?

2016-04-29 22:20:03

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] x86/xsaves: Fix xstate_offsets, xstate_sizes for legacy components

On Fri, Apr 29, 2016 at 03:13:33PM -0700, Dave Hansen wrote:
> On 04/29/2016 03:07 PM, Yu-cheng Yu wrote:
> > On Fri, Apr 29, 2016 at 01:28:31PM -0700, Dave Hansen wrote:
> >> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> >>> The arrays xstate_offsets[] and xstate_sizes[] record XSAVE area
> >>> offsets and sizes. Values for legacy components i387 and XMMs were
> >>> not initialized. Fix it.
> >>
> >> Is this just a completeness thing or does it actually break something?
> >
> > PTRACE format conversion needs them.
>
> If you respin these, can you please note that in the changelog and/or
> comments?

Do you mean when it is rebased? I will do that.


2016-04-29 22:34:58

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On Fri, Apr 29, 2016 at 01:25:34PM -0700, Dave Hansen wrote:
> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> > + for (i = 0; i < XFEATURE_MAX; i++) {
> > + /*
> > + * Copy only in-use xstates.
> > + */
> > + if (((header.xfeatures >> i) & 1) && xfeature_enabled(i)) {
> > + void *src = get_xsave_addr_no_check(xsave, i);
>
> How could a bit in header.xfeatures get set if it is not set in
> xfeature_enabled() aka xfeatures_mask aka XCR0?

Do you mean, we should test xfeature_enabled(i) first, like,

if (xfeature_enabled(i) && ((header.xfeatures >> i) & 1)) ?

The result will be the same, like you said, if XCR0[i] is not set,
hader.xfeatures[i] cannot be set. But if XCR0[i] is set,
header.xfeatures[i] can be cleared.

>
> If a caller tries to pass a non-enabled xfeature in, we appear to just
> silently drop it and return success. Is that really what we want to do
> or do we want to error out?

Let it fail. I will chage it.

2016-04-29 22:36:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On 04/29/2016 03:30 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:25:34PM -0700, Dave Hansen wrote:
>> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
>>> + for (i = 0; i < XFEATURE_MAX; i++) {
>>> + /*
>>> + * Copy only in-use xstates.
>>> + */
>>> + if (((header.xfeatures >> i) & 1) && xfeature_enabled(i)) {
>>> + void *src = get_xsave_addr_no_check(xsave, i);
>>
>> How could a bit in header.xfeatures get set if it is not set in
>> xfeature_enabled() aka xfeatures_mask aka XCR0?
>
> Do you mean, we should test xfeature_enabled(i) first, like,
>
> if (xfeature_enabled(i) && ((header.xfeatures >> i) & 1)) ?
>
> The result will be the same, like you said, if XCR0[i] is not set,
> hader.xfeatures[i] cannot be set. But if XCR0[i] is set,
> header.xfeatures[i] can be cleared.

I think the xfeature_enabled(i) is probably redundant. Does it serve
any actual purpose?


2016-04-29 22:42:17

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues

On 04/29/2016 01:49 PM, Andy Lutomirski wrote:
>> >
>> > But I also think it would be pretty nice if 'ls' didn't pay the 2k cost
>> > to have AVX-512 state if it's not using AVX-512. We also don't have to
>> > do this with CR0.TS. We'd actually use a combination of out-of-line
>> > (not appended to task_struct) XSAVE buffers and XGETBV1 to check the
>> > size of our XSAVE buffer before we call XSAVE* and resize it when needed.
>> >
>> > Maybe nobody will ever care enough about 2kbytes/thread, though.
> I suspect we're so far about 2k/thread that no one cares.
>
...
> However, what do we do if we run out when memory when trying to reallocate?

The thread has to die a horrible death. We can switch away from it, but
can never switch back to it. Well we can switch to it, we just can't
return to userspace.

Actually, though... On my laptop 1/3 of the task_struct is XSAVE state
(it's ~3k). With AVX-512, ~3/5 of it will be XSAVE, and it will be ~5k
(>1 page). Breaking it in to two pieces makes it overall less likely
that we'd have to fail an allocation.

We'd be in a situation where we probably can't fork *anyway* if that
happened.

2016-04-29 22:43:12

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86/xsaves: Fix PTRACE frames for XSAVES

On Fri, Apr 29, 2016 at 03:36:12PM -0700, Dave Hansen wrote:
> On 04/29/2016 03:30 PM, Yu-cheng Yu wrote:
> > On Fri, Apr 29, 2016 at 01:25:34PM -0700, Dave Hansen wrote:
> >> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> >>> + for (i = 0; i < XFEATURE_MAX; i++) {
> >>> + /*
> >>> + * Copy only in-use xstates.
> >>> + */
> >>> + if (((header.xfeatures >> i) & 1) && xfeature_enabled(i)) {
> >>> + void *src = get_xsave_addr_no_check(xsave, i);
> >>
> >> How could a bit in header.xfeatures get set if it is not set in
> >> xfeature_enabled() aka xfeatures_mask aka XCR0?
> >
> > Do you mean, we should test xfeature_enabled(i) first, like,
> >
> > if (xfeature_enabled(i) && ((header.xfeatures >> i) & 1)) ?
> >
> > The result will be the same, like you said, if XCR0[i] is not set,
> > hader.xfeatures[i] cannot be set. But if XCR0[i] is set,
> > header.xfeatures[i] can be cleared.
>
> I think the xfeature_enabled(i) is probably redundant. Does it serve
> any actual purpose?

Got it.

2016-04-29 22:47:46

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

On Fri, Apr 29, 2016 at 01:09:07PM -0700, Dave Hansen wrote:
> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
> > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> > index 0fbf60c..09945f1 100644
> > --- a/arch/x86/kernel/fpu/signal.c
> > +++ b/arch/x86/kernel/fpu/signal.c
> > @@ -130,6 +130,45 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
> > return err;
> > }
> >
> > +static int may_copy_fpregs_to_sigframe(void)
> > +{
> > + /*
> > + * In signal handling path, the kernel already checks if
> > + * FPU instructions have been used before it calls
> > + * copy_fpstate_to_sigframe(). We check this here again
> > + * to detect any potential mis-use and saving invalid
> > + * register values directly to a signal frame.
> > + */
> > + WARN_ONCE(!current->thread.fpu.fpstate_active,
> > + "direct FPU save with no math use\n");
>
> This is probably an OK check for this _particular_ context (since this
> context is all ready to copy_to_user() the fpu state). But is it good
> generally? Why couldn't you have a !fpstate_active thread that _was_
> fpregs_active?
>
> Such a thread _could_ do a direct XSAVE with no issues.

But it won't come to this function unless fpstate_active is ture?

2016-04-29 23:17:00

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] x86/xsaves: Re-enable XSAVES

On Fri, Apr 29, 2016 at 01:32:15PM -0700, Dave Hansen wrote:
> The reason I haven't acked this patch is that I want to be _sure_ that
> we've audited all of the call paths that access the XSAVE buffer to
> ensure that they can all either handle the XSAVES format *or* don't care
> for whatever reason.
>
> Could you share the steps that you've taken to assure yourself that all
> of the call paths are handled and we don't have more bugs?
>

We tested for signal, ptrace, context switch, avx, and mpx. We also run
these tests with your audit patch to detect any format mis-match.
That said, I cannot be sure there are no more bugs. As you said, we want
to get this feature tested in the field and find potential issues early.

2016-04-30 00:36:51

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

On 04/29/2016 03:43 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:09:07PM -0700, Dave Hansen wrote:
>> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
>>> +static int may_copy_fpregs_to_sigframe(void)
>>> +{
>>> + /*
>>> + * In signal handling path, the kernel already checks if
>>> + * FPU instructions have been used before it calls
>>> + * copy_fpstate_to_sigframe(). We check this here again
>>> + * to detect any potential mis-use and saving invalid
>>> + * register values directly to a signal frame.
>>> + */
>>> + WARN_ONCE(!current->thread.fpu.fpstate_active,
>>> + "direct FPU save with no math use\n");
>>
>> This is probably an OK check for this _particular_ context (since this
>> context is all ready to copy_to_user() the fpu state). But is it good
>> generally? Why couldn't you have a !fpstate_active thread that _was_
>> fpregs_active?
>>
>> Such a thread _could_ do a direct XSAVE with no issues.
>
> But it won't come to this function unless fpstate_active is ture?

If may_copy_fpregs_to_sigframe() were called from a slightly different
context, or if we change the call-site, what breaks?

In other words. if we can still "may_copy_fpregs_to_sigframe()" no
matter the state of fpu.fpstate_active, then I don't think we should be
checking it in may_copy_fpregs_to_sigframe().

2016-04-30 00:40:47

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] x86/xsaves: Re-enable XSAVES

On 04/29/2016 04:12 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:32:15PM -0700, Dave Hansen wrote:
>> The reason I haven't acked this patch is that I want to be _sure_ that
>> we've audited all of the call paths that access the XSAVE buffer to
>> ensure that they can all either handle the XSAVES format *or* don't care
>> for whatever reason.
>>
>> Could you share the steps that you've taken to assure yourself that all
>> of the call paths are handled and we don't have more bugs?
>
> We tested for signal, ptrace, context switch, avx, and mpx. We also run
> these tests with your audit patch to detect any format mis-match.
> That said, I cannot be sure there are no more bugs. As you said, we want
> to get this feature tested in the field and find potential issues early.

That's better than what we had before, but it relies entirely on testing
coverage and runtime checks.

Is it too much to ask that you also take a look and audit all the places
the XSAVE buffer is accessed in the kernel and ensure that they either
have code to handle standard vs. compacted/supervisor or don't care for
some reason?

I did such an audit once upon a time, but I think it would be a good
exercise to repeat both by a second set of eyes and because some time
has passed.

2016-04-30 07:53:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 0/10] x86/xsaves: Fix XSAVES known issues


* Dave Hansen <[email protected]> wrote:

> But I also think it would be pretty nice if 'ls' didn't pay the 2k cost to have
> AVX-512 state if it's not using AVX-512. [...]

A C library might decide to use AVX-512 memset(). RAM is cheap, while allocation
complexity, especially in the kernel, has various other costs.

I mean, we should not worry about per thread allocation sizes that can be compared
to the kernel stack size.

We can still use the compacted area handling instructions, because presumably
those are the fastest and are also the most optimized ones? But I wouldn't use
them to do dynamic allocation: just allocate the maximum possible FPU save area at
task creation time and never again worry about that detail.

Ok?

Thanks,

Ingo