2010-02-02 02:03:33

by Suresh Siddha

[permalink] [raw]
Subject: [patch] x86: ptrace and core-dump extensions for xstate

Kernel ptrace, core dump extensions to support AVX state etc. This interface
(PTRACE_GETXSTATEREGS, PTRACE_SETXSTATEREGS, NT_X86_XSTATE) is designed to
support all the future state that gets supported using xsave/xrstor
infrastructure.

Looking at the memory layout saved by "xsave", one can't say which state
is represented in the memory layout. This is because if a particular state is
in init state, in the xsave hdr it can be represented by bit '0'. And hence
we can't really say by the xsave header wether a state is in init state or
the state is not saved in the memory layout.

And hence the xsave memory layout available through the PTRACE_GETXSTATEREGS
and the core dump(NT_X86_XSTATE) uses SW usable bytes [464..511] to convey
what state is represented in the memory layout.

First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
mask(which is same as the 64bit mask returned by the xgetbv's xCR0).

For more information on how to use this API by users like debuggers and core
dump, please refer to comments in arch/x86/include/asm/ptrace-abi.h

Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Hongjiu Lu <[email protected]>
---

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ebfb8a9..1cd5d43 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -32,9 +32,11 @@ extern void __math_state_restore(void);
extern void init_thread_xstate(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

-extern user_regset_active_fn fpregs_active, xfpregs_active;
-extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get;
-extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set;
+extern user_regset_active_fn fpregs_active, xfpregs_active, xstateregs_active;
+extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
+ xstateregs_get;
+extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
+ xstateregs_set;

extern struct _fpx_sw_bytes fx_sw_reserved;
#ifdef CONFIG_IA32_EMULATION
diff --git a/arch/x86/include/asm/ptrace-abi.h b/arch/x86/include/asm/ptrace-abi.h
index 8672303..6fe60cb 100644
--- a/arch/x86/include/asm/ptrace-abi.h
+++ b/arch/x86/include/asm/ptrace-abi.h
@@ -80,6 +80,44 @@

#define PTRACE_SINGLEBLOCK 33 /* resume execution until next branch */

+/*
+ * Structure layout used in PTRACE_GETXSTATEREGS/PTRACE_SETXSTATEREGS is same
+ * as the memory layout of xsave used by the processor (except for the bytes
+ * 464..511 which can be used by the software). Size of the structure that users
+ * need to use for these two interfaces can be obtained by doing:
+ * cpuid_count(0xd, 0, &eax, &ptrace_xstateregs_struct_size, &ecx, &edx);
+ * i.e., cpuid.(eax=0xd,ecx=0).ebx will be the size that user (debuggers etc)
+ * need to use.
+ *
+ * And format of this structure will be like:
+ * struct {
+ * fxsave_bytes[0..463]
+ * sw_usable_bytes[464..511]
+ * xsave_hdr_bytes[512..575]
+ * avx_bytes[576..831]
+ * future_state etc
+ * }
+ *
+ * Same memory layout will be used for the coredump NT_X86_XSTATE representing
+ * the xstate registers.
+ *
+ * For now, only first 8 bytes of the sw_usable_bytes[464..467] will be used and
+ * will be set to OS enabled xstate mask(which is same as the 64bit mask
+ * returned by the xgetbv's xCR0). Users (analyzing core dump remotely etc)
+ * can use this mask aswell as the mask saved in the xstate_hdr bytes and
+ * interpret what states the processor/OS supports and what states are in
+ * modified/initialized conditions for the particular process/thread.
+ *
+ * Also when the user modifies certain state FP/SSE/etc through this
+ * PTRACE_SETXSTATEREGS, they must ensure that the xsave_hdr.xstate_bv
+ * bytes[512..519] of the above memory layout are updated correspondingly.
+ * i.e., for example when FP state is modified to a non-init state,
+ * xsave_hdr.xstate_bv's bit 0 must be set to '1', when SSE is modified to
+ * non-init state, xsave_hdr.xstate_bv's bit 1 must to be set to '1' etc..
+ */
+#define PTRACE_GETXSTATEREGS 34
+#define PTRACE_SETXSTATEREGS 35
+
#ifndef __ASSEMBLY__
#include <linux/types.h>

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 727acc1..9e26171 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -27,9 +27,11 @@
extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern struct xsave_struct *init_xstate_buf;
+extern u64 xstate_fx_sw_bytes[6];

extern void xsave_cntxt_init(void);
extern void xsave_init(void);
+extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
void __user *fpstate,
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index f2f8540..dffdf91 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -224,6 +224,85 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
return ret;
}

+int xstateregs_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ return (cpu_has_xsave && tsk_used_math(target)) ? xstate_size : 0;
+}
+
+int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ if (!cpu_has_xsave)
+ return -ENODEV;
+
+ ret = init_fpu(target);
+ if (ret)
+ return ret;
+
+ /*
+ * First copy the fxsave bytes 0..463
+ */
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.xstate->xsave, 0,
+ (sizeof(struct i387_fxsave_struct) -
+ sizeof(xstate_fx_sw_bytes)));
+ /*
+ * Copy the 48bytes defined by software
+ */
+ ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ xstate_fx_sw_bytes,
+ (sizeof(struct i387_fxsave_struct) -
+ sizeof(xstate_fx_sw_bytes)),
+ sizeof(struct i387_fxsave_struct));
+ /*
+ * Copy the rest of xstate memory layout
+ */
+ ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.xstate->xsave.xsave_hdr,
+ sizeof(struct i387_fxsave_struct), -1);
+ return ret;
+}
+
+int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+ struct xsave_hdr_struct *xsave_hdr =
+ &current->thread.xstate->xsave.xsave_hdr;
+
+
+ if (!cpu_has_xsave)
+ return -ENODEV;
+
+ ret = init_fpu(target);
+ if (ret)
+ return ret;
+
+ set_stopped_child_used_math(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ &target->thread.xstate->xsave, 0, -1);
+
+ /*
+ * mxcsr reserved bits must be masked to zero for security reasons.
+ */
+ target->thread.xstate->fxsave.mxcsr &= mxcsr_feature_mask;
+
+ xsave_hdr->xstate_bv &= pcntxt_mask;
+ /*
+ * These bits must be zero.
+ */
+ xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
+
+
+ return ret;
+}
+
#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION

/*
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 73554a3..4bb5c1c 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -48,6 +48,7 @@ enum x86_regset {
REGSET_FP,
REGSET_XFP,
REGSET_IOPERM64 = REGSET_XFP,
+ REGSET_XSTATE,
REGSET_TLS,
REGSET_IOPERM32,
};
@@ -1208,6 +1209,20 @@ long arch_ptrace(struct task_struct *child, long request, long addr, long data)
0, sizeof(struct user_i387_struct),
datap);

+ case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
+ return copy_regset_to_user(child,
+ task_user_regset_view(current),
+ REGSET_XSTATE,
+ 0, xstate_size,
+ datap);
+
+ case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
+ return copy_regset_from_user(child,
+ task_user_regset_view(current),
+ REGSET_XSTATE,
+ 0, xstate_size,
+ datap);
+
#ifdef CONFIG_X86_32
case PTRACE_GETFPXREGS: /* Get the child extended FPU state. */
return copy_regset_to_user(child, &user_x86_32_view,
@@ -1537,6 +1552,16 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
sizeof(struct user32_fxsr_struct),
datap);

+ case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
+ return copy_regset_to_user(child, &user_x86_32_view,
+ REGSET_XSTATE, 0, xstate_size,
+ datap);
+
+ case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
+ return copy_regset_from_user(child, &user_x86_32_view,
+ REGSET_XSTATE, 0, xstate_size,
+ datap);
+
case PTRACE_GET_THREAD_AREA:
case PTRACE_SET_THREAD_AREA:
#ifdef CONFIG_X86_PTRACE_BTS
@@ -1560,7 +1585,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,

#ifdef CONFIG_X86_64

-static const struct user_regset x86_64_regsets[] = {
+static struct user_regset x86_64_regsets[] __read_mostly = {
[REGSET_GENERAL] = {
.core_note_type = NT_PRSTATUS,
.n = sizeof(struct user_regs_struct) / sizeof(long),
@@ -1573,6 +1598,12 @@ static const struct user_regset x86_64_regsets[] = {
.size = sizeof(long), .align = sizeof(long),
.active = xfpregs_active, .get = xfpregs_get, .set = xfpregs_set
},
+ [REGSET_XSTATE] = {
+ .core_note_type = NT_X86_XSTATE,
+ .size = sizeof(long), .align = sizeof(long),
+ .active = xstateregs_active, .get = xstateregs_get,
+ .set = xstateregs_set
+ },
[REGSET_IOPERM64] = {
.core_note_type = NT_386_IOPERM,
.n = IO_BITMAP_LONGS,
@@ -1598,7 +1629,7 @@ static const struct user_regset_view user_x86_64_view = {
#endif /* CONFIG_X86_64 */

#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
-static const struct user_regset x86_32_regsets[] = {
+static struct user_regset x86_32_regsets[] __read_mostly = {
[REGSET_GENERAL] = {
.core_note_type = NT_PRSTATUS,
.n = sizeof(struct user_regs_struct32) / sizeof(u32),
@@ -1617,6 +1648,12 @@ static const struct user_regset x86_32_regsets[] = {
.size = sizeof(u32), .align = sizeof(u32),
.active = xfpregs_active, .get = xfpregs_get, .set = xfpregs_set
},
+ [REGSET_XSTATE] = {
+ .core_note_type = NT_X86_XSTATE,
+ .size = sizeof(u32), .align = sizeof(u32),
+ .active = xstateregs_active, .get = xstateregs_get,
+ .set = xstateregs_set
+ },
[REGSET_TLS] = {
.core_note_type = NT_386_TLS,
.n = GDT_ENTRY_TLS_ENTRIES, .bias = GDT_ENTRY_TLS_MIN,
@@ -1639,6 +1676,18 @@ static const struct user_regset_view user_x86_32_view = {
};
#endif

+u64 xstate_fx_sw_bytes[6];
+void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
+{
+#ifdef CONFIG_X86_64
+ x86_64_regsets[REGSET_XSTATE].n = size / sizeof(long);
+#endif
+#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
+ x86_32_regsets[REGSET_XSTATE].n = size / sizeof(u32);
+#endif
+ xstate_fx_sw_bytes[0] = xstate_mask;
+}
+
const struct user_regset_view *task_user_regset_view(struct task_struct *task)
{
#ifdef CONFIG_IA32_EMULATION
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index c5ee17e..782c3a3 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -337,6 +337,7 @@ void __ref xsave_cntxt_init(void)
cpuid_count(0xd, 0, &eax, &ebx, &ecx, &edx);
xstate_size = ebx;

+ update_regset_xstate_info(xstate_size, pcntxt_mask);
prepare_fx_sw_frame();

setup_xstate_init();
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 0cc4d55..a8c4af0 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -361,6 +361,7 @@ typedef struct elf64_shdr {
#define NT_PPC_VSX 0x102 /* PowerPC VSX registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
#define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */
+#define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */
#define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */




2010-02-02 02:34:08

by Suresh Siddha

[permalink] [raw]
Subject: [tip:x86/ptrace] x86: ptrace and core-dump extensions for xstate

Commit-ID: c741196df8a42602965f781e0f09462de81742e2
Gitweb: http://git.kernel.org/tip/c741196df8a42602965f781e0f09462de81742e2
Author: Suresh Siddha <[email protected]>
AuthorDate: Mon, 1 Feb 2010 18:00:25 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Mon, 1 Feb 2010 18:26:54 -0800

x86: ptrace and core-dump extensions for xstate

Kernel ptrace, core dump extensions to support AVX state etc. This interface
(PTRACE_GETXSTATEREGS, PTRACE_SETXSTATEREGS, NT_X86_XSTATE) is designed to
support all the future state that gets supported using xsave/xrstor
infrastructure.

Looking at the memory layout saved by "xsave", one can't say which state
is represented in the memory layout. This is because if a particular state is
in init state, in the xsave hdr it can be represented by bit '0'. And hence
we can't really say by the xsave header wether a state is in init state or
the state is not saved in the memory layout.

And hence the xsave memory layout available through the PTRACE_GETXSTATEREGS
and the core dump(NT_X86_XSTATE) uses SW usable bytes [464..511] to convey
what state is represented in the memory layout.

First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
mask(which is same as the 64bit mask returned by the xgetbv's xCR0).

For more information on how to use this API by users like debuggers and core
dump, please refer to comments in arch/x86/include/asm/ptrace-abi.h

Signed-off-by: Suresh Siddha <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Hongjiu Lu <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/i387.h | 8 ++-
arch/x86/include/asm/ptrace-abi.h | 38 ++++++++++++++++++
arch/x86/include/asm/xsave.h | 2 +
arch/x86/kernel/i387.c | 79 +++++++++++++++++++++++++++++++++++++
arch/x86/kernel/ptrace.c | 53 ++++++++++++++++++++++++-
arch/x86/kernel/xsave.c | 1 +
include/linux/elf.h | 1 +
7 files changed, 177 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index ebfb8a9..1cd5d43 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -32,9 +32,11 @@ extern void __math_state_restore(void);
extern void init_thread_xstate(void);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);

-extern user_regset_active_fn fpregs_active, xfpregs_active;
-extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get;
-extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set;
+extern user_regset_active_fn fpregs_active, xfpregs_active, xstateregs_active;
+extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
+ xstateregs_get;
+extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
+ xstateregs_set;

extern struct _fpx_sw_bytes fx_sw_reserved;
#ifdef CONFIG_IA32_EMULATION
diff --git a/arch/x86/include/asm/ptrace-abi.h b/arch/x86/include/asm/ptrace-abi.h
index 8672303..447cfb6 100644
--- a/arch/x86/include/asm/ptrace-abi.h
+++ b/arch/x86/include/asm/ptrace-abi.h
@@ -80,6 +80,44 @@

#define PTRACE_SINGLEBLOCK 33 /* resume execution until next branch */

+/*
+ * Structure layout used in PTRACE_GETXSTATEREGS/PTRACE_SETXSTATEREGS is same
+ * as the memory layout of xsave used by the processor (except for the bytes
+ * 464..511 which can be used by the software). Size of the structure that users
+ * need to use for these two interfaces can be obtained by doing:
+ * cpuid_count(0xd, 0, &eax, &ptrace_xstateregs_struct_size, &ecx, &edx);
+ * i.e., cpuid.(eax=0xd,ecx=0).ebx will be the size that user (debuggers etc)
+ * need to use.
+ *
+ * And format of this structure will be like:
+ * struct {
+ * fxsave_bytes[0..463]
+ * sw_usable_bytes[464..511]
+ * xsave_hdr_bytes[512..575]
+ * avx_bytes[576..831]
+ * future_state etc
+ * }
+ *
+ * Same memory layout will be used for the coredump NT_X86_XSTATE representing
+ * the xstate registers.
+ *
+ * For now, only first 8 bytes of the sw_usable_bytes[464..467] will be used and
+ * will be set to OS enabled xstate mask(which is same as the 64bit mask
+ * returned by the xgetbv's xCR0). Users (analyzing core dump remotely etc)
+ * can use this mask aswell as the mask saved in the xstate_hdr bytes and
+ * interpret what states the processor/OS supports and what states are in
+ * modified/initialized conditions for the particular process/thread.
+ *
+ * Also when the user modifies certain state FP/SSE/etc through this
+ * PTRACE_SETXSTATEREGS, they must ensure that the xsave_hdr.xstate_bv
+ * bytes[512..519] of the above memory layout are updated correspondingly.
+ * i.e., for example when FP state is modified to a non-init state,
+ * xsave_hdr.xstate_bv's bit 0 must be set to '1', when SSE is modified to
+ * non-init state, xsave_hdr.xstate_bv's bit 1 must to be set to '1' etc..
+ */
+#define PTRACE_GETXSTATEREGS 34
+#define PTRACE_SETXSTATEREGS 35
+
#ifndef __ASSEMBLY__
#include <linux/types.h>

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 727acc1..9e26171 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -27,9 +27,11 @@
extern unsigned int xstate_size;
extern u64 pcntxt_mask;
extern struct xsave_struct *init_xstate_buf;
+extern u64 xstate_fx_sw_bytes[6];

extern void xsave_cntxt_init(void);
extern void xsave_init(void);
+extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
extern int init_fpu(struct task_struct *child);
extern int check_for_xstate(struct i387_fxsave_struct __user *buf,
void __user *fpstate,
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index f2f8540..dffdf91 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -224,6 +224,85 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
return ret;
}

+int xstateregs_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ return (cpu_has_xsave && tsk_used_math(target)) ? xstate_size : 0;
+}
+
+int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ if (!cpu_has_xsave)
+ return -ENODEV;
+
+ ret = init_fpu(target);
+ if (ret)
+ return ret;
+
+ /*
+ * First copy the fxsave bytes 0..463
+ */
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.xstate->xsave, 0,
+ (sizeof(struct i387_fxsave_struct) -
+ sizeof(xstate_fx_sw_bytes)));
+ /*
+ * Copy the 48bytes defined by software
+ */
+ ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ xstate_fx_sw_bytes,
+ (sizeof(struct i387_fxsave_struct) -
+ sizeof(xstate_fx_sw_bytes)),
+ sizeof(struct i387_fxsave_struct));
+ /*
+ * Copy the rest of xstate memory layout
+ */
+ ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.xstate->xsave.xsave_hdr,
+ sizeof(struct i387_fxsave_struct), -1);
+ return ret;
+}
+
+int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+ struct xsave_hdr_struct *xsave_hdr =
+ &current->thread.xstate->xsave.xsave_hdr;
+
+
+ if (!cpu_has_xsave)
+ return -ENODEV;
+
+ ret = init_fpu(target);
+ if (ret)
+ return ret;
+
+ set_stopped_child_used_math(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ &target->thread.xstate->xsave, 0, -1);
+
+ /*
+ * mxcsr reserved bits must be masked to zero for security reasons.
+ */
+ target->thread.xstate->fxsave.mxcsr &= mxcsr_feature_mask;
+
+ xsave_hdr->xstate_bv &= pcntxt_mask;
+ /*
+ * These bits must be zero.
+ */
+ xsave_hdr->reserved1[0] = xsave_hdr->reserved1[1] = 0;
+
+
+ return ret;
+}
+
#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION

/*
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 017d937..df9add9 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -48,6 +48,7 @@ enum x86_regset {
REGSET_FP,
REGSET_XFP,
REGSET_IOPERM64 = REGSET_XFP,
+ REGSET_XSTATE,
REGSET_TLS,
REGSET_IOPERM32,
};
@@ -1232,6 +1233,20 @@ long arch_ptrace(struct task_struct *child, long request, long addr, long data)
0, sizeof(struct user_i387_struct),
datap);

+ case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
+ return copy_regset_to_user(child,
+ task_user_regset_view(current),
+ REGSET_XSTATE,
+ 0, xstate_size,
+ datap);
+
+ case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
+ return copy_regset_from_user(child,
+ task_user_regset_view(current),
+ REGSET_XSTATE,
+ 0, xstate_size,
+ datap);
+
#ifdef CONFIG_X86_32
case PTRACE_GETFPXREGS: /* Get the child extended FPU state. */
return copy_regset_to_user(child, &user_x86_32_view,
@@ -1561,6 +1576,16 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
sizeof(struct user32_fxsr_struct),
datap);

+ case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
+ return copy_regset_to_user(child, &user_x86_32_view,
+ REGSET_XSTATE, 0, xstate_size,
+ datap);
+
+ case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
+ return copy_regset_from_user(child, &user_x86_32_view,
+ REGSET_XSTATE, 0, xstate_size,
+ datap);
+
case PTRACE_GET_THREAD_AREA:
case PTRACE_SET_THREAD_AREA:
#ifdef CONFIG_X86_PTRACE_BTS
@@ -1584,7 +1609,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,

#ifdef CONFIG_X86_64

-static const struct user_regset x86_64_regsets[] = {
+static struct user_regset x86_64_regsets[] __read_mostly = {
[REGSET_GENERAL] = {
.core_note_type = NT_PRSTATUS,
.n = sizeof(struct user_regs_struct) / sizeof(long),
@@ -1597,6 +1622,12 @@ static const struct user_regset x86_64_regsets[] = {
.size = sizeof(long), .align = sizeof(long),
.active = xfpregs_active, .get = xfpregs_get, .set = xfpregs_set
},
+ [REGSET_XSTATE] = {
+ .core_note_type = NT_X86_XSTATE,
+ .size = sizeof(long), .align = sizeof(long),
+ .active = xstateregs_active, .get = xstateregs_get,
+ .set = xstateregs_set
+ },
[REGSET_IOPERM64] = {
.core_note_type = NT_386_IOPERM,
.n = IO_BITMAP_LONGS,
@@ -1622,7 +1653,7 @@ static const struct user_regset_view user_x86_64_view = {
#endif /* CONFIG_X86_64 */

#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
-static const struct user_regset x86_32_regsets[] = {
+static struct user_regset x86_32_regsets[] __read_mostly = {
[REGSET_GENERAL] = {
.core_note_type = NT_PRSTATUS,
.n = sizeof(struct user_regs_struct32) / sizeof(u32),
@@ -1641,6 +1672,12 @@ static const struct user_regset x86_32_regsets[] = {
.size = sizeof(u32), .align = sizeof(u32),
.active = xfpregs_active, .get = xfpregs_get, .set = xfpregs_set
},
+ [REGSET_XSTATE] = {
+ .core_note_type = NT_X86_XSTATE,
+ .size = sizeof(u32), .align = sizeof(u32),
+ .active = xstateregs_active, .get = xstateregs_get,
+ .set = xstateregs_set
+ },
[REGSET_TLS] = {
.core_note_type = NT_386_TLS,
.n = GDT_ENTRY_TLS_ENTRIES, .bias = GDT_ENTRY_TLS_MIN,
@@ -1663,6 +1700,18 @@ static const struct user_regset_view user_x86_32_view = {
};
#endif

+u64 xstate_fx_sw_bytes[6];
+void update_regset_xstate_info(unsigned int size, u64 xstate_mask)
+{
+#ifdef CONFIG_X86_64
+ x86_64_regsets[REGSET_XSTATE].n = size / sizeof(long);
+#endif
+#if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
+ x86_32_regsets[REGSET_XSTATE].n = size / sizeof(u32);
+#endif
+ xstate_fx_sw_bytes[0] = xstate_mask;
+}
+
const struct user_regset_view *task_user_regset_view(struct task_struct *task)
{
#ifdef CONFIG_IA32_EMULATION
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index c5ee17e..782c3a3 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -337,6 +337,7 @@ void __ref xsave_cntxt_init(void)
cpuid_count(0xd, 0, &eax, &ebx, &ecx, &edx);
xstate_size = ebx;

+ update_regset_xstate_info(xstate_size, pcntxt_mask);
prepare_fx_sw_frame();

setup_xstate_init();
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 0cc4d55..a8c4af0 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -361,6 +361,7 @@ typedef struct elf64_shdr {
#define NT_PPC_VSX 0x102 /* PowerPC VSX registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
#define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */
+#define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */
#define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */

2010-02-03 23:37:38

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

Please also CC Oleg on things related to user_regset and/or ptrace,
as per MAINTAINERS.

Please make this two patches. The first one should add the regset, which
implicitly adds it to core dumps, and makes fixed the note layout aspect of
the permanent userland ABI. The second one should add the new ptrace
requests, which is a further addition to the userland ABI.

> For more information on how to use this API by users like debuggers and core
> dump, please refer to comments in arch/x86/include/asm/ptrace-abi.h

There are some obvious typos in that comment. Please fix those up.

> +int xstateregs_active(struct task_struct *target,
> + const struct user_regset *regset)
> +{
> + return (cpu_has_xsave && tsk_used_math(target)) ? xstate_size : 0;
> +}

The return value here is "n" not "n * size", so xstate_size is wrong. You
can just use regset->n instead. I'll further note that when !cpu_has_xsave,
regset->n will be zero. So all you really need is:

return tsk_used_math(target) ? regset->n : 0;

i.e., the same as fpregs_active. So I would just use:

#define xstateregs_active fpregs_active

with a comment explaining that they are same and why.

> + /*
> + * First copy the fxsave bytes 0..463
> + */
> + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> + &target->thread.xstate->xsave, 0,
> + (sizeof(struct i387_fxsave_struct) -
> + sizeof(xstate_fx_sw_bytes)));
> + /*
> + * Copy the 48bytes defined by software
> + */
> + ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> + xstate_fx_sw_bytes,
> + (sizeof(struct i387_fxsave_struct) -
> + sizeof(xstate_fx_sw_bytes)),
> + sizeof(struct i387_fxsave_struct));
> + /*
> + * Copy the rest of xstate memory layout
> + */
> + ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> + &target->thread.xstate->xsave.xsave_hdr,
> + sizeof(struct i387_fxsave_struct), -1);

This is wrong for the error cases. user_regset_copyout returns an error
code or 0, so "|=" is a strange thing to with its return value. In fact,
it only ever returns 0 or -EFAULT, so |= will produce the value you want.
But | is a pretty strange thing to do with two error codes.

It's problematically wrong for a different reason. In the error case,
user_regset_copyout returns without updating pos, count, and ubuf (i.e.
it returns having done nothing when returning an error, which is a pretty
normal convention). So, if the first call fails then the second call will
have pos < start_pos and hit the BUG_ON. IOW, be sure to test your new
ptrace call with an invalid userland pointer (e.g. NULL) passed in.

In short, replace "ret |=" with "if (likely(!ret))\n\tret = ".


This is up to you, but personally I would define something akin to:

struct xstate_info {
union {
struct i387_fxsave_struct fxsave;
struct {
u64 i387[58];
u64 xstate_fx_sw[XSTATE_FX_SW_WORDS];
};
};
struct xsave_hdr_struct xsave_hdr;
u64 xsave_state[0];
};

and then use offsetof rather than manual arithmetic in those
user_regset_copyout calls. IMHO it would be better to have this in
ptrace-abi.h along with some macros for what the fx_sw indices are (I guess
only 0 is defined now, but whatever) rather than relegate that info to
magic numbers in a comment and userland code doing manual arithmetic.
But even if you just put it locally in ptrace.c, it makes the ptrace
code less prone to possible arithmetic typos.


> + case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
> + return copy_regset_to_user(child,
> + task_user_regset_view(current),
> + REGSET_XSTATE,
> + 0, xstate_size,
> + datap);
> +
> + case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
> + return copy_regset_from_user(child,
> + task_user_regset_view(current),
> + REGSET_XSTATE,
> + 0, xstate_size,
> + datap);

I think this is a poor choice of interface for this. The other existing
PTRACE_GET*REGS calls use a fixed-size and fixed-layout block that is a
known constant in the userland ABI. Here, userland has no way to know
xstate_size, so you are accessing a chunk of user memory where userland
doesn't really know how much you are going to access.

AFAICT from skimming the Intel manuals, there is no specified upper bound
on how big the xsave size might grow in future processors. I would think
that it might be limited to a page, since there is no way to indicate a
partial copy to restart after a page fault. So for unpinned access (such
as in user mode, or the user memory access in the signal frame setup), in
full-thrashing situations an xsave spanning multiple pages might thrash
totally and never make progress. OTOH, the manual does not say that the
buffer can't span two pages today, just that it has to be 64-byte aligned.
So perhaps we already have that issue (for signal frame setup or for direct
user-space uses of xsave/xrstor) and it's just not really an issue that
matters (thrashing is thrashing--it's already pathological, so who cares if
it's literal livelock or not). The upshot being, I don't think there is an
obvious upper bound that userland can presume statically here.

AFAICT, the only way for userland to guess xstate_size is to use cpuid
itself. Even if that is actually reliable, or even if the kernel gave
userland some other means to know the kernel's xstate_size value, IMHO that
is a pretty dubious and error-prone way to construct the ABI. Usually when
userland supplies a buffer whose size is not a permanent constant of the
ABI, userland says how big a buffer it's passing in.

The most obvious change would be just s/xstate_size/MIN(addr, xstate_size)/.
i.e. userland passes in the maximum size it wants in the other argument.

But IMHO this is a fine opportunity not to add another one-off request for
a particular regset type, and then never add another one again. Instead,
we can add a general-purpose request to handle any regset based on what is
already part of the userland ABI: the NT_* codes and the regset layouts
they imply on each machine.

e.g.
struct iovec iov = { mybuffer, mylength };
ret = ptrace(PTRACE_GETREGSET, NT_X86_XSTATE, &iov);

or something along those lines. We could implement a generic
PTRACE_GETREGSET and PTRACE_SETREGSET in {,compat_}ptrace_request() on
CONFIG_HAVE_ARCH_TRACEHOOK machines.

Then all any arch ever has to do in future is just define a new user_regset
in its user_regset_view for a new thing.

I'll make Oleg implement it if you don't do it yourself ;-). We can all
work out together exactly what the interface should be, I don't have any
special fixed ideas.


Thanks,
Roland

2010-02-03 23:45:18

by Lu, Hongjiu

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

>
> > + case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
> > + return copy_regset_to_user(child,
> > + task_user_regset_view(current),
> > + REGSET_XSTATE,
> > + 0, xstate_size,
> > + datap);
> > +
> > + case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
> > + return copy_regset_from_user(child,
> > + task_user_regset_view(current),
> > + REGSET_XSTATE,
> > + 0, xstate_size,
> > + datap);
>
> I think this is a poor choice of interface for this. The other existing
> PTRACE_GET*REGS calls use a fixed-size and fixed-layout block that is a
> known constant in the userland ABI. Here, userland has no way to know
> xstate_size, so you are accessing a chunk of user memory where userland
> doesn't really know how much you are going to access.
>
> AFAICT from skimming the Intel manuals, there is no specified upper bound
> on how big the xsave size might grow in future processors. I would think
> that it might be limited to a page, since there is no way to indicate a
> partial copy to restart after a page fault. So for unpinned access (such
> as in user mode, or the user memory access in the signal frame setup), in
> full-thrashing situations an xsave spanning multiple pages might thrash
> totally and never make progress. OTOH, the manual does not say that the
> buffer can't span two pages today, just that it has to be 64-byte aligned.
> So perhaps we already have that issue (for signal frame setup or for direct
> user-space uses of xsave/xrstor) and it's just not really an issue that
> matters (thrashing is thrashing--it's already pathological, so who cares if
> it's literal livelock or not). The upshot being, I don't think there is an
> obvious upper bound that userland can presume statically here.
>
> AFAICT, the only way for userland to guess xstate_size is to use cpuid
> itself. Even if that is actually reliable, or even if the kernel gave
> userland some other means to know the kernel's xstate_size value, IMHO that
> is a pretty dubious and error-prone way to construct the ABI. Usually when
> userland supplies a buffer whose size is not a permanent constant of the
> ABI, userland says how big a buffer it's passing in.
>

Gdb calls cpuid during startup time to find out the actual xstate_size
on the target machine. It has to be reliable. There is no need for
another ptrace option. But we should document it clearly in ABI.


H.J.

2010-02-04 02:02:39

by Roland McGrath

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

> Gdb calls cpuid during startup time to find out the actual xstate_size
> on the target machine. It has to be reliable. There is no need for
> another ptrace option. But we should document it clearly in ABI.

As I said, I think that is an extremely poor API choice. There is no such
arcane requirement in the API now, so we do not have add one.


Thanks,
Roland

2010-02-04 02:05:18

by Lu, Hongjiu

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

>
> > Gdb calls cpuid during startup time to find out the actual xstate_size
> > on the target machine. It has to be reliable. There is no need for
> > another ptrace option. But we should document it clearly in ABI.
>
> As I said, I think that is an extremely poor API choice. There is no such
> arcane requirement in the API now, so we do not have add one.
>

That is how processor works.


H.J.

2010-02-04 02:18:42

by Roland McGrath

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

> That is how processor works.

We're not talking about the processor.
We're talking about the API of a Linux system call.


Thanks,
Roland

2010-02-04 02:23:08

by Lu, Hongjiu

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

> > That is how processor works.
>
> We're not talking about the processor.
> We're talking about the API of a Linux system call.
>
>

To support XSAVE, gdb needs to know XCR0 as well as XSTATE size.
We can get those info from kernel via system call or cpuid. I
prefer cpuid over system call.


H.J.

2010-02-04 04:55:26

by Roland McGrath

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

> To support XSAVE, gdb needs to know XCR0 as well as XSTATE size.
> We can get those info from kernel via system call or cpuid. I
> prefer cpuid over system call.

Suresh's patch puts this value in the xsave block, in what Suresh calls
"sw_usable_bytes". See the asm/ptrace-abi.h comment in the patch you
signed off on.

How is that not sufficient? If it is indeed not sufficient to usefully
interpret the xsave block, then how could an xsave block in a core dump
file ever possibly be examined if it might not have been generated on the
same system and kernel where the debugger is doing the examination? If
the NT_X86_XSTATE note as implemented in Suresh's patch is indeed not
entirely self-contained in this way, then NAK on that new note format.


Thanks,
Roland

2010-02-04 05:01:36

by Lu, Hongjiu

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

> > To support XSAVE, gdb needs to know XCR0 as well as XSTATE size.
> > We can get those info from kernel via system call or cpuid. I
> > prefer cpuid over system call.
>
> Suresh's patch puts this value in the xsave block, in what Suresh calls
> "sw_usable_bytes". See the asm/ptrace-abi.h comment in the patch you
> signed off on.
>
> How is that not sufficient? If it is indeed not sufficient to usefully
> interpret the xsave block, then how could an xsave block in a core dump
> file ever possibly be examined if it might not have been generated on the
> same system and kernel where the debugger is doing the examination? If
> the NT_X86_XSTATE note as implemented in Suresh's patch is indeed not
> entirely self-contained in this way, then NAK on that new note format.
>

I use it when reading core dump, which doesn't involve a system call.
I can analyze it on a totally different machine.

FWIW, there is the header file I have in gdb

---
/* XSAVE extended state information. */

/* The extended state status. */
enum xstate_status
{
XSTATE_UNKNOWN = 0,
XSTATE_UNSUPPORTED,
XSTATE_INITIALIZED
};

/* The extended state feature bits. */
#define bit_XSTATE_X87 (1ULL << 0)
#define bit_XSTATE_SSE (1ULL << 1)
#define bit_XSTATE_AVX (1ULL << 2)

/* Supported mask and size of the extended state. Used for qSupport. */
#define XSTATE_SSE_MASK 0x3
#define XSTATE_AVX_MASK 0x7
#define XSTATE_MAX_MASK 0x7

#define XSTATE_SSE_MASK_STRING "0x3"
#define XSTATE_AVX_MASK_STRING "0x7"
#define XSTATE_MAX_MASK_STRING "0x7"

#define XSTATE_SSE_SIZE 576
#define XSTATE_AVX_SIZE 832
#define XSTATE_MAX_SIZE 832

#define XSTATE_SSE_SIZE_STRING "576"
#define XSTATE_AVX_SIZE_STRING "832"
#define XSTATE_MAX_SIZE_STRING "832"

struct i386_xstate_type
{
/* The extended state status. */
enum xstate_status status;
/* The extended control register 0 (the XFEATURE_ENABLED_MASK
register). */
unsigned long long xcr0;
/* The extended state size in bytes. */
unsigned int size;
/* The extended state size in unit of int64. We use array of
int64 for better alignment. */
unsigned int n_of_int64;
};

extern struct i386_xstate_type i386_xstate;

extern void i386_xstate_init (void);
---

I don't need a system call to tell me xcr0 and size. If you
really want, you can put it in glibc and doesn't involve kernel.
We can use __cpu_features for it.


H.J.

2010-02-04 05:18:34

by Roland McGrath

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

> > Suresh's patch puts this value in the xsave block, in what Suresh calls
> > "sw_usable_bytes". See the asm/ptrace-abi.h comment in the patch you
> > signed off on.
> >
> > How is that not sufficient? If it is indeed not sufficient to usefully
> > interpret the xsave block, then how could an xsave block in a core dump
> > file ever possibly be examined if it might not have been generated on the
> > same system and kernel where the debugger is doing the examination? If
> > the NT_X86_XSTATE note as implemented in Suresh's patch is indeed not
> > entirely self-contained in this way, then NAK on that new note format.
> >
>
> I use it when reading core dump, which doesn't involve a system call.
> I can analyze it on a totally different machine.

You did not answer any of my questions.
Perhaps Suresh can be more helpful in explaining the situation.


Thanks,
Roland

2010-02-04 05:33:34

by Lu, Hongjiu

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

>
> > > Suresh's patch puts this value in the xsave block, in what Suresh calls
> > > "sw_usable_bytes". See the asm/ptrace-abi.h comment in the patch you
> > > signed off on.
> > >
> > > How is that not sufficient? If it is indeed not sufficient to usefully
> > > interpret the xsave block, then how could an xsave block in a core dump
> > > file ever possibly be examined if it might not have been generated on the
> > > same system and kernel where the debugger is doing the examination? If
> > > the NT_X86_XSTATE note as implemented in Suresh's patch is indeed not
> > > entirely self-contained in this way, then NAK on that new note format.
> > >
> >
> > I use it when reading core dump, which doesn't involve a system call.
> > I can analyze it on a totally different machine.
>
> You did not answer any of my questions.
> Perhaps Suresh can be more helpful in explaining the situation.
>

When gdb reads a core section, it only processes up to the maximum
xsave size it supports and ignores the rest. It also reads the first 8
byte in sw_usable_bytes for xcr0. It only exams the xcr0 bits it supports.
With this info, gdb can exams xsave core dump up to the state it supports.

When it calls ptrace with XSTATE, it uses a buffer of the
size returned by cpuid. But it only reads up to the xsave size
it supports.

The difference here is the size of core dump is recorded in core
dump and I have to use cpuid returned size to make a ptrace call.

H.J.

2010-02-04 20:30:05

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

On Wed, 2010-02-03 at 15:08 -0800, Roland McGrath wrote:
> Please also CC Oleg on things related to user_regset and/or ptrace,
> as per MAINTAINERS.

Sure.

>
> Please make this two patches. The first one should add the regset, which
> implicitly adds it to core dumps, and makes fixed the note layout aspect of
> the permanent userland ABI. The second one should add the new ptrace
> requests, which is a further addition to the userland ABI.

Sure. Let's come to the agreement on the ABI changes and then I can
split the patches accordingly.

>
> > For more information on how to use this API by users like debuggers and core
> > dump, please refer to comments in arch/x86/include/asm/ptrace-abi.h
>
> There are some obvious typos in that comment. Please fix those up.

Can you please elaborate?

>
> > +int xstateregs_active(struct task_struct *target,
> > + const struct user_regset *regset)
> > +{
> > + return (cpu_has_xsave && tsk_used_math(target)) ? xstate_size : 0;
> > +}
>
> The return value here is "n" not "n * size", so xstate_size is wrong. You
> can just use regset->n instead. I'll further note that when !cpu_has_xsave,
> regset->n will be zero. So all you really need is:
>
> return tsk_used_math(target) ? regset->n : 0;
>
> i.e., the same as fpregs_active. So I would just use:
>
> #define xstateregs_active fpregs_active
>
> with a comment explaining that they are same and why.

Makes sense.

>
> > + /*
> > + * First copy the fxsave bytes 0..463
> > + */
> > + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> > + &target->thread.xstate->xsave, 0,
> > + (sizeof(struct i387_fxsave_struct) -
> > + sizeof(xstate_fx_sw_bytes)));
> > + /*
> > + * Copy the 48bytes defined by software
> > + */
> > + ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> > + xstate_fx_sw_bytes,
> > + (sizeof(struct i387_fxsave_struct) -
> > + sizeof(xstate_fx_sw_bytes)),
> > + sizeof(struct i387_fxsave_struct));
> > + /*
> > + * Copy the rest of xstate memory layout
> > + */
> > + ret |= user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> > + &target->thread.xstate->xsave.xsave_hdr,
> > + sizeof(struct i387_fxsave_struct), -1);
>
> This is wrong for the error cases. user_regset_copyout returns an error
> code or 0, so "|=" is a strange thing to with its return value. In fact,
> it only ever returns 0 or -EFAULT, so |= will produce the value you want.
> But | is a pretty strange thing to do with two error codes.
>
> It's problematically wrong for a different reason. In the error case,
> user_regset_copyout returns without updating pos, count, and ubuf (i.e.
> it returns having done nothing when returning an error, which is a pretty
> normal convention). So, if the first call fails then the second call will
> have pos < start_pos and hit the BUG_ON. IOW, be sure to test your new
> ptrace call with an invalid userland pointer (e.g. NULL) passed in.

oops. Will fix it.

>
> In short, replace "ret |=" with "if (likely(!ret))\n\tret = ".
>
>
> This is up to you, but personally I would define something akin to:
>
> struct xstate_info {
> union {
> struct i387_fxsave_struct fxsave;
> struct {
> u64 i387[58];
> u64 xstate_fx_sw[XSTATE_FX_SW_WORDS];
> };
> };
> struct xsave_hdr_struct xsave_hdr;
> u64 xsave_state[0];
> };
>
> and then use offsetof rather than manual arithmetic in those
> user_regset_copyout calls. IMHO it would be better to have this in
> ptrace-abi.h along with some macros for what the fx_sw indices are (I guess
> only 0 is defined now, but whatever) rather than relegate that info to
> magic numbers in a comment and userland code doing manual arithmetic.
> But even if you just put it locally in ptrace.c, it makes the ptrace
> code less prone to possible arithmetic typos.

Agree. Will update it.

>
>
> > + case PTRACE_GETXSTATEREGS: /* Get the child extended state. */
> > + return copy_regset_to_user(child,
> > + task_user_regset_view(current),
> > + REGSET_XSTATE,
> > + 0, xstate_size,
> > + datap);
> > +
> > + case PTRACE_SETXSTATEREGS: /* Set the child extended state. */
> > + return copy_regset_from_user(child,
> > + task_user_regset_view(current),
> > + REGSET_XSTATE,
> > + 0, xstate_size,
> > + datap);
>
> I think this is a poor choice of interface for this. The other existing
> PTRACE_GET*REGS calls use a fixed-size and fixed-layout block that is a
> known constant in the userland ABI. Here, userland has no way to know
> xstate_size, so you are accessing a chunk of user memory where userland
> doesn't really know how much you are going to access.
>
> AFAICT from skimming the Intel manuals, there is no specified upper bound
> on how big the xsave size might grow in future processors. I would think
> that it might be limited to a page, since there is no way to indicate a
> partial copy to restart after a page fault. So for unpinned access (such
> as in user mode, or the user memory access in the signal frame setup), in
> full-thrashing situations an xsave spanning multiple pages might thrash
> totally and never make progress. OTOH, the manual does not say that the
> buffer can't span two pages today, just that it has to be 64-byte aligned.
> So perhaps we already have that issue (for signal frame setup or for direct
> user-space uses of xsave/xrstor) and it's just not really an issue that
> matters (thrashing is thrashing--it's already pathological, so who cares if
> it's literal livelock or not).

This issue is not new and gets handled in the same way as for existing
fxsave/fxrstor, as they don't specify page alignment restrictions.

> The upshot being, I don't think there is an
> obvious upper bound that userland can presume statically here.
>
> AFAICT, the only way for userland to guess xstate_size is to use cpuid
> itself. Even if that is actually reliable, or even if the kernel gave
> userland some other means to know the kernel's xstate_size value, IMHO that
> is a pretty dubious and error-prone way to construct the ABI. Usually when
> userland supplies a buffer whose size is not a permanent constant of the
> ABI, userland says how big a buffer it's passing in.
>
> The most obvious change would be just s/xstate_size/MIN(addr, xstate_size)/.
> i.e. userland passes in the maximum size it wants in the other argument.

Ok. I would like to enforce that "addr == xstate_size", so that we don't
have to complicate the kernel implementation in trying to interpret what
state the user knows based on the size passed by the user etc. User
should use cpuid to get the size of the xstate buffer supported by the
OS and processor. Kernel can check and throw an error with out silently
corrupting the user who is not following the requirements.

> But IMHO this is a fine opportunity not to add another one-off request for
> a particular regset type, and then never add another one again. Instead,
> we can add a general-purpose request to handle any regset based on what is
> already part of the userland ABI: the NT_* codes and the regset layouts
> they imply on each machine.
>
> e.g.
> struct iovec iov = { mybuffer, mylength };
> ret = ptrace(PTRACE_GETREGSET, NT_X86_XSTATE, &iov);
>
> or something along those lines. We could implement a generic
> PTRACE_GETREGSET and PTRACE_SETREGSET in {,compat_}ptrace_request() on
> CONFIG_HAVE_ARCH_TRACEHOOK machines.
>
> Then all any arch ever has to do in future is just define a new user_regset
> in its user_regset_view for a new thing.
>
> I'll make Oleg implement it if you don't do it yourself ;-). We can all
> work out together exactly what the interface should be, I don't have any
> special fixed ideas.

We probably have to extend regset infrastructure to track which NT_*
types are part of PTRACE_[GET|SET]REGSET and which are not. Also, I am
not sure if pushing the ptrace interpretation of the user pointer into
the regset routines is a good idea. Potentially this regset get/set
routines can be used by signal handling (in addition to kernel core dump
etc) aswell and for compatibility reasons etc their format might be
different from ptrace. Your idea sounds like a good idea in-general and
if there is a clean interface, then I can work with Oleg to make this
generic for future.

thanks,
suresh

2010-02-04 20:56:00

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

> Sure. Let's come to the agreement on the ABI changes and then I can
> split the patches accordingly.

The main point of splitting the patch is that the two new ABI components
are quite separate and the former need not wait for the latter. There is
no real question about the regset format, so (with fixes) you can send the
first patch now and we'll all sign off on it. Only the ptrace addition
really remains to be hashed out.

> > There are some obvious typos in that comment. Please fix those up.
>
> Can you please elaborate?

Just proofread it. I really did mean "obvious typos", i.e. spelling,
whitespace, punctuation, nothing more.

> This issue is not new and gets handled in the same way as for existing
> fxsave/fxrstor, as they don't specify page alignment restrictions.

I didn't suggest it was new. I was looking for some confirmation that
there is in fact no permanent (i.e. compile-time) size limit.

> Ok. I would like to enforce that "addr == xstate_size", so that we don't
> have to complicate the kernel implementation in trying to interpret what
> state the user knows based on the size passed by the user etc.

I don't think this is the right way to think about it. The regset code
does not need to do anything different at all. There will in future be
other callers of the regset hooks, that's what the whole interface is there
for. Regardless of whether modification is full or partial, you just
enforce the various bitmasks on the resultant buffer as you already do, and
that's all there is to it. If userland stores partial contents with a
bogus format, that's its problem. It's just like the program itself had
used xrstor in user mode with the same bogus buffer contents.

> User should use cpuid to get the size of the xstate buffer supported by
> the OS and processor.

They can if they want. But they can also just use a smaller size that
corresponds to the bitmasks they set in the buffer they pass in.

> Kernel can check and throw an error with out silently corrupting the user
> who is not following the requirements.

ptrace is simply giving you a way to edit the buffer passed to xrstor. You
can put garbage there via ptrace just like you can put garbage there with a
direct xrstor instruction in user mode. This is no different from all
other ptrace register access, and there is no reason to enforce anything
special about it.

> We probably have to extend regset infrastructure to track which NT_*
> types are part of PTRACE_[GET|SET]REGSET and which are not.

I don't understand what you mean. The point of the generic requests is
that they apply to any user_regset you want. user_regset does not need
anything new.

> Also, I am not sure if pushing the ptrace interpretation of the user
> pointer into the regset routines is a good idea.

I don't understand what you mean here at all. I did not suggest anything
that affects what the regset routines themselves do in any way at all.

It is an unacceptably bad idea to have any new ptrace interfaces for regset
access that do anything different than exactly let you get/set all or part
of a regset exactly as the arch's user_regset provides it.


Thanks,
Roland

2010-02-04 22:06:21

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

On Thu, 2010-02-04 at 12:55 -0800, Roland McGrath wrote:
> Just proofread it. I really did mean "obvious typos", i.e. spelling,
> whitespace, punctuation, nothing more.

Roland, All I found after double checking is:

> For now, only first 8 bytes of the sw_usable_bytes[464..467]

should be

> For now, only first 8 bytes of the sw_usable_bytes[464..471]

Let me know if I am overlooking something.

>
> > This issue is not new and gets handled in the same way as for existing
> > fxsave/fxrstor, as they don't specify page alignment restrictions.
>
> I didn't suggest it was new. I was looking for some confirmation that
> there is in fact no permanent (i.e. compile-time) size limit.

Yes. No size limit as of now.

> I don't think this is the right way to think about it. The regset code
> does not need to do anything different at all. There will in future be
> other callers of the regset hooks, that's what the whole interface is there
> for. Regardless of whether modification is full or partial, you just
> enforce the various bitmasks on the resultant buffer as you already do, and
> that's all there is to it. If userland stores partial contents with a
> bogus format, that's its problem. It's just like the program itself had
> used xrstor in user mode with the same bogus buffer contents.

Ok. I think I can agree, if we are ok with giving room for the ptrace
(or any other user of the API) to make a mistake and corrupt reg-state
of the process under debug, if it doesn't follow rules.

> > We probably have to extend regset infrastructure to track which NT_*
> > types are part of PTRACE_[GET|SET]REGSET and which are not.
>
> I don't understand what you mean. The point of the generic requests is
> that they apply to any user_regset you want. user_regset does not need
> anything new.

Thought some of them might be only relevant to core-dump or based on
permissions etc. But I guess get/set routines of the regset should be
able to take care of this?

> > Also, I am not sure if pushing the ptrace interpretation of the user
> > pointer into the regset routines is a good idea.
>
> I don't understand what you mean here at all. I did not suggest anything
> that affects what the regset routines themselves do in any way at all.
>
> It is an unacceptably bad idea to have any new ptrace interfaces for regset
> access that do anything different than exactly let you get/set all or part
> of a regset exactly as the arch's user_regset provides it.

So in the example you provided before:

struct iovec iov = { mybuffer, mylength };
ret = ptrace(PTRACE_GETREGSET, NT_X86_XSTATE, &iov);

You wanted to propose common data format (iov) for all of the NT_* ?

thanks,
suresh

2010-02-04 22:21:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

On 02/04/2010 02:05 PM, Suresh Siddha wrote:
>
> So in the example you provided before:
>
> struct iovec iov = { mybuffer, mylength };
> ret = ptrace(PTRACE_GETREGSET, NT_X86_XSTATE, &iov);
>
> You wanted to propose common data format (iov) for all of the NT_* ?
>

How about encoding the regset number into the command, e.g.
ptrace(PTRACE_GETREGS(NT_X86_XSTATE), length, buffer)

... where we have ...

#define PTRACE_GETREGS(r) (((r) << 16) | PTRACE_GETREGS_CMD)

... or something like that?

-hpa

2010-02-05 19:47:33

by Lu, Hongjiu

[permalink] [raw]
Subject: RE: [patch] x86: ptrace and core-dump extensions for xstate

>
> On 02/04/2010 02:05 PM, Suresh Siddha wrote:
> >
> > So in the example you provided before:
> >
> > struct iovec iov = { mybuffer, mylength };
> > ret = ptrace(PTRACE_GETREGSET, NT_X86_XSTATE, &iov);
> >
> > You wanted to propose common data format (iov) for all of the NT_* ?
> >
>
> How about encoding the regset number into the command, e.g.
> ptrace(PTRACE_GETREGS(NT_X86_XSTATE), length, buffer)
>
> ... where we have ...
>
> #define PTRACE_GETREGS(r) (((r) << 16) | PTRACE_GETREGS_CMD)
>
> ... or something like that?
>

I like this idea.

BTW, it should be

ptrace(PTRACE_GETREGS(NT_X86_XSTATE), pid, length, buffer)


H.J.

2010-02-05 21:05:21

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

> Roland, All I found after double checking is:
>
> > For now, only first 8 bytes of the sw_usable_bytes[464..467]
>
> should be
>
> > For now, only first 8 bytes of the sw_usable_bytes[464..471]

I didn't notice that one, because I didn't check any of your arithmetic.

> Let me know if I am overlooking something.

I only meant the pure English errors. Here is the comment with some
grammar and punctuation fixed:

+/*
+ * The structure layout used in PTRACE_GETXSTATEREGS/PTRACE_SETXSTATEREGS is
+ * the same as the memory layout of xsave used by the processor (except
+ * for the bytes 464..511, which can be used by the software). The size
+ * of the structure that users need to use for these two interfaces can be
+ * obtained by doing:
+ * cpuid_count(0xd, 0, &eax, &ptrace_xstateregs_struct_size, &ecx, &edx);
+ * i.e., cpuid.(eax=0xd,ecx=0).ebx will be the size that user (debuggers, etc.)
+ * need to use.
+ *
+ * The format of this structure will be like:
+ * struct {
+ * fxsave_bytes[0..463]
+ * sw_usable_bytes[464..511]
+ * xsave_hdr_bytes[512..575]
+ * avx_bytes[576..831]
+ * future_state etc
+ * }
+ *
+ * The same memory layout will be used for the core-dump NT_X86_XSTATE
+ * note representing the xstate registers.
+ *
+ * For now, only the first 8 bytes of the sw_usable_bytes[464..471] will
+ * be used and will be set to OS enabled xstate mask (which is same as the
+ * 64bit mask returned by the xgetbv's xCR0). Users (analyzing core dump
+ * remotely, etc.) can use this mask as well as the mask saved in the
+ * xstate_hdr bytes and interpret what states the processor/OS supports
+ * and what states are in modified/initialized conditions for the
+ * particular process/thread.
+ *
+ * Also when the user modifies certain state FP/SSE/etc through this
+ * PTRACE_SETXSTATEREGS, they must ensure that the xsave_hdr.xstate_bv
+ * bytes[512..519] of the above memory layout are updated correspondingly.
+ * i.e., for example when FP state is modified to a non-init state,
+ * xsave_hdr.xstate_bv's bit 0 must be set to '1', when SSE is modified to
+ * non-init state, xsave_hdr.xstate_bv's bit 1 must to be set to '1', etc.
+ */

> Yes. No size limit as of now.

Ok. Thanks for the clear answer.

> Ok. I think I can agree, if we are ok with giving room for the ptrace
> (or any other user of the API) to make a mistake and corrupt reg-state
> of the process under debug, if it doesn't follow rules.

This is not new, and it's not a problem. The bottom line is that ptrace
can do whatever the process could have done to itself.

> Thought some of them might be only relevant to core-dump or based on
> permissions etc. But I guess get/set routines of the regset should be
> able to take care of this?

The whole point of user_regset is that whatever user-space machine state
there is to be had can be accessed consistently by whatever means. If
there is something that you don't want debuggers to be able to access, then
user_regset is not where it belongs. There is no rationale I can imagine
for having something in a core dump but not readable by debuggers when the
process is still alive.

The only issue that I can imagine you might be referring to when you say
"based on permissions etc." is state that you cannot set arbitrarily from
user mode. The only such example we have is NT_386_IOPERM, which you just
cannot set at all via user_regset. If we were to allow debuggers to change
that data at all, then its .set function would need to do some permission
checking.

For everything else that is really part of the user-mode register state per
se, it just doesn't make sense to think about any kind of "permission
checking" beyond the simple masks applied to eflags, mxcsr, etc. The
debugger can make the user process do anything that the process could do
itself, which of course includes changing all its registers.

> So in the example you provided before:
>
> struct iovec iov = { mybuffer, mylength };
> ret = ptrace(PTRACE_GETREGSET, NT_X86_XSTATE, &iov);
>
> You wanted to propose common data format (iov) for all of the NT_* ?

I'm not sure I understand your question. The iovec is just API trivia,
part of communicating "I want this regset and I want these bytes of it".
This has nothing to do with the actual data format of each regset.


Thanks,
Roland

2010-02-05 21:16:16

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

> #define PTRACE_GETREGS(r) (((r) << 16) | PTRACE_GETREGS_CMD)
>
> ... or something like that?

(You can't use that exact name, it's taken.) IMHO this is some spurious
obfuscation that is not warranted by saving the two get_user calls in the
kernel. (OTOH, my suggestion requires a whole extra 5 lines of code or so
in compat_sys_ptrace because the indirection in the ABI is sensitive to
userland word size.) But I don't feel strongly about the particulars of
the ptrace API addition, just that it be generic to cover any regset and
not be prone to implicit buffer-size miscommunications. I'll leave it to
whatever Oleg wants to implement.


Thanks,
Roland

2010-02-05 21:40:21

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

On Fri, 2010-02-05 at 13:15 -0800, Roland McGrath wrote:
> > #define PTRACE_GETREGS(r) (((r) << 16) | PTRACE_GETREGS_CMD)
> >
> > ... or something like that?
>
> (You can't use that exact name, it's taken.) IMHO this is some spurious
> obfuscation that is not warranted by saving the two get_user calls in the
> kernel.

Also value of NT_PRXFPREG complicates things bit more

#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */

In this context, we have to perhaps use bits 31 and 29 to represent this
generic ptrace interface and the corresponding GET/SET commands.

> (OTOH, my suggestion requires a whole extra 5 lines of code or so
> in compat_sys_ptrace because the indirection in the ABI is sensitive to
> userland word size.) But I don't feel strongly about the particulars of
> the ptrace API addition, just that it be generic to cover any regset and
> not be prone to implicit buffer-size miscommunications. I'll leave it to
> whatever Oleg wants to implement.

Ok. I will split the previous patch in to two patches and re-post it. I
can help Oleg with reviewing and testing the generic implementation
whenever it is ready.

thanks,
suresh

2010-02-09 17:29:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] x86: ptrace and core-dump extensions for xstate

Sorry for delay!

On 02/05, Suresh Siddha wrote:
>
> On Fri, 2010-02-05 at 13:15 -0800, Roland McGrath wrote:
>
> > (OTOH, my suggestion requires a whole extra 5 lines of code or so
> > in compat_sys_ptrace because the indirection in the ABI is sensitive to
> > userland word size.) But I don't feel strongly about the particulars of
> > the ptrace API addition, just that it be generic to cover any regset and
> > not be prone to implicit buffer-size miscommunications. I'll leave it to
> > whatever Oleg wants to implement.
>
> Ok. I will split the previous patch in to two patches and re-post it. I
> can help Oleg with reviewing and testing the generic implementation
> whenever it is ready.

OK, I'll try to do this "asap", this generic implementation looks easy.

Except, I am not sure I really understand the problem, need to read
the regset code and then re-read this thread ;)

Oleg.