2022-08-05 23:04:33

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v4 1/2] x86/fpu: Allow PKRU to be (once again) written by ptrace.

From: Kyle Huey <[email protected]>

When management of the PKRU register was moved away from XSTATE, emulation
of PKRU's existence in XSTATE was added for APIs that read XSTATE, but not
for APIs that write XSTATE. This can be seen by running gdb and executing
`p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected kernels (5.14+) the
write to the PKRU register (which gdb performs through ptrace) is ignored.

There are three relevant APIs: PTRACE_SETREGSET with NT_X86_XSTATE,
sigreturn, and KVM_SET_XSAVE. KVM_SET_XSAVE has its own special handling to
make PKRU writes take effect (in fpu_copy_uabi_to_guest_fpstate). Push that
down into copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE
and sigreturn pass in pointers to the appropriate PKRU value.

This also adds code to initialize the PKRU value to the hardware init value
(namely 0) if the PKRU bit is not set in the XSTATE header to match XRSTOR.
This is a change to the current KVM_SET_XSAVE behavior.

Changelog since v3:
- The v3 patch is now part 1 of 2.
- Adds a selftest in part 2 of 2.

Changelog since v2:
- Removed now unused variables in fpu_copy_uabi_to_guest_fpstate

Changelog since v1:
- Handles the error case of copy_to_buffer().

Signed-off-by: Kyle Huey <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected] # For edge case behavior of KVM_SET_XSAVE
Cc: [email protected] # 5.14+
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
---
arch/x86/kernel/fpu/core.c | 13 +------------
arch/x86/kernel/fpu/regset.c | 2 +-
arch/x86/kernel/fpu/signal.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 28 +++++++++++++++++++++++-----
arch/x86/kernel/fpu/xstate.h | 4 ++--
5 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3b28c5b25e12..46b935bc87c8 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
{
struct fpstate *kstate = gfpu->fpstate;
const union fpregs_state *ustate = buf;
- struct pkru_state *xpkru;
- int ret;

if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
@@ -406,16 +404,7 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;

- ret = copy_uabi_from_kernel_to_xstate(kstate, ustate);
- if (ret)
- return ret;
-
- /* Retrieve PKRU if not in init state */
- if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
- xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU);
- *vpkru = xpkru->pkru;
- }
- return 0;
+ return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 75ffaef8c299..6d056b68f4ed 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
}

fpu_force_restore(fpu);
- ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
+ ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru);

out:
vfree(tmpbuf);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 91d4b6de58ab..558076dbde5b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,

fpregs = &fpu->fpstate->regs;
if (use_xsave() && !fx_only) {
- if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx))
+ if (copy_sigframe_from_user_to_xstate(tsk, buf_fx))
return false;
} else {
if (__copy_from_user(&fpregs->fxsave, buf_fx,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..e01d3514ae68 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,


static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
- const void __user *ubuf)
+ const void __user *ubuf, u32 *pkru)
{
struct xregs_state *xsave = &fpstate->regs.xsave;
unsigned int offset, size;
@@ -1235,6 +1235,24 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
for (i = 0; i < XFEATURE_MAX; i++) {
mask = BIT_ULL(i);

+ if (i == XFEATURE_PKRU) {
+ /*
+ * Retrieve PKRU if not in init state, otherwise
+ * initialize it.
+ */
+ if (hdr.xfeatures & mask) {
+ struct pkru_state xpkru = {0};
+
+ if (copy_from_buffer(&xpkru, xstate_offsets[i],
+ sizeof(xpkru), kbuf, ubuf))
+ return -EFAULT;
+
+ *pkru = xpkru.pkru;
+ } else {
+ *pkru = 0;
+ }
+ }
+
if (hdr.xfeatures & mask) {
void *dst = __raw_xsave_addr(xsave, i);

@@ -1264,9 +1282,9 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
* Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
* format and copy to the target thread. Used by ptrace and KVM.
*/
-int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru)
{
- return copy_uabi_to_xstate(fpstate, kbuf, NULL);
+ return copy_uabi_to_xstate(fpstate, kbuf, NULL, pkru);
}

/*
@@ -1274,10 +1292,10 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
* XSAVE[S] format and copy to the target thread. This is called from the
* sigreturn() and rt_sigreturn() system calls.
*/
-int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate,
+int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(fpstate, NULL, ubuf);
+ return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
}

static bool validate_independent_components(u64 mask)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 5ad47031383b..a4ecb04d8d64 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
-extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
-extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
+extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
+extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);


extern void fpu__init_cpu_xstate(void);
--
2.37.0


2022-08-05 23:05:57

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v4 2/2] selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace

From: Kyle Huey <[email protected]>

This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
removing the PKRU bit from XSTATE_BV.

Signed-off-by: Kyle Huey <[email protected]>
---
tools/testing/selftests/vm/pkey-x86.h | 12 +++
tools/testing/selftests/vm/protection_keys.c | 88 +++++++++++++++++++-
2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
index b078ce9c6d2a..72c14cd3ddc7 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -104,6 +104,18 @@ static inline int cpu_has_pkeys(void)
return 1;
}

+static inline int cpu_max_xsave_size(void)
+{
+ unsigned long XSTATE_CPUID = 0xd;
+ unsigned int eax;
+ unsigned int ebx;
+ unsigned int ecx;
+ unsigned int edx;
+
+ __cpuid_count(XSTATE_CPUID, 0, eax, ebx, ecx, edx);
+ return ecx;
+}
+
static inline u32 pkey_bit_position(int pkey)
{
return pkey * PKEY_BITS_PER_PKEY;
diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
index 291bc1e07842..27759d3ed9cd 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -18,12 +18,13 @@
* do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
*
* Compile like this:
- * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
- * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
+ * gcc -mxsave -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
+ * gcc -mxsave -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
*/
#define _GNU_SOURCE
#define __SANE_USERSPACE_TYPES__
#include <errno.h>
+#include <linux/elf.h>
#include <linux/futex.h>
#include <time.h>
#include <sys/time.h>
@@ -1550,6 +1551,86 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
}

+#if defined(__i386__) || defined(__x86_64__)
+void test_ptrace_modifies_pkru(int *ptr, u16 pkey)
+{
+ pid_t child;
+ int status, ret;
+ int pkey_offset = pkey_reg_xstate_offset();
+ size_t xsave_size = cpu_max_xsave_size();
+ void *xsave;
+ u32 *pkey_register;
+ u64 *xstate_bv;
+ struct iovec iov;
+
+ child = fork();
+ pkey_assert(child >= 0);
+ dprintf3("[%d] fork() ret: %d\n", getpid(), child);
+ if (!child) {
+ u32 pkey_register = read_pkey_reg();
+
+ ptrace(PTRACE_TRACEME, 0, 0, 0);
+ raise(SIGSTOP);
+
+ /*
+ * need __read_pkey_reg() version so we do not do shadow_pkey_reg
+ * checking
+ */
+ if (pkey_register == __read_pkey_reg())
+ exit(1);
+
+ raise(SIGSTOP);
+
+ exit(__read_pkey_reg());
+ }
+
+ pkey_assert(child == waitpid(child, &status, 0));
+ dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
+ pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
+
+ xsave = (void *)malloc(xsave_size);
+ pkey_assert(xsave > 0);
+
+ iov.iov_base = xsave;
+ iov.iov_len = xsave_size;
+ ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
+ pkey_assert(ret == 0);
+
+ pkey_register = (u32 *)(xsave + pkey_offset);
+ pkey_assert(*pkey_register == read_pkey_reg());
+
+ *pkey_register = !read_pkey_reg();
+
+ ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
+ pkey_assert(ret == 0);
+
+ ret = ptrace(PTRACE_CONT, child, 0, 0);
+ pkey_assert(ret == 0);
+
+ pkey_assert(child == waitpid(child, &status, 0));
+ dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
+ pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
+
+ ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
+ pkey_assert(ret == 0);
+
+ xstate_bv = (u64 *)(xsave + 512);
+ *xstate_bv &= ~(1 << 9);
+
+ ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
+ pkey_assert(ret == 0);
+
+ ret = ptrace(PTRACE_CONT, child, 0, 0);
+ pkey_assert(ret == 0);
+
+ pkey_assert(child == waitpid(child, &status, 0));
+ dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
+ pkey_assert(WIFEXITED(status));
+ pkey_assert(WEXITSTATUS(status) == 0);
+ free(xsave);
+}
+#endif
+
void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey)
{
int size = PAGE_SIZE;
@@ -1585,6 +1666,9 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_pkey_syscalls_bad_args,
test_pkey_alloc_exhaust,
test_pkey_alloc_free_attach_pkey0,
+#if defined(__i386__) || defined(__x86_64__)
+ test_ptrace_modifies_pkru,
+#endif
};

void run_tests_once(void)
--
2.37.0

2022-08-06 08:59:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace


* Kyle Huey <[email protected]> wrote:

> From: Kyle Huey <[email protected]>
>
> This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
> removing the PKRU bit from XSTATE_BV.
>
> Signed-off-by: Kyle Huey <[email protected]>
> ---
> tools/testing/selftests/vm/pkey-x86.h | 12 +++
> tools/testing/selftests/vm/protection_keys.c | 88 +++++++++++++++++++-
> 2 files changed, 98 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
> index b078ce9c6d2a..72c14cd3ddc7 100644
> --- a/tools/testing/selftests/vm/pkey-x86.h
> +++ b/tools/testing/selftests/vm/pkey-x86.h
> @@ -104,6 +104,18 @@ static inline int cpu_has_pkeys(void)
> return 1;
> }
>
> +static inline int cpu_max_xsave_size(void)
> +{
> + unsigned long XSTATE_CPUID = 0xd;
> + unsigned int eax;
> + unsigned int ebx;
> + unsigned int ecx;
> + unsigned int edx;
> +
> + __cpuid_count(XSTATE_CPUID, 0, eax, ebx, ecx, edx);
> + return ecx;
> +}
> +
> static inline u32 pkey_bit_position(int pkey)
> {
> return pkey * PKEY_BITS_PER_PKEY;
> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index 291bc1e07842..27759d3ed9cd 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -18,12 +18,13 @@
> * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
> *
> * Compile like this:
> - * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> - * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> + * gcc -mxsave -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> + * gcc -mxsave -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> */
> #define _GNU_SOURCE
> #define __SANE_USERSPACE_TYPES__
> #include <errno.h>
> +#include <linux/elf.h>
> #include <linux/futex.h>
> #include <time.h>
> #include <sys/time.h>
> @@ -1550,6 +1551,86 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
> do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
> }
>
> +#if defined(__i386__) || defined(__x86_64__)
> +void test_ptrace_modifies_pkru(int *ptr, u16 pkey)
> +{
> + pid_t child;
> + int status, ret;
> + int pkey_offset = pkey_reg_xstate_offset();
> + size_t xsave_size = cpu_max_xsave_size();
> + void *xsave;
> + u32 *pkey_register;
> + u64 *xstate_bv;
> + struct iovec iov;
> +
> + child = fork();
> + pkey_assert(child >= 0);
> + dprintf3("[%d] fork() ret: %d\n", getpid(), child);
> + if (!child) {
> + u32 pkey_register = read_pkey_reg();
> +
> + ptrace(PTRACE_TRACEME, 0, 0, 0);
> + raise(SIGSTOP);
> +
> + /*
> + * need __read_pkey_reg() version so we do not do shadow_pkey_reg
> + * checking
> + */
> + if (pkey_register == __read_pkey_reg())
> + exit(1);
> +
> + raise(SIGSTOP);
> +
> + exit(__read_pkey_reg());
> + }
> +
> + pkey_assert(child == waitpid(child, &status, 0));
> + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> +
> + xsave = (void *)malloc(xsave_size);
> + pkey_assert(xsave > 0);
> +
> + iov.iov_base = xsave;
> + iov.iov_len = xsave_size;
> + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> + pkey_assert(ret == 0);
> +
> + pkey_register = (u32 *)(xsave + pkey_offset);
> + pkey_assert(*pkey_register == read_pkey_reg());
> +
> + *pkey_register = !read_pkey_reg();
> +
> + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> + pkey_assert(ret == 0);
> +
> + ret = ptrace(PTRACE_CONT, child, 0, 0);
> + pkey_assert(ret == 0);
> +
> + pkey_assert(child == waitpid(child, &status, 0));
> + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> +
> + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> + pkey_assert(ret == 0);
> +
> + xstate_bv = (u64 *)(xsave + 512);
> + *xstate_bv &= ~(1 << 9);
> +
> + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> + pkey_assert(ret == 0);
> +
> + ret = ptrace(PTRACE_CONT, child, 0, 0);
> + pkey_assert(ret == 0);
> +
> + pkey_assert(child == waitpid(child, &status, 0));
> + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> + pkey_assert(WIFEXITED(status));
> + pkey_assert(WEXITSTATUS(status) == 0);
> + free(xsave);

LGTM.

May I ask for a bit more in terms of testing the ABI: writing some
non-trivial (not all-zero and not all-ones) value into the PKRU register,
forcing the child task to go through a FPU save/restore context switch
and then reading it back and verifying the value, or something like that?

Thanks,

Ingo

2022-08-06 13:00:41

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace

On Sat, Aug 6, 2022 at 1:52 AM Ingo Molnar <[email protected]> wrote:
>
>
> * Kyle Huey <[email protected]> wrote:
>
> > From: Kyle Huey <[email protected]>
> >
> > This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
> > removing the PKRU bit from XSTATE_BV.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > ---
> > tools/testing/selftests/vm/pkey-x86.h | 12 +++
> > tools/testing/selftests/vm/protection_keys.c | 88 +++++++++++++++++++-
> > 2 files changed, 98 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
> > index b078ce9c6d2a..72c14cd3ddc7 100644
> > --- a/tools/testing/selftests/vm/pkey-x86.h
> > +++ b/tools/testing/selftests/vm/pkey-x86.h
> > @@ -104,6 +104,18 @@ static inline int cpu_has_pkeys(void)
> > return 1;
> > }
> >
> > +static inline int cpu_max_xsave_size(void)
> > +{
> > + unsigned long XSTATE_CPUID = 0xd;
> > + unsigned int eax;
> > + unsigned int ebx;
> > + unsigned int ecx;
> > + unsigned int edx;
> > +
> > + __cpuid_count(XSTATE_CPUID, 0, eax, ebx, ecx, edx);
> > + return ecx;
> > +}
> > +
> > static inline u32 pkey_bit_position(int pkey)
> > {
> > return pkey * PKEY_BITS_PER_PKEY;
> > diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> > index 291bc1e07842..27759d3ed9cd 100644
> > --- a/tools/testing/selftests/vm/protection_keys.c
> > +++ b/tools/testing/selftests/vm/protection_keys.c
> > @@ -18,12 +18,13 @@
> > * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
> > *
> > * Compile like this:
> > - * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > - * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > + * gcc -mxsave -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > + * gcc -mxsave -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > */
> > #define _GNU_SOURCE
> > #define __SANE_USERSPACE_TYPES__
> > #include <errno.h>
> > +#include <linux/elf.h>
> > #include <linux/futex.h>
> > #include <time.h>
> > #include <sys/time.h>
> > @@ -1550,6 +1551,86 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
> > do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
> > }
> >
> > +#if defined(__i386__) || defined(__x86_64__)
> > +void test_ptrace_modifies_pkru(int *ptr, u16 pkey)
> > +{
> > + pid_t child;
> > + int status, ret;
> > + int pkey_offset = pkey_reg_xstate_offset();
> > + size_t xsave_size = cpu_max_xsave_size();
> > + void *xsave;
> > + u32 *pkey_register;
> > + u64 *xstate_bv;
> > + struct iovec iov;
> > +
> > + child = fork();
> > + pkey_assert(child >= 0);
> > + dprintf3("[%d] fork() ret: %d\n", getpid(), child);
> > + if (!child) {
> > + u32 pkey_register = read_pkey_reg();
> > +
> > + ptrace(PTRACE_TRACEME, 0, 0, 0);
> > + raise(SIGSTOP);
> > +
> > + /*
> > + * need __read_pkey_reg() version so we do not do shadow_pkey_reg
> > + * checking
> > + */
> > + if (pkey_register == __read_pkey_reg())
> > + exit(1);
> > +
> > + raise(SIGSTOP);
> > +
> > + exit(__read_pkey_reg());
> > + }
> > +
> > + pkey_assert(child == waitpid(child, &status, 0));
> > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > +
> > + xsave = (void *)malloc(xsave_size);
> > + pkey_assert(xsave > 0);
> > +
> > + iov.iov_base = xsave;
> > + iov.iov_len = xsave_size;
> > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > + pkey_assert(ret == 0);
> > +
> > + pkey_register = (u32 *)(xsave + pkey_offset);
> > + pkey_assert(*pkey_register == read_pkey_reg());
> > +
> > + *pkey_register = !read_pkey_reg();
> > +
> > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > + pkey_assert(ret == 0);
> > +
> > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > + pkey_assert(ret == 0);
> > +
> > + pkey_assert(child == waitpid(child, &status, 0));
> > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > +
> > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > + pkey_assert(ret == 0);
> > +
> > + xstate_bv = (u64 *)(xsave + 512);
> > + *xstate_bv &= ~(1 << 9);
> > +
> > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > + pkey_assert(ret == 0);
> > +
> > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > + pkey_assert(ret == 0);
> > +
> > + pkey_assert(child == waitpid(child, &status, 0));
> > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > + pkey_assert(WIFEXITED(status));
> > + pkey_assert(WEXITSTATUS(status) == 0);
> > + free(xsave);
>
> LGTM.
>
> May I ask for a bit more in terms of testing the ABI: writing some
> non-trivial (not all-zero and not all-ones) value into the PKRU register,
> forcing the child task to go through a FPU save/restore context switch
> and then reading it back and verifying the value, or something like that?

Can you elaborate a bit on what you mean here? I'm not sure what "a
FPU save/restore context switch" is. The XSTATE (and everything else)
will be saved/restored at the ptrace stops (for the raise(SIGSTOP)s)
already.

- Kyle

> Thanks,
>
> Ingo

2022-08-06 19:23:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace


* Kyle Huey <[email protected]> wrote:

> On Sat, Aug 6, 2022 at 1:52 AM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Kyle Huey <[email protected]> wrote:
> >
> > > From: Kyle Huey <[email protected]>
> > >
> > > This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
> > > removing the PKRU bit from XSTATE_BV.
> > >
> > > Signed-off-by: Kyle Huey <[email protected]>
> > > ---
> > > tools/testing/selftests/vm/pkey-x86.h | 12 +++
> > > tools/testing/selftests/vm/protection_keys.c | 88 +++++++++++++++++++-
> > > 2 files changed, 98 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
> > > index b078ce9c6d2a..72c14cd3ddc7 100644
> > > --- a/tools/testing/selftests/vm/pkey-x86.h
> > > +++ b/tools/testing/selftests/vm/pkey-x86.h
> > > @@ -104,6 +104,18 @@ static inline int cpu_has_pkeys(void)
> > > return 1;
> > > }
> > >
> > > +static inline int cpu_max_xsave_size(void)
> > > +{
> > > + unsigned long XSTATE_CPUID = 0xd;
> > > + unsigned int eax;
> > > + unsigned int ebx;
> > > + unsigned int ecx;
> > > + unsigned int edx;
> > > +
> > > + __cpuid_count(XSTATE_CPUID, 0, eax, ebx, ecx, edx);
> > > + return ecx;
> > > +}
> > > +
> > > static inline u32 pkey_bit_position(int pkey)
> > > {
> > > return pkey * PKEY_BITS_PER_PKEY;
> > > diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> > > index 291bc1e07842..27759d3ed9cd 100644
> > > --- a/tools/testing/selftests/vm/protection_keys.c
> > > +++ b/tools/testing/selftests/vm/protection_keys.c
> > > @@ -18,12 +18,13 @@
> > > * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
> > > *
> > > * Compile like this:
> > > - * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > - * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > + * gcc -mxsave -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > + * gcc -mxsave -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > */
> > > #define _GNU_SOURCE
> > > #define __SANE_USERSPACE_TYPES__
> > > #include <errno.h>
> > > +#include <linux/elf.h>
> > > #include <linux/futex.h>
> > > #include <time.h>
> > > #include <sys/time.h>
> > > @@ -1550,6 +1551,86 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
> > > do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
> > > }
> > >
> > > +#if defined(__i386__) || defined(__x86_64__)
> > > +void test_ptrace_modifies_pkru(int *ptr, u16 pkey)
> > > +{
> > > + pid_t child;
> > > + int status, ret;
> > > + int pkey_offset = pkey_reg_xstate_offset();
> > > + size_t xsave_size = cpu_max_xsave_size();
> > > + void *xsave;
> > > + u32 *pkey_register;
> > > + u64 *xstate_bv;
> > > + struct iovec iov;
> > > +
> > > + child = fork();
> > > + pkey_assert(child >= 0);
> > > + dprintf3("[%d] fork() ret: %d\n", getpid(), child);
> > > + if (!child) {
> > > + u32 pkey_register = read_pkey_reg();
> > > +
> > > + ptrace(PTRACE_TRACEME, 0, 0, 0);
> > > + raise(SIGSTOP);
> > > +
> > > + /*
> > > + * need __read_pkey_reg() version so we do not do shadow_pkey_reg
> > > + * checking
> > > + */
> > > + if (pkey_register == __read_pkey_reg())
> > > + exit(1);
> > > +
> > > + raise(SIGSTOP);
> > > +
> > > + exit(__read_pkey_reg());
> > > + }
> > > +
> > > + pkey_assert(child == waitpid(child, &status, 0));
> > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > > +
> > > + xsave = (void *)malloc(xsave_size);
> > > + pkey_assert(xsave > 0);
> > > +
> > > + iov.iov_base = xsave;
> > > + iov.iov_len = xsave_size;
> > > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > + pkey_assert(ret == 0);
> > > +
> > > + pkey_register = (u32 *)(xsave + pkey_offset);
> > > + pkey_assert(*pkey_register == read_pkey_reg());
> > > +
> > > + *pkey_register = !read_pkey_reg();
> > > +
> > > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > + pkey_assert(ret == 0);
> > > +
> > > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > > + pkey_assert(ret == 0);
> > > +
> > > + pkey_assert(child == waitpid(child, &status, 0));
> > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > > +
> > > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > + pkey_assert(ret == 0);
> > > +
> > > + xstate_bv = (u64 *)(xsave + 512);
> > > + *xstate_bv &= ~(1 << 9);
> > > +
> > > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > + pkey_assert(ret == 0);
> > > +
> > > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > > + pkey_assert(ret == 0);
> > > +
> > > + pkey_assert(child == waitpid(child, &status, 0));
> > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > + pkey_assert(WIFEXITED(status));
> > > + pkey_assert(WEXITSTATUS(status) == 0);
> > > + free(xsave);
> >
> > LGTM.
> >
> > May I ask for a bit more in terms of testing the ABI: writing some
> > non-trivial (not all-zero and not all-ones) value into the PKRU register,
> > forcing the child task to go through a FPU save/restore context switch
> > and then reading it back and verifying the value, or something like that?
>
> Can you elaborate a bit on what you mean here? I'm not sure what "a
> FPU save/restore context switch" is. The XSTATE (and everything else)
> will be saved/restored at the ptrace stops (for the raise(SIGSTOP)s)
> already.

Yeah, here I meant that the ptraced child actually has to execute to carry
the new values - and AFAICS that already happens in your testcase, as
there's a PTRACE_CONT+waitpid() between the PTRACE_SETREGSET and the second
PTRACE_GETREGSET call, right?

If so, then the testcase should be mostly fine, except would it make sense
to use something less trivial than clearing the permission bitmask:

> > > + xstate_bv = (u64 *)(xsave + 512);
> > > + *xstate_bv &= ~(1 << 9);

if I'm reading the code right? A 01010101 bitmask perhaps?

Thanks,

Ingo

2022-08-06 19:36:57

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace

On Sat, Aug 6, 2022 at 11:55 AM Ingo Molnar <[email protected]> wrote:
>
>
> * Kyle Huey <[email protected]> wrote:
>
> > On Sat, Aug 6, 2022 at 1:52 AM Ingo Molnar <[email protected]> wrote:
> > >
> > >
> > > * Kyle Huey <[email protected]> wrote:
> > >
> > > > From: Kyle Huey <[email protected]>
> > > >
> > > > This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
> > > > removing the PKRU bit from XSTATE_BV.
> > > >
> > > > Signed-off-by: Kyle Huey <[email protected]>
> > > > ---
> > > > tools/testing/selftests/vm/pkey-x86.h | 12 +++
> > > > tools/testing/selftests/vm/protection_keys.c | 88 +++++++++++++++++++-
> > > > 2 files changed, 98 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
> > > > index b078ce9c6d2a..72c14cd3ddc7 100644
> > > > --- a/tools/testing/selftests/vm/pkey-x86.h
> > > > +++ b/tools/testing/selftests/vm/pkey-x86.h
> > > > @@ -104,6 +104,18 @@ static inline int cpu_has_pkeys(void)
> > > > return 1;
> > > > }
> > > >
> > > > +static inline int cpu_max_xsave_size(void)
> > > > +{
> > > > + unsigned long XSTATE_CPUID = 0xd;
> > > > + unsigned int eax;
> > > > + unsigned int ebx;
> > > > + unsigned int ecx;
> > > > + unsigned int edx;
> > > > +
> > > > + __cpuid_count(XSTATE_CPUID, 0, eax, ebx, ecx, edx);
> > > > + return ecx;
> > > > +}
> > > > +
> > > > static inline u32 pkey_bit_position(int pkey)
> > > > {
> > > > return pkey * PKEY_BITS_PER_PKEY;
> > > > diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> > > > index 291bc1e07842..27759d3ed9cd 100644
> > > > --- a/tools/testing/selftests/vm/protection_keys.c
> > > > +++ b/tools/testing/selftests/vm/protection_keys.c
> > > > @@ -18,12 +18,13 @@
> > > > * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
> > > > *
> > > > * Compile like this:
> > > > - * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > - * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > + * gcc -mxsave -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > + * gcc -mxsave -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > */
> > > > #define _GNU_SOURCE
> > > > #define __SANE_USERSPACE_TYPES__
> > > > #include <errno.h>
> > > > +#include <linux/elf.h>
> > > > #include <linux/futex.h>
> > > > #include <time.h>
> > > > #include <sys/time.h>
> > > > @@ -1550,6 +1551,86 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
> > > > do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
> > > > }
> > > >
> > > > +#if defined(__i386__) || defined(__x86_64__)
> > > > +void test_ptrace_modifies_pkru(int *ptr, u16 pkey)
> > > > +{
> > > > + pid_t child;
> > > > + int status, ret;
> > > > + int pkey_offset = pkey_reg_xstate_offset();
> > > > + size_t xsave_size = cpu_max_xsave_size();
> > > > + void *xsave;
> > > > + u32 *pkey_register;
> > > > + u64 *xstate_bv;
> > > > + struct iovec iov;
> > > > +
> > > > + child = fork();
> > > > + pkey_assert(child >= 0);
> > > > + dprintf3("[%d] fork() ret: %d\n", getpid(), child);
> > > > + if (!child) {
> > > > + u32 pkey_register = read_pkey_reg();
> > > > +
> > > > + ptrace(PTRACE_TRACEME, 0, 0, 0);
> > > > + raise(SIGSTOP);
> > > > +
> > > > + /*
> > > > + * need __read_pkey_reg() version so we do not do shadow_pkey_reg
> > > > + * checking
> > > > + */
> > > > + if (pkey_register == __read_pkey_reg())
> > > > + exit(1);
> > > > +
> > > > + raise(SIGSTOP);
> > > > +
> > > > + exit(__read_pkey_reg());
> > > > + }
> > > > +
> > > > + pkey_assert(child == waitpid(child, &status, 0));
> > > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > > > +
> > > > + xsave = (void *)malloc(xsave_size);
> > > > + pkey_assert(xsave > 0);
> > > > +
> > > > + iov.iov_base = xsave;
> > > > + iov.iov_len = xsave_size;
> > > > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > + pkey_assert(ret == 0);
> > > > +
> > > > + pkey_register = (u32 *)(xsave + pkey_offset);
> > > > + pkey_assert(*pkey_register == read_pkey_reg());
> > > > +
> > > > + *pkey_register = !read_pkey_reg();
> > > > +
> > > > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > + pkey_assert(ret == 0);
> > > > +
> > > > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > > > + pkey_assert(ret == 0);
> > > > +
> > > > + pkey_assert(child == waitpid(child, &status, 0));
> > > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > > > +
> > > > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > + pkey_assert(ret == 0);
> > > > +
> > > > + xstate_bv = (u64 *)(xsave + 512);
> > > > + *xstate_bv &= ~(1 << 9);
> > > > +
> > > > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > + pkey_assert(ret == 0);
> > > > +
> > > > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > > > + pkey_assert(ret == 0);
> > > > +
> > > > + pkey_assert(child == waitpid(child, &status, 0));
> > > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > > + pkey_assert(WIFEXITED(status));
> > > > + pkey_assert(WEXITSTATUS(status) == 0);
> > > > + free(xsave);
> > >
> > > LGTM.
> > >
> > > May I ask for a bit more in terms of testing the ABI: writing some
> > > non-trivial (not all-zero and not all-ones) value into the PKRU register,
> > > forcing the child task to go through a FPU save/restore context switch
> > > and then reading it back and verifying the value, or something like that?
> >
> > Can you elaborate a bit on what you mean here? I'm not sure what "a
> > FPU save/restore context switch" is. The XSTATE (and everything else)
> > will be saved/restored at the ptrace stops (for the raise(SIGSTOP)s)
> > already.
>
> Yeah, here I meant that the ptraced child actually has to execute to carry
> the new values - and AFAICS that already happens in your testcase, as
> there's a PTRACE_CONT+waitpid() between the PTRACE_SETREGSET and the second
> PTRACE_GETREGSET call, right?

Yeah. The gdb command sequence I reported is essentially doing
PTRACE_SETREGSET immediately followed by PTRACE_GETREGSET with no
intervening execution. And while that is a visible manifestation of
the bug, what I really care about is the modifications being reflected
in the execution of the ptracee. I could add code to read back the
value via PTRACE_GETREGSET too if desired.

> If so, then the testcase should be mostly fine, except would it make sense
> to use something less trivial than clearing the permission bitmask:
>
> > > > + xstate_bv = (u64 *)(xsave + 512);
> > > > + *xstate_bv &= ~(1 << 9);
>
> if I'm reading the code right? A 01010101 bitmask perhaps?

So there's two tests (the two PTRACE_SETREGSET calls). The first one
tests modifying the PKRU value stored in the XSTATE. I actually meant
this to negate the existing PKRU value and flip all the bits but I
realize now I wrote ! instead of ~ (! is the bitwise negation operator
in Rust, which is what I do most of the time). !PKRU is still
different from PKRU though, so the test does fail on affected kernels
and pass after the fix which the child sees that the PKRU value did
change somehow and doesn't take the early exit(1).

The second part, which you've highlighted, tests clearing the PKRU bit
in the XSTATE_BV field. This would cause an XRSTOR to set PKRU to the
hardware init value (0) and the test checks that the value is indeed
zero (by having the child exit with the PKRU register value as its
return code, and then checking that the child exited with code 0).

- Kyle

> Thanks,
>
> Ingo

2022-08-06 19:40:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace


* Kyle Huey <[email protected]> wrote:

> On Sat, Aug 6, 2022 at 11:55 AM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Kyle Huey <[email protected]> wrote:
> >
> > > On Sat, Aug 6, 2022 at 1:52 AM Ingo Molnar <[email protected]> wrote:
> > > >
> > > >
> > > > * Kyle Huey <[email protected]> wrote:
> > > >
> > > > > From: Kyle Huey <[email protected]>
> > > > >
> > > > > This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
> > > > > removing the PKRU bit from XSTATE_BV.
> > > > >
> > > > > Signed-off-by: Kyle Huey <[email protected]>
> > > > > ---
> > > > > tools/testing/selftests/vm/pkey-x86.h | 12 +++
> > > > > tools/testing/selftests/vm/protection_keys.c | 88 +++++++++++++++++++-
> > > > > 2 files changed, 98 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
> > > > > index b078ce9c6d2a..72c14cd3ddc7 100644
> > > > > --- a/tools/testing/selftests/vm/pkey-x86.h
> > > > > +++ b/tools/testing/selftests/vm/pkey-x86.h
> > > > > @@ -104,6 +104,18 @@ static inline int cpu_has_pkeys(void)
> > > > > return 1;
> > > > > }
> > > > >
> > > > > +static inline int cpu_max_xsave_size(void)
> > > > > +{
> > > > > + unsigned long XSTATE_CPUID = 0xd;
> > > > > + unsigned int eax;
> > > > > + unsigned int ebx;
> > > > > + unsigned int ecx;
> > > > > + unsigned int edx;
> > > > > +
> > > > > + __cpuid_count(XSTATE_CPUID, 0, eax, ebx, ecx, edx);
> > > > > + return ecx;
> > > > > +}
> > > > > +
> > > > > static inline u32 pkey_bit_position(int pkey)
> > > > > {
> > > > > return pkey * PKEY_BITS_PER_PKEY;
> > > > > diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> > > > > index 291bc1e07842..27759d3ed9cd 100644
> > > > > --- a/tools/testing/selftests/vm/protection_keys.c
> > > > > +++ b/tools/testing/selftests/vm/protection_keys.c
> > > > > @@ -18,12 +18,13 @@
> > > > > * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
> > > > > *
> > > > > * Compile like this:
> > > > > - * gcc -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > > - * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > > + * gcc -mxsave -o protection_keys -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > > + * gcc -mxsave -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> > > > > */
> > > > > #define _GNU_SOURCE
> > > > > #define __SANE_USERSPACE_TYPES__
> > > > > #include <errno.h>
> > > > > +#include <linux/elf.h>
> > > > > #include <linux/futex.h>
> > > > > #include <time.h>
> > > > > #include <sys/time.h>
> > > > > @@ -1550,6 +1551,86 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
> > > > > do_not_expect_pkey_fault("plain read on recently PROT_EXEC area");
> > > > > }
> > > > >
> > > > > +#if defined(__i386__) || defined(__x86_64__)
> > > > > +void test_ptrace_modifies_pkru(int *ptr, u16 pkey)
> > > > > +{
> > > > > + pid_t child;
> > > > > + int status, ret;
> > > > > + int pkey_offset = pkey_reg_xstate_offset();
> > > > > + size_t xsave_size = cpu_max_xsave_size();
> > > > > + void *xsave;
> > > > > + u32 *pkey_register;
> > > > > + u64 *xstate_bv;
> > > > > + struct iovec iov;
> > > > > +
> > > > > + child = fork();
> > > > > + pkey_assert(child >= 0);
> > > > > + dprintf3("[%d] fork() ret: %d\n", getpid(), child);
> > > > > + if (!child) {
> > > > > + u32 pkey_register = read_pkey_reg();
> > > > > +
> > > > > + ptrace(PTRACE_TRACEME, 0, 0, 0);
> > > > > + raise(SIGSTOP);
> > > > > +
> > > > > + /*
> > > > > + * need __read_pkey_reg() version so we do not do shadow_pkey_reg
> > > > > + * checking
> > > > > + */
> > > > > + if (pkey_register == __read_pkey_reg())
> > > > > + exit(1);
> > > > > +
> > > > > + raise(SIGSTOP);
> > > > > +
> > > > > + exit(__read_pkey_reg());
> > > > > + }
> > > > > +
> > > > > + pkey_assert(child == waitpid(child, &status, 0));
> > > > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > > > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > > > > +
> > > > > + xsave = (void *)malloc(xsave_size);
> > > > > + pkey_assert(xsave > 0);
> > > > > +
> > > > > + iov.iov_base = xsave;
> > > > > + iov.iov_len = xsave_size;
> > > > > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > > + pkey_assert(ret == 0);
> > > > > +
> > > > > + pkey_register = (u32 *)(xsave + pkey_offset);
> > > > > + pkey_assert(*pkey_register == read_pkey_reg());
> > > > > +
> > > > > + *pkey_register = !read_pkey_reg();
> > > > > +
> > > > > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > > + pkey_assert(ret == 0);
> > > > > +
> > > > > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > > > > + pkey_assert(ret == 0);
> > > > > +
> > > > > + pkey_assert(child == waitpid(child, &status, 0));
> > > > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > > > + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
> > > > > +
> > > > > + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > > + pkey_assert(ret == 0);
> > > > > +
> > > > > + xstate_bv = (u64 *)(xsave + 512);
> > > > > + *xstate_bv &= ~(1 << 9);
> > > > > +
> > > > > + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_X86_XSTATE, &iov);
> > > > > + pkey_assert(ret == 0);
> > > > > +
> > > > > + ret = ptrace(PTRACE_CONT, child, 0, 0);
> > > > > + pkey_assert(ret == 0);
> > > > > +
> > > > > + pkey_assert(child == waitpid(child, &status, 0));
> > > > > + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status);
> > > > > + pkey_assert(WIFEXITED(status));
> > > > > + pkey_assert(WEXITSTATUS(status) == 0);
> > > > > + free(xsave);
> > > >
> > > > LGTM.
> > > >
> > > > May I ask for a bit more in terms of testing the ABI: writing some
> > > > non-trivial (not all-zero and not all-ones) value into the PKRU register,
> > > > forcing the child task to go through a FPU save/restore context switch
> > > > and then reading it back and verifying the value, or something like that?
> > >
> > > Can you elaborate a bit on what you mean here? I'm not sure what "a
> > > FPU save/restore context switch" is. The XSTATE (and everything else)
> > > will be saved/restored at the ptrace stops (for the raise(SIGSTOP)s)
> > > already.
> >
> > Yeah, here I meant that the ptraced child actually has to execute to carry
> > the new values - and AFAICS that already happens in your testcase, as
> > there's a PTRACE_CONT+waitpid() between the PTRACE_SETREGSET and the second
> > PTRACE_GETREGSET call, right?
>
> Yeah. The gdb command sequence I reported is essentially doing
> PTRACE_SETREGSET immediately followed by PTRACE_GETREGSET with no
> intervening execution. And while that is a visible manifestation of
> the bug, what I really care about is the modifications being reflected
> in the execution of the ptracee. I could add code to read back the
> value via PTRACE_GETREGSET too if desired.

Yeah, that would be nice and completes the test cycle pretty robustly if
the value written isn't "trivial" - and 'flipping' is fine:

> > If so, then the testcase should be mostly fine, except would it make sense
> > to use something less trivial than clearing the permission bitmask:
> >
> > > > > + xstate_bv = (u64 *)(xsave + 512);
> > > > > + *xstate_bv &= ~(1 << 9);
> >
> > if I'm reading the code right? A 01010101 bitmask perhaps?
>
> So there's two tests (the two PTRACE_SETREGSET calls). The first one
> tests modifying the PKRU value stored in the XSTATE. I actually meant
> this to negate the existing PKRU value and flip all the bits but I
> realize now I wrote ! instead of ~ (! is the bitwise negation operator
> in Rust, which is what I do most of the time). !PKRU is still
> different from PKRU though, so the test does fail on affected kernels
> and pass after the fix which the child sees that the PKRU value did
> change somehow and doesn't take the early exit(1).

Yeah, flipping the bits is even better, as it excercises all the bits.

> The second part, which you've highlighted, tests clearing the PKRU bit in
> the XSTATE_BV field. This would cause an XRSTOR to set PKRU to the
> hardware init value (0) and the test checks that the value is indeed zero
> (by having the child exit with the PKRU register value as its return
> code, and then checking that the child exited with code 0).

OK - I missed the 'PKRU == exit code' trick in the child, although
*technically* the range of exit codes is restricted, with silent clipping
of bits, so it's not a full 32-bit return code. I'd suggest exiting with a
known exit code instead if the PKRU value departs from expectations. A
debug session in that case will tell all the details - in the general case
we don't really expect these tests to fail.

Thanks,

Ingo