2021-06-02 10:21:39

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

syszbot reported a warnon for XRSTOR raising #GP:

https://lore.kernel.org/r/[email protected]

with a syzcaller reproducer and a conclusive bisect result.

It took a while to destill a simple C reproducer out of it which allowed to
pin point the root cause: The recent addition of supervisor XSTATEs broke
the signal restore path for the case where the signal handler wreckaged the
XSTATE on stack because it does not sanitize the XSTATE header which causes
a subsequent XRSTOR to fail and #GP.

The following series addresses the problem and fixes related issues which
were found while inspecting the related changes.

Thanks to Andy and Dave for working on this with me!

Thanks,

tglx
---
arch/x86/include/asm/fpu/xstate.h | 4
arch/x86/kernel/fpu/core.c | 62 ++++++---
arch/x86/kernel/fpu/regset.c | 43 ++----
arch/x86/kernel/fpu/signal.c | 30 +++-
arch/x86/kernel/fpu/xstate.c | 95 +++++----------
b/tools/testing/selftests/x86/corrupt_xstate_header.c | 114 ++++++++++++++++++
tools/testing/selftests/x86/Makefile | 3
7 files changed, 234 insertions(+), 117 deletions(-)




2021-06-02 21:30:33

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On 6/2/2021 2:55 AM, Thomas Gleixner wrote:
> syszbot reported a warnon for XRSTOR raising #GP:
>
> https://lore.kernel.org/r/[email protected]
>
> with a syzcaller reproducer and a conclusive bisect result.
>
> It took a while to destill a simple C reproducer out of it which allowed to
> pin point the root cause: The recent addition of supervisor XSTATEs broke
> the signal restore path for the case where the signal handler wreckaged the
> XSTATE on stack because it does not sanitize the XSTATE header which causes
> a subsequent XRSTOR to fail and #GP.
>
> The following series addresses the problem and fixes related issues which
> were found while inspecting the related changes.
>
> Thanks to Andy and Dave for working on this with me!
>
> Thanks,
>
> tglx
> ---
> arch/x86/include/asm/fpu/xstate.h | 4
> arch/x86/kernel/fpu/core.c | 62 ++++++---
> arch/x86/kernel/fpu/regset.c | 43 ++----
> arch/x86/kernel/fpu/signal.c | 30 +++-
> arch/x86/kernel/fpu/xstate.c | 95 +++++----------
> b/tools/testing/selftests/x86/corrupt_xstate_header.c | 114 ++++++++++++++++++
> tools/testing/selftests/x86/Makefile | 3
> 7 files changed, 234 insertions(+), 117 deletions(-)
>

With the series applied, glibc pkey test fails sometimes. I will try to
find out the cause.

The test:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/tst-pkey.c

The output:

error: ../sysdeps/unix/sysv/linux/tst-pkey.c:161: not true:
!check_page_access (i, false)
error: ../sysdeps/unix/sysv/linux/tst-pkey.c:162: not true:
!check_page_access (i, true)
error: ../sysdeps/unix/sysv/linux/tst-pkey.c:161: not true:
!check_page_access (i, false)
error: ../sysdeps/unix/sysv/linux/tst-pkey.c:162: not true:
!check_page_access (i, true)
error: ../sysdeps/unix/sysv/linux/tst-pkey.c:161: not true:
!check_page_access (i, false)
error: ../sysdeps/unix/sysv/linux/tst-pkey.c:162: not true:
!check_page_access (i, true)
../sysdeps/unix/sysv/linux/tst-pkey.c:238: numeric comparison failure
left: 0 (0x0); from: result->access_rights[i]
right: 1 (0x1); from: PKEY_DISABLE_ACCESS
../sysdeps/unix/sysv/linux/tst-pkey.c:238: numeric comparison failure
left: 0 (0x0); from: result->access_rights[i]
right: 1 (0x1); from: PKEY_DISABLE_ACCESS
../sysdeps/unix/sysv/linux/tst-pkey.c:238: numeric comparison failure
left: 0 (0x0); from: result->access_rights[i]
right: 1 (0x1); from: PKEY_DISABLE_ACCESS
../sysdeps/unix/sysv/linux/tst-pkey.c:242: numeric comparison failure
left: 0 (0x0); from: result->access_rights[i]
right: 1 (0x1); from: PKEY_DISABLE_ACCESS
../sysdeps/unix/sysv/linux/tst-pkey.c:242: numeric comparison failure
left: 0 (0x0); from: result->access_rights[i]
right: 1 (0x1); from: PKEY_DISABLE_ACCESS
../sysdeps/unix/sysv/linux/tst-pkey.c:242: numeric comparison failure
left: 0 (0x0); from: result->access_rights[i]
right: 1 (0x1); from: PKEY_DISABLE_ACCESS
error: 12 test failures

2021-06-04 14:07:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On Wed, Jun 02 2021 at 14:28, Yu-cheng Yu wrote:
> On 6/2/2021 2:55 AM, Thomas Gleixner wrote:
>
> With the series applied, glibc pkey test fails sometimes. I will try to
> find out the cause.

That fails not sometimes. It fails always due to patch 7/8. The reason
is that before that patch fpu__clear_all() reinitialized the hardware
which includes writing the initial PKRU value.

But fpu__initialize() does not store the initial PKRU value in the
task's xstate which breaks PKRU. As that patch is not urgent we can
postpone it for not.

But looking at the above, it's not clear to me why that PKRU stuff works
at all (upstream), but I'll figure it out eventually. I'm quite sure
that it does work by pure chance not by design.

Thanks,

tglx

2021-06-04 16:31:45

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On 6/4/2021 7:05 AM, Thomas Gleixner wrote:
> On Wed, Jun 02 2021 at 14:28, Yu-cheng Yu wrote:
>> On 6/2/2021 2:55 AM, Thomas Gleixner wrote:
>>
>> With the series applied, glibc pkey test fails sometimes. I will try to
>> find out the cause.
>
> That fails not sometimes. It fails always due to patch 7/8. The reason
> is that before that patch fpu__clear_all() reinitialized the hardware
> which includes writing the initial PKRU value.
>
> But fpu__initialize() does not store the initial PKRU value in the
> task's xstate which breaks PKRU. As that patch is not urgent we can
> postpone it for not.
>
> But looking at the above, it's not clear to me why that PKRU stuff works
> at all (upstream), but I'll figure it out eventually. I'm quite sure
> that it does work by pure chance not by design.
>

Thanks for the update. I will continue looking at it.

Yu-cheng

2021-06-04 17:50:35

by Dave Hansen

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On 6/4/21 7:05 AM, Thomas Gleixner wrote:
> But looking at the above, it's not clear to me why that PKRU stuff works
> at all (upstream), but I'll figure it out eventually. I'm quite sure
> that it does work by pure chance not by design.

The upstream flush_thread() code appears correct and even intentionally
so. Just how we must eagerly load PKRU on a context switch, the
fpu__clear*() code eagerly "clears" PKRU. It doesn't actually zero it,
of course, but reverts the register and the fpstate back to the
'init_pkru_value':

flush_thread()->fpu__clear_all()->fpu__clear(user_only=false)
copy_init_fpstate_to_fpregs()
copy_kernel_to_xregs(init_fpu) // fpstate
copy_init_pkru_to_fpregs()
write_pkru(init_pkru_value_snapshot) // fpregs

Andy said you have a fix for this, but I think the new fpu__clear_all()
is failing to do the eager write_pkru().

2021-06-04 18:18:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On Fri, Jun 04 2021 at 10:46, Dave Hansen wrote:
> On 6/4/21 7:05 AM, Thomas Gleixner wrote:
>> But looking at the above, it's not clear to me why that PKRU stuff works
>> at all (upstream), but I'll figure it out eventually. I'm quite sure
>> that it does work by pure chance not by design.
>
> The upstream flush_thread() code appears correct and even intentionally
> so. Just how we must eagerly load PKRU on a context switch, the
> fpu__clear*() code eagerly "clears" PKRU. It doesn't actually zero it,
> of course, but reverts the register and the fpstate back to the
> 'init_pkru_value':
>
> flush_thread()->fpu__clear_all()->fpu__clear(user_only=false)
> copy_init_fpstate_to_fpregs()
> copy_kernel_to_xregs(init_fpu) // fpstate
> copy_init_pkru_to_fpregs()
> write_pkru(init_pkru_value_snapshot) // fpregs
>
> Andy said you have a fix for this, but I think the new fpu__clear_all()
> is failing to do the eager write_pkru().

Yes, that's the reason and it took some time until I realized that
fpu__initialize() is inconsistent vs. PKRU.

We can't use copy_init_pkru_to_fregs() either because that's not
updating the xsaves area because XFEATURE_PKRU has been cleared.
Yay for consistency!

I'll post a fix soonish after testing it.

Thanks,

tglx

2021-06-04 22:08:15

by Dave Hansen

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

The nice Intel 0day folks threw some tests at this series. It managed
to trigger an oops. I can't right this moment publish the source for
the test, but it looks pretty trivial. It basically creates a 0'd XSAVE
buffer, sets XCOMP_BV to:

#define XSAVES_FEATURES ( \
XFEATURE_MASK_PT | \
XFEATURE_MASK_SHSTK_USER | \
XFEATURE_MASK_SHSTK_KERNEL | \
0x8000000000000000 \
)

Then passes that buffer in to ptrace(PTRACE_SETREGSET, ...).

The oops is below. It doesn't *look* to be XSAVES-related. The oops
looks like it's happening in xstateregs_set() itself as opposed to down
in the code actually concerned with supervisor state.

No bug is jumping out of the code as I took a brief look at it. The
xbuf versus kbuf code looks a bit wonky, but I can't find a hole in it.

I'll see if I can reproduce this locally.

> [ 38.612897] BAT XSAVE
> [ 38.612902]
> [ 38.619525] os
> [ 38.619528]
> [ 38.625885] enumeration
> [ 38.625890]
> [ 38.633098] schedule check
> [ 38.633105]
> [ 38.640308] fork check
> [ 38.640321]
> [ 38.647154] signal check
> [ 38.647164]
> [ 38.654065] thread check
> [ 38.654072]
> [ 38.661036] multi thread check
> [ 38.661042]
> [ 75.771095] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 75.779596] #PF: supervisor read access in kernel mode
> [ 75.785980] #PF: error_code(0x0000) - not-present page
> [ 75.792321] PGD 0 P4D 0
> [ 75.795742] Oops: 0000 [#1] SMP PTI
> [ 75.800233] CPU: 94 PID: 2145 Comm: ptrace_sys_stat Not tainted 5.13.0-rc4-00096-g2c38e60b245f #1
> [ 75.810753] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0067.R02.1507221722 07/22/2015
> [ 75.822934] RIP: 0010:xstateregs_set+0x9f/0x140
> [ 75.828591] Code: e0 5d 41 5c 41 5d 41 5e 41 5f c3 41 89 cf 4d 89 ce 4c 89 ff e8 32 87 2e 00 48 89 c5 48 85 c0 0f 84 8a 00 00 00 45 85 e4 74 20 <48> 8b 34 25 00 00 00 00 48 85 f6 74 2f 4c 89 fa 48 89 c7 e8 89 78
> [ 75.850813] RSP: 0018:ffffc90023a47e38 EFLAGS: 00010202
> [ 75.857241] RAX: ffffc9000c9dd000 RBX: ffff88c107824f80 RCX: 8000000000000063
> [ 75.865827] RDX: 0000000000000001 RSI: ffff888100000000 RDI: ffff888100203328
> [ 75.874376] RBP: ffffc9000c9dd000 R08: ffff88810cdffef0 R09: 8000000000000063
> [ 75.882928] R10: 0000000000000ee8 R11: ffff88a08159c3f0 R12: 0000000000000340
> [ 75.891427] R13: ffff88c1078273c0 R14: 000055576f8960c0 R15: 0000000000000340
> [ 75.899961] FS: 00007f50ad027500(0000) GS:ffff88bfff980000(0000) knlGS:0000000000000000
> [ 75.909533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 75.916458] CR2: 0000000000000000 CR3: 000000208df04004 CR4: 00000000001706e0
> [ 75.924975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 75.933434] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 75.941911] Call Trace:
> [ 75.945118] ptrace_request+0x4cb/0x600
> [ 75.949893] arch_ptrace+0x125/0x300
> [ 75.954350] ? ptrace_check_attach+0x69/0x140
> [ 75.959690] __x64_sys_ptrace+0x80/0x140
> [ 75.964532] do_syscall_64+0x40/0x80
> [ 75.969004] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 75.975120] RIP: 0033:0x7f50acf5692f
> [ 75.979581] Code: c7 44 24 10 18 00 00 00 48 89 44 24 18 48 8d 44 24 30 8b 70 08 48 8b 50 10 4c 0f 43 50 18 48 89 44 24 20 b8 65 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 48 85 c0 78 06 41 83 f8 02 76 1e 48 8b 4c
> [ 76.001535] RSP: 002b:00007fff18a9f6f0 EFLAGS: 00000202 ORIG_RAX: 0000000000000065
> [ 76.010465] RAX: ffffffffffffffda RBX: 0000000000000862 RCX: 00007f50acf5692f
> [ 76.018960] RDX: 0000000000000202 RSI: 0000000000000862 RDI: 0000000000004205
> [ 76.027433] RBP: 000055576f8960c0 R08: 0000000000004204 R09: 00007f50ad027500
> [ 76.035947] R10: 00007fff18a9f770 R11: 0000000000000202 R12: 0000000000000340
> [ 76.044428] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 76.052957] Modules linked in: xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 irqbypass crct10dif_pclmul drm_kms_helper crc32_pclmul ipmi_ssif crc32c_intel ghash_clmulni_intel rapl syscopyarea sysfillrect intel_cstate ahci sysimgblt fb_sys_fops libahci mpt3sas acpi_ipmi intel_uncore raid_class drm libata ipmi_si joydev scsi_transport_sas wmi ipmi_devintf ipmi_msghandler acpi_pad ip_tables
> [ 76.104215] CR2: 0000000000000000
> [ 76.108641] ---[ end trace dd34396ee89519a4 ]---
> [ 76.114448] RIP: 0010:xstateregs_set+0x9f/0x140
> [ 76.120185] Code: e0 5d 41 5c 41 5d 41 5e 41 5f c3 41 89 cf 4d 89 ce 4c 89 ff e8 32 87 2e 00 48 89 c5 48 85 c0 0f 84 8a 00 00 00 45 85 e4 74 20 <48> 8b 34 25 00 00 00 00 48 85 f6 74 2f 4c 89 fa 48 89 c7 e8 89 78
> [ 76.142571] RSP: 0018:ffffc90023a47e38 EFLAGS: 00010202
> [ 76.149152] RAX: ffffc9000c9dd000 RBX: ffff88c107824f80 RCX: 8000000000000063
> [ 76.157855] RDX: 0000000000000001 RSI: ffff888100000000 RDI: ffff888100203328
> [ 76.166562] RBP: ffffc9000c9dd000 R08: ffff88810cdffef0 R09: 8000000000000063
> [ 76.175265] R10: 0000000000000ee8 R11: ffff88a08159c3f0 R12: 0000000000000340
> [ 76.183967] R13: ffff88c1078273c0 R14: 000055576f8960c0 R15: 0000000000000340
> [ 76.192662] FS: 00007f50ad027500(0000) GS:ffff88bfff980000(0000) knlGS:0000000000000000
> [ 76.202425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 76.209590] CR2: 0000000000000000 CR3: 000000208df04004 CR4: 00000000001706e0
> [ 76.218266] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 76.226957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 76.235597] Kernel panic - not syncing: Fatal exception
> [ 76.284889] Kernel Offset: disabled
> ACPI MEMORY or I/O RESET_REG.

2021-06-05 10:22:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On Fri, Jun 04 2021 at 15:04, Dave Hansen wrote:
> The nice Intel 0day folks threw some tests at this series. It managed
> to trigger an oops. I can't right this moment publish the source for
> the test, but it looks pretty trivial. It basically creates a 0'd XSAVE
> buffer, sets XCOMP_BV to:
>
> #define XSAVES_FEATURES ( \
> XFEATURE_MASK_PT | \
> XFEATURE_MASK_SHSTK_USER | \
> XFEATURE_MASK_SHSTK_KERNEL | \
> 0x8000000000000000 \
> )
>
> Then passes that buffer in to ptrace(PTRACE_SETREGSET, ...).
>
> The oops is below. It doesn't *look* to be XSAVES-related. The oops
> looks like it's happening in xstateregs_set() itself as opposed to down
> in the code actually concerned with supervisor state.
>
> No bug is jumping out of the code as I took a brief look at it. The
> xbuf versus kbuf code looks a bit wonky, but I can't find a hole in it.

I can....

--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -128,7 +128,7 @@ int xstateregs_set(struct task_struct *t
xbuf = vmalloc(count);
if (!xbuf)
return -ENOMEM;
- ret = user_regset_copyin(&pos, &count, NULL, &ubuf, xbuf, 0, -1);
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xbuf, 0, -1);
if (ret)
goto out;
}

2021-06-05 11:58:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/8] x86/fpu: Mop up XSAVES and related damage

On Sat, Jun 05 2021 at 12:18, Thomas Gleixner wrote:
> On Fri, Jun 04 2021 at 15:04, Dave Hansen wrote:
>> No bug is jumping out of the code as I took a brief look at it. The
>> xbuf versus kbuf code looks a bit wonky, but I can't find a hole in it.
>
> I can....
>
> --- a/arch/x86/kernel/fpu/regset.c
> +++ b/arch/x86/kernel/fpu/regset.c
> @@ -128,7 +128,7 @@ int xstateregs_set(struct task_struct *t
> xbuf = vmalloc(count);
> if (!xbuf)
> return -ENOMEM;
> - ret = user_regset_copyin(&pos, &count, NULL, &ubuf, xbuf, 0, -1);
> + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xbuf, 0, -1);
> if (ret)
> goto out;
> }

But this whole user_regset_copyin() is pointless here. See below.

Thanks,

tglx
---
include/asm/fpu/xstate.h | 4 ----
kernel/fpu/regset.c | 40 ++++++++++++++++------------------------
kernel/fpu/xstate.c | 12 +++++++-----
3 files changed, 23 insertions(+), 33 deletions(-)

--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr
void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);

-
-/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
-int validate_user_xstate_header(const struct xstate_header *hdr);
-
#endif
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -6,6 +6,9 @@
#include <asm/fpu/signal.h>
#include <asm/fpu/regset.h>
#include <asm/fpu/xstate.h>
+
+#include <linux/vmalloc.h>
+
#include <linux/sched/task_stack.h>

/*
@@ -108,7 +111,7 @@ int xstateregs_set(struct task_struct *t
const void *kbuf, const void __user *ubuf)
{
struct fpu *fpu = &target->thread.fpu;
- struct xregs_state *xsave;
+ struct xregs_state *xbuf = NULL;
int ret;

if (!boot_cpu_has(X86_FEATURE_XSAVE))
@@ -120,32 +123,21 @@ int xstateregs_set(struct task_struct *t
if (pos != 0 || count != fpu_user_xstate_size)
return -EFAULT;

- xsave = &fpu->state.xsave;
-
- fpu__prepare_write(fpu);
-
- if (using_compacted_format()) {
- if (kbuf)
- ret = copy_kernel_to_xstate(xsave, kbuf);
- else
- ret = copy_user_to_xstate(xsave, ubuf);
- } else {
- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
- if (!ret)
- ret = validate_user_xstate_header(&xsave->header);
+ if (!kbuf) {
+ xbuf = vmalloc(count);
+ if (!xbuf)
+ return -ENOMEM;
+ if (copy_from_user(xbuf, ubuf, count)) {
+ ret = -EFAULT;
+ goto out;
+ }
}

- /*
- * mxcsr reserved bits must be masked to zero for security reasons.
- */
- xsave->i387.mxcsr &= mxcsr_feature_mask;
-
- /*
- * In case of failure, mark all states as init:
- */
- if (ret)
- fpstate_init(&fpu->state);
+ fpu__prepare_write(fpu);
+ ret = copy_kernel_to_xstate(&fpu->state.xsave, kbuf ? kbuf : xbuf);

+out:
+ vfree(xbuf);
return ret;
}

--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -515,7 +515,7 @@ int using_compacted_format(void)
}

/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */
-int validate_user_xstate_header(const struct xstate_header *hdr)
+static int validate_user_xstate_header(const struct xstate_header *hdr)
{
/* No unknown or supervisor features may be set */
if (hdr->xfeatures & ~xfeatures_mask_user())
@@ -1172,14 +1172,16 @@ int copy_kernel_to_xstate(struct xregs_s
*/
xsave->header.xfeatures |= hdr.xfeatures;

+ /* mxcsr reserved bits must be masked to zero for security reasons. */
+ xsave->i387.mxcsr &= mxcsr_feature_mask;
+
return 0;
}

/*
- * Convert from a ptrace or sigreturn standard-format user-space buffer to
- * kernel XSAVES format and copy to the target thread. This is called from
- * xstateregs_set(), as well as potentially from the sigreturn() and
- * rt_sigreturn() system calls.
+ * Convert from a sigreturn standard-format user-space buffer to kernel
+ * XSAVES format and copy to the target thread. This is called from the
+ * sigreturn() and rt_sigreturn() system calls.
*/
int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
{