2023-12-06 01:22:41

by Michal Luczaj

[permalink] [raw]
Subject: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses

User space executing opcode SGDT on a UMIP-enabled CPU results in
#GP(0). In an effort to support legacy applications, #GP handler calls
fixup_umip_exception() to patch up the exception and return a dummy
value:

if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
goto exit;
}

SGDT is emulated by decoding the instruction and copying dummy data to
the memory address specified by the operand:

uaddr = insn_get_addr_ref(&insn, regs);
if ((unsigned long)uaddr == -1L)
return false;

nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
if (nr_copied > 0) {
/*
* If copy fails, send a signal and tell caller that
* fault was fixed up.
*/
force_sig_info_umip_fault(uaddr, regs);
return true;
}

Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of
`uaddr` will be correctly (in compatibility mode) shifted by the base
address of the segment. There's a catch though: decoder does not check
segment's DPL, nor its type.

ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0
segment, potentially one with a kernel space base address. On return to
user space, CPU will reject such selector load; exception will be
raised. But because the #GP handler sees no distinction between
SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick
in and process the malformed @regs->ss.

If the first 4 GiB of user space are unmapped or non-writable,
copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e.
the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for
each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few
steps.

Introducing a DPL check in insn_get_seg_base(), or even in get_desc(),
seems enough to prevent the decoder from disclosing data.

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 558a605929db..4c1eea736519 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
if (!get_desc(&desc, sel))
return -1L;

+ /*
+ * Some segment selectors coming from @regs do not necessarily reflect
+ * the state of CPU; see get_segment_selector(). Their values might
+ * have been altered by ptrace. Thus, the instruction decoder can be
+ * tricked into "dereferencing" a segment descriptor that would
+ * otherwise cause a CPU exception -- for example due to mismatched
+ * privilege levels. This opens up the possibility to expose kernel
+ * space base address of DPL=0 segments.
+ */
+ if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK))
+ return -1L;
+
return get_desc_base(&desc);
}

That said, I guess instead of trying to harden the decoder, it would be
better to ensure any emulation is attempted only when @regs do for sure
reflect the state of CPU. I.e. when #GP comes directly from the user
space, not via a bad IRET.

In other words, can the context be somehow tainted during bad IRET
handling -- signalling to the #GP handler that emulation should be
avoided?

Now, I'm far from being competent, but here's an idea I've tried: tell
the #GP handler that UMIP-related exceptions come only as #GP(0):

if (static_cpu_has(X86_FEATURE_UMIP)) {
- if (user_mode(regs) && fixup_umip_exception(regs))
+ if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
goto exit;
}

To my understanding of Intel SDM, a bad IRET can theoretically cause a
#GP(0), too. So for that occasion, would it be okay to "poison" the
value of error code in fixup_bad_iret()? Along the lines of:

/* Copy the remainder of the stack from the current stack. */
__memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));

+ /* Suppress the error code. */
+ tmp.orig_ax = -1UL;
+
/* Update the entry stack */
__memcpy(new_stack, &tmp, sizeof(tmp));

Admittedly, this feels hackish. And I've realized there's also the case
of ESPFIX: #DF handler explicitly sets up a #GP(0) frame before
forwarding the execution to the #GP handler.

Thanks,
Michal

Michal Luczaj (2):
x86/traps: Attempt UMIP fixup only on #GP(0)
selftests/x86: UMIP DPL=0 segment base address info leak test

arch/x86/kernel/traps.c | 2 +-
tools/testing/selftests/x86/Makefile | 6 +-
tools/testing/selftests/x86/umip_leak_seg.c | 249 ++++++++++++++++++++
3 files changed, 255 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/umip_leak_seg.c

--
2.43.0


2023-12-09 15:53:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses

On Wed, Dec 06, 2023 at 01:43:43AM +0100, Michal Luczaj wrote:
> Introducing a DPL check in insn_get_seg_base(), or even in get_desc(),
> seems enough to prevent the decoder from disclosing data.
>
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 558a605929db..4c1eea736519 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
> if (!get_desc(&desc, sel))
> return -1L;
>
> + /*
> + * Some segment selectors coming from @regs do not necessarily reflect
> + * the state of CPU; see get_segment_selector(). Their values might
> + * have been altered by ptrace. Thus, the instruction decoder can be
> + * tricked into "dereferencing" a segment descriptor that would
> + * otherwise cause a CPU exception -- for example due to mismatched
> + * privilege levels. This opens up the possibility to expose kernel
> + * space base address of DPL=0 segments.
> + */
> + if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK))
> + return -1L;
> +
> return get_desc_base(&desc);
> }
>
> That said, I guess instead of trying to harden the decoder,

Well, here's what my CPU manual says:

"4.10.1 Accessing Data Segments

...

The processor compares the effective privilege level with the DPL in the
descriptor-table entry referenced by the segment selector. If the
effective privilege level is greater than or equal to (numerically
lower-than or equal-to) the DPL, then the processor loads the segment
register with the data-segment selector.

If the effective privilege level is lower than (numerically
greater-than) the DPL, a general-protection exception (#GP) occurs and
the segment register is not loaded.

...

4.10.2 Accessing Stack Segments

The processor compares the CPL with the DPL in the descriptor-table
entry referenced by the segment selector. The two values must be equal.
If they are not equal, a #GP occurs and the SS register is not loaded."

So *actually* doing those checks in the insn decoder is the proper thing
to do, IMNSVHO.

> Now, I'm far from being competent, but here's an idea I've tried: tell
> the #GP handler that UMIP-related exceptions come only as #GP(0):
>
> if (static_cpu_has(X86_FEATURE_UMIP)) {
> - if (user_mode(regs) && fixup_umip_exception(regs))
> + if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
> goto exit;
> }

And yap, as you've realized, that alone doesn't fix the leaking.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-12-09 17:16:45

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses

On Tue, Dec 5, 2023 at 8:22 PM Michal Luczaj <[email protected]> wrote:
>
> User space executing opcode SGDT on a UMIP-enabled CPU results in
> #GP(0). In an effort to support legacy applications, #GP handler calls
> fixup_umip_exception() to patch up the exception and return a dummy
> value:
>
> if (static_cpu_has(X86_FEATURE_UMIP)) {
> if (user_mode(regs) && fixup_umip_exception(regs))
> goto exit;
> }
>
> SGDT is emulated by decoding the instruction and copying dummy data to
> the memory address specified by the operand:
>
> uaddr = insn_get_addr_ref(&insn, regs);
> if ((unsigned long)uaddr == -1L)
> return false;
>
> nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> if (nr_copied > 0) {
> /*
> * If copy fails, send a signal and tell caller that
> * fault was fixed up.
> */
> force_sig_info_umip_fault(uaddr, regs);
> return true;
> }
>
> Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of
> `uaddr` will be correctly (in compatibility mode) shifted by the base
> address of the segment. There's a catch though: decoder does not check
> segment's DPL, nor its type.
>
> ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0
> segment, potentially one with a kernel space base address. On return to
> user space, CPU will reject such selector load; exception will be
> raised. But because the #GP handler sees no distinction between
> SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick
> in and process the malformed @regs->ss.
>
> If the first 4 GiB of user space are unmapped or non-writable,
> copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e.
> the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for
> each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few
> steps.

A different way to plug this is to harden ptrace (and sigreturn) to
verify that the segments are code or data type segments instead of
relying on an IRET fault.

Brian Gerst

2023-12-09 20:10:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses

On Sat, 9 Dec 2023 at 09:16, Brian Gerst <[email protected]> wrote:
>
> A different way to plug this is to harden ptrace (and sigreturn) to
> verify that the segments are code or data type segments instead of
> relying on an IRET fault.

I think that is likely a good idea regardless of this particular issue.

And I don't think you need to even check the segment for any kind of
validity - all you need to check that it's a valid selector.

And we *kind* of do that already, with the x86 ptrace code checking

static inline bool invalid_selector(u16 value)
{
return unlikely(value != 0 && (value & SEGMENT_RPL_MASK) != USER_RPL);
}

but the thing is, I think we could limit that a lot more.

I think the only valid GDT entries are 0-15 (that includes the default
kernel segments, but they don't contain anything interesting), so we
could tighten that selector check to say that it has to be either a
LDT entry or a selector < 15.

So add some kind of requirement for "(value & 4) || (value < 8*16)", perhaps?

Linus

2023-12-10 17:13:34

by Michal Luczaj

[permalink] [raw]
Subject: Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses

On 12/9/23 16:53, Borislav Petkov wrote:
> On Wed, Dec 06, 2023 at 01:43:43AM +0100, Michal Luczaj wrote:
>> Introducing a DPL check in insn_get_seg_base(), or even in get_desc(),
>> seems enough to prevent the decoder from disclosing data.
>>
>> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
>> index 558a605929db..4c1eea736519 100644
>> --- a/arch/x86/lib/insn-eval.c
>> +++ b/arch/x86/lib/insn-eval.c
>> @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
>> if (!get_desc(&desc, sel))
>> return -1L;
>>
>> + /*
>> + * Some segment selectors coming from @regs do not necessarily reflect
>> + * the state of CPU; see get_segment_selector(). Their values might
>> + * have been altered by ptrace. Thus, the instruction decoder can be
>> + * tricked into "dereferencing" a segment descriptor that would
>> + * otherwise cause a CPU exception -- for example due to mismatched
>> + * privilege levels. This opens up the possibility to expose kernel
>> + * space base address of DPL=0 segments.
>> + */
>> + if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK))
>> + return -1L;
>> +
>> return get_desc_base(&desc);
>> }
>>
>> That said, I guess instead of trying to harden the decoder,
>
> Well, here's what my CPU manual says:
>
> "4.10.1 Accessing Data Segments
>
> ...
>
> The processor compares the effective privilege level with the DPL in the
> descriptor-table entry referenced by the segment selector. If the
> effective privilege level is greater than or equal to (numerically
> lower-than or equal-to) the DPL, then the processor loads the segment
> register with the data-segment selector.
>
> If the effective privilege level is lower than (numerically
> greater-than) the DPL, a general-protection exception (#GP) occurs and
> the segment register is not loaded.
>
> ...
>
> 4.10.2 Accessing Stack Segments
>
> The processor compares the CPL with the DPL in the descriptor-table
> entry referenced by the segment selector. The two values must be equal.
> If they are not equal, a #GP occurs and the SS register is not loaded."
>
> So *actually* doing those checks in the insn decoder is the proper thing
> to do, IMNSVHO.

Are you suggesting checking only CPL vs. DPL or making sure the insn
decoder faithfully emulates all the other stuff CPU does? Like the desc.s
issue described below.

>> Now, I'm far from being competent, but here's an idea I've tried: tell
>> the #GP handler that UMIP-related exceptions come only as #GP(0):
>>
>> if (static_cpu_has(X86_FEATURE_UMIP)) {
>> - if (user_mode(regs) && fixup_umip_exception(regs))
>> + if (user_mode(regs) && !error_code && fixup_umip_exception(regs))
>> goto exit;
>> }
>
> And yap, as you've realized, that alone doesn't fix the leaking.

With this fix applied, I can't see any way to sufficiently confuse the
UMIP emulation with a non-ESPFIX bad IRET. It appears that #GP(selector)
takes precedence over #GP(0), so tripping IRET with any malformed selector
always ends up with #GP handler's error_code != 0, even if conditions were
met for #GP(0) just as well. Is there something I'm missing?

That said, there's still the case of #DF handler feeding #GP handler after
a fault in ESPFIX. Consider

cs = (GDT_ENTRY_TSS << 3) | USER_RPL
ss = (SOME_LDT_ENTRY << 3) | SEGMENT_LDT | USER_RPL
ip = "sgdt %cs:(%reg)"

Attempting IRET with such illegal CS raises #GP(selector), but (because of
ESPFIX) this #GP is promoted to #DF where it becomes #GP(0). And UMIP
emulation is triggered.

UMIP emulator starts by fetching code from user. insn decoder does
`copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE)` where `ip` is
CS.base+IP and CS.base here is actually a (part of) GDT_ENTRY_TSS.base, a
kernel address. With IP under user's control, no fault while copying.

Next, insn_get_code_seg_params() concludes that, given TSS as a code
segment, address and operand size are both 16-bit. Prefix SGDT with size
overrides, and we're back to 32-bit. Then insn_get_addr_ref() and
copy_to_user() does the leaking.

I don't know if/how to deal with ESPFIX losing #GP's error code. As for
telling insn decoder that system segments aren't code:

--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -809,6 +809,10 @@ int insn_get_code_seg_params(struct pt_regs *regs)
if (!get_desc(&desc, sel))
return -EINVAL;

+ /* System segments are not code. */
+ if (!desc.s)
+ return -EINVAL;
+
/*
* The most significant byte of the Type field of the segment descriptor
* determines whether a segment contains data or code. If this is a data

Is this something in the right direction?

(Note, get_segment_selector() is broken for selectors with the high bit
set. I'll send patch later.)

thanks,
Michal