2009-11-06 13:00:30

by matthieu castet

[permalink] [raw]
Subject: Using x86 segments against NULL pointer deference exploit

Hi,

I am wondering why we can't set the KERNEL_DS data segment to not contain the
first page, ie changing it from R/W flat model to R/W expand down from
0xffffffff to 4096.

The modification seems simple : change GDT_ENTRY_KERNEL_DS [1], and some
modification for syscall entry point that doesn't support segment (sysenter).

The drawback of this it that the kernel can't access anymore data in the first
segment. Is it needed for application like wine or dosemu ?


Regards,

Matthieu

PS : why x86_64 segment got access bit set and x86_32 doesn't ?

[1]
something like
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cc25c2b..898a569 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -101,7 +101,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = {
.gdt = {
[GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xa0fb, 0, 0xfffff),
#else
[GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xc09a, 0, 0xfffff),
- [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc096, 0, 0x00001),
[GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xc0fa, 0, 0xfffff),
[GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(0xc0f2, 0, 0xfffff),
/*


2009-11-06 13:11:41

by Alan

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

On Fri, 06 Nov 2009 13:59:49 +0100
[email protected] wrote:

> Hi,
>
> I am wondering why we can't set the KERNEL_DS data segment to not contain the
> first page, ie changing it from R/W flat model to R/W expand down from
> 0xffffffff to 4096.

For one it is enormously expensive because the moment you have segment
limits all sorts of stuff goes slower. You also do sometimes need low 4K
access for wine/dosemu etc as you guess - and for APM and so on. Plus in
64bit you don't have a lot of those features ayway.

> The drawback of this it that the kernel can't access anymore data in the first
> segment. Is it needed for application like wine or dosemu ?

Yes.

2009-11-06 20:18:54

by Andi Kleen

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

[email protected] writes:

> Hi,
>
> I am wondering why we can't set the KERNEL_DS data segment to not contain the
> first page, ie changing it from R/W flat model to R/W expand down from
> 0xffffffff to 4096.

As Alan pointed out setting segment limits/bases has large penalties.

This has been already addressed by the mmap limit defaults
on the VM level by disallowing to place something on the zero page.

In fact a lot of systems should already run with that default.

> PS : why x86_64 segment got access bit set and x86_32 doesn't ?

It's a extremly minor optimization, but the CPU sets it on the first
access anyways.

-Andi

--
[email protected] -- Speaking for myself only.

2009-11-06 20:35:56

by matthieu castet

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

Hi Alan,

Alan Cox wrote:
> On Fri, 06 Nov 2009 13:59:49 +0100
> [email protected] wrote:
>
>> Hi,
>>
>> I am wondering why we can't set the KERNEL_DS data segment to not contain the
>> first page, ie changing it from R/W flat model to R/W expand down from
>> 0xffffffff to 4096.
>
> For one it is enormously expensive because the moment you have segment
> limits all sorts of stuff goes slower.
We can always imagine a lazy mechanism that will enable segment limit when page0 is mapped.
That will only slow down the machine when wine & co are running.

> and for APM and so on.
APM clears all segment before calling bios (APM_ZERO_SEGS is defined for detecting buggy bios)
and pnpbios seems to have their own segment (GDT_ENTRY_PNPBIOS) There is also GDT_ENTRY_APMBIOS_BASE,
but that seems unused.

> You also do sometimes need low 4K
> access for wine/dosemu etc as you guess -
That's a bigger problem. If there not many access we can imagine fix it with trap/single step.

> 64bit you don't have a lot of those features ayway.
Yes.

May be the sane way should be to forbid mapping page 0, and make run application needing page 0 in a
emulator. After all it is for special case [1] :
- Win16 binary for wine
- upstream version of dosemu and qemu have workaround

But some distro still set mmap_min_addr to 0 (ubuntu+wine, ...) :(

Matthieu

[1] http://wiki.debian.org/mmap_min_addr

2009-11-06 20:58:04

by Jiri Kosina

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

On Fri, 6 Nov 2009, [email protected] wrote:

> I am wondering why we can't set the KERNEL_DS data segment to not
> contain the first page, ie changing it from R/W flat model to R/W expand
> down from 0xffffffff to 4096.
> The modification seems simple : change GDT_ENTRY_KERNEL_DS [1], and some
> modification for syscall entry point that doesn't support segment (sysenter).

The question is -- why bother? We already have mmap_min_addr ... does this
pontentially provide any additional advantage?

> The drawback of this it that the kernel can't access anymore data in the first
> segment. Is it needed for application like wine or dosemu ?
> Regards,
>
> Matthieu
>
> PS : why x86_64 segment got access bit set and x86_32 doesn't ?
>
> [1]
> something like
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index cc25c2b..898a569 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -101,7 +101,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = {
> .gdt = {
> [GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xa0fb, 0, 0xfffff),
> #else
> [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xc09a, 0, 0xfffff),
> - [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
> + [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc096, 0, 0x00001),
> [GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xc0fa, 0, 0xfffff),
> [GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(0xc0f2, 0, 0xfffff),
> /*

It's not that simple for various reasons ... PaX/Grsecurity people already
did this in their patchset quite some time ago.

See http://www.grsecurity.net/~spender/uderef.txt

--
Jiri Kosina
SUSE Labs, Novell Inc.

2009-11-06 22:54:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

On 11/06/2009 04:59 AM, [email protected] wrote:
> Hi,
>
> I am wondering why we can't set the KERNEL_DS data segment to not contain the
> first page, ie changing it from R/W flat model to R/W expand down from
> 0xffffffff to 4096.
>
> The modification seems simple : change GDT_ENTRY_KERNEL_DS [1], and some
> modification for syscall entry point that doesn't support segment (sysenter).
>
> The drawback of this it that the kernel can't access anymore data in the first
> segment. Is it needed for application like wine or dosemu ?
>

Yes, it is. On 32 bits it is possible to switch around segments and do
this (in which case you want it to only cover the actual kernel area,
and use USER_DS for all user-space references.) This also lets you drop
nearly all pointer-range checks, since they are now redundant. However,
there is a cost -- it pretty much requires a segment register for
USER_DS (this used to be fs once upon a time, hence set_fs) and probably
would break Xen and possibly other virtualization solutions.

> PS : why x86_64 segment got access bit set and x86_32 doesn't ?

It is trivially faster to start out with the access bit set -- the
hardware will set the accessed bit anyway.

-hpa

2009-11-07 10:20:07

by Jiri Kosina

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

On Fri, 6 Nov 2009, H. Peter Anvin wrote:

> Yes, it is. On 32 bits it is possible to switch around segments and do
> this (in which case you want it to only cover the actual kernel area,
> and use USER_DS for all user-space references.) This also lets you drop
> nearly all pointer-range checks, since they are now redundant.
> However, there is a cost -- it pretty much requires a segment register
> for USER_DS (this used to be fs once upon a time, hence set_fs) and
> probably would break Xen and possibly other virtualization solutions.

There are ways to work around this though (UDEREF implementation of this
technique in PaX explicitly checks for VMWare signature and handles such
case differently ... I guess the same could be done for other
virtualization solutions).

Not that it would be particularly nice of course ...

--
Jiri Kosina
SUSE Labs, Novell Inc.

2009-11-08 14:38:56

by matthieu castet

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

Hi,

Andi Kleen wrote:
> [email protected] writes:
>
>> Hi,
>>
>> I am wondering why we can't set the KERNEL_DS data segment to not contain the
>> first page, ie changing it from R/W flat model to R/W expand down from
>> 0xffffffff to 4096.
>
> As Alan pointed out setting segment limits/bases has large penalties.
>
> This has been already addressed by the mmap limit defaults
> on the VM level by disallowing to place something on the zero page.
>
> In fact a lot of systems should already run with that default.
Yes, but lot's of system run with access to zero page enabled.
Mmap limit was added near 2 years ago. But this summer lot's of machines were
still vulnerable to 'NULL deference exploits'.
Why ?

May be because the kernel still allow it (mmap_min_addr is 0 by default). Openbsd enforce it.
There are lots of way to bypass it (root, RAW_IO cap, personality, ...).
Also some distro doesn't enable it because it break some applications. For example vm86 can't
be used by dosemu, wine.

I attach a basic (, slow and probably buggy) protection with segments. It works for wine and dosemu, and catch kernel access to page0.

I believe a better solution should be to implement a new vm86 syscall. This syscall will allow to run code in
virtual 8086 mode that doesn't need to be in low pages.
For that an extra argument pointing to the code region could be added.
The kernel in the syscall entry will :
- duplicate the memory mapping of the calling thread.
- map at low pages (zero page and more) the code to run
- switch to this mapping
- enter in vm86 mode
...
- exit vm86 mode
- switch back to original mapping (without page0).
- return to user

With that new syscall, there should less programs that need page0 mapping.


Matthieu


>
>> PS : why x86_64 segment got access bit set and x86_32 doesn't ?
>
> It's a extremly minor optimization, but the CPU sets it on the first
> access anyways.
Setting it for x86_32 will allow to merge them out the ifdef. Not that it is important...


Attachments:
kernel_page0_prot.diff (4.92 kB)

2009-11-08 19:41:09

by Andi Kleen

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit


Please see 12.3.3.3 of
http://www.intel.com/Assets/PDF/manual/248966.pdf
for details why this is a bad idea.

-Andi

2009-11-09 06:30:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

On 11/06/2009 12:35 PM, matthieu castet wrote:
>
> May be the sane way should be to forbid mapping page 0, and make run
> application needing page 0 in a
> emulator. After all it is for special case [1] :
> - Win16 binary for wine
> - upstream version of dosemu and qemu have workaround
>
> But some distro still set mmap_min_addr to 0 (ubuntu+wine, ...) :(
>

Don't Do That, Then[TM].

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-11-10 17:00:26

by Markku Savela

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit


I'm wondering why on architectures that actually have EXECUTE
permission memory management, the user space is mapped into kernel
with EXECUTE enabled!!!??

If the user space were not mapped with EXECUTE enabled, the
restriction of mapping the 0-page in user space is not required
anymore.

2009-11-11 14:12:00

by Jiri Kosina

[permalink] [raw]
Subject: Re: Using x86 segments against NULL pointer deference exploit

On Tue, 10 Nov 2009, Markku Savela wrote:

> I'm wondering why on architectures that actually have EXECUTE
> permission memory management, the user space is mapped into kernel
> with EXECUTE enabled!!!??

x86, to give one random example?

--
Jiri Kosina
SUSE Labs, Novell Inc.