we'd like to announce the availability of the following kernel patch:
http://redhat.com/~mingo/nx-patches/nx-2.6.7-rc2-bk2-AE
which makes use of the 'NX' x86 feature pioneered in AMD64 CPUs and for
which support has also been announced by Intel. (other x86 CPU vendors,
Transmeta and VIA announced support as well. Windows support for NX has
also been announced by Microsoft, for their next service pack.) The NX
feature is also being marketed as 'Enhanced Virus Protection'. This
patch makes sure Linux has full support for this hardware feature on x86
too.
What does this patch do? The pagetable format of current x86 CPUs does
not have an 'execute' bit. This means that even if an application maps a
memory area without PROT_EXEC, the CPU will still allow code to be
executed in this memory. This property is often abused by exploits when
they manage to inject hostile code into this memory, for example via a
buffer overflow.
The NX feature changes this and adds a 'dont execute' bit to the PAE
pagetable format. But since the flag defaults to zero (for compatibility
reasons), all pages are executable by default and the kernel has to be
taught to make use of this bit.
If the NX feature is supported by the CPU then the patched kernel turns
on NX and it will enforce userspace executability constraints such as a
no-exec stack and no-exec mmap and data areas. This means less chance
for stack overflows and buffer-overflows to cause exploits.
furthermore, the patch also implements 'NX protection' for kernelspace
code: only the kernel code and modules are executable - so even
kernel-space overflows are harder (in some cases, impossible) to
exploit. Here is how kernel code that tries to execute off the stack is
stopped:
kernel tried to access NX-protected page - exploit attempt? (uid: 500)
Unable to handle kernel paging request at virtual address f78d0f40
printing eip:
...
The patch is based on a prototype NX patch written for 2.4 by Intel -
special thanks go to Suresh Siddha and Jun Nakajima @ Intel. The
existing NX support in the 64-bit x86_64 kernels has been written by
Andi Kleen and this patch is modeled after his code.
Arjan van de Ven has also provided lots of feedback and he has
integrated the patch into the Fedora Core 2 kernel. Test rpms are
available for download at:
http://redhat.com/~arjanv/2.6/RPMS.kernel/
the kernel-2.6.6-1.411 rpms have the NX patch applied.
here's a quickstart to recompile the vanilla kernel from source with the
NX patch:
http://redhat.com/~mingo/nx-patches/QuickStart-NX.txt
Ingo
the _GPL export of vmalloc_exec is silly, it's a trivial __vmalloc wrapper
and __vmalloc is exported. You might be better of just killing it anyway,
I don't see much use for it outside module support.
apropos modules, in SuSE's 2.4 kernels Andi had a nice optimization to
not use vmalloc if we could get high enough order allocations, might be
worth ressurecting that.
* Christoph Hellwig <[email protected]> wrote:
> the _GPL export of vmalloc_exec is silly, it's a trivial __vmalloc
> wrapper and __vmalloc is exported. You might be better of just
> killing it anyway, I don't see much use for it outside module support.
ok, agreed.
> apropos modules, in SuSE's 2.4 kernels Andi had a nice optimization to
> not use vmalloc if we could get high enough order allocations, might
> be worth ressurecting that.
yeah, this might make sense.
Ingo
On Wed, 2 Jun 2004, Ingo Molnar wrote:
>
> If the NX feature is supported by the CPU then the patched kernel turns
> on NX and it will enforce userspace executability constraints such as a
> no-exec stack and no-exec mmap and data areas. This means less chance
> for stack overflows and buffer-overflows to cause exploits.
Just out of interest - how many legacy apps are broken by this? I assume
it's a non-zero number, but wouldn't mind to be happily surprised.
And do we have some way of on a per-process basis say "avoid NX because
this old version of Oracle/flash/whatever-binary-thing doesn't run with
it"?
Linus
On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 2 Jun 2004, Ingo Molnar wrote:
> >
> > If the NX feature is supported by the CPU then the patched kernel turns
> > on NX and it will enforce userspace executability constraints such as a
> > no-exec stack and no-exec mmap and data areas. This means less chance
> > for stack overflows and buffer-overflows to cause exploits.
>
> Just out of interest - how many legacy apps are broken by this? I assume
> it's a non-zero number, but wouldn't mind to be happily surprised.
based on execshield in FC1.. about zero.
>
> And do we have some way of on a per-process basis say "avoid NX because
> this old version of Oracle/flash/whatever-binary-thing doesn't run with
> it"?
yes those aren't compiled with the PT_GNU_STACK elf flag and run with the
stack executable just fine. GCC will also emit a "make the stack executable"
flag when it emits code that puts stack trampolines up.
That all JustWorks(tm).
Arjan van de Ven <[email protected]> writes:
> On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
>>
>>
>> Just out of interest - how many legacy apps are broken by this? I assume
>> it's a non-zero number, but wouldn't mind to be happily surprised.
>
> based on execshield in FC1.. about zero.
IIRC, Lisp systems like CMUCL and SBCL (plus commercial Lisps) had
problems with FC1 due to execshield. They tend to do things like
compile code on the fly to heap memory and expect to be able to run
it.
>> And do we have some way of on a per-process basis say "avoid NX because
>> this old version of Oracle/flash/whatever-binary-thing doesn't run with
>> it"?
>
> yes those aren't compiled with the PT_GNU_STACK elf flag and run with the
> stack executable just fine. GCC will also emit a "make the stack executable"
> flag when it emits code that puts stack trampolines up.
> That all JustWorks(tm).
Given this I wonder why they had trouble...
Disclaimer: I don't hack on free Lisps; I just follow some of the
mailing lists, so I may have important details wrong. :)
-Doug
On Wed, 2 Jun 2004 22:50:25 +0200
Ingo Molnar <[email protected]> wrote:
>
> we'd like to announce the availability of the following kernel patch:
>
> http://redhat.com/~mingo/nx-patches/nx-2.6.7-rc2-bk2-AE
I think you still have the change_page_attr() bug that I so far
didn't manage to fix on x86-64 properly neither (I had one patch, but
it caused mysterious other failures). Currently x86-64 has a nasty
partly broken workaround for this only that can cause illegal aliases.
The bug is when change_page_attr() splits a kernel text page
and reverts it back then the NX bit gets set on the kernel
text page. This happens when AGP allocates a page that lies
within 2MB of the kernel mapping.
-Andi
On Thu, 2004-06-03 at 06:50, Ingo Molnar wrote:
> furthermore, the patch also implements 'NX protection' for kernelspace
> code: only the kernel code and modules are executable - so even
No, actually, it doesn't quite do that:
--- linux/kernel/module.c.orig
+++ linux/kernel/module.c
@@ -1431,7 +1431,7 @@ static struct module *load_module(void _
/* Suck in entire file: we'll want most of it. */
/* vmalloc barfs on "unusual" numbers. Check here */
- if (len > 64 * 1024 * 1024 || (hdr = vmalloc(len)) == NULL)
+ if (len > 64 * 1024 * 1024 || (hdr = vmalloc_exec(len)) == NULL)
return ERR_PTR(-ENOMEM);
if (copy_from_user(hdr, umod, len) != 0) {
err = -EFAULT;
This is where we such the module file into kernel memory to parse it,
not where we actually copy the memory.
You want to replace the arch-specific module_alloc() function for this.
Or even better, reset the NX bit only on executable sections (in the
arch-specific module_finalize(), using mod->core_text_size and
mod->init_text_size). No generic changes necessary.
What surprises me is that this error didn't cause your kernel to explode
the moment you inserted a module containing a function...
Hope that helps!
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
Rusty Russell wrote:
> On Thu, 2004-06-03 at 06:50, Ingo Molnar wrote:
>
>>furthermore, the patch also implements 'NX protection' for kernelspace
>>code: only the kernel code and modules are executable - so even
>
>
> No, actually, it doesn't quite do that:
>
> --- linux/kernel/module.c.orig
> +++ linux/kernel/module.c
> @@ -1431,7 +1431,7 @@ static struct module *load_module(void _
>
> /* Suck in entire file: we'll want most of it. */
> /* vmalloc barfs on "unusual" numbers. Check here */
> - if (len > 64 * 1024 * 1024 || (hdr = vmalloc(len)) == NULL)
> + if (len > 64 * 1024 * 1024 || (hdr = vmalloc_exec(len)) == NULL)
> return ERR_PTR(-ENOMEM);
> if (copy_from_user(hdr, umod, len) != 0) {
> err = -EFAULT;
>
> This is where we such the module file into kernel memory to parse it,
> not where we actually copy the memory.
>
> You want to replace the arch-specific module_alloc() function for this.
> Or even better, reset the NX bit only on executable sections (in the
> arch-specific module_finalize(), using mod->core_text_size and
> mod->init_text_size). No generic changes necessary.
>
> What surprises me is that this error didn't cause your kernel to explode
> the moment you inserted a module containing a function...
bah, modules are for lame people who don't want to squeeze that last
%0.00001 of additional performance out of their kernel by reducing TLB
and I-cache misses...
Jeff
On Wed, Jun 02, 2004 at 11:17:14PM +0200, Arjan van de Ven wrote:
> On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
> > Just out of interest - how many legacy apps are broken by this? I assume
> > it's a non-zero number, but wouldn't mind to be happily surprised.
>
> based on execshield in FC1.. about zero.
Doesn't Sun's JDK break here?
Joel
--
"Maybe the time has drawn the faces I recall.
But things in this life change very slowly,
If they ever change at all."
Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127
On Wed, 2 Jun 2004 18:12:53 -0700
Joel Becker <[email protected]> wrote:
> On Wed, Jun 02, 2004 at 11:17:14PM +0200, Arjan van de Ven wrote:
> > On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
> > > Just out of interest - how many legacy apps are broken by this? I assume
> > > it's a non-zero number, but wouldn't mind to be happily surprised.
> >
> > based on execshield in FC1.. about zero.
>
> Doesn't Sun's JDK break here?
Nope, since it doesn't have the ELF header bit set that says it can support
that.
-Andi
On Wed, Jun 02, 2004 at 06:12:53PM -0700, Joel Becker wrote:
> On Wed, Jun 02, 2004 at 11:17:14PM +0200, Arjan van de Ven wrote:
> > On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
> > > Just out of interest - how many legacy apps are broken by this? I assume
> > > it's a non-zero number, but wouldn't mind to be happily surprised.
> >
> > based on execshield in FC1.. about zero.
>
> Doesn't Sun's JDK break here?
nope.
It broke with 4g/4g because it thought pointers in the upper 1Gb of address
space were errors, but it doesn't break with execshield.
* Linus Torvalds <[email protected]> wrote:
> > If the NX feature is supported by the CPU then the patched kernel turns
> > on NX and it will enforce userspace executability constraints such as a
> > no-exec stack and no-exec mmap and data areas. This means less chance
> > for stack overflows and buffer-overflows to cause exploits.
>
> Just out of interest - how many legacy apps are broken by this? I
> assume it's a non-zero number, but wouldn't mind to be happily
> surprised.
in the full install of FC1 and FC2 the number is zero - and Fedora has
exec-shield which does a couple of things more: it makes the heap
non-executable as well [this broke X], it randomizes the address-space
layout and has a 4:4 VM [which broke the Sun JVM].
> And do we have some way of on a per-process basis say "avoid NX
> because this old version of Oracle/flash/whatever-binary-thing doesn't
> run with it"?
we have three mechanisms for this in Fedora:
1) the PT_GNU_STACK flag itself - you can turn executability on/off
compile-time or even after the fact via the execstack(8) utility
Jakub wrote. This only affects the stack's executability - if an
application assumes a non-PROT_EXEC mmap() can be executed it might
still break with NX - but based on experience with Fedora Core i'd
say there's almost no such application.
this method works in 2.6 too, since it supports PT_GNU_STACK. gcc's
PT_GNU_STACK mechanism is very conservative - e.g. if an application
does an asm() then gcc assumes that it might rely on stack executability
and emits the X flag. [applications can then turn this off in the source
if stack executability is not required.] Likewise, if gcc emits
trampolines then the X flag is emitted too. (glibc knows about
PT_GNU_STACK all across - so e.g. if a nonexec stack application
dlopen()s a library that needs stack executability then glibc makes the
stack executable on the fly via PROT_GROWSDOWN/GROWSUP.)
2) via a runtime method: via the i386 personality. So an application can
trigger the 'legacy' Linux VM layout by e.g doing 'i386 java
./test.class'.
this is a hack in Fedora - we wanted to have a finegrained runtime
mechanism just in case. But it would be nice to have this upstream too -
e.g. via a PERSONALITY_3G?
3) via a kernel boot parameter (exec-shield=0)
with the NX patch this becomes noexec=off [the same flag works on x86_64
too]. This method is the most inflexible one, and is a last-resort
thing. (Fedora also has a runtime global switch to turn off the VM
layout changes.)
here's a list of applications that we had to fix/work around in Fedora
when the VM layout changed:
- emacs _rebuild_. (it coredumps itself during build ... xemacs is OK.)
- some JDKs. Since they generate code and try to be as fast as possible
they tend to rely more on VM details than normal applications.
- X's module loader assumed that brk was executable. (fixed)
- Wine. (it implements another OS so it's by definition very sensitive
to layout changes.)
most of the breakages were unclean x86-only code that would have broken
if ported over to 64-bit anyway.
old, legacy applications dont have the PT_GNU_STACK flag so they all
work fine.
Ingo
* Rusty Russell <[email protected]> wrote:
> You want to replace the arch-specific module_alloc() function for
> this. Or even better, reset the NX bit only on executable sections (in
> the arch-specific module_finalize(), using mod->core_text_size and
> mod->init_text_size). No generic changes necessary.
ok!
Ingo
* Rusty Russell <[email protected]> wrote:
> You want to replace the arch-specific module_alloc() function for
> this. Or even better, reset the NX bit only on executable sections (in
> the arch-specific module_finalize(), using mod->core_text_size and
> mod->init_text_size). No generic changes necessary.
does the .exit.text section have to be taken into account as well? This
is the normal section order of x86 .ko objects:
.text
.init.text
.exit.text
.rodata
.modinfo
.rodata.str1.1
.data
__obsparm
.gnu.linkonce.this_module
.comment
.gnu_debuglink
we load the module up including the .data section? Or do we load the
whole thing?
Ingo
* Rusty Russell <[email protected]> wrote:
> You want to replace the arch-specific module_alloc() function for
> this. Or even better, reset the NX bit only on executable sections (in
> the arch-specific module_finalize(), using mod->core_text_size and
> mod->init_text_size). No generic changes necessary.
this reminds me of another issue: x86_64 currently seems to manually map
the whole module via PAGE_KERNEL_EXEC. Andi, we could change it to use
vmalloc_exec(), right?
and yet another sub-topic: when building modules we should align .rodata
(the first non-executable section) to page boundary. This adds ~2K to
the module size but it's not an issue i think. Data section overflows do
happen and if it has a function pointer that can be used as a trampoline
then we want the whole data section to be non-executable.
Ingo
here's the latest NX patch:
http://redhat.com/~mingo/nx-patches/nx-2.6.7-rc2-bk2-AF
Changes since -AE:
- use vmalloc_exec() in module_alloc() (bug noticed by Rusty Russell)
- unexport vmalloc_exec() (suggested by Christoph Hellwig)
- fix compilation warning when !PAE (Andrew Morton)
Ingo
Ingo Molnar <[email protected]> writes:
> * Rusty Russell <[email protected]> wrote:
>
>> You want to replace the arch-specific module_alloc() function for
>> this. Or even better, reset the NX bit only on executable sections (in
>> the arch-specific module_finalize(), using mod->core_text_size and
>> mod->init_text_size). No generic changes necessary.
>
> this reminds me of another issue: x86_64 currently seems to manually map
> the whole module via PAGE_KERNEL_EXEC. Andi, we could change it to use
> vmalloc_exec(), right?
Nope, the manual map is needed. On x86-64 kernels are linked in the
"kernel" code model and the modules must be within 2GB of the main
kernel text. vmalloc space is elsewhere.
To fix this you would need to link the modules with -fPIC and
add a full dynamic linker to the module loader, which would
probably not be worth the effort.
-Andi
* Ingo Molnar <[email protected]> wrote:
> > And do we have some way of on a per-process basis say "avoid NX
> > because this old version of Oracle/flash/whatever-binary-thing doesn't
> > run with it"?
[...]
> 2) via a runtime method: via the i386 personality. So an application can
> trigger the 'legacy' Linux VM layout by e.g doing 'i386 java
> ./test.class'.
>
> this is a hack in Fedora - we wanted to have a finegrained runtime
> mechanism just in case. But it would be nice to have this upstream too -
> e.g. via a PERSONALITY_3G?
i've attached a patch that provides a cleaner solution. It does 3
changes:
- it adds a ADDR_SPACE_EXECUTABLE bit to the personality 'bug bits'
section. This bit if set will make the stack executable. (if in the
future we decide to make the malloc() heap non-exec [which i definitely
think we should], that property will also listen to this bit.)
- in elf.h, it changes the x86 personality inheritance code to match
that of x86_64 - which is a much saner method. This means if a complex
app that does exec()s will all run with the personality of the
parent(s).
- in exec.c, since address-space executability is a security-relevant
item, we must clear the personality when we exec a setuid binary. I
believe this is also a (small) security robustness fix for current
64-bit architectures.
(the patch also adds a break to the elf_ex.e_phnum loop - there can only
be one STACK header in the binary and once we found it we should not
iterate through the remaining program headers (if any).)
we didnt want to add a non-standard personality flag to Fedora so we
abused PER_LINUX32 as the compatibility flag - but this only works on
x86. With the ADDR_SPACE_EXECUTABLE flag there would be a standard
method to fall back to 'legacy' executability assumptions Linux
applications might make.
hm?
Ingo
--- linux/include/linux/personality.h.orig
+++ linux/include/linux/personality.h
@@ -30,6 +30,7 @@ extern int abi_fake_utsname;
*/
enum {
MMAP_PAGE_ZERO = 0x0100000,
+ ADDR_SPACE_EXECUTABLE = 0x0200000,
ADDR_LIMIT_32BIT = 0x0800000,
SHORT_INODE = 0x1000000,
WHOLE_SECONDS = 0x2000000,
--- linux/include/asm-i386/elf.h.orig
+++ linux/include/asm-i386/elf.h
@@ -117,7 +117,8 @@ typedef struct user_fxsr_struct elf_fpxr
#define AT_SYSINFO_EHDR 33
#ifdef __KERNEL__
-#define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)
+/* child inherits the personality of the parent */
+#define SET_PERSONALITY(ex, ibcs2) do { } while (0)
extern int dump_task_regs (struct task_struct *, elf_gregset_t *);
extern int dump_task_fpu (struct task_struct *, elf_fpregset_t *);
--- linux/fs/exec.c.orig
+++ linux/fs/exec.c
@@ -886,8 +886,11 @@ int prepare_binprm(struct linux_binprm *
if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
/* Set-uid? */
- if (mode & S_ISUID)
+ if (mode & S_ISUID) {
bprm->e_uid = inode->i_uid;
+ /* reset personality */
+ current->personality = PER_LINUX;
+ }
/* Set-gid? */
/*
@@ -895,8 +898,11 @@ int prepare_binprm(struct linux_binprm *
* is a candidate for mandatory locking, not a setgid
* executable.
*/
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->e_gid = inode->i_gid;
+ /* reset personality */
+ current->personality = PER_LINUX;
+ }
}
/* fill in binprm security blob */
--- linux/fs/binfmt_elf.c.orig
+++ linux/fs/binfmt_elf.c
@@ -490,7 +490,7 @@ static int load_elf_binary(struct linux_
struct exec interp_ex;
char passed_fileno[6];
struct files_struct *files;
- int executable_stack = EXSTACK_DEFAULT;
+ int executable_stack;
/* Get the exec-header */
elf_ex = *((struct elfhdr *) bprm->buf);
@@ -616,13 +616,19 @@ static int load_elf_binary(struct linux_
}
elf_ppnt = elf_phdata;
- for (i = 0; i < elf_ex.e_phnum; i++, elf_ppnt++)
- if (elf_ppnt->p_type == PT_GNU_STACK) {
- if (elf_ppnt->p_flags & PF_X)
- executable_stack = EXSTACK_ENABLE_X;
- else
- executable_stack = EXSTACK_DISABLE_X;
- }
+ if (current->personality & ADDR_SPACE_EXECUTABLE)
+ executable_stack = EXSTACK_ENABLE_X;
+ else {
+ executable_stack = EXSTACK_DEFAULT;
+ for (i = 0; i < elf_ex.e_phnum; i++, elf_ppnt++)
+ if (elf_ppnt->p_type == PT_GNU_STACK) {
+ if (elf_ppnt->p_flags & PF_X)
+ executable_stack = EXSTACK_ENABLE_X;
+ else
+ executable_stack = EXSTACK_DISABLE_X;
+ break;
+ }
+ }
/* Some simple consistency checks for the interpreter */
if (elf_interpreter) {
Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:
>
>
>>>If the NX feature is supported by the CPU then the patched kernel turns
>>>on NX and it will enforce userspace executability constraints such as a
>>>no-exec stack and no-exec mmap and data areas. This means less chance
>>>for stack overflows and buffer-overflows to cause exploits.
>>
>>Just out of interest - how many legacy apps are broken by this? I
>>assume it's a non-zero number, but wouldn't mind to be happily
>>surprised.
>
>
> in the full install of FC1 and FC2 the number is zero - and Fedora has
> exec-shield which does a couple of things more: it makes the heap
> non-executable as well [this broke X], it randomizes the address-space
> layout and has a 4:4 VM [which broke the Sun JVM].
>
>
>>And do we have some way of on a per-process basis say "avoid NX
>>because this old version of Oracle/flash/whatever-binary-thing doesn't
>>run with it"?
>
>
> we have three mechanisms for this in Fedora:
>
> 1) the PT_GNU_STACK flag itself - you can turn executability on/off
> compile-time or even after the fact via the execstack(8) utility
> Jakub wrote. This only affects the stack's executability - if an
> application assumes a non-PROT_EXEC mmap() can be executed it might
> still break with NX - but based on experience with Fedora Core i'd
> say there's almost no such application.
>
> this method works in 2.6 too, since it supports PT_GNU_STACK. gcc's
> PT_GNU_STACK mechanism is very conservative - e.g. if an application
> does an asm() then gcc assumes that it might rely on stack executability
> and emits the X flag. [applications can then turn this off in the source
> if stack executability is not required.] Likewise, if gcc emits
> trampolines then the X flag is emitted too. (glibc knows about
> PT_GNU_STACK all across - so e.g. if a nonexec stack application
> dlopen()s a library that needs stack executability then glibc makes the
> stack executable on the fly via PROT_GROWSDOWN/GROWSUP.)
>
> 2) via a runtime method: via the i386 personality. So an application can
> trigger the 'legacy' Linux VM layout by e.g doing 'i386 java
> ./test.class'.
>
> this is a hack in Fedora - we wanted to have a finegrained runtime
> mechanism just in case. But it would be nice to have this upstream too -
> e.g. via a PERSONALITY_3G?
>
> 3) via a kernel boot parameter (exec-shield=0)
>
> with the NX patch this becomes noexec=off [the same flag works on x86_64
> too]. This method is the most inflexible one, and is a last-resort
> thing. (Fedora also has a runtime global switch to turn off the VM
> layout changes.)
>
> here's a list of applications that we had to fix/work around in Fedora
> when the VM layout changed:
>
> - emacs _rebuild_. (it coredumps itself during build ... xemacs is OK.)
>
> - some JDKs. Since they generate code and try to be as fast as possible
> they tend to rely more on VM details than normal applications.
>
> - X's module loader assumed that brk was executable. (fixed)
>
> - Wine. (it implements another OS so it's by definition very sensitive
> to layout changes.)
Wine breaks because of the part of exec-shield that relocates shared
libs to low addresses, where the (stripped) Windows binaries expect to
be loaded at. NX stack doesn't affect it.
--
Brian Gerst
> kernel tried to access NX-protected page - exploit attempt? (uid: 500)
> Unable to handle kernel paging request at virtual address f78d0f40
> printing eip:
> ...
Just a small nitpick...
Can you please drop the "- exploit attempt" from the error? Buffer
overflows aren't always exploits.
I already have a problem with jumpy users to blame everything on "hackers"
I'd much rather have someone qualified come to that conclusion rather than
the kernel making a bad guess at it.
Gerhard
--
Gerhard Mack
[email protected]
<>< As a computer I find your faith in technology amusing.
On Thu, 3 Jun 2004 14:44:48 +0200
Ingo Molnar <[email protected]> wrote:
> - in exec.c, since address-space executability is a security-relevant
> item, we must clear the personality when we exec a setuid binary. I
> believe this is also a (small) security robustness fix for current
> 64-bit architectures.
I'm not sure I like that. This means I cannot earily force an i386 uname
or 3GB address space on suid programs anymore on x86-64.
While in theory it could be a small security problem I think the utility
is much greater.
It's hard to see how setting NX could cause a security hole. The program
may crash, but it is unlikely to be exploitable.
-Andi
On Thu, 2004-06-03 at 16:36, Gerhard Mack wrote:
> > kernel tried to access NX-protected page - exploit attempt? (uid: 500)
> > Unable to handle kernel paging request at virtual address f78d0f40
> > printing eip:
> > ...
>
> Just a small nitpick...
>
> Can you please drop the "- exploit attempt" from the error? Buffer
> overflows aren't always exploits.
buffer overflows that then also execute code are pretty much always
exploits tho ;)
Ingo Molnar wrote:
> gcc's
> PT_GNU_STACK mechanism is very conservative - e.g. if an application
> does an asm() then gcc assumes that it might rely on stack executability
> and emits the X flag.
Actually, this isn't the case. asm() alone don't trigger this. There
are far too many of them in use. And there never has been a reported
problem. Only trampolines etc cause gcc to automatically request the X
flag to be set. In case an asm() indeed causes problems the user can
pass the --execstack option to the linker.
It's all explained in
http://people.redhat.com/drepper/nonselsec.pdf
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
On Thursday 03 June 2004 05:44, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > And do we have some way of on a per-process basis say "avoid NX
> > > because this old version of Oracle/flash/whatever-binary-thing doesn't
> > > run with it"?
>
> [...]
> > 2) via a runtime method: via the i386 personality. So an application can
> > trigger the 'legacy' Linux VM layout by e.g doing 'i386 java
> > ./test.class'.
> >
> > this is a hack in Fedora - we wanted to have a finegrained runtime
> > mechanism just in case. But it would be nice to have this upstream too -
> > e.g. via a PERSONALITY_3G?
>
> i've attached a patch that provides a cleaner solution. It does 3
> changes:
>
> - it adds a ADDR_SPACE_EXECUTABLE bit to the personality 'bug bits'
> section. This bit if set will make the stack executable. (if in the
> future we decide to make the malloc() heap non-exec [which i definitely
> think we should], that property will also listen to this bit.)
Ingo,
What do you mean by "in the future"? on x86, with the current no execute
patch, malloc() will be non-exec
thanks,
suresh
> What do you mean by "in the future"? on x86, with the current no execute
> patch, malloc() will be non-exec
On x86-64 the heap is executable right now at least.
-Andi
On Wed, Jun 02, 2004 at 11:17:14PM +0200, Arjan van de Ven wrote:
> On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
> >
> > Just out of interest - how many legacy apps are broken by this? I assume
> > it's a non-zero number, but wouldn't mind to be happily surprised.
>
> based on execshield in FC1.. about zero.
FWIW, applix runs fine on FC1 (well, after you install the necessary
libraries.) This is probably about the oldest binary-only legacy app
for linux. The date on the executable is Jun 1996.
Thanks,
Jim
Hi Linus,
On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
> Just out of interest - how many legacy apps are broken by this? I assume
> it's a non-zero number, but wouldn't mind to be happily surprised.
>
> And do we have some way of on a per-process basis say "avoid NX because
> this old version of Oracle/flash/whatever-binary-thing doesn't run with
> it"?
That's why the PT_GNU_STACK section is searched for ;-)
If it's not found, we assume a legacy app and go with the arch default.
I tested this -- on x86-64.
Regards,
--
Kurt Garloff <[email protected]> [Koeln, DE]
Physics:Plasma modeling <[email protected]> [TU Eindhoven, NL]
Linux: SUSE Labs (Head) <[email protected]> [SUSE Nuernberg, DE]
On Thursday 03 June 2004 13:37, Andi Kleen wrote:
> > What do you mean by "in the future"? on x86, with the current no execute
> > patch, malloc() will be non-exec
>
> On x86-64 the heap is executable right now at least.
>
oh! I see. Looks like only Ingo's exec-shield patch is doing that.
thanks,
suresh
Andi Kleen wrote:
> On Thu, 3 Jun 2004 14:44:48 +0200
> Ingo Molnar <[email protected]> wrote:
>
>
>
>>- in exec.c, since address-space executability is a security-relevant
>>item, we must clear the personality when we exec a setuid binary. I
>>believe this is also a (small) security robustness fix for current
>>64-bit architectures.
>
>
> I'm not sure I like that. This means I cannot earily force an i386 uname
> or 3GB address space on suid programs anymore on x86-64.
>
> While in theory it could be a small security problem I think the utility
> is much greater.
>
> It's hard to see how setting NX could cause a security hole. The program
> may crash, but it is unlikely to be exploitable.
The whole point of NX, though, is that it prevents certain classes of
exploits. If a setuid binary is vulnerable to one of these, then Ingo's
patch "fixes" it. Your approach breaks that.
I don't like Ingo's fix either, though. At least it should check
CAP_PTRACE or some such. A better fix would be for LSM to pass down a flag
indicating a change of security context. I'll throw that in to my
caps/apply_creds cleanup, in case that ever gets applied.
--Andy
>
> -Andi
On Thu, Jun 03, 2004 at 03:58:27PM -0700, Siddha, Suresh B wrote:
> On Thursday 03 June 2004 13:37, Andi Kleen wrote:
> > > What do you mean by "in the future"? on x86, with the current no execute
> > > patch, malloc() will be non-exec
> >
> > On x86-64 the heap is executable right now at least.
> >
>
> oh! I see. Looks like only Ingo's exec-shield patch is doing that.
Maybe adding a new ELF header flag for that would be a good idea too.
Just not sure if gcc should set it by default or not.
But I fear chaging it for x86-64 generically could break programs
again.
-andi
On Thu, Jun 03, 2004 at 04:01:57PM -0700, Andy Lutomirski wrote:
> Andi Kleen wrote:
> >On Thu, 3 Jun 2004 14:44:48 +0200
> >Ingo Molnar <[email protected]> wrote:
> >
> >
> >
> >>- in exec.c, since address-space executability is a security-relevant
> >>item, we must clear the personality when we exec a setuid binary. I
> >>believe this is also a (small) security robustness fix for current
> >>64-bit architectures.
> >
> >
> >I'm not sure I like that. This means I cannot earily force an i386 uname
> >or 3GB address space on suid programs anymore on x86-64.
> >
> >While in theory it could be a small security problem I think the utility
> >is much greater.
> >
> >It's hard to see how setting NX could cause a security hole. The program
> >may crash, but it is unlikely to be exploitable.
>
> The whole point of NX, though, is that it prevents certain classes of
> exploits. If a setuid binary is vulnerable to one of these, then Ingo's
> patch "fixes" it. Your approach breaks that.
Good point.
But that only applies to the NX personality bit. For the uname emulation
it is not an issue.
So maybe the dropping on exec should only zero a few selected
personality bits, but not all.
>
> I don't like Ingo's fix either, though. At least it should check
> CAP_PTRACE or some such. A better fix would be for LSM to pass down a flag
> indicating a change of security context. I'll throw that in to my
> caps/apply_creds cleanup, in case that ever gets applied.
Don't think we should require an LSM module for that. That's
far overkill.
-Andi
Andi Kleen wrote:
>>The whole point of NX, though, is that it prevents certain classes of
>>exploits. If a setuid binary is vulnerable to one of these, then Ingo's
>>patch "fixes" it. Your approach breaks that.
>
>
> Good point.
>
> But that only applies to the NX personality bit. For the uname emulation
> it is not an issue.
>
> So maybe the dropping on exec should only zero a few selected
> personality bits, but not all.
True.
>>I don't like Ingo's fix either, though. At least it should check
>>CAP_PTRACE or some such. A better fix would be for LSM to pass down a flag
>>indicating a change of security context. I'll throw that in to my
>>caps/apply_creds cleanup, in case that ever gets applied.
>
>
> Don't think we should require an LSM module for that. That's
> far overkill.
I'm not suggesting a new LSM module. I'm suggesting modifying the existing
LSM code to handle this cleanly. We already have a function
(security_bprm_secureexec) that does something like this, and, in fact,
it's probably the right thing to test here.
I'm currently compiling a new patch (modified from my last caps cleanup)
that makes a new bitfield for this stuff. I don't know if it's worth
applying, but I'll send it off to Andrew once I convince myself it works.
--Andy
On Thu, 2004-06-03 at 18:53, Ingo Molnar wrote:
> * Rusty Russell <[email protected]> wrote:
>
> > You want to replace the arch-specific module_alloc() function for
> > this. Or even better, reset the NX bit only on executable sections (in
> > the arch-specific module_finalize(), using mod->core_text_size and
> > mod->init_text_size). No generic changes necessary.
...
> and yet another sub-topic: when building modules we should align .rodata
> (the first non-executable section) to page boundary. This adds ~2K to
> the module size but it's not an issue i think. Data section overflows do
> happen and if it has a function pointer that can be used as a trampoline
> then we want the whole data section to be non-executable.
Yes. It would add ~4k (if you want to do it for the init sections as
well as the core sections of the module: might not be worth it).
You can set the alignment requirement in module_frob_arch_sections(),
but beware that this alignment will only be relative to the allocation
returned by module_alloc(), so to do this you'll want module_alloc() to
return page-aligned memory.
Note the section sorting done in kernel/module.c:layout_sections(): in
particular, all executable sections are placed FIRST in the module,
which makes your life easier here.
Hope that helps!
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
Andy Lutomirski wrote:
>>> I don't like Ingo's fix either, though. At least it should check
>>> CAP_PTRACE or some such. A better fix would be for LSM to pass down
>>> a flag indicating a change of security context. I'll throw that in
>>> to my caps/apply_creds cleanup, in case that ever gets applied.
>>
>>
>>
>> Don't think we should require an LSM module for that. That's far
>> overkill.
>
>
> I'm not suggesting a new LSM module. I'm suggesting modifying the
> existing LSM code to handle this cleanly. We already have a function
> (security_bprm_secureexec) that does something like this, and, in fact,
> it's probably the right thing to test here.
... or not.
secureexec will return true even if you have whatever cap you want the user
to have for this to work.
What use to you see for having this flag survive setuid? The only (safe)
use I can see is for debugging, in which case just copying the binary and
running it non-setuid should be OK.
In this case, then secureexec is a better test than setuid-ness because of
LSMs (like SELinux) in which case setuid is not the only way that security
can be elevated.
--Andy
* Andi Kleen <[email protected]> wrote:
> > The whole point of NX, though, is that it prevents certain classes of
> > exploits. If a setuid binary is vulnerable to one of these, then Ingo's
> > patch "fixes" it. Your approach breaks that.
>
> Good point.
>
> But that only applies to the NX personality bit. For the uname
> emulation it is not an issue.
>
> So maybe the dropping on exec should only zero a few selected
> personality bits, but not all.
ok, how about the attached patch then? There's a PERS_DROP_ON_SUID mask
that we drop upon setuid - all the other personality bits get inherited.
Ingo
--- linux/include/linux/personality.h.orig
+++ linux/include/linux/personality.h
@@ -30,6 +30,7 @@ extern int abi_fake_utsname;
*/
enum {
MMAP_PAGE_ZERO = 0x0100000,
+ ADDR_SPACE_EXECUTABLE = 0x0200000,
ADDR_LIMIT_32BIT = 0x0800000,
SHORT_INODE = 0x1000000,
WHOLE_SECONDS = 0x2000000,
@@ -37,6 +38,8 @@ enum {
ADDR_LIMIT_3GB = 0x8000000,
};
+#define PERS_DROP_ON_SUID (MMAP_PAGE_ZERO|ADDR_SPACE_EXECUTABLE)
+
/*
* Personality types.
*
--- linux/include/asm-i386/elf.h.orig
+++ linux/include/asm-i386/elf.h
@@ -117,7 +117,8 @@ typedef struct user_fxsr_struct elf_fpxr
#define AT_SYSINFO_EHDR 33
#ifdef __KERNEL__
-#define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)
+/* child inherits the personality of the parent */
+#define SET_PERSONALITY(ex, ibcs2) do { } while (0)
extern int dump_task_regs (struct task_struct *, elf_gregset_t *);
extern int dump_task_fpu (struct task_struct *, elf_fpregset_t *);
--- linux/fs/exec.c.orig
+++ linux/fs/exec.c
@@ -886,8 +886,11 @@ int prepare_binprm(struct linux_binprm *
if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
/* Set-uid? */
- if (mode & S_ISUID)
+ if (mode & S_ISUID) {
bprm->e_uid = inode->i_uid;
+ /* reset personality */
+ current->personality &= ~PERS_DROP_ON_SUID;
+ }
/* Set-gid? */
/*
@@ -895,8 +898,11 @@ int prepare_binprm(struct linux_binprm *
* is a candidate for mandatory locking, not a setgid
* executable.
*/
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->e_gid = inode->i_gid;
+ /* reset personality */
+ current->personality &= ~PERS_DROP_ON_SUID;
+ }
}
/* fill in binprm security blob */
--- linux/fs/binfmt_elf.c.orig
+++ linux/fs/binfmt_elf.c
@@ -490,7 +490,7 @@ static int load_elf_binary(struct linux_
struct exec interp_ex;
char passed_fileno[6];
struct files_struct *files;
- int executable_stack = EXSTACK_DEFAULT;
+ int executable_stack;
/* Get the exec-header */
elf_ex = *((struct elfhdr *) bprm->buf);
@@ -616,13 +616,19 @@ static int load_elf_binary(struct linux_
}
elf_ppnt = elf_phdata;
- for (i = 0; i < elf_ex.e_phnum; i++, elf_ppnt++)
- if (elf_ppnt->p_type == PT_GNU_STACK) {
- if (elf_ppnt->p_flags & PF_X)
- executable_stack = EXSTACK_ENABLE_X;
- else
- executable_stack = EXSTACK_DISABLE_X;
- }
+ if (current->personality & ADDR_SPACE_EXECUTABLE)
+ executable_stack = EXSTACK_ENABLE_X;
+ else {
+ executable_stack = EXSTACK_DEFAULT;
+ for (i = 0; i < elf_ex.e_phnum; i++, elf_ppnt++)
+ if (elf_ppnt->p_type == PT_GNU_STACK) {
+ if (elf_ppnt->p_flags & PF_X)
+ executable_stack = EXSTACK_ENABLE_X;
+ else
+ executable_stack = EXSTACK_DISABLE_X;
+ break;
+ }
+ }
/* Some simple consistency checks for the interpreter */
if (elf_interpreter) {
* Suresh Siddha <[email protected]> wrote:
> On Thursday 03 June 2004 13:37, Andi Kleen wrote:
> > > What do you mean by "in the future"? on x86, with the current no execute
> > > patch, malloc() will be non-exec
> >
> > On x86-64 the heap is executable right now at least.
>
> oh! I see. Looks like only Ingo's exec-shield patch is doing that.
yep. The patch also detaches the brk area from the binary's image and
bss, and randomizes it. (this isolates them better and makes it harder
to overflow between these sections.)
For the segment-limit method on non-NX CPUs a non-executable brk (heap,
malloc() space) has another significance: since it must be above the
binary image [there's simply not enough brk space below the binary], the
CS segment limit does not cover the binary's .data/bss sections - hence
that is non-executable as well. [for NX there's no difference - the
.data/bss sections are non-executable.]
but for the mainstream kernel the most important step would be to make
brk non-executable.
Ingo
* Gerhard Mack <[email protected]> wrote:
> > kernel tried to access NX-protected page - exploit attempt? (uid: 500)
> > Unable to handle kernel paging request at virtual address f78d0f40
> > printing eip:
> > ...
>
> Just a small nitpick...
>
> Can you please drop the "- exploit attempt" from the error? Buffer
> overflows aren't always exploits.
this message will only trigger if the kernel tries to _execute_ an
non-executable kernel page - which almost never happens even considering
kernel crashes. Normal kernel oopses will still look like they used to.
Ingo
* Brian Gerst <[email protected]> wrote:
> Wine breaks because of the part of exec-shield that relocates shared
> libs to low addresses, where the (stripped) Windows binaries expect to
> be loaded at. NX stack doesn't affect it.
I think Wine could get around this by creating a dummy ELF section in
the Wine binary that covers the first 1GB or so. Wine could still use
ordinary dynamic libraries - those would go above that 1GB. Then once
Wine has loaded up it can munmap() that first 1GB.
(this would not work if Wine has to dlopen() new libraries after this
phase - does that happen?)
Ingo
On Fri, Jun 04, 2004 at 11:39:58AM +0200, Ingo Molnar wrote:
> I think Wine could get around this by creating a dummy ELF section in
> the Wine binary that covers the first 1GB or so. Wine could still use
> ordinary dynamic libraries - those would go above that 1GB. Then once
> Wine has loaded up it can munmap() that first 1GB.
>
> (this would not work if Wine has to dlopen() new libraries after this
> phase - does that happen?)
Why can't wine just implement it's own binfmt_pecoff? Sounds like the
much simpler solutuion.
On Fri, Jun 04, 2004 at 11:39:58AM +0200, Ingo Molnar wrote:
>> I think Wine could get around this by creating a dummy ELF section in
>> the Wine binary that covers the first 1GB or so. Wine could still use
>> ordinary dynamic libraries - those would go above that 1GB. Then once
>> Wine has loaded up it can munmap() that first 1GB.
>> (this would not work if Wine has to dlopen() new libraries after this
>> phase - does that happen?)
On Fri, Jun 04, 2004 at 11:41:08AM +0100, Christoph Hellwig wrote:
> Why can't wine just implement it's own binfmt_pecoff? Sounds like the
> much simpler solutuion.
I'd be in favor of this also. An executable format with wide enough
usage is worth adding kernel support for loading it.
-- wli
Ingo Molnar wrote:
>* Gerhard Mack <[email protected]> wrote:
>
>>> kernel tried to access NX-protected page - exploit attempt? (uid: 500)
>>> Unable to handle kernel paging request at virtual address f78d0f40
>>> printing eip:
>>> ...
>>>
>>>
>>Just a small nitpick...
>>
>>Can you please drop the "- exploit attempt" from the error? Buffer
>>overflows aren't always exploits.
>>
>>
>
>this message will only trigger if the kernel tries to _execute_ an
>non-executable kernel page - which almost never happens even considering
>kernel crashes. Normal kernel oopses will still look like they used to.
>
>
Perhaps the message should read like so:
kernel tried to execute NX-protected page - exploit attempt? (uid: 500)
Unable to handle kernel paging request at virtual address f78d0f40
printing eip:
- Steve
On Friday 04 June 2004 02:25, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > > The whole point of NX, though, is that it prevents certain classes of
> > > exploits. If a setuid binary is vulnerable to one of these, then Ingo's
> > > patch "fixes" it. Your approach breaks that.
> >
> > Good point.
> >
> > But that only applies to the NX personality bit. For the uname
> > emulation it is not an issue.
> >
> > So maybe the dropping on exec should only zero a few selected
> > personality bits, but not all.
>
> ok, how about the attached patch then? There's a PERS_DROP_ON_SUID mask
> that we drop upon setuid - all the other personality bits get inherited.
>
> Ingo
This is wrong on SELinux (and presumably with other LSMs). It also does
unexpected things if you fail to exec a setuid executable.
Here's a (completely untested, applies-with-some-offset-on-non-mm) version
that's less wrong.
Note the less. It will at least do funny things to your audit log on
SELinux -- I don't like it _that_ much.
--Andy
fs/binfmt_elf.c | 22 ++++++++++++++--------
fs/exec.c | 3 +++
include/asm-i386/elf.h | 3 ++-
include/linux/personality.h | 3 +++
4 files changed, 22 insertions(+), 9 deletions(-)
--- 2.6.7-rc1-mm1/fs/binfmt_elf.c~ingo 2004-05-28 09:02:06.000000000 -0700
+++ 2.6.7-rc1-mm1/fs/binfmt_elf.c 2004-06-04 08:13:33.192127641 -0700
@@ -487,7 +487,7 @@ static int load_elf_binary(struct linux_
struct exec interp_ex;
char passed_fileno[6];
struct files_struct *files;
- int executable_stack = EXSTACK_DEFAULT;
+ int executable_stack;
/* Get the exec-header */
elf_ex = *((struct elfhdr *) bprm->buf);
@@ -613,13 +613,19 @@ static int load_elf_binary(struct linux_
}
elf_ppnt = elf_phdata;
- for (i = 0; i < elf_ex.e_phnum; i++, elf_ppnt++)
- if (elf_ppnt->p_type == PT_GNU_STACK) {
- if (elf_ppnt->p_flags & PF_X)
- executable_stack = EXSTACK_ENABLE_X;
- else
- executable_stack = EXSTACK_DISABLE_X;
- }
+ if (current->personality & ADDR_SPACE_EXECUTABLE)
+ executable_stack = EXSTACK_ENABLE_X;
+ else {
+ executable_stack = EXSTACK_DEFAULT;
+ for (i = 0; i < elf_ex.e_phnum; i++, elf_ppnt++)
+ if (elf_ppnt->p_type == PT_GNU_STACK) {
+ if (elf_ppnt->p_flags & PF_X)
+ executable_stack = EXSTACK_ENABLE_X;
+ else
+ executable_stack = EXSTACK_DISABLE_X;
+ break;
+ }
+ }
/* Some simple consistency checks for the interpreter */
if (elf_interpreter) {
--- 2.6.7-rc1-mm1/fs/exec.c~ingo 2004-06-04 08:12:26.347465895 -0700
+++ 2.6.7-rc1-mm1/fs/exec.c 2004-06-04 08:19:43.226203330 -0700
@@ -934,6 +934,9 @@ void compute_creds(struct linux_binprm *
unsafe = unsafe_exec(current);
security_bprm_apply_creds(bprm, unsafe);
task_unlock(current);
+
+ if (security_bprm_secureexec(bprm))
+ current->personality &= ~PERS_DROP_ON_SUID;
}
EXPORT_SYMBOL(compute_creds);
--- 2.6.7-rc1-mm1/include/linux/personality.h~ingo 2004-05-09 19:32:37.000000000 -0700
+++ 2.6.7-rc1-mm1/include/linux/personality.h 2004-06-04 08:13:33.166133605 -0700
@@ -30,6 +30,7 @@ extern int abi_fake_utsname;
*/
enum {
MMAP_PAGE_ZERO = 0x0100000,
+ ADDR_SPACE_EXECUTABLE = 0x0200000,
ADDR_LIMIT_32BIT = 0x0800000,
SHORT_INODE = 0x1000000,
WHOLE_SECONDS = 0x2000000,
@@ -37,6 +38,8 @@ enum {
ADDR_LIMIT_3GB = 0x8000000,
};
+#define PERS_DROP_ON_SUID (MMAP_PAGE_ZERO|ADDR_SPACE_EXECUTABLE)
+
/*
* Personality types.
*
--- 2.6.7-rc1-mm1/include/asm-i386/elf.h~ingo 2004-05-09 19:32:53.000000000 -0700
+++ 2.6.7-rc1-mm1/include/asm-i386/elf.h 2004-06-04 08:13:33.182129935 -0700
@@ -117,7 +117,8 @@ typedef struct user_fxsr_struct elf_fpxr
#define AT_SYSINFO_EHDR 33
#ifdef __KERNEL__
-#define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)
+/* child inherits the personality of the parent */
+#define SET_PERSONALITY(ex, ibcs2) do { } while (0)
extern int dump_task_regs (struct task_struct *, elf_gregset_t *);
extern int dump_task_fpu (struct task_struct *, elf_fpregset_t *);
On Fri, 4 Jun 2004, Andy Lutomirski wrote:
>
> This is wrong on SELinux (and presumably with other LSMs). It also does
> unexpected things if you fail to exec a setuid executable.
Let's not do this at all.
Anything that changes subtle behaviour at suid-execute time is just wrong.
Imagine an app that has been tested in normal use, and then has a subtle
bug when executed set-uid, simply because the address space layout
changes. Or something that mysteriously works when you're root, but not
when you're anything else. Ouch.
I think we should just look at the executable itself, not whether it is
suid. If the executable says it is "NX-approved", then it's NX-approved.
End of story - just try to make sure that as many executables as possible
get compiled with the newer compiler suite that enables it.
Add a tool to let people turn on/off the NX bit on an executable if it
turns out the executable can't work with it (let's say it was compiled and
tested on a CPU without NX support), and everybody should be happy. You
can have a trivial script that turns on the NX bit on all the legacy apps
too, and then if testing shows iot wasn't a good idea, you can turn it off
again on a per-executable basis.
No?
Linus
On Fri, Jun 04, 2004 at 08:36:15AM -0700, Linus Torvalds wrote:
>
> Add a tool to let people turn on/off the NX bit on an executable if it
the prelink rpm on Fedora has such a tool already fwiw.
(it's part of prelink because the elf manipulations needed are quite similar
to the ones prelink does so infrastructure is shared)
On Fri, 4 Jun 2004, Arjan van de Ven wrote:
>
> the prelink rpm on Fedora has such a tool already fwiw.
> (it's part of prelink because the elf manipulations needed are quite similar
> to the ones prelink does so infrastructure is shared)
Just for fun, can somebody that has the required hardware just test old
apps with NX turned on?
I know we used to put the signal handler trampoline on the stack, but
these days that should all be handled with the magic executable syscall
page, so _normally_ I don't think an old application should even really
care.
In fact, it would be interesting to just hear somebody running an older
distribution with a new CPU and a new kernel, and see just how many
programs need to be marked non-NX in "normal running".
Linus
On Fri, Jun 04, 2004 at 08:47:11AM -0700, Linus Torvalds wrote:
>
>
> On Fri, 4 Jun 2004, Arjan van de Ven wrote:
> >
> > the prelink rpm on Fedora has such a tool already fwiw.
> > (it's part of prelink because the elf manipulations needed are quite similar
> > to the ones prelink does so infrastructure is shared)
>
> Just for fun, can somebody that has the required hardware just test old
> apps with NX turned on?
well anyone with an amd64 qualifies.. old apps work for me.
(fwiw FC1 and FC2 already run without the stack being executable if you use
the default distro kernel even on "traditional" x86 cpus, via the segment limit hack)
> In fact, it would be interesting to just hear somebody running an older
> distribution with a new CPU and a new kernel, and see just how many
> programs need to be marked non-NX in "normal running".
I know that in a FC1 full install there are less than 5 binaries that don't
run with NX. (one uses nested functions in C and passes function pointers to
the inner function around which causes gcc to emit a stack trampoline, and
gcc then marks the binary as non-NX, the others have asm in them that we
didn't fix in time to be properly marked).
On Fri, 4 Jun 2004, Arjan van de Ven wrote:
>
> I know that in a FC1 full install there are less than 5 binaries that don't
> run with NX. (one uses nested functions in C and passes function pointers to
> the inner function around which causes gcc to emit a stack trampoline, and
> gcc then marks the binary as non-NX, the others have asm in them that we
> didn't fix in time to be properly marked).
If things are really that good, why are we even worrying about this?
It sounds like we should just have NX on by default even for executables
that don't have any NX info records, and have some way of marking the
(very few) executables that don't want it. Maybe have the NX fault print a
warning when it happens for an executable that defaulted to NX on.
I think most people have seen the security disaster that causes most of
the emails on the net to be spam. So this should be _trivial_ to explain
to people when they complain about default behaviour breaking their
strange legacy app. Especially if there's a trivial tool to add an elf
section to make it work again.
So instead of having complex things to try to turn NX on for suid, we
should aim to turn ot on as widely as possible, _even_if_ that means that
people who upgrade hardware might have to do some trivial MIS stuff.
Make a kernel bootup option to default to legacy mode if somebody
literally has trouble booting and fixing their thing due to "init" or
similar being one of the problematic cases. Together with a printk() that
says which executable triggered, it should be trivial to clean up a
system.
No?
Linus
* Linus Torvalds <[email protected]> wrote:
> I think we should just look at the executable itself, not whether it
> is suid. If the executable says it is "NX-approved", then it's
> NX-approved. End of story - just try to make sure that as many
> executables as possible get compiled with the newer compiler suite
> that enables it.
right now the 'x' bit in the PT_GNU_STACK ELF program header has the
narrow meaning of specifcing the stack's executability. How should we
handle the brk area's executability? A good portion of recent attacks
came over heap overflows.
we could use the following 3 values:
PT_GNU_STACK not present: legacy app, stack and heap executable
PT_GNU_STACK present but !X: heap non-executable, stack executable
PT_GNU_STACK present and X: both heap and stack are executable.
this method is what is used in Fedora and it works pretty well.
(in fact Fedora also does VM-layout changes to get more brk/mmap space
on x86 and to put executable code close to each other - this too is
turned off if PT_GNU_STACK is not present.)
Ingo
On Fri, 4 Jun 2004 09:02:26 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:
>
>
> On Fri, 4 Jun 2004, Arjan van de Ven wrote:
> >
> > I know that in a FC1 full install there are less than 5 binaries that don't
> > run with NX. (one uses nested functions in C and passes function pointers to
> > the inner function around which causes gcc to emit a stack trampoline, and
> > gcc then marks the binary as non-NX, the others have asm in them that we
> > didn't fix in time to be properly marked).
>
> If things are really that good, why are we even worrying about this?
It's only that good because gcc handles it transparently.
Also more weird exe
> So instead of having complex things to try to turn NX on for suid, we
> should aim to turn ot on as widely as possible, _even_if_ that means that
> people who upgrade hardware might have to do some trivial MIS stuff.
That is what is currently done on x86-64 in the major distributions
(SUSE,RedHat) at least.
Anything compiled with the new gcc is marked NX on unless
it has a trampoline. Old executables are run with NX off.
I would keep the default for old executables off, because
the applications which need the gcc trampolines are more
widely spread that one first things (e.g. most Ada programs
compiled by GNAT and GNAT itself require this and we
got a few other reports of third party programs breaking)
But that's handled by the policy of only doing it for programs
compiled by the new gcc.
Of course that is only for the stack. Making the heap non executable
is another can of worms. I don't know if Fedora does that
too, SUSE and mainline x86-64 doesn't.
> Make a kernel bootup option to default to legacy mode if somebody
> literally has trouble booting and fixing their thing due to "init" or
> similar being one of the problematic cases. Together with a printk() that
> says which executable triggered, it should be trivial to clean up a
> system.
That exists too on x86-64:
/* noexec32=opt{,opt}
Control the no exec default for 32bit processes. Can be also overwritten
per executable using ELF header flags (e.g. needed for the X server)
Requires noexec=on or noexec=noforce to be effective.
Valid options:
all,on Heap,stack,data is non executable.
off (default) Heap,stack,data is executable
stack Stack is non executable, heap/data is.
force Don't imply PROT_EXEC for PROT_READ
compat (default) Imply PROT_EXEC for PROT_READ
*/
and the same for 64bit processes:
/* noexec=on|off
Control non executable mappings for 64bit processes.
on Enable
off Disable
noforce (default) Don't enable by default for heap/stack/data,
but allow PROT_EXEC to be effective
*/
BTW 64bit processes mostly have the same problem - there are some
that break since the first x86-64 distributions didn't use NX.
-Andi
On Fri, Jun 04, 2004 at 06:13:04PM +0200, Andi Kleen wrote:
> > So instead of having complex things to try to turn NX on for suid, we
> > should aim to turn ot on as widely as possible, _even_if_ that means that
> > people who upgrade hardware might have to do some trivial MIS stuff.
>
> That is what is currently done on x86-64 in the major distributions
> (SUSE,Red Hat) at least.
yep.
> Of course that is only for the stack. Making the heap non executable
> is another can of worms. I don't know if Fedora does that
> too, SUSE and mainline x86-64 doesn't.
Fedora makes the heap non executable too; it only broke X which has it's own
shared library loader (which btw had all the right mprotects in place, just
ifdef'd for freebsd, ia64 and a few other architectures that default to
non-executable heap, so we just added x86(-64) to that)
Greetings,
Arjan van de Ven
On Fri, Jun 04, 2004 at 06:37:54PM +0200, Arjan van de Ven wrote:
> Fedora makes the heap non executable too; it only broke X which has it's own
> shared library loader (which btw had all the right mprotects in place, just
> ifdef'd for freebsd, ia64 and a few other architectures that default to
> non-executable heap, so we just added x86(-64) to that)
Maybe you should just call mprotect always to be safe? :) OTOH I guess
the world would end if a X release had less ifdefs than the previous one..
Linus Torvalds wrote:
> If things are really that good, why are we even worrying about this?
>
> It sounds like we should just have NX on by default even for executables
> that don't have any NX info records,
This is possible in one of the modes the FC kernel supports but not a
good default.
While most of the code we ship has no problems, 3rd party code is a
completely different story. Most of the time this code is not as
cleanly written as the (cleaned-up) code we ship. If anything, you can
announce your intention to change the default in a few years and urge
people to clean up their code. If you want the maximum protection now
go with Ingo's exec-shield patch and the /proc/sys/kernel/exec-shield
entry which can be set to 2 to enable the strict mode. That's certainly
the best solution for edge servers but not for application servers
running lots of dubious 3rd party code.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
correction to the table:
> PT_GNU_STACK not present: legacy app, stack and heap executable
> PT_GNU_STACK present but X: heap non-executable, stack executable
> PT_GNU_STACK present and !X: both heap and stack are non-executable.
>
> this method is what is used in Fedora and it works pretty well.
the patch below implements this simple and pretty robust logic ontop of
the -AF NX patch.
in fact it's more conservative than what we have in Fedora because it
will turn on executability even for data mmap()s. (in theory there could
be third party apps that expect a data mmap to be executable on x86 even
if it's not PROT_EXEC.)
I've test-booted it on an athlon64 box running FC2 and have tested an
old PT_GNU_STACK-less binary and it indeed has all data mappings
executable, explicitly. (I've also test-booted it on an x86 box with an
older distribution installed - works as expected.)
newly-compiled applications that have the PT_GNU_STACK flag (either as X
or NX) will have the heap non-executable, and the stack executable
depending on the value of the flag.
hm?
Ingo
--- linux/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -491,6 +491,7 @@ static int load_elf_binary(struct linux_
char passed_fileno[6];
struct files_struct *files;
int executable_stack = EXSTACK_DEFAULT;
+ unsigned long def_flags = 0;
/* Get the exec-header */
elf_ex = *((struct elfhdr *) bprm->buf);
@@ -622,7 +623,10 @@ static int load_elf_binary(struct linux_
executable_stack = EXSTACK_ENABLE_X;
else
executable_stack = EXSTACK_DISABLE_X;
+ break;
}
+ if (i == elf_ex.e_phnum)
+ def_flags |= VM_EXEC | VM_MAYEXEC;
/* Some simple consistency checks for the interpreter */
if (elf_interpreter) {
@@ -690,6 +694,7 @@ static int load_elf_binary(struct linux_
current->mm->end_code = 0;
current->mm->mmap = NULL;
current->flags &= ~PF_FORKNOEXEC;
+ current->mm->def_flags = def_flags;
/* Do this immediately, since STACK_TOP as used in setup_arg_pages
may depend on the personality. */
--- linux/fs/exec.c
+++ linux/fs/exec.c
@@ -431,6 +431,7 @@ int setup_arg_pages(struct linux_binprm
mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
else
mpnt->vm_flags = VM_STACK_FLAGS;
+ mpnt->vm_flags |= mm->def_flags;
mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
insert_vm_struct(mm, mpnt);
mm->total_vm = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
--- linux/include/asm-i386/page.h
+++ linux/include/asm-i386/page.h
@@ -138,7 +138,7 @@ static __inline__ int get_order(unsigned
#define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
-#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \
+#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | \
VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
#endif /* __KERNEL__ */
* Ingo Molnar <[email protected]> wrote:
> the patch below implements this simple and pretty robust logic ontop of
> the -AF NX patch.
this also means we finally reach the end of the road - in typical
applications strictly those mappings are executable that contain code:
saturn:~> cat /proc/self/maps | grep xp
00aca000-00adf000 r-xp 00000000 03:41 3434109 /lib/ld-2.3.3.so
00ae3000-00bf8000 r-xp 00000000 03:41 3434110 /lib/tls/libc-2.3.3.so
08048000-0804c000 r-xp 00000000 03:41 4431437 /bin/cat
Ingo
>>>>> On Fri, 4 Jun 2004 17:40:32 +0100, Christoph Hellwig <[email protected]> said:
Christoph> On Fri, Jun 04, 2004 at 06:37:54PM +0200, Arjan van de
Christoph> Ven wrote:
>> Fedora makes the heap non executable too; it only broke X which
>> has it's own shared library loader (which btw had all the right
>> mprotects in place, just ifdef'd for freebsd, ia64 and a few
>> other architectures that default to non-executable heap, so we
>> just added x86(-64) to that)
Christoph> Maybe you should just call mprotect always to be safe? :)
Christoph> OTOH I guess the world would end if a X release had less
Christoph> ifdefs than the previous one..
No kidding!
--david
this is how the maps look like on distributions that dont have the
PT_GNU_STACK:
08048000-0804c000 r-xp 00000000 03:01 495264 /bin/cat
0804c000-0804d000 rwxp 00003000 03:01 495264 /bin/cat
0804d000-0806e000 rwxp 0804d000 00:00 0
40000000-40014000 r-xp 00000000 03:01 319552 /lib/ld-2.3.3.so
40014000-40015000 r--p 00014000 03:01 319552 /lib/ld-2.3.3.so
40015000-40016000 rwxp 00015000 03:01 319552 /lib/ld-2.3.3.so
40029000-4015d000 r-xp 00000000 03:01 176042 /lib/tls/libc-2.3.3.so
4015d000-4015f000 r--p 00134000 03:01 176042 /lib/tls/libc-2.3.3.so
4015f000-40161000 rwxp 00136000 03:01 176042 /lib/tls/libc-2.3.3.so
40161000-40164000 rwxp 40161000 00:00 0
40164000-40364000 r-xp 00000000 03:01 465837 /usr/lib/locale/locale-archive
40364000-4036a000 r-xp 00902000 03:01 465837 /usr/lib/locale/locale-archive
4036a000-40397000 r-xp 0090c000 03:01 465837 /usr/lib/locale/locale-archive
40397000-40398000 r-xp 00942000 03:01 465837 /usr/lib/locale/locale-archive
bfffe000-c0000000 rwxp bfffe000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
so we can turn on NX safely - all but the library/binary .data areas are
executable. (and if any code assumes executability there then it
deserves that SEGFAULT ...)
Ingo
On Fri, 4 Jun 2004 17:40:32 +0100
Christoph Hellwig <[email protected]> wrote:
> On Fri, Jun 04, 2004 at 06:37:54PM +0200, Arjan van de Ven wrote:
> > Fedora makes the heap non executable too; it only broke X which has it's own
> > shared library loader (which btw had all the right mprotects in place, just
> > ifdef'd for freebsd, ia64 and a few other architectures that default to
> > non-executable heap, so we just added x86(-64) to that)
>
> Maybe you should just call mprotect always to be safe? :) OTOH I guess
> the world would end if a X release had less ifdefs than the previous one..
If you do that please also fix the X server to always use the /proc/bus/pci/*
access methods too instead of banging on 0xcf8 directly. The current
method is racing with the kernel and can actually cause hard disk corruption
on x86-64 machines that use the IOMMU (IOMMU flush uses pci config space
access too)
-Andi
On Fri, 4 Jun 2004, Arjan van de Ven wrote:
> I know that in a FC1 full install there are less than 5 binaries that don't
> run with NX. (one uses nested functions in C and passes function pointers to
> the inner function around which causes gcc to emit a stack trampoline, and
> gcc then marks the binary as non-NX, the others have asm in them that we
> didn't fix in time to be properly marked).
Can you tell if GCC uses trampolines for all use of function pointers or
just ones that use nested functions ?
Also, what is the fastest way to check if GCC is marking non-NX?
Gerhard
--
Gerhard Mack
[email protected]
<>< As a computer I find your faith in technology amusing.
On Fri, Jun 04, 2004 at 02:11:00PM -0400, Gerhard Mack wrote:
> On Fri, 4 Jun 2004, Arjan van de Ven wrote:
>
> > I know that in a FC1 full install there are less than 5 binaries that don't
> > run with NX. (one uses nested functions in C and passes function pointers to
> > the inner function around which causes gcc to emit a stack trampoline, and
> > gcc then marks the binary as non-NX, the others have asm in them that we
> > didn't fix in time to be properly marked).
>
> Can you tell if GCC uses trampolines for all use of function pointers or
> just ones that use nested functions ?
just for nested functions
>
> Also, what is the fastest way to check if GCC is marking non-NX?
readelf -l <binary>
should give output with a STACK line, if that says "RW" then it's NX ok, if
it says "RWX" then it's non-NX
On Wed, Jun 02, 2004 at 05:31:37PM -0400, Doug McNaught wrote:
> Arjan van de Ven <[email protected]> writes:
>
> > On Wed, Jun 02, 2004 at 02:13:13PM -0700, Linus Torvalds wrote:
> >>
> >>
> >> Just out of interest - how many legacy apps are broken by this? I assume
> >> it's a non-zero number, but wouldn't mind to be happily surprised.
> >
> > based on execshield in FC1.. about zero.
>
> IIRC, Lisp systems like CMUCL and SBCL (plus commercial Lisps) had
> problems with FC1 due to execshield. They tend to do things like
> compile code on the fly to heap memory and expect to be able to run
> it.
They will still work, as long as you don't recompile them with recent
toolchain.
When you recompile them, they either needs to be taught to DTRT (i.e.
use mmap with PROT_EXEC for executable stuff), or can be linked with
-Wl,-z,execstack to mark them as needing executable stack.
prelink package also contains execstack(8) utility which can be used
on already linked binaries/shared libraries.
Jakub
On Fri, Jun 04, 2004 at 06:13:04PM +0200, Andi Kleen wrote:
> Of course that is only for the stack. Making the heap non executable
> is another can of worms. I don't know if Fedora does that
> too, SUSE and mainline x86-64 doesn't.
When I added PT_GNU_STACK, it was meant from the beginning as
stack+heap+mmap w/o PROT_EXEC executability/non-executability.
I don't think it makes any sense to have separate bits for heap and stack.
Any program which assumes PROT_READ implies PROT_EXEC just can be marked
PT_GNU_STACK PF_X.
Jakub
On Tue, 8 Jun 2004 05:07:12 -0400
Jakub Jelinek <[email protected]> wrote:
> On Fri, Jun 04, 2004 at 06:13:04PM +0200, Andi Kleen wrote:
> > Of course that is only for the stack. Making the heap non executable
> > is another can of worms. I don't know if Fedora does that
> > too, SUSE and mainline x86-64 doesn't.
>
> When I added PT_GNU_STACK, it was meant from the beginning as
> stack+heap+mmap w/o PROT_EXEC executability/non-executability.
> I don't think it makes any sense to have separate bits for heap and stack.
> Any program which assumes PROT_READ implies PROT_EXEC just can be marked
> PT_GNU_STACK PF_X.
heap execution seems to be a lot more common than stack execution.
-Andi
> > When I added PT_GNU_STACK, it was meant from the beginning as
> > stack+heap+mmap w/o PROT_EXEC executability/non-executability.
> > I don't think it makes any sense to have separate bits for heap and stack.
> > Any program which assumes PROT_READ implies PROT_EXEC just can be marked
> > PT_GNU_STACK PF_X.
>
> heap execution seems to be a lot more common than stack execution.
yep but because *BSD and ia64 and .. and .. already require the correct
mprotect/mmap flags for the heap most code has it ok.
(Ok X had broken ifdefs ;)
Ulrich Drepper wrote:
> Linus Torvalds wrote:
>
>
>>If things are really that good, why are we even worrying about this?
>>
>>It sounds like we should just have NX on by default even for executables
>>that don't have any NX info records,
>
>
> This is possible in one of the modes the FC kernel supports but not a
> good default.
>
> While most of the code we ship has no problems, 3rd party code is a
> completely different story. Most of the time this code is not as
> cleanly written as the (cleaned-up) code we ship. If anything, you can
> announce your intention to change the default in a few years and urge
> people to clean up their code. If you want the maximum protection now
> go with Ingo's exec-shield patch and the /proc/sys/kernel/exec-shield
> entry which can be set to 2 to enable the strict mode. That's certainly
> the best solution for edge servers but not for application servers
> running lots of dubious 3rd party code.
>
I have complained about breaking existing programs many times, but in
this case I think the default should be no exec on all user writable
data, and let the admin relax the security as needed. Or at least make a
really obnoxious whine default, so people will know they have to deal
with what's coming in some (near) future release.
And this should be noted at the top of the ChangeLog for versions where
it changes, please! This will bite people, I just think it's needed to
keep Linux the most secure o/s around. Yes, I know about openBSD...
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me