The traditional intended behavior of Linux is that anonymous memory
has the EXECUTE permission turned on. The reason I say "intended"
behavior is that there appears to be an old bug in the kernel in this
respect. Specifically, the ELF data section is normally mapped with
READ+WRITE permission (no EXECUTE permission). The initial break
value starts at the end of the bss segment and if that value does not
fall on a page boundary (usually it doesn't), it means that the first
few bytes allocated with sbrk() will NOT be EXECUTABLE. Now, on x86
this doesn't matter, because READ permission implies RIGHT permission,
but on ia64 and any other architecture that has a separate EXECUTE
permission, this means that programs cannot rely on memory returned by
malloc() being executable. There is obviously several ways to fix
this problem, but I'm wondering whether it's not time to just say NO
to turning on execute permission by default on anonymous memory.
I discussed this briefly with Linus and his comments are reproduced
below. While I see Linus' points, I do think that turning off EXECUTE
permission on anonymous would improve the security in practice, if not
in theory. But I'd be interested in other people's opinion. Also, as
a practical matter, we currently have special hacks in the ia64 page
fault handler that are needed to work around performance problems that
arise from the fact that we map anonymous memory with EXECUTE rights
by default. Those hacks avoid having to flush the cache for memory
that's mapped executable but never really executed. So clearly there
are technical advantages to not turning on EXECUTE permission, even if
we ignore the security argument.
What I'm wondering: how do others feel about this issue? Since x86
wont be affected by this, I'm especially interested in the opinion of
the maintainers of non-x86 platforms.
It seems to me that for portability reasons, dynamic code generators
should always do an mmap() call to ensure that the generated code is
executable. If we can agree on this as the recommended practice, then
I don't see much of a problem with not turning on the EXECUTE right by
default.
Opinions?
--david
--------------
Comments by Linus:
I would say that the BSS has to be mapped the same way brk() maps things.
They _are_ the same thing, after all - I consider "brk()" to be a system
call that just dynamically changes the BSS limits.
So I would say that we have a few options:
- just explicitly make bss/brk() be non-executable, and tell Compaq that
if they want to do dynamic code generation they should use an anonymous
mapping with MAP_EXEC.
- make both of them always be executable, and say "this is how x86 does
it, security issues don't help", x86 is the Borg, and you _have_ been
assimilated.
- add a per-process flag (that gets copied at fork() and stays alive over
exec()) that allows the system to decide between the two above
dynamically on a process-per-process basis. We could default to the
stricter thing, and people who aren't happy would just make a wrapper
executable (no setuid needed) that sets the flag and executes whatever
process it wants to run that needs to execute BSS.
Quite frankly, my personal preference is "We are the borg of x86" choice,
especially on ia64. The security issue with stack smashing etc is a
complete non-issue: if the program allows a buffer overrun it is insecure
whether EXEC is set or not.
But I suspect you should talk this over on ia64 lists and possibly people
like Alan &co. Feel free to quote this email.
Linus
From: David Mosberger <[email protected]>
Date: Mon, 7 Jan 2002 16:25:10 -0800
Also, as a practical matter, we currently have special hacks in the
ia64 page fault handler that are needed to work around performance
problems that arise from the fact that we map anonymous memory with
EXECUTE rights by default. Those hacks avoid having to flush the
cache for memory that's mapped executable but never really
executed. So clearly there are technical advantages to not turning
on EXECUTE permission, even if we ignore the security argument.
I assume this hack is "have a software EXECUTE bit, initially only
set the software one, when we take a fault on execute set the hardware
bit and maybe flush the Icache". If so, what is the big deal? :-)
What I'm wondering: how do others feel about this issue? Since x86
wont be affected by this, I'm especially interested in the opinion of
the maintainers of non-x86 platforms.
It seems to me that for portability reasons, dynamic code generators
should always do an mmap() call to ensure that the generated code is
executable. If we can agree on this as the recommended practice, then
I don't see much of a problem with not turning on the EXECUTE right by
default.
Opinions?
I think changing this behavior is going to silently break things on
many architectures. Secondly, I do not see any real gain from any
of this and my ports are those that have I-cache coherency issues :-)
Franks a lot,
David S. Miller
[email protected]
> Opinions?
>
> Quite frankly, my personal preference is "We are the borg of x86" choice,
> especially on ia64. The security issue with stack smashing etc is a
> complete non-issue: if the program allows a buffer overrun it is insecure
> whether EXEC is set or not.
I semi agree with Linus comment. However it is a lot easier to make attacks
_hard_ especially on a 64bit box by having non executable areas. My
personal feeling is that for an existing production world port like Alpha
you fix the sbrk bug so you always get executable memory. For the IA64
its a new platform and you either say "No it isnt executable" or let ld.so
and malloc do the remapping based on environment variable settings.
We are borg of x86 is true for the near future, but codifying an x86ism for
all ports for ever seems unwise.
For IA32 on IA64 binaries you would however need to keep the executable
data behaviour.
Alan
>>>>> On Mon, 07 Jan 2002 22:02:08 -0800 (PST), "David S. Miller" <[email protected]> said:
DaveM> I assume this hack is "have a software EXECUTE bit, initially
DaveM> only set the software one, when we take a fault on execute
DaveM> set the hardware bit and maybe flush the Icache". If so,
DaveM> what is the big deal? :-)
Yes. Hey, don't get me wrong: I'm *proud* of that solution, but if
the alternative is to completely get rid of the problem in the first
place, that is always preferable (simplicity rules).
DaveM> I think changing this behavior is going to silently break
DaveM> things on many architectures.
I don't consider SIGSEGV to be a silent failure. Also, I think
all the evidence is that it's unlikely to break many existing
apps:
o The bug I described has been present for *years* on
Alpha and probably all other platforms other than x86;
even on ia64 it took almost two years before someone
noticed. It's possible that nobody noticed because
the code generators were part of a larger program,
but it's very likely that anyone writing a test program
would have allocated the non-executable memory, so you'd
expect *someone* to have run into it at some point.
o Certain libraries such as the Boehm Garbage Collector
already turn off execute permission by default. While
there may not be that many apps that use it in a production
environment, it is my impression that many developers are
using it as a memory-leak detector (e.g., Mozilla does that).
DaveM> Secondly, I do not see any
DaveM> real gain from any of this and my ports are those that have
DaveM> I-cache coherency issues :-)
I think that's fine. If the consensus is that apps *should* use
mprotect() to get executable permission (Linus implied as much) and
it's an architecture specific choice as to whether this is enforced,
I'm happy. My belief is that we could make this change on ia64
without undue burden on programmers. If not, I'm sure I'll find out
about it and I'm willing to take the responsibility.
--david
>>>>> On Tue, 8 Jan 2002 13:23:15 +0000 (GMT), Alan Cox <[email protected]> said:
Alan> We are borg of x86 is true for the near future, but codifying
Alan> an x86ism for all ports for ever seems unwise.
Glad to hear that.
Alan> For IA32 on IA64 binaries you would however need to keep the
Alan> executable data behaviour.
Yes. I don't recall off hand whether the x86 emulation hardware (aka
"IVE") automatically takes care of that. I'll work on prototyping
this.
Thanks,
--david
David Mosberger writes:
> I think that's fine. If the consensus is that apps *should* use
> mprotect() to get executable permission (Linus implied as much) and
> it's an architecture specific choice as to whether this is enforced,
> I'm happy. My belief is that we could make this change on ia64
> without undue burden on programmers. If not, I'm sure I'll find out
> about it and I'm willing to take the responsibility.
If you turn off executable permission right now, you can add it
back at some future date.
If you leave the executable permission, we're stuck with it as
the ABI becomes set in stone.
So turn it off ASAP.
Followup to: <[email protected]>
By author: David Mosberger <[email protected]>
In newsgroup: linux.dev.kernel
>
> I don't consider SIGSEGV to be a silent failure. Also, I think
> all the evidence is that it's unlikely to break many existing
> apps:
>
> o The bug I described has been present for *years* on
> Alpha and probably all other platforms other than x86;
> even on ia64 it took almost two years before someone
> noticed. It's possible that nobody noticed because
> the code generators were part of a larger program,
> but it's very likely that anyone writing a test program
> would have allocated the non-executable memory, so you'd
> expect *someone* to have run into it at some point.
>
> o Certain libraries such as the Boehm Garbage Collector
> already turn off execute permission by default. While
> there may not be that many apps that use it in a production
> environment, it is my impression that many developers are
> using it as a memory-leak detector (e.g., Mozilla does that).
>
>
> DaveM> Secondly, I do not see any
> DaveM> real gain from any of this and my ports are those that have
> DaveM> I-cache coherency issues :-)
>
> I think that's fine. If the consensus is that apps *should* use
> mprotect() to get executable permission (Linus implied as much) and
> it's an architecture specific choice as to whether this is enforced,
> I'm happy. My belief is that we could make this change on ia64
> without undue burden on programmers. If not, I'm sure I'll find out
> about it and I'm willing to take the responsibility.
>
One way to do this would be to create a newbrk() syscall which takes a
permission argument (for new pages.)
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
On 8 Jan 2002, H. Peter Anvin wrote:
> One way to do this would be to create a newbrk() syscall which takes a
> permission argument (for new pages.)
ITYM mmap(2)
Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document
http://www.surriel.com/ http://distro.conectiva.com/
Rik van Riel wrote:
> On 8 Jan 2002, H. Peter Anvin wrote:
>
>
>>One way to do this would be to create a newbrk() syscall which takes a
>>permission argument (for new pages.)
>>
>
> ITYM mmap(2)
>
That's an idea, too. WTF do we actually need brk() for? If it's only
there to be annoying, let's get rid of it completely and let the C
library implement it -- stating its assumptions explicitly.
-hpa
> One way to do this would be to create a newbrk() syscall which takes a
> permission argument (for new pages.)
brk(), mmap().
Welcome to libc 8)
On Wed Jan 09, 2002 at 03:11:22AM +0000, Alan Cox wrote:
> > One way to do this would be to create a newbrk() syscall which takes a
> > permission argument (for new pages.)
>
> brk(), mmap().
>
> Welcome to libc 8)
Umm. How can libc implement mmap without the kernel
handing out the pages? I don't get it.
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
On Tuesday 08 January 2002 09:52 pm, H. Peter Anvin wrote:
> Rik van Riel wrote:
> > On 8 Jan 2002, H. Peter Anvin wrote:
> >>One way to do this would be to create a newbrk() syscall which takes a
> >>permission argument (for new pages.)
> >
> > ITYM mmap(2)
>
> That's an idea, too. WTF do we actually need brk() for? If it's only
> there to be annoying, let's get rid of it completely and let the C
> library implement it -- stating its assumptions explicitly.
>
> -hpa
There was a fun little panel a few months back at Atlanta Linux Showcase
(which was inexplicably held in california this year, but I'm told they're
patching that in the next release... :)
Apparently, for mallocs below a certain size, glibc uses brk, and above a
certain size, it uses mmap. And the mmap variant is something like 10 times
slower than the brk variant, because of all the soft page faults, fiddling
with page tables, associated cache trashing, etc. (The guy found it trying
to figure out why his app was SLOWER on Linux than on Irix, and eventually
found a glibc variable that could make linux faster: raising glibc's
threshold where it does brk for malloc to infinity.)
Glibc does mmap instead of brk because theoretically brk can leave wasted
memory between fragments, although apparently nobody's ever seen more than
10% waste in a live program, and the speed penality of taking a soft page
fault at access time to muck about with the page tables is a LOT bigger than
10%...
Rob
Rob Landley <[email protected]> writes:
> Glibc does mmap instead of brk because theoretically brk can leave wasted
> memory between fragments, although apparently nobody's ever seen more than
> 10% waste in a live program, and the speed penality of taking a soft page
> fault at access time to muck about with the page tables is a LOT bigger than
> 10%...
The other reason glibc uses mmap() is because your shared libraries
are (usually) mapped smack dab in the middle of your address space.
brk() assumes a contiguous heap, so when it hits your libraries, it
has to stop, even if there is a gig of VM above the libs. mmap() can
give you an arbitrary chunk of the address space, so glibc uses it for
'large' allocations.
-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863
David Mosberger writes:
> The traditional intended behavior of Linux is that anonymous memory
> has the EXECUTE permission turned on. The reason I say "intended"
> behavior is that there appears to be an old bug in the kernel in this
> respect. Specifically, the ELF data section is normally mapped with
> READ+WRITE permission (no EXECUTE permission). The initial break
The permissions come from the ELF program header, so if your data
section is mapped without execute permission, then I would see that as
a binutils issue rather than a kernel issue.
> I discussed this briefly with Linus and his comments are reproduced
> below. While I see Linus' points, I do think that turning off EXECUTE
> permission on anonymous would improve the security in practice, if not
> in theory. But I'd be interested in other people's opinion. Also, as
> a practical matter, we currently have special hacks in the ia64 page
> fault handler that are needed to work around performance problems that
> arise from the fact that we map anonymous memory with EXECUTE rights
> by default. Those hacks avoid having to flush the cache for memory
> that's mapped executable but never really executed. So clearly there
> are technical advantages to not turning on EXECUTE permission, even if
> we ignore the security argument.
We have something of a similar issue on PPC with the need to flush the
cache. I now have a new version of the cache-flush avoidance changes
for PPC which does things a little differently to the old version. I
now use the PG_arch_1 bit to indicate that a page is icache-clean only
for page cache pages (including swap cache pages). They are flushed
in flush_icache_page, if dirty, regardless of whether the page has
execute permission or not.
Anonymous pages are flushed unconditionally in copy_user_page but not
in clear_user_page since 0 is an illegal instruction. (Not flushing
in clear_user_page actually saves us an awful lot of kernel time.)
If a user program jumps to a part of an anonymous memory region that
it has never written to, I don't think it has a right to expect any
particular behaviour, such as getting an illegal instruction signal
at the address it jumped to (which is what would happen if we
flushed).
> What I'm wondering: how do others feel about this issue? Since x86
> wont be affected by this, I'm especially interested in the opinion of
> the maintainers of non-x86 platforms.
I think that if you have per-page execute permission, you should mark
dirty pages as non-executable and flush if the user process tries to
execute from them - which sounds like what you are doing already.
With that there is no performance advantage to having anonymous memory
being non-executable.
BTW, where do you put your sigreturn trampoline? On PPC we put it on
the stack, as on ia32. If you do too, and you make the stack
non-executable then clearly you will need to find somewhere else for
it.
As to whether it is better from a security point of view to make
anonymous memory non-executable, it probably is. I guess you have the
opportunity on ia64 to do that since there aren't a lot of ia64
machines around yet. If you want to do that, now is the time to do it
and find whatever bugs there are in glibc relating to that.
Whatever you decide won't have much impact on PPC since very few
PowerPC chips support per-page execute permission.
Paul.
>>>>> On Thu, 10 Jan 2002 12:04:22 +1100 (EST), Paul Mackerras <[email protected]> said:
Paul> David Mosberger writes:
>> The traditional intended behavior of Linux is that anonymous
>> memory has the EXECUTE permission turned on. The reason I say
>> "intended" behavior is that there appears to be an old bug in the
>> kernel in this respect. Specifically, the ELF data section is
>> normally mapped with READ+WRITE permission (no EXECUTE
>> permission). The initial break
Paul> The permissions come from the ELF program header, so if your
Paul> data section is mapped without execute permission, then I
Paul> would see that as a binutils issue rather than a kernel issue.
Yes, that's one (among many other possible) solution.
>> What I'm wondering: how do others feel about this issue? Since
>> x86 wont be affected by this, I'm especially interested in the
>> opinion of the maintainers of non-x86 platforms.
Paul> I think that if you have per-page execute permission, you
Paul> should mark dirty pages as non-executable and flush if the
Paul> user process tries to execute from them - which sounds like
Paul> what you are doing already. With that there is no performance
Paul> advantage to having anonymous memory being non-executable.
That's what we do on ia64 also. There is a performance penalty though
for programs that *do* generate code dynamically (in the form of
additional page faults). I don't think it's a huge issue, but like I
said earlier: if there is a simple solution that gets rid of the
problem entirely, I'd prefer that.
Paul> BTW, where do you put your sigreturn trampoline? On PPC we
Paul> put it on the stack, as on ia32. If you do too, and you make
Paul> the stack non-executable then clearly you will need to find
Paul> somewhere else for it.
On ia64, we simply map the trampoline in the kernel's gate page. This
is a special page that can be executed by user level, but not read or
written (in the future, we may use this page for system calls, too).
No dynamic code generation needed for this (indeed, we even share the
TLB entry across all processes ;-).
Wouldn't it be better to use the sa_restorer approach on PPC like x86
and Alpha (and probably others) do? That would avoid dynamic code
generation.
Paul> As to whether it is better from a security point of view to
Paul> make anonymous memory non-executable, it probably is. I guess
Paul> you have the opportunity on ia64 to do that since there aren't
Paul> a lot of ia64 machines around yet. If you want to do that,
Paul> now is the time to do it and find whatever bugs there are in
Paul> glibc relating to that.
Yes, my thinking exactly.
Paul> Whatever you decide won't have much impact on PPC since very
Paul> few PowerPC chips support per-page execute permission.
Right. I hope it's fair to say that the principle could/should be:
- applications must use mprotect() to ensure malloc'd memory
is executable
- Linux platforms *may* choose to enforce this by making
sbrk() memory not EXECUTABLE by default
This way, we can turn of execute permission on ia64 and the other
platforms have the choice whether to follow suite or to leave things
as they are.
Thanks,
--david
How about the attached patch? It gives platform-dependent code the
option to turn off execute permission on data pages by defining a
suitable value for DATA_PAGE_DEFAULT_RIGHTS in asm/page.h. If a
platform doesn't define this macro, the old behavior applies (data
pages continue to be executable by default). For IA-64, the macro is
defined such that data pages will be executable by default only for
x86 processes (unlike real x86 CPUs, the x86 hardware emulator inside
Itanium does check the execute bit, so this is really needed).
I have booted an ia64 machine with this patch applied without any
problems and also tested an x86 program that does dynamic code
generation so the basics appear to be right.
Oh, I dropped the call to calc_vm_flags() in do_brk(). I didn't see
the point of it. Perhaps I missed something, though.
If it looks OK to you, would you mind applying this for 2.5? (The
patch is relative to 2.5.0, but it's trivial enough that this won't be
a problem, hopefully).
--david
PS: Note that this patch does not solve the original bug I reported
for platforms that do have an EXECUTABLE permission bit. If you
don't want to risk breaking backwards compatibility, your best bet
is probably to follow Paul's suggestion and modify binutils so the
data section gets mapped with RWX rights (won't help with existing
binaries, of course).
--- linux-2.5.0/mm/mmap.c Mon Nov 5 18:29:05 2001
+++ lia64-kdb/mm/mmap.c Thu Jan 10 18:01:39 2002
@@ -1046,10 +1052,7 @@
if (!vm_enough_memory(len >> PAGE_SHIFT))
return -ENOMEM;
- flags = calc_vm_flags(PROT_READ|PROT_WRITE|PROT_EXEC,
- MAP_FIXED|MAP_PRIVATE) | mm->def_flags;
-
- flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+ flags = DATA_PAGE_DEFAULT_RIGHTS | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
/* Can we just expand an old anonymous mapping? */
if (rb_parent && vma_merge(mm, prev, rb_parent, addr, addr + len, flags))
--- linux-2.5.0/include/linux/mm.h Mon Nov 26 21:29:07 2001
+++ lia64-kdb/include/linux/mm.h Thu Jan 10 21:05:41 2002
@@ -103,7 +103,14 @@
#define VM_DONTEXPAND 0x00040000 /* Cannot expand with mremap() */
#define VM_RESERVED 0x00080000 /* Don't unmap it from swap_out */
-#define VM_STACK_FLAGS 0x00000177
+#ifndef DATA_PAGE_DEFAULT_RIGHTS
+ /* Historically, Linux mapped data with execute rights, but some
+ platforms (e.g., ia64) use non-executable data by default. Those
+ platforms define their own value for this macro. */
+# define DATA_PAGE_DEFAULT_RIGHTS (VM_READ|VM_WRITE|VM_EXEC)
+#endif
+
+#define VM_STACK_FLAGS (0x00000170 | DATA_PAGE_DEFAULT_RIGHTS)
#define VM_READHINTMASK (VM_SEQ_READ | VM_RAND_READ)
#define VM_ClearReadHint(v) (v)->vm_flags &= ~VM_READHINTMASK
--- linux-2.5.0/include/asm-ia64/page.h Mon Nov 26 11:19:18 2001
+++ lia64-kdb/include/asm-ia64/page.h Thu Jan 10 18:48:34 2002
@@ -148,6 +148,13 @@
# define __pgprot(x) (x)
#endif /* !STRICT_MM_TYPECHECKS */
-#define PAGE_OFFSET 0xe000000000000000
+#define PAGE_OFFSET 0xe000000000000000
+
+#ifdef CONFIG_IA32_SUPPORT
+# define DATA_PAGE_DEFAULT_RIGHTS (VM_READ|VM_WRITE | \
+ ((current->personality == PER_LINUX32) ? VM_EXEC : 0))
+#else
+# define DATA_PAGE_DEFAULT_RIGHTS (VM_READ|VM_WRITE)
+#endif
#endif /* _ASM_IA64_PAGE_H */