With some programs the 2.6 kernel can end up allocating memory
at address zero, for a non-MAP_FIXED mmap call! This causes
problems with some programs and is generally rude to do. This
simple patch fixes the problem in my tests.
Make sure that we don't allocate memory all the way down to zero,
so the NULL pointer never gets covered up with anonymous memory
and we don't end up violating the C standard.
Signed-off-by: Rik van Riel <[email protected]>
--- linux-2.6.9/mm/mmap.c.nullmmap 2005-01-25 18:00:26.000000000 -0500
+++ linux-2.6.9/mm/mmap.c 2005-01-26 08:48:03.438701673 -0500
@@ -1114,6 +1114,8 @@ void arch_unmap_area(struct vm_area_stru
area->vm_mm->free_area_cache = area->vm_start;
}
+#define SHLIB_BASE 0x00111000
+
/*
* This mmap-allocator allocates new areas top-down from below the
* stack's low limit (the base):
@@ -1162,6 +1164,13 @@ try_again:
return addr;
/*
+ * Make sure we don't allocate all the way down to
+ * zero, which would break NULL pointer detection.
+ */
+ if (addr < SHLIB_BASE)
+ goto fail;
+
+ /*
* new region fits between prev_vma->vm_end and
* vma->vm_start, use it:
*/
@@ -1258,8 +1267,6 @@ get_unmapped_area_prot(struct file *file
EXPORT_SYMBOL(get_unmapped_area_prot);
-#define SHLIB_BASE 0x00111000
-
unsigned long arch_get_unmapped_exec_area(struct file *filp, unsigned long
addr0,
unsigned long len0, unsigned long pgoff, unsigned long flags)
{
On Wed, 26 Jan 2005, Rik van Riel wrote:
> With some programs the 2.6 kernel can end up allocating memory
> at address zero, for a non-MAP_FIXED mmap call! This causes
> problems with some programs and is generally rude to do. This
> simple patch fixes the problem in my tests.
Does this mean that we can't mmap the screen regen buffer at
0x000b8000 anymore?
How do I look at the real-mode interrupt table starting at
offset 0? You know that the return value of mmap is to be
checked for MAP_FAILED, not for NULL, don't you?
What 'C' standard do you refer to? Seg-faults on null pointers
have nothing to do with the 'C' standard and everything to
do with the platform.
I don't think you should apply this patch. It puts "your"
policy in the kernel. Now, your policy might be "good" for
you, but the kernel is not supposed to have such policy
inside.
>
> Make sure that we don't allocate memory all the way down to zero,
> so the NULL pointer never gets covered up with anonymous memory
> and we don't end up violating the C standard.
>
> Signed-off-by: Rik van Riel <[email protected]>
>
> --- linux-2.6.9/mm/mmap.c.nullmmap 2005-01-25 18:00:26.000000000 -0500
> +++ linux-2.6.9/mm/mmap.c 2005-01-26 08:48:03.438701673 -0500
> @@ -1114,6 +1114,8 @@ void arch_unmap_area(struct vm_area_stru
> area->vm_mm->free_area_cache = area->vm_start;
> }
>
> +#define SHLIB_BASE 0x00111000
> +
> /*
> * This mmap-allocator allocates new areas top-down from below the
> * stack's low limit (the base):
> @@ -1162,6 +1164,13 @@ try_again:
> return addr;
>
> /*
> + * Make sure we don't allocate all the way down to
> + * zero, which would break NULL pointer detection.
> + */
> + if (addr < SHLIB_BASE)
> + goto fail;
> +
> + /*
> * new region fits between prev_vma->vm_end and
> * vma->vm_start, use it:
> */
> @@ -1258,8 +1267,6 @@ get_unmapped_area_prot(struct file *file
> EXPORT_SYMBOL(get_unmapped_area_prot);
>
>
> -#define SHLIB_BASE 0x00111000
> -
> unsigned long arch_get_unmapped_exec_area(struct file *filp, unsigned long
> addr0,
> unsigned long len0, unsigned long pgoff, unsigned long flags)
> {
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
On Wed, 26 Jan 2005, Olivier Galibert wrote:
> On Wed, Jan 26, 2005 at 11:38:15AM -0500, linux-os wrote:
>> On Wed, 26 Jan 2005, Rik van Riel wrote:
>>
>>> With some programs the 2.6 kernel can end up allocating memory
>>> at address zero, for a non-MAP_FIXED mmap call! This causes
>>> problems with some programs and is generally rude to do. This
>>> simple patch fixes the problem in my tests.
>>
>> Does this mean that we can't mmap the screen regen buffer at
>> 0x000b8000 anymore?
>
> No. Missed the "non-MAP_FIXED" part? You can always map at 0, you
> just have to ask for it.
>
Okay.
>
>> What 'C' standard do you refer to?
>
> Malloc uses mmap to get more memory. Malloc returning 0 means no
> memory, not "the memory happens to be at 0". Not that easy to fix in
> the glibc if you want to keep the "segfault on null pointer accesses"
> debugging help too.
>
malloc is a runtime library. It has its own documented rules.
> Given that the man page itself says that unless you're using MAP_FIXED
> start is only a hint and you should use 0 if you don't care things can
> get real annoying real fast. Imagine if you want to mmap a <4K file
> and mmap then returns 0, i.e. NULL, as the mapping address as you
> asked. It's illegal from the point of view of susv3[1] and it's real
> annoying in a C/C++ program.
mmap() can (will) return 0 if you use 0 as the hint and use MAP_FIXED
at 0. That's the reason why one does NOT check for NULL with mmap() but
for MAP_FAILED (which on this system is (void *)-1.
>
> OG.
>
> [1]
> When MAP_FIXED is not set, the implementation uses addr in an
> implementation-defined manner to arrive at pa. The pa so chosen
> shall be an area of the address space that the implementation deems
> suitable for a mapping of len bytes to the file. All implementations
> interpret an addr value of 0 as granting the implementation complete
> freedom in selecting pa, subject to constraints described below. A
> non-zero value of addr is taken to be a suggestion of a process
> address near which the mapping should be placed. When the
> implementation selects a value for pa, it never places a mapping at
> address 0, nor does it replace any extant mapping.
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
On Wed, 2005-01-26 at 17:34, Chris Friesen wrote:
> linux-os wrote:
>
> > Does this mean that we can't mmap the screen regen buffer at
> > 0x000b8000 anymore?
> >
> > How do I look at the real-mode interrupt table starting at
> > offset 0? You know that the return value of mmap is to be
> > checked for MAP_FAILED, not for NULL, don't you?
>
> Can't you still map those physical addresses to other virtual addresses?
>
I think that's the case. The 0 address as refered to here is only for
the user virtual address space.
> > What 'C' standard do you refer to? Seg-faults on null pointers
> > have nothing to do with the 'C' standard and everything to
> > do with the platform.
>
> I believe the ISO/IEC 9899:1999 C Standard explicitly states that
> dereferencing a null pointer with the unary * operator results in
> undefined behavior.
Exactly. Undefined. VAX/UNIX allowed assignment to null pointers. BUT
it's now such a commonly held assumption that a null pointer is not
valid that things will break if this is changed. Doesn't glibc malloc
use mmap for small allocations? From the man page:
RETURN VALUE
For calloc() and malloc(), the value returned is a pointer to the
allocated memory, which is suitably aligned for any kind of
variable, or NULL if the request fails.
This could get pretty confusing if NULL was a valid address...
Cheers,
Bryn.
The DBX manual contrasts this behaviour on different systems:
http://acs.ucsd.edu/info/dbx.debug.php
On Wed, Jan 26, 2005 at 11:38:15AM -0500, linux-os wrote:
> On Wed, 26 Jan 2005, Rik van Riel wrote:
>
> >With some programs the 2.6 kernel can end up allocating memory
> >at address zero, for a non-MAP_FIXED mmap call! This causes
> >problems with some programs and is generally rude to do. This
> >simple patch fixes the problem in my tests.
>
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
No. Missed the "non-MAP_FIXED" part? You can always map at 0, you
just have to ask for it.
> What 'C' standard do you refer to?
Malloc uses mmap to get more memory. Malloc returning 0 means no
memory, not "the memory happens to be at 0". Not that easy to fix in
the glibc if you want to keep the "segfault on null pointer accesses"
debugging help too.
Given that the man page itself says that unless you're using MAP_FIXED
start is only a hint and you should use 0 if you don't care things can
get real annoying real fast. Imagine if you want to mmap a <4K file
and mmap then returns 0, i.e. NULL, as the mapping address as you
asked. It's illegal from the point of view of susv3[1] and it's real
annoying in a C/C++ program.
OG.
[1]
When MAP_FIXED is not set, the implementation uses addr in an
implementation-defined manner to arrive at pa. The pa so chosen
shall be an area of the address space that the implementation deems
suitable for a mapping of len bytes to the file. All implementations
interpret an addr value of 0 as granting the implementation complete
freedom in selecting pa, subject to constraints described below. A
non-zero value of addr is taken to be a suggestion of a process
address near which the mapping should be placed. When the
implementation selects a value for pa, it never places a mapping at
address 0, nor does it replace any extant mapping.
On Wed, 26 Jan 2005, Andy Isaacson wrote:
> Any particular thoughts as to how large a window should be reserved?
> SHLIB_BASE is a bit more than 1MB, which is fairly small in the grand
> scheme of things, but I guess I don't see why you'd reserve more than
> PAGE_SIZE (or maybe PAGE_SIZE*2, though I can't actually articulate why
> that seems like a good idea).
You also want to catch not-quite-NULL pointer dereferences,
where you dereference the member of a large struct or array,
but the array or struct pointer itself was NULL.
Eg. you try to fetch the 100,000th from an array, but you
got passed a NULL array pointer. Getting a segfault is
much, much better than corrupting data.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Jan 26, 2005, at 11:38, linux-os wrote:
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
I believe the point of this is to ensure that *non*-MAP_FIXED
mmap calls won't use 0, IOW, it keeps things from accidentally
being mapped at 0 when the user didn't intend to, like shared
libs and such.
Cheers,
Kyle Moffett
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------
linux-os wrote:
> On Wed, 26 Jan 2005, Rik van Riel wrote:
>
>> With some programs the 2.6 kernel can end up allocating memory
>> at address zero, for a non-MAP_FIXED mmap call! This causes
>> problems with some programs and is generally rude to do. This
>> simple patch fixes the problem in my tests.
>
>
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
>
> How do I look at the real-mode interrupt table starting at
> offset 0? You know that the return value of mmap is to be
> checked for MAP_FAILED, not for NULL, don't you?
This does not affect the case where the user requests a specific
virrtual address (ie. vm86 stuff).
--
Brian Gerst
On Wed, Jan 26, 2005 at 11:38:15AM -0500, linux-os wrote:
> On Wed, 26 Jan 2005, Rik van Riel wrote:
> >With some programs the 2.6 kernel can end up allocating memory
> >at address zero, for a non-MAP_FIXED mmap call! This causes
> >problems with some programs and is generally rude to do. This
> >simple patch fixes the problem in my tests.
>
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
That would be a MAP_FIXED call, so not affected by this change.
> How do I look at the real-mode interrupt table starting at
> offset 0? You know that the return value of mmap is to be
> checked for MAP_FAILED, not for NULL, don't you?
All MAP_FIXED, too.
> What 'C' standard do you refer to? Seg-faults on null pointers
> have nothing to do with the 'C' standard and everything to
> do with the platform.
Obviously having malloc() return NULL for a successful allocation would
be a bad thing, no? That's precisely what could happen if an anonymous
allocation got mapped at 0x0.
> >+#define SHLIB_BASE 0x00111000
Any particular thoughts as to how large a window should be reserved?
SHLIB_BASE is a bit more than 1MB, which is fairly small in the grand
scheme of things, but I guess I don't see why you'd reserve more than
PAGE_SIZE (or maybe PAGE_SIZE*2, though I can't actually articulate why
that seems like a good idea).
FWIW, mmap(2) also will return NULL if length==0. That sure did confuse
me the first time I noticed.
-andy
On Wed, 2005-01-26 at 11:38 -0500, linux-os wrote:
> On Wed, 26 Jan 2005, Rik van Riel wrote:
>
> > With some programs the 2.6 kernel can end up allocating memory
> > at address zero, for a non-MAP_FIXED mmap call! This causes
> > problems with some programs and is generally rude to do. This
> > simple patch fixes the problem in my tests.
>
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
you're confusing virtual and physical addresses
linux-os wrote:
> The seg-fault you get when you de-reference a pointer to NULL
> is caused by the kernel. You are attempting to access memory
> that has not been mapped into your address space. Once that
> memory gets mmap()ed, you will no longer get a seg-fault.
> Again, the seg-fault has nothing to do with 'C'. It's an
> implementation behavior that can be changed with mmap().
The segfault *does* have something to do with C. The standard says that
the result of dereferencing a NULL pointer is *undefined*. Not
implementation-defined, but undefined. Anything relying on
dereferencing NULL pointers is not valid C code.
Chris
Bryn Reeves wrote:
> RETURN VALUE
> For calloc() and malloc(), the value returned is a pointer to the
> allocated memory, which is suitably aligned for any kind of
> variable, or NULL if the request fails.
>
> This could get pretty confusing if NULL was a valid address...
Internally the library can use mmap(). Presumably they will map a
MAP_FAILED return code from mmap() to a NULL return code in malloc().
Chris
On Wed, 26 Jan 2005, Olivier Galibert wrote:
> On Wed, Jan 26, 2005 at 01:20:53PM -0500, linux-os wrote:
>> On Wed, 26 Jan 2005, Olivier Galibert wrote:
>>> Given that the man page itself says that unless you're using MAP_FIXED
>>> start is only a hint and you should use 0 if you don't care things can
>>> get real annoying real fast. Imagine if you want to mmap a <4K file
>>> and mmap then returns 0, i.e. NULL, as the mapping address as you
>>> asked. It's illegal from the point of view of susv3[1] and it's real
>>> annoying in a C/C++ program.
>>
>> mmap() can (will) return 0 if you use 0 as the hint and use MAP_FIXED
>> at 0. That's the reason why one does NOT check for NULL with mmap() but
>> for MAP_FAILED (which on this system is (void *)-1.
>
> All the paragraph was under an obvious "when you do not use MAP_FIXED"
> precondition. The patch is not supposed to change anything to the
> MAP_FIXED case. Malloc does not use MAP_FIXED. Usual file mmaping
> as an alternative to read() does not use MAP_FIXED.
>
> OG.
Okay. That's fine.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
On Wed, 26 Jan 2005, Rik van Riel wrote:
> On Wed, 26 Jan 2005, linux-os wrote:
>
>> Wrong! A returned value of 0 is perfectly correct for mmap()
>> when mapping a fixed address. The attached code shows it working
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> The code that is patched is only run in case of a non-MAP_FIXED
> mmap() call...
>
That's good then. I needed to make sure. Lots of embedded stuff
peeks and pokes at ix86 low-memory physical addresses.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
On Wed, 26 Jan 2005, Bryn Reeves wrote:
> On Wed, 2005-01-26 at 17:34, Chris Friesen wrote:
>> linux-os wrote:
>>
>>> Does this mean that we can't mmap the screen regen buffer at
>>> 0x000b8000 anymore?
>>>
>>> How do I look at the real-mode interrupt table starting at
>>> offset 0? You know that the return value of mmap is to be
>>> checked for MAP_FAILED, not for NULL, don't you?
>>
>> Can't you still map those physical addresses to other virtual addresses?
>>
>
> I think that's the case. The 0 address as refered to here is only for
> the user virtual address space.
>
>>> What 'C' standard do you refer to? Seg-faults on null pointers
>>> have nothing to do with the 'C' standard and everything to
>>> do with the platform.
>>
>> I believe the ISO/IEC 9899:1999 C Standard explicitly states that
>> dereferencing a null pointer with the unary * operator results in
>> undefined behavior.
>
> Exactly. Undefined. VAX/UNIX allowed assignment to null pointers. BUT
> it's now such a commonly held assumption that a null pointer is not
> valid that things will break if this is changed. Doesn't glibc malloc
> use mmap for small allocations? From the man page:
Wrong! A returned value of 0 is perfectly correct for mmap()
when mapping a fixed address. The attached code shows it working
perfectly while returning NULL, that event being put on the
terminal.
Any kernel changes that break this code are wrong. There are
many 'C' runtime library functions that signal a problem (or
should signal a problem) by returning NULL. That is, however,
the documented behavior of those procedures. Even malloc()
which __should__ show a failure to allocate by returning NULL,
doesn't work that way because the allocation never fails
(even if you are out of memory) as long as the user address
space hasn't been corrupted by the user (overwriting buffers).
The seg-fault you get when you de-reference a pointer to NULL
is caused by the kernel. You are attempting to access memory
that has not been mapped into your address space. Once that
memory gets mmap()ed, you will no longer get a seg-fault.
Again, the seg-fault has nothing to do with 'C'. It's an
implementation behavior that can be changed with mmap().
[SNIPPED...]
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
On Wed, Jan 26, 2005 at 01:20:53PM -0500, linux-os wrote:
> On Wed, 26 Jan 2005, Olivier Galibert wrote:
> >Given that the man page itself says that unless you're using MAP_FIXED
> >start is only a hint and you should use 0 if you don't care things can
> >get real annoying real fast. Imagine if you want to mmap a <4K file
> >and mmap then returns 0, i.e. NULL, as the mapping address as you
> >asked. It's illegal from the point of view of susv3[1] and it's real
> >annoying in a C/C++ program.
>
> mmap() can (will) return 0 if you use 0 as the hint and use MAP_FIXED
> at 0. That's the reason why one does NOT check for NULL with mmap() but
> for MAP_FAILED (which on this system is (void *)-1.
All the paragraph was under an obvious "when you do not use MAP_FIXED"
precondition. The patch is not supposed to change anything to the
MAP_FIXED case. Malloc does not use MAP_FIXED. Usual file mmaping
as an alternative to read() does not use MAP_FIXED.
OG.
On Wed, 26 Jan 2005, linux-os wrote:
> Wrong! A returned value of 0 is perfectly correct for mmap()
> when mapping a fixed address. The attached code shows it working
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The code that is patched is only run in case of a non-MAP_FIXED
mmap() call...
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
linux-os wrote:
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
>
> How do I look at the real-mode interrupt table starting at
> offset 0? You know that the return value of mmap is to be
> checked for MAP_FAILED, not for NULL, don't you?
Can't you still map those physical addresses to other virtual addresses?
> What 'C' standard do you refer to? Seg-faults on null pointers
> have nothing to do with the 'C' standard and everything to
> do with the platform.
I believe the ISO/IEC 9899:1999 C Standard explicitly states that
dereferencing a null pointer with the unary * operator results in
undefined behavior.
Chris
On Wed, Jan 26, 2005 at 11:18:08AM -0500, Rik van Riel wrote:
> With some programs the 2.6 kernel can end up allocating memory
> at address zero, for a non-MAP_FIXED mmap call! This causes
> problems with some programs and is generally rude to do. This
> simple patch fixes the problem in my tests.
> Make sure that we don't allocate memory all the way down to zero,
> so the NULL pointer never gets covered up with anonymous memory
> and we don't end up violating the C standard.
> Signed-off-by: Rik van Riel <[email protected]>
SHLIB_BASE does not appear to be present in 2.6.9; perhaps something
else is going on.
I think we are better off:
(a) checking for hitting zero explicitly as opposed to
enforcing a randomly-chosen lower limit for addresses
(b) enforcing vma allocation above FIRST_USER_PGD_NR*PGDIR_SIZE,
to which SHLIB_BASE bears no relation.
-- wli
On Wed, Jan 26, 2005 at 11:38:15AM -0500, linux-os wrote:
> On Wed, 26 Jan 2005, Rik van Riel wrote:
>
> >With some programs the 2.6 kernel can end up allocating memory
> >at address zero, for a non-MAP_FIXED mmap call! This causes
> >problems with some programs and is generally rude to do. This
> >simple patch fixes the problem in my tests.
>
> Does this mean that we can't mmap the screen regen buffer at
> 0x000b8000 anymore?
If you would have looked inside mmap.c, you would have seen that his check
is executed *after* trying for a specific address if it was given. Mmapping
0x000b8000 should still work. I don't know if this patch was very clean (it
probably isn't) but what it's supposed to do is only fail if no specific
address has been given to it.
> How do I look at the real-mode interrupt table starting at
> offset 0? You know that the return value of mmap is to be
> checked for MAP_FAILED, not for NULL, don't you?
>
> What 'C' standard do you refer to? Seg-faults on null pointers
> have nothing to do with the 'C' standard and everything to
> do with the platform.
Oh come on. Every normal program checks whether a variable has been allocated
or not by comparing it to NULL. I have no knowledge of the internals of glibc
though, and wouldn't know whether this should be handled inside the kernel or
if having it checked in glibc and userspace programs that use mmap directly
should be enough, but AFAIK every C coder assumes that NULL pointers point to
nothing.
Sytse
On Wed, Jan 26, 2005 at 11:18:08AM -0500, Rik van Riel wrote:
>> With some programs the 2.6 kernel can end up allocating memory
>> at address zero, for a non-MAP_FIXED mmap call! This causes
>> problems with some programs and is generally rude to do. This
>> simple patch fixes the problem in my tests.
>> Make sure that we don't allocate memory all the way down to zero,
>> so the NULL pointer never gets covered up with anonymous memory
>> and we don't end up violating the C standard.
>> Signed-off-by: Rik van Riel <[email protected]>
On Wed, Jan 26, 2005 at 09:25:38AM -0800, William Lee Irwin III wrote:
> SHLIB_BASE does not appear to be present in 2.6.9; perhaps something
> else is going on.
> I think we are better off:
> (a) checking for hitting zero explicitly as opposed to
> enforcing a randomly-chosen lower limit for addresses
> (b) enforcing vma allocation above FIRST_USER_PGD_NR*PGDIR_SIZE,
> to which SHLIB_BASE bears no relation.
There's a long discussion here, in which no one appears to have noticed
that SHLIB_BASE does not exist in mainline. Is anyone else awake here?
-- wli
On Wed, Jan 26, 2005 at 09:09:27PM -0800, William Lee Irwin III wrote:
> On Wed, Jan 26, 2005 at 11:18:08AM -0500, Rik van Riel wrote:
> >> With some programs the 2.6 kernel can end up allocating memory
> >> at address zero, for a non-MAP_FIXED mmap call! This causes
> >> problems with some programs and is generally rude to do. This
> >> simple patch fixes the problem in my tests.
> >> Make sure that we don't allocate memory all the way down to zero,
> >> so the NULL pointer never gets covered up with anonymous memory
> >> and we don't end up violating the C standard.
> >> Signed-off-by: Rik van Riel <[email protected]>
>
> On Wed, Jan 26, 2005 at 09:25:38AM -0800, William Lee Irwin III wrote:
> > SHLIB_BASE does not appear to be present in 2.6.9; perhaps something
> > else is going on.
> > I think we are better off:
> > (a) checking for hitting zero explicitly as opposed to
> > enforcing a randomly-chosen lower limit for addresses
> > (b) enforcing vma allocation above FIRST_USER_PGD_NR*PGDIR_SIZE,
> > to which SHLIB_BASE bears no relation.
>
> There's a long discussion here, in which no one appears to have noticed
> that SHLIB_BASE does not exist in mainline. Is anyone else awake here?
It's an exec-shield'ism. Rik likely was working off a Red Hat/Fedora kernel tree.
Dave
On Wed, Jan 26, 2005 at 09:09:27PM -0800, William Lee Irwin III wrote:
>> There's a long discussion here, in which no one appears to have noticed
>> that SHLIB_BASE does not exist in mainline. Is anyone else awake here?
On Thu, Jan 27, 2005 at 12:18:56AM -0500, Dave Jones wrote:
> It's an exec-shield'ism. Rik likely was working off a Red Hat/Fedora
> kernel tree.
It does change things in a substantial way. Namely, the lower address
space boundary doesn't exist in mainline at all, and so its enforcement
requires introduction. Furthermore, mainline must concern itself with
all Linux-supported architectures (not distro-supported architectures),
not just ia32/x86-64, so there is also FIRST_USER_PGD_NR to take into
account as opposed to pulling magic ia32/x86-64 -specific numbers out
of a hat and splattering them all over core code. Not that you had
anything to do with any of this.
-- wli
William Lee Irwin III writes:
> On Wed, Jan 26, 2005 at 11:18:08AM -0500, Rik van Riel wrote:
> >> With some programs the 2.6 kernel can end up allocating memory
> >> at address zero, for a non-MAP_FIXED mmap call! This causes
> >> problems with some programs and is generally rude to do. This
> >> simple patch fixes the problem in my tests.
> >> Make sure that we don't allocate memory all the way down to zero,
> >> so the NULL pointer never gets covered up with anonymous memory
> >> and we don't end up violating the C standard.
> >> Signed-off-by: Rik van Riel <[email protected]>
>
> On Wed, Jan 26, 2005 at 09:25:38AM -0800, William Lee Irwin III wrote:
> > SHLIB_BASE does not appear to be present in 2.6.9; perhaps something
> > else is going on.
> > I think we are better off:
> > (a) checking for hitting zero explicitly as opposed to
> > enforcing a randomly-chosen lower limit for addresses
> > (b) enforcing vma allocation above FIRST_USER_PGD_NR*PGDIR_SIZE,
> > to which SHLIB_BASE bears no relation.
>
> There's a long discussion here, in which no one appears to have noticed
> that SHLIB_BASE does not exist in mainline. Is anyone else awake here?
About the only kernel-level enforcement I would feel comfortable with is
to have non-fixed mmap()s refuse to grab the _page_ at address 0. Any range
larger than this is policy, and hence needs a user-space override mechanism.
Also, if user-space wants to catch accesses in a larger region above 0 then
it can do that itself, by checking the result of mmap(), or by doing a fixed
mmap() at address 0 with suitable size and rwx protection disabled.
/Mikael
William Lee Irwin III writes:
>> There's a long discussion here, in which no one appears to have noticed
>> that SHLIB_BASE does not exist in mainline. Is anyone else awake here?
On Thu, Jan 27, 2005 at 10:29:12AM +0100, Mikael Pettersson wrote:
> About the only kernel-level enforcement I would feel comfortable with is
> to have non-fixed mmap()s refuse to grab the _page_ at address 0. Any range
> larger than this is policy, and hence needs a user-space override mechanism.
> Also, if user-space wants to catch accesses in a larger region above 0 then
> it can do that itself, by checking the result of mmap(), or by doing a fixed
> mmap() at address 0 with suitable size and rwx protection disabled.
FIRST_USER_PGD_NR is a matter of killing the entire box dead where it
exists, not any kind of process' preference. Userspace should be
prevented from setting up vmas below FIRST_USER_PGD_NR. It has zero to
do with what userspace's own concerns, but rather the kernel trying to
avoid system-critical data from being stomped on by userspace. It would
be like accidentally allowing userspace to use the IDT for malloc() on
x86, destroying one's ability to handle interrupts, page faults, etc.,
to allow userspace to go below FIRST_USER_PGD_NR on ARM.
-- wli
On Thu, Jan 27, 2005 at 04:52:54AM -0800, William Lee Irwin III wrote:
> On Thu, Jan 27, 2005 at 10:29:12AM +0100, Mikael Pettersson wrote:
> > About the only kernel-level enforcement I would feel comfortable with is
> > to have non-fixed mmap()s refuse to grab the _page_ at address 0. Any range
> > larger than this is policy, and hence needs a user-space override mechanism.
> > Also, if user-space wants to catch accesses in a larger region above 0 then
> > it can do that itself, by checking the result of mmap(), or by doing a fixed
> > mmap() at address 0 with suitable size and rwx protection disabled.
>
> FIRST_USER_PGD_NR is a matter of killing the entire box dead where it
> exists, not any kind of process' preference. Userspace should be
> prevented from setting up vmas below FIRST_USER_PGD_NR.
No it should not. The PGD index is FAR to coarse to use - each PGD on
ARM maps 1MB of virtual address space. Userspace text starts at 32K.
The protection against mmap() MAP_FIXED fiddling with the first page is
handled by the arch-specific mmap() wrappers, so generic code doesn't
have to worry about it.
What generic code _does_ have to worry about is:
(a) not removing the very first page.
(b) not removing the very first pointer to the 2nd level table in the
1st level tables.
and that is all. Maybe FIRST_USER_PGD_NR was a bad way of achieving
this, but in the instance of the VM upon which it was originally
implemented (somewhere between 2.2 and 2.4), it was deemed (by others
iirc) to be the best way of achieving it at the time.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
William Lee Irwin III writes:
> William Lee Irwin III writes:
> >> There's a long discussion here, in which no one appears to have noticed
> >> that SHLIB_BASE does not exist in mainline. Is anyone else awake here?
>
> On Thu, Jan 27, 2005 at 10:29:12AM +0100, Mikael Pettersson wrote:
> > About the only kernel-level enforcement I would feel comfortable with is
> > to have non-fixed mmap()s refuse to grab the _page_ at address 0. Any range
> > larger than this is policy, and hence needs a user-space override mechanism.
> > Also, if user-space wants to catch accesses in a larger region above 0 then
> > it can do that itself, by checking the result of mmap(), or by doing a fixed
> > mmap() at address 0 with suitable size and rwx protection disabled.
>
> FIRST_USER_PGD_NR is a matter of killing the entire box dead where it
> exists, not any kind of process' preference. Userspace should be
> prevented from setting up vmas below FIRST_USER_PGD_NR. It has zero to
> do with what userspace's own concerns, but rather the kernel trying to
> avoid system-critical data from being stomped on by userspace. It would
> be like accidentally allowing userspace to use the IDT for malloc() on
> x86, destroying one's ability to handle interrupts, page faults, etc.,
> to allow userspace to go below FIRST_USER_PGD_NR on ARM.
My argument only applied to the "protect the user" discussion.
_If_ the kernel needs to reserve parts of the user's address
space for IDTs etc, then certainly mmap() mustn't map those.
But that's orthogonal (in spirit, if not in implementation)
from the "protect the user" discussion.
/Mkael
On Thu, Jan 27, 2005 at 04:52:54AM -0800, William Lee Irwin III wrote:
>> FIRST_USER_PGD_NR is a matter of killing the entire box dead where it
>> exists, not any kind of process' preference. Userspace should be
>> prevented from setting up vmas below FIRST_USER_PGD_NR.
On Thu, Jan 27, 2005 at 02:25:00PM +0000, Russell King wrote:
> No it should not. The PGD index is FAR to coarse to use - each PGD on
> ARM maps 1MB of virtual address space. Userspace text starts at 32K.
> The protection against mmap() MAP_FIXED fiddling with the first page is
> handled by the arch-specific mmap() wrappers, so generic code doesn't
> have to worry about it.
> What generic code _does_ have to worry about is:
>
> (a) not removing the very first page.
> (b) not removing the very first pointer to the 2nd level table in the
> 1st level tables.
> and that is all. Maybe FIRST_USER_PGD_NR was a bad way of achieving
> this, but in the instance of the VM upon which it was originally
> implemented (somewhere between 2.2 and 2.4), it was deemed (by others
> iirc) to be the best way of achieving it at the time.
The only claim above is the effect of clobbering virtual page 0 and
referring to this phenomenon by the macro. I was rather careful not to
claim a specific lower boundary to the address space.
-- wli
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
> The only claim above is the effect of clobbering virtual page 0 and
> referring to this phenomenon by the macro. I was rather careful not to
> claim a specific lower boundary to the address space.
OK, here is a patch that does compile against the current
2.6 kernel. It it obvious we need a cutoff somewhere, so
I've chosen one that's sure to spark up a conversation
and I'll let others decide what that cutoff value is ;)))
===== mm/mmap.c 1.161 vs edited =====
--- 1.161/mm/mmap.c Wed Jan 12 11:26:28 2005
+++ edited/mm/mmap.c Thu Jan 27 14:19:21 2005
@@ -1259,6 +1259,13 @@
return addr;
/*
+ * Make sure we don't allocate all the way down to
+ * zero, which would break NULL pointer detection.
+ */
+ if (addr < mm->brk)
+ goto fail;
+
+ /*
* new region fits between prev_vma->vm_end and
* vma->vm_start, use it:
*/
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
>> The only claim above is the effect of clobbering virtual page 0 and
>> referring to this phenomenon by the macro. I was rather careful not to
>> claim a specific lower boundary to the address space.
On Thu, Jan 27, 2005 at 02:22:50PM -0500, Rik van Riel wrote:
> OK, here is a patch that does compile against the current
> 2.6 kernel. It it obvious we need a cutoff somewhere, so
> I've chosen one that's sure to spark up a conversation
> and I'll let others decide what that cutoff value is ;)))
Okay, 2 comments:
(a) The most permissive scheme possible given architectural constraints
I'm aware of (thanks, rmk) would only disallow below PAGE_SIZE,
or basically !vma->vm_start. mm->brk affects executables with
high load addresses and prevents the use of the lower 128MB on
ia32, which I personally make extensive use of in order to move
program stacks out of the way of larger mmap()'s and to conserve
pagetable memory.
(b) sys_mremap() isn't covered.
Does the following give you any new ideas?
-- wli
Index: mm1-2.6.11-rc2/mm/mmap.c
===================================================================
--- mm1-2.6.11-rc2.orig/mm/mmap.c 2005-01-26 00:30:38.000000000 -0800
+++ mm1-2.6.11-rc2/mm/mmap.c 2005-01-27 12:33:34.000000000 -0800
@@ -1258,7 +1258,7 @@
* return with success:
*/
vma = find_vma(mm, addr);
- if (!vma || addr+len <= vma->vm_start)
+ if (addr && (!vma || addr+len <= vma->vm_start))
/* remember the address as a hint for next time */
return (mm->free_area_cache = addr);
Index: mm1-2.6.11-rc2/mm/mremap.c
===================================================================
--- mm1-2.6.11-rc2.orig/mm/mremap.c 2005-01-26 00:26:43.000000000 -0800
+++ mm1-2.6.11-rc2/mm/mremap.c 2005-01-27 12:34:34.000000000 -0800
@@ -297,6 +297,8 @@
if (flags & MREMAP_FIXED) {
if (new_addr & ~PAGE_MASK)
goto out;
+ if (!new_addr)
+ goto out;
if (!(flags & MREMAP_MAYMOVE))
goto out;
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
> (b) sys_mremap() isn't covered.
AFAICS it is covered.
> --- mm1-2.6.11-rc2.orig/mm/mremap.c 2005-01-26 00:26:43.000000000 -0800
> +++ mm1-2.6.11-rc2/mm/mremap.c 2005-01-27 12:34:34.000000000 -0800
> @@ -297,6 +297,8 @@
> if (flags & MREMAP_FIXED) {
> if (new_addr & ~PAGE_MASK)
> goto out;
> + if (!new_addr)
> + goto out;
This looks broken, look at the MREMAP_FIXED part...
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
>> (b) sys_mremap() isn't covered.
On Thu, Jan 27, 2005 at 03:58:12PM -0500, Rik van Riel wrote:
> AFAICS it is covered.
> >--- mm1-2.6.11-rc2.orig/mm/mremap.c 2005-01-26 00:26:43.000000000 -0800
> >+++ mm1-2.6.11-rc2/mm/mremap.c 2005-01-27 12:34:34.000000000 -0800
> >@@ -297,6 +297,8 @@
> > if (flags & MREMAP_FIXED) {
> > if (new_addr & ~PAGE_MASK)
> > goto out;
> >+ if (!new_addr)
> >+ goto out;
>
> This looks broken, look at the MREMAP_FIXED part...
The only way I can make sense of this is if you're trying to say that
because the user is trying to pass in a fixed address, that 0 should
then be permitted.
The intention was to disallow vmas starting at 0 categorically. i.e. it
is very intentional to deny the MREMAP_FIXED to 0 case of mremap().
It was also the intention to deny the MAP_FIXED to 0 case of mmap(),
though I didn't actually sweep that much (if at all).
-- wli
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
> The intention was to disallow vmas starting at 0 categorically. i.e. it
> is very intentional to deny the MREMAP_FIXED to 0 case of mremap().
> It was also the intention to deny the MAP_FIXED to 0 case of mmap(),
> though I didn't actually sweep that much (if at all).
We can't do that, look at line 944 of fs/binfmt_elf.c:
if (current->personality & MMAP_PAGE_ZERO) {
/* Why this, you ask??? Well SVr4 maps page 0 as read-only,
and some applications "depend" upon this behavior.
Since we do not have the power to recompile these, we
emulate the SVr4 behavior. Sigh. */
down_write(¤t->mm->mmap_sem);
error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
MAP_FIXED | MAP_PRIVATE, 0);
up_write(¤t->mm->mmap_sem);
}
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
> On Thu, 27 Jan 2005, William Lee Irwin III wrote:
>>> (b) sys_mremap() isn't covered.
>
> On Thu, Jan 27, 2005 at 03:58:12PM -0500, Rik van Riel wrote:
>> AFAICS it is covered.
>>> --- mm1-2.6.11-rc2.orig/mm/mremap.c 2005-01-26 00:26:43.000000000 -0800
>>> +++ mm1-2.6.11-rc2/mm/mremap.c 2005-01-27 12:34:34.000000000 -0800
>>> @@ -297,6 +297,8 @@
>>> if (flags & MREMAP_FIXED) {
>>> if (new_addr & ~PAGE_MASK)
>>> goto out;
>>> + if (!new_addr)
>>> + goto out;
>>
>> This looks broken, look at the MREMAP_FIXED part...
>
> The only way I can make sense of this is if you're trying to say that
> because the user is trying to pass in a fixed address, that 0 should
> then be permitted.
>
> The intention was to disallow vmas starting at 0 categorically. i.e. it
> is very intentional to deny the MREMAP_FIXED to 0 case of mremap().
> It was also the intention to deny the MAP_FIXED to 0 case of mmap(),
> though I didn't actually sweep that much (if at all).
>
>
> -- wli
No! Then you can't make a tool that will be able to look at
the entire x86 address-space! You end up with an offset that will
eventually wrap to zero which you are denying. You must leave
MAP_FIXED alone. Ignore the 'C' pedants, a pointer is properly
initialized if it points to a mapped address. It would be absurd
to have to make the CPU calculate the address at run-time just
because you thew some rocks in the way.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
>> The intention was to disallow vmas starting at 0 categorically. i.e. it
>> is very intentional to deny the MREMAP_FIXED to 0 case of mremap().
>> It was also the intention to deny the MAP_FIXED to 0 case of mmap(),
>> though I didn't actually sweep that much (if at all).
On Thu, Jan 27, 2005 at 04:28:19PM -0500, Rik van Riel wrote:
> We can't do that, look at line 944 of fs/binfmt_elf.c:
> if (current->personality & MMAP_PAGE_ZERO) {
> /* Why this, you ask??? Well SVr4 maps page 0 as read-only,
> and some applications "depend" upon this behavior.
> Since we do not have the power to recompile these, we
> emulate the SVr4 behavior. Sigh. */
> down_write(¤t->mm->mmap_sem);
> error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
> MAP_FIXED | MAP_PRIVATE, 0);
> up_write(¤t->mm->mmap_sem);
> }
You seem to be on about something else, e.g. only forbidding the vma
allocator to return a vma starting at 0 when not specifically requested.
In that case vma->vm_start < mm->brk and similar are all fine.
-- wli
On Thu, 27 Jan 2005, William Lee Irwin III wrote:
> You seem to be on about something else, e.g. only forbidding the vma
> allocator to return a vma starting at 0 when not specifically requested.
> In that case vma->vm_start < mm->brk and similar are all fine.
Yes.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Fri, 28 Jan 2005, Rik van Riel wrote:
> On Thu, 27 Jan 2005, William Lee Irwin III wrote:
>
> > You seem to be on about something else, e.g. only forbidding the vma
> > allocator to return a vma starting at 0 when not specifically requested.
> > In that case vma->vm_start < mm->brk and similar are all fine.
>
> Yes.
Prohibiting "addr < mm->brk" (unless FIXED) seems arbitrary policy to me,
and at variance with the comment "would break NULL pointer detection".
Why have you chosen to prohibit all that rather than just "!addr"?
(Other than to stoke up this controversy you expect ;)
But I have to admit that it's much less arbitrary policy than the
TASK_UNMAPPED_BASE used when going bottom up - I much prefer your
your mm->brk test to that.
I had imagined that top down (non-FIXED) would continue to make
more space available, the space below the text, just cutting off
at PAGE_SIZE. There was a more serious lower limit on ARM under
discussion before, but ARM doesn't use top down so far as I can see.
Perhaps you're coming from experience of various buggy apps
that get into difficulties if mmaps are found below mm->brk?
I'm not sure that we should be cutting others' address space to
make life easier for those, they should be ADDR_COMPAT_LAYOUT.
Part (all?) of the point of topdown was to make more address
space available, wasn't it?
arch/ppc64/mm/hugetlbpage.c (odd place to find it) has its own
arch_get_unmapped_area_topdown, should be given a similar fix.
Hugh
On Fri, Jan 28, 2005 at 02:14:36PM +0000, Hugh Dickins wrote:
> I had imagined that top down (non-FIXED) would continue to make
> more space available, the space below the text, just cutting off
> at PAGE_SIZE. There was a more serious lower limit on ARM under
> discussion before, but ARM doesn't use top down so far as I can see.
rmk chimed in at one point and made it clear that ARM's actual lower
bounday was PAGE_SIZE (before that, all I knew was:
0 < x <= (FIRST_USER_PGD_NR << PGDIR_SHIFT)).
-- wli
On Fri, 28 Jan 2005, Hugh Dickins wrote:
> Perhaps you're coming from experience of various buggy apps
> that get into difficulties if mmaps are found below mm->brk?
> I'm not sure that we should be cutting others' address space to
> make life easier for those, they should be ADDR_COMPAT_LAYOUT.
The main thing I would really like to preserve is the
space used for "near-NULL" pointer detection. That is,
detection of trying to access a large index in a NULL
pointer array, etc.
I'd be happy to have some arbitrary value for the lower
boundary...
> arch/ppc64/mm/hugetlbpage.c (odd place to find it) has its own
> arch_get_unmapped_area_topdown, should be given a similar fix.
Good point, though a 64 bit architecture is, umm, less
likely to run all the way down to zero within our lifetime.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Fri, 28 Jan 2005, Rik van Riel wrote:
>
> The main thing I would really like to preserve is the
> space used for "near-NULL" pointer detection. That is,
> detection of trying to access a large index in a NULL
> pointer array, etc.
>
> I'd be happy to have some arbitrary value for the lower
> boundary...
Almost everything will still have that not-so-near-NULL pointer
detection. It only gets limited to a PAGE_SIZE detection extent
in the case when the app mmaps as much as it possibly can.
I think it should be allowed make that tradeoff.
> > arch/ppc64/mm/hugetlbpage.c (odd place to find it) has its own
> > arch_get_unmapped_area_topdown, should be given a similar fix.
>
> Good point, though a 64 bit architecture is, umm, less
> likely to run all the way down to zero within our lifetime.
I hadn't looked at it that way!
Hugh