2016-04-02 19:18:09

by Piotr Kwapulinski

[permalink] [raw]
Subject: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

Currently the mmap(MAP_FIXED) discards the overlapping part of the
existing VMA(s).
Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
No existing mapping(s) is discarded.
The implementation tests the MAP_DONTUNMAP flag right before unmapping
the VMA. The tile arch is the dependency of mmap_flags.

I did the isolated tests and also tested it with Gentoo full
installation.

Signed-off-by: Piotr Kwapulinski <[email protected]>
---
arch/tile/mm/elf.c | 1 +
include/linux/mm.h | 3 ++-
include/uapi/asm-generic/mman-common.h | 1 +
mm/mmap.c | 10 +++++++---
4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 6225cc9..dae4b33 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -142,6 +142,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
if (!retval) {
unsigned long addr = MEM_USER_INTRPT;
addr = mmap_region(NULL, addr, INTRPT_SIZE,
+ MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
if (addr > (unsigned long) -PAGE_SIZE)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed6407d..31dcdfb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2048,7 +2048,8 @@ extern int install_special_mapping(struct mm_struct *mm,
extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);

extern unsigned long mmap_region(struct file *file, unsigned long addr,
- unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
+ unsigned long len, unsigned long mmap_flags,
+ vm_flags_t vm_flags, unsigned long pgoff);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate);
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 5827438..3655be3 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -19,6 +19,7 @@
#define MAP_TYPE 0x0f /* Mask for type of mapping */
#define MAP_FIXED 0x10 /* Interpret addr exactly */
#define MAP_ANONYMOUS 0x20 /* don't use a file */
+#define MAP_DONTUNMAP 0x40 /* don't unmap overlapping VMA */
#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
#else
diff --git a/mm/mmap.c b/mm/mmap.c
index bd2e1a53..ab429c3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1286,7 +1286,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= VM_NORESERVE;
}

- addr = mmap_region(file, addr, len, vm_flags, pgoff);
+ addr = mmap_region(file, addr, len, flags, vm_flags, pgoff);
if (!IS_ERR_VALUE(addr) &&
((vm_flags & VM_LOCKED) ||
(flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
@@ -1422,7 +1422,8 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
}

unsigned long mmap_region(struct file *file, unsigned long addr,
- unsigned long len, vm_flags_t vm_flags, unsigned long pgoff)
+ unsigned long len, unsigned long mmap_flags,
+ vm_flags_t vm_flags, unsigned long pgoff)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
@@ -1448,7 +1449,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Clear old maps */
while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
&rb_parent)) {
- if (do_munmap(mm, addr, len))
+ const bool dont_unmap =
+ (mmap_flags & (MAP_DONTUNMAP | MAP_FIXED))
+ == (MAP_DONTUNMAP | MAP_FIXED);
+ if (dont_unmap || do_munmap(mm, addr, len))
return -ENOMEM;
}

--
2.7.4


2016-04-02 19:18:20

by Piotr Kwapulinski

[permalink] [raw]
Subject: [PATCH 1/3] man/mmap.2: don't unmap the overlapping VMA(s)

mmap.2 man page update for MAP_DONTUNMAP flag of mmap.

Signed-off-by: Piotr Kwapulinski <[email protected]>
---
It should be considered to be merged only in case the patch
"[PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)"
is merged.
---
man2/mmap.2 | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 0f2f277..0fc5879 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -37,7 +37,7 @@
.\" 2007-07-10, mtk, Added an example program.
.\" 2008-11-18, mtk, document MAP_STACK
.\"
-.TH MMAP 2 2016-03-15 "Linux" "Linux Programmer's Manual"
+.TH MMAP 2 2016-04-02 "Linux" "Linux Programmer's Manual"
.SH NAME
mmap, munmap \- map or unmap files or devices into memory
.SH SYNOPSIS
@@ -213,7 +213,9 @@ If the memory region specified by
.I addr
and
.I len
-overlaps pages of any existing mapping(s), then the overlapped
+overlaps pages of any existing mapping(s) and
+.B MAP_DONTUNMAP
+is not set, then the overlapped
part of the existing mapping(s) will be discarded.
If the specified address cannot be used,
.BR mmap ()
@@ -221,6 +223,23 @@ will fail.
Because requiring a fixed address for a mapping is less portable,
the use of this option is discouraged.
.TP
+.BR MAP_DONTUNMAP " (since Linux 4.6)"
+If this flag and
+.B MAP_FIXED
+are set and the memory region specified by
+.I addr
+and
+.I length
+overlaps pages of any existing mapping(s), then the
+.BR mmap ()
+will fail with
+.BR ENOMEM .
+No existing mapping(s) will be
+discarded.
+
+Note: currently, this flag is not implemented in the glibc wrapper.
+Use the numerical hex value 40, if you want to use it.
+.TP
.B MAP_GROWSDOWN
Used for stacks.
Indicates to the kernel virtual memory system that the mapping
@@ -477,6 +496,15 @@ No memory is available.
.TP
.B ENOMEM
The process's maximum number of mappings would have been exceeded.
+Or, both the
+.B MAP_FIXED
+and
+.B MAP_DONTUNMAP
+flags are set and the memory region specified by
+.I addr
+and
+.I length
+overlaps pages of any existing mapping(s).
This error can also occur for
.BR munmap (2),
when unmapping a region in the middle of an existing mapping,
--
2.7.4

2016-04-02 19:18:41

by Piotr Kwapulinski

[permalink] [raw]
Subject: [PATCH 3/3] man/mremap.2: don't unmap the overlapping VMA(s)

mremap.2 man page update for MREAP_DONTUNMAP flag of mremap.

Signed-off-by: Piotr Kwapulinski <[email protected]>
---
It should be considered to be merged only in case the patch
"[PATCH 2/3] mm/mremap.c: don't unmap the overlapping VMA(s)"
is merged.
---
man2/mremap.2 | 50 +++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/man2/mremap.2 b/man2/mremap.2
index e4998a4..978d509 100644
--- a/man2/mremap.2
+++ b/man2/mremap.2
@@ -27,7 +27,7 @@
.\" Update for Linux 1.3.87 and later
.\" 2005-10-11 mtk: Added NOTES for MREMAP_FIXED; revised EINVAL text.
.\"
-.TH MREMAP 2 2015-12-05 "Linux" "Linux Programmer's Manual"
+.TH MREMAP 2 2016-04-02 "Linux" "Linux Programmer's Manual"
.SH NAME
mremap \- remap a virtual memory address
.SH SYNOPSIS
@@ -104,7 +104,9 @@ accepts a fifth argument,
.IR "void\ *new_address" ,
which specifies a page-aligned address to which the mapping must
be moved.
-Any previous mapping at the address range specified by
+If
+.B MREMAP_DONTUNMAP
+is not set then any previous mapping at the address range specified by
.I new_address
and
.I new_size
@@ -114,6 +116,30 @@ If
is specified, then
.B MREMAP_MAYMOVE
must also be specified.
+.TP
+.BR MREMAP_DONTUNMAP " (since Linux 4.6)"
+This flag is similar to
+.B MAP_DONTUNMAP
+flag of
+.BR mmap (2).
+If this flag and
+.B MREMAP_FIXED
+are set and the memory region specified by
+.I new_address
+and
+.I new_size
+overlaps pages of any existing mapping(s), then the
+.BR mremap ()
+will fail with
+.BR ENOMEM .
+No existing mapping(s) will be discarded. If
+.B MREMAP_DONTUNMAP
+is specified, then
+.B MREMAP_FIXED
+must also be specified.
+
+Note: currently, this flag is not implemented in the glibc wrapper.
+Use the numerical value 4, if you want to use it.
.PP
If the memory segment specified by
.I old_address
@@ -156,8 +182,9 @@ page aligned; a value other than
.B MREMAP_MAYMOVE
or
.B MREMAP_FIXED
-was specified in
-.IR flags ;
+or
+.B MREMAP_DONTUNMAP
+was specified in \fIflags\fP;
.I new_size
was zero;
.I new_size
@@ -175,13 +202,26 @@ and
or
.B MREMAP_FIXED
was specified without also specifying
-.BR MREMAP_MAYMOVE .
+.BR MREMAP_MAYMOVE ;
+or
+.B MREMAP_DONTUNMAP
+was specified without also specifying
+.BR MREMAP_FIXED .
.TP
.B ENOMEM
The memory area cannot be expanded at the current virtual address, and the
.B MREMAP_MAYMOVE
flag is not set in \fIflags\fP.
Or, there is not enough (virtual) memory available.
+Or, both the
+.B MREMAP_FIXED
+and
+.B MREMAP_DONTUNMAP
+flags are set and the memory region specified by
+.I new_address
+and
+.I new_size
+overlaps pages of any existing mapping(s).
.SH CONFORMING TO
This call is Linux-specific, and should not be used in programs
intended to be portable.
--
2.7.4

2016-04-02 19:18:29

by Piotr Kwapulinski

[permalink] [raw]
Subject: [PATCH 2/3] mm/mremap.c: don't unmap the overlapping VMA(s)

Currently the
mremap(new_size, MREMAP_MAYMOVE | MREMAP_FIXED, new_address)
discards the part of existing VMA(s) if it overlaps the memory region
specified by new_address and new_size.
Introduce the new MREMAP_DONTUNMAP flag which forces the mremap to
fail with ENOMEM whenever the overlapping occurs. No existing
mapping(s) is discarded.
The implementation tests the MAP_DONTUNMAP flag and scans the AS for
the overlapping VMA(s) right before unmapping the area.

I did the isolated tests and also tested it with Gentoo full
installation.

Signed-off-by: Piotr Kwapulinski <[email protected]>
---
include/uapi/linux/mman.h | 5 +++--
mm/mremap.c | 23 +++++++++++++++++------
2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index ade4acd..bc6478e 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -3,8 +3,9 @@

#include <asm/mman.h>

-#define MREMAP_MAYMOVE 1
-#define MREMAP_FIXED 2
+#define MREMAP_MAYMOVE 1
+#define MREMAP_FIXED 2
+#define MREMAP_DONTUNMAP 4

#define OVERCOMMIT_GUESS 0
#define OVERCOMMIT_ALWAYS 1
diff --git a/mm/mremap.c b/mm/mremap.c
index 3fa0a467..f57d396 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -397,7 +397,8 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
}

static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
- unsigned long new_addr, unsigned long new_len, bool *locked)
+ unsigned long new_addr, unsigned long new_len,
+ unsigned long flags, bool *locked)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -415,9 +416,16 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if (addr + old_len > new_addr && new_addr + new_len > addr)
goto out;

- ret = do_munmap(mm, new_addr, new_len);
- if (ret)
- goto out;
+ if (flags & MREMAP_DONTUNMAP) {
+ if (find_vma_intersection(mm, new_addr, new_len)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ } else {
+ ret = do_munmap(mm, new_addr, new_len);
+ if (ret)
+ goto out;
+ }

if (old_len >= new_len) {
ret = do_munmap(mm, addr+new_len, old_len - new_len);
@@ -482,12 +490,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
unsigned long charged = 0;
bool locked = false;

- if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
+ if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
return ret;

if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
return ret;

+ if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED))
+ return ret;
+
if (offset_in_page(addr))
return ret;

@@ -505,7 +516,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
down_write(&current->mm->mmap_sem);

if (flags & MREMAP_FIXED) {
- ret = mremap_to(addr, old_len, new_addr, new_len,
+ ret = mremap_to(addr, old_len, new_addr, new_len, flags,
&locked);
goto out;
}
--
2.7.4

2016-04-02 21:54:49

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Sat, Apr 02, 2016 at 09:17:31PM +0200, Piotr Kwapulinski wrote:
> @@ -19,6 +19,7 @@
> #define MAP_TYPE 0x0f /* Mask for type of mapping */
> #define MAP_FIXED 0x10 /* Interpret addr exactly */
> #define MAP_ANONYMOUS 0x20 /* don't use a file */
> +#define MAP_DONTUNMAP 0x40 /* don't unmap overlapping VMA */

NAK.

arch/powerpc/include/uapi/asm/mman.h:#define MAP_NORESERVE 0x40 /* don't reserve swap pages */
arch/sparc/include/uapi/asm/mman.h:#define MAP_NORESERVE 0x40 /* don't reserve swap pages */
arch/x86/include/uapi/asm/mman.h:#define MAP_32BIT 0x40 /* only give out 32bit addresses */

--
Kirill A. Shutemov

2016-04-03 05:52:41

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Sat, Apr 2, 2016 at 10:17 PM, Piotr Kwapulinski
<[email protected]> wrote:
> Currently the mmap(MAP_FIXED) discards the overlapping part of the
> existing VMA(s).
> Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
> with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
> No existing mapping(s) is discarded.

How userspace is supposed to use this and handle failure?

For now you can get the same behavior in couple syscalls:
mmap without MAP_FIXED if resulting address differs unmmap and handle error.
Twice slower but this is error-path so you anyway have to some extra actions.

> The implementation tests the MAP_DONTUNMAP flag right before unmapping
> the VMA. The tile arch is the dependency of mmap_flags.
>
> I did the isolated tests and also tested it with Gentoo full
> installation.
>
> Signed-off-by: Piotr Kwapulinski <[email protected]>
> ---
> arch/tile/mm/elf.c | 1 +
> include/linux/mm.h | 3 ++-
> include/uapi/asm-generic/mman-common.h | 1 +
> mm/mmap.c | 10 +++++++---
> 4 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
> index 6225cc9..dae4b33 100644
> --- a/arch/tile/mm/elf.c
> +++ b/arch/tile/mm/elf.c
> @@ -142,6 +142,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
> if (!retval) {
> unsigned long addr = MEM_USER_INTRPT;
> addr = mmap_region(NULL, addr, INTRPT_SIZE,
> + MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
> VM_READ|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
> if (addr > (unsigned long) -PAGE_SIZE)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ed6407d..31dcdfb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2048,7 +2048,8 @@ extern int install_special_mapping(struct mm_struct *mm,
> extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
>
> extern unsigned long mmap_region(struct file *file, unsigned long addr,
> - unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
> + unsigned long len, unsigned long mmap_flags,
> + vm_flags_t vm_flags, unsigned long pgoff);
> extern unsigned long do_mmap(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot, unsigned long flags,
> vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate);
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index 5827438..3655be3 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -19,6 +19,7 @@
> #define MAP_TYPE 0x0f /* Mask for type of mapping */
> #define MAP_FIXED 0x10 /* Interpret addr exactly */
> #define MAP_ANONYMOUS 0x20 /* don't use a file */
> +#define MAP_DONTUNMAP 0x40 /* don't unmap overlapping VMA */
> #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
> # define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
> #else
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bd2e1a53..ab429c3 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1286,7 +1286,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> vm_flags |= VM_NORESERVE;
> }
>
> - addr = mmap_region(file, addr, len, vm_flags, pgoff);
> + addr = mmap_region(file, addr, len, flags, vm_flags, pgoff);
> if (!IS_ERR_VALUE(addr) &&
> ((vm_flags & VM_LOCKED) ||
> (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
> @@ -1422,7 +1422,8 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
> }
>
> unsigned long mmap_region(struct file *file, unsigned long addr,
> - unsigned long len, vm_flags_t vm_flags, unsigned long pgoff)
> + unsigned long len, unsigned long mmap_flags,
> + vm_flags_t vm_flags, unsigned long pgoff)
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma, *prev;
> @@ -1448,7 +1449,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> /* Clear old maps */
> while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
> &rb_parent)) {
> - if (do_munmap(mm, addr, len))
> + const bool dont_unmap =
> + (mmap_flags & (MAP_DONTUNMAP | MAP_FIXED))
> + == (MAP_DONTUNMAP | MAP_FIXED);
> + if (dont_unmap || do_munmap(mm, addr, len))
> return -ENOMEM;
> }
>
> --
> 2.7.4
>

2016-04-04 07:31:07

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Sat 02-04-16 21:17:31, Piotr Kwapulinski wrote:
> Currently the mmap(MAP_FIXED) discards the overlapping part of the
> existing VMA(s).
> Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
> with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
> No existing mapping(s) is discarded.

You forgot to tell us what is the use case for this new flag.
--
Michal Hocko
SUSE Labs

2016-04-04 15:26:51

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On 04/04/2016 09:31 AM, Michal Hocko wrote:
> On Sat 02-04-16 21:17:31, Piotr Kwapulinski wrote:
>> Currently the mmap(MAP_FIXED) discards the overlapping part of the
>> existing VMA(s).
>> Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
>> with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
>> No existing mapping(s) is discarded.
>
> You forgot to tell us what is the use case for this new flag.

Exactly. Also, returning ENOMEM is strange, EINVAL might be a better
match, otherwise how would you distinguish a "geunine" ENOMEM from
passing a wrong address?


2016-04-07 16:11:37

by Piotr Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Mon, Apr 04, 2016 at 05:26:43PM +0200, Vlastimil Babka wrote:
> On 04/04/2016 09:31 AM, Michal Hocko wrote:
> >On Sat 02-04-16 21:17:31, Piotr Kwapulinski wrote:
> >>Currently the mmap(MAP_FIXED) discards the overlapping part of the
> >>existing VMA(s).
> >>Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
> >>with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
> >>No existing mapping(s) is discarded.
> >
> >You forgot to tell us what is the use case for this new flag.
>
> Exactly. Also, returning ENOMEM is strange, EINVAL might be a better match,
> otherwise how would you distinguish a "geunine" ENOMEM from passing a wrong
> address?
>
>

Thanks to all for suggestions. I'll fix them.

The example use case:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

void main(void)
{
void* addr = (void*)0x1000000;
size_t size = 0x600000;
void* start = 0;
start = mmap(addr,
size,
PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);

strcpy(start, "PPPP");
printf("%s\n", start); // == PPPP

addr = (void*)0x1000000;
size = 0x9000;
start = mmap(addr,
size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);

printf("%s\n", start); // != PPPP
}

Another use case, this time with huge pages in action.
The limit configured in proc's nr_hugepages is exceeded.
mmap unmaps the area and fails. No new mapping is created.
The program segfaults.

echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

void main(void)
{
void* addr = (void*)0x1000000;
size_t size = 0x600000;
void* start = 0;
start = mmap(addr,
size,
PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);

strcpy(start, "PPPP");
printf("%s\n", start); // == PPPP

addr = (void*)0x1000000;
size = 0x400000;
start = mmap(addr,
size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED | MAP_HUGETLB,
-1, 0); // mmap fails but unmaps the area

printf("%s\n", start); // segfault
}

Piotr

2016-04-07 16:20:54

by Piotr Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Mon, Apr 04, 2016 at 05:26:43PM +0200, Vlastimil Babka wrote:
> On 04/04/2016 09:31 AM, Michal Hocko wrote:
> >On Sat 02-04-16 21:17:31, Piotr Kwapulinski wrote:
> >>Currently the mmap(MAP_FIXED) discards the overlapping part of the
> >>existing VMA(s).
> >>Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
> >>with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
> >>No existing mapping(s) is discarded.
> >
> >You forgot to tell us what is the use case for this new flag.
>
> Exactly. Also, returning ENOMEM is strange, EINVAL might be a better match,
> otherwise how would you distinguish a "geunine" ENOMEM from passing a wrong
> address?
>
>

Thanks to all for suggestions. I'll fix them.

The example use case:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

void main(void)
{
void* addr = (void*)0x1000000;
size_t size = 0x600000;
void* start = 0;
start = mmap(addr,
size,
PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);

strcpy(start, "PPPP");
printf("%s\n", start); // == PPPP

addr = (void*)0x1000000;
size = 0x9000;
start = mmap(addr,
size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);

printf("%s\n", start); // != PPPP
}

Another use case, this time with huge pages in action.
The limit configured in proc's nr_hugepages is exceeded.
mmap unmaps the area and fails. No new mapping is created.
The program segfaults.

echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

void main(void)
{
void* addr = (void*)0x1000000;
size_t size = 0x600000;
void* start = 0;
start = mmap(addr,
size,
PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
-1, 0);

strcpy(start, "PPPP");
printf("%s\n", start); // == PPPP

addr = (void*)0x1000000;
size = 0x400000;
start = mmap(addr,
size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED | MAP_HUGETLB,
-1, 0); // mmap fails but unmaps the area

printf("%s\n", addr); // segfault
}

Piotr

2016-04-07 16:31:15

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Thu 07-04-16 18:11:29, Piotr Kwapulinski wrote:
> On Mon, Apr 04, 2016 at 05:26:43PM +0200, Vlastimil Babka wrote:
> > On 04/04/2016 09:31 AM, Michal Hocko wrote:
> > >On Sat 02-04-16 21:17:31, Piotr Kwapulinski wrote:
> > >>Currently the mmap(MAP_FIXED) discards the overlapping part of the
> > >>existing VMA(s).
> > >>Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
> > >>with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
> > >>No existing mapping(s) is discarded.
> > >
> > >You forgot to tell us what is the use case for this new flag.
> >
> > Exactly. Also, returning ENOMEM is strange, EINVAL might be a better match,
> > otherwise how would you distinguish a "geunine" ENOMEM from passing a wrong
> > address?
> >
> >
>
> Thanks to all for suggestions. I'll fix them.
>
> The example use case:
> #include <stdio.h>
> #include <string.h>
> #include <sys/mman.h>
>
> void main(void)
> {
> void* addr = (void*)0x1000000;
> size_t size = 0x600000;
> void* start = 0;
> start = mmap(addr,
> size,
> PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> -1, 0);
>
> strcpy(start, "PPPP");
> printf("%s\n", start); // == PPPP
>
> addr = (void*)0x1000000;
> size = 0x9000;
> start = mmap(addr,
> size,
> PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> -1, 0);
>
> printf("%s\n", start); // != PPPP
> }
>
> Another use case, this time with huge pages in action.
> The limit configured in proc's nr_hugepages is exceeded.
> mmap unmaps the area and fails. No new mapping is created.
> The program segfaults.

Yes and this is the standard behavior for ages. So _why_ somebody wants
non-default behavior. When I've asked for the use case I meant a real
life code (not just an example snippet) which cannot cope with the
standard semantic. In other words why this cannot be handled in the
userspace and we have to add a new API which we have to maintain for
ever?
--
Michal Hocko
SUSE Labs

2016-04-08 15:32:43

by Piotr Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH 0/3] mm/mmap.c: don't unmap the overlapping VMA(s)

On Thu, Apr 07, 2016 at 06:31:09PM +0200, Michal Hocko wrote:
> On Thu 07-04-16 18:11:29, Piotr Kwapulinski wrote:
> > On Mon, Apr 04, 2016 at 05:26:43PM +0200, Vlastimil Babka wrote:
> > > On 04/04/2016 09:31 AM, Michal Hocko wrote:
> > > >On Sat 02-04-16 21:17:31, Piotr Kwapulinski wrote:
> > > >>Currently the mmap(MAP_FIXED) discards the overlapping part of the
> > > >>existing VMA(s).
> > > >>Introduce the new MAP_DONTUNMAP flag which forces the mmap to fail
> > > >>with ENOMEM whenever the overlapping occurs and MAP_FIXED is set.
> > > >>No existing mapping(s) is discarded.
> > > >
> > > >You forgot to tell us what is the use case for this new flag.
> > >
> > > Exactly. Also, returning ENOMEM is strange, EINVAL might be a better match,
> > > otherwise how would you distinguish a "geunine" ENOMEM from passing a wrong
> > > address?
> > >
> > >
> >
> > Thanks to all for suggestions. I'll fix them.
> >
> > The example use case:
> > #include <stdio.h>
> > #include <string.h>
> > #include <sys/mman.h>
> >
> > void main(void)
> > {
> > void* addr = (void*)0x1000000;
> > size_t size = 0x600000;
> > void* start = 0;
> > start = mmap(addr,
> > size,
> > PROT_WRITE,
> > MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> > -1, 0);
> >
> > strcpy(start, "PPPP");
> > printf("%s\n", start); // == PPPP
> >
> > addr = (void*)0x1000000;
> > size = 0x9000;
> > start = mmap(addr,
> > size,
> > PROT_READ | PROT_WRITE,
> > MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> > -1, 0);
> >
> > printf("%s\n", start); // != PPPP
> > }
> >
> > Another use case, this time with huge pages in action.
> > The limit configured in proc's nr_hugepages is exceeded.
> > mmap unmaps the area and fails. No new mapping is created.
> > The program segfaults.
>
> Yes and this is the standard behavior for ages. So _why_ somebody wants
> non-default behavior. When I've asked for the use case I meant a real
> life code (not just an example snippet) which cannot cope with the
> standard semantic. In other words why this cannot be handled in the
> userspace and we have to add a new API which we have to maintain for
> ever?

Ok, I got it. Thanks for feedback.