2017-03-06 14:36:54

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv6 0/5] Fix compatible mmap() return pointer over 4Gb

Note: this patches set has some minor conflicts with Kirill's
set for 5-table paging. (with the patch "mm, x86: introduce
PR_SET_MAX_VADDR and PR_GET_MAX_VADDR").
Conflicts are minor and I'm OK rather rebase this set on linux-next
with his set or help him with rebasing (if he need it).

There are a couple of fixes related to x86 mmap():
o 1-2 are just preparation to introduce new mmap bases
o 3 fixes 32-bit syscall returning address over 4Gb in applications,
launched from 64-bit binaries. This is done by introducing new bases:
mmap_compat_base and mmap_compat_legacy_base.
Those bases are separated from 64-bit ones, which allows to use
mmap base according to bitness of the syscall.
Which makes the behavior of 32-bit syscalls the same independently
of launched binary's bitness (the same for 64-bit syscalls).
It also makes possible to allocate with 64-bit mmap() address higher
than 4Gb in compat ELFs - that may be used when 4Gb is not enough or
with MAP_FIXED for hiding that mapping from 32-bit address space.
o 4 fixes behavior of MAP_32BIT - at this moment it's related
to the bitness of executed binary, not of the syscall.
o 5 is a selftest to check that 32-bit mmap() does return 32-bit
pointer.

Changes since v5:
- ifdef fixup (kbuild test robot)
- rebase on linux-next-20170306 (minor: sysret_rip test added)

Changes since v4 (Thomas's review):
- rewrote changelogs (so they should be readable by humans also)
- made code simpler (fighting to ifdef horror, etc)

Changes since v3:
- fixed usage of 64-bit random mask for 32-bit mm->mmap_compat_base,
during introducing mmap_compat{_legacy,}_base

Changes since v2:
- don't distinguish native and compat tasks by TIF_ADDR32,
introduced mmap_compat{_legacy,}_base which allows to treat them
the same
- fixed kbuild errors

Changes since v1:
- Recalculate mmap_base instead of using max possible virtual address
for compat/native syscall. That will make policy for allocation the
same in 32-bit binaries and in 32-bit syscalls in 64-bit binaries.
I need this because sys_mmap() in restored 32-bit process shouldn't
hit the stack area.
- Fixed mmap() with MAP_32BIT flag in the same usecases
- used in_compat_syscall() helper rather TS_COMPAT check (Andy noticed)
- introduced find_top() helper as suggested by Andy to simplify code
- fixed test error-handeling: it checked the result of sys_mmap() with
MMAP_FAILED, which is not correct, as it calls raw syscall - now
checks return value to be aligned to PAGE_SIZE.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Cyrill Gorcunov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>

Dmitry Safonov (5):
x86/mm: introduce arch_rnd() to compute 32/64 mmap rnd
x86/mm: add task_size parameter to mmap_base()
x86/mm: introduce mmap_compat_base for 32-bit mmap()
x86/mm: check in_compat_syscall() instead TIF_ADDR32 for
mmap(MAP_32BIT)
selftests/x86: add test for 32-bit mmap() return addr

arch/Kconfig | 7 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/elf.h | 27 ++--
arch/x86/include/asm/processor.h | 4 +-
arch/x86/kernel/sys_x86_64.c | 27 +++-
arch/x86/mm/mmap.c | 109 ++++++++-----
include/linux/mm_types.h | 5 +
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/test_compat_mmap.c | 208 +++++++++++++++++++++++++
9 files changed, 332 insertions(+), 58 deletions(-)
create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c

--
2.11.1


2017-03-06 14:37:38

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv6 1/5] x86/mm: introduce arch_rnd() to compute 32/64 mmap rnd

To fix 32-bit mmap() syscall returning pointer higher than 4Gb in
64-bit binaries, two mmap bases will be used: one for mapping with
32-bit syscalls and another for 64-bit syscall.
To correctly place those two bases, introduce arch_rnd() function,
which will return the random factor independently of mmap_is_ia32().

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/mm/mmap.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 7940166c799b..f31ed7097d0b 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -55,6 +55,14 @@ static unsigned long stack_maxrandom_size(void)
#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
#define MAX_GAP (TASK_SIZE/6*5)

+#ifdef CONFIG_COMPAT
+# define mmap32_rnd_bits mmap_rnd_compat_bits
+# define mmap64_rnd_bits mmap_rnd_bits
+#else
+# define mmap32_rnd_bits mmap_rnd_bits
+# define mmap64_rnd_bits mmap_rnd_bits
+#endif
+
static int mmap_is_legacy(void)
{
if (current->personality & ADDR_COMPAT_LAYOUT)
@@ -66,20 +74,14 @@ static int mmap_is_legacy(void)
return sysctl_legacy_va_layout;
}

-unsigned long arch_mmap_rnd(void)
+static unsigned long arch_rnd(unsigned int rndbits)
{
- unsigned long rnd;
-
- if (mmap_is_ia32())
-#ifdef CONFIG_COMPAT
- rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
-#else
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
-#endif
- else
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
+ return (get_random_long() & ((1UL << rndbits) - 1)) << PAGE_SHIFT;
+}

- return rnd << PAGE_SHIFT;
+unsigned long arch_mmap_rnd(void)
+{
+ return arch_rnd(mmap_is_ia32() ? mmap32_rnd_bits : mmap64_rnd_bits);
}

static unsigned long mmap_base(unsigned long rnd)
--
2.11.1

2017-03-06 14:40:13

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

Result of mmap() calls with MAP_32BIT flag at this moment depends
on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
It's broken as the behavior of mmap() shouldn't depend on exec-ed
application's bitness. Instead, it should check the bitness of mmap()
syscall.
How it worked before:
o for 32-bit compatible binaries it is completely ignored. Which was
fine when there were one mmap_base, computed for 32-bit syscalls.
After introducing mmap_compat_base 64-bit syscalls do use computed
for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
address with 64-bit syscall in application launched from 32-bit
compatible binary. And ignoring this flag is not expected behavior.
o for 64-bit ELFs it forces legacy bottom-up allocations and 1Gb address
space restriction for allocations: [0x40000000, 0x80000000) - look at
find_start_end(). Which means that it was wrongly handled for 32-bit
syscalls - they don't need nor this restriction nor legacy mmap
(as we try to keep 32-bit syscalls behavior the same independently of
native/compat mode of ELF being executed).

Changed mmap() behavior for MAP_32BIT flag the way that for 32-bit
syscalls it will be always ignored and for 64-bit syscalls it'll
always return 32-bit pointer restricted with 1Gb adress space,
independently of the binary's bitness of executed application.

Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/kernel/sys_x86_64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index c54817baabc7..63e89dfc808a 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -115,7 +115,7 @@ static unsigned long get_mmap_base(int is_legacy)
static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
+ if (!in_compat_syscall() && (flags & MAP_32BIT)) {
/* This is usually used needed to map code in small
model, so it needs to be in the first 31bit. Limit
it to that. This means we need to move the
@@ -191,7 +191,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;

/* for MAP_32BIT mappings we force the legacy mmap base */
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
+ if (!in_compat_syscall() && (flags & MAP_32BIT))
goto bottomup;

/* requesting a specific address */
--
2.11.1

2017-03-06 14:40:32

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv6 3/5] x86/mm: introduce mmap_compat_base for 32-bit mmap()

mmap() uses base address, from which it starts to look for a free space
for allocation. At this moment there is one mm->mmap_base, which is
calculated during exec(). The address depends on task's size, set rlimit
for stack, ASLR randomization. As task size and number of random bits
differ between 64 and 32 bit applications, calculated mmap_base will
be valid only for the same bitness.
That means e.g., that calculated mmap_base for ELF64 lies upper than
4Gb, which results in bug that 32-bit mmap() syscall will start to
search for a free address over 32-bit address space and returns only
lower 4-bytes of allocated mapping.
As 64-bit applications can do 32-bit syscalls and vice-versa, we need
to correctly chose mmap_base address for syscalls of different bitness.
For this purpose introduce mmap_compat_base and mmap_compat_legacy_base,
use them accordingly in top-down and bottom-up allocations in 32-bit
syscalls, use existed bases mmap_base and mmap_legacy_base for 64-bit
syscalls.
That means that each application on x86_64 will have now two bases
(or four if count legacy bases also) which are calculated on application
exec(). I guess we can relax the calculation of bases until first mmap()
call, but don't think it's worth.

Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/Kconfig | 7 +++++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/elf.h | 3 +++
arch/x86/kernel/sys_x86_64.c | 23 +++++++++++++++++++----
arch/x86/mm/mmap.c | 41 ++++++++++++++++++++++++++++-------------
include/linux/mm_types.h | 5 +++++
6 files changed, 63 insertions(+), 17 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index cd211a14a88f..c4d6833aacd9 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -700,6 +700,13 @@ config ARCH_MMAP_RND_COMPAT_BITS
This value can be changed after boot using the
/proc/sys/vm/mmap_rnd_compat_bits tunable

+config HAVE_ARCH_COMPAT_MMAP_BASES
+ bool
+ help
+ This allows 64bit applications to invoke 32-bit mmap() syscall
+ and vice-versa 32-bit applications to call 64-bit mmap().
+ Required for applications doing different bitness syscalls.
+
config HAVE_COPY_THREAD_TLS
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a294ee..2bab9d093b51 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -106,6 +106,7 @@ config X86
select HAVE_ARCH_KMEMCHECK
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
+ select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index b908141cf0c4..ac5be5ba8527 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -303,6 +303,9 @@ static inline int mmap_is_ia32(void)
test_thread_flag(TIF_ADDR32));
}

+extern unsigned long tasksize_32bit(void);
+extern unsigned long tasksize_64bit(void);
+
#ifdef CONFIG_X86_32

#define __STACK_RND_MASK(is32bit) (0x7ff)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 50215a4b9347..c54817baabc7 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -17,6 +17,8 @@
#include <linux/uaccess.h>
#include <linux/elf.h>

+#include <asm/elf.h>
+#include <asm/compat.h>
#include <asm/ia32.h>
#include <asm/syscalls.h>

@@ -98,6 +100,18 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
return error;
}

+static unsigned long get_mmap_base(int is_legacy)
+{
+ struct mm_struct *mm = current->mm;
+
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+ if (in_compat_syscall())
+ return is_legacy ? mm->mmap_compat_legacy_base
+ : mm->mmap_compat_base;
+#endif
+ return is_legacy ? mm->mmap_legacy_base : mm->mmap_base;
+}
+
static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
@@ -114,10 +128,11 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
if (current->flags & PF_RANDOMIZE) {
*begin = randomize_page(*begin, 0x02000000);
}
- } else {
- *begin = current->mm->mmap_legacy_base;
- *end = TASK_SIZE;
+ return;
}
+
+ *begin = get_mmap_base(1);
+ *end = in_compat_syscall() ? tasksize_32bit() : tasksize_64bit();
}

unsigned long
@@ -191,7 +206,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = PAGE_SIZE;
- info.high_limit = mm->mmap_base;
+ info.high_limit = get_mmap_base(0);
info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1e9cb945dca1..84a0ffc550fe 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -36,11 +36,16 @@ struct va_alignment __read_mostly va_align = {
.flags = -1,
};

-static inline unsigned long tasksize_32bit(void)
+unsigned long tasksize_32bit(void)
{
return IA32_PAGE_OFFSET;
}

+unsigned long tasksize_64bit(void)
+{
+ return TASK_SIZE_MAX;
+}
+
static unsigned long stack_maxrandom_size(unsigned long task_size)
{
unsigned long max = 0;
@@ -81,6 +86,8 @@ static unsigned long arch_rnd(unsigned int rndbits)

unsigned long arch_mmap_rnd(void)
{
+ if (!(current->flags & PF_RANDOMIZE))
+ return 0;
return arch_rnd(mmap_is_ia32() ? mmap32_rnd_bits : mmap64_rnd_bits);
}

@@ -114,22 +121,30 @@ static unsigned long mmap_legacy_base(unsigned long rnd,
* This function, called very early during the creation of a new
* process VM image, sets up which VM layout function to use:
*/
-void arch_pick_mmap_layout(struct mm_struct *mm)
+static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
+ unsigned long random_factor, unsigned long task_size)
{
- unsigned long random_factor = 0UL;
-
- if (current->flags & PF_RANDOMIZE)
- random_factor = arch_mmap_rnd();
-
- mm->mmap_legacy_base = mmap_legacy_base(random_factor, TASK_SIZE);
+ *legacy_base = mmap_legacy_base(random_factor, task_size);
+ if (mmap_is_legacy())
+ *base = *legacy_base;
+ else
+ *base = mmap_base(random_factor, task_size);
+}

- if (mmap_is_legacy()) {
- mm->mmap_base = mm->mmap_legacy_base;
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+ if (mmap_is_legacy())
mm->get_unmapped_area = arch_get_unmapped_area;
- } else {
- mm->mmap_base = mmap_base(random_factor, TASK_SIZE);
+ else
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
- }
+
+ arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
+ arch_rnd(mmap64_rnd_bits), tasksize_64bit());
+
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+ arch_pick_mmap_base(&mm->mmap_compat_base, &mm->mmap_compat_legacy_base,
+ arch_rnd(mmap32_rnd_bits), tasksize_32bit());
+#endif
}

const char *arch_vma_name(struct vm_area_struct *vma)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f60f45fe226f..45cdb27791a3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -367,6 +367,11 @@ struct mm_struct {
#endif
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+ /* Base adresses for compatible mmap() */
+ unsigned long mmap_compat_base;
+ unsigned long mmap_compat_legacy_base;
+#endif
unsigned long task_size; /* size of task vm space */
unsigned long highest_vm_end; /* highest vma end address */
pgd_t * pgd;
--
2.11.1

2017-03-06 14:59:31

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv6 5/5] selftests/x86: add test for 32-bit mmap() return addr

This test calls 32-bit mmap() through int 0x80 and checks /proc/self/maps
for allocated VMA's address - it should be downer than 4 Gb. Just
accessing allocated with mmap pointer will not work, as we could have
some VMA placed on the same address as lower 4 bytes of the new mapping.
As allocation is top-down by default (unless legacy personality was set),
we can expect that mmap() will allocate memory over 4Gb if mmap_base
has been computed not correctly.

On failure it prints:
[NOTE] Allocated mmap 0x6f36a000, sized 0x400000
[NOTE] New mapping appeared: 0x7f936f36a000
[FAIL] Found VMA [0x7f936f36a000, 0x7f936f76a000] in maps file, that was allocated with compat syscall

Cc: Shuah Khan <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/test_compat_mmap.c | 208 +++++++++++++++++++++++++
2 files changed, 209 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 38e0a9ca5d71..959224d6750d 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -10,7 +10,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc
TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip
+TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip test_compat_mmap

TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
diff --git a/tools/testing/selftests/x86/test_compat_mmap.c b/tools/testing/selftests/x86/test_compat_mmap.c
new file mode 100644
index 000000000000..245d9407653e
--- /dev/null
+++ b/tools/testing/selftests/x86/test_compat_mmap.c
@@ -0,0 +1,208 @@
+/*
+ * Check that compat 32-bit mmap() returns address < 4Gb on 64-bit.
+ *
+ * Copyright (c) 2017 Dmitry Safonov (Virtuozzo)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <sys/mman.h>
+#include <sys/types.h>
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <signal.h>
+#include <stdlib.h>
+
+#define PAGE_SIZE 4096
+#define MMAP_SIZE (PAGE_SIZE*1024)
+#define MAX_VMAS 50
+#define BUF_SIZE 1024
+
+#ifndef __NR32_mmap2
+#define __NR32_mmap2 192
+#endif
+
+struct syscall_args32 {
+ uint32_t nr, arg0, arg1, arg2, arg3, arg4, arg5;
+};
+
+static void do_full_int80(struct syscall_args32 *args)
+{
+ asm volatile ("int $0x80"
+ : "+a" (args->nr),
+ "+b" (args->arg0), "+c" (args->arg1), "+d" (args->arg2),
+ "+S" (args->arg3), "+D" (args->arg4),
+ "+rbp" (args->arg5)
+ : : "r8", "r9", "r10", "r11");
+}
+
+void *mmap2(void *addr, size_t len, int prot, int flags,
+ int fildes, off_t off)
+{
+ struct syscall_args32 s;
+
+ s.nr = __NR32_mmap2;
+ s.arg0 = (uint32_t)(uintptr_t)addr;
+ s.arg1 = (uint32_t)len;
+ s.arg2 = prot;
+ s.arg3 = flags;
+ s.arg4 = fildes;
+ s.arg5 = (uint32_t)off;
+
+ do_full_int80(&s);
+
+ return (void *)(uintptr_t)s.nr;
+}
+
+struct vm_area {
+ unsigned long start;
+ unsigned long end;
+};
+
+static struct vm_area vmas_before_mmap[MAX_VMAS];
+static struct vm_area vmas_after_mmap[MAX_VMAS];
+
+static char buf[BUF_SIZE];
+
+int parse_maps(struct vm_area *vmas)
+{
+ FILE *maps;
+ int i;
+
+ maps = fopen("/proc/self/maps", "r");
+ if (maps == NULL) {
+ printf("[ERROR]\tFailed to open maps file: %m\n");
+ return -1;
+ }
+
+ for (i = 0; i < MAX_VMAS; i++) {
+ struct vm_area *v = &vmas[i];
+ char *end;
+
+ if (fgets(buf, BUF_SIZE, maps) == NULL)
+ break;
+
+ v->start = strtoul(buf, &end, 16);
+ v->end = strtoul(end + 1, NULL, 16);
+ //printf("[NOTE]\tVMA: [%#lx, %#lx]\n", v->start, v->end);
+ }
+
+ if (i == MAX_VMAS) {
+ printf("[ERROR]\tNumber of VMAs is bigger than reserved array's size\n");
+ return -1;
+ }
+
+ if (fclose(maps)) {
+ printf("[ERROR]\tFailed to close maps file: %m\n");
+ return -1;
+ }
+ return 0;
+}
+
+int compare_vmas(struct vm_area *vmax, struct vm_area *vmay)
+{
+ if (vmax->start > vmay->start)
+ return 1;
+ if (vmax->start < vmay->start)
+ return -1;
+ if (vmax->end > vmay->end)
+ return 1;
+ if (vmax->end < vmay->end)
+ return -1;
+ return 0;
+}
+
+unsigned long vma_size(struct vm_area *v)
+{
+ return v->end - v->start;
+}
+
+int find_new_vma_like(struct vm_area *vma)
+{
+ int i, j = 0, found_alike = -1;
+
+ for (i = 0; i < MAX_VMAS && j < MAX_VMAS; i++, j++) {
+ int cmp = compare_vmas(&vmas_before_mmap[i],
+ &vmas_after_mmap[j]);
+
+ if (cmp == 0)
+ continue;
+ if (cmp < 0) {/* Lost mapping */
+ printf("[NOTE]\tLost mapping: %#lx\n",
+ vmas_before_mmap[i].start);
+ j--;
+ continue;
+ }
+
+ printf("[NOTE]\tNew mapping appeared: %#lx\n",
+ vmas_after_mmap[j].start);
+ i--;
+ if (!compare_vmas(&vmas_after_mmap[j], vma))
+ return 0;
+
+ if (((vmas_after_mmap[j].start & 0xffffffff) == vma->start) &&
+ (vma_size(&vmas_after_mmap[j]) == vma_size(vma)))
+ found_alike = j;
+ }
+
+ /* Left new vmas in tail */
+ for (; i < MAX_VMAS; i++)
+ if (!compare_vmas(&vmas_after_mmap[j], vma))
+ return 0;
+
+ if (found_alike != -1) {
+ printf("[FAIL]\tFound VMA [%#lx, %#lx] in maps file, that was allocated with compat syscall\n",
+ vmas_after_mmap[found_alike].start,
+ vmas_after_mmap[found_alike].end);
+ return -1;
+ }
+
+ printf("[ERROR]\tCan't find [%#lx, %#lx] in maps file\n",
+ vma->start, vma->end);
+ return -1;
+}
+
+int main(int argc, char **argv)
+{
+ void *map;
+ struct vm_area vma;
+
+ if (parse_maps(vmas_before_mmap)) {
+ printf("[ERROR]\tFailed to parse maps file\n");
+ return 1;
+ }
+
+ map = mmap2(0, MMAP_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
+ MAP_PRIVATE | MAP_ANON, -1, 0);
+ if (((uintptr_t)map) % PAGE_SIZE) {
+ printf("[ERROR]\tmmap2 failed: %d\n",
+ (~(uint32_t)(uintptr_t)map) + 1);
+ return 1;
+ } else {
+ printf("[NOTE]\tAllocated mmap %p, sized %#x\n", map, MMAP_SIZE);
+ }
+
+ if (parse_maps(vmas_after_mmap)) {
+ printf("[ERROR]\tFailed to parse maps file\n");
+ return 1;
+ }
+
+ munmap(map, MMAP_SIZE);
+
+ vma.start = (unsigned long)(uintptr_t)map;
+ vma.end = vma.start + MMAP_SIZE;
+ if (find_new_vma_like(&vma))
+ return 1;
+
+ printf("[OK]\n");
+
+ return 0;
+}
--
2.11.1

2017-03-06 17:04:30

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCHv6 2/5] x86/mm: add task_size parameter to mmap_base()

To correctly handle 32-bit and 64-bit mmap() syscalls, we need different
mmap bases to start allocation from. So, introduce mmap_legacy_base()
helper and change mmap_base() to return base address according to
specified task size.
It'll prepare the mmap base computing code for splitting mmap_base
on two bases: for 64-bit syscall and for 32-bit syscalls.

Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/include/asm/elf.h | 24 ++++++++++---------
arch/x86/include/asm/processor.h | 4 +++-
arch/x86/mm/mmap.c | 50 +++++++++++++++++++++++++---------------
3 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 9d49c18b5ea9..b908141cf0c4 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -293,8 +293,19 @@ do { \
} \
} while (0)

+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+ return IS_ENABLED(CONFIG_X86_32) ||
+ (IS_ENABLED(CONFIG_COMPAT) &&
+ test_thread_flag(TIF_ADDR32));
+}
+
#ifdef CONFIG_X86_32

+#define __STACK_RND_MASK(is32bit) (0x7ff)
#define STACK_RND_MASK (0x7ff)

#define ARCH_DLINFO ARCH_DLINFO_IA32
@@ -304,7 +315,8 @@ do { \
#else /* CONFIG_X86_32 */

/* 1GB for 64bit, 8MB for 32bit */
-#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
+#define __STACK_RND_MASK(is32bit) ((is32bit) ? 0x7ff : 0x3fffff)
+#define STACK_RND_MASK __STACK_RND_MASK(mmap_is_ia32())

#define ARCH_DLINFO \
do { \
@@ -348,16 +360,6 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
int uses_interp);
#define compat_arch_setup_additional_pages compat_arch_setup_additional_pages

-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static inline int mmap_is_ia32(void)
-{
- return IS_ENABLED(CONFIG_X86_32) ||
- (IS_ENABLED(CONFIG_COMPAT) &&
- test_thread_flag(TIF_ADDR32));
-}
-
/* Do not change the values. See get_align_mask() */
enum align_flags {
ALIGN_VA_32 = BIT(0),
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f385eca5407a..7caa2ac50ea2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -797,6 +797,7 @@ static inline void spin_lock_prefetch(const void *x)
/*
* User space process size: 3GB (default).
*/
+#define IA32_PAGE_OFFSET PAGE_OFFSET
#define TASK_SIZE PAGE_OFFSET
#define TASK_SIZE_MAX TASK_SIZE
#define STACK_TOP TASK_SIZE
@@ -873,7 +874,8 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
-#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
+#define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3))
+#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE)

#define KSTK_EIP(task) (task_pt_regs(task)->ip)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index f31ed7097d0b..1e9cb945dca1 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -36,25 +36,23 @@ struct va_alignment __read_mostly va_align = {
.flags = -1,
};

-static unsigned long stack_maxrandom_size(void)
+static inline unsigned long tasksize_32bit(void)
+{
+ return IA32_PAGE_OFFSET;
+}
+
+static unsigned long stack_maxrandom_size(unsigned long task_size)
{
unsigned long max = 0;
if ((current->flags & PF_RANDOMIZE) &&
!(current->personality & ADDR_NO_RANDOMIZE)) {
- max = ((-1UL) & STACK_RND_MASK) << PAGE_SHIFT;
+ max = (-1UL) & __STACK_RND_MASK(task_size == tasksize_32bit());
+ max <<= PAGE_SHIFT;
}

return max;
}

-/*
- * Top of mmap area (just below the process stack).
- *
- * Leave an at least ~128 MB hole with possible stack randomization.
- */
-#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
-#define MAX_GAP (TASK_SIZE/6*5)
-
#ifdef CONFIG_COMPAT
# define mmap32_rnd_bits mmap_rnd_compat_bits
# define mmap64_rnd_bits mmap_rnd_bits
@@ -63,6 +61,8 @@ static unsigned long stack_maxrandom_size(void)
# define mmap64_rnd_bits mmap_rnd_bits
#endif

+#define SIZE_128M (128 * 1024 * 1024UL)
+
static int mmap_is_legacy(void)
{
if (current->personality & ADDR_COMPAT_LAYOUT)
@@ -84,16 +84,30 @@ unsigned long arch_mmap_rnd(void)
return arch_rnd(mmap_is_ia32() ? mmap32_rnd_bits : mmap64_rnd_bits);
}

-static unsigned long mmap_base(unsigned long rnd)
+static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
{
unsigned long gap = rlimit(RLIMIT_STACK);
+ unsigned long gap_min, gap_max;
+
+ /*
+ * Top of mmap area (just below the process stack).
+ * Leave an at least ~128 MB hole with possible stack randomization.
+ */
+ gap_min = SIZE_128M + stack_maxrandom_size(task_size);
+ gap_max = (task_size / 6) * 5;

- if (gap < MIN_GAP)
- gap = MIN_GAP;
- else if (gap > MAX_GAP)
- gap = MAX_GAP;
+ if (gap < gap_min)
+ gap = gap_min;
+ else if (gap > gap_max)
+ gap = gap_max;

- return PAGE_ALIGN(TASK_SIZE - gap - rnd);
+ return PAGE_ALIGN(task_size - gap - rnd);
+}
+
+static unsigned long mmap_legacy_base(unsigned long rnd,
+ unsigned long task_size)
+{
+ return __TASK_UNMAPPED_BASE(task_size) + rnd;
}

/*
@@ -107,13 +121,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
if (current->flags & PF_RANDOMIZE)
random_factor = arch_mmap_rnd();

- mm->mmap_legacy_base = TASK_UNMAPPED_BASE + random_factor;
+ mm->mmap_legacy_base = mmap_legacy_base(random_factor, TASK_SIZE);

if (mmap_is_legacy()) {
mm->mmap_base = mm->mmap_legacy_base;
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
- mm->mmap_base = mmap_base(random_factor);
+ mm->mmap_base = mmap_base(random_factor, TASK_SIZE);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
}
--
2.11.1

2017-03-13 09:40:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On Mon, 6 Mar 2017, Dmitry Safonov wrote:

> Result of mmap() calls with MAP_32BIT flag at this moment depends
> on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
> It's broken as the behavior of mmap() shouldn't depend on exec-ed
> application's bitness. Instead, it should check the bitness of mmap()
> syscall.
> How it worked before:
> o for 32-bit compatible binaries it is completely ignored. Which was
> fine when there were one mmap_base, computed for 32-bit syscalls.
> After introducing mmap_compat_base 64-bit syscalls do use computed
> for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
> address with 64-bit syscall in application launched from 32-bit
> compatible binary. And ignoring this flag is not expected behavior.

Well, the real question here is, whether we should allow 32bit applications
to obtain 64bit mappings at all. We can very well force 32bit applications
into the 4GB address space as it was before your mmap base splitup and be
done with it.

Thanks,

tglx


2017-03-13 10:00:14

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On 03/13/2017 12:39 PM, Thomas Gleixner wrote:
> On Mon, 6 Mar 2017, Dmitry Safonov wrote:
>
>> Result of mmap() calls with MAP_32BIT flag at this moment depends
>> on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
>> It's broken as the behavior of mmap() shouldn't depend on exec-ed
>> application's bitness. Instead, it should check the bitness of mmap()
>> syscall.
>> How it worked before:
>> o for 32-bit compatible binaries it is completely ignored. Which was
>> fine when there were one mmap_base, computed for 32-bit syscalls.
>> After introducing mmap_compat_base 64-bit syscalls do use computed
>> for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
>> address with 64-bit syscall in application launched from 32-bit
>> compatible binary. And ignoring this flag is not expected behavior.
>
> Well, the real question here is, whether we should allow 32bit applications
> to obtain 64bit mappings at all. We can very well force 32bit applications
> into the 4GB address space as it was before your mmap base splitup and be
> done with it.

Hmm, yes, we could restrict 32bit applications to 32bit mappings only.
But the approach which I tried to follow in the patches set, it was do
not base the logic on the bitness of launched applications
(native/compat) - only base on bitness of the performing syscall.
The idea was suggested by Andy and I made mmap() logic here independent
from original application's bitness.

It also seems to me simpler:
if 32-bit application wants to allocate 64-bit mapping, it should
long-jump with 64-bit segment descriptor and do `syscall` instruction
for 64-bit syscall entry path. So, in my point of view after this dance
the application does not differ much from native 64-bit binary and can
have 64-bit address mapping.

>
> Thanks,
>
> tglx
>
>


--
Dmitry

2017-03-13 13:47:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On Mon, 13 Mar 2017, Dmitry Safonov wrote:
> On 03/13/2017 12:39 PM, Thomas Gleixner wrote:
> > On Mon, 6 Mar 2017, Dmitry Safonov wrote:
> >
> > > Result of mmap() calls with MAP_32BIT flag at this moment depends
> > > on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
> > > It's broken as the behavior of mmap() shouldn't depend on exec-ed
> > > application's bitness. Instead, it should check the bitness of mmap()
> > > syscall.
> > > How it worked before:
> > > o for 32-bit compatible binaries it is completely ignored. Which was
> > > fine when there were one mmap_base, computed for 32-bit syscalls.
> > > After introducing mmap_compat_base 64-bit syscalls do use computed
> > > for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
> > > address with 64-bit syscall in application launched from 32-bit
> > > compatible binary. And ignoring this flag is not expected behavior.
> >
> > Well, the real question here is, whether we should allow 32bit applications
> > to obtain 64bit mappings at all. We can very well force 32bit applications
> > into the 4GB address space as it was before your mmap base splitup and be
> > done with it.
>
> Hmm, yes, we could restrict 32bit applications to 32bit mappings only.
> But the approach which I tried to follow in the patches set, it was do
> not base the logic on the bitness of launched applications
> (native/compat) - only base on bitness of the performing syscall.
> The idea was suggested by Andy and I made mmap() logic here independent
> from original application's bitness.
>
> It also seems to me simpler:
> if 32-bit application wants to allocate 64-bit mapping, it should
> long-jump with 64-bit segment descriptor and do `syscall` instruction
> for 64-bit syscall entry path. So, in my point of view after this dance
> the application does not differ much from native 64-bit binary and can
> have 64-bit address mapping.

Works for me, but it lacks documentation .....

Thanks,

tglx

2017-03-13 14:00:26

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On 03/13/2017 04:47 PM, Thomas Gleixner wrote:
> On Mon, 13 Mar 2017, Dmitry Safonov wrote:
>> On 03/13/2017 12:39 PM, Thomas Gleixner wrote:
>>> On Mon, 6 Mar 2017, Dmitry Safonov wrote:
>>>
>>>> Result of mmap() calls with MAP_32BIT flag at this moment depends
>>>> on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
>>>> It's broken as the behavior of mmap() shouldn't depend on exec-ed
>>>> application's bitness. Instead, it should check the bitness of mmap()
>>>> syscall.
>>>> How it worked before:
>>>> o for 32-bit compatible binaries it is completely ignored. Which was
>>>> fine when there were one mmap_base, computed for 32-bit syscalls.
>>>> After introducing mmap_compat_base 64-bit syscalls do use computed
>>>> for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
>>>> address with 64-bit syscall in application launched from 32-bit
>>>> compatible binary. And ignoring this flag is not expected behavior.
>>>
>>> Well, the real question here is, whether we should allow 32bit applications
>>> to obtain 64bit mappings at all. We can very well force 32bit applications
>>> into the 4GB address space as it was before your mmap base splitup and be
>>> done with it.
>>
>> Hmm, yes, we could restrict 32bit applications to 32bit mappings only.
>> But the approach which I tried to follow in the patches set, it was do
>> not base the logic on the bitness of launched applications
>> (native/compat) - only base on bitness of the performing syscall.
>> The idea was suggested by Andy and I made mmap() logic here independent
>> from original application's bitness.
>>
>> It also seems to me simpler:
>> if 32-bit application wants to allocate 64-bit mapping, it should
>> long-jump with 64-bit segment descriptor and do `syscall` instruction
>> for 64-bit syscall entry path. So, in my point of view after this dance
>> the application does not differ much from native 64-bit binary and can
>> have 64-bit address mapping.
>
> Works for me, but it lacks documentation .....

Sure, could you recommend a better place for it?
Should it be in-code comment in x86 mmap() code or Documentation/*
change or a patch to man-pages?

CC'ing Michael.


--
Dmitry

2017-03-13 14:03:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On Mon, 13 Mar 2017, Dmitry Safonov wrote:
> On 03/13/2017 04:47 PM, Thomas Gleixner wrote:
> > On Mon, 13 Mar 2017, Dmitry Safonov wrote:
> > > On 03/13/2017 12:39 PM, Thomas Gleixner wrote:
> > > > On Mon, 6 Mar 2017, Dmitry Safonov wrote:
> > > >
> > > > > Result of mmap() calls with MAP_32BIT flag at this moment depends
> > > > > on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
> > > > > It's broken as the behavior of mmap() shouldn't depend on exec-ed
> > > > > application's bitness. Instead, it should check the bitness of mmap()
> > > > > syscall.
> > > > > How it worked before:
> > > > > o for 32-bit compatible binaries it is completely ignored. Which was
> > > > > fine when there were one mmap_base, computed for 32-bit syscalls.
> > > > > After introducing mmap_compat_base 64-bit syscalls do use computed
> > > > > for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
> > > > > address with 64-bit syscall in application launched from 32-bit
> > > > > compatible binary. And ignoring this flag is not expected behavior.
> > > >
> > > > Well, the real question here is, whether we should allow 32bit
> > > > applications
> > > > to obtain 64bit mappings at all. We can very well force 32bit
> > > > applications
> > > > into the 4GB address space as it was before your mmap base splitup and
> > > > be
> > > > done with it.
> > >
> > > Hmm, yes, we could restrict 32bit applications to 32bit mappings only.
> > > But the approach which I tried to follow in the patches set, it was do
> > > not base the logic on the bitness of launched applications
> > > (native/compat) - only base on bitness of the performing syscall.
> > > The idea was suggested by Andy and I made mmap() logic here independent
> > > from original application's bitness.
> > >
> > > It also seems to me simpler:
> > > if 32-bit application wants to allocate 64-bit mapping, it should
> > > long-jump with 64-bit segment descriptor and do `syscall` instruction
> > > for 64-bit syscall entry path. So, in my point of view after this dance
> > > the application does not differ much from native 64-bit binary and can
> > > have 64-bit address mapping.
> >
> > Works for me, but it lacks documentation .....
>
> Sure, could you recommend a better place for it?
> Should it be in-code comment in x86 mmap() code or Documentation/*
> change or a patch to man-pages?

I added a comment in the code and fixed up the changelogs. man-page needs
some care as well.

Thanks,

tglx

Subject: [tip:x86/mm] x86/mm: Add task_size parameter to mmap_base()

Commit-ID: 8f3e474f3cea7b2470218a6ed6da47ff02147dce
Gitweb: http://git.kernel.org/tip/8f3e474f3cea7b2470218a6ed6da47ff02147dce
Author: Dmitry Safonov <[email protected]>
AuthorDate: Mon, 6 Mar 2017 17:17:18 +0300
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 13 Mar 2017 14:59:22 +0100

x86/mm: Add task_size parameter to mmap_base()

To correctly handle 32-bit and 64-bit mmap() syscalls in 64bit applications
its required to have separate address bases to place a mapping.

The tasksize can be used as an indicator to select the proper parameters
for mmap_base().

This requires the following changes:

- Add task_size argument to mmap_base() and make the calculation based on it.
- Provide mmap_legacy_base() as a seperate function
- Use the new functions in arch_pick_mmap_layout()

[ tglx: Massaged changelog ]

Signed-off-by: Dmitry Safonov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
arch/x86/include/asm/elf.h | 24 ++++++++++---------
arch/x86/include/asm/processor.h | 4 +++-
arch/x86/mm/mmap.c | 50 +++++++++++++++++++++++++---------------
3 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 9d49c18..b908141 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -293,8 +293,19 @@ do { \
} \
} while (0)

+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+ return IS_ENABLED(CONFIG_X86_32) ||
+ (IS_ENABLED(CONFIG_COMPAT) &&
+ test_thread_flag(TIF_ADDR32));
+}
+
#ifdef CONFIG_X86_32

+#define __STACK_RND_MASK(is32bit) (0x7ff)
#define STACK_RND_MASK (0x7ff)

#define ARCH_DLINFO ARCH_DLINFO_IA32
@@ -304,7 +315,8 @@ do { \
#else /* CONFIG_X86_32 */

/* 1GB for 64bit, 8MB for 32bit */
-#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
+#define __STACK_RND_MASK(is32bit) ((is32bit) ? 0x7ff : 0x3fffff)
+#define STACK_RND_MASK __STACK_RND_MASK(mmap_is_ia32())

#define ARCH_DLINFO \
do { \
@@ -348,16 +360,6 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
int uses_interp);
#define compat_arch_setup_additional_pages compat_arch_setup_additional_pages

-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static inline int mmap_is_ia32(void)
-{
- return IS_ENABLED(CONFIG_X86_32) ||
- (IS_ENABLED(CONFIG_COMPAT) &&
- test_thread_flag(TIF_ADDR32));
-}
-
/* Do not change the values. See get_align_mask() */
enum align_flags {
ALIGN_VA_32 = BIT(0),
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f385eca..7caa2ac 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -797,6 +797,7 @@ static inline void spin_lock_prefetch(const void *x)
/*
* User space process size: 3GB (default).
*/
+#define IA32_PAGE_OFFSET PAGE_OFFSET
#define TASK_SIZE PAGE_OFFSET
#define TASK_SIZE_MAX TASK_SIZE
#define STACK_TOP TASK_SIZE
@@ -873,7 +874,8 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
-#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
+#define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3))
+#define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE)

#define KSTK_EIP(task) (task_pt_regs(task)->ip)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index f31ed70..1e9cb94 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -36,25 +36,23 @@ struct va_alignment __read_mostly va_align = {
.flags = -1,
};

-static unsigned long stack_maxrandom_size(void)
+static inline unsigned long tasksize_32bit(void)
+{
+ return IA32_PAGE_OFFSET;
+}
+
+static unsigned long stack_maxrandom_size(unsigned long task_size)
{
unsigned long max = 0;
if ((current->flags & PF_RANDOMIZE) &&
!(current->personality & ADDR_NO_RANDOMIZE)) {
- max = ((-1UL) & STACK_RND_MASK) << PAGE_SHIFT;
+ max = (-1UL) & __STACK_RND_MASK(task_size == tasksize_32bit());
+ max <<= PAGE_SHIFT;
}

return max;
}

-/*
- * Top of mmap area (just below the process stack).
- *
- * Leave an at least ~128 MB hole with possible stack randomization.
- */
-#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
-#define MAX_GAP (TASK_SIZE/6*5)
-
#ifdef CONFIG_COMPAT
# define mmap32_rnd_bits mmap_rnd_compat_bits
# define mmap64_rnd_bits mmap_rnd_bits
@@ -63,6 +61,8 @@ static unsigned long stack_maxrandom_size(void)
# define mmap64_rnd_bits mmap_rnd_bits
#endif

+#define SIZE_128M (128 * 1024 * 1024UL)
+
static int mmap_is_legacy(void)
{
if (current->personality & ADDR_COMPAT_LAYOUT)
@@ -84,16 +84,30 @@ unsigned long arch_mmap_rnd(void)
return arch_rnd(mmap_is_ia32() ? mmap32_rnd_bits : mmap64_rnd_bits);
}

-static unsigned long mmap_base(unsigned long rnd)
+static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
{
unsigned long gap = rlimit(RLIMIT_STACK);
+ unsigned long gap_min, gap_max;
+
+ /*
+ * Top of mmap area (just below the process stack).
+ * Leave an at least ~128 MB hole with possible stack randomization.
+ */
+ gap_min = SIZE_128M + stack_maxrandom_size(task_size);
+ gap_max = (task_size / 6) * 5;

- if (gap < MIN_GAP)
- gap = MIN_GAP;
- else if (gap > MAX_GAP)
- gap = MAX_GAP;
+ if (gap < gap_min)
+ gap = gap_min;
+ else if (gap > gap_max)
+ gap = gap_max;

- return PAGE_ALIGN(TASK_SIZE - gap - rnd);
+ return PAGE_ALIGN(task_size - gap - rnd);
+}
+
+static unsigned long mmap_legacy_base(unsigned long rnd,
+ unsigned long task_size)
+{
+ return __TASK_UNMAPPED_BASE(task_size) + rnd;
}

/*
@@ -107,13 +121,13 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
if (current->flags & PF_RANDOMIZE)
random_factor = arch_mmap_rnd();

- mm->mmap_legacy_base = TASK_UNMAPPED_BASE + random_factor;
+ mm->mmap_legacy_base = mmap_legacy_base(random_factor, TASK_SIZE);

if (mmap_is_legacy()) {
mm->mmap_base = mm->mmap_legacy_base;
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
- mm->mmap_base = mmap_base(random_factor);
+ mm->mmap_base = mmap_base(random_factor, TASK_SIZE);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
}

Subject: [tip:x86/mm] x86/mm: Introduce arch_rnd() to compute 32/64 mmap random base

Commit-ID: 6a0b41d1e23dd3318568461593ae5e36d966981e
Gitweb: http://git.kernel.org/tip/6a0b41d1e23dd3318568461593ae5e36d966981e
Author: Dmitry Safonov <[email protected]>
AuthorDate: Mon, 6 Mar 2017 17:17:17 +0300
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 13 Mar 2017 14:59:22 +0100

x86/mm: Introduce arch_rnd() to compute 32/64 mmap random base

The compat (32bit) mmap() sycall issued by a 64-bit task results in a
mapping above 4GB. That's outside the compat mode address space and
prevents CRIU to restore 32bit processes from a 64bit application.

As a first step to address this, split out the address base randomizing
calculation from arch_mmap_rnd() into a helper function, which can be used
independent of mmap_ia32() based decisions.

[ tglx: Massaged changelog ]

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
arch/x86/mm/mmap.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 7940166..f31ed70 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -55,6 +55,14 @@ static unsigned long stack_maxrandom_size(void)
#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
#define MAX_GAP (TASK_SIZE/6*5)

+#ifdef CONFIG_COMPAT
+# define mmap32_rnd_bits mmap_rnd_compat_bits
+# define mmap64_rnd_bits mmap_rnd_bits
+#else
+# define mmap32_rnd_bits mmap_rnd_bits
+# define mmap64_rnd_bits mmap_rnd_bits
+#endif
+
static int mmap_is_legacy(void)
{
if (current->personality & ADDR_COMPAT_LAYOUT)
@@ -66,20 +74,14 @@ static int mmap_is_legacy(void)
return sysctl_legacy_va_layout;
}

-unsigned long arch_mmap_rnd(void)
+static unsigned long arch_rnd(unsigned int rndbits)
{
- unsigned long rnd;
-
- if (mmap_is_ia32())
-#ifdef CONFIG_COMPAT
- rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
-#else
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
-#endif
- else
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
+ return (get_random_long() & ((1UL << rndbits) - 1)) << PAGE_SHIFT;
+}

- return rnd << PAGE_SHIFT;
+unsigned long arch_mmap_rnd(void)
+{
+ return arch_rnd(mmap_is_ia32() ? mmap32_rnd_bits : mmap64_rnd_bits);
}

static unsigned long mmap_base(unsigned long rnd)

Subject: [tip:x86/mm] x86/mm: Introduce mmap_compat_base() for 32-bit mmap()

Commit-ID: 1b028f784e8c341e762c264f70dc0ca1418c8b7a
Gitweb: http://git.kernel.org/tip/1b028f784e8c341e762c264f70dc0ca1418c8b7a
Author: Dmitry Safonov <[email protected]>
AuthorDate: Mon, 6 Mar 2017 17:17:19 +0300
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 13 Mar 2017 14:59:22 +0100

x86/mm: Introduce mmap_compat_base() for 32-bit mmap()

mmap() uses a base address, from which it starts to look for a free space
for allocation.

The base address is stored in mm->mmap_base, which is calculated during
exec(). The address depends on task's size, set rlimit for stack, ASLR
randomization. The base depends on the task size and the number of random
bits which are different for 64-bit and 32bit applications.

Due to the fact, that the base address is fixed, its mmap() from a compat
(32bit) syscall issued by a 64bit task will return a address which is based
on the 64bit base address and does not fit into the 32bit address space
(4GB). The returned pointer is truncated to 32bit, which results in an
invalid address.

To solve store a seperate compat address base plus a compat legacy address
base in mm_struct. These bases are calculated at exec() time and can be
used later to address the 32bit compat mmap() issued by 64 bit
applications.

As a consequence of this change 32-bit applications issuing a 64-bit
syscall (after doing a long jump) will get a 64-bit mapping now. Before
this change 32-bit applications always got a 32bit mapping.

[ tglx: Massaged changelog and added a comment ]

Signed-off-by: Dmitry Safonov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
arch/Kconfig | 7 +++++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/elf.h | 3 +++
arch/x86/kernel/sys_x86_64.c | 23 ++++++++++++++++++----
arch/x86/mm/mmap.c | 47 ++++++++++++++++++++++++++++++++------------
include/linux/mm_types.h | 5 +++++
6 files changed, 69 insertions(+), 17 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index cd211a1..c4d6833 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -700,6 +700,13 @@ config ARCH_MMAP_RND_COMPAT_BITS
This value can be changed after boot using the
/proc/sys/vm/mmap_rnd_compat_bits tunable

+config HAVE_ARCH_COMPAT_MMAP_BASES
+ bool
+ help
+ This allows 64bit applications to invoke 32-bit mmap() syscall
+ and vice-versa 32-bit applications to call 64-bit mmap().
+ Required for applications doing different bitness syscalls.
+
config HAVE_COPY_THREAD_TLS
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a..2bab9d0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -106,6 +106,7 @@ config X86
select HAVE_ARCH_KMEMCHECK
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
+ select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index b908141..ac5be5b 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -303,6 +303,9 @@ static inline int mmap_is_ia32(void)
test_thread_flag(TIF_ADDR32));
}

+extern unsigned long tasksize_32bit(void);
+extern unsigned long tasksize_64bit(void);
+
#ifdef CONFIG_X86_32

#define __STACK_RND_MASK(is32bit) (0x7ff)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 50215a4..c54817b 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -17,6 +17,8 @@
#include <linux/uaccess.h>
#include <linux/elf.h>

+#include <asm/elf.h>
+#include <asm/compat.h>
#include <asm/ia32.h>
#include <asm/syscalls.h>

@@ -98,6 +100,18 @@ out:
return error;
}

+static unsigned long get_mmap_base(int is_legacy)
+{
+ struct mm_struct *mm = current->mm;
+
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+ if (in_compat_syscall())
+ return is_legacy ? mm->mmap_compat_legacy_base
+ : mm->mmap_compat_base;
+#endif
+ return is_legacy ? mm->mmap_legacy_base : mm->mmap_base;
+}
+
static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
@@ -114,10 +128,11 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
if (current->flags & PF_RANDOMIZE) {
*begin = randomize_page(*begin, 0x02000000);
}
- } else {
- *begin = current->mm->mmap_legacy_base;
- *end = TASK_SIZE;
+ return;
}
+
+ *begin = get_mmap_base(1);
+ *end = in_compat_syscall() ? tasksize_32bit() : tasksize_64bit();
}

unsigned long
@@ -191,7 +206,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = PAGE_SIZE;
- info.high_limit = mm->mmap_base;
+ info.high_limit = get_mmap_base(0);
info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1e9cb94..529ab79 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -36,11 +36,16 @@ struct va_alignment __read_mostly va_align = {
.flags = -1,
};

-static inline unsigned long tasksize_32bit(void)
+unsigned long tasksize_32bit(void)
{
return IA32_PAGE_OFFSET;
}

+unsigned long tasksize_64bit(void)
+{
+ return TASK_SIZE_MAX;
+}
+
static unsigned long stack_maxrandom_size(unsigned long task_size)
{
unsigned long max = 0;
@@ -81,6 +86,8 @@ static unsigned long arch_rnd(unsigned int rndbits)

unsigned long arch_mmap_rnd(void)
{
+ if (!(current->flags & PF_RANDOMIZE))
+ return 0;
return arch_rnd(mmap_is_ia32() ? mmap32_rnd_bits : mmap64_rnd_bits);
}

@@ -114,22 +121,36 @@ static unsigned long mmap_legacy_base(unsigned long rnd,
* This function, called very early during the creation of a new
* process VM image, sets up which VM layout function to use:
*/
-void arch_pick_mmap_layout(struct mm_struct *mm)
+static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
+ unsigned long random_factor, unsigned long task_size)
{
- unsigned long random_factor = 0UL;
-
- if (current->flags & PF_RANDOMIZE)
- random_factor = arch_mmap_rnd();
-
- mm->mmap_legacy_base = mmap_legacy_base(random_factor, TASK_SIZE);
+ *legacy_base = mmap_legacy_base(random_factor, task_size);
+ if (mmap_is_legacy())
+ *base = *legacy_base;
+ else
+ *base = mmap_base(random_factor, task_size);
+}

- if (mmap_is_legacy()) {
- mm->mmap_base = mm->mmap_legacy_base;
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+ if (mmap_is_legacy())
mm->get_unmapped_area = arch_get_unmapped_area;
- } else {
- mm->mmap_base = mmap_base(random_factor, TASK_SIZE);
+ else
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
- }
+
+ arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
+ arch_rnd(mmap64_rnd_bits), tasksize_64bit());
+
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+ /*
+ * The mmap syscall mapping base decision depends solely on the
+ * syscall type (64-bit or compat). This applies for 64bit
+ * applications and 32bit applications. The 64bit syscall uses
+ * mmap_base, the compat syscall uses mmap_compat_base.
+ */
+ arch_pick_mmap_base(&mm->mmap_compat_base, &mm->mmap_compat_legacy_base,
+ arch_rnd(mmap32_rnd_bits), tasksize_32bit());
+#endif
}

const char *arch_vma_name(struct vm_area_struct *vma)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f60f45f..45cdb27 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -367,6 +367,11 @@ struct mm_struct {
#endif
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
+#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
+ /* Base adresses for compatible mmap() */
+ unsigned long mmap_compat_base;
+ unsigned long mmap_compat_legacy_base;
+#endif
unsigned long task_size; /* size of task vm space */
unsigned long highest_vm_end; /* highest vma end address */
pgd_t * pgd;

Subject: [tip:x86/mm] x86/mm: Make mmap(MAP_32BIT) work correctly

Commit-ID: 3e6ef9c80946f781fc25e8490c9875b1d2b61158
Gitweb: http://git.kernel.org/tip/3e6ef9c80946f781fc25e8490c9875b1d2b61158
Author: Dmitry Safonov <[email protected]>
AuthorDate: Mon, 6 Mar 2017 17:17:20 +0300
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 13 Mar 2017 14:59:23 +0100

x86/mm: Make mmap(MAP_32BIT) work correctly

mmap(MAP_32BIT) is broken due to the dependency on the TIF_ADDR32 thread
flag.

For 64bit applications MAP_32BIT will force legacy bottom-up allocations and
the 1GB address space restriction even if the application issued a compat
syscall, which should not be subject of these restrictions.

For 32bit applications, which issue 64bit syscalls the newly introduced
mmap base separation into 64-bit and compat bases changed the behaviour
because now a 64-bit mapping is returned, but due to the TIF_ADDR32
dependency MAP_32BIT is ignored. Before the separation a 32-bit mapping was
returned, so the MAP_32BIT handling was irrelevant.

Replace the check for TIF_ADDR32 with a check for the compat syscall. That
solves both the 64-bit issuing a compat syscall and the 32-bit issuing a
64-bit syscall problems.

[ tglx: Massaged changelog ]

Signed-off-by: Dmitry Safonov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
arch/x86/kernel/sys_x86_64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index c54817b..63e89df 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -115,7 +115,7 @@ static unsigned long get_mmap_base(int is_legacy)
static void find_start_end(unsigned long flags, unsigned long *begin,
unsigned long *end)
{
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
+ if (!in_compat_syscall() && (flags & MAP_32BIT)) {
/* This is usually used needed to map code in small
model, so it needs to be in the first 31bit. Limit
it to that. This means we need to move the
@@ -191,7 +191,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;

/* for MAP_32BIT mappings we force the legacy mmap base */
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
+ if (!in_compat_syscall() && (flags & MAP_32BIT))
goto bottomup;

/* requesting a specific address */

2017-03-13 14:20:54

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On 03/13/2017 05:03 PM, Thomas Gleixner wrote:
> On Mon, 13 Mar 2017, Dmitry Safonov wrote:
>> On 03/13/2017 04:47 PM, Thomas Gleixner wrote:
>>> On Mon, 13 Mar 2017, Dmitry Safonov wrote:
>>>> On 03/13/2017 12:39 PM, Thomas Gleixner wrote:
>>>>> On Mon, 6 Mar 2017, Dmitry Safonov wrote:
>>>>>
>>>>>> Result of mmap() calls with MAP_32BIT flag at this moment depends
>>>>>> on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
>>>>>> It's broken as the behavior of mmap() shouldn't depend on exec-ed
>>>>>> application's bitness. Instead, it should check the bitness of mmap()
>>>>>> syscall.
>>>>>> How it worked before:
>>>>>> o for 32-bit compatible binaries it is completely ignored. Which was
>>>>>> fine when there were one mmap_base, computed for 32-bit syscalls.
>>>>>> After introducing mmap_compat_base 64-bit syscalls do use computed
>>>>>> for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
>>>>>> address with 64-bit syscall in application launched from 32-bit
>>>>>> compatible binary. And ignoring this flag is not expected behavior.
>>>>>
>>>>> Well, the real question here is, whether we should allow 32bit
>>>>> applications
>>>>> to obtain 64bit mappings at all. We can very well force 32bit
>>>>> applications
>>>>> into the 4GB address space as it was before your mmap base splitup and
>>>>> be
>>>>> done with it.
>>>>
>>>> Hmm, yes, we could restrict 32bit applications to 32bit mappings only.
>>>> But the approach which I tried to follow in the patches set, it was do
>>>> not base the logic on the bitness of launched applications
>>>> (native/compat) - only base on bitness of the performing syscall.
>>>> The idea was suggested by Andy and I made mmap() logic here independent
>>>> from original application's bitness.
>>>>
>>>> It also seems to me simpler:
>>>> if 32-bit application wants to allocate 64-bit mapping, it should
>>>> long-jump with 64-bit segment descriptor and do `syscall` instruction
>>>> for 64-bit syscall entry path. So, in my point of view after this dance
>>>> the application does not differ much from native 64-bit binary and can
>>>> have 64-bit address mapping.
>>>
>>> Works for me, but it lacks documentation .....
>>
>> Sure, could you recommend a better place for it?
>> Should it be in-code comment in x86 mmap() code or Documentation/*
>> change or a patch to man-pages?
>
> I added a comment in the code and fixed up the changelogs. man-page needs
> some care as well.

Big thanks, Thomas!
I'll make a patch for man-pages on the week.

>
> Thanks,
>
> tglx
>


--
Dmitry

2017-03-13 15:30:24

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCHv6 4/5] x86/mm: check in_compat_syscall() instead TIF_ADDR32 for mmap(MAP_32BIT)

On Mon, Mar 13, 2017 at 6:47 AM, Thomas Gleixner <[email protected]> wrote:
> On Mon, 13 Mar 2017, Dmitry Safonov wrote:
>> On 03/13/2017 12:39 PM, Thomas Gleixner wrote:
>> > On Mon, 6 Mar 2017, Dmitry Safonov wrote:
>> >
>> > > Result of mmap() calls with MAP_32BIT flag at this moment depends
>> > > on thread flag TIF_ADDR32, which is set during exec() for 32-bit apps.
>> > > It's broken as the behavior of mmap() shouldn't depend on exec-ed
>> > > application's bitness. Instead, it should check the bitness of mmap()
>> > > syscall.
>> > > How it worked before:
>> > > o for 32-bit compatible binaries it is completely ignored. Which was
>> > > fine when there were one mmap_base, computed for 32-bit syscalls.
>> > > After introducing mmap_compat_base 64-bit syscalls do use computed
>> > > for 64-bit syscalls mmap_base, which means that we can allocate 64-bit
>> > > address with 64-bit syscall in application launched from 32-bit
>> > > compatible binary. And ignoring this flag is not expected behavior.
>> >
>> > Well, the real question here is, whether we should allow 32bit applications
>> > to obtain 64bit mappings at all. We can very well force 32bit applications
>> > into the 4GB address space as it was before your mmap base splitup and be
>> > done with it.
>>
>> Hmm, yes, we could restrict 32bit applications to 32bit mappings only.
>> But the approach which I tried to follow in the patches set, it was do
>> not base the logic on the bitness of launched applications
>> (native/compat) - only base on bitness of the performing syscall.
>> The idea was suggested by Andy and I made mmap() logic here independent
>> from original application's bitness.
>>
>> It also seems to me simpler:
>> if 32-bit application wants to allocate 64-bit mapping, it should
>> long-jump with 64-bit segment descriptor and do `syscall` instruction
>> for 64-bit syscall entry path. So, in my point of view after this dance
>> the application does not differ much from native 64-bit binary and can
>> have 64-bit address mapping.

I agree.

>
> Works for me, but it lacks documentation .....
>
> Thanks,
>
> tglx



--
Andy Lutomirski
AMA Capital Management, LLC

2017-03-14 01:31:43

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [x86/mm] 0d708eaade: libhugetlbfs.32bit.unlinked_fd.fail


FYI, we noticed the following commit:

commit: 0d708eaade1be6e0e6da453f5ddde178143b8fa1 ("x86/mm: introduce mmap_compat_base for 32-bit mmap()")
url: https://github.com/0day-ci/linux/commits/Dmitry-Safonov/Fix-compatible-mmap-return-pointer-over-4Gb/20170307-183052


in testcase: libhugetlbfs
with following parameters:

pagesize: 2MB
pagenum: 25
cpufreq_governor: performance



on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):




2017-03-08 09:13:02 obj/hugeadm --add-temp-swap=25 --pool-pages-min 2MB:25 --hard
Setting up swapspace version 1, size = 50 MiB (52424704 bytes)
no label, UUID=295fdfb1-1054-4692-9512-3d2e4236627f
2017-03-08 09:13:02 make check
zero_filesize_segment (2M: 32): PASS
zero_filesize_segment (2M: 64): PASS
test_root (2M: 32): PASS
test_root (2M: 64): PASS
meminfo_nohuge (2M: 32): PASS
meminfo_nohuge (2M: 64): PASS
gethugepagesize (2M: 32): PASS
gethugepagesize (2M: 64): PASS
gethugepagesizes (2M: 32): PASS
gethugepagesizes (2M: 64): PASS
HUGETLB_VERBOSE=1 empty_mounts (2M: 32): PASS
HUGETLB_VERBOSE=1 empty_mounts (2M: 64): PASS
HUGETLB_VERBOSE=1 large_mounts (2M: 32): PASS
HUGETLB_VERBOSE=1 large_mounts (2M: 64): PASS
find_path (2M: 32): PASS
find_path (2M: 64): PASS
unlinked_fd (2M: 32): FAIL mmap(): Cannot allocate memory
unlinked_fd (2M: 64): PASS
readback (2M: 32): FAIL mmap(): Cannot allocate memory
readback (2M: 64): PASS
truncate (2M: 32): FAIL mmap(): Cannot allocate memory
truncate (2M: 64): PASS
shared (2M: 32): FAIL mmap() 1: Cannot allocate memory
shared (2M: 64): PASS
mprotect (2M: 32): FAIL mmap(): Cannot allocate memory
mprotect (2M: 64): PASS
mlock (2M: 32): FAIL mmap() failed (flags=2): Cannot allocate memory
mlock (2M: 64): PASS
misalign (2M: 32): FAIL mmap() without hint failed: Cannot allocate memory
misalign (2M: 64): PASS
fallocate_basic.sh (2M: 32): PASS
fallocate_basic.sh (2M: 64): PASS
fallocate_align.sh (2M: 32): PASS
fallocate_align.sh (2M: 64): PASS
ptrace-write-hugepage (2M: 32): FAIL mmap(): Cannot allocate memory
ptrace-write-hugepage (2M: 64): PASS
icache-hygiene (2M: 32): FAIL mmap() 1: Cannot allocate memory
icache-hygiene (2M: 64): PASS
slbpacaflush (2M: 32): FAIL mmap(): Cannot allocate memory
slbpacaflush (2M: 64): PASS (inconclusive)
straddle_4GB_static (2M: 64): PASS
huge_at_4GB_normal_below_static (2M: 64): PASS
huge_below_4GB_normal_above_static (2M: 64): PASS
map_high_truncate_2 (2M: 32): FAIL mmap() 1: Cannot allocate memory
map_high_truncate_2 (2M: 64): PASS
misaligned_offset (2M: 32): FAIL mmap() offset 4GB: Cannot allocate memory
misaligned_offset (2M: 64): PASS (inconclusive)
truncate_above_4GB (2M: 32): FAIL mmap() offset 4GB: Cannot allocate memory
truncate_above_4GB (2M: 64): PASS
brk_near_huge (2M: 32):
brk_near_huge (2M: 64):
task-size-overrun (2M: 32): PASS
task-size-overrun (2M: 64): PASS
stack_grow_into_huge (2M: 32): PASS
stack_grow_into_huge (2M: 64): PASS
corrupt-by-cow-opt (2M: 32): FAIL mmap() 1: Cannot allocate memory
corrupt-by-cow-opt (2M: 64): PASS
noresv-preserve-resv-page (2M: 32): FAIL mmap() 1: Cannot allocate memory
noresv-preserve-resv-page (2M: 64): PASS
noresv-regarded-as-resv (2M: 32): FAIL mmap() 1: Cannot allocate memory
noresv-regarded-as-resv (2M: 64): PASS
readahead_reserve.sh (2M: 32): FAIL mmap(): Cannot allocate memory
readahead_reserve.sh (2M: 64): PASS
madvise_reserve.sh (2M: 32): FAIL mmap(): Cannot allocate memory
madvise_reserve.sh (2M: 64): PASS
fadvise_reserve.sh (2M: 32): FAIL mmap(): Cannot allocate memory
fadvise_reserve.sh (2M: 64): PASS
mremap-expand-slice-collision.sh (2M: 32): PASS
mremap-expand-slice-collision.sh (2M: 64): PASS
mremap-fixed-normal-near-huge.sh (2M: 32): FAIL mmap(): Cannot allocate memory
mremap-fixed-normal-near-huge.sh (2M: 64): PASS
mremap-fixed-huge-near-normal.sh (2M: 32): FAIL mmap(huge page): Cannot allocate memory
mremap-fixed-huge-near-normal.sh (2M: 64): PASS
set shmmax limit to 67108864
shm-perms (2M: 32): Bad configuration: Must have at least 32 free hugepages
shm-perms (2M: 64): Bad configuration: Must have at least 32 free hugepages
private (2M: 32): FAIL mmap() SHARED: Cannot allocate memory
private (2M: 64): PASS
fork-cow (2M: 32): FAIL mmap(): Cannot allocate memory
fork-cow (2M: 64): PASS
direct (2M: 32): Bad configuration: Failed to open direct-IO file: Invalid argument
direct (2M: 64): Bad configuration: Failed to open direct-IO file: File exists
malloc (2M: 32): PASS
malloc (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:none HUGETLB_MORECORE=yes malloc (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:none HUGETLB_MORECORE=yes malloc (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc HUGETLB_MORECORE=yes malloc (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc HUGETLB_MORECORE=yes malloc (2M: 64): PASS
malloc_manysmall (2M: 32): PASS
malloc_manysmall (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 64): PASS
heapshrink (2M: 32): PASS
heapshrink (2M: 64): PASS
LD_PRELOAD=libheapshrink.so heapshrink (2M: 32): PASS
LD_PRELOAD=libheapshrink.so heapshrink (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes heapshrink (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes heapshrink (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 64): PASS
LD_PRELOAD=libheapshrink.so HUGETLB_MORECORE_SHRINK=yes HUGETLB_MORECORE=yes heapshrink (2M: 32): PASS (inconclusive)
LD_PRELOAD=libheapshrink.so HUGETLB_MORECORE_SHRINK=yes HUGETLB_MORECORE=yes heapshrink (2M: 64): PASS (inconclusive)
LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE_SHRINK=yes HUGETLB_MORECORE=yes heapshrink (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE_SHRINK=yes HUGETLB_MORECORE=yes heapshrink (2M: 64): PASS
HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (2M: 32): PASS
HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (2M: 64): PASS
HUGETLB_VERBOSE=0 linkhuge_nofd (2M: 32): PASS
HUGETLB_VERBOSE=0 linkhuge_nofd (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_VERBOSE=0 linkhuge_nofd (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_VERBOSE=0 linkhuge_nofd (2M: 64): PASS
HUGETLB_VERBOSE=0 xB.linkhuge_nofd (2M: 32):
HUGETLB_VERBOSE=0 xB.linkhuge_nofd (2M: 64): PASS
HUGETLB_VERBOSE=0 xBDT.linkhuge_nofd (2M: 32): PASS
HUGETLB_VERBOSE=0 xBDT.linkhuge_nofd (2M: 64): PASS
HUGETLB_VERBOSE=0 HUGETLB_MINIMAL_COPY=no xB.linkhuge_nofd (2M: 32):
HUGETLB_VERBOSE=0 HUGETLB_MINIMAL_COPY=no xB.linkhuge_nofd (2M: 64): PASS
HUGETLB_VERBOSE=0 HUGETLB_MINIMAL_COPY=no xBDT.linkhuge_nofd (2M: 32): PASS
HUGETLB_VERBOSE=0 HUGETLB_MINIMAL_COPY=no xBDT.linkhuge_nofd (2M: 64): PASS
HUGETLB_VERBOSE=0 HUGETLB_ELFMAP=no xB.linkhuge_nofd (2M: 32):
HUGETLB_VERBOSE=0 HUGETLB_ELFMAP=no xB.linkhuge_nofd (2M: 64): PASS
HUGETLB_VERBOSE=0 HUGETLB_ELFMAP=no xBDT.linkhuge_nofd (2M: 32): PASS
HUGETLB_VERBOSE=0 HUGETLB_ELFMAP=no xBDT.linkhuge_nofd (2M: 64): PASS
linkhuge (2M: 32): PASS
linkhuge (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so linkhuge (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so linkhuge (2M: 64): PASS
xB.linkhuge (2M: 32): FAIL small_bss is not hugepage
xB.linkhuge (2M: 64): PASS
xBDT.linkhuge (2M: 32): FAIL small_data is not hugepage
xBDT.linkhuge (2M: 64): PASS
HUGETLB_MINIMAL_COPY=no xB.linkhuge (2M: 32): FAIL small_bss is not hugepage
HUGETLB_MINIMAL_COPY=no xB.linkhuge (2M: 64): PASS
HUGETLB_MINIMAL_COPY=no xBDT.linkhuge (2M: 32): FAIL small_data is not hugepage
HUGETLB_MINIMAL_COPY=no xBDT.linkhuge (2M: 64): PASS
HUGETLB_ELFMAP=no xB.linkhuge (2M: 32): PASS
HUGETLB_ELFMAP=no xB.linkhuge (2M: 64): PASS
HUGETLB_ELFMAP=no xBDT.linkhuge (2M: 32): PASS
HUGETLB_ELFMAP=no xBDT.linkhuge (2M: 64): PASS
HUGETLB_SHARE=1 xB.linkshare (2M: 32): PASS
HUGETLB_SHARE=1 xB.linkshare (2M: 64): PASS
HUGETLB_SHARE=1 xBDT.linkshare (2M: 32): PASS
HUGETLB_SHARE=1 xBDT.linkshare (2M: 64): PASS
HUGETLB_SHARE=1 xB.linkshare (2M: 32): PASS
HUGETLB_SHARE=1 xB.linkshare (2M: 64): PASS
HUGETLB_SHARE=1 xBDT.linkshare (2M: 32): PASS
HUGETLB_SHARE=1 xBDT.linkshare (2M: 64): PASS
HUGETLB_SHARE=0 xB.linkhuge (2M: 32): FAIL small_bss is not hugepage
HUGETLB_SHARE=0 xB.linkhuge (2M: 64): PASS
HUGETLB_SHARE=1 xB.linkhuge (2M: 32): FAIL small_bss is not hugepage
HUGETLB_SHARE=1 xB.linkhuge (2M: 64): PASS
HUGETLB_SHARE=0 xBDT.linkhuge (2M: 32): FAIL small_data is not hugepage
HUGETLB_SHARE=0 xBDT.linkhuge (2M: 64): PASS
HUGETLB_SHARE=1 xBDT.linkhuge (2M: 32): FAIL small_data is not hugepage
HUGETLB_SHARE=1 xBDT.linkhuge (2M: 64): PASS
linkhuge_rw (2M: 32): PASS
linkhuge_rw (2M: 64): PASS
HUGETLB_ELFMAP=R linkhuge_rw (2M: 32): FAIL small_const is not hugepage
HUGETLB_ELFMAP=R linkhuge_rw (2M: 64): FAIL small_const is not hugepage
HUGETLB_ELFMAP=W linkhuge_rw (2M: 32):
HUGETLB_ELFMAP=W linkhuge_rw (2M: 64):
HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32):
HUGETLB_ELFMAP=RW linkhuge_rw (2M: 64):
HUGETLB_ELFMAP=no linkhuge_rw (2M: 32): PASS
HUGETLB_ELFMAP=no linkhuge_rw (2M: 64): PASS
HUGETLB_ELFMAP= HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32): PASS
HUGETLB_ELFMAP= HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 64): PASS
HUGETLB_ELFMAP=W HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32): FAIL small_data is not hugepage
HUGETLB_ELFMAP=W HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 64): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 64): FAIL small_data is not hugepage
HUGETLB_SHARE=0 HUGETLB_ELFMAP=R linkhuge_rw (2M: 32): FAIL small_const is not hugepage
HUGETLB_SHARE=0 HUGETLB_ELFMAP=R linkhuge_rw (2M: 64): FAIL small_const is not hugepage
HUGETLB_SHARE=1 HUGETLB_ELFMAP=R linkhuge_rw (2M: 32): FAIL small_const is not hugepage
HUGETLB_SHARE=1 HUGETLB_ELFMAP=R linkhuge_rw (2M: 64): FAIL small_const is not hugepage
HUGETLB_SHARE=0 HUGETLB_ELFMAP=W linkhuge_rw (2M: 32):
HUGETLB_SHARE=0 HUGETLB_ELFMAP=W linkhuge_rw (2M: 64):
HUGETLB_SHARE=1 HUGETLB_ELFMAP=W linkhuge_rw (2M: 32):
HUGETLB_SHARE=1 HUGETLB_ELFMAP=W linkhuge_rw (2M: 64):
HUGETLB_SHARE=0 HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32):
HUGETLB_SHARE=0 HUGETLB_ELFMAP=RW linkhuge_rw (2M: 64):
HUGETLB_SHARE=1 HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32):
HUGETLB_SHARE=1 HUGETLB_ELFMAP=RW linkhuge_rw (2M: 64):
chunk-overcommit (2M: 32): FAIL mmap() chunk1: Cannot allocate memory
chunk-overcommit (2M: 64): PASS
alloc-instantiate-race shared (2M: 32): FAIL mmap() 1: Cannot allocate memory
alloc-instantiate-race shared (2M: 64): PASS
alloc-instantiate-race private (2M: 32): FAIL mmap() 1: Cannot allocate memory
alloc-instantiate-race private (2M: 64): PASS
truncate_reserve_wraparound (2M: 32): FAIL mmap(): Cannot allocate memory
truncate_reserve_wraparound (2M: 64): PASS
truncate_sigbus_versus_oom (2M: 32): FAIL mmap(): Cannot allocate memory
truncate_sigbus_versus_oom (2M: 64): PASS
get_huge_pages (2M: 32): FAIL get_huge_pages() for 1 hugepages
get_huge_pages (2M: 64): PASS
shmoverride_linked (2M: 32): PASS
shmoverride_linked (2M: 64): PASS
HUGETLB_SHM=yes shmoverride_linked (2M: 32): FAIL shmmat failed from line 176: Cannot allocate memory
HUGETLB_SHM=yes shmoverride_linked (2M: 64): PASS
shmoverride_linked_static (2M: 32): PASS
shmoverride_linked_static (2M: 64): PASS
HUGETLB_SHM=yes shmoverride_linked_static (2M: 32): FAIL shmmat failed from line 176: Cannot allocate memory
HUGETLB_SHM=yes shmoverride_linked_static (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so shmoverride_unlinked (2M: 32): PASS
LD_PRELOAD=libhugetlbfs.so shmoverride_unlinked (2M: 64): PASS
LD_PRELOAD=libhugetlbfs.so HUGETLB_SHM=yes shmoverride_unlinked (2M: 32): FAIL shmmat failed from line 176: Cannot allocate memory
LD_PRELOAD=libhugetlbfs.so HUGETLB_SHM=yes shmoverride_unlinked (2M: 64): PASS
quota.sh (2M: 32): FAIL kernel_has_private_reservations() failed
quota.sh (2M: 64): PASS
counters.sh (2M: 32): FAIL kernel_has_private_reservations() failed
counters.sh (2M: 64): PASS
mmap-gettest 10 25 (2M: 32): FAIL Failed to mmap the hugetlb file: Cannot allocate memory
mmap-gettest 10 25 (2M: 64): PASS
mmap-cow 24 25 (2M: 32): FAIL Failed to create shared mapping: Cannot allocate memory
mmap-cow 24 25 (2M: 64): PASS
set shmmax limit to 52428800
shm-fork 10 12 (2M: 32): FAIL Thread 0 (pid=4492) failed
shm-fork 10 12 (2M: 64): PASS
set shmmax limit to 52428800
shm-fork 10 25 (2M: 32): FAIL Thread 0 (pid=4535) failed
shm-fork 10 25 (2M: 64): PASS
set shmmax limit to 52428800
shm-getraw 25 /dev/full (2M: 32): FAIL shmat() failed: Cannot allocate memory
shm-getraw 25 /dev/full (2M: 64): PASS
fallocate_stress.sh (2M: 32): FAIL mmap(): Cannot allocate memory FAIL mmap(): FAIL mmap(): Cannot allocate memory FAIL mmap(): Cannot allocate memory
fallocate_stress.sh (2M: 64): PASS
********** TEST SUMMARY
* 2M
* 32-bit 64-bit
* Total testcases: 110 113
* Skipped: 0 0
* PASS: 45 99
* FAIL: 53 5
* Killed by signal: 10 7
* Bad configuration: 2 2
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Strange test result: 0 0
**********



To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Xiaolong


Attachments:
(No filename) (13.62 kB)
config-4.10.0-11076-g0d708ea (154.21 kB)
job-script (6.43 kB)
kmsg.xz (28.93 kB)
libhugetlbfs (25.20 kB)
job.yaml (4.13 kB)
reproduce (180.00 B)
Download all attachments