This series introduces generic functions to make top-down mmap layout
easily accessible to architectures, in particular riscv which was
the initial goal of this series.
The generic implementation was taken from arm64 and used successively
by arm, mips and finally riscv.
Note that in addition the series fixes 2 issues:
- stack randomization was taken into account even if not necessary.
- [1] fixed an issue with mmap base which did not take into account
randomization but did not report it to arm and mips, so by moving
arm64 into a generic library, this problem is now fixed for both
architectures.
This work is an effort to factorize architecture functions to avoid
code duplication and oversights as in [1].
[1]: https://www.mail-archive.com/[email protected]/msg1429066.html
Changes in v6:
- Do not handle sv48 as it will be correctly implemented later: assume
64BIT <=> sv39.
- Add acked-by from Paul
Changes in v5:
- Fix [PATCH 11/14]
- Rebase on top of v5.3rc2 and commit
"riscv: kbuild: add virtual memory system selection"
- [PATCH 14/14] now takes into account the various virtual memory systems
Changes in v4:
- Make ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT select ARCH_HAS_ELF_RANDOMIZE
by default as suggested by Kees,
- ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT depends on MMU and defines the
functions needed by ARCH_HAS_ELF_RANDOMIZE => architectures that use
the generic mmap topdown functions cannot have ARCH_HAS_ELF_RANDOMIZE
selected without MMU, but I think it's ok since randomization without
MMU does not add much security anyway.
- There is no common API to determine if a process is 32b, so I came up with
!IS_ENABLED(CONFIG_64BIT) || is_compat_task() in [PATCH v4 12/14].
- Mention in the change log that x86 already takes care of not offseting mmap
base address if the task does not want randomization.
- Re-introduce a comment that should not have been removed.
- Add Reviewed/Acked-By from Paul, Christoph and Kees, thank you for that.
- I tried to minimize the changes from the commits in v3 in order to make
easier the review of the v4, the commits changed or added are:
- [PATCH v4 5/14]
- [PATCH v4 8/14]
- [PATCH v4 11/14]
- [PATCH v4 12/14]
- [PATCH v4 13/14]
Changes in v3:
- Split into small patches to ease review as suggested by Christoph
Hellwig and Kees Cook
- Move help text of new config as a comment, as suggested by Christoph
- Make new config depend on MMU, as suggested by Christoph
Changes in v2 as suggested by Christoph Hellwig:
- Preparatory patch that moves randomize_stack_top
- Fix duplicate config in riscv
- Align #if defined on next line => this gives rise to a checkpatch
warning. I found this pattern all around the tree, in the same proportion
as the previous pattern which was less pretty:
git grep -C 1 -n -P "^#if defined.+\|\|.*\\\\$"
Alexandre Ghiti (14):
mm, fs: Move randomize_stack_top from fs to mm
arm64: Make use of is_compat_task instead of hardcoding this test
arm64: Consider stack randomization for mmap base only when necessary
arm64, mm: Move generic mmap layout functions to mm
arm64, mm: Make randomization selected by generic topdown mmap layout
arm: Properly account for stack randomization and stack guard gap
arm: Use STACK_TOP when computing mmap base address
arm: Use generic mmap top-down layout and brk randomization
mips: Properly account for stack randomization and stack guard gap
mips: Use STACK_TOP when computing mmap base address
mips: Adjust brk randomization offset to fit generic version
mips: Replace arch specific way to determine 32bit task with generic
version
mips: Use generic mmap top-down layout and brk randomization
riscv: Make mmap allocation top-down by default
arch/Kconfig | 11 +++
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/processor.h | 2 -
arch/arm/kernel/process.c | 5 --
arch/arm/mm/mmap.c | 52 --------------
arch/arm64/Kconfig | 2 +-
arch/arm64/include/asm/processor.h | 2 -
arch/arm64/kernel/process.c | 8 ---
arch/arm64/mm/mmap.c | 72 -------------------
arch/mips/Kconfig | 2 +-
arch/mips/include/asm/processor.h | 5 --
arch/mips/mm/mmap.c | 84 ----------------------
arch/riscv/Kconfig | 12 ++++
fs/binfmt_elf.c | 20 ------
include/linux/mm.h | 2 +
kernel/sysctl.c | 6 +-
mm/util.c | 107 ++++++++++++++++++++++++++++-
17 files changed, 138 insertions(+), 256 deletions(-)
--
2.20.1
This preparatory commit moves this function so that further introduction
of generic topdown mmap layout is contained only in mm/util.c.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
fs/binfmt_elf.c | 20 --------------------
include/linux/mm.h | 2 ++
mm/util.c | 22 ++++++++++++++++++++++
3 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d4e11b2e04f6..cec3b4146440 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -670,26 +670,6 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
* libraries. There is no binary dependent code anywhere else.
*/
-#ifndef STACK_RND_MASK
-#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12)) /* 8MB of VA */
-#endif
-
-static unsigned long randomize_stack_top(unsigned long stack_top)
-{
- unsigned long random_variable = 0;
-
- if (current->flags & PF_RANDOMIZE) {
- random_variable = get_random_long();
- random_variable &= STACK_RND_MASK;
- random_variable <<= PAGE_SHIFT;
- }
-#ifdef CONFIG_STACK_GROWSUP
- return PAGE_ALIGN(stack_top) + random_variable;
-#else
- return PAGE_ALIGN(stack_top) - random_variable;
-#endif
-}
-
static int load_elf_binary(struct linux_binprm *bprm)
{
struct file *interpreter = NULL; /* to shut gcc up */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..ae0e5d241eb8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2351,6 +2351,8 @@ extern int install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long flags, struct page **pages);
+unsigned long randomize_stack_top(unsigned long stack_top);
+
extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
extern unsigned long mmap_region(struct file *file, unsigned long addr,
diff --git a/mm/util.c b/mm/util.c
index e6351a80f248..15a4fb0f5473 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -16,6 +16,8 @@
#include <linux/hugetlb.h>
#include <linux/vmalloc.h>
#include <linux/userfaultfd_k.h>
+#include <linux/elf.h>
+#include <linux/random.h>
#include <linux/uaccess.h>
@@ -293,6 +295,26 @@ int vma_is_stack_for_current(struct vm_area_struct *vma)
return (vma->vm_start <= KSTK_ESP(t) && vma->vm_end >= KSTK_ESP(t));
}
+#ifndef STACK_RND_MASK
+#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12)) /* 8MB of VA */
+#endif
+
+unsigned long randomize_stack_top(unsigned long stack_top)
+{
+ unsigned long random_variable = 0;
+
+ if (current->flags & PF_RANDOMIZE) {
+ random_variable = get_random_long();
+ random_variable &= STACK_RND_MASK;
+ random_variable <<= PAGE_SHIFT;
+ }
+#ifdef CONFIG_STACK_GROWSUP
+ return PAGE_ALIGN(stack_top) + random_variable;
+#else
+ return PAGE_ALIGN(stack_top) - random_variable;
+#endif
+}
+
#if defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
--
2.20.1
Each architecture has its own way to determine if a task is a compat task,
by using is_compat_task in arch_mmap_rnd, it allows more genericity and
then it prepares its moving to mm/.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/arm64/mm/mmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index b050641b5139..bb0140afed66 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -43,7 +43,7 @@ unsigned long arch_mmap_rnd(void)
unsigned long rnd;
#ifdef CONFIG_COMPAT
- if (test_thread_flag(TIF_32BIT))
+ if (is_compat_task())
rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
else
#endif
--
2.20.1
arm64 handles top-down mmap layout in a way that can be easily reused
by other architectures, so make it available in mm.
It then introduces a new config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
that can be set by other architectures to benefit from those functions.
Note that this new config depends on MMU being enabled, if selected
without MMU support, a warning will be thrown.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/Kconfig | 10 ++++
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/processor.h | 2 -
arch/arm64/mm/mmap.c | 76 -----------------------------
kernel/sysctl.c | 6 ++-
mm/util.c | 78 +++++++++++++++++++++++++++++-
6 files changed, 92 insertions(+), 81 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index a7b57dd42c26..a0bb6fa4d381 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -696,6 +696,16 @@ config HAVE_ARCH_COMPAT_MMAP_BASES
and vice-versa 32-bit applications to call 64-bit mmap().
Required for applications doing different bitness syscalls.
+# This allows to use a set of generic functions to determine mmap base
+# address by giving priority to top-down scheme only if the process
+# is not in legacy mode (compat task, unlimited stack size or
+# sysctl_legacy_va_layout).
+# Architecture that selects this option can provide its own version of:
+# - STACK_RND_MASK
+config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
+ bool
+ depends on MMU
+
config HAVE_COPY_THREAD_TLS
bool
help
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..14a194e63458 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -72,6 +72,7 @@ config ARM64
select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
+ select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 844e2964b0f5..65e2de00913f 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -281,8 +281,6 @@ static inline void spin_lock_prefetch(const void *ptr)
"nop") : : "p" (ptr));
}
-#define HAVE_ARCH_PICK_MMAP_LAYOUT
-
#endif
extern unsigned long __ro_after_init signal_minsigstksz; /* sigframe size */
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index e4acaead67de..3028bacbc4e9 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -20,82 +20,6 @@
#include <asm/cputype.h>
-/*
- * Leave enough space between the mmap area and the stack to honour ulimit in
- * the face of randomisation.
- */
-#define MIN_GAP (SZ_128M)
-#define MAX_GAP (STACK_TOP/6*5)
-
-static int mmap_is_legacy(struct rlimit *rlim_stack)
-{
- if (current->personality & ADDR_COMPAT_LAYOUT)
- return 1;
-
- if (rlim_stack->rlim_cur == RLIM_INFINITY)
- return 1;
-
- return sysctl_legacy_va_layout;
-}
-
-unsigned long arch_mmap_rnd(void)
-{
- unsigned long rnd;
-
-#ifdef CONFIG_COMPAT
- if (is_compat_task())
- rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
- else
-#endif
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
- return rnd << PAGE_SHIFT;
-}
-
-static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
-{
- unsigned long gap = rlim_stack->rlim_cur;
- unsigned long pad = stack_guard_gap;
-
- /* Account for stack randomization if necessary */
- if (current->flags & PF_RANDOMIZE)
- pad += (STACK_RND_MASK << PAGE_SHIFT);
-
- /* Values close to RLIM_INFINITY can overflow. */
- if (gap + pad > gap)
- gap += pad;
-
- if (gap < MIN_GAP)
- gap = MIN_GAP;
- else if (gap > MAX_GAP)
- gap = MAX_GAP;
-
- return PAGE_ALIGN(STACK_TOP - gap - rnd);
-}
-
-/*
- * This function, called very early during the creation of a new process VM
- * image, sets up which VM layout function to use:
- */
-void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
-{
- unsigned long random_factor = 0UL;
-
- if (current->flags & PF_RANDOMIZE)
- random_factor = arch_mmap_rnd();
-
- /*
- * Fall back to the standard layout if the personality bit is set, or
- * if the expected stack growth is unlimited:
- */
- if (mmap_is_legacy(rlim_stack)) {
- mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
- } else {
- mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
- }
-}
-
/*
* You really shouldn't be using read() or write() on /dev/mem. This might go
* away in the future.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 078950d9605b..00fcea236eba 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -264,7 +264,8 @@ extern struct ctl_table epoll_table[];
extern struct ctl_table firmware_config_table[];
#endif
-#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
+#if defined(HAVE_ARCH_PICK_MMAP_LAYOUT) || \
+ defined(CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT)
int sysctl_legacy_va_layout;
#endif
@@ -1573,7 +1574,8 @@ static struct ctl_table vm_table[] = {
.proc_handler = proc_dointvec,
.extra1 = SYSCTL_ZERO,
},
-#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
+#if defined(HAVE_ARCH_PICK_MMAP_LAYOUT) || \
+ defined(CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT)
{
.procname = "legacy_va_layout",
.data = &sysctl_legacy_va_layout,
diff --git a/mm/util.c b/mm/util.c
index 15a4fb0f5473..0781e5575cb3 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -17,7 +17,12 @@
#include <linux/vmalloc.h>
#include <linux/userfaultfd_k.h>
#include <linux/elf.h>
+#include <linux/elf-randomize.h>
+#include <linux/personality.h>
#include <linux/random.h>
+#include <linux/processor.h>
+#include <linux/sizes.h>
+#include <linux/compat.h>
#include <linux/uaccess.h>
@@ -315,7 +320,78 @@ unsigned long randomize_stack_top(unsigned long stack_top)
#endif
}
-#if defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
+#ifdef CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
+#ifdef CONFIG_ARCH_HAS_ELF_RANDOMIZE
+unsigned long arch_mmap_rnd(void)
+{
+ unsigned long rnd;
+
+#ifdef CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS
+ if (is_compat_task())
+ rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
+ else
+#endif /* CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS */
+ rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
+
+ return rnd << PAGE_SHIFT;
+}
+#endif /* CONFIG_ARCH_HAS_ELF_RANDOMIZE */
+
+static int mmap_is_legacy(struct rlimit *rlim_stack)
+{
+ if (current->personality & ADDR_COMPAT_LAYOUT)
+ return 1;
+
+ if (rlim_stack->rlim_cur == RLIM_INFINITY)
+ return 1;
+
+ return sysctl_legacy_va_layout;
+}
+
+/*
+ * Leave enough space between the mmap area and the stack to honour ulimit in
+ * the face of randomisation.
+ */
+#define MIN_GAP (SZ_128M)
+#define MAX_GAP (STACK_TOP / 6 * 5)
+
+static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
+{
+ unsigned long gap = rlim_stack->rlim_cur;
+ unsigned long pad = stack_guard_gap;
+
+ /* Account for stack randomization if necessary */
+ if (current->flags & PF_RANDOMIZE)
+ pad += (STACK_RND_MASK << PAGE_SHIFT);
+
+ /* Values close to RLIM_INFINITY can overflow. */
+ if (gap + pad > gap)
+ gap += pad;
+
+ if (gap < MIN_GAP)
+ gap = MIN_GAP;
+ else if (gap > MAX_GAP)
+ gap = MAX_GAP;
+
+ return PAGE_ALIGN(STACK_TOP - gap - rnd);
+}
+
+void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
+{
+ unsigned long random_factor = 0UL;
+
+ if (current->flags & PF_RANDOMIZE)
+ random_factor = arch_mmap_rnd();
+
+ if (mmap_is_legacy(rlim_stack)) {
+ mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
+ mm->get_unmapped_area = arch_get_unmapped_area;
+ } else {
+ mm->mmap_base = mmap_base(random_factor, rlim_stack);
+ mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ }
+}
+#elif defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
mm->mmap_base = TASK_UNMAPPED_BASE;
--
2.20.1
This commit takes care of stack randomization and stack guard gap when
computing mmap base address and checks if the task asked for randomization.
This fixes the problem uncovered and not fixed for arm here:
https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/arm/mm/mmap.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index f866870db749..bff3d00bda5b 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -18,8 +18,9 @@
(((pgoff)<<PAGE_SHIFT) & (SHMLBA-1)))
/* gap between mmap and stack */
-#define MIN_GAP (128*1024*1024UL)
-#define MAX_GAP ((TASK_SIZE)/6*5)
+#define MIN_GAP (128*1024*1024UL)
+#define MAX_GAP ((TASK_SIZE)/6*5)
+#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
static int mmap_is_legacy(struct rlimit *rlim_stack)
{
@@ -35,6 +36,15 @@ static int mmap_is_legacy(struct rlimit *rlim_stack)
static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
{
unsigned long gap = rlim_stack->rlim_cur;
+ unsigned long pad = stack_guard_gap;
+
+ /* Account for stack randomization if necessary */
+ if (current->flags & PF_RANDOMIZE)
+ pad += (STACK_RND_MASK << PAGE_SHIFT);
+
+ /* Values close to RLIM_INFINITY can overflow. */
+ if (gap + pad > gap)
+ gap += pad;
if (gap < MIN_GAP)
gap = MIN_GAP;
--
2.20.1
mmap base address must be computed wrt stack top address, using TASK_SIZE
is wrong since STACK_TOP and TASK_SIZE are not equivalent.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/arm/mm/mmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index bff3d00bda5b..0b94b674aa91 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -19,7 +19,7 @@
/* gap between mmap and stack */
#define MIN_GAP (128*1024*1024UL)
-#define MAX_GAP ((TASK_SIZE)/6*5)
+#define MAX_GAP ((STACK_TOP)/6*5)
#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
static int mmap_is_legacy(struct rlimit *rlim_stack)
@@ -51,7 +51,7 @@ static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
else if (gap > MAX_GAP)
gap = MAX_GAP;
- return PAGE_ALIGN(TASK_SIZE - gap - rnd);
+ return PAGE_ALIGN(STACK_TOP - gap - rnd);
}
/*
--
2.20.1
arm uses a top-down mmap layout by default that exactly fits the generic
functions, so get rid of arch specific code and use the generic version
by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.
As ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT selects ARCH_HAS_ELF_RANDOMIZE,
use the generic version of arch_randomize_brk since it also fits.
Note that this commit also removes the possibility for arm to have elf
randomization and no MMU: without MMU, the security added by randomization
is worth nothing.
Note that it is safe to remove STACK_RND_MASK since it matches the default
value.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/processor.h | 2 --
arch/arm/kernel/process.c | 5 ---
arch/arm/mm/mmap.c | 62 --------------------------------
4 files changed, 1 insertion(+), 70 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 33b00579beff..81b08b027e4e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,7 +7,6 @@ config ARM
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select ARCH_HAS_DEVMEM_IS_ALLOWED
- select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
@@ -30,6 +29,7 @@ config ARM
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
+ select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
select BUILDTIME_EXTABLE_SORT if MMU
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 20c2f42454b8..614bf829e454 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -140,8 +140,6 @@ static inline void prefetchw(const void *ptr)
#endif
#endif
-#define HAVE_ARCH_PICK_MMAP_LAYOUT
-
#endif
#endif /* __ASM_ARM_PROCESSOR_H */
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f934a6739fc0..9485acc520a4 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -319,11 +319,6 @@ unsigned long get_wchan(struct task_struct *p)
return 0;
}
-unsigned long arch_randomize_brk(struct mm_struct *mm)
-{
- return randomize_page(mm->brk, 0x02000000);
-}
-
#ifdef CONFIG_MMU
#ifdef CONFIG_KUSER_HELPERS
/*
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index 0b94b674aa91..b8d912ac9e61 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -17,43 +17,6 @@
((((addr)+SHMLBA-1)&~(SHMLBA-1)) + \
(((pgoff)<<PAGE_SHIFT) & (SHMLBA-1)))
-/* gap between mmap and stack */
-#define MIN_GAP (128*1024*1024UL)
-#define MAX_GAP ((STACK_TOP)/6*5)
-#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
-
-static int mmap_is_legacy(struct rlimit *rlim_stack)
-{
- if (current->personality & ADDR_COMPAT_LAYOUT)
- return 1;
-
- if (rlim_stack->rlim_cur == RLIM_INFINITY)
- return 1;
-
- return sysctl_legacy_va_layout;
-}
-
-static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
-{
- unsigned long gap = rlim_stack->rlim_cur;
- unsigned long pad = stack_guard_gap;
-
- /* Account for stack randomization if necessary */
- if (current->flags & PF_RANDOMIZE)
- pad += (STACK_RND_MASK << PAGE_SHIFT);
-
- /* Values close to RLIM_INFINITY can overflow. */
- if (gap + pad > gap)
- gap += pad;
-
- if (gap < MIN_GAP)
- gap = MIN_GAP;
- else if (gap > MAX_GAP)
- gap = MAX_GAP;
-
- return PAGE_ALIGN(STACK_TOP - gap - rnd);
-}
-
/*
* We need to ensure that shared mappings are correctly aligned to
* avoid aliasing issues with VIPT caches. We need to ensure that
@@ -181,31 +144,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;
}
-unsigned long arch_mmap_rnd(void)
-{
- unsigned long rnd;
-
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
-
- return rnd << PAGE_SHIFT;
-}
-
-void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
-{
- unsigned long random_factor = 0UL;
-
- if (current->flags & PF_RANDOMIZE)
- random_factor = arch_mmap_rnd();
-
- if (mmap_is_legacy(rlim_stack)) {
- mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
- } else {
- mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
- }
-}
-
/*
* You really shouldn't be using read() or write() on /dev/mem. This
* might go away in the future.
--
2.20.1
This commit takes care of stack randomization and stack guard gap when
computing mmap base address and checks if the task asked for randomization.
This fixes the problem uncovered and not fixed for arm here:
https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/mips/mm/mmap.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index d79f2b432318..f5c778113384 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -21,8 +21,9 @@ unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
EXPORT_SYMBOL(shm_align_mask);
/* gap between mmap and stack */
-#define MIN_GAP (128*1024*1024UL)
-#define MAX_GAP ((TASK_SIZE)/6*5)
+#define MIN_GAP (128*1024*1024UL)
+#define MAX_GAP ((TASK_SIZE)/6*5)
+#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
static int mmap_is_legacy(struct rlimit *rlim_stack)
{
@@ -38,6 +39,15 @@ static int mmap_is_legacy(struct rlimit *rlim_stack)
static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
{
unsigned long gap = rlim_stack->rlim_cur;
+ unsigned long pad = stack_guard_gap;
+
+ /* Account for stack randomization if necessary */
+ if (current->flags & PF_RANDOMIZE)
+ pad += (STACK_RND_MASK << PAGE_SHIFT);
+
+ /* Values close to RLIM_INFINITY can overflow. */
+ if (gap + pad > gap)
+ gap += pad;
if (gap < MIN_GAP)
gap = MIN_GAP;
--
2.20.1
mips uses a top-down layout by default that exactly fits the generic
functions, so get rid of arch specific code and use the generic version
by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.
As ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT selects ARCH_HAS_ELF_RANDOMIZE,
use the generic version of arch_randomize_brk since it also fits.
Note that this commit also removes the possibility for mips to have elf
randomization and no MMU: without MMU, the security added by randomization
is worth nothing.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/mips/Kconfig | 2 +-
arch/mips/include/asm/processor.h | 5 --
arch/mips/mm/mmap.c | 96 -------------------------------
3 files changed, 1 insertion(+), 102 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index d50fafd7bf3a..4e85d7d2cf1a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -5,7 +5,6 @@ config MIPS
select ARCH_32BIT_OFF_T if !64BIT
select ARCH_BINFMT_ELF_STATE if MIPS_FP_SUPPORT
select ARCH_CLOCKSOURCE_DATA
- select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_SUPPORTS_UPROBES
@@ -13,6 +12,7 @@ config MIPS
select ARCH_USE_CMPXCHG_LOCKREF if 64BIT
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index aca909bd7841..fba18d4a9190 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -29,11 +29,6 @@
extern unsigned int vced_count, vcei_count;
-/*
- * MIPS does have an arch_pick_mmap_layout()
- */
-#define HAVE_ARCH_PICK_MMAP_LAYOUT 1
-
#ifdef CONFIG_32BIT
#ifdef CONFIG_KVM_GUEST
/* User space process size is limited to 1GB in KVM Guest Mode */
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index d5106c26ac6a..00fe90c6db3e 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -16,49 +16,10 @@
#include <linux/random.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
-#include <linux/sizes.h>
-#include <linux/compat.h>
unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
EXPORT_SYMBOL(shm_align_mask);
-/* gap between mmap and stack */
-#define MIN_GAP (128*1024*1024UL)
-#define MAX_GAP ((STACK_TOP)/6*5)
-#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
-
-static int mmap_is_legacy(struct rlimit *rlim_stack)
-{
- if (current->personality & ADDR_COMPAT_LAYOUT)
- return 1;
-
- if (rlim_stack->rlim_cur == RLIM_INFINITY)
- return 1;
-
- return sysctl_legacy_va_layout;
-}
-
-static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
-{
- unsigned long gap = rlim_stack->rlim_cur;
- unsigned long pad = stack_guard_gap;
-
- /* Account for stack randomization if necessary */
- if (current->flags & PF_RANDOMIZE)
- pad += (STACK_RND_MASK << PAGE_SHIFT);
-
- /* Values close to RLIM_INFINITY can overflow. */
- if (gap + pad > gap)
- gap += pad;
-
- if (gap < MIN_GAP)
- gap = MIN_GAP;
- else if (gap > MAX_GAP)
- gap = MAX_GAP;
-
- return PAGE_ALIGN(STACK_TOP - gap - rnd);
-}
-
#define COLOUR_ALIGN(addr, pgoff) \
((((addr) + shm_align_mask) & ~shm_align_mask) + \
(((pgoff) << PAGE_SHIFT) & shm_align_mask))
@@ -156,63 +117,6 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp,
addr0, len, pgoff, flags, DOWN);
}
-unsigned long arch_mmap_rnd(void)
-{
- unsigned long rnd;
-
-#ifdef CONFIG_COMPAT
- if (TASK_IS_32BIT_ADDR)
- rnd = get_random_long() & ((1UL << mmap_rnd_compat_bits) - 1);
- else
-#endif /* CONFIG_COMPAT */
- rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
-
- return rnd << PAGE_SHIFT;
-}
-
-void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
-{
- unsigned long random_factor = 0UL;
-
- if (current->flags & PF_RANDOMIZE)
- random_factor = arch_mmap_rnd();
-
- if (mmap_is_legacy(rlim_stack)) {
- mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
- } else {
- mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
- }
-}
-
-static inline unsigned long brk_rnd(void)
-{
- unsigned long rnd = get_random_long();
-
- rnd = rnd << PAGE_SHIFT;
- /* 32MB for 32bit, 1GB for 64bit */
- if (!IS_ENABLED(CONFIG_64BIT) || is_compat_task())
- rnd = rnd & (SZ_32M - 1);
- else
- rnd = rnd & (SZ_1G - 1);
-
- return rnd;
-}
-
-unsigned long arch_randomize_brk(struct mm_struct *mm)
-{
- unsigned long base = mm->brk;
- unsigned long ret;
-
- ret = PAGE_ALIGN(base + brk_rnd());
-
- if (ret < mm->brk)
- return mm->brk;
-
- return ret;
-}
-
bool __virt_addr_valid(const volatile void *kaddr)
{
unsigned long vaddr = (unsigned long)kaddr;
--
2.20.1
In order to avoid wasting user address space by using bottom-up mmap
allocation scheme, prefer top-down scheme when possible.
Before:
root@qemuriscv64:~# cat /proc/self/maps
00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
00018000-00039000 rw-p 00000000 00:00 0 [heap]
1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
155556f000-1555570000 rw-p 00000000 00:00 0
1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
1555574000-1555576000 rw-p 00000000 00:00 0
1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
155567a000-15556a0000 rw-p 00000000 00:00 0
3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
After:
root@qemuriscv64:~# cat /proc/self/maps
00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Walmsley <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/riscv/Kconfig | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..87dc5370becb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -54,6 +54,18 @@ config RISCV
select EDAC_SUPPORT
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+ select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+ select HAVE_ARCH_MMAP_RND_BITS
+
+config ARCH_MMAP_RND_BITS_MIN
+ default 18 if 64BIT
+ default 8
+
+# max bits determined by the following formula:
+# VA_BITS - PAGE_SHIFT - 3
+config ARCH_MMAP_RND_BITS_MAX
+ default 24 if 64BIT # SV39 based
+ default 17
config MMU
def_bool y
--
2.20.1
Do not offset mmap base address because of stack randomization if
current task does not want randomization.
Note that x86 already implements this behaviour.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/arm64/mm/mmap.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index bb0140afed66..e4acaead67de 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -54,7 +54,11 @@ unsigned long arch_mmap_rnd(void)
static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
{
unsigned long gap = rlim_stack->rlim_cur;
- unsigned long pad = (STACK_RND_MASK << PAGE_SHIFT) + stack_guard_gap;
+ unsigned long pad = stack_guard_gap;
+
+ /* Account for stack randomization if necessary */
+ if (current->flags & PF_RANDOMIZE)
+ pad += (STACK_RND_MASK << PAGE_SHIFT);
/* Values close to RLIM_INFINITY can overflow. */
if (gap + pad > gap)
--
2.20.1
This commits selects ARCH_HAS_ELF_RANDOMIZE when an arch uses the generic
topdown mmap layout functions so that this security feature is on by
default.
Note that this commit also removes the possibility for arm64 to have elf
randomization and no MMU: without MMU, the security added by randomization
is worth nothing.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/Kconfig | 1 +
arch/arm64/Kconfig | 1 -
arch/arm64/kernel/process.c | 8 --------
mm/util.c | 11 +++++++++--
4 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index a0bb6fa4d381..d4c1f0551dfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -705,6 +705,7 @@ config HAVE_ARCH_COMPAT_MMAP_BASES
config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
bool
depends on MMU
+ select ARCH_HAS_ELF_RANDOMIZE
config HAVE_COPY_THREAD_TLS
bool
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 14a194e63458..399f595ef852 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,7 +16,6 @@ config ARM64
select ARCH_HAS_DMA_MMAP_PGPROT
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
- select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index f674f28df663..8ddc2471b054 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -548,14 +548,6 @@ unsigned long arch_align_stack(unsigned long sp)
return sp & ~0xf;
}
-unsigned long arch_randomize_brk(struct mm_struct *mm)
-{
- if (is_compat_task())
- return randomize_page(mm->brk, SZ_32M);
- else
- return randomize_page(mm->brk, SZ_1G);
-}
-
/*
* Called from setup_new_exec() after (COMPAT_)SET_PERSONALITY.
*/
diff --git a/mm/util.c b/mm/util.c
index 0781e5575cb3..16f1e56e2996 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -321,7 +321,15 @@ unsigned long randomize_stack_top(unsigned long stack_top)
}
#ifdef CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
-#ifdef CONFIG_ARCH_HAS_ELF_RANDOMIZE
+unsigned long arch_randomize_brk(struct mm_struct *mm)
+{
+ /* Is the current task 32bit ? */
+ if (!IS_ENABLED(CONFIG_64BIT) || is_compat_task())
+ return randomize_page(mm->brk, SZ_32M);
+
+ return randomize_page(mm->brk, SZ_1G);
+}
+
unsigned long arch_mmap_rnd(void)
{
unsigned long rnd;
@@ -335,7 +343,6 @@ unsigned long arch_mmap_rnd(void)
return rnd << PAGE_SHIFT;
}
-#endif /* CONFIG_ARCH_HAS_ELF_RANDOMIZE */
static int mmap_is_legacy(struct rlimit *rlim_stack)
{
--
2.20.1
mmap base address must be computed wrt stack top address, using TASK_SIZE
is wrong since STACK_TOP and TASK_SIZE are not equivalent.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/mips/mm/mmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index f5c778113384..a7e84b2e71d7 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -22,7 +22,7 @@ EXPORT_SYMBOL(shm_align_mask);
/* gap between mmap and stack */
#define MIN_GAP (128*1024*1024UL)
-#define MAX_GAP ((TASK_SIZE)/6*5)
+#define MAX_GAP ((STACK_TOP)/6*5)
#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
static int mmap_is_legacy(struct rlimit *rlim_stack)
@@ -54,7 +54,7 @@ static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
else if (gap > MAX_GAP)
gap = MAX_GAP;
- return PAGE_ALIGN(TASK_SIZE - gap - rnd);
+ return PAGE_ALIGN(STACK_TOP - gap - rnd);
}
#define COLOUR_ALIGN(addr, pgoff) \
--
2.20.1
Mips uses TASK_IS_32BIT_ADDR to determine if a task is 32bit, but
this define is mips specific and other arches do not have it: instead,
use !IS_ENABLED(CONFIG_64BIT) || is_compat_task() condition.
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/mips/mm/mmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index ff6ab87e9c56..d5106c26ac6a 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -17,6 +17,7 @@
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
#include <linux/sizes.h>
+#include <linux/compat.h>
unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
EXPORT_SYMBOL(shm_align_mask);
@@ -191,7 +192,7 @@ static inline unsigned long brk_rnd(void)
rnd = rnd << PAGE_SHIFT;
/* 32MB for 32bit, 1GB for 64bit */
- if (TASK_IS_32BIT_ADDR)
+ if (!IS_ENABLED(CONFIG_64BIT) || is_compat_task())
rnd = rnd & (SZ_32M - 1);
else
rnd = rnd & (SZ_1G - 1);
--
2.20.1
This commit simply bumps up to 32MB and 1GB the random offset
of brk, compared to 8MB and 256MB, for 32bit and 64bit respectively.
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
---
arch/mips/mm/mmap.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index a7e84b2e71d7..ff6ab87e9c56 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -16,6 +16,7 @@
#include <linux/random.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
+#include <linux/sizes.h>
unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
EXPORT_SYMBOL(shm_align_mask);
@@ -189,11 +190,11 @@ static inline unsigned long brk_rnd(void)
unsigned long rnd = get_random_long();
rnd = rnd << PAGE_SHIFT;
- /* 8MB for 32bit, 256MB for 64bit */
+ /* 32MB for 32bit, 1GB for 64bit */
if (TASK_IS_32BIT_ADDR)
- rnd = rnd & 0x7ffffful;
+ rnd = rnd & (SZ_32M - 1);
else
- rnd = rnd & 0xffffffful;
+ rnd = rnd & (SZ_1G - 1);
return rnd;
}
--
2.20.1
Hello!
On 08.08.2019 9:17, Alexandre Ghiti wrote:
> This commit takes care of stack randomization and stack guard gap when
> computing mmap base address and checks if the task asked for randomization.
>
> This fixes the problem uncovered and not fixed for arm here:
> https://lkml.kernel.org/r/[email protected]
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> Acked-by: Kees Cook <[email protected]>
> Acked-by: Paul Burton <[email protected]>
> Reviewed-by: Luis Chamberlain <[email protected]>
> ---
> arch/mips/mm/mmap.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
> index d79f2b432318..f5c778113384 100644
> --- a/arch/mips/mm/mmap.c
> +++ b/arch/mips/mm/mmap.c
> @@ -21,8 +21,9 @@ unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
> EXPORT_SYMBOL(shm_align_mask);
>
> /* gap between mmap and stack */
> -#define MIN_GAP (128*1024*1024UL)
> -#define MAX_GAP ((TASK_SIZE)/6*5)
> +#define MIN_GAP (128*1024*1024UL)
> +#define MAX_GAP ((TASK_SIZE)/6*5)
Could add spaces around *, while touching this anyway? And parens
around TASK_SIZE shouldn't be needed...
> +#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))
>
> static int mmap_is_legacy(struct rlimit *rlim_stack)
> {
> @@ -38,6 +39,15 @@ static int mmap_is_legacy(struct rlimit *rlim_stack)
> static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack)
> {
> unsigned long gap = rlim_stack->rlim_cur;
> + unsigned long pad = stack_guard_gap;
> +
> + /* Account for stack randomization if necessary */
> + if (current->flags & PF_RANDOMIZE)
> + pad += (STACK_RND_MASK << PAGE_SHIFT);
Parens not needed here.
> +
> + /* Values close to RLIM_INFINITY can overflow. */
> + if (gap + pad > gap)
> + gap += pad;
>
> if (gap < MIN_GAP)
> gap = MIN_GAP;
>
Hello!
On 08.08.2019 9:17, Alexandre Ghiti wrote:
> This commit simply bumps up to 32MB and 1GB the random offset
> of brk, compared to 8MB and 256MB, for 32bit and 64bit respectively.
>
> Suggested-by: Kees Cook <[email protected]>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> Acked-by: Paul Burton <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> Reviewed-by: Luis Chamberlain <[email protected]>
> ---
> arch/mips/mm/mmap.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
> index a7e84b2e71d7..ff6ab87e9c56 100644
> --- a/arch/mips/mm/mmap.c
> +++ b/arch/mips/mm/mmap.c
[...]
> @@ -189,11 +190,11 @@ static inline unsigned long brk_rnd(void)
> unsigned long rnd = get_random_long();
>
> rnd = rnd << PAGE_SHIFT;
> - /* 8MB for 32bit, 256MB for 64bit */
> + /* 32MB for 32bit, 1GB for 64bit */
> if (TASK_IS_32BIT_ADDR)
> - rnd = rnd & 0x7ffffful;
> + rnd = rnd & (SZ_32M - 1);
> else
> - rnd = rnd & 0xffffffful;
> + rnd = rnd & (SZ_1G - 1);
Why not make these 'rnd &= SZ_* - 1', while at it anyways?
[...]
MBR, Sergei
On 8/8/19 11:16 AM, Sergei Shtylyov wrote:
> Hello!
>
> On 08.08.2019 9:17, Alexandre Ghiti wrote:
>
>> This commit takes care of stack randomization and stack guard gap when
>> computing mmap base address and checks if the task asked for
>> randomization.
>>
>> This fixes the problem uncovered and not fixed for arm here:
>> https://lkml.kernel.org/r/[email protected]
>>
>> Signed-off-by: Alexandre Ghiti <[email protected]>
>> Acked-by: Kees Cook <[email protected]>
>> Acked-by: Paul Burton <[email protected]>
>> Reviewed-by: Luis Chamberlain <[email protected]>
>> ---
>> Â arch/mips/mm/mmap.c | 14 ++++++++++++--
>> Â 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
>> index d79f2b432318..f5c778113384 100644
>> --- a/arch/mips/mm/mmap.c
>> +++ b/arch/mips/mm/mmap.c
>> @@ -21,8 +21,9 @@ unsigned long shm_align_mask = PAGE_SIZE - 1;Â Â Â /*
>> Sane caches */
>> Â EXPORT_SYMBOL(shm_align_mask);
>> Â Â /* gap between mmap and stack */
>> -#define MIN_GAP (128*1024*1024UL)
>> -#define MAX_GAP ((TASK_SIZE)/6*5)
>> +#define MIN_GAPÂ Â Â Â Â Â Â (128*1024*1024UL)
>> +#define MAX_GAPÂ Â Â Â Â Â Â ((TASK_SIZE)/6*5)
>
> Â Â Could add spaces around *, while touching this anyway? And parens
> around TASK_SIZE shouldn't be needed...
>
I did not fix checkpatch warnings here since this code gets removed
afterwards.
>> +#define STACK_RND_MASKÂ Â Â (0x7ff >> (PAGE_SHIFT - 12))
>> Â Â static int mmap_is_legacy(struct rlimit *rlim_stack)
>> Â {
>> @@ -38,6 +39,15 @@ static int mmap_is_legacy(struct rlimit *rlim_stack)
>> Â static unsigned long mmap_base(unsigned long rnd, struct rlimit
>> *rlim_stack)
>> Â {
>> Â Â Â Â Â unsigned long gap = rlim_stack->rlim_cur;
>> +Â Â Â unsigned long pad = stack_guard_gap;
>> +
>> +Â Â Â /* Account for stack randomization if necessary */
>> +Â Â Â if (current->flags & PF_RANDOMIZE)
>> +Â Â Â Â Â Â Â pad += (STACK_RND_MASK << PAGE_SHIFT);
>
> Â Â Parens not needed here.
Belt and braces approach here as I'm never sure about priorities.
Thanks for your time,
Alex
>
>> +
>> +Â Â Â /* Values close to RLIM_INFINITY can overflow. */
>> +Â Â Â if (gap + pad > gap)
>> +Â Â Â Â Â Â Â gap += pad;
>> Â Â Â Â Â Â if (gap < MIN_GAP)
>> Â Â Â Â Â Â Â Â Â gap = MIN_GAP;
>>
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
On 8/8/19 11:19 AM, Sergei Shtylyov wrote:
> Hello!
>
> On 08.08.2019 9:17, Alexandre Ghiti wrote:
>
>> This commit simply bumps up to 32MB and 1GB the random offset
>> of brk, compared to 8MB and 256MB, for 32bit and 64bit respectively.
>>
>> Suggested-by: Kees Cook <[email protected]>
>> Signed-off-by: Alexandre Ghiti <[email protected]>
>> Acked-by: Paul Burton <[email protected]>
>> Reviewed-by: Kees Cook <[email protected]>
>> Reviewed-by: Luis Chamberlain <[email protected]>
>> ---
>> Â arch/mips/mm/mmap.c | 7 ++++---
>> Â 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
>> index a7e84b2e71d7..ff6ab87e9c56 100644
>> --- a/arch/mips/mm/mmap.c
>> +++ b/arch/mips/mm/mmap.c
> [...]
>> @@ -189,11 +190,11 @@ static inline unsigned long brk_rnd(void)
>> Â Â Â Â Â unsigned long rnd = get_random_long();
>> Â Â Â Â Â Â rnd = rnd << PAGE_SHIFT;
>> -Â Â Â /* 8MB for 32bit, 256MB for 64bit */
>> +Â Â Â /* 32MB for 32bit, 1GB for 64bit */
>> Â Â Â Â Â if (TASK_IS_32BIT_ADDR)
>> -Â Â Â Â Â Â Â rnd = rnd & 0x7ffffful;
>> +Â Â Â Â Â Â Â rnd = rnd & (SZ_32M - 1);
>> Â Â Â Â Â else
>> -Â Â Â Â Â Â Â rnd = rnd & 0xffffffful;
>> +Â Â Â Â Â Â Â rnd = rnd & (SZ_1G - 1);
>
> Â Â Why not make these 'rnd &= SZ_* - 1', while at it anyways?
You're right, I could have. Again, this code gets removed afterwards, so
I think it's ok
to leave it as is.
Anyway, thanks for your remarks Sergei !
Alex
>
> [...]
>
> MBR, Sergei
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
> In order to avoid wasting user address space by using bottom-up mmap
> allocation scheme, prefer top-down scheme when possible.
>
> Before:
> root@qemuriscv64:~# cat /proc/self/maps
> 00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
> 00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
> 00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
> 00018000-00039000 rw-p 00000000 00:00 0 [heap]
> 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
> 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
> 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
> 155556f000-1555570000 rw-p 00000000 00:00 0
> 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
> 1555574000-1555576000 rw-p 00000000 00:00 0
> 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
> 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
> 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
> 155567a000-15556a0000 rw-p 00000000 00:00 0
> 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
>
> After:
> root@qemuriscv64:~# cat /proc/self/maps
> 00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
> 00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
> 00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
> 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
> 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
> 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
> 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
> 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
> 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
> 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
> 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
> 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
> 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
> 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
> 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> Acked-by: Paul Walmsley <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> Reviewed-by: Luis Chamberlain <[email protected]>
> ---
> arch/riscv/Kconfig | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 59a4727ecd6c..87dc5370becb 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -54,6 +54,18 @@ config RISCV
> select EDAC_SUPPORT
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> + select HAVE_ARCH_MMAP_RND_BITS
> +
> +config ARCH_MMAP_RND_BITS_MIN
> + default 18 if 64BIT
> + default 8
> +
> +# max bits determined by the following formula:
> +# VA_BITS - PAGE_SHIFT - 3
> +config ARCH_MMAP_RND_BITS_MAX
> + default 24 if 64BIT # SV39 based
> + default 17
>
> config MMU
> def_bool y
With this patch, I am not able to boot a Fedora Linux(a Gnome desktop
image) on RISC-V hardware (Unleashed + Microsemi Expansion board). The
booting gets stuck right after systemd starts.
https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
Reverting just this patch allow to boot Fedora successfully on specific
RISC-V hardware. I have not root caused the issue but it looks like it
might have messed userpsace mapping.
--
Regards,
Atish
On 10/4/19 10:12 PM, Atish Patra wrote:
> On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
>> In order to avoid wasting user address space by using bottom-up mmap
>> allocation scheme, prefer top-down scheme when possible.
>>
>> Before:
>> root@qemuriscv64:~# cat /proc/self/maps
>> 00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
>> 00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
>> 00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
>> 00018000-00039000 rw-p 00000000 00:00 0 [heap]
>> 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
>> 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
>> 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
>> 155556f000-1555570000 rw-p 00000000 00:00 0
>> 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
>> 1555574000-1555576000 rw-p 00000000 00:00 0
>> 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
>> 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
>> 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
>> 155567a000-15556a0000 rw-p 00000000 00:00 0
>> 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
>>
>> After:
>> root@qemuriscv64:~# cat /proc/self/maps
>> 00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
>> 00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
>> 00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
>> 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
>> 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
>> 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
>> 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
>> 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
>> 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
>> 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
>> 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
>> 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
>> 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
>> 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
>> 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
>>
>> Signed-off-by: Alexandre Ghiti <[email protected]>
>> Acked-by: Paul Walmsley <[email protected]>
>> Reviewed-by: Christoph Hellwig <[email protected]>
>> Reviewed-by: Kees Cook <[email protected]>
>> Reviewed-by: Luis Chamberlain <[email protected]>
>> ---
>> arch/riscv/Kconfig | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 59a4727ecd6c..87dc5370becb 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -54,6 +54,18 @@ config RISCV
>> select EDAC_SUPPORT
>> select ARCH_HAS_GIGANTIC_PAGE
>> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>> + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>> + select HAVE_ARCH_MMAP_RND_BITS
>> +
>> +config ARCH_MMAP_RND_BITS_MIN
>> + default 18 if 64BIT
>> + default 8
>> +
>> +# max bits determined by the following formula:
>> +# VA_BITS - PAGE_SHIFT - 3
>> +config ARCH_MMAP_RND_BITS_MAX
>> + default 24 if 64BIT # SV39 based
>> + default 17
>>
>> config MMU
>> def_bool y
> With this patch, I am not able to boot a Fedora Linux(a Gnome desktop
> image) on RISC-V hardware (Unleashed + Microsemi Expansion board). The
> booting gets stuck right after systemd starts.
>
> https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
>
> Reverting just this patch allow to boot Fedora successfully on specific
> RISC-V hardware. I have not root caused the issue but it looks like it
> might have messed userpsace mapping.
It might have messed userspace mapping but not enough to make userspace
completely broken
as systemd does some things. I would try to boot in legacy layout: if
you can try to set sysctl legacy_va_layout
at boottime, it will map userspace as it was before (bottom-up). If that
does not work, the problem could
be the randomization that is activated by default now.
Anyway, it's weird since userspace should not depend on how the mapping is.
If you can identify the program that stalls, that would be fantastic :)
As the code is common to mips and arm now and I did not hear from them,
I imagine the problem comes
from us.
Alex
>
On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:
> On 10/4/19 10:12 PM, Atish Patra wrote:
> > On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
> > > In order to avoid wasting user address space by using bottom-up
> > > mmap
> > > allocation scheme, prefer top-down scheme when possible.
> > >
> > > Before:
> > > root@qemuriscv64:~# cat /proc/self/maps
> > > 00010000-00016000 r-xp 00000000 fe:00
> > > 6389 /bin/cat.coreutils
> > > 00016000-00017000 r--p 00005000 fe:00
> > > 6389 /bin/cat.coreutils
> > > 00017000-00018000 rw-p 00006000 fe:00
> > > 6389 /bin/cat.coreutils
> > > 00018000-00039000 rw-p 00000000 00:00 0 [heap]
> > > 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
> > > 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
> > > 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
> > > 155556f000-1555570000 rw-p 00000000 00:00 0
> > > 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
> > > 1555574000-1555576000 rw-p 00000000 00:00 0
> > > 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-
> > > 2.28.so
> > > 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-
> > > 2.28.so
> > > 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-
> > > 2.28.so
> > > 155567a000-15556a0000 rw-p 00000000 00:00 0
> > > 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
> > >
> > > After:
> > > root@qemuriscv64:~# cat /proc/self/maps
> > > 00010000-00016000 r-xp 00000000 fe:00
> > > 6389 /bin/cat.coreutils
> > > 00016000-00017000 r--p 00005000 fe:00
> > > 6389 /bin/cat.coreutils
> > > 00017000-00018000 rw-p 00006000 fe:00
> > > 6389 /bin/cat.coreutils
> > > 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
> > > 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
> > > 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-
> > > 2.28.so
> > > 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-
> > > 2.28.so
> > > 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-
> > > 2.28.so
> > > 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
> > > 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
> > > 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
> > > 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
> > > 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
> > > 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
> > > 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
> > >
> > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > Acked-by: Paul Walmsley <[email protected]>
> > > Reviewed-by: Christoph Hellwig <[email protected]>
> > > Reviewed-by: Kees Cook <[email protected]>
> > > Reviewed-by: Luis Chamberlain <[email protected]>
> > > ---
> > > arch/riscv/Kconfig | 12 ++++++++++++
> > > 1 file changed, 12 insertions(+)
> > >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index 59a4727ecd6c..87dc5370becb 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -54,6 +54,18 @@ config RISCV
> > > select EDAC_SUPPORT
> > > select ARCH_HAS_GIGANTIC_PAGE
> > > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> > > + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> > > + select HAVE_ARCH_MMAP_RND_BITS
> > > +
> > > +config ARCH_MMAP_RND_BITS_MIN
> > > + default 18 if 6legacy_va_layout4BIT
> > > + default 8
> > > +
> > > +# max bits determined by the following formula:
> > > +# VA_BITS - PAGE_SHIFT - 3
> > > +config ARCH_MMAP_RND_BITS_MAX
> > > + default 24 if 64BIT # SV39 based
> > > + default 17
> > >
> > > config MMU
> > > def_bool y
> > With this patch, I am not able to boot a Fedora Linux(a Gnome
> > desktop
> > image) on RISC-V hardware (Unleashed + Microsemi Expansion board).
> > The
> > booting gets stuck right after systemd starts.
> >
> > https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
> >
> > Reverting just this patch allow to boot Fedora successfully on
> > specific
> > RISC-V hardware. I have not root caused the issue but it looks like
> > it
> > might have messed userpsace mapping.
>
> It might have messed userspace mapping but not enough to make
> userspace
> completely broken
> as systemd does some things. I would try to boot in legacy layout:
> if
> you can try to set sysctl legacy_va_layout
> at boottime, it will map userspace as it was before (bottom-up). If
> that
> does not work, the problem could
> be the randomization that is activated by default now.
Randomization may not be the issue. I just removed
ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that seems to
work. Here is the bottom-up layout with randomization on.
[root@fedora-riscv ~]# cat /proc/self/maps
1555556000-1555570000 r-xp 00000000 103:01
280098 /usr/lib64/ld-2.28.so
1555570000-1555571000 r--p 00019000 103:01
280098 /usr/lib64/ld-2.28.so
1555571000-1555572000 rw-p 0001a000 103:01
280098 /usr/lib64/ld-2.28.so
1555572000-1555573000 rw-p 00000000 00:00 0
1555573000-1555575000 r-xp 00000000 00:00
0 [vdso]
1555575000-1555576000 r--p 00000000 103:01
50936 /usr/lib/locale/en_US.utf8/LC_IDENTIFICAT
ION
1555576000-155557d000 r--s 00000000 103:01
280826 /usr/lib64/gconv/gconv-modules.cache
155557d000-155557e000 r--p 00000000 103:01
50937 /usr/lib/locale/en_US.utf8/LC_MEASUREMENT
155557e000-155557f000 r--p 00000000 103:01
50939 /usr/lib/locale/en_US.utf8/LC_TELEPHONE
155557f000-1555580000 r--p 00000000 103:01
3706 /usr/lib/locale/en_US.utf8/LC_ADDRESS
1555580000-1555581000 r--p 00000000 103:01
50944 /usr/lib/locale/en_US.utf8/LC_NAME
1555581000-1555582000 r--p 00000000 103:01
3775 /usr/lib/locale/en_US.utf8/LC_PAPER
1555582000-1555583000 r--p 00000000 103:01
3758 /usr/lib/locale/en_US.utf8/LC_MESSAGES/SY
S_LC_MESSAGES
1555583000-1555584000 r--p 00000000 103:01
50938 /usr/lib/locale/en_US.utf8/LC_MONETARY
1555584000-1555585000 r--p 00000000 103:01
50940 /usr/lib/locale/en_US.utf8/LC_TIME
1555585000-1555586000 r--p 00000000 103:01
50945 /usr/lib/locale/en_US.utf8/LC_NUMERIC
1555590000-1555592000 rw-p 00000000 00:00 0
1555592000-15556b1000 r-xp 00000000 103:01
280105 /usr/lib64/libc-2.28.so
15556b1000-15556b5000 r--p 0011e000 103:01
280105 /usr/lib64/libc-2.28.so
15556b5000-15556b7000 rw-p 00122000 103:01
280105 /usr/lib64/libc-2.28.so
15556b7000-15556bb000 rw-p 00000000 00:00 0
15556bb000-1555933000 r--p 00000000 103:01
3755 /usr/lib/locale/en_US.utf8/LC_COLLATE
1555933000-1555986000 r--p 00000000 103:01
50942 /usr/lib/locale/en_US.utf8/LC_CTYPE
1555986000-15559a8000 rw-p 00000000 00:00 0
2aaaaaa000-2aaaab1000 r-xp 00000000 103:01
283975 /usr/bin/cat
2aaaab1000-2aaaab2000 r--p 00006000 103:01
283975 /usr/bin/cat
2aaaab2000-2aaaab3000 rw-p 00007000 103:01
283975 /usr/bin/cat
2aaaab3000-2aaaad4000 rw-p 00000000 00:00
0 [heap]
3fffc97000-3fffcb8000 rw-p 00000000 00:00
0 [stack]
> Anyway, it's weird since userspace should not depend on how the
> mapping is.
>
> If you can identify the program that stalls, that would be fantastic
> :)
>
It stucks while booting. So I am not sure how to figure out which
program stalls. It is difficult to figure out from boot log as it
stucks at different places but soon after systemd starts.
> As the code is common to mips and arm now and I did not hear from
> them,
> I imagine the problem comes
> from us.
>
> Alex
--
Regards,
Atish
On 10/7/19 5:11 AM, Alex Ghiti wrote:
> On 10/4/19 10:12 PM, Atish Patra wrote:
>> On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
>>> In order to avoid wasting user address space by using bottom-up mmap
>>> allocation scheme, prefer top-down scheme when possible.
>>>
>>> Before:
>>> root@qemuriscv64:~# cat /proc/self/maps
>>> 00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
>>> 00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
>>> 00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
>>> 00018000-00039000 rw-p 00000000 00:00 0Â Â Â Â Â Â Â Â Â [heap]
>>> 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
>>> 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
>>> 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
>>> 155556f000-1555570000 rw-p 00000000 00:00 0
>>> 1555570000-1555572000 r-xp 00000000 00:00 0Â Â Â Â Â [vdso]
>>> 1555574000-1555576000 rw-p 00000000 00:00 0
>>> 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
>>> 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
>>> 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
>>> 155567a000-15556a0000 rw-p 00000000 00:00 0
>>> 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0Â Â Â Â Â [stack]
>>>
>>> After:
>>> root@qemuriscv64:~# cat /proc/self/maps
>>> 00010000-00016000 r-xp 00000000 fe:00 6389 /bin/cat.coreutils
>>> 00016000-00017000 r--p 00005000 fe:00 6389 /bin/cat.coreutils
>>> 00017000-00018000 rw-p 00006000 fe:00 6389 /bin/cat.coreutils
>>> 2de81000-2dea2000 rw-p 00000000 00:00 0Â Â Â Â Â Â Â Â Â [heap]
>>> 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
>>> 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-2.28.so
>>> 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
>>> 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
>>> 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
>>> 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0Â Â Â Â Â [vdso]
>>> 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
>>> 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
>>> 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
>>> 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
>>> 3fff888000-3fff8a9000 rw-p 00000000 00:00 0Â Â Â Â Â [stack]
>>>
>>> Signed-off-by: Alexandre Ghiti <[email protected]>
>>> Acked-by: Paul Walmsley <[email protected]>
>>> Reviewed-by: Christoph Hellwig <[email protected]>
>>> Reviewed-by: Kees Cook <[email protected]>
>>> Reviewed-by: Luis Chamberlain <[email protected]>
>>> ---
>>> Â arch/riscv/Kconfig | 12 ++++++++++++
>>> Â 1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 59a4727ecd6c..87dc5370becb 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -54,6 +54,18 @@ config RISCV
>>> Â Â Â Â Â select EDAC_SUPPORT
>>> Â Â Â Â Â select ARCH_HAS_GIGANTIC_PAGE
>>> Â Â Â Â Â select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>>> +Â Â Â select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>>> +Â Â Â select HAVE_ARCH_MMAP_RND_BITS
>>> +
>>> +config ARCH_MMAP_RND_BITS_MIN
>>> +Â Â Â default 18 if 64BIT
>>> +Â Â Â default 8
>>> +
>>> +# max bits determined by the following formula:
>>> +#Â VA_BITS - PAGE_SHIFT - 3
>>> +config ARCH_MMAP_RND_BITS_MAX
>>> +Â Â Â default 24 if 64BIT # SV39 based
>>> +Â Â Â default 17
>>> Â Â config MMU
>>> Â Â Â Â Â def_bool y
>> With this patch, I am not able to boot a Fedora Linux(a Gnome desktop
>> image) on RISC-V hardware (Unleashed + Microsemi Expansion board). The
>> booting gets stuck right after systemd starts.
>>
>> https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
>>
>> Reverting just this patch allow to boot Fedora successfully on specific
>> RISC-V hardware. I have not root caused the issue but it looks like it
>> might have messed userpsace mapping.
>
> It might have messed userspace mapping but not enough to make
> userspace completely broken
> as systemd does some things. I would try to boot in legacy layout: if
> you can try to set sysctl legacy_va_layout
> at boottime, it will map userspace as it was before (bottom-up). If
> that does not work, the problem could
> be the randomization that is activated by default now.
> Anyway, it's weird since userspace should not depend on how the
> mapping is.
>
> If you can identify the program that stalls, that would be fantastic :)
>
> As the code is common to mips and arm now and I did not hear from
> them, I imagine the problem comes
> from us.
>
> Alex
Atish, do you have any news regarding this problem ? If you have an
image I can execute on qemu that
reproduces the issue, I can take a look.
Alex
>>
>
On 10/7/19 8:46 PM, Atish Patra wrote:
> On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:
>> On 10/4/19 10:12 PM, Atish Patra wrote:
>>> On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
>>>> In order to avoid wasting user address space by using bottom-up
>>>> mmap
>>>> allocation scheme, prefer top-down scheme when possible.
>>>>
>>>> Before:
>>>> root@qemuriscv64:~# cat /proc/self/maps
>>>> 00010000-00016000 r-xp 00000000 fe:00
>>>> 6389 /bin/cat.coreutils
>>>> 00016000-00017000 r--p 00005000 fe:00
>>>> 6389 /bin/cat.coreutils
>>>> 00017000-00018000 rw-p 00006000 fe:00
>>>> 6389 /bin/cat.coreutils
>>>> 00018000-00039000 rw-p 00000000 00:00 0 [heap]
>>>> 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
>>>> 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
>>>> 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
>>>> 155556f000-1555570000 rw-p 00000000 00:00 0
>>>> 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
>>>> 1555574000-1555576000 rw-p 00000000 00:00 0
>>>> 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-
>>>> 2.28.so
>>>> 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-
>>>> 2.28.so
>>>> 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-
>>>> 2.28.so
>>>> 155567a000-15556a0000 rw-p 00000000 00:00 0
>>>> 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
>>>>
>>>> After:
>>>> root@qemuriscv64:~# cat /proc/self/maps
>>>> 00010000-00016000 r-xp 00000000 fe:00
>>>> 6389 /bin/cat.coreutils
>>>> 00016000-00017000 r--p 00005000 fe:00
>>>> 6389 /bin/cat.coreutils
>>>> 00017000-00018000 rw-p 00006000 fe:00
>>>> 6389 /bin/cat.coreutils
>>>> 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
>>>> 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
>>>> 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-
>>>> 2.28.so
>>>> 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-
>>>> 2.28.so
>>>> 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-
>>>> 2.28.so
>>>> 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
>>>> 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
>>>> 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-2.28.so
>>>> 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
>>>> 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
>>>> 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
>>>> 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
>>>>
>>>> Signed-off-by: Alexandre Ghiti <[email protected]>
>>>> Acked-by: Paul Walmsley <[email protected]>
>>>> Reviewed-by: Christoph Hellwig <[email protected]>
>>>> Reviewed-by: Kees Cook <[email protected]>
>>>> Reviewed-by: Luis Chamberlain <[email protected]>
>>>> ---
>>>> arch/riscv/Kconfig | 12 ++++++++++++
>>>> 1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>> index 59a4727ecd6c..87dc5370becb 100644
>>>> --- a/arch/riscv/Kconfig
>>>> +++ b/arch/riscv/Kconfig
>>>> @@ -54,6 +54,18 @@ config RISCV
>>>> select EDAC_SUPPORT
>>>> select ARCH_HAS_GIGANTIC_PAGE
>>>> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>>>> + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>>>> + select HAVE_ARCH_MMAP_RND_BITS
>>>> +
>>>> +config ARCH_MMAP_RND_BITS_MIN
>>>> + default 18 if 6legacy_va_layout4BIT
>>>> + default 8
>>>> +
>>>> +# max bits determined by the following formula:
>>>> +# VA_BITS - PAGE_SHIFT - 3
>>>> +config ARCH_MMAP_RND_BITS_MAX
>>>> + default 24 if 64BIT # SV39 based
>>>> + default 17
>>>>
>>>> config MMU
>>>> def_bool y
>>> With this patch, I am not able to boot a Fedora Linux(a Gnome
>>> desktop
>>> image) on RISC-V hardware (Unleashed + Microsemi Expansion board).
>>> The
>>> booting gets stuck right after systemd starts.
>>>
>>> https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
>>>
>>> Reverting just this patch allow to boot Fedora successfully on
>>> specific
>>> RISC-V hardware. I have not root caused the issue but it looks like
>>> it
>>> might have messed userpsace mapping.
>> It might have messed userspace mapping but not enough to make
>> userspace
>> completely broken
>> as systemd does some things. I would try to boot in legacy layout:
>> if
>> you can try to set sysctl legacy_va_layout
>> at boottime, it will map userspace as it was before (bottom-up). If
>> that
>> does not work, the problem could
>> be the randomization that is activated by default now.
> Randomization may not be the issue. I just removed
> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that seems to
> work. Here is the bottom-up layout with randomization on.
Oups, sorry for my previous answer, I missed yours that landed in
another folder.
Removing ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT also removes randomization
as this config selects ARCH_HAS_ELF_RANDOMIZE.
You could remove ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and selects by hand
ARCH_HAS_ELF_RANDOMIZE but you would have to implement arch_mmap_rnd and
arch_randomize_brk (elf-randomize.h).
The simplest would be to boot in legacy layout: I did not find a way to
set this in kernel
command line, but you can by modifying it directly in the code:
https://elixir.bootlin.com/linux/v5.4-rc2/source/kernel/sysctl.c#L269
> [root@fedora-riscv ~]# cat /proc/self/maps
> 1555556000-1555570000 r-xp 00000000 103:01
> 280098 /usr/lib64/ld-2.28.so
> 1555570000-1555571000 r--p 00019000 103:01
> 280098 /usr/lib64/ld-2.28.so
> 1555571000-1555572000 rw-p 0001a000 103:01
> 280098 /usr/lib64/ld-2.28.so
> 1555572000-1555573000 rw-p 00000000 00:00 0
> 1555573000-1555575000 r-xp 00000000 00:00
> 0 [vdso]
> 1555575000-1555576000 r--p 00000000 103:01
> 50936 /usr/lib/locale/en_US.utf8/LC_IDENTIFICAT
> ION
> 1555576000-155557d000 r--s 00000000 103:01
> 280826 /usr/lib64/gconv/gconv-modules.cache
> 155557d000-155557e000 r--p 00000000 103:01
> 50937 /usr/lib/locale/en_US.utf8/LC_MEASUREMENT
> 155557e000-155557f000 r--p 00000000 103:01
> 50939 /usr/lib/locale/en_US.utf8/LC_TELEPHONE
> 155557f000-1555580000 r--p 00000000 103:01
> 3706 /usr/lib/locale/en_US.utf8/LC_ADDRESS
> 1555580000-1555581000 r--p 00000000 103:01
> 50944 /usr/lib/locale/en_US.utf8/LC_NAME
> 1555581000-1555582000 r--p 00000000 103:01
> 3775 /usr/lib/locale/en_US.utf8/LC_PAPER
> 1555582000-1555583000 r--p 00000000 103:01
> 3758 /usr/lib/locale/en_US.utf8/LC_MESSAGES/SY
> S_LC_MESSAGES
> 1555583000-1555584000 r--p 00000000 103:01
> 50938 /usr/lib/locale/en_US.utf8/LC_MONETARY
> 1555584000-1555585000 r--p 00000000 103:01
> 50940 /usr/lib/locale/en_US.utf8/LC_TIME
> 1555585000-1555586000 r--p 00000000 103:01
> 50945 /usr/lib/locale/en_US.utf8/LC_NUMERIC
> 1555590000-1555592000 rw-p 00000000 00:00 0
> 1555592000-15556b1000 r-xp 00000000 103:01
> 280105 /usr/lib64/libc-2.28.so
> 15556b1000-15556b5000 r--p 0011e000 103:01
> 280105 /usr/lib64/libc-2.28.so
> 15556b5000-15556b7000 rw-p 00122000 103:01
> 280105 /usr/lib64/libc-2.28.so
> 15556b7000-15556bb000 rw-p 00000000 00:00 0
> 15556bb000-1555933000 r--p 00000000 103:01
> 3755 /usr/lib/locale/en_US.utf8/LC_COLLATE
> 1555933000-1555986000 r--p 00000000 103:01
> 50942 /usr/lib/locale/en_US.utf8/LC_CTYPE
> 1555986000-15559a8000 rw-p 00000000 00:00 0
> 2aaaaaa000-2aaaab1000 r-xp 00000000 103:01
> 283975 /usr/bin/cat
> 2aaaab1000-2aaaab2000 r--p 00006000 103:01
> 283975 /usr/bin/cat
> 2aaaab2000-2aaaab3000 rw-p 00007000 103:01
> 283975 /usr/bin/cat
> 2aaaab3000-2aaaad4000 rw-p 00000000 00:00
> 0 [heap]
> 3fffc97000-3fffcb8000 rw-p 00000000 00:00
> 0 [stack]
>
>
>> Anyway, it's weird since userspace should not depend on how the
>> mapping is.
>>
>> If you can identify the program that stalls, that would be fantastic
>> :)
>>
> It stucks while booting. So I am not sure how to figure out which
> program stalls. It is difficult to figure out from boot log as it
> stucks at different places but soon after systemd starts.
If you can attach the running kernel, I would use vmlinux-gdb.py commands
to figure out which processes are running (lx-ps command in particular could
give us a hint). You can also add traces directly in the kernel and
either use
lx-dmesg command to print them from gdb or use your standard serial output:
I would then print task_struct->comm at context switch to see which process
is stuck.
To use the python script, you need to recompile with DEBUG_INFO and
GDB_SCRIPTS enabled.
FYI, I have just booted a custom buildroot image based on kernel 5.4-rc2.
Let me know if I can do anything.
Alex
>> As the code is common to mips and arm now and I did not hear from
>> them,
>> I imagine the problem comes
>> from us.
>>
>> Alex
On Tue, 2019-10-08 at 07:58 -0400, Alex Ghiti wrote:
> On 10/7/19 8:46 PM, Atish Patra wrote:
> > On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:
> > > On 10/4/19 10:12 PM, Atish Patra wrote:
> > > > On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
> > > > > In order to avoid wasting user address space by using bottom-
> > > > > up
> > > > > mmap
> > > > > allocation scheme, prefer top-down scheme when possible.
> > > > >
> > > > > Before:
> > > > > root@qemuriscv64:~# cat /proc/self/maps
> > > > > 00010000-00016000 r-xp 00000000 fe:00
> > > > > 6389 /bin/cat.coreutils
> > > > > 00016000-00017000 r--p 00005000 fe:00
> > > > > 6389 /bin/cat.coreutils
> > > > > 00017000-00018000 rw-p 00006000 fe:00
> > > > > 6389 /bin/cat.coreutils
> > > > > 00018000-00039000 rw-p 00000000 00:00 0 [heap]
> > > > > 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-
> > > > > 2.28.so
> > > > > 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-
> > > > > 2.28.so
> > > > > 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-
> > > > > 2.28.so
> > > > > 155556f000-1555570000 rw-p 00000000 00:00 0
> > > > > 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
> > > > > 1555574000-1555576000 rw-p 00000000 00:00 0
> > > > > 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-
> > > > > 2.28.so
> > > > > 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-
> > > > > 2.28.so
> > > > > 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-
> > > > > 2.28.so
> > > > > 155567a000-15556a0000 rw-p 00000000 00:00 0
> > > > > 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
> > > > >
> > > > > After:
> > > > > root@qemuriscv64:~# cat /proc/self/maps
> > > > > 00010000-00016000 r-xp 00000000 fe:00
> > > > > 6389 /bin/cat.coreutils
> > > > > 00016000-00017000 r--p 00005000 fe:00
> > > > > 6389 /bin/cat.coreutils
> > > > > 00017000-00018000 rw-p 00006000 fe:00
> > > > > 6389 /bin/cat.coreutils
> > > > > 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
> > > > > 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
> > > > > 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-
> > > > > 2.28.so
> > > > > 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-
> > > > > 2.28.so
> > > > > 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-
> > > > > 2.28.so
> > > > > 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
> > > > > 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
> > > > > 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-
> > > > > 2.28.so
> > > > > 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-
> > > > > 2.28.so
> > > > > 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-
> > > > > 2.28.so
> > > > > 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
> > > > > 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
> > > > >
> > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > Acked-by: Paul Walmsley <[email protected]>
> > > > > Reviewed-by: Christoph Hellwig <[email protected]>
> > > > > Reviewed-by: Kees Cook <[email protected]>
> > > > > Reviewed-by: Luis Chamberlain <[email protected]>
> > > > > ---
> > > > > arch/riscv/Kconfig | 12 ++++++++++++
> > > > > 1 file changed, 12 insertions(+)
> > > > >
> > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > index 59a4727ecd6c..87dc5370becb 100644
> > > > > --- a/arch/riscv/Kconfig
> > > > > +++ b/arch/riscv/Kconfig
> > > > > @@ -54,6 +54,18 @@ config RISCV
> > > > > select EDAC_SUPPORT
> > > > > select ARCH_HAS_GIGANTIC_PAGE
> > > > > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> > > > > + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> > > > > + select HAVE_ARCH_MMAP_RND_BITS
> > > > > +
> > > > > +config ARCH_MMAP_RND_BITS_MIN
> > > > > + default 18 if 6legacy_va_layout4BIT
> > > > > + default 8
> > > > > +
> > > > > +# max bits determined by the following formula:
> > > > > +# VA_BITS - PAGE_SHIFT - 3
> > > > > +config ARCH_MMAP_RND_BITS_MAX
> > > > > + default 24 if 64BIT # SV39 based
> > > > > + default 17
> > > > >
> > > > > config MMU
> > > > > def_bool y
> > > > With this patch, I am not able to boot a Fedora Linux(a Gnome
> > > > desktop
> > > > image) on RISC-V hardware (Unleashed + Microsemi Expansion
> > > > board).
> > > > The
> > > > booting gets stuck right after systemd starts.
> > > >
> > > > https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
> > > >
> > > > Reverting just this patch allow to boot Fedora successfully on
> > > > specific
> > > > RISC-V hardware. I have not root caused the issue but it looks
> > > > like
> > > > it
> > > > might have messed userpsace mapping.
> > > It might have messed userspace mapping but not enough to make
> > > userspace
> > > completely broken
> > > as systemd does some things. I would try to boot in legacy
> > > layout:
> > > if
> > > you can try to set sysctl legacy_va_layout
> > > at boottime, it will map userspace as it was before (bottom-up).
> > > If
> > > that
> > > does not work, the problem could
> > > be the randomization that is activated by default now.
> > Randomization may not be the issue. I just removed
> > ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that
> > seems to
> > work. Here is the bottom-up layout with randomization on.
>
> Oups, sorry for my previous answer, I missed yours that landed in
> another folder.
>
> Removing ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT also removes
> randomization
> as this config selects ARCH_HAS_ELF_RANDOMIZE.
> You could remove ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and selects by
> hand
> ARCH_HAS_ELF_RANDOMIZE but you would have to implement arch_mmap_rnd
> and
> arch_randomize_brk (elf-randomize.h).
>
Ahh okay.
> The simplest would be to boot in legacy layout: I did not find a way
> to
> set this in kernel
> command line, but you can by modifying it directly in the code:
>
> https://elixir.bootlin.com/linux/v5.4-rc2/source/kernel/sysctl.c#L269
>
Setting this to 1 works.
> > [root@fedora-riscv ~]# cat /proc/self/maps
> > 1555556000-1555570000 r-xp 00000000 103:01
> > 280098 /usr/lib64/ld-2.28.so
> > 1555570000-1555571000 r--p 00019000 103:01
> > 280098 /usr/lib64/ld-2.28.so
> > 1555571000-1555572000 rw-p 0001a000 103:01
> > 280098 /usr/lib64/ld-2.28.so
> > 1555572000-1555573000 rw-p 00000000 00:00 0
> > 1555573000-1555575000 r-xp 00000000 00:00
> > 0 [vdso]
> > 1555575000-1555576000 r--p 00000000 103:01
> > 50936 /usr/lib/locale/en_US.utf8/LC_IDENTIF
> > ICAT
> > ION
> > 1555576000-155557d000 r--s 00000000 103:01
> > 280826 /usr/lib64/gconv/gconv-modules.cache
> > 155557d000-155557e000 r--p 00000000 103:01
> > 50937 /usr/lib/locale/en_US.utf8/LC_MEASURE
> > MENT
> > 155557e000-155557f000 r--p 00000000 103:01
> > 50939 /usr/lib/locale/en_US.utf8/LC_TELEPHO
> > NE
> > 155557f000-1555580000 r--p 00000000 103:01
> > 3706 /usr/lib/locale/en_US.utf8/LC_ADDRESS
> > 1555580000-1555581000 r--p 00000000 103:01
> > 50944 /usr/lib/locale/en_US.utf8/LC_NAME
> > 1555581000-1555582000 r--p 00000000 103:01
> > 3775 /usr/lib/locale/en_US.utf8/LC_PAPER
> > 1555582000-1555583000 r--p 00000000 103:01
> > 3758 /usr/lib/locale/en_US.utf8/LC_MESSAGE
> > S/SY
> > S_LC_MESSAGES
> > 1555583000-1555584000 r--p 00000000 103:01
> > 50938 /usr/lib/locale/en_US.utf8/LC_MONETAR
> > Y
> > 1555584000-1555585000 r--p 00000000 103:01
> > 50940 /usr/lib/locale/en_US.utf8/LC_TIME
> > 1555585000-1555586000 r--p 00000000 103:01
> > 50945 /usr/lib/locale/en_US.utf8/LC_NUMERIC
> > 1555590000-1555592000 rw-p 00000000 00:00 0
> > 1555592000-15556b1000 r-xp 00000000 103:01
> > 280105 /usr/lib64/libc-2.28.so
> > 15556b1000-15556b5000 r--p 0011e000 103:01
> > 280105 /usr/lib64/libc-2.28.so
> > 15556b5000-15556b7000 rw-p 00122000 103:01
> > 280105 /usr/lib64/libc-2.28.so
> > 15556b7000-15556bb000 rw-p 00000000 00:00 0
> > 15556bb000-1555933000 r--p 00000000 103:01
> > 3755 /usr/lib/locale/en_US.utf8/LC_COLLATE
> > 1555933000-1555986000 r--p 00000000 103:01
> > 50942 /usr/lib/locale/en_US.utf8/LC_CTYPE
> > 1555986000-15559a8000 rw-p 00000000 00:00 0
> > 2aaaaaa000-2aaaab1000 r-xp 00000000 103:01
> > 283975 /usr/bin/cat
> > 2aaaab1000-2aaaab2000 r--p 00006000 103:01
> > 283975 /usr/bin/cat
> > 2aaaab2000-2aaaab3000 rw-p 00007000 103:01
> > 283975 /usr/bin/cat
> > 2aaaab3000-2aaaad4000 rw-p 00000000 00:00
> > 0 [heap]
> > 3fffc97000-3fffcb8000 rw-p 00000000 00:00
> > 0 [stack]
> >
> >
> > > Anyway, it's weird since userspace should not depend on how the
> > > mapping is.
> > >
> > > If you can identify the program that stalls, that would be
> > > fantastic
> > > :)
> > >
> > It stucks while booting. So I am not sure how to figure out which
> > program stalls. It is difficult to figure out from boot log as it
> > stucks at different places but soon after systemd starts.
>
> If you can attach the running kernel, I would use vmlinux-gdb.py
> commands
> to figure out which processes are running (lx-ps command in
> particular could
> give us a hint). You can also add traces directly in the kernel and
> either use
> lx-dmesg command to print them from gdb or use your standard serial
> output:
> I would then print task_struct->comm at context switch to see which
> process
> is stuck.
> To use the python script, you need to recompile with DEBUG_INFO and
> GDB_SCRIPTS enabled.
>
> FYI, I have just booted a custom buildroot image based on kernel 5.4-
> rc2.
>
vmlinux-gdb.py works only if you are running in Qemu or have a JTAG.
Right ?
I am seeing this issue only on HiFive Unleashed + Microsemi Expansion
board with Fedora Gnome desktop image. I can even boot a Fedora
developer image on same hardware and a busybox image in Unleashed
without any issues. But the issue is not specific to fedora as we see
the same issue in OpenEmbedded disk image as well (HiFive Unleashed +
Microsemi Expansion board).
May be it gets triggerd only if bigger userspace ?
> Let me know if I can do anything.
>
> Alex
>
> > > As the code is common to mips and arm now and I did not hear from
> > > them,
> > > I imagine the problem comes
> > > from us.
> > >
> > > Alex
--
Regards,
Atish
On 10/8/19 10:07 PM, Atish Patra wrote:
> On Tue, 2019-10-08 at 07:58 -0400, Alex Ghiti wrote:
>> On 10/7/19 8:46 PM, Atish Patra wrote:
>>> On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:
>>>> On 10/4/19 10:12 PM, Atish Patra wrote:
>>>>> On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
>>>>>> In order to avoid wasting user address space by using bottom-
>>>>>> up
>>>>>> mmap
>>>>>> allocation scheme, prefer top-down scheme when possible.
>>>>>>
>>>>>> Before:
>>>>>> root@qemuriscv64:~# cat /proc/self/maps
>>>>>> 00010000-00016000 r-xp 00000000 fe:00
>>>>>> 6389 /bin/cat.coreutils
>>>>>> 00016000-00017000 r--p 00005000 fe:00
>>>>>> 6389 /bin/cat.coreutils
>>>>>> 00017000-00018000 rw-p 00006000 fe:00
>>>>>> 6389 /bin/cat.coreutils
>>>>>> 00018000-00039000 rw-p 00000000 00:00 0 [heap]
>>>>>> 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-
>>>>>> 2.28.so
>>>>>> 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-
>>>>>> 2.28.so
>>>>>> 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-
>>>>>> 2.28.so
>>>>>> 155556f000-1555570000 rw-p 00000000 00:00 0
>>>>>> 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
>>>>>> 1555574000-1555576000 rw-p 00000000 00:00 0
>>>>>> 1555576000-1555674000 r-xp 00000000 fe:00 7187 /lib/libc-
>>>>>> 2.28.so
>>>>>> 1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-
>>>>>> 2.28.so
>>>>>> 1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-
>>>>>> 2.28.so
>>>>>> 155567a000-15556a0000 rw-p 00000000 00:00 0
>>>>>> 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
>>>>>>
>>>>>> After:
>>>>>> root@qemuriscv64:~# cat /proc/self/maps
>>>>>> 00010000-00016000 r-xp 00000000 fe:00
>>>>>> 6389 /bin/cat.coreutils
>>>>>> 00016000-00017000 r--p 00005000 fe:00
>>>>>> 6389 /bin/cat.coreutils
>>>>>> 00017000-00018000 rw-p 00006000 fe:00
>>>>>> 6389 /bin/cat.coreutils
>>>>>> 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
>>>>>> 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
>>>>>> 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00 7187 /lib/libc-
>>>>>> 2.28.so
>>>>>> 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-
>>>>>> 2.28.so
>>>>>> 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-
>>>>>> 2.28.so
>>>>>> 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
>>>>>> 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
>>>>>> 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-
>>>>>> 2.28.so
>>>>>> 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-
>>>>>> 2.28.so
>>>>>> 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-
>>>>>> 2.28.so
>>>>>> 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
>>>>>> 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
>>>>>>
>>>>>> Signed-off-by: Alexandre Ghiti <[email protected]>
>>>>>> Acked-by: Paul Walmsley <[email protected]>
>>>>>> Reviewed-by: Christoph Hellwig <[email protected]>
>>>>>> Reviewed-by: Kees Cook <[email protected]>
>>>>>> Reviewed-by: Luis Chamberlain <[email protected]>
>>>>>> ---
>>>>>> arch/riscv/Kconfig | 12 ++++++++++++
>>>>>> 1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>>>> index 59a4727ecd6c..87dc5370becb 100644
>>>>>> --- a/arch/riscv/Kconfig
>>>>>> +++ b/arch/riscv/Kconfig
>>>>>> @@ -54,6 +54,18 @@ config RISCV
>>>>>> select EDAC_SUPPORT
>>>>>> select ARCH_HAS_GIGANTIC_PAGE
>>>>>> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>>>>>> + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>>>>>> + select HAVE_ARCH_MMAP_RND_BITS
>>>>>> +
>>>>>> +config ARCH_MMAP_RND_BITS_MIN
>>>>>> + default 18 if 6legacy_va_layout4BIT
>>>>>> + default 8
>>>>>> +
>>>>>> +# max bits determined by the following formula:
>>>>>> +# VA_BITS - PAGE_SHIFT - 3
>>>>>> +config ARCH_MMAP_RND_BITS_MAX
>>>>>> + default 24 if 64BIT # SV39 based
>>>>>> + default 17
>>>>>>
>>>>>> config MMU
>>>>>> def_bool y
>>>>> With this patch, I am not able to boot a Fedora Linux(a Gnome
>>>>> desktop
>>>>> image) on RISC-V hardware (Unleashed + Microsemi Expansion
>>>>> board).
>>>>> The
>>>>> booting gets stuck right after systemd starts.
>>>>>
>>>>> https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
>>>>>
>>>>> Reverting just this patch allow to boot Fedora successfully on
>>>>> specific
>>>>> RISC-V hardware. I have not root caused the issue but it looks
>>>>> like
>>>>> it
>>>>> might have messed userpsace mapping.
>>>> It might have messed userspace mapping but not enough to make
>>>> userspace
>>>> completely broken
>>>> as systemd does some things. I would try to boot in legacy
>>>> layout:
>>>> if
>>>> you can try to set sysctl legacy_va_layout
>>>> at boottime, it will map userspace as it was before (bottom-up).
>>>> If
>>>> that
>>>> does not work, the problem could
>>>> be the randomization that is activated by default now.
>>> Randomization may not be the issue. I just removed
>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that
>>> seems to
>>> work. Here is the bottom-up layout with randomization on.
>> Oups, sorry for my previous answer, I missed yours that landed in
>> another folder.
>>
>> Removing ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT also removes
>> randomization
>> as this config selects ARCH_HAS_ELF_RANDOMIZE.
>> You could remove ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and selects by
>> hand
>> ARCH_HAS_ELF_RANDOMIZE but you would have to implement arch_mmap_rnd
>> and
>> arch_randomize_brk (elf-randomize.h).
>>
> Ahh okay.
>
>> The simplest would be to boot in legacy layout: I did not find a way
>> to
>> set this in kernel
>> command line, but you can by modifying it directly in the code:
>>
>> https://elixir.bootlin.com/linux/v5.4-rc2/source/kernel/sysctl.c#L269
>>
> Setting this to 1 works.
>
>>> [root@fedora-riscv ~]# cat /proc/self/maps
>>> 1555556000-1555570000 r-xp 00000000 103:01
>>> 280098 /usr/lib64/ld-2.28.so
>>> 1555570000-1555571000 r--p 00019000 103:01
>>> 280098 /usr/lib64/ld-2.28.so
>>> 1555571000-1555572000 rw-p 0001a000 103:01
>>> 280098 /usr/lib64/ld-2.28.so
>>> 1555572000-1555573000 rw-p 00000000 00:00 0
>>> 1555573000-1555575000 r-xp 00000000 00:00
>>> 0 [vdso]
>>> 1555575000-1555576000 r--p 00000000 103:01
>>> 50936 /usr/lib/locale/en_US.utf8/LC_IDENTIF
>>> ICAT
>>> ION
>>> 1555576000-155557d000 r--s 00000000 103:01
>>> 280826 /usr/lib64/gconv/gconv-modules.cache
>>> 155557d000-155557e000 r--p 00000000 103:01
>>> 50937 /usr/lib/locale/en_US.utf8/LC_MEASURE
>>> MENT
>>> 155557e000-155557f000 r--p 00000000 103:01
>>> 50939 /usr/lib/locale/en_US.utf8/LC_TELEPHO
>>> NE
>>> 155557f000-1555580000 r--p 00000000 103:01
>>> 3706 /usr/lib/locale/en_US.utf8/LC_ADDRESS
>>> 1555580000-1555581000 r--p 00000000 103:01
>>> 50944 /usr/lib/locale/en_US.utf8/LC_NAME
>>> 1555581000-1555582000 r--p 00000000 103:01
>>> 3775 /usr/lib/locale/en_US.utf8/LC_PAPER
>>> 1555582000-1555583000 r--p 00000000 103:01
>>> 3758 /usr/lib/locale/en_US.utf8/LC_MESSAGE
>>> S/SY
>>> S_LC_MESSAGES
>>> 1555583000-1555584000 r--p 00000000 103:01
>>> 50938 /usr/lib/locale/en_US.utf8/LC_MONETAR
>>> Y
>>> 1555584000-1555585000 r--p 00000000 103:01
>>> 50940 /usr/lib/locale/en_US.utf8/LC_TIME
>>> 1555585000-1555586000 r--p 00000000 103:01
>>> 50945 /usr/lib/locale/en_US.utf8/LC_NUMERIC
>>> 1555590000-1555592000 rw-p 00000000 00:00 0
>>> 1555592000-15556b1000 r-xp 00000000 103:01
>>> 280105 /usr/lib64/libc-2.28.so
>>> 15556b1000-15556b5000 r--p 0011e000 103:01
>>> 280105 /usr/lib64/libc-2.28.so
>>> 15556b5000-15556b7000 rw-p 00122000 103:01
>>> 280105 /usr/lib64/libc-2.28.so
>>> 15556b7000-15556bb000 rw-p 00000000 00:00 0
>>> 15556bb000-1555933000 r--p 00000000 103:01
>>> 3755 /usr/lib/locale/en_US.utf8/LC_COLLATE
>>> 1555933000-1555986000 r--p 00000000 103:01
>>> 50942 /usr/lib/locale/en_US.utf8/LC_CTYPE
>>> 1555986000-15559a8000 rw-p 00000000 00:00 0
>>> 2aaaaaa000-2aaaab1000 r-xp 00000000 103:01
>>> 283975 /usr/bin/cat
>>> 2aaaab1000-2aaaab2000 r--p 00006000 103:01
>>> 283975 /usr/bin/cat
>>> 2aaaab2000-2aaaab3000 rw-p 00007000 103:01
>>> 283975 /usr/bin/cat
>>> 2aaaab3000-2aaaad4000 rw-p 00000000 00:00
>>> 0 [heap]
>>> 3fffc97000-3fffcb8000 rw-p 00000000 00:00
>>> 0 [stack]
>>>
>>>
>>>> Anyway, it's weird since userspace should not depend on how the
>>>> mapping is.
>>>>
>>>> If you can identify the program that stalls, that would be
>>>> fantastic
>>>> :)
>>>>
>>> It stucks while booting. So I am not sure how to figure out which
>>> program stalls. It is difficult to figure out from boot log as it
>>> stucks at different places but soon after systemd starts.
>> If you can attach the running kernel, I would use vmlinux-gdb.py
>> commands
>> to figure out which processes are running (lx-ps command in
>> particular could
>> give us a hint). You can also add traces directly in the kernel and
>> either use
>> lx-dmesg command to print them from gdb or use your standard serial
>> output:
>> I would then print task_struct->comm at context switch to see which
>> process
>> is stuck.
>> To use the python script, you need to recompile with DEBUG_INFO and
>> GDB_SCRIPTS enabled.
>>
>> FYI, I have just booted a custom buildroot image based on kernel 5.4-
>> rc2.
>>
> vmlinux-gdb.py works only if you are running in Qemu or have a JTAG.
> Right ?
Yes. Does the problem appear on qemu too ?
> I am seeing this issue only on HiFive Unleashed + Microsemi Expansion
> board with Fedora Gnome desktop image. I can even boot a Fedora
> developer image on same hardware and a busybox image in Unleashed
> without any issues. But the issue is not specific to fedora as we see
> the same issue in OpenEmbedded disk image as well (HiFive Unleashed +
> Microsemi Expansion board).
>
> May be it gets triggerd only if bigger userspace ?
The purpose of this patch is, on the contrary, to not lose address space by
setting an arbitrary starting point (TASK_UNMAPPED_BASE) for mmap.
Do you a link to an image that fails to boot and reproduce the problem
on Qemu ?
I don't have much idea here, the only thing I can think of is finding
the application
that makes the boot stall and understand what is wrong.
>
>> Let me know if I can do anything.
>>
>> Alex
>>
>>>> As the code is common to mips and arm now and I did not hear from
>>>> them,
>>>> I imagine the problem comes
>>>> from us.
>>>>
>>>> Alex
On Wed, 2019-10-09 at 14:39 -0400, Alex Ghiti wrote:
> On 10/8/19 10:07 PM, Atish Patra wrote:
> > On Tue, 2019-10-08 at 07:58 -0400, Alex Ghiti wrote:
> > > On 10/7/19 8:46 PM, Atish Patra wrote:
> > > > On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:
> > > > > On 10/4/19 10:12 PM, Atish Patra wrote:
> > > > > > On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:
> > > > > > > In order to avoid wasting user address space by using
> > > > > > > bottom-
> > > > > > > up
> > > > > > > mmap
> > > > > > > allocation scheme, prefer top-down scheme when possible.
> > > > > > >
> > > > > > > Before:
> > > > > > > root@qemuriscv64:~# cat /proc/self/maps
> > > > > > > 00010000-00016000 r-xp 00000000 fe:00
> > > > > > > 6389 /bin/cat.coreutils
> > > > > > > 00016000-00017000 r--p 00005000 fe:00
> > > > > > > 6389 /bin/cat.coreutils
> > > > > > > 00017000-00018000 rw-p 00006000 fe:00
> > > > > > > 6389 /bin/cat.coreutils
> > > > > > > 00018000-00039000 rw-p 00000000 00:00 0 [heap]
> > > > > > > 1555556000-155556d000 r-xp 00000000 fe:00 7193 /lib/ld-
> > > > > > > 2.28.so
> > > > > > > 155556d000-155556e000 r--p 00016000 fe:00 7193 /lib/ld-
> > > > > > > 2.28.so
> > > > > > > 155556e000-155556f000 rw-p 00017000 fe:00 7193 /lib/ld-
> > > > > > > 2.28.so
> > > > > > > 155556f000-1555570000 rw-p 00000000 00:00 0
> > > > > > > 1555570000-1555572000 r-xp 00000000 00:00 0 [vdso]
> > > > > > > 1555574000-1555576000 rw-p 00000000 00:00 0
> > > > > > > 1555576000-1555674000 r-xp 00000000 fe:00
> > > > > > > 7187 /lib/libc-
> > > > > > > 2.28.so
> > > > > > > 1555674000-1555678000 r--p 000fd000 fe:00
> > > > > > > 7187 /lib/libc-
> > > > > > > 2.28.so
> > > > > > > 1555678000-155567a000 rw-p 00101000 fe:00
> > > > > > > 7187 /lib/libc-
> > > > > > > 2.28.so
> > > > > > > 155567a000-15556a0000 rw-p 00000000 00:00 0
> > > > > > > 3fffb90000-3fffbb1000 rw-p 00000000 00:00 0 [stack]
> > > > > > >
> > > > > > > After:
> > > > > > > root@qemuriscv64:~# cat /proc/self/maps
> > > > > > > 00010000-00016000 r-xp 00000000 fe:00
> > > > > > > 6389 /bin/cat.coreutils
> > > > > > > 00016000-00017000 r--p 00005000 fe:00
> > > > > > > 6389 /bin/cat.coreutils
> > > > > > > 00017000-00018000 rw-p 00006000 fe:00
> > > > > > > 6389 /bin/cat.coreutils
> > > > > > > 2de81000-2dea2000 rw-p 00000000 00:00 0 [heap]
> > > > > > > 3ff7eb6000-3ff7ed8000 rw-p 00000000 00:00 0
> > > > > > > 3ff7ed8000-3ff7fd6000 r-xp 00000000 fe:00
> > > > > > > 7187 /lib/libc-
> > > > > > > 2.28.so
> > > > > > > 3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00
> > > > > > > 7187 /lib/libc-
> > > > > > > 2.28.so
> > > > > > > 3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00
> > > > > > > 7187 /lib/libc-
> > > > > > > 2.28.so
> > > > > > > 3ff7fdc000-3ff7fe2000 rw-p 00000000 00:00 0
> > > > > > > 3ff7fe4000-3ff7fe6000 r-xp 00000000 00:00 0 [vdso]
> > > > > > > 3ff7fe6000-3ff7ffd000 r-xp 00000000 fe:00 7193 /lib/ld-
> > > > > > > 2.28.so
> > > > > > > 3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-
> > > > > > > 2.28.so
> > > > > > > 3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-
> > > > > > > 2.28.so
> > > > > > > 3ff7fff000-3ff8000000 rw-p 00000000 00:00 0
> > > > > > > 3fff888000-3fff8a9000 rw-p 00000000 00:00 0 [stack]
> > > > > > >
> > > > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > > > Acked-by: Paul Walmsley <[email protected]>
> > > > > > > Reviewed-by: Christoph Hellwig <[email protected]>
> > > > > > > Reviewed-by: Kees Cook <[email protected]>
> > > > > > > Reviewed-by: Luis Chamberlain <[email protected]>
> > > > > > > ---
> > > > > > > arch/riscv/Kconfig | 12 ++++++++++++
> > > > > > > 1 file changed, 12 insertions(+)
> > > > > > >
> > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > index 59a4727ecd6c..87dc5370becb 100644
> > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > @@ -54,6 +54,18 @@ config RISCV
> > > > > > > select EDAC_SUPPORT
> > > > > > > select ARCH_HAS_GIGANTIC_PAGE
> > > > > > > select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> > > > > > > + select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> > > > > > > + select HAVE_ARCH_MMAP_RND_BITS
> > > > > > > +
> > > > > > > +config ARCH_MMAP_RND_BITS_MIN
> > > > > > > + default 18 if 6legacy_va_layout4BIT
> > > > > > > + default 8
> > > > > > > +
> > > > > > > +# max bits determined by the following formula:
> > > > > > > +# VA_BITS - PAGE_SHIFT - 3
> > > > > > > +config ARCH_MMAP_RND_BITS_MAX
> > > > > > > + default 24 if 64BIT # SV39 based
> > > > > > > + default 17
> > > > > > >
> > > > > > > config MMU
> > > > > > > def_bool y
> > > > > > With this patch, I am not able to boot a Fedora Linux(a
> > > > > > Gnome
> > > > > > desktop
> > > > > > image) on RISC-V hardware (Unleashed + Microsemi Expansion
> > > > > > board).
> > > > > > The
> > > > > > booting gets stuck right after systemd starts.
> > > > > >
> > > > > > https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg
> > > > > >
> > > > > > Reverting just this patch allow to boot Fedora successfully
> > > > > > on
> > > > > > specific
> > > > > > RISC-V hardware. I have not root caused the issue but it
> > > > > > looks
> > > > > > like
> > > > > > it
> > > > > > might have messed userpsace mapping.
> > > > > It might have messed userspace mapping but not enough to make
> > > > > userspace
> > > > > completely broken
> > > > > as systemd does some things. I would try to boot in legacy
> > > > > layout:
> > > > > if
> > > > > you can try to set sysctl legacy_va_layout
> > > > > at boottime, it will map userspace as it was before (bottom-
> > > > > up).
> > > > > If
> > > > > that
> > > > > does not work, the problem could
> > > > > be the randomization that is activated by default now.
> > > > Randomization may not be the issue. I just removed
> > > > ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that
> > > > seems to
> > > > work. Here is the bottom-up layout with randomization on.
> > > Oups, sorry for my previous answer, I missed yours that landed in
> > > another folder.
> > >
> > > Removing ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT also removes
> > > randomization
> > > as this config selects ARCH_HAS_ELF_RANDOMIZE.
> > > You could remove ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
> > > selects by
> > > hand
> > > ARCH_HAS_ELF_RANDOMIZE but you would have to implement
> > > arch_mmap_rnd
> > > and
> > > arch_randomize_brk (elf-randomize.h).
> > >
> > Ahh okay.
> >
> > > The simplest would be to boot in legacy layout: I did not find a
> > > way
> > > to
> > > set this in kernel
> > > command line, but you can by modifying it directly in the code:
> > >
> > > https://elixir.bootlin.com/linux/v5.4-rc2/source/kernel/sysctl.c#L269
> > >
> > Setting this to 1 works.
> >
> > > > [root@fedora-riscv ~]# cat /proc/self/maps
> > > > 1555556000-1555570000 r-xp 00000000 103:01
> > > > 280098 /usr/lib64/ld-2.28.so
> > > > 1555570000-1555571000 r--p 00019000 103:01
> > > > 280098 /usr/lib64/ld-2.28.so
> > > > 1555571000-1555572000 rw-p 0001a000 103:01
> > > > 280098 /usr/lib64/ld-2.28.so
> > > > 1555572000-1555573000 rw-p 00000000 00:00 0
> > > > 1555573000-1555575000 r-xp 00000000 00:00
> > > > 0 [vdso]
> > > > 1555575000-1555576000 r--p 00000000 103:01
> > > > 50936 /usr/lib/locale/en_US.utf8/LC_IDE
> > > > NTIF
> > > > ICAT
> > > > ION
> > > > 1555576000-155557d000 r--s 00000000 103:01
> > > > 280826 /usr/lib64/gconv/gconv-
> > > > modules.cache
> > > > 155557d000-155557e000 r--p 00000000 103:01
> > > > 50937 /usr/lib/locale/en_US.utf8/LC_MEA
> > > > SURE
> > > > MENT
> > > > 155557e000-155557f000 r--p 00000000 103:01
> > > > 50939 /usr/lib/locale/en_US.utf8/LC_TEL
> > > > EPHO
> > > > NE
> > > > 155557f000-1555580000 r--p 00000000 103:01
> > > > 3706 /usr/lib/locale/en_US.utf8/LC_ADD
> > > > RESS
> > > > 1555580000-1555581000 r--p 00000000 103:01
> > > > 50944 /usr/lib/locale/en_US.utf8/LC_NAM
> > > > E
> > > > 1555581000-1555582000 r--p 00000000 103:01
> > > > 3775 /usr/lib/locale/en_US.utf8/LC_PAP
> > > > ER
> > > > 1555582000-1555583000 r--p 00000000 103:01
> > > > 3758 /usr/lib/locale/en_US.utf8/LC_MES
> > > > SAGE
> > > > S/SY
> > > > S_LC_MESSAGES
> > > > 1555583000-1555584000 r--p 00000000 103:01
> > > > 50938 /usr/lib/locale/en_US.utf8/LC_MON
> > > > ETAR
> > > > Y
> > > > 1555584000-1555585000 r--p 00000000 103:01
> > > > 50940 /usr/lib/locale/en_US.utf8/LC_TIM
> > > > E
> > > > 1555585000-1555586000 r--p 00000000 103:01
> > > > 50945 /usr/lib/locale/en_US.utf8/LC_NUM
> > > > ERIC
> > > > 1555590000-1555592000 rw-p 00000000 00:00 0
> > > > 1555592000-15556b1000 r-xp 00000000 103:01
> > > > 280105 /usr/lib64/libc-2.28.so
> > > > 15556b1000-15556b5000 r--p 0011e000 103:01
> > > > 280105 /usr/lib64/libc-2.28.so
> > > > 15556b5000-15556b7000 rw-p 00122000 103:01
> > > > 280105 /usr/lib64/libc-2.28.so
> > > > 15556b7000-15556bb000 rw-p 00000000 00:00 0
> > > > 15556bb000-1555933000 r--p 00000000 103:01
> > > > 3755 /usr/lib/locale/en_US.utf8/LC_COL
> > > > LATE
> > > > 1555933000-1555986000 r--p 00000000 103:01
> > > > 50942 /usr/lib/locale/en_US.utf8/LC_CTY
> > > > PE
> > > > 1555986000-15559a8000 rw-p 00000000 00:00 0
> > > > 2aaaaaa000-2aaaab1000 r-xp 00000000 103:01
> > > > 283975 /usr/bin/cat
> > > > 2aaaab1000-2aaaab2000 r--p 00006000 103:01
> > > > 283975 /usr/bin/cat
> > > > 2aaaab2000-2aaaab3000 rw-p 00007000 103:01
> > > > 283975 /usr/bin/cat
> > > > 2aaaab3000-2aaaad4000 rw-p 00000000 00:00
> > > > 0 [heap]
> > > > 3fffc97000-3fffcb8000 rw-p 00000000 00:00
> > > > 0 [stack]
> > > >
> > > >
> > > > > Anyway, it's weird since userspace should not depend on how
> > > > > the
> > > > > mapping is.
> > > > >
> > > > > If you can identify the program that stalls, that would be
> > > > > fantastic
> > > > > :)
> > > > >
> > > > It stucks while booting. So I am not sure how to figure out
> > > > which
> > > > program stalls. It is difficult to figure out from boot log as
> > > > it
> > > > stucks at different places but soon after systemd starts.
> > > If you can attach the running kernel, I would use vmlinux-gdb.py
> > > commands
> > > to figure out which processes are running (lx-ps command in
> > > particular could
> > > give us a hint). You can also add traces directly in the kernel
> > > and
> > > either use
> > > lx-dmesg command to print them from gdb or use your standard
> > > serial
> > > output:
> > > I would then print task_struct->comm at context switch to see
> > > which
> > > process
> > > is stuck.
> > > To use the python script, you need to recompile with DEBUG_INFO
> > > and
> > > GDB_SCRIPTS enabled.
> > >
> > > FYI, I have just booted a custom buildroot image based on kernel
> > > 5.4-
> > > rc2.
> > >
> > vmlinux-gdb.py works only if you are running in Qemu or have a
> > JTAG.
> > Right ?
>
> Yes. Does the problem appear on qemu too ?
>
Nope.
> > I am seeing this issue only on HiFive Unleashed + Microsemi
> > Expansion
> > board with Fedora Gnome desktop image. I can even boot a Fedora
> > developer image on same hardware and a busybox image in Unleashed
> > without any issues. But the issue is not specific to fedora as we
> > see
> > the same issue in OpenEmbedded disk image as well (HiFive Unleashed
> > +
> > Microsemi Expansion board).
> >
> > May be it gets triggerd only if bigger userspace ?
>
> The purpose of this patch is, on the contrary, to not lose address
> space by
> setting an arbitrary starting point (TASK_UNMAPPED_BASE) for mmap.
>
> Do you a link to an image that fails to boot and reproduce the
> problem
> on Qemu ?
>
Nope. This is only reproducible in RISC-V Fedora Gnome desktop image on
a HiFive Unleashed + Microsemi Expansion. Just to clarify, there is no
issue with OpenEmbedded disk image related to memory layout. It was a
userspace thing.
Unforunately, fedora builders do not build gnome images now and the one
I am using is very old (~ 1 year). I think I can just use the sysctl
parameter for now in my tree and live with it until we have Fedora 31
images availale. We can revisit the problem after that. Thanks for
suggestions.
> I don't have much idea here, the only thing I can think of is
> finding
> the application
> that makes the boot stall and understand what is wrong.
>
> > > Let me know if I can do anything.
> > >
> > > Alex
> > >
> > > > > As the code is common to mips and arm now and I did not hear
> > > > > from
> > > > > them,
> > > > > I imagine the problem comes
> > > > > from us.
> > > > >
> > > > > Alex
--
Regards,
Atish
On Okt 15 2019, Atish Patra <[email protected]> wrote:
> Nope. This is only reproducible in RISC-V Fedora Gnome desktop image on
> a HiFive Unleashed + Microsemi Expansion. Just to clarify, there is no
> issue with OpenEmbedded disk image related to memory layout. It was a
> userspace thing.
Does it also happen with any of the openSUSE images?
https://download.opensuse.org/ports/riscv/tumbleweed/images/
Andreas.
--
Andreas Schwab, SUSE Labs, [email protected]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."