2023-07-27 21:49:58

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v8 0/4] RISC-V: mm: Make SV48 the default address space

Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.

-Charlie

---
v8:
- Fix RV32 and the RV32 compat mode of RV64
- Extract out addr and base from the mmap macros
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.

v6:
- Rebase onto the correct base

v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.

v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---

Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes

Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 28 ++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 257 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh

--
2.41.0



2023-07-27 21:49:58

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v8 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57

Make sv48 the default address space for mmap as some applications
currently depend on this assumption. A hint address passed to mmap will
cause the largest address space that fits entirely into the hint to be
used. If the hint is less than or equal to 1<<38, an sv39 address will
be used. An exception is that if the hint address is 0, then a sv48
address will be used. After an address space is completely full, the next
smallest address space will be used.

Signed-off-by: Charlie Jenkins <[email protected]>
---
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 20 +++++++++++-
arch/riscv/include/asm/processor.h | 52 ++++++++++++++++++++++++++----
3 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index c24280774caf..5d3368d5585c 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
* the loader. We need to make sure that it is out of the way of the program
* that it will "exec", and that there is sufficient room for the brk.
*/
-#define ELF_ET_DYN_BASE ((TASK_SIZE / 3) * 2)
+#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2)

#ifdef CONFIG_64BIT
#ifdef CONFIG_COMPAT
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 75970ee2bda2..c76a1ef094a4 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -63,8 +63,26 @@
* position vmemmap directly below the VMALLOC region.
*/
#ifdef CONFIG_64BIT
+#define VA_BITS_SV39 39
+#define VA_BITS_SV48 48
+#define VA_BITS_SV57 57
+
+#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1))
+#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
+#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
+
#define VA_BITS (pgtable_l5_enabled ? \
- 57 : (pgtable_l4_enabled ? 48 : 39))
+ VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39))
+
+#ifdef CONFIG_COMPAT
+#define MMAP_VA_BITS_64 ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
+#define MMAP_MIN_VA_BITS_64 ((VA_BITS >= VA_BITS_SV39) ? VA_BITS_SV39 : VA_BITS)
+#define MMAP_VA_BITS (test_thread_flag(TIF_32BIT) ? 32 : MMAP_VA_BITS_64)
+#define MMAP_MIN_VA_BITS (test_thread_flag(TIF_32BIT) ? 32 : MMAP_MIN_VA_BITS_64)
+#else
+#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
+#define MMAP_MIN_VA_BITS ((VA_BITS >= VA_BITS_SV39) ? VA_BITS_SV39 : VA_BITS)
+#endif
#else
#define VA_BITS 32
#endif
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index c950a8d9edef..e810244ea951 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -13,19 +13,59 @@

#include <asm/ptrace.h>

+#ifdef CONFIG_64BIT
+#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
+#define STACK_TOP_MAX TASK_SIZE_64
+
+#define arch_get_mmap_end(addr, len, flags) \
+({ \
+ unsigned long mmap_end; \
+ typeof(addr) _addr = (addr); \
+ if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_32BIT))) \
+ mmap_end = DEFAULT_MAP_WINDOW; \
+ else if ((_addr) >= VA_USER_SV57) \
+ mmap_end = STACK_TOP_MAX; \
+ else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
+ mmap_end = VA_USER_SV48; \
+ else \
+ mmap_end = VA_USER_SV39; \
+ mmap_end; \
+})
+
+#define arch_get_mmap_base(addr, base) \
+({ \
+ unsigned long mmap_base; \
+ typeof(addr) _addr = (addr); \
+ typeof(base) _base = (base); \
+ unsigned long rnd_gap = (_base) - DEFAULT_MAP_WINDOW; \
+ if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_32BIT))) \
+ mmap_base = (_base); \
+ else if (((_addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) \
+ mmap_base = VA_USER_SV57 + rnd_gap; \
+ else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
+ mmap_base = VA_USER_SV48 + rnd_gap; \
+ else \
+ mmap_base = VA_USER_SV39 + rnd_gap; \
+ mmap_base; \
+})
+
+#else
+#define DEFAULT_MAP_WINDOW TASK_SIZE
+#define STACK_TOP_MAX TASK_SIZE
+#endif
+#define STACK_ALIGN 16
+
+#define STACK_TOP DEFAULT_MAP_WINDOW
+
/*
* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
-#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
-
-#define STACK_TOP TASK_SIZE
#ifdef CONFIG_64BIT
-#define STACK_TOP_MAX TASK_SIZE_64
+#define TASK_UNMAPPED_BASE PAGE_ALIGN((UL(1) << MMAP_MIN_VA_BITS) / 3)
#else
-#define STACK_TOP_MAX TASK_SIZE
+#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
#endif
-#define STACK_ALIGN 16

#ifndef __ASSEMBLY__

--
2.41.0


2023-07-27 21:51:15

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v8 2/4] RISC-V: mm: Add tests for RISC-V mm

Add tests that enforce mmap hint address behavior. mmap should default
to sv48. mmap will provide an address at the highest address space that
can fit into the hint address, unless the hint address is less than sv39
and not 0, then it will return a sv39 address.

These tests are split into two files: mmap_default.c and mmap_bottomup.c
because a new process must be exec'd in order to change the mmap layout.
The run_mmap.sh script sets the stack to be unlimited for the
mmap_bottomup.c test which triggers a bottomup layout.

Signed-off-by: Charlie Jenkins <[email protected]>
---
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
7 files changed, 164 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh

diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index f4b3d5c9af5b..4a9ff515a3a0 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -5,7 +5,7 @@
ARCH ?= $(shell uname -m 2>/dev/null || echo not)

ifneq (,$(filter $(ARCH),riscv))
-RISCV_SUBTARGETS ?= hwprobe vector
+RISCV_SUBTARGETS ?= hwprobe vector mm
else
RISCV_SUBTARGETS :=
endif
diff --git a/tools/testing/selftests/riscv/mm/.gitignore b/tools/testing/selftests/riscv/mm/.gitignore
new file mode 100644
index 000000000000..5c2c57cb950c
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/.gitignore
@@ -0,0 +1,2 @@
+mmap_bottomup
+mmap_default
diff --git a/tools/testing/selftests/riscv/mm/Makefile b/tools/testing/selftests/riscv/mm/Makefile
new file mode 100644
index 000000000000..11e0f0568923
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2021 ARM Limited
+# Originally tools/testing/arm64/abi/Makefile
+
+# Additional include paths needed by kselftest.h and local headers
+CFLAGS += -D_GNU_SOURCE -std=gnu99 -I.
+
+TEST_GEN_FILES := testcases/mmap_default testcases/mmap_bottomup
+
+TEST_PROGS := testcases/run_mmap.sh
+
+include ../../lib.mk
+
+$(OUTPUT)/mm: testcases/mmap_default.c testcases/mmap_bottomup.c testcases/mmap_tests.h
+ $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c b/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
new file mode 100644
index 000000000000..b29379f7e478
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <sys/mman.h>
+#include <testcases/mmap_test.h>
+
+#include "../../kselftest_harness.h"
+
+TEST(infinite_rlimit)
+{
+// Only works on 64 bit
+#if __riscv_xlen == 64
+ struct addresses mmap_addresses;
+
+ EXPECT_EQ(BOTTOM_UP, memory_layout());
+
+ do_mmaps(&mmap_addresses);
+
+ EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);
+
+ EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
+ EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
+ EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
+ EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
+ EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
+ EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
+ EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
+#endif
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_default.c b/tools/testing/selftests/riscv/mm/testcases/mmap_default.c
new file mode 100644
index 000000000000..d1accb91b726
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/testcases/mmap_default.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <sys/mman.h>
+#include <testcases/mmap_test.h>
+
+#include "../../kselftest_harness.h"
+
+TEST(default_rlimit)
+{
+// Only works on 64 bit
+#if __riscv_xlen == 64
+ struct addresses mmap_addresses;
+
+ EXPECT_EQ(TOP_DOWN, memory_layout());
+
+ do_mmaps(&mmap_addresses);
+
+ EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
+ EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);
+
+ EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
+ EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
+ EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
+ EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
+ EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
+ EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
+ EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
+#endif
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_test.h b/tools/testing/selftests/riscv/mm/testcases/mmap_test.h
new file mode 100644
index 000000000000..98a892de5d19
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/testcases/mmap_test.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _TESTCASES_MMAP_TEST_H
+#define _TESTCASES_MMAP_TEST_H
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <stddef.h>
+
+#define TOP_DOWN 0
+#define BOTTOM_UP 1
+
+struct addresses {
+ int *no_hint;
+ int *on_37_addr;
+ int *on_38_addr;
+ int *on_46_addr;
+ int *on_47_addr;
+ int *on_55_addr;
+ int *on_56_addr;
+};
+
+void do_mmaps(struct addresses *mmap_addresses)
+{
+ /*
+ * Place all of the hint addresses on the boundaries of mmap
+ * sv39, sv48, sv57
+ * User addresses end at 1<<38, 1<<47, 1<<56 respectively
+ */
+ void *on_37_bits = (void *)(1UL << 37);
+ void *on_38_bits = (void *)(1UL << 38);
+ void *on_46_bits = (void *)(1UL << 46);
+ void *on_47_bits = (void *)(1UL << 47);
+ void *on_55_bits = (void *)(1UL << 55);
+ void *on_56_bits = (void *)(1UL << 56);
+
+ int prot = PROT_READ | PROT_WRITE;
+ int flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+ mmap_addresses->no_hint =
+ mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0);
+ mmap_addresses->on_37_addr =
+ mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0);
+ mmap_addresses->on_38_addr =
+ mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0);
+ mmap_addresses->on_46_addr =
+ mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0);
+ mmap_addresses->on_47_addr =
+ mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0);
+ mmap_addresses->on_55_addr =
+ mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0);
+ mmap_addresses->on_56_addr =
+ mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0);
+}
+
+int memory_layout(void)
+{
+ int prot = PROT_READ | PROT_WRITE;
+ int flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+ void *value1 = mmap(NULL, sizeof(int), prot, flags, 0, 0);
+ void *value2 = mmap(NULL, sizeof(int), prot, flags, 0, 0);
+
+ return value2 > value1;
+}
+#endif /* _TESTCASES_MMAP_TEST_H */
diff --git a/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh b/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
new file mode 100755
index 000000000000..ca5ad7c48bad
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+original_stack_limit=$(ulimit -s)
+
+./mmap_default
+
+# Force mmap_bottomup to be ran with bottomup memory due to
+# the unlimited stack
+ulimit -s unlimited
+./mmap_bottomup
+ulimit -s $original_stack_limit
--
2.41.0


2023-07-27 21:51:44

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v8 4/4] RISC-V: mm: Document mmap changes

The behavior of mmap is modified with this patch series, so explain the
changes to the mmap hint address behavior.

Signed-off-by: Charlie Jenkins <[email protected]>
---
Documentation/riscv/vm-layout.rst | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index 5462c84f4723..69ff6da1dbf8 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -133,3 +133,25 @@ RISC-V Linux Kernel SV57
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF
ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel
__________________|____________|__________________|_________|____________________________________________________________
+
+
+Userspace VAs
+--------------------
+To maintain compatibility with software that relies on the VA space with a
+maximum of 48 bits the kernel will, by default, return virtual addresses to
+userspace from a 48-bit range (sv48). This default behavior is achieved by
+passing 0 into the hint address parameter of mmap. On CPUs with an address space
+smaller than sv48, the CPU maximum supported address space will be the default.
+
+Software can "opt-in" to receiving VAs from another VA space by providing
+a hint address to mmap. A hint address passed to mmap will cause the largest
+address space that fits entirely into the hint to be used, unless there is no
+space left in the address space. If there is no space available in the requested
+address space, an address in the next smallest available address space will be
+returned.
+
+For example, in order to obtain 48-bit VA space, a hint address greater than
+:code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace
+ending at :code:`1 << 47` and the addresses beyond this are reserved for the
+kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
+than or equal to :code:`1 << 56` must be provided.
--
2.41.0


2023-07-27 22:16:19

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v8 3/4] RISC-V: mm: Update pgtable comment documentation

sv57 is supported in the kernel so pgtable.h should reflect that.

Signed-off-by: Charlie Jenkins <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
---
arch/riscv/include/asm/pgtable.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index c76a1ef094a4..b551467a1dd8 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -848,14 +848,16 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
* Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
* Note that PGDIR_SIZE must evenly divide TASK_SIZE.
* Task size is:
- * - 0x9fc00000 (~2.5GB) for RV32.
- * - 0x4000000000 ( 256GB) for RV64 using SV39 mmu
- * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
+ * - 0x9fc00000 (~2.5GB) for RV32.
+ * - 0x4000000000 ( 256GB) for RV64 using SV39 mmu
+ * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
+ * - 0x100000000000000 ( 64PB) for RV64 using SV57 mmu
*
* Note that PGDIR_SIZE must evenly divide TASK_SIZE since "RISC-V
* Instruction Set Manual Volume II: Privileged Architecture" states that
* "load and store effective addresses, which are 64bits, must have bits
* 63–48 all equal to bit 47, or else a page-fault exception will occur."
+ * Similarly for SV57, bits 63–57 must be equal to bit 56.
*/
#ifdef CONFIG_64BIT
#define TASK_SIZE_64 (PGDIR_SIZE * PTRS_PER_PGD / 2)
--
2.41.0


2023-08-06 10:09:47

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v8 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57


On 27/07/2023 23:26, Charlie Jenkins wrote:
> Make sv48 the default address space for mmap as some applications
> currently depend on this assumption. A hint address passed to mmap will
> cause the largest address space that fits entirely into the hint to be
> used. If the hint is less than or equal to 1<<38, an sv39 address will
> be used. An exception is that if the hint address is 0, then a sv48
> address will be used. After an address space is completely full, the next
> smallest address space will be used.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
> ---
> arch/riscv/include/asm/elf.h | 2 +-
> arch/riscv/include/asm/pgtable.h | 20 +++++++++++-
> arch/riscv/include/asm/processor.h | 52 ++++++++++++++++++++++++++----
> 3 files changed, 66 insertions(+), 8 deletions(-)
>
> diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
> index c24280774caf..5d3368d5585c 100644
> --- a/arch/riscv/include/asm/elf.h
> +++ b/arch/riscv/include/asm/elf.h
> @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
> * the loader. We need to make sure that it is out of the way of the program
> * that it will "exec", and that there is sufficient room for the brk.
> */
> -#define ELF_ET_DYN_BASE ((TASK_SIZE / 3) * 2)
> +#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2)
>
> #ifdef CONFIG_64BIT
> #ifdef CONFIG_COMPAT
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 75970ee2bda2..c76a1ef094a4 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -63,8 +63,26 @@
> * position vmemmap directly below the VMALLOC region.
> */
> #ifdef CONFIG_64BIT
> +#define VA_BITS_SV39 39
> +#define VA_BITS_SV48 48
> +#define VA_BITS_SV57 57
> +
> +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1))
> +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
> +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
> +
> #define VA_BITS (pgtable_l5_enabled ? \
> - 57 : (pgtable_l4_enabled ? 48 : 39))
> + VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39))
> +
> +#ifdef CONFIG_COMPAT
> +#define MMAP_VA_BITS_64 ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
> +#define MMAP_MIN_VA_BITS_64 ((VA_BITS >= VA_BITS_SV39) ? VA_BITS_SV39 : VA_BITS)


Here the condition is always true right?


> +#define MMAP_VA_BITS (test_thread_flag(TIF_32BIT) ? 32 : MMAP_VA_BITS_64)
> +#define MMAP_MIN_VA_BITS (test_thread_flag(TIF_32BIT) ? 32 : MMAP_MIN_VA_BITS_64)


I think you should use is_compat_task() here instead of
test_thread_flag(TIF_32BIT). And what about introducing VA_BITS_SV32
instead of hardcoding 32?


> +#else
> +#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
> +#define MMAP_MIN_VA_BITS ((VA_BITS >= VA_BITS_SV39) ? VA_BITS_SV39 : VA_BITS)


Ditto here.


> +#endif
> #else
> #define VA_BITS 32
> #endif
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index c950a8d9edef..e810244ea951 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -13,19 +13,59 @@
>
> #include <asm/ptrace.h>
>
> +#ifdef CONFIG_64BIT
> +#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
> +#define STACK_TOP_MAX TASK_SIZE_64
> +
> +#define arch_get_mmap_end(addr, len, flags) \
> +({ \
> + unsigned long mmap_end; \
> + typeof(addr) _addr = (addr); \
> + if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_32BIT))) \
> + mmap_end = DEFAULT_MAP_WINDOW; \


Wouldn't that prevent a sv57 system to allocate sv57 addresses when sv48
is full unless explicitly asked?


> + else if ((_addr) >= VA_USER_SV57) \
> + mmap_end = STACK_TOP_MAX; \
> + else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
> + mmap_end = VA_USER_SV48; \
> + else \
> + mmap_end = VA_USER_SV39; \
> + mmap_end; \
> +})
> +
> +#define arch_get_mmap_base(addr, base) \
> +({ \
> + unsigned long mmap_base; \
> + typeof(addr) _addr = (addr); \
> + typeof(base) _base = (base); \
> + unsigned long rnd_gap = (_base) - DEFAULT_MAP_WINDOW; \
> + if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_32BIT))) \
> + mmap_base = (_base); \
> + else if (((_addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) \
> + mmap_base = VA_USER_SV57 + rnd_gap; \


Shouldn't it be mmap_base = VA_USER_SV57 - rnd_gap?


> + else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
> + mmap_base = VA_USER_SV48 + rnd_gap; \
> + else \
> + mmap_base = VA_USER_SV39 + rnd_gap; \
> + mmap_base; \
> +})
> +
> +#else
> +#define DEFAULT_MAP_WINDOW TASK_SIZE
> +#define STACK_TOP_MAX TASK_SIZE
> +#endif
> +#define STACK_ALIGN 16
> +
> +#define STACK_TOP DEFAULT_MAP_WINDOW
> +
> /*
> * This decides where the kernel will search for a free chunk of vm
> * space during mmap's.
> */
> -#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
> -
> -#define STACK_TOP TASK_SIZE
> #ifdef CONFIG_64BIT
> -#define STACK_TOP_MAX TASK_SIZE_64
> +#define TASK_UNMAPPED_BASE PAGE_ALIGN((UL(1) << MMAP_MIN_VA_BITS) / 3)
> #else
> -#define STACK_TOP_MAX TASK_SIZE
> +#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
> #endif
> -#define STACK_ALIGN 16
>
> #ifndef __ASSEMBLY__
>

2023-08-06 10:20:45

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v8 2/4] RISC-V: mm: Add tests for RISC-V mm


On 27/07/2023 23:26, Charlie Jenkins wrote:
> Add tests that enforce mmap hint address behavior. mmap should default
> to sv48. mmap will provide an address at the highest address space that
> can fit into the hint address, unless the hint address is less than sv39
> and not 0, then it will return a sv39 address.
>
> These tests are split into two files: mmap_default.c and mmap_bottomup.c
> because a new process must be exec'd in order to change the mmap layout.
> The run_mmap.sh script sets the stack to be unlimited for the
> mmap_bottomup.c test which triggers a bottomup layout.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
> ---
> tools/testing/selftests/riscv/Makefile | 2 +-
> tools/testing/selftests/riscv/mm/.gitignore | 2 +
> tools/testing/selftests/riscv/mm/Makefile | 15 +++++
> .../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
> .../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
> .../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
> .../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
> 7 files changed, 164 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
> create mode 100644 tools/testing/selftests/riscv/mm/Makefile
> create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
> create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
> create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
> create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
>
> diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
> index f4b3d5c9af5b..4a9ff515a3a0 100644
> --- a/tools/testing/selftests/riscv/Makefile
> +++ b/tools/testing/selftests/riscv/Makefile
> @@ -5,7 +5,7 @@
> ARCH ?= $(shell uname -m 2>/dev/null || echo not)
>
> ifneq (,$(filter $(ARCH),riscv))
> -RISCV_SUBTARGETS ?= hwprobe vector
> +RISCV_SUBTARGETS ?= hwprobe vector mm
> else
> RISCV_SUBTARGETS :=
> endif
> diff --git a/tools/testing/selftests/riscv/mm/.gitignore b/tools/testing/selftests/riscv/mm/.gitignore
> new file mode 100644
> index 000000000000..5c2c57cb950c
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/.gitignore
> @@ -0,0 +1,2 @@
> +mmap_bottomup
> +mmap_default
> diff --git a/tools/testing/selftests/riscv/mm/Makefile b/tools/testing/selftests/riscv/mm/Makefile
> new file mode 100644
> index 000000000000..11e0f0568923
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/Makefile
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2021 ARM Limited
> +# Originally tools/testing/arm64/abi/Makefile
> +
> +# Additional include paths needed by kselftest.h and local headers
> +CFLAGS += -D_GNU_SOURCE -std=gnu99 -I.
> +
> +TEST_GEN_FILES := testcases/mmap_default testcases/mmap_bottomup
> +
> +TEST_PROGS := testcases/run_mmap.sh
> +
> +include ../../lib.mk
> +
> +$(OUTPUT)/mm: testcases/mmap_default.c testcases/mmap_bottomup.c testcases/mmap_tests.h
> + $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
> diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c b/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
> new file mode 100644
> index 000000000000..b29379f7e478
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <sys/mman.h>
> +#include <testcases/mmap_test.h>
> +
> +#include "../../kselftest_harness.h"
> +
> +TEST(infinite_rlimit)
> +{
> +// Only works on 64 bit
> +#if __riscv_xlen == 64
> + struct addresses mmap_addresses;
> +
> + EXPECT_EQ(BOTTOM_UP, memory_layout());
> +
> + do_mmaps(&mmap_addresses);
> +
> + EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);
> +
> + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);


This test ^ will only work on sv48+ systems, if launched on a sv39
system, it will fail


> + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
> + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
> + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
> + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
> + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
> + EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
> +#endif
> +}
> +
> +TEST_HARNESS_MAIN
> diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_default.c b/tools/testing/selftests/riscv/mm/testcases/mmap_default.c
> new file mode 100644
> index 000000000000..d1accb91b726
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/testcases/mmap_default.c
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <sys/mman.h>
> +#include <testcases/mmap_test.h>
> +
> +#include "../../kselftest_harness.h"
> +
> +TEST(default_rlimit)
> +{
> +// Only works on 64 bit
> +#if __riscv_xlen == 64
> + struct addresses mmap_addresses;
> +
> + EXPECT_EQ(TOP_DOWN, memory_layout());
> +
> + do_mmaps(&mmap_addresses);
> +
> + EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
> + EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);
> +
> + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
> + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
> + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
> + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
> + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
> + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
> + EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
> +#endif
> +}
> +
> +TEST_HARNESS_MAIN
> diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_test.h b/tools/testing/selftests/riscv/mm/testcases/mmap_test.h
> new file mode 100644
> index 000000000000..98a892de5d19
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/testcases/mmap_test.h
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef _TESTCASES_MMAP_TEST_H
> +#define _TESTCASES_MMAP_TEST_H
> +#include <sys/mman.h>
> +#include <sys/resource.h>
> +#include <stddef.h>
> +
> +#define TOP_DOWN 0
> +#define BOTTOM_UP 1
> +
> +struct addresses {
> + int *no_hint;
> + int *on_37_addr;
> + int *on_38_addr;
> + int *on_46_addr;
> + int *on_47_addr;
> + int *on_55_addr;
> + int *on_56_addr;
> +};
> +
> +void do_mmaps(struct addresses *mmap_addresses)


I would static inline this function definition since you are in a header
file.


> +{
> + /*
> + * Place all of the hint addresses on the boundaries of mmap
> + * sv39, sv48, sv57
> + * User addresses end at 1<<38, 1<<47, 1<<56 respectively
> + */
> + void *on_37_bits = (void *)(1UL << 37);
> + void *on_38_bits = (void *)(1UL << 38);
> + void *on_46_bits = (void *)(1UL << 46);
> + void *on_47_bits = (void *)(1UL << 47);
> + void *on_55_bits = (void *)(1UL << 55);
> + void *on_56_bits = (void *)(1UL << 56);
> +
> + int prot = PROT_READ | PROT_WRITE;
> + int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> +
> + mmap_addresses->no_hint =
> + mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0);
> + mmap_addresses->on_37_addr =
> + mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0);
> + mmap_addresses->on_38_addr =
> + mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0);
> + mmap_addresses->on_46_addr =
> + mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0);
> + mmap_addresses->on_47_addr =
> + mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0);
> + mmap_addresses->on_55_addr =
> + mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0);
> + mmap_addresses->on_56_addr =
> + mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +}
> +
> +int memory_layout(void)
> +{
> + int prot = PROT_READ | PROT_WRITE;
> + int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> +
> + void *value1 = mmap(NULL, sizeof(int), prot, flags, 0, 0);
> + void *value2 = mmap(NULL, sizeof(int), prot, flags, 0, 0);
> +
> + return value2 > value1;
> +}
> +#endif /* _TESTCASES_MMAP_TEST_H */
> diff --git a/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh b/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
> new file mode 100755
> index 000000000000..ca5ad7c48bad
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
> @@ -0,0 +1,12 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +
> +original_stack_limit=$(ulimit -s)
> +
> +./mmap_default
> +
> +# Force mmap_bottomup to be ran with bottomup memory due to
> +# the unlimited stack
> +ulimit -s unlimited
> +./mmap_bottomup
> +ulimit -s $original_stack_limit

2023-08-06 10:31:45

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v8 4/4] RISC-V: mm: Document mmap changes


On 27/07/2023 23:26, Charlie Jenkins wrote:
> The behavior of mmap is modified with this patch series, so explain the
> changes to the mmap hint address behavior.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
> ---
> Documentation/riscv/vm-layout.rst | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
> index 5462c84f4723..69ff6da1dbf8 100644
> --- a/Documentation/riscv/vm-layout.rst
> +++ b/Documentation/riscv/vm-layout.rst
> @@ -133,3 +133,25 @@ RISC-V Linux Kernel SV57
> ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF
> ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel
> __________________|____________|__________________|_________|____________________________________________________________
> +
> +
> +Userspace VAs
> +--------------------
> +To maintain compatibility with software that relies on the VA space with a
> +maximum of 48 bits the kernel will, by default, return virtual addresses to
> +userspace from a 48-bit range (sv48). This default behavior is achieved by
> +passing 0 into the hint address parameter of mmap. On CPUs with an address space
> +smaller than sv48, the CPU maximum supported address space will be the default.
> +
> +Software can "opt-in" to receiving VAs from another VA space by providing
> +a hint address to mmap. A hint address passed to mmap will cause the largest
> +address space that fits entirely into the hint to be used, unless there is no
> +space left in the address space. If there is no space available in the requested
> +address space, an address in the next smallest available address space will be
> +returned.
> +
> +For example, in order to obtain 48-bit VA space, a hint address greater than
> +:code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace
> +ending at :code:`1 << 47` and the addresses beyond this are reserved for the
> +kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
> +than or equal to :code:`1 << 56` must be provided.


You can add:

Reviewed-by: Alexandre Ghiti <[email protected]>

Thanks,

Alex


2023-08-07 18:46:35

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v8 2/4] RISC-V: mm: Add tests for RISC-V mm

On Sun, Aug 06, 2023 at 12:06:49PM +0200, Alexandre Ghiti wrote:
>
> On 27/07/2023 23:26, Charlie Jenkins wrote:
> > Add tests that enforce mmap hint address behavior. mmap should default
> > to sv48. mmap will provide an address at the highest address space that
> > can fit into the hint address, unless the hint address is less than sv39
> > and not 0, then it will return a sv39 address.
> >
> > These tests are split into two files: mmap_default.c and mmap_bottomup.c
> > because a new process must be exec'd in order to change the mmap layout.
> > The run_mmap.sh script sets the stack to be unlimited for the
> > mmap_bottomup.c test which triggers a bottomup layout.
> >
> > Signed-off-by: Charlie Jenkins <[email protected]>
> > ---
> > tools/testing/selftests/riscv/Makefile | 2 +-
> > tools/testing/selftests/riscv/mm/.gitignore | 2 +
> > tools/testing/selftests/riscv/mm/Makefile | 15 +++++
> > .../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
> > .../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
> > .../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
> > .../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
> > 7 files changed, 164 insertions(+), 1 deletion(-)
> > create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
> > create mode 100644 tools/testing/selftests/riscv/mm/Makefile
> > create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
> > create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
> > create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
> > create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
> >
> > diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
> > index f4b3d5c9af5b..4a9ff515a3a0 100644
> > --- a/tools/testing/selftests/riscv/Makefile
> > +++ b/tools/testing/selftests/riscv/Makefile
> > @@ -5,7 +5,7 @@
> > ARCH ?= $(shell uname -m 2>/dev/null || echo not)
> > ifneq (,$(filter $(ARCH),riscv))
> > -RISCV_SUBTARGETS ?= hwprobe vector
> > +RISCV_SUBTARGETS ?= hwprobe vector mm
> > else
> > RISCV_SUBTARGETS :=
> > endif
> > diff --git a/tools/testing/selftests/riscv/mm/.gitignore b/tools/testing/selftests/riscv/mm/.gitignore
> > new file mode 100644
> > index 000000000000..5c2c57cb950c
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/mm/.gitignore
> > @@ -0,0 +1,2 @@
> > +mmap_bottomup
> > +mmap_default
> > diff --git a/tools/testing/selftests/riscv/mm/Makefile b/tools/testing/selftests/riscv/mm/Makefile
> > new file mode 100644
> > index 000000000000..11e0f0568923
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/mm/Makefile
> > @@ -0,0 +1,15 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (C) 2021 ARM Limited
> > +# Originally tools/testing/arm64/abi/Makefile
> > +
> > +# Additional include paths needed by kselftest.h and local headers
> > +CFLAGS += -D_GNU_SOURCE -std=gnu99 -I.
> > +
> > +TEST_GEN_FILES := testcases/mmap_default testcases/mmap_bottomup
> > +
> > +TEST_PROGS := testcases/run_mmap.sh
> > +
> > +include ../../lib.mk
> > +
> > +$(OUTPUT)/mm: testcases/mmap_default.c testcases/mmap_bottomup.c testcases/mmap_tests.h
> > + $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
> > diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c b/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
> > new file mode 100644
> > index 000000000000..b29379f7e478
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
> > @@ -0,0 +1,35 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +#include <sys/mman.h>
> > +#include <testcases/mmap_test.h>
> > +
> > +#include "../../kselftest_harness.h"
> > +
> > +TEST(infinite_rlimit)
> > +{
> > +// Only works on 64 bit
> > +#if __riscv_xlen == 64
> > + struct addresses mmap_addresses;
> > +
> > + EXPECT_EQ(BOTTOM_UP, memory_layout());
> > +
> > + do_mmaps(&mmap_addresses);
> > +
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);
> > +
> > + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
>
>
> This test ^ will only work on sv48+ systems, if launched on a sv39 system,
> it will fail
>
I may be missing something, but why would this test fail? It may not be
particularily useful since the addresses will always be in the sv39
address space, but it shouldn't fail. I have tested in QEMU and it
passes. This tests if 1UL << 47 is greater than the returned mmap address
which will always be true.
>
>
> > + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
> > + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
> > + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
> > + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
> > + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
> > + EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
> > +#endif
> > +}
> > +
> > +TEST_HARNESS_MAIN
> > diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_default.c b/tools/testing/selftests/riscv/mm/testcases/mmap_default.c
> > new file mode 100644
> > index 000000000000..d1accb91b726
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/mm/testcases/mmap_default.c
> > @@ -0,0 +1,35 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +#include <sys/mman.h>
> > +#include <testcases/mmap_test.h>
> > +
> > +#include "../../kselftest_harness.h"
> > +
> > +TEST(default_rlimit)
> > +{
> > +// Only works on 64 bit
> > +#if __riscv_xlen == 64
> > + struct addresses mmap_addresses;
> > +
> > + EXPECT_EQ(TOP_DOWN, memory_layout());
> > +
> > + do_mmaps(&mmap_addresses);
> > +
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
> > + EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);
> > +
> > + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
> > + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
> > + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
> > + EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
> > + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
> > + EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
> > + EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
> > +#endif
> > +}
> > +
> > +TEST_HARNESS_MAIN
> > diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap_test.h b/tools/testing/selftests/riscv/mm/testcases/mmap_test.h
> > new file mode 100644
> > index 000000000000..98a892de5d19
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/mm/testcases/mmap_test.h
> > @@ -0,0 +1,64 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef _TESTCASES_MMAP_TEST_H
> > +#define _TESTCASES_MMAP_TEST_H
> > +#include <sys/mman.h>
> > +#include <sys/resource.h>
> > +#include <stddef.h>
> > +
> > +#define TOP_DOWN 0
> > +#define BOTTOM_UP 1
> > +
> > +struct addresses {
> > + int *no_hint;
> > + int *on_37_addr;
> > + int *on_38_addr;
> > + int *on_46_addr;
> > + int *on_47_addr;
> > + int *on_55_addr;
> > + int *on_56_addr;
> > +};
> > +
> > +void do_mmaps(struct addresses *mmap_addresses)
>
>
> I would static inline this function definition since you are in a header
> file.
>
>
> > +{
> > + /*
> > + * Place all of the hint addresses on the boundaries of mmap
> > + * sv39, sv48, sv57
> > + * User addresses end at 1<<38, 1<<47, 1<<56 respectively
> > + */
> > + void *on_37_bits = (void *)(1UL << 37);
> > + void *on_38_bits = (void *)(1UL << 38);
> > + void *on_46_bits = (void *)(1UL << 46);
> > + void *on_47_bits = (void *)(1UL << 47);
> > + void *on_55_bits = (void *)(1UL << 55);
> > + void *on_56_bits = (void *)(1UL << 56);
> > +
> > + int prot = PROT_READ | PROT_WRITE;
> > + int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> > +
> > + mmap_addresses->no_hint =
> > + mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0);
> > + mmap_addresses->on_37_addr =
> > + mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0);
> > + mmap_addresses->on_38_addr =
> > + mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0);
> > + mmap_addresses->on_46_addr =
> > + mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0);
> > + mmap_addresses->on_47_addr =
> > + mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0);
> > + mmap_addresses->on_55_addr =
> > + mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0);
> > + mmap_addresses->on_56_addr =
> > + mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0);
> > +}
> > +
> > +int memory_layout(void)
> > +{
> > + int prot = PROT_READ | PROT_WRITE;
> > + int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> > +
> > + void *value1 = mmap(NULL, sizeof(int), prot, flags, 0, 0);
> > + void *value2 = mmap(NULL, sizeof(int), prot, flags, 0, 0);
> > +
> > + return value2 > value1;
> > +}
> > +#endif /* _TESTCASES_MMAP_TEST_H */
> > diff --git a/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh b/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
> > new file mode 100755
> > index 000000000000..ca5ad7c48bad
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
> > @@ -0,0 +1,12 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +original_stack_limit=$(ulimit -s)
> > +
> > +./mmap_default
> > +
> > +# Force mmap_bottomup to be ran with bottomup memory due to
> > +# the unlimited stack
> > +ulimit -s unlimited
> > +./mmap_bottomup
> > +ulimit -s $original_stack_limit

2023-08-07 23:07:54

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v8 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57

On Sun, Aug 06, 2023 at 11:53:51AM +0200, Alexandre Ghiti wrote:
>
> On 27/07/2023 23:26, Charlie Jenkins wrote:
> > Make sv48 the default address space for mmap as some applications
> > currently depend on this assumption. A hint address passed to mmap will
> > cause the largest address space that fits entirely into the hint to be
> > used. If the hint is less than or equal to 1<<38, an sv39 address will
> > be used. An exception is that if the hint address is 0, then a sv48
> > address will be used. After an address space is completely full, the next
> > smallest address space will be used.
> >
> > Signed-off-by: Charlie Jenkins <[email protected]>
> > ---
> > arch/riscv/include/asm/elf.h | 2 +-
> > arch/riscv/include/asm/pgtable.h | 20 +++++++++++-
> > arch/riscv/include/asm/processor.h | 52 ++++++++++++++++++++++++++----
> > 3 files changed, 66 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
> > index c24280774caf..5d3368d5585c 100644
> > --- a/arch/riscv/include/asm/elf.h
> > +++ b/arch/riscv/include/asm/elf.h
> > @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
> > * the loader. We need to make sure that it is out of the way of the program
> > * that it will "exec", and that there is sufficient room for the brk.
> > */
> > -#define ELF_ET_DYN_BASE ((TASK_SIZE / 3) * 2)
> > +#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2)
> > #ifdef CONFIG_64BIT
> > #ifdef CONFIG_COMPAT
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 75970ee2bda2..c76a1ef094a4 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -63,8 +63,26 @@
> > * position vmemmap directly below the VMALLOC region.
> > */
> > #ifdef CONFIG_64BIT
> > +#define VA_BITS_SV39 39
> > +#define VA_BITS_SV48 48
> > +#define VA_BITS_SV57 57
> > +
> > +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1))
> > +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
> > +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
> > +
> > #define VA_BITS (pgtable_l5_enabled ? \
> > - 57 : (pgtable_l4_enabled ? 48 : 39))
> > + VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39))
> > +
> > +#ifdef CONFIG_COMPAT
> > +#define MMAP_VA_BITS_64 ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
> > +#define MMAP_MIN_VA_BITS_64 ((VA_BITS >= VA_BITS_SV39) ? VA_BITS_SV39 : VA_BITS)
>
>
> Here the condition is always true right?
>
Yes, that condition is always true, I can eliminate the conditional.
>
> > +#define MMAP_VA_BITS (test_thread_flag(TIF_32BIT) ? 32 : MMAP_VA_BITS_64)
> > +#define MMAP_MIN_VA_BITS (test_thread_flag(TIF_32BIT) ? 32 : MMAP_MIN_VA_BITS_64)
>
>
> I think you should use is_compat_task() here instead of
> test_thread_flag(TIF_32BIT). And what about introducing VA_BITS_SV32 instead
> of hardcoding 32?
>
Sounds good.
>
> > +#else
> > +#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
> > +#define MMAP_MIN_VA_BITS ((VA_BITS >= VA_BITS_SV39) ? VA_BITS_SV39 : VA_BITS)
>
>
> Ditto here.
>
>
> > +#endif
> > #else
> > #define VA_BITS 32
> > #endif
> > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > index c950a8d9edef..e810244ea951 100644
> > --- a/arch/riscv/include/asm/processor.h
> > +++ b/arch/riscv/include/asm/processor.h
> > @@ -13,19 +13,59 @@
> > #include <asm/ptrace.h>
> > +#ifdef CONFIG_64BIT
> > +#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
> > +#define STACK_TOP_MAX TASK_SIZE_64
> > +
> > +#define arch_get_mmap_end(addr, len, flags) \
> > +({ \
> > + unsigned long mmap_end; \
> > + typeof(addr) _addr = (addr); \
> > + if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_32BIT))) \
> > + mmap_end = DEFAULT_MAP_WINDOW; \
>
>
> Wouldn't that prevent a sv57 system to allocate sv57 addresses when sv48 is
> full unless explicitly asked?
>
Yes that is a good point, that should be STACK_TOP_MAX as well.
>
> > + else if ((_addr) >= VA_USER_SV57) \
> > + mmap_end = STACK_TOP_MAX; \
> > + else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
> > + mmap_end = VA_USER_SV48; \
> > + else \
> > + mmap_end = VA_USER_SV39; \
> > + mmap_end; \
> > +})
> > +
> > +#define arch_get_mmap_base(addr, base) \
> > +({ \
> > + unsigned long mmap_base; \
> > + typeof(addr) _addr = (addr); \
> > + typeof(base) _base = (base); \
> > + unsigned long rnd_gap = (_base) - DEFAULT_MAP_WINDOW; \
> > + if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_32BIT))) \
> > + mmap_base = (_base); \
> > + else if (((_addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) \
> > + mmap_base = VA_USER_SV57 + rnd_gap; \
>
>
> Shouldn't it be mmap_base = VA_USER_SV57 - rnd_gap?
>
No, rnd_gap is a negative number here. 'base' is equal to
DEFAULT_MAP_WINDOW - gap - rnd. It does seem more clear if it is a
positive number so I will set rnd_gap to DEFAULT_MAP_WINDOW - (_base).
>
> > + else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
> > + mmap_base = VA_USER_SV48 + rnd_gap; \
> > + else \
> > + mmap_base = VA_USER_SV39 + rnd_gap; \
> > + mmap_base; \
> > +})
> > +
> > +#else
> > +#define DEFAULT_MAP_WINDOW TASK_SIZE
> > +#define STACK_TOP_MAX TASK_SIZE
> > +#endif
> > +#define STACK_ALIGN 16
> > +
> > +#define STACK_TOP DEFAULT_MAP_WINDOW
> > +
> > /*
> > * This decides where the kernel will search for a free chunk of vm
> > * space during mmap's.
> > */
> > -#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
> > -
> > -#define STACK_TOP TASK_SIZE
> > #ifdef CONFIG_64BIT
> > -#define STACK_TOP_MAX TASK_SIZE_64
> > +#define TASK_UNMAPPED_BASE PAGE_ALIGN((UL(1) << MMAP_MIN_VA_BITS) / 3)
> > #else
> > -#define STACK_TOP_MAX TASK_SIZE
> > +#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
> > #endif
> > -#define STACK_ALIGN 16
> > #ifndef __ASSEMBLY__