2024-01-14 19:56:50

by Yangyu Chen

[permalink] [raw]
Subject: [PATCH 0/3] RISC-V: mm: correct mmap behavior in sv48 address space

Previous patch series [1] violates the principle of mmap syscall as it uses
hint address as the largest address space to use rather than where to
create the mapping, thus broke the possibility to mmap in sv48, sv57
address space without a MAP_FIXED flag. This patchset corrects the behavior
of mmap syscall and use the behavior of x86 5-stage-paging as a reference.

I first noticed this issue when I was trying to run box64 on a sv48 system
with commit previous than [2]. Then I reported this through private
communication, then a box64 contributor did some investigation and found
that trying to mmap in sv48 address space without MAP_FIXED flag will
always return a random address in sv39. I review the changelog with some
tests on qemu and found this issue was introduced from [1]. After reviewing
the code, tests and docs, I think the original author might misunderstand
the meaning of hint address in mmap syscall. Then I did some investigation
on other ISAs like x86 which has 5-stage-paging and found that it has
addressed the same issue if some userspace software assumes the pointer
size should smaller than 47 bits and also solved in kernel by limiting the
mmap in maximum 47 bits address space by default.

Finally I correct the behavior of mmap syscall as x86 5-stage-paging does,
and migreate the documentation from x86-64 kernel to riscv kernel.


[1]. https://lore.kernel.org/linux-riscv/[email protected]/
[2]. https://github.com/ptitSeb/box64/commit/5b700cb6e6f397d2074c49659f7f9915f4a33c5f

Yangyu Chen (3):
RISC-V: mm: fix mmap behavior in sv48 address space
RISC-V: mm: only test mmap without hint
Documentation: riscv: correct sv57 kernel behavior

Documentation/arch/riscv/vm-layout.rst | 48 +++++++++++--------
arch/riscv/include/asm/processor.h | 39 ++++-----------
.../selftests/riscv/mm/mmap_bottomup.c | 12 -----
.../testing/selftests/riscv/mm/mmap_default.c | 12 -----
tools/testing/selftests/riscv/mm/mmap_test.h | 30 ------------
5 files changed, 36 insertions(+), 105 deletions(-)

--
2.43.0



2024-01-14 19:59:46

by Yangyu Chen

[permalink] [raw]
Subject: [PATCH 3/3] Documentation: riscv: correct sv57 kernel behavior

The original documentation from a patch violates the principle of mmap.
Since the kernel behavior has been corrected from the previous patch, this
documentation should also be updated. This patch migrated the
5-level-paging documentation from x86_64 with minor modifications to align
with the current kernel's behavior on RISC-V.

Signed-off-by: Yangyu Chen <[email protected]>
---
Documentation/arch/riscv/vm-layout.rst | 48 +++++++++++++++-----------
1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/Documentation/arch/riscv/vm-layout.rst b/Documentation/arch/riscv/vm-layout.rst
index 69ff6da1dbf8..30e879dad6a2 100644
--- a/Documentation/arch/riscv/vm-layout.rst
+++ b/Documentation/arch/riscv/vm-layout.rst
@@ -135,23 +135,31 @@ RISC-V Linux Kernel SV57
__________________|____________|__________________|_________|____________________________________________________________


-Userspace VAs
---------------------
-To maintain compatibility with software that relies on the VA space with a
-maximum of 48 bits the kernel will, by default, return virtual addresses to
-userspace from a 48-bit range (sv48). This default behavior is achieved by
-passing 0 into the hint address parameter of mmap. On CPUs with an address space
-smaller than sv48, the CPU maximum supported address space will be the default.
-
-Software can "opt-in" to receiving VAs from another VA space by providing
-a hint address to mmap. A hint address passed to mmap will cause the largest
-address space that fits entirely into the hint to be used, unless there is no
-space left in the address space. If there is no space available in the requested
-address space, an address in the next smallest available address space will be
-returned.
-
-For example, in order to obtain 48-bit VA space, a hint address greater than
-:code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace
-ending at :code:`1 << 47` and the addresses beyond this are reserved for the
-kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
-than or equal to :code:`1 << 56` must be provided.
+User-space and large virtual address space
+==========================================
+On RISC-V, Sv57 paging enables 56-bit userspace virtual address space.
+Not all user space is ready to handle wide addresses. It's known that
+at least some JIT compilers use higher bits in pointers to encode their
+information. It collides with valid pointers with Sv57 paging and
+leads to crashes.
+
+To mitigate this, we are not going to allocate virtual address space
+above 47-bit by default.
+
+But userspace can ask for allocation from full address space by
+specifying hint address (with or without MAP_FIXED) above 47-bits.
+
+If hint address set above 47-bit, but MAP_FIXED is not specified, we try
+to look for unmapped area by specified address. If it's already occupied,
+this mmap will fail.
+
+A high hint address would only affect the allocation in question, but not
+any future mmap()s.
+
+Specifying high hint address without MAP_FIXED on older kernel or on
+machine without Sv57 paging support is safe. The hint will be ignored and
+kernel will fall back to allocation from the supported address space.
+
+This approach helps to easily make application's memory allocator aware
+about large address space without manually tracking allocated virtual
+address space.
--
2.43.0


2024-01-14 20:06:03

by Yangyu Chen

[permalink] [raw]
Subject: [PATCH 2/3] RISC-V: mm: only test mmap without hint

The original test from a patch violates the principle of mmap which uses
mmap hint address as the largest address space to use rather than where to
create the mapping. The address space in this context is either sv39, sv48,
sv57.

After fixing the correct behavior, only keeping the test mmap without a
hint is enough.

Signed-off-by: Yangyu Chen <[email protected]>
---
.../selftests/riscv/mm/mmap_bottomup.c | 12 --------
.../testing/selftests/riscv/mm/mmap_default.c | 12 --------
tools/testing/selftests/riscv/mm/mmap_test.h | 30 -------------------
3 files changed, 54 deletions(-)

diff --git a/tools/testing/selftests/riscv/mm/mmap_bottomup.c b/tools/testing/selftests/riscv/mm/mmap_bottomup.c
index 1757d19ca89b..1ba703d3f552 100644
--- a/tools/testing/selftests/riscv/mm/mmap_bottomup.c
+++ b/tools/testing/selftests/riscv/mm/mmap_bottomup.c
@@ -15,20 +15,8 @@ TEST(infinite_rlimit)
do_mmaps(&mmap_addresses);

EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);

EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
- EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
- EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
- EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
- EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
- EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
- EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
#endif
}

diff --git a/tools/testing/selftests/riscv/mm/mmap_default.c b/tools/testing/selftests/riscv/mm/mmap_default.c
index c63c60b9397e..f1ac860dcf04 100644
--- a/tools/testing/selftests/riscv/mm/mmap_default.c
+++ b/tools/testing/selftests/riscv/mm/mmap_default.c
@@ -15,20 +15,8 @@ TEST(default_rlimit)
do_mmaps(&mmap_addresses);

EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr);
- EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr);

EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint);
- EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr);
- EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr);
- EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr);
- EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr);
- EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr);
- EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr);
#endif
}

diff --git a/tools/testing/selftests/riscv/mm/mmap_test.h b/tools/testing/selftests/riscv/mm/mmap_test.h
index 2e0db9c5be6c..d2271426288f 100644
--- a/tools/testing/selftests/riscv/mm/mmap_test.h
+++ b/tools/testing/selftests/riscv/mm/mmap_test.h
@@ -10,47 +10,17 @@

struct addresses {
int *no_hint;
- int *on_37_addr;
- int *on_38_addr;
- int *on_46_addr;
- int *on_47_addr;
- int *on_55_addr;
- int *on_56_addr;
};

// Only works on 64 bit
#if __riscv_xlen == 64
static inline void do_mmaps(struct addresses *mmap_addresses)
{
- /*
- * Place all of the hint addresses on the boundaries of mmap
- * sv39, sv48, sv57
- * User addresses end at 1<<38, 1<<47, 1<<56 respectively
- */
- void *on_37_bits = (void *)(1UL << 37);
- void *on_38_bits = (void *)(1UL << 38);
- void *on_46_bits = (void *)(1UL << 46);
- void *on_47_bits = (void *)(1UL << 47);
- void *on_55_bits = (void *)(1UL << 55);
- void *on_56_bits = (void *)(1UL << 56);
-
int prot = PROT_READ | PROT_WRITE;
int flags = MAP_PRIVATE | MAP_ANONYMOUS;

mmap_addresses->no_hint =
mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0);
- mmap_addresses->on_37_addr =
- mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0);
- mmap_addresses->on_38_addr =
- mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0);
- mmap_addresses->on_46_addr =
- mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0);
- mmap_addresses->on_47_addr =
- mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0);
- mmap_addresses->on_55_addr =
- mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0);
- mmap_addresses->on_56_addr =
- mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0);
}
#endif /* __riscv_xlen == 64 */

--
2.43.0