2023-05-31 22:35:58

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH v4 0/7] Optimize mremap during mutual alignment within PMD

Hello!

Here is v4 of the mremap start address optimization / fix for exec warning. It
took me a while to write a test that catches the issue me/Linus discussed in
the last version. And I verified kernel crashes without the check. See below.

The main changes in this series is:
Care to be taken to move purely within a VMA, in other words this check
in call_align_down():
if (vma->vm_start != addr_masked)
return false;

As an example of why this is needed:
Consider the following range which is 2MB aligned and is
a part of a larger 10MB range which is not shown. Each
character is 256KB below making the source and destination
2MB each. The lower case letters are moved (s to d) and the
upper case letters are not moved.

|DDDDddddSSSSssss|

If we align down 'ssss' to start from the 'SSSS', we will end up destroying
SSSS. The above if statement prevents that and I verified it.

I also added a test for this in the last patch.

History of patches
==================
v3->v4:
1. Make sure to check address to align is beginning of VMA
2. Add test to check this (test fails with a kernel crash if we don't do this).

v2->v3:
1. Masked address was stored in int, fixed it to unsigned long to avoid truncation.
2. We now handle moves happening purely within a VMA, a new test is added to handle this.
3. More code comments.

v1->v2:
1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing
that it works correctly.

2. Fix issue with bogus return value found by Linus if we broke out of the
above loop for the first PMD itself.

v1: Initial RFC.

Description of patches
======================
These patches optimizes the start addresses in move_page_tables() and tests the
changes. It addresses a warning [1] that occurs due to a downward, overlapping
move on a mutually-aligned offset within a PMD during exec. By initiating the
copy process at the PMD level when such alignment is present, we can prevent
this warning and speed up the copying process at the same time. Linus Torvalds
suggested this idea.

Please check the individual patches for more details.

thanks,

- Joel

[1] https://lore.kernel.org/all/ZB2GTBD%[email protected]/

Joel Fernandes (Google) (7):
mm/mremap: Optimize the start addresses in move_page_tables()
mm/mremap: Allow moves within the same VMA for stack
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
selftests: mm: Add a test for remapping within a range
selftests: mm: Add a test for moving from an offset from start of
mapping

fs/exec.c | 2 +-
include/linux/mm.h | 2 +-
mm/mremap.c | 63 ++++-
tools/testing/selftests/mm/mremap_test.c | 301 +++++++++++++++++++----
4 files changed, 319 insertions(+), 49 deletions(-)

--
2.41.0.rc2.161.g9c6817b8e7-goog



2023-05-31 22:36:07

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH v4 7/7] selftests: mm: Add a test for moving from an offset from start of mapping

It is possible that the aligned address falls on no existing mapping,
however that does not mean that we can just align it down to that.
This test verifies that the "vma->vm_start != addr_to_align" check in
can_align_down() prevents disastrous results if aligning down when
source and dest are mutually aligned within a PMD but the source/dest
addresses requested are not at the beginning of the respective mapping
containing these addresses.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
tools/testing/selftests/mm/mremap_test.c | 189 ++++++++++++++++-------
1 file changed, 134 insertions(+), 55 deletions(-)

diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c
index f45d1abedc9c..c71ac8306c89 100644
--- a/tools/testing/selftests/mm/mremap_test.c
+++ b/tools/testing/selftests/mm/mremap_test.c
@@ -24,6 +24,7 @@

#define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
#define SIZE_MB(m) ((size_t)m * (1024 * 1024))
+#define SIZE_KB(k) ((size_t)k * 1024)

struct config {
unsigned long long src_alignment;
@@ -148,6 +149,60 @@ static bool is_range_mapped(FILE *maps_fp, void *start, void *end)
return success;
}

+/*
+ * Returns the start address of the mapping on success, else returns
+ * NULL on failure.
+ */
+static void *get_source_mapping(struct config c)
+{
+ unsigned long long addr = 0ULL;
+ void *src_addr = NULL;
+ unsigned long long mmap_min_addr;
+
+ mmap_min_addr = get_mmap_min_addr();
+ /*
+ * For some tests, we need to not have any mappings below the
+ * source mapping. Add some headroom to mmap_min_addr for this.
+ */
+ mmap_min_addr += 10 * _4MB;
+
+retry:
+ addr += c.src_alignment;
+ if (addr < mmap_min_addr)
+ goto retry;
+
+ src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
+ MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+ -1, 0);
+ if (src_addr == MAP_FAILED) {
+ if (errno == EPERM || errno == EEXIST)
+ goto retry;
+ goto error;
+ }
+ /*
+ * Check that the address is aligned to the specified alignment.
+ * Addresses which have alignments that are multiples of that
+ * specified are not considered valid. For instance, 1GB address is
+ * 2MB-aligned, however it will not be considered valid for a
+ * requested alignment of 2MB. This is done to reduce coincidental
+ * alignment in the tests.
+ */
+ if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
+ !((unsigned long long) src_addr & c.src_alignment)) {
+ munmap(src_addr, c.region_size);
+ goto retry;
+ }
+
+ if (!src_addr)
+ goto error;
+
+ return src_addr;
+error:
+ ksft_print_msg("Failed to map source region: %s\n",
+ strerror(errno));
+ return NULL;
+}
+
/*
* This test validates that merge is called when expanding a mapping.
* Mapping containing three pages is created, middle page is unmapped
@@ -300,60 +355,6 @@ static void mremap_move_within_range(char pattern_seed)
ksft_test_result_fail("%s\n", test_name);
}

-/*
- * Returns the start address of the mapping on success, else returns
- * NULL on failure.
- */
-static void *get_source_mapping(struct config c)
-{
- unsigned long long addr = 0ULL;
- void *src_addr = NULL;
- unsigned long long mmap_min_addr;
-
- mmap_min_addr = get_mmap_min_addr();
- /*
- * For some tests, we need to not have any mappings below the
- * source mapping. Add some headroom to mmap_min_addr for this.
- */
- mmap_min_addr += 10 * _4MB;
-
-retry:
- addr += c.src_alignment;
- if (addr < mmap_min_addr)
- goto retry;
-
- src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
- MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
- -1, 0);
- if (src_addr == MAP_FAILED) {
- if (errno == EPERM || errno == EEXIST)
- goto retry;
- goto error;
- }
- /*
- * Check that the address is aligned to the specified alignment.
- * Addresses which have alignments that are multiples of that
- * specified are not considered valid. For instance, 1GB address is
- * 2MB-aligned, however it will not be considered valid for a
- * requested alignment of 2MB. This is done to reduce coincidental
- * alignment in the tests.
- */
- if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
- !((unsigned long long) src_addr & c.src_alignment)) {
- munmap(src_addr, c.region_size);
- goto retry;
- }
-
- if (!src_addr)
- goto error;
-
- return src_addr;
-error:
- ksft_print_msg("Failed to map source region: %s\n",
- strerror(errno));
- return NULL;
-}
-
/* Returns the time taken for the remap on success else returns -1. */
static long long remap_region(struct config c, unsigned int threshold_mb,
char pattern_seed)
@@ -487,6 +488,83 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
return ret;
}

+/*
+ * Verify that an mremap aligning down does not destroy
+ * the beginning of the mapping just because the aligned
+ * down address landed on a mapping that maybe does not exist.
+ */
+static void mremap_move_1mb_from_start(char pattern_seed)
+{
+ char *test_name = "mremap move 1mb from start at 2MB+256KB aligned src";
+ void *src = NULL, *dest = NULL;
+ int i, success = 1;
+
+ /* Config to reuse get_source_mapping() to do an aligned mmap. */
+ struct config c = {
+ .src_alignment = SIZE_MB(1) + SIZE_KB(256),
+ .region_size = SIZE_MB(6)
+ };
+
+ src = get_source_mapping(c);
+ if (!src) {
+ success = 0;
+ goto out;
+ }
+
+ c.src_alignment = SIZE_MB(1) + SIZE_KB(256);
+ dest = get_source_mapping(c);
+ if (!dest) {
+ success = 0;
+ goto out;
+ }
+
+ /* Set byte pattern for source block. */
+ srand(pattern_seed);
+ for (i = 0; i < SIZE_MB(2); i++) {
+ ((char *)src)[i] = (char) rand();
+ }
+
+ /*
+ * Unmap the beginning of dest so that the aligned address
+ * falls on no mapping.
+ */
+ munmap(dest, SIZE_MB(1));
+
+ void *new_ptr = mremap(src + SIZE_MB(1), SIZE_MB(1), SIZE_MB(1),
+ MREMAP_MAYMOVE | MREMAP_FIXED, dest + SIZE_MB(1));
+ if (new_ptr == MAP_FAILED) {
+ perror("mremap");
+ success = 0;
+ goto out;
+ }
+
+ /* Verify byte pattern after remapping */
+ srand(pattern_seed);
+ for (i = 0; i < SIZE_MB(1); i++) {
+ char c = (char) rand();
+
+ if (((char *)src)[i] != c) {
+ ksft_print_msg("Data at src at %d got corrupted due to unrelated mremap\n",
+ i);
+ ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
+ ((char *) src)[i] & 0xff);
+ success = 0;
+ }
+ }
+
+out:
+ if (src && munmap(src, c.region_size) == -1)
+ perror("munmap src");
+
+ if (dest && munmap(dest, c.region_size) == -1)
+ perror("munmap dest");
+
+ if (success)
+ ksft_test_result_pass("%s\n", test_name);
+ else
+ ksft_test_result_fail("%s\n", test_name);
+}
+
static void run_mremap_test_case(struct test test_case, int *failures,
unsigned int threshold_mb,
unsigned int pattern_seed)
@@ -565,7 +643,7 @@ int main(int argc, char **argv)
unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
unsigned int pattern_seed;
int num_expand_tests = 2;
- int num_misc_tests = 1;
+ int num_misc_tests = 2;
struct test test_cases[MAX_TEST] = {};
struct test perf_test_cases[MAX_PERF_TEST];
int page_size;
@@ -666,6 +744,7 @@ int main(int argc, char **argv)
fclose(maps_fp);

mremap_move_within_range(pattern_seed);
+ mremap_move_1mb_from_start(pattern_seed);

if (run_perf_tests) {
ksft_print_msg("\n%s\n",
--
2.41.0.rc2.161.g9c6817b8e7-goog


2023-06-01 00:00:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] Optimize mremap during mutual alignment within PMD

On Wed, May 31, 2023 at 6:08 PM Joel Fernandes (Google)
<[email protected]> wrote:
>
> Here is v4 of the mremap start address optimization / fix for exec warning.

I don't see anything suspicious here.

Not that that probably means much, but the test coverage looks reasonable too.

Linus