2023-03-21 20:46:31

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v2 0/4] further cleanup of vma_merge()

Following on from Vlastimil Babka's patch series "cleanup vma_merge() and
improve mergeability tests" which was in turn based on Liam's prior
cleanups, this patch series introduces changes discussed in review of
Vlastimil's series and goes further in attempting to make the logic as
clear as possible.

Nearly all of this should have absolutely no functional impact, however it
does add a singular VM_WARN_ON() case.

With many thanks to Vernon for helping kick start the discussion around
simplification - abstract use of vma did indeed turn out not to be
necessary - and to Liam for his excellent suggestions which greatly
simplified things.

v2:
- Put the patch series on a serious diet, cut comments down to avoid
bloat.
- Added clever use of find_vma_intersection() and vma_lookup() as suggested
by Liam which improved clarity + brevity significantly.
- Eliminated the use of a temporary vma local as suggested by Vernon, it
does seem this was ultimately adding confusion and Liam's suggestions
eliminated the need for this.
- Moved around initial variables to be more sensible and to initialise each
variable in one place where possible.

v1:
https://lore.kernel.org/all/[email protected]

Lorenzo Stoakes (4):
mm/mmap/vma_merge: further improve prev/next VMA naming
mm/mmap/vma_merge: set next to NULL if not applicable
mm/mmap/vma_merge: extend invariants, avoid invalid res, vma
mm/mmap/vma_merge: init cleanup, be explicit about the non-mergeable
case

mm/mmap.c | 155 ++++++++++++++++++++++++++++--------------------------
1 file changed, 80 insertions(+), 75 deletions(-)

--
2.39.2


2023-03-21 20:46:33

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v2 1/4] mm/mmap/vma_merge: further improve prev/next VMA naming

Previously the ASCII diagram above vma_merge() and the accompanying
variable naming was rather confusing, however recent efforts by Liam
Howlett and Vlastimil Babka have significantly improved matters.

This patch goes a little further - replacing 'X' with 'N' which feels a lot
more natural and replacing what was 'N' with 'C' which stands for
'concurrent' VMA.

No word quite describes a VMA that has coincident start as the input span,
concurrent, abbreviated to 'curr' (and which can be thought of also as
'current') however fits intuitions well alongside prev and next.

This has no functional impact.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
mm/mmap.c | 86 +++++++++++++++++++++++++++----------------------------
1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 042d22e63528..c9834364ac98 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -861,44 +861,44 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* this area are about to be changed to vm_flags - and the no-change
* case has already been eliminated.
*
- * The following mprotect cases have to be considered, where AAAA is
+ * The following mprotect cases have to be considered, where **** is
* the area passed down from mprotect_fixup, never extending beyond one
- * vma, PPPP is the previous vma, NNNN is a vma that starts at the same
- * address as AAAA and is of the same or larger span, and XXXX the next
- * vma after AAAA:
+ * vma, PPPP is the previous vma, CCCC is a concurrent vma that starts
+ * at the same address as **** and is of the same or larger span, and
+ * NNNN the next vma after ****:
*
- * AAAA AAAA AAAA
- * PPPPPPXXXXXX PPPPPPXXXXXX PPPPPPNNNNNN
+ * **** **** ****
+ * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPCCCCCC
* cannot merge might become might become
- * PPXXXXXXXXXX PPPPPPPPPPNN
+ * PPNNNNNNNNNN PPPPPPPPPPCC
* mmap, brk or case 4 below case 5 below
* mremap move:
- * AAAA AAAA
- * PPPP XXXX PPPPNNNNXXXX
+ * **** ****
+ * PPPP NNNN PPPPCCCCNNNN
* might become might become
* PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or
- * PPPPPPPPXXXX 2 or PPPPPPPPXXXX 7 or
- * PPPPXXXXXXXX 3 PPPPXXXXXXXX 8
+ * PPPPPPPPNNNN 2 or PPPPPPPPNNNN 7 or
+ * PPPPNNNNNNNN 3 PPPPNNNNNNNN 8
*
- * It is important for case 8 that the vma NNNN overlapping the
- * region AAAA is never going to extended over XXXX. Instead XXXX must
- * be extended in region AAAA and NNNN must be removed. This way in
+ * It is important for case 8 that the vma CCCC overlapping the
+ * region **** is never going to extended over NNNN. Instead NNNN must
+ * be extended in region **** and CCCC must be removed. This way in
* all cases where vma_merge succeeds, the moment vma_merge drops the
* rmap_locks, the properties of the merged vma will be already
* correct for the whole merged range. Some of those properties like
* vm_page_prot/vm_flags may be accessed by rmap_walks and they must
* be correct for the whole merged range immediately after the
- * rmap_locks are released. Otherwise if XXXX would be removed and
- * NNNN would be extended over the XXXX range, remove_migration_ptes
+ * rmap_locks are released. Otherwise if NNNN would be removed and
+ * CCCC would be extended over the NNNN range, remove_migration_ptes
* or other rmap walkers (if working on addresses beyond the "end"
- * parameter) may establish ptes with the wrong permissions of NNNN
- * instead of the right permissions of XXXX.
+ * parameter) may establish ptes with the wrong permissions of CCCC
+ * instead of the right permissions of NNNN.
*
* In the code below:
* PPPP is represented by *prev
- * NNNN is represented by *mid or not represented at all (NULL)
- * XXXX is represented by *next or not represented at all (NULL)
- * AAAA is not represented - it will be merged and the vma containing the
+ * CCCC is represented by *curr or not represented at all (NULL)
+ * NNNN is represented by *next or not represented at all (NULL)
+ * **** is not represented - it will be merged and the vma containing the
* area is returned, or the function will return NULL
*/
struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
@@ -911,7 +911,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
pgoff_t vma_pgoff;
- struct vm_area_struct *mid, *next, *res = NULL;
+ struct vm_area_struct *curr, *next, *res = NULL;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
int err = -1;
bool merge_prev = false;
@@ -930,19 +930,19 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (vm_flags & VM_SPECIAL)
return NULL;

- mid = find_vma(mm, prev ? prev->vm_end : 0);
- if (mid && mid->vm_end == end) /* cases 6, 7, 8 */
- next = find_vma(mm, mid->vm_end);
+ curr = find_vma(mm, prev ? prev->vm_end : 0);
+ if (curr && curr->vm_end == end) /* cases 6, 7, 8 */
+ next = find_vma(mm, curr->vm_end);
else
- next = mid;
+ next = curr;

- /* In cases 1 - 4 there's no NNNN vma */
- if (mid && end <= mid->vm_start)
- mid = NULL;
+ /* In cases 1 - 4 there's no CCCC vma */
+ if (curr && end <= curr->vm_start)
+ curr = NULL;

/* verify some invariant that must be enforced by the caller */
VM_WARN_ON(prev && addr <= prev->vm_start);
- VM_WARN_ON(mid && end > mid->vm_end);
+ VM_WARN_ON(curr && end > curr->vm_end);
VM_WARN_ON(addr >= end);

if (prev) {
@@ -974,21 +974,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
remove = next; /* case 1 */
vma_end = next->vm_end;
err = dup_anon_vma(prev, next);
- if (mid) { /* case 6 */
- remove = mid;
+ if (curr) { /* case 6 */
+ remove = curr;
remove2 = next;
if (!next->anon_vma)
- err = dup_anon_vma(prev, mid);
+ err = dup_anon_vma(prev, curr);
}
} else if (merge_prev) {
err = 0; /* case 2 */
- if (mid) {
- err = dup_anon_vma(prev, mid);
- if (end == mid->vm_end) { /* case 7 */
- remove = mid;
+ if (curr) {
+ err = dup_anon_vma(prev, curr);
+ if (end == curr->vm_end) { /* case 7 */
+ remove = curr;
} else { /* case 5 */
- adjust = mid;
- adj_start = (end - mid->vm_start);
+ adjust = curr;
+ adj_start = (end - curr->vm_start);
}
}
} else if (merge_next) {
@@ -1004,10 +1004,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_end = next->vm_end;
vma_pgoff = next->vm_pgoff;
err = 0;
- if (mid) { /* case 8 */
- vma_pgoff = mid->vm_pgoff;
- remove = mid;
- err = dup_anon_vma(next, mid);
+ if (curr) { /* case 8 */
+ vma_pgoff = curr->vm_pgoff;
+ remove = curr;
+ err = dup_anon_vma(next, curr);
}
}
}
--
2.39.2


2023-03-21 20:46:37

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v2 2/4] mm/mmap/vma_merge: set next to NULL if not applicable

We are only interested in next if end == next->vm_start (in which case we
check to see if we can set merge_next), so perform this check alongside
checking whether curr should be set.

This groups all of the simple range checks together and establishes the
invariant that, if prev, curr or next are non-NULL then their positions are
as expected.

This has no functional impact.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
mm/mmap.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index c9834364ac98..6361baf75601 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -930,15 +930,15 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (vm_flags & VM_SPECIAL)
return NULL;

- curr = find_vma(mm, prev ? prev->vm_end : 0);
- if (curr && curr->vm_end == end) /* cases 6, 7, 8 */
- next = find_vma(mm, curr->vm_end);
- else
- next = curr;
+ /* Does the input range span an existing VMA? (cases 5 - 8) */
+ curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);

- /* In cases 1 - 4 there's no CCCC vma */
- if (curr && end <= curr->vm_start)
- curr = NULL;
+ if (curr && end == curr->vm_end)
+ /* Is there is a VMA immediately adjacent (cases 6 - 8)? */
+ next = vma_lookup(mm, curr->vm_end);
+ else if (!curr)
+ /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
+ next = vma_lookup(mm, end);

/* verify some invariant that must be enforced by the caller */
VM_WARN_ON(prev && addr <= prev->vm_start);
@@ -959,11 +959,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
}
}
/* Can we merge the successor? */
- if (next && end == next->vm_start &&
- mpol_equal(policy, vma_policy(next)) &&
- can_vma_merge_before(next, vm_flags,
- anon_vma, file, pgoff+pglen,
- vm_userfaultfd_ctx, anon_name)) {
+ if (next && mpol_equal(policy, vma_policy(next)) &&
+ can_vma_merge_before(next, vm_flags,
+ anon_vma, file, pgoff+pglen,
+ vm_userfaultfd_ctx, anon_name)) {
merge_next = true;
}

--
2.39.2


2023-03-21 20:46:46

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v2 3/4] mm/mmap/vma_merge: extend invariants, avoid invalid res, vma

Previously, vma was an uninitialised variable which was only definitely
assigned as a result of the logic covering all possible input cases - for
it to have remained uninitialised, prev would have to be NULL, and next
would _have_ to be mergeable.

We now reuse vma to assign curr and next, so to be absolutely explicit,
ensure this variable is _always_ assigned, and while we're at it remove the
redundant assignment of both res and vma (if prev is NULL then we simply
assign to NULL).

In addition, we absolutely do rely on addr == curr->vm_start should curr
exist, so assert as much.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
mm/mmap.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 6361baf75601..7aec49c3bc74 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -911,7 +911,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
pgoff_t vma_pgoff;
- struct vm_area_struct *curr, *next, *res = NULL;
+ struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
int err = -1;
bool merge_prev = false;
@@ -940,14 +940,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
/* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
next = vma_lookup(mm, end);

- /* verify some invariant that must be enforced by the caller */
+ /*
+ * By default, we return prev. Cases 3, 4, 8 will instead return next
+ * and cases 3, 8 will also update vma to point at next.
+ */
+ res = vma = prev;
+
+ /* Verify some invariant that must be enforced by the caller. */
VM_WARN_ON(prev && addr <= prev->vm_start);
- VM_WARN_ON(curr && end > curr->vm_end);
+ VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
VM_WARN_ON(addr >= end);

if (prev) {
- res = prev;
- vma = prev;
vma_start = prev->vm_start;
vma_pgoff = prev->vm_pgoff;
/* Can we merge the predecessor? */
@@ -958,6 +962,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_prev(vmi);
}
}
+
/* Can we merge the successor? */
if (next && mpol_equal(policy, vma_policy(next)) &&
can_vma_merge_before(next, vm_flags,
@@ -998,6 +1003,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
adj_start = -(prev->vm_end - addr);
err = dup_anon_vma(next, prev);
} else {
+ /*
+ * Note that cases 3 and 8 are the ONLY ones where prev
+ * is permitted to be (but is not necessarily) NULL.
+ */
vma = next; /* case 3 */
vma_start = addr;
vma_end = next->vm_end;
--
2.39.2


2023-03-21 20:46:49

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v2 4/4] mm/mmap/vma_merge: init cleanup, be explicit about the non-mergeable case

Reorder the initial variables sensibly and set vma_start and vm_pgoff there
rather than later so all initial values are set at the same time meaning we
don't have to set these later.

Rather than setting err = -1 and only resetting if we hit merge cases,
explicitly check the non-mergeable case to make it abundantly clear that we
only proceed with the rest if something is mergeable, default err to 0 and
only update if an error might occur.

Move the merge_prev, merge_next cases closer to the logic determining curr,
next.

This has no functional impact.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
mm/mmap.c | 55 ++++++++++++++++++++++++++-----------------------------
1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 7aec49c3bc74..d60cb0b7ae15 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -909,18 +909,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
struct anon_vma_name *anon_name)
{
- pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
- pgoff_t vma_pgoff;
struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
- int err = -1;
+ struct vma_prepare vp;
+ int err = 0;
bool merge_prev = false;
bool merge_next = false;
bool vma_expanded = false;
- struct vma_prepare vp;
+ unsigned long vma_start = prev ? prev->vm_start : addr;
unsigned long vma_end = end;
+ pgoff_t vma_pgoff = prev ? prev->vm_pgoff : 0;
+ pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
long adj_start = 0;
- unsigned long vma_start = addr;

validate_mm(mm);
/*
@@ -940,6 +940,23 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
/* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
next = vma_lookup(mm, end);

+ /* Can we merge the predecessor? */
+ if (prev && addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
+ && can_vma_merge_after(prev, vm_flags, anon_vma, file,
+ pgoff, vm_userfaultfd_ctx, anon_name)) {
+ merge_prev = true;
+ vma_prev(vmi);
+ }
+
+ /* Can we merge the successor? */
+ merge_next = next && mpol_equal(policy, vma_policy(next)) &&
+ can_vma_merge_before(next, vm_flags,
+ anon_vma, file, pgoff+pglen,
+ vm_userfaultfd_ctx, anon_name);
+
+ if (!merge_prev && !merge_next)
+ return NULL; /* Not mergeable. */
+
/*
* By default, we return prev. Cases 3, 4, 8 will instead return next
* and cases 3, 8 will also update vma to point at next.
@@ -951,26 +968,6 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
VM_WARN_ON(addr >= end);

- if (prev) {
- vma_start = prev->vm_start;
- vma_pgoff = prev->vm_pgoff;
- /* Can we merge the predecessor? */
- if (prev->vm_end == addr && mpol_equal(vma_policy(prev), policy)
- && can_vma_merge_after(prev, vm_flags, anon_vma, file,
- pgoff, vm_userfaultfd_ctx, anon_name)) {
- merge_prev = true;
- vma_prev(vmi);
- }
- }
-
- /* Can we merge the successor? */
- if (next && mpol_equal(policy, vma_policy(next)) &&
- can_vma_merge_before(next, vm_flags,
- anon_vma, file, pgoff+pglen,
- vm_userfaultfd_ctx, anon_name)) {
- merge_next = true;
- }
-
remove = remove2 = adjust = NULL;
/* Can we merge both the predecessor and the successor? */
if (merge_prev && merge_next &&
@@ -985,7 +982,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
err = dup_anon_vma(prev, curr);
}
} else if (merge_prev) {
- err = 0; /* case 2 */
+ /* case 2 */
if (curr) {
err = dup_anon_vma(prev, curr);
if (end == curr->vm_end) { /* case 7 */
@@ -995,7 +992,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
adj_start = (end - curr->vm_start);
}
}
- } else if (merge_next) {
+ } else { /* merge_next */
res = next;
if (prev && addr < prev->vm_end) { /* case 4 */
vma_end = addr;
@@ -1011,7 +1008,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_start = addr;
vma_end = next->vm_end;
vma_pgoff = next->vm_pgoff;
- err = 0;
+
if (curr) { /* case 8 */
vma_pgoff = curr->vm_pgoff;
remove = curr;
@@ -1020,7 +1017,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
}
}

- /* Cannot merge or error in anon_vma clone */
+ /* Error in anon_vma clone. */
if (err)
return NULL;

--
2.39.2


2023-03-22 01:49:31

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] mm/mmap/vma_merge: set next to NULL if not applicable

* Lorenzo Stoakes <[email protected]> [230321 16:51]:
> We are only interested in next if end == next->vm_start (in which case we
> check to see if we can set merge_next), so perform this check alongside
> checking whether curr should be set.
>
> This groups all of the simple range checks together and establishes the
> invariant that, if prev, curr or next are non-NULL then their positions are
> as expected.
>
> This has no functional impact.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> mm/mmap.c | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c9834364ac98..6361baf75601 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -930,15 +930,15 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (vm_flags & VM_SPECIAL)
> return NULL;
>
> - curr = find_vma(mm, prev ? prev->vm_end : 0);
> - if (curr && curr->vm_end == end) /* cases 6, 7, 8 */
> - next = find_vma(mm, curr->vm_end);
> - else
> - next = curr;
> + /* Does the input range span an existing VMA? (cases 5 - 8) */
> + curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
>
> - /* In cases 1 - 4 there's no CCCC vma */
> - if (curr && end <= curr->vm_start)
> - curr = NULL;
> + if (curr && end == curr->vm_end)
> + /* Is there is a VMA immediately adjacent (cases 6 - 8)? */
> + next = vma_lookup(mm, curr->vm_end);

Since end == curr->vm_end, this lookup is the same as below so these two
statements can be combined.

I also think you may need to initialize next to null since it may not be
set for the 'cannot merge' case.

Something like:
if ((!curr) || /* case 1-4 */
(end == curr->vm_end)) /* Case 6-8, adjacent vma */
next = vma_lookup(mm, end);
else
next = NULL


> + else if (!curr)
> + /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> + next = vma_lookup(mm, end);

Nit, can we have braces for comments that make the if/else look like
it's unguarded?

>
> /* verify some invariant that must be enforced by the caller */
> VM_WARN_ON(prev && addr <= prev->vm_start);
> @@ -959,11 +959,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> }
> }
> /* Can we merge the successor? */
> - if (next && end == next->vm_start &&
> - mpol_equal(policy, vma_policy(next)) &&
> - can_vma_merge_before(next, vm_flags,
> - anon_vma, file, pgoff+pglen,
> - vm_userfaultfd_ctx, anon_name)) {
> + if (next && mpol_equal(policy, vma_policy(next)) &&
> + can_vma_merge_before(next, vm_flags,
> + anon_vma, file, pgoff+pglen,
> + vm_userfaultfd_ctx, anon_name)) {
> merge_next = true;
> }
>
> --
> 2.39.2
>

2023-03-22 02:14:02

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/mmap/vma_merge: init cleanup, be explicit about the non-mergeable case

* Lorenzo Stoakes <[email protected]> [230321 16:46]:
> Reorder the initial variables sensibly and set vma_start and vm_pgoff there
^vma_pgoff
Indicating it is used for the vm_area_struct *vma

> rather than later so all initial values are set at the same time meaning we
> don't have to set these later.

I did these later to reduce the number of times we were checking prev.
With this patch, we check prev 3 times, but before we were checking it
once. The compiler might do something clever here to reduce the
checking?

I'm also not sure adding the conditional operator in the init helps your
goal of cleaning it up.

>
> Rather than setting err = -1 and only resetting if we hit merge cases,
> explicitly check the non-mergeable case to make it abundantly clear that we
> only proceed with the rest if something is mergeable, default err to 0 and
> only update if an error might occur.
>
> Move the merge_prev, merge_next cases closer to the logic determining curr,
> next.
>
> This has no functional impact.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> mm/mmap.c | 55 ++++++++++++++++++++++++++-----------------------------
> 1 file changed, 26 insertions(+), 29 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 7aec49c3bc74..d60cb0b7ae15 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -909,18 +909,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> struct anon_vma_name *anon_name)
> {
> - pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> - pgoff_t vma_pgoff;
> struct vm_area_struct *curr, *next, *res;
> struct vm_area_struct *vma, *adjust, *remove, *remove2;
> - int err = -1;
> + struct vma_prepare vp;
> + int err = 0;
> bool merge_prev = false;
> bool merge_next = false;
> bool vma_expanded = false;
> - struct vma_prepare vp;
> + unsigned long vma_start = prev ? prev->vm_start : addr;
> unsigned long vma_end = end;
> + pgoff_t vma_pgoff = prev ? prev->vm_pgoff : 0;
> + pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> long adj_start = 0;
> - unsigned long vma_start = addr;
>
> validate_mm(mm);
> /*
> @@ -940,6 +940,23 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> next = vma_lookup(mm, end);
>
> + /* Can we merge the predecessor? */
> + if (prev && addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
> + && can_vma_merge_after(prev, vm_flags, anon_vma, file,
> + pgoff, vm_userfaultfd_ctx, anon_name)) {
> + merge_prev = true;
> + vma_prev(vmi);
> + }
> +
> + /* Can we merge the successor? */
> + merge_next = next && mpol_equal(policy, vma_policy(next)) &&
> + can_vma_merge_before(next, vm_flags,
> + anon_vma, file, pgoff+pglen,
> + vm_userfaultfd_ctx, anon_name);
> +
> + if (!merge_prev && !merge_next)
> + return NULL; /* Not mergeable. */
> +
> /*
> * By default, we return prev. Cases 3, 4, 8 will instead return next
> * and cases 3, 8 will also update vma to point at next.
> @@ -951,26 +968,6 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> VM_WARN_ON(addr >= end);
>
> - if (prev) {
> - vma_start = prev->vm_start;
> - vma_pgoff = prev->vm_pgoff;
> - /* Can we merge the predecessor? */
> - if (prev->vm_end == addr && mpol_equal(vma_policy(prev), policy)
> - && can_vma_merge_after(prev, vm_flags, anon_vma, file,
> - pgoff, vm_userfaultfd_ctx, anon_name)) {
> - merge_prev = true;
> - vma_prev(vmi);
> - }
> - }
> -
> - /* Can we merge the successor? */
> - if (next && mpol_equal(policy, vma_policy(next)) &&
> - can_vma_merge_before(next, vm_flags,
> - anon_vma, file, pgoff+pglen,
> - vm_userfaultfd_ctx, anon_name)) {
> - merge_next = true;
> - }
> -
> remove = remove2 = adjust = NULL;
> /* Can we merge both the predecessor and the successor? */
> if (merge_prev && merge_next &&
> @@ -985,7 +982,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> err = dup_anon_vma(prev, curr);
> }
> } else if (merge_prev) {
> - err = 0; /* case 2 */
> + /* case 2 */
> if (curr) {
> err = dup_anon_vma(prev, curr);
> if (end == curr->vm_end) { /* case 7 */
> @@ -995,7 +992,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> adj_start = (end - curr->vm_start);
> }
> }
> - } else if (merge_next) {
> + } else { /* merge_next */
> res = next;
> if (prev && addr < prev->vm_end) { /* case 4 */
> vma_end = addr;
> @@ -1011,7 +1008,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vma_start = addr;
> vma_end = next->vm_end;
> vma_pgoff = next->vm_pgoff;
> - err = 0;
> +

Was this blank line intentional? I assume so, to give a gap for the
comment below? It's probably worth having.

> if (curr) { /* case 8 */
> vma_pgoff = curr->vm_pgoff;
> remove = curr;
> @@ -1020,7 +1017,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> }
> }
>
> - /* Cannot merge or error in anon_vma clone */
> + /* Error in anon_vma clone. */
> if (err)
> return NULL;
>
> --
> 2.39.2
>
>

2023-03-22 06:18:44

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/mmap/vma_merge: init cleanup, be explicit about the non-mergeable case

On Tue, Mar 21, 2023 at 10:08:58PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <[email protected]> [230321 16:46]:
> > Reorder the initial variables sensibly and set vma_start and vm_pgoff there
> ^vma_pgoff
> Indicating it is used for the vm_area_struct *vma
>
> > rather than later so all initial values are set at the same time meaning we
> > don't have to set these later.
>
> I did these later to reduce the number of times we were checking prev.
> With this patch, we check prev 3 times, but before we were checking it
> once. The compiler might do something clever here to reduce the
> checking?
>

Apologies for undoing your work! Which obviously wasn't my intention :)

I suspect the compiler would, but we probably shouldn't rely on it.

> I'm also not sure adding the conditional operator in the init helps your
> goal of cleaning it up.

The purpose was to group everything together, reduce indentation in the
prev case, have it resemble the next case more closely, reduce LoC and to
have the prev if block be solely concerned with merging the predecessor
rather than both setting these values and then checking to see if we can
merge the predecessor.

However, on second thoughts I think avoiding repeatedly checking it trumps
that so I will revert to the previous approach.

>
> >
> > Rather than setting err = -1 and only resetting if we hit merge cases,
> > explicitly check the non-mergeable case to make it abundantly clear that we
> > only proceed with the rest if something is mergeable, default err to 0 and
> > only update if an error might occur.
> >
> > Move the merge_prev, merge_next cases closer to the logic determining curr,
> > next.
> >
> > This has no functional impact.
> >
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
> > ---
> > mm/mmap.c | 55 ++++++++++++++++++++++++++-----------------------------
> > 1 file changed, 26 insertions(+), 29 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 7aec49c3bc74..d60cb0b7ae15 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -909,18 +909,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> > struct anon_vma_name *anon_name)
> > {
> > - pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> > - pgoff_t vma_pgoff;
> > struct vm_area_struct *curr, *next, *res;
> > struct vm_area_struct *vma, *adjust, *remove, *remove2;
> > - int err = -1;
> > + struct vma_prepare vp;
> > + int err = 0;
> > bool merge_prev = false;
> > bool merge_next = false;
> > bool vma_expanded = false;
> > - struct vma_prepare vp;
> > + unsigned long vma_start = prev ? prev->vm_start : addr;
> > unsigned long vma_end = end;
> > + pgoff_t vma_pgoff = prev ? prev->vm_pgoff : 0;
> > + pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> > long adj_start = 0;
> > - unsigned long vma_start = addr;
> >
> > validate_mm(mm);
> > /*
> > @@ -940,6 +940,23 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> > next = vma_lookup(mm, end);
> >
> > + /* Can we merge the predecessor? */
> > + if (prev && addr == prev->vm_end && mpol_equal(vma_policy(prev), policy)
> > + && can_vma_merge_after(prev, vm_flags, anon_vma, file,
> > + pgoff, vm_userfaultfd_ctx, anon_name)) {
> > + merge_prev = true;
> > + vma_prev(vmi);
> > + }
> > +
> > + /* Can we merge the successor? */
> > + merge_next = next && mpol_equal(policy, vma_policy(next)) &&
> > + can_vma_merge_before(next, vm_flags,
> > + anon_vma, file, pgoff+pglen,
> > + vm_userfaultfd_ctx, anon_name);
> > +
> > + if (!merge_prev && !merge_next)
> > + return NULL; /* Not mergeable. */
> > +
> > /*
> > * By default, we return prev. Cases 3, 4, 8 will instead return next
> > * and cases 3, 8 will also update vma to point at next.
> > @@ -951,26 +968,6 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> > VM_WARN_ON(addr >= end);
> >
> > - if (prev) {
> > - vma_start = prev->vm_start;
> > - vma_pgoff = prev->vm_pgoff;
> > - /* Can we merge the predecessor? */
> > - if (prev->vm_end == addr && mpol_equal(vma_policy(prev), policy)
> > - && can_vma_merge_after(prev, vm_flags, anon_vma, file,
> > - pgoff, vm_userfaultfd_ctx, anon_name)) {
> > - merge_prev = true;
> > - vma_prev(vmi);
> > - }
> > - }
> > -
> > - /* Can we merge the successor? */
> > - if (next && mpol_equal(policy, vma_policy(next)) &&
> > - can_vma_merge_before(next, vm_flags,
> > - anon_vma, file, pgoff+pglen,
> > - vm_userfaultfd_ctx, anon_name)) {
> > - merge_next = true;
> > - }
> > -
> > remove = remove2 = adjust = NULL;
> > /* Can we merge both the predecessor and the successor? */
> > if (merge_prev && merge_next &&
> > @@ -985,7 +982,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > err = dup_anon_vma(prev, curr);
> > }
> > } else if (merge_prev) {
> > - err = 0; /* case 2 */
> > + /* case 2 */
> > if (curr) {
> > err = dup_anon_vma(prev, curr);
> > if (end == curr->vm_end) { /* case 7 */
> > @@ -995,7 +992,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > adj_start = (end - curr->vm_start);
> > }
> > }
> > - } else if (merge_next) {
> > + } else { /* merge_next */
> > res = next;
> > if (prev && addr < prev->vm_end) { /* case 4 */
> > vma_end = addr;
> > @@ -1011,7 +1008,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > vma_start = addr;
> > vma_end = next->vm_end;
> > vma_pgoff = next->vm_pgoff;
> > - err = 0;
> > +
>
> Was this blank line intentional? I assume so, to give a gap for the
> comment below? It's probably worth having.
>
> > if (curr) { /* case 8 */
> > vma_pgoff = curr->vm_pgoff;
> > remove = curr;
> > @@ -1020,7 +1017,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > }
> > }
> >
> > - /* Cannot merge or error in anon_vma clone */
> > + /* Error in anon_vma clone. */
> > if (err)
> > return NULL;
> >
> > --
> > 2.39.2
> >
> >

2023-03-22 06:42:20

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] mm/mmap/vma_merge: set next to NULL if not applicable

On Tue, Mar 21, 2023 at 09:42:32PM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <[email protected]> [230321 16:51]:
> > We are only interested in next if end == next->vm_start (in which case we
> > check to see if we can set merge_next), so perform this check alongside
> > checking whether curr should be set.
> >
> > This groups all of the simple range checks together and establishes the
> > invariant that, if prev, curr or next are non-NULL then their positions are
> > as expected.
> >
> > This has no functional impact.
> >
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
> > ---
> > mm/mmap.c | 25 ++++++++++++-------------
> > 1 file changed, 12 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index c9834364ac98..6361baf75601 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -930,15 +930,15 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > if (vm_flags & VM_SPECIAL)
> > return NULL;
> >
> > - curr = find_vma(mm, prev ? prev->vm_end : 0);
> > - if (curr && curr->vm_end == end) /* cases 6, 7, 8 */
> > - next = find_vma(mm, curr->vm_end);
> > - else
> > - next = curr;
> > + /* Does the input range span an existing VMA? (cases 5 - 8) */
> > + curr = find_vma_intersection(mm, prev ? prev->vm_end : 0, end);
> >
> > - /* In cases 1 - 4 there's no CCCC vma */
> > - if (curr && end <= curr->vm_start)
> > - curr = NULL;
> > + if (curr && end == curr->vm_end)
> > + /* Is there is a VMA immediately adjacent (cases 6 - 8)? */
> > + next = vma_lookup(mm, curr->vm_end);
>
> Since end == curr->vm_end, this lookup is the same as below so these two
> statements can be combined.

I can't believe I didn't see this :) wood for the trees + a great example as to
the benefit of code review.

>
> I also think you may need to initialize next to null since it may not be
> set for the 'cannot merge' case.

You're right, will fix.

>
> Something like:
> if ((!curr) || /* case 1-4 */
> (end == curr->vm_end)) /* Case 6-8, adjacent vma */
> next = vma_lookup(mm, end);
> else
> next = NULL
>

This is really turning out to be quite nicely succinct now I think! Will do.

>
> > + else if (!curr)
> > + /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> > + next = vma_lookup(mm, end);
>
> Nit, can we have braces for comments that make the if/else look like
> it's unguarded?
>

Ack will fix in respin.

> >
> > /* verify some invariant that must be enforced by the caller */
> > VM_WARN_ON(prev && addr <= prev->vm_start);
> > @@ -959,11 +959,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > }
> > }
> > /* Can we merge the successor? */
> > - if (next && end == next->vm_start &&
> > - mpol_equal(policy, vma_policy(next)) &&
> > - can_vma_merge_before(next, vm_flags,
> > - anon_vma, file, pgoff+pglen,
> > - vm_userfaultfd_ctx, anon_name)) {
> > + if (next && mpol_equal(policy, vma_policy(next)) &&
> > + can_vma_merge_before(next, vm_flags,
> > + anon_vma, file, pgoff+pglen,
> > + vm_userfaultfd_ctx, anon_name)) {
> > merge_next = true;
> > }
> >
> > --
> > 2.39.2
> >

2023-03-22 09:18:36

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] mm/mmap/vma_merge: extend invariants, avoid invalid res, vma

On 3/21/23 21:45, Lorenzo Stoakes wrote:
> Previously, vma was an uninitialised variable which was only definitely
> assigned as a result of the logic covering all possible input cases - for
> it to have remained uninitialised, prev would have to be NULL, and next
> would _have_ to be mergeable.
>
> We now reuse vma to assign curr and next, so to be absolutely explicit,
> ensure this variable is _always_ assigned, and while we're at it remove the
> redundant assignment of both res and vma (if prev is NULL then we simply
> assign to NULL).
>
> In addition, we absolutely do rely on addr == curr->vm_start should curr
> exist, so assert as much.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

Nit suggestion below.

> ---
> mm/mmap.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6361baf75601..7aec49c3bc74 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -911,7 +911,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> {
> pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> pgoff_t vma_pgoff;
> - struct vm_area_struct *curr, *next, *res = NULL;
> + struct vm_area_struct *curr, *next, *res;
> struct vm_area_struct *vma, *adjust, *remove, *remove2;
> int err = -1;
> bool merge_prev = false;
> @@ -940,14 +940,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> next = vma_lookup(mm, end);
>
> - /* verify some invariant that must be enforced by the caller */
> + /*
> + * By default, we return prev. Cases 3, 4, 8 will instead return next
> + * and cases 3, 8 will also update vma to point at next.
> + */
> + res = vma = prev;

Later in the function there's a line:

remove = remove2 = adjust = NULL;

Now it would make sense to move it up here?

> +
> + /* Verify some invariant that must be enforced by the caller. */
> VM_WARN_ON(prev && addr <= prev->vm_start);
> - VM_WARN_ON(curr && end > curr->vm_end);
> + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> VM_WARN_ON(addr >= end);
>
> if (prev) {
> - res = prev;
> - vma = prev;
> vma_start = prev->vm_start;
> vma_pgoff = prev->vm_pgoff;
> /* Can we merge the predecessor? */
> @@ -958,6 +962,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vma_prev(vmi);
> }
> }
> +
> /* Can we merge the successor? */
> if (next && mpol_equal(policy, vma_policy(next)) &&
> can_vma_merge_before(next, vm_flags,
> @@ -998,6 +1003,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> adj_start = -(prev->vm_end - addr);
> err = dup_anon_vma(next, prev);
> } else {
> + /*
> + * Note that cases 3 and 8 are the ONLY ones where prev
> + * is permitted to be (but is not necessarily) NULL.
> + */
> vma = next; /* case 3 */
> vma_start = addr;
> vma_end = next->vm_end;

2023-03-23 17:16:32

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] mm/mmap/vma_merge: extend invariants, avoid invalid res, vma

On 21.03.2023 21:45, Lorenzo Stoakes wrote:
> Previously, vma was an uninitialised variable which was only definitely
> assigned as a result of the logic covering all possible input cases - for
> it to have remained uninitialised, prev would have to be NULL, and next
> would _have_ to be mergeable.
>
> We now reuse vma to assign curr and next, so to be absolutely explicit,
> ensure this variable is _always_ assigned, and while we're at it remove the
> redundant assignment of both res and vma (if prev is NULL then we simply
> assign to NULL).
>
> In addition, we absolutely do rely on addr == curr->vm_start should curr
> exist, so assert as much.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>

This patch has been merged into today's linux next-20230323 as commit
6426bbcc76be ("mm/mmap/vma_merge: extend invariants, avoid invalid res,
vma").

Unfortunately it breaks booting of some ARM 32bit machines, like Samsung
Exynos5422-based Odroid XU3 board. This shortened log shows the issue:

Run /sbin/init as init process
  with arguments:
    /sbin/init
  with environment:
    HOME=/
    TERM=linux
8<--- cut here ---
Unhandled fault: page domain fault (0x01b) at 0xb6f03010
[b6f03010] *pgd=b5e84835
Internal error: : 1b [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 2 PID: 1 Comm: init Not tainted 6.3.0-rc1-00296-g6426bbcc76be #6445
Hardware name: Samsung Exynos (Flattened Device Tree)
PC is at vma_merge+0xa0/0x728
LR is at vma_merge+0x294/0x728
pc : [<c02b08a8>]    lr : [<c02b0a9c>]    psr: a0000013
...
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4f11406a  DAC: 00000051
...
Process init (pid: 1, stack limit = 0x5219a5c7)
Stack: (0xf0835e30 to 0xf0836000)
...
 vma_merge from mprotect_fixup+0xc8/0x290
 mprotect_fixup from do_mprotect_pkey.constprop.0+0x210/0x338
 do_mprotect_pkey.constprop.0 from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0835fa8 to 0xf0835ff0)
...
---[ end trace 0000000000000000 ]---
note: init[1] exited with irqs disabled
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Reverting it on top of linux-next, together with 183b2bced4c9
("mm/mmap/vma_merge: init cleanup, be explicit about the non-mergeable
case"), which depends on this patch, fixes the boot issue.


> ---
> mm/mmap.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6361baf75601..7aec49c3bc74 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -911,7 +911,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> {
> pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> pgoff_t vma_pgoff;
> - struct vm_area_struct *curr, *next, *res = NULL;
> + struct vm_area_struct *curr, *next, *res;
> struct vm_area_struct *vma, *adjust, *remove, *remove2;
> int err = -1;
> bool merge_prev = false;
> @@ -940,14 +940,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> next = vma_lookup(mm, end);
>
> - /* verify some invariant that must be enforced by the caller */
> + /*
> + * By default, we return prev. Cases 3, 4, 8 will instead return next
> + * and cases 3, 8 will also update vma to point at next.
> + */
> + res = vma = prev;
> +
> + /* Verify some invariant that must be enforced by the caller. */
> VM_WARN_ON(prev && addr <= prev->vm_start);
> - VM_WARN_ON(curr && end > curr->vm_end);
> + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> VM_WARN_ON(addr >= end);
>
> if (prev) {
> - res = prev;
> - vma = prev;
> vma_start = prev->vm_start;
> vma_pgoff = prev->vm_pgoff;
> /* Can we merge the predecessor? */
> @@ -958,6 +962,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vma_prev(vmi);
> }
> }
> +
> /* Can we merge the successor? */
> if (next && mpol_equal(policy, vma_policy(next)) &&
> can_vma_merge_before(next, vm_flags,
> @@ -998,6 +1003,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> adj_start = -(prev->vm_end - addr);
> err = dup_anon_vma(next, prev);
> } else {
> + /*
> + * Note that cases 3 and 8 are the ONLY ones where prev
> + * is permitted to be (but is not necessarily) NULL.
> + */
> vma = next; /* case 3 */
> vma_start = addr;
> vma_end = next->vm_end;

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2023-03-23 17:20:14

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] mm/mmap/vma_merge: extend invariants, avoid invalid res, vma

On Thu, Mar 23, 2023 at 06:00:45PM +0100, Marek Szyprowski wrote:
> On 21.03.2023 21:45, Lorenzo Stoakes wrote:
> > Previously, vma was an uninitialised variable which was only definitely
> > assigned as a result of the logic covering all possible input cases - for
> > it to have remained uninitialised, prev would have to be NULL, and next
> > would _have_ to be mergeable.
> >
> > We now reuse vma to assign curr and next, so to be absolutely explicit,
> > ensure this variable is _always_ assigned, and while we're at it remove the
> > redundant assignment of both res and vma (if prev is NULL then we simply
> > assign to NULL).
> >
> > In addition, we absolutely do rely on addr == curr->vm_start should curr
> > exist, so assert as much.
> >
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
>
> This patch has been merged into today's linux next-20230323 as commit
> 6426bbcc76be ("mm/mmap/vma_merge: extend invariants, avoid invalid res,
> vma").
>
> Unfortunately it breaks booting of some ARM 32bit machines, like Samsung
> Exynos5422-based Odroid XU3 board. This shortened log shows the issue:

I'm very sorry about this. This was due to a bug with uninitiased
state. This was fixed in v3 of this series, which is now at v5 [1]. This
has already been taken to mm-unstable in Andrew's repo, hopefully it will
be taken to -next soon!

I have made sure to reply quickly whenever this is raised to ensure people
are aware, and I will also buy anybody affected a pint if I meet them in
person :)

[1]:https://lore.kernel.org/all/[email protected]/

>
> Run /sbin/init as init process
> ? with arguments:
> ??? /sbin/init
> ? with environment:
> ??? HOME=/
> ??? TERM=linux
> 8<--- cut here ---
> Unhandled fault: page domain fault (0x01b) at 0xb6f03010
> [b6f03010] *pgd=b5e84835
> Internal error: : 1b [#1] PREEMPT SMP ARM
> Modules linked in:
> CPU: 2 PID: 1 Comm: init Not tainted 6.3.0-rc1-00296-g6426bbcc76be #6445
> Hardware name: Samsung Exynos (Flattened Device Tree)
> PC is at vma_merge+0xa0/0x728
> LR is at vma_merge+0x294/0x728
> pc : [<c02b08a8>]??? lr : [<c02b0a9c>]??? psr: a0000013
> ...
> Flags: NzCv? IRQs on? FIQs on? Mode SVC_32? ISA ARM? Segment none
> Control: 10c5387d? Table: 4f11406a? DAC: 00000051
> ...
> Process init (pid: 1, stack limit = 0x5219a5c7)
> Stack: (0xf0835e30 to 0xf0836000)
> ...
> ?vma_merge from mprotect_fixup+0xc8/0x290
> ?mprotect_fixup from do_mprotect_pkey.constprop.0+0x210/0x338
> ?do_mprotect_pkey.constprop.0 from ret_fast_syscall+0x0/0x1c
> Exception stack(0xf0835fa8 to 0xf0835ff0)
> ...
> ---[ end trace 0000000000000000 ]---
> note: init[1] exited with irqs disabled
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> Reverting it on top of linux-next, together with 183b2bced4c9
> ("mm/mmap/vma_merge: init cleanup, be explicit about the non-mergeable
> case"), which depends on this patch, fixes the boot issue.
>
>
> > ---
> > mm/mmap.c | 19 ++++++++++++++-----
> > 1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 6361baf75601..7aec49c3bc74 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -911,7 +911,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > {
> > pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> > pgoff_t vma_pgoff;
> > - struct vm_area_struct *curr, *next, *res = NULL;
> > + struct vm_area_struct *curr, *next, *res;
> > struct vm_area_struct *vma, *adjust, *remove, *remove2;
> > int err = -1;
> > bool merge_prev = false;
> > @@ -940,14 +940,18 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > /* Is there a VMA next to a hole (case 1 - 3) or prev (4)? */
> > next = vma_lookup(mm, end);
> >
> > - /* verify some invariant that must be enforced by the caller */
> > + /*
> > + * By default, we return prev. Cases 3, 4, 8 will instead return next
> > + * and cases 3, 8 will also update vma to point at next.
> > + */
> > + res = vma = prev;
> > +
> > + /* Verify some invariant that must be enforced by the caller. */
> > VM_WARN_ON(prev && addr <= prev->vm_start);
> > - VM_WARN_ON(curr && end > curr->vm_end);
> > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end));
> > VM_WARN_ON(addr >= end);
> >
> > if (prev) {
> > - res = prev;
> > - vma = prev;
> > vma_start = prev->vm_start;
> > vma_pgoff = prev->vm_pgoff;
> > /* Can we merge the predecessor? */
> > @@ -958,6 +962,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > vma_prev(vmi);
> > }
> > }
> > +
> > /* Can we merge the successor? */
> > if (next && mpol_equal(policy, vma_policy(next)) &&
> > can_vma_merge_before(next, vm_flags,
> > @@ -998,6 +1003,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > adj_start = -(prev->vm_end - addr);
> > err = dup_anon_vma(next, prev);
> > } else {
> > + /*
> > + * Note that cases 3 and 8 are the ONLY ones where prev
> > + * is permitted to be (but is not necessarily) NULL.
> > + */
> > vma = next; /* case 3 */
> > vma_start = addr;
> > vma_end = next->vm_end;
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>