Also available in git:
https://git.kernel.org/vbabka/h/vma_merge_cleanup-v1r2
Changes since RFC:
- rebased to 6.3-rc1, dropped first patch (urgent fix) that was merged there
- reindent parameters of mergeability checks (suggested by willy on IRC)
My initial goal here was to try making the check for vm_ops->close in
is_mergeable_vma() only be applied for vma's that would be truly removed
as part of the merge (see Patch 9). This would then allow reverting the
quick fix d014cd7c1c35 ("mm, mremap: fix mremap() expanding for vma's
with vm_ops->close()"). This was successful enough to allow the revert
(Patch 10). Checks using can_vma_merge_before() are still pessimistic
about possible vma removal, and making them precise would probably
complicate the vma_merge() code too much.
Liam's 6.3-rc1 simplification of vma_merge() and removal of
__vma_adjust() was very much helpful in understanding the vma_merge()
implementation and especially when vma removals can happen, which is now
very obvious. While studing the code, I've found ways to make it
hopefully even more easy to follow, so that's the patches 1-8. That made
me also notice a bug that's now already fixed in 6.3-rc1.
Vlastimil Babka (10):
mm/mmap/vma_merge: use only primary pointers for preparing merge
mm/mmap/vma_merge: use the proper vma pointer in case 3
mm/mmap/vma_merge: use the proper vma pointers in cases 1 and 6
mm/mmap/vma_merge: use the proper vma pointer in case 4
mm/mmap/vma_merge: initialize mid and next in natural order
mm/mmap/vma_merge: set mid to NULL if not applicable
mm/mmap/vma_merge: rename adj_next to adj_start
mm/mmap/vma_merge: convert mergeability checks to return bool
mm/mmap: start distinguishing if vma can be removed in mergeability
test
mm/mremap: simplify vma expansion again
mm/mmap.c | 142 ++++++++++++++++++++++++++++------------------------
mm/mremap.c | 20 ++------
2 files changed, 80 insertions(+), 82 deletions(-)
--
2.39.2
In case 3 we we use 'next' for everything but vma_pgoff. So use 'next'
for that as well, instead of 'mid', for consistency. Then in case 8 we
have to use 'mid' explicitly, which should also make the intent more
obvious.
Adjust the diagram for cases 1-3 in the comment to match the code - we
are using 'next' for case 3 so mark the range with XXXX instead of NNNN.
For case 2 that's a no-op as the code doesn't touch 'next' or 'mid'. For
case 1 it's now wrong but that will be fixed next.
No functional change.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 0a8b052e3022..1af4c9bc2c87 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -857,11 +857,11 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* mmap, brk or case 4 below case 5 below
* mremap move:
* AAAA AAAA
- * PPPP NNNN PPPPNNNNXXXX
+ * PPPP XXXX PPPPNNNNXXXX
* might become might become
* PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or
- * PPPPPPPPNNNN 2 or PPPPPPPPXXXX 7 or
- * PPPPNNNNNNNN 3 PPPPXXXXXXXX 8
+ * PPPPPPPPXXXX 2 or PPPPPPPPXXXX 7 or
+ * PPPPXXXXXXXX 3 PPPPXXXXXXXX 8
*
* It is important for case 8 that the vma NNNN overlapping the
* region AAAA is never going to extended over XXXX. Instead XXXX must
@@ -978,9 +978,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma = next; /* case 3 */
vma_start = addr;
vma_end = next->vm_end;
- vma_pgoff = mid->vm_pgoff;
+ vma_pgoff = next->vm_pgoff;
err = 0;
if (mid != next) { /* case 8 */
+ vma_pgoff = mid->vm_pgoff;
remove = mid;
err = dup_anon_vma(next, mid);
}
--
2.39.2
Since pre-git times, is_mergeable_vma() returns false for a vma with
vm_ops->close, so that no owner assumptions are violated in case the vma
is removed as part of the merge.
This check is currently very conservative and can prevent merging even
situations where vma can't be removed, such as simple expansion of
previous vma, as evidenced by commit d014cd7c1c35 ("mm, mremap: fix
mremap() expanding for vma's with vm_ops->close()")
In order to allow more merging when appropriate and simplify the code
that was made more complex by commit d014cd7c1c35, start distinguishing
cases where the vma can be really removed, and allow merging with
vm_ops->close otherwise.
As a first step, add a may_remove_vma parameter to is_mergeable_vma().
can_vma_merge_before() sets it to true, because when called from
vma_merge(), a removal of the vma is possible.
In can_vma_merge_after(), pass the parameter as false, because no
removal can occur in each of its callers:
- vma_merge() calls it on the 'prev' vma, which is never removed
- mmap_region() and do_brk_flags() call it to determine if it can expand
a vma, which is not removed
As a result, vma's with vm_ops->close may now merge with compatible
ranges in more situations than previously. We can also revert commit
d014cd7c1c35 as the next step to simplify mremap code again.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index d20bbe9ec613..65503ea07f32 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -742,12 +742,13 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
/*
* If the vma has a ->close operation then the driver probably needs to release
- * per-vma resources, so we don't attempt to merge those.
+ * per-vma resources, so we don't attempt to merge those in case the caller
+ * indicates the current vma may be removed as part of the merge.
*/
static inline bool is_mergeable_vma(struct vm_area_struct *vma,
struct file *file, unsigned long vm_flags,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+ struct anon_vma_name *anon_name, bool may_remove_vma)
{
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -761,7 +762,7 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma,
return false;
if (vma->vm_file != file)
return false;
- if (vma->vm_ops && vma->vm_ops->close)
+ if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
return false;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
return false;
@@ -793,6 +794,8 @@ static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
* We don't check here for the merged mmap wrapping around the end of pagecache
* indices (16TB on ia32) because do_mmap() does not permit mmap's which
* wrap, nor mmaps which cover the final page at index -1UL.
+ *
+ * We assume the vma may be removed as part of the merge.
*/
static bool
can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
@@ -800,7 +803,7 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
struct anon_vma_name *anon_name)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
+ if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
if (vma->vm_pgoff == vm_pgoff)
return true;
@@ -814,6 +817,8 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
*
* We cannot merge two vmas if they have differently assigned (non-NULL)
* anon_vmas, nor if same anon_vma is assigned but offsets incompatible.
+ *
+ * We assume that vma is not removed as part of the merge.
*/
static bool
can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
@@ -821,7 +826,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
struct anon_vma_name *anon_name)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
+ if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
pgoff_t vm_pglen;
vm_pglen = vma_pages(vma);
--
2.39.2
There are several places where we test if 'mid' is really the area NNNN
in the diagram and the tests have two variants and are non-obvious to
follow. Instead, set 'mid' to NULL up-front if it's not the NNNN area,
and simplify the tests.
Also update the description in comment accordingly.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index be60b344e4b1..3396c9b13f1c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -848,10 +848,11 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
*
* The following mprotect cases have to be considered, where AAAA is
* the area passed down from mprotect_fixup, never extending beyond one
- * vma, PPPPPP is the prev vma specified, and NNNNNN the next vma after:
+ * vma, PPPPPP is the prev vma specified, NNNN is a vma that overlaps
+ * the area AAAA and XXXXXX the next vma after AAAA:
*
* AAAA AAAA AAAA
- * PPPPPPNNNNNN PPPPPPXXXXXX PPPPPPNNNNNN
+ * PPPPPPXXXXXX PPPPPPXXXXXX PPPPPPNNNNNN
* cannot merge might become might become
* PPXXXXXXXXXX PPPPPPPPPPNN
* mmap, brk or case 4 below case 5 below
@@ -879,9 +880,10 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
*
* In the code below:
* PPPP is represented by *prev
- * NNNN is represented by *mid (and possibly equal to *next)
- * XXXX is represented by *next or not represented at all.
- * AAAA is not represented - it will be merged or the function will return NULL
+ * NNNN is represented by *mid or not represented at all (NULL)
+ * XXXX is represented by *next or not represented at all (NULL)
+ * AAAA is not represented - it will be merged and the vma containing the
+ * area is returned, or the function will return NULL
*/
struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
struct vm_area_struct *prev, unsigned long addr,
@@ -918,6 +920,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
else
next = mid;
+ if (mid && end <= mid->vm_start)
+ mid = NULL;
+
/* verify some invariant that must be enforced by the caller */
VM_WARN_ON(prev && addr <= prev->vm_start);
VM_WARN_ON(mid && end > mid->vm_end);
@@ -952,7 +957,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
remove = next; /* case 1 */
vma_end = next->vm_end;
err = dup_anon_vma(prev, next);
- if (mid != next) { /* case 6 */
+ if (mid) { /* case 6 */
remove = mid;
remove2 = next;
if (!next->anon_vma)
@@ -960,7 +965,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
}
} else if (merge_prev) {
err = 0; /* case 2 */
- if (mid && end > mid->vm_start) {
+ if (mid) {
err = dup_anon_vma(prev, mid);
if (end == mid->vm_end) { /* case 7 */
remove = mid;
@@ -982,7 +987,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_end = next->vm_end;
vma_pgoff = next->vm_pgoff;
err = 0;
- if (mid != next) { /* case 8 */
+ if (mid) { /* case 8 */
vma_pgoff = mid->vm_pgoff;
remove = mid;
err = dup_anon_vma(next, mid);
--
2.39.2
Case 1 is now shown in the comment as next vma being merged with prev,
so use 'next' instead of 'mid'. In case 1 they both point to the same
vma.
As a consequence, in case 6, the dup_anon_vma() is now tried first on
'next' and then on 'mid', before it was the opposite order. This is not
a functional change, as those two vma's cannnot have a different
anon_vma, as that would have prevented the merging in the first place.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 1af4c9bc2c87..c33237b283c9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -605,7 +605,7 @@ static inline void vma_complete(struct vma_prepare *vp,
/*
* In mprotect's case 6 (see comments on vma_merge),
- * we must remove the one after next as well.
+ * we are removing both mid and next vmas
*/
if (vp->remove2) {
vp->remove = vp->remove2;
@@ -948,13 +948,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
/* Can we merge both the predecessor and the successor? */
if (merge_prev && merge_next &&
is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
- remove = mid; /* case 1 */
+ remove = next; /* case 1 */
vma_end = next->vm_end;
- err = dup_anon_vma(prev, mid);
+ err = dup_anon_vma(prev, next);
if (mid != next) { /* case 6 */
+ remove = mid;
remove2 = next;
- if (!mid->anon_vma)
- err = dup_anon_vma(prev, next);
+ if (!next->anon_vma)
+ err = dup_anon_vma(prev, mid);
}
} else if (merge_prev) {
err = 0; /* case 2 */
--
2.39.2
Almost all cases now use the 'next' pointer for the vma following
the merged area, and the cases diagram shows it as XXXX. Case 4 is
different as it uses 'mid' and NNNN, so change it for consistency. No
functional change.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index c33237b283c9..420d6847c94c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -851,9 +851,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* vma, PPPPPP is the prev vma specified, and NNNNNN the next vma after:
*
* AAAA AAAA AAAA
- * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPNNNNNN
+ * PPPPPPNNNNNN PPPPPPXXXXXX PPPPPPNNNNNN
* cannot merge might become might become
- * PPNNNNNNNNNN PPPPPPPPPPNN
+ * PPXXXXXXXXXX PPPPPPPPPPNN
* mmap, brk or case 4 below case 5 below
* mremap move:
* AAAA AAAA
@@ -972,9 +972,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
res = next;
if (prev && addr < prev->vm_end) { /* case 4 */
vma_end = addr;
- adjust = mid;
+ adjust = next;
adj_next = -(prev->vm_end - addr);
- err = dup_anon_vma(mid, prev);
+ err = dup_anon_vma(next, prev);
} else {
vma = next; /* case 3 */
vma_start = addr;
--
2.39.2
In the merging preparation part of vma_merge(), some vma pointer
variables are assigned for later execution of the merge, but also read
from in the block itself. The code is easier follow and check against
the cases diagram in the comment if the code reads only from the
"primary" vma variables prev, mid, next instead. No functional change.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 740b54be3ed4..0a8b052e3022 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -950,16 +950,16 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
remove = mid; /* case 1 */
vma_end = next->vm_end;
- err = dup_anon_vma(res, remove);
+ err = dup_anon_vma(prev, mid);
if (mid != next) { /* case 6 */
remove2 = next;
- if (!remove->anon_vma)
- err = dup_anon_vma(res, remove2);
+ if (!mid->anon_vma)
+ err = dup_anon_vma(prev, next);
}
} else if (merge_prev) {
err = 0; /* case 2 */
if (mid && end > mid->vm_start) {
- err = dup_anon_vma(res, mid);
+ err = dup_anon_vma(prev, mid);
if (end == mid->vm_end) { /* case 7 */
remove = mid;
} else { /* case 5 */
@@ -972,8 +972,8 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (prev && addr < prev->vm_end) { /* case 4 */
vma_end = addr;
adjust = mid;
- adj_next = -(vma->vm_end - addr);
- err = dup_anon_vma(adjust, prev);
+ adj_next = -(prev->vm_end - addr);
+ err = dup_anon_vma(mid, prev);
} else {
vma = next; /* case 3 */
vma_start = addr;
@@ -982,7 +982,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
err = 0;
if (mid != next) { /* case 8 */
remove = mid;
- err = dup_anon_vma(res, remove);
+ err = dup_anon_vma(next, mid);
}
}
}
--
2.39.2
This effectively reverts d014cd7c1c35 ("mm, mremap: fix mremap()
expanding for vma's with vm_ops->close()"). After the recent changes,
vma_merge() is able to handle the expansion properly even when the vma
being expanded has a vm_ops->close operation, so we don't need to
special case it anymore.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mremap.c | 20 ++++----------------
1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 411a85682b58..65f5b545601e 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1040,23 +1040,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
* vma (expand operation itself) and possibly also with
* the next vma if it becomes adjacent to the expanded
* vma and otherwise compatible.
- *
- * However, vma_merge() can currently fail due to
- * is_mergeable_vma() check for vm_ops->close (see the
- * comment there). Yet this should not prevent vma
- * expanding, so perform a simple expand for such vma.
- * Ideally the check for close op should be only done
- * when a vma would be actually removed due to a merge.
*/
- if (!vma->vm_ops || !vma->vm_ops->close) {
- vma = vma_merge(&vmi, mm, vma, extension_start,
- extension_end, vma->vm_flags, vma->anon_vma,
- vma->vm_file, extension_pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
- } else if (vma_expand(&vmi, vma, vma->vm_start,
- addr + new_len, vma->vm_pgoff, NULL)) {
- vma = NULL;
- }
+ vma = vma_merge(&vmi, mm, vma, extension_start,
+ extension_end, vma->vm_flags, vma->anon_vma,
+ vma->vm_file, extension_pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (!vma) {
vm_unacct_memory(pages);
ret = -ENOMEM;
--
2.39.2
The comments already mention returning 'true' so make the code match
them.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 53 +++++++++++++++++++++++++----------------------------
1 file changed, 25 insertions(+), 28 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index c51d69592e4e..d20bbe9ec613 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -744,10 +744,10 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
* If the vma has a ->close operation then the driver probably needs to release
* per-vma resources, so we don't attempt to merge those.
*/
-static inline int is_mergeable_vma(struct vm_area_struct *vma,
- struct file *file, unsigned long vm_flags,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+static inline bool is_mergeable_vma(struct vm_area_struct *vma,
+ struct file *file, unsigned long vm_flags,
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct anon_vma_name *anon_name)
{
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -758,21 +758,20 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
* extended instead.
*/
if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
- return 0;
+ return false;
if (vma->vm_file != file)
- return 0;
+ return false;
if (vma->vm_ops && vma->vm_ops->close)
- return 0;
+ return false;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
- return 0;
+ return false;
if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
- return 0;
- return 1;
+ return false;
+ return true;
}
-static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
- struct anon_vma *anon_vma2,
- struct vm_area_struct *vma)
+static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
+ struct anon_vma *anon_vma2, struct vm_area_struct *vma)
{
/*
* The list_is_singular() test is to avoid merging VMA cloned from
@@ -780,7 +779,7 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
*/
if ((!anon_vma1 || !anon_vma2) && (!vma ||
list_is_singular(&vma->anon_vma_chain)))
- return 1;
+ return true;
return anon_vma1 == anon_vma2;
}
@@ -795,19 +794,18 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
* indices (16TB on ia32) because do_mmap() does not permit mmap's which
* wrap, nor mmaps which cover the final page at index -1UL.
*/
-static int
+static bool
can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+ struct anon_vma *anon_vma, struct file *file,
+ pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct anon_vma_name *anon_name)
{
if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
if (vma->vm_pgoff == vm_pgoff)
- return 1;
+ return true;
}
- return 0;
+ return false;
}
/*
@@ -817,21 +815,20 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
* We cannot merge two vmas if they have differently assigned (non-NULL)
* anon_vmas, nor if same anon_vma is assigned but offsets incompatible.
*/
-static int
+static bool
can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t vm_pgoff,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+ struct anon_vma *anon_vma, struct file *file,
+ pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct anon_vma_name *anon_name)
{
if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
pgoff_t vm_pglen;
vm_pglen = vma_pages(vma);
if (vma->vm_pgoff + vm_pglen == vm_pgoff)
- return 1;
+ return true;
}
- return 0;
+ return false;
}
/*
--
2.39.2
The variable 'adj_next' holds the value by which we adjust vm_start of a
vma in variable 'adjust', that's either 'next' or 'mid', so the current
name is inaccurate. Rename it to 'adj_start'.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3396c9b13f1c..c51d69592e4e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -903,7 +903,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
bool vma_expanded = false;
struct vma_prepare vp;
unsigned long vma_end = end;
- long adj_next = 0;
+ long adj_start = 0;
unsigned long vma_start = addr;
validate_mm(mm);
@@ -971,7 +971,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
remove = mid;
} else { /* case 5 */
adjust = mid;
- adj_next = (end - mid->vm_start);
+ adj_start = (end - mid->vm_start);
}
}
} else if (merge_next) {
@@ -979,7 +979,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (prev && addr < prev->vm_end) { /* case 4 */
vma_end = addr;
adjust = next;
- adj_next = -(prev->vm_end - addr);
+ adj_start = -(prev->vm_end - addr);
err = dup_anon_vma(next, prev);
} else {
vma = next; /* case 3 */
@@ -1002,7 +1002,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (vma_iter_prealloc(vmi))
return NULL;
- vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next);
+ vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start);
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
vp.anon_vma != adjust->anon_vma);
@@ -1018,10 +1018,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (vma_expanded)
vma_iter_store(vmi, vma);
- if (adj_next) {
- adjust->vm_start += adj_next;
- adjust->vm_pgoff += adj_next >> PAGE_SHIFT;
- if (adj_next < 0) {
+ if (adj_start) {
+ adjust->vm_start += adj_start;
+ adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
+ if (adj_start < 0) {
WARN_ON(vma_expanded);
vma_iter_store(vmi, next);
}
--
2.39.2
It is more intuitive to go from prev to mid and then next. No functional
change.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 420d6847c94c..be60b344e4b1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -912,10 +912,11 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (vm_flags & VM_SPECIAL)
return NULL;
- next = find_vma(mm, prev ? prev->vm_end : 0);
- mid = next;
- if (next && next->vm_end == end) /* cases 6, 7, 8 */
- next = find_vma(mm, next->vm_end);
+ mid = find_vma(mm, prev ? prev->vm_end : 0);
+ if (mid && mid->vm_end == end) /* cases 6, 7, 8 */
+ next = find_vma(mm, mid->vm_end);
+ else
+ next = mid;
/* verify some invariant that must be enforced by the caller */
VM_WARN_ON(prev && addr <= prev->vm_start);
--
2.39.2
On Thu, Mar 09, 2023 at 12:12:55PM +0100, Vlastimil Babka wrote:
> The variable 'adj_next' holds the value by which we adjust vm_start of a
> vma in variable 'adjust', that's either 'next' or 'mid', so the current
> name is inaccurate. Rename it to 'adj_start'.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3396c9b13f1c..c51d69592e4e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -903,7 +903,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> bool vma_expanded = false;
> struct vma_prepare vp;
> unsigned long vma_end = end;
> - long adj_next = 0;
> + long adj_start = 0;
> unsigned long vma_start = addr;
>
> validate_mm(mm);
> @@ -971,7 +971,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> remove = mid;
> } else { /* case 5 */
> adjust = mid;
> - adj_next = (end - mid->vm_start);
> + adj_start = (end - mid->vm_start);
> }
> }
> } else if (merge_next) {
> @@ -979,7 +979,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (prev && addr < prev->vm_end) { /* case 4 */
> vma_end = addr;
> adjust = next;
> - adj_next = -(prev->vm_end - addr);
> + adj_start = -(prev->vm_end - addr);
> err = dup_anon_vma(next, prev);
> } else {
> vma = next; /* case 3 */
> @@ -1002,7 +1002,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (vma_iter_prealloc(vmi))
> return NULL;
>
> - vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next);
> + vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start);
> init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
> VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
> vp.anon_vma != adjust->anon_vma);
> @@ -1018,10 +1018,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (vma_expanded)
> vma_iter_store(vmi, vma);
>
> - if (adj_next) {
> - adjust->vm_start += adj_next;
> - adjust->vm_pgoff += adj_next >> PAGE_SHIFT;
> - if (adj_next < 0) {
> + if (adj_start) {
> + adjust->vm_start += adj_start;
> + adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
> + if (adj_start < 0) {
> WARN_ON(vma_expanded);
> vma_iter_store(vmi, next);
> }
> --
> 2.39.2
>
It looks like Suren's dc72d59c416d ("mm/mmap: move vma_prepare before
vma_adjust_trans_huge") trivially conflicts with this change.
On Thu, Mar 09, 2023 at 12:12:49PM +0100, Vlastimil Babka wrote:
> In the merging preparation part of vma_merge(), some vma pointer
> variables are assigned for later execution of the merge, but also read
> from in the block itself. The code is easier follow and check against
> the cases diagram in the comment if the code reads only from the
> "primary" vma variables prev, mid, next instead. No functional change.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 740b54be3ed4..0a8b052e3022 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -950,16 +950,16 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
> remove = mid; /* case 1 */
> vma_end = next->vm_end;
> - err = dup_anon_vma(res, remove);
> + err = dup_anon_vma(prev, mid);
> if (mid != next) { /* case 6 */
> remove2 = next;
> - if (!remove->anon_vma)
> - err = dup_anon_vma(res, remove2);
> + if (!mid->anon_vma)
> + err = dup_anon_vma(prev, next);
> }
> } else if (merge_prev) {
> err = 0; /* case 2 */
> if (mid && end > mid->vm_start) {
> - err = dup_anon_vma(res, mid);
> + err = dup_anon_vma(prev, mid);
> if (end == mid->vm_end) { /* case 7 */
> remove = mid;
> } else { /* case 5 */
> @@ -972,8 +972,8 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (prev && addr < prev->vm_end) { /* case 4 */
> vma_end = addr;
> adjust = mid;
> - adj_next = -(vma->vm_end - addr);
> - err = dup_anon_vma(adjust, prev);
> + adj_next = -(prev->vm_end - addr);
> + err = dup_anon_vma(mid, prev);
> } else {
> vma = next; /* case 3 */
> vma_start = addr;
> @@ -982,7 +982,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> err = 0;
> if (mid != next) { /* case 8 */
> remove = mid;
> - err = dup_anon_vma(res, remove);
> + err = dup_anon_vma(next, mid);
> }
> }
> }
> --
> 2.39.2
>
Big improvement in readability already here.
Reviewed-by: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:50PM +0100, Vlastimil Babka wrote:
> In case 3 we we use 'next' for everything but vma_pgoff. So use 'next'
> for that as well, instead of 'mid', for consistency. Then in case 8 we
> have to use 'mid' explicitly, which should also make the intent more
> obvious.
>
> Adjust the diagram for cases 1-3 in the comment to match the code - we
> are using 'next' for case 3 so mark the range with XXXX instead of NNNN.
> For case 2 that's a no-op as the code doesn't touch 'next' or 'mid'. For
> case 1 it's now wrong but that will be fixed next.
>
> No functional change.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0a8b052e3022..1af4c9bc2c87 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -857,11 +857,11 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> * mmap, brk or case 4 below case 5 below
> * mremap move:
> * AAAA AAAA
> - * PPPP NNNN PPPPNNNNXXXX
> + * PPPP XXXX PPPPNNNNXXXX
> * might become might become
> * PPPPPPPPPPPP 1 or PPPPPPPPPPPP 6 or
> - * PPPPPPPPNNNN 2 or PPPPPPPPXXXX 7 or
> - * PPPPNNNNNNNN 3 PPPPXXXXXXXX 8
> + * PPPPPPPPXXXX 2 or PPPPPPPPXXXX 7 or
> + * PPPPXXXXXXXX 3 PPPPXXXXXXXX 8
> *
I'm glad you're making things more consistent and what you're addressing here is
a real clanger, but these diagrams while great to have do definitely feel
quite confusing even now. But that's something for a future patch!
> * It is important for case 8 that the vma NNNN overlapping the
> * region AAAA is never going to extended over XXXX. Instead XXXX must
> @@ -978,9 +978,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vma = next; /* case 3 */
> vma_start = addr;
> vma_end = next->vm_end;
> - vma_pgoff = mid->vm_pgoff;
> + vma_pgoff = next->vm_pgoff;
> err = 0;
> if (mid != next) { /* case 8 */
> + vma_pgoff = mid->vm_pgoff;
> remove = mid;
> err = dup_anon_vma(next, mid);
> }
> --
> 2.39.2
>
This does fix a big incongruity in that previously everything but vm_pgoff was
relative to next, while in the non-8 case mid is equal to next anyway.
Good, clarifying improvement!
Reviewed-by: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:51PM +0100, Vlastimil Babka wrote:
> Case 1 is now shown in the comment as next vma being merged with prev,
> so use 'next' instead of 'mid'. In case 1 they both point to the same
> vma.
>
> As a consequence, in case 6, the dup_anon_vma() is now tried first on
> 'next' and then on 'mid', before it was the opposite order. This is not
> a functional change, as those two vma's cannnot have a different
> anon_vma, as that would have prevented the merging in the first place.
>
This makes me wonder whether there might be further simplifications based upon
known conditions of mergeability to be had (as e.g. is_mergeable_anon_vma()
would have prevented otherwise). But perhaps I will discover these later in the
series :)
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1af4c9bc2c87..c33237b283c9 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -605,7 +605,7 @@ static inline void vma_complete(struct vma_prepare *vp,
>
> /*
> * In mprotect's case 6 (see comments on vma_merge),
> - * we must remove the one after next as well.
> + * we are removing both mid and next vmas
> */
> if (vp->remove2) {
> vp->remove = vp->remove2;
> @@ -948,13 +948,14 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> /* Can we merge both the predecessor and the successor? */
> if (merge_prev && merge_next &&
> is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
> - remove = mid; /* case 1 */
> + remove = next; /* case 1 */
> vma_end = next->vm_end;
> - err = dup_anon_vma(prev, mid);
> + err = dup_anon_vma(prev, next);
> if (mid != next) { /* case 6 */
> + remove = mid;
> remove2 = next;
> - if (!mid->anon_vma)
> - err = dup_anon_vma(prev, next);
> + if (!next->anon_vma)
> + err = dup_anon_vma(prev, mid);
> }
> } else if (merge_prev) {
> err = 0; /* case 2 */
> --
> 2.39.2
>
Reviewed-by: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:52PM +0100, Vlastimil Babka wrote:
> Almost all cases now use the 'next' pointer for the vma following
> the merged area, and the cases diagram shows it as XXXX. Case 4 is
> different as it uses 'mid' and NNNN, so change it for consistency. No
> functional change.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c33237b283c9..420d6847c94c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -851,9 +851,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> * vma, PPPPPP is the prev vma specified, and NNNNNN the next vma after:
> *
> * AAAA AAAA AAAA
> - * PPPPPPNNNNNN PPPPPPNNNNNN PPPPPPNNNNNN
> + * PPPPPPNNNNNN PPPPPPXXXXXX PPPPPPNNNNNN
> * cannot merge might become might become
> - * PPNNNNNNNNNN PPPPPPPPPPNN
> + * PPXXXXXXXXXX PPPPPPPPPPNN
> * mmap, brk or case 4 below case 5 below
> * mremap move:
> * AAAA AAAA
> @@ -972,9 +972,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> res = next;
> if (prev && addr < prev->vm_end) { /* case 4 */
> vma_end = addr;
> - adjust = mid;
> + adjust = next;
> adj_next = -(prev->vm_end - addr);
> - err = dup_anon_vma(mid, prev);
> + err = dup_anon_vma(next, prev);
> } else {
> vma = next; /* case 3 */
> vma_start = addr;
> --
> 2.39.2
>
Reviewed-By: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:53PM +0100, Vlastimil Babka wrote:
> It is more intuitive to go from prev to mid and then next. No functional
> change.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 420d6847c94c..be60b344e4b1 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -912,10 +912,11 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (vm_flags & VM_SPECIAL)
> return NULL;
>
> - next = find_vma(mm, prev ? prev->vm_end : 0);
> - mid = next;
> - if (next && next->vm_end == end) /* cases 6, 7, 8 */
> - next = find_vma(mm, next->vm_end);
> + mid = find_vma(mm, prev ? prev->vm_end : 0);
> + if (mid && mid->vm_end == end) /* cases 6, 7, 8 */
> + next = find_vma(mm, mid->vm_end);
> + else
> + next = mid;
It feels like the original implementation may have been backwards like this just
to avoid this else branch. Which is silly.
>
> /* verify some invariant that must be enforced by the caller */
> VM_WARN_ON(prev && addr <= prev->vm_start);
> --
> 2.39.2
>
Reviewed-by: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:54PM +0100, Vlastimil Babka wrote:
> There are several places where we test if 'mid' is really the area NNNN
> in the diagram and the tests have two variants and are non-obvious to
> follow. Instead, set 'mid' to NULL up-front if it's not the NNNN area,
> and simplify the tests.
>
> Also update the description in comment accordingly.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index be60b344e4b1..3396c9b13f1c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -848,10 +848,11 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> *
> * The following mprotect cases have to be considered, where AAAA is
> * the area passed down from mprotect_fixup, never extending beyond one
> - * vma, PPPPPP is the prev vma specified, and NNNNNN the next vma after:
> + * vma, PPPPPP is the prev vma specified, NNNN is a vma that overlaps
> + * the area AAAA and XXXXXX the next vma after AAAA:
I think this is worded in a bit of a confusing way + can be read as 'NNNN is a
vma that overlaps the area AAAA and XXXXXX' whereas you mean to say 'NNNN is a
VMA that overlaps the area AAAA, and XXXXXX is the next vma after AAAA'.
This therefore might be better worded as:-
'PPPP is the previous VMA, NNNN is a VMA which overlaps AAAA and XXXX is the
next VMA after AAAA.'
Also - nit, but there's also inconsistency here between the number of letters in
each block, e.g. 6 P's 4 N's 4 A's and 6 X's.
'N' and 'X' are starting to be horrifically misleading here imo, I feel as if
'N' moving to 'O' (for overlapping) and 'X' to 'N' would make a big difference
here.
> *
> * AAAA AAAA AAAA
> - * PPPPPPNNNNNN PPPPPPXXXXXX PPPPPPNNNNNN
> + * PPPPPPXXXXXX PPPPPPXXXXXX PPPPPPNNNNNN
> * cannot merge might become might become
> * PPXXXXXXXXXX PPPPPPPPPPNN
> * mmap, brk or case 4 below case 5 below
> @@ -879,9 +880,10 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> *
> * In the code below:
> * PPPP is represented by *prev
> - * NNNN is represented by *mid (and possibly equal to *next)
> - * XXXX is represented by *next or not represented at all.
> - * AAAA is not represented - it will be merged or the function will return NULL
> + * NNNN is represented by *mid or not represented at all (NULL)
> + * XXXX is represented by *next or not represented at all (NULL)
> + * AAAA is not represented - it will be merged and the vma containing the
> + * area is returned, or the function will return NULL
> */
> struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> struct vm_area_struct *prev, unsigned long addr,
> @@ -918,6 +920,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> else
> next = mid;
>
> + if (mid && end <= mid->vm_start)
> + mid = NULL;
> +
Might be worth putting a comment with the cases where this will happen, 1 - 4
right? And also something like 'does AAAA overlap with mid?'
And I really think renaming this to 'overlapping' or 'overlaps' or similar would
make a big readability difference.
However we do have the thorny issue of case 4 where A overlaps P... But probably
the fact that we treat this as a separate VMA from prev is enough to make it
clear it being called 'overlaps' means 'separate from prev, also overlaps' so I
think that's fine.
Adding this actually makes me think twice about the previous 'natural order'
patch, because the intuition which that promotes is:-
mid = VMA after prev
next = VMA after mid
[ prev ] [ mid ] [ next ]
But in reality that else branch means that next could be be equal to mid and
now if there isn't overlap we rename mid to next effectively, e.g.:-
mid = VMA after prev
next = mid
delete mid
Which feels like the 'natural' intuition is suddenly broken. Maybe this needs
reworking to be super explicit about this? Such as:-
struct vm_area_struct tmp;
...
/* If there is a previous VMA, find the next, otherwise find the first. */
tmp = find_vma(mm, prev ? prev->vm_end : 0);
/*
* If the address range overlaps with the input range (which can cover only a
* single VMA at most), then we are only interested in next if we span right up
* to its end.
*
* Otherwise we are simply left with prev and next.
*/
overlaps = tmp && end > tmp->vm_start ? tmp : NULL;
if (overlaps)
next = overlaps->vm_end == end ? overlaps->vm_next : NULL;
else
next = tmp;
Of course I haven't read the rest of the patches in this series so you may
address aspects of this already :)
> /* verify some invariant that must be enforced by the caller */
> VM_WARN_ON(prev && addr <= prev->vm_start);
> VM_WARN_ON(mid && end > mid->vm_end);
> @@ -952,7 +957,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> remove = next; /* case 1 */
> vma_end = next->vm_end;
> err = dup_anon_vma(prev, next);
> - if (mid != next) { /* case 6 */
> + if (mid) { /* case 6 */
> remove = mid;
> remove2 = next;
> if (!next->anon_vma)
> @@ -960,7 +965,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> }
> } else if (merge_prev) {
> err = 0; /* case 2 */
> - if (mid && end > mid->vm_start) {
> + if (mid) {
> err = dup_anon_vma(prev, mid);
> if (end == mid->vm_end) { /* case 7 */
> remove = mid;
> @@ -982,7 +987,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vma_end = next->vm_end;
> vma_pgoff = next->vm_pgoff;
> err = 0;
> - if (mid != next) { /* case 8 */
> + if (mid) { /* case 8 */
> vma_pgoff = mid->vm_pgoff;
> remove = mid;
> err = dup_anon_vma(next, mid);
> --
> 2.39.2
>
Other than the nitty comment notes and the conceptual discussion, this LGTM so:-
Reviewed-By: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:55PM +0100, Vlastimil Babka wrote:
> The variable 'adj_next' holds the value by which we adjust vm_start of a
> vma in variable 'adjust', that's either 'next' or 'mid', so the current
> name is inaccurate. Rename it to 'adj_start'.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3396c9b13f1c..c51d69592e4e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -903,7 +903,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> bool vma_expanded = false;
> struct vma_prepare vp;
> unsigned long vma_end = end;
> - long adj_next = 0;
> + long adj_start = 0;
> unsigned long vma_start = addr;
>
> validate_mm(mm);
> @@ -971,7 +971,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> remove = mid;
> } else { /* case 5 */
> adjust = mid;
> - adj_next = (end - mid->vm_start);
> + adj_start = (end - mid->vm_start);
> }
> }
> } else if (merge_next) {
> @@ -979,7 +979,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (prev && addr < prev->vm_end) { /* case 4 */
> vma_end = addr;
> adjust = next;
> - adj_next = -(prev->vm_end - addr);
> + adj_start = -(prev->vm_end - addr);
> err = dup_anon_vma(next, prev);
> } else {
> vma = next; /* case 3 */
> @@ -1002,7 +1002,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (vma_iter_prealloc(vmi))
> return NULL;
>
> - vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next);
> + vma_adjust_trans_huge(vma, vma_start, vma_end, adj_start);
> init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
> VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
> vp.anon_vma != adjust->anon_vma);
> @@ -1018,10 +1018,10 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> if (vma_expanded)
> vma_iter_store(vmi, vma);
>
> - if (adj_next) {
> - adjust->vm_start += adj_next;
> - adjust->vm_pgoff += adj_next >> PAGE_SHIFT;
> - if (adj_next < 0) {
> + if (adj_start) {
> + adjust->vm_start += adj_start;
> + adjust->vm_pgoff += adj_start >> PAGE_SHIFT;
> + if (adj_start < 0) {
> WARN_ON(vma_expanded);
> vma_iter_store(vmi, next);
> }
> --
> 2.39.2
>
This feels like a very silly oversight in the first instance but I imagine one
that is a result of 'code evolution' :) excellent improvement in clarity, so:-
Reviewed-By: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:56PM +0100, Vlastimil Babka wrote:
> The comments already mention returning 'true' so make the code match
> them.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 53 +++++++++++++++++++++++++----------------------------
> 1 file changed, 25 insertions(+), 28 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c51d69592e4e..d20bbe9ec613 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -744,10 +744,10 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> * If the vma has a ->close operation then the driver probably needs to release
> * per-vma resources, so we don't attempt to merge those.
> */
> -static inline int is_mergeable_vma(struct vm_area_struct *vma,
> - struct file *file, unsigned long vm_flags,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> +static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> + struct file *file, unsigned long vm_flags,
> + struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> + struct anon_vma_name *anon_name)
> {
> /*
> * VM_SOFTDIRTY should not prevent from VMA merging, if we
> @@ -758,21 +758,20 @@ static inline int is_mergeable_vma(struct vm_area_struct *vma,
> * extended instead.
> */
> if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
> - return 0;
> + return false;
> if (vma->vm_file != file)
> - return 0;
> + return false;
> if (vma->vm_ops && vma->vm_ops->close)
> - return 0;
> + return false;
> if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
> - return 0;
> + return false;
> if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
> - return 0;
> - return 1;
> + return false;
> + return true;
> }
>
> -static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> - struct anon_vma *anon_vma2,
> - struct vm_area_struct *vma)
> +static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> + struct anon_vma *anon_vma2, struct vm_area_struct *vma)
> {
> /*
> * The list_is_singular() test is to avoid merging VMA cloned from
> @@ -780,7 +779,7 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> */
> if ((!anon_vma1 || !anon_vma2) && (!vma ||
> list_is_singular(&vma->anon_vma_chain)))
> - return 1;
> + return true;
> return anon_vma1 == anon_vma2;
> }
>
> @@ -795,19 +794,18 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> * indices (16TB on ia32) because do_mmap() does not permit mmap's which
> * wrap, nor mmaps which cover the final page at index -1UL.
> */
> -static int
> +static bool
> can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> - struct anon_vma *anon_vma, struct file *file,
> - pgoff_t vm_pgoff,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> + struct anon_vma *anon_vma, struct file *file,
> + pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> + struct anon_vma_name *anon_name)
> {
> if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
> is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> if (vma->vm_pgoff == vm_pgoff)
> - return 1;
> + return true;
> }
> - return 0;
> + return false;
> }
>
> /*
> @@ -817,21 +815,20 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> * We cannot merge two vmas if they have differently assigned (non-NULL)
> * anon_vmas, nor if same anon_vma is assigned but offsets incompatible.
> */
> -static int
> +static bool
> can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> - struct anon_vma *anon_vma, struct file *file,
> - pgoff_t vm_pgoff,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> + struct anon_vma *anon_vma, struct file *file,
> + pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> + struct anon_vma_name *anon_name)
> {
> if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
> is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> pgoff_t vm_pglen;
> vm_pglen = vma_pages(vma);
> if (vma->vm_pgoff + vm_pglen == vm_pgoff)
> - return 1;
> + return true;
> }
> - return 0;
> + return false;
> }
>
> /*
> --
> 2.39.2
>
Partying like it's C99 :) Good improvement, feels sensible to move on from using
return types as if the 1st edition of The C Programming Language were still
current! Therefore,
Reviewed-By: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:57PM +0100, Vlastimil Babka wrote:
> Since pre-git times, is_mergeable_vma() returns false for a vma with
> vm_ops->close, so that no owner assumptions are violated in case the vma
> is removed as part of the merge.
>
> This check is currently very conservative and can prevent merging even
> situations where vma can't be removed, such as simple expansion of
> previous vma, as evidenced by commit d014cd7c1c35 ("mm, mremap: fix
> mremap() expanding for vma's with vm_ops->close()")
>
> In order to allow more merging when appropriate and simplify the code
> that was made more complex by commit d014cd7c1c35, start distinguishing
> cases where the vma can be really removed, and allow merging with
> vm_ops->close otherwise.
>
> As a first step, add a may_remove_vma parameter to is_mergeable_vma().
> can_vma_merge_before() sets it to true, because when called from
> vma_merge(), a removal of the vma is possible.
>
> In can_vma_merge_after(), pass the parameter as false, because no
> removal can occur in each of its callers:
> - vma_merge() calls it on the 'prev' vma, which is never removed
> - mmap_region() and do_brk_flags() call it to determine if it can expand
> a vma, which is not removed
>
> As a result, vma's with vm_ops->close may now merge with compatible
> ranges in more situations than previously. We can also revert commit
> d014cd7c1c35 as the next step to simplify mremap code again.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d20bbe9ec613..65503ea07f32 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -742,12 +742,13 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>
> /*
> * If the vma has a ->close operation then the driver probably needs to release
> - * per-vma resources, so we don't attempt to merge those.
> + * per-vma resources, so we don't attempt to merge those in case the caller
> + * indicates the current vma may be removed as part of the merge.
Nit: 'in case the caller indicates' -> 'if the caller indicates'
> */
> static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> struct file *file, unsigned long vm_flags,
> struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> - struct anon_vma_name *anon_name)
> + struct anon_vma_name *anon_name, bool may_remove_vma)
> {
> /*
> * VM_SOFTDIRTY should not prevent from VMA merging, if we
> @@ -761,7 +762,7 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma,
> return false;
> if (vma->vm_file != file)
> return false;
> - if (vma->vm_ops && vma->vm_ops->close)
> + if (may_remove_vma && vma->vm_ops && vma->vm_ops->close)
> return false;
> if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
> return false;
> @@ -793,6 +794,8 @@ static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
> * We don't check here for the merged mmap wrapping around the end of pagecache
> * indices (16TB on ia32) because do_mmap() does not permit mmap's which
> * wrap, nor mmaps which cover the final page at index -1UL.
> + *
> + * We assume the vma may be removed as part of the merge.
> */
> static bool
> can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> @@ -800,7 +803,7 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> struct anon_vma_name *anon_name)
> {
> - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
> + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) &&
> is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> if (vma->vm_pgoff == vm_pgoff)
> return true;
> @@ -814,6 +817,8 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
> *
> * We cannot merge two vmas if they have differently assigned (non-NULL)
> * anon_vmas, nor if same anon_vma is assigned but offsets incompatible.
> + *
> + * We assume that vma is not removed as part of the merge.
> */
> static bool
> can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> @@ -821,7 +826,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> struct anon_vma_name *anon_name)
> {
> - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
> + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
> is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
> pgoff_t vm_pglen;
> vm_pglen = vma_pages(vma);
> --
> 2.39.2
>
I wonder whether this might be moved later into the actual vma_merge() logic so
we only abort a merge at the point a VMA with ->close() would otherwise be
removed? But obviously that would probably need some further clean up to make it
not add yet more complexity to an already very complicated bit of logic.
Otherwise, very nice, clear + conservative so,
Reviewed-By: Lorenzo Stoakes <[email protected]>
On Thu, Mar 09, 2023 at 12:12:58PM +0100, Vlastimil Babka wrote:
> This effectively reverts d014cd7c1c35 ("mm, mremap: fix mremap()
> expanding for vma's with vm_ops->close()"). After the recent changes,
> vma_merge() is able to handle the expansion properly even when the vma
> being expanded has a vm_ops->close operation, so we don't need to
> special case it anymore.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mremap.c | 20 ++++----------------
> 1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 411a85682b58..65f5b545601e 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -1040,23 +1040,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> * vma (expand operation itself) and possibly also with
> * the next vma if it becomes adjacent to the expanded
> * vma and otherwise compatible.
> - *
> - * However, vma_merge() can currently fail due to
> - * is_mergeable_vma() check for vm_ops->close (see the
> - * comment there). Yet this should not prevent vma
> - * expanding, so perform a simple expand for such vma.
> - * Ideally the check for close op should be only done
> - * when a vma would be actually removed due to a merge.
> */
> - if (!vma->vm_ops || !vma->vm_ops->close) {
> - vma = vma_merge(&vmi, mm, vma, extension_start,
> - extension_end, vma->vm_flags, vma->anon_vma,
> - vma->vm_file, extension_pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> - } else if (vma_expand(&vmi, vma, vma->vm_start,
> - addr + new_len, vma->vm_pgoff, NULL)) {
> - vma = NULL;
> - }
> + vma = vma_merge(&vmi, mm, vma, extension_start,
> + extension_end, vma->vm_flags, vma->anon_vma,
> + vma->vm_file, extension_pgoff, vma_policy(vma),
> + vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> if (!vma) {
> vm_unacct_memory(pages);
> ret = -ENOMEM;
> --
> 2.39.2
>
Good to eliminate this edge case! Do we have a self-test for this case to assert
that the issue is fixed by this? I guess a little tricky due to the need for the
the owning VMA to have ->close() specified.
In any case, the changes you have made in the previous patch should ensure the
edge case is no longer required, hence:-
Reviewed-by: Lorenzo Stoakes <[email protected]>
On 3/15/23 23:20, Lorenzo Stoakes wrote:
> On Thu, Mar 09, 2023 at 12:12:58PM +0100, Vlastimil Babka wrote:
>> This effectively reverts d014cd7c1c35 ("mm, mremap: fix mremap()
>> expanding for vma's with vm_ops->close()"). After the recent changes,
>> vma_merge() is able to handle the expansion properly even when the vma
>> being expanded has a vm_ops->close operation, so we don't need to
>> special case it anymore.
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> ---
>> mm/mremap.c | 20 ++++----------------
>> 1 file changed, 4 insertions(+), 16 deletions(-)
>>
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index 411a85682b58..65f5b545601e 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -1040,23 +1040,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>> * vma (expand operation itself) and possibly also with
>> * the next vma if it becomes adjacent to the expanded
>> * vma and otherwise compatible.
>> - *
>> - * However, vma_merge() can currently fail due to
>> - * is_mergeable_vma() check for vm_ops->close (see the
>> - * comment there). Yet this should not prevent vma
>> - * expanding, so perform a simple expand for such vma.
>> - * Ideally the check for close op should be only done
>> - * when a vma would be actually removed due to a merge.
>> */
>> - if (!vma->vm_ops || !vma->vm_ops->close) {
>> - vma = vma_merge(&vmi, mm, vma, extension_start,
>> - extension_end, vma->vm_flags, vma->anon_vma,
>> - vma->vm_file, extension_pgoff, vma_policy(vma),
>> - vma->vm_userfaultfd_ctx, anon_vma_name(vma));
>> - } else if (vma_expand(&vmi, vma, vma->vm_start,
>> - addr + new_len, vma->vm_pgoff, NULL)) {
>> - vma = NULL;
>> - }
>> + vma = vma_merge(&vmi, mm, vma, extension_start,
>> + extension_end, vma->vm_flags, vma->anon_vma,
>> + vma->vm_file, extension_pgoff, vma_policy(vma),
>> + vma->vm_userfaultfd_ctx, anon_vma_name(vma));
>> if (!vma) {
>> vm_unacct_memory(pages);
>> ret = -ENOMEM;
>> --
>> 2.39.2
>>
>
> Good to eliminate this edge case! Do we have a self-test for this case to assert
> that the issue is fixed by this? I guess a little tricky due to the need for the
> the owning VMA to have ->close() specified.
Yeah that's the problem, it needs some specific setup, unlike the existing
tests.
> In any case, the changes you have made in the previous patch should ensure the
> edge case is no longer required, hence:-
>
> Reviewed-by: Lorenzo Stoakes <[email protected]>
Thanks!
On 3/15/23 22:34, Lorenzo Stoakes wrote:
> On Thu, Mar 09, 2023 at 12:12:54PM +0100, Vlastimil Babka wrote:
>> There are several places where we test if 'mid' is really the area NNNN
>> in the diagram and the tests have two variants and are non-obvious to
>> follow. Instead, set 'mid' to NULL up-front if it's not the NNNN area,
>> and simplify the tests.
>>
>> Also update the description in comment accordingly.
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> ---
>> mm/mmap.c | 21 +++++++++++++--------
>> 1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index be60b344e4b1..3396c9b13f1c 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -848,10 +848,11 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
>> *
>> * The following mprotect cases have to be considered, where AAAA is
>> * the area passed down from mprotect_fixup, never extending beyond one
>> - * vma, PPPPPP is the prev vma specified, and NNNNNN the next vma after:
>> + * vma, PPPPPP is the prev vma specified, NNNN is a vma that overlaps
>> + * the area AAAA and XXXXXX the next vma after AAAA:
>
> I think this is worded in a bit of a confusing way + can be read as 'NNNN is a
> vma that overlaps the area AAAA and XXXXXX' whereas you mean to say 'NNNN is a
> VMA that overlaps the area AAAA, and XXXXXX is the next vma after AAAA'.
>
> This therefore might be better worded as:-
>
> 'PPPP is the previous VMA, NNNN is a VMA which overlaps AAAA and XXXX is the
> next VMA after AAAA.'
>
> Also - nit, but there's also inconsistency here between the number of letters in
> each block, e.g. 6 P's 4 N's 4 A's and 6 X's.
OK, I fixed that up (-fix patch below), thanks. Note that it's not just
"overlaps" for NNNN, it also has to align at the start of AAAA, so I made
that explicit in the comment. It also means PPPP no longer "overlaps" by
this definition in case 4.
> 'N' and 'X' are starting to be horrifically misleading here imo, I feel as if
> 'N' moving to 'O' (for overlapping) and 'X' to 'N' would make a big difference
> here.
I'll leave that possibility for a future patch as that's easier to done at
once after all those incremental changes here. But again note how
"overlapping" is not completely accurate word due to the start alignemnt.
>> *
>> * AAAA AAAA AAAA
>> - * PPPPPPNNNNNN PPPPPPXXXXXX PPPPPPNNNNNN
>> + * PPPPPPXXXXXX PPPPPPXXXXXX PPPPPPNNNNNN
>> * cannot merge might become might become
>> * PPXXXXXXXXXX PPPPPPPPPPNN
>> * mmap, brk or case 4 below case 5 below
>> @@ -879,9 +880,10 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
>> *
>> * In the code below:
>> * PPPP is represented by *prev
>> - * NNNN is represented by *mid (and possibly equal to *next)
>> - * XXXX is represented by *next or not represented at all.
>> - * AAAA is not represented - it will be merged or the function will return NULL
>> + * NNNN is represented by *mid or not represented at all (NULL)
>> + * XXXX is represented by *next or not represented at all (NULL)
>> + * AAAA is not represented - it will be merged and the vma containing the
>> + * area is returned, or the function will return NULL
>> */
>> struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
>> struct vm_area_struct *prev, unsigned long addr,
>> @@ -918,6 +920,9 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
>> else
>> next = mid;
>>
>> + if (mid && end <= mid->vm_start)
>> + mid = NULL;
>> +
>
> Might be worth putting a comment with the cases where this will happen, 1 - 4
> right? And also something like 'does AAAA overlap with mid?'
Added to the -fix patch below, with slightly different comment.
> And I really think renaming this to 'overlapping' or 'overlaps' or similar would
> make a big readability difference.
Yeah but it's quite long word and again not completely self explanatory.
> However we do have the thorny issue of case 4 where A overlaps P... But probably
> the fact that we treat this as a separate VMA from prev is enough to make it
> clear it being called 'overlaps' means 'separate from prev, also overlaps' so I
> think that's fine.
>
> Adding this actually makes me think twice about the previous 'natural order'
> patch, because the intuition which that promotes is:-
>
> mid = VMA after prev
> next = VMA after mid
>
> [ prev ] [ mid ] [ next ]
>
> But in reality that else branch means that next could be be equal to mid and
> now if there isn't overlap we rename mid to next effectively, e.g.:-
>
> mid = VMA after prev
> next = mid
> delete mid
>
> Which feels like the 'natural' intuition is suddenly broken. Maybe this needs
> reworking to be super explicit about this? Such as:-
>
> struct vm_area_struct tmp;
>
> ...
>
> /* If there is a previous VMA, find the next, otherwise find the first. */
> tmp = find_vma(mm, prev ? prev->vm_end : 0);
>
> /*
> * If the address range overlaps with the input range (which can cover only a
> * single VMA at most), then we are only interested in next if we span right up
> * to its end.
> *
> * Otherwise we are simply left with prev and next.
> */
> overlaps = tmp && end > tmp->vm_start ? tmp : NULL;
> if (overlaps)
> next = overlaps->vm_end == end ? overlaps->vm_next : NULL;
> else
> next = tmp;
>
> Of course I haven't read the rest of the patches in this series so you may
> address aspects of this already :)
So as said above feel free to propose further followup in that direction.
You're right that in case 5 we should end up with next == NULL, in order to
be completely accurate. If we made sure next is only non-NULL if "end ==
next->vm_start" upfront, we could leave out that test later in "/* Can we
merge the successor? */".
>> /* verify some invariant that must be enforced by the caller */
>> VM_WARN_ON(prev && addr <= prev->vm_start);
>> VM_WARN_ON(mid && end > mid->vm_end);
>> @@ -952,7 +957,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
>> remove = next; /* case 1 */
>> vma_end = next->vm_end;
>> err = dup_anon_vma(prev, next);
>> - if (mid != next) { /* case 6 */
>> + if (mid) { /* case 6 */
>> remove = mid;
>> remove2 = next;
>> if (!next->anon_vma)
>> @@ -960,7 +965,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
>> }
>> } else if (merge_prev) {
>> err = 0; /* case 2 */
>> - if (mid && end > mid->vm_start) {
>> + if (mid) {
>> err = dup_anon_vma(prev, mid);
>> if (end == mid->vm_end) { /* case 7 */
>> remove = mid;
>> @@ -982,7 +987,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
>> vma_end = next->vm_end;
>> vma_pgoff = next->vm_pgoff;
>> err = 0;
>> - if (mid != next) { /* case 8 */
>> + if (mid) { /* case 8 */
>> vma_pgoff = mid->vm_pgoff;
>> remove = mid;
>> err = dup_anon_vma(next, mid);
>> --
>> 2.39.2
>>
>
> Other than the nitty comment notes and the conceptual discussion, this LGTM so:-
>
> Reviewed-By: Lorenzo Stoakes <[email protected]>
Thanks! Here's the -fix patch:
----8<----
From 1016590e31f0173070daffd905c3396607a68b4b Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Thu, 16 Mar 2023 10:56:04 +0100
Subject: [PATCH] mm/mmap/vma_merge: set mid to NULL if not applicable-fix
Adjust/add comments as suggested by Lorenzo.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3396c9b13f1c..cd0b0d1f4aeb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -848,8 +848,9 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
*
* The following mprotect cases have to be considered, where AAAA is
* the area passed down from mprotect_fixup, never extending beyond one
- * vma, PPPPPP is the prev vma specified, NNNN is a vma that overlaps
- * the area AAAA and XXXXXX the next vma after AAAA:
+ * vma, PPPP is the previous vma, NNNN is a vma that starts at the same
+ * address as AAAA and is of the same or larger span, and XXXX the next
+ * vma after AAAA:
*
* AAAA AAAA AAAA
* PPPPPPXXXXXX PPPPPPXXXXXX PPPPPPNNNNNN
@@ -920,6 +921,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
else
next = mid;
+ /* In cases 1 - 4 there's no NNNN vma */
if (mid && end <= mid->vm_start)
mid = NULL;
--
2.39.2
On 3/15/23 23:05, Lorenzo Stoakes wrote:
> On Thu, Mar 09, 2023 at 12:12:57PM +0100, Vlastimil Babka wrote:
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -742,12 +742,13 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>>
>> /*
>> * If the vma has a ->close operation then the driver probably needs to release
>> - * per-vma resources, so we don't attempt to merge those.
>> + * per-vma resources, so we don't attempt to merge those in case the caller
>> + * indicates the current vma may be removed as part of the merge.
>
> Nit: 'in case the caller indicates' -> 'if the caller indicates'
Ok, -fix patch below.
...
>> - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
>> + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) &&
>> is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
>> pgoff_t vm_pglen;
>> vm_pglen = vma_pages(vma);
>> --
>> 2.39.2
>>
>
> I wonder whether this might be moved later into the actual vma_merge() logic so
> we only abort a merge at the point a VMA with ->close() would otherwise be
> removed? But obviously that would probably need some further clean up to make it
> not add yet more complexity to an already very complicated bit of logic.
Yeah, can try that later to cover the remaining cases where
->close prevents merging unnecessarily.
> Otherwise, very nice, clear + conservative so,
>
> Reviewed-By: Lorenzo Stoakes <[email protected]>
Thanks!
----8<----
From b8072720288491bdc887ebebbb099babcfb108a5 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Thu, 16 Mar 2023 11:54:27 +0100
Subject: [PATCH] mm/mmap: start distinguishing if vma can be removed in
mergeability test-fix
Adjust comment as suggested by Lorenzo.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 4259095a2982..b255e6352710 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -742,8 +742,8 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
/*
* If the vma has a ->close operation then the driver probably needs to release
- * per-vma resources, so we don't attempt to merge those in case the caller
- * indicates the current vma may be removed as part of the merge.
+ * per-vma resources, so we don't attempt to merge those if the caller indicates
+ * the current vma may be removed as part of the merge.
*/
static inline bool is_mergeable_vma(struct vm_area_struct *vma,
struct file *file, unsigned long vm_flags,
--
2.39.2
Sorry for the late review, I've been away.
For the whole series,
Reviewed-by: Liam R. Howlett <[email protected]>
* Vlastimil Babka <[email protected]> [230309 06:18]:
> Also available in git:
> https://git.kernel.org/vbabka/h/vma_merge_cleanup-v1r2
>
> Changes since RFC:
> - rebased to 6.3-rc1, dropped first patch (urgent fix) that was merged there
> - reindent parameters of mergeability checks (suggested by willy on IRC)
>
> My initial goal here was to try making the check for vm_ops->close in
> is_mergeable_vma() only be applied for vma's that would be truly removed
> as part of the merge (see Patch 9). This would then allow reverting the
> quick fix d014cd7c1c35 ("mm, mremap: fix mremap() expanding for vma's
> with vm_ops->close()"). This was successful enough to allow the revert
> (Patch 10). Checks using can_vma_merge_before() are still pessimistic
> about possible vma removal, and making them precise would probably
> complicate the vma_merge() code too much.
>
> Liam's 6.3-rc1 simplification of vma_merge() and removal of
> __vma_adjust() was very much helpful in understanding the vma_merge()
> implementation and especially when vma removals can happen, which is now
> very obvious. While studing the code, I've found ways to make it
> hopefully even more easy to follow, so that's the patches 1-8. That made
> me also notice a bug that's now already fixed in 6.3-rc1.
>
> Vlastimil Babka (10):
> mm/mmap/vma_merge: use only primary pointers for preparing merge
> mm/mmap/vma_merge: use the proper vma pointer in case 3
> mm/mmap/vma_merge: use the proper vma pointers in cases 1 and 6
> mm/mmap/vma_merge: use the proper vma pointer in case 4
> mm/mmap/vma_merge: initialize mid and next in natural order
> mm/mmap/vma_merge: set mid to NULL if not applicable
> mm/mmap/vma_merge: rename adj_next to adj_start
> mm/mmap/vma_merge: convert mergeability checks to return bool
> mm/mmap: start distinguishing if vma can be removed in mergeability
> test
> mm/mremap: simplify vma expansion again
>
> mm/mmap.c | 142 ++++++++++++++++++++++++++++------------------------
> mm/mremap.c | 20 ++------
> 2 files changed, 80 insertions(+), 82 deletions(-)
>
> --
> 2.39.2
>