2023-10-08 20:23:40

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH 0/4] Abstract vma_merge() and split_vma()

The vma_merge() interface is very confusing and its implementation has led
to numerous bugs as a result of that confusion.

In addition there is duplication both in invocation of vma_merge(), but
also in the common mprotect()-style pattern of attempting a merge, then if
this fails, splitting the portion of a VMA about to have its attributes
changed.

This pattern has been copy/pasted around the kernel in each instance where
such an operation has been required, each very slightly modified from the
last to make it even harder to decipher what is going on.

Simplify the whole thing by dividing the actual uses of vma_merge() and
split_vma() into specific and abstracted functions and de-duplicate the
vma_merge()/split_vma() pattern altogether.

Doing so also opens the door to changing how vma_merge() is implemented -
by knowing precisely what cases a caller is invoking rather than having a
central interface where anything might happen, we can untangle the brittle
and confusing vma_merge() implementation into something more workable.

For mprotect()-like cases we introduce vma_modify() which performs the
vma_merge()/split_vma() pattern, returning a pointer or an ERR_PTR(err) if
the splits fail.

This is an internal interface, as it is confusing having a number of
different parameters available for fields that can be changed. Instead we
split the kernel interface into four functions:-

* vma_modify_flags() - Prepare to modify the VMA's flags.
* vma_modify_flags_name() - Prepare to modify the VMA's flags/anon_vma_name
* vma_modify_policy() - Prepare to modify the VMA's mempolicy.
* vma_modify_uffd() - Prepare to modify the VMA's flags/uffd context.

For cases where a new VMA is attempted to be merged with adjacent VMAs we
add:-

* vma_merge_new_vma() - Prepare to merge a new VMA.
* vma_merge_extend() - Prepare to extend the end of a new VMA.

Lorenzo Stoakes (4):
mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.
mm: make vma_merge() and split_vma() internal
mm: abstract merge for new VMAs into vma_merge_new_vma()
mm: abstract VMA extension and merge into vma_merge_extend() helper

fs/userfaultfd.c | 53 +++++----------
include/linux/mm.h | 32 ++++++---
mm/internal.h | 7 ++
mm/madvise.c | 25 ++-----
mm/mempolicy.c | 20 +-----
mm/mlock.c | 24 ++-----
mm/mmap.c | 160 ++++++++++++++++++++++++++++++++++++++++-----
mm/mprotect.c | 27 ++------
mm/mremap.c | 30 ++++-----
mm/nommu.c | 4 +-
10 files changed, 228 insertions(+), 154 deletions(-)

--
2.42.0


2023-10-08 20:23:42

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH 1/4] mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.

mprotect() and other functions which change VMA parameters over a range
each employ a pattern of:-

1. Attempt to merge the range with adjacent VMAs.
2. If this fails, and the range spans a subset of the VMA, split it
accordingly.

This is open-coded and duplicated in each case. Also in each case most of
the parameters passed to vma_merge() remain the same.

Create a new static function, vma_modify(), which abstracts this operation,
accepting only those parameters which can be changed.

To avoid the mess of invoking each function call with unnecessary
parameters, create wrapper functions for each of the modify operations,
parameterised only by what is required to perform the action.

Note that the userfaultfd_release() case works even though it does not
split VMAs - since start is set to vma->vm_start and end is set to
vma->vm_end, the split logic does not trigger.

In addition, since we calculate pgoff to be equal to vma->vm_pgoff + (start
- vma->vm_start) >> PAGE_SHIFT, and start - vma->vm_start will be 0 in this
instance, this invocation will remain unchanged.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
fs/userfaultfd.c | 53 +++++++++-----------------
include/linux/mm.h | 23 ++++++++++++
mm/madvise.c | 25 ++++---------
mm/mempolicy.c | 20 ++--------
mm/mlock.c | 24 ++++--------
mm/mmap.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++
mm/mprotect.c | 27 ++++----------
7 files changed, 157 insertions(+), 108 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index a7c6ef764e63..9e5232d23927 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -927,11 +927,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
continue;
}
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
- new_flags, vma->anon_vma,
- vma->vm_file, vma->vm_pgoff,
- vma_policy(vma),
- NULL_VM_UFFD_CTX, anon_vma_name(vma));
+ prev = vma_modify_uffd(&vmi, prev, vma, vma->vm_start,
+ vma->vm_end, new_flags,
+ NULL_VM_UFFD_CTX);
+
if (prev) {
vma = prev;
} else {
@@ -1331,7 +1330,6 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
unsigned long start, end, vma_end;
struct vma_iterator vmi;
bool wp_async = userfaultfd_wp_async_ctx(ctx);
- pgoff_t pgoff;

user_uffdio_register = (struct uffdio_register __user *) arg;

@@ -1484,26 +1482,18 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vma_end = min(end, vma->vm_end);

new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
- pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- vma_policy(vma),
- ((struct vm_userfaultfd_ctx){ ctx }),
- anon_vma_name(vma));
+ prev = vma_modify_uffd(&vmi, prev, vma, start, vma_end,
+ new_flags,
+ ((struct vm_userfaultfd_ctx){ ctx }));
if (prev) {
/* vma_merge() invalidated the mas */
vma = prev;
goto next;
}
- if (vma->vm_start < start) {
- ret = split_vma(&vmi, vma, start, 1);
- if (ret)
- break;
- }
- if (vma->vm_end > end) {
- ret = split_vma(&vmi, vma, end, 0);
- if (ret)
- break;
+
+ if (IS_ERR(prev)) {
+ ret = PTR_ERR(prev);
+ break;
}
next:
/*
@@ -1568,7 +1558,6 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
const void __user *buf = (void __user *)arg;
struct vma_iterator vmi;
bool wp_async = userfaultfd_wp_async_ctx(ctx);
- pgoff_t pgoff;

ret = -EFAULT;
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1671,24 +1660,16 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
uffd_wp_range(vma, start, vma_end - start, false);

new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- vma_policy(vma),
- NULL_VM_UFFD_CTX, anon_vma_name(vma));
+ prev = vma_modify_uffd(&vmi, prev, vma, start, vma_end,
+ new_flags, NULL_VM_UFFD_CTX);
if (prev) {
vma = prev;
goto next;
}
- if (vma->vm_start < start) {
- ret = split_vma(&vmi, vma, start, 1);
- if (ret)
- break;
- }
- if (vma->vm_end > end) {
- ret = split_vma(&vmi, vma, end, 0);
- if (ret)
- break;
+
+ if (IS_ERR(prev)) {
+ ret = PTR_ERR(prev);
+ break;
}
next:
/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a7b667786cde..c069813f215f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3253,6 +3253,29 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
unsigned long addr, unsigned long len, pgoff_t pgoff,
bool *need_rmap_locks);
extern void exit_mmap(struct mm_struct *);
+struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags);
+struct vm_area_struct *vma_modify_flags_name(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start,
+ unsigned long end,
+ unsigned long new_flags,
+ struct anon_vma_name *new_name);
+struct vm_area_struct *vma_modify_policy(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct mempolicy *new_pol);
+struct vm_area_struct *vma_modify_uffd(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags,
+ struct vm_userfaultfd_ctx new_ctx);

static inline int check_data_rlimit(unsigned long rlim,
unsigned long new,
diff --git a/mm/madvise.c b/mm/madvise.c
index a4a20de50494..73024693d5c8 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -141,7 +141,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
{
struct mm_struct *mm = vma->vm_mm;
int error;
- pgoff_t pgoff;
+ struct vm_area_struct *merged;
VMA_ITERATOR(vmi, mm, start);

if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
@@ -149,28 +149,17 @@ static int madvise_update_vma(struct vm_area_struct *vma,
return 0;
}

- pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(&vmi, mm, *prev, start, end, new_flags,
- vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_name);
- if (*prev) {
- vma = *prev;
+ merged = vma_modify_flags_name(&vmi, *prev, vma, start, end, new_flags,
+ anon_name);
+ if (merged) {
+ vma = *prev = merged;
goto success;
}

*prev = vma;

- if (start != vma->vm_start) {
- error = split_vma(&vmi, vma, start, 1);
- if (error)
- return error;
- }
-
- if (end != vma->vm_end) {
- error = split_vma(&vmi, vma, end, 0);
- if (error)
- return error;
- }
+ if (IS_ERR(merged))
+ return PTR_ERR(merged);

success:
/* vm_flags is protected by the mmap_lock held in write mode. */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b01922e88548..b608b1744197 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -786,8 +786,6 @@ static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
{
struct vm_area_struct *merged;
unsigned long vmstart, vmend;
- pgoff_t pgoff;
- int err;

vmend = min(end, vma->vm_end);
if (start > vma->vm_start) {
@@ -802,26 +800,14 @@ static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
return 0;
}

- pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff, new_pol,
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ merged = vma_modify_policy(vmi, *prev, vma, vmstart, vmend, new_pol);
if (merged) {
*prev = merged;
return vma_replace_policy(merged, new_pol);
}

- if (vma->vm_start != vmstart) {
- err = split_vma(vmi, vma, vmstart, 1);
- if (err)
- return err;
- }
-
- if (vma->vm_end != vmend) {
- err = split_vma(vmi, vma, vmend, 0);
- if (err)
- return err;
- }
+ if (IS_ERR(merged))
+ return PTR_ERR(merged);

*prev = vma;
return vma_replace_policy(vma, new_pol);
diff --git a/mm/mlock.c b/mm/mlock.c
index 42b6865f8f82..50ebea3b7885 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -476,10 +476,10 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long end, vm_flags_t newflags)
{
struct mm_struct *mm = vma->vm_mm;
- pgoff_t pgoff;
int nr_pages;
int ret = 0;
vm_flags_t oldflags = vma->vm_flags;
+ struct vm_area_struct *merged;

if (newflags == oldflags || (oldflags & VM_SPECIAL) ||
is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) ||
@@ -487,25 +487,15 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
/* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
goto out;

- pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(vmi, mm, *prev, start, end, newflags,
- vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
- if (*prev) {
- vma = *prev;
+ merged = vma_modify_flags(vmi, *prev, vma, start, end, newflags);
+ if (merged) {
+ vma = *prev = merged;
goto success;
}

- if (start != vma->vm_start) {
- ret = split_vma(vmi, vma, start, 1);
- if (ret)
- goto out;
- }
-
- if (end != vma->vm_end) {
- ret = split_vma(vmi, vma, end, 0);
- if (ret)
- goto out;
+ if (IS_ERR(merged)) {
+ ret = PTR_ERR(merged);
+ goto out;
}

success:
diff --git a/mm/mmap.c b/mm/mmap.c
index 673429ee8a9e..8c21171b431f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2437,6 +2437,99 @@ int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
return __split_vma(vmi, vma, addr, new_below);
}

+/*
+ * We are about to modify one or multiple of a VMA's flags, policy, userfaultfd
+ * context and anonymous VMA name within the range [start, end).
+ *
+ * As a result, we might be able to merge the newly modified VMA range with an
+ * adjacent VMA with identical properties.
+ *
+ * If no merge is possible and the range does not span the entirety of the VMA,
+ * we then need to split the VMA to accommodate the change.
+ */
+static struct vm_area_struct *vma_modify(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long vm_flags,
+ struct mempolicy *policy,
+ struct vm_userfaultfd_ctx uffd_ctx,
+ struct anon_vma_name *anon_name)
+{
+ pgoff_t pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+ struct vm_area_struct *merged;
+
+ merged = vma_merge(vmi, vma->vm_mm, prev, start, end, vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, policy,
+ uffd_ctx, anon_name);
+ if (merged)
+ return merged;
+
+ if (vma->vm_start < start) {
+ int err = split_vma(vmi, vma, start, 1);
+
+ if (err)
+ return ERR_PTR(err);
+ }
+
+ if (vma->vm_end > end) {
+ int err = split_vma(vmi, vma, end, 0);
+
+ if (err)
+ return ERR_PTR(err);
+ }
+
+ return NULL;
+}
+
+/* We are about to modify the VMA's flags. */
+struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags)
+{
+ return vma_modify(vmi, prev, vma, start, end, new_flags,
+ vma_policy(vma), vma->vm_userfaultfd_ctx,
+ anon_vma_name(vma));
+}
+
+/* We are about to modify the VMA's flags and/or anon_name. */
+struct vm_area_struct *vma_modify_flags_name(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start,
+ unsigned long end,
+ unsigned long new_flags,
+ struct anon_vma_name *new_name)
+{
+ return vma_modify(vmi, prev, vma, start, end, new_flags,
+ vma_policy(vma), vma->vm_userfaultfd_ctx, new_name);
+}
+
+/* We are about to modify the VMA's flags memory policy. */
+struct vm_area_struct *vma_modify_policy(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct mempolicy *new_pol)
+{
+ return vma_modify(vmi, prev, vma, start, end, vma->vm_flags,
+ new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+}
+
+/* We are about to modify the VMA's uffd context and/or flags. */
+struct vm_area_struct *vma_modify_uffd(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ unsigned long new_flags,
+ struct vm_userfaultfd_ctx new_ctx)
+{
+ return vma_modify(vmi, prev, vma, start, end, new_flags,
+ vma_policy(vma), new_ctx, anon_vma_name(vma));
+}
+
/*
* do_vmi_align_munmap() - munmap the aligned region from @start to @end.
* @vmi: The vma iterator
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b94fbb45d5c7..fdc94453bced 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -581,7 +581,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
long nrpages = (end - start) >> PAGE_SHIFT;
unsigned int mm_cp_flags = 0;
unsigned long charged = 0;
- pgoff_t pgoff;
+ struct vm_area_struct *merged;
int error;

if (newflags == oldflags) {
@@ -625,31 +625,18 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
}
}

- /*
- * First try to merge with previous and/or next vma.
- */
- pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *pprev = vma_merge(vmi, mm, *pprev, start, end, newflags,
- vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
- if (*pprev) {
- vma = *pprev;
+ merged = vma_modify_flags(vmi, *pprev, vma, start, end, newflags);
+ if (merged) {
+ vma = *pprev = merged;
VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
goto success;
}

*pprev = vma;

- if (start != vma->vm_start) {
- error = split_vma(vmi, vma, start, 1);
- if (error)
- goto fail;
- }
-
- if (end != vma->vm_end) {
- error = split_vma(vmi, vma, end, 0);
- if (error)
- goto fail;
+ if (IS_ERR(merged)) {
+ error = PTR_ERR(merged);
+ goto fail;
}

success:
--
2.42.0

2023-10-08 20:23:47

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH 2/4] mm: make vma_merge() and split_vma() internal

Now the vma_merge()/split_vma() pattern has been abstracted, we use it
entirely internally within mm/mmap.c, so make the function static. We also
no longer need vma_merge() anywhere else except mm/mremap.c, so make it
internal.

In addition, the split_vma() nommu variant also need not be exported.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
include/linux/mm.h | 9 ---------
mm/internal.h | 9 +++++++++
mm/mmap.c | 8 ++++----
mm/nommu.c | 4 ++--
4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c069813f215f..6aa532682094 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3237,16 +3237,7 @@ extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct vm_area_struct *next);
extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff);
-extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
- struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
- unsigned long end, unsigned long vm_flags, struct anon_vma *,
- struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
- struct anon_vma_name *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
-extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
- unsigned long addr, int new_below);
-extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
- unsigned long addr, int new_below);
extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void unlink_file_vma(struct vm_area_struct *);
extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
diff --git a/mm/internal.h b/mm/internal.h
index 3a72975425bb..ddaeb9f2d9d7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1011,6 +1011,15 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmd,
unsigned int flags);

+/*
+ * mm/mmap.c
+ */
+struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
+ struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
+ unsigned long end, unsigned long vm_flags, struct anon_vma *,
+ struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
+ struct anon_vma_name *);
+
enum {
/* mark page accessed */
FOLL_TOUCH = 1 << 16,
diff --git a/mm/mmap.c b/mm/mmap.c
index 8c21171b431f..58d71f84e917 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2346,8 +2346,8 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
* has already been checked or doesn't make sense to fail.
* VMA Iterator will point to the end VMA.
*/
-int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+static int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long addr, int new_below)
{
struct vma_prepare vp;
struct vm_area_struct *new;
@@ -2428,8 +2428,8 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
* Split a vma into two pieces at address 'addr', a new vma is allocated
* either for the first part or the tail.
*/
-int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long addr, int new_below)
{
if (vma->vm_mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
diff --git a/mm/nommu.c b/mm/nommu.c
index f9553579389b..fc4afe924ad5 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1305,8 +1305,8 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
* split a vma into two pieces at address 'addr', a new vma is allocated either
* for the first part or the tail.
*/
-int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long addr, int new_below)
{
struct vm_area_struct *new;
struct vm_region *region;
--
2.42.0

2023-10-08 20:23:59

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH 3/4] mm: abstract merge for new VMAs into vma_merge_new_vma()

Only in mmap_region() and copy_vma() do we add VMAs which occupy entirely
new regions of virtual memory.

We can share the logic between these invocations and make it absolutely
explici to reduce confusion around the rather inscrutible parameters
possessed by vma_merge().

This also paves the way for a simplification of the core vma_merge()
implementation, as we seek to make the function entirely an implementation
detail.

Note that on mmap_region(), vma fields are initialised to zero, so we can
simply reference these rather than explicitly specifying NULL.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
mm/mmap.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 58d71f84e917..51be864b876b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2530,6 +2530,22 @@ struct vm_area_struct *vma_modify_uffd(struct vma_iterator *vmi,
vma_policy(vma), new_ctx, anon_vma_name(vma));
}

+/*
+ * Attempt to merge a newly mapped VMA with those adjacent to it. The caller
+ * must ensure that [start, end) does not overlap any existing VMA.
+ */
+static struct vm_area_struct *vma_merge_new_vma(struct vma_iterator *vmi,
+ struct vm_area_struct *prev,
+ struct vm_area_struct *vma,
+ unsigned long start,
+ unsigned long end,
+ pgoff_t pgoff)
+{
+ return vma_merge(vmi, vma->vm_mm, prev, start, end, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+}
+
/*
* do_vmi_align_munmap() - munmap the aligned region from @start to @end.
* @vmi: The vma iterator
@@ -2885,10 +2901,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && prev)) {
- merge = vma_merge(&vmi, mm, prev, vma->vm_start,
- vma->vm_end, vma->vm_flags, NULL,
- vma->vm_file, vma->vm_pgoff, NULL,
- NULL_VM_UFFD_CTX, NULL);
+ merge = vma_merge_new_vma(&vmi, prev, vma,
+ vma->vm_start, vma->vm_end,
+ pgoff);
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -3430,9 +3445,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */

- new_vma = vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ new_vma = vma_merge_new_vma(&vmi, prev, vma, addr, addr + len, pgoff);
if (new_vma) {
/*
* Source vma may have been merged into new_vma
--
2.42.0

2023-10-08 20:24:32

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH 4/4] mm: abstract VMA extension and merge into vma_merge_extend() helper

mremap uses vma_merge() in the case where a VMA needs to be extended. This
can be significantly simplified and abstracted.

This makes it far easier to understand what the actual function is doing,
avoids future mistakes in use of the confusing vma_merge() function and
importantly allows us to make future changes to how vma_merge() is
implemented by knowing explicitly which merge cases each invocation uses.

Note that in the mremap() extend case, we perform this merge only when
old_len == vma->vm_end - addr. The extension_start, i.e. the start of the
extended portion of the VMA is equal to addr + old_len, i.e. vma->vm_end.

With this refactoring, vma_merge() is no longer required anywhere except
mm/mmap.c, so mark it static.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
mm/internal.h | 8 +++-----
mm/mmap.c | 32 +++++++++++++++++++++++++-------
mm/mremap.c | 30 +++++++++++++-----------------
3 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index ddaeb9f2d9d7..6fa722b07a94 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1014,11 +1014,9 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
/*
* mm/mmap.c
*/
-struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
- struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
- unsigned long end, unsigned long vm_flags, struct anon_vma *,
- struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
- struct anon_vma_name *);
+struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
+ struct vm_area_struct *vma,
+ unsigned long delta);

enum {
/* mark page accessed */
diff --git a/mm/mmap.c b/mm/mmap.c
index 51be864b876b..5d2f2e8d7307 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -860,13 +860,13 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* **** is not represented - it will be merged and the vma containing the
* area is returned, or the function will return NULL
*/
-struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *prev, unsigned long addr,
- unsigned long end, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+static struct vm_area_struct
+*vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
+ struct vm_area_struct *prev, unsigned long addr, unsigned long end,
+ unsigned long vm_flags, struct anon_vma *anon_vma, struct file *file,
+ pgoff_t pgoff, struct mempolicy *policy,
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct anon_vma_name *anon_name)
{
struct vm_area_struct *curr, *next, *res;
struct vm_area_struct *vma, *adjust, *remove, *remove2;
@@ -2546,6 +2546,24 @@ static struct vm_area_struct *vma_merge_new_vma(struct vma_iterator *vmi,
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
}

+/*
+ * Expand vma by delta bytes, potentially merging with an immediately adjacent
+ * VMA with identical properties.
+ */
+struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
+ struct vm_area_struct *vma,
+ unsigned long delta)
+{
+ pgoff_t pgoff = vma->vm_pgoff +
+ ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
+
+ /* vma is specified as prev, so case 1 or 2 will apply. */
+ return vma_merge(vmi, vma->vm_mm, vma, vma->vm_end, vma->vm_end + delta,
+ vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff,
+ vma_policy(vma), vma->vm_userfaultfd_ctx,
+ anon_vma_name(vma));
+}
+
/*
* do_vmi_align_munmap() - munmap the aligned region from @start to @end.
* @vmi: The vma iterator
diff --git a/mm/mremap.c b/mm/mremap.c
index ce8a23ef325a..38d98465f3d8 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1096,14 +1096,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
/* old_len exactly to the end of the area..
*/
if (old_len == vma->vm_end - addr) {
+ unsigned long delta = new_len - old_len;
+
/* can we just expand the current mapping? */
- if (vma_expandable(vma, new_len - old_len)) {
- long pages = (new_len - old_len) >> PAGE_SHIFT;
- unsigned long extension_start = addr + old_len;
- unsigned long extension_end = addr + new_len;
- pgoff_t extension_pgoff = vma->vm_pgoff +
- ((extension_start - vma->vm_start) >> PAGE_SHIFT);
- VMA_ITERATOR(vmi, mm, extension_start);
+ if (vma_expandable(vma, delta)) {
+ long pages = delta >> PAGE_SHIFT;
+ VMA_ITERATOR(vmi, mm, vma->vm_end);
long charged = 0;

if (vma->vm_flags & VM_ACCOUNT) {
@@ -1115,17 +1113,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
}

/*
- * Function vma_merge() is called on the extension we
- * are adding to the already existing vma, vma_merge()
- * will merge this extension with the already existing
- * vma (expand operation itself) and possibly also with
- * the next vma if it becomes adjacent to the expanded
- * vma and otherwise compatible.
+ * Function vma_merge_extend() is called on the
+ * extension we are adding to the already existing vma,
+ * vma_merge_extend() will merge this extension with the
+ * already existing vma (expand operation itself) and
+ * possibly also with the next vma if it becomes
+ * adjacent to the expanded vma and otherwise
+ * compatible.
*/
- vma = vma_merge(&vmi, mm, vma, extension_start,
- extension_end, vma->vm_flags, vma->anon_vma,
- vma->vm_file, extension_pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ vma = vma_merge_extend(&vmi, vma, delta);
if (!vma) {
vm_unacct_memory(charged);
ret = -ENOMEM;
--
2.42.0

2023-10-09 15:22:57

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.

On 10/8/23 22:23, Lorenzo Stoakes wrote:
> mprotect() and other functions which change VMA parameters over a range
> each employ a pattern of:-
>
> 1. Attempt to merge the range with adjacent VMAs.
> 2. If this fails, and the range spans a subset of the VMA, split it
> accordingly.
>
> This is open-coded and duplicated in each case. Also in each case most of
> the parameters passed to vma_merge() remain the same.
>
> Create a new static function, vma_modify(), which abstracts this operation,
> accepting only those parameters which can be changed.
>
> To avoid the mess of invoking each function call with unnecessary
> parameters, create wrapper functions for each of the modify operations,
> parameterised only by what is required to perform the action.

Nice!

> Note that the userfaultfd_release() case works even though it does not
> split VMAs - since start is set to vma->vm_start and end is set to
> vma->vm_end, the split logic does not trigger.
>
> In addition, since we calculate pgoff to be equal to vma->vm_pgoff + (start
> - vma->vm_start) >> PAGE_SHIFT, and start - vma->vm_start will be 0 in this
> instance, this invocation will remain unchanged.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> fs/userfaultfd.c | 53 +++++++++-----------------
> include/linux/mm.h | 23 ++++++++++++
> mm/madvise.c | 25 ++++---------
> mm/mempolicy.c | 20 ++--------
> mm/mlock.c | 24 ++++--------
> mm/mmap.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++
> mm/mprotect.c | 27 ++++----------
> 7 files changed, 157 insertions(+), 108 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index a7c6ef764e63..9e5232d23927 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -927,11 +927,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
> continue;
> }
> new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
> - prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
> - new_flags, vma->anon_vma,
> - vma->vm_file, vma->vm_pgoff,
> - vma_policy(vma),
> - NULL_VM_UFFD_CTX, anon_vma_name(vma));
> + prev = vma_modify_uffd(&vmi, prev, vma, vma->vm_start,
> + vma->vm_end, new_flags,
> + NULL_VM_UFFD_CTX);
> +
> if (prev) {
> vma = prev;
> } else {
> @@ -1331,7 +1330,6 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> unsigned long start, end, vma_end;
> struct vma_iterator vmi;
> bool wp_async = userfaultfd_wp_async_ctx(ctx);
> - pgoff_t pgoff;
>
> user_uffdio_register = (struct uffdio_register __user *) arg;
>
> @@ -1484,26 +1482,18 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> vma_end = min(end, vma->vm_end);
>
> new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
> - pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> - prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> - vma->anon_vma, vma->vm_file, pgoff,
> - vma_policy(vma),
> - ((struct vm_userfaultfd_ctx){ ctx }),
> - anon_vma_name(vma));
> + prev = vma_modify_uffd(&vmi, prev, vma, start, vma_end,
> + new_flags,
> + ((struct vm_userfaultfd_ctx){ ctx }));
> if (prev) {

This will hit also for IS_ERR(prev), no?

> /* vma_merge() invalidated the mas */
> vma = prev;
> goto next;
> }
> - if (vma->vm_start < start) {
> - ret = split_vma(&vmi, vma, start, 1);
> - if (ret)
> - break;
> - }
> - if (vma->vm_end > end) {
> - ret = split_vma(&vmi, vma, end, 0);
> - if (ret)
> - break;
> +
> + if (IS_ERR(prev)) {

So here's too late to test for it. AFAICS the other usages are like this as
well.

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a7b667786cde..c069813f215f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3253,6 +3253,29 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
> unsigned long addr, unsigned long len, pgoff_t pgoff,
> bool *need_rmap_locks);
> extern void exit_mmap(struct mm_struct *);
> +struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + unsigned long new_flags);
> +struct vm_area_struct *vma_modify_flags_name(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start,
> + unsigned long end,
> + unsigned long new_flags,
> + struct anon_vma_name *new_name);
> +struct vm_area_struct *vma_modify_policy(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + struct mempolicy *new_pol);
> +struct vm_area_struct *vma_modify_uffd(struct vma_iterator *vmi,
> + struct vm_area_struct *prev,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + unsigned long new_flags,
> + struct vm_userfaultfd_ctx new_ctx);

Could these be instead static inline wrappers, and vma_modify exported
instead of static?

Maybe we could also move this to mm/internal.h? Which would mean
fs/userfaultfd.c would have to start including it, but as it's already so
much rooted in mm, it shouldn't be wrong?

>
> static inline int check_data_rlimit(unsigned long rlim,
> unsigned long new,
> diff --git a/mm/madvise.c b/mm/madvise.c
> index a4a20de50494..73024693d5c8 100644

2023-10-09 15:45:45

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm: make vma_merge() and split_vma() internal

On 10/8/23 22:23, Lorenzo Stoakes wrote:
> Now the vma_merge()/split_vma() pattern has been abstracted, we use it

"it" refers to split_vma() only so "the latter" or "split_vma()"?

> entirely internally within mm/mmap.c, so make the function static. We also
> no longer need vma_merge() anywhere else except mm/mremap.c, so make it
> internal.
>
> In addition, the split_vma() nommu variant also need not be exported.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

2023-10-09 16:05:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm: abstract merge for new VMAs into vma_merge_new_vma()

On 10/8/23 22:23, Lorenzo Stoakes wrote:
> Only in mmap_region() and copy_vma() do we add VMAs which occupy entirely
> new regions of virtual memory.
>
> We can share the logic between these invocations and make it absolutely
> explici to reduce confusion around the rather inscrutible parameters

explicit ... inscrutable

> possessed by vma_merge().
>
> This also paves the way for a simplification of the core vma_merge()
> implementation, as we seek to make the function entirely an implementation
> detail.
>
> Note that on mmap_region(), vma fields are initialised to zero, so we can
> simply reference these rather than explicitly specifying NULL.

Right, if they were different from NULL, the code would be broken already.

> Signed-off-by: Lorenzo Stoakes <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

2023-10-09 16:30:25

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: abstract VMA extension and merge into vma_merge_extend() helper

On 10/8/23 22:23, Lorenzo Stoakes wrote:
> mremap uses vma_merge() in the case where a VMA needs to be extended. This
> can be significantly simplified and abstracted.
>
> This makes it far easier to understand what the actual function is doing,
> avoids future mistakes in use of the confusing vma_merge() function and
> importantly allows us to make future changes to how vma_merge() is
> implemented by knowing explicitly which merge cases each invocation uses.
>
> Note that in the mremap() extend case, we perform this merge only when
> old_len == vma->vm_end - addr. The extension_start, i.e. the start of the
> extended portion of the VMA is equal to addr + old_len, i.e. vma->vm_end.
>
> With this refactoring, vma_merge() is no longer required anywhere except
> mm/mmap.c, so mark it static.
>
> Signed-off-by: Lorenzo Stoakes <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

Nit:
> @@ -2546,6 +2546,24 @@ static struct vm_area_struct *vma_merge_new_vma(struct vma_iterator *vmi,
> vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> }
>
> +/*
> + * Expand vma by delta bytes, potentially merging with an immediately adjacent
> + * VMA with identical properties.
> + */
> +struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
> + struct vm_area_struct *vma,
> + unsigned long delta)
> +{
> + pgoff_t pgoff = vma->vm_pgoff +
> + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT);

could use vma_pages() here

> +
> + /* vma is specified as prev, so case 1 or 2 will apply. */
> + return vma_merge(vmi, vma->vm_mm, vma, vma->vm_end, vma->vm_end + delta,
> + vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff,
> + vma_policy(vma), vma->vm_userfaultfd_ctx,
> + anon_vma_name(vma));
> +}
> +
> /*
> * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
> * @vmi: The vma iterator

2023-10-09 18:19:52

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.

On Mon, Oct 09, 2023 at 05:22:33PM +0200, Vlastimil Babka wrote:
> On 10/8/23 22:23, Lorenzo Stoakes wrote:
> > mprotect() and other functions which change VMA parameters over a range
> > each employ a pattern of:-
> >
> > 1. Attempt to merge the range with adjacent VMAs.
> > 2. If this fails, and the range spans a subset of the VMA, split it
> > accordingly.
> >
> > This is open-coded and duplicated in each case. Also in each case most of
> > the parameters passed to vma_merge() remain the same.
> >
> > Create a new static function, vma_modify(), which abstracts this operation,
> > accepting only those parameters which can be changed.
> >
> > To avoid the mess of invoking each function call with unnecessary
> > parameters, create wrapper functions for each of the modify operations,
> > parameterised only by what is required to perform the action.
>
> Nice!

Thanks :)

>
> > Note that the userfaultfd_release() case works even though it does not
> > split VMAs - since start is set to vma->vm_start and end is set to
> > vma->vm_end, the split logic does not trigger.
> >
> > In addition, since we calculate pgoff to be equal to vma->vm_pgoff + (start
> > - vma->vm_start) >> PAGE_SHIFT, and start - vma->vm_start will be 0 in this
> > instance, this invocation will remain unchanged.
> >
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
> > ---
> > fs/userfaultfd.c | 53 +++++++++-----------------
> > include/linux/mm.h | 23 ++++++++++++
> > mm/madvise.c | 25 ++++---------
> > mm/mempolicy.c | 20 ++--------
> > mm/mlock.c | 24 ++++--------
> > mm/mmap.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++
> > mm/mprotect.c | 27 ++++----------
> > 7 files changed, 157 insertions(+), 108 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index a7c6ef764e63..9e5232d23927 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -927,11 +927,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
> > continue;
> > }
> > new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
> > - prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
> > - new_flags, vma->anon_vma,
> > - vma->vm_file, vma->vm_pgoff,
> > - vma_policy(vma),
> > - NULL_VM_UFFD_CTX, anon_vma_name(vma));
> > + prev = vma_modify_uffd(&vmi, prev, vma, vma->vm_start,
> > + vma->vm_end, new_flags,
> > + NULL_VM_UFFD_CTX);
> > +
> > if (prev) {
> > vma = prev;
> > } else {
> > @@ -1331,7 +1330,6 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> > unsigned long start, end, vma_end;
> > struct vma_iterator vmi;
> > bool wp_async = userfaultfd_wp_async_ctx(ctx);
> > - pgoff_t pgoff;
> >
> > user_uffdio_register = (struct uffdio_register __user *) arg;
> >
> > @@ -1484,26 +1482,18 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> > vma_end = min(end, vma->vm_end);
> >
> > new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
> > - pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> > - prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> > - vma->anon_vma, vma->vm_file, pgoff,
> > - vma_policy(vma),
> > - ((struct vm_userfaultfd_ctx){ ctx }),
> > - anon_vma_name(vma));
> > + prev = vma_modify_uffd(&vmi, prev, vma, start, vma_end,
> > + new_flags,
> > + ((struct vm_userfaultfd_ctx){ ctx }));
> > if (prev) {
>
> This will hit also for IS_ERR(prev), no?
>
> > /* vma_merge() invalidated the mas */
> > vma = prev;
> > goto next;
> > }
> > - if (vma->vm_start < start) {
> > - ret = split_vma(&vmi, vma, start, 1);
> > - if (ret)
> > - break;
> > - }
> > - if (vma->vm_end > end) {
> > - ret = split_vma(&vmi, vma, end, 0);
> > - if (ret)
> > - break;
> > +
> > + if (IS_ERR(prev)) {
>
> So here's too late to test for it. AFAICS the other usages are like this as
> well.

Oh dear :) yes you're right, I will rework this in v2 for all cases.

>
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index a7b667786cde..c069813f215f 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3253,6 +3253,29 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
> > unsigned long addr, unsigned long len, pgoff_t pgoff,
> > bool *need_rmap_locks);
> > extern void exit_mmap(struct mm_struct *);
> > +struct vm_area_struct *vma_modify_flags(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start, unsigned long end,
> > + unsigned long new_flags);
> > +struct vm_area_struct *vma_modify_flags_name(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start,
> > + unsigned long end,
> > + unsigned long new_flags,
> > + struct anon_vma_name *new_name);
> > +struct vm_area_struct *vma_modify_policy(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start, unsigned long end,
> > + struct mempolicy *new_pol);
> > +struct vm_area_struct *vma_modify_uffd(struct vma_iterator *vmi,
> > + struct vm_area_struct *prev,
> > + struct vm_area_struct *vma,
> > + unsigned long start, unsigned long end,
> > + unsigned long new_flags,
> > + struct vm_userfaultfd_ctx new_ctx);
>
> Could these be instead static inline wrappers, and vma_modify exported
> instead of static?

I started by trying this but sadly the vma_policy() helper needs the
mempolicy header and trying to important that into mm.h produces a horror
show of things breaking.

As discussed via IRC, will look to see whether we can sensibly move this
define into mm_types.h and then we can shift these.

>
> Maybe we could also move this to mm/internal.h? Which would mean
> fs/userfaultfd.c would have to start including it, but as it's already so
> much rooted in mm, it shouldn't be wrong?

I'm not a fan of trying to have fs/userfaultfd.c to important
mm/internal.h, seems like a bridge too far there. I think it's a bit odd
that the fs bit invokes mm bits but the mm bit doesn't, but this might be
an artifact of how uffd is implemented.

I do in principle like the idea, as we can then seriously shift what I
consider to be impl details (mergey/splitty) to being as internal as we can
make it, but I think perhaps it's something we can address later if it
makes sense to move some uffd bits around.

>
> >
> > static inline int check_data_rlimit(unsigned long rlim,
> > unsigned long new,
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index a4a20de50494..73024693d5c8 100644
>

2023-10-09 18:21:54

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm: make vma_merge() and split_vma() internal

On Mon, Oct 09, 2023 at 05:45:26PM +0200, Vlastimil Babka wrote:
> On 10/8/23 22:23, Lorenzo Stoakes wrote:
> > Now the vma_merge()/split_vma() pattern has been abstracted, we use it
>
> "it" refers to split_vma() only so "the latter" or "split_vma()"?
>

I mean to say the pattern of attempting vma_merge(), then if that fails,
splitting as necessary. I will try to clarify the language in v2.

> > entirely internally within mm/mmap.c, so make the function static. We also
> > no longer need vma_merge() anywhere else except mm/mremap.c, so make it
> > internal.
> >
> > In addition, the split_vma() nommu variant also need not be exported.
> >
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>

Thanks!

2023-10-09 18:21:54

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm: abstract merge for new VMAs into vma_merge_new_vma()

On Mon, Oct 09, 2023 at 06:04:47PM +0200, Vlastimil Babka wrote:
> On 10/8/23 22:23, Lorenzo Stoakes wrote:
> > Only in mmap_region() and copy_vma() do we add VMAs which occupy entirely
> > new regions of virtual memory.
> >
> > We can share the logic between these invocations and make it absolutely
> > explici to reduce confusion around the rather inscrutible parameters
>
> explicit ... inscrutable
>

Ack will fix up in v2.

> > possessed by vma_merge().
> >
> > This also paves the way for a simplification of the core vma_merge()
> > implementation, as we seek to make the function entirely an implementation
> > detail.
> >
> > Note that on mmap_region(), vma fields are initialised to zero, so we can
> > simply reference these rather than explicitly specifying NULL.
>
> Right, if they were different from NULL, the code would be broken already.
>
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>

Thanks!

2023-10-09 18:22:19

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: abstract VMA extension and merge into vma_merge_extend() helper

On Mon, Oct 09, 2023 at 06:30:02PM +0200, Vlastimil Babka wrote:
> On 10/8/23 22:23, Lorenzo Stoakes wrote:
> > mremap uses vma_merge() in the case where a VMA needs to be extended. This
> > can be significantly simplified and abstracted.
> >
> > This makes it far easier to understand what the actual function is doing,
> > avoids future mistakes in use of the confusing vma_merge() function and
> > importantly allows us to make future changes to how vma_merge() is
> > implemented by knowing explicitly which merge cases each invocation uses.
> >
> > Note that in the mremap() extend case, we perform this merge only when
> > old_len == vma->vm_end - addr. The extension_start, i.e. the start of the
> > extended portion of the VMA is equal to addr + old_len, i.e. vma->vm_end.
> >
> > With this refactoring, vma_merge() is no longer required anywhere except
> > mm/mmap.c, so mark it static.
> >
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>

Thanks!

>
> Nit:
> > @@ -2546,6 +2546,24 @@ static struct vm_area_struct *vma_merge_new_vma(struct vma_iterator *vmi,
> > vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> > }
> >
> > +/*
> > + * Expand vma by delta bytes, potentially merging with an immediately adjacent
> > + * VMA with identical properties.
> > + */
> > +struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi,
> > + struct vm_area_struct *vma,
> > + unsigned long delta)
> > +{
> > + pgoff_t pgoff = vma->vm_pgoff +
> > + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT);
>
> could use vma_pages() here

Will update in v2.

>
> > +
> > + /* vma is specified as prev, so case 1 or 2 will apply. */
> > + return vma_merge(vmi, vma->vm_mm, vma, vma->vm_end, vma->vm_end + delta,
> > + vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff,
> > + vma_policy(vma), vma->vm_userfaultfd_ctx,
> > + anon_vma_name(vma));
> > +}
> > +
> > /*
> > * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
> > * @vmi: The vma iterator