Andrew,
This patch set does two things: 1. Clean up, including removal of
__vma_adjust() and 2. Extends the VMA iterator API to provide type
safety to the VMA operations using the maple tree, as requested by Linus
[1].
It also addresses another issue of usability brought up by Linus about
needing to modify the maple state within the loops. The maple state has
been replaced by the VMA iterator and the iterator is now modified
within the MM code so the caller should not need to worry about doing
the work themselves when tree modifications occur.
This brought up a potential inconsistency of the iterator state and what
the user expects, so the inconsistency is addressed to keep the VMA
iterator safe for use after the looping over a VMA range. This is
addressed in patch 3 ("maple_tree: Reduce user error potential") and 4
("test_maple_tree: Test modifications while iterating").
While cleaning up the state, the duplicate locking code in mm/mmap.c
introduced by the maple tree has been address by abstracting it to two
functions: vma_prepare() and vma_complete(). These abstractions allowed
for a much simpler __vma_adjust(), which eventually leads to the removal
of the __vma_adjust() function by placing the logic into the vma_merge()
function itself.
1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/
Changes since v3:
- Rebased patches on mm-unstable
- Updated damon patches as per SeongJae Park comments
- Added extra patch to convert the new call of vma_adjust() in mremap()
v3: https://lore.kernel.org/linux-mm/[email protected]/
v2: https://lore.kernel.org/linux-mm/[email protected]/
v1: https://lore.kernel.org/linux-mm/[email protected]/
Liam R. Howlett (49):
maple_tree: Add mas_init() function
maple_tree: Fix potential rcu issue
maple_tree: Reduce user error potential
test_maple_tree: Test modifications while iterating
maple_tree: Fix handle of invalidated state in mas_wr_store_setup()
maple_tree: Fix mas_prev() and mas_find() state handling
mm: Expand vma iterator interface
mm/mmap: convert brk to use vma iterator
kernel/fork: Convert forking to using the vmi iterator
mmap: Convert vma_link() vma iterator
mm/mmap: Remove preallocation from do_mas_align_munmap()
mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma
iterator
mmap: Convert vma_expand() to use vma iterator
mm: Add temporary vma iterator versions of vma_merge(), split_vma(),
and __split_vma()
ipc/shm: Use the vma iterator for munmap calls
userfaultfd: Use vma iterator
mm: Change mprotect_fixup to vma iterator
mlock: Convert mlock to vma iterator
coredump: Convert to vma iterator
mempolicy: Convert to vma iterator
task_mmu: Convert to vma iterator
sched: Convert to vma iterator
madvise: Use vmi iterator for __split_vma() and vma_merge()
mmap: Pass through vmi iterator to __split_vma()
mmap: Use vmi version of vma_merge()
mm/mremap: Use vmi version of vma_merge()
nommu: Convert nommu to using the vma iterator
nommu: Pass through vma iterator to shrink_vma()
mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator
mm/damon/vaddr-test.h: Stop using vma_mas_store() for maple tree store
mmap: Convert __vma_adjust() to use vma iterator
mm: Pass through vma iterator to __vma_adjust()
madvise: Use split_vma() instead of __split_vma()
mm: Remove unnecessary write to vma iterator in __vma_adjust()
mm: Pass vma iterator through to __vma_adjust()
mm: Add vma iterator to vma_adjust() arguments
mmap: Clean up mmap_region() unrolling
mm: Change munmap splitting order and move_vma()
mm/mmap: move anon_vma setting in __vma_adjust()
mm/mmap: Refactor locking out of __vma_adjust()
mm/mmap: Use vma_prepare() and vma_complete() in vma_expand()
mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep()
mm: Don't use __vma_adjust() in __split_vma()
mm/mremap: Convert vma_adjust() to vma_expand()
mm/mmap: Don't use __vma_adjust() in shift_arg_pages()
mm/mmap: Introduce dup_vma_anon() helper
mm/mmap: Convert do_brk_flags() to use vma_prepare() and
vma_complete()
mm/mmap: Remove __vma_adjust()
vma_merge: Set vma iterator to correct position.
fs/coredump.c | 8 +-
fs/exec.c | 16 +-
fs/proc/task_mmu.c | 27 +-
fs/userfaultfd.c | 87 ++-
include/linux/maple_tree.h | 11 +
include/linux/mm.h | 87 ++-
include/linux/mm_types.h | 4 +-
ipc/shm.c | 11 +-
kernel/events/uprobes.c | 2 +-
kernel/fork.c | 19 +-
kernel/sched/fair.c | 14 +-
lib/maple_tree.c | 19 +-
lib/test_maple_tree.c | 72 +++
mm/damon/vaddr-test.h | 20 +-
mm/filemap.c | 2 +-
mm/internal.h | 78 +++
mm/madvise.c | 13 +-
mm/mempolicy.c | 25 +-
mm/mlock.c | 57 +-
mm/mmap.c | 1022 +++++++++++++++++-------------------
mm/mprotect.c | 47 +-
mm/mremap.c | 44 +-
mm/nommu.c | 124 ++---
mm/rmap.c | 15 +-
24 files changed, 949 insertions(+), 875 deletions(-)
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Ensure the node isn't dead after reading the node end.
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index cb8c03c4dce7..cbb8bd9b9d25 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4654,13 +4654,13 @@ static inline void *mas_next_nentry(struct ma_state *mas,
pivots = ma_pivots(node, type);
slots = ma_slots(node, type);
mas->index = mas_safe_min(mas, pivots, mas->offset);
+ count = ma_data_end(node, type, pivots, mas->max);
if (ma_dead_node(node))
return NULL;
if (mas->index > max)
return NULL;
- count = ma_data_end(node, type, pivots, mas->max);
if (mas->offset > count)
return NULL;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Drop the vmi_* functions and transition all users to use the vma
iterator directly.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/userfaultfd.c | 14 ++++----
include/linux/mm.h | 18 +++--------
mm/madvise.c | 6 ++--
mm/mempolicy.c | 6 ++--
mm/mlock.c | 6 ++--
mm/mmap.c | 79 +++++++++++++---------------------------------
mm/mprotect.c | 6 ++--
mm/mremap.c | 10 +++---
mm/nommu.c | 8 +++--
9 files changed, 55 insertions(+), 98 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 4334bd35984d..f3c75c6222de 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -909,7 +909,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
continue;
}
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vmi_vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
+ prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
@@ -1452,7 +1452,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vma_end = min(end, vma->vm_end);
new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
- prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
+ prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
((struct vm_userfaultfd_ctx){ ctx }),
@@ -1463,12 +1463,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto next;
}
if (vma->vm_start < start) {
- ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+ ret = split_vma(&vmi, vma, start, 1);
if (ret)
break;
}
if (vma->vm_end > end) {
- ret = vmi_split_vma(&vmi, mm, vma, end, 0);
+ ret = split_vma(&vmi, vma, end, 0);
if (ret)
break;
}
@@ -1632,7 +1632,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
uffd_wp_range(mm, vma, start, vma_end - start, false);
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
+ prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
NULL_VM_UFFD_CTX, anon_vma_name(vma));
@@ -1641,12 +1641,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto next;
}
if (vma->vm_start < start) {
- ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+ ret = split_vma(&vmi, vma, start, 1);
if (ret)
break;
}
if (vma->vm_end > end) {
- ret = vmi_split_vma(&vmi, mm, vma, end, 0);
+ ret = split_vma(&vmi, vma, end, 0);
if (ret)
break;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bd0017ab13f3..9f519c6ea006 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2839,24 +2839,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
{
return __vma_adjust(vma, start, end, pgoff, insert, NULL);
}
-extern struct vm_area_struct *vma_merge(struct mm_struct *,
- struct vm_area_struct *prev, unsigned long addr, unsigned long end,
- unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
-extern struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags, struct anon_vma *,
struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
struct anon_vma_name *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
-extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
-extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
- struct vm_area_struct *, unsigned long addr, int new_below);
-extern int split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
-extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
- struct vm_area_struct *, unsigned long addr, int new_below);
+extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+ unsigned long addr, int new_below);
+extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+ unsigned long addr, int new_below);
extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void unlink_file_vma(struct vm_area_struct *);
extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
diff --git a/mm/madvise.c b/mm/madvise.c
index 4d4471916465..02b317726c9a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -150,7 +150,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
}
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vmi_vma_merge(&vmi, mm, *prev, start, end, new_flags,
+ *prev = vma_merge(&vmi, mm, *prev, start, end, new_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_name);
if (*prev) {
@@ -163,7 +163,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (start != vma->vm_start) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = vmi__split_vma(&vmi, mm, vma, start, 1);
+ error = __split_vma(&vmi, vma, start, 1);
if (error)
return error;
}
@@ -171,7 +171,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (end != vma->vm_end) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = vmi__split_vma(&vmi, mm, vma, end, 0);
+ error = __split_vma(&vmi, vma, end, 0);
if (error)
return error;
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f5201285c628..195dcf83dc41 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -809,7 +809,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
pgoff = vma->vm_pgoff +
((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vmi_vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
+ prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
new_pol, vma->vm_userfaultfd_ctx,
anon_vma_name(vma));
@@ -818,12 +818,12 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
goto replace;
}
if (vma->vm_start != vmstart) {
- err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmstart, 1);
+ err = split_vma(&vmi, vma, vmstart, 1);
if (err)
goto out;
}
if (vma->vm_end != vmend) {
- err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmend, 0);
+ err = split_vma(&vmi, vma, vmend, 0);
if (err)
goto out;
}
diff --git a/mm/mlock.c b/mm/mlock.c
index 0d09b9070071..0336f52e03d7 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -418,7 +418,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
goto out;
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
+ *prev = vma_merge(vmi, mm, *prev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*prev) {
@@ -427,13 +427,13 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
if (start != vma->vm_start) {
- ret = vmi_split_vma(vmi, mm, vma, start, 1);
+ ret = split_vma(vmi, vma, start, 1);
if (ret)
goto out;
}
if (end != vma->vm_end) {
- ret = vmi_split_vma(vmi, mm, vma, end, 0);
+ ret = split_vma(vmi, vma, end, 0);
if (ret)
goto out;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index 0696bf9e1085..4a5d5c9a8dc6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1013,7 +1013,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* parameter) may establish ptes with the wrong permissions of NNNN
* instead of the right permissions of XXXX.
*/
-struct vm_area_struct *vma_merge(struct mm_struct *mm,
+struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
@@ -1022,7 +1022,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
struct anon_vma_name *anon_name)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
- struct vm_area_struct *mid, *next, *res;
+ struct vm_area_struct *mid, *next, *res = NULL;
int err = -1;
bool merge_prev = false;
bool merge_next = false;
@@ -1088,26 +1088,11 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
if (err)
return NULL;
khugepaged_enter_vma(res, vm_flags);
- return res;
-}
-struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
- struct mm_struct *mm,
- struct vm_area_struct *prev, unsigned long addr,
- unsigned long end, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
-{
- struct vm_area_struct *tmp;
-
- tmp = vma_merge(mm, prev, addr, end, vm_flags, anon_vma, file, pgoff,
- policy, vm_userfaultfd_ctx, anon_name);
- if (tmp)
+ if (res)
vma_iter_set(vmi, end);
- return tmp;
+ return res;
}
/*
@@ -2231,12 +2216,14 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
* __split_vma() bypasses sysctl_max_map_count checking. We use this where it
* has already been checked or doesn't make sense to fail.
*/
-int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
struct vm_area_struct *new;
int err;
- validate_mm_mt(mm);
+ unsigned long end = vma->vm_end;
+
+ validate_mm_mt(vma->vm_mm);
if (vma->vm_ops && vma->vm_ops->may_split) {
err = vma->vm_ops->may_split(vma, addr);
@@ -2276,8 +2263,10 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
/* Success. */
- if (!err)
+ if (!err) {
+ vma_iter_set(vmi, end);
return 0;
+ }
/* Avoid vm accounting in close() operation */
new->vm_start = new->vm_end;
@@ -2292,46 +2281,21 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
mpol_put(vma_policy(new));
out_free_vma:
vm_area_free(new);
- validate_mm_mt(mm);
+ validate_mm_mt(vma->vm_mm);
return err;
}
-int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long addr, int new_below)
-{
- int ret;
- unsigned long end = vma->vm_end;
-
- ret = __split_vma(mm, vma, addr, new_below);
- if (!ret)
- vma_iter_set(vmi, end);
-
- return ret;
-}
/*
* Split a vma into two pieces at address 'addr', a new vma is allocated
* either for the first part or the tail.
*/
-int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
- if (mm->map_count >= sysctl_max_map_count)
+ if (vma->vm_mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
- return __split_vma(mm, vma, addr, new_below);
-}
-
-int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long addr, int new_below)
-{
- int ret;
- unsigned long end = vma->vm_end;
-
- ret = split_vma(mm, vma, addr, new_below);
- if (!ret)
- vma_iter_set(vmi, end);
-
- return ret;
+ return __split_vma(vmi, vma, addr, new_below);
}
static inline int munmap_sidetree(struct vm_area_struct *vma,
@@ -2391,7 +2355,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
goto map_count_exceeded;
- error = vmi__split_vma(vmi, mm, vma, start, 0);
+ error = __split_vma(vmi, vma, start, 0);
if (error)
goto start_split_failed;
@@ -2412,7 +2376,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (next->vm_end > end) {
struct vm_area_struct *split;
- error = vmi__split_vma(vmi, mm, next, end, 1);
+ error = __split_vma(vmi, next, end, 1);
if (error)
goto end_split_failed;
@@ -2693,9 +2657,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && prev)) {
- merge = vmi_vma_merge(&vmi, mm, prev, vma->vm_start,
- vma->vm_end, vma->vm_flags, NULL, vma->vm_file,
- vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
+ merge = vma_merge(&vmi, mm, prev, vma->vm_start,
+ vma->vm_end, vma->vm_flags, NULL,
+ vma->vm_file, vma->vm_pgoff, NULL,
+ NULL_VM_UFFD_CTX, NULL);
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -3241,7 +3206,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */
- new_vma = vmi_vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
+ new_vma = vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (new_vma) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 42ceb0548754..c417f7d5d0e3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -642,7 +642,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
* First try to merge with previous and/or next vma.
*/
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *pprev = vmi_vma_merge(vmi, mm, *pprev, start, end, newflags,
+ *pprev = vma_merge(vmi, mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*pprev) {
@@ -654,13 +654,13 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
*pprev = vma;
if (start != vma->vm_start) {
- error = vmi_split_vma(vmi, mm, vma, start, 1);
+ error = split_vma(vmi, vma, start, 1);
if (error)
goto fail;
}
if (end != vma->vm_end) {
- error = vmi_split_vma(vmi, mm, vma, end, 0);
+ error = split_vma(vmi, vma, end, 0);
if (error)
goto fail;
}
diff --git a/mm/mremap.c b/mm/mremap.c
index f161516ab3c1..71ba8eddd836 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1043,12 +1043,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
* when a vma would be actually removed due to a merge.
*/
if (!vma->vm_ops || !vma->vm_ops->close) {
- vma = vmi_vma_merge(&vmi, mm, vma,
- extension_start, extension_end,
- vma->vm_flags, vma->anon_vma,
- vma->vm_file, extension_pgoff,
- vma_policy(vma), vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
+ vma = vma_merge(&vmi, mm, vma, extension_start,
+ extension_end, vma->vm_flags, vma->anon_vma,
+ vma->vm_file, extension_pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
} else if (vma_adjust(vma, vma->vm_start, addr + new_len,
vma->vm_pgoff, NULL)) {
vma = NULL;
diff --git a/mm/nommu.c b/mm/nommu.c
index 9ddeb92600d6..9a166738909e 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1297,18 +1297,20 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
* split a vma into two pieces at address 'addr', a new vma is allocated either
* for the first part or the tail.
*/
-int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long addr, int new_below)
+int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long addr, int new_below)
{
struct vm_area_struct *new;
struct vm_region *region;
unsigned long npages;
+ struct mm_struct *mm;
/* we're only permitted to split anonymous regions (these should have
* only a single usage on the region) */
if (vma->vm_file)
return -ENOMEM;
+ mm = vma->vm_mm;
if (mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
@@ -1465,7 +1467,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
if (end != vma->vm_end && offset_in_page(end))
return -EINVAL;
if (start != vma->vm_start && end != vma->vm_end) {
- ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+ ret = split_vma(&vmi, vma, start, 1);
if (ret < 0)
return ret;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
When iterating, a user may operate on the tree and cause the maple state
to be altered and left in an unintuitive state. Detect this scenario
and correct it by setting to the limit and invalidating the state.
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index cbb8bd9b9d25..1af09c6f7810 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4731,6 +4731,11 @@ static inline void *mas_next_entry(struct ma_state *mas, unsigned long limit)
unsigned long last;
enum maple_type mt;
+ if (mas->index > limit) {
+ mas->index = mas->last = limit;
+ mas_pause(mas);
+ return NULL;
+ }
last = mas->last;
retry:
offset = mas->offset;
@@ -4837,6 +4842,11 @@ static inline void *mas_prev_entry(struct ma_state *mas, unsigned long min)
{
void *entry;
+ if (mas->index < min) {
+ mas->index = mas->last = min;
+ mas_pause(mas);
+ return NULL;
+ }
retry:
while (likely(!mas_is_none(mas))) {
entry = mas_prev_nentry(mas, min, mas->index);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
These wrappers are short-lived in this patch set so that each user can
be converted on its own. In the end, these functions are renamed in one
commit.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 11 ++++++++++-
mm/mmap.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 152a1362b800..956025940053 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2843,11 +2843,20 @@ extern struct vm_area_struct *vma_merge(struct mm_struct *,
struct vm_area_struct *prev, unsigned long addr, unsigned long end,
unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
+extern struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+ struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
+ unsigned long end, unsigned long vm_flags, struct anon_vma *,
+ struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
+ struct anon_vma_name *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
+ unsigned long addr, int new_below);
+extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
+ struct vm_area_struct *, unsigned long addr, int new_below);
extern int split_vma(struct mm_struct *, struct vm_area_struct *,
unsigned long addr, int new_below);
+extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
+ struct vm_area_struct *, unsigned long addr, int new_below);
extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void unlink_file_vma(struct vm_area_struct *);
extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ec671a119c1..5092d0405883 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1091,6 +1091,25 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
return res;
}
+struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+ struct mm_struct *mm,
+ struct vm_area_struct *prev, unsigned long addr,
+ unsigned long end, unsigned long vm_flags,
+ struct anon_vma *anon_vma, struct file *file,
+ pgoff_t pgoff, struct mempolicy *policy,
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct anon_vma_name *anon_name)
+{
+ struct vm_area_struct *tmp;
+
+ tmp = vma_merge(mm, prev, addr, end, vm_flags, anon_vma, file, pgoff,
+ policy, vm_userfaultfd_ctx, anon_name);
+ if (tmp)
+ vma_iter_set(vmi, end);
+
+ return tmp;
+}
+
/*
* Rough compatibility check to quickly see if it's even worth looking
* at sharing an anon_vma.
@@ -2276,6 +2295,18 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
validate_mm_mt(mm);
return err;
}
+int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
+ struct vm_area_struct *vma, unsigned long addr, int new_below)
+{
+ int ret;
+ unsigned long end = vma->vm_end;
+
+ ret = __split_vma(mm, vma, addr, new_below);
+ if (!ret)
+ vma_iter_set(vmi, end);
+
+ return ret;
+}
/*
* Split a vma into two pieces at address 'addr', a new vma is allocated
@@ -2290,6 +2321,19 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
return __split_vma(mm, vma, addr, new_below);
}
+int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
+ struct vm_area_struct *vma, unsigned long addr, int new_below)
+{
+ int ret;
+ unsigned long end = vma->vm_end;
+
+ ret = split_vma(mm, vma, addr, new_below);
+ if (!ret)
+ vma_iter_set(vmi, end);
+
+ return ret;
+}
+
static inline int munmap_sidetree(struct vm_area_struct *vma,
struct ma_state *mas_detach)
{
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the abstracted vma locking for do_brk_flags()
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 0a2b19633174..5aa048e9ff30 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2928,6 +2928,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, unsigned long len, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vma_prepare vp;
validate_mm_mt(mm);
/*
@@ -2955,18 +2956,13 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
goto unacct_fail;
vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
- if (vma->anon_vma) {
- anon_vma_lock_write(vma->anon_vma);
- anon_vma_interval_tree_pre_update_vma(vma);
- }
+ init_vma_prep(&vp, vma);
+ vma_prepare(&vp);
vma->vm_end = addr + len;
vma->vm_flags |= VM_SOFTDIRTY;
vma_iter_store(vmi, vma);
- if (vma->anon_vma) {
- anon_vma_interval_tree_post_update_vma(vma);
- anon_vma_unlock_write(vma->anon_vma);
- }
+ vma_complete(&vp, vmi, mm);
khugepaged_enter_vma(vma, flags);
goto out;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 5 ++++-
include/linux/mm.h | 6 +++---
mm/mprotect.c | 47 ++++++++++++++++++++++------------------------
3 files changed, 29 insertions(+), 29 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index ab913243a367..b98647eeae9f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -758,6 +758,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_expand;
unsigned long rlim_stack;
struct mmu_gather tlb;
+ struct vma_iterator vmi;
#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size */
@@ -812,8 +813,10 @@ int setup_arg_pages(struct linux_binprm *bprm,
vm_flags |= mm->def_flags;
vm_flags |= VM_STACK_INCOMPLETE_SETUP;
+ vma_iter_init(&vmi, mm, vma->vm_start);
+
tlb_gather_mmu(&tlb, mm);
- ret = mprotect_fixup(&tlb, vma, &prev, vma->vm_start, vma->vm_end,
+ ret = mprotect_fixup(&vmi, &tlb, vma, &prev, vma->vm_start, vma->vm_end,
vm_flags);
tlb_finish_mmu(&tlb);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 956025940053..bd0017ab13f3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2197,9 +2197,9 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
extern long change_protection(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
unsigned long end, unsigned long cp_flags);
-extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
- struct vm_area_struct **pprev, unsigned long start,
- unsigned long end, unsigned long newflags);
+extern int mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+ struct vm_area_struct *vma, struct vm_area_struct **pprev,
+ unsigned long start, unsigned long end, unsigned long newflags);
/*
* doesn't attempt to fault and will return short.
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 6ecdf0671b81..42ceb0548754 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -585,9 +585,9 @@ static const struct mm_walk_ops prot_none_walk_ops = {
};
int
-mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
- struct vm_area_struct **pprev, unsigned long start,
- unsigned long end, unsigned long newflags)
+mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+ struct vm_area_struct *vma, struct vm_area_struct **pprev,
+ unsigned long start, unsigned long end, unsigned long newflags)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long oldflags = vma->vm_flags;
@@ -642,7 +642,7 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
* First try to merge with previous and/or next vma.
*/
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *pprev = vma_merge(mm, *pprev, start, end, newflags,
+ *pprev = vmi_vma_merge(vmi, mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*pprev) {
@@ -654,13 +654,13 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
*pprev = vma;
if (start != vma->vm_start) {
- error = split_vma(mm, vma, start, 1);
+ error = vmi_split_vma(vmi, mm, vma, start, 1);
if (error)
goto fail;
}
if (end != vma->vm_end) {
- error = split_vma(mm, vma, end, 0);
+ error = vmi_split_vma(vmi, mm, vma, end, 0);
if (error)
goto fail;
}
@@ -709,7 +709,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
const bool rier = (current->personality & READ_IMPLIES_EXEC) &&
(prot & PROT_READ);
struct mmu_gather tlb;
- MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;
start = untagged_addr(start);
@@ -741,8 +741,8 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
goto out;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, current->mm, start);
+ vma = vma_find(&vmi, end);
error = -ENOMEM;
if (!vma)
goto out;
@@ -765,18 +765,22 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
}
}
+ prev = vma_prev(&vmi);
if (start > vma->vm_start)
prev = vma;
- else
- prev = mas_prev(&mas, 0);
tlb_gather_mmu(&tlb, current->mm);
- for (nstart = start ; ; ) {
+ nstart = start;
+ tmp = vma->vm_start;
+ for_each_vma_range(vmi, vma, end) {
unsigned long mask_off_old_flags;
unsigned long newflags;
int new_vma_pkey;
- /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
+ if (vma->vm_start != tmp) {
+ error = -ENOMEM;
+ break;
+ }
/* Does the application expect PROT_READ to imply PROT_EXEC */
if (rier && (vma->vm_flags & VM_MAYEXEC))
@@ -819,25 +823,18 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
break;
}
- error = mprotect_fixup(&tlb, vma, &prev, nstart, tmp, newflags);
+ error = mprotect_fixup(&vmi, &tlb, vma, &prev, nstart, tmp, newflags);
if (error)
break;
nstart = tmp;
-
- if (nstart < prev->vm_end)
- nstart = prev->vm_end;
- if (nstart >= end)
- break;
-
- vma = find_vma(current->mm, prev->vm_end);
- if (!vma || vma->vm_start != nstart) {
- error = -ENOMEM;
- break;
- }
prot = reqprot;
}
tlb_finish_mmu(&tlb);
+
+ if (vma_iter_end(&vmi) < end)
+ error = -ENOMEM;
+
out:
mmap_write_unlock(current->mm);
return error;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Add a function that will zero out the maple state struct and set some
basic defaults.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/maple_tree.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index a7bf58fd7cc6..1fadb5f5978b 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -432,6 +432,7 @@ struct ma_wr_state {
.min = 0, \
.max = ULONG_MAX, \
.alloc = NULL, \
+ .mas_flags = 0, \
}
#define MA_WR_STATE(name, ma_state, wr_entry) \
@@ -470,6 +471,16 @@ void *mas_next(struct ma_state *mas, unsigned long max);
int mas_empty_area(struct ma_state *mas, unsigned long min, unsigned long max,
unsigned long size);
+static inline void mas_init(struct ma_state *mas, struct maple_tree *tree,
+ unsigned long addr)
+{
+ memset(mas, 0, sizeof(struct ma_state));
+ mas->tree = tree;
+ mas->index = mas->last = addr;
+ mas->max = ULONG_MAX;
+ mas->node = MAS_START;
+}
+
/* Checks if a mas has not found anything */
static inline bool mas_is_none(struct ma_state *mas)
{
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Add wrappers for the maple tree to the vma iterator. This will provide
type safety at compile time.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 46 ++++++++++++++++++++++++++---
include/linux/mm_types.h | 4 +--
mm/internal.h | 64 ++++++++++++++++++++++++++++++++++++++++
mm/mmap.c | 18 +++++++++++
4 files changed, 125 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c9db257f09b3..b977a90d9829 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -670,16 +670,16 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
static inline
struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max)
{
- return mas_find(&vmi->mas, max);
+ return mas_find(&vmi->mas, max - 1);
}
static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi)
{
/*
- * Uses vma_find() to get the first VMA when the iterator starts.
+ * Uses mas_find() to get the first VMA when the iterator starts.
* Calling mas_next() could skip the first entry.
*/
- return vma_find(vmi, ULONG_MAX);
+ return mas_find(&vmi->mas, ULONG_MAX);
}
static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi)
@@ -692,12 +692,50 @@ static inline unsigned long vma_iter_addr(struct vma_iterator *vmi)
return vmi->mas.index;
}
+static inline unsigned long vma_iter_end(struct vma_iterator *vmi)
+{
+ return vmi->mas.last + 1;
+}
+static inline int vma_iter_bulk_alloc(struct vma_iterator *vmi,
+ unsigned long count)
+{
+ return mas_expected_entries(&vmi->mas, count);
+}
+
+/* Free any unused preallocations */
+static inline void vma_iter_free(struct vma_iterator *vmi)
+{
+ mas_destroy(&vmi->mas);
+}
+
+static inline int vma_iter_bulk_store(struct vma_iterator *vmi,
+ struct vm_area_struct *vma)
+{
+ vmi->mas.index = vma->vm_start;
+ vmi->mas.last = vma->vm_end - 1;
+ mas_store(&vmi->mas, vma);
+ if (unlikely(mas_is_err(&vmi->mas)))
+ return -ENOMEM;
+
+ return 0;
+}
+
+static inline void vma_iter_invalidate(struct vma_iterator *vmi)
+{
+ mas_pause(&vmi->mas);
+}
+
+static inline void vma_iter_set(struct vma_iterator *vmi, unsigned long addr)
+{
+ mas_set(&vmi->mas, addr);
+}
+
#define for_each_vma(__vmi, __vma) \
while (((__vma) = vma_next(&(__vmi))) != NULL)
/* The MM code likes to work with exclusive end addresses */
#define for_each_vma_range(__vmi, __vma, __end) \
- while (((__vma) = vma_find(&(__vmi), (__end) - 1)) != NULL)
+ while (((__vma) = vma_find(&(__vmi), (__end))) != NULL)
#ifdef CONFIG_SHMEM
/*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a689198caf74..2d6d790d9bed 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -854,9 +854,7 @@ struct vma_iterator {
static inline void vma_iter_init(struct vma_iterator *vmi,
struct mm_struct *mm, unsigned long addr)
{
- vmi->mas.tree = &mm->mm_mt;
- vmi->mas.index = addr;
- vmi->mas.node = MAS_START;
+ mas_init(&vmi->mas, &mm->mm_mt, addr);
}
struct mmu_gather;
diff --git a/mm/internal.h b/mm/internal.h
index ce462bf145b4..b4f66efc912d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -873,4 +873,68 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
return !(vma->vm_flags & VM_SOFTDIRTY);
}
+/*
+ * VMA Iterator functions shared between nommu and mmap
+ */
+static inline int vma_iter_prealloc(struct vma_iterator *vmi)
+{
+ return mas_preallocate(&vmi->mas, GFP_KERNEL);
+}
+
+static inline void vma_iter_clear(struct vma_iterator *vmi,
+ unsigned long start, unsigned long end)
+{
+ mas_set_range(&vmi->mas, start, end - 1);
+ mas_store_prealloc(&vmi->mas, NULL);
+}
+
+static inline struct vm_area_struct *vma_iter_load(struct vma_iterator *vmi)
+{
+ return mas_walk(&vmi->mas);
+}
+
+/* Store a VMA with preallocated memory */
+static inline void vma_iter_store(struct vma_iterator *vmi,
+ struct vm_area_struct *vma)
+{
+
+#if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
+ if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.index > vma->vm_start)) {
+ printk("%lu > %lu\n", vmi->mas.index, vma->vm_start);
+ printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+ printk("into slot %lu-%lu", vmi->mas.index, vmi->mas.last);
+ mt_dump(vmi->mas.tree);
+ }
+ if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.last < vma->vm_start)) {
+ printk("%lu < %lu\n", vmi->mas.last, vma->vm_start);
+ printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+ printk("into slot %lu-%lu", vmi->mas.index, vmi->mas.last);
+ mt_dump(vmi->mas.tree);
+ }
+#endif
+
+ if (vmi->mas.node != MAS_START &&
+ ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+ vma_iter_invalidate(vmi);
+
+ vmi->mas.index = vma->vm_start;
+ vmi->mas.last = vma->vm_end - 1;
+ mas_store_prealloc(&vmi->mas, vma);
+}
+
+static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
+ struct vm_area_struct *vma, gfp_t gfp)
+{
+ if (vmi->mas.node != MAS_START &&
+ ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+ vma_iter_invalidate(vmi);
+
+ vmi->mas.index = vma->vm_start;
+ vmi->mas.last = vma->vm_end - 1;
+ mas_store_gfp(&vmi->mas, vma, gfp);
+ if (unlikely(mas_is_err(&vmi->mas)))
+ return -ENOMEM;
+
+ return 0;
+}
#endif /* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index 335ba3df9898..253a7490fae3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -144,6 +144,24 @@ static void remove_vma(struct vm_area_struct *vma)
vm_area_free(vma);
}
+static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *vmi,
+ unsigned long min)
+{
+ return mas_prev(&vmi->mas, min);
+}
+
+static inline int vma_iter_clear_gfp(struct vma_iterator *vmi,
+ unsigned long start, unsigned long end, gfp_t gfp)
+{
+ vmi->mas.index = start;
+ vmi->mas.last = end - 1;
+ mas_store_gfp(&vmi->mas, NULL, gfp);
+ if (unlikely(mas_is_err(&vmi->mas)))
+ return -ENOMEM;
+
+ return 0;
+}
+
/*
* check_brk_limits() - Use platform specific check of range & verify mlock
* limits.
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Pass the iterator through to be used in __vma_adjust(). The state of
the iterator needs to be correct for the operation that will occur so
make the adjustments.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 58b2187b447b..c7d72475ba6d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -528,6 +528,10 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_interval_tree_remove(vma, root);
}
+ /* VMA iterator points to previous, so set to start if necessary */
+ if (vma_iter_addr(vmi) != start)
+ vma_iter_set(vmi, start);
+
vma->vm_start = start;
vma->vm_end = end;
vma->vm_pgoff = pgoff;
@@ -2167,13 +2171,13 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
/*
* __split_vma() bypasses sysctl_max_map_count checking. We use this where it
* has already been checked or doesn't make sense to fail.
+ * VMA Iterator will point to the end VMA.
*/
int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
struct vm_area_struct *new;
int err;
- unsigned long end = vma->vm_end;
validate_mm_mt(vma->vm_mm);
@@ -2209,14 +2213,17 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
new->vm_ops->open(new);
if (new_below)
- err = vma_adjust(vma, addr, vma->vm_end, vma->vm_pgoff +
- ((addr - new->vm_start) >> PAGE_SHIFT), new);
+ err = __vma_adjust(vmi, vma, addr, vma->vm_end,
+ vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
+ new, NULL);
else
- err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
+ err = __vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
+ new, NULL);
/* Success. */
if (!err) {
- vma_iter_set(vmi, end);
+ if (new_below)
+ vma_next(vmi);
return 0;
}
@@ -2311,8 +2318,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (error)
goto start_split_failed;
- vma_iter_set(vmi, start);
- vma = vma_find(vmi, end);
+ vma = vma_iter_load(vmi);
}
prev = vma_prev(vmi);
@@ -2332,7 +2338,6 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (error)
goto end_split_failed;
- vma_iter_set(vmi, end);
split = vma_prev(vmi);
error = munmap_sidetree(split, &mas_detach);
if (error)
@@ -2576,6 +2581,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
goto unacct_error;
}
+ vma_iter_set(&vmi, addr);
vma->vm_start = addr;
vma->vm_end = end;
vma->vm_flags = vm_flags;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator internally for __vma_adjust(). Avoid using the
maple tree interface directly for type safety.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 3 --
mm/mmap.c | 75 ++++++++--------------------------------------
2 files changed, 13 insertions(+), 65 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9f519c6ea006..170a06e46cc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2856,9 +2856,6 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
bool *need_rmap_locks);
extern void exit_mmap(struct mm_struct *);
-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas);
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas);
-
static inline int check_data_rlimit(unsigned long rlim,
unsigned long new,
unsigned long start,
diff --git a/mm/mmap.c b/mm/mmap.c
index 4a5d5c9a8dc6..19e5a79d5ca7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -435,56 +435,6 @@ static void __vma_link_file(struct vm_area_struct *vma,
flush_dcache_mmap_unlock(mapping);
}
-/*
- * vma_mas_store() - Store a VMA in the maple tree.
- * @vma: The vm_area_struct
- * @mas: The maple state
- *
- * Efficient way to store a VMA in the maple tree when the @mas has already
- * walked to the correct location.
- *
- * Note: the end address is inclusive in the maple tree.
- */
-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas)
-{
- trace_vma_store(mas->tree, vma);
- mas_set_range(mas, vma->vm_start, vma->vm_end - 1);
- mas_store_prealloc(mas, vma);
-}
-
-/*
- * vma_mas_remove() - Remove a VMA from the maple tree.
- * @vma: The vm_area_struct
- * @mas: The maple state
- *
- * Efficient way to remove a VMA from the maple tree when the @mas has already
- * been established and points to the correct location.
- * Note: the end address is inclusive in the maple tree.
- */
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas)
-{
- trace_vma_mas_szero(mas->tree, vma->vm_start, vma->vm_end - 1);
- mas->index = vma->vm_start;
- mas->last = vma->vm_end - 1;
- mas_store_prealloc(mas, NULL);
-}
-
-/*
- * vma_mas_szero() - Set a given range to zero. Used when modifying a
- * vm_area_struct start or end.
- *
- * @mas: The maple tree ma_state
- * @start: The start address to zero
- * @end: The end address to zero.
- */
-static inline void vma_mas_szero(struct ma_state *mas, unsigned long start,
- unsigned long end)
-{
- trace_vma_mas_szero(mas->tree, start, end - 1);
- mas_set_range(mas, start, end - 1);
- mas_store_prealloc(mas, NULL);
-}
-
static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
{
VMA_ITERATOR(vmi, mm, 0);
@@ -644,7 +594,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
bool vma_changed = false;
long adjust_next = 0;
int remove_next = 0;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
struct vm_area_struct *exporter = NULL, *importer = NULL;
if (next && !insert) {
@@ -729,7 +679,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
}
- if (mas_preallocate(&mas, GFP_KERNEL))
+ if (vma_iter_prealloc(&vmi))
return -ENOMEM;
vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
@@ -775,7 +725,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (start != vma->vm_start) {
if ((vma->vm_start < start) &&
(!insert || (insert->vm_end != start))) {
- vma_mas_szero(&mas, vma->vm_start, start);
+ vma_iter_clear(&vmi, vma->vm_start, start);
VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
} else {
vma_changed = true;
@@ -785,8 +735,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (end != vma->vm_end) {
if (vma->vm_end > end) {
if (!insert || (insert->vm_start != end)) {
- vma_mas_szero(&mas, end, vma->vm_end);
- mas_reset(&mas);
+ vma_iter_clear(&vmi, end, vma->vm_end);
+ vma_iter_set(&vmi, vma->vm_end);
VM_WARN_ON(insert &&
insert->vm_end < vma->vm_end);
}
@@ -797,13 +747,13 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
if (vma_changed)
- vma_mas_store(vma, &mas);
+ vma_iter_store(&vmi, vma);
vma->vm_pgoff = pgoff;
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- vma_mas_store(next, &mas);
+ vma_iter_store(&vmi, next);
}
if (file) {
@@ -823,8 +773,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
* us to insert it before dropping the locks
* (it may either follow vma or precede it).
*/
- mas_reset(&mas);
- vma_mas_store(insert, &mas);
+ vma_iter_store(&vmi, insert);
mm->map_count++;
}
@@ -870,7 +819,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (insert && file)
uprobe_mmap(insert);
- mas_destroy(&mas);
+ vma_iter_free(&vmi);
validate_mm(mm);
return 0;
@@ -2002,7 +1951,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
anon_vma_interval_tree_pre_update_vma(vma);
vma->vm_end = address;
/* Overwrite old entry in mtree. */
- vma_mas_store(vma, &mas);
+ mas_set_range(&mas, vma->vm_start, address - 1);
+ mas_store_prealloc(&mas, vma);
anon_vma_interval_tree_post_update_vma(vma);
spin_unlock(&mm->page_table_lock);
@@ -2084,7 +2034,8 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
vma->vm_start = address;
vma->vm_pgoff -= grow;
/* Overwrite old entry in mtree. */
- vma_mas_store(vma, &mas);
+ mas_set_range(&mas, address, vma->vm_end - 1);
+ mas_store_prealloc(&mas, vma);
anon_vma_interval_tree_post_update_vma(vma);
spin_unlock(&mm->page_table_lock);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Add a testcase to ensure the iterator detects bad states on modifications
and does what the user expects
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/test_maple_tree.c | 72 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)
diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index ec847bf4dcb4..3d19b1f78d71 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -1709,6 +1709,74 @@ static noinline void check_forking(struct maple_tree *mt)
mtree_destroy(&newmt);
}
+static noinline void check_iteration(struct maple_tree *mt)
+{
+ int i, nr_entries = 125;
+ void *val;
+ MA_STATE(mas, mt, 0, 0);
+
+ for (i = 0; i <= nr_entries; i++)
+ mtree_store_range(mt, i * 10, i * 10 + 9,
+ xa_mk_value(i), GFP_KERNEL);
+
+ mt_set_non_kernel(99999);
+
+ i = 0;
+ mas_lock(&mas);
+ mas_for_each(&mas, val, 925) {
+ MT_BUG_ON(mt, mas.index != i * 10);
+ MT_BUG_ON(mt, mas.last != i * 10 + 9);
+ /* Overwrite end of entry 92 */
+ if (i == 92) {
+ mas.index = 925;
+ mas.last = 929;
+ mas_store(&mas, val);
+ }
+ i++;
+ }
+ /* Ensure mas_find() gets the next value */
+ val = mas_find(&mas, ULONG_MAX);
+ MT_BUG_ON(mt, val != xa_mk_value(i));
+
+ mas_set(&mas, 0);
+ i = 0;
+ mas_for_each(&mas, val, 785) {
+ MT_BUG_ON(mt, mas.index != i * 10);
+ MT_BUG_ON(mt, mas.last != i * 10 + 9);
+ /* Overwrite start of entry 78 */
+ if (i == 78) {
+ mas.index = 780;
+ mas.last = 785;
+ mas_store(&mas, val);
+ } else {
+ i++;
+ }
+ }
+ val = mas_find(&mas, ULONG_MAX);
+ MT_BUG_ON(mt, val != xa_mk_value(i));
+
+ mas_set(&mas, 0);
+ i = 0;
+ mas_for_each(&mas, val, 765) {
+ MT_BUG_ON(mt, mas.index != i * 10);
+ MT_BUG_ON(mt, mas.last != i * 10 + 9);
+ /* Overwrite end of entry 76 and advance to the end */
+ if (i == 76) {
+ mas.index = 760;
+ mas.last = 765;
+ mas_store(&mas, val);
+ mas_next(&mas, ULONG_MAX);
+ }
+ i++;
+ }
+ /* Make sure the next find returns the one after 765, 766-769 */
+ val = mas_find(&mas, ULONG_MAX);
+ MT_BUG_ON(mt, val != xa_mk_value(76));
+ mas_unlock(&mas);
+ mas_destroy(&mas);
+ mt_set_non_kernel(0);
+}
+
static noinline void check_mas_store_gfp(struct maple_tree *mt)
{
@@ -2659,6 +2727,10 @@ static int maple_tree_seed(void)
goto skip;
#endif
+ mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+ check_iteration(&tree);
+ mtree_destroy(&tree);
+
mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
check_forking(&tree);
mtree_destroy(&tree);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Inline the work of __vma_adjust() into vma_merge(). This reduces code
size and has the added benefits of the comments for the cases being
located with the code.
Change the comments referencing vma_adjust() accordingly.
Signed-off-by: Liam R. Howlett <[email protected]>
---
kernel/events/uprobes.c | 2 +-
mm/filemap.c | 2 +-
mm/mmap.c | 250 ++++++++++++++++------------------------
mm/rmap.c | 15 +--
4 files changed, 107 insertions(+), 162 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1a3904e0179c..59887c69d54c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1351,7 +1351,7 @@ static int delayed_ref_ctr_inc(struct vm_area_struct *vma)
}
/*
- * Called from mmap_region/vma_adjust with mm->mmap_lock acquired.
+ * Called from mmap_region/vma_merge with mm->mmap_lock acquired.
*
* Currently we ignore all errors and always return 0, the callers
* can't handle the failure anyway.
diff --git a/mm/filemap.c b/mm/filemap.c
index c915ded191f0..992554c18f1f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -97,7 +97,7 @@
* ->i_pages lock (__sync_single_inode)
*
* ->i_mmap_rwsem
- * ->anon_vma.lock (vma_adjust)
+ * ->anon_vma.lock (vma_merge)
*
* ->anon_vma.lock
* ->page_table_lock or pte_lock (anon_vma_prepare and various)
diff --git a/mm/mmap.c b/mm/mmap.c
index 5aa048e9ff30..e227b7cd71aa 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -743,133 +743,6 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
return 0;
}
-/*
- * We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
- * is already present in an i_mmap tree without adjusting the tree.
- * The following helper function should be used when such adjustments
- * are necessary. The "insert" vma (if any) is to be inserted
- * before we drop the necessary locks.
- */
-int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *expand)
-{
- struct mm_struct *mm = vma->vm_mm;
- struct vm_area_struct *remove2 = NULL;
- struct vm_area_struct *remove = NULL;
- struct vm_area_struct *next = find_vma(mm, vma->vm_end);
- struct vm_area_struct *orig_vma = vma;
- struct file *file = vma->vm_file;
- bool vma_changed = false;
- long adjust_next = 0;
- struct vma_prepare vma_prep;
-
- if (next) {
- int error = 0;
-
- if (end >= next->vm_end) {
- /*
- * vma expands, overlapping all the next, and
- * perhaps the one after too (mprotect case 6).
- * The only other cases that gets here are
- * case 1, case 7 and case 8.
- */
- if (next == expand) {
- /*
- * The only case where we don't expand "vma"
- * and we expand "next" instead is case 8.
- */
- VM_WARN_ON(end != next->vm_end);
- /*
- * we're removing "vma" and that to do so we
- * swapped "vma" and "next".
- */
- VM_WARN_ON(file != next->vm_file);
- swap(vma, next);
- remove = next;
- } else {
- VM_WARN_ON(expand != vma);
- /*
- * case 1, 6, 7, remove next.
- * case 6 also removes the one beyond next
- */
- remove = next;
- if (end > next->vm_end)
- remove2 = find_vma(mm, next->vm_end);
-
- VM_WARN_ON(remove2 != NULL &&
- end != remove2->vm_end);
- }
-
- /*
- * If next doesn't have anon_vma, import from vma after
- * next, if the vma overlaps with it.
- */
- if (remove != NULL && !next->anon_vma)
- error = dup_anon_vma(vma, remove2);
- else
- error = dup_anon_vma(vma, remove);
-
- } else if (end > next->vm_start) {
- /*
- * vma expands, overlapping part of the next:
- * mprotect case 5 shifting the boundary up.
- */
- adjust_next = (end - next->vm_start);
- VM_WARN_ON(expand != vma);
- error = dup_anon_vma(vma, next);
- } else if (end < vma->vm_end) {
- /*
- * vma shrinks, and !insert tells it's not
- * split_vma inserting another: so it must be
- * mprotect case 4 shifting the boundary down.
- */
- adjust_next = -(vma->vm_end - end);
- VM_WARN_ON(expand != next);
- error = dup_anon_vma(next, vma);
- }
- if (error)
- return error;
- }
-
- if (vma_iter_prealloc(vmi))
- return -ENOMEM;
-
- vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
-
- init_multi_vma_prep(&vma_prep, vma, adjust_next ? next : NULL, remove,
- remove2);
- VM_WARN_ON(vma_prep.anon_vma && adjust_next && next->anon_vma &&
- vma_prep.anon_vma != next->anon_vma);
-
- vma_prepare(&vma_prep);
-
- if (start < vma->vm_start || end > vma->vm_end)
- vma_changed = true;
-
- vma->vm_start = start;
- vma->vm_end = end;
- vma->vm_pgoff = pgoff;
-
- if (vma_changed)
- vma_iter_store(vmi, vma);
-
- if (adjust_next) {
- next->vm_start += adjust_next;
- next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- if (adjust_next < 0) {
- WARN_ON_ONCE(vma_changed);
- vma_iter_store(vmi, next);
- }
- }
-
- vma_complete(&vma_prep, vmi, mm);
- vma_iter_free(vmi);
- validate_mm(mm);
-
- return 0;
-}
-
/*
* If the vma has a ->close operation then the driver probably needs to release
* per-vma resources, so we don't attempt to merge those.
@@ -996,7 +869,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* It is important for case 8 that the vma NNNN overlapping the
* region AAAA is never going to extended over XXXX. Instead XXXX must
* be extended in region AAAA and NNNN must be removed. This way in
- * all cases where vma_merge succeeds, the moment vma_adjust drops the
+ * all cases where vma_merge succeeds, the moment vma_merge drops the
* rmap_locks, the properties of the merged vma will be already
* correct for the whole merged range. Some of those properties like
* vm_page_prot/vm_flags may be accessed by rmap_walks and they must
@@ -1006,6 +879,12 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* or other rmap walkers (if working on addresses beyond the "end"
* parameter) may establish ptes with the wrong permissions of NNNN
* instead of the right permissions of XXXX.
+ *
+ * In the code below:
+ * PPPP is represented by *prev
+ * NNNN is represented by *mid (and possibly equal to *next)
+ * XXXX is represented by *next or not represented at all.
+ * AAAA is not represented - it will be merged or the function will return NULL
*/
struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
struct vm_area_struct *prev, unsigned long addr,
@@ -1016,11 +895,19 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
struct anon_vma_name *anon_name)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
+ pgoff_t vma_pgoff;
struct vm_area_struct *mid, *next, *res = NULL;
+ struct vm_area_struct *vma, *adjust, *remove, *remove2;
int err = -1;
bool merge_prev = false;
bool merge_next = false;
+ bool vma_expanded = false;
+ struct vma_prepare vp;
+ unsigned long vma_end = end;
+ long adj_next = 0;
+ unsigned long vma_start = addr;
+ validate_mm(mm);
/*
* We later require that vma->vm_flags == vm_flags,
* so this tests vma->vm_flags & VM_SPECIAL, too.
@@ -1038,13 +925,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
VM_WARN_ON(mid && end > mid->vm_end);
VM_WARN_ON(addr >= end);
- /* Can we merge the predecessor? */
- if (prev && prev->vm_end == addr &&
- mpol_equal(vma_policy(prev), policy) &&
- can_vma_merge_after(prev, vm_flags,
- anon_vma, file, pgoff,
- vm_userfaultfd_ctx, anon_name)) {
- merge_prev = true;
+ if (prev) {
+ res = prev;
+ vma = prev;
+ vma_start = prev->vm_start;
+ vma_pgoff = prev->vm_pgoff;
+ /* Can we merge the predecessor? */
+ if (prev->vm_end == addr && mpol_equal(vma_policy(prev), policy)
+ && can_vma_merge_after(prev, vm_flags, anon_vma, file,
+ pgoff, vm_userfaultfd_ctx, anon_name)) {
+ merge_prev = true;
+ }
}
/* Can we merge the successor? */
if (next && end == next->vm_start &&
@@ -1054,32 +945,85 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vm_userfaultfd_ctx, anon_name)) {
merge_next = true;
}
+
+ remove = remove2 = adjust = NULL;
/* Can we merge both the predecessor and the successor? */
if (merge_prev && merge_next &&
- is_mergeable_anon_vma(prev->anon_vma,
- next->anon_vma, NULL)) { /* cases 1, 6 */
- err = __vma_adjust(vmi, prev, prev->vm_start,
- next->vm_end, prev->vm_pgoff, prev);
- res = prev;
- } else if (merge_prev) { /* cases 2, 5, 7 */
- err = __vma_adjust(vmi, prev, prev->vm_start,
- end, prev->vm_pgoff, prev);
- res = prev;
+ is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
+ remove = mid; /* case 1 */
+ vma_end = next->vm_end;
+ err = dup_anon_vma(res, remove);
+ if (mid != next) { /* case 6 */
+ remove2 = next;
+ if (!remove->anon_vma)
+ err = dup_anon_vma(res, remove2);
+ }
+ } else if (merge_prev) {
+ err = 0; /* case 2 */
+ if (mid && end > mid->vm_start) {
+ err = dup_anon_vma(res, mid);
+ if (end == mid->vm_end) { /* case 7 */
+ remove = mid;
+ } else { /* case 5 */
+ adjust = mid;
+ adj_next = (end - mid->vm_start);
+ }
+ }
} else if (merge_next) {
- if (prev && addr < prev->vm_end) /* case 4 */
- err = __vma_adjust(vmi, prev, prev->vm_start,
- addr, prev->vm_pgoff, next);
- else /* cases 3, 8 */
- err = __vma_adjust(vmi, mid, addr, next->vm_end,
- next->vm_pgoff - pglen, next);
res = next;
+ if (prev && addr < prev->vm_end) { /* case 4 */
+ vma_end = addr;
+ adjust = mid;
+ adj_next = -(vma->vm_end - addr);
+ err = dup_anon_vma(res, adjust);
+ } else {
+ vma = next; /* case 3 */
+ vma_start = addr;
+ vma_end = next->vm_end;
+ vma_pgoff = next->vm_pgoff;
+ err = 0;
+ if (mid != next) { /* case 8 */
+ remove = mid;
+ err = dup_anon_vma(res, remove);
+ }
+ }
}
- /*
- * Cannot merge with predecessor or successor or error in __vma_adjust?
- */
+ /* Cannot merge or error in anon_vma clone */
if (err)
return NULL;
+
+ if (vma_iter_prealloc(vmi))
+ return NULL;
+
+ vma_adjust_trans_huge(vma, vma_start, vma_end, adj_next);
+ init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
+ VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
+ vp.anon_vma != adjust->anon_vma);
+
+ vma_prepare(&vp);
+ if (vma_start < vma->vm_start || vma_end > vma->vm_end)
+ vma_expanded = true;
+
+ vma->vm_start = vma_start;
+ vma->vm_end = vma_end;
+ vma->vm_pgoff = vma_pgoff;
+
+ if (vma_expanded)
+ vma_iter_store(vmi, vma);
+
+ if (adj_next) {
+ adjust->vm_start += adj_next;
+ adjust->vm_pgoff += adj_next >> PAGE_SHIFT;
+ if (adj_next < 0) {
+ WARN_ON(vma_expanded);
+ vma_iter_store(vmi, next);
+ }
+ }
+
+ vma_complete(&vp, vmi, mm);
+ vma_iter_free(vmi);
+ validate_mm(mm);
khugepaged_enter_vma(res, vm_flags);
if (res)
diff --git a/mm/rmap.c b/mm/rmap.c
index 948ca17a96ad..f6176c09fcc7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -262,11 +262,12 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
* Attach the anon_vmas from src to dst.
* Returns 0 on success, -ENOMEM on failure.
*
- * anon_vma_clone() is called by __vma_adjust(), __split_vma(), copy_vma() and
- * anon_vma_fork(). The first three want an exact copy of src, while the last
- * one, anon_vma_fork(), may try to reuse an existing anon_vma to prevent
- * endless growth of anon_vma. Since dst->anon_vma is set to NULL before call,
- * we can identify this case by checking (!dst->anon_vma && src->anon_vma).
+ * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
+ * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
+ * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
+ * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
+ * call, we can identify this case by checking (!dst->anon_vma &&
+ * src->anon_vma).
*
* If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
* and reuse existing anon_vma which has no vmas and only one child anon_vma.
@@ -1253,7 +1254,7 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
__lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
if (likely(!folio_test_ksm(folio))) {
- /* address might be in next vma when migration races vma_adjust */
+ /* address might be in next vma when migration races vma_merge */
if (first)
__page_set_anon_rmap(folio, page, vma, address,
!!(flags & RMAP_EXCLUSIVE));
@@ -2538,7 +2539,7 @@ void hugepage_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
BUG_ON(!folio_test_locked(folio));
BUG_ON(!anon_vma);
- /* address might be in next vma when migration races vma_adjust */
+ /* address might be in next vma when migration races vma_merge */
first = atomic_inc_and_test(&folio->_entire_mapcount);
VM_BUG_ON_PAGE(!first && (flags & RMAP_EXCLUSIVE), page);
VM_BUG_ON_PAGE(!first && PageAnonExclusive(page), page);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Avoid using the maple tree interface directly.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index e2ba9b094cad..09a5b6e00374 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -487,10 +487,10 @@ static inline void vma_mas_szero(struct ma_state *mas, unsigned long start,
static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
{
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
struct address_space *mapping = NULL;
- if (mas_preallocate(&mas, GFP_KERNEL))
+ if (vma_iter_prealloc(&vmi))
return -ENOMEM;
if (vma->vm_file) {
@@ -498,7 +498,7 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
i_mmap_lock_write(mapping);
}
- vma_mas_store(vma, &mas);
+ vma_iter_store(&vmi, vma);
if (mapping) {
__vma_link_file(vma, mapping);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Move the locking into vma_prepare() and vma_complete() for use elsewhere
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/internal.h | 14 +++
mm/mmap.c | 231 +++++++++++++++++++++++++++++---------------------
2 files changed, 150 insertions(+), 95 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index b4f66efc912d..bcd01a6e8ed2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -937,4 +937,18 @@ static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
return 0;
}
+
+/*
+ * VMA lock generalization
+ */
+struct vma_prepare {
+ struct vm_area_struct *vma;
+ struct vm_area_struct *adj_next;
+ struct file *file;
+ struct address_space *mapping;
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *insert;
+ struct vm_area_struct *remove;
+ struct vm_area_struct *remove2;
+};
#endif /* __MM_INTERNAL_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index b83c70c59a76..9afaf05eb96b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -576,6 +576,127 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
return -ENOMEM;
}
+/*
+ * vma_prepare() - Helper function for handling locking VMAs prior to altering
+ * @vp: The initialized vma_prepare struct
+ */
+static inline void vma_prepare(struct vma_prepare *vp)
+{
+ if (vp->file) {
+ uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
+
+ if (vp->adj_next)
+ uprobe_munmap(vp->adj_next, vp->adj_next->vm_start,
+ vp->adj_next->vm_end);
+
+ i_mmap_lock_write(vp->mapping);
+ if (vp->insert && vp->insert->vm_file) {
+ /*
+ * Put into interval tree now, so instantiated pages
+ * are visible to arm/parisc __flush_dcache_page
+ * throughout; but we cannot insert into address
+ * space until vma start or end is updated.
+ */
+ __vma_link_file(vp->insert,
+ vp->insert->vm_file->f_mapping);
+ }
+ }
+
+ if (vp->anon_vma) {
+ anon_vma_lock_write(vp->anon_vma);
+ anon_vma_interval_tree_pre_update_vma(vp->vma);
+ if (vp->adj_next)
+ anon_vma_interval_tree_pre_update_vma(vp->adj_next);
+ }
+
+ if (vp->file) {
+ flush_dcache_mmap_lock(vp->mapping);
+ vma_interval_tree_remove(vp->vma, &vp->mapping->i_mmap);
+ if (vp->adj_next)
+ vma_interval_tree_remove(vp->adj_next,
+ &vp->mapping->i_mmap);
+ }
+
+}
+
+/*
+ * vma_complete- Helper function for handling the unlocking after altering VMAs,
+ * or for inserting a VMA.
+ *
+ * @vp: The vma_prepare struct
+ * @vmi: The vma iterator
+ * @mm: The mm_struct
+ */
+static inline void vma_complete(struct vma_prepare *vp,
+ struct vma_iterator *vmi, struct mm_struct *mm)
+{
+ if (vp->file) {
+ if (vp->adj_next)
+ vma_interval_tree_insert(vp->adj_next,
+ &vp->mapping->i_mmap);
+ vma_interval_tree_insert(vp->vma, &vp->mapping->i_mmap);
+ flush_dcache_mmap_unlock(vp->mapping);
+ }
+
+ if (vp->remove && vp->file) {
+ __remove_shared_vm_struct(vp->remove, vp->file, vp->mapping);
+ if (vp->remove2)
+ __remove_shared_vm_struct(vp->remove2, vp->file,
+ vp->mapping);
+ } else if (vp->insert) {
+ /*
+ * split_vma has split insert from vma, and needs
+ * us to insert it before dropping the locks
+ * (it may either follow vma or precede it).
+ */
+ vma_iter_store(vmi, vp->insert);
+ mm->map_count++;
+ }
+
+ if (vp->anon_vma) {
+ anon_vma_interval_tree_post_update_vma(vp->vma);
+ if (vp->adj_next)
+ anon_vma_interval_tree_post_update_vma(vp->adj_next);
+ anon_vma_unlock_write(vp->anon_vma);
+ }
+
+ if (vp->file) {
+ i_mmap_unlock_write(vp->mapping);
+ uprobe_mmap(vp->vma);
+
+ if (vp->adj_next)
+ uprobe_mmap(vp->adj_next);
+ }
+
+ if (vp->remove) {
+again:
+ if (vp->file) {
+ uprobe_munmap(vp->remove, vp->remove->vm_start,
+ vp->remove->vm_end);
+ fput(vp->file);
+ }
+ if (vp->remove->anon_vma)
+ anon_vma_merge(vp->vma, vp->remove);
+ mm->map_count--;
+ mpol_put(vma_policy(vp->remove));
+ if (!vp->remove2)
+ WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end);
+ vm_area_free(vp->remove);
+
+ /*
+ * In mprotect's case 6 (see comments on vma_merge),
+ * we must remove next_next too.
+ */
+ if (vp->remove2) {
+ vp->remove = vp->remove2;
+ vp->remove2 = NULL;
+ goto again;
+ }
+ }
+ if (vp->insert && vp->file)
+ uprobe_mmap(vp->insert);
+}
+
/*
* We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
* is already present in an i_mmap tree without adjusting the tree.
@@ -591,14 +712,13 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct vm_area_struct *next_next = NULL; /* uninit var warning */
struct vm_area_struct *next = find_vma(mm, vma->vm_end);
struct vm_area_struct *orig_vma = vma;
- struct address_space *mapping = NULL;
- struct rb_root_cached *root = NULL;
struct anon_vma *anon_vma = NULL;
struct file *file = vma->vm_file;
bool vma_changed = false;
long adjust_next = 0;
int remove_next = 0;
struct vm_area_struct *exporter = NULL, *importer = NULL;
+ struct vma_prepare vma_prep;
if (next && !insert) {
if (end >= next->vm_end) {
@@ -694,39 +814,22 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
anon_vma != next->anon_vma);
vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
- if (file) {
- mapping = file->f_mapping;
- root = &mapping->i_mmap;
- uprobe_munmap(vma, vma->vm_start, vma->vm_end);
-
- if (adjust_next)
- uprobe_munmap(next, next->vm_start, next->vm_end);
-
- i_mmap_lock_write(mapping);
- if (insert && insert->vm_file) {
- /*
- * Put into interval tree now, so instantiated pages
- * are visible to arm/parisc __flush_dcache_page
- * throughout; but we cannot insert into address
- * space until vma start or end is updated.
- */
- __vma_link_file(insert, insert->vm_file->f_mapping);
- }
- }
- if (anon_vma) {
- anon_vma_lock_write(anon_vma);
- anon_vma_interval_tree_pre_update_vma(vma);
- if (adjust_next)
- anon_vma_interval_tree_pre_update_vma(next);
+ memset(&vma_prep, 0, sizeof(vma_prep));
+ vma_prep.vma = vma;
+ vma_prep.anon_vma = anon_vma;
+ vma_prep.file = file;
+ if (adjust_next)
+ vma_prep.adj_next = next;
+ if (file)
+ vma_prep.mapping = file->f_mapping;
+ vma_prep.insert = insert;
+ if (remove_next) {
+ vma_prep.remove = next;
+ vma_prep.remove2 = next_next;
}
- if (file) {
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, root);
- if (adjust_next)
- vma_interval_tree_remove(next, root);
- }
+ vma_prepare(&vma_prep);
if (start != vma->vm_start) {
if (vma->vm_start < start) {
@@ -764,69 +867,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_iter_store(vmi, next);
}
- if (file) {
- if (adjust_next)
- vma_interval_tree_insert(next, root);
- vma_interval_tree_insert(vma, root);
- flush_dcache_mmap_unlock(mapping);
- }
-
- if (remove_next && file) {
- __remove_shared_vm_struct(next, file, mapping);
- if (remove_next == 2)
- __remove_shared_vm_struct(next_next, file, mapping);
- } else if (insert) {
- /*
- * split_vma has split insert from vma, and needs
- * us to insert it before dropping the locks
- * (it may either follow vma or precede it).
- */
- vma_iter_store(vmi, insert);
- mm->map_count++;
- }
-
- if (anon_vma) {
- anon_vma_interval_tree_post_update_vma(vma);
- if (adjust_next)
- anon_vma_interval_tree_post_update_vma(next);
- anon_vma_unlock_write(anon_vma);
- }
-
- if (file) {
- i_mmap_unlock_write(mapping);
- uprobe_mmap(vma);
-
- if (adjust_next)
- uprobe_mmap(next);
- }
-
- if (remove_next) {
-again:
- if (file) {
- uprobe_munmap(next, next->vm_start, next->vm_end);
- fput(file);
- }
- if (next->anon_vma)
- anon_vma_merge(vma, next);
- mm->map_count--;
- mpol_put(vma_policy(next));
- if (remove_next != 2)
- BUG_ON(vma->vm_end < next->vm_end);
- vm_area_free(next);
-
- /*
- * In mprotect's case 6 (see comments on vma_merge),
- * we must remove next_next too.
- */
- if (remove_next == 2) {
- remove_next = 1;
- next = next_next;
- goto again;
- }
- }
- if (insert && file)
- uprobe_mmap(insert);
-
+ vma_complete(&vma_prep, vmi, mm);
vma_iter_free(vmi);
validate_mm(mm);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 5092d0405883..0de180bb4df0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2391,7 +2391,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
goto map_count_exceeded;
- error = __split_vma(mm, vma, start, 0);
+ error = vmi__split_vma(vmi, mm, vma, start, 0);
if (error)
goto start_split_failed;
@@ -2412,7 +2412,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (next->vm_end > end) {
struct vm_area_struct *split;
- error = __split_vma(mm, next, end, 1);
+ error = vmi__split_vma(vmi, mm, next, end, 1);
if (error)
goto end_split_failed;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/userfaultfd.c | 87 ++++++++++++++++++------------------------------
1 file changed, 33 insertions(+), 54 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 15a5bf765d43..4334bd35984d 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -883,7 +883,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
/* len == 0 means wake all */
struct userfaultfd_wake_range range = { .len = 0, };
unsigned long new_flags;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
WRITE_ONCE(ctx->released, true);
@@ -900,7 +900,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
*/
mmap_write_lock(mm);
prev = NULL;
- mas_for_each(&mas, vma, ULONG_MAX) {
+ for_each_vma(vmi, vma) {
cond_resched();
BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^
!!(vma->vm_flags & __VM_UFFD_FLAGS));
@@ -909,13 +909,12 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
continue;
}
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
+ prev = vmi_vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
NULL_VM_UFFD_CTX, anon_vma_name(vma));
if (prev) {
- mas_pause(&mas);
vma = prev;
} else {
prev = vma;
@@ -1302,7 +1301,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
bool found;
bool basic_ioctls;
unsigned long start, end, vma_end;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;
user_uffdio_register = (struct uffdio_register __user *) arg;
@@ -1344,17 +1343,13 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
if (!mmget_not_zero(mm))
goto out;
+ ret = -EINVAL;
mmap_write_lock(mm);
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_find(&vmi, end);
if (!vma)
goto out_unlock;
- /* check that there's at least one vma in the range */
- ret = -EINVAL;
- if (vma->vm_start >= end)
- goto out_unlock;
-
/*
* If the first vma contains huge pages, make sure start address
* is aligned to huge page size.
@@ -1371,7 +1366,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
*/
found = false;
basic_ioctls = false;
- for (cur = vma; cur; cur = mas_next(&mas, end - 1)) {
+ cur = vma;
+ do {
cond_resched();
BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
@@ -1428,16 +1424,14 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
basic_ioctls = true;
found = true;
- }
+ } for_each_vma_range(vmi, cur, end);
BUG_ON(!found);
- mas_set(&mas, start);
- prev = mas_prev(&mas, 0);
- if (prev != vma)
- mas_next(&mas, ULONG_MAX);
+ vma_iter_set(&vmi, start);
+ prev = vma_prev(&vmi);
ret = 0;
- do {
+ for_each_vma_range(vmi, vma, end) {
cond_resched();
BUG_ON(!vma_can_userfault(vma, vm_flags));
@@ -1458,30 +1452,25 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vma_end = min(end, vma->vm_end);
new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
- prev = vma_merge(mm, prev, start, vma_end, new_flags,
+ prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
((struct vm_userfaultfd_ctx){ ctx }),
anon_vma_name(vma));
if (prev) {
/* vma_merge() invalidated the mas */
- mas_pause(&mas);
vma = prev;
goto next;
}
if (vma->vm_start < start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(&vmi, mm, vma, start, 1);
if (ret)
break;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
if (vma->vm_end > end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = vmi_split_vma(&vmi, mm, vma, end, 0);
if (ret)
break;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
next:
/*
@@ -1498,8 +1487,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
skip:
prev = vma;
start = vma->vm_end;
- vma = mas_next(&mas, end - 1);
- } while (vma);
+ }
+
out_unlock:
mmap_write_unlock(mm);
mmput(mm);
@@ -1543,7 +1532,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
bool found;
unsigned long start, end, vma_end;
const void __user *buf = (void __user *)arg;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;
ret = -EFAULT;
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1562,14 +1551,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out;
mmap_write_lock(mm);
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
- if (!vma)
- goto out_unlock;
-
- /* check that there's at least one vma in the range */
ret = -EINVAL;
- if (vma->vm_start >= end)
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_find(&vmi, end);
+ if (!vma)
goto out_unlock;
/*
@@ -1587,8 +1572,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
* Search for not compatible vmas.
*/
found = false;
- ret = -EINVAL;
- for (cur = vma; cur; cur = mas_next(&mas, end - 1)) {
+ cur = vma;
+ do {
cond_resched();
BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
@@ -1605,16 +1590,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out_unlock;
found = true;
- }
+ } for_each_vma_range(vmi, cur, end);
BUG_ON(!found);
- mas_set(&mas, start);
- prev = mas_prev(&mas, 0);
- if (prev != vma)
- mas_next(&mas, ULONG_MAX);
-
+ vma_iter_set(&vmi, start);
+ prev = vma_prev(&vmi);
ret = 0;
- do {
+ for_each_vma_range(vmi, vma, end) {
cond_resched();
BUG_ON(!vma_can_userfault(vma, vma->vm_flags));
@@ -1650,26 +1632,23 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
uffd_wp_range(mm, vma, start, vma_end - start, false);
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vma_merge(mm, prev, start, vma_end, new_flags,
+ prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
NULL_VM_UFFD_CTX, anon_vma_name(vma));
if (prev) {
vma = prev;
- mas_pause(&mas);
goto next;
}
if (vma->vm_start < start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(&vmi, mm, vma, start, 1);
if (ret)
break;
- mas_pause(&mas);
}
if (vma->vm_end > end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = vmi_split_vma(&vmi, mm, vma, end, 0);
if (ret)
break;
- mas_pause(&mas);
}
next:
/*
@@ -1683,8 +1662,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
skip:
prev = vma;
start = vma->vm_end;
- vma = mas_next(&mas, end - 1);
- } while (vma);
+ }
+
out_unlock:
mmap_write_unlock(mm);
mmput(mm);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Move the anon_vma setting & warn_no up the function. This is done to
clear up the locking later.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 56483b837ef9..b83c70c59a76 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -685,6 +685,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (vma_iter_prealloc(vmi))
return -ENOMEM;
+ anon_vma = vma->anon_vma;
+ if (!anon_vma && adjust_next)
+ anon_vma = next->anon_vma;
+
+ if (anon_vma)
+ VM_WARN_ON(adjust_next && next->anon_vma &&
+ anon_vma != next->anon_vma);
+
vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
if (file) {
mapping = file->f_mapping;
@@ -706,12 +714,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
}
- anon_vma = vma->anon_vma;
- if (!anon_vma && adjust_next)
- anon_vma = next->anon_vma;
if (anon_vma) {
- VM_WARN_ON(adjust_next && next->anon_vma &&
- anon_vma != next->anon_vma);
anon_vma_lock_write(anon_vma);
anon_vma_interval_tree_pre_update_vma(vma);
if (adjust_next)
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/madvise.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 92a3c6bd84c1..4d4471916465 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -142,6 +142,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
int error;
pgoff_t pgoff;
+ VMA_ITERATOR(vmi, mm, 0);
if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
*prev = vma;
@@ -149,8 +150,8 @@ static int madvise_update_vma(struct vm_area_struct *vma,
}
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
- vma->vm_file, pgoff, vma_policy(vma),
+ *prev = vmi_vma_merge(&vmi, mm, *prev, start, end, new_flags,
+ vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_name);
if (*prev) {
vma = *prev;
@@ -162,7 +163,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (start != vma->vm_start) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = __split_vma(mm, vma, start, 1);
+ error = vmi__split_vma(&vmi, mm, vma, start, 1);
if (error)
return error;
}
@@ -170,7 +171,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (end != vma->vm_end) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = __split_vma(mm, vma, end, 0);
+ error = vmi__split_vma(&vmi, mm, vma, end, 0);
if (error)
return error;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Start passing the vma iterator through the mm code. This will allow for
reuse of the state and cleaner invalidation if necessary.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 2 +-
mm/mmap.c | 77 +++++++++++++++++++++-------------------------
mm/mremap.c | 6 ++--
3 files changed, 39 insertions(+), 46 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b977a90d9829..152a1362b800 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2905,7 +2905,7 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr,
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
unsigned long pgoff, unsigned long *populate, struct list_head *uf);
-extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade);
extern int do_munmap(struct mm_struct *, unsigned long, size_t,
diff --git a/mm/mmap.c b/mm/mmap.c
index 83d25fcc2f6d..18f5f71a9202 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2305,8 +2305,8 @@ static inline int munmap_sidetree(struct vm_area_struct *vma,
}
/*
- * do_mas_align_munmap() - munmap the aligned region from @start to @end.
- * @mas: The maple_state, ideally set up to alter the correct tree location.
+ * do_vmi_align_munmap() - munmap the aligned region from @start to @end.
+ * @vmi: The vma iterator
* @vma: The starting vm_area_struct
* @mm: The mm_struct
* @start: The aligned start address to munmap.
@@ -2317,7 +2317,7 @@ static inline int munmap_sidetree(struct vm_area_struct *vma,
* If @downgrade is true, check return code for potential release of the lock.
*/
static int
-do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct mm_struct *mm, unsigned long start,
unsigned long end, struct list_head *uf, bool downgrade)
{
@@ -2329,7 +2329,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);
- mas->last = end - 1;
/*
* If we need to split any vma, do it now to save pain later.
*
@@ -2349,27 +2348,23 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
goto map_count_exceeded;
- /*
- * mas_pause() is not needed since mas->index needs to be set
- * differently than vma->vm_end anyways.
- */
error = __split_vma(mm, vma, start, 0);
if (error)
goto start_split_failed;
- mas_set(mas, start);
- vma = mas_walk(mas);
+ vma_iter_set(vmi, start);
+ vma = vma_find(vmi, end);
}
- prev = mas_prev(mas, 0);
+ prev = vma_prev(vmi);
if (unlikely((!prev)))
- mas_set(mas, start);
+ vma_iter_set(vmi, start);
/*
* Detach a range of VMAs from the mm. Using next as a temp variable as
* it is always overwritten.
*/
- mas_for_each(mas, next, end - 1) {
+ for_each_vma_range(*vmi, next, end) {
/* Does it split the end? */
if (next->vm_end > end) {
struct vm_area_struct *split;
@@ -2378,8 +2373,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
if (error)
goto end_split_failed;
- mas_set(mas, end);
- split = mas_prev(mas, 0);
+ vma_iter_set(vmi, end);
+ split = vma_prev(vmi);
error = munmap_sidetree(split, &mas_detach);
if (error)
goto munmap_sidetree_failed;
@@ -2401,7 +2396,7 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
}
if (!next)
- next = mas_next(mas, ULONG_MAX);
+ next = vma_next(vmi);
if (unlikely(uf)) {
/*
@@ -2426,10 +2421,10 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
struct vm_area_struct *vma_mas, *vma_test;
int test_count = 0;
- mas_set_range(mas, start, end - 1);
+ vma_iter_set(vmi, start);
rcu_read_lock();
vma_test = mas_find(&test, end - 1);
- mas_for_each(mas, vma_mas, end - 1) {
+ for_each_vma_range(*vmi, vma_mas, end) {
BUG_ON(vma_mas != vma_test);
test_count++;
vma_test = mas_next(&test, end - 1);
@@ -2439,8 +2434,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
}
#endif
/* Point of no return */
- mas_set_range(mas, start, end - 1);
- if (mas_store_gfp(mas, NULL, GFP_KERNEL))
+ vma_iter_set(vmi, start);
+ if (vma_iter_clear_gfp(vmi, start, end, GFP_KERNEL))
return -ENOMEM;
mm->map_count -= count;
@@ -2478,8 +2473,8 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
}
/*
- * do_mas_munmap() - munmap a given range.
- * @mas: The maple state
+ * do_vmi_munmap() - munmap a given range.
+ * @vmi: The vma iterator
* @mm: The mm_struct
* @start: The start address to munmap
* @len: The length of the range to munmap
@@ -2493,7 +2488,7 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
*
* Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
*/
-int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade)
{
@@ -2511,11 +2506,11 @@ int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
arch_unmap(mm, start, end);
/* Find the first overlapping VMA */
- vma = mas_find(mas, end - 1);
+ vma = vma_find(vmi, end);
if (!vma)
return 0;
- return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade);
+ return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, downgrade);
}
/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
@@ -2527,9 +2522,9 @@ int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
struct list_head *uf)
{
- MA_STATE(mas, &mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, mm, start);
- return do_mas_munmap(&mas, mm, start, len, uf, false);
+ return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
unsigned long mmap_region(struct file *file, unsigned long addr,
@@ -2545,7 +2540,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long merge_start = addr, merge_end = end;
pgoff_t vm_pgoff;
int error;
- MA_STATE(mas, &mm->mm_mt, addr, end - 1);
+ VMA_ITERATOR(vmi, mm, addr);
/* Check against address space limit. */
if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
@@ -2563,7 +2558,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}
/* Unmap any existing mapping in the area */
- if (do_mas_munmap(&mas, mm, addr, len, uf, false))
+ if (do_vmi_munmap(&vmi, mm, addr, len, uf, false))
return -ENOMEM;
/*
@@ -2576,8 +2571,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags |= VM_ACCOUNT;
}
- next = mas_next(&mas, ULONG_MAX);
- prev = mas_prev(&mas, 0);
+ next = vma_next(&vmi);
+ prev = vma_prev(&vmi);
if (vm_flags & VM_SPECIAL)
goto cannot_expand;
@@ -2605,13 +2600,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Actually expand, if possible */
if (vma &&
- !vma_expand(&mas, vma, merge_start, merge_end, vm_pgoff, next)) {
+ !vma_expand(&vmi.mas, vma, merge_start, merge_end, vm_pgoff, next)) {
khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}
- mas.index = addr;
- mas.last = end - 1;
cannot_expand:
/*
* Determine the object being mapped and call the appropriate
@@ -2650,7 +2643,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
error = -EINVAL;
goto close_and_free_vma;
}
- mas_reset(&mas);
+ vma_iter_set(&vmi, addr);
/*
* If vm_flags changed after call_mmap(), we should try merge
@@ -2696,7 +2689,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
goto free_vma;
}
- if (mas_preallocate(&mas, GFP_KERNEL)) {
+ if (vma_iter_prealloc(&vmi)) {
error = -ENOMEM;
if (file)
goto close_and_free_vma;
@@ -2709,7 +2702,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
- vma_mas_store(vma, &mas);
+ vma_iter_store(&vmi, vma);
mm->map_count++;
if (vma->vm_file) {
if (vma->vm_flags & VM_SHARED)
@@ -2770,7 +2763,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma->vm_file = NULL;
/* Undo any partial mapping done by a device driver. */
- unmap_region(mm, mas.tree, vma, prev, next, vma->vm_start, vma->vm_end);
+ unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start, vma->vm_end);
if (file && (vm_flags & VM_SHARED))
mapping_unmap_writable(file->f_mapping);
free_vma:
@@ -2787,12 +2780,12 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
int ret;
struct mm_struct *mm = current->mm;
LIST_HEAD(uf);
- MA_STATE(mas, &mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, mm, start);
if (mmap_write_lock_killable(mm))
return -EINTR;
- ret = do_mas_munmap(&mas, mm, start, len, &uf, downgrade);
+ ret = do_vmi_munmap(&vmi, mm, start, len, &uf, downgrade);
/*
* Returning 1 indicates mmap_lock is downgraded.
* But 1 is not legal return value of vm_munmap() and munmap(), reset
@@ -2924,7 +2917,7 @@ static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
int ret;
arch_unmap(mm, newbrk, oldbrk);
- ret = do_mas_align_munmap(&vmi->mas, vma, mm, newbrk, oldbrk, uf, true);
+ ret = do_vmi_align_munmap(vmi, vma, mm, newbrk, oldbrk, uf, true);
validate_mm_mt(mm);
return ret;
}
@@ -3047,7 +3040,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
if (ret)
goto limits_failed;
- ret = do_mas_munmap(&vmi.mas, mm, addr, len, &uf, 0);
+ ret = do_vmi_munmap(&vmi, mm, addr, len, &uf, 0);
if (ret)
goto munmap_failed;
diff --git a/mm/mremap.c b/mm/mremap.c
index 05f90f47e149..3cc64c3f8bdb 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -978,14 +978,14 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
/*
* Always allow a shrinking remap: that just unmaps
* the unnecessary pages..
- * do_mas_munmap does all the needed commit accounting, and
+ * do_vmi_munmap does all the needed commit accounting, and
* downgrades mmap_lock to read if so directed.
*/
if (old_len >= new_len) {
int retval;
- MA_STATE(mas, &mm->mm_mt, addr + new_len, addr + new_len);
+ VMA_ITERATOR(vmi, mm, addr + new_len);
- retval = do_mas_munmap(&mas, mm, addr + new_len,
+ retval = do_vmi_munmap(&vmi, mm, addr + new_len,
old_len - new_len, &uf_unmap, true);
/* Returning 1 indicates mmap_lock is downgraded to read. */
if (retval == 1) {
--
2.35.1
When mas_prev() does not find anything, set the state to MAS_NONE.
Handle the MAS_NONE in mas_find() like a MAS_START.
Reported-by: [email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 14ce355e4149..43bbf1906571 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4844,7 +4844,7 @@ static inline void *mas_prev_entry(struct ma_state *mas, unsigned long min)
if (mas->index < min) {
mas->index = mas->last = min;
- mas_pause(mas);
+ mas->node = MAS_NONE;
return NULL;
}
retry:
@@ -5918,6 +5918,7 @@ void *mas_prev(struct ma_state *mas, unsigned long min)
if (!mas->index) {
/* Nothing comes before 0 */
mas->last = 0;
+ mas->node = MAS_NONE;
return NULL;
}
@@ -6008,6 +6009,9 @@ void *mas_find(struct ma_state *mas, unsigned long max)
mas->index = ++mas->last;
}
+ if (unlikely(mas_is_none(mas)))
+ mas->node = MAS_START;
+
if (unlikely(mas_is_start(mas))) {
/* First run or continue */
void *entry;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Move logic of unrolling to the error path as apposed to duplicating it
within the function body. This reduces the potential of missing an
update to one path when making changes.
Cc: Li Zetao <[email protected]>
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 45 ++++++++++++++++++---------------------------
1 file changed, 18 insertions(+), 27 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index b6bedc07ef11..d9e2666d8059 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2604,12 +2604,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* Expansion is handled above, merging is handled below.
* Drivers should not alter the address of the VMA.
*/
- if (WARN_ON((addr != vma->vm_start))) {
- error = -EINVAL;
+ error = -EINVAL;
+ if (WARN_ON((addr != vma->vm_start)))
goto close_and_free_vma;
- }
- vma_iter_set(&vmi, addr);
+ vma_iter_set(&vmi, addr);
/*
* If vm_flags changed after call_mmap(), we should try merge
* vma again as we may succeed this time.
@@ -2646,25 +2645,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}
/* Allow architectures to sanity-check the vm_flags */
- if (!arch_validate_flags(vma->vm_flags)) {
- error = -EINVAL;
- if (file)
- goto close_and_free_vma;
- else if (vma->vm_file)
- goto unmap_and_free_vma;
- else
- goto free_vma;
- }
+ error = -EINVAL;
+ if (!arch_validate_flags(vma->vm_flags))
+ goto close_and_free_vma;
- if (vma_iter_prealloc(&vmi)) {
- error = -ENOMEM;
- if (file)
- goto close_and_free_vma;
- else if (vma->vm_file)
- goto unmap_and_free_vma;
- else
- goto free_vma;
- }
+ error = -ENOMEM;
+ if (vma_iter_prealloc(&vmi))
+ goto close_and_free_vma;
if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
@@ -2723,14 +2710,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
return addr;
close_and_free_vma:
- if (vma->vm_ops && vma->vm_ops->close)
+ if (file && vma->vm_ops && vma->vm_ops->close)
vma->vm_ops->close(vma);
+
+ if (file || vma->vm_file) {
unmap_and_free_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
- /* Undo any partial mapping done by a device driver. */
- unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start,
+ vma->vm_end);
+ }
if (file && (vm_flags & VM_SHARED))
mapping_unmap_writable(file->f_mapping);
free_vma:
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Splitting can be more efficient when the order is not of concern.
Change do_vmi_align_munmap() to reduce walking of the tree during split
operations.
move_vma() must also be altered to remove the dependency of keeping the
original VMA as the active part of the split. Transition to using vma
iterator to look up the prev and/or next vma after munmap.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 18 ++----------------
mm/mremap.c | 27 ++++++++++++++++-----------
2 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index d9e2666d8059..56483b837ef9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2332,21 +2332,9 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
for_each_vma_range(*vmi, next, end) {
/* Does it split the end? */
if (next->vm_end > end) {
- struct vm_area_struct *split;
-
- error = __split_vma(vmi, next, end, 1);
+ error = __split_vma(vmi, next, end, 0);
if (error)
goto end_split_failed;
-
- split = vma_prev(vmi);
- error = munmap_sidetree(split, &mas_detach);
- if (error)
- goto munmap_sidetree_failed;
-
- count++;
- if (vma == next)
- vma = split;
- break;
}
error = munmap_sidetree(next, &mas_detach);
if (error)
@@ -2359,9 +2347,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
#endif
}
- if (!next)
- next = vma_next(vmi);
-
+ next = vma_next(vmi);
if (unlikely(uf)) {
/*
* If userfaultfd_unmap_prep returns an error the vmas
diff --git a/mm/mremap.c b/mm/mremap.c
index 2176f0cc7f9a..1bc81afd90de 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -580,11 +580,12 @@ static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long vm_flags = vma->vm_flags;
unsigned long new_pgoff;
unsigned long moved_len;
- unsigned long excess = 0;
+ unsigned long account_start = 0;
+ unsigned long account_end = 0;
unsigned long hiwater_vm;
- int split = 0;
int err = 0;
bool need_rmap_locks;
+ VMA_ITERATOR(vmi, mm, old_addr);
/*
* We'd prefer to avoid failure later on in do_munmap:
@@ -662,10 +663,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
/* Conceal VM_ACCOUNT so old reservation is not undone */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
vma->vm_flags &= ~VM_ACCOUNT;
- excess = vma->vm_end - vma->vm_start - old_len;
- if (old_addr > vma->vm_start &&
- old_addr + old_len < vma->vm_end)
- split = 1;
+ if (vma->vm_start < old_addr)
+ account_start = vma->vm_start;
+ if (vma->vm_end > old_addr + old_len)
+ account_end = vma->vm_end;
}
/*
@@ -700,11 +701,11 @@ static unsigned long move_vma(struct vm_area_struct *vma,
return new_addr;
}
- if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
+ if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) {
/* OOM: unable to split vma, just get accounts right */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
vm_acct_memory(old_len >> PAGE_SHIFT);
- excess = 0;
+ account_start = account_end = 0;
}
if (vm_flags & VM_LOCKED) {
@@ -715,10 +716,14 @@ static unsigned long move_vma(struct vm_area_struct *vma,
mm->hiwater_vm = hiwater_vm;
/* Restore VM_ACCOUNT if one or two pieces of vma left */
- if (excess) {
+ if (account_start) {
+ vma = vma_prev(&vmi);
+ vma->vm_flags |= VM_ACCOUNT;
+ }
+
+ if (account_end) {
+ vma = vma_next(&vmi);
vma->vm_flags |= VM_ACCOUNT;
- if (split)
- find_vma(mm, vma->vm_end)->vm_flags |= VM_ACCOUNT;
}
return new_addr;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
The split_vma() wrapper is specifically for this use case, so use it.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/madvise.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 02b317726c9a..7db6622f8293 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -161,17 +161,13 @@ static int madvise_update_vma(struct vm_area_struct *vma,
*prev = vma;
if (start != vma->vm_start) {
- if (unlikely(mm->map_count >= sysctl_max_map_count))
- return -ENOMEM;
- error = __split_vma(&vmi, vma, start, 1);
+ error = split_vma(&vmi, vma, start, 1);
if (error)
return error;
}
if (end != vma->vm_end) {
- if (unlikely(mm->map_count >= sysctl_max_map_count))
- return -ENOMEM;
- error = __split_vma(&vmi, vma, end, 0);
+ error = split_vma(&vmi, vma, end, 0);
if (error)
return error;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
In preparation of passing the vma state through split, the
pre-allocation that occurs before the split has to be moved to after.
Since the preallocation would then live right next to the store, just
call store instead of preallocating. This effectively restores the
potential error path of splitting and not munmap'ing which pre-dates the
maple tree.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 09a5b6e00374..83d25fcc2f6d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2329,9 +2329,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);
- if (mas_preallocate(mas, GFP_KERNEL))
- return -ENOMEM;
-
mas->last = end - 1;
/*
* If we need to split any vma, do it now to save pain later.
@@ -2422,8 +2419,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
goto userfaultfd_error;
}
- /* Point of no return */
- mas_set_range(mas, start, end - 1);
#if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
/* Make sure no VMAs are about to be lost. */
{
@@ -2431,6 +2426,7 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
struct vm_area_struct *vma_mas, *vma_test;
int test_count = 0;
+ mas_set_range(mas, start, end - 1);
rcu_read_lock();
vma_test = mas_find(&test, end - 1);
mas_for_each(mas, vma_mas, end - 1) {
@@ -2440,10 +2436,13 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
}
rcu_read_unlock();
BUG_ON(count != test_count);
- mas_set_range(mas, start, end - 1);
}
#endif
- mas_store_prealloc(mas, NULL);
+ /* Point of no return */
+ mas_set_range(mas, start, end - 1);
+ if (mas_store_gfp(mas, NULL, GFP_KERNEL))
+ return -ENOMEM;
+
mm->map_count -= count;
/*
* Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or
@@ -2475,7 +2474,6 @@ do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
__mt_destroy(&mt_detach);
start_split_failed:
map_count_exceeded:
- mas_destroy(mas);
return error;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
When merging the previous value, set the vma iterator to the previous
slot. Don't use the vma iterator to get the next/prev so that it is in
the correct position for a write.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index e227b7cd71aa..920b0c56ab7c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -935,6 +935,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
&& can_vma_merge_after(prev, vm_flags, anon_vma, file,
pgoff, vm_userfaultfd_ctx, anon_name)) {
merge_prev = true;
+ vma_prev(vmi);
}
}
/* Can we merge the successor? */
@@ -1026,9 +1027,6 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
validate_mm(mm);
khugepaged_enter_vma(res, vm_flags);
- if (res)
- vma_iter_set(vmi, end);
-
return res;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
If the vma start address is going to change due to an insert, then it is
safe to not write the vma to the tree. The write of the insert vma will
alter the tree as necessary.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 5f03c8f3f407..58b2187b447b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -722,10 +722,12 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
if (start != vma->vm_start) {
- if ((vma->vm_start < start) &&
- (!insert || (insert->vm_end != start))) {
- vma_iter_clear(vmi, vma->vm_start, start);
- VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
+ if (vma->vm_start < start) {
+ if (!insert || (insert->vm_end != start)) {
+ vma_iter_clear(vmi, vma->vm_start, start);
+ vma_iter_set(vmi, start);
+ VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
+ }
} else {
vma_changed = true;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator API for the brk() system call. This will provide
type safety at compile time.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 48 +++++++++++++++++++++++-------------------------
1 file changed, 23 insertions(+), 25 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 253a7490fae3..e2ba9b094cad 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -180,10 +180,10 @@ static int check_brk_limits(unsigned long addr, unsigned long len)
return mlock_future_check(current->mm, current->mm->def_flags, len);
}
-static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long newbrk, unsigned long oldbrk,
struct list_head *uf);
-static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *brkvma,
+static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *brkvma,
unsigned long addr, unsigned long request, unsigned long flags);
SYSCALL_DEFINE1(brk, unsigned long, brk)
{
@@ -194,7 +194,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
bool populate;
bool downgraded = false;
LIST_HEAD(uf);
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;
if (mmap_write_lock_killable(mm))
return -EINTR;
@@ -242,8 +242,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
int ret;
/* Search one past newbrk */
- mas_set(&mas, newbrk);
- brkvma = mas_find(&mas, oldbrk);
+ vma_iter_init(&vmi, mm, newbrk);
+ brkvma = vma_find(&vmi, oldbrk);
if (!brkvma || brkvma->vm_start >= oldbrk)
goto out; /* mapping intersects with an existing non-brk vma. */
/*
@@ -252,7 +252,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
* before calling do_brk_munmap().
*/
mm->brk = brk;
- ret = do_brk_munmap(&mas, brkvma, newbrk, oldbrk, &uf);
+ ret = do_brk_munmap(&vmi, brkvma, newbrk, oldbrk, &uf);
if (ret == 1) {
downgraded = true;
goto success;
@@ -270,14 +270,14 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
* Only check if the next VMA is within the stack_guard_gap of the
* expansion area
*/
- mas_set(&mas, oldbrk);
- next = mas_find(&mas, newbrk - 1 + PAGE_SIZE + stack_guard_gap);
+ vma_iter_init(&vmi, mm, oldbrk);
+ next = vma_find(&vmi, newbrk + PAGE_SIZE + stack_guard_gap);
if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
goto out;
- brkvma = mas_prev(&mas, mm->start_brk);
+ brkvma = vma_prev_limit(&vmi, mm->start_brk);
/* Ok, looks good - let it rip. */
- if (do_brk_flags(&mas, brkvma, oldbrk, newbrk - oldbrk, 0) < 0)
+ if (do_brk_flags(&vmi, brkvma, oldbrk, newbrk - oldbrk, 0) < 0)
goto out;
mm->brk = brk;
@@ -2907,8 +2907,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
}
/*
- * brk_munmap() - Unmap a partial vma.
- * @mas: The maple tree state.
+ * brk_munmap() - Unmap a full or partial vma.
+ * @vmi: The vma iterator
* @vma: The vma to be modified
* @newbrk: the start of the address to unmap
* @oldbrk: The end of the address to unmap
@@ -2918,7 +2918,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
* unmaps a partial VMA mapping. Does not handle alignment, downgrades lock if
* possible.
*/
-static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+static int do_brk_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long newbrk, unsigned long oldbrk,
struct list_head *uf)
{
@@ -2926,14 +2926,14 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
int ret;
arch_unmap(mm, newbrk, oldbrk);
- ret = do_mas_align_munmap(mas, vma, mm, newbrk, oldbrk, uf, true);
+ ret = do_mas_align_munmap(&vmi->mas, vma, mm, newbrk, oldbrk, uf, true);
validate_mm_mt(mm);
return ret;
}
/*
* do_brk_flags() - Increase the brk vma if the flags match.
- * @mas: The maple tree state.
+ * @vmi: The vma iterator
* @addr: The start address
* @len: The length of the increase
* @vma: The vma,
@@ -2943,7 +2943,7 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
* do not match then create a new anonymous VMA. Eventually we may be able to
* do some brk-specific accounting here.
*/
-static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
+static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, unsigned long len, unsigned long flags)
{
struct mm_struct *mm = current->mm;
@@ -2970,8 +2970,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
if (vma && vma->vm_end == addr && !vma_policy(vma) &&
can_vma_merge_after(vma, flags, NULL, NULL,
addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) {
- mas_set_range(mas, vma->vm_start, addr + len - 1);
- if (mas_preallocate(mas, GFP_KERNEL))
+ if (vma_iter_prealloc(vmi))
goto unacct_fail;
vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
@@ -2981,7 +2980,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
}
vma->vm_end = addr + len;
vma->vm_flags |= VM_SOFTDIRTY;
- mas_store_prealloc(mas, vma);
+ vma_iter_store(vmi, vma);
if (vma->anon_vma) {
anon_vma_interval_tree_post_update_vma(vma);
@@ -3002,8 +3001,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
vma->vm_pgoff = addr >> PAGE_SHIFT;
vma->vm_flags = flags;
vma->vm_page_prot = vm_get_page_prot(flags);
- mas_set_range(mas, vma->vm_start, addr + len - 1);
- if (mas_store_gfp(mas, vma, GFP_KERNEL))
+ if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
goto mas_store_fail;
mm->map_count++;
@@ -3032,7 +3030,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
int ret;
bool populate;
LIST_HEAD(uf);
- MA_STATE(mas, &mm->mm_mt, addr, addr);
+ VMA_ITERATOR(vmi, mm, addr);
len = PAGE_ALIGN(request);
if (len < request)
@@ -3051,12 +3049,12 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
if (ret)
goto limits_failed;
- ret = do_mas_munmap(&mas, mm, addr, len, &uf, 0);
+ ret = do_mas_munmap(&vmi.mas, mm, addr, len, &uf, 0);
if (ret)
goto munmap_failed;
- vma = mas_prev(&mas, 0);
- ret = do_brk_flags(&mas, vma, addr, len, flags);
+ vma = vma_prev(&vmi);
+ ret = do_brk_flags(&vmi, vma, addr, len, flags);
populate = ((mm->def_flags & VM_LOCKED) != 0);
mmap_write_unlock(mm);
userfaultfd_unmap_complete(mm, &uf);
--
2.35.1
Gain type safety in nommu by using the vma_iterator and not the maple
tree directly.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/nommu.c | 79 +++++++++++++++++++++---------------------------------
1 file changed, 31 insertions(+), 48 deletions(-)
diff --git a/mm/nommu.c b/mm/nommu.c
index 0481922fe66e..7a52a7c37009 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -544,19 +544,6 @@ static void put_nommu_region(struct vm_region *region)
__put_nommu_region(region);
}
-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas)
-{
- mas_set_range(mas, vma->vm_start, vma->vm_end - 1);
- mas_store_prealloc(mas, vma);
-}
-
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas)
-{
- mas->index = vma->vm_start;
- mas->last = vma->vm_end - 1;
- mas_store_prealloc(mas, NULL);
-}
-
static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
{
vma->vm_mm = mm;
@@ -574,13 +561,13 @@ static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
}
/*
- * mas_add_vma_to_mm() - Maple state variant of add_mas_to_mm().
- * @mas: The maple state with preallocations.
+ * vmi_add_vma_to_mm() - VMA Iterator variant of add_vmi_to_mm().
+ * @vmi: The VMA iterator
* @mm: The mm_struct
* @vma: The vma to add
*
*/
-static void mas_add_vma_to_mm(struct ma_state *mas, struct mm_struct *mm,
+static void vmi_add_vma_to_mm(struct vma_iterator *vmi, struct mm_struct *mm,
struct vm_area_struct *vma)
{
BUG_ON(!vma->vm_region);
@@ -589,7 +576,7 @@ static void mas_add_vma_to_mm(struct ma_state *mas, struct mm_struct *mm,
mm->map_count++;
/* add the VMA to the tree */
- vma_mas_store(vma, mas);
+ vma_iter_store(vmi, vma);
}
/*
@@ -600,14 +587,14 @@ static void mas_add_vma_to_mm(struct ma_state *mas, struct mm_struct *mm,
*/
static int add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
{
- MA_STATE(mas, &mm->mm_mt, vma->vm_start, vma->vm_end);
+ VMA_ITERATOR(vmi, mm, vma->vm_start);
- if (mas_preallocate(&mas, GFP_KERNEL)) {
+ if (vma_iter_prealloc(&vmi)) {
pr_warn("Allocation of vma tree for process %d failed\n",
current->pid);
return -ENOMEM;
}
- mas_add_vma_to_mm(&mas, mm, vma);
+ vmi_add_vma_to_mm(&vmi, mm, vma);
return 0;
}
@@ -626,14 +613,15 @@ static void cleanup_vma_from_mm(struct vm_area_struct *vma)
i_mmap_unlock_write(mapping);
}
}
+
/*
* delete a VMA from its owning mm_struct and address space
*/
static int delete_vma_from_mm(struct vm_area_struct *vma)
{
- MA_STATE(mas, &vma->vm_mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_start);
- if (mas_preallocate(&mas, GFP_KERNEL)) {
+ if (vma_iter_prealloc(&vmi)) {
pr_warn("Allocation of vma tree for process %d failed\n",
current->pid);
return -ENOMEM;
@@ -641,10 +629,9 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
cleanup_vma_from_mm(vma);
/* remove from the MM's tree and list */
- vma_mas_remove(vma, &mas);
+ vma_iter_clear(&vmi, vma->vm_start, vma->vm_end);
return 0;
}
-
/*
* destroy a VMA record
*/
@@ -675,9 +662,9 @@ EXPORT_SYMBOL(find_vma_intersection);
*/
struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
{
- MA_STATE(mas, &mm->mm_mt, addr, addr);
+ VMA_ITERATOR(vmi, mm, addr);
- return mas_walk(&mas);
+ return vma_iter_load(&vmi);
}
EXPORT_SYMBOL(find_vma);
@@ -709,9 +696,9 @@ static struct vm_area_struct *find_vma_exact(struct mm_struct *mm,
{
struct vm_area_struct *vma;
unsigned long end = addr + len;
- MA_STATE(mas, &mm->mm_mt, addr, addr);
+ VMA_ITERATOR(vmi, mm, addr);
- vma = mas_walk(&mas);
+ vma = vma_iter_load(&vmi);
if (!vma)
return NULL;
if (vma->vm_start != addr)
@@ -1062,7 +1049,7 @@ unsigned long do_mmap(struct file *file,
vm_flags_t vm_flags;
unsigned long capabilities, result;
int ret;
- MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, current->mm, 0);
*populate = 0;
@@ -1091,8 +1078,8 @@ unsigned long do_mmap(struct file *file,
if (!vma)
goto error_getting_vma;
- if (mas_preallocate(&mas, GFP_KERNEL))
- goto error_maple_preallocate;
+ if (vma_iter_prealloc(&vmi))
+ goto error_vma_iter_prealloc;
region->vm_usage = 1;
region->vm_flags = vm_flags;
@@ -1234,7 +1221,7 @@ unsigned long do_mmap(struct file *file,
current->mm->total_vm += len >> PAGE_SHIFT;
share:
- mas_add_vma_to_mm(&mas, current->mm, vma);
+ vmi_add_vma_to_mm(&vmi, current->mm, vma);
/* we flush the region from the icache only when the first executable
* mapping of it is made */
@@ -1250,7 +1237,7 @@ unsigned long do_mmap(struct file *file,
error_just_free:
up_write(&nommu_region_sem);
error:
- mas_destroy(&mas);
+ vma_iter_free(&vmi);
if (region->vm_file)
fput(region->vm_file);
kmem_cache_free(vm_region_jar, region);
@@ -1278,7 +1265,7 @@ unsigned long do_mmap(struct file *file,
show_free_areas(0, NULL);
return -ENOMEM;
-error_maple_preallocate:
+error_vma_iter_prealloc:
kmem_cache_free(vm_region_jar, region);
vm_area_free(vma);
pr_warn("Allocation of vma tree for process %d failed\n", current->pid);
@@ -1344,20 +1331,18 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
* split a vma into two pieces at address 'addr', a new vma is allocated either
* for the first part or the tail.
*/
-int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
+ struct vm_area_struct *vma, unsigned long addr, int new_below)
{
struct vm_area_struct *new;
struct vm_region *region;
unsigned long npages;
- MA_STATE(mas, &mm->mm_mt, vma->vm_start, vma->vm_end);
/* we're only permitted to split anonymous regions (these should have
* only a single usage on the region) */
if (vma->vm_file)
return -ENOMEM;
- mm = vma->vm_mm;
if (mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
@@ -1369,10 +1354,10 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
if (!new)
goto err_vma_dup;
- if (mas_preallocate(&mas, GFP_KERNEL)) {
+ if (vma_iter_prealloc(vmi)) {
pr_warn("Allocation of vma tree for process %d failed\n",
current->pid);
- goto err_mas_preallocate;
+ goto err_vmi_preallocate;
}
/* most fields are the same, copy all, and then fixup */
@@ -1406,13 +1391,11 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
setup_vma_to_mm(vma, mm);
setup_vma_to_mm(new, mm);
- mas_set_range(&mas, vma->vm_start, vma->vm_end - 1);
- mas_store(&mas, vma);
- vma_mas_store(new, &mas);
+ vma_iter_store(vmi, new);
mm->map_count++;
return 0;
-err_mas_preallocate:
+err_vmi_preallocate:
vm_area_free(new);
err_vma_dup:
kmem_cache_free(vm_region_jar, region);
@@ -1466,7 +1449,7 @@ static int shrink_vma(struct mm_struct *mm,
*/
int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf)
{
- MA_STATE(mas, &mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, mm, start);
struct vm_area_struct *vma;
unsigned long end;
int ret = 0;
@@ -1478,7 +1461,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
end = start + len;
/* find the first potentially overlapping VMA */
- vma = mas_find(&mas, end - 1);
+ vma = vma_find(&vmi, end);
if (!vma) {
static int limit;
if (limit < 5) {
@@ -1497,7 +1480,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
return -EINVAL;
if (end == vma->vm_end)
goto erase_whole_vma;
- vma = mas_next(&mas, end - 1);
+ vma = vma_find(&vmi, end);
} while (vma);
return -EINVAL;
} else {
@@ -1511,7 +1494,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
if (end != vma->vm_end && offset_in_page(end))
return -EINVAL;
if (start != vma->vm_start && end != vma->vm_end) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(&vmi, mm, vma, start, 1);
if (ret < 0)
return ret;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Add init_vma_prep() and init_multi_vma_prep() to set up the struct
vma_prepare. This is to abstract the locking when adjusting the VMAs.
Also change __vma_adjust() variable remove_next int in favour of a
pointer to the VMA to remove. Rename next_next to remove2 since this
better reflects its use.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 108 ++++++++++++++++++++++++++++++------------------------
1 file changed, 61 insertions(+), 47 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index a0883c23f948..7eb93c311d8d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -460,6 +460,45 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
return 0;
}
+/*
+ * init_multi_vma_prep() - Initializer for struct vma_prepare
+ * @vp: The vma_prepare struct
+ * @vma: The vma that will be altered once locked
+ * @next: The next vma if it is to be adjusted
+ * @remove: The first vma to be removed
+ * @remove2: The second vma to be removed
+ */
+static inline void init_multi_vma_prep(struct vma_prepare *vp,
+ struct vm_area_struct *vma, struct vm_area_struct *next,
+ struct vm_area_struct *remove, struct vm_area_struct *remove2)
+{
+ memset(vp, 0, sizeof(struct vma_prepare));
+ vp->vma = vma;
+ vp->anon_vma = vma->anon_vma;
+ vp->remove = remove;
+ vp->remove2 = remove2;
+ vp->adj_next = next;
+ if (!vp->anon_vma && next)
+ vp->anon_vma = next->anon_vma;
+
+ vp->file = vma->vm_file;
+ if (vp->file)
+ vp->mapping = vma->vm_file->f_mapping;
+
+}
+
+/*
+ * init_vma_prep() - Initializer wrapper for vma_prepare struct
+ * @vp: The vma_prepare struct
+ * @vma: The vma that will be altered once locked
+ */
+static inline void init_vma_prep(struct vma_prepare *vp,
+ struct vm_area_struct *vma)
+{
+ init_multi_vma_prep(vp, vma, NULL, NULL, NULL);
+}
+
+
/*
* vma_prepare() - Helper function for handling locking VMAs prior to altering
* @vp: The initialized vma_prepare struct
@@ -569,7 +608,7 @@ static inline void vma_complete(struct vma_prepare *vp,
/*
* In mprotect's case 6 (see comments on vma_merge),
- * we must remove next_next too.
+ * we must remove the one after next as well.
*/
if (vp->remove2) {
vp->remove = vp->remove2;
@@ -602,17 +641,14 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff,
struct vm_area_struct *next)
{
+ bool remove_next = false;
struct vma_prepare vp;
- memset(&vp, 0, sizeof(vp));
- vp.vma = vma;
- vp.anon_vma = vma->anon_vma;
if (next && (vma != next) && (end == next->vm_end)) {
- vp.remove = next;
+ remove_next = true;
if (next->anon_vma && !vma->anon_vma) {
int error;
- vp.anon_vma = next->anon_vma;
vma->anon_vma = next->anon_vma;
error = anon_vma_clone(vma, next);
if (error)
@@ -620,6 +656,7 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
}
+ init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
/* Not merging but overwriting any part of next is not handled. */
VM_WARN_ON(next && !vp.remove &&
next != vma && end > next->vm_start);
@@ -630,11 +667,6 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
goto nomem;
vma_adjust_trans_huge(vma, start, end, 0);
-
- vp.file = vma->vm_file;
- if (vp.file)
- vp.mapping = vp.file->f_mapping;
-
/* VMA iterator points to previous, so set to start if necessary */
if (vma_iter_addr(vmi) != start)
vma_iter_set(vmi, start);
@@ -665,14 +697,13 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct vm_area_struct *insert, struct vm_area_struct *expand)
{
struct mm_struct *mm = vma->vm_mm;
- struct vm_area_struct *next_next = NULL; /* uninit var warning */
+ struct vm_area_struct *remove2 = NULL;
+ struct vm_area_struct *remove = NULL;
struct vm_area_struct *next = find_vma(mm, vma->vm_end);
struct vm_area_struct *orig_vma = vma;
- struct anon_vma *anon_vma = NULL;
struct file *file = vma->vm_file;
bool vma_changed = false;
long adjust_next = 0;
- int remove_next = 0;
struct vm_area_struct *exporter = NULL, *importer = NULL;
struct vma_prepare vma_prep;
@@ -691,25 +722,24 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
*/
VM_WARN_ON(end != next->vm_end);
/*
- * remove_next == 3 means we're
- * removing "vma" and that to do so we
+ * we're removing "vma" and that to do so we
* swapped "vma" and "next".
*/
- remove_next = 3;
VM_WARN_ON(file != next->vm_file);
swap(vma, next);
+ remove = next;
} else {
VM_WARN_ON(expand != vma);
/*
- * case 1, 6, 7, remove_next == 2 is case 6,
- * remove_next == 1 is case 1 or 7.
+ * case 1, 6, 7, remove next.
+ * case 6 also removes the one beyond next
*/
- remove_next = 1 + (end > next->vm_end);
- if (remove_next == 2)
- next_next = find_vma(mm, next->vm_end);
+ remove = next;
+ if (end > next->vm_end)
+ remove2 = find_vma(mm, next->vm_end);
- VM_WARN_ON(remove_next == 2 &&
- end != next_next->vm_end);
+ VM_WARN_ON(remove2 != NULL &&
+ end != remove2->vm_end);
}
exporter = next;
@@ -719,8 +749,8 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
* If next doesn't have anon_vma, import from vma after
* next, if the vma overlaps with it.
*/
- if (remove_next == 2 && !next->anon_vma)
- exporter = next_next;
+ if (remove2 != NULL && !next->anon_vma)
+ exporter = remove2;
} else if (end > next->vm_start) {
/*
@@ -761,30 +791,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (vma_iter_prealloc(vmi))
return -ENOMEM;
- anon_vma = vma->anon_vma;
- if (!anon_vma && adjust_next)
- anon_vma = next->anon_vma;
-
- if (anon_vma)
- VM_WARN_ON(adjust_next && next->anon_vma &&
- anon_vma != next->anon_vma);
-
vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
- memset(&vma_prep, 0, sizeof(vma_prep));
- vma_prep.vma = vma;
- vma_prep.anon_vma = anon_vma;
- vma_prep.file = file;
- if (adjust_next)
- vma_prep.adj_next = next;
- if (file)
- vma_prep.mapping = file->f_mapping;
- vma_prep.insert = insert;
- if (remove_next) {
- vma_prep.remove = next;
- vma_prep.remove2 = next_next;
- }
+ init_multi_vma_prep(&vma_prep, vma, adjust_next ? next : NULL, remove,
+ remove2);
+ VM_WARN_ON(vma_prep.anon_vma && adjust_next && next->anon_vma &&
+ vma_prep.anon_vma != next->anon_vma);
+ vma_prep.insert = insert;
vma_prepare(&vma_prep);
if (start != vma->vm_start) {
--
2.35.1
Prepare for the removal of the vma_mas_store() function by open coding
the maple tree store in this test code. Set the range of the maple
state and call the store function directly.
Cc: SeongJae Park <[email protected]>
Cc: [email protected]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/damon/vaddr-test.h | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h
index bce37c487540..c4b455b5ee30 100644
--- a/mm/damon/vaddr-test.h
+++ b/mm/damon/vaddr-test.h
@@ -14,19 +14,26 @@
#include <kunit/test.h>
-static void __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
+static int __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
ssize_t nr_vmas)
{
- int i;
+ int i, ret = -ENOMEM;
MA_STATE(mas, mt, 0, 0);
if (!nr_vmas)
- return;
+ return 0;
mas_lock(&mas);
- for (i = 0; i < nr_vmas; i++)
- vma_mas_store(&vmas[i], &mas);
+ for (i = 0; i < nr_vmas; i++) {
+ mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
+ if (mas_store_gfp(&mas, &vmas[i], GFP_KERNEL))
+ goto failed;
+ }
+
+ ret = 0;
+failed:
mas_unlock(&mas);
+ return ret;
}
/*
@@ -71,7 +78,8 @@ static void damon_test_three_regions_in_vmas(struct kunit *test)
};
mt_init_flags(&mm.mm_mt, MM_MT_FLAGS);
- __link_vmas(&mm.mm_mt, vmas, ARRAY_SIZE(vmas));
+ if (__link_vmas(&mm.mm_mt, vmas, ARRAY_SIZE(vmas)))
+ kunit_skip(test, "Failed to create VMA tree");
__damon_va_three_regions(&mm, regions);
--
2.35.1
Hello Liam,
On Fri, 20 Jan 2023 11:26:31 -0500 "Liam R. Howlett" <[email protected]> wrote:
> Prepare for the removal of the vma_mas_store() function by open coding
> the maple tree store in this test code. Set the range of the maple
> state and call the store function directly.
>
> Cc: SeongJae Park <[email protected]>
> Cc: [email protected]
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Liam R. Howlett <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Thanks,
SJ
> ---
> mm/damon/vaddr-test.h | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/mm/damon/vaddr-test.h b/mm/damon/vaddr-test.h
> index bce37c487540..c4b455b5ee30 100644
> --- a/mm/damon/vaddr-test.h
> +++ b/mm/damon/vaddr-test.h
> @@ -14,19 +14,26 @@
>
> #include <kunit/test.h>
>
> -static void __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
> +static int __link_vmas(struct maple_tree *mt, struct vm_area_struct *vmas,
> ssize_t nr_vmas)
> {
> - int i;
> + int i, ret = -ENOMEM;
> MA_STATE(mas, mt, 0, 0);
>
> if (!nr_vmas)
> - return;
> + return 0;
>
> mas_lock(&mas);
> - for (i = 0; i < nr_vmas; i++)
> - vma_mas_store(&vmas[i], &mas);
> + for (i = 0; i < nr_vmas; i++) {
> + mas_set_range(&mas, vmas[i].vm_start, vmas[i].vm_end - 1);
> + if (mas_store_gfp(&mas, &vmas[i], GFP_KERNEL))
> + goto failed;
> + }
> +
> + ret = 0;
> +failed:
> mas_unlock(&mas);
> + return ret;
> }
>
> /*
> @@ -71,7 +78,8 @@ static void damon_test_three_regions_in_vmas(struct kunit *test)
> };
>
> mt_init_flags(&mm.mm_mt, MM_MT_FLAGS);
> - __link_vmas(&mm.mm_mt, vmas, ARRAY_SIZE(vmas));
> + if (__link_vmas(&mm.mm_mt, vmas, ARRAY_SIZE(vmas)))
> + kunit_skip(test, "Failed to create VMA tree");
>
> __damon_va_three_regions(&mm, regions);
>
> --
> 2.35.1
>
>
From: "Liam R. Howlett" <[email protected]>
Introduce shrink_vma() which uses the vma_prepare() and vma_complete()
functions to reduce the vma coverage.
Convert shift_arg_pages() to use expand_vma() and the new shrink_vma()
function. Remove support from __vma_adjust() to reduce a vma size since
shift_arg_pages() is the only user that shrinks a VMA in this way.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 4 ++--
include/linux/mm.h | 10 ++-------
mm/mmap.c | 52 ++++++++++++++++++++++++++++++++++++++--------
3 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index d52fca2dd30b..c0df813d2b45 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff))
+ if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
return -ENOMEM;
/*
@@ -733,7 +733,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
vma_prev(&vmi);
/* Shrink the vma to just the new range */
- return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff);
+ return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
}
/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 287e340ced01..cd6947b1dc99 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2831,17 +2831,11 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
/* mmap.c */
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *expand);
-static inline int vma_adjust(struct vma_iterator *vmi,
- struct vm_area_struct *vma, unsigned long start, unsigned long end,
- pgoff_t pgoff)
-{
- return __vma_adjust(vmi, vma, start, end, pgoff, NULL);
-}
extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff,
struct vm_area_struct *next);
+extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff);
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags, struct anon_vma *,
diff --git a/mm/mmap.c b/mm/mmap.c
index 4bb8d219b53f..da58f428c5c0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -685,6 +685,44 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
nomem:
return -ENOMEM;
}
+
+/*
+ * vma_shrink() - Reduce an existing VMAs memory area
+ * @vmi: The vma iterator
+ * @vma: The VMA to modify
+ * @start: The new start
+ * @end: The new end
+ *
+ * Returns: 0 on success, -ENOMEM otherwise
+ */
+int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff)
+{
+ struct vma_prepare vp;
+
+ WARN_ON((vma->vm_start != start) && (vma->vm_end != end));
+
+ if (vma_iter_prealloc(vmi))
+ return -ENOMEM;
+
+ init_vma_prep(&vp, vma);
+ vma_adjust_trans_huge(vma, start, end, 0);
+ vma_prepare(&vp);
+
+ if (vma->vm_start < start)
+ vma_iter_clear(vmi, vma->vm_start, start);
+
+ if (vma->vm_end > end)
+ vma_iter_clear(vmi, end, vma->vm_end);
+
+ vma->vm_start = start;
+ vma->vm_end = end;
+ vma->vm_pgoff = pgoff;
+ vma_complete(&vp, vmi, vma->vm_mm);
+ validate_mm(vma->vm_mm);
+ return 0;
+}
+
/*
* We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
* is already present in an i_mmap tree without adjusting the tree.
@@ -800,14 +838,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_prepare(&vma_prep);
- if (vma->vm_start < start)
- vma_iter_clear(vmi, vma->vm_start, start);
- else if (start != vma->vm_start)
- vma_changed = true;
-
- if (vma->vm_end > end)
- vma_iter_clear(vmi, end, vma->vm_end);
- else if (end != vma->vm_end)
+ if (start < vma->vm_start || end > vma->vm_end)
vma_changed = true;
vma->vm_start = start;
@@ -820,7 +851,10 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- vma_iter_store(vmi, next);
+ if (adjust_next < 0) {
+ WARN_ON_ONCE(vma_changed);
+ vma_iter_store(vmi, next);
+ }
}
vma_complete(&vma_prep, vmi, mm);
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Update the comments to how the vma iterator works. The vma iterator
will keep track of the last vm_end and start the search from vm_end + 1.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/proc/task_mmu.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e56dfa3d6165..f937c4cd0214 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -892,7 +892,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
struct vm_area_struct *vma;
unsigned long vma_start = 0, last_vma_end = 0;
int ret = 0;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
priv->task = get_proc_task(priv->inode);
if (!priv->task)
@@ -910,7 +910,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
goto out_put_mm;
hold_task_mempolicy(priv);
- vma = mas_find(&mas, ULONG_MAX);
+ vma = vma_next(&vmi);
if (unlikely(!vma))
goto empty_set;
@@ -925,7 +925,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
* access it for write request.
*/
if (mmap_lock_is_contended(mm)) {
- mas_pause(&mas);
+ vma_iter_invalidate(&vmi);
mmap_read_unlock(mm);
ret = mmap_read_lock_killable(mm);
if (ret) {
@@ -950,31 +950,31 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
*
* 1) VMA2 is freed, but VMA3 exists:
*
- * find_vma(mm, 16k - 1) will return VMA3.
+ * vma_next(vmi) will return VMA3.
* In this case, just continue from VMA3.
*
* 2) VMA2 still exists:
*
- * find_vma(mm, 16k - 1) will return VMA2.
- * Iterate the loop like the original one.
+ * vma_next(vmi) will return VMA3.
+ * In this case, just continue from VMA3.
*
* 3) No more VMAs can be found:
*
- * find_vma(mm, 16k - 1) will return NULL.
+ * vma_next(vmi) will return NULL.
* No more things to do, just break.
*
* 4) (last_vma_end - 1) is the middle of a vma (VMA'):
*
- * find_vma(mm, 16k - 1) will return VMA' whose range
+ * vma_next(vmi) will return VMA' whose range
* contains last_vma_end.
* Iterate VMA' from last_vma_end.
*/
- vma = mas_find(&mas, ULONG_MAX);
+ vma = vma_next(&vmi);
/* Case 3 above */
if (!vma)
break;
- /* Case 1 above */
+ /* Case 1 and 2 above */
if (vma->vm_start >= last_vma_end)
continue;
@@ -982,8 +982,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
if (vma->vm_end > last_vma_end)
smap_gather_stats(vma, &mss, last_vma_end);
}
- /* Case 2 above */
- } while ((vma = mas_find(&mas, ULONG_MAX)) != NULL);
+ } for_each_vma(vmi, vma);
empty_set:
show_vma_header_prefix(m, vma_start, last_vma_end, 0, 0, 0, 0);
@@ -1279,7 +1278,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
return -ESRCH;
mm = get_task_mm(task);
if (mm) {
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
struct mmu_notifier_range range;
struct clear_refs_private cp = {
.type = type,
@@ -1299,7 +1298,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
}
if (type == CLEAR_REFS_SOFT_DIRTY) {
- mas_for_each(&mas, vma, ULONG_MAX) {
+ for_each_vma(vmi, vma) {
if (!(vma->vm_flags & VM_SOFTDIRTY))
continue;
vma->vm_flags &= ~VM_SOFTDIRTY;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/coredump.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index de78bde2991b..f27d734f3102 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1111,14 +1111,14 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
* Helper function for iterating across a vma list. It ensures that the caller
* will visit `gate_vma' prior to terminating the search.
*/
-static struct vm_area_struct *coredump_next_vma(struct ma_state *mas,
+static struct vm_area_struct *coredump_next_vma(struct vma_iterator *vmi,
struct vm_area_struct *vma,
struct vm_area_struct *gate_vma)
{
if (gate_vma && (vma == gate_vma))
return NULL;
- vma = mas_next(mas, ULONG_MAX);
+ vma = vma_next(vmi);
if (vma)
return vma;
return gate_vma;
@@ -1146,7 +1146,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
{
struct vm_area_struct *gate_vma, *vma = NULL;
struct mm_struct *mm = current->mm;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
int i = 0;
/*
@@ -1167,7 +1167,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
return false;
}
- while ((vma = coredump_next_vma(&mas, vma, gate_vma)) != NULL) {
+ while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) {
struct core_vma_metadata *m = cprm->vma_meta + i;
m->start = vma->vm_start;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mempolicy.c | 25 ++++++++-----------------
1 file changed, 8 insertions(+), 17 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index fd99d303e34f..f5201285c628 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -786,24 +786,21 @@ static int vma_replace_policy(struct vm_area_struct *vma,
static int mbind_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct mempolicy *new_pol)
{
- MA_STATE(mas, &mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, mm, start);
struct vm_area_struct *prev;
struct vm_area_struct *vma;
int err = 0;
pgoff_t pgoff;
- prev = mas_prev(&mas, 0);
- if (unlikely(!prev))
- mas_set(&mas, start);
-
- vma = mas_find(&mas, end - 1);
+ prev = vma_prev(&vmi);
+ vma = vma_find(&vmi, end);
if (WARN_ON(!vma))
return 0;
if (start > vma->vm_start)
prev = vma;
- for (; vma; vma = mas_next(&mas, end - 1)) {
+ do {
unsigned long vmstart = max(start, vma->vm_start);
unsigned long vmend = min(end, vma->vm_end);
@@ -812,29 +809,23 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
pgoff = vma->vm_pgoff +
((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
+ prev = vmi_vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
new_pol, vma->vm_userfaultfd_ctx,
anon_vma_name(vma));
if (prev) {
- /* vma_merge() invalidated the mas */
- mas_pause(&mas);
vma = prev;
goto replace;
}
if (vma->vm_start != vmstart) {
- err = split_vma(vma->vm_mm, vma, vmstart, 1);
+ err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmstart, 1);
if (err)
goto out;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
if (vma->vm_end != vmend) {
- err = split_vma(vma->vm_mm, vma, vmend, 0);
+ err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmend, 0);
if (err)
goto out;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
replace:
err = vma_replace_policy(vma, new_pol);
@@ -842,7 +833,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
goto out;
next:
prev = vma;
- }
+ } for_each_vma_range(vmi, vma, end);
out:
return err;
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the new locking functions for vma_expand(). This reduces code
duplication.
At the same time change VM_BUG_ON() to VM_WARN_ON()
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 188 +++++++++++++++++++++---------------------------------
1 file changed, 72 insertions(+), 116 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 9afaf05eb96b..a0883c23f948 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -460,122 +460,6 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
return 0;
}
-/*
- * vma_expand - Expand an existing VMA
- *
- * @mas: The maple state
- * @vma: The vma to expand
- * @start: The start of the vma
- * @end: The exclusive end of the vma
- * @pgoff: The page offset of vma
- * @next: The current of next vma.
- *
- * Expand @vma to @start and @end. Can expand off the start and end. Will
- * expand over @next if it's different from @vma and @end == @next->vm_end.
- * Checking if the @vma can expand and merge with @next needs to be handled by
- * the caller.
- *
- * Returns: 0 on success
- */
-inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next)
-{
- struct mm_struct *mm = vma->vm_mm;
- struct address_space *mapping = NULL;
- struct rb_root_cached *root = NULL;
- struct anon_vma *anon_vma = vma->anon_vma;
- struct file *file = vma->vm_file;
- bool remove_next = false;
-
- if (next && (vma != next) && (end == next->vm_end)) {
- remove_next = true;
- if (next->anon_vma && !vma->anon_vma) {
- int error;
-
- anon_vma = next->anon_vma;
- vma->anon_vma = anon_vma;
- error = anon_vma_clone(vma, next);
- if (error)
- return error;
- }
- }
-
- /* Not merging but overwriting any part of next is not handled. */
- VM_BUG_ON(next && !remove_next && next != vma && end > next->vm_start);
- /* Only handles expanding */
- VM_BUG_ON(vma->vm_start < start || vma->vm_end > end);
-
- if (vma_iter_prealloc(vmi))
- goto nomem;
-
- vma_adjust_trans_huge(vma, start, end, 0);
-
- if (file) {
- mapping = file->f_mapping;
- root = &mapping->i_mmap;
- uprobe_munmap(vma, vma->vm_start, vma->vm_end);
- i_mmap_lock_write(mapping);
- }
-
- if (anon_vma) {
- anon_vma_lock_write(anon_vma);
- anon_vma_interval_tree_pre_update_vma(vma);
- }
-
- if (file) {
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, root);
- }
-
- /* VMA iterator points to previous, so set to start if necessary */
- if (vma_iter_addr(vmi) != start)
- vma_iter_set(vmi, start);
-
- vma->vm_start = start;
- vma->vm_end = end;
- vma->vm_pgoff = pgoff;
- vma_iter_store(vmi, vma);
-
- if (file) {
- vma_interval_tree_insert(vma, root);
- flush_dcache_mmap_unlock(mapping);
- }
-
- /* Expanding over the next vma */
- if (remove_next && file) {
- __remove_shared_vm_struct(next, file, mapping);
- }
-
- if (anon_vma) {
- anon_vma_interval_tree_post_update_vma(vma);
- anon_vma_unlock_write(anon_vma);
- }
-
- if (file) {
- i_mmap_unlock_write(mapping);
- uprobe_mmap(vma);
- }
-
- if (remove_next) {
- if (file) {
- uprobe_munmap(next, next->vm_start, next->vm_end);
- fput(file);
- }
- if (next->anon_vma)
- anon_vma_merge(vma, next);
- mm->map_count--;
- mpol_put(vma_policy(next));
- vm_area_free(next);
- }
-
- validate_mm(mm);
- return 0;
-
-nomem:
- return -ENOMEM;
-}
-
/*
* vma_prepare() - Helper function for handling locking VMAs prior to altering
* @vp: The initialized vma_prepare struct
@@ -697,6 +581,78 @@ static inline void vma_complete(struct vma_prepare *vp,
uprobe_mmap(vp->insert);
}
+/*
+ * vma_expand - Expand an existing VMA
+ *
+ * @vmi: The vma iterator
+ * @vma: The vma to expand
+ * @start: The start of the vma
+ * @end: The exclusive end of the vma
+ * @pgoff: The page offset of vma
+ * @next: The current of next vma.
+ *
+ * Expand @vma to @start and @end. Can expand off the start and end. Will
+ * expand over @next if it's different from @vma and @end == @next->vm_end.
+ * Checking if the @vma can expand and merge with @next needs to be handled by
+ * the caller.
+ *
+ * Returns: 0 on success
+ */
+inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next)
+{
+ struct vma_prepare vp;
+
+ memset(&vp, 0, sizeof(vp));
+ vp.vma = vma;
+ vp.anon_vma = vma->anon_vma;
+ if (next && (vma != next) && (end == next->vm_end)) {
+ vp.remove = next;
+ if (next->anon_vma && !vma->anon_vma) {
+ int error;
+
+ vp.anon_vma = next->anon_vma;
+ vma->anon_vma = next->anon_vma;
+ error = anon_vma_clone(vma, next);
+ if (error)
+ return error;
+ }
+ }
+
+ /* Not merging but overwriting any part of next is not handled. */
+ VM_WARN_ON(next && !vp.remove &&
+ next != vma && end > next->vm_start);
+ /* Only handles expanding */
+ VM_WARN_ON(vma->vm_start < start || vma->vm_end > end);
+
+ if (vma_iter_prealloc(vmi))
+ goto nomem;
+
+ vma_adjust_trans_huge(vma, start, end, 0);
+
+ vp.file = vma->vm_file;
+ if (vp.file)
+ vp.mapping = vp.file->f_mapping;
+
+ /* VMA iterator points to previous, so set to start if necessary */
+ if (vma_iter_addr(vmi) != start)
+ vma_iter_set(vmi, start);
+
+ vma_prepare(&vp);
+ vma->vm_start = start;
+ vma->vm_end = end;
+ vma->vm_pgoff = pgoff;
+ /* Note: mas must be pointing to the expanding VMA */
+ vma_iter_store(vmi, vma);
+
+ vma_complete(&vp, vmi, vma->vm_mm);
+ validate_mm(vma->vm_mm);
+ return 0;
+
+nomem:
+ return -ENOMEM;
+}
/*
* We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
* is already present in an i_mmap tree without adjusting the tree.
--
2.35.1
Rename the function to vmi_shrink_vma() indicate it takes the vma
iterator. Use the iterator to preallocate and drop the delete function.
The maple tree is able to do the modification easier than the linked
list and rbtree, so just clear the necessary area in the tree.
add_vma_to_mm() is no longer used, so drop this function.
vmi_add_vma_to_mm() is now only used once, so inline this function into
do_mmap().
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/nommu.c | 63 +++++++++++++++---------------------------------------
1 file changed, 17 insertions(+), 46 deletions(-)
diff --git a/mm/nommu.c b/mm/nommu.c
index 7a52a7c37009..9ddeb92600d6 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -560,44 +560,6 @@ static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
}
}
-/*
- * vmi_add_vma_to_mm() - VMA Iterator variant of add_vmi_to_mm().
- * @vmi: The VMA iterator
- * @mm: The mm_struct
- * @vma: The vma to add
- *
- */
-static void vmi_add_vma_to_mm(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *vma)
-{
- BUG_ON(!vma->vm_region);
-
- setup_vma_to_mm(vma, mm);
- mm->map_count++;
-
- /* add the VMA to the tree */
- vma_iter_store(vmi, vma);
-}
-
-/*
- * add a VMA into a process's mm_struct in the appropriate place in the list
- * and tree and add to the address space's page tree also if not an anonymous
- * page
- * - should be called with mm->mmap_lock held writelocked
- */
-static int add_vma_to_mm(struct mm_struct *mm, struct vm_area_struct *vma)
-{
- VMA_ITERATOR(vmi, mm, vma->vm_start);
-
- if (vma_iter_prealloc(&vmi)) {
- pr_warn("Allocation of vma tree for process %d failed\n",
- current->pid);
- return -ENOMEM;
- }
- vmi_add_vma_to_mm(&vmi, mm, vma);
- return 0;
-}
-
static void cleanup_vma_from_mm(struct vm_area_struct *vma)
{
vma->vm_mm->map_count--;
@@ -1221,7 +1183,11 @@ unsigned long do_mmap(struct file *file,
current->mm->total_vm += len >> PAGE_SHIFT;
share:
- vmi_add_vma_to_mm(&vmi, current->mm, vma);
+ BUG_ON(!vma->vm_region);
+ setup_vma_to_mm(vma, current->mm);
+ current->mm->map_count++;
+ /* add the VMA to the tree */
+ vma_iter_store(&vmi, vma);
/* we flush the region from the icache only when the first executable
* mapping of it is made */
@@ -1406,7 +1372,7 @@ int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
* shrink a VMA by removing the specified chunk from either the beginning or
* the end
*/
-static int shrink_vma(struct mm_struct *mm,
+static int vmi_shrink_vma(struct vma_iterator *vmi,
struct vm_area_struct *vma,
unsigned long from, unsigned long to)
{
@@ -1414,14 +1380,19 @@ static int shrink_vma(struct mm_struct *mm,
/* adjust the VMA's pointers, which may reposition it in the MM's tree
* and list */
- if (delete_vma_from_mm(vma))
+ if (vma_iter_prealloc(vmi)) {
+ pr_warn("Allocation of vma tree for process %d failed\n",
+ current->pid);
return -ENOMEM;
- if (from > vma->vm_start)
+ }
+
+ if (from > vma->vm_start) {
+ vma_iter_clear(vmi, from, vma->vm_end);
vma->vm_end = from;
- else
+ } else {
+ vma_iter_clear(vmi, vma->vm_start, to);
vma->vm_start = to;
- if (add_vma_to_mm(mm, vma))
- return -ENOMEM;
+ }
/* cut the backing region down to size */
region = vma->vm_region;
@@ -1498,7 +1469,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
if (ret < 0)
return ret;
}
- return shrink_vma(mm, vma, start, end);
+ return vmi_shrink_vma(&vmi, vma, start, end);
}
erase_whole_vma:
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Change the vma_adjust() function definition to accept the vma iterator
and pass it through to __vma_adjust().
Update fs/exec to use the new vma_adjust() function parameters.
Update mm/mremap to use the new vma_adjust() function parameters.
Revert the __split_vma() calls back from __vma_adjust() to vma_adjust()
and pass through the vma iterator.
Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 11 ++++-------
include/linux/mm.h | 9 ++++-----
mm/mmap.c | 10 +++++-----
mm/mremap.c | 4 ++--
4 files changed, 15 insertions(+), 19 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index b98647eeae9f..76ee62e1d3f1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
+ if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
return -ENOMEM;
/*
@@ -731,12 +731,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
}
tlb_finish_mmu(&tlb);
- /*
- * Shrink the vma to just the new range. Always succeeds.
- */
- vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
-
- return 0;
+ vma_prev(&vmi);
+ /* Shrink the vma to just the new range */
+ return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff, NULL);
}
/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 479c79204d96..75b6d06d69d5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2834,12 +2834,11 @@ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admi
extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
struct vm_area_struct *expand);
-static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
+static inline int vma_adjust(struct vma_iterator *vmi,
+ struct vm_area_struct *vma, unsigned long start, unsigned long end,
+ pgoff_t pgoff, struct vm_area_struct *insert)
{
- VMA_ITERATOR(vmi, vma->vm_mm, start);
-
- return __vma_adjust(&vmi, vma, start, end, pgoff, insert, NULL);
+ return __vma_adjust(vmi, vma, start, end, pgoff, insert, NULL);
}
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index c7d72475ba6d..b6bedc07ef11 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2213,12 +2213,12 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
new->vm_ops->open(new);
if (new_below)
- err = __vma_adjust(vmi, vma, addr, vma->vm_end,
- vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
- new, NULL);
+ err = vma_adjust(vmi, vma, addr, vma->vm_end,
+ vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
+ new);
else
- err = __vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
- new, NULL);
+ err = vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
+ new);
/* Success. */
if (!err) {
diff --git a/mm/mremap.c b/mm/mremap.c
index 71ba8eddd836..2176f0cc7f9a 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1047,8 +1047,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
extension_end, vma->vm_flags, vma->anon_vma,
vma->vm_file, extension_pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
- } else if (vma_adjust(vma, vma->vm_start, addr + new_len,
- vma->vm_pgoff, NULL)) {
+ } else if (vma_adjust(&vmi, vma, vma->vm_start,
+ addr + new_len, vma->vm_pgoff, NULL)) {
vma = NULL;
}
if (!vma) {
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
kernel/sched/fair.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c36aa54ae071..9c9950249d7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2938,11 +2938,11 @@ static void task_numa_work(struct callback_head *work)
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
u64 runtime = p->se.sum_exec_runtime;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
struct vm_area_struct *vma;
unsigned long start, end;
unsigned long nr_pte_updates = 0;
long pages, virtpages;
+ struct vma_iterator vmi;
SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
@@ -2995,16 +2995,16 @@ static void task_numa_work(struct callback_head *work)
if (!mmap_read_trylock(mm))
return;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_next(&vmi);
if (!vma) {
reset_ptenuma_scan(p);
start = 0;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_set(&vmi, start);
+ vma = vma_next(&vmi);
}
- for (; vma; vma = mas_find(&mas, ULONG_MAX)) {
+ do {
if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
continue;
@@ -3051,7 +3051,7 @@ static void task_numa_work(struct callback_head *work)
cond_resched();
} while (end != vma->vm_end);
- }
+ } for_each_vma(vmi, vma);
out:
/*
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Create a helper for duplicating the anon vma when adjusting the vma.
This simplifies the logic of __vma_adjust().
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 74 ++++++++++++++++++++++++++++++-------------------------
1 file changed, 40 insertions(+), 34 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index da58f428c5c0..0a2b19633174 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -620,6 +620,29 @@ static inline void vma_complete(struct vma_prepare *vp,
uprobe_mmap(vp->insert);
}
+/*
+ * dup_anon_vma() - Helper function to duplicate anon_vma
+ * @dst: The destination VMA
+ * @src: The source VMA
+ *
+ * Returns: 0 on success.
+ */
+static inline int dup_anon_vma(struct vm_area_struct *dst,
+ struct vm_area_struct *src)
+{
+ /*
+ * Easily overlooked: when mprotect shifts the boundary, make sure the
+ * expanding vma has anon_vma set if the shrinking vma had, to cover any
+ * anon pages imported.
+ */
+ if (src->anon_vma && !dst->anon_vma) {
+ dst->anon_vma = src->anon_vma;
+ return anon_vma_clone(dst, src);
+ }
+
+ return 0;
+}
+
/*
* vma_expand - Expand an existing VMA
*
@@ -645,15 +668,12 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct vma_prepare vp;
if (next && (vma != next) && (end == next->vm_end)) {
- remove_next = true;
- if (next->anon_vma && !vma->anon_vma) {
- int error;
+ int ret;
- vma->anon_vma = next->anon_vma;
- error = anon_vma_clone(vma, next);
- if (error)
- return error;
- }
+ remove_next = true;
+ ret = dup_anon_vma(vma, next);
+ if (ret)
+ return ret;
}
init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
@@ -742,10 +762,11 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct file *file = vma->vm_file;
bool vma_changed = false;
long adjust_next = 0;
- struct vm_area_struct *exporter = NULL, *importer = NULL;
struct vma_prepare vma_prep;
if (next) {
+ int error = 0;
+
if (end >= next->vm_end) {
/*
* vma expands, overlapping all the next, and
@@ -780,15 +801,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
end != remove2->vm_end);
}
- exporter = next;
- importer = vma;
-
/*
* If next doesn't have anon_vma, import from vma after
* next, if the vma overlaps with it.
*/
- if (remove2 != NULL && !next->anon_vma)
- exporter = remove2;
+ if (remove != NULL && !next->anon_vma)
+ error = dup_anon_vma(vma, remove2);
+ else
+ error = dup_anon_vma(vma, remove);
} else if (end > next->vm_start) {
/*
@@ -796,9 +816,8 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
* mprotect case 5 shifting the boundary up.
*/
adjust_next = (end - next->vm_start);
- exporter = next;
- importer = vma;
- VM_WARN_ON(expand != importer);
+ VM_WARN_ON(expand != vma);
+ error = dup_anon_vma(vma, next);
} else if (end < vma->vm_end) {
/*
* vma shrinks, and !insert tells it's not
@@ -806,24 +825,11 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
* mprotect case 4 shifting the boundary down.
*/
adjust_next = -(vma->vm_end - end);
- exporter = vma;
- importer = next;
- VM_WARN_ON(expand != importer);
- }
-
- /*
- * Easily overlooked: when mprotect shifts the boundary,
- * make sure the expanding vma has anon_vma set if the
- * shrinking vma had, to cover any anon pages imported.
- */
- if (exporter && exporter->anon_vma && !importer->anon_vma) {
- int error;
-
- importer->anon_vma = exporter->anon_vma;
- error = anon_vma_clone(importer, exporter);
- if (error)
- return error;
+ VM_WARN_ON(expand != next);
+ error = dup_anon_vma(next, vma);
}
+ if (error)
+ return error;
}
if (vma_iter_prealloc(vmi))
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Pass through the vma iterator to do_vmi_munmap() to handle the iterator
state internally
Signed-off-by: Liam R. Howlett <[email protected]>
---
ipc/shm.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/ipc/shm.c b/ipc/shm.c
index bd2fcc4d454e..1c6a6b319a49 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1786,8 +1786,8 @@ long ksys_shmdt(char __user *shmaddr)
*/
file = vma->vm_file;
size = i_size_read(file_inode(vma->vm_file));
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
- mas_pause(&vmi.mas);
+ do_vmi_munmap(&vmi, mm, vma->vm_start,
+ vma->vm_end - vma->vm_start, NULL, false);
/*
* We discovered the size of the shm segment, so
* break out of here and fall through to the next
@@ -1810,10 +1810,9 @@ long ksys_shmdt(char __user *shmaddr)
/* finding a matching vma now does not alter retval */
if ((vma->vm_ops == &shm_vm_ops) &&
((vma->vm_start - addr)/PAGE_SIZE == vma->vm_pgoff) &&
- (vma->vm_file == file)) {
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
- mas_pause(&vmi.mas);
- }
+ (vma->vm_file == file))
+ do_vmi_munmap(&vmi, mm, vma->vm_start,
+ vma->vm_end - vma->vm_start, NULL, false);
vma = vma_next(&vmi);
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Pass the vma iterator through to __vma_adjust() so the state can be
updated.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 6 ++++--
mm/mmap.c | 31 +++++++++++++++----------------
2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 170a06e46cc9..479c79204d96 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2831,13 +2831,15 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
/* mmap.c */
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
+extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
struct vm_area_struct *expand);
static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
{
- return __vma_adjust(vma, start, end, pgoff, insert, NULL);
+ VMA_ITERATOR(vmi, vma->vm_mm, start);
+
+ return __vma_adjust(&vmi, vma, start, end, pgoff, insert, NULL);
}
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index 19e5a79d5ca7..5f03c8f3f407 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -579,9 +579,9 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
* are necessary. The "insert" vma (if any) is to be inserted
* before we drop the necessary locks.
*/
-int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
- struct vm_area_struct *expand)
+int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *insert, struct vm_area_struct *expand)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *next_next = NULL; /* uninit var warning */
@@ -594,7 +594,6 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
bool vma_changed = false;
long adjust_next = 0;
int remove_next = 0;
- VMA_ITERATOR(vmi, mm, 0);
struct vm_area_struct *exporter = NULL, *importer = NULL;
if (next && !insert) {
@@ -679,7 +678,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
}
- if (vma_iter_prealloc(&vmi))
+ if (vma_iter_prealloc(vmi))
return -ENOMEM;
vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
@@ -725,7 +724,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (start != vma->vm_start) {
if ((vma->vm_start < start) &&
(!insert || (insert->vm_end != start))) {
- vma_iter_clear(&vmi, vma->vm_start, start);
+ vma_iter_clear(vmi, vma->vm_start, start);
VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
} else {
vma_changed = true;
@@ -735,8 +734,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (end != vma->vm_end) {
if (vma->vm_end > end) {
if (!insert || (insert->vm_start != end)) {
- vma_iter_clear(&vmi, end, vma->vm_end);
- vma_iter_set(&vmi, vma->vm_end);
+ vma_iter_clear(vmi, end, vma->vm_end);
+ vma_iter_set(vmi, vma->vm_end);
VM_WARN_ON(insert &&
insert->vm_end < vma->vm_end);
}
@@ -747,13 +746,13 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
if (vma_changed)
- vma_iter_store(&vmi, vma);
+ vma_iter_store(vmi, vma);
vma->vm_pgoff = pgoff;
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- vma_iter_store(&vmi, next);
+ vma_iter_store(vmi, next);
}
if (file) {
@@ -773,7 +772,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
* us to insert it before dropping the locks
* (it may either follow vma or precede it).
*/
- vma_iter_store(&vmi, insert);
+ vma_iter_store(vmi, insert);
mm->map_count++;
}
@@ -819,7 +818,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (insert && file)
uprobe_mmap(insert);
- vma_iter_free(&vmi);
+ vma_iter_free(vmi);
validate_mm(mm);
return 0;
@@ -1013,20 +1012,20 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (merge_prev && merge_next &&
is_mergeable_anon_vma(prev->anon_vma,
next->anon_vma, NULL)) { /* cases 1, 6 */
- err = __vma_adjust(prev, prev->vm_start,
+ err = __vma_adjust(vmi, prev, prev->vm_start,
next->vm_end, prev->vm_pgoff, NULL,
prev);
res = prev;
} else if (merge_prev) { /* cases 2, 5, 7 */
- err = __vma_adjust(prev, prev->vm_start,
+ err = __vma_adjust(vmi, prev, prev->vm_start,
end, prev->vm_pgoff, NULL, prev);
res = prev;
} else if (merge_next) {
if (prev && addr < prev->vm_end) /* case 4 */
- err = __vma_adjust(prev, prev->vm_start,
+ err = __vma_adjust(vmi, prev, prev->vm_start,
addr, prev->vm_pgoff, NULL, next);
else /* cases 3, 8 */
- err = __vma_adjust(mid, addr, next->vm_end,
+ err = __vma_adjust(vmi, mid, addr, next->vm_end,
next->vm_pgoff - pglen, NULL, next);
res = next;
}
--
2.35.1
From: "Liam R. Howlett" <[email protected]>
Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mlock.c | 57 +++++++++++++++++++++++++++---------------------------
1 file changed, 28 insertions(+), 29 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c
index b680f11879c3..0d09b9070071 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -401,8 +401,9 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
*
* For vmas that pass the filters, merge/split as appropriate.
*/
-static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
- unsigned long start, unsigned long end, vm_flags_t newflags)
+static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, vm_flags_t newflags)
{
struct mm_struct *mm = vma->vm_mm;
pgoff_t pgoff;
@@ -417,22 +418,22 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
goto out;
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
- vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ *prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
+ vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*prev) {
vma = *prev;
goto success;
}
if (start != vma->vm_start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(vmi, mm, vma, start, 1);
if (ret)
goto out;
}
if (end != vma->vm_end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = vmi_split_vma(vmi, mm, vma, end, 0);
if (ret)
goto out;
}
@@ -471,7 +472,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
unsigned long nstart, end, tmp;
struct vm_area_struct *vma, *prev;
int error;
- MA_STATE(mas, ¤t->mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, current->mm, start);
VM_BUG_ON(offset_in_page(start));
VM_BUG_ON(len != PAGE_ALIGN(len));
@@ -480,39 +481,37 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
return -EINVAL;
if (end == start)
return 0;
- vma = mas_walk(&mas);
+ vma = vma_iter_load(&vmi);
if (!vma)
return -ENOMEM;
+ prev = vma_prev(&vmi);
if (start > vma->vm_start)
prev = vma;
- else
- prev = mas_prev(&mas, 0);
- for (nstart = start ; ; ) {
- vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ nstart = start;
+ tmp = vma->vm_start;
+ for_each_vma_range(vmi, vma, end) {
+ vm_flags_t newflags;
- newflags |= flags;
+ if (vma->vm_start != tmp)
+ return -ENOMEM;
+ newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ newflags |= flags;
/* Here we know that vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
- error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
+ error = mlock_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
if (error)
break;
nstart = tmp;
- if (nstart < prev->vm_end)
- nstart = prev->vm_end;
- if (nstart >= end)
- break;
-
- vma = find_vma(prev->vm_mm, prev->vm_end);
- if (!vma || vma->vm_start != nstart) {
- error = -ENOMEM;
- break;
- }
}
+
+ if (vma_iter_end(&vmi) < end)
+ return -ENOMEM;
+
return error;
}
@@ -658,7 +657,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
*/
static int apply_mlockall_flags(int flags)
{
- MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, current->mm, 0);
struct vm_area_struct *vma, *prev = NULL;
vm_flags_t to_add = 0;
@@ -679,15 +678,15 @@ static int apply_mlockall_flags(int flags)
to_add |= VM_LOCKONFAULT;
}
- mas_for_each(&mas, vma, ULONG_MAX) {
+ for_each_vma(vmi, vma) {
vm_flags_t newflags;
newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
newflags |= to_add;
/* Ignore errors */
- mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
- mas_pause(&mas);
+ mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
+ newflags);
cond_resched();
}
out:
--
2.35.1
Stop using vma_adjust() in preparation for removing the function.
Export vma_expand() to use instead.
Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 3 +++
mm/mmap.c | 6 +++---
mm/mremap.c | 4 ++--
3 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c372c09e11b5..287e340ced01 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2839,6 +2839,9 @@ static inline int vma_adjust(struct vma_iterator *vmi,
{
return __vma_adjust(vmi, vma, start, end, pgoff, NULL);
}
+extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next);
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags, struct anon_vma *,
diff --git a/mm/mmap.c b/mm/mmap.c
index c1eb353c16f8..4bb8d219b53f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -637,9 +637,9 @@ static inline void vma_complete(struct vma_prepare *vp,
*
* Returns: 0 on success
*/
-inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next)
+int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next)
{
bool remove_next = false;
struct vma_prepare vp;
diff --git a/mm/mremap.c b/mm/mremap.c
index 30eea37f9fc4..1b3ee02bead7 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1052,8 +1052,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
extension_end, vma->vm_flags, vma->anon_vma,
vma->vm_file, extension_pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
- } else if (vma_adjust(&vmi, vma, vma->vm_start,
- addr + new_len, vma->vm_pgoff)) {
+ } else if (vma_expand(&vmi, vma, vma->vm_start,
+ addr + new_len, vma->vm_pgoff, NULL)) {
vma = NULL;
}
if (!vma) {
--
2.35.1
Hi Liam,
"Liam R. Howlett" <[email protected]> writes:
> From: "Liam R. Howlett" <[email protected]>
>
> Pass through the vma iterator to do_vmi_munmap() to handle the iterator
> state internally
>
> Signed-off-by: Liam R. Howlett <[email protected]>
> ---
> ipc/shm.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
git bisect says this breaks the shm* testcase in ltp on (at least) s390:
# ./test.sh
tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
shmat01.c:124: TPASS: shmat() succeeded to attach NULL address
shmat01.c:92: TFAIL: shmat() failed: EINVAL (22)
shmat01.c:92: TFAIL: shmat() failed: EINVAL (22)
shmat01.c:92: TFAIL: shmat() failed: EINVAL (22)
Summary:
passed 1
failed 3
broken 0
skipped 0
warnings 0
#
Can you take a look? Thanks!
reverting the above commit fixes the issue.
Thanks,
Sven
* Sven Schnelle <[email protected]> [230125 06:00]:
> Hi Liam,
>
> "Liam R. Howlett" <[email protected]> writes:
>
> > From: "Liam R. Howlett" <[email protected]>
> >
> > Pass through the vma iterator to do_vmi_munmap() to handle the iterator
> > state internally
> >
> > Signed-off-by: Liam R. Howlett <[email protected]>
> > ---
> > ipc/shm.c | 11 +++++------
> > 1 file changed, 5 insertions(+), 6 deletions(-)
>
> git bisect says this breaks the shm* testcase in ltp on (at least) s390:
>
> # ./test.sh
> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
> shmat01.c:124: TPASS: shmat() succeeded to attach NULL address
> shmat01.c:92: TFAIL: shmat() failed: EINVAL (22)
> shmat01.c:92: TFAIL: shmat() failed: EINVAL (22)
> shmat01.c:92: TFAIL: shmat() failed: EINVAL (22)
>
> Summary:
> passed 1
> failed 3
> broken 0
> skipped 0
> warnings 0
>
> #
>
> Can you take a look? Thanks!
>
> reverting the above commit fixes the issue.
Thanks for testing this and letting me know.
I'll have a look.
Regards,
Liam
On 1/20/23 17:26, Liam R. Howlett wrote:
> From: "Liam R. Howlett" <[email protected]>
>
> Inline the work of __vma_adjust() into vma_merge(). This reduces code
> size and has the added benefits of the comments for the cases being
> located with the code.
>
> Change the comments referencing vma_adjust() accordingly.
>
> Signed-off-by: Liam R. Howlett <[email protected]>
...
> @@ -1054,32 +945,85 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vm_userfaultfd_ctx, anon_name)) {
> merge_next = true;
> }
> +
> + remove = remove2 = adjust = NULL;
> /* Can we merge both the predecessor and the successor? */
> if (merge_prev && merge_next &&
> - is_mergeable_anon_vma(prev->anon_vma,
> - next->anon_vma, NULL)) { /* cases 1, 6 */
> - err = __vma_adjust(vmi, prev, prev->vm_start,
> - next->vm_end, prev->vm_pgoff, prev);
> - res = prev;
> - } else if (merge_prev) { /* cases 2, 5, 7 */
> - err = __vma_adjust(vmi, prev, prev->vm_start,
> - end, prev->vm_pgoff, prev);
> - res = prev;
> + is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
> + remove = mid; /* case 1 */
> + vma_end = next->vm_end;
> + err = dup_anon_vma(res, remove);
> + if (mid != next) { /* case 6 */
> + remove2 = next;
> + if (!remove->anon_vma)
> + err = dup_anon_vma(res, remove2);
> + }
> + } else if (merge_prev) {
> + err = 0; /* case 2 */
> + if (mid && end > mid->vm_start) {
> + err = dup_anon_vma(res, mid);
> + if (end == mid->vm_end) { /* case 7 */
> + remove = mid;
> + } else { /* case 5 */
> + adjust = mid;
> + adj_next = (end - mid->vm_start);
> + }
> + }
> } else if (merge_next) {
> - if (prev && addr < prev->vm_end) /* case 4 */
> - err = __vma_adjust(vmi, prev, prev->vm_start,
> - addr, prev->vm_pgoff, next);
> - else /* cases 3, 8 */
> - err = __vma_adjust(vmi, mid, addr, next->vm_end,
> - next->vm_pgoff - pglen, next);
> res = next;
> + if (prev && addr < prev->vm_end) { /* case 4 */
> + vma_end = addr;
> + adjust = mid;
> + adj_next = -(vma->vm_end - addr);
> + err = dup_anon_vma(res, adjust);
I think this one is wrong, and should be fixed as below. I'm not
exactly sure about user visible effects, but shouldn't matter if
we fix before rc1? I guess what can happen is we end up with pages
becoming part of 'prev' that have anon_vma originally from 'mid'
which is not connected to 'prev', so eventually some rmap operation
will fail to do the right thing etc. Or 'mid' is unmapped, its
anon_vma freed and we have a use-after free. Probably rare to happen,
but nasty enough.
----8<----
From 854f4cef0fecde9a0a89ff1a5beb0a1e2115363f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Wed, 22 Feb 2023 16:51:46 +0100
Subject: [PATCH urgent for 6.3-rc1] mm/mremap: fix dup_anon_vma() in vma_merge() case 4
In case 4, we are shrinking 'prev' (PPPP in the comment) and expanding
'mid' (NNNN). So we need to make sure 'mid' clones the anon_vma from
'prev', if it doesn't have any. After commit 0503ea8f5ba7 ("mm/mmap:
remove __vma_adjust()") we can fail to do that due to wrong parameters
for dup_anon_vma(). The call is a no-op because res == next, adjust ==
mid and mid == next. Fix it.
Fixes: 0503ea8f5ba7 ("mm/mmap: remove __vma_adjust()")
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/mmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 20f21f0949dd..740b54be3ed4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -973,7 +973,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_end = addr;
adjust = mid;
adj_next = -(vma->vm_end - addr);
- err = dup_anon_vma(res, adjust);
+ err = dup_anon_vma(adjust, prev);
} else {
vma = next; /* case 3 */
vma_start = addr;
--
2.39.2
* Vlastimil Babka <[email protected]> [230222 11:17]:
> On 1/20/23 17:26, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <[email protected]>
> >
> > Inline the work of __vma_adjust() into vma_merge(). This reduces code
> > size and has the added benefits of the comments for the cases being
> > located with the code.
> >
> > Change the comments referencing vma_adjust() accordingly.
> >
> > Signed-off-by: Liam R. Howlett <[email protected]>
>
> ...
>
> > @@ -1054,32 +945,85 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> > vm_userfaultfd_ctx, anon_name)) {
> > merge_next = true;
> > }
> > +
> > + remove = remove2 = adjust = NULL;
> > /* Can we merge both the predecessor and the successor? */
> > if (merge_prev && merge_next &&
> > - is_mergeable_anon_vma(prev->anon_vma,
> > - next->anon_vma, NULL)) { /* cases 1, 6 */
> > - err = __vma_adjust(vmi, prev, prev->vm_start,
> > - next->vm_end, prev->vm_pgoff, prev);
> > - res = prev;
> > - } else if (merge_prev) { /* cases 2, 5, 7 */
> > - err = __vma_adjust(vmi, prev, prev->vm_start,
> > - end, prev->vm_pgoff, prev);
> > - res = prev;
> > + is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
> > + remove = mid; /* case 1 */
> > + vma_end = next->vm_end;
> > + err = dup_anon_vma(res, remove);
> > + if (mid != next) { /* case 6 */
> > + remove2 = next;
> > + if (!remove->anon_vma)
> > + err = dup_anon_vma(res, remove2);
> > + }
> > + } else if (merge_prev) {
> > + err = 0; /* case 2 */
> > + if (mid && end > mid->vm_start) {
> > + err = dup_anon_vma(res, mid);
> > + if (end == mid->vm_end) { /* case 7 */
> > + remove = mid;
> > + } else { /* case 5 */
> > + adjust = mid;
> > + adj_next = (end - mid->vm_start);
> > + }
> > + }
> > } else if (merge_next) {
> > - if (prev && addr < prev->vm_end) /* case 4 */
> > - err = __vma_adjust(vmi, prev, prev->vm_start,
> > - addr, prev->vm_pgoff, next);
> > - else /* cases 3, 8 */
> > - err = __vma_adjust(vmi, mid, addr, next->vm_end,
> > - next->vm_pgoff - pglen, next);
> > res = next;
> > + if (prev && addr < prev->vm_end) { /* case 4 */
> > + vma_end = addr;
> > + adjust = mid;
> > + adj_next = -(vma->vm_end - addr);
> > + err = dup_anon_vma(res, adjust);
>
> I think this one is wrong, and should be fixed as below. I'm not
> exactly sure about user visible effects, but shouldn't matter if
> we fix before rc1? I guess what can happen is we end up with pages
> becoming part of 'prev' that have anon_vma originally from 'mid'
> which is not connected to 'prev', so eventually some rmap operation
> will fail to do the right thing etc. Or 'mid' is unmapped, its
> anon_vma freed and we have a use-after free. Probably rare to happen,
> but nasty enough.
Yes, you are correct. Thanks for the closer look here.
>
> ----8<----
> From 854f4cef0fecde9a0a89ff1a5beb0a1e2115363f Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Wed, 22 Feb 2023 16:51:46 +0100
> Subject: [PATCH urgent for 6.3-rc1] mm/mremap: fix dup_anon_vma() in vma_merge() case 4
>
> In case 4, we are shrinking 'prev' (PPPP in the comment) and expanding
> 'mid' (NNNN). So we need to make sure 'mid' clones the anon_vma from
> 'prev', if it doesn't have any. After commit 0503ea8f5ba7 ("mm/mmap:
> remove __vma_adjust()") we can fail to do that due to wrong parameters
> for dup_anon_vma(). The call is a no-op because res == next, adjust ==
> mid and mid == next. Fix it.
>
> Fixes: 0503ea8f5ba7 ("mm/mmap: remove __vma_adjust()")
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> mm/mmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 20f21f0949dd..740b54be3ed4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -973,7 +973,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
> vma_end = addr;
> adjust = mid;
> adj_next = -(vma->vm_end - addr);
> - err = dup_anon_vma(res, adjust);
> + err = dup_anon_vma(adjust, prev);
Reviewed-by: Liam R. Howlett <[email protected]>
> } else {
> vma = next; /* case 3 */
> vma_start = addr;
> --
> 2.39.2
>
>
>
On 20/01/2023 16:26, Liam R. Howlett wrote:
> From: "Liam R. Howlett" <[email protected]>
>
> Use the vma iterator so that the iterator can be invalidated or updated
> to avoid each caller doing so.
Hi,
I've bisected 2 mm selftest regressions back to this patch, so hoping someone can help debug and fix? The failures are reproducible on x86_64 and arm64.
mlock-random-test:
$ ./run_kselftest.sh -t mm:mlock-random-test
TAP version 13
1..1
# selftests: mm: mlock-random-test
mlock() failure at |0xaaaaaaab52d0(131072)| mlock:|0xaaaaaaacc65d(26551)|
not ok 1 selftests: mm: mlock-random-test # exit=255
This mallocs a buffer then loops 100 times, trying to mlock random parts of it. After this patch, the test fails after a variable number of iterations; mlock() returns ENOMEM. If I explicitly munlock at the end of each loop, it works.
mlock2-tests:
$ ./run_kselftest.sh -t mm:mlock2-tests
TAP version 13
1..1
# selftests: mm: mlock2-tests
munlock(): Cannot allocate memory
munlock(): Cannot allocate memory
not ok 1 selftests: mm: mlock2-tests # exit=2
Here, a 3 page buffer is mlock2()ed, then the middle page is munlocked. Finally the whole 3 page range is munlocked, and after this patch it fails with ENOMEM. If I modify the test to split the final munlock into 2, one for the first page and one for the last, the test passes.
Immediately prior to this patch (2286a6914c77 "mm: change mprotect_fixup to vma iterator"), both tests pass.
From a quick scan of the man page, I don't think it explicitly says that its ok to call mlock/munlock on already locked/unlocked pages, but it's certainly a change of behavior and the tests notice, so I'm guessing this wasn't intentional?
I'm not familiar with this code so it's not obvious to me exactly what the problem is, but I'm hoping someone can help debug?
Thanks,
Ryan
>
> Signed-off-by: Liam R. Howlett <[email protected]>
> ---
> mm/mlock.c | 57 +++++++++++++++++++++++++++---------------------------
> 1 file changed, 28 insertions(+), 29 deletions(-)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index b680f11879c3..0d09b9070071 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -401,8 +401,9 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
> *
> * For vmas that pass the filters, merge/split as appropriate.
> */
> -static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
> - unsigned long start, unsigned long end, vm_flags_t newflags)
> +static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
> + struct vm_area_struct **prev, unsigned long start,
> + unsigned long end, vm_flags_t newflags)
> {
> struct mm_struct *mm = vma->vm_mm;
> pgoff_t pgoff;
> @@ -417,22 +418,22 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
> goto out;
>
> pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> - *prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
> - vma->vm_file, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> + *prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
> + vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> + vma->vm_userfaultfd_ctx, anon_vma_name(vma));
> if (*prev) {
> vma = *prev;
> goto success;
> }
>
> if (start != vma->vm_start) {
> - ret = split_vma(mm, vma, start, 1);
> + ret = vmi_split_vma(vmi, mm, vma, start, 1);
> if (ret)
> goto out;
> }
>
> if (end != vma->vm_end) {
> - ret = split_vma(mm, vma, end, 0);
> + ret = vmi_split_vma(vmi, mm, vma, end, 0);
> if (ret)
> goto out;
> }
> @@ -471,7 +472,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
> unsigned long nstart, end, tmp;
> struct vm_area_struct *vma, *prev;
> int error;
> - MA_STATE(mas, ¤t->mm->mm_mt, start, start);
> + VMA_ITERATOR(vmi, current->mm, start);
>
> VM_BUG_ON(offset_in_page(start));
> VM_BUG_ON(len != PAGE_ALIGN(len));
> @@ -480,39 +481,37 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
> return -EINVAL;
> if (end == start)
> return 0;
> - vma = mas_walk(&mas);
> + vma = vma_iter_load(&vmi);
> if (!vma)
> return -ENOMEM;
>
> + prev = vma_prev(&vmi);
> if (start > vma->vm_start)
> prev = vma;
> - else
> - prev = mas_prev(&mas, 0);
>
> - for (nstart = start ; ; ) {
> - vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + nstart = start;
> + tmp = vma->vm_start;
> + for_each_vma_range(vmi, vma, end) {
> + vm_flags_t newflags;
>
> - newflags |= flags;
> + if (vma->vm_start != tmp)
> + return -ENOMEM;
>
> + newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + newflags |= flags;
> /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
> tmp = vma->vm_end;
> if (tmp > end)
> tmp = end;
> - error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
> + error = mlock_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
> if (error)
> break;
> nstart = tmp;
> - if (nstart < prev->vm_end)
> - nstart = prev->vm_end;
> - if (nstart >= end)
> - break;
> -
> - vma = find_vma(prev->vm_mm, prev->vm_end);
> - if (!vma || vma->vm_start != nstart) {
> - error = -ENOMEM;
> - break;
> - }
> }
> +
> + if (vma_iter_end(&vmi) < end)
> + return -ENOMEM;
> +
> return error;
> }
>
> @@ -658,7 +657,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> */
> static int apply_mlockall_flags(int flags)
> {
> - MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
> + VMA_ITERATOR(vmi, current->mm, 0);
> struct vm_area_struct *vma, *prev = NULL;
> vm_flags_t to_add = 0;
>
> @@ -679,15 +678,15 @@ static int apply_mlockall_flags(int flags)
> to_add |= VM_LOCKONFAULT;
> }
>
> - mas_for_each(&mas, vma, ULONG_MAX) {
> + for_each_vma(vmi, vma) {
> vm_flags_t newflags;
>
> newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> newflags |= to_add;
>
> /* Ignore errors */
> - mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
> - mas_pause(&mas);
> + mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
> + newflags);
> cond_resched();
> }
> out:
On 11/07/2023 16:27, Liam R. Howlett wrote:
> * Ryan Roberts <[email protected]> [230711 10:09]:
>> On 20/01/2023 16:26, Liam R. Howlett wrote:
>>> From: "Liam R. Howlett" <[email protected]>
>>>
>>> Use the vma iterator so that the iterator can be invalidated or updated
>>> to avoid each caller doing so.
>>
>> Hi,
>
>
> Hello!
>
>>
>> I've bisected 2 mm selftest regressions back to this patch, so hoping someone can help debug and fix? The failures are reproducible on x86_64 and arm64.
>
> Thanks! That is a big help. Where did you start your bisection? I
> assume 6.4?
Yes, I'm working to get all the mm selftests running (and ideally passing!) on
arm64. I working on v6.4 and it was broken there. I went arbitrarily back to
v5.10 and it was working there, so bisected between them.
>
>>
>>
>> mlock-random-test:
>>
>> $ ./run_kselftest.sh -t mm:mlock-random-test
>> TAP version 13
>> 1..1
>> # selftests: mm: mlock-random-test
>> mlock() failure at |0xaaaaaaab52d0(131072)| mlock:|0xaaaaaaacc65d(26551)|
>> not ok 1 selftests: mm: mlock-random-test # exit=255
>>
>> This mallocs a buffer then loops 100 times, trying to mlock random parts of it. After this patch, the test fails after a variable number of iterations; mlock() returns ENOMEM. If I explicitly munlock at the end of each loop, it works.
>>
>>
>> mlock2-tests:
>>
>> $ ./run_kselftest.sh -t mm:mlock2-tests
>> TAP version 13
>> 1..1
>> # selftests: mm: mlock2-tests
>> munlock(): Cannot allocate memory
>> munlock(): Cannot allocate memory
>> not ok 1 selftests: mm: mlock2-tests # exit=2
>>
>> Here, a 3 page buffer is mlock2()ed, then the middle page is munlocked. Finally the whole 3 page range is munlocked, and after this patch it fails with ENOMEM. If I modify the test to split the final munlock into 2, one for the first page and one for the last, the test passes.
>>
>>
>> Immediately prior to this patch (2286a6914c77 "mm: change mprotect_fixup to vma iterator"), both tests pass.
>>
>> From a quick scan of the man page, I don't think it explicitly says that its ok to call mlock/munlock on already locked/unlocked pages, but it's certainly a change of behavior and the tests notice, so I'm guessing this wasn't intentional?
>>
>> I'm not familiar with this code so it's not obvious to me exactly what the problem is, but I'm hoping someone can help debug?
>
> I think I see the issue and I'm working on a fix. I appreciate the
> analysis and report, it really helps narrow things down.
You're welcome!
>
> Regards,
> Liam
* Ryan Roberts <[email protected]> [230711 10:09]:
> On 20/01/2023 16:26, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <[email protected]>
> >
> > Use the vma iterator so that the iterator can be invalidated or updated
> > to avoid each caller doing so.
>
> Hi,
Hello!
>
> I've bisected 2 mm selftest regressions back to this patch, so hoping someone can help debug and fix? The failures are reproducible on x86_64 and arm64.
Thanks! That is a big help. Where did you start your bisection? I
assume 6.4?
>
>
> mlock-random-test:
>
> $ ./run_kselftest.sh -t mm:mlock-random-test
> TAP version 13
> 1..1
> # selftests: mm: mlock-random-test
> mlock() failure at |0xaaaaaaab52d0(131072)| mlock:|0xaaaaaaacc65d(26551)|
> not ok 1 selftests: mm: mlock-random-test # exit=255
>
> This mallocs a buffer then loops 100 times, trying to mlock random parts of it. After this patch, the test fails after a variable number of iterations; mlock() returns ENOMEM. If I explicitly munlock at the end of each loop, it works.
>
>
> mlock2-tests:
>
> $ ./run_kselftest.sh -t mm:mlock2-tests
> TAP version 13
> 1..1
> # selftests: mm: mlock2-tests
> munlock(): Cannot allocate memory
> munlock(): Cannot allocate memory
> not ok 1 selftests: mm: mlock2-tests # exit=2
>
> Here, a 3 page buffer is mlock2()ed, then the middle page is munlocked. Finally the whole 3 page range is munlocked, and after this patch it fails with ENOMEM. If I modify the test to split the final munlock into 2, one for the first page and one for the last, the test passes.
>
>
> Immediately prior to this patch (2286a6914c77 "mm: change mprotect_fixup to vma iterator"), both tests pass.
>
> From a quick scan of the man page, I don't think it explicitly says that its ok to call mlock/munlock on already locked/unlocked pages, but it's certainly a change of behavior and the tests notice, so I'm guessing this wasn't intentional?
>
> I'm not familiar with this code so it's not obvious to me exactly what the problem is, but I'm hoping someone can help debug?
I think I see the issue and I'm working on a fix. I appreciate the
analysis and report, it really helps narrow things down.
Regards,
Liam
* Ryan Roberts <[email protected]> [230711 11:30]:
> On 11/07/2023 16:27, Liam R. Howlett wrote:
> > * Ryan Roberts <[email protected]> [230711 10:09]:
> >> On 20/01/2023 16:26, Liam R. Howlett wrote:
> >>> From: "Liam R. Howlett" <[email protected]>
> >>>
> >>> Use the vma iterator so that the iterator can be invalidated or updated
> >>> to avoid each caller doing so.
> >>
> >> Hi,
> >
> >
> > Hello!
> >
> >>
> >> I've bisected 2 mm selftest regressions back to this patch, so hoping someone can help debug and fix? The failures are reproducible on x86_64 and arm64.
> >
> > Thanks! That is a big help. Where did you start your bisection? I
> > assume 6.4?
>
> Yes, I'm working to get all the mm selftests running (and ideally passing!) on
> arm64. I working on v6.4 and it was broken there. I went arbitrarily back to
> v5.10 and it was working there, so bisected between them.
>
Annoyingly, this is similar to another bug I had fixed in another
iterator across VMAs. It's the same pattern and I did go back to see if
I had broken other places but, obviously, I missed this one.
It's annoying enough that I'm trying to figure out a better way to do
this in general.. A contiguous iterator of sorts. I will add this to my
maple tree work list [1].
[1] http://lists.infradead.org/pipermail/maple-tree/2023-July/002683.html
Thanks,
Liam