2023-01-05 19:35:06

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust()

From: "Liam R. Howlett" <[email protected]>

Andrew,

This patch set does two things: 1. Clean up, including removal of
__vma_adjust() and 2. Extends the VMA iterator API to provide type
safety to the VMA operations using the maple tree, as requested by Linus
[1].

It also addresses another issue of usability brought up by Linus about
needing to modify the maple state within the loops. The maple state has
been replaced by the VMA iterator and the iterator is now modified
within the MM code so the caller should not need to worry about doing
the work themselves when tree modifications occur.

This brought up a potential inconsistency of the iterator state and what
the user expects, so the inconsistency is addressed to keep the VMA
iterator safe for use after the looping over a VMA range. This is
addressed in patch 3 ("maple_tree: Reduce user error potential") and 4
("test_maple_tree: Test modifications while iterating").

While cleaning up the state, the duplicate locking code in mm/mmap.c
introduced by the maple tree has been address by abstracting it to two
functions: vma_prepare() and vma_complete(). These abstractions allowed
for a much simpler __vma_adjust(), which eventually leads to the removal
of the __vma_adjust() function by placing the logic into the vma_merge()
function itself.

1. https://lore.kernel.org/linux-mm/CAHk-=wg9WQXBGkNdKD2bqocnN73rDswuWsavBB7T-tekykEn_A@mail.gmail.com/

Changes since v1:
- Changed the subject to better highlight the removal of __vma_adjust()
- Converted damon test code to use the maple tree functions as apposed
to vma_mas_store(). This added an extra patch to the series
- Wrap debug output in vma_iter_store() with DEBUG_VM_MAPLE_TREE config
option
- Fix comment in mm/rmap.c referencing __vma_adjust()

v1: https://lore.kernel.org/linux-mm/[email protected]/

Liam R. Howlett (44):
maple_tree: Add mas_init() function
maple_tree: Fix potential rcu issue
maple_tree: Reduce user error potential
test_maple_tree: Test modifications while iterating
mm: Expand vma iterator interface.
mm/mmap: convert brk to use vma iterator
kernel/fork: Convert forking to using the vmi iterator
mmap: Convert vma_link() vma iterator
mm/mmap: Remove preallocation from do_mas_align_munmap()
mmap: Change do_mas_munmap and do_mas_aligned_munmap() to use vma
iterator
mmap: Convert vma_expand() to use vma iterator
mm: Add temporary vma iterator versions of vma_merge(), split_vma(),
and __split_vma()
ipc/shm: Use the vma iterator for munmap calls
userfaultfd: Use vma iterator
mm: Change mprotect_fixup to vma iterator
mlock: Convert mlock to vma iterator
coredump: Convert to vma iterator
mempolicy: Convert to vma iterator
task_mmu: Convert to vma iterator
sched: Convert to vma iterator
madvise: Use vmi iterator for __split_vma() and vma_merge()
mmap: Pass through vmi iterator to __split_vma()
mmap: Use vmi version of vma_merge()
mm/mremap: Use vmi version of vma_merge()
mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator
mm/damon: Stop using vma_mas_store() for maple tree store
mmap: Convert __vma_adjust() to use vma iterator
mm: Pass through vma iterator to __vma_adjust()
madvise: Use split_vma() instead of __split_vma()
mm: Remove unnecessary write to vma iterator in __vma_adjust()
mm: Pass vma iterator through to __vma_adjust()
mm: Add vma iterator to vma_adjust() arguments
mmap: Clean up mmap_region() unrolling
mm: Change munmap splitting order and move_vma()
mm/mmap: move anon_vma setting in __vma_adjust()
mm/mmap: Refactor locking out of __vma_adjust()
mm/mmap: Use vma_prepare() and vma_complete() in vma_expand()
mm/mmap: Introduce init_vma_prep() and init_multi_vma_prep()
mm: Don't use __vma_adjust() in __split_vma()
mm/mmap: Don't use __vma_adjust() in shift_arg_pages()
mm/mmap: Introduce dup_vma_anon() helper
mm/mmap: Convert do_brk_flags() to use vma_prepare() and
vma_complete()
mm/mmap: Remove __vma_adjust()
vma_merge: Set vma iterator to correct position.

fs/coredump.c | 8 +-
fs/exec.c | 16 +-
fs/proc/task_mmu.c | 14 +-
fs/userfaultfd.c | 88 ++-
include/linux/maple_tree.h | 11 +
include/linux/mm.h | 87 ++-
include/linux/mm_types.h | 4 +-
ipc/shm.c | 11 +-
kernel/events/uprobes.c | 2 +-
kernel/fork.c | 19 +-
kernel/sched/fair.c | 14 +-
lib/maple_tree.c | 12 +-
lib/test_maple_tree.c | 72 +++
mm/damon/vaddr-test.h | 6 +-
mm/filemap.c | 2 +-
mm/internal.h | 13 +
mm/madvise.c | 13 +-
mm/mempolicy.c | 25 +-
mm/mlock.c | 57 +-
mm/mmap.c | 1076 ++++++++++++++++++------------------
mm/mprotect.c | 47 +-
mm/mremap.c | 42 +-
mm/rmap.c | 15 +-
23 files changed, 876 insertions(+), 778 deletions(-)

--
2.35.1


2023-01-05 19:35:24

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 24/44] mm/mremap: Use vmi version of vma_merge()

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mremap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 94d2590f0871..4364daaf0e83 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1018,6 +1018,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
unsigned long extension_end = addr + new_len;
pgoff_t extension_pgoff = vma->vm_pgoff +
((extension_start - vma->vm_start) >> PAGE_SHIFT);
+ VMA_ITERATOR(vmi, mm, extension_start);

if (vma->vm_flags & VM_ACCOUNT) {
if (security_vm_enough_memory_mm(mm, pages)) {
@@ -1033,10 +1034,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
* with the next vma if it becomes adjacent to the expanded vma and
* otherwise compatible.
*/
- vma = vma_merge(mm, vma, extension_start, extension_end,
- vma->vm_flags, vma->anon_vma, vma->vm_file,
- extension_pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ vma = vmi_vma_merge(&vmi, mm, vma, extension_start,
+ extension_end, vma->vm_flags, vma->anon_vma,
+ vma->vm_file, extension_pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (!vma) {
vm_unacct_memory(pages);
ret = -ENOMEM;
--
2.35.1

2023-01-05 19:35:51

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 28/44] mm: Pass through vma iterator to __vma_adjust()

From: "Liam R. Howlett" <[email protected]>

Pass the vma iterator through to __vma_adjust() so the state can be
updated.

Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 6 ++++--
mm/mmap.c | 31 +++++++++++++++----------------
2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 28973a3941a4..294894969cd9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2822,13 +2822,15 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);

/* mmap.c */
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
+extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
struct vm_area_struct *expand);
static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
{
- return __vma_adjust(vma, start, end, pgoff, insert, NULL);
+ VMA_ITERATOR(vmi, vma->vm_mm, start);
+
+ return __vma_adjust(&vmi, vma, start, end, pgoff, insert, NULL);
}
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index a898ae2a57d5..a4e564163334 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -638,9 +638,9 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
* are necessary. The "insert" vma (if any) is to be inserted
* before we drop the necessary locks.
*/
-int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
- struct vm_area_struct *expand)
+int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *insert, struct vm_area_struct *expand)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *next_next = NULL; /* uninit var warning */
@@ -653,7 +653,6 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
bool vma_changed = false;
long adjust_next = 0;
int remove_next = 0;
- VMA_ITERATOR(vmi, mm, 0);
struct vm_area_struct *exporter = NULL, *importer = NULL;

if (next && !insert) {
@@ -738,7 +737,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
}

- if (vma_iter_prealloc(&vmi, vma))
+ if (vma_iter_prealloc(vmi, vma))
return -ENOMEM;

vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
@@ -784,7 +783,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (start != vma->vm_start) {
if ((vma->vm_start < start) &&
(!insert || (insert->vm_end != start))) {
- vma_iter_clear(&vmi, vma->vm_start, start);
+ vma_iter_clear(vmi, vma->vm_start, start);
VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
} else {
vma_changed = true;
@@ -794,8 +793,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (end != vma->vm_end) {
if (vma->vm_end > end) {
if (!insert || (insert->vm_start != end)) {
- vma_iter_clear(&vmi, end, vma->vm_end);
- vma_iter_set(&vmi, vma->vm_end);
+ vma_iter_clear(vmi, end, vma->vm_end);
+ vma_iter_set(vmi, vma->vm_end);
VM_WARN_ON(insert &&
insert->vm_end < vma->vm_end);
}
@@ -806,13 +805,13 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}

if (vma_changed)
- vma_iter_store(&vmi, vma);
+ vma_iter_store(vmi, vma);

vma->vm_pgoff = pgoff;
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- vma_iter_store(&vmi, next);
+ vma_iter_store(vmi, next);
}

if (file) {
@@ -832,7 +831,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
* us to insert it before dropping the locks
* (it may either follow vma or precede it).
*/
- vma_iter_store(&vmi, insert);
+ vma_iter_store(vmi, insert);
mm->map_count++;
}

@@ -878,7 +877,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (insert && file)
uprobe_mmap(insert);

- vma_iter_free(&vmi);
+ vma_iter_free(vmi);
validate_mm(mm);

return 0;
@@ -1072,20 +1071,20 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
if (merge_prev && merge_next &&
is_mergeable_anon_vma(prev->anon_vma,
next->anon_vma, NULL)) { /* cases 1, 6 */
- err = __vma_adjust(prev, prev->vm_start,
+ err = __vma_adjust(vmi, prev, prev->vm_start,
next->vm_end, prev->vm_pgoff, NULL,
prev);
res = prev;
} else if (merge_prev) { /* cases 2, 5, 7 */
- err = __vma_adjust(prev, prev->vm_start,
+ err = __vma_adjust(vmi, prev, prev->vm_start,
end, prev->vm_pgoff, NULL, prev);
res = prev;
} else if (merge_next) {
if (prev && addr < prev->vm_end) /* case 4 */
- err = __vma_adjust(prev, prev->vm_start,
+ err = __vma_adjust(vmi, prev, prev->vm_start,
addr, prev->vm_pgoff, NULL, next);
else /* cases 3, 8 */
- err = __vma_adjust(mid, addr, next->vm_end,
+ err = __vma_adjust(vmi, mid, addr, next->vm_end,
next->vm_pgoff - pglen, NULL, next);
res = next;
}
--
2.35.1

2023-01-05 19:35:58

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator

From: "Liam R. Howlett" <[email protected]>

Drop the vmi_* functions and transition all users to use the vma
iterator directly.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/userfaultfd.c | 14 ++++----
include/linux/mm.h | 16 +++-------
mm/madvise.c | 6 ++--
mm/mempolicy.c | 6 ++--
mm/mlock.c | 6 ++--
mm/mmap.c | 79 +++++++++++++---------------------------------
mm/mprotect.c | 6 ++--
mm/mremap.c | 2 +-
8 files changed, 47 insertions(+), 88 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index b3249388696a..e60f86d6b91c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -883,7 +883,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
continue;
}
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vmi_vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
+ prev = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
@@ -1426,7 +1426,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vma_end = min(end, vma->vm_end);

new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
- prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
+ prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
((struct vm_userfaultfd_ctx){ ctx }),
@@ -1437,12 +1437,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto next;
}
if (vma->vm_start < start) {
- ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+ ret = split_vma(&vmi, vma, start, 1);
if (ret)
break;
}
if (vma->vm_end > end) {
- ret = vmi_split_vma(&vmi, mm, vma, end, 0);
+ ret = split_vma(&vmi, vma, end, 0);
if (ret)
break;
}
@@ -1606,7 +1606,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
uffd_wp_range(mm, vma, start, vma_end - start, false);

new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
+ prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
NULL_VM_UFFD_CTX, anon_vma_name(vma));
@@ -1615,13 +1615,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto next;
}
if (vma->vm_start < start) {
- ret = vmi_split_vma(&vmi, mm, vma, start, 1);
+ ret = split_vma(&vmi, vma, start, 1);
if (ret)
break;
}
if (vma->vm_end > end) {
vma_iter_set(&vmi, vma->vm_end);
- ret = vmi_split_vma(&vmi, mm, vma, end, 0);
+ ret = split_vma(&vmi, vma, end, 0);
if (ret)
break;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 98c91a25d257..71474615b4ab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2830,22 +2830,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
{
return __vma_adjust(vma, start, end, pgoff, insert, NULL);
}
-extern struct vm_area_struct *vma_merge(struct mm_struct *,
- struct vm_area_struct *prev, unsigned long addr, unsigned long end,
- unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
-extern struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
+extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags, struct anon_vma *,
struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
struct anon_vma_name *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
-extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
- struct vm_area_struct *, unsigned long addr, int new_below);
-extern int split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
-extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
- struct vm_area_struct *, unsigned long addr, int new_below);
+extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+ unsigned long addr, int new_below);
+extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+ unsigned long addr, int new_below);
extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void unlink_file_vma(struct vm_area_struct *);
extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
diff --git a/mm/madvise.c b/mm/madvise.c
index 4ee85b85806a..4115516f58dd 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -150,7 +150,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
}

pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vmi_vma_merge(&vmi, mm, *prev, start, end, new_flags,
+ *prev = vma_merge(&vmi, mm, *prev, start, end, new_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_name);
if (*prev) {
@@ -163,7 +163,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (start != vma->vm_start) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = vmi__split_vma(&vmi, mm, vma, start, 1);
+ error = __split_vma(&vmi, vma, start, 1);
if (error)
return error;
}
@@ -171,7 +171,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (end != vma->vm_end) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = vmi__split_vma(&vmi, mm, vma, end, 0);
+ error = __split_vma(&vmi, vma, end, 0);
if (error)
return error;
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 6f41a30c24d5..171525b0c7a8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -810,7 +810,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,

pgoff = vma->vm_pgoff +
((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vmi_vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
+ prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
new_pol, vma->vm_userfaultfd_ctx,
anon_vma_name(vma));
@@ -819,12 +819,12 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
goto replace;
}
if (vma->vm_start != vmstart) {
- err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmstart, 1);
+ err = split_vma(&vmi, vma, vmstart, 1);
if (err)
goto out;
}
if (vma->vm_end != vmend) {
- err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmend, 0);
+ err = split_vma(&vmi, vma, vmend, 0);
if (err)
goto out;
}
diff --git a/mm/mlock.c b/mm/mlock.c
index f06b02b631b5..393cddee2f06 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -418,7 +418,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
goto out;

pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
+ *prev = vma_merge(vmi, mm, *prev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*prev) {
@@ -427,13 +427,13 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
}

if (start != vma->vm_start) {
- ret = vmi_split_vma(vmi, mm, vma, start, 1);
+ ret = split_vma(vmi, vma, start, 1);
if (ret)
goto out;
}

if (end != vma->vm_end) {
- ret = vmi_split_vma(vmi, mm, vma, end, 0);
+ ret = split_vma(vmi, vma, end, 0);
if (ret)
goto out;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index 579d586e4e6a..8e7f4fc36960 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1072,7 +1072,7 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
* parameter) may establish ptes with the wrong permissions of NNNN
* instead of the right permissions of XXXX.
*/
-struct vm_area_struct *vma_merge(struct mm_struct *mm,
+struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
@@ -1081,7 +1081,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
struct anon_vma_name *anon_name)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
- struct vm_area_struct *mid, *next, *res;
+ struct vm_area_struct *mid, *next, *res = NULL;
int err = -1;
bool merge_prev = false;
bool merge_next = false;
@@ -1147,26 +1147,11 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
if (err)
return NULL;
khugepaged_enter_vma(res, vm_flags);
- return res;
-}

-struct vm_area_struct *vmi_vma_merge(struct vma_iterator *vmi,
- struct mm_struct *mm,
- struct vm_area_struct *prev, unsigned long addr,
- unsigned long end, unsigned long vm_flags,
- struct anon_vma *anon_vma, struct file *file,
- pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
-{
- struct vm_area_struct *tmp;
-
- tmp = vma_merge(mm, prev, addr, end, vm_flags, anon_vma, file, pgoff,
- policy, vm_userfaultfd_ctx, anon_name);
- if (tmp)
+ if (res)
vma_iter_set(vmi, end);

- return tmp;
+ return res;
}

/*
@@ -2286,12 +2271,14 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
* __split_vma() bypasses sysctl_max_map_count checking. We use this where it
* has already been checked or doesn't make sense to fail.
*/
-int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
struct vm_area_struct *new;
int err;
- validate_mm_mt(mm);
+ unsigned long end = vma->vm_end;
+
+ validate_mm_mt(vma->vm_mm);

if (vma->vm_ops && vma->vm_ops->may_split) {
err = vma->vm_ops->may_split(vma, addr);
@@ -2331,8 +2318,10 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);

/* Success. */
- if (!err)
+ if (!err) {
+ vma_iter_set(vmi, end);
return 0;
+ }

/* Avoid vm accounting in close() operation */
new->vm_start = new->vm_end;
@@ -2347,46 +2336,21 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
mpol_put(vma_policy(new));
out_free_vma:
vm_area_free(new);
- validate_mm_mt(mm);
+ validate_mm_mt(vma->vm_mm);
return err;
}
-int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long addr, int new_below)
-{
- int ret;
- unsigned long end = vma->vm_end;
-
- ret = __split_vma(mm, vma, addr, new_below);
- if (!ret)
- vma_iter_set(vmi, end);
-
- return ret;
-}

/*
* Split a vma into two pieces at address 'addr', a new vma is allocated
* either for the first part or the tail.
*/
-int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
+int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
- if (mm->map_count >= sysctl_max_map_count)
+ if (vma->vm_mm->map_count >= sysctl_max_map_count)
return -ENOMEM;

- return __split_vma(mm, vma, addr, new_below);
-}
-
-int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long addr, int new_below)
-{
- int ret;
- unsigned long end = vma->vm_end;
-
- ret = split_vma(mm, vma, addr, new_below);
- if (!ret)
- vma_iter_set(vmi, end);
-
- return ret;
+ return __split_vma(vmi, vma, addr, new_below);
}

static inline int munmap_sidetree(struct vm_area_struct *vma,
@@ -2446,7 +2410,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
goto map_count_exceeded;

- error = vmi__split_vma(vmi, mm, vma, start, 0);
+ error = __split_vma(vmi, vma, start, 0);
if (error)
goto start_split_failed;

@@ -2467,7 +2431,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (next->vm_end > end) {
struct vm_area_struct *split;

- error = vmi__split_vma(vmi, mm, next, end, 1);
+ error = __split_vma(vmi, next, end, 1);
if (error)
goto end_split_failed;

@@ -2748,9 +2712,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && prev)) {
- merge = vmi_vma_merge(&vmi, mm, prev, vma->vm_start,
- vma->vm_end, vma->vm_flags, NULL, vma->vm_file,
- vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
+ merge = vma_merge(&vmi, mm, prev, vma->vm_start,
+ vma->vm_end, vma->vm_flags, NULL,
+ vma->vm_file, vma->vm_pgoff, NULL,
+ NULL_VM_UFFD_CTX, NULL);
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -3297,7 +3262,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */

- new_vma = vmi_vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
+ new_vma = vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (new_vma) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7e6cb2165000..057b7e3e93bb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -605,7 +605,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
* First try to merge with previous and/or next vma.
*/
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *pprev = vmi_vma_merge(vmi, mm, *pprev, start, end, newflags,
+ *pprev = vma_merge(vmi, mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*pprev) {
@@ -617,13 +617,13 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
*pprev = vma;

if (start != vma->vm_start) {
- error = vmi_split_vma(vmi, mm, vma, start, 1);
+ error = split_vma(vmi, vma, start, 1);
if (error)
goto fail;
}

if (end != vma->vm_end) {
- error = vmi_split_vma(vmi, mm, vma, end, 0);
+ error = split_vma(vmi, vma, end, 0);
if (error)
goto fail;
}
diff --git a/mm/mremap.c b/mm/mremap.c
index 4364daaf0e83..00845aec5441 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1034,7 +1034,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
* with the next vma if it becomes adjacent to the expanded vma and
* otherwise compatible.
*/
- vma = vmi_vma_merge(&vmi, mm, vma, extension_start,
+ vma = vma_merge(&vmi, mm, vma, extension_start,
extension_end, vma->vm_flags, vma->anon_vma,
vma->vm_file, extension_pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
--
2.35.1

2023-01-05 19:36:26

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 18/44] mempolicy: Convert to vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mempolicy.c | 25 ++++++++-----------------
1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 02c8a712282f..6f41a30c24d5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -787,24 +787,21 @@ static int vma_replace_policy(struct vm_area_struct *vma,
static int mbind_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct mempolicy *new_pol)
{
- MA_STATE(mas, &mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, mm, start);
struct vm_area_struct *prev;
struct vm_area_struct *vma;
int err = 0;
pgoff_t pgoff;

- prev = mas_prev(&mas, 0);
- if (unlikely(!prev))
- mas_set(&mas, start);
-
- vma = mas_find(&mas, end - 1);
+ prev = vma_prev(&vmi);
+ vma = vma_find(&vmi, end);
if (WARN_ON(!vma))
return 0;

if (start > vma->vm_start)
prev = vma;

- for (; vma; vma = mas_next(&mas, end - 1)) {
+ do {
unsigned long vmstart = max(start, vma->vm_start);
unsigned long vmend = min(end, vma->vm_end);

@@ -813,29 +810,23 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,

pgoff = vma->vm_pgoff +
((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
+ prev = vmi_vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
new_pol, vma->vm_userfaultfd_ctx,
anon_vma_name(vma));
if (prev) {
- /* vma_merge() invalidated the mas */
- mas_pause(&mas);
vma = prev;
goto replace;
}
if (vma->vm_start != vmstart) {
- err = split_vma(vma->vm_mm, vma, vmstart, 1);
+ err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmstart, 1);
if (err)
goto out;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
if (vma->vm_end != vmend) {
- err = split_vma(vma->vm_mm, vma, vmend, 0);
+ err = vmi_split_vma(&vmi, vma->vm_mm, vma, vmend, 0);
if (err)
goto out;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
replace:
err = vma_replace_policy(vma, new_pol);
@@ -843,7 +834,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
goto out;
next:
prev = vma;
- }
+ } for_each_vma_range(vmi, vma, end);

out:
return err;
--
2.35.1

2023-01-05 19:37:27

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 17/44] coredump: Convert to vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/coredump.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index de78bde2991b..f27d734f3102 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1111,14 +1111,14 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
* Helper function for iterating across a vma list. It ensures that the caller
* will visit `gate_vma' prior to terminating the search.
*/
-static struct vm_area_struct *coredump_next_vma(struct ma_state *mas,
+static struct vm_area_struct *coredump_next_vma(struct vma_iterator *vmi,
struct vm_area_struct *vma,
struct vm_area_struct *gate_vma)
{
if (gate_vma && (vma == gate_vma))
return NULL;

- vma = mas_next(mas, ULONG_MAX);
+ vma = vma_next(vmi);
if (vma)
return vma;
return gate_vma;
@@ -1146,7 +1146,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
{
struct vm_area_struct *gate_vma, *vma = NULL;
struct mm_struct *mm = current->mm;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
int i = 0;

/*
@@ -1167,7 +1167,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
return false;
}

- while ((vma = coredump_next_vma(&mas, vma, gate_vma)) != NULL) {
+ while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) {
struct core_vma_metadata *m = cprm->vma_meta + i;

m->start = vma->vm_start;
--
2.35.1

2023-01-05 19:37:30

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 07/44] kernel/fork: Convert forking to using the vmi iterator

From: "Liam R. Howlett" <[email protected]>

Avoid using the maple tree interface directly. This gains type safety.

Signed-off-by: Liam R. Howlett <[email protected]>
---
kernel/fork.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9f7fe3541897..441dcec60aae 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -585,8 +585,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
int retval;
unsigned long charge = 0;
LIST_HEAD(uf);
- MA_STATE(old_mas, &oldmm->mm_mt, 0, 0);
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(old_vmi, oldmm, 0);
+ VMA_ITERATOR(vmi, mm, 0);

uprobe_start_dup_mmap();
if (mmap_write_lock_killable(oldmm)) {
@@ -613,11 +613,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
goto out;
khugepaged_fork(mm, oldmm);

- retval = mas_expected_entries(&mas, oldmm->map_count);
+ retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count);
if (retval)
goto out;

- mas_for_each(&old_mas, mpnt, ULONG_MAX) {
+ for_each_vma(old_vmi, mpnt) {
struct file *file;

if (mpnt->vm_flags & VM_DONTCOPY) {
@@ -683,11 +683,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
hugetlb_dup_vma_private(tmp);

/* Link the vma into the MT */
- mas.index = tmp->vm_start;
- mas.last = tmp->vm_end - 1;
- mas_store(&mas, tmp);
- if (mas_is_err(&mas))
- goto fail_nomem_mas_store;
+ if (vma_iter_bulk_store(&vmi, tmp))
+ goto fail_nomem_vmi_store;

mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
@@ -702,7 +699,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
/* a new mm has just been created */
retval = arch_dup_mmap(oldmm, mm);
loop_out:
- mas_destroy(&mas);
+ vma_iter_free(&vmi);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
@@ -712,7 +709,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
uprobe_end_dup_mmap();
return retval;

-fail_nomem_mas_store:
+fail_nomem_vmi_store:
unlink_anon_vmas(tmp);
fail_nomem_anon_vma_fork:
mpol_put(vma_policy(tmp));
--
2.35.1

2023-01-05 19:37:31

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 31/44] mm: Pass vma iterator through to __vma_adjust()

From: "Liam R. Howlett" <[email protected]>

Pass the iterator through to be used in __vma_adjust(). The state of
the iterator needs to be correct for the operation that will occur so
make the adjustments.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 174cbf25251f..c10ab873b8e4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -587,6 +587,10 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_interval_tree_remove(vma, root);
}

+ /* VMA iterator points to previous, so set to start if necessary */
+ if (vma_iter_addr(vmi) != start)
+ vma_iter_set(vmi, start);
+
vma->vm_start = start;
vma->vm_end = end;
vma->vm_pgoff = pgoff;
@@ -2222,13 +2226,13 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
/*
* __split_vma() bypasses sysctl_max_map_count checking. We use this where it
* has already been checked or doesn't make sense to fail.
+ * VMA Iterator will point to the end VMA.
*/
int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
struct vm_area_struct *new;
int err;
- unsigned long end = vma->vm_end;

validate_mm_mt(vma->vm_mm);

@@ -2264,14 +2268,17 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
new->vm_ops->open(new);

if (new_below)
- err = vma_adjust(vma, addr, vma->vm_end, vma->vm_pgoff +
- ((addr - new->vm_start) >> PAGE_SHIFT), new);
+ err = __vma_adjust(vmi, vma, addr, vma->vm_end,
+ vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
+ new, NULL);
else
- err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
+ err = __vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
+ new, NULL);

/* Success. */
if (!err) {
- vma_iter_set(vmi, end);
+ if (new_below)
+ vma_next(vmi);
return 0;
}

@@ -2366,8 +2373,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (error)
goto start_split_failed;

- vma_iter_set(vmi, start);
- vma = vma_find(vmi, end);
+ vma = vma_iter_load(vmi);
}

prev = vma_prev(vmi);
@@ -2387,7 +2393,6 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (error)
goto end_split_failed;

- vma_iter_set(vmi, end);
split = vma_prev(vmi);
error = munmap_sidetree(split, &mas_detach);
if (error)
@@ -2631,6 +2636,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
goto unacct_error;
}

+ vma_iter_set(&vmi, addr);
vma->vm_start = addr;
vma->vm_end = end;
vma->vm_flags = vm_flags;
--
2.35.1

2023-01-05 19:37:45

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 23/44] mmap: Use vmi version of vma_merge()

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 80f12fcf158c..579d586e4e6a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2748,8 +2748,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && prev)) {
- merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags,
- NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
+ merge = vmi_vma_merge(&vmi, mm, prev, vma->vm_start,
+ vma->vm_end, vma->vm_flags, NULL, vma->vm_file,
+ vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
if (merge) {
/*
* ->mmap() can change vma->vm_file and fput
@@ -3280,6 +3281,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma, *prev;
bool faulted_in_anon_vma = true;
+ VMA_ITERATOR(vmi, mm, addr);

validate_mm_mt(mm);
/*
@@ -3295,7 +3297,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
if (new_vma && new_vma->vm_start < addr + len)
return NULL; /* should never get here */

- new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
+ new_vma = vmi_vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (new_vma) {
--
2.35.1

2023-01-05 19:37:56

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 11/44] mmap: Convert vma_expand() to use vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator instead of the maple state for type safety and for
consistency through the mm code.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 41767c585120..8fd48686f708 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -586,7 +586,7 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
*
* Returns: 0 on success
*/
-inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma,
+inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff,
struct vm_area_struct *next)
{
@@ -615,7 +615,7 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma,
/* Only handles expanding */
VM_BUG_ON(vma->vm_start < start || vma->vm_end > end);

- if (mas_preallocate(mas, vma, GFP_KERNEL))
+ if (vma_iter_prealloc(vmi, vma))
goto nomem;

vma_adjust_trans_huge(vma, start, end, 0);
@@ -640,8 +640,7 @@ inline int vma_expand(struct ma_state *mas, struct vm_area_struct *vma,
vma->vm_start = start;
vma->vm_end = end;
vma->vm_pgoff = pgoff;
- /* Note: mas must be pointing to the expanding VMA */
- vma_mas_store(vma, mas);
+ vma_iter_store(vmi, vma);

if (file) {
vma_interval_tree_insert(vma, root);
@@ -2655,7 +2654,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

/* Actually expand, if possible */
if (vma &&
- !vma_expand(&vmi.mas, vma, merge_start, merge_end, vm_pgoff, next)) {
+ !vma_expand(&vmi, vma, merge_start, merge_end, vm_pgoff, next)) {
khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}
--
2.35.1

2023-01-05 19:45:02

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 41/44] mm/mmap: Introduce dup_vma_anon() helper

From: "Liam R. Howlett" <[email protected]>

Create a helper for duplicating the anon vma when adjusting the vma.
This simplifies the logic of __vma_adjust().

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 74 ++++++++++++++++++++++++++++++-------------------------
1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index dad5c0113380..1e9b8eb00d45 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -679,6 +679,29 @@ static inline void vma_complete(struct vma_prepare *vp,
uprobe_mmap(vp->insert);
}

+/*
+ * dup_anon_vma() - Helper function to duplicate anon_vma
+ * @dst: The destination VMA
+ * @src: The source VMA
+ *
+ * Returns: 0 on success.
+ */
+static inline int dup_anon_vma(struct vm_area_struct *dst,
+ struct vm_area_struct *src)
+{
+ /*
+ * Easily overlooked: when mprotect shifts the boundary, make sure the
+ * expanding vma has anon_vma set if the shrinking vma had, to cover any
+ * anon pages imported.
+ */
+ if (src->anon_vma && !dst->anon_vma) {
+ dst->anon_vma = src->anon_vma;
+ return anon_vma_clone(dst, src);
+ }
+
+ return 0;
+}
+
/*
* vma_expand - Expand an existing VMA
*
@@ -704,15 +727,12 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct vma_prepare vp;

if (next && (vma != next) && (end == next->vm_end)) {
- remove_next = true;
- if (next->anon_vma && !vma->anon_vma) {
- int error;
+ int ret;

- vma->anon_vma = next->anon_vma;
- error = anon_vma_clone(vma, next);
- if (error)
- return error;
- }
+ remove_next = true;
+ ret = dup_anon_vma(vma, next);
+ if (ret)
+ return ret;
}

init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
@@ -801,10 +821,11 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct file *file = vma->vm_file;
bool vma_changed = false;
long adjust_next = 0;
- struct vm_area_struct *exporter = NULL, *importer = NULL;
struct vma_prepare vma_prep;

if (next) {
+ int error = 0;
+
if (end >= next->vm_end) {
/*
* vma expands, overlapping all the next, and
@@ -839,15 +860,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
end != remove2->vm_end);
}

- exporter = next;
- importer = vma;
-
/*
* If next doesn't have anon_vma, import from vma after
* next, if the vma overlaps with it.
*/
- if (remove2 != NULL && !next->anon_vma)
- exporter = remove2;
+ if (remove != NULL && !next->anon_vma)
+ error = dup_anon_vma(vma, remove2);
+ else
+ error = dup_anon_vma(vma, remove);

} else if (end > next->vm_start) {
/*
@@ -855,9 +875,8 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
* mprotect case 5 shifting the boundary up.
*/
adjust_next = (end - next->vm_start);
- exporter = next;
- importer = vma;
- VM_WARN_ON(expand != importer);
+ VM_WARN_ON(expand != vma);
+ error = dup_anon_vma(vma, next);
} else if (end < vma->vm_end) {
/*
* vma shrinks, and !insert tells it's not
@@ -865,24 +884,11 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
* mprotect case 4 shifting the boundary down.
*/
adjust_next = -(vma->vm_end - end);
- exporter = vma;
- importer = next;
- VM_WARN_ON(expand != importer);
- }
-
- /*
- * Easily overlooked: when mprotect shifts the boundary,
- * make sure the expanding vma has anon_vma set if the
- * shrinking vma had, to cover any anon pages imported.
- */
- if (exporter && exporter->anon_vma && !importer->anon_vma) {
- int error;
-
- importer->anon_vma = exporter->anon_vma;
- error = anon_vma_clone(importer, exporter);
- if (error)
- return error;
+ VM_WARN_ON(expand != next);
+ error = dup_anon_vma(next, vma);
}
+ if (error)
+ return error;
}

if (vma_iter_prealloc(vmi, vma))
--
2.35.1

2023-01-05 19:46:08

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 16/44] mlock: Convert mlock to vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mlock.c | 57 +++++++++++++++++++++++++++---------------------------
1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 7032f6dd0ce1..f06b02b631b5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -401,8 +401,9 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
*
* For vmas that pass the filters, merge/split as appropriate.
*/
-static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
- unsigned long start, unsigned long end, vm_flags_t newflags)
+static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, vm_flags_t newflags)
{
struct mm_struct *mm = vma->vm_mm;
pgoff_t pgoff;
@@ -417,22 +418,22 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
goto out;

pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
- vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ *prev = vmi_vma_merge(vmi, mm, *prev, start, end, newflags,
+ vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*prev) {
vma = *prev;
goto success;
}

if (start != vma->vm_start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(vmi, mm, vma, start, 1);
if (ret)
goto out;
}

if (end != vma->vm_end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = vmi_split_vma(vmi, mm, vma, end, 0);
if (ret)
goto out;
}
@@ -471,7 +472,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
unsigned long nstart, end, tmp;
struct vm_area_struct *vma, *prev;
int error;
- MA_STATE(mas, &current->mm->mm_mt, start, start);
+ VMA_ITERATOR(vmi, current->mm, start);

VM_BUG_ON(offset_in_page(start));
VM_BUG_ON(len != PAGE_ALIGN(len));
@@ -480,39 +481,37 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
return -EINVAL;
if (end == start)
return 0;
- vma = mas_walk(&mas);
+ vma = vma_find(&vmi, end);
if (!vma)
return -ENOMEM;

+ prev = vma_prev(&vmi);
if (start > vma->vm_start)
prev = vma;
- else
- prev = mas_prev(&mas, 0);

- for (nstart = start ; ; ) {
- vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ nstart = start;
+ tmp = vma->vm_start;
+ for_each_vma_range(vmi, vma, end) {
+ vm_flags_t newflags;

- newflags |= flags;
+ if (vma->vm_start != tmp)
+ return -ENOMEM;

+ newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ newflags |= flags;
/* Here we know that vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
- error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
+ error = mlock_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
if (error)
break;
nstart = tmp;
- if (nstart < prev->vm_end)
- nstart = prev->vm_end;
- if (nstart >= end)
- break;
-
- vma = find_vma(prev->vm_mm, prev->vm_end);
- if (!vma || vma->vm_start != nstart) {
- error = -ENOMEM;
- break;
- }
}
+
+ if (vma_iter_end(&vmi) < end)
+ return -ENOMEM;
+
return error;
}

@@ -658,7 +657,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
*/
static int apply_mlockall_flags(int flags)
{
- MA_STATE(mas, &current->mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, current->mm, 0);
struct vm_area_struct *vma, *prev = NULL;
vm_flags_t to_add = 0;

@@ -679,15 +678,15 @@ static int apply_mlockall_flags(int flags)
to_add |= VM_LOCKONFAULT;
}

- mas_for_each(&mas, vma, ULONG_MAX) {
+ for_each_vma(vmi, vma) {
vm_flags_t newflags;

newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
newflags |= to_add;

/* Ignore errors */
- mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
- mas_pause(&mas);
+ mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end,
+ newflags);
cond_resched();
}
out:
--
2.35.1

2023-01-05 19:47:49

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 15/44] mm: Change mprotect_fixup to vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 5 ++++-
include/linux/mm.h | 6 +++---
mm/mprotect.c | 47 ++++++++++++++++++++++------------------------
3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index ab913243a367..b98647eeae9f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -758,6 +758,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_expand;
unsigned long rlim_stack;
struct mmu_gather tlb;
+ struct vma_iterator vmi;

#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size */
@@ -812,8 +813,10 @@ int setup_arg_pages(struct linux_binprm *bprm,
vm_flags |= mm->def_flags;
vm_flags |= VM_STACK_INCOMPLETE_SETUP;

+ vma_iter_init(&vmi, mm, vma->vm_start);
+
tlb_gather_mmu(&tlb, mm);
- ret = mprotect_fixup(&tlb, vma, &prev, vma->vm_start, vma->vm_end,
+ ret = mprotect_fixup(&vmi, &tlb, vma, &prev, vma->vm_start, vma->vm_end,
vm_flags);
tlb_finish_mmu(&tlb);

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9c790c88f691..98c91a25d257 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2188,9 +2188,9 @@ extern unsigned long change_protection(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
unsigned long cp_flags);
-extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
- struct vm_area_struct **pprev, unsigned long start,
- unsigned long end, unsigned long newflags);
+extern int mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+ struct vm_area_struct *vma, struct vm_area_struct **pprev,
+ unsigned long start, unsigned long end, unsigned long newflags);

/*
* doesn't attempt to fault and will return short.
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 908df12caa26..7e6cb2165000 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -548,9 +548,9 @@ static const struct mm_walk_ops prot_none_walk_ops = {
};

int
-mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
- struct vm_area_struct **pprev, unsigned long start,
- unsigned long end, unsigned long newflags)
+mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+ struct vm_area_struct *vma, struct vm_area_struct **pprev,
+ unsigned long start, unsigned long end, unsigned long newflags)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long oldflags = vma->vm_flags;
@@ -605,7 +605,7 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
* First try to merge with previous and/or next vma.
*/
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *pprev = vma_merge(mm, *pprev, start, end, newflags,
+ *pprev = vmi_vma_merge(vmi, mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*pprev) {
@@ -617,13 +617,13 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
*pprev = vma;

if (start != vma->vm_start) {
- error = split_vma(mm, vma, start, 1);
+ error = vmi_split_vma(vmi, mm, vma, start, 1);
if (error)
goto fail;
}

if (end != vma->vm_end) {
- error = split_vma(mm, vma, end, 0);
+ error = vmi_split_vma(vmi, mm, vma, end, 0);
if (error)
goto fail;
}
@@ -672,7 +672,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
const bool rier = (current->personality & READ_IMPLIES_EXEC) &&
(prot & PROT_READ);
struct mmu_gather tlb;
- MA_STATE(mas, &current->mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;

start = untagged_addr(start);

@@ -704,8 +704,8 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
goto out;

- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, current->mm, start);
+ vma = vma_find(&vmi, end);
error = -ENOMEM;
if (!vma)
goto out;
@@ -728,18 +728,22 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
}
}

+ prev = vma_prev(&vmi);
if (start > vma->vm_start)
prev = vma;
- else
- prev = mas_prev(&mas, 0);

tlb_gather_mmu(&tlb, current->mm);
- for (nstart = start ; ; ) {
+ nstart = start;
+ tmp = vma->vm_start;
+ for_each_vma_range(vmi, vma, end) {
unsigned long mask_off_old_flags;
unsigned long newflags;
int new_vma_pkey;

- /* Here we know that vma->vm_start <= nstart < vma->vm_end. */
+ if (vma->vm_start != tmp) {
+ error = -ENOMEM;
+ break;
+ }

/* Does the application expect PROT_READ to imply PROT_EXEC */
if (rier && (vma->vm_flags & VM_MAYEXEC))
@@ -782,25 +786,18 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
break;
}

- error = mprotect_fixup(&tlb, vma, &prev, nstart, tmp, newflags);
+ error = mprotect_fixup(&vmi, &tlb, vma, &prev, nstart, tmp, newflags);
if (error)
break;

nstart = tmp;
-
- if (nstart < prev->vm_end)
- nstart = prev->vm_end;
- if (nstart >= end)
- break;
-
- vma = find_vma(current->mm, prev->vm_end);
- if (!vma || vma->vm_start != nstart) {
- error = -ENOMEM;
- break;
- }
prot = reqprot;
}
tlb_finish_mmu(&tlb);
+
+ if (vma_iter_end(&vmi) < end)
+ error = -ENOMEM;
+
out:
mmap_write_unlock(current->mm);
return error;
--
2.35.1

2023-01-05 19:48:48

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 40/44] mm/mmap: Don't use __vma_adjust() in shift_arg_pages()

From: "Liam R. Howlett" <[email protected]>

Introduce shrink_vma() which uses the vma_prepare() and vma_complete()
functions to reduce the vma coverage.

Convert shift_arg_pages() to use expand_vma() and the new shrink_vma()
function. Remove support from __vma_adjust() to reduce a vma size since
shift_arg_pages() is the only user that shrinks a VMA in this way.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 4 ++--
include/linux/mm.h | 13 ++++------
mm/mmap.c | 59 ++++++++++++++++++++++++++++++++++++----------
3 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index d52fca2dd30b..c0df813d2b45 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff))
+ if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
return -ENOMEM;

/*
@@ -733,7 +733,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)

vma_prev(&vmi);
/* Shrink the vma to just the new range */
- return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff);
+ return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
}

/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a00871cc63cc..0b229ddf43a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2822,14 +2822,11 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);

/* mmap.c */
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
-extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *expand);
-static inline int vma_adjust(struct vma_iterator *vmi,
- struct vm_area_struct *vma, unsigned long start, unsigned long end,
- pgoff_t pgoff)
-{
- return __vma_adjust(vmi, vma, start, end, pgoff, NULL);
-}
+extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next);
+extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff);
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
unsigned long end, unsigned long vm_flags, struct anon_vma *,
diff --git a/mm/mmap.c b/mm/mmap.c
index 3bca62c11686..dad5c0113380 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -696,10 +696,9 @@ static inline void vma_complete(struct vma_prepare *vp,
*
* Returns: 0 on success
*/
-inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next)
-
+int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next)
{
bool remove_next = false;
struct vma_prepare vp;
@@ -745,6 +744,44 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
nomem:
return -ENOMEM;
}
+
+/*
+ * vma_shrink() - Reduce an existing VMAs memory area
+ * @vmi: The vma iterator
+ * @vma: The VMA to modify
+ * @start: The new start
+ * @end: The new end
+ *
+ * Returns: 0 on success, -ENOMEM otherwise
+ */
+int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff)
+{
+ struct vma_prepare vp;
+
+ WARN_ON((vma->vm_start != start) && (vma->vm_end != end));
+
+ if (vma_iter_prealloc(vmi, vma))
+ return -ENOMEM;
+
+ init_vma_prep(&vp, vma);
+ vma_adjust_trans_huge(vma, start, end, 0);
+ vma_prepare(&vp);
+
+ if (vma->vm_start < start)
+ vma_iter_clear(vmi, vma->vm_start, start);
+
+ if (vma->vm_end > end)
+ vma_iter_clear(vmi, end, vma->vm_end);
+
+ vma->vm_start = start;
+ vma->vm_end = end;
+ vma->vm_pgoff = pgoff;
+ vma_complete(&vp, vmi, vma->vm_mm);
+ validate_mm(vma->vm_mm);
+ return 0;
+}
+
/*
* We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
* is already present in an i_mmap tree without adjusting the tree.
@@ -860,14 +897,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,

vma_prepare(&vma_prep);

- if (vma->vm_start < start)
- vma_iter_clear(vmi, vma->vm_start, start);
- else if (start != vma->vm_start)
- vma_changed = true;
-
- if (vma->vm_end > end)
- vma_iter_clear(vmi, end, vma->vm_end);
- else if (end != vma->vm_end)
+ if (start < vma->vm_start || end > vma->vm_end)
vma_changed = true;

vma->vm_start = start;
@@ -880,7 +910,10 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- vma_iter_store(vmi, next);
+ if (adjust_next < 0) {
+ WARN_ON_ONCE(vma_changed);
+ vma_iter_store(vmi, next);
+ }
}

vma_complete(&vma_prep, vmi, mm);
--
2.35.1

2023-01-05 19:51:48

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 13/44] ipc/shm: Use the vma iterator for munmap calls

From: "Liam R. Howlett" <[email protected]>

Pass through the vma iterator to do_vmi_munmap() to handle the iterator
state internally

Signed-off-by: Liam R. Howlett <[email protected]>
---
ipc/shm.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index bd2fcc4d454e..1c6a6b319a49 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1786,8 +1786,8 @@ long ksys_shmdt(char __user *shmaddr)
*/
file = vma->vm_file;
size = i_size_read(file_inode(vma->vm_file));
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
- mas_pause(&vmi.mas);
+ do_vmi_munmap(&vmi, mm, vma->vm_start,
+ vma->vm_end - vma->vm_start, NULL, false);
/*
* We discovered the size of the shm segment, so
* break out of here and fall through to the next
@@ -1810,10 +1810,9 @@ long ksys_shmdt(char __user *shmaddr)
/* finding a matching vma now does not alter retval */
if ((vma->vm_ops == &shm_vm_ops) &&
((vma->vm_start - addr)/PAGE_SIZE == vma->vm_pgoff) &&
- (vma->vm_file == file)) {
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
- mas_pause(&vmi.mas);
- }
+ (vma->vm_file == file))
+ do_vmi_munmap(&vmi, mm, vma->vm_start,
+ vma->vm_end - vma->vm_start, NULL, false);

vma = vma_next(&vmi);
}
--
2.35.1

2023-01-05 19:52:26

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 14/44] userfaultfd: Use vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/userfaultfd.c | 88 +++++++++++++++++++-----------------------------
1 file changed, 34 insertions(+), 54 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 98ac37e34e3d..b3249388696a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -857,7 +857,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
/* len == 0 means wake all */
struct userfaultfd_wake_range range = { .len = 0, };
unsigned long new_flags;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);

WRITE_ONCE(ctx->released, true);

@@ -874,7 +874,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
*/
mmap_write_lock(mm);
prev = NULL;
- mas_for_each(&mas, vma, ULONG_MAX) {
+ for_each_vma(vmi, vma) {
cond_resched();
BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^
!!(vma->vm_flags & __VM_UFFD_FLAGS));
@@ -883,13 +883,12 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
continue;
}
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
+ prev = vmi_vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end,
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
NULL_VM_UFFD_CTX, anon_vma_name(vma));
if (prev) {
- mas_pause(&mas);
vma = prev;
} else {
prev = vma;
@@ -1276,7 +1275,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
bool found;
bool basic_ioctls;
unsigned long start, end, vma_end;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;

user_uffdio_register = (struct uffdio_register __user *) arg;

@@ -1318,17 +1317,13 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
if (!mmget_not_zero(mm))
goto out;

+ ret = -EINVAL;
mmap_write_lock(mm);
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_find(&vmi, end);
if (!vma)
goto out_unlock;

- /* check that there's at least one vma in the range */
- ret = -EINVAL;
- if (vma->vm_start >= end)
- goto out_unlock;
-
/*
* If the first vma contains huge pages, make sure start address
* is aligned to huge page size.
@@ -1345,7 +1340,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
*/
found = false;
basic_ioctls = false;
- for (cur = vma; cur; cur = mas_next(&mas, end - 1)) {
+ cur = vma;
+ do {
cond_resched();

BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
@@ -1402,16 +1398,14 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
basic_ioctls = true;

found = true;
- }
+ } for_each_vma_range(vmi, cur, end);
BUG_ON(!found);

- mas_set(&mas, start);
- prev = mas_prev(&mas, 0);
- if (prev != vma)
- mas_next(&mas, ULONG_MAX);
+ vma_iter_set(&vmi, start);
+ prev = vma_prev(&vmi);

ret = 0;
- do {
+ for_each_vma_range(vmi, vma, end) {
cond_resched();

BUG_ON(!vma_can_userfault(vma, vm_flags));
@@ -1432,30 +1426,25 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vma_end = min(end, vma->vm_end);

new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
- prev = vma_merge(mm, prev, start, vma_end, new_flags,
+ prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
((struct vm_userfaultfd_ctx){ ctx }),
anon_vma_name(vma));
if (prev) {
/* vma_merge() invalidated the mas */
- mas_pause(&mas);
vma = prev;
goto next;
}
if (vma->vm_start < start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(&vmi, mm, vma, start, 1);
if (ret)
break;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
if (vma->vm_end > end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = vmi_split_vma(&vmi, mm, vma, end, 0);
if (ret)
break;
- /* split_vma() invalidated the mas */
- mas_pause(&mas);
}
next:
/*
@@ -1472,8 +1461,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
skip:
prev = vma;
start = vma->vm_end;
- vma = mas_next(&mas, end - 1);
- } while (vma);
+ }
+
out_unlock:
mmap_write_unlock(mm);
mmput(mm);
@@ -1517,7 +1506,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
bool found;
unsigned long start, end, vma_end;
const void __user *buf = (void __user *)arg;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ struct vma_iterator vmi;

ret = -EFAULT;
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1536,14 +1525,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out;

mmap_write_lock(mm);
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
- if (!vma)
- goto out_unlock;
-
- /* check that there's at least one vma in the range */
ret = -EINVAL;
- if (vma->vm_start >= end)
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_find(&vmi, end);
+ if (!vma)
goto out_unlock;

/*
@@ -1561,8 +1546,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
* Search for not compatible vmas.
*/
found = false;
- ret = -EINVAL;
- for (cur = vma; cur; cur = mas_next(&mas, end - 1)) {
+ cur = vma;
+ do {
cond_resched();

BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
@@ -1579,16 +1564,13 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out_unlock;

found = true;
- }
+ } for_each_vma_range(vmi, cur, end);
BUG_ON(!found);

- mas_set(&mas, start);
- prev = mas_prev(&mas, 0);
- if (prev != vma)
- mas_next(&mas, ULONG_MAX);
-
+ vma_iter_set(&vmi, start);
+ prev = vma_prev(&vmi);
ret = 0;
- do {
+ for_each_vma_range(vmi, vma, end) {
cond_resched();

BUG_ON(!vma_can_userfault(vma, vma->vm_flags));
@@ -1624,26 +1606,24 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
uffd_wp_range(mm, vma, start, vma_end - start, false);

new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
- prev = vma_merge(mm, prev, start, vma_end, new_flags,
+ prev = vmi_vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
NULL_VM_UFFD_CTX, anon_vma_name(vma));
if (prev) {
vma = prev;
- mas_pause(&mas);
goto next;
}
if (vma->vm_start < start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = vmi_split_vma(&vmi, mm, vma, start, 1);
if (ret)
break;
- mas_pause(&mas);
}
if (vma->vm_end > end) {
- ret = split_vma(mm, vma, end, 0);
+ vma_iter_set(&vmi, vma->vm_end);
+ ret = vmi_split_vma(&vmi, mm, vma, end, 0);
if (ret)
break;
- mas_pause(&mas);
}
next:
/*
@@ -1657,8 +1637,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
skip:
prev = vma;
start = vma->vm_end;
- vma = mas_next(&mas, end - 1);
- } while (vma);
+ }
+
out_unlock:
mmap_write_unlock(mm);
mmput(mm);
--
2.35.1

2023-01-05 19:54:23

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 27/44] mmap: Convert __vma_adjust() to use vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator internally for __vma_adjust(). Avoid using the
maple tree interface directly for type safety.

Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 3 --
mm/mmap.c | 75 ++++++++--------------------------------------
2 files changed, 13 insertions(+), 65 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 71474615b4ab..28973a3941a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2847,9 +2847,6 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
bool *need_rmap_locks);
extern void exit_mmap(struct mm_struct *);

-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas);
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas);
-
static inline int check_data_rlimit(unsigned long rlim,
unsigned long new,
unsigned long start,
diff --git a/mm/mmap.c b/mm/mmap.c
index 8e7f4fc36960..a898ae2a57d5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -494,56 +494,6 @@ static void __vma_link_file(struct vm_area_struct *vma,
flush_dcache_mmap_unlock(mapping);
}

-/*
- * vma_mas_store() - Store a VMA in the maple tree.
- * @vma: The vm_area_struct
- * @mas: The maple state
- *
- * Efficient way to store a VMA in the maple tree when the @mas has already
- * walked to the correct location.
- *
- * Note: the end address is inclusive in the maple tree.
- */
-void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas)
-{
- trace_vma_store(mas->tree, vma);
- mas_set_range(mas, vma->vm_start, vma->vm_end - 1);
- mas_store_prealloc(mas, vma);
-}
-
-/*
- * vma_mas_remove() - Remove a VMA from the maple tree.
- * @vma: The vm_area_struct
- * @mas: The maple state
- *
- * Efficient way to remove a VMA from the maple tree when the @mas has already
- * been established and points to the correct location.
- * Note: the end address is inclusive in the maple tree.
- */
-void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas)
-{
- trace_vma_mas_szero(mas->tree, vma->vm_start, vma->vm_end - 1);
- mas->index = vma->vm_start;
- mas->last = vma->vm_end - 1;
- mas_store_prealloc(mas, NULL);
-}
-
-/*
- * vma_mas_szero() - Set a given range to zero. Used when modifying a
- * vm_area_struct start or end.
- *
- * @mas: The maple tree ma_state
- * @start: The start address to zero
- * @end: The end address to zero.
- */
-static inline void vma_mas_szero(struct ma_state *mas, unsigned long start,
- unsigned long end)
-{
- trace_vma_mas_szero(mas->tree, start, end - 1);
- mas_set_range(mas, start, end - 1);
- mas_store_prealloc(mas, NULL);
-}
-
static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
{
VMA_ITERATOR(vmi, mm, 0);
@@ -703,7 +653,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
bool vma_changed = false;
long adjust_next = 0;
int remove_next = 0;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
struct vm_area_struct *exporter = NULL, *importer = NULL;

if (next && !insert) {
@@ -788,7 +738,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
}

- if (mas_preallocate(&mas, vma, GFP_KERNEL))
+ if (vma_iter_prealloc(&vmi, vma))
return -ENOMEM;

vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
@@ -834,7 +784,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (start != vma->vm_start) {
if ((vma->vm_start < start) &&
(!insert || (insert->vm_end != start))) {
- vma_mas_szero(&mas, vma->vm_start, start);
+ vma_iter_clear(&vmi, vma->vm_start, start);
VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
} else {
vma_changed = true;
@@ -844,8 +794,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (end != vma->vm_end) {
if (vma->vm_end > end) {
if (!insert || (insert->vm_start != end)) {
- vma_mas_szero(&mas, end, vma->vm_end);
- mas_reset(&mas);
+ vma_iter_clear(&vmi, end, vma->vm_end);
+ vma_iter_set(&vmi, vma->vm_end);
VM_WARN_ON(insert &&
insert->vm_end < vma->vm_end);
}
@@ -856,13 +806,13 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}

if (vma_changed)
- vma_mas_store(vma, &mas);
+ vma_iter_store(&vmi, vma);

vma->vm_pgoff = pgoff;
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
- vma_mas_store(next, &mas);
+ vma_iter_store(&vmi, next);
}

if (file) {
@@ -882,8 +832,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
* us to insert it before dropping the locks
* (it may either follow vma or precede it).
*/
- mas_reset(&mas);
- vma_mas_store(insert, &mas);
+ vma_iter_store(&vmi, insert);
mm->map_count++;
}

@@ -929,7 +878,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
if (insert && file)
uprobe_mmap(insert);

- mas_destroy(&mas);
+ vma_iter_free(&vmi);
validate_mm(mm);

return 0;
@@ -2057,7 +2006,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
anon_vma_interval_tree_pre_update_vma(vma);
vma->vm_end = address;
/* Overwrite old entry in mtree. */
- vma_mas_store(vma, &mas);
+ mas_set_range(&mas, vma->vm_start, address - 1);
+ mas_store_prealloc(&mas, vma);
anon_vma_interval_tree_post_update_vma(vma);
spin_unlock(&mm->page_table_lock);

@@ -2139,7 +2089,8 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
vma->vm_start = address;
vma->vm_pgoff -= grow;
/* Overwrite old entry in mtree. */
- vma_mas_store(vma, &mas);
+ mas_set_range(&mas, address, vma->vm_end - 1);
+ mas_store_prealloc(&mas, vma);
anon_vma_interval_tree_post_update_vma(vma);
spin_unlock(&mm->page_table_lock);

--
2.35.1

2023-01-05 19:55:00

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 21/44] madvise: Use vmi iterator for __split_vma() and vma_merge()

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/madvise.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index a56a6d17e201..4ee85b85806a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -142,6 +142,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
int error;
pgoff_t pgoff;
+ VMA_ITERATOR(vmi, mm, 0);

if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
*prev = vma;
@@ -149,8 +150,8 @@ static int madvise_update_vma(struct vm_area_struct *vma,
}

pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
- vma->vm_file, pgoff, vma_policy(vma),
+ *prev = vmi_vma_merge(&vmi, mm, *prev, start, end, new_flags,
+ vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_name);
if (*prev) {
vma = *prev;
@@ -162,7 +163,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (start != vma->vm_start) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = __split_vma(mm, vma, start, 1);
+ error = vmi__split_vma(&vmi, mm, vma, start, 1);
if (error)
return error;
}
@@ -170,7 +171,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
if (end != vma->vm_end) {
if (unlikely(mm->map_count >= sysctl_max_map_count))
return -ENOMEM;
- error = __split_vma(mm, vma, end, 0);
+ error = vmi__split_vma(&vmi, mm, vma, end, 0);
if (error)
return error;
}
--
2.35.1

2023-01-05 19:55:52

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 34/44] mm: Change munmap splitting order and move_vma()

From: "Liam R. Howlett" <[email protected]>

Splitting can be more efficient when the order is not of concern.
Change do_vmi_align_munmap() to reduce walking of the tree during split
operations.

move_vma() must also be altered to remove the dependency of keeping the
original VMA as the active part of the split. Transition to using vma
iterator to look up the prev and/or next vma after munmap.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 18 ++----------------
mm/mremap.c | 27 ++++++++++++++++-----------
2 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 99c94d49640b..c1796f9261e4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2387,21 +2387,9 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
for_each_vma_range(*vmi, next, end) {
/* Does it split the end? */
if (next->vm_end > end) {
- struct vm_area_struct *split;
-
- error = __split_vma(vmi, next, end, 1);
+ error = __split_vma(vmi, next, end, 0);
if (error)
goto end_split_failed;
-
- split = vma_prev(vmi);
- error = munmap_sidetree(split, &mas_detach);
- if (error)
- goto munmap_sidetree_failed;
-
- count++;
- if (vma == next)
- vma = split;
- break;
}
error = munmap_sidetree(next, &mas_detach);
if (error)
@@ -2414,9 +2402,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
#endif
}

- if (!next)
- next = vma_next(vmi);
-
+ next = vma_next(vmi);
if (unlikely(uf)) {
/*
* If userfaultfd_unmap_prep returns an error the vmas
diff --git a/mm/mremap.c b/mm/mremap.c
index 00845aec5441..98f27d466265 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -580,11 +580,12 @@ static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long vm_flags = vma->vm_flags;
unsigned long new_pgoff;
unsigned long moved_len;
- unsigned long excess = 0;
+ unsigned long account_start = 0;
+ unsigned long account_end = 0;
unsigned long hiwater_vm;
- int split = 0;
int err = 0;
bool need_rmap_locks;
+ VMA_ITERATOR(vmi, mm, old_addr);

/*
* We'd prefer to avoid failure later on in do_munmap:
@@ -662,10 +663,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
/* Conceal VM_ACCOUNT so old reservation is not undone */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
vma->vm_flags &= ~VM_ACCOUNT;
- excess = vma->vm_end - vma->vm_start - old_len;
- if (old_addr > vma->vm_start &&
- old_addr + old_len < vma->vm_end)
- split = 1;
+ if (vma->vm_start < old_addr)
+ account_start = vma->vm_start;
+ if (vma->vm_end > old_addr + old_len)
+ account_end = vma->vm_end;
}

/*
@@ -700,11 +701,11 @@ static unsigned long move_vma(struct vm_area_struct *vma,
return new_addr;
}

- if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
+ if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) {
/* OOM: unable to split vma, just get accounts right */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
vm_acct_memory(old_len >> PAGE_SHIFT);
- excess = 0;
+ account_start = account_end = 0;
}

if (vm_flags & VM_LOCKED) {
@@ -715,10 +716,14 @@ static unsigned long move_vma(struct vm_area_struct *vma,
mm->hiwater_vm = hiwater_vm;

/* Restore VM_ACCOUNT if one or two pieces of vma left */
- if (excess) {
+ if (account_start) {
+ vma = vma_prev(&vmi);
+ vma->vm_flags |= VM_ACCOUNT;
+ }
+
+ if (account_end) {
+ vma = vma_next(&vmi);
vma->vm_flags |= VM_ACCOUNT;
- if (split)
- find_vma(mm, vma->vm_end)->vm_flags |= VM_ACCOUNT;
}

return new_addr;
--
2.35.1

2023-01-05 19:56:05

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 35/44] mm/mmap: move anon_vma setting in __vma_adjust()

From: "Liam R. Howlett" <[email protected]>

Move the anon_vma setting & warn_no up the function. This is done to
clear up the locking later.

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index c1796f9261e4..c15a04bf3518 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -744,6 +744,14 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (vma_iter_prealloc(vmi, vma))
return -ENOMEM;

+ anon_vma = vma->anon_vma;
+ if (!anon_vma && adjust_next)
+ anon_vma = next->anon_vma;
+
+ if (anon_vma)
+ VM_WARN_ON(adjust_next && next->anon_vma &&
+ anon_vma != next->anon_vma);
+
vma_adjust_trans_huge(orig_vma, start, end, adjust_next);
if (file) {
mapping = file->f_mapping;
@@ -765,12 +773,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
}

- anon_vma = vma->anon_vma;
- if (!anon_vma && adjust_next)
- anon_vma = next->anon_vma;
if (anon_vma) {
- VM_WARN_ON(adjust_next && next->anon_vma &&
- anon_vma != next->anon_vma);
anon_vma_lock_write(anon_vma);
anon_vma_interval_tree_pre_update_vma(vma);
if (adjust_next)
--
2.35.1

2023-01-05 19:56:20

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 33/44] mmap: Clean up mmap_region() unrolling

From: "Liam R. Howlett" <[email protected]>

Move logic of unrolling to the error path as apposed to duplicating it
within the function body. This reduces the potential of missing an
update to one path when making changes.

Cc: Li Zetao <[email protected]>
Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 45 ++++++++++++++++++---------------------------
1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index d7530abdd7c0..99c94d49640b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2659,12 +2659,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* Expansion is handled above, merging is handled below.
* Drivers should not alter the address of the VMA.
*/
- if (WARN_ON((addr != vma->vm_start))) {
- error = -EINVAL;
+ error = -EINVAL;
+ if (WARN_ON((addr != vma->vm_start)))
goto close_and_free_vma;
- }
- vma_iter_set(&vmi, addr);

+ vma_iter_set(&vmi, addr);
/*
* If vm_flags changed after call_mmap(), we should try merge
* vma again as we may succeed this time.
@@ -2701,25 +2700,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}

/* Allow architectures to sanity-check the vm_flags */
- if (!arch_validate_flags(vma->vm_flags)) {
- error = -EINVAL;
- if (file)
- goto close_and_free_vma;
- else if (vma->vm_file)
- goto unmap_and_free_vma;
- else
- goto free_vma;
- }
+ error = -EINVAL;
+ if (!arch_validate_flags(vma->vm_flags))
+ goto close_and_free_vma;

- if (vma_iter_prealloc(&vmi, vma)) {
- error = -ENOMEM;
- if (file)
- goto close_and_free_vma;
- else if (vma->vm_file)
- goto unmap_and_free_vma;
- else
- goto free_vma;
- }
+ error = -ENOMEM;
+ if (vma_iter_prealloc(&vmi, vma))
+ goto close_and_free_vma;

if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
@@ -2778,14 +2765,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
return addr;

close_and_free_vma:
- if (vma->vm_ops && vma->vm_ops->close)
+ if (file && vma->vm_ops && vma->vm_ops->close)
vma->vm_ops->close(vma);
+
+ if (file || vma->vm_file) {
unmap_and_free_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
+ fput(vma->vm_file);
+ vma->vm_file = NULL;

- /* Undo any partial mapping done by a device driver. */
- unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start,
+ vma->vm_end);
+ }
if (file && (vm_flags & VM_SHARED))
mapping_unmap_writable(file->f_mapping);
free_vma:
--
2.35.1

2023-01-05 19:56:41

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 32/44] mm: Add vma iterator to vma_adjust() arguments

From: "Liam R. Howlett" <[email protected]>

Change the vma_adjust() function definition to accept the vma iterator
and pass it through to __vma_adjust().

Update fs/exec to use the new vma_adjust() function parameters.

Revert the __split_vma() calls back from __vma_adjust() to vma_adjust()
and pass through the vma iterator.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 11 ++++-------
include/linux/mm.h | 9 ++++-----
mm/mmap.c | 10 +++++-----
3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b98647eeae9f..76ee62e1d3f1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
+ if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
return -ENOMEM;

/*
@@ -731,12 +731,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
}
tlb_finish_mmu(&tlb);

- /*
- * Shrink the vma to just the new range. Always succeeds.
- */
- vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
-
- return 0;
+ vma_prev(&vmi);
+ /* Shrink the vma to just the new range */
+ return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff, NULL);
}

/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 294894969cd9..aabfd4183091 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2825,12 +2825,11 @@ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admi
extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
struct vm_area_struct *expand);
-static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
+static inline int vma_adjust(struct vma_iterator *vmi,
+ struct vm_area_struct *vma, unsigned long start, unsigned long end,
+ pgoff_t pgoff, struct vm_area_struct *insert)
{
- VMA_ITERATOR(vmi, vma->vm_mm, start);
-
- return __vma_adjust(&vmi, vma, start, end, pgoff, insert, NULL);
+ return __vma_adjust(vmi, vma, start, end, pgoff, insert, NULL);
}
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index c10ab873b8e4..d7530abdd7c0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2268,12 +2268,12 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
new->vm_ops->open(new);

if (new_below)
- err = __vma_adjust(vmi, vma, addr, vma->vm_end,
- vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
- new, NULL);
+ err = vma_adjust(vmi, vma, addr, vma->vm_end,
+ vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
+ new);
else
- err = __vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
- new, NULL);
+ err = vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
+ new);

/* Success. */
if (!err) {
--
2.35.1

2023-01-05 19:56:43

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 37/44] mm/mmap: Use vma_prepare() and vma_complete() in vma_expand()

From: "Liam R. Howlett" <[email protected]>

Use the new locking functions for vma_expand(). This reduces code
duplication.

At the same time change VM_BUG_ON() to VM_WARN_ON()

Signed-off-by: Liam R. Howlett <[email protected]>
---
mm/mmap.c | 189 +++++++++++++++++++++---------------------------------
1 file changed, 73 insertions(+), 116 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 3cf08aaee17d..9546d5811ca9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -519,122 +519,6 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
return 0;
}

-/*
- * vma_expand - Expand an existing VMA
- *
- * @mas: The maple state
- * @vma: The vma to expand
- * @start: The start of the vma
- * @end: The exclusive end of the vma
- * @pgoff: The page offset of vma
- * @next: The current of next vma.
- *
- * Expand @vma to @start and @end. Can expand off the start and end. Will
- * expand over @next if it's different from @vma and @end == @next->vm_end.
- * Checking if the @vma can expand and merge with @next needs to be handled by
- * the caller.
- *
- * Returns: 0 on success
- */
-inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
- unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *next)
-{
- struct mm_struct *mm = vma->vm_mm;
- struct address_space *mapping = NULL;
- struct rb_root_cached *root = NULL;
- struct anon_vma *anon_vma = vma->anon_vma;
- struct file *file = vma->vm_file;
- bool remove_next = false;
-
- if (next && (vma != next) && (end == next->vm_end)) {
- remove_next = true;
- if (next->anon_vma && !vma->anon_vma) {
- int error;
-
- anon_vma = next->anon_vma;
- vma->anon_vma = anon_vma;
- error = anon_vma_clone(vma, next);
- if (error)
- return error;
- }
- }
-
- /* Not merging but overwriting any part of next is not handled. */
- VM_BUG_ON(next && !remove_next && next != vma && end > next->vm_start);
- /* Only handles expanding */
- VM_BUG_ON(vma->vm_start < start || vma->vm_end > end);
-
- if (vma_iter_prealloc(vmi, vma))
- goto nomem;
-
- vma_adjust_trans_huge(vma, start, end, 0);
-
- if (file) {
- mapping = file->f_mapping;
- root = &mapping->i_mmap;
- uprobe_munmap(vma, vma->vm_start, vma->vm_end);
- i_mmap_lock_write(mapping);
- }
-
- if (anon_vma) {
- anon_vma_lock_write(anon_vma);
- anon_vma_interval_tree_pre_update_vma(vma);
- }
-
- if (file) {
- flush_dcache_mmap_lock(mapping);
- vma_interval_tree_remove(vma, root);
- }
-
- /* VMA iterator points to previous, so set to start if necessary */
- if (vma_iter_addr(vmi) != start)
- vma_iter_set(vmi, start);
-
- vma->vm_start = start;
- vma->vm_end = end;
- vma->vm_pgoff = pgoff;
- vma_iter_store(vmi, vma);
-
- if (file) {
- vma_interval_tree_insert(vma, root);
- flush_dcache_mmap_unlock(mapping);
- }
-
- /* Expanding over the next vma */
- if (remove_next && file) {
- __remove_shared_vm_struct(next, file, mapping);
- }
-
- if (anon_vma) {
- anon_vma_interval_tree_post_update_vma(vma);
- anon_vma_unlock_write(anon_vma);
- }
-
- if (file) {
- i_mmap_unlock_write(mapping);
- uprobe_mmap(vma);
- }
-
- if (remove_next) {
- if (file) {
- uprobe_munmap(next, next->vm_start, next->vm_end);
- fput(file);
- }
- if (next->anon_vma)
- anon_vma_merge(vma, next);
- mm->map_count--;
- mpol_put(vma_policy(next));
- vm_area_free(next);
- }
-
- validate_mm(mm);
- return 0;
-
-nomem:
- return -ENOMEM;
-}
-
/*
* vma_prepare() - Helper function for handling locking VMAs prior to altering
* @vp: The initialized vma_prepare struct
@@ -756,6 +640,79 @@ static inline void vma_complete(struct vma_prepare *vp,
uprobe_mmap(vp->insert);
}

+/*
+ * vma_expand - Expand an existing VMA
+ *
+ * @vmi: The vma iterator
+ * @vma: The vma to expand
+ * @start: The start of the vma
+ * @end: The exclusive end of the vma
+ * @pgoff: The page offset of vma
+ * @next: The current of next vma.
+ *
+ * Expand @vma to @start and @end. Can expand off the start and end. Will
+ * expand over @next if it's different from @vma and @end == @next->vm_end.
+ * Checking if the @vma can expand and merge with @next needs to be handled by
+ * the caller.
+ *
+ * Returns: 0 on success
+ */
+inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next)
+
+{
+ struct vma_prepare vp;
+
+ memset(&vp, 0, sizeof(vp));
+ vp.vma = vma;
+ vp.anon_vma = vma->anon_vma;
+ if (next && (vma != next) && (end == next->vm_end)) {
+ vp.remove = next;
+ if (next->anon_vma && !vma->anon_vma) {
+ int error;
+
+ vp.anon_vma = next->anon_vma;
+ vma->anon_vma = next->anon_vma;
+ error = anon_vma_clone(vma, next);
+ if (error)
+ return error;
+ }
+ }
+
+ /* Not merging but overwriting any part of next is not handled. */
+ VM_WARN_ON(next && !vp.remove &&
+ next != vma && end > next->vm_start);
+ /* Only handles expanding */
+ VM_WARN_ON(vma->vm_start < start || vma->vm_end > end);
+
+ if (vma_iter_prealloc(vmi, vma))
+ goto nomem;
+
+ vma_adjust_trans_huge(vma, start, end, 0);
+
+ vp.file = vma->vm_file;
+ if (vp.file)
+ vp.mapping = vp.file->f_mapping;
+
+ /* VMA iterator points to previous, so set to start if necessary */
+ if (vma_iter_addr(vmi) != start)
+ vma_iter_set(vmi, start);
+
+ vma_prepare(&vp);
+ vma->vm_start = start;
+ vma->vm_end = end;
+ vma->vm_pgoff = pgoff;
+ /* Note: mas must be pointing to the expanding VMA */
+ vma_iter_store(vmi, vma);
+
+ vma_complete(&vp, vmi, vma->vm_mm);
+ validate_mm(vma->vm_mm);
+ return 0;
+
+nomem:
+ return -ENOMEM;
+}
/*
* We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that
* is already present in an i_mmap tree without adjusting the tree.
--
2.35.1

2023-01-05 20:12:20

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 39/44] mm: Don't use __vma_adjust() in __split_vma()

From: "Liam R. Howlett" <[email protected]>

Use the abstracted locking and maple tree operations. Since
__split_vma() is the only user of the __vma_adjust() function to use the
insert argument, drop that argument. Remove the NULL passed through
from fs/exec's shift_arg_pages() at the same time.

Signed-off-by: Liam R. Howlett <[email protected]>
---
fs/exec.c | 4 +-
include/linux/mm.h | 7 ++-
mm/mmap.c | 114 ++++++++++++++++++++-------------------------
3 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 76ee62e1d3f1..d52fca2dd30b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
+ if (vma_adjust(&vmi, vma, new_start, old_end, vma->vm_pgoff))
return -ENOMEM;

/*
@@ -733,7 +733,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)

vma_prev(&vmi);
/* Shrink the vma to just the new range */
- return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff, NULL);
+ return vma_adjust(&vmi, vma, new_start, new_end, vma->vm_pgoff);
}

/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index aabfd4183091..a00871cc63cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2823,13 +2823,12 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
/* mmap.c */
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
extern int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
- struct vm_area_struct *expand);
+ unsigned long end, pgoff_t pgoff, struct vm_area_struct *expand);
static inline int vma_adjust(struct vma_iterator *vmi,
struct vm_area_struct *vma, unsigned long start, unsigned long end,
- pgoff_t pgoff, struct vm_area_struct *insert)
+ pgoff_t pgoff)
{
- return __vma_adjust(vmi, vma, start, end, pgoff, insert, NULL);
+ return __vma_adjust(vmi, vma, start, end, pgoff, NULL);
}
extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index 431c5ee9ce00..3bca62c11686 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -754,7 +754,7 @@ inline int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
*/
int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, pgoff_t pgoff,
- struct vm_area_struct *insert, struct vm_area_struct *expand)
+ struct vm_area_struct *expand)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *remove2 = NULL;
@@ -767,7 +767,7 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
struct vm_area_struct *exporter = NULL, *importer = NULL;
struct vma_prepare vma_prep;

- if (next && !insert) {
+ if (next) {
if (end >= next->vm_end) {
/*
* vma expands, overlapping all the next, and
@@ -858,39 +858,25 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
VM_WARN_ON(vma_prep.anon_vma && adjust_next && next->anon_vma &&
vma_prep.anon_vma != next->anon_vma);

- vma_prep.insert = insert;
vma_prepare(&vma_prep);

- if (start != vma->vm_start) {
- if (vma->vm_start < start) {
- if (!insert || (insert->vm_end != start)) {
- vma_iter_clear(vmi, vma->vm_start, start);
- vma_iter_set(vmi, start);
- VM_WARN_ON(insert && insert->vm_start > vma->vm_start);
- }
- } else {
- vma_changed = true;
- }
- vma->vm_start = start;
- }
- if (end != vma->vm_end) {
- if (vma->vm_end > end) {
- if (!insert || (insert->vm_start != end)) {
- vma_iter_clear(vmi, end, vma->vm_end);
- vma_iter_set(vmi, vma->vm_end);
- VM_WARN_ON(insert &&
- insert->vm_end < vma->vm_end);
- }
- } else {
- vma_changed = true;
- }
- vma->vm_end = end;
- }
+ if (vma->vm_start < start)
+ vma_iter_clear(vmi, vma->vm_start, start);
+ else if (start != vma->vm_start)
+ vma_changed = true;
+
+ if (vma->vm_end > end)
+ vma_iter_clear(vmi, end, vma->vm_end);
+ else if (end != vma->vm_end)
+ vma_changed = true;
+
+ vma->vm_start = start;
+ vma->vm_end = end;
+ vma->vm_pgoff = pgoff;

if (vma_changed)
vma_iter_store(vmi, vma);

- vma->vm_pgoff = pgoff;
if (adjust_next) {
next->vm_start += adjust_next;
next->vm_pgoff += adjust_next >> PAGE_SHIFT;
@@ -909,9 +895,9 @@ int __vma_adjust(struct vma_iterator *vmi, struct vm_area_struct *vma,
* per-vma resources, so we don't attempt to merge those.
*/
static inline int is_mergeable_vma(struct vm_area_struct *vma,
- struct file *file, unsigned long vm_flags,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- struct anon_vma_name *anon_name)
+ struct file *file, unsigned long vm_flags,
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct anon_vma_name *anon_name)
{
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1093,20 +1079,19 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
is_mergeable_anon_vma(prev->anon_vma,
next->anon_vma, NULL)) { /* cases 1, 6 */
err = __vma_adjust(vmi, prev, prev->vm_start,
- next->vm_end, prev->vm_pgoff, NULL,
- prev);
+ next->vm_end, prev->vm_pgoff, prev);
res = prev;
} else if (merge_prev) { /* cases 2, 5, 7 */
err = __vma_adjust(vmi, prev, prev->vm_start,
- end, prev->vm_pgoff, NULL, prev);
+ end, prev->vm_pgoff, prev);
res = prev;
} else if (merge_next) {
if (prev && addr < prev->vm_end) /* case 4 */
err = __vma_adjust(vmi, prev, prev->vm_start,
- addr, prev->vm_pgoff, NULL, next);
+ addr, prev->vm_pgoff, next);
else /* cases 3, 8 */
err = __vma_adjust(vmi, mid, addr, next->vm_end,
- next->vm_pgoff - pglen, NULL, next);
+ next->vm_pgoff - pglen, next);
res = next;
}

@@ -2246,6 +2231,7 @@ static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
+ struct vma_prepare vp;
struct vm_area_struct *new;
int err;

@@ -2261,16 +2247,20 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (!new)
return -ENOMEM;

- if (new_below)
+ err = -ENOMEM;
+ if (vma_iter_prealloc(vmi, vma))
+ goto out_free_vma;
+
+ if (new_below) {
new->vm_end = addr;
- else {
+ } else {
new->vm_start = addr;
new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
}

err = vma_dup_policy(vma, new);
if (err)
- goto out_free_vma;
+ goto out_free_vmi;

err = anon_vma_clone(new, vma);
if (err)
@@ -2282,33 +2272,31 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (new->vm_ops && new->vm_ops->open)
new->vm_ops->open(new);

- if (new_below)
- err = vma_adjust(vmi, vma, addr, vma->vm_end,
- vma->vm_pgoff + ((addr - new->vm_start) >> PAGE_SHIFT),
- new);
- else
- err = vma_adjust(vmi, vma, vma->vm_start, addr, vma->vm_pgoff,
- new);
+ vma_adjust_trans_huge(vma, vma->vm_start, addr, 0);
+ init_vma_prep(&vp, vma);
+ vp.insert = new;
+ vma_prepare(&vp);

- /* Success. */
- if (!err) {
- if (new_below)
- vma_next(vmi);
- return 0;
+ if (new_below) {
+ vma->vm_start = addr;
+ vma->vm_pgoff += (addr - new->vm_start) >> PAGE_SHIFT;
+ } else {
+ vma->vm_end = addr;
}

- /* Avoid vm accounting in close() operation */
- new->vm_start = new->vm_end;
- new->vm_pgoff = 0;
- /* Clean everything up if vma_adjust failed. */
- if (new->vm_ops && new->vm_ops->close)
- new->vm_ops->close(new);
- if (new->vm_file)
- fput(new->vm_file);
- unlink_anon_vmas(new);
- out_free_mpol:
+ /* vma_complete stores the new vma */
+ vma_complete(&vp, vmi, vma->vm_mm);
+
+ /* Success. */
+ if (new_below)
+ vma_next(vmi);
+ return 0;
+
+out_free_mpol:
mpol_put(vma_policy(new));
- out_free_vma:
+out_free_vmi:
+ vma_iter_free(vmi);
+out_free_vma:
vm_area_free(new);
validate_mm_mt(vma->vm_mm);
return err;
--
2.35.1

2023-01-05 20:14:38

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v2 20/44] sched: Convert to vma iterator

From: "Liam R. Howlett" <[email protected]>

Use the vma iterator so that the iterator can be invalidated or updated
to avoid each caller doing so.

Signed-off-by: Liam R. Howlett <[email protected]>
---
kernel/sched/fair.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c36aa54ae071..9c9950249d7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2938,11 +2938,11 @@ static void task_numa_work(struct callback_head *work)
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
u64 runtime = p->se.sum_exec_runtime;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
struct vm_area_struct *vma;
unsigned long start, end;
unsigned long nr_pte_updates = 0;
long pages, virtpages;
+ struct vma_iterator vmi;

SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));

@@ -2995,16 +2995,16 @@ static void task_numa_work(struct callback_head *work)

if (!mmap_read_trylock(mm))
return;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_next(&vmi);
if (!vma) {
reset_ptenuma_scan(p);
start = 0;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_set(&vmi, start);
+ vma = vma_next(&vmi);
}

- for (; vma; vma = mas_find(&mas, ULONG_MAX)) {
+ do {
if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
continue;
@@ -3051,7 +3051,7 @@ static void task_numa_work(struct callback_head *work)

cond_resched();
} while (end != vma->vm_end);
- }
+ } for_each_vma(vmi, vma);

out:
/*
--
2.35.1

2023-01-06 17:41:36

by SeongJae Park

[permalink] [raw]
Subject: Re: [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator

Hello Liam,

On Thu, 5 Jan 2023 19:15:59 +0000 Liam Howlett <[email protected]> wrote:

> From: "Liam R. Howlett" <[email protected]>
>
> Drop the vmi_* functions and transition all users to use the vma
> iterator directly.
>
> Signed-off-by: Liam R. Howlett <[email protected]>
> ---
> fs/userfaultfd.c | 14 ++++----
> include/linux/mm.h | 16 +++-------
> mm/madvise.c | 6 ++--
> mm/mempolicy.c | 6 ++--
> mm/mlock.c | 6 ++--
> mm/mmap.c | 79 +++++++++++++---------------------------------
> mm/mprotect.c | 6 ++--
> mm/mremap.c | 2 +-
> 8 files changed, 47 insertions(+), 88 deletions(-)
[...]
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2830,22 +2830,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
[...]
> -extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
> - struct vm_area_struct *, unsigned long addr, int new_below);
> -extern int split_vma(struct mm_struct *, struct vm_area_struct *,
> - unsigned long addr, int new_below);
> -extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
> - struct vm_area_struct *, unsigned long addr, int new_below);
> +extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> + unsigned long addr, int new_below);
> +extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> + unsigned long addr, int new_below);

I just found this change for split_vma() is applied to !CONFIG_MMU, which the
definition of split_vma() is not changed, so cause a build error. I posted a
simple fix for that:
https://lore.kernel.org/linux-mm/[email protected]/


Thanks,
SJ

[...]

2023-01-06 19:40:05

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2 25/44] mm: Switch vma_merge(), split_vma(), and __split_vma to vma iterator

* SeongJae Park <[email protected]> [230106 12:23]:
> Hello Liam,
>
> On Thu, 5 Jan 2023 19:15:59 +0000 Liam Howlett <[email protected]> wrote:
>
> > From: "Liam R. Howlett" <[email protected]>
> >
> > Drop the vmi_* functions and transition all users to use the vma
> > iterator directly.
> >
> > Signed-off-by: Liam R. Howlett <[email protected]>
> > ---
> > fs/userfaultfd.c | 14 ++++----
> > include/linux/mm.h | 16 +++-------
> > mm/madvise.c | 6 ++--
> > mm/mempolicy.c | 6 ++--
> > mm/mlock.c | 6 ++--
> > mm/mmap.c | 79 +++++++++++++---------------------------------
> > mm/mprotect.c | 6 ++--
> > mm/mremap.c | 2 +-
> > 8 files changed, 47 insertions(+), 88 deletions(-)
> [...]
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2830,22 +2830,16 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> [...]
> > -extern int vmi__split_vma(struct vma_iterator *vmi, struct mm_struct *,
> > - struct vm_area_struct *, unsigned long addr, int new_below);
> > -extern int split_vma(struct mm_struct *, struct vm_area_struct *,
> > - unsigned long addr, int new_below);
> > -extern int vmi_split_vma(struct vma_iterator *vmi, struct mm_struct *,
> > - struct vm_area_struct *, unsigned long addr, int new_below);
> > +extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> > + unsigned long addr, int new_below);
> > +extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
> > + unsigned long addr, int new_below);
>
> I just found this change for split_vma() is applied to !CONFIG_MMU, which the
> definition of split_vma() is not changed, so cause a build error. I posted a
> simple fix for that:
> https://lore.kernel.org/linux-mm/[email protected]/
>

Thanks. I think I need revisit the nommu side of things with this
change as well. I was hoping to avoid that, but it seems to be more
necessary than I had thought.

2023-01-10 23:10:39

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust()

On Thu, Jan 05, 2023 at 07:15:44PM +0000, Liam Howlett wrote:

> This patch set does two things: 1. Clean up, including removal of
> __vma_adjust() and 2. Extends the VMA iterator API to provide type
> safety to the VMA operations using the maple tree, as requested by Linus
> [1].

This series *appears* to be causing some fun issues in -next for the
past couple of days or so. The initial failures were seen by KernelCI
on several platforms (I've mostly been trying various arm64 things, at
least 32 bit ARM is also affected). The intial symptom seen is that a
go binary called skipgen that gets invoked as part of the testing
silently faults, tweaking things so that we get as far as running the
arm64 selftests results in much more useful output with various things
failing with actual error messages such as:

./fake_sigreturn_bad_magic: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory
./sve-test: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory

I'm fairly sure we're not actually running out of memory, there's no OOM
killer activity, the amount of memory the system has appears to make no
difference and just replacing the kernel with a mainline build runs as
expected.

You can see the full run that produced the above errors at:

https://lava.sirena.org.uk/scheduler/job/88257

which also embeds links to all the binaries used, exact commands run and
so on. The failing binaries all appear to be execed from within a
testsuite, though it's not *all* binaries execed from within tests (eg,
vec-syscfg execs things and seems happy).

This has taken out a bunch of testsuites in KernelCI (and probably other
CI systems using test-definitions, though I didn't check).

I tried to bisect this but otherwise haven't made any effort to look at
the failure. The bisect sadly got lost in this series since a lot of
the series either fails to build with:

/home/broonie/git/bisect/mm/madvise.c: In function 'madvise_update_vma':
/home/broonie/git/bisect/mm/madvise.c:165:25: error: implicit declaration of function '__split_vma'; did you mean 'split_vma'? [-Werror=implicit-function-declaration]
165 | error = __split_vma(mm, vma, start, 1);
| ^~~~~~~~~~~
| split_vma

or fails to boot with something along the lines of:

<6>[ 6.054380] Freeing initrd memory: 86880K
<1>[ 6.087945] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
<1>[ 6.088231] Mem abort info:
<1>[ 6.088340] ESR = 0x0000000096000004
<1>[ 6.088504] EC = 0x25: DABT (current EL), IL = 32 bits
<1>[ 6.088671] SET = 0, FnV = 0
<1>[ 6.088802] EA = 0, S1PTW = 0
<1>[ 6.088929] FSC = 0x04: level 0 translation fault
<1>[ 6.089099] Data abort info:
<1>[ 6.089210] ISV = 0, ISS = 0x00000004
<1>[ 6.089347] CM = 0, WnR = 0
<1>[ 6.089486] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043e33000
<1>[ 6.089692] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000
<0>[ 6.090566] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
<4>[ 6.090866] Modules linked in:
<4>[ 6.091167] CPU: 0 PID: 42 Comm: modprobe Not tainted 6.2.0-rc1-00190-g505c59767243 #13
<4>[ 6.091478] Hardware name: linux,dummy-virt (DT)
<4>[ 6.091784] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
<4>[ 6.092048] pc : mas_wr_walk+0x60/0x2d0
<4>[ 6.092622] lr : mas_wr_store_entry.isra.0+0x80/0x4a0
<4>[ 6.092798] sp : ffff80000821bb10
<4>[ 6.092926] x29: ffff80000821bb10 x28: ffff000003fa4480 x27: 0000000200100073
<4>[ 6.093206] x26: ffff000003fa41b0 x25: ffff000003fa43f0 x24: 0000000000000002
<4>[ 6.093445] x23: 0000000ffffae021 x22: 0000000000000000 x21: ffff000002a74440
<4>[ 6.093685] x20: ffff000003fa4480 x19: ffff80000821bc48 x18: 0000000000000000
<4>[ 6.093933] x17: 0000000000000000 x16: ffff000002b8da00 x15: ffff80000821bc48
<4>[ 6.094169] x14: 0000ffffae022fff x13: ffffffffffffffff x12: ffff000002b8da0c
<4>[ 6.094427] x11: ffff80000821bb68 x10: ffffd75265462458 x9 : ffff80000821bc48
<4>[ 6.094685] x8 : ffff80000821bbb8 x7 : ffff80000821bc48 x6 : ffffffffffffffff
<4>[ 6.094922] x5 : 000000000000000e x4 : 000000000000000e x3 : 0000000000000000
<4>[ 6.095167] x2 : 0000000000000008 x1 : 000000000000000f x0 : ffff80000821bb68
<4>[ 6.095499] Call trace:
<4>[ 6.095685] mas_wr_walk+0x60/0x2d0
<4>[ 6.095936] mas_store_prealloc+0x50/0xa0
<4>[ 6.096097] mmap_region+0x520/0x784
<4>[ 6.096232] do_mmap+0x3b0/0x52c
<4>[ 6.096347] vm_mmap_pgoff+0xe4/0x10c
<4>[ 6.096480] ksys_mmap_pgoff+0x4c/0x204
<4>[ 6.096621] __arm64_sys_mmap+0x30/0x44
<4>[ 6.096754] invoke_syscall+0x48/0x114
<4>[ 6.096900] el0_svc_common.constprop.0+0x44/0xec
<4>[ 6.097052] do_el0_svc+0x38/0xb0
<4>[ 6.097183] el0_svc+0x2c/0x84
<4>[ 6.097287] el0t_64_sync_handler+0xf4/0x120
<4>[ 6.097457] el0t_64_sync+0x190/0x194
<0>[ 6.097835] Code: 39402021 51000425 92401ca4 12001ca5 (f8647844)
<4>[ 6.098294] ---[ end trace 0000000000000000 ]---

(not always exactly the same backtrace, but the mas_wr_walk() was always
there.)

The specific set of commits in next-20230110 where bisect got lost was:

505c59767243 madvise: use vmi iterator for __split_vma() and vma_merge()
1cfdd2a44d6b mmap: pass through vmi iterator to __split_vma()
7d718fd9873c sched: convert to vma iterator
2f94851ec717 mmap: use vmi version of vma_merge()
7e2dd18353a3 task_mmu: convert to vma iterator
756841b468f5 mm/mremap: use vmi version of vma_merge()
aaba4ba837fa mempolicy: convert to vma iterator
8193673ee5d8 coredump: convert to vma iterator
d4f7ebf41a44 mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
4b02758dc3c5 mlock: convert mlock to vma iterator
fd367dac089e include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
3a72a0174748 mm/damon: stop using vma_mas_store() for maple tree store
dd51a3ca1096 mm: change mprotect_fixup to vma iterator
b9e4eabb8f40 mmap: convert __vma_adjust() to use vma iterator
c6fc05242a09 userfaultfd: use vma iterator
b9000fd4c5a6 mmap-convert-__vma_adjust-to-use-vma-iterator-fix
bdfb333b0b2a ipc/shm: use the vma iterator for munmap calls
3128296746a1 mm: pass through vma iterator to __vma_adjust()
80c8eed1721e mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
311129a7971c mmap: convert vma_expand() to use vma iterator
69e9b6c8a525 madvise: use split_vma() instead of __split_vma()
751f0a6713a9 mm: remove unnecessary write to vma iterator in __vma_adjust()
a7f83eb601ef mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
39fd6622223e mm: pass vma iterator through to __vma_adjust()

(that last one actually failed, the rest were skipped.) Full bisect
log:

git bisect start
# bad: [435bf71af3a0aa8067f3b87ff9febf68b564dbb6] Add linux-next specific files for 20230110
git bisect bad 435bf71af3a0aa8067f3b87ff9febf68b564dbb6
# good: [1fe4fd6f5cad346e598593af36caeadc4f5d4fa9] Merge tag 'xfs-6.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect good 1fe4fd6f5cad346e598593af36caeadc4f5d4fa9
# good: [57aac56e8af1628ef96055820f88ca547233b310] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
git bisect good 57aac56e8af1628ef96055820f88ca547233b310
# good: [c9167d1c0ec75118a2859099255f68dc4d0779fd] Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git bisect good c9167d1c0ec75118a2859099255f68dc4d0779fd
# good: [74f6598c9d8197774cfa9038c0cf0925cc5f178f] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
git bisect good 74f6598c9d8197774cfa9038c0cf0925cc5f178f
# bad: [f434860645df3dc10aae20654f17eb30955196c6] drivers/misc/open-dice: don't touch VM_MAYSHARE
git bisect bad f434860645df3dc10aae20654f17eb30955196c6
# good: [f73d9ff6ef5a79d212319950dab7d6b1fdea9ee9] mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected()
git bisect good f73d9ff6ef5a79d212319950dab7d6b1fdea9ee9
# skip: [311129a7971cb4b80038fca4b4ac0c6214dbc46f] mmap: convert vma_expand() to use vma iterator
git bisect skip 311129a7971cb4b80038fca4b4ac0c6214dbc46f
# bad: [85a9b62c63adb67becc48887b6e211a3760e1758] zram: correctly handle all next_arg() cases
git bisect bad 85a9b62c63adb67becc48887b6e211a3760e1758
# good: [f355b8d96876e06a6879e8936297474fdf8b5e82] mm/mmap: remove preallocation from do_mas_align_munmap()
git bisect good f355b8d96876e06a6879e8936297474fdf8b5e82
# skip: [061dc47414898c882c8ffb55c60434f41e844cb7] mm: add vma iterator to vma_adjust() arguments
git bisect skip 061dc47414898c882c8ffb55c60434f41e844cb7
# skip: [751f0a6713a94e739a924d8729fd58628e119ef6] mm: remove unnecessary write to vma iterator in __vma_adjust()
git bisect skip 751f0a6713a94e739a924d8729fd58628e119ef6
# skip: [505c597672439d99cb42b11b5ea56fbf00746e0a] madvise: use vmi iterator for __split_vma() and vma_merge()
git bisect skip 505c597672439d99cb42b11b5ea56fbf00746e0a
# skip: [b01b3b8a73656aa475df807c17e4a34254d3a4c1] mm: change munmap splitting order and move_vma()
git bisect skip b01b3b8a73656aa475df807c17e4a34254d3a4c1
# bad: [3eade064bd22a24bcde84bdf371fb746087f6c9b] mm: fix two spelling mistakes in highmem.h
git bisect bad 3eade064bd22a24bcde84bdf371fb746087f6c9b
# skip: [b9000fd4c5a64464e62e61da21f2101543b2e042] mmap-convert-__vma_adjust-to-use-vma-iterator-fix
git bisect skip b9000fd4c5a64464e62e61da21f2101543b2e042
# skip: [a7f83eb601efc719889279bf9981b4b3f23f0084] mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
git bisect skip a7f83eb601efc719889279bf9981b4b3f23f0084
# skip: [1b55bb7e3b16724e91020c168eb50c40a1f5df88] mmap: clean up mmap_region() unrolling
git bisect skip 1b55bb7e3b16724e91020c168eb50c40a1f5df88
# bad: [4b9c180dfc284fbbecad8feaa4b5f86a12d04e49] mm/mmap: remove __vma_adjust()
git bisect bad 4b9c180dfc284fbbecad8feaa4b5f86a12d04e49
# skip: [3a72a017474833fca226699e3cc7a95cdf55d421] mm/damon: stop using vma_mas_store() for maple tree store
git bisect skip 3a72a017474833fca226699e3cc7a95cdf55d421
# skip: [bdfb333b0b2a025de350a01748be1406801f1f24] ipc/shm: use the vma iterator for munmap calls
git bisect skip bdfb333b0b2a025de350a01748be1406801f1f24
# skip: [2f94851ec717a9b318ac57c011af349a5ef20f5e] mmap: use vmi version of vma_merge()
git bisect skip 2f94851ec717a9b318ac57c011af349a5ef20f5e
# skip: [07364e5b9a1db3a939395c387e0222964b962561] mm: don't use __vma_adjust() in __split_vma()
git bisect skip 07364e5b9a1db3a939395c387e0222964b962561
# skip: [756841b468f59fd31c3dcd1ff574a2c582124a7e] mm/mremap: use vmi version of vma_merge()
git bisect skip 756841b468f59fd31c3dcd1ff574a2c582124a7e
# skip: [c6fc05242a095b7652e501ae73313730359a4bbb] userfaultfd: use vma iterator
git bisect skip c6fc05242a095b7652e501ae73313730359a4bbb
# skip: [3128296746a14cb620247ffd3f8ff38dd4c58102] mm: pass through vma iterator to __vma_adjust()
git bisect skip 3128296746a14cb620247ffd3f8ff38dd4c58102
# skip: [dd51a3ca1096d568a796b5b21851d9d07e5955eb] mm: change mprotect_fixup to vma iterator
git bisect skip dd51a3ca1096d568a796b5b21851d9d07e5955eb
# skip: [d4f7ebf41a4428a3ea6f202e297b7584f1109a78] mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
git bisect skip d4f7ebf41a4428a3ea6f202e297b7584f1109a78
# skip: [d2297db1d48afba5b74eb002c1cbf7beb8a5c241] mm/mmap: use vma_prepare() and vma_complete() in vma_expand()
git bisect skip d2297db1d48afba5b74eb002c1cbf7beb8a5c241
# skip: [fd367dac089e27a60bc0700dc272428cb9da8446] include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
git bisect skip fd367dac089e27a60bc0700dc272428cb9da8446
# skip: [4b02758dc3c5f80582e4c822d28ef271828b8d68] mlock: convert mlock to vma iterator
git bisect skip 4b02758dc3c5f80582e4c822d28ef271828b8d68
# skip: [69e9b6c8a5256fdc6a5854375e6d231527f33247] madvise: use split_vma() instead of __split_vma()
git bisect skip 69e9b6c8a5256fdc6a5854375e6d231527f33247
# skip: [0471d6b0df5e8afe03cb7ff3cd507dd8d45dd0ac] mm/mmap: refactor locking out of __vma_adjust()
git bisect skip 0471d6b0df5e8afe03cb7ff3cd507dd8d45dd0ac
# skip: [b9e4eabb8f40e7dae4b0d5f33826b6d27c33a6e7] mmap: convert __vma_adjust() to use vma iterator
git bisect skip b9e4eabb8f40e7dae4b0d5f33826b6d27c33a6e7
# skip: [edd9f4829c57c856109764d6c1140428b9f275b5] mm/mmap: move anon_vma setting in __vma_adjust()
git bisect skip edd9f4829c57c856109764d6c1140428b9f275b5
# skip: [1cfdd2a44d6b142dc6c16108e1efc8404c21f3b6] mmap: pass through vmi iterator to __split_vma()
git bisect skip 1cfdd2a44d6b142dc6c16108e1efc8404c21f3b6
# skip: [fc63eb0e3016002ee0683829f0673463ee0d855e] mm/mmap: introduce init_vma_prep() and init_multi_vma_prep()
git bisect skip fc63eb0e3016002ee0683829f0673463ee0d855e
# bad: [39fd6622223e2f26f585c2c19cf69443ba5b3549] mm: pass vma iterator through to __vma_adjust()
git bisect bad 39fd6622223e2f26f585c2c19cf69443ba5b3549
# skip: [7d718fd9873c157fc791816829ece1a96e7ac4d3] sched: convert to vma iterator
git bisect skip 7d718fd9873c157fc791816829ece1a96e7ac4d3
# skip: [aaba4ba837fa08bb6e822d0726a6718f861661d7] mempolicy: convert to vma iterator
git bisect skip aaba4ba837fa08bb6e822d0726a6718f861661d7
# skip: [7e2dd18353a3f09d2ad16cd4977dd9d716104863] task_mmu: convert to vma iterator
git bisect skip 7e2dd18353a3f09d2ad16cd4977dd9d716104863
# skip: [8193673ee5d8a88563cfd5f5befe299c41d49e54] coredump: convert to vma iterator
git bisect skip 8193673ee5d8a88563cfd5f5befe299c41d49e54
# skip: [80c8eed1721ee630b2494f14f239d7b3389dac7e] mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
git bisect skip 80c8eed1721ee630b2494f14f239d7b3389dac7e
# only skipped commits left to test
# possible first bad commit: [39fd6622223e2f26f585c2c19cf69443ba5b3549] mm: pass vma iterator through to __vma_adjust()
# possible first bad commit: [751f0a6713a94e739a924d8729fd58628e119ef6] mm: remove unnecessary write to vma iterator in __vma_adjust()
# possible first bad commit: [69e9b6c8a5256fdc6a5854375e6d231527f33247] madvise: use split_vma() instead of __split_vma()
# possible first bad commit: [3128296746a14cb620247ffd3f8ff38dd4c58102] mm: pass through vma iterator to __vma_adjust()
# possible first bad commit: [b9000fd4c5a64464e62e61da21f2101543b2e042] mmap-convert-__vma_adjust-to-use-vma-iterator-fix
# possible first bad commit: [b9e4eabb8f40e7dae4b0d5f33826b6d27c33a6e7] mmap: convert __vma_adjust() to use vma iterator
# possible first bad commit: [3a72a017474833fca226699e3cc7a95cdf55d421] mm/damon: stop using vma_mas_store() for maple tree store
# possible first bad commit: [fd367dac089e27a60bc0700dc272428cb9da8446] include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
# possible first bad commit: [d4f7ebf41a4428a3ea6f202e297b7584f1109a78] mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
# possible first bad commit: [756841b468f59fd31c3dcd1ff574a2c582124a7e] mm/mremap: use vmi version of vma_merge()
# possible first bad commit: [2f94851ec717a9b318ac57c011af349a5ef20f5e] mmap: use vmi version of vma_merge()
# possible first bad commit: [1cfdd2a44d6b142dc6c16108e1efc8404c21f3b6] mmap: pass through vmi iterator to __split_vma()
# possible first bad commit: [505c597672439d99cb42b11b5ea56fbf00746e0a] madvise: use vmi iterator for __split_vma() and vma_merge()
# possible first bad commit: [7d718fd9873c157fc791816829ece1a96e7ac4d3] sched: convert to vma iterator
# possible first bad commit: [7e2dd18353a3f09d2ad16cd4977dd9d716104863] task_mmu: convert to vma iterator
# possible first bad commit: [aaba4ba837fa08bb6e822d0726a6718f861661d7] mempolicy: convert to vma iterator
# possible first bad commit: [8193673ee5d8a88563cfd5f5befe299c41d49e54] coredump: convert to vma iterator
# possible first bad commit: [4b02758dc3c5f80582e4c822d28ef271828b8d68] mlock: convert mlock to vma iterator
# possible first bad commit: [dd51a3ca1096d568a796b5b21851d9d07e5955eb] mm: change mprotect_fixup to vma iterator
# possible first bad commit: [c6fc05242a095b7652e501ae73313730359a4bbb] userfaultfd: use vma iterator
# possible first bad commit: [bdfb333b0b2a025de350a01748be1406801f1f24] ipc/shm: use the vma iterator for munmap calls
# possible first bad commit: [80c8eed1721ee630b2494f14f239d7b3389dac7e] mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
# possible first bad commit: [311129a7971cb4b80038fca4b4ac0c6214dbc46f] mmap: convert vma_expand() to use vma iterator
# possible first bad commit: [a7f83eb601efc719889279bf9981b4b3f23f0084] mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator


Attachments:
(No filename) (16.66 kB)
signature.asc (499.00 B)
Download all attachments

2023-01-11 02:51:43

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2 00/44] VMA tree type safety and remove __vma_adjust()

* Mark Brown <[email protected]> [230110 17:52]:
> On Thu, Jan 05, 2023 at 07:15:44PM +0000, Liam Howlett wrote:
>
> > This patch set does two things: 1. Clean up, including removal of
> > __vma_adjust() and 2. Extends the VMA iterator API to provide type
> > safety to the VMA operations using the maple tree, as requested by Linus
> > [1].
>
> This series *appears* to be causing some fun issues in -next for the
> past couple of days or so. The initial failures were seen by KernelCI
> on several platforms (I've mostly been trying various arm64 things, at
> least 32 bit ARM is also affected). The intial symptom seen is that a
> go binary called skipgen that gets invoked as part of the testing
> silently faults, tweaking things so that we get as far as running the
> arm64 selftests results in much more useful output with various things
> failing with actual error messages such as:
>
> ./fake_sigreturn_bad_magic: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory
> ./sve-test: error while loading shared libraries: cannot make segment writable for relocation: Cannot allocate memory
>
> I'm fairly sure we're not actually running out of memory, there's no OOM
> killer activity, the amount of memory the system has appears to make no
> difference and just replacing the kernel with a mainline build runs as
> expected.


Thanks for the detailed analysis. This series has been dropped from
mm-unstable and, I guess, out of linux-next by tomorrow.

I will retest my series against a larger number of platforms before
sending out the next revision.

>
> You can see the full run that produced the above errors at:
>
> https://lava.sirena.org.uk/scheduler/job/88257
>
> which also embeds links to all the binaries used, exact commands run and
> so on. The failing binaries all appear to be execed from within a
> testsuite, though it's not *all* binaries execed from within tests (eg,
> vec-syscfg execs things and seems happy).
>
> This has taken out a bunch of testsuites in KernelCI (and probably other
> CI systems using test-definitions, though I didn't check).
>
> I tried to bisect this but otherwise haven't made any effort to look at
> the failure. The bisect sadly got lost in this series since a lot of
> the series either fails to build with:
>
> /home/broonie/git/bisect/mm/madvise.c: In function 'madvise_update_vma':
> /home/broonie/git/bisect/mm/madvise.c:165:25: error: implicit declaration of function '__split_vma'; did you mean 'split_vma'? [-Werror=implicit-function-declaration]
> 165 | error = __split_vma(mm, vma, start, 1);
> | ^~~~~~~~~~~
> | split_vma

Thanks. This was reported to me before and I had a fix in mm-unstable.
I'll squash this into the series for v3.

>
> or fails to boot with something along the lines of:
>
> <6>[ 6.054380] Freeing initrd memory: 86880K
> <1>[ 6.087945] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
> <1>[ 6.088231] Mem abort info:
> <1>[ 6.088340] ESR = 0x0000000096000004
> <1>[ 6.088504] EC = 0x25: DABT (current EL), IL = 32 bits
> <1>[ 6.088671] SET = 0, FnV = 0
> <1>[ 6.088802] EA = 0, S1PTW = 0
> <1>[ 6.088929] FSC = 0x04: level 0 translation fault
> <1>[ 6.089099] Data abort info:
> <1>[ 6.089210] ISV = 0, ISS = 0x00000004
> <1>[ 6.089347] CM = 0, WnR = 0
> <1>[ 6.089486] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043e33000
> <1>[ 6.089692] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000
> <0>[ 6.090566] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> <4>[ 6.090866] Modules linked in:
> <4>[ 6.091167] CPU: 0 PID: 42 Comm: modprobe Not tainted 6.2.0-rc1-00190-g505c59767243 #13
> <4>[ 6.091478] Hardware name: linux,dummy-virt (DT)
> <4>[ 6.091784] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> <4>[ 6.092048] pc : mas_wr_walk+0x60/0x2d0
> <4>[ 6.092622] lr : mas_wr_store_entry.isra.0+0x80/0x4a0
> <4>[ 6.092798] sp : ffff80000821bb10
> <4>[ 6.092926] x29: ffff80000821bb10 x28: ffff000003fa4480 x27: 0000000200100073
> <4>[ 6.093206] x26: ffff000003fa41b0 x25: ffff000003fa43f0 x24: 0000000000000002
> <4>[ 6.093445] x23: 0000000ffffae021 x22: 0000000000000000 x21: ffff000002a74440
> <4>[ 6.093685] x20: ffff000003fa4480 x19: ffff80000821bc48 x18: 0000000000000000
> <4>[ 6.093933] x17: 0000000000000000 x16: ffff000002b8da00 x15: ffff80000821bc48
> <4>[ 6.094169] x14: 0000ffffae022fff x13: ffffffffffffffff x12: ffff000002b8da0c
> <4>[ 6.094427] x11: ffff80000821bb68 x10: ffffd75265462458 x9 : ffff80000821bc48
> <4>[ 6.094685] x8 : ffff80000821bbb8 x7 : ffff80000821bc48 x6 : ffffffffffffffff
> <4>[ 6.094922] x5 : 000000000000000e x4 : 000000000000000e x3 : 0000000000000000
> <4>[ 6.095167] x2 : 0000000000000008 x1 : 000000000000000f x0 : ffff80000821bb68
> <4>[ 6.095499] Call trace:
> <4>[ 6.095685] mas_wr_walk+0x60/0x2d0
> <4>[ 6.095936] mas_store_prealloc+0x50/0xa0
> <4>[ 6.096097] mmap_region+0x520/0x784
> <4>[ 6.096232] do_mmap+0x3b0/0x52c
> <4>[ 6.096347] vm_mmap_pgoff+0xe4/0x10c
> <4>[ 6.096480] ksys_mmap_pgoff+0x4c/0x204
> <4>[ 6.096621] __arm64_sys_mmap+0x30/0x44
> <4>[ 6.096754] invoke_syscall+0x48/0x114
> <4>[ 6.096900] el0_svc_common.constprop.0+0x44/0xec
> <4>[ 6.097052] do_el0_svc+0x38/0xb0
> <4>[ 6.097183] el0_svc+0x2c/0x84
> <4>[ 6.097287] el0t_64_sync_handler+0xf4/0x120
> <4>[ 6.097457] el0t_64_sync+0x190/0x194
> <0>[ 6.097835] Code: 39402021 51000425 92401ca4 12001ca5 (f8647844)
> <4>[ 6.098294] ---[ end trace 0000000000000000 ]---
>
> (not always exactly the same backtrace, but the mas_wr_walk() was always
> there.)

Thanks. This was also reported and a fix had landed in mm-unstable as
well.

>
> The specific set of commits in next-20230110 where bisect got lost was:
>
> 505c59767243 madvise: use vmi iterator for __split_vma() and vma_merge()
> 1cfdd2a44d6b mmap: pass through vmi iterator to __split_vma()
> 7d718fd9873c sched: convert to vma iterator
> 2f94851ec717 mmap: use vmi version of vma_merge()
> 7e2dd18353a3 task_mmu: convert to vma iterator
> 756841b468f5 mm/mremap: use vmi version of vma_merge()
> aaba4ba837fa mempolicy: convert to vma iterator
> 8193673ee5d8 coredump: convert to vma iterator
> d4f7ebf41a44 mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator
> 4b02758dc3c5 mlock: convert mlock to vma iterator
> fd367dac089e include/linux/mm: declare different type of split_vma() for !CONFIG_MMU
> 3a72a0174748 mm/damon: stop using vma_mas_store() for maple tree store
> dd51a3ca1096 mm: change mprotect_fixup to vma iterator
> b9e4eabb8f40 mmap: convert __vma_adjust() to use vma iterator
> c6fc05242a09 userfaultfd: use vma iterator
> b9000fd4c5a6 mmap-convert-__vma_adjust-to-use-vma-iterator-fix
> bdfb333b0b2a ipc/shm: use the vma iterator for munmap calls
> 3128296746a1 mm: pass through vma iterator to __vma_adjust()
> 80c8eed1721e mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()
> 311129a7971c mmap: convert vma_expand() to use vma iterator
> 69e9b6c8a525 madvise: use split_vma() instead of __split_vma()
> 751f0a6713a9 mm: remove unnecessary write to vma iterator in __vma_adjust()
> a7f83eb601ef mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator
> 39fd6622223e mm: pass vma iterator through to __vma_adjust()
>

...

I appreciate you running through the bisect and bringing this to my
attention.

I will do a better job of emailing linux-next the fixes, which I
obviously overlooked.