2022-05-04 10:18:09

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

From: "Liam R. Howlett" <[email protected]>

Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
do_mas_align_munmap().

do_munmap() is a wrapper to create a maple state for any callers that have
not been converted to the maple tree.

do_mas_munmap() takes a maple state to mumap a range. This is just a
small function which checks for error conditions and aligns the end of the
range.

do_mas_align_munmap() uses the aligned range to mumap a range.
do_mas_align_munmap() starts with the first VMA in the range, then finds
the last VMA in the range. Both start and end are split if necessary.
Then the VMAs are removed from the linked list and the mm mlock count is
updated at the same time. Followed by a single tree operation of
overwriting the area in with a NULL. Finally, the detached list is
unmapped and freed.

By reorganizing the munmap calls as outlined, it is now possible to avoid
extra work of aligning pre-aligned callers which are known to be safe,
avoid extra VMA lookups or tree walks for modifications.

detach_vmas_to_be_unmapped() is no longer used, so drop this code.

vm_brk_flags() can just call the do_mas_munmap() as it checks for
intersecting VMAs directly.

Signed-off-by: Liam R. Howlett <[email protected]>
---
include/linux/mm.h | 5 +-
mm/mmap.c | 231 ++++++++++++++++++++++++++++-----------------
mm/mremap.c | 17 ++--
3 files changed, 158 insertions(+), 95 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6d633f04a64..0cc2cb692a78 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2712,8 +2712,9 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr,
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
unsigned long pgoff, unsigned long *populate, struct list_head *uf);
-extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
- struct list_head *uf, bool downgrade);
+extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+ unsigned long start, size_t len, struct list_head *uf,
+ bool downgrade);
extern int do_munmap(struct mm_struct *, unsigned long, size_t,
struct list_head *uf);
extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
diff --git a/mm/mmap.c b/mm/mmap.c
index d49dca8fecd5..dd21f0a3f236 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2372,47 +2372,6 @@ static void unmap_region(struct mm_struct *mm,
tlb_finish_mmu(&tlb);
}

-/*
- * Create a list of vma's touched by the unmap, removing them from the mm's
- * vma list as we go..
- */
-static bool
-detach_vmas_to_be_unmapped(struct mm_struct *mm, struct ma_state *mas,
- struct vm_area_struct *vma, struct vm_area_struct *prev,
- unsigned long end)
-{
- struct vm_area_struct **insertion_point;
- struct vm_area_struct *tail_vma = NULL;
-
- insertion_point = (prev ? &prev->vm_next : &mm->mmap);
- vma->vm_prev = NULL;
- vma_mas_szero(mas, vma->vm_start, end);
- do {
- if (vma->vm_flags & VM_LOCKED)
- mm->locked_vm -= vma_pages(vma);
- mm->map_count--;
- tail_vma = vma;
- vma = vma->vm_next;
- } while (vma && vma->vm_start < end);
- *insertion_point = vma;
- if (vma)
- vma->vm_prev = prev;
- else
- mm->highest_vm_end = prev ? vm_end_gap(prev) : 0;
- tail_vma->vm_next = NULL;
-
- /*
- * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or
- * VM_GROWSUP VMA. Such VMAs can change their size under
- * down_read(mmap_lock) and collide with the VMA we are about to unmap.
- */
- if (vma && (vma->vm_flags & VM_GROWSDOWN))
- return false;
- if (prev && (prev->vm_flags & VM_GROWSUP))
- return false;
- return true;
-}
-
/*
* __split_vma() bypasses sysctl_max_map_count checking. We use this where it
* has already been checked or doesn't make sense to fail.
@@ -2492,40 +2451,51 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
return __split_vma(mm, vma, addr, new_below);
}

-/* Munmap is split into 2 main parts -- this part which finds
- * what needs doing, and the areas themselves, which do the
- * work. This now handles partial unmappings.
- * Jeremy Fitzhardinge <[email protected]>
- */
-int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
- struct list_head *uf, bool downgrade)
+static inline int
+unlock_range(struct vm_area_struct *start, struct vm_area_struct **tail,
+ unsigned long limit)
{
- unsigned long end;
- struct vm_area_struct *vma, *prev, *last;
- int error = -ENOMEM;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ struct mm_struct *mm = start->vm_mm;
+ struct vm_area_struct *tmp = start;
+ int count = 0;

- if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start)
- return -EINVAL;
+ while (tmp && tmp->vm_start < limit) {
+ *tail = tmp;
+ count++;
+ if (tmp->vm_flags & VM_LOCKED)
+ mm->locked_vm -= vma_pages(tmp);

- len = PAGE_ALIGN(len);
- end = start + len;
- if (len == 0)
- return -EINVAL;
+ tmp = tmp->vm_next;
+ }

- /* arch_unmap() might do unmaps itself. */
- arch_unmap(mm, start, end);
+ return count;
+}

- /* Find the first overlapping VMA where start < vma->vm_end */
- vma = find_vma_intersection(mm, start, end);
- if (!vma)
- return 0;
+/*
+ * do_mas_align_munmap() - munmap the aligned region from @start to @end.
+ * @mas: The maple_state, ideally set up to alter the correct tree location.
+ * @vma: The starting vm_area_struct
+ * @mm: The mm_struct
+ * @start: The aligned start address to munmap.
+ * @end: The aligned end address to munmap.
+ * @uf: The userfaultfd list_head
+ * @downgrade: Set to true to attempt a write downgrade of the mmap_sem
+ *
+ * If @downgrade is true, check return code for potential release of the lock.
+ */
+static int
+do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
+ struct mm_struct *mm, unsigned long start,
+ unsigned long end, struct list_head *uf, bool downgrade)
+{
+ struct vm_area_struct *prev, *last;
+ int error = -ENOMEM;
+ /* we have start < vma->vm_end */

- if (mas_preallocate(&mas, vma, GFP_KERNEL))
+ if (mas_preallocate(mas, vma, GFP_KERNEL))
return -ENOMEM;
- prev = vma->vm_prev;
- /* we have start < vma->vm_end */

+ mas->last = end - 1;
/*
* If we need to split any vma, do it now to save pain later.
*
@@ -2546,17 +2516,31 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
error = __split_vma(mm, vma, start, 0);
if (error)
goto split_failed;
+
prev = vma;
+ vma = __vma_next(mm, prev);
+ mas->index = start;
+ mas_reset(mas);
+ } else {
+ prev = vma->vm_prev;
}

+ if (vma->vm_end >= end)
+ last = vma;
+ else
+ last = find_vma_intersection(mm, end - 1, end);
+
/* Does it split the last one? */
- last = find_vma(mm, end);
- if (last && end > last->vm_start) {
+ if (last && end < last->vm_end) {
error = __split_vma(mm, last, end, 1);
+
if (error)
goto split_failed;
+
+ if (vma == last)
+ vma = __vma_next(mm, prev);
+ mas_reset(mas);
}
- vma = __vma_next(mm, prev);

if (unlikely(uf)) {
/*
@@ -2569,16 +2553,46 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
* failure that it's not worth optimizing it for.
*/
error = userfaultfd_unmap_prep(vma, start, end, uf);
+
if (error)
goto userfaultfd_error;
}

- /* Detach vmas from rbtree */
- if (!detach_vmas_to_be_unmapped(mm, &mas, vma, prev, end))
- downgrade = false;
+ /*
+ * unlock any mlock()ed ranges before detaching vmas, count the number
+ * of VMAs to be dropped, and return the tail entry of the affected
+ * area.
+ */
+ mm->map_count -= unlock_range(vma, &last, end);
+ /* Drop removed area from the tree */
+ mas_store_prealloc(mas, NULL);

- if (downgrade)
- mmap_write_downgrade(mm);
+ /* Detach vmas from the MM linked list */
+ vma->vm_prev = NULL;
+ if (prev)
+ prev->vm_next = last->vm_next;
+ else
+ mm->mmap = last->vm_next;
+
+ if (last->vm_next) {
+ last->vm_next->vm_prev = prev;
+ last->vm_next = NULL;
+ } else
+ mm->highest_vm_end = prev ? vm_end_gap(prev) : 0;
+
+ /*
+ * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or
+ * VM_GROWSUP VMA. Such VMAs can change their size under
+ * down_read(mmap_lock) and collide with the VMA we are about to unmap.
+ */
+ if (downgrade) {
+ if (last && (last->vm_flags & VM_GROWSDOWN))
+ downgrade = false;
+ else if (prev && (prev->vm_flags & VM_GROWSUP))
+ downgrade = false;
+ else
+ mmap_write_downgrade(mm);
+ }

unmap_region(mm, vma, prev, start, end);

@@ -2592,14 +2606,63 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
map_count_exceeded:
split_failed:
userfaultfd_error:
- mas_destroy(&mas);
+ mas_destroy(mas);
return error;
}

+/*
+ * do_mas_munmap() - munmap a given range.
+ * @mas: The maple state
+ * @mm: The mm_struct
+ * @start: The start address to munmap
+ * @len: The length of the range to munmap
+ * @uf: The userfaultfd list_head
+ * @downgrade: set to true if the user wants to attempt to write_downgrade the
+ * mmap_sem
+ *
+ * This function takes a @mas that is either pointing to the previous VMA or set
+ * to MA_START and sets it up to remove the mapping(s). The @len will be
+ * aligned and any arch_unmap work will be preformed.
+ *
+ * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
+ */
+int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+ unsigned long start, size_t len, struct list_head *uf,
+ bool downgrade)
+{
+ unsigned long end;
+ struct vm_area_struct *vma;
+
+ if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start)
+ return -EINVAL;
+
+ end = start + PAGE_ALIGN(len);
+ if (end == start)
+ return -EINVAL;
+
+ /* arch_unmap() might do unmaps itself. */
+ arch_unmap(mm, start, end);
+
+ /* Find the first overlapping VMA */
+ vma = mas_find(mas, end - 1);
+ if (!vma)
+ return 0;
+
+ return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade);
+}
+
+/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
+ * @mm: The mm_struct
+ * @start: The start address to munmap
+ * @len: The length to be munmapped.
+ * @uf: The userfaultfd list_head
+ */
int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
struct list_head *uf)
{
- return __do_munmap(mm, start, len, uf, false);
+ MA_STATE(mas, &mm->mm_mt, start, start);
+
+ return do_mas_munmap(&mas, mm, start, len, uf, false);
}

unsigned long mmap_region(struct file *file, unsigned long addr,
@@ -2633,7 +2696,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}

/* Unmap any existing mapping in the area */
- if (do_munmap(mm, addr, len, uf))
+ if (do_mas_munmap(&mas, mm, addr, len, uf, false))
return -ENOMEM;

/*
@@ -2845,11 +2908,12 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
int ret;
struct mm_struct *mm = current->mm;
LIST_HEAD(uf);
+ MA_STATE(mas, &mm->mm_mt, start, start);

if (mmap_write_lock_killable(mm))
return -EINTR;

- ret = __do_munmap(mm, start, len, &uf, downgrade);
+ ret = do_mas_munmap(&mas, mm, start, len, &uf, downgrade);
/*
* Returning 1 indicates mmap_lock is downgraded.
* But 1 is not legal return value of vm_munmap() and munmap(), reset
@@ -2984,10 +3048,7 @@ static int do_brk_munmap(struct ma_state *mas, struct vm_area_struct *vma,
if (likely((vma->vm_end < oldbrk) ||
((vma->vm_start == newbrk) && (vma->vm_end == oldbrk)))) {
/* remove entire mapping(s) */
- mas_set(mas, newbrk);
- if (vma->vm_start != newbrk)
- mas_reset(mas); /* cause a re-walk for the first overlap. */
- ret = __do_munmap(mm, newbrk, oldbrk - newbrk, uf, true);
+ ret = do_mas_munmap(mas, mm, newbrk, oldbrk-newbrk, uf, true);
goto munmap_full_vma;
}

@@ -3168,9 +3229,7 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
if (ret)
goto limits_failed;

- if (find_vma_intersection(mm, addr, addr + len))
- ret = do_munmap(mm, addr, len, &uf);
-
+ ret = do_mas_munmap(&mas, mm, addr, len, &uf, 0);
if (ret)
goto munmap_failed;

diff --git a/mm/mremap.c b/mm/mremap.c
index 98f50e633009..4495f69eccbe 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -975,20 +975,23 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
/*
* Always allow a shrinking remap: that just unmaps
* the unnecessary pages..
- * __do_munmap does all the needed commit accounting, and
+ * do_mas_munmap does all the needed commit accounting, and
* downgrades mmap_lock to read if so directed.
*/
if (old_len >= new_len) {
int retval;
+ MA_STATE(mas, &mm->mm_mt, addr + new_len, addr + new_len);

- retval = __do_munmap(mm, addr+new_len, old_len - new_len,
- &uf_unmap, true);
- if (retval < 0 && old_len != new_len) {
- ret = retval;
- goto out;
+ retval = do_mas_munmap(&mas, mm, addr + new_len,
+ old_len - new_len, &uf_unmap, true);
/* Returning 1 indicates mmap_lock is downgraded to read. */
- } else if (retval == 1)
+ if (retval == 1) {
downgraded = true;
+ } else if (retval < 0 && old_len != new_len) {
+ ret = retval;
+ goto out;
+ }
+
ret = addr;
goto out;
}
--
2.35.1


2022-06-06 12:12:57

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, May 04, 2022 at 01:13:53AM +0000, Liam Howlett wrote:
> From: "Liam R. Howlett" <[email protected]>
>
> Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
> do_mas_align_munmap().
>
> do_munmap() is a wrapper to create a maple state for any callers that have
> not been converted to the maple tree.
>
> do_mas_munmap() takes a maple state to mumap a range. This is just a
> small function which checks for error conditions and aligns the end of the
> range.
>
> do_mas_align_munmap() uses the aligned range to mumap a range.
> do_mas_align_munmap() starts with the first VMA in the range, then finds
> the last VMA in the range. Both start and end are split if necessary.
> Then the VMAs are removed from the linked list and the mm mlock count is
> updated at the same time. Followed by a single tree operation of
> overwriting the area in with a NULL. Finally, the detached list is
> unmapped and freed.
>
> By reorganizing the munmap calls as outlined, it is now possible to avoid
> extra work of aligning pre-aligned callers which are known to be safe,
> avoid extra VMA lookups or tree walks for modifications.
>
> detach_vmas_to_be_unmapped() is no longer used, so drop this code.
>
> vm_brk_flags() can just call the do_mas_munmap() as it checks for
> intersecting VMAs directly.
>
> Signed-off-by: Liam R. Howlett <[email protected]>
...
> +/*
> + * do_mas_align_munmap() - munmap the aligned region from @start to @end.
> + * @mas: The maple_state, ideally set up to alter the correct tree location.
> + * @vma: The starting vm_area_struct
> + * @mm: The mm_struct
> + * @start: The aligned start address to munmap.
> + * @end: The aligned end address to munmap.
> + * @uf: The userfaultfd list_head
> + * @downgrade: Set to true to attempt a write downgrade of the mmap_sem
> + *
> + * If @downgrade is true, check return code for potential release of the lock.
> + */
> +static int
> +do_mas_align_munmap(struct ma_state *mas, struct vm_area_struct *vma,
> + struct mm_struct *mm, unsigned long start,
> + unsigned long end, struct list_head *uf, bool downgrade)
> +{
> + struct vm_area_struct *prev, *last;
> + int error = -ENOMEM;
> + /* we have start < vma->vm_end */
>
> - if (mas_preallocate(&mas, vma, GFP_KERNEL))
> + if (mas_preallocate(mas, vma, GFP_KERNEL))
> return -ENOMEM;
> - prev = vma->vm_prev;
> - /* we have start < vma->vm_end */
>
> + mas->last = end - 1;
> /*
> * If we need to split any vma, do it now to save pain later.
> *
...
> +/*
> + * do_mas_munmap() - munmap a given range.
> + * @mas: The maple state
> + * @mm: The mm_struct
> + * @start: The start address to munmap
> + * @len: The length of the range to munmap
> + * @uf: The userfaultfd list_head
> + * @downgrade: set to true if the user wants to attempt to write_downgrade the
> + * mmap_sem
> + *
> + * This function takes a @mas that is either pointing to the previous VMA or set
> + * to MA_START and sets it up to remove the mapping(s). The @len will be
> + * aligned and any arch_unmap work will be preformed.
> + *
> + * Returns: -EINVAL on failure, 1 on success and unlock, 0 otherwise.
> + */
> +int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
> + unsigned long start, size_t len, struct list_head *uf,
> + bool downgrade)
> +{
> + unsigned long end;
> + struct vm_area_struct *vma;
> +
> + if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start)
> + return -EINVAL;
> +
> + end = start + PAGE_ALIGN(len);
> + if (end == start)
> + return -EINVAL;
> +
> + /* arch_unmap() might do unmaps itself. */
> + arch_unmap(mm, start, end);
> +
> + /* Find the first overlapping VMA */
> + vma = mas_find(mas, end - 1);
> + if (!vma)
> + return 0;
> +
> + return do_mas_align_munmap(mas, vma, mm, start, end, uf, downgrade);
> +}
> +
...
> @@ -2845,11 +2908,12 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
> int ret;
> struct mm_struct *mm = current->mm;
> LIST_HEAD(uf);
> + MA_STATE(mas, &mm->mm_mt, start, start);
>
> if (mmap_write_lock_killable(mm))
> return -EINTR;
>
> - ret = __do_munmap(mm, start, len, &uf, downgrade);
> + ret = do_mas_munmap(&mas, mm, start, len, &uf, downgrade);
> /*
> * Returning 1 indicates mmap_lock is downgraded.
> * But 1 is not legal return value of vm_munmap() and munmap(), reset

Running a syscall fuzzer for a while could trigger those.

WARNING: CPU: 95 PID: 1329067 at mm/slub.c:3643 kmem_cache_free_bulk
CPU: 95 PID: 1329067 Comm: trinity-c32 Not tainted 5.18.0-next-20220603 #137
pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : kmem_cache_free_bulk
lr : mt_destroy_walk
sp : ffff80005ed66bf0
x29: ffff80005ed66bf0 x28: ffff401d6c82f050 x27: 0000000000000000
x26: dfff800000000000 x25: 0000000000000003 x24: 1ffffa97cc5fb120
x23: ffffd4be62fd8760 x22: ffff401d6c82f050 x21: 0000000000000003
x20: 0000000000000000 x19: ffff401d6c82f000 x18: ffffd4be66407d1c
x17: ffff40297ac21f0c x16: 1fffe8016136146b x15: 1fffe806c7d1ad38
x14: 1fffe8016136145e x13: 0000000000000004 x12: ffff70000bdacd8d
x11: 1ffff0000bdacd8c x10: ffff70000bdacd8c x9 : ffffd4be60d633c4
x8 : ffff80005ed66c63 x7 : 0000000000000001 x6 : 0000000000000003
x5 : ffff80005ed66c60 x4 : 0000000000000000 x3 : ffff400b09b09a80
x2 : ffff401d6c82f050 x1 : 0000000000000000 x0 : ffff07ff80014a80
Call trace:
kmem_cache_free_bulk
mt_destroy_walk
mas_wmb_replace
mas_spanning_rebalance.isra.0
mas_wr_spanning_store.isra.0
mas_wr_store_entry.isra.0
mas_store_prealloc
do_mas_align_munmap.constprop.0
do_mas_munmap
__vm_munmap
__arm64_sys_munmap
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync
irq event stamp: 665580
hardirqs last enabled at (665579): kasan_quarantine_put
hardirqs last disabled at (665580): el1_dbg
softirqs last enabled at (664048): __do_softirq
softirqs last disabled at (663831): __irq_exit_rcu


BUG: KASAN: double-free or invalid-free in kmem_cache_free_bulk

CPU: 95 PID: 1329067 Comm: trinity-c32 Tainted: G W 5.18.0-next-20220603 #137
Call trace:
dump_backtrace
show_stack
dump_stack_lvl
print_address_description.constprop.0
print_report
kasan_report_invalid_free
____kasan_slab_free
__kasan_slab_free
slab_free_freelist_hook
kmem_cache_free_bulk
mas_destroy
mas_store_prealloc
do_mas_align_munmap.constprop.0
do_mas_munmap
__vm_munmap
__arm64_sys_munmap
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync

Allocated by task 1329067:
kasan_save_stack
__kasan_slab_alloc
slab_post_alloc_hook
kmem_cache_alloc_bulk
mas_alloc_nodes
mas_preallocate
__vma_adjust
vma_merge
mprotect_fixup
do_mprotect_pkey.constprop.0
__arm64_sys_mprotect
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync

Freed by task 1329067:
kasan_save_stack
kasan_set_track
kasan_set_free_info
____kasan_slab_free
__kasan_slab_free
slab_free_freelist_hook
kmem_cache_free
mt_destroy_walk
mas_wmb_replace
mas_spanning_rebalance.isra.0
mas_wr_spanning_store.isra.0
mas_wr_store_entry.isra.0
mas_store_prealloc
do_mas_align_munmap.constprop.0
do_mas_munmap
__vm_munmap
__arm64_sys_munmap
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync

The buggy address belongs to the object at ffff401d6c82f000
which belongs to the cache maple_node of size 256
The buggy address is located 0 bytes inside of
256-byte region [ffff401d6c82f000, ffff401d6c82f100)

The buggy address belongs to the physical page:
page:fffffd0075b20a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x401dec828
head:fffffd0075b20a00 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x1bfffc0000010200(slab|head|node=1|zone=2|lastcpupid=0xffff)
raw: 1bfffc0000010200 fffffd00065b2a08 fffffd0006474408 ffff07ff80014a80
raw: 0000000000000000 00000000002a002a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 185514, tgid 185514 (trinity-c15), ts 9791681605400, free_ts 9785882037080
post_alloc_hook
get_page_from_freelist
__alloc_pages
alloc_pages
allocate_slab
new_slab
___slab_alloc
__slab_alloc.constprop.0
kmem_cache_alloc
mas_alloc_nodes
mas_preallocate
__vma_adjust
vma_merge
mlock_fixup
apply_mlockall_flags
__arm64_sys_munlockall
page last free stack trace:
free_pcp_prepare
free_unref_page
__free_pages
__free_slab
discard_slab
__slab_free
___cache_free
qlist_free_all
kasan_quarantine_reduce
__kasan_slab_alloc
__kmalloc_node
kvmalloc_node
__slab_free
___cache_free
qlist_free_all
kasan_quarantine_reduce
__kasan_slab_alloc
__kmalloc_node
kvmalloc_node
proc_sys_call_handler
proc_sys_read
new_sync_read
vfs_read

Memory state around the buggy address:
ffff401d6c82ef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff401d6c82ef80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff401d6c82f000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff401d6c82f080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff401d6c82f100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

2022-06-06 16:38:19

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Qian Cai <[email protected]> [220606 08:09]:
> On Wed, May 04, 2022 at 01:13:53AM +0000, Liam Howlett wrote:
> > From: "Liam R. Howlett" <[email protected]>
> >
> > Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
> > do_mas_align_munmap().
> >
> > do_munmap() is a wrapper to create a maple state for any callers that have
> > not been converted to the maple tree.
> >
> > do_mas_munmap() takes a maple state to mumap a range. This is just a
> > small function which checks for error conditions and aligns the end of the
> > range.
> >
> > do_mas_align_munmap() uses the aligned range to mumap a range.
> > do_mas_align_munmap() starts with the first VMA in the range, then finds
> > the last VMA in the range. Both start and end are split if necessary.
> > Then the VMAs are removed from the linked list and the mm mlock count is
> > updated at the same time. Followed by a single tree operation of
> > overwriting the area in with a NULL. Finally, the detached list is
> > unmapped and freed.
> >
> > By reorganizing the munmap calls as outlined, it is now possible to avoid
> > extra work of aligning pre-aligned callers which are known to be safe,
> > avoid extra VMA lookups or tree walks for modifications.
> >
> > detach_vmas_to_be_unmapped() is no longer used, so drop this code.
> >
> > vm_brk_flags() can just call the do_mas_munmap() as it checks for
> > intersecting VMAs directly.
> >
> > Signed-off-by: Liam R. Howlett <[email protected]>
> ...

..
> Running a syscall fuzzer for a while could trigger those.

Thanks.

>
> WARNING: CPU: 95 PID: 1329067 at mm/slub.c:3643 kmem_cache_free_bulk
> CPU: 95 PID: 1329067 Comm: trinity-c32 Not tainted 5.18.0-next-20220603 #137
> pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : kmem_cache_free_bulk
> lr : mt_destroy_walk
> sp : ffff80005ed66bf0


Does your syscall fuzzer create a reproducer? This looks like arm64
and says 5.18.0-next-20220603 again. Was this bisected to the patch
above?

Regards,
Liam

2022-06-06 16:56:09

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> Does your syscall fuzzer create a reproducer? This looks like arm64
> and says 5.18.0-next-20220603 again. Was this bisected to the patch
> above?

This was triggered by running the fuzzer over the weekend.

$ trinity -C 160

No bisection was done. It was only brought up here because the trace
pointed to do_mas_munmap() which was introduced here.

2022-06-11 21:46:16

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
>
> On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > Does your syscall fuzzer create a reproducer? This looks like arm64
> > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > above?
>
> This was triggered by running the fuzzer over the weekend.
>
> $ trinity -C 160
>
> No bisection was done. It was only brought up here because the trace
> pointed to do_mas_munmap() which was introduced here.

Liam,

I'm getting a similar crash on arm64 -- the allocator is madvise(),
not mprotect(). Please take a look.

Thanks.

==================================================================
BUG: KASAN: double-free or invalid-free in kmem_cache_free_bulk+0x230/0x3b0
Pointer tag: [0c], memory tag: [fe]

CPU: 2 PID: 8320 Comm: stress-ng Tainted: G B W
5.19.0-rc1-lockdep+ #3
Call trace:
dump_backtrace+0x1a0/0x200
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_report+0x15c/0x524
kasan_report_invalid_free+0x64/0x84
____kasan_slab_free+0x150/0x184
__kasan_slab_free+0x14/0x24
slab_free_freelist_hook+0x100/0x1ac
kmem_cache_free_bulk+0x230/0x3b0
mas_destroy+0x10d8/0x1270
mas_store_prealloc+0xb8/0xec
do_mas_align_munmap+0x398/0x694
do_mas_munmap+0xf8/0x118
__vm_munmap+0x154/0x1e0
__arm64_sys_munmap+0x44/0x54
el0_svc_common+0xfc/0x1cc
do_el0_svc_compat+0x38/0x5c
el0_svc_compat+0x68/0xf4
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x190/0x194

Allocated by task 8437:
kasan_set_track+0x4c/0x7c
__kasan_slab_alloc+0x84/0xa8
kmem_cache_alloc_bulk+0x300/0x408
mas_alloc_nodes+0x198/0x294
mas_preallocate+0x8c/0x110
__vma_adjust+0x174/0xc88
vma_merge+0x2e4/0x300
do_madvise+0x504/0xd20
__arm64_sys_madvise+0x54/0x64
el0_svc_common+0xfc/0x1cc
do_el0_svc_compat+0x38/0x5c
el0_svc_compat+0x68/0xf4
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x190/0x194

Freed by task 8320:
kasan_set_track+0x4c/0x7c
kasan_set_free_info+0x2c/0x38
____kasan_slab_free+0x13c/0x184
__kasan_slab_free+0x14/0x24
slab_free_freelist_hook+0x100/0x1ac
kmem_cache_free+0x11c/0x264
mt_destroy_walk+0x6d8/0x714
mas_wmb_replace+0x9d4/0xa68
mas_spanning_rebalance+0x1af0/0x1d2c
mas_wr_spanning_store+0x908/0x964
mas_wr_store_entry+0x53c/0x5c0
mas_store_prealloc+0x88/0xec
do_mas_align_munmap+0x398/0x694
do_mas_munmap+0xf8/0x118
__vm_munmap+0x154/0x1e0
__arm64_sys_munmap+0x44/0x54
el0_svc_common+0xfc/0x1cc
do_el0_svc_compat+0x38/0x5c
el0_svc_compat+0x68/0xf4
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x190/0x194

The buggy address belongs to the object at ffffff808b5f0a00
which belongs to the cache maple_node of size 256
The buggy address is located 0 bytes inside of
256-byte region [ffffff808b5f0a00, ffffff808b5f0b00)

The buggy address belongs to the physical page:
page:fffffffe022d7c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0xcffff808b5f0a00 pfn:0x10b5f0
head:fffffffe022d7c00 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x8000000000010200(slab|head|zone=2|kasantag=0x0)
raw: 8000000000010200 fffffffe031a8608 fffffffe021a3608 caffff808002c800
raw: 0cffff808b5f0a00 0000000000150013 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffffff808b5f0800: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
ffffff808b5f0900: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
>ffffff808b5f0a00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
^
ffffff808b5f0b00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
ffffff808b5f0c00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
==================================================================

2022-06-12 16:14:32

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
>
> On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> >
> > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > above?
> >
> > This was triggered by running the fuzzer over the weekend.
> >
> > $ trinity -C 160
> >
> > No bisection was done. It was only brought up here because the trace
> > pointed to do_mas_munmap() which was introduced here.
>
> Liam,
>
> I'm getting a similar crash on arm64 -- the allocator is madvise(),
> not mprotect(). Please take a look.

Another crash on x86_64, which seems different:

==================================================================
BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461

CPU: 66 PID: 18461 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
Call Trace:
<TASK>
dump_stack_lvl+0xc5/0xf4
print_address_description+0x7f/0x460
print_report+0x10b/0x240
? mab_mas_cp+0x2d9/0x6c0
kasan_report+0xe6/0x110
? mab_mas_cp+0x2d9/0x6c0
kasan_check_range+0x2ef/0x310
? mab_mas_cp+0x2d9/0x6c0
memcpy+0x44/0x70
mab_mas_cp+0x2d9/0x6c0
mas_spanning_rebalance+0x1a45/0x4d70
? stack_trace_save+0xca/0x160
? stack_trace_save+0xca/0x160
mas_wr_spanning_store+0x16a4/0x1ad0
mas_wr_spanning_store+0x16a4/0x1ad0
mas_wr_store_entry+0xbf9/0x12e0
mas_store_prealloc+0x205/0x3c0
do_mas_align_munmap+0x6cf/0xd10
do_mas_munmap+0x1bb/0x210
? down_write_killable+0xa6/0x110
__vm_munmap+0x1c4/0x270
__x64_sys_munmap+0x60/0x70
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x589827
Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff
ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff9276c518 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827
RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000
RBP: 00000000004cf000 R08: 00007fff9276c550 R09: 0000000000923bf0
R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000
R13: 00000000004cf040 R14: 0000000000000004 R15: 00007fff9276c668
</TASK>

Allocated by task 18461:
__kasan_slab_alloc+0xaf/0xe0
kmem_cache_alloc_bulk+0x261/0x360
mas_alloc_nodes+0x2d7/0x4d0
mas_preallocate+0xe0/0x220
do_mas_align_munmap+0x1ce/0xd10
do_mas_munmap+0x1bb/0x210
__vm_munmap+0x1c4/0x270
__x64_sys_munmap+0x60/0x70
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x46/0xb0

The buggy address belongs to the object at ffff88c5a2319c00
which belongs to the cache maple_node of size 256
The buggy address is located 128 bytes inside of
256-byte region [ffff88c5a2319c00, ffff88c5a2319d00)

The buggy address belongs to the physical page:
page:000000000a5cfe8b refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x45a2319
flags: 0x1400000000000200(slab|node=1|zone=1)
raw: 1400000000000200 ffffea01168dea88 ffffea0116951f48 ffff88810004ff00
raw: 0000000000000000 ffff88c5a2319000 0000000100000008 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88c5a2319c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88c5a2319c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88c5a2319d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88c5a2319d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88c5a2319e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

2022-06-12 16:16:55

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220611 17:50]:
> On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
> >
> > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> > >
> > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > > above?
> > >
> > > This was triggered by running the fuzzer over the weekend.
> > >
> > > $ trinity -C 160
> > >
> > > No bisection was done. It was only brought up here because the trace
> > > pointed to do_mas_munmap() which was introduced here.
> >
> > Liam,
> >
> > I'm getting a similar crash on arm64 -- the allocator is madvise(),
> > not mprotect(). Please take a look.
>
> Another crash on x86_64, which seems different:

Thanks, yes. This one may be different. The others are the same source
and I'm working on that.

>
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461
>
> CPU: 66 PID: 18461 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> Call Trace:
> <TASK>
> dump_stack_lvl+0xc5/0xf4
> print_address_description+0x7f/0x460
> print_report+0x10b/0x240
> ? mab_mas_cp+0x2d9/0x6c0
> kasan_report+0xe6/0x110
> ? mab_mas_cp+0x2d9/0x6c0
> kasan_check_range+0x2ef/0x310
> ? mab_mas_cp+0x2d9/0x6c0
> memcpy+0x44/0x70
> mab_mas_cp+0x2d9/0x6c0
> mas_spanning_rebalance+0x1a45/0x4d70
> ? stack_trace_save+0xca/0x160
> ? stack_trace_save+0xca/0x160
> mas_wr_spanning_store+0x16a4/0x1ad0
> mas_wr_spanning_store+0x16a4/0x1ad0
> mas_wr_store_entry+0xbf9/0x12e0
> mas_store_prealloc+0x205/0x3c0
> do_mas_align_munmap+0x6cf/0xd10
> do_mas_munmap+0x1bb/0x210
> ? down_write_killable+0xa6/0x110
> __vm_munmap+0x1c4/0x270
> __x64_sys_munmap+0x60/0x70
> do_syscall_64+0x44/0xa0
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x589827
> Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff
> ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff9276c518 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827
> RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000
> RBP: 00000000004cf000 R08: 00007fff9276c550 R09: 0000000000923bf0
> R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000
> R13: 00000000004cf040 R14: 0000000000000004 R15: 00007fff9276c668
> </TASK>
>
> Allocated by task 18461:
> __kasan_slab_alloc+0xaf/0xe0
> kmem_cache_alloc_bulk+0x261/0x360
> mas_alloc_nodes+0x2d7/0x4d0
> mas_preallocate+0xe0/0x220
> do_mas_align_munmap+0x1ce/0xd10
> do_mas_munmap+0x1bb/0x210
> __vm_munmap+0x1c4/0x270
> __x64_sys_munmap+0x60/0x70
> do_syscall_64+0x44/0xa0
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> The buggy address belongs to the object at ffff88c5a2319c00
> which belongs to the cache maple_node of size 256
> The buggy address is located 128 bytes inside of
> 256-byte region [ffff88c5a2319c00, ffff88c5a2319d00)
>
> The buggy address belongs to the physical page:
> page:000000000a5cfe8b refcount:1 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x45a2319
> flags: 0x1400000000000200(slab|node=1|zone=1)
> raw: 1400000000000200 ffffea01168dea88 ffffea0116951f48 ffff88810004ff00
> raw: 0000000000000000 ffff88c5a2319000 0000000100000008 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff88c5a2319c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff88c5a2319c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >ffff88c5a2319d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ^
> ffff88c5a2319d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff88c5a2319e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ==================================================================

2022-06-15 14:52:40

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220611 17:50]:
> On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
> >
> > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> > >
> > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > > above?
> > >
> > > This was triggered by running the fuzzer over the weekend.
> > >
> > > $ trinity -C 160
> > >
> > > No bisection was done. It was only brought up here because the trace
> > > pointed to do_mas_munmap() which was introduced here.
> >
> > Liam,
> >
> > I'm getting a similar crash on arm64 -- the allocator is madvise(),
> > not mprotect(). Please take a look.
>
> Another crash on x86_64, which seems different:

Thanks for this. I was able to reproduce the other crashes that you and
Qian reported. I've sent out a patch set to Andrew to apply to the
branch which includes the fix for them and an unrelated issue discovered
when I wrote the testcases to cover what was going on here.


>
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461
>
> CPU: 66 PID: 18461 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> Call Trace:
> <TASK>
> dump_stack_lvl+0xc5/0xf4
> print_address_description+0x7f/0x460
> print_report+0x10b/0x240
> ? mab_mas_cp+0x2d9/0x6c0
> kasan_report+0xe6/0x110
> ? mab_mas_cp+0x2d9/0x6c0
> kasan_check_range+0x2ef/0x310
> ? mab_mas_cp+0x2d9/0x6c0
> memcpy+0x44/0x70
> mab_mas_cp+0x2d9/0x6c0
> mas_spanning_rebalance+0x1a45/0x4d70
> ? stack_trace_save+0xca/0x160
> ? stack_trace_save+0xca/0x160
> mas_wr_spanning_store+0x16a4/0x1ad0
> mas_wr_spanning_store+0x16a4/0x1ad0
> mas_wr_store_entry+0xbf9/0x12e0
> mas_store_prealloc+0x205/0x3c0
> do_mas_align_munmap+0x6cf/0xd10
> do_mas_munmap+0x1bb/0x210
> ? down_write_killable+0xa6/0x110
> __vm_munmap+0x1c4/0x270
> __x64_sys_munmap+0x60/0x70
> do_syscall_64+0x44/0xa0
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x589827
> Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff
> ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff9276c518 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827
> RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000
> RBP: 00000000004cf000 R08: 00007fff9276c550 R09: 0000000000923bf0
> R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000
> R13: 00000000004cf040 R14: 0000000000000004 R15: 00007fff9276c668
> </TASK>

...

As for this crash, I was unable to reproduce and the code I just sent
out changes this code a lot. Was this running with "trinity -c madvise"
or another use case/fuzzer?


Thanks,
Liam

2022-06-15 19:08:34

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220615 14:08]:
> On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <[email protected]> wrote:
> >
> > * Yu Zhao <[email protected]> [220611 17:50]:
> > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
> > > >
> > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > > > > above?
> > > > >
> > > > > This was triggered by running the fuzzer over the weekend.
> > > > >
> > > > > $ trinity -C 160
> > > > >
> > > > > No bisection was done. It was only brought up here because the trace
> > > > > pointed to do_mas_munmap() which was introduced here.
> > > >
> > > > Liam,
> > > >
> > > > I'm getting a similar crash on arm64 -- the allocator is madvise(),
> > > > not mprotect(). Please take a look.
> > >
> > > Another crash on x86_64, which seems different:
> >
> > Thanks for this. I was able to reproduce the other crashes that you and
> > Qian reported. I've sent out a patch set to Andrew to apply to the
> > branch which includes the fix for them and an unrelated issue discovered
> > when I wrote the testcases to cover what was going on here.
>
> Thanks. I'm restarting the test and will report the results in a few hours.
>
> > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461
> ^^^^^^^^^
>
> > As for this crash, I was unable to reproduce and the code I just sent
> > out changes this code a lot. Was this running with "trinity -c madvise"
> > or another use case/fuzzer?
>
> This is also stress-ng (same as the one on arm64). The test stopped
> before it could try syzkaller (fuzzer).

Thanks. What are the arguments to stress-ng you use? I've run
"stress-ng --class vm -a 20 -t 600s --temp-path /tmp" until it OOMs on
my vm, but it only has 8GB of ram.

Regards,
Liam

2022-06-15 19:17:43

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 12:55 PM Liam Howlett <[email protected]> wrote:
>
> * Yu Zhao <[email protected]> [220615 14:08]:
> > On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <[email protected]> wrote:
> > >
> > > * Yu Zhao <[email protected]> [220611 17:50]:
> > > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > > > > > above?
> > > > > >
> > > > > > This was triggered by running the fuzzer over the weekend.
> > > > > >
> > > > > > $ trinity -C 160
> > > > > >
> > > > > > No bisection was done. It was only brought up here because the trace
> > > > > > pointed to do_mas_munmap() which was introduced here.
> > > > >
> > > > > Liam,
> > > > >
> > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(),
> > > > > not mprotect(). Please take a look.
> > > >
> > > > Another crash on x86_64, which seems different:
> > >
> > > Thanks for this. I was able to reproduce the other crashes that you and
> > > Qian reported. I've sent out a patch set to Andrew to apply to the
> > > branch which includes the fix for them and an unrelated issue discovered
> > > when I wrote the testcases to cover what was going on here.
> >
> > Thanks. I'm restarting the test and will report the results in a few hours.
> >
> > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461
> > ^^^^^^^^^
> >
> > > As for this crash, I was unable to reproduce and the code I just sent
> > > out changes this code a lot. Was this running with "trinity -c madvise"
> > > or another use case/fuzzer?
> >
> > This is also stress-ng (same as the one on arm64). The test stopped
> > before it could try syzkaller (fuzzer).
>
> Thanks. What are the arguments to stress-ng you use? I've run
> "stress-ng --class vm -a 20 -t 600s --temp-path /tmp" until it OOMs on
> my vm, but it only has 8GB of ram.

Yes, I used the same parameters with 512GB of RAM, and the kernel with
KASAN and other debug options.

2022-06-15 19:56:37

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <[email protected]> wrote:
>
> * Yu Zhao <[email protected]> [220611 17:50]:
> > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
> > >
> > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> > > >
> > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > > > above?
> > > >
> > > > This was triggered by running the fuzzer over the weekend.
> > > >
> > > > $ trinity -C 160
> > > >
> > > > No bisection was done. It was only brought up here because the trace
> > > > pointed to do_mas_munmap() which was introduced here.
> > >
> > > Liam,
> > >
> > > I'm getting a similar crash on arm64 -- the allocator is madvise(),
> > > not mprotect(). Please take a look.
> >
> > Another crash on x86_64, which seems different:
>
> Thanks for this. I was able to reproduce the other crashes that you and
> Qian reported. I've sent out a patch set to Andrew to apply to the
> branch which includes the fix for them and an unrelated issue discovered
> when I wrote the testcases to cover what was going on here.

Thanks. I'm restarting the test and will report the results in a few hours.

> > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461
^^^^^^^^^

> As for this crash, I was unable to reproduce and the code I just sent
> out changes this code a lot. Was this running with "trinity -c madvise"
> or another use case/fuzzer?

This is also stress-ng (same as the one on arm64). The test stopped
before it could try syzkaller (fuzzer).

2022-06-15 21:29:55

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 1:05 PM Yu Zhao <[email protected]> wrote:
>
> On Wed, Jun 15, 2022 at 12:55 PM Liam Howlett <[email protected]> wrote:
> >
> > * Yu Zhao <[email protected]> [220615 14:08]:
> > > On Wed, Jun 15, 2022 at 8:25 AM Liam Howlett <[email protected]> wrote:
> > > >
> > > > * Yu Zhao <[email protected]> [220611 17:50]:
> > > > > On Sat, Jun 11, 2022 at 2:11 PM Yu Zhao <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jun 6, 2022 at 10:40 AM Qian Cai <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 06, 2022 at 04:19:52PM +0000, Liam Howlett wrote:
> > > > > > > > Does your syscall fuzzer create a reproducer? This looks like arm64
> > > > > > > > and says 5.18.0-next-20220603 again. Was this bisected to the patch
> > > > > > > > above?
> > > > > > >
> > > > > > > This was triggered by running the fuzzer over the weekend.
> > > > > > >
> > > > > > > $ trinity -C 160
> > > > > > >
> > > > > > > No bisection was done. It was only brought up here because the trace
> > > > > > > pointed to do_mas_munmap() which was introduced here.
> > > > > >
> > > > > > Liam,
> > > > > >
> > > > > > I'm getting a similar crash on arm64 -- the allocator is madvise(),
> > > > > > not mprotect(). Please take a look.
> > > > >
> > > > > Another crash on x86_64, which seems different:
> > > >
> > > > Thanks for this. I was able to reproduce the other crashes that you and
> > > > Qian reported. I've sent out a patch set to Andrew to apply to the
> > > > branch which includes the fix for them and an unrelated issue discovered
> > > > when I wrote the testcases to cover what was going on here.
> > >
> > > Thanks. I'm restarting the test and will report the results in a few hours.
> > >
> > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > Write of size 136 at addr ffff88c5a2319c80 by task stress-ng/18461
> > > ^^^^^^^^^
> > >
> > > > As for this crash, I was unable to reproduce and the code I just sent
> > > > out changes this code a lot. Was this running with "trinity -c madvise"
> > > > or another use case/fuzzer?
> > >
> > > This is also stress-ng (same as the one on arm64). The test stopped
> > > before it could try syzkaller (fuzzer).
> >
> > Thanks. What are the arguments to stress-ng you use? I've run
> > "stress-ng --class vm -a 20 -t 600s --temp-path /tmp" until it OOMs on
> > my vm, but it only has 8GB of ram.
>
> Yes, I used the same parameters with 512GB of RAM, and the kernel with
> KASAN and other debug options.

Sorry, Liam. I got the same crash :(

9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
55140693394d maple_tree: Make mas_prealloc() error checking more generic
2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
4d4472148ccd maple_tree: Change spanning store to work on larger trees
ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
spanning writes
0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()

==================================================================
BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303

CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
Call Trace:
<TASK>
dump_stack_lvl+0xc5/0xf4
print_address_description+0x7f/0x460
print_report+0x10b/0x240
? mab_mas_cp+0x2d9/0x6c0
kasan_report+0xe6/0x110
? mast_spanning_rebalance+0x2634/0x29b0
? mab_mas_cp+0x2d9/0x6c0
kasan_check_range+0x2ef/0x310
? mab_mas_cp+0x2d9/0x6c0
? mab_mas_cp+0x2d9/0x6c0
memcpy+0x44/0x70
mab_mas_cp+0x2d9/0x6c0
mas_spanning_rebalance+0x1a3e/0x4f90
? stack_trace_save+0xca/0x160
? stack_trace_save+0xca/0x160
mas_wr_spanning_store+0x16c5/0x1b80
mas_wr_store_entry+0xbf9/0x12e0
mas_store_prealloc+0x205/0x3c0
do_mas_align_munmap+0x6cf/0xd10
do_mas_munmap+0x1bb/0x210
? down_write_killable+0xa6/0x110
__vm_munmap+0x1c4/0x270
__x64_sys_munmap+0x60/0x70
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x589827
Code: 00 00 00 48 c7 c2 98 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff
ff eb 85 66 2e 0f 1f 84 00 00 00 00 00 90 b8 0b 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 98 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffee601ec08 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000400000000000 RCX: 0000000000589827
RDX: 0000000000000000 RSI: 00007ffffffff000 RDI: 0000000000000000
RBP: 00000000004cf000 R08: 00007ffee601ec40 R09: 0000000000923bf0
R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000001000
R13: 00000000004cf040 R14: 0000000000000002 R15: 00007ffee601ed58
</TASK>

Allocated by task 19303:
__kasan_slab_alloc+0xaf/0xe0
kmem_cache_alloc_bulk+0x261/0x360
mas_alloc_nodes+0x2d7/0x4d0
mas_preallocate+0xe2/0x230
do_mas_align_munmap+0x1ce/0xd10
do_mas_munmap+0x1bb/0x210
__vm_munmap+0x1c4/0x270
__x64_sys_munmap+0x60/0x70
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x46/0xb0

The buggy address belongs to the object at ffff88c35a3b9e00
which belongs to the cache maple_node of size 256
The buggy address is located 128 bytes inside of
256-byte region [ffff88c35a3b9e00, ffff88c35a3b9f00)

The buggy address belongs to the physical page:
page:00000000325428b6 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x435a3b9
flags: 0x1400000000000200(slab|node=1|zone=1)
raw: 1400000000000200 ffffea010d71a5c8 ffffea010d71dec8 ffff88810004ff00
raw: 0000000000000000 ffff88c35a3b9000 0000000100000008 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88c35a3b9e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88c35a3b9e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88c35a3b9f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88c35a3b9f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88c35a3ba000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

2022-06-16 02:08:10

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220615 17:17]:

...

> > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > KASAN and other debug options.
>
> Sorry, Liam. I got the same crash :(

Thanks for running this promptly. I am trying to get my own server
setup now.

>
> 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> spanning writes
> 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
>
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
>
> CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> Call Trace:
> <TASK>
> dump_stack_lvl+0xc5/0xf4
> print_address_description+0x7f/0x460
> print_report+0x10b/0x240
> ? mab_mas_cp+0x2d9/0x6c0
> kasan_report+0xe6/0x110
> ? mast_spanning_rebalance+0x2634/0x29b0
> ? mab_mas_cp+0x2d9/0x6c0
> kasan_check_range+0x2ef/0x310
> ? mab_mas_cp+0x2d9/0x6c0
> ? mab_mas_cp+0x2d9/0x6c0
> memcpy+0x44/0x70
> mab_mas_cp+0x2d9/0x6c0
> mas_spanning_rebalance+0x1a3e/0x4f90

Does this translate to an inline around line 2997?
And then probably around 2808?

> ? stack_trace_save+0xca/0x160
> ? stack_trace_save+0xca/0x160
> mas_wr_spanning_store+0x16c5/0x1b80
> mas_wr_store_entry+0xbf9/0x12e0
> mas_store_prealloc+0x205/0x3c0
> do_mas_align_munmap+0x6cf/0xd10
> do_mas_munmap+0x1bb/0x210
> ? down_write_killable+0xa6/0x110
> __vm_munmap+0x1c4/0x270

Looks like a NULL entry being written.

> __x64_sys_munmap+0x60/0x70
> do_syscall_64+0x44/0xa0
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x589827


Thanks,
Liam

2022-06-16 02:24:14

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
>
> * Yu Zhao <[email protected]> [220615 17:17]:
>
> ...
>
> > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > KASAN and other debug options.
> >
> > Sorry, Liam. I got the same crash :(
>
> Thanks for running this promptly. I am trying to get my own server
> setup now.
>
> >
> > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > spanning writes
> > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> >
> > ==================================================================
> > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> >
> > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0xc5/0xf4
> > print_address_description+0x7f/0x460
> > print_report+0x10b/0x240
> > ? mab_mas_cp+0x2d9/0x6c0
> > kasan_report+0xe6/0x110
> > ? mast_spanning_rebalance+0x2634/0x29b0
> > ? mab_mas_cp+0x2d9/0x6c0
> > kasan_check_range+0x2ef/0x310
> > ? mab_mas_cp+0x2d9/0x6c0
> > ? mab_mas_cp+0x2d9/0x6c0
> > memcpy+0x44/0x70
> > mab_mas_cp+0x2d9/0x6c0
> > mas_spanning_rebalance+0x1a3e/0x4f90
>
> Does this translate to an inline around line 2997?
> And then probably around 2808?

$ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
mab_mas_cp+0x2d9/0x6c0:
mab_mas_cp at lib/maple_tree.c:1988
$ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
mas_spanning_rebalance+0x1a3e/0x4f90:
mast_cp_to_nodes at lib/maple_tree.c:?
(inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
$ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
mas_wr_spanning_store+0x16c5/0x1b80:
mas_wr_spanning_store at lib/maple_tree.c:?

No idea why faddr2line didn't work for the last two addresses. GDB
seems more reliable.

(gdb) li *(mab_mas_cp+0x2d9)
0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
(gdb) li *(mas_spanning_rebalance+0x1a3e)
0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
quit)
(gdb) li *(mas_wr_spanning_store+0x16c5)
0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).

2022-06-16 03:07:34

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220615 21:59]:
> On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> >
> > * Yu Zhao <[email protected]> [220615 17:17]:
> >
> > ...
> >
> > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > KASAN and other debug options.
> > >
> > > Sorry, Liam. I got the same crash :(
> >
> > Thanks for running this promptly. I am trying to get my own server
> > setup now.
> >
> > >
> > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > spanning writes
> > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > >
> > > ==================================================================
> > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > >
> > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0xc5/0xf4
> > > print_address_description+0x7f/0x460
> > > print_report+0x10b/0x240
> > > ? mab_mas_cp+0x2d9/0x6c0
> > > kasan_report+0xe6/0x110
> > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > ? mab_mas_cp+0x2d9/0x6c0
> > > kasan_check_range+0x2ef/0x310
> > > ? mab_mas_cp+0x2d9/0x6c0
> > > ? mab_mas_cp+0x2d9/0x6c0
> > > memcpy+0x44/0x70
> > > mab_mas_cp+0x2d9/0x6c0
> > > mas_spanning_rebalance+0x1a3e/0x4f90
> >
> > Does this translate to an inline around line 2997?
> > And then probably around 2808?
>
> $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> mab_mas_cp+0x2d9/0x6c0:
> mab_mas_cp at lib/maple_tree.c:1988
> $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> mas_spanning_rebalance+0x1a3e/0x4f90:
> mast_cp_to_nodes at lib/maple_tree.c:?
> (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> mas_wr_spanning_store+0x16c5/0x1b80:
> mas_wr_spanning_store at lib/maple_tree.c:?
>
> No idea why faddr2line didn't work for the last two addresses. GDB
> seems more reliable.
>
> (gdb) li *(mab_mas_cp+0x2d9)
> 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> (gdb) li *(mas_spanning_rebalance+0x1a3e)
> 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> quit)
> (gdb) li *(mas_wr_spanning_store+0x16c5)
> 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).


Thanks. I am not having luck recreating it. I am hitting what looks
like an unrelated issue in the unstable mm, "scheduling while atomic".
I will try the git commit you indicate above.

2022-06-16 03:27:38

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <[email protected]> wrote:
>
> * Yu Zhao <[email protected]> [220615 21:59]:
> > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> > >
> > > * Yu Zhao <[email protected]> [220615 17:17]:
> > >
> > > ...
> > >
> > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > KASAN and other debug options.
> > > >
> > > > Sorry, Liam. I got the same crash :(
> > >
> > > Thanks for running this promptly. I am trying to get my own server
> > > setup now.
> > >
> > > >
> > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > spanning writes
> > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > >
> > > > ==================================================================
> > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > >
> > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > > Call Trace:
> > > > <TASK>
> > > > dump_stack_lvl+0xc5/0xf4
> > > > print_address_description+0x7f/0x460
> > > > print_report+0x10b/0x240
> > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > kasan_report+0xe6/0x110
> > > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > kasan_check_range+0x2ef/0x310
> > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > memcpy+0x44/0x70
> > > > mab_mas_cp+0x2d9/0x6c0
> > > > mas_spanning_rebalance+0x1a3e/0x4f90
> > >
> > > Does this translate to an inline around line 2997?
> > > And then probably around 2808?
> >
> > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > mab_mas_cp+0x2d9/0x6c0:
> > mab_mas_cp at lib/maple_tree.c:1988
> > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > mas_spanning_rebalance+0x1a3e/0x4f90:
> > mast_cp_to_nodes at lib/maple_tree.c:?
> > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > mas_wr_spanning_store+0x16c5/0x1b80:
> > mas_wr_spanning_store at lib/maple_tree.c:?
> >
> > No idea why faddr2line didn't work for the last two addresses. GDB
> > seems more reliable.
> >
> > (gdb) li *(mab_mas_cp+0x2d9)
> > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > quit)
> > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
>
>
> Thanks. I am not having luck recreating it. I am hitting what looks
> like an unrelated issue in the unstable mm, "scheduling while atomic".
> I will try the git commit you indicate above.

Fix here:
https://lore.kernel.org/linux-mm/[email protected]/

2022-06-16 06:05:00

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <[email protected]> wrote:
>
> On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <[email protected]> wrote:
> >
> > * Yu Zhao <[email protected]> [220615 21:59]:
> > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> > > >
> > > > * Yu Zhao <[email protected]> [220615 17:17]:
> > > >
> > > > ...
> > > >
> > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > > KASAN and other debug options.
> > > > >
> > > > > Sorry, Liam. I got the same crash :(
> > > >
> > > > Thanks for running this promptly. I am trying to get my own server
> > > > setup now.
> > > >
> > > > >
> > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > > spanning writes
> > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > > >
> > > > > ==================================================================
> > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > > >
> > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > > > Call Trace:
> > > > > <TASK>
> > > > > dump_stack_lvl+0xc5/0xf4
> > > > > print_address_description+0x7f/0x460
> > > > > print_report+0x10b/0x240
> > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > kasan_report+0xe6/0x110
> > > > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > kasan_check_range+0x2ef/0x310
> > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > memcpy+0x44/0x70
> > > > > mab_mas_cp+0x2d9/0x6c0
> > > > > mas_spanning_rebalance+0x1a3e/0x4f90
> > > >
> > > > Does this translate to an inline around line 2997?
> > > > And then probably around 2808?
> > >
> > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > > mab_mas_cp+0x2d9/0x6c0:
> > > mab_mas_cp at lib/maple_tree.c:1988
> > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > > mas_spanning_rebalance+0x1a3e/0x4f90:
> > > mast_cp_to_nodes at lib/maple_tree.c:?
> > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > > mas_wr_spanning_store+0x16c5/0x1b80:
> > > mas_wr_spanning_store at lib/maple_tree.c:?
> > >
> > > No idea why faddr2line didn't work for the last two addresses. GDB
> > > seems more reliable.
> > >
> > > (gdb) li *(mab_mas_cp+0x2d9)
> > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > > quit)
> > > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
> >
> >
> > Thanks. I am not having luck recreating it. I am hitting what looks
> > like an unrelated issue in the unstable mm, "scheduling while atomic".
> > I will try the git commit you indicate above.
>
> Fix here:
> https://lore.kernel.org/linux-mm/[email protected]/

A seemingly new crash on arm64:

KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f]
pc : __hwasan_check_x2_67043363+0x4/0x34
lr : mas_wr_walk_descend+0xe0/0x2c0
sp : ffffffc0164378d0
x29: ffffffc0164378f0 x28: 13ffff8028ee7328 x27: ffffffc016437a68
x26: 0dffff807aa63710 x25: ffffffc016437a60 x24: 51ffff8028ee1928
x23: ffffffc016437a78 x22: ffffffc0164379e0 x21: ffffffc016437998
x20: efffffc000000000 x19: ffffffc016437998 x18: 07ffff8077718180
x17: 45ffff800b366010 x16: 0000000000000000 x15: 9cffff8092bfcdf0
x14: ffffffefef411b8c x13: 0000000000000001 x12: 0000000000000002
x11: ffffffffffffff00 x10: 0000000000000000 x9 : efffffc000000000
x8 : ffffffc016437a60 x7 : 0000000000000000 x6 : ffffffefef8246cc
x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffeff0bf48ee
x2 : 0000000000000008 x1 : ffffffc0164379b8 x0 : ffffffc016437998
Call trace:
__hwasan_check_x2_67043363+0x4/0x34
mas_wr_store_entry+0x178/0x5c0
mas_store+0x88/0xc8
dup_mmap+0x4bc/0x6d8
dup_mm+0x8c/0x17c
copy_mm+0xb0/0x12c
copy_process+0xa44/0x17d4
kernel_clone+0x100/0x2cc
__arm64_sys_clone+0xf4/0x120
el0_svc_common+0xfc/0x1cc
do_el0_svc_compat+0x38/0x5c
el0_svc_compat+0x68/0xf4
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x190/0x194
Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)

2022-06-16 06:30:06

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <[email protected]> wrote:
>
> On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <[email protected]> wrote:
> >
> > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <[email protected]> wrote:
> > >
> > > * Yu Zhao <[email protected]> [220615 21:59]:
> > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> > > > >
> > > > > * Yu Zhao <[email protected]> [220615 17:17]:
> > > > >
> > > > > ...
> > > > >
> > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > > > KASAN and other debug options.
> > > > > >
> > > > > > Sorry, Liam. I got the same crash :(
> > > > >
> > > > > Thanks for running this promptly. I am trying to get my own server
> > > > > setup now.
> > > > >
> > > > > >
> > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > > > spanning writes
> > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > > > >
> > > > > > ==================================================================
> > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > > > >
> > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > dump_stack_lvl+0xc5/0xf4
> > > > > > print_address_description+0x7f/0x460
> > > > > > print_report+0x10b/0x240
> > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > kasan_report+0xe6/0x110
> > > > > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > kasan_check_range+0x2ef/0x310
> > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > memcpy+0x44/0x70
> > > > > > mab_mas_cp+0x2d9/0x6c0
> > > > > > mas_spanning_rebalance+0x1a3e/0x4f90
> > > > >
> > > > > Does this translate to an inline around line 2997?
> > > > > And then probably around 2808?
> > > >
> > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > > > mab_mas_cp+0x2d9/0x6c0:
> > > > mab_mas_cp at lib/maple_tree.c:1988
> > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > > > mas_spanning_rebalance+0x1a3e/0x4f90:
> > > > mast_cp_to_nodes at lib/maple_tree.c:?
> > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > > > mas_wr_spanning_store+0x16c5/0x1b80:
> > > > mas_wr_spanning_store at lib/maple_tree.c:?
> > > >
> > > > No idea why faddr2line didn't work for the last two addresses. GDB
> > > > seems more reliable.
> > > >
> > > > (gdb) li *(mab_mas_cp+0x2d9)
> > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > > > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > > > quit)
> > > > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
> > >
> > >
> > > Thanks. I am not having luck recreating it. I am hitting what looks
> > > like an unrelated issue in the unstable mm, "scheduling while atomic".
> > > I will try the git commit you indicate above.
> >
> > Fix here:
> > https://lore.kernel.org/linux-mm/[email protected]/
>
> A seemingly new crash on arm64:
>
> KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f]
> Call trace:
> __hwasan_check_x2_67043363+0x4/0x34
> mas_wr_store_entry+0x178/0x5c0
> mas_store+0x88/0xc8
> dup_mmap+0x4bc/0x6d8
> dup_mm+0x8c/0x17c
> copy_mm+0xb0/0x12c
> copy_process+0xa44/0x17d4
> kernel_clone+0x100/0x2cc
> __arm64_sys_clone+0xf4/0x120
> el0_svc_common+0xfc/0x1cc
> do_el0_svc_compat+0x38/0x5c
> el0_svc_compat+0x68/0xf4
> el0t_32_sync_handler+0xc0/0xf0
> el0t_32_sync+0x190/0x194
> Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)

And bad rss counters from another arm64 machine:

BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4
Call trace:
__mmdrop+0x1f0/0x208
__mmput+0x194/0x198
mmput+0x5c/0x80
exit_mm+0x108/0x190
do_exit+0x244/0xc98
__arm64_sys_exit_group+0x0/0x30
__wake_up_parent+0x0/0x48
el0_svc_common+0xfc/0x1cc
do_el0_svc_compat+0x38/0x5c
el0_svc_compat+0x68/0xf4
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x190/0x194
Code: b000b520 91259c00 aa1303e1 94482015 (d4210000)

2022-06-16 18:29:27

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220616 01:56]:
> On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <[email protected]> wrote:
> >
> > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <[email protected]> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <[email protected]> wrote:
> > > >
> > > > * Yu Zhao <[email protected]> [220615 21:59]:
> > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> > > > > >
> > > > > > * Yu Zhao <[email protected]> [220615 17:17]:
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > > > > KASAN and other debug options.
> > > > > > >
> > > > > > > Sorry, Liam. I got the same crash :(
> > > > > >
> > > > > > Thanks for running this promptly. I am trying to get my own server
> > > > > > setup now.
> > > > > >
> > > > > > >
> > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > > > > spanning writes
> > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > > > > >
> > > > > > > ==================================================================
> > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > > > > >
> > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > > > > > Call Trace:
> > > > > > > <TASK>
> > > > > > > dump_stack_lvl+0xc5/0xf4
> > > > > > > print_address_description+0x7f/0x460
> > > > > > > print_report+0x10b/0x240
> > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > kasan_report+0xe6/0x110
> > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > kasan_check_range+0x2ef/0x310
> > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > memcpy+0x44/0x70
> > > > > > > mab_mas_cp+0x2d9/0x6c0
> > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90
> > > > > >
> > > > > > Does this translate to an inline around line 2997?
> > > > > > And then probably around 2808?
> > > > >
> > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > > > > mab_mas_cp+0x2d9/0x6c0:
> > > > > mab_mas_cp at lib/maple_tree.c:1988
> > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > > > > mas_spanning_rebalance+0x1a3e/0x4f90:
> > > > > mast_cp_to_nodes at lib/maple_tree.c:?
> > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > > > > mas_wr_spanning_store+0x16c5/0x1b80:
> > > > > mas_wr_spanning_store at lib/maple_tree.c:?
> > > > >
> > > > > No idea why faddr2line didn't work for the last two addresses. GDB
> > > > > seems more reliable.
> > > > >
> > > > > (gdb) li *(mab_mas_cp+0x2d9)
> > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > > > > quit)
> > > > > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
> > > >
> > > >
> > > > Thanks. I am not having luck recreating it. I am hitting what looks
> > > > like an unrelated issue in the unstable mm, "scheduling while atomic".
> > > > I will try the git commit you indicate above.
> > >
> > > Fix here:
> > > https://lore.kernel.org/linux-mm/[email protected]/
> >
> > A seemingly new crash on arm64:
> >
> > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f]
> > Call trace:
> > __hwasan_check_x2_67043363+0x4/0x34
> > mas_wr_store_entry+0x178/0x5c0
> > mas_store+0x88/0xc8
> > dup_mmap+0x4bc/0x6d8
> > dup_mm+0x8c/0x17c
> > copy_mm+0xb0/0x12c
> > copy_process+0xa44/0x17d4
> > kernel_clone+0x100/0x2cc
> > __arm64_sys_clone+0xf4/0x120
> > el0_svc_common+0xfc/0x1cc
> > do_el0_svc_compat+0x38/0x5c
> > el0_svc_compat+0x68/0xf4
> > el0t_32_sync_handler+0xc0/0xf0
> > el0t_32_sync+0x190/0x194
> > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)
>
> And bad rss counters from another arm64 machine:
>
> BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4
> Call trace:
> __mmdrop+0x1f0/0x208
> __mmput+0x194/0x198
> mmput+0x5c/0x80
> exit_mm+0x108/0x190
> do_exit+0x244/0xc98
> __arm64_sys_exit_group+0x0/0x30
> __wake_up_parent+0x0/0x48
> el0_svc_common+0xfc/0x1cc
> do_el0_svc_compat+0x38/0x5c
> el0_svc_compat+0x68/0xf4
> el0t_32_sync_handler+0xc0/0xf0
> el0t_32_sync+0x190/0x194
> Code: b000b520 91259c00 aa1303e1 94482015 (d4210000)
>


What was the setup for these two? I'm running trinity, but I suspect
you are using stress-ng? If so, what are the arguments? My arm64 vm is
even lower memory than my x86_64 vm so I will probably have to adjust
accordingly.


Thanks,
Liam

2022-06-16 19:10:29

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

On Thu, Jun 16, 2022 at 12:27 PM Liam Howlett <[email protected]> wrote:
>
> * Yu Zhao <[email protected]> [220616 01:56]:
> > On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <[email protected]> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <[email protected]> wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <[email protected]> wrote:
> > > > >
> > > > > * Yu Zhao <[email protected]> [220615 21:59]:
> > > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> > > > > > >
> > > > > > > * Yu Zhao <[email protected]> [220615 17:17]:
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > > > > > KASAN and other debug options.
> > > > > > > >
> > > > > > > > Sorry, Liam. I got the same crash :(
> > > > > > >
> > > > > > > Thanks for running this promptly. I am trying to get my own server
> > > > > > > setup now.
> > > > > > >
> > > > > > > >
> > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > > > > > spanning writes
> > > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > > > > > >
> > > > > > > > ==================================================================
> > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > > > > > >
> > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > > > > > > Call Trace:
> > > > > > > > <TASK>
> > > > > > > > dump_stack_lvl+0xc5/0xf4
> > > > > > > > print_address_description+0x7f/0x460
> > > > > > > > print_report+0x10b/0x240
> > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > kasan_report+0xe6/0x110
> > > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > kasan_check_range+0x2ef/0x310
> > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > memcpy+0x44/0x70
> > > > > > > > mab_mas_cp+0x2d9/0x6c0
> > > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90
> > > > > > >
> > > > > > > Does this translate to an inline around line 2997?
> > > > > > > And then probably around 2808?
> > > > > >
> > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > > > > > mab_mas_cp+0x2d9/0x6c0:
> > > > > > mab_mas_cp at lib/maple_tree.c:1988
> > > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > > > > > mas_spanning_rebalance+0x1a3e/0x4f90:
> > > > > > mast_cp_to_nodes at lib/maple_tree.c:?
> > > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > > > > > mas_wr_spanning_store+0x16c5/0x1b80:
> > > > > > mas_wr_spanning_store at lib/maple_tree.c:?
> > > > > >
> > > > > > No idea why faddr2line didn't work for the last two addresses. GDB
> > > > > > seems more reliable.
> > > > > >
> > > > > > (gdb) li *(mab_mas_cp+0x2d9)
> > > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > > > > > quit)
> > > > > > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
> > > > >
> > > > >
> > > > > Thanks. I am not having luck recreating it. I am hitting what looks
> > > > > like an unrelated issue in the unstable mm, "scheduling while atomic".
> > > > > I will try the git commit you indicate above.
> > > >
> > > > Fix here:
> > > > https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > A seemingly new crash on arm64:
> > >
> > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f]
> > > Call trace:
> > > __hwasan_check_x2_67043363+0x4/0x34
> > > mas_wr_store_entry+0x178/0x5c0
> > > mas_store+0x88/0xc8
> > > dup_mmap+0x4bc/0x6d8
> > > dup_mm+0x8c/0x17c
> > > copy_mm+0xb0/0x12c
> > > copy_process+0xa44/0x17d4
> > > kernel_clone+0x100/0x2cc
> > > __arm64_sys_clone+0xf4/0x120
> > > el0_svc_common+0xfc/0x1cc
> > > do_el0_svc_compat+0x38/0x5c
> > > el0_svc_compat+0x68/0xf4
> > > el0t_32_sync_handler+0xc0/0xf0
> > > el0t_32_sync+0x190/0x194
> > > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)
> >
> > And bad rss counters from another arm64 machine:
> >
> > BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4
> > Call trace:
> > __mmdrop+0x1f0/0x208
> > __mmput+0x194/0x198
> > mmput+0x5c/0x80
> > exit_mm+0x108/0x190
> > do_exit+0x244/0xc98
> > __arm64_sys_exit_group+0x0/0x30
> > __wake_up_parent+0x0/0x48
> > el0_svc_common+0xfc/0x1cc
> > do_el0_svc_compat+0x38/0x5c
> > el0_svc_compat+0x68/0xf4
> > el0t_32_sync_handler+0xc0/0xf0
> > el0t_32_sync+0x190/0x194
> > Code: b000b520 91259c00 aa1303e1 94482015 (d4210000)
> >
>
> What was the setup for these two? I'm running trinity, but I suspect
> you are using stress-ng?

That's correct.

> If so, what are the arguments? My arm64 vm is
> even lower memory than my x86_64 vm so I will probably have to adjust
> accordingly.

I usually lower the N for `-a N`.

2022-06-17 14:00:39

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

* Yu Zhao <[email protected]> [220616 14:35]:
> On Thu, Jun 16, 2022 at 12:27 PM Liam Howlett <[email protected]> wrote:
> >
> > * Yu Zhao <[email protected]> [220616 01:56]:
> > > On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <[email protected]> wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <[email protected]> wrote:
> > > > >
> > > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <[email protected]> wrote:
> > > > > >
> > > > > > * Yu Zhao <[email protected]> [220615 21:59]:
> > > > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <[email protected]> wrote:
> > > > > > > >
> > > > > > > > * Yu Zhao <[email protected]> [220615 17:17]:
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > > > > > > KASAN and other debug options.
> > > > > > > > >
> > > > > > > > > Sorry, Liam. I got the same crash :(
> > > > > > > >
> > > > > > > > Thanks for running this promptly. I am trying to get my own server
> > > > > > > > setup now.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > > > > > > spanning writes
> > > > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > > > > > > >
> > > > > > > > > ==================================================================
> > > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > > > > > > >
> > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S I 5.19.0-smp-DEV #1
> > > > > > > > > Call Trace:
> > > > > > > > > <TASK>
> > > > > > > > > dump_stack_lvl+0xc5/0xf4
> > > > > > > > > print_address_description+0x7f/0x460
> > > > > > > > > print_report+0x10b/0x240
> > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > kasan_report+0xe6/0x110
> > > > > > > > > ? mast_spanning_rebalance+0x2634/0x29b0
> > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > kasan_check_range+0x2ef/0x310
> > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > memcpy+0x44/0x70
> > > > > > > > > mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90
> > > > > > > >
> > > > > > > > Does this translate to an inline around line 2997?
> > > > > > > > And then probably around 2808?
> > > > > > >
> > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > > > > > > mab_mas_cp+0x2d9/0x6c0:
> > > > > > > mab_mas_cp at lib/maple_tree.c:1988
> > > > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90:
> > > > > > > mast_cp_to_nodes at lib/maple_tree.c:?
> > > > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > > > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > > > > > > mas_wr_spanning_store+0x16c5/0x1b80:
> > > > > > > mas_wr_spanning_store at lib/maple_tree.c:?
> > > > > > >
> > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB
> > > > > > > seems more reliable.
> > > > > > >
> > > > > > > (gdb) li *(mab_mas_cp+0x2d9)
> > > > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > > > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > > > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > > > > > > quit)
> > > > > > > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > > > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
> > > > > >
> > > > > >
> > > > > > Thanks. I am not having luck recreating it. I am hitting what looks
> > > > > > like an unrelated issue in the unstable mm, "scheduling while atomic".
> > > > > > I will try the git commit you indicate above.
> > > > >
> > > > > Fix here:
> > > > > https://lore.kernel.org/linux-mm/[email protected]/
> > > >
> > > > A seemingly new crash on arm64:
> > > >
> > > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f]
> > > > Call trace:
> > > > __hwasan_check_x2_67043363+0x4/0x34
> > > > mas_wr_store_entry+0x178/0x5c0
> > > > mas_store+0x88/0xc8
> > > > dup_mmap+0x4bc/0x6d8
> > > > dup_mm+0x8c/0x17c
> > > > copy_mm+0xb0/0x12c
> > > > copy_process+0xa44/0x17d4
> > > > kernel_clone+0x100/0x2cc
> > > > __arm64_sys_clone+0xf4/0x120
> > > > el0_svc_common+0xfc/0x1cc
> > > > do_el0_svc_compat+0x38/0x5c
> > > > el0_svc_compat+0x68/0xf4
> > > > el0t_32_sync_handler+0xc0/0xf0
> > > > el0t_32_sync+0x190/0x194
> > > > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)
> > >
> > > And bad rss counters from another arm64 machine:
> > >
> > > BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4
> > > Call trace:
> > > __mmdrop+0x1f0/0x208
> > > __mmput+0x194/0x198
> > > mmput+0x5c/0x80
> > > exit_mm+0x108/0x190
> > > do_exit+0x244/0xc98
> > > __arm64_sys_exit_group+0x0/0x30
> > > __wake_up_parent+0x0/0x48
> > > el0_svc_common+0xfc/0x1cc
> > > do_el0_svc_compat+0x38/0x5c
> > > el0_svc_compat+0x68/0xf4
> > > el0t_32_sync_handler+0xc0/0xf0
> > > el0t_32_sync+0x190/0x194
> > > Code: b000b520 91259c00 aa1303e1 94482015 (d4210000)
> > >
> >
> > What was the setup for these two? I'm running trinity, but I suspect
> > you are using stress-ng?
>
> That's correct.
>
> > If so, what are the arguments? My arm64 vm is
> > even lower memory than my x86_64 vm so I will probably have to adjust
> > accordingly.
>
> I usually lower the N for `-a N`.

I'm still trying to reproduce any of these bugs you are seeing. I sent
out two fixes that I cc'ed you on that may help at least the last one
here. My thinking is there isn't enough pre-allocation happening and so
I am missing some of the munmap events. I fixed this by not
pre-allocating the side tree and return -ENOMEM instead. This is safe
since munmap can allocate anyways for splits.