2023-03-27 19:05:20

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH 8/8] mm: enable maple tree RCU mode by default.

From: "Liam R. Howlett" <[email protected]>

Use the maple tree in RCU mode for VMA tracking.

The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be
a NULL in the direction of the growth and thus will only update a pivot.

It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a
VMA to grow and consume the empty space. Overwriting the entire NULL
entry causes the tree to be altered in a way that is not safe for
concurrent readers; the readers may see a node being rewritten or one
that does not match the maple state they are using.

Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/linux-mm/[email protected]/
Cc: [email protected]
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: [email protected]
---
include/linux/mm_types.h | 3 ++-
kernel/fork.c | 3 +++
mm/mmap.c | 3 ++-
3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0722859c3647..a57e6ae78e65 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -774,7 +774,8 @@ struct mm_struct {
unsigned long cpu_bitmap[];
};

-#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
+#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
+ MT_FLAGS_USE_RCU)
extern struct mm_struct init_mm;

/* Pointer magic because the dynamic array size confuses some compilers. */
diff --git a/kernel/fork.c b/kernel/fork.c
index d8cda4c6de6c..1bf31ba07e85 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
if (retval)
goto out;

+ mt_clear_in_rcu(vmi.mas.tree);
for_each_vma(old_vmi, mpnt) {
struct file *file;

@@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = arch_dup_mmap(oldmm, mm);
loop_out:
vma_iter_free(&vmi);
+ if (!retval)
+ mt_set_in_rcu(vmi.mas.tree);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 740b54be3ed4..16cbb83b3ec6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2277,7 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, &mt_detach, 0, 0);
- mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN);
+ mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);

/*
@@ -3042,6 +3042,7 @@ void exit_mmap(struct mm_struct *mm)
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
mmap_write_lock(mm);
+ mt_clear_in_rcu(&mm->mm_mt);
free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
--
2.39.2


2023-03-27 19:48:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 8/8] mm: enable maple tree RCU mode by default.

On Mon, 27 Mar 2023 14:55:32 -0400 "Liam R. Howlett" <[email protected]> wrote:

> Use the maple tree in RCU mode for VMA tracking.
>
> The maple tree tracks the stack and is able to update the pivot
> (lower/upper boundary) in-place to allow the page fault handler to write
> to the tree while holding just the mmap read lock. This is safe as the
> writes to the stack have a guard VMA which ensures there will always be
> a NULL in the direction of the growth and thus will only update a pivot.
>
> It is possible, but not recommended, to have VMAs that grow up/down
> without guard VMAs. syzbot has constructed a testcase which sets up a
> VMA to grow and consume the empty space. Overwriting the entire NULL
> entry causes the tree to be altered in a way that is not safe for
> concurrent readers; the readers may see a node being rewritten or one
> that does not match the maple state they are using.
>
> Enabling RCU mode allows the concurrent readers to see a stable node and

This differs from what had. Intended?

--- a/mm/mmap.c~mm-enable-maple-tree-rcu-mode-by-default-v8
+++ a/mm/mmap.c
@@ -2277,8 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, &mt_detach, 0, 0);
- mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags &
- (MT_FLAGS_LOCK_MASK | MT_FLAGS_USE_RCU));
+ mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
mt_set_external_lock(&mt_detach, &mm->mmap_lock);

/*
_

2023-03-27 19:49:53

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 8/8] mm: enable maple tree RCU mode by default.

* Andrew Morton <[email protected]> [230327 15:38]:
> On Mon, 27 Mar 2023 14:55:32 -0400 "Liam R. Howlett" <[email protected]> wrote:
>
> > Use the maple tree in RCU mode for VMA tracking.
> >
> > The maple tree tracks the stack and is able to update the pivot
> > (lower/upper boundary) in-place to allow the page fault handler to write
> > to the tree while holding just the mmap read lock. This is safe as the
> > writes to the stack have a guard VMA which ensures there will always be
> > a NULL in the direction of the growth and thus will only update a pivot.
> >
> > It is possible, but not recommended, to have VMAs that grow up/down
> > without guard VMAs. syzbot has constructed a testcase which sets up a
> > VMA to grow and consume the empty space. Overwriting the entire NULL
> > entry causes the tree to be altered in a way that is not safe for
> > concurrent readers; the readers may see a node being rewritten or one
> > that does not match the maple state they are using.
> >
> > Enabling RCU mode allows the concurrent readers to see a stable node and
>
> This differs from what had. Intended?

Yes, this is not necessary. The scope of this tree is limited to the
function do_vmi_align_munmap() and so we don't need to free the nodes
with RCU.

Thanks,
Liam

>
> --- a/mm/mmap.c~mm-enable-maple-tree-rcu-mode-by-default-v8
> +++ a/mm/mmap.c
> @@ -2277,8 +2277,7 @@ do_vmi_align_munmap(struct vma_iterator
> int count = 0;
> int error = -ENOMEM;
> MA_STATE(mas_detach, &mt_detach, 0, 0);
> - mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags &
> - (MT_FLAGS_LOCK_MASK | MT_FLAGS_USE_RCU));
> + mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK);
> mt_set_external_lock(&mt_detach, &mm->mmap_lock);
>
> /*
> _
>

2023-04-11 01:29:56

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 8/8] mm: enable maple tree RCU mode by default.


Hello,

kernel test robot noticed a -8.5% regression of stress-ng.mmapaddr.ops_per_sec on:


commit: b5768f8ae36fd0c218838c88b114a9978db05c91 ("[PATCH 8/8] mm: enable maple tree RCU mode by default.")
url: https://github.com/intel-lab-lkp/linux/commits/Liam-R-Howlett/maple_tree-be-more-cautious-about-dead-nodes/20230328-033353
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 3a93e40326c8f470e71d20b4c42d36767450f38f
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH 8/8] mm: enable maple tree RCU mode by default.

testcase: stress-ng
test machine: 96 threads 2 sockets (Ice Lake) with 256G memory
parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: ext4
class: vm
test: mmapaddr
cpufreq_governor: performance




If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
vm/gcc-11/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/mmapaddr/stress-ng/60s

commit:
5ae51d7b1f ("maple_tree: add RCU lock checking to rcu callback functions")
b5768f8ae3 ("mm: enable maple tree RCU mode by default.")

5ae51d7b1fb2fba9 b5768f8ae36fd0c218838c88b11
---------------- ---------------------------
%stddev %change %stddev
\ | \
53853773 -8.4% 49313594 stress-ng.mmapaddr.ops
898492 -8.5% 821889 stress-ng.mmapaddr.ops_per_sec
1.077e+08 -8.4% 98633700 stress-ng.time.minor_page_faults
863.25 -2.5% 841.62 stress-ng.time.percent_of_cpu_this_job_got
491.00 -1.9% 481.58 stress-ng.time.system_time
221487 +12.8% 249928 meminfo.SUnreclaim
0.04 ? 2% +0.2 0.27 mpstat.cpu.all.soft%
93053 ? 6% +20.0% 111624 ? 7% numa-meminfo.node1.SUnreclaim
23263 ? 6% +20.0% 27913 ? 7% numa-vmstat.node1.nr_slab_unreclaimable
1947 +18.1% 2299 vmstat.system.cs
55371 +12.8% 62436 proc-vmstat.nr_slab_unreclaimable
4.898e+08 -6.6% 4.573e+08 proc-vmstat.numa_hit
4.89e+08 -6.7% 4.561e+08 proc-vmstat.numa_local
4.813e+08 -5.5% 4.548e+08 proc-vmstat.pgalloc_normal
1.08e+08 -8.4% 98948247 proc-vmstat.pgfault
4.812e+08 -5.6% 4.544e+08 proc-vmstat.pgfree
3.35 +39.9% 4.68 ? 18% perf-stat.i.MPKI
8.691e+09 -2.9% 8.441e+09 perf-stat.i.branch-instructions
0.57 +0.2 0.76 ? 7% perf-stat.i.branch-miss-rate%
43345846 +30.2% 56428345 perf-stat.i.branch-misses
20.37 ? 2% -2.7 17.66 perf-stat.i.cache-miss-rate%
26387649 ? 2% +18.0% 31141545 perf-stat.i.cache-misses
1.286e+08 +35.4% 1.741e+08 perf-stat.i.cache-references
1540 +24.1% 1912 perf-stat.i.context-switches
0.71 +4.5% 0.74 ? 4% perf-stat.i.cpi
1212 -12.7% 1058 ? 5% perf-stat.i.cycles-between-cache-misses
1.163e+10 -2.0% 1.139e+10 perf-stat.i.dTLB-loads
0.05 ? 2% +0.0 0.05 ? 12% perf-stat.i.dTLB-store-miss-rate%
7.19e+09 -2.3% 7.023e+09 perf-stat.i.dTLB-stores
4.587e+10 -2.6% 4.468e+10 perf-stat.i.instructions
1.42 -3.0% 1.38 perf-stat.i.ipc
287.84 -2.2% 281.56 perf-stat.i.metric.M/sec
89.27 -8.4 80.91 perf-stat.i.node-load-miss-rate%
417672 ? 5% +102.1% 844069 ? 3% perf-stat.i.node-loads
85.67 -18.1 67.58 perf-stat.i.node-store-miss-rate%
708025 ? 4% +187.6% 2036086 ? 3% perf-stat.i.node-stores
2.80 +39.0% 3.90 perf-stat.overall.MPKI
0.50 +0.2 0.67 perf-stat.overall.branch-miss-rate%
20.52 ? 2% -2.6 17.89 perf-stat.overall.cache-miss-rate%
0.70 +3.0% 0.72 perf-stat.overall.cpi
1209 ? 2% -15.0% 1028 perf-stat.overall.cycles-between-cache-misses
0.04 ? 3% +0.0 0.05 ? 13% perf-stat.overall.dTLB-store-miss-rate%
1.44 -2.9% 1.40 perf-stat.overall.ipc
90.25 -8.7 81.54 perf-stat.overall.node-load-miss-rate%
86.37 -18.3 68.08 perf-stat.overall.node-store-miss-rate%
8.553e+09 -2.9% 8.308e+09 perf-stat.ps.branch-instructions
42646399 +30.2% 55522635 perf-stat.ps.branch-misses
25968476 ? 2% +18.0% 30648177 perf-stat.ps.cache-misses
1.265e+08 +35.4% 1.714e+08 perf-stat.ps.cache-references
1515 +24.2% 1881 perf-stat.ps.context-switches
1.144e+10 -2.0% 1.121e+10 perf-stat.ps.dTLB-loads
7.076e+09 -2.3% 6.912e+09 perf-stat.ps.dTLB-stores
4.514e+10 -2.6% 4.398e+10 perf-stat.ps.instructions
411017 ? 5% +102.1% 830672 ? 3% perf-stat.ps.node-loads
696326 ? 4% +187.7% 2003485 ? 3% perf-stat.ps.node-stores
2.847e+12 -2.5% 2.777e+12 perf-stat.total.instructions
24.83 -4.0 20.87 ? 37% perf-profile.calltrace.cycles-pp.__munmap
23.56 -3.7 19.86 ? 37% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
23.37 -3.7 19.72 ? 37% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
23.05 -3.6 19.44 ? 37% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
22.88 -3.6 19.27 ? 37% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
17.14 -3.3 13.84 ? 37% perf-profile.calltrace.cycles-pp.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
11.25 -2.2 9.04 ? 37% perf-profile.calltrace.cycles-pp.mincore
9.04 -1.8 7.23 ? 37% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mincore
8.80 -1.8 7.05 ? 37% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mincore
7.73 -1.5 6.19 ? 37% perf-profile.calltrace.cycles-pp.__do_sys_mincore.do_syscall_64.entry_SYSCALL_64_after_hwframe.mincore
12.75 -1.2 11.60 ? 2% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
16.22 -1.1 15.10 ? 2% perf-profile.calltrace.cycles-pp.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.01 -1.1 14.91 ? 2% perf-profile.calltrace.cycles-pp.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff.do_syscall_64
8.49 -0.7 7.74 ? 3% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
8.06 -0.7 7.33 ? 3% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
8.25 -0.7 7.57 ? 2% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
6.26 ? 2% -0.6 5.62 ? 3% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
2.94 -0.6 2.33 ? 37% perf-profile.calltrace.cycles-pp.do_mincore.__do_sys_mincore.do_syscall_64.entry_SYSCALL_64_after_hwframe.mincore
5.44 ? 2% -0.6 4.84 ? 3% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
8.68 -0.6 8.11 perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
7.99 -0.5 7.48 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate
2.18 ? 2% -0.5 1.73 ? 37% perf-profile.calltrace.cycles-pp.__get_free_pages.__do_sys_mincore.do_syscall_64.entry_SYSCALL_64_after_hwframe.mincore
4.07 ? 3% -0.4 3.62 ? 4% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
5.21 -0.4 4.78 ? 2% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
1.30 ? 5% -0.3 1.02 ? 38% perf-profile.calltrace.cycles-pp.stress_mmapaddr_child
3.44 -0.3 3.16 ? 3% perf-profile.calltrace.cycles-pp.folio_mark_accessed.follow_page_pte.__get_user_pages.populate_vma_page_range.__mm_populate
3.68 -0.3 3.42 ? 2% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
3.51 -0.2 3.27 ? 2% perf-profile.calltrace.cycles-pp.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.30 -0.2 2.07 ? 3% perf-profile.calltrace.cycles-pp.walk_page_range.do_mincore.__do_sys_mincore.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.82 -0.2 2.60 perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap
2.46 ? 2% -0.2 2.24 ? 2% perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
2.65 ? 2% -0.2 2.44 perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.do_vmi_align_munmap
2.46 -0.2 2.26 ? 2% perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
0.61 ? 3% -0.2 0.40 ? 57% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
2.67 -0.2 2.47 ? 2% perf-profile.calltrace.cycles-pp.__pmd_alloc.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
2.48 ? 2% -0.2 2.29 perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region
1.64 ? 2% -0.2 1.47 ? 4% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
2.92 -0.2 2.74 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.86 -0.2 0.69 ? 38% perf-profile.calltrace.cycles-pp.__entry_text_start.mincore
1.68 ? 3% -0.2 1.52 ? 3% perf-profile.calltrace.cycles-pp.__alloc_pages.__get_free_pages.__do_sys_mincore.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.65 -0.2 2.49 ? 2% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
1.31 ? 2% -0.1 1.17 ? 4% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
2.36 -0.1 2.22 ? 2% perf-profile.calltrace.cycles-pp.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__get_user_pages
2.14 -0.1 2.00 ? 2% perf-profile.calltrace.cycles-pp.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.64 ? 2% -0.1 1.51 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages.__pmd_alloc.__handle_mm_fault.handle_mm_fault.__get_user_pages
1.16 -0.1 1.04 ? 3% perf-profile.calltrace.cycles-pp.__walk_page_range.walk_page_range.do_mincore.__do_sys_mincore.do_syscall_64
0.58 ? 2% -0.1 0.46 ? 38% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.pte_alloc_one.__pte_alloc.do_anonymous_page
0.87 ? 2% -0.1 0.77 ? 4% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
0.86 ? 3% -0.1 0.76 ? 3% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.___pte_free_tlb.free_pud_range.free_p4d_range.free_pgd_range
1.08 -0.1 0.99 ? 3% perf-profile.calltrace.cycles-pp.walk_pgd_range.__walk_page_range.walk_page_range.do_mincore.__do_sys_mincore
1.60 ? 2% -0.1 1.51 ? 2% perf-profile.calltrace.cycles-pp.__pud_alloc.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range
0.96 ? 3% -0.1 0.88 ? 2% perf-profile.calltrace.cycles-pp.___pte_free_tlb.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables
1.57 ? 2% -0.1 1.49 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault
0.83 ? 3% -0.1 0.75 ? 2% perf-profile.calltrace.cycles-pp.native_flush_tlb_local.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.unmap_region
0.85 ? 4% -0.1 0.78 ? 3% perf-profile.calltrace.cycles-pp.kmem_cache_free.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
1.33 ? 2% -0.1 1.26 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages.get_zeroed_page.__pud_alloc.__handle_mm_fault.handle_mm_fault
0.91 ? 3% -0.1 0.84 ? 2% perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.do_vmi_align_munmap.do_vmi_munmap.move_vma
0.91 -0.1 0.84 ? 3% perf-profile.calltrace.cycles-pp.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range.do_mincore
1.04 ? 2% -0.1 0.97 ? 3% perf-profile.calltrace.cycles-pp.__pmd_alloc.move_page_tables.move_vma.__do_sys_mremap.do_syscall_64
0.66 ? 3% -0.1 0.60 ? 4% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.__pmd_alloc.__handle_mm_fault.handle_mm_fault.__get_user_pages
0.68 ? 3% -0.1 0.62 ? 4% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.__pmd_alloc.__handle_mm_fault.handle_mm_fault
0.77 -0.1 0.71 ? 3% perf-profile.calltrace.cycles-pp.walk_pud_range.walk_p4d_range.walk_pgd_range.__walk_page_range.walk_page_range
0.76 ? 2% -0.1 0.70 ? 3% perf-profile.calltrace.cycles-pp.__memcg_kmem_charge_page.__alloc_pages.__pmd_alloc.__handle_mm_fault.handle_mm_fault
0.64 ? 2% -0.0 0.59 ? 3% perf-profile.calltrace.cycles-pp.__memcg_kmem_charge_page.__alloc_pages.get_zeroed_page.__pud_alloc.__handle_mm_fault
0.61 ? 2% -0.0 0.57 ? 3% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.move_ptes.move_page_tables.move_vma.__do_sys_mremap
0.62 ? 2% -0.0 0.59 ? 2% perf-profile.calltrace.cycles-pp.mas_next_entry.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
1.18 ? 7% +0.2 1.34 ? 6% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.06 ? 8% +0.2 1.22 ? 7% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.57 +0.2 0.74 ? 2% perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_modify.mas_store_prealloc.vma_link.copy_vma
0.60 +0.2 0.77 ? 2% perf-profile.calltrace.cycles-pp.mas_wr_modify.mas_store_prealloc.vma_link.copy_vma.move_vma
1.32 ? 7% +0.2 1.51 ? 7% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
1.93 ? 5% +0.2 2.15 ? 6% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
1.26 +0.3 1.53 ? 2% perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma.__do_sys_mremap
0.61 ? 2% +0.3 0.92 ? 2% perf-profile.calltrace.cycles-pp.mas_wr_modify.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.move_vma
1.61 +0.3 1.96 perf-profile.calltrace.cycles-pp.mas_wr_modify.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff
1.53 +0.4 1.88 perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_modify.mas_store_prealloc.mmap_region.do_mmap
0.76 ? 4% +0.4 1.13 ? 2% perf-profile.calltrace.cycles-pp.mas_wr_modify.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
1.15 ? 2% +0.4 1.60 ? 2% perf-profile.calltrace.cycles-pp.mas_wr_bnode.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
1.19 +0.5 1.69 ? 2% perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_link.copy_vma.move_vma.__do_sys_mremap
0.46 ? 37% +0.6 1.02 ? 2% perf-profile.calltrace.cycles-pp.mas_commit_b_node.mas_wr_bnode.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
1.30 ? 2% +0.7 1.96 ? 2% perf-profile.calltrace.cycles-pp.mas_wr_node_store.mas_wr_modify.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
0.68 ? 2% +0.7 1.34 ? 3% perf-profile.calltrace.cycles-pp.kmem_cache_free_bulk.mas_destroy.mas_store_prealloc.mmap_region.do_mmap
0.00 +0.7 0.69 ? 3% perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_bulk.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.vma_link
3.94 ? 2% +0.7 4.63 perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
0.00 +0.7 0.72 ? 3% perf-profile.calltrace.cycles-pp.mas_destroy.mas_store_prealloc.vma_link.copy_vma.move_vma
0.00 +0.8 0.79 ? 3% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.vma_link.copy_vma
0.00 +1.0 0.98 ? 3% perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.vma_link.copy_vma.move_vma
1.16 +1.0 2.15 ? 3% perf-profile.calltrace.cycles-pp.mas_destroy.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +1.0 1.01 ? 3% perf-profile.calltrace.cycles-pp.mas_preallocate.vma_link.copy_vma.move_vma.__do_sys_mremap
2.99 +1.0 4.02 ? 2% perf-profile.calltrace.cycles-pp.copy_vma.move_vma.__do_sys_mremap.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.73 +1.0 2.77 ? 2% perf-profile.calltrace.cycles-pp.vma_link.copy_vma.move_vma.__do_sys_mremap.do_syscall_64
3.77 +1.5 5.25 ? 2% perf-profile.calltrace.cycles-pp.mas_store_prealloc.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
1.27 ? 2% +1.7 2.97 ? 3% perf-profile.calltrace.cycles-pp.mas_preallocate.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64
1.20 +1.7 2.92 ? 3% perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.mmap_region.do_mmap.vm_mmap_pgoff
0.66 ? 2% +1.7 2.39 ? 3% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.mmap_region.do_mmap
0.00 +1.8 1.83 ? 15% perf-profile.calltrace.cycles-pp.___slab_alloc.__kmem_cache_alloc_bulk.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate
0.00 +2.1 2.11 ? 3% perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_bulk.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.mmap_region
9.81 +3.0 12.83 ? 2% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.28 -4.0 21.25 ? 37% perf-profile.children.cycles-pp.__munmap
11.80 -2.3 9.46 ? 37% perf-profile.children.cycles-pp.mincore
18.01 -1.6 16.43 ? 2% perf-profile.children.cycles-pp.unmap_region
17.18 -1.2 16.00 ? 2% perf-profile.children.cycles-pp.__mm_populate
16.23 -1.1 15.12 ? 2% perf-profile.children.cycles-pp.populate_vma_page_range
16.08 -1.1 14.97 ? 2% perf-profile.children.cycles-pp.__get_user_pages
11.96 -1.0 11.00 ? 2% perf-profile.children.cycles-pp.tlb_finish_mmu
8.51 -0.7 7.77 ? 3% perf-profile.children.cycles-pp.tlb_batch_pages_flush
8.10 -0.7 7.37 ? 3% perf-profile.children.cycles-pp.release_pages
6.29 -0.6 5.65 ? 3% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
7.84 -0.6 7.22 ? 2% perf-profile.children.cycles-pp.__do_sys_mincore
5.48 ? 2% -0.6 4.88 ? 3% perf-profile.children.cycles-pp.uncharge_batch
8.73 -0.6 8.16 perf-profile.children.cycles-pp.handle_mm_fault
8.06 -0.5 7.55 perf-profile.children.cycles-pp.__handle_mm_fault
8.18 -0.5 7.70 perf-profile.children.cycles-pp.__alloc_pages
4.10 ? 2% -0.5 3.64 ? 4% perf-profile.children.cycles-pp.page_counter_uncharge
2.22 ? 2% -0.4 1.84 ? 20% perf-profile.children.cycles-pp.__entry_text_start
3.13 ? 3% -0.3 2.79 ? 2% perf-profile.children.cycles-pp.__mod_lruvec_page_state
1.43 ? 4% -0.3 1.11 ? 38% perf-profile.children.cycles-pp.stress_mmapaddr_child
3.42 -0.3 3.11 ? 2% perf-profile.children.cycles-pp.free_pgtables
3.74 -0.3 3.47 ? 2% perf-profile.children.cycles-pp.__pmd_alloc
3.00 -0.3 2.75 ? 2% perf-profile.children.cycles-pp.flush_tlb_func
2.15 -0.2 1.90 ? 3% perf-profile.children.cycles-pp.unmap_vmas
2.96 -0.2 2.72 ? 2% perf-profile.children.cycles-pp.do_mincore
2.33 -0.2 2.09 ? 2% perf-profile.children.cycles-pp.walk_page_range
3.54 -0.2 3.30 ? 2% perf-profile.children.cycles-pp.move_page_tables
2.84 -0.2 2.61 perf-profile.children.cycles-pp.free_pgd_range
3.57 -0.2 3.35 perf-profile.children.cycles-pp.flush_tlb_mm_range
2.68 ? 2% -0.2 2.47 ? 2% perf-profile.children.cycles-pp.free_p4d_range
3.56 -0.2 3.36 perf-profile.children.cycles-pp.__pte_alloc
2.21 ? 2% -0.2 2.01 ? 3% perf-profile.children.cycles-pp.__get_free_pages
3.17 ? 2% -0.2 2.97 ? 2% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
1.70 ? 2% -0.2 1.51 ? 3% perf-profile.children.cycles-pp.unmap_page_range
3.22 -0.2 3.04 perf-profile.children.cycles-pp.pte_alloc_one
2.50 ? 2% -0.2 2.32 perf-profile.children.cycles-pp.free_pud_range
2.69 -0.2 2.51 ? 3% perf-profile.children.cycles-pp.mas_find
0.75 -0.2 0.58 ? 3% perf-profile.children.cycles-pp.mas_update_gap
2.68 -0.2 2.52 ? 2% perf-profile.children.cycles-pp.do_anonymous_page
3.34 -0.1 3.20 ? 2% perf-profile.children.cycles-pp.get_page_from_freelist
1.11 ? 2% -0.1 0.99 ? 3% perf-profile.children.cycles-pp.zap_pmd_range
1.23 ? 2% -0.1 1.11 ? 2% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.18 -0.1 1.06 ? 3% perf-profile.children.cycles-pp.__walk_page_range
1.31 ? 2% -0.1 1.19 perf-profile.children.cycles-pp.native_flush_tlb_local
1.44 -0.1 1.33 ? 3% perf-profile.children.cycles-pp.native_flush_tlb_one_user
1.34 ? 2% -0.1 1.22 ? 3% perf-profile.children.cycles-pp.___pte_free_tlb
1.94 -0.1 1.82 perf-profile.children.cycles-pp.mas_next_entry
0.47 ? 2% -0.1 0.36 ? 4% perf-profile.children.cycles-pp.mas_leaf_max_gap
0.77 ? 2% -0.1 0.67 ? 4% perf-profile.children.cycles-pp.zap_pte_range
1.89 ? 2% -0.1 1.78 ? 2% perf-profile.children.cycles-pp.get_zeroed_page
1.21 ? 2% -0.1 1.10 ? 2% perf-profile.children.cycles-pp.find_vma
0.86 ? 4% -0.1 0.77 ? 2% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
1.24 -0.1 1.15 ? 2% perf-profile.children.cycles-pp.clear_page_erms
1.09 -0.1 1.00 ? 3% perf-profile.children.cycles-pp.walk_pgd_range
1.09 ? 2% -0.1 1.00 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.61 ? 2% -0.1 1.53 ? 2% perf-profile.children.cycles-pp.__pud_alloc
0.95 ? 4% -0.1 0.87 ? 5% perf-profile.children.cycles-pp.propagate_protected_usage
0.92 -0.1 0.85 ? 3% perf-profile.children.cycles-pp.walk_p4d_range
1.09 -0.1 1.02 ? 2% perf-profile.children.cycles-pp.down_write_killable
0.62 ? 5% -0.1 0.56 ? 4% perf-profile.children.cycles-pp.__count_memcg_events
0.67 ? 4% -0.1 0.61 ? 4% perf-profile.children.cycles-pp.try_charge_memcg
0.78 -0.1 0.72 ? 4% perf-profile.children.cycles-pp.walk_pud_range
0.90 -0.1 0.84 ? 3% perf-profile.children.cycles-pp.move_ptes
0.65 -0.0 0.60 ? 4% perf-profile.children.cycles-pp.walk_pmd_range
0.71 ? 2% -0.0 0.66 ? 3% perf-profile.children.cycles-pp.__might_sleep
0.26 ? 4% -0.0 0.22 ? 7% perf-profile.children.cycles-pp.__check_object_size
0.34 ? 5% -0.0 0.29 ? 5% perf-profile.children.cycles-pp.follow_p4d_mask
0.28 ? 2% -0.0 0.24 ? 5% perf-profile.children.cycles-pp.lru_add_drain
0.38 ? 3% -0.0 0.35 ? 5% perf-profile.children.cycles-pp.__mod_node_page_state
0.22 ? 3% -0.0 0.18 ? 8% perf-profile.children.cycles-pp.tlb_gather_mmu
0.27 -0.0 0.24 ? 5% perf-profile.children.cycles-pp.do_munmap
0.33 ? 5% -0.0 0.30 ? 4% perf-profile.children.cycles-pp.__x64_sys_mincore
0.88 -0.0 0.84 ? 3% perf-profile.children.cycles-pp.alloc_pages
0.24 ? 2% -0.0 0.20 ? 8% perf-profile.children.cycles-pp.remove_vma
0.23 ? 2% -0.0 0.20 ? 9% perf-profile.children.cycles-pp.perf_event_mmap_output
0.21 ? 4% -0.0 0.18 ? 6% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.21 ? 4% -0.0 0.18 ? 3% perf-profile.children.cycles-pp._find_next_bit
0.12 ? 8% -0.0 0.10 ? 7% perf-profile.children.cycles-pp.check_vma_flags
0.18 ? 3% -0.0 0.16 ? 6% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.09 ? 9% -0.0 0.07 ? 8% perf-profile.children.cycles-pp.mremap_userfaultfd_prep
0.08 ? 10% +0.0 0.10 ? 9% perf-profile.children.cycles-pp.local_clock
0.18 ? 5% +0.0 0.21 ? 4% perf-profile.children.cycles-pp.__list_add_valid
0.02 ?100% +0.0 0.07 ? 5% perf-profile.children.cycles-pp.memcg_slab_free_hook
0.00 +0.1 0.05 ? 6% perf-profile.children.cycles-pp.discard_slab
0.00 +0.1 0.05 ? 6% perf-profile.children.cycles-pp.__free_one_page
0.16 ? 5% +0.1 0.22 ? 5% perf-profile.children.cycles-pp.vma_complete
0.00 +0.1 0.07 ? 6% perf-profile.children.cycles-pp.setup_object
0.09 ? 7% +0.1 0.16 ? 12% perf-profile.children.cycles-pp.security_mmap_addr
0.00 +0.1 0.07 ? 13% perf-profile.children.cycles-pp.rcu_segcblist_pend_cbs
0.00 +0.1 0.08 ? 9% perf-profile.children.cycles-pp.mas_node_count_gfp
0.19 ? 6% +0.1 0.27 ? 5% perf-profile.children.cycles-pp.__list_del_entry_valid
0.43 ? 5% +0.1 0.51 ? 3% perf-profile.children.cycles-pp.__mt_destroy
0.29 ? 3% +0.1 0.37 ? 2% perf-profile.children.cycles-pp.__split_vma
0.00 +0.1 0.10 ? 22% perf-profile.children.cycles-pp.cap_mmap_addr
0.19 ? 7% +0.1 0.30 ? 3% perf-profile.children.cycles-pp.vma_expand
0.00 +0.1 0.12 ? 5% perf-profile.children.cycles-pp.free_pcppages_bulk
0.71 ? 3% +0.1 0.83 ? 2% perf-profile.children.cycles-pp.free_unref_page
1.15 ? 7% +0.2 1.31 ? 7% perf-profile.children.cycles-pp.update_process_times
1.26 ? 6% +0.2 1.42 ? 7% perf-profile.children.cycles-pp.tick_sched_handle
0.00 +0.2 0.17 ? 3% perf-profile.children.cycles-pp.inc_slabs_node
0.01 ?264% +0.2 0.18 ? 14% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.19 ? 12% +0.2 0.37 ? 10% perf-profile.children.cycles-pp.rcu_pending
1.41 ? 6% +0.2 1.59 ? 7% perf-profile.children.cycles-pp.tick_sched_timer
0.00 +0.2 0.19 ? 7% perf-profile.children.cycles-pp.put_cpu_partial
0.22 ? 11% +0.2 0.41 ? 10% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.12 ? 18% +0.2 0.32 ? 11% perf-profile.children.cycles-pp.check_cpu_stall
0.00 +0.2 0.22 ? 2% perf-profile.children.cycles-pp.rcu_nocb_try_bypass
0.00 +0.3 0.25 ? 7% perf-profile.children.cycles-pp.shuffle_freelist
0.00 +0.3 0.27 ? 11% perf-profile.children.cycles-pp.rcu_cblist_dequeue
2.03 ? 2% +0.3 2.34 ? 2% perf-profile.children.cycles-pp.kmem_cache_free
0.00 +0.3 0.31 ? 8% perf-profile.children.cycles-pp.get_any_partial
0.00 +0.4 0.37 ? 6% perf-profile.children.cycles-pp.allocate_slab
0.00 +0.4 0.38 ? 4% perf-profile.children.cycles-pp.rcu_segcblist_enqueue
0.00 +0.4 0.43 ? 4% perf-profile.children.cycles-pp.mas_replace
16.16 +0.4 16.59 perf-profile.children.cycles-pp.__do_sys_mremap
2.55 ? 2% +0.5 3.01 ? 4% perf-profile.children.cycles-pp.kmem_cache_alloc
15.01 +0.5 15.52 perf-profile.children.cycles-pp.move_vma
0.28 ? 11% +0.6 0.85 ? 3% perf-profile.children.cycles-pp.mas_pop_node
1.52 ? 2% +0.6 2.10 ? 2% perf-profile.children.cycles-pp.mas_wr_bnode
0.00 +0.6 0.61 ? 7% perf-profile.children.cycles-pp.get_partial_node
0.00 +0.6 0.62 ? 2% perf-profile.children.cycles-pp.__unfreeze_partials
0.00 +0.6 0.65 ? 4% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.63 ? 2% +0.7 1.31 ? 3% perf-profile.children.cycles-pp.mas_commit_b_node
0.05 ? 40% +0.8 0.88 ? 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
1.01 ? 3% +0.9 1.95 ? 2% perf-profile.children.cycles-pp.kmem_cache_free_bulk
5.28 +1.0 6.24 perf-profile.children.cycles-pp.mas_store_gfp
3.02 +1.0 4.04 ? 2% perf-profile.children.cycles-pp.copy_vma
1.74 +1.0 2.78 ? 2% perf-profile.children.cycles-pp.vma_link
0.00 +1.2 1.22 ? 2% perf-profile.children.cycles-pp.__call_rcu_common
3.73 +1.2 4.95 perf-profile.children.cycles-pp.mas_wr_modify
3.51 +1.2 4.76 perf-profile.children.cycles-pp.mas_wr_node_store
1.79 +1.4 3.18 ? 2% perf-profile.children.cycles-pp.mas_destroy
0.00 +1.6 1.62 ? 3% perf-profile.children.cycles-pp.__slab_free
31.27 +1.8 33.02 ? 2% perf-profile.children.cycles-pp.vm_mmap_pgoff
5.27 +2.1 7.35 ? 2% perf-profile.children.cycles-pp.mas_store_prealloc
0.00 +2.2 2.24 ? 3% perf-profile.children.cycles-pp.rcu_do_batch
0.00 +2.3 2.28 ? 3% perf-profile.children.cycles-pp.rcu_core
0.00 +2.3 2.30 ? 3% perf-profile.children.cycles-pp.___slab_alloc
0.44 ? 4% +2.3 2.74 ? 2% perf-profile.children.cycles-pp.__irq_exit_rcu
0.36 ? 5% +2.3 2.67 ? 2% perf-profile.children.cycles-pp.__do_softirq
1.86 +2.3 4.19 ? 3% perf-profile.children.cycles-pp.mas_preallocate
1.01 +2.4 3.38 ? 3% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
0.53 ? 2% +2.4 2.93 ? 3% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
3.76 ? 4% +2.5 6.26 ? 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.44 ? 5% +2.5 5.94 ? 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
2.33 +3.0 5.33 ? 3% perf-profile.children.cycles-pp.mas_alloc_nodes
9.91 +3.0 12.92 ? 2% perf-profile.children.cycles-pp.mmap_region
12.05 +3.0 15.08 ? 2% perf-profile.children.cycles-pp.do_mmap
2.99 -0.7 2.30 perf-profile.self.cycles-pp.mas_wr_node_store
3.18 ? 3% -0.4 2.74 ? 4% perf-profile.self.cycles-pp.page_counter_uncharge
3.32 -0.3 2.98 ? 3% perf-profile.self.cycles-pp.folio_mark_accessed
1.12 ? 5% -0.3 0.84 ? 40% perf-profile.self.cycles-pp.stress_mmapaddr_child
2.82 -0.2 2.60 ? 3% perf-profile.self.cycles-pp.mtree_range_walk
0.96 ? 2% -0.2 0.75 ? 37% perf-profile.self.cycles-pp.mincore
1.76 ? 4% -0.2 1.56 ? 4% perf-profile.self.cycles-pp.__mod_lruvec_page_state
0.80 -0.2 0.64 ? 38% perf-profile.self.cycles-pp.__mmap
1.66 ? 2% -0.1 1.52 ? 3% perf-profile.self.cycles-pp.__might_resched
1.02 ? 5% -0.1 0.90 ? 5% perf-profile.self.cycles-pp.__alloc_pages
0.46 ? 3% -0.1 0.34 ? 4% perf-profile.self.cycles-pp.mas_leaf_max_gap
1.19 ? 2% -0.1 1.07 ? 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.44 -0.1 1.32 ? 3% perf-profile.self.cycles-pp.native_flush_tlb_one_user
1.22 -0.1 1.11 ? 2% perf-profile.self.cycles-pp.clear_page_erms
1.30 ? 2% -0.1 1.18 perf-profile.self.cycles-pp.native_flush_tlb_local
0.91 ? 4% -0.1 0.81 ? 4% perf-profile.self.cycles-pp.__entry_text_start
0.76 ? 5% -0.1 0.66 ? 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.52 ? 4% -0.1 0.42 ? 4% perf-profile.self.cycles-pp.zap_pte_range
0.93 ? 4% -0.1 0.84 ? 6% perf-profile.self.cycles-pp.propagate_protected_usage
0.61 ? 4% -0.1 0.52 ? 4% perf-profile.self.cycles-pp.mmap_region
0.93 ? 2% -0.1 0.84 perf-profile.self.cycles-pp.mas_next_entry
0.77 ? 2% -0.1 0.68 ? 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.63 -0.1 0.55 ? 5% perf-profile.self.cycles-pp.mas_wr_store_entry
0.42 ? 4% -0.1 0.34 ? 38% perf-profile.self.cycles-pp.mremap
1.08 -0.1 1.00 ? 4% perf-profile.self.cycles-pp._raw_spin_lock
0.54 ? 4% -0.1 0.47 ? 6% perf-profile.self.cycles-pp.uncharge_batch
0.27 ? 4% -0.1 0.20 ? 5% perf-profile.self.cycles-pp.mas_update_gap
0.98 -0.1 0.92 ? 3% perf-profile.self.cycles-pp.mas_wr_walk
0.55 ? 2% -0.1 0.48 ? 5% perf-profile.self.cycles-pp.mas_find
0.81 ? 2% -0.1 0.75 ? 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.89 ? 2% -0.1 0.83 ? 4% perf-profile.self.cycles-pp.mas_next_nentry
0.42 ? 5% -0.1 0.36 ? 2% perf-profile.self.cycles-pp.down_read
0.54 ? 6% -0.1 0.49 ? 4% perf-profile.self.cycles-pp.__count_memcg_events
0.92 -0.1 0.86 ? 3% perf-profile.self.cycles-pp._raw_spin_trylock
0.59 ? 3% -0.1 0.53 ? 3% perf-profile.self.cycles-pp.mas_store_gfp
0.76 ? 2% -0.1 0.71 ? 4% perf-profile.self.cycles-pp.rmqueue
0.47 ? 2% -0.1 0.41 ? 3% perf-profile.self.cycles-pp.do_syscall_64
0.46 ? 2% -0.1 0.41 ? 4% perf-profile.self.cycles-pp.down_write_killable
0.32 ? 5% -0.1 0.27 ? 5% perf-profile.self.cycles-pp.follow_p4d_mask
0.45 ? 3% -0.0 0.40 ? 4% perf-profile.self.cycles-pp.try_charge_memcg
0.64 ? 2% -0.0 0.60 ? 4% perf-profile.self.cycles-pp.__cond_resched
0.52 ? 2% -0.0 0.48 ? 3% perf-profile.self.cycles-pp.alloc_pages
0.51 ? 3% -0.0 0.47 ? 3% perf-profile.self.cycles-pp.memcg_account_kmem
0.21 ? 5% -0.0 0.17 ? 9% perf-profile.self.cycles-pp.tlb_gather_mmu
0.36 ? 3% -0.0 0.33 ? 5% perf-profile.self.cycles-pp.__mod_node_page_state
0.32 ? 5% -0.0 0.28 ? 4% perf-profile.self.cycles-pp.__vm_munmap
0.30 ? 5% -0.0 0.27 ? 3% perf-profile.self.cycles-pp.__x64_sys_mincore
0.46 ? 3% -0.0 0.42 ? 2% perf-profile.self.cycles-pp.__do_sys_mincore
0.49 ? 3% -0.0 0.46 ? 3% perf-profile.self.cycles-pp.release_pages
0.17 ? 2% -0.0 0.14 ? 5% perf-profile.self.cycles-pp.unmap_vmas
0.28 ? 3% -0.0 0.25 ? 4% perf-profile.self.cycles-pp.mab_mas_cp
0.18 ? 6% -0.0 0.15 ? 8% perf-profile.self.cycles-pp.unmap_region
0.35 ? 2% -0.0 0.32 ? 4% perf-profile.self.cycles-pp.security_mmap_file
0.14 ? 8% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.free_p4d_range
0.19 ? 5% -0.0 0.16 ? 3% perf-profile.self.cycles-pp._find_next_bit
0.11 ? 6% -0.0 0.08 ? 10% perf-profile.self.cycles-pp.check_vma_flags
0.17 ? 4% -0.0 0.15 ? 6% perf-profile.self.cycles-pp.vm_area_alloc
0.14 ? 10% -0.0 0.11 ? 8% perf-profile.self.cycles-pp.free_pgtables
0.21 ? 2% -0.0 0.18 ? 8% perf-profile.self.cycles-pp.perf_event_mmap_output
0.08 ? 5% -0.0 0.06 ? 10% perf-profile.self.cycles-pp.__walk_page_range
0.07 ? 10% -0.0 0.05 ? 38% perf-profile.self.cycles-pp.remove_vma
0.06 ? 7% -0.0 0.04 ? 38% perf-profile.self.cycles-pp.lru_add_drain
0.14 ? 6% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.walk_p4d_range
0.09 ? 8% -0.0 0.07 ? 14% perf-profile.self.cycles-pp.__virt_addr_valid
0.11 ? 4% -0.0 0.09 ? 7% perf-profile.self.cycles-pp.ksys_mmap_pgoff
0.16 ? 4% -0.0 0.14 ? 5% perf-profile.self.cycles-pp.mas_prev
0.14 ? 5% +0.0 0.17 ? 4% perf-profile.self.cycles-pp.__list_add_valid
0.00 +0.1 0.05 ? 6% perf-profile.self.cycles-pp.discard_slab
0.01 ?264% +0.1 0.06 ? 5% perf-profile.self.cycles-pp.memcg_slab_free_hook
0.14 ? 4% +0.1 0.22 ? 3% perf-profile.self.cycles-pp.__list_del_entry_valid
0.02 ?129% +0.1 0.10 ? 9% perf-profile.self.cycles-pp.mas_commit_b_node
0.00 +0.1 0.09 ? 6% perf-profile.self.cycles-pp.rcu_do_batch
0.00 +0.1 0.09 ? 23% perf-profile.self.cycles-pp.cap_mmap_addr
0.00 +0.1 0.11 ? 9% perf-profile.self.cycles-pp.get_any_partial
0.44 ? 2% +0.1 0.58 ? 3% perf-profile.self.cycles-pp.mas_alloc_nodes
0.00 +0.1 0.15 ? 5% perf-profile.self.cycles-pp.rcu_nocb_try_bypass
0.00 +0.2 0.16 ? 7% perf-profile.self.cycles-pp.__unfreeze_partials
0.00 +0.2 0.16 ? 10% perf-profile.self.cycles-pp.shuffle_freelist
0.00 +0.2 0.16 ? 4% perf-profile.self.cycles-pp.inc_slabs_node
0.00 +0.2 0.18 ? 6% perf-profile.self.cycles-pp.put_cpu_partial
0.00 +0.2 0.18 ? 10% perf-profile.self.cycles-pp.get_partial_node
0.05 ? 40% +0.2 0.23 ? 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.12 ? 18% +0.2 0.32 ? 11% perf-profile.self.cycles-pp.check_cpu_stall
0.90 ? 2% +0.2 1.10 ? 3% perf-profile.self.cycles-pp.kmem_cache_alloc
0.00 +0.3 0.25 ? 10% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.52 ? 2% +0.3 0.80 ? 5% perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
1.53 ? 3% +0.3 1.81 ? 3% perf-profile.self.cycles-pp.kmem_cache_free
0.95 ? 3% +0.3 1.27 ? 3% perf-profile.self.cycles-pp.kmem_cache_free_bulk
0.00 +0.4 0.35 ? 4% perf-profile.self.cycles-pp.rcu_segcblist_enqueue
0.00 +0.4 0.40 ? 4% perf-profile.self.cycles-pp.mas_replace
0.27 ? 11% +0.5 0.80 ? 3% perf-profile.self.cycles-pp.mas_pop_node
0.00 +0.6 0.63 ? 2% perf-profile.self.cycles-pp.__call_rcu_common
0.00 +0.6 0.65 ? 4% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.00 +1.0 1.00 ? 2% perf-profile.self.cycles-pp.___slab_alloc
0.00 +1.4 1.44 ? 3% perf-profile.self.cycles-pp.__slab_free




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



Attachments:
(No filename) (43.75 kB)
config-6.3.0-rc4-00033-gb5768f8ae36f (159.51 kB)
job-script (8.31 kB)
job.yaml (6.05 kB)
reproduce (482.00 B)
Download all attachments

2023-04-11 02:37:16

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 8/8] mm: enable maple tree RCU mode by default.

On Tue, Apr 11, 2023 at 09:25:16AM +0800, kernel test robot wrote:
> kernel test robot noticed a -8.5% regression of stress-ng.mmapaddr.ops_per_sec on:

Assuming this is the test in question:

https://github.com/ColinIanKing/stress-ng/blob/master/stress-mmapaddr.c

then yes, this is expected. The test calls mmap() and munmap() a lot,
and we've made those slower in order to fix a bug.

While it does take pagefaults (which is a better test than some
microbenchmarks), it only takes one pagefault per call to mmap() and
munmap(), which is not representative of real workloads.

Thanks for running the test.