2022-12-19 10:35:51

by kernel test robot

[permalink] [raw]
Subject: [linus:master] will-it-scale.per_thread_ops -40.2% regression in mmap1 benchmark

Greetings,

FYI, we noticed a -40.2% regression of will-it-scale.per_thread_ops
between commit 524e00b36e8c and e15e06a83923 of mainline

524e00b36e8c5 mm: remove rb tree.
0c563f1480435 proc: remove VMA rbtree use from nommu
d0cf3dd47f0d5 damon: convert __damon_va_three_regions to use the VMA iterator
c9dbe82cb99db kernel/fork: use maple tree for dup_mmap() during forking
3499a13168da6 mm/mmap: use maple tree for unmapped_area{_topdown}
7fdbd37da5c6f mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
be8432e7166ef mm/mmap: use the maple tree in find_vma() instead of the rbtree.
2e3af1db17442 mmap: use the VMA iterator in count_vma_pages_range()
f39af05949a42 mm: add VMA iterator
d4af56c5c7c67 mm: start tracking VMAs with maple tree
e15e06a839232 lib/test_maple_tree: add testing for maple tree

in testcase: will-it-scale
on test machine: 104 threads 2 sockets (Skylake) with 192G memory
with following parameters:

nr_task: 50%
mode: thread
test: mmap1
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale


We couldn't find out the commit that introduced this regression because
some of above commits failed to boot during bisection, but looks it is
related with maple tree code. Please check following details:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/mmap1/will-it-scale

commit:
e15e06a839232 ("lib/test_maple_tree: add testing for maple tree")
524e00b36e8c5 ("mm: remove rb tree.")

e15e06a8392321a1 524e00b36e8c547f5582eef3fb6
---------------- ---------------------------
%stddev %change %stddev
\ | \
238680 -40.2% 142816 will-it-scale.52.threads
4589 -40.2% 2746 will-it-scale.per_thread_ops
238680 -40.2% 142816 will-it-scale.workload
0.28 -0.1 0.20 ? 3% mpstat.cpu.all.usr%
7758 -1.6% 7636 proc-vmstat.nr_mapped
0.03 ? 14% +40.0% 0.05 ? 10% time.system_time
35.87 ? 41% -17.2 18.71 ? 92% turbostat.C1E%
14.11 ?105% +15.5 29.62 ? 52% turbostat.C6%
466662 ? 3% +20.3% 561351 ? 3% turbostat.POLL
42.33 +3.9% 44.00 turbostat.PkgTmp
838.08 ? 49% -50.7% 412.94 ? 19% sched_debug.cfs_rq:/.load_avg.max
466231 ? 14% -53.4% 217040 ? 82% sched_debug.cfs_rq:/.min_vruntime.min
-335910 +146.5% -828023 sched_debug.cfs_rq:/.spread0.min
602391 ? 4% +6.5% 641749 ? 4% sched_debug.cpu.avg_idle.avg
26455 ? 7% +16.1% 30723 ? 6% sched_debug.cpu.nr_switches.max
230323 ? 6% +42.4% 327946 ? 3% numa-numastat.node0.local_node
257238 ? 2% +29.2% 332446 numa-numastat.node0.numa_hit
26826 ? 35% -83.1% 4532 ?138% numa-numastat.node0.other_node
344370 ? 3% -26.8% 251981 ? 2% numa-numastat.node1.local_node
351214 ? 2% -19.9% 281185 numa-numastat.node1.numa_hit
6779 ?139% +330.8% 29204 ? 21% numa-numastat.node1.other_node
111776 ? 8% +43.9% 160892 ? 17% numa-meminfo.node0.AnonHugePages
163879 ? 5% +34.9% 221083 ? 21% numa-meminfo.node0.AnonPages
182360 ? 2% +39.7% 254705 ? 15% numa-meminfo.node0.AnonPages.max
167687 ? 4% +33.0% 223029 ? 20% numa-meminfo.node0.Inactive
165329 ? 4% +34.9% 223029 ? 20% numa-meminfo.node0.Inactive(anon)
2357 ?131% -100.0% 0.00 numa-meminfo.node0.Inactive(file)
2087 ? 11% +22.1% 2548 ? 9% numa-meminfo.node0.PageTables
170594 ? 7% -27.5% 123611 ? 23% numa-meminfo.node1.AnonHugePages
238127 ? 3% -23.9% 181170 ? 25% numa-meminfo.node1.AnonPages
278201 ? 3% -26.8% 203778 ? 22% numa-meminfo.node1.AnonPages.max
244262 ? 2% -24.0% 185599 ? 25% numa-meminfo.node1.Inactive
244206 ? 2% -24.1% 185419 ? 25% numa-meminfo.node1.Inactive(anon)
20767 ? 64% -48.4% 10717 ?124% numa-meminfo.node1.Mapped
40936 ? 5% +34.9% 55213 ? 21% numa-vmstat.node0.nr_anon_pages
41317 ? 4% +34.8% 55700 ? 20% numa-vmstat.node0.nr_inactive_anon
41317 ? 4% +34.8% 55700 ? 20% numa-vmstat.node0.nr_zone_inactive_anon
257331 ? 2% +29.2% 332536 numa-vmstat.node0.numa_hit
230417 ? 5% +42.4% 328036 ? 3% numa-vmstat.node0.numa_local
26826 ? 35% -83.1% 4532 ?138% numa-vmstat.node0.numa_other
59518 ? 4% -24.0% 45237 ? 25% numa-vmstat.node1.nr_anon_pages
61041 ? 3% -24.2% 46287 ? 25% numa-vmstat.node1.nr_inactive_anon
5196 ? 64% -48.7% 2666 ?126% numa-vmstat.node1.nr_mapped
61041 ? 3% -24.2% 46287 ? 25% numa-vmstat.node1.nr_zone_inactive_anon
351314 ? 2% -20.0% 281191 numa-vmstat.node1.numa_hit
344470 ? 4% -26.8% 251987 ? 2% numa-vmstat.node1.numa_local
6779 ?139% +330.8% 29204 ? 21% numa-vmstat.node1.numa_other
3.12 ? 10% -25.7% 2.32 ? 2% perf-stat.i.MPKI
3.111e+09 +4.4% 3.247e+09 perf-stat.i.branch-instructions
0.43 -0.0 0.39 perf-stat.i.branch-miss-rate%
13577850 -5.5% 12837395 perf-stat.i.branch-misses
38.85 ? 3% +4.6 43.44 ? 3% perf-stat.i.cache-miss-rate%
47922345 ? 10% -21.9% 37423833 ? 2% perf-stat.i.cache-references
9.42 -5.1% 8.94 perf-stat.i.cpi
0.02 -0.0 0.01 perf-stat.i.dTLB-load-miss-rate%
632005 -28.8% 449814 perf-stat.i.dTLB-load-misses
4.127e+09 +3.8% 4.282e+09 perf-stat.i.dTLB-loads
0.00 ? 7% -0.0 0.00 ? 11% perf-stat.i.dTLB-store-miss-rate%
3.131e+08 +26.5% 3.962e+08 perf-stat.i.dTLB-stores
599587 ? 8% -20.0% 479492 ? 6% perf-stat.i.iTLB-load-misses
2324378 -12.7% 2028806 ? 7% perf-stat.i.iTLB-loads
1.54e+10 +5.4% 1.622e+10 perf-stat.i.instructions
25907 ? 7% +31.4% 34030 ? 6% perf-stat.i.instructions-per-iTLB-miss
0.11 +5.4% 0.11 perf-stat.i.ipc
570.88 ? 8% -22.1% 444.53 ? 2% perf-stat.i.metric.K/sec
72.60 +5.0% 76.20 perf-stat.i.metric.M/sec
90.37 +1.5 91.82 perf-stat.i.node-load-miss-rate%
7458505 ? 2% -27.2% 5431142 ? 3% perf-stat.i.node-load-misses
795163 -39.1% 484036 perf-stat.i.node-loads
3.11 ? 10% -25.9% 2.31 ? 2% perf-stat.overall.MPKI
0.44 -0.0 0.40 perf-stat.overall.branch-miss-rate%
38.72 ? 3% +4.5 43.24 ? 3% perf-stat.overall.cache-miss-rate%
9.40 -5.1% 8.93 perf-stat.overall.cpi
0.02 -0.0 0.01 perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 6% -0.0 0.00 ? 11% perf-stat.overall.dTLB-store-miss-rate%
25842 ? 7% +31.5% 33976 ? 6% perf-stat.overall.instructions-per-iTLB-miss
0.11 +5.4% 0.11 perf-stat.overall.ipc
90.36 +1.4 91.81 perf-stat.overall.node-load-miss-rate%
19478525 +76.1% 34307144 perf-stat.overall.path-length
3.101e+09 +4.4% 3.236e+09 perf-stat.ps.branch-instructions
13536210 -5.5% 12794692 perf-stat.ps.branch-misses
47758992 ? 10% -21.9% 37302259 ? 2% perf-stat.ps.cache-references
629957 -28.8% 448327 perf-stat.ps.dTLB-load-misses
4.113e+09 +3.8% 4.268e+09 perf-stat.ps.dTLB-loads
3.121e+08 +26.5% 3.949e+08 perf-stat.ps.dTLB-stores
597514 ? 8% -20.0% 477834 ? 6% perf-stat.ps.iTLB-load-misses
2316405 -12.7% 2021878 ? 7% perf-stat.ps.iTLB-loads
1.535e+10 +5.4% 1.617e+10 perf-stat.ps.instructions
7434434 ? 2% -27.2% 5412315 ? 3% perf-stat.ps.node-load-misses
792675 -39.1% 482405 perf-stat.ps.node-loads
4.648e+12 +5.4% 4.9e+12 perf-stat.total.instructions
24.16 ? 66% -16.4 7.77 ?122% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
24.16 ? 66% -16.4 7.77 ?122% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
33.88 ? 20% -9.6 24.32 ? 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
33.89 ? 20% -9.3 24.61 ? 6% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
33.05 ? 20% -8.9 24.13 ? 6% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
33.04 ? 20% -8.9 24.13 ? 6% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
33.05 ? 20% -8.9 24.14 ? 6% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
33.05 ? 20% -8.9 24.14 ? 6% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
33.05 ? 20% -8.9 24.14 ? 6% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
0.38 ? 70% +0.2 0.61 ? 2% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.6 0.56 ? 2% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.00 +0.6 0.57 ? 2% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.00 +0.6 0.60 ? 3% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
31.73 ? 10% +4.5 36.24 ? 2% perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
31.50 ? 10% +4.6 36.05 perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
33.19 ? 10% +4.6 37.77 ? 2% perf-profile.calltrace.cycles-pp.__munmap
32.39 ? 10% +4.6 36.97 ? 2% perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
32.34 ? 10% +4.6 36.94 ? 2% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64
33.08 ? 10% +4.6 37.69 ? 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
32.15 ? 10% +4.6 36.76 perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
33.05 ? 10% +4.6 37.66 ? 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
32.31 ? 10% +4.6 36.92 ? 2% perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap
32.10 ? 10% +4.6 36.73 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
32.98 ? 10% +4.6 37.61 ? 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
32.07 ? 10% +4.6 36.70 perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64
32.98 ? 10% +4.6 37.61 ? 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
32.86 ? 10% +4.7 37.56 perf-profile.calltrace.cycles-pp.__mmap
32.74 ? 10% +4.7 37.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
32.71 ? 10% +4.8 37.46 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
32.62 ? 10% +4.8 37.39 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
24.31 ? 66% -16.4 7.88 ?122% perf-profile.children.cycles-pp.intel_idle
33.89 ? 20% -9.3 24.61 ? 6% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
33.89 ? 20% -9.3 24.61 ? 6% perf-profile.children.cycles-pp.cpu_startup_entry
33.89 ? 20% -9.3 24.61 ? 6% perf-profile.children.cycles-pp.do_idle
33.85 ? 20% -9.3 24.57 ? 6% perf-profile.children.cycles-pp.mwait_idle_with_hints
33.88 ? 20% -9.3 24.60 ? 6% perf-profile.children.cycles-pp.cpuidle_enter
33.88 ? 20% -9.3 24.60 ? 6% perf-profile.children.cycles-pp.cpuidle_enter_state
33.88 ? 20% -9.3 24.61 ? 6% perf-profile.children.cycles-pp.cpuidle_idle_call
33.05 ? 20% -8.9 24.14 ? 6% perf-profile.children.cycles-pp.start_secondary
0.84 ? 25% -0.4 0.48 ? 16% perf-profile.children.cycles-pp.start_kernel
0.84 ? 25% -0.4 0.48 ? 16% perf-profile.children.cycles-pp.arch_call_rest_init
0.84 ? 25% -0.4 0.48 ? 16% perf-profile.children.cycles-pp.rest_init
0.16 ? 12% -0.1 0.08 perf-profile.children.cycles-pp.unmap_region
0.14 ? 11% -0.0 0.10 ? 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.13 ? 12% -0.0 0.10 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.mas_wr_node_store
0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.memset_erms
0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.mas_wr_modify
0.00 +0.1 0.07 ? 6% perf-profile.children.cycles-pp.kmem_cache_free_bulk
0.53 ? 10% +0.1 0.61 ? 2% perf-profile.children.cycles-pp.__do_munmap
0.00 +0.1 0.08 ? 5% perf-profile.children.cycles-pp.mas_destroy
0.00 +0.1 0.09 ? 5% perf-profile.children.cycles-pp.mt_find
0.00 +0.1 0.10 perf-profile.children.cycles-pp.mas_spanning_rebalance
0.00 +0.1 0.10 ? 4% perf-profile.children.cycles-pp.mas_wr_spanning_store
0.00 +0.1 0.12 ? 4% perf-profile.children.cycles-pp.mas_rev_awalk
0.00 +0.1 0.13 perf-profile.children.cycles-pp.mas_empty_area_rev
0.00 +0.1 0.14 ? 5% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
0.00 +0.2 0.16 ? 5% perf-profile.children.cycles-pp.mas_alloc_nodes
0.00 +0.2 0.17 ? 4% perf-profile.children.cycles-pp.mas_preallocate
0.42 ? 15% +0.2 0.60 ? 3% perf-profile.children.cycles-pp.do_mmap
0.06 ? 7% +0.2 0.27 perf-profile.children.cycles-pp.vma_link
0.20 ? 14% +0.2 0.41 ? 4% perf-profile.children.cycles-pp.mmap_region
0.00 +0.3 0.35 ? 4% perf-profile.children.cycles-pp.mas_store_prealloc
0.78 ? 8% +0.4 1.13 ? 2% perf-profile.children.cycles-pp.rwsem_spin_on_owner
33.20 ? 10% +4.6 37.77 ? 2% perf-profile.children.cycles-pp.__munmap
32.98 ? 10% +4.6 37.61 ? 2% perf-profile.children.cycles-pp.__x64_sys_munmap
32.98 ? 10% +4.6 37.61 ? 2% perf-profile.children.cycles-pp.__vm_munmap
32.86 ? 10% +4.7 37.56 perf-profile.children.cycles-pp.__mmap
32.62 ? 10% +4.8 37.40 perf-profile.children.cycles-pp.vm_mmap_pgoff
63.26 ? 10% +9.1 72.32 ? 2% perf-profile.children.cycles-pp.osq_lock
64.54 ? 10% +9.2 73.72 ? 2% perf-profile.children.cycles-pp.down_write_killable
64.44 ? 10% +9.2 73.66 ? 2% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
64.38 ? 10% +9.2 73.62 ? 2% perf-profile.children.cycles-pp.rwsem_optimistic_spin
65.87 ? 10% +9.3 75.21 ? 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
65.79 ? 10% +9.4 75.15 ? 2% perf-profile.children.cycles-pp.do_syscall_64
33.85 ? 20% -9.3 24.57 ? 6% perf-profile.self.cycles-pp.mwait_idle_with_hints
0.29 ? 19% -0.1 0.14 ? 3% perf-profile.self.cycles-pp.rwsem_optimistic_spin
0.13 ? 10% -0.0 0.09 ? 9% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.09 ? 9% -0.0 0.05 ? 8% perf-profile.self.cycles-pp.down_write_killable
0.13 ? 12% -0.0 0.10 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 perf-profile.self.cycles-pp.memset_erms
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.kmem_cache_free_bulk
0.00 +0.1 0.06 ? 7% perf-profile.self.cycles-pp.kmem_cache_alloc_bulk
0.00 +0.1 0.08 perf-profile.self.cycles-pp.mt_find
0.00 +0.1 0.11 ? 4% perf-profile.self.cycles-pp.mas_rev_awalk
0.76 ? 8% +0.4 1.12 ? 2% perf-profile.self.cycles-pp.rwsem_spin_on_owner
62.94 ? 10% +9.0 71.91 ? 2% perf-profile.self.cycles-pp.osq_lock


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (19.02 kB)
config-6.1.0-rc7-00211-g0ba09b173387 (168.61 kB)
job-script (7.67 kB)
job.yaml (5.22 kB)
reproduce (354.00 B)
Download all attachments

2022-12-19 18:23:58

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [linus:master] will-it-scale.per_thread_ops -40.2% regression in mmap1 benchmark

* kernel test robot <[email protected]> [221219 05:01]:
> Greetings,
>
> FYI, we noticed a -40.2% regression of will-it-scale.per_thread_ops
> between commit 524e00b36e8c and e15e06a83923 of mainline

Thank you for running this test.

We are aware of this regression. The regression was taken as an
acceptable trade off for the gain on the read speed. Applications
perform more reads than writes to the VMA tree. The overfall
performance on real applications is either even or faster with the maple
tree. This can be seen in the kernel build times where forked processes
are short lived and would be close to the worst case scenario.

This isn't to say we can't do better, and we are constantly working
towards faster performance. Please continue to report on the
performance.

Looking specifically at mmap1, it is mapping then unmapping in a tight
loop. The regression would be expected, considering the internals of
what is going on, but I don't believe this would ever happen in an
application that is doing what it is supposed to be doing.

If you find a real application that shows a performance regression,
please let us know.

>
> 524e00b36e8c5 mm: remove rb tree.
> 0c563f1480435 proc: remove VMA rbtree use from nommu
> d0cf3dd47f0d5 damon: convert __damon_va_three_regions to use the VMA iterator
> c9dbe82cb99db kernel/fork: use maple tree for dup_mmap() during forking
> 3499a13168da6 mm/mmap: use maple tree for unmapped_area{_topdown}
> 7fdbd37da5c6f mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
> be8432e7166ef mm/mmap: use the maple tree in find_vma() instead of the rbtree.
> 2e3af1db17442 mmap: use the VMA iterator in count_vma_pages_range()
> f39af05949a42 mm: add VMA iterator
> d4af56c5c7c67 mm: start tracking VMAs with maple tree
> e15e06a839232 lib/test_maple_tree: add testing for maple tree
>
> in testcase: will-it-scale
> on test machine: 104 threads 2 sockets (Skylake) with 192G memory
> with following parameters:
>
> nr_task: 50%
> mode: thread
> test: mmap1
> cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
> We couldn't find out the commit that introduced this regression because
> some of above commits failed to boot during bisection, but looks it is
> related with maple tree code. Please check following details:

It is interesting that these issues were not detected by myself or other
build bots. Perhaps there is a configuration option that wasn't tested.
In any rate, all of the listed commits were in preparation for the last
commit to remove the rb tree. Regardless of which commit introduced the
regression, it is the fact that that the maple tree is slower on writes
that is being detected.

Thanks,
Liam