When checking a performance change for will-it-scale scalability
mmap test [1], we found very high lock contention for spinlock of
percpu counter 'vm_committed_as':
94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave
48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;
Actually this heavy lock contention is not always necessary. The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch
number for the percpu counter.
So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy,
and enlarge it for not-so-strict OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS
policies.
Benchmark with the same testcase in [1] shows 53% improvement on a
8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. And for that
case, whether it shows improvements depends on if the test mmap size
is bigger than the batch number computed.
We tested 10+ platforms in 0day (server, desktop and laptop). If we
lift it to 64X, 80%+ platforms show improvements, and for 16X lift,
1/3 of the platforms will show improvements.
And generally it should help the mmap/unmap usage,as Michal Hocko
mentioned:
: I believe that there are non-synthetic worklaods which would benefit
: from a larger batch. E.g. large in memory databases which do large
: mmaps during startups from multiple threads.
Note: There are some style complain from checkpatch for patch 3,
as sysctl handler declaration follows the similar format of sibling
functions
[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
patch1: a cleanup for /proc/meminfo
patch2: a preparation patch which also improve the accuracy of
vm_memory_committed
patch3: main change
This is against today's linux-mm git tree on github.
Please help to review, thanks!
- Feng
----------------------------------------------------------------
Changelog:
v5:
* rebase after 5.8-rc1
* remove the 3/4 patch in v4 which is merged in v5.7
* add code comments for vm_memory_committed()
v4:
* Remove the VM_WARN_ONCE check for vm_committed_as underflow,
thanks to Qian Cai for finding and testing the warning
v3:
* refine commit log and cleanup code, according to comments
from Michal Hocko and Matthew Wilcox
* change the lift from 16X and 64X after test
v2:
* add the sysctl handler to cover runtime overcommit policy
change, as suggested by Andres Morton
* address the accuracy concern of vm_memory_committed()
from Andi Kleen
Feng Tang (3):
proc/meminfo: avoid open coded reading of vm_committed_as
mm/util.c: make vm_memory_committed() more accurate
mm: adjust vm_committed_as_batch according to vm overcommit policy
fs/proc/meminfo.c | 2 +-
include/linux/mm.h | 2 ++
include/linux/mman.h | 4 ++++
kernel/sysctl.c | 2 +-
mm/mm_init.c | 18 ++++++++++++++----
mm/util.c | 19 ++++++++++++++++++-
6 files changed, 40 insertions(+), 7 deletions(-)
--
2.7.4
percpu_counter_sum_positive() will provide more accurate info.
As with percpu_counter_read_positive(), in worst case the deviation
could be 'batch * nr_cpus', which is totalram_pages/256 for now,
and will be more when the batch gets enlarged.
Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3
microseconds on a 2S/36C/72T Skylake server in normal case, and in
worst case where vm_committed_as's spinlock is under severe
contention, it costs 30~40 microseconds for the 2S/36C/72T Skylake
sever, which should be fine for its only two users: /proc/meminfo
and HyperV balloon driver's status trace per second.
Signed-off-by: Feng Tang <[email protected]>
Acked-by: Michal Hocko <[email protected]> # for /proc/meminfo
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Huang Ying <[email protected]>
---
mm/util.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c
index c63c8e4..1c9d097 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -787,10 +787,15 @@ struct percpu_counter vm_committed_as ____cacheline_aligned_in_smp;
* balancing memory across competing virtual machines that are hosted.
* Several metrics drive this policy engine including the guest reported
* memory commitment.
+ *
+ * The time cost of this is very low for small platforms, and for big
+ * platform like a 2S/36C/72T Skylake server, in worst case where
+ * vm_committed_as's spinlock is under severe contention, the time cost
+ * could be about 30~40 microseconds.
*/
unsigned long vm_memory_committed(void)
{
- return percpu_counter_read_positive(&vm_committed_as);
+ return percpu_counter_sum_positive(&vm_committed_as);
}
EXPORT_SYMBOL_GPL(vm_memory_committed);
--
2.7.4
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':
94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave
48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;
Actually this heavy lock contention is not always necessary. The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.
So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also
add a sysctl handler to adjust it when the policy is reconfigured.
Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test. And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.
And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:
: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch. E.g. large in memory databases which do large mmaps
: during startups from multiple threads.
[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Feng Tang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Huang Ying <[email protected]>
---
include/linux/mm.h | 2 ++
include/linux/mman.h | 4 ++++
kernel/sysctl.c | 2 +-
mm/mm_init.c | 18 ++++++++++++++----
mm/util.c | 12 ++++++++++++
5 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e6ff54a..d00facb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *,
loff_t *);
int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *,
loff_t *);
+int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
+ loff_t *);
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 4b08e9c..91c93c1 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -57,8 +57,12 @@ extern struct percpu_counter vm_committed_as;
#ifdef CONFIG_SMP
extern s32 vm_committed_as_batch;
+extern void mm_compute_batch(void);
#else
#define vm_committed_as_batch 0
+static inline void mm_compute_batch(void)
+{
+}
#endif
unsigned long vm_memory_committed(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 40180cd..10dcc06 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2650,7 +2650,7 @@ static struct ctl_table vm_table[] = {
.data = &sysctl_overcommit_memory,
.maxlen = sizeof(sysctl_overcommit_memory),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = overcommit_policy_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = &two,
},
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 435e5f7..c5a6fb1 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -13,6 +13,7 @@
#include <linux/memory.h>
#include <linux/notifier.h>
#include <linux/sched.h>
+#include <linux/mman.h>
#include "internal.h"
#ifdef CONFIG_DEBUG_MEMORY_INIT
@@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj);
#ifdef CONFIG_SMP
s32 vm_committed_as_batch = 32;
-static void __meminit mm_compute_batch(void)
+void mm_compute_batch(void)
{
u64 memsized_batch;
s32 nr = num_present_cpus();
s32 batch = max_t(s32, nr*2, 32);
-
- /* batch size set to 0.4% of (total memory/#cpus), or max int32 */
- memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff);
+ unsigned long ram_pages = totalram_pages();
+
+ /*
+ * For policy of OVERCOMMIT_NEVER, set batch size to 0.4%
+ * of (total memory/#cpus), and lift it to 25% for other
+ * policies to easy the possible lock contention for percpu_counter
+ * vm_committed_as, while the max limit is INT_MAX
+ */
+ if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
+ memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX);
+ else
+ memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX);
vm_committed_as_batch = max_t(s32, memsized_batch, batch);
}
diff --git a/mm/util.c b/mm/util.c
index 1c9d097..52ed9c1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -746,6 +746,18 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
return ret;
}
+int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+ if (ret == 0 && write)
+ mm_compute_batch();
+
+ return ret;
+}
+
int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer,
size_t *lenp, loff_t *ppos)
{
--
2.7.4
Greeting,
FYI, we noticed a 1894.6% improvement of will-it-scale.per_process_ops due to commit:
commit: 4e2c82a40911c19419349918e675aa202b113b4d ("[PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy")
url: https://github.com/0day-ci/linux/commits/Feng-Tang/make-vm_committed_as_batch-aware-of-vm-overcommit-policy/20200621-153906
in testcase: will-it-scale
on test machine: 104 threads Skylake with 192G memory
with following parameters:
nr_task: 100%
mode: process
test: mmap1
cpufreq_governor: performance
ucode: 0x2000065
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-skl-fpga01/mmap1/will-it-scale/0x2000065
commit:
e172eef574 ("mm/util.c: make vm_memory_committed() more accurate")
4e2c82a409 ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
e172eef574090d38 4e2c82a40911c19419349918e67
---------------- ---------------------------
%stddev %change %stddev
\ | \
12134 +1894.6% 242032 will-it-scale.per_process_ops
1262048 +1894.5% 25171362 will-it-scale.workload
93285003 ± 35% -58.8% 38472987 ±109% cpuidle.C6.time
156141 ± 10% -51.8% 75193 ± 68% cpuidle.C6.usage
9175913 +37.5% 12615167 meminfo.Committed_AS
98360 -11.8% 86768 meminfo.Shmem
98.78 -17.1 81.70 mpstat.cpu.all.sys%
0.84 +17.1 17.90 mpstat.cpu.all.usr%
91727 ± 3% -10.2% 82361 ± 3% numa-meminfo.node0.SUnreclaim
78758 ± 4% +9.8% 86490 ± 3% numa-meminfo.node1.SUnreclaim
22931 ± 3% -10.2% 20589 ± 3% numa-vmstat.node0.nr_slab_unreclaimable
19689 ± 4% +9.8% 21622 ± 3% numa-vmstat.node1.nr_slab_unreclaimable
98.00 -17.3% 81.00 vmstat.cpu.sy
1.00 +1600.0% 17.00 vmstat.cpu.us
1374 +8.7% 1493 vmstat.system.cs
93504 -3.0% 90744 proc-vmstat.nr_active_anon
7150 -1.2% 7060 proc-vmstat.nr_inactive_anon
24614 -11.9% 21686 proc-vmstat.nr_shmem
93504 -3.0% 90744 proc-vmstat.nr_zone_active_anon
7150 -1.2% 7060 proc-vmstat.nr_zone_inactive_anon
737688 -1.3% 727889 proc-vmstat.numa_hit
703983 -1.4% 694240 proc-vmstat.numa_local
4068 ± 47% +510.6% 24844 ± 61% proc-vmstat.numa_pte_updates
26377 -16.3% 22072 ± 2% proc-vmstat.pgactivate
154.89 +12.8% 174.68 ± 2% sched_debug.cfs_rq:/.exec_clock.stddev
44.42 ± 14% +25.5% 55.75 ± 13% sched_debug.cfs_rq:/.nr_spread_over.max
412.89 ± 2% +9.9% 453.76 ± 4% sched_debug.cfs_rq:/.util_est_enqueued.avg
125.87 ± 9% +55.0% 195.07 ± 2% sched_debug.cfs_rq:/.util_est_enqueued.stddev
8.58 ± 17% +37.2% 11.78 ± 10% sched_debug.cpu.clock.stddev
8.58 ± 17% +37.2% 11.78 ± 10% sched_debug.cpu.clock_task.stddev
0.00 ± 2% +21.9% 0.00 ± 11% sched_debug.cpu.next_balance.stddev
2499 ± 8% +31.1% 3276 ± 5% sched_debug.cpu.nr_switches.stddev
1911 ± 5% +30.3% 2491 sched_debug.cpu.sched_count.stddev
4564 ± 8% +79.3% 8183 ± 14% sched_debug.cpu.ttwu_count.max
742.42 ± 4% +67.5% 1243 ± 8% sched_debug.cpu.ttwu_count.stddev
730.40 -9.8% 659.01 sched_debug.cpu.ttwu_local.avg
303.67 ± 4% +15.5% 350.83 ± 4% sched_debug.cpu.ttwu_local.min
62371 +41.5% 88277 ± 3% interrupts.CAL:Function_call_interrupts
956.75 ± 30% +34.3% 1284 ± 4% interrupts.CPU0.CAL:Function_call_interrupts
956.75 ± 28% +115.3% 2060 ± 42% interrupts.CPU1.CAL:Function_call_interrupts
476.00 ± 20% +42.8% 679.50 ± 31% interrupts.CPU1.RES:Rescheduling_interrupts
518.75 ± 10% -12.3% 455.00 interrupts.CPU101.CAL:Function_call_interrupts
554.50 ± 58% -43.5% 313.50 interrupts.CPU101.RES:Rescheduling_interrupts
378.75 ± 13% -16.3% 317.00 interrupts.CPU102.RES:Rescheduling_interrupts
844.75 ± 8% +64.0% 1385 ± 14% interrupts.CPU26.CAL:Function_call_interrupts
330.50 ± 2% -6.4% 309.25 interrupts.CPU39.RES:Rescheduling_interrupts
377.00 ± 8% -11.3% 334.50 interrupts.CPU5.RES:Rescheduling_interrupts
1167 ± 6% +32.9% 1552 ± 2% interrupts.CPU52.CAL:Function_call_interrupts
526.25 ± 5% +20.9% 636.00 ± 13% interrupts.CPU52.RES:Rescheduling_interrupts
1822 ± 2% +121.0% 4027 ± 36% interrupts.CPU53.CAL:Function_call_interrupts
587.75 ± 9% +28.4% 754.50 ± 11% interrupts.CPU53.RES:Rescheduling_interrupts
1979 ± 13% +240.0% 6729 ± 23% interrupts.CPU54.CAL:Function_call_interrupts
356.50 ± 5% -10.0% 321.00 ± 3% interrupts.CPU64.RES:Rescheduling_interrupts
360.50 ± 6% -10.9% 321.25 ± 3% interrupts.CPU67.RES:Rescheduling_interrupts
1204 ± 3% +30.2% 1567 ± 3% interrupts.CPU78.CAL:Function_call_interrupts
1877 ± 17% +189.9% 5443 ± 21% interrupts.CPU79.CAL:Function_call_interrupts
1875 ± 11% +183.2% 5312 ± 37% interrupts.CPU80.CAL:Function_call_interrupts
1462 ± 35% +119.8% 3214 ± 21% interrupts.CPU81.CAL:Function_call_interrupts
409.00 ± 18% -21.7% 320.25 ± 3% interrupts.CPU86.RES:Rescheduling_interrupts
546.50 ± 55% -41.7% 318.50 interrupts.CPU92.RES:Rescheduling_interrupts
369.25 ± 10% -12.1% 324.75 ± 7% interrupts.CPU95.RES:Rescheduling_interrupts
137.00 ± 21% -33.8% 90.75 ± 10% interrupts.IWI:IRQ_work_interrupts
41580 +10.7% 46012 ± 3% interrupts.RES:Rescheduling_interrupts
95.75 ± 27% +156.1% 245.25 ± 31% interrupts.TLB:TLB_shootdowns
3.23 -98.8% 0.04 ± 14% perf-stat.i.MPKI
4.957e+09 +1087.7% 5.887e+10 perf-stat.i.branch-instructions
0.41 +0.0 0.43 perf-stat.i.branch-miss-rate%
20020017 +1123.3% 2.449e+08 perf-stat.i.branch-misses
40.71 -32.6 8.14 ± 18% perf-stat.i.cache-miss-rate%
27302806 -98.2% 482743 ± 11% perf-stat.i.cache-misses
67071073 -91.2% 5902976 ± 11% perf-stat.i.cache-references
1339 +9.4% 1466 perf-stat.i.context-switches
13.85 -92.0% 1.11 perf-stat.i.cpi
2.882e+11 -3.5% 2.783e+11 perf-stat.i.cpu-cycles
143.91 +12.2% 161.48 perf-stat.i.cpu-migrations
10534 +6412.5% 686069 ± 6% perf-stat.i.cycles-between-cache-misses
0.05 +0.0 0.07 perf-stat.i.dTLB-load-miss-rate%
2601973 +1828.7% 50183924 perf-stat.i.dTLB-load-misses
5.777e+09 +1183.8% 7.416e+10 perf-stat.i.dTLB-loads
0.00 ± 4% -0.0 0.00 ± 14% perf-stat.i.dTLB-store-miss-rate%
11333 +308.3% 46270 perf-stat.i.dTLB-store-misses
1.712e+09 +1723.7% 3.123e+10 perf-stat.i.dTLB-stores
16.99 +16.4 33.43 perf-stat.i.iTLB-load-miss-rate%
1019286 +5015.7% 52143535 perf-stat.i.iTLB-load-misses
4980632 +1977.5% 1.035e+08 perf-stat.i.iTLB-loads
2.079e+10 +1107.7% 2.511e+11 perf-stat.i.instructions
20597 -76.2% 4909 perf-stat.i.instructions-per-iTLB-miss
0.07 +1112.2% 0.90 perf-stat.i.ipc
2.77 -3.5% 2.68 perf-stat.i.metric.GHz
0.29 +386.2% 1.40 ± 4% perf-stat.i.metric.K/sec
120.45 +1211.8% 1580 perf-stat.i.metric.M/sec
99.58 -9.9 89.65 perf-stat.i.node-load-miss-rate%
5372890 -98.1% 102469 ± 16% perf-stat.i.node-load-misses
17999 ± 4% +7.1% 19279 ± 3% perf-stat.i.node-loads
99.70 -15.3 84.37 perf-stat.i.node-store-miss-rate%
4224840 -99.6% 16674 perf-stat.i.node-store-misses
3957 ± 5% +20.7% 4778 ± 4% perf-stat.i.node-stores
3.23 -99.3% 0.02 ± 11% perf-stat.overall.MPKI
0.40 +0.0 0.42 perf-stat.overall.branch-miss-rate%
40.69 -32.4 8.33 ± 18% perf-stat.overall.cache-miss-rate%
13.86 -92.0% 1.11 perf-stat.overall.cpi
10555 +5417.7% 582409 ± 10% perf-stat.overall.cycles-between-cache-misses
0.05 +0.0 0.07 perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 2% -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
16.98 +16.5 33.50 perf-stat.overall.iTLB-load-miss-rate%
20411 -76.4% 4816 perf-stat.overall.instructions-per-iTLB-miss
0.07 +1150.8% 0.90 perf-stat.overall.ipc
99.64 -15.9 83.75 perf-stat.overall.node-load-miss-rate%
99.90 -22.3 77.60 perf-stat.overall.node-store-miss-rate%
4964886 -39.5% 3004976 perf-stat.overall.path-length
4.94e+09 +1087.6% 5.867e+10 perf-stat.ps.branch-instructions
19955233 +1123.3% 2.441e+08 perf-stat.ps.branch-misses
27214496 -98.2% 482131 ± 11% perf-stat.ps.cache-misses
66882519 -91.2% 5886018 ± 11% perf-stat.ps.cache-references
1335 +9.3% 1459 perf-stat.ps.context-switches
2.873e+11 -3.5% 2.773e+11 perf-stat.ps.cpu-cycles
143.62 +11.9% 160.74 perf-stat.ps.cpu-migrations
2604905 +1819.8% 50009913 perf-stat.ps.dTLB-load-misses
5.758e+09 +1183.7% 7.391e+10 perf-stat.ps.dTLB-loads
11577 ± 2% +298.4% 46128 perf-stat.ps.dTLB-store-misses
1.707e+09 +1723.5% 3.112e+10 perf-stat.ps.dTLB-stores
1015525 +5016.3% 51957351 perf-stat.ps.iTLB-load-misses
4964472 +1977.1% 1.031e+08 perf-stat.ps.iTLB-loads
2.072e+10 +1107.6% 2.503e+11 perf-stat.ps.instructions
5354985 -98.1% 102049 ± 16% perf-stat.ps.node-load-misses
4210659 -99.6% 16606 perf-stat.ps.node-store-misses
4036 ± 6% +18.8% 4795 ± 4% perf-stat.ps.node-stores
6.266e+12 +1107.2% 7.564e+13 perf-stat.total.instructions
6110 ± 4% +170.0% 16499 ± 2% softirqs.CPU0.RCU
131029 -22.3% 101841 ± 2% softirqs.CPU0.TIMER
5426 ± 6% +192.8% 15889 ± 8% softirqs.CPU1.RCU
4265 ± 2% +267.8% 15691 ± 3% softirqs.CPU10.RCU
126375 -23.9% 96143 softirqs.CPU10.TIMER
3733 +298.3% 14869 ± 3% softirqs.CPU100.RCU
125994 -26.9% 92120 ± 2% softirqs.CPU100.TIMER
3815 ± 6% +284.1% 14652 ± 2% softirqs.CPU101.RCU
126160 -27.5% 91409 ± 2% softirqs.CPU101.TIMER
3654 ± 3% +303.2% 14732 ± 2% softirqs.CPU102.RCU
126138 -27.5% 91507 ± 2% softirqs.CPU102.TIMER
4662 ± 14% +225.6% 15182 ± 5% softirqs.CPU103.RCU
127953 -28.0% 92148 ± 2% softirqs.CPU103.TIMER
4203 +271.5% 15613 ± 3% softirqs.CPU11.RCU
126126 -24.1% 95681 softirqs.CPU11.TIMER
4341 ± 6% +256.5% 15474 ± 3% softirqs.CPU12.RCU
126312 -24.7% 95061 softirqs.CPU12.TIMER
4450 ± 5% +261.6% 16090 ± 2% softirqs.CPU13.RCU
126426 -23.9% 96172 softirqs.CPU13.TIMER
4197 +280.3% 15964 ± 4% softirqs.CPU14.RCU
126366 -23.7% 96372 softirqs.CPU14.TIMER
4320 ± 4% +314.7% 17915 ± 3% softirqs.CPU15.RCU
126212 -23.3% 96764 softirqs.CPU15.TIMER
4204 ± 2% +323.1% 17786 ± 9% softirqs.CPU16.RCU
4440 ± 11% +303.4% 17912 softirqs.CPU17.RCU
126505 -24.5% 95546 softirqs.CPU17.TIMER
4061 +334.4% 17640 ± 2% softirqs.CPU18.RCU
125981 -24.0% 95706 softirqs.CPU18.TIMER
4114 +301.3% 16513 ± 11% softirqs.CPU19.RCU
126207 -24.6% 95201 ± 2% softirqs.CPU19.TIMER
4663 ± 2% +249.1% 16277 ± 3% softirqs.CPU2.RCU
126641 -24.5% 95677 ± 2% softirqs.CPU2.TIMER
4135 ± 2% +327.9% 17694 ± 2% softirqs.CPU20.RCU
126190 -23.8% 96097 softirqs.CPU20.TIMER
4117 +329.4% 17678 ± 2% softirqs.CPU21.RCU
126078 -24.1% 95698 softirqs.CPU21.TIMER
4108 +335.3% 17883 ± 3% softirqs.CPU22.RCU
125926 -23.3% 96581 softirqs.CPU22.TIMER
4346 ± 10% +304.4% 17574 ± 3% softirqs.CPU23.RCU
126388 -24.6% 95334 softirqs.CPU23.TIMER
4289 ± 5% +314.4% 17774 ± 3% softirqs.CPU24.RCU
126365 -24.7% 95125 softirqs.CPU24.TIMER
4275 ± 3% +313.4% 17678 ± 2% softirqs.CPU25.RCU
126236 -24.6% 95217 softirqs.CPU25.TIMER
4692 ± 2% +243.6% 16123 ± 2% softirqs.CPU26.RCU
126801 -27.2% 92342 ± 2% softirqs.CPU26.TIMER
4945 ± 8% +233.8% 16506 ± 4% softirqs.CPU27.RCU
127385 -27.2% 92710 softirqs.CPU27.TIMER
4423 ± 7% +253.8% 15650 ± 3% softirqs.CPU28.RCU
134939 ± 10% -31.7% 92217 ± 2% softirqs.CPU28.TIMER
6106 ± 48% +182.5% 17251 ± 17% softirqs.CPU29.RCU
130358 ± 2% -28.3% 93520 ± 4% softirqs.CPU29.TIMER
4362 ± 2% +270.0% 16139 ± 2% softirqs.CPU3.RCU
126413 -23.6% 96611 softirqs.CPU3.TIMER
4321 ± 6% +290.0% 16854 ± 3% softirqs.CPU30.RCU
127297 -27.3% 92574 ± 2% softirqs.CPU30.TIMER
4246 ± 7% +320.7% 17866 ± 11% softirqs.CPU31.RCU
126876 -25.5% 94552 ± 4% softirqs.CPU31.TIMER
4270 ± 5% +291.0% 16695 ± 3% softirqs.CPU32.RCU
127209 -27.1% 92679 ± 2% softirqs.CPU32.TIMER
4295 ± 7% +287.9% 16658 ± 2% softirqs.CPU33.RCU
127371 -27.6% 92158 ± 2% softirqs.CPU33.TIMER
4402 ± 4% +286.8% 17026 ± 3% softirqs.CPU34.RCU
127223 -27.3% 92495 ± 2% softirqs.CPU34.TIMER
4271 ± 9% +288.0% 16572 ± 2% softirqs.CPU35.RCU
127855 -28.3% 91697 ± 2% softirqs.CPU35.TIMER
4134 ± 3% +297.2% 16425 ± 3% softirqs.CPU36.RCU
126837 -27.5% 91968 ± 2% softirqs.CPU36.TIMER
3998 +306.0% 16230 ± 3% softirqs.CPU37.RCU
126729 -27.3% 92174 softirqs.CPU37.TIMER
4309 ± 11% +280.5% 16398 softirqs.CPU38.RCU
127222 -27.1% 92707 ± 2% softirqs.CPU38.TIMER
4060 ± 2% +303.7% 16393 ± 2% softirqs.CPU39.RCU
126816 -27.6% 91843 ± 2% softirqs.CPU39.TIMER
4407 ± 2% +260.9% 15907 ± 2% softirqs.CPU4.RCU
126543 -24.2% 95938 softirqs.CPU4.TIMER
4005 +316.9% 16701 softirqs.CPU40.RCU
126854 -26.6% 93051 ± 2% softirqs.CPU40.TIMER
3956 +316.3% 16469 ± 3% softirqs.CPU41.RCU
126714 -27.2% 92226 softirqs.CPU41.TIMER
4061 ± 2% +298.6% 16186 ± 2% softirqs.CPU42.RCU
126738 -27.4% 91971 ± 2% softirqs.CPU42.TIMER
4094 ± 6% +290.4% 15982 ± 2% softirqs.CPU43.RCU
126917 -27.4% 92181 ± 2% softirqs.CPU43.TIMER
3896 +323.7% 16508 ± 4% softirqs.CPU44.RCU
126547 -26.3% 93260 ± 4% softirqs.CPU44.TIMER
3967 +308.3% 16197 ± 3% softirqs.CPU45.RCU
126577 -26.9% 92512 ± 2% softirqs.CPU45.TIMER
4531 ± 11% +253.0% 15997 ± 5% softirqs.CPU46.RCU
3957 +318.1% 16545 ± 2% softirqs.CPU47.RCU
126651 -27.2% 92139 ± 2% softirqs.CPU47.TIMER
4306 ± 8% +287.6% 16688 ± 5% softirqs.CPU48.RCU
136498 ± 11% -32.2% 92515 ± 3% softirqs.CPU48.TIMER
3943 +312.6% 16271 ± 2% softirqs.CPU49.RCU
126688 -27.6% 91778 ± 2% softirqs.CPU49.TIMER
4604 ± 8% +243.1% 15796 ± 3% softirqs.CPU5.RCU
126598 -24.2% 95934 softirqs.CPU5.TIMER
4034 ± 3% +303.4% 16272 softirqs.CPU50.RCU
126758 -27.7% 91677 softirqs.CPU50.TIMER
3957 ± 3% +313.5% 16365 ± 2% softirqs.CPU51.RCU
126637 -27.2% 92186 ± 2% softirqs.CPU51.TIMER
5784 +245.4% 19976 ± 2% softirqs.CPU52.RCU
125703 -22.7% 97173 softirqs.CPU52.TIMER
5831 ± 3% +210.7% 18115 ± 13% softirqs.CPU53.RCU
125659 -24.6% 94807 ± 2% softirqs.CPU53.TIMER
5513 ± 7% +249.5% 19267 ± 2% softirqs.CPU54.RCU
126963 -24.4% 96024 ± 2% softirqs.CPU54.TIMER
4426 +359.7% 20350 ± 13% softirqs.CPU55.RCU
125809 -22.7% 97211 ± 3% softirqs.CPU55.TIMER
4262 ± 2% +330.2% 18339 softirqs.CPU56.RCU
125733 -24.2% 95356 softirqs.CPU56.TIMER
4143 ± 5% +332.7% 17926 softirqs.CPU57.RCU
125930 -24.5% 95106 softirqs.CPU57.TIMER
4052 ± 3% +337.3% 17719 softirqs.CPU58.RCU
125971 -24.7% 94890 softirqs.CPU58.TIMER
4047 +350.8% 18243 softirqs.CPU59.RCU
125799 -23.7% 95943 softirqs.CPU59.TIMER
4524 ± 4% +243.5% 15541 ± 2% softirqs.CPU6.RCU
126547 -24.6% 95402 softirqs.CPU6.TIMER
3937 +292.2% 15441 ± 2% softirqs.CPU60.RCU
125746 -24.1% 95502 softirqs.CPU60.TIMER
3995 ± 4% +286.7% 15450 ± 2% softirqs.CPU61.RCU
125797 -24.4% 95077 softirqs.CPU61.TIMER
3897 +297.4% 15486 ± 2% softirqs.CPU62.RCU
125352 -24.0% 95299 softirqs.CPU62.TIMER
3901 +299.1% 15570 ± 4% softirqs.CPU63.RCU
125467 -24.0% 95340 softirqs.CPU63.TIMER
3901 +299.6% 15587 ± 3% softirqs.CPU64.RCU
125635 -24.7% 94635 softirqs.CPU64.TIMER
3931 ± 2% +287.0% 15215 ± 3% softirqs.CPU65.RCU
125605 -24.2% 95256 softirqs.CPU65.TIMER
3888 +301.0% 15590 ± 2% softirqs.CPU66.RCU
125564 -23.7% 95791 softirqs.CPU66.TIMER
3880 +303.9% 15673 ± 4% softirqs.CPU67.RCU
125496 -23.3% 96252 softirqs.CPU67.TIMER
3823 +289.1% 14876 ± 11% softirqs.CPU68.RCU
125395 -24.7% 94405 ± 2% softirqs.CPU68.TIMER
3872 +304.9% 15679 ± 3% softirqs.CPU69.RCU
125436 -24.4% 94854 softirqs.CPU69.TIMER
4291 +270.3% 15888 ± 3% softirqs.CPU7.RCU
126326 -23.9% 96082 softirqs.CPU7.TIMER
3866 +303.8% 15615 ± 3% softirqs.CPU70.RCU
125491 -23.9% 95510 softirqs.CPU70.TIMER
3853 +305.2% 15612 ± 6% softirqs.CPU71.RCU
3901 ± 4% +293.4% 15350 ± 3% softirqs.CPU72.RCU
125753 -23.8% 95803 softirqs.CPU72.TIMER
3803 ± 2% +302.4% 15302 ± 3% softirqs.CPU73.RCU
125519 -23.9% 95474 softirqs.CPU73.TIMER
3938 ± 6% +292.3% 15451 ± 4% softirqs.CPU74.RCU
125510 -24.0% 95438 softirqs.CPU74.TIMER
4074 ± 11% +270.9% 15113 ± 3% softirqs.CPU75.RCU
126182 -24.6% 95178 softirqs.CPU75.TIMER
3793 ± 2% +309.7% 15540 ± 2% softirqs.CPU76.RCU
125474 -24.1% 95180 softirqs.CPU76.TIMER
4075 ± 5% +270.2% 15085 ± 3% softirqs.CPU77.RCU
125744 -24.5% 94928 softirqs.CPU77.TIMER
6033 ± 4% +208.0% 18580 softirqs.CPU78.RCU
126368 -27.3% 91863 ± 2% softirqs.CPU78.TIMER
5481 ± 2% +223.7% 17740 ± 2% softirqs.CPU79.RCU
126427 -27.3% 91869 ± 2% softirqs.CPU79.TIMER
4293 +268.2% 15810 ± 2% softirqs.CPU8.RCU
126323 -24.3% 95674 softirqs.CPU8.TIMER
4934 ± 6% +256.0% 17569 ± 5% softirqs.CPU80.RCU
126434 -26.9% 92478 softirqs.CPU80.TIMER
4737 ± 9% +258.1% 16965 ± 3% softirqs.CPU81.RCU
125204 -26.5% 91964 ± 2% softirqs.CPU81.TIMER
7821 ± 39% +129.5% 17954 ± 17% softirqs.CPU82.RCU
130022 ± 2% -28.3% 93274 ± 4% softirqs.CPU82.TIMER
5849 ± 49% +211.2% 18204 ± 15% softirqs.CPU83.RCU
128023 -27.4% 92901 ± 2% softirqs.CPU83.TIMER
3962 ± 2% +324.3% 16810 softirqs.CPU84.RCU
126550 -27.3% 91964 ± 2% softirqs.CPU84.TIMER
3886 ± 2% +328.7% 16661 ± 2% softirqs.CPU85.RCU
126621 -27.4% 91951 ± 2% softirqs.CPU85.TIMER
3906 ± 4% +318.9% 16363 ± 2% softirqs.CPU86.RCU
126692 -27.5% 91851 ± 2% softirqs.CPU86.TIMER
4133 ± 3% +286.7% 15983 ± 2% softirqs.CPU87.RCU
126919 -28.2% 91166 ± 2% softirqs.CPU87.TIMER
3845 +322.6% 16250 ± 2% softirqs.CPU88.RCU
126361 -27.4% 91676 softirqs.CPU88.TIMER
3950 ± 4% +318.1% 16516 softirqs.CPU89.RCU
134193 ± 10% -31.6% 91850 ± 2% softirqs.CPU89.TIMER
4248 +269.1% 15678 ± 2% softirqs.CPU9.RCU
126127 -24.4% 95382 softirqs.CPU9.TIMER
3749 +292.1% 14702 ± 2% softirqs.CPU90.RCU
126128 -27.5% 91394 ± 2% softirqs.CPU90.TIMER
3968 ± 8% +272.1% 14765 ± 3% softirqs.CPU91.RCU
126535 -27.5% 91676 ± 2% softirqs.CPU91.TIMER
4027 ± 10% +265.5% 14722 ± 2% softirqs.CPU92.RCU
126767 -27.7% 91678 ± 2% softirqs.CPU92.TIMER
3768 ± 3% +293.2% 14816 softirqs.CPU93.RCU
126161 -27.2% 91824 ± 2% softirqs.CPU93.TIMER
3771 ± 2% +286.3% 14569 ± 2% softirqs.CPU94.RCU
126058 -27.4% 91551 ± 2% softirqs.CPU94.TIMER
3785 ± 3% +288.8% 14714 softirqs.CPU95.RCU
126186 -27.2% 91831 ± 2% softirqs.CPU95.TIMER
3754 +290.8% 14670 ± 2% softirqs.CPU96.RCU
126106 -27.3% 91676 ± 2% softirqs.CPU96.TIMER
3763 +301.0% 15092 ± 2% softirqs.CPU97.RCU
126045 -26.9% 92192 ± 3% softirqs.CPU97.TIMER
3792 ± 3% +282.6% 14508 ± 10% softirqs.CPU98.RCU
126131 -27.5% 91451 softirqs.CPU98.TIMER
3819 ± 3% +286.4% 14757 ± 2% softirqs.CPU99.RCU
126210 -27.7% 91293 ± 2% softirqs.CPU99.TIMER
447015 +280.5% 1701111 ± 2% softirqs.RCU
13186374 -25.4% 9837196 softirqs.TIMER
48.24 -48.2 0.00 perf-profile.calltrace.cycles-pp.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
48.24 -48.2 0.00 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff
47.88 -47.9 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap
47.76 -47.8 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region
47.73 -47.7 0.00 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
47.36 -47.4 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap
47.23 -47.2 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap
48.79 -36.9 11.88 perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
48.94 -33.0 15.91 perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
49.05 -30.1 18.91 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
49.06 -29.9 19.19 perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
49.33 -24.8 24.49 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
49.34 -24.6 24.77 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mmap64
49.63 -17.6 32.02 perf-profile.calltrace.cycles-pp.mmap64
0.00 +0.6 0.58 perf-profile.calltrace.cycles-pp.security_mmap_addr.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.00 +0.6 0.58 perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.6 0.58 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
0.00 +0.6 0.62 perf-profile.calltrace.cycles-pp.cap_vm_enough_memory.security_vm_enough_memory_mm.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +0.7 0.66 perf-profile.calltrace.cycles-pp.strlcpy.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +0.7 0.70 perf-profile.calltrace.cycles-pp.__vma_rb_erase.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.00 +0.8 0.75 perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.83 perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.00 +1.0 1.01 perf-profile.calltrace.cycles-pp.vmacache_find.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap
0.00 +1.1 1.08 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +1.1 1.12 perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.get_unmapped_area.do_mmap.vm_mmap_pgoff
0.00 +1.3 1.25 perf-profile.calltrace.cycles-pp.remove_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.00 +1.3 1.26 perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap.mmap_region.do_mmap
0.00 +1.3 1.30 perf-profile.calltrace.cycles-pp.security_mmap_file.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.6 1.63 perf-profile.calltrace.cycles-pp.__vma_link_rb.vma_link.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +1.9 1.89 perf-profile.calltrace.cycles-pp.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.00 +2.0 2.01 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.00 +2.1 2.14 perf-profile.calltrace.cycles-pp.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.00 +2.5 2.47 perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +2.6 2.57 perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.00 +2.6 2.60 perf-profile.calltrace.cycles-pp.rcu_all_qs._cond_resched.unmap_page_range.unmap_vmas.unmap_region
0.00 +3.0 3.00 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.munmap
0.00 +3.0 3.04 perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.00 +3.2 3.15 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.mmap64
0.00 +3.4 3.37 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.mmap64
0.00 +3.5 3.46 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.munmap
49.52 +3.8 53.33 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +4.7 4.67 perf-profile.calltrace.cycles-pp.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
49.77 +4.9 54.71 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
0.00 +5.4 5.40 perf-profile.calltrace.cycles-pp._cond_resched.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
49.80 +5.5 55.27 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
0.00 +7.5 7.49 perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.unmap_region.__do_munmap.__vm_munmap
0.00 +7.9 7.95 perf-profile.calltrace.cycles-pp.free_pgd_range.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
50.05 +10.5 60.53 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
50.06 +10.7 60.79 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap
0.38 ± 57% +14.9 15.24 perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
50.33 +17.5 67.84 perf-profile.calltrace.cycles-pp.munmap
1.12 +33.4 34.49 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
1.21 +35.2 36.40 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
1.54 +44.8 46.37 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
95.97 -95.4 0.56 perf-profile.children.cycles-pp.percpu_counter_add_batch
95.25 -95.2 0.00 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
95.00 -95.0 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
48.24 -47.9 0.36 perf-profile.children.cycles-pp.__vm_enough_memory
48.79 -36.8 12.00 perf-profile.children.cycles-pp.mmap_region
48.94 -33.0 15.96 perf-profile.children.cycles-pp.do_mmap
49.05 -30.1 18.97 perf-profile.children.cycles-pp.vm_mmap_pgoff
49.06 -29.8 19.22 perf-profile.children.cycles-pp.ksys_mmap_pgoff
49.65 -17.2 32.44 perf-profile.children.cycles-pp.mmap64
99.40 -14.3 85.11 perf-profile.children.cycles-pp.do_syscall_64
99.42 -13.8 85.62 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.21 ± 2% +0.0 0.24 perf-profile.children.cycles-pp.up_read
0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.__x86_indirect_thunk_r9
0.00 +0.1 0.07 ± 6% perf-profile.children.cycles-pp.__vma_link_file
0.00 +0.1 0.07 ± 6% perf-profile.children.cycles-pp.tlb_table_flush
0.06 ± 6% +0.1 0.13 ± 6% perf-profile.children.cycles-pp.scheduler_tick
0.00 +0.1 0.07 perf-profile.children.cycles-pp.kfree
0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.unlink_file_vma
0.03 ±100% +0.1 0.11 ± 10% perf-profile.children.cycles-pp.task_tick_fair
0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.should_failslab
0.00 +0.1 0.09 perf-profile.children.cycles-pp.vma_wants_writenotify
0.00 +0.1 0.09 perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.00 +0.1 0.11 ± 4% perf-profile.children.cycles-pp.get_mmap_base
0.08 ± 8% +0.1 0.19 ± 4% perf-profile.children.cycles-pp.tick_sched_handle
0.08 ± 6% +0.1 0.19 ± 6% perf-profile.children.cycles-pp.update_process_times
0.00 +0.1 0.12 perf-profile.children.cycles-pp.__rb_insert_augmented
0.00 +0.1 0.12 ± 6% perf-profile.children.cycles-pp.vm_pgprot_modify
0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.vm_area_free
0.00 +0.1 0.13 ± 3% perf-profile.children.cycles-pp.__vma_link_list
0.00 +0.1 0.13 ± 7% perf-profile.children.cycles-pp.__x86_retpoline_rbp
0.08 ± 5% +0.1 0.22 ± 5% perf-profile.children.cycles-pp.tick_sched_timer
0.00 +0.1 0.13 ± 8% perf-profile.children.cycles-pp.ktime_get
0.13 ± 3% +0.1 0.27 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.00 +0.1 0.14 ± 7% perf-profile.children.cycles-pp.clockevents_program_event
0.00 +0.2 0.15 ± 3% perf-profile.children.cycles-pp.strlen
0.00 +0.2 0.18 ± 2% perf-profile.children.cycles-pp.ima_file_mmap
0.00 +0.2 0.19 perf-profile.children.cycles-pp.unmap_single_vma
0.00 +0.2 0.20 ± 2% perf-profile.children.cycles-pp.blocking_notifier_call_chain
0.00 +0.2 0.20 ± 2% perf-profile.children.cycles-pp.__x86_retpoline_r9
0.00 +0.2 0.20 ± 2% perf-profile.children.cycles-pp.userfaultfd_unmap_complete
0.00 +0.2 0.21 ± 2% perf-profile.children.cycles-pp.cap_capable
0.00 +0.2 0.21 ± 2% perf-profile.children.cycles-pp.unlink_anon_vmas
0.00 +0.2 0.21 perf-profile.children.cycles-pp.tlb_flush_mmu
0.00 +0.2 0.21 ± 3% perf-profile.children.cycles-pp.userfaultfd_unmap_prep
0.00 +0.2 0.21 ± 3% perf-profile.children.cycles-pp.may_expand_vm
0.21 ± 3% +0.2 0.43 ± 2% perf-profile.children.cycles-pp.asm_call_on_stack
0.00 +0.2 0.23 ± 3% perf-profile.children.cycles-pp.vma_merge
0.00 +0.2 0.24 perf-profile.children.cycles-pp.fpregs_assert_state_consistent
0.00 +0.2 0.24 ± 3% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.00 +0.3 0.27 perf-profile.children.cycles-pp.up_write
0.00 +0.3 0.27 perf-profile.children.cycles-pp.__syscall_return_slowpath
0.00 +0.3 0.28 ± 5% perf-profile.children.cycles-pp.vmacache_update
0.29 +0.3 0.57 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.18 ± 2% +0.3 0.47 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
0.19 ± 3% +0.3 0.48 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.21 ± 2% +0.3 0.51 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.00 +0.3 0.31 perf-profile.children.cycles-pp.cap_mmap_file
0.00 +0.3 0.32 ± 2% perf-profile.children.cycles-pp.__x64_sys_mmap
0.00 +0.3 0.33 ± 2% perf-profile.children.cycles-pp.lru_add_drain
0.00 +0.3 0.34 perf-profile.children.cycles-pp.vma_set_page_prot
0.00 +0.4 0.36 perf-profile.children.cycles-pp.downgrade_write
0.00 +0.4 0.36 ± 3% perf-profile.children.cycles-pp.cap_mmap_addr
0.00 +0.4 0.37 ± 3% perf-profile.children.cycles-pp.memcpy_erms
0.00 +0.4 0.40 ± 2% perf-profile.children.cycles-pp.tlb_gather_mmu
0.00 +0.4 0.41 perf-profile.children.cycles-pp.apparmor_mmap_file
0.00 +0.5 0.46 ± 2% perf-profile.children.cycles-pp.free_pgtables
0.00 +0.5 0.51 perf-profile.children.cycles-pp.__x86_retpoline_rax
0.00 +0.5 0.51 perf-profile.children.cycles-pp.kmem_cache_free
0.00 +0.5 0.53 perf-profile.children.cycles-pp.__prepare_exit_to_usermode
0.00 +0.6 0.59 perf-profile.children.cycles-pp.security_mmap_addr
0.00 +0.6 0.60 perf-profile.children.cycles-pp.tlb_finish_mmu
0.00 +0.6 0.64 perf-profile.children.cycles-pp.cap_vm_enough_memory
0.00 +0.7 0.68 perf-profile.children.cycles-pp.strlcpy
0.00 +0.7 0.70 perf-profile.children.cycles-pp.__vma_rb_erase
0.00 +0.7 0.74 perf-profile.children.cycles-pp.__might_sleep
0.00 +0.8 0.84 perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.23 +0.9 1.15 perf-profile.children.cycles-pp.kmem_cache_alloc
0.00 +1.1 1.05 perf-profile.children.cycles-pp.vmacache_find
0.00 +1.1 1.13 perf-profile.children.cycles-pp.vm_unmapped_area
0.00 +1.3 1.28 perf-profile.children.cycles-pp.perf_event_mmap_output
0.04 ± 57% +1.3 1.34 perf-profile.children.cycles-pp.security_mmap_file
0.00 +1.3 1.31 perf-profile.children.cycles-pp.remove_vma
0.05 ± 8% +1.3 1.39 perf-profile.children.cycles-pp.down_write_killable
0.00 +1.6 1.63 perf-profile.children.cycles-pp.__vma_link_rb
0.25 +1.6 1.89 perf-profile.children.cycles-pp.vm_area_alloc
0.07 ± 7% +2.0 2.05 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.05 ± 9% +2.1 2.19 perf-profile.children.cycles-pp.vma_link
0.07 ± 5% +2.4 2.50 perf-profile.children.cycles-pp.perf_iterate_sb
0.11 ± 4% +2.6 2.69 perf-profile.children.cycles-pp.find_vma
0.11 ± 3% +3.0 3.09 perf-profile.children.cycles-pp.get_unmapped_area
0.10 ± 4% +3.0 3.11 perf-profile.children.cycles-pp.rcu_all_qs
49.52 +3.9 53.47 perf-profile.children.cycles-pp.__do_munmap
0.15 ± 3% +4.6 4.75 perf-profile.children.cycles-pp.perf_event_mmap
49.77 +5.0 54.75 perf-profile.children.cycles-pp.__vm_munmap
49.80 +5.5 55.33 perf-profile.children.cycles-pp.__x64_sys_munmap
0.20 ± 3% +5.5 5.74 perf-profile.children.cycles-pp._cond_resched
0.27 +6.6 6.84 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.27 +6.7 6.92 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.26 +7.3 7.51 perf-profile.children.cycles-pp.free_p4d_range
0.27 +7.7 7.98 perf-profile.children.cycles-pp.free_pgd_range
0.55 +14.9 15.41 perf-profile.children.cycles-pp.___might_sleep
50.35 +18.0 68.32 perf-profile.children.cycles-pp.munmap
1.19 +34.3 35.50 perf-profile.children.cycles-pp.unmap_page_range
1.21 +35.2 36.44 perf-profile.children.cycles-pp.unmap_vmas
1.54 +44.9 46.43 perf-profile.children.cycles-pp.unmap_region
95.00 -95.0 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.21 ± 2% +0.0 0.23 perf-profile.self.cycles-pp.up_read
0.00 +0.1 0.05 perf-profile.self.cycles-pp.should_failslab
0.00 +0.1 0.05 perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.00 +0.1 0.05 perf-profile.self.cycles-pp.tlb_table_flush
0.00 +0.1 0.06 perf-profile.self.cycles-pp.kfree
0.00 +0.1 0.07 ± 6% perf-profile.self.cycles-pp.unlink_file_vma
0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.vma_wants_writenotify
0.00 +0.1 0.08 perf-profile.self.cycles-pp.lru_add_drain
0.00 +0.1 0.10 perf-profile.self.cycles-pp.get_mmap_base
0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.__vm_enough_memory
0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.__vma_link_list
0.00 +0.1 0.11 ± 4% perf-profile.self.cycles-pp.__x86_retpoline_rbp
0.00 +0.1 0.11 ± 4% perf-profile.self.cycles-pp.vm_area_free
0.00 +0.1 0.11 ± 6% perf-profile.self.cycles-pp.vm_pgprot_modify
0.00 +0.1 0.12 perf-profile.self.cycles-pp.__rb_insert_augmented
0.00 +0.1 0.12 ± 8% perf-profile.self.cycles-pp.ktime_get
0.00 +0.1 0.13 perf-profile.self.cycles-pp.security_mmap_addr
0.00 +0.1 0.13 ± 3% perf-profile.self.cycles-pp.security_vm_enough_memory_mm
0.00 +0.1 0.14 ± 3% perf-profile.self.cycles-pp.vma_set_page_prot
0.40 +0.1 0.54 ± 2% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.00 +0.1 0.15 ± 3% perf-profile.self.cycles-pp.strlen
0.00 +0.1 0.15 ± 2% perf-profile.self.cycles-pp.tlb_flush_mmu
0.00 +0.2 0.17 ± 2% perf-profile.self.cycles-pp.ima_file_mmap
0.00 +0.2 0.18 ± 2% perf-profile.self.cycles-pp.unmap_single_vma
0.00 +0.2 0.18 ± 4% perf-profile.self.cycles-pp.__x86_retpoline_r9
0.00 +0.2 0.18 ± 2% perf-profile.self.cycles-pp.userfaultfd_unmap_complete
0.00 +0.2 0.19 ± 4% perf-profile.self.cycles-pp.free_pgtables
0.00 +0.2 0.19 ± 3% perf-profile.self.cycles-pp.unlink_anon_vmas
0.00 +0.2 0.19 ± 2% perf-profile.self.cycles-pp.userfaultfd_unmap_prep
0.00 +0.2 0.19 ± 2% perf-profile.self.cycles-pp.blocking_notifier_call_chain
0.00 +0.2 0.20 ± 4% perf-profile.self.cycles-pp.remove_vma
0.00 +0.2 0.20 ± 2% perf-profile.self.cycles-pp.cap_capable
0.00 +0.2 0.20 ± 4% perf-profile.self.cycles-pp.may_expand_vm
0.00 +0.2 0.21 ± 2% perf-profile.self.cycles-pp.vma_merge
0.00 +0.2 0.23 perf-profile.self.cycles-pp.fpregs_assert_state_consistent
0.00 +0.2 0.24 ± 3% perf-profile.self.cycles-pp.lru_add_drain_cpu
0.00 +0.3 0.26 perf-profile.self.cycles-pp.__syscall_return_slowpath
0.00 +0.3 0.26 ± 4% perf-profile.self.cycles-pp.vmacache_update
0.00 +0.3 0.26 perf-profile.self.cycles-pp.cap_mmap_file
0.00 +0.3 0.26 ± 3% perf-profile.self.cycles-pp.up_write
0.00 +0.3 0.27 perf-profile.self.cycles-pp.ksys_mmap_pgoff
0.00 +0.3 0.27 perf-profile.self.cycles-pp.unmap_region
0.00 +0.3 0.28 perf-profile.self.cycles-pp.strlcpy
0.00 +0.3 0.28 ± 3% perf-profile.self.cycles-pp.__x64_sys_mmap
0.00 +0.3 0.29 ± 3% perf-profile.self.cycles-pp.vma_link
0.00 +0.3 0.29 perf-profile.self.cycles-pp.__prepare_exit_to_usermode
0.00 +0.3 0.31 ± 2% perf-profile.self.cycles-pp.cap_mmap_addr
0.00 +0.3 0.31 perf-profile.self.cycles-pp.__vm_munmap
0.00 +0.3 0.35 perf-profile.self.cycles-pp.downgrade_write
0.00 +0.4 0.36 ± 2% perf-profile.self.cycles-pp.memcpy_erms
0.00 +0.4 0.36 ± 2% perf-profile.self.cycles-pp.apparmor_mmap_file
0.00 +0.4 0.37 perf-profile.self.cycles-pp.__x64_sys_munmap
0.00 +0.4 0.38 perf-profile.self.cycles-pp.unmap_vmas
0.21 +0.4 0.59 perf-profile.self.cycles-pp.kmem_cache_alloc
0.00 +0.4 0.38 ± 2% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.00 +0.4 0.39 perf-profile.self.cycles-pp.tlb_finish_mmu
0.00 +0.4 0.39 ± 3% perf-profile.self.cycles-pp.get_unmapped_area
0.00 +0.4 0.39 perf-profile.self.cycles-pp.tlb_gather_mmu
0.00 +0.4 0.40 perf-profile.self.cycles-pp.vm_mmap_pgoff
0.00 +0.5 0.46 perf-profile.self.cycles-pp.__x86_retpoline_rax
0.00 +0.5 0.48 perf-profile.self.cycles-pp.free_pgd_range
0.00 +0.5 0.49 perf-profile.self.cycles-pp.kmem_cache_free
0.00 +0.5 0.50 ± 2% perf-profile.self.cycles-pp.security_mmap_file
0.00 +0.5 0.52 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.00 +0.5 0.54 perf-profile.self.cycles-pp.down_write_killable
0.00 +0.6 0.58 perf-profile.self.cycles-pp.munmap
0.00 +0.7 0.68 perf-profile.self.cycles-pp.mmap64
0.00 +0.7 0.68 perf-profile.self.cycles-pp.__might_sleep
0.00 +0.7 0.69 perf-profile.self.cycles-pp.__vma_rb_erase
0.00 +0.7 0.71 perf-profile.self.cycles-pp.vm_area_alloc
0.00 +0.8 0.78 perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
0.00 +0.9 0.88 perf-profile.self.cycles-pp.do_mmap
0.00 +1.0 0.97 perf-profile.self.cycles-pp.vmacache_find
0.00 +1.1 1.10 perf-profile.self.cycles-pp.vm_unmapped_area
0.01 ±173% +1.1 1.12 ± 2% perf-profile.self.cycles-pp.perf_iterate_sb
0.00 +1.2 1.22 perf-profile.self.cycles-pp.perf_event_mmap_output
0.00 +1.3 1.26 perf-profile.self.cycles-pp.mmap_region
0.06 +1.3 1.39 perf-profile.self.cycles-pp.find_vma
0.03 ±100% +1.3 1.36 perf-profile.self.cycles-pp.perf_event_mmap
0.05 ± 8% +1.6 1.60 perf-profile.self.cycles-pp.__do_munmap
0.00 +1.6 1.61 perf-profile.self.cycles-pp.__vma_link_rb
0.07 +2.3 2.39 perf-profile.self.cycles-pp.rcu_all_qs
0.10 ± 4% +2.6 2.69 perf-profile.self.cycles-pp._cond_resched
0.24 +5.8 6.04 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.27 +6.6 6.91 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.25 +7.2 7.46 perf-profile.self.cycles-pp.free_p4d_range
0.47 +8.8 9.28 perf-profile.self.cycles-pp.do_syscall_64
0.51 +13.7 14.24 perf-profile.self.cycles-pp.___might_sleep
0.56 ± 2% +16.9 17.48 perf-profile.self.cycles-pp.unmap_page_range
will-it-scale.per_process_ops
250000 +------------------------------------------------------------------+
| |
| |
200000 |-+ |
| |
| |
150000 |-+ |
| |
100000 |-+ |
| |
| |
50000 |-+ |
| |
|+.++.++.+++.++.++.+++.++.++.+++.++.+++.++.++.+++.++.++.+++.++.++.+|
0 +------------------------------------------------------------------+
will-it-scale.workload
3e+07 +-----------------------------------------------------------------+
| |
2.5e+07 |O+OO OOO OO OOO OO O |
| |
| |
2e+07 |-+ |
| |
1.5e+07 |-+ |
| |
1e+07 |-+ |
| |
| |
5e+06 |-+ |
|+.++.+++.++.+++.++.+++.++.+++.++.++.+++.++.+++.++.+++.++.+++.++.+|
0 +-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
Andrew, I do not see these patches in mmotm tree. Is there anything
blocking them? There used to be v3 in the tree
(http://lkml.kernel.org/r/[email protected])
but that one got dropped due some failures. I haven't seen any failures
for this one.
On Sun 21-06-20 15:36:37, Feng Tang wrote:
> When checking a performance change for will-it-scale scalability
> mmap test [1], we found very high lock contention for spinlock of
> percpu counter 'vm_committed_as':
>
> 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
> 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;
>
> Actually this heavy lock contention is not always necessary. The
> 'vm_committed_as' needs to be very precise when the strict
> OVERCOMMIT_NEVER policy is set, which requires a rather small batch
> number for the percpu counter.
>
> So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy,
> and enlarge it for not-so-strict OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS
> policies.
>
> Benchmark with the same testcase in [1] shows 53% improvement on a
> 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. And for that
> case, whether it shows improvements depends on if the test mmap size
> is bigger than the batch number computed.
>
> We tested 10+ platforms in 0day (server, desktop and laptop). If we
> lift it to 64X, 80%+ platforms show improvements, and for 16X lift,
> 1/3 of the platforms will show improvements.
>
> And generally it should help the mmap/unmap usage,as Michal Hocko
> mentioned:
>
> : I believe that there are non-synthetic worklaods which would benefit
> : from a larger batch. E.g. large in memory databases which do large
> : mmaps during startups from multiple threads.
>
> Note: There are some style complain from checkpatch for patch 3,
> as sysctl handler declaration follows the similar format of sibling
> functions
>
> [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
>
> patch1: a cleanup for /proc/meminfo
> patch2: a preparation patch which also improve the accuracy of
> vm_memory_committed
> patch3: main change
>
> This is against today's linux-mm git tree on github.
>
> Please help to review, thanks!
>
> - Feng
>
> ----------------------------------------------------------------
> Changelog:
>
> v5:
> * rebase after 5.8-rc1
> * remove the 3/4 patch in v4 which is merged in v5.7
> * add code comments for vm_memory_committed()
>
> v4:
> * Remove the VM_WARN_ONCE check for vm_committed_as underflow,
> thanks to Qian Cai for finding and testing the warning
>
> v3:
> * refine commit log and cleanup code, according to comments
> from Michal Hocko and Matthew Wilcox
> * change the lift from 16X and 64X after test
>
> v2:
> * add the sysctl handler to cover runtime overcommit policy
> change, as suggested by Andres Morton
> * address the accuracy concern of vm_memory_committed()
> from Andi Kleen
>
> Feng Tang (3):
> proc/meminfo: avoid open coded reading of vm_committed_as
> mm/util.c: make vm_memory_committed() more accurate
> mm: adjust vm_committed_as_batch according to vm overcommit policy
>
> fs/proc/meminfo.c | 2 +-
> include/linux/mm.h | 2 ++
> include/linux/mman.h | 4 ++++
> kernel/sysctl.c | 2 +-
> mm/mm_init.c | 18 ++++++++++++++----
> mm/util.c | 19 ++++++++++++++++++-
> 6 files changed, 40 insertions(+), 7 deletions(-)
>
> --
> 2.7.4
>
>
--
Michal Hocko
SUSE Labs