2023-02-09 19:31:23

by Roman Kagan

[permalink] [raw]
Subject: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

From: Zhang Qiao <[email protected]>

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
to the base level (around cfs_rq->min_vruntime), so that the entity
doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its
vruntime may get too far behind (e.g. while cfs_rq was executing a
low-weight hog), which can inverse the vruntime comparison due to s64
overflow. This results in the entity being placed with its original
vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it
didn't execute for longer than the time that can lead to an overflow.

Signed-off-by: Zhang Qiao <[email protected]>
[rkagan: formatted, adjusted commit log, comments, cutoff value]
Co-developed-by: Roman Kagan <[email protected]>
Signed-off-by: Roman Kagan <[email protected]>
---
v2 -> v3:
- make cutoff less arbitrary and update comments [Vincent]

v1 -> v2:
- add Zhang Qiao's s-o-b
- fix constant promotion on 32bit

kernel/sched/fair.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0f8736991427..3baa6b7ea860 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4656,6 +4656,7 @@ static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
u64 vruntime = cfs_rq->min_vruntime;
+ u64 sleep_time;

/*
* The 'current' period is already promised to the current tasks,
@@ -4685,8 +4686,24 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
vruntime -= thresh;
}

- /* ensure we never gain time by being placed backwards. */
- se->vruntime = max_vruntime(se->vruntime, vruntime);
+ /*
+ * Pull vruntime of the entity being placed to the base level of
+ * cfs_rq, to prevent boosting it if placed backwards.
+ * However, min_vruntime can advance much faster than real time, with
+ * the exterme being when an entity with the minimal weight always runs
+ * on the cfs_rq. If the new entity slept for long, its vruntime
+ * difference from min_vruntime may overflow s64 and their comparison
+ * may get inversed, so ignore the entity's original vruntime in that
+ * case.
+ * The maximal vruntime speedup is given by the ratio of normal to
+ * minimal weight: NICE_0_LOAD / MIN_SHARES, so cutting off on the
+ * sleep time of 2^63 / NICE_0_LOAD should be safe.
+ */
+ sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+ if ((s64)sleep_time > (1ULL << 63) / NICE_0_LOAD)
+ se->vruntime = vruntime;
+ else
+ se->vruntime = max_vruntime(se->vruntime, vruntime);
}

static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
--
2.34.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879





2023-02-21 09:39:13

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Thu, 9 Feb 2023 at 20:31, Roman Kagan <[email protected]> wrote:
>
> From: Zhang Qiao <[email protected]>
>
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
>
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow. This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
>
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for longer than the time that can lead to an overflow.
>
> Signed-off-by: Zhang Qiao <[email protected]>
> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Co-developed-by: Roman Kagan <[email protected]>
> Signed-off-by: Roman Kagan <[email protected]>

Reviewed-by: Vincent Guittot <[email protected]>

> ---
> v2 -> v3:
> - make cutoff less arbitrary and update comments [Vincent]
>
> v1 -> v2:
> - add Zhang Qiao's s-o-b
> - fix constant promotion on 32bit
>
> kernel/sched/fair.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0f8736991427..3baa6b7ea860 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4656,6 +4656,7 @@ static void
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> u64 vruntime = cfs_rq->min_vruntime;
> + u64 sleep_time;
>
> /*
> * The 'current' period is already promised to the current tasks,
> @@ -4685,8 +4686,24 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime -= thresh;
> }
>
> - /* ensure we never gain time by being placed backwards. */
> - se->vruntime = max_vruntime(se->vruntime, vruntime);
> + /*
> + * Pull vruntime of the entity being placed to the base level of
> + * cfs_rq, to prevent boosting it if placed backwards.
> + * However, min_vruntime can advance much faster than real time, with
> + * the exterme being when an entity with the minimal weight always runs
> + * on the cfs_rq. If the new entity slept for long, its vruntime
> + * difference from min_vruntime may overflow s64 and their comparison
> + * may get inversed, so ignore the entity's original vruntime in that
> + * case.
> + * The maximal vruntime speedup is given by the ratio of normal to
> + * minimal weight: NICE_0_LOAD / MIN_SHARES, so cutting off on the
> + * sleep time of 2^63 / NICE_0_LOAD should be safe.
> + */
> + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> + if ((s64)sleep_time > (1ULL << 63) / NICE_0_LOAD)
> + se->vruntime = vruntime;
> + else
> + se->vruntime = max_vruntime(se->vruntime, vruntime);
> }
>
> static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
> --
> 2.34.1
>
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

2023-02-21 16:57:49

by Roman Kagan

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Tue, Feb 21, 2023 at 10:38:44AM +0100, Vincent Guittot wrote:
> On Thu, 9 Feb 2023 at 20:31, Roman Kagan <[email protected]> wrote:
> >
> > From: Zhang Qiao <[email protected]>
> >
> > When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> > to the base level (around cfs_rq->min_vruntime), so that the entity
> > doesn't gain extra boost when placed backwards.
> >
> > However, if the entity being placed wasn't executed for a long time, its
> > vruntime may get too far behind (e.g. while cfs_rq was executing a
> > low-weight hog), which can inverse the vruntime comparison due to s64
> > overflow. This results in the entity being placed with its original
> > vruntime way forwards, so that it will effectively never get to the cpu.
> >
> > To prevent that, ignore the vruntime of the entity being placed if it
> > didn't execute for longer than the time that can lead to an overflow.
> >
> > Signed-off-by: Zhang Qiao <[email protected]>
> > [rkagan: formatted, adjusted commit log, comments, cutoff value]
> > Co-developed-by: Roman Kagan <[email protected]>
> > Signed-off-by: Roman Kagan <[email protected]>
>
> Reviewed-by: Vincent Guittot <[email protected]>
>
> > ---
> > v2 -> v3:
> > - make cutoff less arbitrary and update comments [Vincent]
> >
> > v1 -> v2:
> > - add Zhang Qiao's s-o-b
> > - fix constant promotion on 32bit
> >
> > kernel/sched/fair.c | 21 +++++++++++++++++++--
> > 1 file changed, 19 insertions(+), 2 deletions(-)

Turns out Peter took v2 through his tree, and it has already landed in
Linus' master.

What scares me, though, is that I've got a message from the test robot
that this commit drammatically affected hackbench results, see the quote
below. I expected the commit not to affect any benchmarks.

Any idea what could have caused this change?

Thanks,
Roman.


On Tue, Feb 21, 2023 at 03:34:16PM +0800, kernel test robot wrote:
> FYI, we noticed a 125.5% improvement of hackbench.throughput due to commit:
>
> commit: 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ("sched/fair: sanitize vruntime of entity being placed")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: hackbench
> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> with following parameters:
>
> nr_threads: 50%
> iterations: 8
> mode: process
> ipc: pipe
> cpufreq_governor: performance
>
> test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+--------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput -8.1% regression |
> | test machine | 104 threads 2 sockets (Skylake) with 192G memory |
> | test parameters | cpufreq_governor=performance |
> | | ipc=socket |
> | | iterations=4 |
> | | mode=process |
> | | nr_threads=100% |
> +------------------+--------------------------------------------------+
>
> Details are as below:
>
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> gcc-11/performance/pipe/8/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/hackbench
>
> commit:
> a2e90611b9 ("sched/fair: Remove capacity inversion detection")
> 829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
>
> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 308887 ? 5% +125.5% 696539 hackbench.throughput
> 259291 ? 2% +127.3% 589293 hackbench.throughput_avg
> 308887 ? 5% +125.5% 696539 hackbench.throughput_best
> 198770 ? 2% +105.5% 408552 ? 4% hackbench.throughput_worst
> 319.60 ? 2% -55.8% 141.24 hackbench.time.elapsed_time
> 319.60 ? 2% -55.8% 141.24 hackbench.time.elapsed_time.max
> 1.298e+09 ? 8% -87.6% 1.613e+08 ? 7% hackbench.time.involuntary_context_switches
> 477107 -12.5% 417660 hackbench.time.minor_page_faults
> 24683 ? 2% -57.2% 10562 hackbench.time.system_time
> 2136 ? 3% -45.0% 1174 hackbench.time.user_time
> 3.21e+09 ? 4% -83.0% 5.442e+08 ? 3% hackbench.time.voluntary_context_switches
> 5.28e+08 ? 4% +8.4% 5.723e+08 ? 3% cpuidle..time
> 365.97 ? 2% -48.9% 187.12 uptime.boot
> 3322559 ? 3% +34.3% 4463206 ? 15% vmstat.memory.cache
> 14194257 ? 2% -62.8% 5279904 ? 3% vmstat.system.cs
> 2120781 ? 3% -72.8% 576421 ? 4% vmstat.system.in
> 1.84 ? 12% +2.6 4.48 ? 5% mpstat.cpu.all.idle%
> 2.49 ? 3% -1.1 1.39 ? 4% mpstat.cpu.all.irq%
> 0.04 ? 12% +0.0 0.05 mpstat.cpu.all.soft%
> 7.36 +2.2 9.56 mpstat.cpu.all.usr%
> 61555 ? 6% -72.8% 16751 ? 16% numa-meminfo.node1.Active
> 61515 ? 6% -72.8% 16717 ? 16% numa-meminfo.node1.Active(anon)
> 960182 ?102% +225.6% 3125990 ? 42% numa-meminfo.node1.FilePages
> 1754002 ? 53% +137.9% 4173379 ? 34% numa-meminfo.node1.MemUsed
> 35296824 ? 6% +157.8% 91005048 numa-numastat.node0.local_node
> 35310119 ? 6% +157.9% 91058472 numa-numastat.node0.numa_hit
> 35512423 ? 5% +159.7% 92232951 numa-numastat.node1.local_node
> 35577275 ? 4% +159.4% 92273266 numa-numastat.node1.numa_hit
> 35310253 ? 6% +157.9% 91058211 numa-vmstat.node0.numa_hit
> 35296958 ? 6% +157.8% 91004787 numa-vmstat.node0.numa_local
> 15337 ? 6% -72.5% 4216 ? 17% numa-vmstat.node1.nr_active_anon
> 239988 ?102% +225.7% 781607 ? 42% numa-vmstat.node1.nr_file_pages
> 15337 ? 6% -72.5% 4216 ? 17% numa-vmstat.node1.nr_zone_active_anon
> 35577325 ? 4% +159.4% 92273215 numa-vmstat.node1.numa_hit
> 35512473 ? 5% +159.7% 92232900 numa-vmstat.node1.numa_local
> 64500 ? 8% -61.8% 24643 ? 32% meminfo.Active
> 64422 ? 8% -61.9% 24568 ? 32% meminfo.Active(anon)
> 140271 ? 14% -38.0% 86979 ? 24% meminfo.AnonHugePages
> 372672 ? 2% +13.3% 422069 meminfo.AnonPages
> 3205235 ? 3% +35.1% 4329061 ? 15% meminfo.Cached
> 1548601 ? 7% +77.4% 2747319 ? 24% meminfo.Committed_AS
> 783193 ? 14% +154.9% 1996137 ? 33% meminfo.Inactive
> 783010 ? 14% +154.9% 1995951 ? 33% meminfo.Inactive(anon)
> 4986534 ? 2% +28.2% 6394741 ? 10% meminfo.Memused
> 475092 ? 22% +236.5% 1598918 ? 41% meminfo.Shmem
> 2777 -2.1% 2719 turbostat.Bzy_MHz
> 11143123 ? 6% +72.0% 19162667 turbostat.C1
> 0.24 ? 7% +0.7 0.94 ? 3% turbostat.C1%
> 100440 ? 18% +203.8% 305136 ? 15% turbostat.C1E
> 0.06 ? 9% +0.1 0.18 ? 11% turbostat.C1E%
> 1.24 ? 3% +1.6 2.81 ? 4% turbostat.C6%
> 1.38 ? 3% +156.1% 3.55 ? 3% turbostat.CPU%c1
> 0.33 ? 5% +76.5% 0.58 ? 7% turbostat.CPU%c6
> 0.16 +31.2% 0.21 turbostat.IPC
> 6.866e+08 ? 5% -87.8% 83575393 ? 5% turbostat.IRQ
> 0.33 ? 27% +0.2 0.57 turbostat.POLL%
> 0.12 ? 10% +176.4% 0.33 ? 12% turbostat.Pkg%pc2
> 0.09 ? 7% -100.0% 0.00 turbostat.Pkg%pc6
> 61.33 +5.2% 64.50 ? 2% turbostat.PkgTmp
> 14.81 +2.0% 15.11 turbostat.RAMWatt
> 16242 ? 8% -62.0% 6179 ? 32% proc-vmstat.nr_active_anon
> 93150 ? 2% +13.2% 105429 proc-vmstat.nr_anon_pages
> 801219 ? 3% +35.1% 1082320 ? 15% proc-vmstat.nr_file_pages
> 195506 ? 14% +155.2% 498919 ? 33% proc-vmstat.nr_inactive_anon
> 118682 ? 22% +236.9% 399783 ? 41% proc-vmstat.nr_shmem
> 16242 ? 8% -62.0% 6179 ? 32% proc-vmstat.nr_zone_active_anon
> 195506 ? 14% +155.2% 498919 ? 33% proc-vmstat.nr_zone_inactive_anon
> 70889233 ? 5% +158.6% 1.833e+08 proc-vmstat.numa_hit
> 70811086 ? 5% +158.8% 1.832e+08 proc-vmstat.numa_local
> 55885 ? 22% -67.2% 18327 ? 38% proc-vmstat.numa_pages_migrated
> 422312 ? 10% -95.4% 19371 ? 7% proc-vmstat.pgactivate
> 71068460 ? 5% +158.1% 1.834e+08 proc-vmstat.pgalloc_normal
> 1554994 -19.6% 1250346 ? 4% proc-vmstat.pgfault
> 71011267 ? 5% +155.9% 1.817e+08 proc-vmstat.pgfree
> 55885 ? 22% -67.2% 18327 ? 38% proc-vmstat.pgmigrate_success
> 111247 ? 2% -35.0% 72355 ? 2% proc-vmstat.pgreuse
> 2506368 ? 2% -53.1% 1176320 proc-vmstat.unevictable_pgs_scanned
> 20.06 ? 10% -22.4% 15.56 ? 8% sched_debug.cfs_rq:/.h_nr_running.max
> 0.81 ? 32% -93.1% 0.06 ?223% sched_debug.cfs_rq:/.h_nr_running.min
> 1917 ? 34% -100.0% 0.00 sched_debug.cfs_rq:/.load.min
> 24.18 ? 10% +39.0% 33.62 ? 11% sched_debug.cfs_rq:/.load_avg.avg
> 245.61 ? 25% +66.3% 408.33 ? 22% sched_debug.cfs_rq:/.load_avg.max
> 47.52 ? 13% +72.6% 82.03 ? 8% sched_debug.cfs_rq:/.load_avg.stddev
> 13431147 -64.9% 4717147 sched_debug.cfs_rq:/.min_vruntime.avg
> 18161799 ? 7% -67.4% 5925316 ? 6% sched_debug.cfs_rq:/.min_vruntime.max
> 12413026 -65.0% 4340952 sched_debug.cfs_rq:/.min_vruntime.min
> 739748 ? 16% -66.6% 247410 ? 17% sched_debug.cfs_rq:/.min_vruntime.stddev
> 0.85 -16.4% 0.71 sched_debug.cfs_rq:/.nr_running.avg
> 0.61 ? 25% -90.9% 0.06 ?223% sched_debug.cfs_rq:/.nr_running.min
> 0.10 ? 25% +109.3% 0.22 ? 7% sched_debug.cfs_rq:/.nr_running.stddev
> 169.22 +101.7% 341.33 sched_debug.cfs_rq:/.removed.load_avg.max
> 32.41 ? 24% +100.2% 64.90 ? 16% sched_debug.cfs_rq:/.removed.load_avg.stddev
> 82.92 ? 10% +108.1% 172.56 sched_debug.cfs_rq:/.removed.runnable_avg.max
> 13.60 ? 28% +114.0% 29.10 ? 20% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> 82.92 ? 10% +108.1% 172.56 sched_debug.cfs_rq:/.removed.util_avg.max
> 13.60 ? 28% +114.0% 29.10 ? 20% sched_debug.cfs_rq:/.removed.util_avg.stddev
> 2156 ? 12% -36.6% 1368 ? 27% sched_debug.cfs_rq:/.runnable_avg.min
> 2285 ? 7% -19.8% 1833 ? 6% sched_debug.cfs_rq:/.runnable_avg.stddev
> -2389921 -64.8% -840940 sched_debug.cfs_rq:/.spread0.min
> 739781 ? 16% -66.5% 247837 ? 17% sched_debug.cfs_rq:/.spread0.stddev
> 843.88 ? 2% -20.5% 670.53 sched_debug.cfs_rq:/.util_avg.avg
> 433.64 ? 7% -43.5% 244.83 ? 17% sched_debug.cfs_rq:/.util_avg.min
> 187.00 ? 6% +40.6% 263.02 ? 4% sched_debug.cfs_rq:/.util_avg.stddev
> 394.15 ? 14% -29.5% 278.06 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.avg
> 1128 ? 12% -17.6% 930.39 ? 5% sched_debug.cfs_rq:/.util_est_enqueued.max
> 38.36 ? 29% -100.0% 0.00 sched_debug.cfs_rq:/.util_est_enqueued.min
> 3596 ? 15% -39.5% 2175 ? 7% sched_debug.cpu.avg_idle.min
> 160647 ? 9% -25.9% 118978 ? 9% sched_debug.cpu.avg_idle.stddev
> 197365 -46.2% 106170 sched_debug.cpu.clock.avg
> 197450 -46.2% 106208 sched_debug.cpu.clock.max
> 197281 -46.2% 106128 sched_debug.cpu.clock.min
> 49.96 ? 22% -53.1% 23.44 ? 19% sched_debug.cpu.clock.stddev
> 193146 -45.7% 104898 sched_debug.cpu.clock_task.avg
> 194592 -45.8% 105455 sched_debug.cpu.clock_task.max
> 177878 -49.3% 90211 sched_debug.cpu.clock_task.min
> 1794 ? 5% -10.7% 1602 ? 2% sched_debug.cpu.clock_task.stddev
> 13154 ? 2% -20.3% 10479 sched_debug.cpu.curr->pid.avg
> 15059 -17.2% 12468 sched_debug.cpu.curr->pid.max
> 7263 ? 33% -100.0% 0.00 sched_debug.cpu.curr->pid.min
> 9321 ? 36% +98.2% 18478 ? 44% sched_debug.cpu.max_idle_balance_cost.stddev
> 0.00 ? 17% -41.6% 0.00 ? 13% sched_debug.cpu.next_balance.stddev
> 20.00 ? 11% -21.4% 15.72 ? 7% sched_debug.cpu.nr_running.max
> 0.86 ? 17% -87.1% 0.11 ?141% sched_debug.cpu.nr_running.min
> 25069883 -83.7% 4084117 ? 4% sched_debug.cpu.nr_switches.avg
> 26486718 -82.8% 4544009 ? 4% sched_debug.cpu.nr_switches.max
> 23680077 -84.5% 3663816 ? 4% sched_debug.cpu.nr_switches.min
> 589836 ? 3% -68.7% 184621 ? 16% sched_debug.cpu.nr_switches.stddev
> 197278 -46.2% 106128 sched_debug.cpu_clk
> 194327 -46.9% 103176 sched_debug.ktime
> 197967 -46.0% 106821 sched_debug.sched_clk
> 14.91 -37.6% 9.31 perf-stat.i.MPKI
> 2.657e+10 +25.0% 3.32e+10 perf-stat.i.branch-instructions
> 1.17 -0.4 0.78 perf-stat.i.branch-miss-rate%
> 3.069e+08 -20.1% 2.454e+08 perf-stat.i.branch-misses
> 6.43 ? 8% +2.2 8.59 ? 4% perf-stat.i.cache-miss-rate%
> 1.952e+09 -24.3% 1.478e+09 perf-stat.i.cache-references
> 14344055 ? 2% -58.6% 5932018 ? 3% perf-stat.i.context-switches
> 1.83 -21.8% 1.43 perf-stat.i.cpi
> 2.403e+11 -3.4% 2.322e+11 perf-stat.i.cpu-cycles
> 1420139 ? 2% -38.8% 869692 ? 5% perf-stat.i.cpu-migrations
> 2619 ? 7% -15.5% 2212 ? 8% perf-stat.i.cycles-between-cache-misses
> 0.24 ? 19% -0.1 0.10 ? 17% perf-stat.i.dTLB-load-miss-rate%
> 90403286 ? 19% -55.8% 39926283 ? 16% perf-stat.i.dTLB-load-misses
> 3.823e+10 +28.6% 4.918e+10 perf-stat.i.dTLB-loads
> 0.01 ? 34% -0.0 0.01 ? 33% perf-stat.i.dTLB-store-miss-rate%
> 2779663 ? 34% -52.7% 1315899 ? 31% perf-stat.i.dTLB-store-misses
> 2.19e+10 +24.2% 2.72e+10 perf-stat.i.dTLB-stores
> 47.99 ? 2% +28.0 75.94 perf-stat.i.iTLB-load-miss-rate%
> 89417955 ? 2% +38.7% 1.24e+08 ? 4% perf-stat.i.iTLB-load-misses
> 97721514 ? 2% -58.2% 40865783 ? 3% perf-stat.i.iTLB-loads
> 1.329e+11 +26.3% 1.678e+11 perf-stat.i.instructions
> 1503 -7.7% 1388 ? 3% perf-stat.i.instructions-per-iTLB-miss
> 0.55 +30.2% 0.72 perf-stat.i.ipc
> 1.64 ? 18% +217.4% 5.20 ? 11% perf-stat.i.major-faults
> 2.73 -3.7% 2.63 perf-stat.i.metric.GHz
> 1098 ? 2% -7.1% 1020 ? 3% perf-stat.i.metric.K/sec
> 1008 +24.4% 1254 perf-stat.i.metric.M/sec
> 4334 ? 2% +90.5% 8257 ? 7% perf-stat.i.minor-faults
> 90.94 -14.9 75.99 perf-stat.i.node-load-miss-rate%
> 41932510 ? 8% -43.0% 23899176 ? 10% perf-stat.i.node-load-misses
> 3366677 ? 5% +86.2% 6267816 perf-stat.i.node-loads
> 81.77 ? 3% -36.3 45.52 ? 3% perf-stat.i.node-store-miss-rate%
> 18498318 ? 7% -31.8% 12613933 ? 7% perf-stat.i.node-store-misses
> 3023556 ? 10% +508.7% 18405880 ? 2% perf-stat.i.node-stores
> 4336 ? 2% +90.5% 8262 ? 7% perf-stat.i.page-faults
> 14.70 -41.2% 8.65 perf-stat.overall.MPKI
> 1.16 -0.4 0.72 perf-stat.overall.branch-miss-rate%
> 6.22 ? 7% +2.4 8.59 ? 4% perf-stat.overall.cache-miss-rate%
> 1.81 -24.3% 1.37 perf-stat.overall.cpi
> 0.24 ? 19% -0.2 0.07 ? 15% perf-stat.overall.dTLB-load-miss-rate%
> 0.01 ? 34% -0.0 0.00 ? 29% perf-stat.overall.dTLB-store-miss-rate%
> 47.78 ? 2% +29.3 77.12 perf-stat.overall.iTLB-load-miss-rate%
> 1486 -9.1% 1351 ? 4% perf-stat.overall.instructions-per-iTLB-miss
> 0.55 +32.0% 0.73 perf-stat.overall.ipc
> 92.54 -15.4 77.16 ? 2% perf-stat.overall.node-load-miss-rate%
> 85.82 ? 2% -48.1 37.76 ? 5% perf-stat.overall.node-store-miss-rate%
> 2.648e+10 +25.2% 3.314e+10 perf-stat.ps.branch-instructions
> 3.06e+08 -22.1% 2.383e+08 perf-stat.ps.branch-misses
> 1.947e+09 -25.5% 1.451e+09 perf-stat.ps.cache-references
> 14298713 ? 2% -62.5% 5359285 ? 3% perf-stat.ps.context-switches
> 2.396e+11 -4.0% 2.299e+11 perf-stat.ps.cpu-cycles
> 1415512 ? 2% -42.2% 817981 ? 4% perf-stat.ps.cpu-migrations
> 90073948 ? 19% -60.4% 35711862 ? 15% perf-stat.ps.dTLB-load-misses
> 3.811e+10 +29.7% 4.944e+10 perf-stat.ps.dTLB-loads
> 2767291 ? 34% -56.3% 1210210 ? 29% perf-stat.ps.dTLB-store-misses
> 2.183e+10 +25.0% 2.729e+10 perf-stat.ps.dTLB-stores
> 89118809 ? 2% +39.6% 1.244e+08 ? 4% perf-stat.ps.iTLB-load-misses
> 97404381 ? 2% -62.2% 36860047 ? 3% perf-stat.ps.iTLB-loads
> 1.324e+11 +26.7% 1.678e+11 perf-stat.ps.instructions
> 1.62 ? 18% +164.7% 4.29 ? 8% perf-stat.ps.major-faults
> 4310 ? 2% +75.1% 7549 ? 5% perf-stat.ps.minor-faults
> 41743097 ? 8% -47.3% 21984450 ? 9% perf-stat.ps.node-load-misses
> 3356259 ? 5% +92.6% 6462631 perf-stat.ps.node-loads
> 18414647 ? 7% -35.7% 11833799 ? 6% perf-stat.ps.node-store-misses
> 3019790 ? 10% +545.0% 19478071 perf-stat.ps.node-stores
> 4312 ? 2% +75.2% 7553 ? 5% perf-stat.ps.page-faults
> 4.252e+13 -43.7% 2.395e+13 perf-stat.total.instructions
> 29.92 ? 4% -22.8 7.09 ? 29% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> 28.53 ? 5% -21.6 6.92 ? 29% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write.ksys_write
> 27.86 ? 5% -21.1 6.77 ? 29% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write
> 27.55 ? 5% -20.9 6.68 ? 29% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
> 22.28 ? 4% -17.0 5.31 ? 30% perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
> 21.98 ? 4% -16.7 5.24 ? 30% perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
> 12.62 ? 4% -9.6 3.00 ? 33% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 34.09 -9.2 24.92 ? 3% perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 11.48 ? 5% -8.8 2.69 ? 38% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 9.60 ? 7% -7.2 2.40 ? 35% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.vfs_read
> 36.39 -6.2 30.20 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 40.40 -6.1 34.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 40.95 -5.7 35.26 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
> 37.43 -5.4 32.07 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 6.30 ? 11% -5.2 1.09 ? 36% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 5.66 ? 12% -5.1 0.58 ? 75% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.46 ? 10% -5.1 1.40 ? 28% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 5.53 ? 13% -5.0 0.56 ? 75% perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> 5.42 ? 13% -4.9 0.56 ? 75% perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> 5.82 ? 9% -4.7 1.10 ? 37% perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 5.86 ? 16% -4.6 1.31 ? 37% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 5.26 ? 9% -4.4 0.89 ? 57% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 45.18 -3.5 41.68 perf-profile.calltrace.cycles-pp.__libc_read
> 50.31 -3.2 47.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 4.00 ? 27% -2.9 1.09 ? 40% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read
> 50.75 -2.7 48.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
> 40.80 -2.6 38.20 perf-profile.calltrace.cycles-pp.pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.10 ? 15% -2.5 0.62 ?103% perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_task_fair.__schedule.schedule.pipe_read
> 2.94 ? 12% -2.3 0.62 ?102% perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 2.38 ? 9% -2.0 0.38 ?102% perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule.pipe_read.vfs_read
> 2.24 ? 7% -1.8 0.40 ? 71% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.vfs_read.ksys_read.do_syscall_64
> 2.08 ? 6% -1.8 0.29 ?100% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.pipe_read.vfs_read
> 2.10 ? 10% -1.8 0.32 ?104% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule.pipe_read
> 2.76 ? 7% -1.5 1.24 ? 17% perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 2.27 ? 5% -1.4 0.88 ? 11% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 2.43 ? 7% -1.3 1.16 ? 17% perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 2.46 ? 5% -1.3 1.20 ? 7% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 1.54 ? 5% -1.2 0.32 ?101% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> 0.97 ? 9% -0.3 0.66 ? 19% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
> 0.86 ? 6% +0.2 1.02 perf-profile.calltrace.cycles-pp.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
> 0.64 ? 9% +0.5 1.16 ? 5% perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.47 ? 45% +0.5 0.99 ? 5% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.60 ? 8% +0.5 1.13 ? 5% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 0.00 +0.5 0.54 ? 5% perf-profile.calltrace.cycles-pp.current_time.file_update_time.pipe_write.vfs_write.ksys_write
> 0.00 +0.6 0.56 ? 4% perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write
> 0.00 +0.6 0.56 ? 7% perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read
> 0.00 +0.6 0.58 ? 5% perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_write.vfs_write.ksys_write
> 0.00 +0.6 0.62 ? 3% perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_read.vfs_read.ksys_read
> 0.00 +0.7 0.65 ? 6% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write
> 0.00 +0.7 0.65 ? 7% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 0.57 ? 5% +0.7 1.24 ? 6% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +0.7 0.72 ? 6% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write.ksys_write
> 0.00 +0.8 0.75 ? 6% perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.pipe_write.vfs_write.ksys_write
> 0.74 ? 9% +0.8 1.48 ? 5% perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.vfs_write.ksys_write.do_syscall_64
> 0.63 ? 5% +0.8 1.40 ? 5% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 0.00 +0.8 0.78 ? 19% perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
> 0.00 +0.8 0.78 ? 19% perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record
> 0.00 +0.8 0.78 ? 19% perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record
> 0.00 +0.8 0.80 ? 15% perf-profile.calltrace.cycles-pp.__cmd_record
> 0.00 +0.8 0.82 ? 11% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 0.00 +0.9 0.85 ? 6% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_write.vfs_write.ksys_write.do_syscall_64
> 0.00 +0.9 0.86 ? 4% perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.pipe_read.vfs_read
> 0.00 +0.9 0.87 ? 5% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
> 0.00 +0.9 0.88 ? 5% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
> 0.26 ?100% +1.0 1.22 ? 10% perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_write.vfs_write.ksys_write
> 0.00 +1.0 0.96 ? 6% perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
> 0.27 ?100% +1.0 1.23 ? 10% perf-profile.calltrace.cycles-pp.schedule.pipe_write.vfs_write.ksys_write.do_syscall_64
> 0.00 +1.0 0.97 ? 7% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read
> 0.87 ? 8% +1.1 1.98 ? 5% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
> 0.73 ? 6% +1.1 1.85 ? 5% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
> 0.00 +1.2 1.15 ? 7% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read
> 0.00 +1.2 1.23 ? 6% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read.ksys_read
> 0.00 +1.2 1.24 ? 7% perf-profile.calltrace.cycles-pp.__folio_put.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.48 ? 45% +1.3 1.74 ? 6% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.vfs_read.ksys_read
> 0.60 ? 7% +1.3 1.87 ? 8% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
> 1.23 ? 7% +1.3 2.51 ? 4% perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
> 43.42 +1.3 44.75 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 0.83 ? 7% +1.3 2.17 ? 5% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.98 ? 7% +1.4 2.36 ? 6% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.27 ?100% +1.4 1.70 ? 9% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read.ksys_read
> 0.79 ? 8% +1.4 2.23 ? 6% perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.18 ?141% +1.5 1.63 ? 9% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read
> 0.18 ?141% +1.5 1.67 ? 9% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read
> 0.00 +1.6 1.57 ? 10% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> 0.00 +1.6 1.57 ? 10% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> 1.05 ? 8% +1.7 2.73 ? 6% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin._copy_from_iter.copy_page_from_iter.pipe_write
> 1.84 ? 9% +1.7 3.56 ? 5% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.copy_page_to_iter.pipe_read
> 1.41 ? 9% +1.8 3.17 ? 6% perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
> 0.00 +1.8 1.79 ? 9% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 1.99 ? 9% +2.0 3.95 ? 5% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
> 2.40 ? 7% +2.4 4.82 ? 5% perf-profile.calltrace.cycles-pp._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write.ksys_write
> 0.00 +2.5 2.50 ? 7% perf-profile.calltrace.cycles-pp.__mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> 2.89 ? 8% +2.6 5.47 ? 5% perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.vfs_write.ksys_write.do_syscall_64
> 1.04 ? 30% +2.8 3.86 ? 5% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
> 0.00 +2.9 2.90 ? 11% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 0.00 +2.9 2.91 ? 11% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 0.00 +2.9 2.91 ? 11% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
> 0.85 ? 27% +2.9 3.80 ? 5% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
> 0.00 +3.0 2.96 ? 11% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
> 2.60 ? 9% +3.1 5.74 ? 6% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read.ksys_read
> 2.93 ? 9% +3.7 6.66 ? 5% perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.vfs_read.ksys_read.do_syscall_64
> 1.60 ? 12% +4.6 6.18 ? 7% perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.vfs_write.ksys_write.do_syscall_64
> 2.60 ? 10% +4.6 7.24 ? 5% perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> 28.75 ? 5% -21.6 7.19 ? 28% perf-profile.children.cycles-pp.schedule
> 30.52 ? 4% -21.6 8.97 ? 22% perf-profile.children.cycles-pp.__wake_up_common_lock
> 28.53 ? 6% -21.0 7.56 ? 26% perf-profile.children.cycles-pp.__schedule
> 29.04 ? 5% -20.4 8.63 ? 23% perf-profile.children.cycles-pp.__wake_up_common
> 28.37 ? 5% -19.9 8.44 ? 23% perf-profile.children.cycles-pp.autoremove_wake_function
> 28.08 ? 5% -19.7 8.33 ? 23% perf-profile.children.cycles-pp.try_to_wake_up
> 13.90 ? 2% -10.2 3.75 ? 28% perf-profile.children.cycles-pp.ttwu_do_activate
> 12.66 ? 3% -9.2 3.47 ? 29% perf-profile.children.cycles-pp.enqueue_task_fair
> 34.20 -9.2 25.05 ? 3% perf-profile.children.cycles-pp.pipe_read
> 90.86 -9.1 81.73 perf-profile.children.cycles-pp.do_syscall_64
> 91.80 -8.3 83.49 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 10.28 ? 7% -7.8 2.53 ? 27% perf-profile.children.cycles-pp._raw_spin_lock
> 9.85 ? 7% -6.9 2.92 ? 29% perf-profile.children.cycles-pp.dequeue_task_fair
> 8.69 ? 7% -6.6 2.05 ? 24% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> 8.99 ? 6% -6.2 2.81 ? 16% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 36.46 -6.1 30.34 perf-profile.children.cycles-pp.vfs_read
> 8.38 ? 8% -5.8 2.60 ? 23% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 6.10 ? 11% -5.4 0.66 ? 61% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 37.45 -5.3 32.13 perf-profile.children.cycles-pp.ksys_read
> 6.50 ? 35% -4.9 1.62 ? 61% perf-profile.children.cycles-pp.update_curr
> 6.56 ? 15% -4.6 1.95 ? 57% perf-profile.children.cycles-pp.update_cfs_group
> 6.38 ? 14% -4.5 1.91 ? 28% perf-profile.children.cycles-pp.enqueue_entity
> 5.74 ? 5% -3.8 1.92 ? 25% perf-profile.children.cycles-pp.update_load_avg
> 45.56 -3.8 41.75 perf-profile.children.cycles-pp.__libc_read
> 3.99 ? 4% -3.1 0.92 ? 24% perf-profile.children.cycles-pp.pick_next_task_fair
> 4.12 ? 27% -2.7 1.39 ? 34% perf-profile.children.cycles-pp.dequeue_entity
> 40.88 -2.5 38.37 perf-profile.children.cycles-pp.pipe_write
> 3.11 ? 4% -2.4 0.75 ? 22% perf-profile.children.cycles-pp.switch_mm_irqs_off
> 2.06 ? 33% -1.8 0.27 ? 27% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
> 2.38 ? 41% -1.8 0.60 ? 72% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
> 2.29 ? 5% -1.7 0.60 ? 25% perf-profile.children.cycles-pp.switch_fpu_return
> 2.30 ? 6% -1.6 0.68 ? 18% perf-profile.children.cycles-pp.prepare_task_switch
> 1.82 ? 33% -1.6 0.22 ? 31% perf-profile.children.cycles-pp.sysvec_call_function_single
> 1.77 ? 33% -1.6 0.20 ? 32% perf-profile.children.cycles-pp.__sysvec_call_function_single
> 1.96 ? 5% -1.5 0.50 ? 20% perf-profile.children.cycles-pp.reweight_entity
> 2.80 ? 7% -1.2 1.60 ? 12% perf-profile.children.cycles-pp.select_task_rq
> 1.61 ? 6% -1.2 0.42 ? 25% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> 1.34 ? 9% -1.2 0.16 ? 28% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> 1.62 ? 4% -1.2 0.45 ? 22% perf-profile.children.cycles-pp.set_next_entity
> 1.55 ? 8% -1.1 0.43 ? 12% perf-profile.children.cycles-pp.update_rq_clock
> 1.49 ? 8% -1.1 0.41 ? 14% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> 1.30 ? 20% -1.0 0.26 ? 18% perf-profile.children.cycles-pp.finish_task_switch
> 1.44 ? 5% -1.0 0.42 ? 19% perf-profile.children.cycles-pp.__switch_to_asm
> 2.47 ? 7% -1.0 1.50 ? 12% perf-profile.children.cycles-pp.select_task_rq_fair
> 2.33 ? 7% -0.9 1.40 ? 3% perf-profile.children.cycles-pp.prepare_to_wait_event
> 1.24 ? 7% -0.9 0.35 ? 14% perf-profile.children.cycles-pp.__update_load_avg_se
> 1.41 ? 32% -0.9 0.56 ? 24% perf-profile.children.cycles-pp.sched_ttwu_pending
> 2.29 ? 8% -0.8 1.45 ? 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 1.04 ? 7% -0.8 0.24 ? 22% perf-profile.children.cycles-pp.check_preempt_curr
> 1.01 ? 3% -0.7 0.30 ? 20% perf-profile.children.cycles-pp.__switch_to
> 0.92 ? 7% -0.7 0.26 ? 12% perf-profile.children.cycles-pp.update_min_vruntime
> 0.71 ? 2% -0.6 0.08 ? 75% perf-profile.children.cycles-pp.put_prev_entity
> 0.76 ? 6% -0.6 0.14 ? 32% perf-profile.children.cycles-pp.check_preempt_wakeup
> 0.81 ? 66% -0.6 0.22 ? 34% perf-profile.children.cycles-pp.set_task_cpu
> 0.82 ? 17% -0.6 0.23 ? 10% perf-profile.children.cycles-pp.cpuacct_charge
> 1.08 ? 15% -0.6 0.51 ? 10% perf-profile.children.cycles-pp.wake_affine
> 0.56 ? 15% -0.5 0.03 ?100% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> 0.66 ? 3% -0.5 0.15 ? 28% perf-profile.children.cycles-pp.os_xsave
> 0.52 ? 44% -0.5 0.06 ?151% perf-profile.children.cycles-pp.native_irq_return_iret
> 0.55 ? 5% -0.4 0.15 ? 21% perf-profile.children.cycles-pp.__calc_delta
> 0.56 ? 10% -0.4 0.17 ? 26% perf-profile.children.cycles-pp.___perf_sw_event
> 0.70 ? 15% -0.4 0.32 ? 11% perf-profile.children.cycles-pp.task_h_load
> 0.40 ? 4% -0.3 0.06 ? 49% perf-profile.children.cycles-pp.pick_next_entity
> 0.57 ? 6% -0.3 0.26 ? 7% perf-profile.children.cycles-pp.__list_del_entry_valid
> 0.39 ? 8% -0.3 0.08 ? 24% perf-profile.children.cycles-pp.set_next_buddy
> 0.64 ? 6% -0.3 0.36 ? 6% perf-profile.children.cycles-pp._raw_spin_lock_irq
> 0.53 ? 20% -0.3 0.25 ? 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> 0.36 ? 8% -0.3 0.08 ? 11% perf-profile.children.cycles-pp.rb_insert_color
> 0.41 ? 6% -0.3 0.14 ? 17% perf-profile.children.cycles-pp.sched_clock_cpu
> 0.36 ? 33% -0.3 0.10 ? 17% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 0.37 ? 4% -0.2 0.13 ? 16% perf-profile.children.cycles-pp.native_sched_clock
> 0.28 ? 5% -0.2 0.07 ? 18% perf-profile.children.cycles-pp.rb_erase
> 0.32 ? 7% -0.2 0.12 ? 10% perf-profile.children.cycles-pp.__list_add_valid
> 0.23 ? 6% -0.2 0.03 ?103% perf-profile.children.cycles-pp.resched_curr
> 0.27 ? 5% -0.2 0.08 ? 20% perf-profile.children.cycles-pp.__wrgsbase_inactive
> 0.26 ? 6% -0.2 0.08 ? 17% perf-profile.children.cycles-pp.finish_wait
> 0.26 ? 4% -0.2 0.08 ? 11% perf-profile.children.cycles-pp.rcu_note_context_switch
> 0.33 ? 21% -0.2 0.15 ? 32% perf-profile.children.cycles-pp.migrate_task_rq_fair
> 0.22 ? 9% -0.2 0.07 ? 22% perf-profile.children.cycles-pp.perf_trace_buf_update
> 0.17 ? 8% -0.1 0.03 ?100% perf-profile.children.cycles-pp.rb_next
> 0.15 ? 32% -0.1 0.03 ?100% perf-profile.children.cycles-pp.llist_reverse_order
> 0.34 ? 7% -0.1 0.26 ? 3% perf-profile.children.cycles-pp.anon_pipe_buf_release
> 0.14 ? 6% -0.1 0.07 ? 17% perf-profile.children.cycles-pp.read@plt
> 0.10 ? 17% -0.1 0.04 ? 75% perf-profile.children.cycles-pp.remove_entity_load_avg
> 0.07 ? 10% -0.0 0.02 ? 99% perf-profile.children.cycles-pp.generic_update_time
> 0.11 ? 6% -0.0 0.07 ? 8% perf-profile.children.cycles-pp.__mark_inode_dirty
> 0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.load_balance
> 0.00 +0.1 0.06 ? 11% perf-profile.children.cycles-pp._raw_spin_trylock
> 0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.uncharge_folio
> 0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.__do_softirq
> 0.00 +0.1 0.07 ? 10% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
> 0.00 +0.1 0.08 ? 14% perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
> 0.15 ? 23% +0.1 0.23 ? 7% perf-profile.children.cycles-pp.task_tick_fair
> 0.19 ? 17% +0.1 0.28 ? 7% perf-profile.children.cycles-pp.scheduler_tick
> 0.00 +0.1 0.10 ? 21% perf-profile.children.cycles-pp.select_idle_core
> 0.00 +0.1 0.10 ? 9% perf-profile.children.cycles-pp.osq_unlock
> 0.23 ? 12% +0.1 0.34 ? 6% perf-profile.children.cycles-pp.update_process_times
> 0.37 ? 13% +0.1 0.48 ? 5% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.24 ? 12% +0.1 0.35 ? 6% perf-profile.children.cycles-pp.tick_sched_handle
> 0.31 ? 14% +0.1 0.43 ? 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 0.37 ? 12% +0.1 0.49 ? 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 0.00 +0.1 0.12 ? 10% perf-profile.children.cycles-pp.__mod_memcg_state
> 0.26 ? 10% +0.1 0.38 ? 6% perf-profile.children.cycles-pp.tick_sched_timer
> 0.00 +0.1 0.13 ? 7% perf-profile.children.cycles-pp.free_unref_page
> 0.00 +0.1 0.14 ? 8% perf-profile.children.cycles-pp.rmqueue
> 0.15 ? 8% +0.2 0.30 ? 5% perf-profile.children.cycles-pp.rcu_all_qs
> 0.16 ? 6% +0.2 0.31 ? 5% perf-profile.children.cycles-pp.__x64_sys_write
> 0.00 +0.2 0.16 ? 10% perf-profile.children.cycles-pp.propagate_protected_usage
> 0.00 +0.2 0.16 ? 10% perf-profile.children.cycles-pp.menu_select
> 0.00 +0.2 0.16 ? 9% perf-profile.children.cycles-pp.memcg_account_kmem
> 0.42 ? 12% +0.2 0.57 ? 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> 0.15 ? 11% +0.2 0.31 ? 8% perf-profile.children.cycles-pp.__x64_sys_read
> 0.00 +0.2 0.17 ? 8% perf-profile.children.cycles-pp.get_page_from_freelist
> 0.44 ? 11% +0.2 0.62 ? 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 0.10 ? 31% +0.2 0.28 ? 24% perf-profile.children.cycles-pp.mnt_user_ns
> 0.16 ? 4% +0.2 0.35 ? 5% perf-profile.children.cycles-pp.kill_fasync
> 0.20 ? 10% +0.2 0.40 ? 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.09 ? 7% +0.2 0.29 ? 4% perf-profile.children.cycles-pp.page_copy_sane
> 0.08 ? 8% +0.2 0.31 ? 6% perf-profile.children.cycles-pp.rw_verify_area
> 0.12 ? 11% +0.2 0.36 ? 8% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> 0.28 ? 12% +0.2 0.52 ? 5% perf-profile.children.cycles-pp.inode_needs_update_time
> 0.00 +0.3 0.27 ? 7% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
> 0.43 ? 6% +0.3 0.73 ? 5% perf-profile.children.cycles-pp.__cond_resched
> 0.21 ? 29% +0.3 0.54 ? 15% perf-profile.children.cycles-pp.select_idle_cpu
> 0.10 ? 10% +0.3 0.43 ? 17% perf-profile.children.cycles-pp.fsnotify_perm
> 0.23 ? 11% +0.3 0.56 ? 6% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> 0.06 ? 75% +0.4 0.47 ? 27% perf-profile.children.cycles-pp.queue_event
> 0.21 ? 9% +0.4 0.62 ? 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.06 ? 75% +0.4 0.48 ? 26% perf-profile.children.cycles-pp.ordered_events__queue
> 0.06 ? 73% +0.4 0.50 ? 24% perf-profile.children.cycles-pp.process_simple
> 0.01 ?223% +0.4 0.44 ? 9% perf-profile.children.cycles-pp.schedule_idle
> 0.05 ? 8% +0.5 0.52 ? 7% perf-profile.children.cycles-pp.__alloc_pages
> 0.45 ? 7% +0.5 0.94 ? 5% perf-profile.children.cycles-pp.__get_task_ioprio
> 0.89 ? 8% +0.5 1.41 ? 4% perf-profile.children.cycles-pp.__might_sleep
> 0.01 ?223% +0.5 0.54 ? 21% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> 0.05 ? 46% +0.5 0.60 ? 7% perf-profile.children.cycles-pp.osq_lock
> 0.34 ? 8% +0.6 0.90 ? 5% perf-profile.children.cycles-pp.aa_file_perm
> 0.01 ?223% +0.7 0.67 ? 7% perf-profile.children.cycles-pp.poll_idle
> 0.14 ? 17% +0.7 0.82 ? 6% perf-profile.children.cycles-pp.mutex_spin_on_owner
> 0.12 ? 12% +0.7 0.82 ? 15% perf-profile.children.cycles-pp.__cmd_record
> 0.07 ? 72% +0.7 0.78 ? 19% perf-profile.children.cycles-pp.reader__read_event
> 0.07 ? 72% +0.7 0.78 ? 19% perf-profile.children.cycles-pp.record__finish_output
> 0.07 ? 72% +0.7 0.78 ? 19% perf-profile.children.cycles-pp.perf_session__process_events
> 0.76 ? 8% +0.8 1.52 ? 5% perf-profile.children.cycles-pp.file_update_time
> 0.08 ? 61% +0.8 0.85 ? 11% perf-profile.children.cycles-pp.intel_idle_irq
> 1.23 ? 8% +0.9 2.11 ? 4% perf-profile.children.cycles-pp.__might_fault
> 0.02 ?141% +1.0 0.97 ? 7% perf-profile.children.cycles-pp.page_counter_uncharge
> 0.51 ? 9% +1.0 1.48 ? 4% perf-profile.children.cycles-pp.current_time
> 0.05 ? 46% +1.1 1.15 ? 7% perf-profile.children.cycles-pp.uncharge_batch
> 1.12 ? 6% +1.1 2.23 ? 5% perf-profile.children.cycles-pp.__fget_light
> 0.06 ? 14% +1.2 1.23 ? 6% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
> 0.06 ? 14% +1.2 1.24 ? 7% perf-profile.children.cycles-pp.__folio_put
> 0.64 ? 7% +1.2 1.83 ? 5% perf-profile.children.cycles-pp.syscall_return_via_sysret
> 1.19 ? 8% +1.2 2.42 ? 4% perf-profile.children.cycles-pp.__might_resched
> 0.59 ? 9% +1.3 1.84 ? 6% perf-profile.children.cycles-pp.atime_needs_update
> 43.47 +1.4 44.83 perf-profile.children.cycles-pp.ksys_write
> 1.28 ? 6% +1.4 2.68 ? 5% perf-profile.children.cycles-pp.__fdget_pos
> 0.80 ? 8% +1.5 2.28 ? 6% perf-profile.children.cycles-pp.touch_atime
> 0.11 ? 49% +1.5 1.59 ? 9% perf-profile.children.cycles-pp.cpuidle_enter_state
> 0.11 ? 49% +1.5 1.60 ? 9% perf-profile.children.cycles-pp.cpuidle_enter
> 0.12 ? 51% +1.7 1.81 ? 9% perf-profile.children.cycles-pp.cpuidle_idle_call
> 1.44 ? 8% +1.8 3.22 ? 6% perf-profile.children.cycles-pp.copyin
> 2.00 ? 9% +2.0 4.03 ? 5% perf-profile.children.cycles-pp.copyout
> 1.02 ? 8% +2.0 3.07 ? 5% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.63 ? 7% +2.3 3.90 ? 5% perf-profile.children.cycles-pp.apparmor_file_permission
> 2.64 ? 8% +2.3 4.98 ? 5% perf-profile.children.cycles-pp._copy_from_iter
> 0.40 ? 14% +2.5 2.92 ? 7% perf-profile.children.cycles-pp.__mutex_lock
> 2.91 ? 8% +2.6 5.54 ? 5% perf-profile.children.cycles-pp.copy_page_from_iter
> 0.17 ? 62% +2.7 2.91 ? 11% perf-profile.children.cycles-pp.start_secondary
> 1.83 ? 7% +2.8 4.59 ? 5% perf-profile.children.cycles-pp.security_file_permission
> 0.17 ? 60% +2.8 2.94 ? 11% perf-profile.children.cycles-pp.do_idle
> 0.17 ? 60% +2.8 2.96 ? 11% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
> 0.17 ? 60% +2.8 2.96 ? 11% perf-profile.children.cycles-pp.cpu_startup_entry
> 2.62 ? 9% +3.2 5.84 ? 6% perf-profile.children.cycles-pp._copy_to_iter
> 1.55 ? 8% +3.2 4.79 ? 5% perf-profile.children.cycles-pp.__entry_text_start
> 3.09 ? 8% +3.7 6.77 ? 5% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> 2.95 ? 9% +3.8 6.73 ? 5% perf-profile.children.cycles-pp.copy_page_to_iter
> 2.28 ? 11% +5.1 7.40 ? 6% perf-profile.children.cycles-pp.mutex_unlock
> 3.92 ? 9% +6.0 9.94 ? 5% perf-profile.children.cycles-pp.mutex_lock
> 8.37 ? 9% -5.8 2.60 ? 23% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 6.54 ? 15% -4.6 1.95 ? 57% perf-profile.self.cycles-pp.update_cfs_group
> 3.08 ? 4% -2.3 0.74 ? 22% perf-profile.self.cycles-pp.switch_mm_irqs_off
> 2.96 ? 4% -1.8 1.13 ? 33% perf-profile.self.cycles-pp.update_load_avg
> 2.22 ? 8% -1.5 0.74 ? 12% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 1.96 ? 9% -1.5 0.48 ? 15% perf-profile.self.cycles-pp.update_curr
> 1.94 ? 5% -1.3 0.64 ? 16% perf-profile.self.cycles-pp._raw_spin_lock
> 1.78 ? 5% -1.3 0.50 ? 18% perf-profile.self.cycles-pp.__schedule
> 1.59 ? 7% -1.2 0.40 ? 12% perf-profile.self.cycles-pp.enqueue_entity
> 1.61 ? 6% -1.2 0.42 ? 25% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> 1.44 ? 8% -1.0 0.39 ? 14% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> 1.42 ? 5% -1.0 0.41 ? 19% perf-profile.self.cycles-pp.__switch_to_asm
> 1.18 ? 7% -0.9 0.33 ? 14% perf-profile.self.cycles-pp.__update_load_avg_se
> 1.14 ? 10% -0.8 0.31 ? 9% perf-profile.self.cycles-pp.update_rq_clock
> 0.90 ? 7% -0.7 0.19 ? 21% perf-profile.self.cycles-pp.pick_next_task_fair
> 1.04 ? 7% -0.7 0.33 ? 13% perf-profile.self.cycles-pp.prepare_task_switch
> 0.98 ? 4% -0.7 0.29 ? 20% perf-profile.self.cycles-pp.__switch_to
> 0.88 ? 6% -0.7 0.20 ? 17% perf-profile.self.cycles-pp.enqueue_task_fair
> 1.01 ? 6% -0.7 0.35 ? 10% perf-profile.self.cycles-pp.prepare_to_wait_event
> 0.90 ? 8% -0.6 0.25 ? 12% perf-profile.self.cycles-pp.update_min_vruntime
> 0.79 ? 17% -0.6 0.22 ? 9% perf-profile.self.cycles-pp.cpuacct_charge
> 1.10 ? 5% -0.6 0.54 ? 9% perf-profile.self.cycles-pp.try_to_wake_up
> 0.66 ? 3% -0.5 0.15 ? 27% perf-profile.self.cycles-pp.os_xsave
> 0.71 ? 6% -0.5 0.22 ? 18% perf-profile.self.cycles-pp.reweight_entity
> 0.68 ? 9% -0.5 0.19 ? 10% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
> 0.67 ? 9% -0.5 0.18 ? 11% perf-profile.self.cycles-pp.__wake_up_common
> 0.65 ? 6% -0.5 0.17 ? 23% perf-profile.self.cycles-pp.switch_fpu_return
> 0.60 ? 11% -0.5 0.14 ? 28% perf-profile.self.cycles-pp.perf_tp_event
> 0.52 ? 44% -0.5 0.06 ?151% perf-profile.self.cycles-pp.native_irq_return_iret
> 0.52 ? 7% -0.4 0.08 ? 25% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> 0.55 ? 4% -0.4 0.15 ? 22% perf-profile.self.cycles-pp.__calc_delta
> 0.61 ? 5% -0.4 0.21 ? 12% perf-profile.self.cycles-pp.dequeue_task_fair
> 0.69 ? 14% -0.4 0.32 ? 11% perf-profile.self.cycles-pp.task_h_load
> 0.49 ? 11% -0.3 0.15 ? 29% perf-profile.self.cycles-pp.___perf_sw_event
> 0.37 ? 4% -0.3 0.05 ? 73% perf-profile.self.cycles-pp.pick_next_entity
> 0.50 ? 3% -0.3 0.19 ? 15% perf-profile.self.cycles-pp.select_idle_sibling
> 0.38 ? 9% -0.3 0.08 ? 24% perf-profile.self.cycles-pp.set_next_buddy
> 0.32 ? 4% -0.3 0.03 ?100% perf-profile.self.cycles-pp.put_prev_entity
> 0.64 ? 6% -0.3 0.35 ? 7% perf-profile.self.cycles-pp._raw_spin_lock_irq
> 0.52 ? 5% -0.3 0.25 ? 6% perf-profile.self.cycles-pp.__list_del_entry_valid
> 0.34 ? 5% -0.3 0.07 ? 29% perf-profile.self.cycles-pp.schedule
> 0.35 ? 9% -0.3 0.08 ? 10% perf-profile.self.cycles-pp.rb_insert_color
> 0.40 ? 5% -0.3 0.14 ? 16% perf-profile.self.cycles-pp.select_task_rq_fair
> 0.33 ? 6% -0.3 0.08 ? 16% perf-profile.self.cycles-pp.check_preempt_wakeup
> 0.33 ? 8% -0.2 0.10 ? 16% perf-profile.self.cycles-pp.select_task_rq
> 0.36 ? 3% -0.2 0.13 ? 16% perf-profile.self.cycles-pp.native_sched_clock
> 0.32 ? 7% -0.2 0.10 ? 14% perf-profile.self.cycles-pp.finish_task_switch
> 0.32 ? 4% -0.2 0.11 ? 13% perf-profile.self.cycles-pp.dequeue_entity
> 0.32 ? 8% -0.2 0.12 ? 10% perf-profile.self.cycles-pp.__list_add_valid
> 0.23 ? 5% -0.2 0.03 ?103% perf-profile.self.cycles-pp.resched_curr
> 0.27 ? 6% -0.2 0.07 ? 21% perf-profile.self.cycles-pp.rb_erase
> 0.27 ? 5% -0.2 0.08 ? 20% perf-profile.self.cycles-pp.__wrgsbase_inactive
> 0.28 ? 13% -0.2 0.09 ? 12% perf-profile.self.cycles-pp.check_preempt_curr
> 0.30 ? 13% -0.2 0.12 ? 7% perf-profile.self.cycles-pp.ttwu_queue_wakelist
> 0.24 ? 5% -0.2 0.06 ? 19% perf-profile.self.cycles-pp.set_next_entity
> 0.21 ? 34% -0.2 0.04 ? 71% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
> 0.25 ? 5% -0.2 0.08 ? 16% perf-profile.self.cycles-pp.rcu_note_context_switch
> 0.19 ? 26% -0.1 0.04 ? 73% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> 0.20 ? 8% -0.1 0.06 ? 13% perf-profile.self.cycles-pp.ttwu_do_activate
> 0.17 ? 8% -0.1 0.03 ?100% perf-profile.self.cycles-pp.rb_next
> 0.22 ? 23% -0.1 0.09 ? 31% perf-profile.self.cycles-pp.migrate_task_rq_fair
> 0.15 ? 32% -0.1 0.03 ?100% perf-profile.self.cycles-pp.llist_reverse_order
> 0.16 ? 8% -0.1 0.06 ? 14% perf-profile.self.cycles-pp.wake_affine
> 0.10 ? 31% -0.1 0.03 ?100% perf-profile.self.cycles-pp.sched_ttwu_pending
> 0.14 ? 5% -0.1 0.07 ? 20% perf-profile.self.cycles-pp.read@plt
> 0.32 ? 8% -0.1 0.26 ? 3% perf-profile.self.cycles-pp.anon_pipe_buf_release
> 0.10 ? 6% -0.1 0.04 ? 45% perf-profile.self.cycles-pp.__wake_up_common_lock
> 0.10 ? 9% -0.0 0.07 ? 8% perf-profile.self.cycles-pp.__mark_inode_dirty
> 0.00 +0.1 0.06 ? 11% perf-profile.self.cycles-pp.free_unref_page
> 0.00 +0.1 0.06 ? 6% perf-profile.self.cycles-pp.__alloc_pages
> 0.00 +0.1 0.06 ? 11% perf-profile.self.cycles-pp._raw_spin_trylock
> 0.00 +0.1 0.06 ? 7% perf-profile.self.cycles-pp.uncharge_folio
> 0.00 +0.1 0.06 ? 11% perf-profile.self.cycles-pp.uncharge_batch
> 0.00 +0.1 0.07 ? 10% perf-profile.self.cycles-pp.menu_select
> 0.00 +0.1 0.08 ? 14% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
> 0.00 +0.1 0.08 ? 7% perf-profile.self.cycles-pp.__memcg_kmem_charge_page
> 0.00 +0.1 0.10 ? 10% perf-profile.self.cycles-pp.osq_unlock
> 0.07 ? 5% +0.1 0.17 ? 8% perf-profile.self.cycles-pp.copyin
> 0.00 +0.1 0.11 ? 11% perf-profile.self.cycles-pp.__mod_memcg_state
> 0.13 ? 8% +0.1 0.24 ? 6% perf-profile.self.cycles-pp.rcu_all_qs
> 0.14 ? 5% +0.1 0.28 ? 5% perf-profile.self.cycles-pp.__x64_sys_write
> 0.07 ? 10% +0.1 0.21 ? 5% perf-profile.self.cycles-pp.page_copy_sane
> 0.13 ? 12% +0.1 0.28 ? 9% perf-profile.self.cycles-pp.__x64_sys_read
> 0.00 +0.2 0.15 ? 10% perf-profile.self.cycles-pp.propagate_protected_usage
> 0.18 ? 9% +0.2 0.33 ? 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.07 ? 8% +0.2 0.23 ? 5% perf-profile.self.cycles-pp.rw_verify_area
> 0.08 ? 34% +0.2 0.24 ? 27% perf-profile.self.cycles-pp.mnt_user_ns
> 0.13 ? 5% +0.2 0.31 ? 7% perf-profile.self.cycles-pp.kill_fasync
> 0.21 ? 8% +0.2 0.39 ? 5% perf-profile.self.cycles-pp.__might_fault
> 0.06 ? 13% +0.2 0.26 ? 9% perf-profile.self.cycles-pp.copyout
> 0.10 ? 11% +0.2 0.31 ? 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> 0.26 ? 13% +0.2 0.49 ? 6% perf-profile.self.cycles-pp.inode_needs_update_time
> 0.23 ? 8% +0.2 0.47 ? 5% perf-profile.self.cycles-pp.copy_page_from_iter
> 0.14 ? 7% +0.2 0.38 ? 6% perf-profile.self.cycles-pp.file_update_time
> 0.36 ? 7% +0.3 0.62 ? 4% perf-profile.self.cycles-pp.ksys_read
> 0.54 ? 13% +0.3 0.80 ? 4% perf-profile.self.cycles-pp._copy_from_iter
> 0.15 ? 5% +0.3 0.41 ? 8% perf-profile.self.cycles-pp.touch_atime
> 0.14 ? 5% +0.3 0.40 ? 6% perf-profile.self.cycles-pp.__cond_resched
> 0.18 ? 5% +0.3 0.47 ? 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 0.16 ? 8% +0.3 0.46 ? 6% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> 0.16 ? 9% +0.3 0.47 ? 6% perf-profile.self.cycles-pp.__fdget_pos
> 1.79 ? 8% +0.3 2.12 ? 3% perf-profile.self.cycles-pp.pipe_read
> 0.10 ? 8% +0.3 0.43 ? 17% perf-profile.self.cycles-pp.fsnotify_perm
> 0.20 ? 4% +0.4 0.55 ? 5% perf-profile.self.cycles-pp.ksys_write
> 0.05 ? 76% +0.4 0.46 ? 27% perf-profile.self.cycles-pp.queue_event
> 0.32 ? 6% +0.4 0.73 ? 6% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
> 0.21 ? 9% +0.4 0.62 ? 6% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.79 ? 8% +0.4 1.22 ? 4% perf-profile.self.cycles-pp.__might_sleep
> 0.44 ? 5% +0.4 0.88 ? 7% perf-profile.self.cycles-pp.do_syscall_64
> 0.26 ? 8% +0.4 0.70 ? 4% perf-profile.self.cycles-pp.atime_needs_update
> 0.42 ? 7% +0.5 0.88 ? 5% perf-profile.self.cycles-pp.__get_task_ioprio
> 0.28 ? 12% +0.5 0.75 ? 5% perf-profile.self.cycles-pp.copy_page_to_iter
> 0.19 ? 6% +0.5 0.68 ? 10% perf-profile.self.cycles-pp.security_file_permission
> 0.31 ? 8% +0.5 0.83 ? 5% perf-profile.self.cycles-pp.aa_file_perm
> 0.05 ? 46% +0.5 0.59 ? 8% perf-profile.self.cycles-pp.osq_lock
> 0.30 ? 7% +0.5 0.85 ? 6% perf-profile.self.cycles-pp._copy_to_iter
> 0.00 +0.6 0.59 ? 6% perf-profile.self.cycles-pp.poll_idle
> 0.13 ? 20% +0.7 0.81 ? 6% perf-profile.self.cycles-pp.mutex_spin_on_owner
> 0.38 ? 9% +0.7 1.12 ? 5% perf-profile.self.cycles-pp.current_time
> 0.08 ? 59% +0.8 0.82 ? 11% perf-profile.self.cycles-pp.intel_idle_irq
> 0.92 ? 6% +0.8 1.72 ? 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.01 ?223% +0.8 0.82 ? 6% perf-profile.self.cycles-pp.page_counter_uncharge
> 0.86 ? 7% +1.1 1.91 ? 4% perf-profile.self.cycles-pp.vfs_read
> 1.07 ? 6% +1.1 2.14 ? 5% perf-profile.self.cycles-pp.__fget_light
> 0.67 ? 7% +1.1 1.74 ? 6% perf-profile.self.cycles-pp.vfs_write
> 0.15 ? 12% +1.1 1.28 ? 7% perf-profile.self.cycles-pp.__mutex_lock
> 1.09 ? 6% +1.1 2.22 ? 5% perf-profile.self.cycles-pp.__libc_read
> 0.62 ? 6% +1.2 1.79 ? 5% perf-profile.self.cycles-pp.syscall_return_via_sysret
> 1.16 ? 8% +1.2 2.38 ? 4% perf-profile.self.cycles-pp.__might_resched
> 0.91 ? 7% +1.3 2.20 ? 5% perf-profile.self.cycles-pp.__libc_write
> 0.59 ? 8% +1.3 1.93 ? 6% perf-profile.self.cycles-pp.__entry_text_start
> 1.27 ? 7% +1.7 3.00 ? 6% perf-profile.self.cycles-pp.apparmor_file_permission
> 0.99 ? 8% +2.0 2.98 ? 5% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.74 ? 8% +3.4 5.15 ? 6% perf-profile.self.cycles-pp.pipe_write
> 2.98 ? 8% +3.7 6.64 ? 5% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> 2.62 ? 10% +4.8 7.38 ? 5% perf-profile.self.cycles-pp.mutex_lock
> 2.20 ? 10% +5.1 7.30 ? 6% perf-profile.self.cycles-pp.mutex_unlock
>
>
> ***************************************************************************************************
> lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/hackbench
>
> commit:
> a2e90611b9 ("sched/fair: Remove capacity inversion detection")
> 829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
>
> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 177139 -8.1% 162815 hackbench.throughput
> 174484 -18.8% 141618 ? 2% hackbench.throughput_avg
> 177139 -8.1% 162815 hackbench.throughput_best
> 168530 -37.3% 105615 ? 3% hackbench.throughput_worst
> 281.38 +23.1% 346.39 ? 2% hackbench.time.elapsed_time
> 281.38 +23.1% 346.39 ? 2% hackbench.time.elapsed_time.max
> 1.053e+08 ? 2% +688.4% 8.302e+08 ? 9% hackbench.time.involuntary_context_switches
> 21992 +27.8% 28116 ? 2% hackbench.time.system_time
> 6652 +8.2% 7196 hackbench.time.user_time
> 3.482e+08 +289.2% 1.355e+09 ? 9% hackbench.time.voluntary_context_switches
> 2110813 ? 5% +21.6% 2565791 ? 3% cpuidle..usage
> 333.95 +19.5% 399.05 uptime.boot
> 0.03 -0.0 0.03 mpstat.cpu.all.soft%
> 22.68 -2.9 19.77 mpstat.cpu.all.usr%
> 561083 ? 10% +45.5% 816171 ? 12% numa-numastat.node0.local_node
> 614314 ? 9% +36.9% 841173 ? 12% numa-numastat.node0.numa_hit
> 1393279 ? 7% -16.8% 1158997 ? 2% numa-numastat.node1.local_node
> 1443679 ? 5% -14.9% 1229074 ? 3% numa-numastat.node1.numa_hit
> 4129900 ? 8% -23.0% 3181115 vmstat.memory.cache
> 1731 +30.8% 2265 vmstat.procs.r
> 1598044 +290.3% 6237840 ? 7% vmstat.system.cs
> 320762 +60.5% 514672 ? 8% vmstat.system.in
> 962111 ? 6% +46.0% 1404646 ? 7% turbostat.C1
> 233987 ? 5% +51.2% 353892 turbostat.C1E
> 91515563 +97.3% 1.806e+08 ? 10% turbostat.IRQ
> 448466 ? 14% -34.2% 294934 ? 5% turbostat.POLL
> 34.60 -7.3% 32.07 turbostat.RAMWatt
> 514028 ? 2% -14.0% 442125 ? 2% meminfo.AnonPages
> 4006312 ? 8% -23.9% 3047078 meminfo.Cached
> 3321064 ? 10% -32.7% 2236362 ? 2% meminfo.Committed_AS
> 1714752 ? 21% -60.3% 680479 ? 8% meminfo.Inactive
> 1714585 ? 21% -60.3% 680305 ? 8% meminfo.Inactive(anon)
> 757124 ? 18% -67.2% 248485 ? 27% meminfo.Mapped
> 6476123 ? 6% -19.4% 5220738 meminfo.Memused
> 1275724 ? 26% -75.2% 316896 ? 15% meminfo.Shmem
> 6806047 ? 3% -13.3% 5901974 meminfo.max_used_kB
> 161311 ? 23% +31.7% 212494 ? 5% numa-meminfo.node0.AnonPages
> 165693 ? 22% +30.5% 216264 ? 5% numa-meminfo.node0.Inactive
> 165563 ? 22% +30.6% 216232 ? 5% numa-meminfo.node0.Inactive(anon)
> 140638 ? 19% -36.7% 89034 ? 11% numa-meminfo.node0.Mapped
> 352173 ? 14% -35.3% 227805 ? 8% numa-meminfo.node1.AnonPages
> 501396 ? 11% -22.6% 388042 ? 5% numa-meminfo.node1.AnonPages.max
> 1702242 ? 43% -77.8% 378325 ? 22% numa-meminfo.node1.FilePages
> 1540803 ? 25% -70.4% 455592 ? 13% numa-meminfo.node1.Inactive
> 1540767 ? 25% -70.4% 455451 ? 13% numa-meminfo.node1.Inactive(anon)
> 612123 ? 18% -74.9% 153752 ? 37% numa-meminfo.node1.Mapped
> 3085231 ? 24% -53.9% 1420940 ? 14% numa-meminfo.node1.MemUsed
> 254052 ? 4% -19.1% 205632 ? 21% numa-meminfo.node1.SUnreclaim
> 1259640 ? 27% -75.9% 303123 ? 15% numa-meminfo.node1.Shmem
> 304597 ? 7% -20.2% 242920 ? 17% numa-meminfo.node1.Slab
> 40345 ? 23% +31.5% 53054 ? 5% numa-vmstat.node0.nr_anon_pages
> 41412 ? 22% +30.4% 53988 ? 5% numa-vmstat.node0.nr_inactive_anon
> 35261 ? 19% -36.9% 22256 ? 12% numa-vmstat.node0.nr_mapped
> 41412 ? 22% +30.4% 53988 ? 5% numa-vmstat.node0.nr_zone_inactive_anon
> 614185 ? 9% +36.9% 841065 ? 12% numa-vmstat.node0.numa_hit
> 560955 ? 11% +45.5% 816063 ? 12% numa-vmstat.node0.numa_local
> 88129 ? 14% -35.2% 57097 ? 8% numa-vmstat.node1.nr_anon_pages
> 426425 ? 43% -77.9% 94199 ? 22% numa-vmstat.node1.nr_file_pages
> 386166 ? 25% -70.5% 113880 ? 13% numa-vmstat.node1.nr_inactive_anon
> 153658 ? 18% -75.3% 38021 ? 37% numa-vmstat.node1.nr_mapped
> 315775 ? 27% -76.1% 75399 ? 16% numa-vmstat.node1.nr_shmem
> 63411 ? 4% -18.6% 51593 ? 21% numa-vmstat.node1.nr_slab_unreclaimable
> 386166 ? 25% -70.5% 113880 ? 13% numa-vmstat.node1.nr_zone_inactive_anon
> 1443470 ? 5% -14.9% 1228740 ? 3% numa-vmstat.node1.numa_hit
> 1393069 ? 7% -16.8% 1158664 ? 2% numa-vmstat.node1.numa_local
> 128457 ? 2% -14.0% 110530 ? 3% proc-vmstat.nr_anon_pages
> 999461 ? 8% -23.8% 761774 proc-vmstat.nr_file_pages
> 426485 ? 21% -60.1% 170237 ? 9% proc-vmstat.nr_inactive_anon
> 82464 -2.6% 80281 proc-vmstat.nr_kernel_stack
> 187777 ? 18% -66.9% 62076 ? 28% proc-vmstat.nr_mapped
> 316813 ? 27% -75.0% 79228 ? 16% proc-vmstat.nr_shmem
> 31469 -2.0% 30840 proc-vmstat.nr_slab_reclaimable
> 117889 -8.4% 108036 proc-vmstat.nr_slab_unreclaimable
> 426485 ? 21% -60.1% 170237 ? 9% proc-vmstat.nr_zone_inactive_anon
> 187187 ? 12% -43.5% 105680 ? 9% proc-vmstat.numa_hint_faults
> 128363 ? 15% -61.5% 49371 ? 19% proc-vmstat.numa_hint_faults_local
> 47314 ? 22% +39.2% 65863 ? 13% proc-vmstat.numa_pages_migrated
> 457026 ? 9% -18.1% 374188 ? 13% proc-vmstat.numa_pte_updates
> 2586600 ? 3% +27.7% 3302787 ? 8% proc-vmstat.pgalloc_normal
> 1589970 -6.2% 1491838 proc-vmstat.pgfault
> 2347186 ? 10% +37.7% 3232369 ? 8% proc-vmstat.pgfree
> 47314 ? 22% +39.2% 65863 ? 13% proc-vmstat.pgmigrate_success
> 112713 +7.0% 120630 ? 3% proc-vmstat.pgreuse
> 2189056 +22.2% 2674944 ? 2% proc-vmstat.unevictable_pgs_scanned
> 14.08 ? 2% +29.3% 18.20 ? 5% sched_debug.cfs_rq:/.h_nr_running.avg
> 0.80 ? 14% +179.2% 2.23 ? 24% sched_debug.cfs_rq:/.h_nr_running.min
> 245.23 ? 12% -19.7% 196.97 ? 6% sched_debug.cfs_rq:/.load_avg.max
> 2.27 ? 16% +75.0% 3.97 ? 4% sched_debug.cfs_rq:/.load_avg.min
> 45.77 ? 16% -17.8% 37.60 ? 6% sched_debug.cfs_rq:/.load_avg.stddev
> 11842707 +39.9% 16567992 sched_debug.cfs_rq:/.min_vruntime.avg
> 13773080 ? 3% +113.9% 29460281 ? 7% sched_debug.cfs_rq:/.min_vruntime.max
> 11423218 +30.3% 14885830 sched_debug.cfs_rq:/.min_vruntime.min
> 301190 ? 12% +439.9% 1626088 ? 10% sched_debug.cfs_rq:/.min_vruntime.stddev
> 203.83 -16.3% 170.67 sched_debug.cfs_rq:/.removed.load_avg.max
> 14330 ? 3% +30.9% 18756 ? 5% sched_debug.cfs_rq:/.runnable_avg.avg
> 25115 ? 4% +15.5% 28999 ? 6% sched_debug.cfs_rq:/.runnable_avg.max
> 3811 ? 11% +68.0% 6404 ? 21% sched_debug.cfs_rq:/.runnable_avg.min
> 3818 ? 6% +15.3% 4404 ? 7% sched_debug.cfs_rq:/.runnable_avg.stddev
> -849635 +410.6% -4338612 sched_debug.cfs_rq:/.spread0.avg
> 1092373 ? 54% +691.1% 8641673 ? 21% sched_debug.cfs_rq:/.spread0.max
> -1263082 +378.1% -6038905 sched_debug.cfs_rq:/.spread0.min
> 300764 ? 12% +441.8% 1629507 ? 9% sched_debug.cfs_rq:/.spread0.stddev
> 1591 ? 4% -11.1% 1413 ? 3% sched_debug.cfs_rq:/.util_avg.max
> 288.90 ? 11% +64.5% 475.23 ? 13% sched_debug.cfs_rq:/.util_avg.min
> 240.33 ? 2% -32.1% 163.09 ? 3% sched_debug.cfs_rq:/.util_avg.stddev
> 494.27 ? 3% +41.6% 699.85 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.avg
> 11.23 ? 54% +634.1% 82.47 ? 22% sched_debug.cfs_rq:/.util_est_enqueued.min
> 174576 +20.7% 210681 sched_debug.cpu.clock.avg
> 174926 +21.2% 211944 sched_debug.cpu.clock.max
> 174164 +20.3% 209436 sched_debug.cpu.clock.min
> 230.84 ? 33% +226.1% 752.67 ? 20% sched_debug.cpu.clock.stddev
> 172836 +20.6% 208504 sched_debug.cpu.clock_task.avg
> 173552 +21.0% 210079 sched_debug.cpu.clock_task.max
> 156807 +22.3% 191789 sched_debug.cpu.clock_task.min
> 1634 +17.1% 1914 ? 5% sched_debug.cpu.clock_task.stddev
> 0.00 ? 32% +220.1% 0.00 ? 20% sched_debug.cpu.next_balance.stddev
> 14.12 ? 2% +28.7% 18.18 ? 5% sched_debug.cpu.nr_running.avg
> 0.73 ? 25% +213.6% 2.30 ? 24% sched_debug.cpu.nr_running.min
> 1810086 +461.3% 10159215 ? 10% sched_debug.cpu.nr_switches.avg
> 2315994 ? 3% +515.6% 14258195 ? 9% sched_debug.cpu.nr_switches.max
> 1529863 +380.3% 7348324 ? 9% sched_debug.cpu.nr_switches.min
> 167487 ? 18% +770.8% 1458519 ? 21% sched_debug.cpu.nr_switches.stddev
> 174149 +20.2% 209410 sched_debug.cpu_clk
> 170980 +20.6% 206240 sched_debug.ktime
> 174896 +20.2% 210153 sched_debug.sched_clk
> 7.35 +24.9% 9.18 ? 4% perf-stat.i.MPKI
> 1.918e+10 +14.4% 2.194e+10 perf-stat.i.branch-instructions
> 2.16 -0.1 2.09 perf-stat.i.branch-miss-rate%
> 4.133e+08 +6.6% 4.405e+08 perf-stat.i.branch-misses
> 23.08 -9.2 13.86 ? 7% perf-stat.i.cache-miss-rate%
> 1.714e+08 -37.2% 1.076e+08 ? 3% perf-stat.i.cache-misses
> 7.497e+08 +33.7% 1.002e+09 ? 5% perf-stat.i.cache-references
> 1636365 +382.4% 7893858 ? 5% perf-stat.i.context-switches
> 2.74 -6.8% 2.56 perf-stat.i.cpi
> 131725 +288.0% 511159 ? 10% perf-stat.i.cpu-migrations
> 1672 +160.8% 4361 ? 4% perf-stat.i.cycles-between-cache-misses
> 0.49 +0.6 1.11 ? 5% perf-stat.i.dTLB-load-miss-rate%
> 1.417e+08 +158.7% 3.665e+08 ? 5% perf-stat.i.dTLB-load-misses
> 2.908e+10 +9.1% 3.172e+10 perf-stat.i.dTLB-loads
> 0.12 ? 4% +0.1 0.20 ? 4% perf-stat.i.dTLB-store-miss-rate%
> 20805655 ? 4% +90.9% 39716345 ? 4% perf-stat.i.dTLB-store-misses
> 1.755e+10 +8.6% 1.907e+10 perf-stat.i.dTLB-stores
> 29.04 +3.6 32.62 ? 2% perf-stat.i.iTLB-load-miss-rate%
> 56676082 +60.4% 90917582 ? 3% perf-stat.i.iTLB-load-misses
> 1.381e+08 +30.6% 1.804e+08 perf-stat.i.iTLB-loads
> 1.03e+11 +10.5% 1.139e+11 perf-stat.i.instructions
> 1840 -21.1% 1451 ? 4% perf-stat.i.instructions-per-iTLB-miss
> 0.37 +10.9% 0.41 perf-stat.i.ipc
> 1084 -4.5% 1035 ? 2% perf-stat.i.metric.K/sec
> 640.69 +10.3% 706.44 perf-stat.i.metric.M/sec
> 5249 -9.3% 4762 ? 3% perf-stat.i.minor-faults
> 23.57 +18.7 42.30 ? 8% perf-stat.i.node-load-miss-rate%
> 40174555 -45.0% 22109431 ? 10% perf-stat.i.node-loads
> 8.84 ? 2% +24.5 33.30 ? 10% perf-stat.i.node-store-miss-rate%
> 2912322 +60.3% 4667137 ? 16% perf-stat.i.node-store-misses
> 34046752 -50.6% 16826621 ? 9% perf-stat.i.node-stores
> 5278 -9.2% 4791 ? 3% perf-stat.i.page-faults
> 7.24 +12.1% 8.12 ? 4% perf-stat.overall.MPKI
> 2.15 -0.1 2.05 perf-stat.overall.branch-miss-rate%
> 22.92 -9.5 13.41 ? 7% perf-stat.overall.cache-miss-rate%
> 2.73 -6.3% 2.56 perf-stat.overall.cpi
> 1644 +43.4% 2358 ? 3% perf-stat.overall.cycles-between-cache-misses
> 0.48 +0.5 0.99 ? 4% perf-stat.overall.dTLB-load-miss-rate%
> 0.12 ? 4% +0.1 0.19 ? 4% perf-stat.overall.dTLB-store-miss-rate%
> 29.06 +2.9 32.01 ? 2% perf-stat.overall.iTLB-load-miss-rate%
> 1826 -26.6% 1340 ? 4% perf-stat.overall.instructions-per-iTLB-miss
> 0.37 +6.8% 0.39 perf-stat.overall.ipc
> 22.74 +6.8 29.53 ? 13% perf-stat.overall.node-load-miss-rate%
> 7.63 +8.4 16.02 ? 20% perf-stat.overall.node-store-miss-rate%
> 1.915e+10 +9.0% 2.088e+10 perf-stat.ps.branch-instructions
> 4.119e+08 +3.9% 4.282e+08 perf-stat.ps.branch-misses
> 1.707e+08 -30.5% 1.186e+08 ? 3% perf-stat.ps.cache-misses
> 7.446e+08 +19.2% 8.874e+08 ? 4% perf-stat.ps.cache-references
> 1611874 +289.1% 6271376 ? 7% perf-stat.ps.context-switches
> 127362 +189.0% 368041 ? 11% perf-stat.ps.cpu-migrations
> 1.407e+08 +116.2% 3.042e+08 ? 5% perf-stat.ps.dTLB-load-misses
> 2.901e+10 +5.4% 3.057e+10 perf-stat.ps.dTLB-loads
> 20667480 ? 4% +66.8% 34473793 ? 4% perf-stat.ps.dTLB-store-misses
> 1.751e+10 +5.1% 1.84e+10 perf-stat.ps.dTLB-stores
> 56310692 +45.0% 81644183 ? 4% perf-stat.ps.iTLB-load-misses
> 1.375e+08 +26.1% 1.733e+08 perf-stat.ps.iTLB-loads
> 1.028e+11 +6.3% 1.093e+11 perf-stat.ps.instructions
> 4929 -24.5% 3723 ? 2% perf-stat.ps.minor-faults
> 40134633 -32.9% 26946247 ? 9% perf-stat.ps.node-loads
> 2805073 +39.5% 3914304 ? 16% perf-stat.ps.node-store-misses
> 33938259 -38.9% 20726382 ? 8% perf-stat.ps.node-stores
> 4952 -24.5% 3741 ? 2% perf-stat.ps.page-faults
> 2.911e+13 +30.9% 3.809e+13 ? 2% perf-stat.total.instructions
> 15.30 ? 4% -8.6 6.66 ? 5% perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> 13.84 ? 6% -7.9 5.98 ? 6% perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> 13.61 ? 6% -7.8 5.84 ? 6% perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
> 9.00 ? 2% -5.5 3.48 ? 4% perf-profile.calltrace.cycles-pp.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> 6.44 ? 4% -4.3 2.14 ? 6% perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> 5.83 ? 8% -3.4 2.44 ? 5% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
> 5.81 ? 6% -3.3 2.48 ? 6% perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
> 5.50 ? 7% -3.2 2.32 ? 6% perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 5.07 ? 8% -3.0 2.04 ? 6% perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> 6.22 ? 2% -2.9 3.33 ? 3% perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> 6.17 ? 2% -2.9 3.30 ? 3% perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> 6.11 ? 2% -2.9 3.24 ? 3% perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
> 50.99 -2.6 48.39 perf-profile.calltrace.cycles-pp.__libc_read
> 5.66 ? 3% -2.3 3.35 ? 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
> 5.52 ? 3% -2.3 3.27 ? 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
> 3.14 ? 2% -1.7 1.42 ? 4% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> 2.73 ? 2% -1.6 1.15 ? 4% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
> 2.59 ? 2% -1.5 1.07 ? 4% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
> 2.72 ? 3% -1.4 1.34 ? 6% perf-profile.calltrace.cycles-pp.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> 41.50 -1.2 40.27 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
> 2.26 ? 4% -1.1 1.12 perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> 2.76 ? 3% -1.1 1.63 ? 3% perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
> 2.84 ? 3% -1.1 1.71 ? 2% perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> 2.20 ? 4% -1.1 1.08 perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
> 2.98 ? 2% -1.1 1.90 ? 6% perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> 1.99 ? 4% -1.1 0.92 ? 2% perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
> 2.10 ? 3% -1.0 1.08 ? 4% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
> 2.08 ? 4% -0.8 1.24 ? 3% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
> 2.16 ? 3% -0.7 1.47 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
> 2.20 ? 2% -0.7 1.52 ? 3% perf-profile.calltrace.cycles-pp.__kmem_cache_free.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
> 1.46 ? 3% -0.6 0.87 ? 8% perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> 4.82 ? 2% -0.6 4.24 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 1.31 ? 2% -0.4 0.90 ? 4% perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> 0.96 ? 3% -0.4 0.57 ? 10% perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
> 1.14 ? 3% -0.4 0.76 ? 5% perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 0.99 ? 3% -0.3 0.65 ? 8% perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
> 1.30 ? 4% -0.3 0.99 ? 3% perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
> 0.98 ? 2% -0.3 0.69 ? 3% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.67 -0.2 0.42 ? 50% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
> 0.56 ? 4% -0.2 0.32 ? 81% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 0.86 ? 2% -0.2 0.63 ? 3% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
> 1.15 ? 4% -0.2 0.93 ? 4% perf-profile.calltrace.cycles-pp.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read.ksys_read
> 0.90 -0.2 0.69 ? 3% perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> 1.23 ? 3% -0.2 1.07 ? 3% perf-profile.calltrace.cycles-pp.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
> 1.05 ? 2% -0.2 0.88 ? 2% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.84 ? 4% -0.2 0.68 ? 4% perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read
> 0.88 -0.1 0.78 ? 5% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
> 0.94 ? 3% -0.1 0.88 ? 4% perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> 0.62 ? 2% +0.3 0.90 ? 2% perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> 0.00 +0.6 0.58 ? 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
> 0.00 +0.6 0.61 ? 6% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 0.00 +0.6 0.62 ? 4% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
> 0.00 +0.7 0.67 ? 11% perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__schedule.schedule
> 0.00 +0.7 0.67 ? 7% perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_write
> 0.00 +0.8 0.76 ? 4% perf-profile.calltrace.cycles-pp.reweight_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
> 0.00 +0.8 0.77 ? 4% perf-profile.calltrace.cycles-pp.___perf_sw_event.prepare_task_switch.__schedule.schedule.schedule_timeout
> 0.00 +0.8 0.77 ? 8% perf-profile.calltrace.cycles-pp.put_prev_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
> 0.00 +0.8 0.81 ? 5% perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> 0.00 +0.8 0.81 ? 5% perf-profile.calltrace.cycles-pp.check_preempt_wakeup.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 0.00 +0.8 0.82 ? 2% perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_read
> 0.00 +0.8 0.82 ? 3% perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 0.00 +0.9 0.86 ? 5% perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 0.00 +0.9 0.87 ? 8% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> 29.66 +0.9 30.58 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 0.00 +1.0 0.95 ? 3% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.schedule_timeout
> 0.00 +1.0 0.98 ? 4% perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 0.00 +1.0 0.99 ? 3% perf-profile.calltrace.cycles-pp.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> 0.00 +1.0 1.05 ? 4% perf-profile.calltrace.cycles-pp.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> 0.00 +1.1 1.07 ? 12% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
> 27.81 ? 2% +1.2 28.98 perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
> 27.36 ? 2% +1.2 28.59 perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read
> 0.00 +1.5 1.46 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 0.00 +1.6 1.55 ? 4% perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> 0.00 +1.6 1.60 ? 4% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> 27.58 +1.6 29.19 perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +1.6 1.63 ? 5% perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
> 0.00 +1.6 1.65 ? 5% perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> 0.00 +1.7 1.66 ? 6% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> 0.00 +1.8 1.80 perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 0.00 +1.8 1.84 ? 2% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> 0.00 +2.0 1.97 ? 2% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> 26.63 ? 2% +2.0 28.61 perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
> 0.00 +2.0 2.01 ? 6% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> 0.00 +2.1 2.09 ? 6% perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 0.00 +2.1 2.11 ? 5% perf-profile.calltrace.cycles-pp.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 25.21 ? 2% +2.2 27.43 perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
> 0.00 +2.4 2.43 ? 5% perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 48.00 +2.7 50.69 perf-profile.calltrace.cycles-pp.__libc_write
> 0.00 +2.9 2.87 ? 5% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
> 0.09 ?223% +3.4 3.47 ? 3% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> 39.07 +4.8 43.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
> 0.66 ? 18% +5.0 5.62 ? 4% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> 4.73 +5.1 9.88 ? 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 0.66 ? 20% +5.3 5.98 ? 3% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> 35.96 +5.7 41.68 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 0.00 +6.0 6.02 ? 6% perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> 0.00 +6.2 6.18 ? 6% perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> 0.00 +6.4 6.36 ? 6% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.78 ? 19% +6.4 7.15 ? 3% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> 0.18 ?141% +7.0 7.18 ? 6% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> 1.89 ? 15% +12.1 13.96 ? 3% perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
> 1.92 ? 15% +12.3 14.23 ? 3% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
> 1.66 ? 19% +12.4 14.06 ? 2% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
> 1.96 ? 15% +12.5 14.48 ? 3% perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> 1.69 ? 19% +12.7 14.38 ? 2% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg
> 1.75 ? 19% +13.0 14.75 ? 2% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg
> 2.53 ? 10% +13.4 15.90 ? 2% perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> 1.96 ? 16% +13.5 15.42 ? 2% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> 2.28 ? 15% +14.6 16.86 ? 3% perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> 15.31 ? 4% -8.6 6.67 ? 5% perf-profile.children.cycles-pp.sock_alloc_send_pskb
> 13.85 ? 6% -7.9 5.98 ? 5% perf-profile.children.cycles-pp.alloc_skb_with_frags
> 13.70 ? 6% -7.8 5.89 ? 6% perf-profile.children.cycles-pp.__alloc_skb
> 9.01 ? 2% -5.5 3.48 ? 4% perf-profile.children.cycles-pp.consume_skb
> 6.86 ? 26% -4.7 2.15 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 11.27 ? 3% -4.6 6.67 ? 3% perf-profile.children.cycles-pp.syscall_return_via_sysret
> 6.46 ? 4% -4.3 2.15 ? 6% perf-profile.children.cycles-pp.skb_release_data
> 4.18 ? 25% -4.0 0.15 ? 69% perf-profile.children.cycles-pp.___slab_alloc
> 5.76 ? 32% -3.9 1.91 ? 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 5.98 ? 8% -3.5 2.52 ? 5% perf-profile.children.cycles-pp.kmem_cache_alloc_node
> 5.84 ? 6% -3.3 2.50 ? 6% perf-profile.children.cycles-pp.kmalloc_reserve
> 3.33 ? 30% -3.3 0.05 ? 88% perf-profile.children.cycles-pp.get_partial_node
> 5.63 ? 7% -3.3 2.37 ? 6% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
> 5.20 ? 7% -3.1 2.12 ? 6% perf-profile.children.cycles-pp.__kmem_cache_alloc_node
> 6.23 ? 2% -2.9 3.33 ? 3% perf-profile.children.cycles-pp.unix_stream_read_actor
> 6.18 ? 2% -2.9 3.31 ? 3% perf-profile.children.cycles-pp.skb_copy_datagram_iter
> 6.11 ? 2% -2.9 3.25 ? 3% perf-profile.children.cycles-pp.__skb_datagram_iter
> 51.39 -2.5 48.85 perf-profile.children.cycles-pp.__libc_read
> 3.14 ? 3% -2.5 0.61 ? 13% perf-profile.children.cycles-pp.__slab_free
> 5.34 ? 3% -2.1 3.23 ? 3% perf-profile.children.cycles-pp.__entry_text_start
> 3.57 ? 2% -1.9 1.66 ? 6% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> 3.16 ? 2% -1.7 1.43 ? 4% perf-profile.children.cycles-pp._copy_to_iter
> 2.74 ? 2% -1.6 1.16 ? 4% perf-profile.children.cycles-pp.copyout
> 4.16 ? 2% -1.5 2.62 ? 3% perf-profile.children.cycles-pp.__check_object_size
> 2.73 ? 3% -1.4 1.35 ? 6% perf-profile.children.cycles-pp.kmem_cache_free
> 2.82 ? 2% -1.2 1.63 ? 3% perf-profile.children.cycles-pp.check_heap_object
> 2.27 ? 4% -1.1 1.13 ? 2% perf-profile.children.cycles-pp.skb_release_head_state
> 2.85 ? 3% -1.1 1.72 ? 2% perf-profile.children.cycles-pp.simple_copy_to_iter
> 2.22 ? 4% -1.1 1.10 perf-profile.children.cycles-pp.unix_destruct_scm
> 3.00 ? 2% -1.1 1.91 ? 5% perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
> 2.00 ? 4% -1.1 0.92 ? 2% perf-profile.children.cycles-pp.sock_wfree
> 2.16 ? 3% -0.7 1.43 ? 7% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
> 1.45 ? 3% -0.7 0.73 ? 7% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 2.21 ? 2% -0.7 1.52 ? 3% perf-profile.children.cycles-pp.__kmem_cache_free
> 1.49 ? 3% -0.6 0.89 ? 8% perf-profile.children.cycles-pp._copy_from_iter
> 1.40 ? 3% -0.6 0.85 ? 13% perf-profile.children.cycles-pp.mod_objcg_state
> 0.74 -0.5 0.24 ? 16% perf-profile.children.cycles-pp.__build_skb_around
> 1.48 -0.5 1.01 ? 2% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
> 2.05 ? 2% -0.5 1.59 ? 2% perf-profile.children.cycles-pp.security_file_permission
> 0.98 ? 2% -0.4 0.59 ? 10% perf-profile.children.cycles-pp.copyin
> 1.08 ? 3% -0.4 0.72 ? 3% perf-profile.children.cycles-pp.__might_resched
> 1.75 -0.3 1.42 ? 4% perf-profile.children.cycles-pp.apparmor_file_permission
> 1.32 ? 4% -0.3 1.00 ? 3% perf-profile.children.cycles-pp.sock_recvmsg
> 0.54 ? 4% -0.3 0.25 ? 6% perf-profile.children.cycles-pp.skb_unlink
> 0.54 ? 6% -0.3 0.26 ? 3% perf-profile.children.cycles-pp.unix_write_space
> 0.66 ? 3% -0.3 0.39 ? 4% perf-profile.children.cycles-pp.obj_cgroup_charge
> 0.68 ? 2% -0.3 0.41 ? 4% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.86 ? 4% -0.3 0.59 ? 3% perf-profile.children.cycles-pp.__check_heap_object
> 0.75 ? 9% -0.3 0.48 ? 2% perf-profile.children.cycles-pp.skb_set_owner_w
> 1.84 ? 3% -0.3 1.58 ? 4% perf-profile.children.cycles-pp.aa_sk_perm
> 0.68 ? 11% -0.2 0.44 ? 3% perf-profile.children.cycles-pp.skb_queue_tail
> 1.22 ? 4% -0.2 0.99 ? 5% perf-profile.children.cycles-pp.__fdget_pos
> 0.70 ? 2% -0.2 0.48 ? 5% perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
> 1.16 ? 4% -0.2 0.93 ? 3% perf-profile.children.cycles-pp.security_socket_recvmsg
> 0.48 ? 3% -0.2 0.29 ? 4% perf-profile.children.cycles-pp.__might_fault
> 0.24 ? 7% -0.2 0.05 ? 56% perf-profile.children.cycles-pp.fsnotify_perm
> 1.12 ? 4% -0.2 0.93 ? 6% perf-profile.children.cycles-pp.__fget_light
> 1.24 ? 3% -0.2 1.07 ? 3% perf-profile.children.cycles-pp.security_socket_sendmsg
> 0.61 ? 3% -0.2 0.45 ? 2% perf-profile.children.cycles-pp.__might_sleep
> 0.33 ? 5% -0.2 0.17 ? 6% perf-profile.children.cycles-pp.refill_obj_stock
> 0.40 ? 2% -0.1 0.25 ? 4% perf-profile.children.cycles-pp.kmalloc_slab
> 0.57 ? 2% -0.1 0.45 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 0.54 ? 3% -0.1 0.42 ? 2% perf-profile.children.cycles-pp.wait_for_unix_gc
> 0.42 ? 2% -0.1 0.30 ? 3% perf-profile.children.cycles-pp.is_vmalloc_addr
> 1.00 ? 2% -0.1 0.87 ? 5% perf-profile.children.cycles-pp.__virt_addr_valid
> 0.52 ? 2% -0.1 0.41 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> 0.33 ? 3% -0.1 0.21 ? 3% perf-profile.children.cycles-pp.tick_sched_handle
> 0.36 ? 2% -0.1 0.25 ? 4% perf-profile.children.cycles-pp.tick_sched_timer
> 0.47 ? 2% -0.1 0.36 ? 2% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.48 ? 2% -0.1 0.36 ? 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 0.32 ? 3% -0.1 0.21 ? 5% perf-profile.children.cycles-pp.update_process_times
> 0.42 ? 3% -0.1 0.31 ? 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 0.26 ? 6% -0.1 0.16 ? 4% perf-profile.children.cycles-pp.kmalloc_size_roundup
> 0.20 ? 4% -0.1 0.10 ? 9% perf-profile.children.cycles-pp.task_tick_fair
> 0.24 ? 3% -0.1 0.15 ? 4% perf-profile.children.cycles-pp.scheduler_tick
> 0.30 ? 5% -0.1 0.21 ? 8% perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
> 0.20 ? 2% -0.1 0.11 ? 6% perf-profile.children.cycles-pp.should_failslab
> 0.51 ? 2% -0.1 0.43 ? 6% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> 0.15 ? 8% -0.1 0.07 ? 13% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.19 ? 4% -0.1 0.12 ? 5% perf-profile.children.cycles-pp.apparmor_socket_sendmsg
> 0.20 ? 4% -0.1 0.13 ? 5% perf-profile.children.cycles-pp.aa_file_perm
> 0.18 ? 5% -0.1 0.12 ? 5% perf-profile.children.cycles-pp.apparmor_socket_recvmsg
> 0.14 ? 13% -0.1 0.08 ? 55% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
> 0.24 ? 4% -0.1 0.18 ? 2% perf-profile.children.cycles-pp.rcu_all_qs
> 0.18 ? 10% -0.1 0.12 ? 11% perf-profile.children.cycles-pp.memcg_account_kmem
> 0.37 ? 3% -0.1 0.31 ? 3% perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
> 0.08 -0.0 0.06 ? 8% perf-profile.children.cycles-pp.put_pid
> 0.18 ? 3% -0.0 0.16 ? 4% perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram
> 0.21 ? 3% +0.0 0.23 ? 2% perf-profile.children.cycles-pp.__get_task_ioprio
> 0.00 +0.1 0.05 perf-profile.children.cycles-pp.perf_exclude_event
> 0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.invalidate_user_asid
> 0.00 +0.1 0.07 ? 6% perf-profile.children.cycles-pp.__bitmap_and
> 0.05 +0.1 0.13 ? 8% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
> 0.00 +0.1 0.08 ? 7% perf-profile.children.cycles-pp.schedule_debug
> 0.00 +0.1 0.08 ? 13% perf-profile.children.cycles-pp.read@plt
> 0.00 +0.1 0.08 ? 5% perf-profile.children.cycles-pp.sysvec_reschedule_ipi
> 0.00 +0.1 0.10 ? 4% perf-profile.children.cycles-pp.tracing_gen_ctx_irq_test
> 0.00 +0.1 0.10 ? 4% perf-profile.children.cycles-pp.place_entity
> 0.00 +0.1 0.12 ? 10% perf-profile.children.cycles-pp.native_irq_return_iret
> 0.07 ? 14% +0.1 0.19 ? 3% perf-profile.children.cycles-pp.__list_add_valid
> 0.00 +0.1 0.13 ? 6% perf-profile.children.cycles-pp.perf_trace_buf_alloc
> 0.00 +0.1 0.13 ? 34% perf-profile.children.cycles-pp._find_next_and_bit
> 0.00 +0.1 0.14 ? 5% perf-profile.children.cycles-pp.switch_ldt
> 0.00 +0.1 0.15 ? 5% perf-profile.children.cycles-pp.check_cfs_rq_runtime
> 0.00 +0.1 0.15 ? 30% perf-profile.children.cycles-pp.migrate_task_rq_fair
> 0.00 +0.2 0.15 ? 5% perf-profile.children.cycles-pp.__rdgsbase_inactive
> 0.00 +0.2 0.16 ? 3% perf-profile.children.cycles-pp.save_fpregs_to_fpstate
> 0.00 +0.2 0.16 ? 6% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> 0.00 +0.2 0.17 perf-profile.children.cycles-pp.perf_trace_buf_update
> 0.00 +0.2 0.18 ? 2% perf-profile.children.cycles-pp.rb_insert_color
> 0.00 +0.2 0.18 ? 4% perf-profile.children.cycles-pp.rb_next
> 0.00 +0.2 0.18 ? 21% perf-profile.children.cycles-pp.__cgroup_account_cputime
> 0.01 ?223% +0.2 0.21 ? 28% perf-profile.children.cycles-pp.perf_trace_sched_switch
> 0.00 +0.2 0.20 ? 3% perf-profile.children.cycles-pp.select_idle_cpu
> 0.00 +0.2 0.20 ? 3% perf-profile.children.cycles-pp.rcu_note_context_switch
> 0.00 +0.2 0.21 ? 26% perf-profile.children.cycles-pp.set_task_cpu
> 0.00 +0.2 0.22 ? 8% perf-profile.children.cycles-pp.resched_curr
> 0.08 ? 5% +0.2 0.31 ? 11% perf-profile.children.cycles-pp.task_h_load
> 0.00 +0.2 0.24 ? 3% perf-profile.children.cycles-pp.finish_wait
> 0.04 ? 44% +0.3 0.29 ? 5% perf-profile.children.cycles-pp.rb_erase
> 0.19 ? 6% +0.3 0.46 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> 0.20 ? 6% +0.3 0.47 ? 3% perf-profile.children.cycles-pp.__list_del_entry_valid
> 0.00 +0.3 0.28 ? 3% perf-profile.children.cycles-pp.__wrgsbase_inactive
> 0.02 ?141% +0.3 0.30 ? 2% perf-profile.children.cycles-pp.native_sched_clock
> 0.06 ? 13% +0.3 0.34 ? 2% perf-profile.children.cycles-pp.sched_clock_cpu
> 0.64 ? 2% +0.3 0.93 perf-profile.children.cycles-pp.mutex_lock
> 0.00 +0.3 0.30 ? 5% perf-profile.children.cycles-pp.cr4_update_irqsoff
> 0.00 +0.3 0.30 ? 4% perf-profile.children.cycles-pp.clear_buddies
> 0.07 ? 55% +0.3 0.37 ? 5% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
> 0.10 ? 66% +0.3 0.42 ? 5% perf-profile.children.cycles-pp.perf_tp_event
> 0.02 ?142% +0.3 0.36 ? 6% perf-profile.children.cycles-pp.cpuacct_charge
> 0.12 ? 9% +0.4 0.47 ? 11% perf-profile.children.cycles-pp.wake_affine
> 0.00 +0.4 0.36 ? 13% perf-profile.children.cycles-pp.available_idle_cpu
> 0.05 ? 48% +0.4 0.42 ? 6% perf-profile.children.cycles-pp.finish_task_switch
> 0.12 ? 4% +0.4 0.49 ? 4% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> 0.07 ? 17% +0.4 0.48 perf-profile.children.cycles-pp.__calc_delta
> 0.03 ?100% +0.5 0.49 ? 4% perf-profile.children.cycles-pp.pick_next_entity
> 0.00 +0.5 0.48 ? 8% perf-profile.children.cycles-pp.set_next_buddy
> 0.08 ? 14% +0.6 0.66 ? 4% perf-profile.children.cycles-pp.update_min_vruntime
> 0.07 ? 17% +0.6 0.68 ? 2% perf-profile.children.cycles-pp.os_xsave
> 0.29 ? 7% +0.7 0.99 ? 3% perf-profile.children.cycles-pp.update_cfs_group
> 0.17 ? 17% +0.7 0.87 ? 4% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
> 0.14 ? 7% +0.7 0.87 ? 3% perf-profile.children.cycles-pp.__update_load_avg_se
> 0.14 ? 16% +0.8 0.90 ? 2% perf-profile.children.cycles-pp.update_rq_clock
> 0.08 ? 17% +0.8 0.84 ? 5% perf-profile.children.cycles-pp.check_preempt_wakeup
> 0.12 ? 14% +0.8 0.95 ? 3% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> 0.22 ? 5% +0.8 1.07 ? 3% perf-profile.children.cycles-pp.prepare_to_wait
> 0.10 ? 18% +0.9 0.98 ? 3% perf-profile.children.cycles-pp.check_preempt_curr
> 29.72 +0.9 30.61 perf-profile.children.cycles-pp.vfs_write
> 0.14 ? 11% +0.9 1.03 ? 4% perf-profile.children.cycles-pp.__switch_to
> 0.07 ? 20% +0.9 0.99 ? 6% perf-profile.children.cycles-pp.put_prev_entity
> 0.12 ? 16% +1.0 1.13 ? 5% perf-profile.children.cycles-pp.___perf_sw_event
> 0.07 ? 17% +1.0 1.10 ? 13% perf-profile.children.cycles-pp.select_idle_sibling
> 27.82 ? 2% +1.2 28.99 perf-profile.children.cycles-pp.unix_stream_recvmsg
> 27.41 ? 2% +1.2 28.63 perf-profile.children.cycles-pp.unix_stream_read_generic
> 0.20 ? 15% +1.4 1.59 ? 3% perf-profile.children.cycles-pp.reweight_entity
> 0.21 ? 13% +1.4 1.60 ? 4% perf-profile.children.cycles-pp.__switch_to_asm
> 0.23 ? 10% +1.4 1.65 ? 5% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> 0.20 ? 13% +1.5 1.69 ? 3% perf-profile.children.cycles-pp.set_next_entity
> 27.59 +1.6 29.19 perf-profile.children.cycles-pp.sock_write_iter
> 0.28 ? 10% +1.8 2.12 ? 5% perf-profile.children.cycles-pp.switch_fpu_return
> 0.26 ? 11% +1.8 2.10 ? 6% perf-profile.children.cycles-pp.select_task_rq_fair
> 26.66 ? 2% +2.0 28.63 perf-profile.children.cycles-pp.sock_sendmsg
> 0.31 ? 12% +2.1 2.44 ? 5% perf-profile.children.cycles-pp.select_task_rq
> 0.30 ? 14% +2.2 2.46 ? 4% perf-profile.children.cycles-pp.prepare_task_switch
> 25.27 ? 2% +2.2 27.47 perf-profile.children.cycles-pp.unix_stream_sendmsg
> 2.10 +2.3 4.38 ? 2% perf-profile.children.cycles-pp._raw_spin_lock
> 0.40 ? 14% +2.5 2.92 ? 5% perf-profile.children.cycles-pp.dequeue_entity
> 48.40 +2.6 51.02 perf-profile.children.cycles-pp.__libc_write
> 0.46 ? 15% +3.1 3.51 ? 3% perf-profile.children.cycles-pp.enqueue_entity
> 0.49 ? 10% +3.2 3.64 ? 7% perf-profile.children.cycles-pp.update_load_avg
> 0.53 ? 20% +3.4 3.91 ? 3% perf-profile.children.cycles-pp.update_curr
> 80.81 +3.4 84.24 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.50 ? 12% +3.5 4.00 ? 4% perf-profile.children.cycles-pp.switch_mm_irqs_off
> 0.55 ? 9% +3.8 4.38 ? 4% perf-profile.children.cycles-pp.pick_next_task_fair
> 9.60 +4.6 14.15 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 0.78 ? 13% +4.9 5.65 ? 4% perf-profile.children.cycles-pp.dequeue_task_fair
> 0.78 ? 15% +5.2 5.99 ? 3% perf-profile.children.cycles-pp.enqueue_task_fair
> 74.30 +5.6 79.86 perf-profile.children.cycles-pp.do_syscall_64
> 0.90 ? 15% +6.3 7.16 ? 3% perf-profile.children.cycles-pp.ttwu_do_activate
> 0.33 ? 31% +6.3 6.61 ? 6% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> 0.82 ? 15% +8.1 8.92 ? 5% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> 1.90 ? 16% +12.2 14.10 ? 2% perf-profile.children.cycles-pp.try_to_wake_up
> 2.36 ? 11% +12.2 14.60 ? 3% perf-profile.children.cycles-pp.schedule_timeout
> 1.95 ? 15% +12.5 14.41 ? 2% perf-profile.children.cycles-pp.autoremove_wake_function
> 2.01 ? 15% +12.8 14.76 ? 2% perf-profile.children.cycles-pp.__wake_up_common
> 2.23 ? 13% +13.2 15.45 ? 2% perf-profile.children.cycles-pp.__wake_up_common_lock
> 2.53 ? 10% +13.4 15.90 ? 2% perf-profile.children.cycles-pp.sock_def_readable
> 2.29 ? 15% +14.6 16.93 ? 3% perf-profile.children.cycles-pp.unix_stream_data_wait
> 2.61 ? 13% +18.0 20.65 ? 4% perf-profile.children.cycles-pp.schedule
> 2.66 ? 13% +18.1 20.77 ? 4% perf-profile.children.cycles-pp.__schedule
> 11.25 ? 3% -4.6 6.67 ? 3% perf-profile.self.cycles-pp.syscall_return_via_sysret
> 5.76 ? 32% -3.9 1.90 ? 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 8.69 ? 3% -3.4 5.27 ? 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 3.11 ? 3% -2.5 0.60 ? 13% perf-profile.self.cycles-pp.__slab_free
> 6.65 ? 2% -2.2 4.47 ? 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 4.78 ? 3% -1.9 2.88 ? 3% perf-profile.self.cycles-pp.__entry_text_start
> 3.52 ? 2% -1.9 1.64 ? 6% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> 2.06 ? 3% -1.1 0.96 ? 5% perf-profile.self.cycles-pp.kmem_cache_free
> 1.42 ? 3% -1.0 0.46 ? 10% perf-profile.self.cycles-pp.check_heap_object
> 1.43 ? 4% -0.8 0.64 perf-profile.self.cycles-pp.sock_wfree
> 0.99 ? 3% -0.8 0.21 ? 12% perf-profile.self.cycles-pp.skb_release_data
> 0.84 ? 8% -0.7 0.10 ? 64% perf-profile.self.cycles-pp.___slab_alloc
> 1.97 ? 2% -0.6 1.32 perf-profile.self.cycles-pp.unix_stream_read_generic
> 1.60 ? 3% -0.5 1.11 ? 4% perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
> 1.24 ? 2% -0.5 0.75 ? 11% perf-profile.self.cycles-pp.mod_objcg_state
> 0.71 -0.5 0.23 ? 15% perf-profile.self.cycles-pp.__build_skb_around
> 0.95 ? 3% -0.5 0.50 ? 6% perf-profile.self.cycles-pp.__alloc_skb
> 0.97 ? 4% -0.4 0.55 ? 5% perf-profile.self.cycles-pp.kmem_cache_alloc_node
> 0.99 ? 3% -0.4 0.59 ? 4% perf-profile.self.cycles-pp.vfs_write
> 1.38 ? 2% -0.4 0.99 perf-profile.self.cycles-pp.__kmem_cache_free
> 0.86 ? 2% -0.4 0.50 ? 3% perf-profile.self.cycles-pp.__kmem_cache_alloc_node
> 0.92 ? 4% -0.4 0.56 ? 4% perf-profile.self.cycles-pp.sock_write_iter
> 1.06 ? 3% -0.4 0.70 ? 3% perf-profile.self.cycles-pp.__might_resched
> 0.73 ? 4% -0.3 0.44 ? 4% perf-profile.self.cycles-pp.__cond_resched
> 0.85 ? 3% -0.3 0.59 ? 4% perf-profile.self.cycles-pp.__check_heap_object
> 1.46 ? 7% -0.3 1.20 ? 2% perf-profile.self.cycles-pp.unix_stream_sendmsg
> 0.73 ? 9% -0.3 0.47 ? 2% perf-profile.self.cycles-pp.skb_set_owner_w
> 1.54 -0.3 1.28 ? 4% perf-profile.self.cycles-pp.apparmor_file_permission
> 0.74 ? 3% -0.2 0.50 ? 2% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
> 1.15 ? 3% -0.2 0.91 ? 8% perf-profile.self.cycles-pp.aa_sk_perm
> 0.60 -0.2 0.36 ? 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.65 ? 4% -0.2 0.45 ? 6% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
> 0.24 ? 6% -0.2 0.05 ? 56% perf-profile.self.cycles-pp.fsnotify_perm
> 0.76 ? 3% -0.2 0.58 ? 2% perf-profile.self.cycles-pp.sock_read_iter
> 1.10 ? 4% -0.2 0.92 ? 6% perf-profile.self.cycles-pp.__fget_light
> 0.42 ? 3% -0.2 0.25 ? 4% perf-profile.self.cycles-pp.obj_cgroup_charge
> 0.32 ? 4% -0.2 0.17 ? 6% perf-profile.self.cycles-pp.refill_obj_stock
> 0.29 -0.2 0.14 ? 8% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
> 0.54 ? 3% -0.1 0.40 ? 2% perf-profile.self.cycles-pp.__might_sleep
> 0.30 ? 7% -0.1 0.16 ? 22% perf-profile.self.cycles-pp.security_file_permission
> 0.34 ? 3% -0.1 0.21 ? 6% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.41 ? 3% -0.1 0.29 ? 3% perf-profile.self.cycles-pp.is_vmalloc_addr
> 0.27 ? 3% -0.1 0.16 ? 6% perf-profile.self.cycles-pp._copy_from_iter
> 0.24 ? 3% -0.1 0.12 ? 3% perf-profile.self.cycles-pp.ksys_write
> 0.95 ? 2% -0.1 0.84 ? 5% perf-profile.self.cycles-pp.__virt_addr_valid
> 0.56 ? 11% -0.1 0.46 ? 4% perf-profile.self.cycles-pp.sock_def_readable
> 0.16 ? 7% -0.1 0.06 ? 18% perf-profile.self.cycles-pp.sock_recvmsg
> 0.22 ? 5% -0.1 0.14 ? 2% perf-profile.self.cycles-pp.ksys_read
> 0.27 ? 4% -0.1 0.19 ? 5% perf-profile.self.cycles-pp.kmalloc_slab
> 0.28 ? 2% -0.1 0.20 ? 2% perf-profile.self.cycles-pp.consume_skb
> 0.35 ? 2% -0.1 0.28 ? 3% perf-profile.self.cycles-pp.__check_object_size
> 0.13 ? 8% -0.1 0.06 ? 18% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.20 ? 5% -0.1 0.12 ? 6% perf-profile.self.cycles-pp.kmalloc_reserve
> 0.26 ? 5% -0.1 0.19 ? 4% perf-profile.self.cycles-pp.sock_alloc_send_pskb
> 0.42 ? 2% -0.1 0.35 ? 7% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> 0.19 ? 5% -0.1 0.12 ? 6% perf-profile.self.cycles-pp.aa_file_perm
> 0.16 ? 4% -0.1 0.10 ? 4% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
> 0.18 ? 4% -0.1 0.12 ? 6% perf-profile.self.cycles-pp.apparmor_socket_sendmsg
> 0.18 ? 5% -0.1 0.12 ? 4% perf-profile.self.cycles-pp.apparmor_socket_recvmsg
> 0.15 ? 5% -0.1 0.10 ? 5% perf-profile.self.cycles-pp.alloc_skb_with_frags
> 0.64 ? 3% -0.1 0.59 perf-profile.self.cycles-pp.__libc_write
> 0.20 ? 4% -0.1 0.15 ? 3% perf-profile.self.cycles-pp._copy_to_iter
> 0.15 ? 5% -0.1 0.10 ? 11% perf-profile.self.cycles-pp.sock_sendmsg
> 0.08 ? 4% -0.1 0.03 ? 81% perf-profile.self.cycles-pp.copyout
> 0.11 ? 6% -0.0 0.06 ? 7% perf-profile.self.cycles-pp.__fdget_pos
> 0.12 ? 5% -0.0 0.07 ? 10% perf-profile.self.cycles-pp.kmalloc_size_roundup
> 0.34 ? 3% -0.0 0.29 perf-profile.self.cycles-pp.do_syscall_64
> 0.20 ? 4% -0.0 0.15 ? 4% perf-profile.self.cycles-pp.rcu_all_qs
> 0.41 ? 3% -0.0 0.37 ? 8% perf-profile.self.cycles-pp.unix_stream_recvmsg
> 0.22 ? 2% -0.0 0.17 ? 4% perf-profile.self.cycles-pp.unix_destruct_scm
> 0.09 ? 4% -0.0 0.05 perf-profile.self.cycles-pp.should_failslab
> 0.10 ? 15% -0.0 0.06 ? 50% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
> 0.11 ? 4% -0.0 0.07 perf-profile.self.cycles-pp.__might_fault
> 0.16 ? 2% -0.0 0.13 ? 6% perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
> 0.18 ? 4% -0.0 0.16 ? 3% perf-profile.self.cycles-pp.security_socket_getpeersec_dgram
> 0.28 ? 2% -0.0 0.25 ? 2% perf-profile.self.cycles-pp.unix_write_space
> 0.17 ? 2% -0.0 0.15 ? 5% perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram
> 0.08 ? 6% -0.0 0.05 ? 7% perf-profile.self.cycles-pp.security_socket_sendmsg
> 0.12 ? 4% -0.0 0.10 ? 3% perf-profile.self.cycles-pp.__skb_datagram_iter
> 0.24 ? 2% -0.0 0.22 perf-profile.self.cycles-pp.mutex_unlock
> 0.08 ? 5% +0.0 0.10 ? 6% perf-profile.self.cycles-pp.scm_recv
> 0.17 ? 2% +0.0 0.19 ? 3% perf-profile.self.cycles-pp.__x64_sys_read
> 0.19 ? 3% +0.0 0.22 ? 2% perf-profile.self.cycles-pp.__get_task_ioprio
> 0.00 +0.1 0.06 perf-profile.self.cycles-pp.finish_wait
> 0.00 +0.1 0.06 ? 7% perf-profile.self.cycles-pp.cr4_update_irqsoff
> 0.00 +0.1 0.06 ? 7% perf-profile.self.cycles-pp.invalidate_user_asid
> 0.00 +0.1 0.07 ? 12% perf-profile.self.cycles-pp.wake_affine
> 0.00 +0.1 0.07 ? 7% perf-profile.self.cycles-pp.check_cfs_rq_runtime
> 0.00 +0.1 0.07 ? 5% perf-profile.self.cycles-pp.perf_trace_buf_update
> 0.00 +0.1 0.07 ? 9% perf-profile.self.cycles-pp.asm_sysvec_reschedule_ipi
> 0.00 +0.1 0.07 ? 10% perf-profile.self.cycles-pp.__bitmap_and
> 0.00 +0.1 0.08 ? 10% perf-profile.self.cycles-pp.schedule_debug
> 0.00 +0.1 0.08 ? 13% perf-profile.self.cycles-pp.read@plt
> 0.00 +0.1 0.08 ? 12% perf-profile.self.cycles-pp.perf_trace_buf_alloc
> 0.00 +0.1 0.09 ? 35% perf-profile.self.cycles-pp.migrate_task_rq_fair
> 0.00 +0.1 0.09 ? 5% perf-profile.self.cycles-pp.place_entity
> 0.00 +0.1 0.10 ? 4% perf-profile.self.cycles-pp.tracing_gen_ctx_irq_test
> 0.00 +0.1 0.10 perf-profile.self.cycles-pp.__wake_up_common_lock
> 0.07 ? 17% +0.1 0.18 ? 3% perf-profile.self.cycles-pp.__list_add_valid
> 0.00 +0.1 0.11 ? 8% perf-profile.self.cycles-pp.native_irq_return_iret
> 0.00 +0.1 0.12 ? 6% perf-profile.self.cycles-pp.select_idle_cpu
> 0.00 +0.1 0.12 ? 34% perf-profile.self.cycles-pp._find_next_and_bit
> 0.00 +0.1 0.13 ? 25% perf-profile.self.cycles-pp.__cgroup_account_cputime
> 0.00 +0.1 0.13 ? 7% perf-profile.self.cycles-pp.switch_ldt
> 0.00 +0.1 0.14 ? 5% perf-profile.self.cycles-pp.check_preempt_curr
> 0.00 +0.1 0.15 ? 2% perf-profile.self.cycles-pp.save_fpregs_to_fpstate
> 0.00 +0.1 0.15 ? 5% perf-profile.self.cycles-pp.__rdgsbase_inactive
> 0.14 ? 3% +0.2 0.29 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> 0.00 +0.2 0.15 ? 7% perf-profile.self.cycles-pp.ttwu_queue_wakelist
> 0.00 +0.2 0.17 ? 4% perf-profile.self.cycles-pp.rb_insert_color
> 0.00 +0.2 0.17 ? 5% perf-profile.self.cycles-pp.rb_next
> 0.00 +0.2 0.18 ? 2% perf-profile.self.cycles-pp.autoremove_wake_function
> 0.01 ?223% +0.2 0.19 ? 6% perf-profile.self.cycles-pp.ttwu_do_activate
> 0.00 +0.2 0.20 ? 2% perf-profile.self.cycles-pp.rcu_note_context_switch
> 0.00 +0.2 0.20 ? 7% perf-profile.self.cycles-pp.exit_to_user_mode_loop
> 0.27 +0.2 0.47 ? 3% perf-profile.self.cycles-pp.mutex_lock
> 0.00 +0.2 0.20 ? 28% perf-profile.self.cycles-pp.perf_trace_sched_switch
> 0.00 +0.2 0.21 ? 9% perf-profile.self.cycles-pp.resched_curr
> 0.04 ? 45% +0.2 0.26 ? 7% perf-profile.self.cycles-pp.perf_tp_event
> 0.06 ? 7% +0.2 0.28 ? 8% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
> 0.19 ? 7% +0.2 0.41 ? 5% perf-profile.self.cycles-pp.__list_del_entry_valid
> 0.08 ? 5% +0.2 0.31 ? 11% perf-profile.self.cycles-pp.task_h_load
> 0.00 +0.2 0.23 ? 5% perf-profile.self.cycles-pp.finish_task_switch
> 0.03 ? 70% +0.2 0.27 ? 5% perf-profile.self.cycles-pp.rb_erase
> 0.02 ?142% +0.3 0.29 ? 2% perf-profile.self.cycles-pp.native_sched_clock
> 0.00 +0.3 0.28 ? 3% perf-profile.self.cycles-pp.__wrgsbase_inactive
> 0.00 +0.3 0.28 ? 6% perf-profile.self.cycles-pp.clear_buddies
> 0.07 ? 10% +0.3 0.35 ? 3% perf-profile.self.cycles-pp.schedule_timeout
> 0.03 ? 70% +0.3 0.33 ? 3% perf-profile.self.cycles-pp.select_task_rq
> 0.06 ? 13% +0.3 0.36 ? 4% perf-profile.self.cycles-pp.__wake_up_common
> 0.06 ? 13% +0.3 0.36 ? 3% perf-profile.self.cycles-pp.dequeue_entity
> 0.06 ? 18% +0.3 0.37 ? 7% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> 0.01 ?223% +0.3 0.33 ? 4% perf-profile.self.cycles-pp.schedule
> 0.02 ?142% +0.3 0.35 ? 7% perf-profile.self.cycles-pp.cpuacct_charge
> 0.01 ?223% +0.3 0.35 perf-profile.self.cycles-pp.set_next_entity
> 0.00 +0.4 0.35 ? 13% perf-profile.self.cycles-pp.available_idle_cpu
> 0.08 ? 10% +0.4 0.44 ? 5% perf-profile.self.cycles-pp.prepare_to_wait
> 0.63 ? 3% +0.4 1.00 ? 4% perf-profile.self.cycles-pp.vfs_read
> 0.02 ?142% +0.4 0.40 ? 4% perf-profile.self.cycles-pp.check_preempt_wakeup
> 0.02 ?141% +0.4 0.42 ? 4% perf-profile.self.cycles-pp.pick_next_entity
> 0.07 ? 17% +0.4 0.48 perf-profile.self.cycles-pp.__calc_delta
> 0.06 ? 14% +0.4 0.47 ? 3% perf-profile.self.cycles-pp.unix_stream_data_wait
> 0.04 ? 45% +0.4 0.45 ? 4% perf-profile.self.cycles-pp.switch_fpu_return
> 0.00 +0.5 0.46 ? 7% perf-profile.self.cycles-pp.set_next_buddy
> 0.07 ? 17% +0.5 0.53 ? 3% perf-profile.self.cycles-pp.select_task_rq_fair
> 0.08 ? 16% +0.5 0.55 ? 4% perf-profile.self.cycles-pp.try_to_wake_up
> 0.08 ? 19% +0.5 0.56 ? 3% perf-profile.self.cycles-pp.update_rq_clock
> 0.02 ?141% +0.5 0.50 ? 10% perf-profile.self.cycles-pp.select_idle_sibling
> 0.77 ? 2% +0.5 1.25 ? 2% perf-profile.self.cycles-pp.__libc_read
> 0.09 ? 19% +0.5 0.59 ? 3% perf-profile.self.cycles-pp.reweight_entity
> 0.08 ? 14% +0.5 0.59 ? 2% perf-profile.self.cycles-pp.dequeue_task_fair
> 0.08 ? 13% +0.6 0.64 ? 5% perf-profile.self.cycles-pp.update_min_vruntime
> 0.02 ?141% +0.6 0.58 ? 7% perf-profile.self.cycles-pp.put_prev_entity
> 0.06 ? 11% +0.6 0.64 ? 4% perf-profile.self.cycles-pp.enqueue_task_fair
> 0.07 ? 18% +0.6 0.68 ? 3% perf-profile.self.cycles-pp.os_xsave
> 1.39 ? 2% +0.7 2.06 ? 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.28 ? 8% +0.7 0.97 ? 4% perf-profile.self.cycles-pp.update_cfs_group
> 0.14 ? 8% +0.7 0.83 ? 3% perf-profile.self.cycles-pp.__update_load_avg_se
> 1.76 ? 3% +0.7 2.47 ? 3% perf-profile.self.cycles-pp._raw_spin_lock
> 0.12 ? 12% +0.7 0.85 ? 5% perf-profile.self.cycles-pp.prepare_task_switch
> 0.12 ? 12% +0.8 0.91 ? 3% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> 0.13 ? 12% +0.8 0.93 ? 5% perf-profile.self.cycles-pp.pick_next_task_fair
> 0.13 ? 12% +0.9 0.98 ? 4% perf-profile.self.cycles-pp.__switch_to
> 0.11 ? 18% +0.9 1.06 ? 5% perf-profile.self.cycles-pp.___perf_sw_event
> 0.16 ? 11% +1.2 1.34 ? 4% perf-profile.self.cycles-pp.enqueue_entity
> 0.20 ? 12% +1.4 1.58 ? 4% perf-profile.self.cycles-pp.__switch_to_asm
> 0.23 ? 10% +1.4 1.65 ? 5% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> 0.25 ? 12% +1.5 1.77 ? 4% perf-profile.self.cycles-pp.__schedule
> 0.22 ? 10% +1.6 1.78 ? 10% perf-profile.self.cycles-pp.update_load_avg
> 0.23 ? 16% +1.7 1.91 ? 7% perf-profile.self.cycles-pp.update_curr
> 0.48 ? 11% +3.4 3.86 ? 4% perf-profile.self.cycles-pp.switch_mm_irqs_off
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




2023-02-21 17:27:13

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
>
> On Tue, Feb 21, 2023 at 10:38:44AM +0100, Vincent Guittot wrote:
> > On Thu, 9 Feb 2023 at 20:31, Roman Kagan <[email protected]> wrote:
> > >
> > > From: Zhang Qiao <[email protected]>
> > >
> > > When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> > > to the base level (around cfs_rq->min_vruntime), so that the entity
> > > doesn't gain extra boost when placed backwards.
> > >
> > > However, if the entity being placed wasn't executed for a long time, its
> > > vruntime may get too far behind (e.g. while cfs_rq was executing a
> > > low-weight hog), which can inverse the vruntime comparison due to s64
> > > overflow. This results in the entity being placed with its original
> > > vruntime way forwards, so that it will effectively never get to the cpu.
> > >
> > > To prevent that, ignore the vruntime of the entity being placed if it
> > > didn't execute for longer than the time that can lead to an overflow.
> > >
> > > Signed-off-by: Zhang Qiao <[email protected]>
> > > [rkagan: formatted, adjusted commit log, comments, cutoff value]
> > > Co-developed-by: Roman Kagan <[email protected]>
> > > Signed-off-by: Roman Kagan <[email protected]>
> >
> > Reviewed-by: Vincent Guittot <[email protected]>
> >
> > > ---
> > > v2 -> v3:
> > > - make cutoff less arbitrary and update comments [Vincent]
> > >
> > > v1 -> v2:
> > > - add Zhang Qiao's s-o-b
> > > - fix constant promotion on 32bit
> > >
> > > kernel/sched/fair.c | 21 +++++++++++++++++++--
> > > 1 file changed, 19 insertions(+), 2 deletions(-)
>
> Turns out Peter took v2 through his tree, and it has already landed in
> Linus' master.
>
> What scares me, though, is that I've got a message from the test robot
> that this commit drammatically affected hackbench results, see the quote
> below. I expected the commit not to affect any benchmarks.
>
> Any idea what could have caused this change?

Hmm, It's most probably because se->exec_start is reset after a
migration and the condition becomes true for newly migrated task
whereas its vruntime should be after min_vruntime.

We have missed this condition

>
> Thanks,
> Roman.
>
>
> On Tue, Feb 21, 2023 at 03:34:16PM +0800, kernel test robot wrote:
> > FYI, we noticed a 125.5% improvement of hackbench.throughput due to commit:
> >
> > commit: 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ("sched/fair: sanitize vruntime of entity being placed")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: hackbench
> > on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> > with following parameters:
> >
> > nr_threads: 50%
> > iterations: 8
> > mode: process
> > ipc: pipe
> > cpufreq_governor: performance
> >
> > test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> > test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
> >
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > +------------------+--------------------------------------------------+
> > | testcase: change | hackbench: hackbench.throughput -8.1% regression |
> > | test machine | 104 threads 2 sockets (Skylake) with 192G memory |
> > | test parameters | cpufreq_governor=performance |
> > | | ipc=socket |
> > | | iterations=4 |
> > | | mode=process |
> > | | nr_threads=100% |
> > +------------------+--------------------------------------------------+
> >
> > Details are as below:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> > gcc-11/performance/pipe/8/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/hackbench
> >
> > commit:
> > a2e90611b9 ("sched/fair: Remove capacity inversion detection")
> > 829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
> >
> > a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 308887 ± 5% +125.5% 696539 hackbench.throughput
> > 259291 ± 2% +127.3% 589293 hackbench.throughput_avg
> > 308887 ± 5% +125.5% 696539 hackbench.throughput_best
> > 198770 ± 2% +105.5% 408552 ± 4% hackbench.throughput_worst
> > 319.60 ± 2% -55.8% 141.24 hackbench.time.elapsed_time
> > 319.60 ± 2% -55.8% 141.24 hackbench.time.elapsed_time.max
> > 1.298e+09 ± 8% -87.6% 1.613e+08 ± 7% hackbench.time.involuntary_context_switches
> > 477107 -12.5% 417660 hackbench.time.minor_page_faults
> > 24683 ± 2% -57.2% 10562 hackbench.time.system_time
> > 2136 ± 3% -45.0% 1174 hackbench.time.user_time
> > 3.21e+09 ± 4% -83.0% 5.442e+08 ± 3% hackbench.time.voluntary_context_switches
> > 5.28e+08 ± 4% +8.4% 5.723e+08 ± 3% cpuidle..time
> > 365.97 ± 2% -48.9% 187.12 uptime.boot
> > 3322559 ± 3% +34.3% 4463206 ± 15% vmstat.memory.cache
> > 14194257 ± 2% -62.8% 5279904 ± 3% vmstat.system.cs
> > 2120781 ± 3% -72.8% 576421 ± 4% vmstat.system.in
> > 1.84 ± 12% +2.6 4.48 ± 5% mpstat.cpu.all.idle%
> > 2.49 ± 3% -1.1 1.39 ± 4% mpstat.cpu.all.irq%
> > 0.04 ± 12% +0.0 0.05 mpstat.cpu.all.soft%
> > 7.36 +2.2 9.56 mpstat.cpu.all.usr%
> > 61555 ± 6% -72.8% 16751 ± 16% numa-meminfo.node1.Active
> > 61515 ± 6% -72.8% 16717 ± 16% numa-meminfo.node1.Active(anon)
> > 960182 ±102% +225.6% 3125990 ± 42% numa-meminfo.node1.FilePages
> > 1754002 ± 53% +137.9% 4173379 ± 34% numa-meminfo.node1.MemUsed
> > 35296824 ± 6% +157.8% 91005048 numa-numastat.node0.local_node
> > 35310119 ± 6% +157.9% 91058472 numa-numastat.node0.numa_hit
> > 35512423 ± 5% +159.7% 92232951 numa-numastat.node1.local_node
> > 35577275 ± 4% +159.4% 92273266 numa-numastat.node1.numa_hit
> > 35310253 ± 6% +157.9% 91058211 numa-vmstat.node0.numa_hit
> > 35296958 ± 6% +157.8% 91004787 numa-vmstat.node0.numa_local
> > 15337 ± 6% -72.5% 4216 ± 17% numa-vmstat.node1.nr_active_anon
> > 239988 ±102% +225.7% 781607 ± 42% numa-vmstat.node1.nr_file_pages
> > 15337 ± 6% -72.5% 4216 ± 17% numa-vmstat.node1.nr_zone_active_anon
> > 35577325 ± 4% +159.4% 92273215 numa-vmstat.node1.numa_hit
> > 35512473 ± 5% +159.7% 92232900 numa-vmstat.node1.numa_local
> > 64500 ± 8% -61.8% 24643 ± 32% meminfo.Active
> > 64422 ± 8% -61.9% 24568 ± 32% meminfo.Active(anon)
> > 140271 ± 14% -38.0% 86979 ± 24% meminfo.AnonHugePages
> > 372672 ± 2% +13.3% 422069 meminfo.AnonPages
> > 3205235 ± 3% +35.1% 4329061 ± 15% meminfo.Cached
> > 1548601 ± 7% +77.4% 2747319 ± 24% meminfo.Committed_AS
> > 783193 ± 14% +154.9% 1996137 ± 33% meminfo.Inactive
> > 783010 ± 14% +154.9% 1995951 ± 33% meminfo.Inactive(anon)
> > 4986534 ± 2% +28.2% 6394741 ± 10% meminfo.Memused
> > 475092 ± 22% +236.5% 1598918 ± 41% meminfo.Shmem
> > 2777 -2.1% 2719 turbostat.Bzy_MHz
> > 11143123 ± 6% +72.0% 19162667 turbostat.C1
> > 0.24 ± 7% +0.7 0.94 ± 3% turbostat.C1%
> > 100440 ± 18% +203.8% 305136 ± 15% turbostat.C1E
> > 0.06 ± 9% +0.1 0.18 ± 11% turbostat.C1E%
> > 1.24 ± 3% +1.6 2.81 ± 4% turbostat.C6%
> > 1.38 ± 3% +156.1% 3.55 ± 3% turbostat.CPU%c1
> > 0.33 ± 5% +76.5% 0.58 ± 7% turbostat.CPU%c6
> > 0.16 +31.2% 0.21 turbostat.IPC
> > 6.866e+08 ± 5% -87.8% 83575393 ± 5% turbostat.IRQ
> > 0.33 ± 27% +0.2 0.57 turbostat.POLL%
> > 0.12 ± 10% +176.4% 0.33 ± 12% turbostat.Pkg%pc2
> > 0.09 ± 7% -100.0% 0.00 turbostat.Pkg%pc6
> > 61.33 +5.2% 64.50 ± 2% turbostat.PkgTmp
> > 14.81 +2.0% 15.11 turbostat.RAMWatt
> > 16242 ± 8% -62.0% 6179 ± 32% proc-vmstat.nr_active_anon
> > 93150 ± 2% +13.2% 105429 proc-vmstat.nr_anon_pages
> > 801219 ± 3% +35.1% 1082320 ± 15% proc-vmstat.nr_file_pages
> > 195506 ± 14% +155.2% 498919 ± 33% proc-vmstat.nr_inactive_anon
> > 118682 ± 22% +236.9% 399783 ± 41% proc-vmstat.nr_shmem
> > 16242 ± 8% -62.0% 6179 ± 32% proc-vmstat.nr_zone_active_anon
> > 195506 ± 14% +155.2% 498919 ± 33% proc-vmstat.nr_zone_inactive_anon
> > 70889233 ± 5% +158.6% 1.833e+08 proc-vmstat.numa_hit
> > 70811086 ± 5% +158.8% 1.832e+08 proc-vmstat.numa_local
> > 55885 ± 22% -67.2% 18327 ± 38% proc-vmstat.numa_pages_migrated
> > 422312 ± 10% -95.4% 19371 ± 7% proc-vmstat.pgactivate
> > 71068460 ± 5% +158.1% 1.834e+08 proc-vmstat.pgalloc_normal
> > 1554994 -19.6% 1250346 ± 4% proc-vmstat.pgfault
> > 71011267 ± 5% +155.9% 1.817e+08 proc-vmstat.pgfree
> > 55885 ± 22% -67.2% 18327 ± 38% proc-vmstat.pgmigrate_success
> > 111247 ± 2% -35.0% 72355 ± 2% proc-vmstat.pgreuse
> > 2506368 ± 2% -53.1% 1176320 proc-vmstat.unevictable_pgs_scanned
> > 20.06 ± 10% -22.4% 15.56 ± 8% sched_debug.cfs_rq:/.h_nr_running.max
> > 0.81 ± 32% -93.1% 0.06 ±223% sched_debug.cfs_rq:/.h_nr_running.min
> > 1917 ± 34% -100.0% 0.00 sched_debug.cfs_rq:/.load.min
> > 24.18 ± 10% +39.0% 33.62 ± 11% sched_debug.cfs_rq:/.load_avg.avg
> > 245.61 ± 25% +66.3% 408.33 ± 22% sched_debug.cfs_rq:/.load_avg.max
> > 47.52 ± 13% +72.6% 82.03 ± 8% sched_debug.cfs_rq:/.load_avg.stddev
> > 13431147 -64.9% 4717147 sched_debug.cfs_rq:/.min_vruntime.avg
> > 18161799 ± 7% -67.4% 5925316 ± 6% sched_debug.cfs_rq:/.min_vruntime.max
> > 12413026 -65.0% 4340952 sched_debug.cfs_rq:/.min_vruntime.min
> > 739748 ± 16% -66.6% 247410 ± 17% sched_debug.cfs_rq:/.min_vruntime.stddev
> > 0.85 -16.4% 0.71 sched_debug.cfs_rq:/.nr_running.avg
> > 0.61 ± 25% -90.9% 0.06 ±223% sched_debug.cfs_rq:/.nr_running.min
> > 0.10 ± 25% +109.3% 0.22 ± 7% sched_debug.cfs_rq:/.nr_running.stddev
> > 169.22 +101.7% 341.33 sched_debug.cfs_rq:/.removed.load_avg.max
> > 32.41 ± 24% +100.2% 64.90 ± 16% sched_debug.cfs_rq:/.removed.load_avg.stddev
> > 82.92 ± 10% +108.1% 172.56 sched_debug.cfs_rq:/.removed.runnable_avg.max
> > 13.60 ± 28% +114.0% 29.10 ± 20% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> > 82.92 ± 10% +108.1% 172.56 sched_debug.cfs_rq:/.removed.util_avg.max
> > 13.60 ± 28% +114.0% 29.10 ± 20% sched_debug.cfs_rq:/.removed.util_avg.stddev
> > 2156 ± 12% -36.6% 1368 ± 27% sched_debug.cfs_rq:/.runnable_avg.min
> > 2285 ± 7% -19.8% 1833 ± 6% sched_debug.cfs_rq:/.runnable_avg.stddev
> > -2389921 -64.8% -840940 sched_debug.cfs_rq:/.spread0.min
> > 739781 ± 16% -66.5% 247837 ± 17% sched_debug.cfs_rq:/.spread0.stddev
> > 843.88 ± 2% -20.5% 670.53 sched_debug.cfs_rq:/.util_avg.avg
> > 433.64 ± 7% -43.5% 244.83 ± 17% sched_debug.cfs_rq:/.util_avg.min
> > 187.00 ± 6% +40.6% 263.02 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
> > 394.15 ± 14% -29.5% 278.06 ± 3% sched_debug.cfs_rq:/.util_est_enqueued.avg
> > 1128 ± 12% -17.6% 930.39 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.max
> > 38.36 ± 29% -100.0% 0.00 sched_debug.cfs_rq:/.util_est_enqueued.min
> > 3596 ± 15% -39.5% 2175 ± 7% sched_debug.cpu.avg_idle.min
> > 160647 ± 9% -25.9% 118978 ± 9% sched_debug.cpu.avg_idle.stddev
> > 197365 -46.2% 106170 sched_debug.cpu.clock.avg
> > 197450 -46.2% 106208 sched_debug.cpu.clock.max
> > 197281 -46.2% 106128 sched_debug.cpu.clock.min
> > 49.96 ± 22% -53.1% 23.44 ± 19% sched_debug.cpu.clock.stddev
> > 193146 -45.7% 104898 sched_debug.cpu.clock_task.avg
> > 194592 -45.8% 105455 sched_debug.cpu.clock_task.max
> > 177878 -49.3% 90211 sched_debug.cpu.clock_task.min
> > 1794 ± 5% -10.7% 1602 ± 2% sched_debug.cpu.clock_task.stddev
> > 13154 ± 2% -20.3% 10479 sched_debug.cpu.curr->pid.avg
> > 15059 -17.2% 12468 sched_debug.cpu.curr->pid.max
> > 7263 ± 33% -100.0% 0.00 sched_debug.cpu.curr->pid.min
> > 9321 ± 36% +98.2% 18478 ± 44% sched_debug.cpu.max_idle_balance_cost.stddev
> > 0.00 ± 17% -41.6% 0.00 ± 13% sched_debug.cpu.next_balance.stddev
> > 20.00 ± 11% -21.4% 15.72 ± 7% sched_debug.cpu.nr_running.max
> > 0.86 ± 17% -87.1% 0.11 ±141% sched_debug.cpu.nr_running.min
> > 25069883 -83.7% 4084117 ± 4% sched_debug.cpu.nr_switches.avg
> > 26486718 -82.8% 4544009 ± 4% sched_debug.cpu.nr_switches.max
> > 23680077 -84.5% 3663816 ± 4% sched_debug.cpu.nr_switches.min
> > 589836 ± 3% -68.7% 184621 ± 16% sched_debug.cpu.nr_switches.stddev
> > 197278 -46.2% 106128 sched_debug.cpu_clk
> > 194327 -46.9% 103176 sched_debug.ktime
> > 197967 -46.0% 106821 sched_debug.sched_clk
> > 14.91 -37.6% 9.31 perf-stat.i.MPKI
> > 2.657e+10 +25.0% 3.32e+10 perf-stat.i.branch-instructions
> > 1.17 -0.4 0.78 perf-stat.i.branch-miss-rate%
> > 3.069e+08 -20.1% 2.454e+08 perf-stat.i.branch-misses
> > 6.43 ± 8% +2.2 8.59 ± 4% perf-stat.i.cache-miss-rate%
> > 1.952e+09 -24.3% 1.478e+09 perf-stat.i.cache-references
> > 14344055 ± 2% -58.6% 5932018 ± 3% perf-stat.i.context-switches
> > 1.83 -21.8% 1.43 perf-stat.i.cpi
> > 2.403e+11 -3.4% 2.322e+11 perf-stat.i.cpu-cycles
> > 1420139 ± 2% -38.8% 869692 ± 5% perf-stat.i.cpu-migrations
> > 2619 ± 7% -15.5% 2212 ± 8% perf-stat.i.cycles-between-cache-misses
> > 0.24 ± 19% -0.1 0.10 ± 17% perf-stat.i.dTLB-load-miss-rate%
> > 90403286 ± 19% -55.8% 39926283 ± 16% perf-stat.i.dTLB-load-misses
> > 3.823e+10 +28.6% 4.918e+10 perf-stat.i.dTLB-loads
> > 0.01 ± 34% -0.0 0.01 ± 33% perf-stat.i.dTLB-store-miss-rate%
> > 2779663 ± 34% -52.7% 1315899 ± 31% perf-stat.i.dTLB-store-misses
> > 2.19e+10 +24.2% 2.72e+10 perf-stat.i.dTLB-stores
> > 47.99 ± 2% +28.0 75.94 perf-stat.i.iTLB-load-miss-rate%
> > 89417955 ± 2% +38.7% 1.24e+08 ± 4% perf-stat.i.iTLB-load-misses
> > 97721514 ± 2% -58.2% 40865783 ± 3% perf-stat.i.iTLB-loads
> > 1.329e+11 +26.3% 1.678e+11 perf-stat.i.instructions
> > 1503 -7.7% 1388 ± 3% perf-stat.i.instructions-per-iTLB-miss
> > 0.55 +30.2% 0.72 perf-stat.i.ipc
> > 1.64 ± 18% +217.4% 5.20 ± 11% perf-stat.i.major-faults
> > 2.73 -3.7% 2.63 perf-stat.i.metric.GHz
> > 1098 ± 2% -7.1% 1020 ± 3% perf-stat.i.metric.K/sec
> > 1008 +24.4% 1254 perf-stat.i.metric.M/sec
> > 4334 ± 2% +90.5% 8257 ± 7% perf-stat.i.minor-faults
> > 90.94 -14.9 75.99 perf-stat.i.node-load-miss-rate%
> > 41932510 ± 8% -43.0% 23899176 ± 10% perf-stat.i.node-load-misses
> > 3366677 ± 5% +86.2% 6267816 perf-stat.i.node-loads
> > 81.77 ± 3% -36.3 45.52 ± 3% perf-stat.i.node-store-miss-rate%
> > 18498318 ± 7% -31.8% 12613933 ± 7% perf-stat.i.node-store-misses
> > 3023556 ± 10% +508.7% 18405880 ± 2% perf-stat.i.node-stores
> > 4336 ± 2% +90.5% 8262 ± 7% perf-stat.i.page-faults
> > 14.70 -41.2% 8.65 perf-stat.overall.MPKI
> > 1.16 -0.4 0.72 perf-stat.overall.branch-miss-rate%
> > 6.22 ± 7% +2.4 8.59 ± 4% perf-stat.overall.cache-miss-rate%
> > 1.81 -24.3% 1.37 perf-stat.overall.cpi
> > 0.24 ± 19% -0.2 0.07 ± 15% perf-stat.overall.dTLB-load-miss-rate%
> > 0.01 ± 34% -0.0 0.00 ± 29% perf-stat.overall.dTLB-store-miss-rate%
> > 47.78 ± 2% +29.3 77.12 perf-stat.overall.iTLB-load-miss-rate%
> > 1486 -9.1% 1351 ± 4% perf-stat.overall.instructions-per-iTLB-miss
> > 0.55 +32.0% 0.73 perf-stat.overall.ipc
> > 92.54 -15.4 77.16 ± 2% perf-stat.overall.node-load-miss-rate%
> > 85.82 ± 2% -48.1 37.76 ± 5% perf-stat.overall.node-store-miss-rate%
> > 2.648e+10 +25.2% 3.314e+10 perf-stat.ps.branch-instructions
> > 3.06e+08 -22.1% 2.383e+08 perf-stat.ps.branch-misses
> > 1.947e+09 -25.5% 1.451e+09 perf-stat.ps.cache-references
> > 14298713 ± 2% -62.5% 5359285 ± 3% perf-stat.ps.context-switches
> > 2.396e+11 -4.0% 2.299e+11 perf-stat.ps.cpu-cycles
> > 1415512 ± 2% -42.2% 817981 ± 4% perf-stat.ps.cpu-migrations
> > 90073948 ± 19% -60.4% 35711862 ± 15% perf-stat.ps.dTLB-load-misses
> > 3.811e+10 +29.7% 4.944e+10 perf-stat.ps.dTLB-loads
> > 2767291 ± 34% -56.3% 1210210 ± 29% perf-stat.ps.dTLB-store-misses
> > 2.183e+10 +25.0% 2.729e+10 perf-stat.ps.dTLB-stores
> > 89118809 ± 2% +39.6% 1.244e+08 ± 4% perf-stat.ps.iTLB-load-misses
> > 97404381 ± 2% -62.2% 36860047 ± 3% perf-stat.ps.iTLB-loads
> > 1.324e+11 +26.7% 1.678e+11 perf-stat.ps.instructions
> > 1.62 ± 18% +164.7% 4.29 ± 8% perf-stat.ps.major-faults
> > 4310 ± 2% +75.1% 7549 ± 5% perf-stat.ps.minor-faults
> > 41743097 ± 8% -47.3% 21984450 ± 9% perf-stat.ps.node-load-misses
> > 3356259 ± 5% +92.6% 6462631 perf-stat.ps.node-loads
> > 18414647 ± 7% -35.7% 11833799 ± 6% perf-stat.ps.node-store-misses
> > 3019790 ± 10% +545.0% 19478071 perf-stat.ps.node-stores
> > 4312 ± 2% +75.2% 7553 ± 5% perf-stat.ps.page-faults
> > 4.252e+13 -43.7% 2.395e+13 perf-stat.total.instructions
> > 29.92 ± 4% -22.8 7.09 ± 29% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 28.53 ± 5% -21.6 6.92 ± 29% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write.ksys_write
> > 27.86 ± 5% -21.1 6.77 ± 29% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write
> > 27.55 ± 5% -20.9 6.68 ± 29% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
> > 22.28 ± 4% -17.0 5.31 ± 30% perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 21.98 ± 4% -16.7 5.24 ± 30% perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
> > 12.62 ± 4% -9.6 3.00 ± 33% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 34.09 -9.2 24.92 ± 3% perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 11.48 ± 5% -8.8 2.69 ± 38% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 9.60 ± 7% -7.2 2.40 ± 35% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.vfs_read
> > 36.39 -6.2 30.20 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 40.40 -6.1 34.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 40.95 -5.7 35.26 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
> > 37.43 -5.4 32.07 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 6.30 ± 11% -5.2 1.09 ± 36% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 5.66 ± 12% -5.1 0.58 ± 75% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 6.46 ± 10% -5.1 1.40 ± 28% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 5.53 ± 13% -5.0 0.56 ± 75% perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> > 5.42 ± 13% -4.9 0.56 ± 75% perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> > 5.82 ± 9% -4.7 1.10 ± 37% perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 5.86 ± 16% -4.6 1.31 ± 37% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> > 5.26 ± 9% -4.4 0.89 ± 57% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 45.18 -3.5 41.68 perf-profile.calltrace.cycles-pp.__libc_read
> > 50.31 -3.2 47.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 4.00 ± 27% -2.9 1.09 ± 40% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read
> > 50.75 -2.7 48.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
> > 40.80 -2.6 38.20 perf-profile.calltrace.cycles-pp.pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 3.10 ± 15% -2.5 0.62 ±103% perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_task_fair.__schedule.schedule.pipe_read
> > 2.94 ± 12% -2.3 0.62 ±102% perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> > 2.38 ± 9% -2.0 0.38 ±102% perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule.pipe_read.vfs_read
> > 2.24 ± 7% -1.8 0.40 ± 71% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 2.08 ± 6% -1.8 0.29 ±100% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.pipe_read.vfs_read
> > 2.10 ± 10% -1.8 0.32 ±104% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule.pipe_read
> > 2.76 ± 7% -1.5 1.24 ± 17% perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 2.27 ± 5% -1.4 0.88 ± 11% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 2.43 ± 7% -1.3 1.16 ± 17% perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 2.46 ± 5% -1.3 1.20 ± 7% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 1.54 ± 5% -1.2 0.32 ±101% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> > 0.97 ± 9% -0.3 0.66 ± 19% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
> > 0.86 ± 6% +0.2 1.02 perf-profile.calltrace.cycles-pp.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
> > 0.64 ± 9% +0.5 1.16 ± 5% perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 0.47 ± 45% +0.5 0.99 ± 5% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.60 ± 8% +0.5 1.13 ± 5% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 0.00 +0.5 0.54 ± 5% perf-profile.calltrace.cycles-pp.current_time.file_update_time.pipe_write.vfs_write.ksys_write
> > 0.00 +0.6 0.56 ± 4% perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write
> > 0.00 +0.6 0.56 ± 7% perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read
> > 0.00 +0.6 0.58 ± 5% perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_write.vfs_write.ksys_write
> > 0.00 +0.6 0.62 ± 3% perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_read.vfs_read.ksys_read
> > 0.00 +0.7 0.65 ± 6% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write
> > 0.00 +0.7 0.65 ± 7% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> > 0.57 ± 5% +0.7 1.24 ± 6% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +0.7 0.72 ± 6% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write.ksys_write
> > 0.00 +0.8 0.75 ± 6% perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.pipe_write.vfs_write.ksys_write
> > 0.74 ± 9% +0.8 1.48 ± 5% perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 0.63 ± 5% +0.8 1.40 ± 5% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 0.00 +0.8 0.78 ± 19% perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
> > 0.00 +0.8 0.78 ± 19% perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record
> > 0.00 +0.8 0.78 ± 19% perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record
> > 0.00 +0.8 0.80 ± 15% perf-profile.calltrace.cycles-pp.__cmd_record
> > 0.00 +0.8 0.82 ± 11% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> > 0.00 +0.9 0.85 ± 6% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 0.00 +0.9 0.86 ± 4% perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.pipe_read.vfs_read
> > 0.00 +0.9 0.87 ± 5% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
> > 0.00 +0.9 0.88 ± 5% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
> > 0.26 ±100% +1.0 1.22 ± 10% perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_write.vfs_write.ksys_write
> > 0.00 +1.0 0.96 ± 6% perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
> > 0.27 ±100% +1.0 1.23 ± 10% perf-profile.calltrace.cycles-pp.schedule.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 0.00 +1.0 0.97 ± 7% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read
> > 0.87 ± 8% +1.1 1.98 ± 5% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
> > 0.73 ± 6% +1.1 1.85 ± 5% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
> > 0.00 +1.2 1.15 ± 7% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read
> > 0.00 +1.2 1.23 ± 6% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read.ksys_read
> > 0.00 +1.2 1.24 ± 7% perf-profile.calltrace.cycles-pp.__folio_put.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 0.48 ± 45% +1.3 1.74 ± 6% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.vfs_read.ksys_read
> > 0.60 ± 7% +1.3 1.87 ± 8% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 1.23 ± 7% +1.3 2.51 ± 4% perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 43.42 +1.3 44.75 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 0.83 ± 7% +1.3 2.17 ± 5% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.98 ± 7% +1.4 2.36 ± 6% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.27 ±100% +1.4 1.70 ± 9% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read.ksys_read
> > 0.79 ± 8% +1.4 2.23 ± 6% perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 0.18 ±141% +1.5 1.63 ± 9% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read
> > 0.18 ±141% +1.5 1.67 ± 9% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read
> > 0.00 +1.6 1.57 ± 10% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> > 0.00 +1.6 1.57 ± 10% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> > 1.05 ± 8% +1.7 2.73 ± 6% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin._copy_from_iter.copy_page_from_iter.pipe_write
> > 1.84 ± 9% +1.7 3.56 ± 5% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.copy_page_to_iter.pipe_read
> > 1.41 ± 9% +1.8 3.17 ± 6% perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
> > 0.00 +1.8 1.79 ± 9% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> > 1.99 ± 9% +2.0 3.95 ± 5% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
> > 2.40 ± 7% +2.4 4.82 ± 5% perf-profile.calltrace.cycles-pp._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write.ksys_write
> > 0.00 +2.5 2.50 ± 7% perf-profile.calltrace.cycles-pp.__mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 2.89 ± 8% +2.6 5.47 ± 5% perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 1.04 ± 30% +2.8 3.86 ± 5% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
> > 0.00 +2.9 2.90 ± 11% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> > 0.00 +2.9 2.91 ± 11% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> > 0.00 +2.9 2.91 ± 11% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
> > 0.85 ± 27% +2.9 3.80 ± 5% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
> > 0.00 +3.0 2.96 ± 11% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
> > 2.60 ± 9% +3.1 5.74 ± 6% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read.ksys_read
> > 2.93 ± 9% +3.7 6.66 ± 5% perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 1.60 ± 12% +4.6 6.18 ± 7% perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 2.60 ± 10% +4.6 7.24 ± 5% perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> > 28.75 ± 5% -21.6 7.19 ± 28% perf-profile.children.cycles-pp.schedule
> > 30.52 ± 4% -21.6 8.97 ± 22% perf-profile.children.cycles-pp.__wake_up_common_lock
> > 28.53 ± 6% -21.0 7.56 ± 26% perf-profile.children.cycles-pp.__schedule
> > 29.04 ± 5% -20.4 8.63 ± 23% perf-profile.children.cycles-pp.__wake_up_common
> > 28.37 ± 5% -19.9 8.44 ± 23% perf-profile.children.cycles-pp.autoremove_wake_function
> > 28.08 ± 5% -19.7 8.33 ± 23% perf-profile.children.cycles-pp.try_to_wake_up
> > 13.90 ± 2% -10.2 3.75 ± 28% perf-profile.children.cycles-pp.ttwu_do_activate
> > 12.66 ± 3% -9.2 3.47 ± 29% perf-profile.children.cycles-pp.enqueue_task_fair
> > 34.20 -9.2 25.05 ± 3% perf-profile.children.cycles-pp.pipe_read
> > 90.86 -9.1 81.73 perf-profile.children.cycles-pp.do_syscall_64
> > 91.80 -8.3 83.49 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 10.28 ± 7% -7.8 2.53 ± 27% perf-profile.children.cycles-pp._raw_spin_lock
> > 9.85 ± 7% -6.9 2.92 ± 29% perf-profile.children.cycles-pp.dequeue_task_fair
> > 8.69 ± 7% -6.6 2.05 ± 24% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> > 8.99 ± 6% -6.2 2.81 ± 16% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > 36.46 -6.1 30.34 perf-profile.children.cycles-pp.vfs_read
> > 8.38 ± 8% -5.8 2.60 ± 23% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > 6.10 ± 11% -5.4 0.66 ± 61% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> > 37.45 -5.3 32.13 perf-profile.children.cycles-pp.ksys_read
> > 6.50 ± 35% -4.9 1.62 ± 61% perf-profile.children.cycles-pp.update_curr
> > 6.56 ± 15% -4.6 1.95 ± 57% perf-profile.children.cycles-pp.update_cfs_group
> > 6.38 ± 14% -4.5 1.91 ± 28% perf-profile.children.cycles-pp.enqueue_entity
> > 5.74 ± 5% -3.8 1.92 ± 25% perf-profile.children.cycles-pp.update_load_avg
> > 45.56 -3.8 41.75 perf-profile.children.cycles-pp.__libc_read
> > 3.99 ± 4% -3.1 0.92 ± 24% perf-profile.children.cycles-pp.pick_next_task_fair
> > 4.12 ± 27% -2.7 1.39 ± 34% perf-profile.children.cycles-pp.dequeue_entity
> > 40.88 -2.5 38.37 perf-profile.children.cycles-pp.pipe_write
> > 3.11 ± 4% -2.4 0.75 ± 22% perf-profile.children.cycles-pp.switch_mm_irqs_off
> > 2.06 ± 33% -1.8 0.27 ± 27% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
> > 2.38 ± 41% -1.8 0.60 ± 72% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
> > 2.29 ± 5% -1.7 0.60 ± 25% perf-profile.children.cycles-pp.switch_fpu_return
> > 2.30 ± 6% -1.6 0.68 ± 18% perf-profile.children.cycles-pp.prepare_task_switch
> > 1.82 ± 33% -1.6 0.22 ± 31% perf-profile.children.cycles-pp.sysvec_call_function_single
> > 1.77 ± 33% -1.6 0.20 ± 32% perf-profile.children.cycles-pp.__sysvec_call_function_single
> > 1.96 ± 5% -1.5 0.50 ± 20% perf-profile.children.cycles-pp.reweight_entity
> > 2.80 ± 7% -1.2 1.60 ± 12% perf-profile.children.cycles-pp.select_task_rq
> > 1.61 ± 6% -1.2 0.42 ± 25% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> > 1.34 ± 9% -1.2 0.16 ± 28% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> > 1.62 ± 4% -1.2 0.45 ± 22% perf-profile.children.cycles-pp.set_next_entity
> > 1.55 ± 8% -1.1 0.43 ± 12% perf-profile.children.cycles-pp.update_rq_clock
> > 1.49 ± 8% -1.1 0.41 ± 14% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> > 1.30 ± 20% -1.0 0.26 ± 18% perf-profile.children.cycles-pp.finish_task_switch
> > 1.44 ± 5% -1.0 0.42 ± 19% perf-profile.children.cycles-pp.__switch_to_asm
> > 2.47 ± 7% -1.0 1.50 ± 12% perf-profile.children.cycles-pp.select_task_rq_fair
> > 2.33 ± 7% -0.9 1.40 ± 3% perf-profile.children.cycles-pp.prepare_to_wait_event
> > 1.24 ± 7% -0.9 0.35 ± 14% perf-profile.children.cycles-pp.__update_load_avg_se
> > 1.41 ± 32% -0.9 0.56 ± 24% perf-profile.children.cycles-pp.sched_ttwu_pending
> > 2.29 ± 8% -0.8 1.45 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > 1.04 ± 7% -0.8 0.24 ± 22% perf-profile.children.cycles-pp.check_preempt_curr
> > 1.01 ± 3% -0.7 0.30 ± 20% perf-profile.children.cycles-pp.__switch_to
> > 0.92 ± 7% -0.7 0.26 ± 12% perf-profile.children.cycles-pp.update_min_vruntime
> > 0.71 ± 2% -0.6 0.08 ± 75% perf-profile.children.cycles-pp.put_prev_entity
> > 0.76 ± 6% -0.6 0.14 ± 32% perf-profile.children.cycles-pp.check_preempt_wakeup
> > 0.81 ± 66% -0.6 0.22 ± 34% perf-profile.children.cycles-pp.set_task_cpu
> > 0.82 ± 17% -0.6 0.23 ± 10% perf-profile.children.cycles-pp.cpuacct_charge
> > 1.08 ± 15% -0.6 0.51 ± 10% perf-profile.children.cycles-pp.wake_affine
> > 0.56 ± 15% -0.5 0.03 ±100% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> > 0.66 ± 3% -0.5 0.15 ± 28% perf-profile.children.cycles-pp.os_xsave
> > 0.52 ± 44% -0.5 0.06 ±151% perf-profile.children.cycles-pp.native_irq_return_iret
> > 0.55 ± 5% -0.4 0.15 ± 21% perf-profile.children.cycles-pp.__calc_delta
> > 0.56 ± 10% -0.4 0.17 ± 26% perf-profile.children.cycles-pp.___perf_sw_event
> > 0.70 ± 15% -0.4 0.32 ± 11% perf-profile.children.cycles-pp.task_h_load
> > 0.40 ± 4% -0.3 0.06 ± 49% perf-profile.children.cycles-pp.pick_next_entity
> > 0.57 ± 6% -0.3 0.26 ± 7% perf-profile.children.cycles-pp.__list_del_entry_valid
> > 0.39 ± 8% -0.3 0.08 ± 24% perf-profile.children.cycles-pp.set_next_buddy
> > 0.64 ± 6% -0.3 0.36 ± 6% perf-profile.children.cycles-pp._raw_spin_lock_irq
> > 0.53 ± 20% -0.3 0.25 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> > 0.36 ± 8% -0.3 0.08 ± 11% perf-profile.children.cycles-pp.rb_insert_color
> > 0.41 ± 6% -0.3 0.14 ± 17% perf-profile.children.cycles-pp.sched_clock_cpu
> > 0.36 ± 33% -0.3 0.10 ± 17% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> > 0.37 ± 4% -0.2 0.13 ± 16% perf-profile.children.cycles-pp.native_sched_clock
> > 0.28 ± 5% -0.2 0.07 ± 18% perf-profile.children.cycles-pp.rb_erase
> > 0.32 ± 7% -0.2 0.12 ± 10% perf-profile.children.cycles-pp.__list_add_valid
> > 0.23 ± 6% -0.2 0.03 ±103% perf-profile.children.cycles-pp.resched_curr
> > 0.27 ± 5% -0.2 0.08 ± 20% perf-profile.children.cycles-pp.__wrgsbase_inactive
> > 0.26 ± 6% -0.2 0.08 ± 17% perf-profile.children.cycles-pp.finish_wait
> > 0.26 ± 4% -0.2 0.08 ± 11% perf-profile.children.cycles-pp.rcu_note_context_switch
> > 0.33 ± 21% -0.2 0.15 ± 32% perf-profile.children.cycles-pp.migrate_task_rq_fair
> > 0.22 ± 9% -0.2 0.07 ± 22% perf-profile.children.cycles-pp.perf_trace_buf_update
> > 0.17 ± 8% -0.1 0.03 ±100% perf-profile.children.cycles-pp.rb_next
> > 0.15 ± 32% -0.1 0.03 ±100% perf-profile.children.cycles-pp.llist_reverse_order
> > 0.34 ± 7% -0.1 0.26 ± 3% perf-profile.children.cycles-pp.anon_pipe_buf_release
> > 0.14 ± 6% -0.1 0.07 ± 17% perf-profile.children.cycles-pp.read@plt
> > 0.10 ± 17% -0.1 0.04 ± 75% perf-profile.children.cycles-pp.remove_entity_load_avg
> > 0.07 ± 10% -0.0 0.02 ± 99% perf-profile.children.cycles-pp.generic_update_time
> > 0.11 ± 6% -0.0 0.07 ± 8% perf-profile.children.cycles-pp.__mark_inode_dirty
> > 0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.load_balance
> > 0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp._raw_spin_trylock
> > 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.uncharge_folio
> > 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__do_softirq
> > 0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
> > 0.00 +0.1 0.08 ± 14% perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
> > 0.15 ± 23% +0.1 0.23 ± 7% perf-profile.children.cycles-pp.task_tick_fair
> > 0.19 ± 17% +0.1 0.28 ± 7% perf-profile.children.cycles-pp.scheduler_tick
> > 0.00 +0.1 0.10 ± 21% perf-profile.children.cycles-pp.select_idle_core
> > 0.00 +0.1 0.10 ± 9% perf-profile.children.cycles-pp.osq_unlock
> > 0.23 ± 12% +0.1 0.34 ± 6% perf-profile.children.cycles-pp.update_process_times
> > 0.37 ± 13% +0.1 0.48 ± 5% perf-profile.children.cycles-pp.hrtimer_interrupt
> > 0.24 ± 12% +0.1 0.35 ± 6% perf-profile.children.cycles-pp.tick_sched_handle
> > 0.31 ± 14% +0.1 0.43 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
> > 0.37 ± 12% +0.1 0.49 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> > 0.00 +0.1 0.12 ± 10% perf-profile.children.cycles-pp.__mod_memcg_state
> > 0.26 ± 10% +0.1 0.38 ± 6% perf-profile.children.cycles-pp.tick_sched_timer
> > 0.00 +0.1 0.13 ± 7% perf-profile.children.cycles-pp.free_unref_page
> > 0.00 +0.1 0.14 ± 8% perf-profile.children.cycles-pp.rmqueue
> > 0.15 ± 8% +0.2 0.30 ± 5% perf-profile.children.cycles-pp.rcu_all_qs
> > 0.16 ± 6% +0.2 0.31 ± 5% perf-profile.children.cycles-pp.__x64_sys_write
> > 0.00 +0.2 0.16 ± 10% perf-profile.children.cycles-pp.propagate_protected_usage
> > 0.00 +0.2 0.16 ± 10% perf-profile.children.cycles-pp.menu_select
> > 0.00 +0.2 0.16 ± 9% perf-profile.children.cycles-pp.memcg_account_kmem
> > 0.42 ± 12% +0.2 0.57 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> > 0.15 ± 11% +0.2 0.31 ± 8% perf-profile.children.cycles-pp.__x64_sys_read
> > 0.00 +0.2 0.17 ± 8% perf-profile.children.cycles-pp.get_page_from_freelist
> > 0.44 ± 11% +0.2 0.62 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> > 0.10 ± 31% +0.2 0.28 ± 24% perf-profile.children.cycles-pp.mnt_user_ns
> > 0.16 ± 4% +0.2 0.35 ± 5% perf-profile.children.cycles-pp.kill_fasync
> > 0.20 ± 10% +0.2 0.40 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> > 0.09 ± 7% +0.2 0.29 ± 4% perf-profile.children.cycles-pp.page_copy_sane
> > 0.08 ± 8% +0.2 0.31 ± 6% perf-profile.children.cycles-pp.rw_verify_area
> > 0.12 ± 11% +0.2 0.36 ± 8% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> > 0.28 ± 12% +0.2 0.52 ± 5% perf-profile.children.cycles-pp.inode_needs_update_time
> > 0.00 +0.3 0.27 ± 7% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
> > 0.43 ± 6% +0.3 0.73 ± 5% perf-profile.children.cycles-pp.__cond_resched
> > 0.21 ± 29% +0.3 0.54 ± 15% perf-profile.children.cycles-pp.select_idle_cpu
> > 0.10 ± 10% +0.3 0.43 ± 17% perf-profile.children.cycles-pp.fsnotify_perm
> > 0.23 ± 11% +0.3 0.56 ± 6% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> > 0.06 ± 75% +0.4 0.47 ± 27% perf-profile.children.cycles-pp.queue_event
> > 0.21 ± 9% +0.4 0.62 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> > 0.06 ± 75% +0.4 0.48 ± 26% perf-profile.children.cycles-pp.ordered_events__queue
> > 0.06 ± 73% +0.4 0.50 ± 24% perf-profile.children.cycles-pp.process_simple
> > 0.01 ±223% +0.4 0.44 ± 9% perf-profile.children.cycles-pp.schedule_idle
> > 0.05 ± 8% +0.5 0.52 ± 7% perf-profile.children.cycles-pp.__alloc_pages
> > 0.45 ± 7% +0.5 0.94 ± 5% perf-profile.children.cycles-pp.__get_task_ioprio
> > 0.89 ± 8% +0.5 1.41 ± 4% perf-profile.children.cycles-pp.__might_sleep
> > 0.01 ±223% +0.5 0.54 ± 21% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> > 0.05 ± 46% +0.5 0.60 ± 7% perf-profile.children.cycles-pp.osq_lock
> > 0.34 ± 8% +0.6 0.90 ± 5% perf-profile.children.cycles-pp.aa_file_perm
> > 0.01 ±223% +0.7 0.67 ± 7% perf-profile.children.cycles-pp.poll_idle
> > 0.14 ± 17% +0.7 0.82 ± 6% perf-profile.children.cycles-pp.mutex_spin_on_owner
> > 0.12 ± 12% +0.7 0.82 ± 15% perf-profile.children.cycles-pp.__cmd_record
> > 0.07 ± 72% +0.7 0.78 ± 19% perf-profile.children.cycles-pp.reader__read_event
> > 0.07 ± 72% +0.7 0.78 ± 19% perf-profile.children.cycles-pp.record__finish_output
> > 0.07 ± 72% +0.7 0.78 ± 19% perf-profile.children.cycles-pp.perf_session__process_events
> > 0.76 ± 8% +0.8 1.52 ± 5% perf-profile.children.cycles-pp.file_update_time
> > 0.08 ± 61% +0.8 0.85 ± 11% perf-profile.children.cycles-pp.intel_idle_irq
> > 1.23 ± 8% +0.9 2.11 ± 4% perf-profile.children.cycles-pp.__might_fault
> > 0.02 ±141% +1.0 0.97 ± 7% perf-profile.children.cycles-pp.page_counter_uncharge
> > 0.51 ± 9% +1.0 1.48 ± 4% perf-profile.children.cycles-pp.current_time
> > 0.05 ± 46% +1.1 1.15 ± 7% perf-profile.children.cycles-pp.uncharge_batch
> > 1.12 ± 6% +1.1 2.23 ± 5% perf-profile.children.cycles-pp.__fget_light
> > 0.06 ± 14% +1.2 1.23 ± 6% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
> > 0.06 ± 14% +1.2 1.24 ± 7% perf-profile.children.cycles-pp.__folio_put
> > 0.64 ± 7% +1.2 1.83 ± 5% perf-profile.children.cycles-pp.syscall_return_via_sysret
> > 1.19 ± 8% +1.2 2.42 ± 4% perf-profile.children.cycles-pp.__might_resched
> > 0.59 ± 9% +1.3 1.84 ± 6% perf-profile.children.cycles-pp.atime_needs_update
> > 43.47 +1.4 44.83 perf-profile.children.cycles-pp.ksys_write
> > 1.28 ± 6% +1.4 2.68 ± 5% perf-profile.children.cycles-pp.__fdget_pos
> > 0.80 ± 8% +1.5 2.28 ± 6% perf-profile.children.cycles-pp.touch_atime
> > 0.11 ± 49% +1.5 1.59 ± 9% perf-profile.children.cycles-pp.cpuidle_enter_state
> > 0.11 ± 49% +1.5 1.60 ± 9% perf-profile.children.cycles-pp.cpuidle_enter
> > 0.12 ± 51% +1.7 1.81 ± 9% perf-profile.children.cycles-pp.cpuidle_idle_call
> > 1.44 ± 8% +1.8 3.22 ± 6% perf-profile.children.cycles-pp.copyin
> > 2.00 ± 9% +2.0 4.03 ± 5% perf-profile.children.cycles-pp.copyout
> > 1.02 ± 8% +2.0 3.07 ± 5% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 1.63 ± 7% +2.3 3.90 ± 5% perf-profile.children.cycles-pp.apparmor_file_permission
> > 2.64 ± 8% +2.3 4.98 ± 5% perf-profile.children.cycles-pp._copy_from_iter
> > 0.40 ± 14% +2.5 2.92 ± 7% perf-profile.children.cycles-pp.__mutex_lock
> > 2.91 ± 8% +2.6 5.54 ± 5% perf-profile.children.cycles-pp.copy_page_from_iter
> > 0.17 ± 62% +2.7 2.91 ± 11% perf-profile.children.cycles-pp.start_secondary
> > 1.83 ± 7% +2.8 4.59 ± 5% perf-profile.children.cycles-pp.security_file_permission
> > 0.17 ± 60% +2.8 2.94 ± 11% perf-profile.children.cycles-pp.do_idle
> > 0.17 ± 60% +2.8 2.96 ± 11% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
> > 0.17 ± 60% +2.8 2.96 ± 11% perf-profile.children.cycles-pp.cpu_startup_entry
> > 2.62 ± 9% +3.2 5.84 ± 6% perf-profile.children.cycles-pp._copy_to_iter
> > 1.55 ± 8% +3.2 4.79 ± 5% perf-profile.children.cycles-pp.__entry_text_start
> > 3.09 ± 8% +3.7 6.77 ± 5% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> > 2.95 ± 9% +3.8 6.73 ± 5% perf-profile.children.cycles-pp.copy_page_to_iter
> > 2.28 ± 11% +5.1 7.40 ± 6% perf-profile.children.cycles-pp.mutex_unlock
> > 3.92 ± 9% +6.0 9.94 ± 5% perf-profile.children.cycles-pp.mutex_lock
> > 8.37 ± 9% -5.8 2.60 ± 23% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> > 6.54 ± 15% -4.6 1.95 ± 57% perf-profile.self.cycles-pp.update_cfs_group
> > 3.08 ± 4% -2.3 0.74 ± 22% perf-profile.self.cycles-pp.switch_mm_irqs_off
> > 2.96 ± 4% -1.8 1.13 ± 33% perf-profile.self.cycles-pp.update_load_avg
> > 2.22 ± 8% -1.5 0.74 ± 12% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > 1.96 ± 9% -1.5 0.48 ± 15% perf-profile.self.cycles-pp.update_curr
> > 1.94 ± 5% -1.3 0.64 ± 16% perf-profile.self.cycles-pp._raw_spin_lock
> > 1.78 ± 5% -1.3 0.50 ± 18% perf-profile.self.cycles-pp.__schedule
> > 1.59 ± 7% -1.2 0.40 ± 12% perf-profile.self.cycles-pp.enqueue_entity
> > 1.61 ± 6% -1.2 0.42 ± 25% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> > 1.44 ± 8% -1.0 0.39 ± 14% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> > 1.42 ± 5% -1.0 0.41 ± 19% perf-profile.self.cycles-pp.__switch_to_asm
> > 1.18 ± 7% -0.9 0.33 ± 14% perf-profile.self.cycles-pp.__update_load_avg_se
> > 1.14 ± 10% -0.8 0.31 ± 9% perf-profile.self.cycles-pp.update_rq_clock
> > 0.90 ± 7% -0.7 0.19 ± 21% perf-profile.self.cycles-pp.pick_next_task_fair
> > 1.04 ± 7% -0.7 0.33 ± 13% perf-profile.self.cycles-pp.prepare_task_switch
> > 0.98 ± 4% -0.7 0.29 ± 20% perf-profile.self.cycles-pp.__switch_to
> > 0.88 ± 6% -0.7 0.20 ± 17% perf-profile.self.cycles-pp.enqueue_task_fair
> > 1.01 ± 6% -0.7 0.35 ± 10% perf-profile.self.cycles-pp.prepare_to_wait_event
> > 0.90 ± 8% -0.6 0.25 ± 12% perf-profile.self.cycles-pp.update_min_vruntime
> > 0.79 ± 17% -0.6 0.22 ± 9% perf-profile.self.cycles-pp.cpuacct_charge
> > 1.10 ± 5% -0.6 0.54 ± 9% perf-profile.self.cycles-pp.try_to_wake_up
> > 0.66 ± 3% -0.5 0.15 ± 27% perf-profile.self.cycles-pp.os_xsave
> > 0.71 ± 6% -0.5 0.22 ± 18% perf-profile.self.cycles-pp.reweight_entity
> > 0.68 ± 9% -0.5 0.19 ± 10% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
> > 0.67 ± 9% -0.5 0.18 ± 11% perf-profile.self.cycles-pp.__wake_up_common
> > 0.65 ± 6% -0.5 0.17 ± 23% perf-profile.self.cycles-pp.switch_fpu_return
> > 0.60 ± 11% -0.5 0.14 ± 28% perf-profile.self.cycles-pp.perf_tp_event
> > 0.52 ± 44% -0.5 0.06 ±151% perf-profile.self.cycles-pp.native_irq_return_iret
> > 0.52 ± 7% -0.4 0.08 ± 25% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> > 0.55 ± 4% -0.4 0.15 ± 22% perf-profile.self.cycles-pp.__calc_delta
> > 0.61 ± 5% -0.4 0.21 ± 12% perf-profile.self.cycles-pp.dequeue_task_fair
> > 0.69 ± 14% -0.4 0.32 ± 11% perf-profile.self.cycles-pp.task_h_load
> > 0.49 ± 11% -0.3 0.15 ± 29% perf-profile.self.cycles-pp.___perf_sw_event
> > 0.37 ± 4% -0.3 0.05 ± 73% perf-profile.self.cycles-pp.pick_next_entity
> > 0.50 ± 3% -0.3 0.19 ± 15% perf-profile.self.cycles-pp.select_idle_sibling
> > 0.38 ± 9% -0.3 0.08 ± 24% perf-profile.self.cycles-pp.set_next_buddy
> > 0.32 ± 4% -0.3 0.03 ±100% perf-profile.self.cycles-pp.put_prev_entity
> > 0.64 ± 6% -0.3 0.35 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irq
> > 0.52 ± 5% -0.3 0.25 ± 6% perf-profile.self.cycles-pp.__list_del_entry_valid
> > 0.34 ± 5% -0.3 0.07 ± 29% perf-profile.self.cycles-pp.schedule
> > 0.35 ± 9% -0.3 0.08 ± 10% perf-profile.self.cycles-pp.rb_insert_color
> > 0.40 ± 5% -0.3 0.14 ± 16% perf-profile.self.cycles-pp.select_task_rq_fair
> > 0.33 ± 6% -0.3 0.08 ± 16% perf-profile.self.cycles-pp.check_preempt_wakeup
> > 0.33 ± 8% -0.2 0.10 ± 16% perf-profile.self.cycles-pp.select_task_rq
> > 0.36 ± 3% -0.2 0.13 ± 16% perf-profile.self.cycles-pp.native_sched_clock
> > 0.32 ± 7% -0.2 0.10 ± 14% perf-profile.self.cycles-pp.finish_task_switch
> > 0.32 ± 4% -0.2 0.11 ± 13% perf-profile.self.cycles-pp.dequeue_entity
> > 0.32 ± 8% -0.2 0.12 ± 10% perf-profile.self.cycles-pp.__list_add_valid
> > 0.23 ± 5% -0.2 0.03 ±103% perf-profile.self.cycles-pp.resched_curr
> > 0.27 ± 6% -0.2 0.07 ± 21% perf-profile.self.cycles-pp.rb_erase
> > 0.27 ± 5% -0.2 0.08 ± 20% perf-profile.self.cycles-pp.__wrgsbase_inactive
> > 0.28 ± 13% -0.2 0.09 ± 12% perf-profile.self.cycles-pp.check_preempt_curr
> > 0.30 ± 13% -0.2 0.12 ± 7% perf-profile.self.cycles-pp.ttwu_queue_wakelist
> > 0.24 ± 5% -0.2 0.06 ± 19% perf-profile.self.cycles-pp.set_next_entity
> > 0.21 ± 34% -0.2 0.04 ± 71% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
> > 0.25 ± 5% -0.2 0.08 ± 16% perf-profile.self.cycles-pp.rcu_note_context_switch
> > 0.19 ± 26% -0.1 0.04 ± 73% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> > 0.20 ± 8% -0.1 0.06 ± 13% perf-profile.self.cycles-pp.ttwu_do_activate
> > 0.17 ± 8% -0.1 0.03 ±100% perf-profile.self.cycles-pp.rb_next
> > 0.22 ± 23% -0.1 0.09 ± 31% perf-profile.self.cycles-pp.migrate_task_rq_fair
> > 0.15 ± 32% -0.1 0.03 ±100% perf-profile.self.cycles-pp.llist_reverse_order
> > 0.16 ± 8% -0.1 0.06 ± 14% perf-profile.self.cycles-pp.wake_affine
> > 0.10 ± 31% -0.1 0.03 ±100% perf-profile.self.cycles-pp.sched_ttwu_pending
> > 0.14 ± 5% -0.1 0.07 ± 20% perf-profile.self.cycles-pp.read@plt
> > 0.32 ± 8% -0.1 0.26 ± 3% perf-profile.self.cycles-pp.anon_pipe_buf_release
> > 0.10 ± 6% -0.1 0.04 ± 45% perf-profile.self.cycles-pp.__wake_up_common_lock
> > 0.10 ± 9% -0.0 0.07 ± 8% perf-profile.self.cycles-pp.__mark_inode_dirty
> > 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.free_unref_page
> > 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__alloc_pages
> > 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp._raw_spin_trylock
> > 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.uncharge_folio
> > 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.uncharge_batch
> > 0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.menu_select
> > 0.00 +0.1 0.08 ± 14% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
> > 0.00 +0.1 0.08 ± 7% perf-profile.self.cycles-pp.__memcg_kmem_charge_page
> > 0.00 +0.1 0.10 ± 10% perf-profile.self.cycles-pp.osq_unlock
> > 0.07 ± 5% +0.1 0.17 ± 8% perf-profile.self.cycles-pp.copyin
> > 0.00 +0.1 0.11 ± 11% perf-profile.self.cycles-pp.__mod_memcg_state
> > 0.13 ± 8% +0.1 0.24 ± 6% perf-profile.self.cycles-pp.rcu_all_qs
> > 0.14 ± 5% +0.1 0.28 ± 5% perf-profile.self.cycles-pp.__x64_sys_write
> > 0.07 ± 10% +0.1 0.21 ± 5% perf-profile.self.cycles-pp.page_copy_sane
> > 0.13 ± 12% +0.1 0.28 ± 9% perf-profile.self.cycles-pp.__x64_sys_read
> > 0.00 +0.2 0.15 ± 10% perf-profile.self.cycles-pp.propagate_protected_usage
> > 0.18 ± 9% +0.2 0.33 ± 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> > 0.07 ± 8% +0.2 0.23 ± 5% perf-profile.self.cycles-pp.rw_verify_area
> > 0.08 ± 34% +0.2 0.24 ± 27% perf-profile.self.cycles-pp.mnt_user_ns
> > 0.13 ± 5% +0.2 0.31 ± 7% perf-profile.self.cycles-pp.kill_fasync
> > 0.21 ± 8% +0.2 0.39 ± 5% perf-profile.self.cycles-pp.__might_fault
> > 0.06 ± 13% +0.2 0.26 ± 9% perf-profile.self.cycles-pp.copyout
> > 0.10 ± 11% +0.2 0.31 ± 8% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> > 0.26 ± 13% +0.2 0.49 ± 6% perf-profile.self.cycles-pp.inode_needs_update_time
> > 0.23 ± 8% +0.2 0.47 ± 5% perf-profile.self.cycles-pp.copy_page_from_iter
> > 0.14 ± 7% +0.2 0.38 ± 6% perf-profile.self.cycles-pp.file_update_time
> > 0.36 ± 7% +0.3 0.62 ± 4% perf-profile.self.cycles-pp.ksys_read
> > 0.54 ± 13% +0.3 0.80 ± 4% perf-profile.self.cycles-pp._copy_from_iter
> > 0.15 ± 5% +0.3 0.41 ± 8% perf-profile.self.cycles-pp.touch_atime
> > 0.14 ± 5% +0.3 0.40 ± 6% perf-profile.self.cycles-pp.__cond_resched
> > 0.18 ± 5% +0.3 0.47 ± 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> > 0.16 ± 8% +0.3 0.46 ± 6% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> > 0.16 ± 9% +0.3 0.47 ± 6% perf-profile.self.cycles-pp.__fdget_pos
> > 1.79 ± 8% +0.3 2.12 ± 3% perf-profile.self.cycles-pp.pipe_read
> > 0.10 ± 8% +0.3 0.43 ± 17% perf-profile.self.cycles-pp.fsnotify_perm
> > 0.20 ± 4% +0.4 0.55 ± 5% perf-profile.self.cycles-pp.ksys_write
> > 0.05 ± 76% +0.4 0.46 ± 27% perf-profile.self.cycles-pp.queue_event
> > 0.32 ± 6% +0.4 0.73 ± 6% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
> > 0.21 ± 9% +0.4 0.62 ± 6% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> > 0.79 ± 8% +0.4 1.22 ± 4% perf-profile.self.cycles-pp.__might_sleep
> > 0.44 ± 5% +0.4 0.88 ± 7% perf-profile.self.cycles-pp.do_syscall_64
> > 0.26 ± 8% +0.4 0.70 ± 4% perf-profile.self.cycles-pp.atime_needs_update
> > 0.42 ± 7% +0.5 0.88 ± 5% perf-profile.self.cycles-pp.__get_task_ioprio
> > 0.28 ± 12% +0.5 0.75 ± 5% perf-profile.self.cycles-pp.copy_page_to_iter
> > 0.19 ± 6% +0.5 0.68 ± 10% perf-profile.self.cycles-pp.security_file_permission
> > 0.31 ± 8% +0.5 0.83 ± 5% perf-profile.self.cycles-pp.aa_file_perm
> > 0.05 ± 46% +0.5 0.59 ± 8% perf-profile.self.cycles-pp.osq_lock
> > 0.30 ± 7% +0.5 0.85 ± 6% perf-profile.self.cycles-pp._copy_to_iter
> > 0.00 +0.6 0.59 ± 6% perf-profile.self.cycles-pp.poll_idle
> > 0.13 ± 20% +0.7 0.81 ± 6% perf-profile.self.cycles-pp.mutex_spin_on_owner
> > 0.38 ± 9% +0.7 1.12 ± 5% perf-profile.self.cycles-pp.current_time
> > 0.08 ± 59% +0.8 0.82 ± 11% perf-profile.self.cycles-pp.intel_idle_irq
> > 0.92 ± 6% +0.8 1.72 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 0.01 ±223% +0.8 0.82 ± 6% perf-profile.self.cycles-pp.page_counter_uncharge
> > 0.86 ± 7% +1.1 1.91 ± 4% perf-profile.self.cycles-pp.vfs_read
> > 1.07 ± 6% +1.1 2.14 ± 5% perf-profile.self.cycles-pp.__fget_light
> > 0.67 ± 7% +1.1 1.74 ± 6% perf-profile.self.cycles-pp.vfs_write
> > 0.15 ± 12% +1.1 1.28 ± 7% perf-profile.self.cycles-pp.__mutex_lock
> > 1.09 ± 6% +1.1 2.22 ± 5% perf-profile.self.cycles-pp.__libc_read
> > 0.62 ± 6% +1.2 1.79 ± 5% perf-profile.self.cycles-pp.syscall_return_via_sysret
> > 1.16 ± 8% +1.2 2.38 ± 4% perf-profile.self.cycles-pp.__might_resched
> > 0.91 ± 7% +1.3 2.20 ± 5% perf-profile.self.cycles-pp.__libc_write
> > 0.59 ± 8% +1.3 1.93 ± 6% perf-profile.self.cycles-pp.__entry_text_start
> > 1.27 ± 7% +1.7 3.00 ± 6% perf-profile.self.cycles-pp.apparmor_file_permission
> > 0.99 ± 8% +2.0 2.98 ± 5% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 1.74 ± 8% +3.4 5.15 ± 6% perf-profile.self.cycles-pp.pipe_write
> > 2.98 ± 8% +3.7 6.64 ± 5% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> > 2.62 ± 10% +4.8 7.38 ± 5% perf-profile.self.cycles-pp.mutex_lock
> > 2.20 ± 10% +5.1 7.30 ± 6% perf-profile.self.cycles-pp.mutex_unlock
> >
> >
> > ***************************************************************************************************
> > lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
> > =========================================================================================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> > gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/hackbench
> >
> > commit:
> > a2e90611b9 ("sched/fair: Remove capacity inversion detection")
> > 829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
> >
> > a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 177139 -8.1% 162815 hackbench.throughput
> > 174484 -18.8% 141618 ± 2% hackbench.throughput_avg
> > 177139 -8.1% 162815 hackbench.throughput_best
> > 168530 -37.3% 105615 ± 3% hackbench.throughput_worst
> > 281.38 +23.1% 346.39 ± 2% hackbench.time.elapsed_time
> > 281.38 +23.1% 346.39 ± 2% hackbench.time.elapsed_time.max
> > 1.053e+08 ± 2% +688.4% 8.302e+08 ± 9% hackbench.time.involuntary_context_switches
> > 21992 +27.8% 28116 ± 2% hackbench.time.system_time
> > 6652 +8.2% 7196 hackbench.time.user_time
> > 3.482e+08 +289.2% 1.355e+09 ± 9% hackbench.time.voluntary_context_switches
> > 2110813 ± 5% +21.6% 2565791 ± 3% cpuidle..usage
> > 333.95 +19.5% 399.05 uptime.boot
> > 0.03 -0.0 0.03 mpstat.cpu.all.soft%
> > 22.68 -2.9 19.77 mpstat.cpu.all.usr%
> > 561083 ± 10% +45.5% 816171 ± 12% numa-numastat.node0.local_node
> > 614314 ± 9% +36.9% 841173 ± 12% numa-numastat.node0.numa_hit
> > 1393279 ± 7% -16.8% 1158997 ± 2% numa-numastat.node1.local_node
> > 1443679 ± 5% -14.9% 1229074 ± 3% numa-numastat.node1.numa_hit
> > 4129900 ± 8% -23.0% 3181115 vmstat.memory.cache
> > 1731 +30.8% 2265 vmstat.procs.r
> > 1598044 +290.3% 6237840 ± 7% vmstat.system.cs
> > 320762 +60.5% 514672 ± 8% vmstat.system.in
> > 962111 ± 6% +46.0% 1404646 ± 7% turbostat.C1
> > 233987 ± 5% +51.2% 353892 turbostat.C1E
> > 91515563 +97.3% 1.806e+08 ± 10% turbostat.IRQ
> > 448466 ± 14% -34.2% 294934 ± 5% turbostat.POLL
> > 34.60 -7.3% 32.07 turbostat.RAMWatt
> > 514028 ± 2% -14.0% 442125 ± 2% meminfo.AnonPages
> > 4006312 ± 8% -23.9% 3047078 meminfo.Cached
> > 3321064 ± 10% -32.7% 2236362 ± 2% meminfo.Committed_AS
> > 1714752 ± 21% -60.3% 680479 ± 8% meminfo.Inactive
> > 1714585 ± 21% -60.3% 680305 ± 8% meminfo.Inactive(anon)
> > 757124 ± 18% -67.2% 248485 ± 27% meminfo.Mapped
> > 6476123 ± 6% -19.4% 5220738 meminfo.Memused
> > 1275724 ± 26% -75.2% 316896 ± 15% meminfo.Shmem
> > 6806047 ± 3% -13.3% 5901974 meminfo.max_used_kB
> > 161311 ± 23% +31.7% 212494 ± 5% numa-meminfo.node0.AnonPages
> > 165693 ± 22% +30.5% 216264 ± 5% numa-meminfo.node0.Inactive
> > 165563 ± 22% +30.6% 216232 ± 5% numa-meminfo.node0.Inactive(anon)
> > 140638 ± 19% -36.7% 89034 ± 11% numa-meminfo.node0.Mapped
> > 352173 ± 14% -35.3% 227805 ± 8% numa-meminfo.node1.AnonPages
> > 501396 ± 11% -22.6% 388042 ± 5% numa-meminfo.node1.AnonPages.max
> > 1702242 ± 43% -77.8% 378325 ± 22% numa-meminfo.node1.FilePages
> > 1540803 ± 25% -70.4% 455592 ± 13% numa-meminfo.node1.Inactive
> > 1540767 ± 25% -70.4% 455451 ± 13% numa-meminfo.node1.Inactive(anon)
> > 612123 ± 18% -74.9% 153752 ± 37% numa-meminfo.node1.Mapped
> > 3085231 ± 24% -53.9% 1420940 ± 14% numa-meminfo.node1.MemUsed
> > 254052 ± 4% -19.1% 205632 ± 21% numa-meminfo.node1.SUnreclaim
> > 1259640 ± 27% -75.9% 303123 ± 15% numa-meminfo.node1.Shmem
> > 304597 ± 7% -20.2% 242920 ± 17% numa-meminfo.node1.Slab
> > 40345 ± 23% +31.5% 53054 ± 5% numa-vmstat.node0.nr_anon_pages
> > 41412 ± 22% +30.4% 53988 ± 5% numa-vmstat.node0.nr_inactive_anon
> > 35261 ± 19% -36.9% 22256 ± 12% numa-vmstat.node0.nr_mapped
> > 41412 ± 22% +30.4% 53988 ± 5% numa-vmstat.node0.nr_zone_inactive_anon
> > 614185 ± 9% +36.9% 841065 ± 12% numa-vmstat.node0.numa_hit
> > 560955 ± 11% +45.5% 816063 ± 12% numa-vmstat.node0.numa_local
> > 88129 ± 14% -35.2% 57097 ± 8% numa-vmstat.node1.nr_anon_pages
> > 426425 ± 43% -77.9% 94199 ± 22% numa-vmstat.node1.nr_file_pages
> > 386166 ± 25% -70.5% 113880 ± 13% numa-vmstat.node1.nr_inactive_anon
> > 153658 ± 18% -75.3% 38021 ± 37% numa-vmstat.node1.nr_mapped
> > 315775 ± 27% -76.1% 75399 ± 16% numa-vmstat.node1.nr_shmem
> > 63411 ± 4% -18.6% 51593 ± 21% numa-vmstat.node1.nr_slab_unreclaimable
> > 386166 ± 25% -70.5% 113880 ± 13% numa-vmstat.node1.nr_zone_inactive_anon
> > 1443470 ± 5% -14.9% 1228740 ± 3% numa-vmstat.node1.numa_hit
> > 1393069 ± 7% -16.8% 1158664 ± 2% numa-vmstat.node1.numa_local
> > 128457 ± 2% -14.0% 110530 ± 3% proc-vmstat.nr_anon_pages
> > 999461 ± 8% -23.8% 761774 proc-vmstat.nr_file_pages
> > 426485 ± 21% -60.1% 170237 ± 9% proc-vmstat.nr_inactive_anon
> > 82464 -2.6% 80281 proc-vmstat.nr_kernel_stack
> > 187777 ± 18% -66.9% 62076 ± 28% proc-vmstat.nr_mapped
> > 316813 ± 27% -75.0% 79228 ± 16% proc-vmstat.nr_shmem
> > 31469 -2.0% 30840 proc-vmstat.nr_slab_reclaimable
> > 117889 -8.4% 108036 proc-vmstat.nr_slab_unreclaimable
> > 426485 ± 21% -60.1% 170237 ± 9% proc-vmstat.nr_zone_inactive_anon
> > 187187 ± 12% -43.5% 105680 ± 9% proc-vmstat.numa_hint_faults
> > 128363 ± 15% -61.5% 49371 ± 19% proc-vmstat.numa_hint_faults_local
> > 47314 ± 22% +39.2% 65863 ± 13% proc-vmstat.numa_pages_migrated
> > 457026 ± 9% -18.1% 374188 ± 13% proc-vmstat.numa_pte_updates
> > 2586600 ± 3% +27.7% 3302787 ± 8% proc-vmstat.pgalloc_normal
> > 1589970 -6.2% 1491838 proc-vmstat.pgfault
> > 2347186 ± 10% +37.7% 3232369 ± 8% proc-vmstat.pgfree
> > 47314 ± 22% +39.2% 65863 ± 13% proc-vmstat.pgmigrate_success
> > 112713 +7.0% 120630 ± 3% proc-vmstat.pgreuse
> > 2189056 +22.2% 2674944 ± 2% proc-vmstat.unevictable_pgs_scanned
> > 14.08 ± 2% +29.3% 18.20 ± 5% sched_debug.cfs_rq:/.h_nr_running.avg
> > 0.80 ± 14% +179.2% 2.23 ± 24% sched_debug.cfs_rq:/.h_nr_running.min
> > 245.23 ± 12% -19.7% 196.97 ± 6% sched_debug.cfs_rq:/.load_avg.max
> > 2.27 ± 16% +75.0% 3.97 ± 4% sched_debug.cfs_rq:/.load_avg.min
> > 45.77 ± 16% -17.8% 37.60 ± 6% sched_debug.cfs_rq:/.load_avg.stddev
> > 11842707 +39.9% 16567992 sched_debug.cfs_rq:/.min_vruntime.avg
> > 13773080 ± 3% +113.9% 29460281 ± 7% sched_debug.cfs_rq:/.min_vruntime.max
> > 11423218 +30.3% 14885830 sched_debug.cfs_rq:/.min_vruntime.min
> > 301190 ± 12% +439.9% 1626088 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev
> > 203.83 -16.3% 170.67 sched_debug.cfs_rq:/.removed.load_avg.max
> > 14330 ± 3% +30.9% 18756 ± 5% sched_debug.cfs_rq:/.runnable_avg.avg
> > 25115 ± 4% +15.5% 28999 ± 6% sched_debug.cfs_rq:/.runnable_avg.max
> > 3811 ± 11% +68.0% 6404 ± 21% sched_debug.cfs_rq:/.runnable_avg.min
> > 3818 ± 6% +15.3% 4404 ± 7% sched_debug.cfs_rq:/.runnable_avg.stddev
> > -849635 +410.6% -4338612 sched_debug.cfs_rq:/.spread0.avg
> > 1092373 ± 54% +691.1% 8641673 ± 21% sched_debug.cfs_rq:/.spread0.max
> > -1263082 +378.1% -6038905 sched_debug.cfs_rq:/.spread0.min
> > 300764 ± 12% +441.8% 1629507 ± 9% sched_debug.cfs_rq:/.spread0.stddev
> > 1591 ± 4% -11.1% 1413 ± 3% sched_debug.cfs_rq:/.util_avg.max
> > 288.90 ± 11% +64.5% 475.23 ± 13% sched_debug.cfs_rq:/.util_avg.min
> > 240.33 ± 2% -32.1% 163.09 ± 3% sched_debug.cfs_rq:/.util_avg.stddev
> > 494.27 ± 3% +41.6% 699.85 ± 3% sched_debug.cfs_rq:/.util_est_enqueued.avg
> > 11.23 ± 54% +634.1% 82.47 ± 22% sched_debug.cfs_rq:/.util_est_enqueued.min
> > 174576 +20.7% 210681 sched_debug.cpu.clock.avg
> > 174926 +21.2% 211944 sched_debug.cpu.clock.max
> > 174164 +20.3% 209436 sched_debug.cpu.clock.min
> > 230.84 ± 33% +226.1% 752.67 ± 20% sched_debug.cpu.clock.stddev
> > 172836 +20.6% 208504 sched_debug.cpu.clock_task.avg
> > 173552 +21.0% 210079 sched_debug.cpu.clock_task.max
> > 156807 +22.3% 191789 sched_debug.cpu.clock_task.min
> > 1634 +17.1% 1914 ± 5% sched_debug.cpu.clock_task.stddev
> > 0.00 ± 32% +220.1% 0.00 ± 20% sched_debug.cpu.next_balance.stddev
> > 14.12 ± 2% +28.7% 18.18 ± 5% sched_debug.cpu.nr_running.avg
> > 0.73 ± 25% +213.6% 2.30 ± 24% sched_debug.cpu.nr_running.min
> > 1810086 +461.3% 10159215 ± 10% sched_debug.cpu.nr_switches.avg
> > 2315994 ± 3% +515.6% 14258195 ± 9% sched_debug.cpu.nr_switches.max
> > 1529863 +380.3% 7348324 ± 9% sched_debug.cpu.nr_switches.min
> > 167487 ± 18% +770.8% 1458519 ± 21% sched_debug.cpu.nr_switches.stddev
> > 174149 +20.2% 209410 sched_debug.cpu_clk
> > 170980 +20.6% 206240 sched_debug.ktime
> > 174896 +20.2% 210153 sched_debug.sched_clk
> > 7.35 +24.9% 9.18 ± 4% perf-stat.i.MPKI
> > 1.918e+10 +14.4% 2.194e+10 perf-stat.i.branch-instructions
> > 2.16 -0.1 2.09 perf-stat.i.branch-miss-rate%
> > 4.133e+08 +6.6% 4.405e+08 perf-stat.i.branch-misses
> > 23.08 -9.2 13.86 ± 7% perf-stat.i.cache-miss-rate%
> > 1.714e+08 -37.2% 1.076e+08 ± 3% perf-stat.i.cache-misses
> > 7.497e+08 +33.7% 1.002e+09 ± 5% perf-stat.i.cache-references
> > 1636365 +382.4% 7893858 ± 5% perf-stat.i.context-switches
> > 2.74 -6.8% 2.56 perf-stat.i.cpi
> > 131725 +288.0% 511159 ± 10% perf-stat.i.cpu-migrations
> > 1672 +160.8% 4361 ± 4% perf-stat.i.cycles-between-cache-misses
> > 0.49 +0.6 1.11 ± 5% perf-stat.i.dTLB-load-miss-rate%
> > 1.417e+08 +158.7% 3.665e+08 ± 5% perf-stat.i.dTLB-load-misses
> > 2.908e+10 +9.1% 3.172e+10 perf-stat.i.dTLB-loads
> > 0.12 ± 4% +0.1 0.20 ± 4% perf-stat.i.dTLB-store-miss-rate%
> > 20805655 ± 4% +90.9% 39716345 ± 4% perf-stat.i.dTLB-store-misses
> > 1.755e+10 +8.6% 1.907e+10 perf-stat.i.dTLB-stores
> > 29.04 +3.6 32.62 ± 2% perf-stat.i.iTLB-load-miss-rate%
> > 56676082 +60.4% 90917582 ± 3% perf-stat.i.iTLB-load-misses
> > 1.381e+08 +30.6% 1.804e+08 perf-stat.i.iTLB-loads
> > 1.03e+11 +10.5% 1.139e+11 perf-stat.i.instructions
> > 1840 -21.1% 1451 ± 4% perf-stat.i.instructions-per-iTLB-miss
> > 0.37 +10.9% 0.41 perf-stat.i.ipc
> > 1084 -4.5% 1035 ± 2% perf-stat.i.metric.K/sec
> > 640.69 +10.3% 706.44 perf-stat.i.metric.M/sec
> > 5249 -9.3% 4762 ± 3% perf-stat.i.minor-faults
> > 23.57 +18.7 42.30 ± 8% perf-stat.i.node-load-miss-rate%
> > 40174555 -45.0% 22109431 ± 10% perf-stat.i.node-loads
> > 8.84 ± 2% +24.5 33.30 ± 10% perf-stat.i.node-store-miss-rate%
> > 2912322 +60.3% 4667137 ± 16% perf-stat.i.node-store-misses
> > 34046752 -50.6% 16826621 ± 9% perf-stat.i.node-stores
> > 5278 -9.2% 4791 ± 3% perf-stat.i.page-faults
> > 7.24 +12.1% 8.12 ± 4% perf-stat.overall.MPKI
> > 2.15 -0.1 2.05 perf-stat.overall.branch-miss-rate%
> > 22.92 -9.5 13.41 ± 7% perf-stat.overall.cache-miss-rate%
> > 2.73 -6.3% 2.56 perf-stat.overall.cpi
> > 1644 +43.4% 2358 ± 3% perf-stat.overall.cycles-between-cache-misses
> > 0.48 +0.5 0.99 ± 4% perf-stat.overall.dTLB-load-miss-rate%
> > 0.12 ± 4% +0.1 0.19 ± 4% perf-stat.overall.dTLB-store-miss-rate%
> > 29.06 +2.9 32.01 ± 2% perf-stat.overall.iTLB-load-miss-rate%
> > 1826 -26.6% 1340 ± 4% perf-stat.overall.instructions-per-iTLB-miss
> > 0.37 +6.8% 0.39 perf-stat.overall.ipc
> > 22.74 +6.8 29.53 ± 13% perf-stat.overall.node-load-miss-rate%
> > 7.63 +8.4 16.02 ± 20% perf-stat.overall.node-store-miss-rate%
> > 1.915e+10 +9.0% 2.088e+10 perf-stat.ps.branch-instructions
> > 4.119e+08 +3.9% 4.282e+08 perf-stat.ps.branch-misses
> > 1.707e+08 -30.5% 1.186e+08 ± 3% perf-stat.ps.cache-misses
> > 7.446e+08 +19.2% 8.874e+08 ± 4% perf-stat.ps.cache-references
> > 1611874 +289.1% 6271376 ± 7% perf-stat.ps.context-switches
> > 127362 +189.0% 368041 ± 11% perf-stat.ps.cpu-migrations
> > 1.407e+08 +116.2% 3.042e+08 ± 5% perf-stat.ps.dTLB-load-misses
> > 2.901e+10 +5.4% 3.057e+10 perf-stat.ps.dTLB-loads
> > 20667480 ± 4% +66.8% 34473793 ± 4% perf-stat.ps.dTLB-store-misses
> > 1.751e+10 +5.1% 1.84e+10 perf-stat.ps.dTLB-stores
> > 56310692 +45.0% 81644183 ± 4% perf-stat.ps.iTLB-load-misses
> > 1.375e+08 +26.1% 1.733e+08 perf-stat.ps.iTLB-loads
> > 1.028e+11 +6.3% 1.093e+11 perf-stat.ps.instructions
> > 4929 -24.5% 3723 ± 2% perf-stat.ps.minor-faults
> > 40134633 -32.9% 26946247 ± 9% perf-stat.ps.node-loads
> > 2805073 +39.5% 3914304 ± 16% perf-stat.ps.node-store-misses
> > 33938259 -38.9% 20726382 ± 8% perf-stat.ps.node-stores
> > 4952 -24.5% 3741 ± 2% perf-stat.ps.page-faults
> > 2.911e+13 +30.9% 3.809e+13 ± 2% perf-stat.total.instructions
> > 15.30 ± 4% -8.6 6.66 ± 5% perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> > 13.84 ± 6% -7.9 5.98 ± 6% perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> > 13.61 ± 6% -7.8 5.84 ± 6% perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
> > 9.00 ± 2% -5.5 3.48 ± 4% perf-profile.calltrace.cycles-pp.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> > 6.44 ± 4% -4.3 2.14 ± 6% perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> > 5.83 ± 8% -3.4 2.44 ± 5% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
> > 5.81 ± 6% -3.3 2.48 ± 6% perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
> > 5.50 ± 7% -3.2 2.32 ± 6% perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 5.07 ± 8% -3.0 2.04 ± 6% perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> > 6.22 ± 2% -2.9 3.33 ± 3% perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> > 6.17 ± 2% -2.9 3.30 ± 3% perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> > 6.11 ± 2% -2.9 3.24 ± 3% perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
> > 50.99 -2.6 48.39 perf-profile.calltrace.cycles-pp.__libc_read
> > 5.66 ± 3% -2.3 3.35 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
> > 5.52 ± 3% -2.3 3.27 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
> > 3.14 ± 2% -1.7 1.42 ± 4% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> > 2.73 ± 2% -1.6 1.15 ± 4% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
> > 2.59 ± 2% -1.5 1.07 ± 4% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
> > 2.72 ± 3% -1.4 1.34 ± 6% perf-profile.calltrace.cycles-pp.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> > 41.50 -1.2 40.27 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
> > 2.26 ± 4% -1.1 1.12 perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> > 2.76 ± 3% -1.1 1.63 ± 3% perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
> > 2.84 ± 3% -1.1 1.71 ± 2% perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> > 2.20 ± 4% -1.1 1.08 perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
> > 2.98 ± 2% -1.1 1.90 ± 6% perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> > 1.99 ± 4% -1.1 0.92 ± 2% perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
> > 2.10 ± 3% -1.0 1.08 ± 4% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
> > 2.08 ± 4% -0.8 1.24 ± 3% perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
> > 2.16 ± 3% -0.7 1.47 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
> > 2.20 ± 2% -0.7 1.52 ± 3% perf-profile.calltrace.cycles-pp.__kmem_cache_free.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
> > 1.46 ± 3% -0.6 0.87 ± 8% perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> > 4.82 ± 2% -0.6 4.24 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 1.31 ± 2% -0.4 0.90 ± 4% perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> > 0.96 ± 3% -0.4 0.57 ± 10% perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
> > 1.14 ± 3% -0.4 0.76 ± 5% perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 0.99 ± 3% -0.3 0.65 ± 8% perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
> > 1.30 ± 4% -0.3 0.99 ± 3% perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
> > 0.98 ± 2% -0.3 0.69 ± 3% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.67 -0.2 0.42 ± 50% perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
> > 0.56 ± 4% -0.2 0.32 ± 81% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 0.86 ± 2% -0.2 0.63 ± 3% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
> > 1.15 ± 4% -0.2 0.93 ± 4% perf-profile.calltrace.cycles-pp.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read.ksys_read
> > 0.90 -0.2 0.69 ± 3% perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> > 1.23 ± 3% -0.2 1.07 ± 3% perf-profile.calltrace.cycles-pp.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
> > 1.05 ± 2% -0.2 0.88 ± 2% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.84 ± 4% -0.2 0.68 ± 4% perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read
> > 0.88 -0.1 0.78 ± 5% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
> > 0.94 ± 3% -0.1 0.88 ± 4% perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> > 0.62 ± 2% +0.3 0.90 ± 2% perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> > 0.00 +0.6 0.58 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
> > 0.00 +0.6 0.61 ± 6% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> > 0.00 +0.6 0.62 ± 4% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
> > 0.00 +0.7 0.67 ± 11% perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__schedule.schedule
> > 0.00 +0.7 0.67 ± 7% perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_write
> > 0.00 +0.8 0.76 ± 4% perf-profile.calltrace.cycles-pp.reweight_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
> > 0.00 +0.8 0.77 ± 4% perf-profile.calltrace.cycles-pp.___perf_sw_event.prepare_task_switch.__schedule.schedule.schedule_timeout
> > 0.00 +0.8 0.77 ± 8% perf-profile.calltrace.cycles-pp.put_prev_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
> > 0.00 +0.8 0.81 ± 5% perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> > 0.00 +0.8 0.81 ± 5% perf-profile.calltrace.cycles-pp.check_preempt_wakeup.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> > 0.00 +0.8 0.82 ± 2% perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_read
> > 0.00 +0.8 0.82 ± 3% perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> > 0.00 +0.9 0.86 ± 5% perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 0.00 +0.9 0.87 ± 8% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> > 29.66 +0.9 30.58 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 0.00 +1.0 0.95 ± 3% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.schedule_timeout
> > 0.00 +1.0 0.98 ± 4% perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 0.00 +1.0 0.99 ± 3% perf-profile.calltrace.cycles-pp.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> > 0.00 +1.0 1.05 ± 4% perf-profile.calltrace.cycles-pp.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> > 0.00 +1.1 1.07 ± 12% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
> > 27.81 ± 2% +1.2 28.98 perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
> > 27.36 ± 2% +1.2 28.59 perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read
> > 0.00 +1.5 1.46 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 0.00 +1.6 1.55 ± 4% perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> > 0.00 +1.6 1.60 ± 4% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> > 27.58 +1.6 29.19 perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.00 +1.6 1.63 ± 5% perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
> > 0.00 +1.6 1.65 ± 5% perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> > 0.00 +1.7 1.66 ± 6% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> > 0.00 +1.8 1.80 perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 0.00 +1.8 1.84 ± 2% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> > 0.00 +2.0 1.97 ± 2% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> > 26.63 ± 2% +2.0 28.61 perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
> > 0.00 +2.0 2.01 ± 6% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> > 0.00 +2.1 2.09 ± 6% perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 0.00 +2.1 2.11 ± 5% perf-profile.calltrace.cycles-pp.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 25.21 ± 2% +2.2 27.43 perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
> > 0.00 +2.4 2.43 ± 5% perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 48.00 +2.7 50.69 perf-profile.calltrace.cycles-pp.__libc_write
> > 0.00 +2.9 2.87 ± 5% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
> > 0.09 ±223% +3.4 3.47 ± 3% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> > 39.07 +4.8 43.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
> > 0.66 ± 18% +5.0 5.62 ± 4% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> > 4.73 +5.1 9.88 ± 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 0.66 ± 20% +5.3 5.98 ± 3% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> > 35.96 +5.7 41.68 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 0.00 +6.0 6.02 ± 6% perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> > 0.00 +6.2 6.18 ± 6% perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> > 0.00 +6.4 6.36 ± 6% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.78 ± 19% +6.4 7.15 ± 3% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> > 0.18 ±141% +7.0 7.18 ± 6% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> > 1.89 ± 15% +12.1 13.96 ± 3% perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
> > 1.92 ± 15% +12.3 14.23 ± 3% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
> > 1.66 ± 19% +12.4 14.06 ± 2% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
> > 1.96 ± 15% +12.5 14.48 ± 3% perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> > 1.69 ± 19% +12.7 14.38 ± 2% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg
> > 1.75 ± 19% +13.0 14.75 ± 2% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg
> > 2.53 ± 10% +13.4 15.90 ± 2% perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> > 1.96 ± 16% +13.5 15.42 ± 2% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> > 2.28 ± 15% +14.6 16.86 ± 3% perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> > 15.31 ± 4% -8.6 6.67 ± 5% perf-profile.children.cycles-pp.sock_alloc_send_pskb
> > 13.85 ± 6% -7.9 5.98 ± 5% perf-profile.children.cycles-pp.alloc_skb_with_frags
> > 13.70 ± 6% -7.8 5.89 ± 6% perf-profile.children.cycles-pp.__alloc_skb
> > 9.01 ± 2% -5.5 3.48 ± 4% perf-profile.children.cycles-pp.consume_skb
> > 6.86 ± 26% -4.7 2.15 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > 11.27 ± 3% -4.6 6.67 ± 3% perf-profile.children.cycles-pp.syscall_return_via_sysret
> > 6.46 ± 4% -4.3 2.15 ± 6% perf-profile.children.cycles-pp.skb_release_data
> > 4.18 ± 25% -4.0 0.15 ± 69% perf-profile.children.cycles-pp.___slab_alloc
> > 5.76 ± 32% -3.9 1.91 ± 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > 5.98 ± 8% -3.5 2.52 ± 5% perf-profile.children.cycles-pp.kmem_cache_alloc_node
> > 5.84 ± 6% -3.3 2.50 ± 6% perf-profile.children.cycles-pp.kmalloc_reserve
> > 3.33 ± 30% -3.3 0.05 ± 88% perf-profile.children.cycles-pp.get_partial_node
> > 5.63 ± 7% -3.3 2.37 ± 6% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
> > 5.20 ± 7% -3.1 2.12 ± 6% perf-profile.children.cycles-pp.__kmem_cache_alloc_node
> > 6.23 ± 2% -2.9 3.33 ± 3% perf-profile.children.cycles-pp.unix_stream_read_actor
> > 6.18 ± 2% -2.9 3.31 ± 3% perf-profile.children.cycles-pp.skb_copy_datagram_iter
> > 6.11 ± 2% -2.9 3.25 ± 3% perf-profile.children.cycles-pp.__skb_datagram_iter
> > 51.39 -2.5 48.85 perf-profile.children.cycles-pp.__libc_read
> > 3.14 ± 3% -2.5 0.61 ± 13% perf-profile.children.cycles-pp.__slab_free
> > 5.34 ± 3% -2.1 3.23 ± 3% perf-profile.children.cycles-pp.__entry_text_start
> > 3.57 ± 2% -1.9 1.66 ± 6% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> > 3.16 ± 2% -1.7 1.43 ± 4% perf-profile.children.cycles-pp._copy_to_iter
> > 2.74 ± 2% -1.6 1.16 ± 4% perf-profile.children.cycles-pp.copyout
> > 4.16 ± 2% -1.5 2.62 ± 3% perf-profile.children.cycles-pp.__check_object_size
> > 2.73 ± 3% -1.4 1.35 ± 6% perf-profile.children.cycles-pp.kmem_cache_free
> > 2.82 ± 2% -1.2 1.63 ± 3% perf-profile.children.cycles-pp.check_heap_object
> > 2.27 ± 4% -1.1 1.13 ± 2% perf-profile.children.cycles-pp.skb_release_head_state
> > 2.85 ± 3% -1.1 1.72 ± 2% perf-profile.children.cycles-pp.simple_copy_to_iter
> > 2.22 ± 4% -1.1 1.10 perf-profile.children.cycles-pp.unix_destruct_scm
> > 3.00 ± 2% -1.1 1.91 ± 5% perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
> > 2.00 ± 4% -1.1 0.92 ± 2% perf-profile.children.cycles-pp.sock_wfree
> > 2.16 ± 3% -0.7 1.43 ± 7% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
> > 1.45 ± 3% -0.7 0.73 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> > 2.21 ± 2% -0.7 1.52 ± 3% perf-profile.children.cycles-pp.__kmem_cache_free
> > 1.49 ± 3% -0.6 0.89 ± 8% perf-profile.children.cycles-pp._copy_from_iter
> > 1.40 ± 3% -0.6 0.85 ± 13% perf-profile.children.cycles-pp.mod_objcg_state
> > 0.74 -0.5 0.24 ± 16% perf-profile.children.cycles-pp.__build_skb_around
> > 1.48 -0.5 1.01 ± 2% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
> > 2.05 ± 2% -0.5 1.59 ± 2% perf-profile.children.cycles-pp.security_file_permission
> > 0.98 ± 2% -0.4 0.59 ± 10% perf-profile.children.cycles-pp.copyin
> > 1.08 ± 3% -0.4 0.72 ± 3% perf-profile.children.cycles-pp.__might_resched
> > 1.75 -0.3 1.42 ± 4% perf-profile.children.cycles-pp.apparmor_file_permission
> > 1.32 ± 4% -0.3 1.00 ± 3% perf-profile.children.cycles-pp.sock_recvmsg
> > 0.54 ± 4% -0.3 0.25 ± 6% perf-profile.children.cycles-pp.skb_unlink
> > 0.54 ± 6% -0.3 0.26 ± 3% perf-profile.children.cycles-pp.unix_write_space
> > 0.66 ± 3% -0.3 0.39 ± 4% perf-profile.children.cycles-pp.obj_cgroup_charge
> > 0.68 ± 2% -0.3 0.41 ± 4% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 0.86 ± 4% -0.3 0.59 ± 3% perf-profile.children.cycles-pp.__check_heap_object
> > 0.75 ± 9% -0.3 0.48 ± 2% perf-profile.children.cycles-pp.skb_set_owner_w
> > 1.84 ± 3% -0.3 1.58 ± 4% perf-profile.children.cycles-pp.aa_sk_perm
> > 0.68 ± 11% -0.2 0.44 ± 3% perf-profile.children.cycles-pp.skb_queue_tail
> > 1.22 ± 4% -0.2 0.99 ± 5% perf-profile.children.cycles-pp.__fdget_pos
> > 0.70 ± 2% -0.2 0.48 ± 5% perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
> > 1.16 ± 4% -0.2 0.93 ± 3% perf-profile.children.cycles-pp.security_socket_recvmsg
> > 0.48 ± 3% -0.2 0.29 ± 4% perf-profile.children.cycles-pp.__might_fault
> > 0.24 ± 7% -0.2 0.05 ± 56% perf-profile.children.cycles-pp.fsnotify_perm
> > 1.12 ± 4% -0.2 0.93 ± 6% perf-profile.children.cycles-pp.__fget_light
> > 1.24 ± 3% -0.2 1.07 ± 3% perf-profile.children.cycles-pp.security_socket_sendmsg
> > 0.61 ± 3% -0.2 0.45 ± 2% perf-profile.children.cycles-pp.__might_sleep
> > 0.33 ± 5% -0.2 0.17 ± 6% perf-profile.children.cycles-pp.refill_obj_stock
> > 0.40 ± 2% -0.1 0.25 ± 4% perf-profile.children.cycles-pp.kmalloc_slab
> > 0.57 ± 2% -0.1 0.45 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> > 0.54 ± 3% -0.1 0.42 ± 2% perf-profile.children.cycles-pp.wait_for_unix_gc
> > 0.42 ± 2% -0.1 0.30 ± 3% perf-profile.children.cycles-pp.is_vmalloc_addr
> > 1.00 ± 2% -0.1 0.87 ± 5% perf-profile.children.cycles-pp.__virt_addr_valid
> > 0.52 ± 2% -0.1 0.41 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> > 0.33 ± 3% -0.1 0.21 ± 3% perf-profile.children.cycles-pp.tick_sched_handle
> > 0.36 ± 2% -0.1 0.25 ± 4% perf-profile.children.cycles-pp.tick_sched_timer
> > 0.47 ± 2% -0.1 0.36 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt
> > 0.48 ± 2% -0.1 0.36 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> > 0.32 ± 3% -0.1 0.21 ± 5% perf-profile.children.cycles-pp.update_process_times
> > 0.42 ± 3% -0.1 0.31 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
> > 0.26 ± 6% -0.1 0.16 ± 4% perf-profile.children.cycles-pp.kmalloc_size_roundup
> > 0.20 ± 4% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.task_tick_fair
> > 0.24 ± 3% -0.1 0.15 ± 4% perf-profile.children.cycles-pp.scheduler_tick
> > 0.30 ± 5% -0.1 0.21 ± 8% perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
> > 0.20 ± 2% -0.1 0.11 ± 6% perf-profile.children.cycles-pp.should_failslab
> > 0.51 ± 2% -0.1 0.43 ± 6% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> > 0.15 ± 8% -0.1 0.07 ± 13% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> > 0.19 ± 4% -0.1 0.12 ± 5% perf-profile.children.cycles-pp.apparmor_socket_sendmsg
> > 0.20 ± 4% -0.1 0.13 ± 5% perf-profile.children.cycles-pp.aa_file_perm
> > 0.18 ± 5% -0.1 0.12 ± 5% perf-profile.children.cycles-pp.apparmor_socket_recvmsg
> > 0.14 ± 13% -0.1 0.08 ± 55% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
> > 0.24 ± 4% -0.1 0.18 ± 2% perf-profile.children.cycles-pp.rcu_all_qs
> > 0.18 ± 10% -0.1 0.12 ± 11% perf-profile.children.cycles-pp.memcg_account_kmem
> > 0.37 ± 3% -0.1 0.31 ± 3% perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
> > 0.08 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.put_pid
> > 0.18 ± 3% -0.0 0.16 ± 4% perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram
> > 0.21 ± 3% +0.0 0.23 ± 2% perf-profile.children.cycles-pp.__get_task_ioprio
> > 0.00 +0.1 0.05 perf-profile.children.cycles-pp.perf_exclude_event
> > 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.invalidate_user_asid
> > 0.00 +0.1 0.07 ± 6% perf-profile.children.cycles-pp.__bitmap_and
> > 0.05 +0.1 0.13 ± 8% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
> > 0.00 +0.1 0.08 ± 7% perf-profile.children.cycles-pp.schedule_debug
> > 0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.read@plt
> > 0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.sysvec_reschedule_ipi
> > 0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.tracing_gen_ctx_irq_test
> > 0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.place_entity
> > 0.00 +0.1 0.12 ± 10% perf-profile.children.cycles-pp.native_irq_return_iret
> > 0.07 ± 14% +0.1 0.19 ± 3% perf-profile.children.cycles-pp.__list_add_valid
> > 0.00 +0.1 0.13 ± 6% perf-profile.children.cycles-pp.perf_trace_buf_alloc
> > 0.00 +0.1 0.13 ± 34% perf-profile.children.cycles-pp._find_next_and_bit
> > 0.00 +0.1 0.14 ± 5% perf-profile.children.cycles-pp.switch_ldt
> > 0.00 +0.1 0.15 ± 5% perf-profile.children.cycles-pp.check_cfs_rq_runtime
> > 0.00 +0.1 0.15 ± 30% perf-profile.children.cycles-pp.migrate_task_rq_fair
> > 0.00 +0.2 0.15 ± 5% perf-profile.children.cycles-pp.__rdgsbase_inactive
> > 0.00 +0.2 0.16 ± 3% perf-profile.children.cycles-pp.save_fpregs_to_fpstate
> > 0.00 +0.2 0.16 ± 6% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> > 0.00 +0.2 0.17 perf-profile.children.cycles-pp.perf_trace_buf_update
> > 0.00 +0.2 0.18 ± 2% perf-profile.children.cycles-pp.rb_insert_color
> > 0.00 +0.2 0.18 ± 4% perf-profile.children.cycles-pp.rb_next
> > 0.00 +0.2 0.18 ± 21% perf-profile.children.cycles-pp.__cgroup_account_cputime
> > 0.01 ±223% +0.2 0.21 ± 28% perf-profile.children.cycles-pp.perf_trace_sched_switch
> > 0.00 +0.2 0.20 ± 3% perf-profile.children.cycles-pp.select_idle_cpu
> > 0.00 +0.2 0.20 ± 3% perf-profile.children.cycles-pp.rcu_note_context_switch
> > 0.00 +0.2 0.21 ± 26% perf-profile.children.cycles-pp.set_task_cpu
> > 0.00 +0.2 0.22 ± 8% perf-profile.children.cycles-pp.resched_curr
> > 0.08 ± 5% +0.2 0.31 ± 11% perf-profile.children.cycles-pp.task_h_load
> > 0.00 +0.2 0.24 ± 3% perf-profile.children.cycles-pp.finish_wait
> > 0.04 ± 44% +0.3 0.29 ± 5% perf-profile.children.cycles-pp.rb_erase
> > 0.19 ± 6% +0.3 0.46 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> > 0.20 ± 6% +0.3 0.47 ± 3% perf-profile.children.cycles-pp.__list_del_entry_valid
> > 0.00 +0.3 0.28 ± 3% perf-profile.children.cycles-pp.__wrgsbase_inactive
> > 0.02 ±141% +0.3 0.30 ± 2% perf-profile.children.cycles-pp.native_sched_clock
> > 0.06 ± 13% +0.3 0.34 ± 2% perf-profile.children.cycles-pp.sched_clock_cpu
> > 0.64 ± 2% +0.3 0.93 perf-profile.children.cycles-pp.mutex_lock
> > 0.00 +0.3 0.30 ± 5% perf-profile.children.cycles-pp.cr4_update_irqsoff
> > 0.00 +0.3 0.30 ± 4% perf-profile.children.cycles-pp.clear_buddies
> > 0.07 ± 55% +0.3 0.37 ± 5% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
> > 0.10 ± 66% +0.3 0.42 ± 5% perf-profile.children.cycles-pp.perf_tp_event
> > 0.02 ±142% +0.3 0.36 ± 6% perf-profile.children.cycles-pp.cpuacct_charge
> > 0.12 ± 9% +0.4 0.47 ± 11% perf-profile.children.cycles-pp.wake_affine
> > 0.00 +0.4 0.36 ± 13% perf-profile.children.cycles-pp.available_idle_cpu
> > 0.05 ± 48% +0.4 0.42 ± 6% perf-profile.children.cycles-pp.finish_task_switch
> > 0.12 ± 4% +0.4 0.49 ± 4% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> > 0.07 ± 17% +0.4 0.48 perf-profile.children.cycles-pp.__calc_delta
> > 0.03 ±100% +0.5 0.49 ± 4% perf-profile.children.cycles-pp.pick_next_entity
> > 0.00 +0.5 0.48 ± 8% perf-profile.children.cycles-pp.set_next_buddy
> > 0.08 ± 14% +0.6 0.66 ± 4% perf-profile.children.cycles-pp.update_min_vruntime
> > 0.07 ± 17% +0.6 0.68 ± 2% perf-profile.children.cycles-pp.os_xsave
> > 0.29 ± 7% +0.7 0.99 ± 3% perf-profile.children.cycles-pp.update_cfs_group
> > 0.17 ± 17% +0.7 0.87 ± 4% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
> > 0.14 ± 7% +0.7 0.87 ± 3% perf-profile.children.cycles-pp.__update_load_avg_se
> > 0.14 ± 16% +0.8 0.90 ± 2% perf-profile.children.cycles-pp.update_rq_clock
> > 0.08 ± 17% +0.8 0.84 ± 5% perf-profile.children.cycles-pp.check_preempt_wakeup
> > 0.12 ± 14% +0.8 0.95 ± 3% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> > 0.22 ± 5% +0.8 1.07 ± 3% perf-profile.children.cycles-pp.prepare_to_wait
> > 0.10 ± 18% +0.9 0.98 ± 3% perf-profile.children.cycles-pp.check_preempt_curr
> > 29.72 +0.9 30.61 perf-profile.children.cycles-pp.vfs_write
> > 0.14 ± 11% +0.9 1.03 ± 4% perf-profile.children.cycles-pp.__switch_to
> > 0.07 ± 20% +0.9 0.99 ± 6% perf-profile.children.cycles-pp.put_prev_entity
> > 0.12 ± 16% +1.0 1.13 ± 5% perf-profile.children.cycles-pp.___perf_sw_event
> > 0.07 ± 17% +1.0 1.10 ± 13% perf-profile.children.cycles-pp.select_idle_sibling
> > 27.82 ± 2% +1.2 28.99 perf-profile.children.cycles-pp.unix_stream_recvmsg
> > 27.41 ± 2% +1.2 28.63 perf-profile.children.cycles-pp.unix_stream_read_generic
> > 0.20 ± 15% +1.4 1.59 ± 3% perf-profile.children.cycles-pp.reweight_entity
> > 0.21 ± 13% +1.4 1.60 ± 4% perf-profile.children.cycles-pp.__switch_to_asm
> > 0.23 ± 10% +1.4 1.65 ± 5% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> > 0.20 ± 13% +1.5 1.69 ± 3% perf-profile.children.cycles-pp.set_next_entity
> > 27.59 +1.6 29.19 perf-profile.children.cycles-pp.sock_write_iter
> > 0.28 ± 10% +1.8 2.12 ± 5% perf-profile.children.cycles-pp.switch_fpu_return
> > 0.26 ± 11% +1.8 2.10 ± 6% perf-profile.children.cycles-pp.select_task_rq_fair
> > 26.66 ± 2% +2.0 28.63 perf-profile.children.cycles-pp.sock_sendmsg
> > 0.31 ± 12% +2.1 2.44 ± 5% perf-profile.children.cycles-pp.select_task_rq
> > 0.30 ± 14% +2.2 2.46 ± 4% perf-profile.children.cycles-pp.prepare_task_switch
> > 25.27 ± 2% +2.2 27.47 perf-profile.children.cycles-pp.unix_stream_sendmsg
> > 2.10 +2.3 4.38 ± 2% perf-profile.children.cycles-pp._raw_spin_lock
> > 0.40 ± 14% +2.5 2.92 ± 5% perf-profile.children.cycles-pp.dequeue_entity
> > 48.40 +2.6 51.02 perf-profile.children.cycles-pp.__libc_write
> > 0.46 ± 15% +3.1 3.51 ± 3% perf-profile.children.cycles-pp.enqueue_entity
> > 0.49 ± 10% +3.2 3.64 ± 7% perf-profile.children.cycles-pp.update_load_avg
> > 0.53 ± 20% +3.4 3.91 ± 3% perf-profile.children.cycles-pp.update_curr
> > 80.81 +3.4 84.24 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 0.50 ± 12% +3.5 4.00 ± 4% perf-profile.children.cycles-pp.switch_mm_irqs_off
> > 0.55 ± 9% +3.8 4.38 ± 4% perf-profile.children.cycles-pp.pick_next_task_fair
> > 9.60 +4.6 14.15 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > 0.78 ± 13% +4.9 5.65 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair
> > 0.78 ± 15% +5.2 5.99 ± 3% perf-profile.children.cycles-pp.enqueue_task_fair
> > 74.30 +5.6 79.86 perf-profile.children.cycles-pp.do_syscall_64
> > 0.90 ± 15% +6.3 7.16 ± 3% perf-profile.children.cycles-pp.ttwu_do_activate
> > 0.33 ± 31% +6.3 6.61 ± 6% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> > 0.82 ± 15% +8.1 8.92 ± 5% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> > 1.90 ± 16% +12.2 14.10 ± 2% perf-profile.children.cycles-pp.try_to_wake_up
> > 2.36 ± 11% +12.2 14.60 ± 3% perf-profile.children.cycles-pp.schedule_timeout
> > 1.95 ± 15% +12.5 14.41 ± 2% perf-profile.children.cycles-pp.autoremove_wake_function
> > 2.01 ± 15% +12.8 14.76 ± 2% perf-profile.children.cycles-pp.__wake_up_common
> > 2.23 ± 13% +13.2 15.45 ± 2% perf-profile.children.cycles-pp.__wake_up_common_lock
> > 2.53 ± 10% +13.4 15.90 ± 2% perf-profile.children.cycles-pp.sock_def_readable
> > 2.29 ± 15% +14.6 16.93 ± 3% perf-profile.children.cycles-pp.unix_stream_data_wait
> > 2.61 ± 13% +18.0 20.65 ± 4% perf-profile.children.cycles-pp.schedule
> > 2.66 ± 13% +18.1 20.77 ± 4% perf-profile.children.cycles-pp.__schedule
> > 11.25 ± 3% -4.6 6.67 ± 3% perf-profile.self.cycles-pp.syscall_return_via_sysret
> > 5.76 ± 32% -3.9 1.90 ± 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> > 8.69 ± 3% -3.4 5.27 ± 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> > 3.11 ± 3% -2.5 0.60 ± 13% perf-profile.self.cycles-pp.__slab_free
> > 6.65 ± 2% -2.2 4.47 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 4.78 ± 3% -1.9 2.88 ± 3% perf-profile.self.cycles-pp.__entry_text_start
> > 3.52 ± 2% -1.9 1.64 ± 6% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> > 2.06 ± 3% -1.1 0.96 ± 5% perf-profile.self.cycles-pp.kmem_cache_free
> > 1.42 ± 3% -1.0 0.46 ± 10% perf-profile.self.cycles-pp.check_heap_object
> > 1.43 ± 4% -0.8 0.64 perf-profile.self.cycles-pp.sock_wfree
> > 0.99 ± 3% -0.8 0.21 ± 12% perf-profile.self.cycles-pp.skb_release_data
> > 0.84 ± 8% -0.7 0.10 ± 64% perf-profile.self.cycles-pp.___slab_alloc
> > 1.97 ± 2% -0.6 1.32 perf-profile.self.cycles-pp.unix_stream_read_generic
> > 1.60 ± 3% -0.5 1.11 ± 4% perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
> > 1.24 ± 2% -0.5 0.75 ± 11% perf-profile.self.cycles-pp.mod_objcg_state
> > 0.71 -0.5 0.23 ± 15% perf-profile.self.cycles-pp.__build_skb_around
> > 0.95 ± 3% -0.5 0.50 ± 6% perf-profile.self.cycles-pp.__alloc_skb
> > 0.97 ± 4% -0.4 0.55 ± 5% perf-profile.self.cycles-pp.kmem_cache_alloc_node
> > 0.99 ± 3% -0.4 0.59 ± 4% perf-profile.self.cycles-pp.vfs_write
> > 1.38 ± 2% -0.4 0.99 perf-profile.self.cycles-pp.__kmem_cache_free
> > 0.86 ± 2% -0.4 0.50 ± 3% perf-profile.self.cycles-pp.__kmem_cache_alloc_node
> > 0.92 ± 4% -0.4 0.56 ± 4% perf-profile.self.cycles-pp.sock_write_iter
> > 1.06 ± 3% -0.4 0.70 ± 3% perf-profile.self.cycles-pp.__might_resched
> > 0.73 ± 4% -0.3 0.44 ± 4% perf-profile.self.cycles-pp.__cond_resched
> > 0.85 ± 3% -0.3 0.59 ± 4% perf-profile.self.cycles-pp.__check_heap_object
> > 1.46 ± 7% -0.3 1.20 ± 2% perf-profile.self.cycles-pp.unix_stream_sendmsg
> > 0.73 ± 9% -0.3 0.47 ± 2% perf-profile.self.cycles-pp.skb_set_owner_w
> > 1.54 -0.3 1.28 ± 4% perf-profile.self.cycles-pp.apparmor_file_permission
> > 0.74 ± 3% -0.2 0.50 ± 2% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
> > 1.15 ± 3% -0.2 0.91 ± 8% perf-profile.self.cycles-pp.aa_sk_perm
> > 0.60 -0.2 0.36 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > 0.65 ± 4% -0.2 0.45 ± 6% perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
> > 0.24 ± 6% -0.2 0.05 ± 56% perf-profile.self.cycles-pp.fsnotify_perm
> > 0.76 ± 3% -0.2 0.58 ± 2% perf-profile.self.cycles-pp.sock_read_iter
> > 1.10 ± 4% -0.2 0.92 ± 6% perf-profile.self.cycles-pp.__fget_light
> > 0.42 ± 3% -0.2 0.25 ± 4% perf-profile.self.cycles-pp.obj_cgroup_charge
> > 0.32 ± 4% -0.2 0.17 ± 6% perf-profile.self.cycles-pp.refill_obj_stock
> > 0.29 -0.2 0.14 ± 8% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
> > 0.54 ± 3% -0.1 0.40 ± 2% perf-profile.self.cycles-pp.__might_sleep
> > 0.30 ± 7% -0.1 0.16 ± 22% perf-profile.self.cycles-pp.security_file_permission
> > 0.34 ± 3% -0.1 0.21 ± 6% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> > 0.41 ± 3% -0.1 0.29 ± 3% perf-profile.self.cycles-pp.is_vmalloc_addr
> > 0.27 ± 3% -0.1 0.16 ± 6% perf-profile.self.cycles-pp._copy_from_iter
> > 0.24 ± 3% -0.1 0.12 ± 3% perf-profile.self.cycles-pp.ksys_write
> > 0.95 ± 2% -0.1 0.84 ± 5% perf-profile.self.cycles-pp.__virt_addr_valid
> > 0.56 ± 11% -0.1 0.46 ± 4% perf-profile.self.cycles-pp.sock_def_readable
> > 0.16 ± 7% -0.1 0.06 ± 18% perf-profile.self.cycles-pp.sock_recvmsg
> > 0.22 ± 5% -0.1 0.14 ± 2% perf-profile.self.cycles-pp.ksys_read
> > 0.27 ± 4% -0.1 0.19 ± 5% perf-profile.self.cycles-pp.kmalloc_slab
> > 0.28 ± 2% -0.1 0.20 ± 2% perf-profile.self.cycles-pp.consume_skb
> > 0.35 ± 2% -0.1 0.28 ± 3% perf-profile.self.cycles-pp.__check_object_size
> > 0.13 ± 8% -0.1 0.06 ± 18% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> > 0.20 ± 5% -0.1 0.12 ± 6% perf-profile.self.cycles-pp.kmalloc_reserve
> > 0.26 ± 5% -0.1 0.19 ± 4% perf-profile.self.cycles-pp.sock_alloc_send_pskb
> > 0.42 ± 2% -0.1 0.35 ± 7% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> > 0.19 ± 5% -0.1 0.12 ± 6% perf-profile.self.cycles-pp.aa_file_perm
> > 0.16 ± 4% -0.1 0.10 ± 4% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
> > 0.18 ± 4% -0.1 0.12 ± 6% perf-profile.self.cycles-pp.apparmor_socket_sendmsg
> > 0.18 ± 5% -0.1 0.12 ± 4% perf-profile.self.cycles-pp.apparmor_socket_recvmsg
> > 0.15 ± 5% -0.1 0.10 ± 5% perf-profile.self.cycles-pp.alloc_skb_with_frags
> > 0.64 ± 3% -0.1 0.59 perf-profile.self.cycles-pp.__libc_write
> > 0.20 ± 4% -0.1 0.15 ± 3% perf-profile.self.cycles-pp._copy_to_iter
> > 0.15 ± 5% -0.1 0.10 ± 11% perf-profile.self.cycles-pp.sock_sendmsg
> > 0.08 ± 4% -0.1 0.03 ± 81% perf-profile.self.cycles-pp.copyout
> > 0.11 ± 6% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__fdget_pos
> > 0.12 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp.kmalloc_size_roundup
> > 0.34 ± 3% -0.0 0.29 perf-profile.self.cycles-pp.do_syscall_64
> > 0.20 ± 4% -0.0 0.15 ± 4% perf-profile.self.cycles-pp.rcu_all_qs
> > 0.41 ± 3% -0.0 0.37 ± 8% perf-profile.self.cycles-pp.unix_stream_recvmsg
> > 0.22 ± 2% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.unix_destruct_scm
> > 0.09 ± 4% -0.0 0.05 perf-profile.self.cycles-pp.should_failslab
> > 0.10 ± 15% -0.0 0.06 ± 50% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
> > 0.11 ± 4% -0.0 0.07 perf-profile.self.cycles-pp.__might_fault
> > 0.16 ± 2% -0.0 0.13 ± 6% perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
> > 0.18 ± 4% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.security_socket_getpeersec_dgram
> > 0.28 ± 2% -0.0 0.25 ± 2% perf-profile.self.cycles-pp.unix_write_space
> > 0.17 ± 2% -0.0 0.15 ± 5% perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram
> > 0.08 ± 6% -0.0 0.05 ± 7% perf-profile.self.cycles-pp.security_socket_sendmsg
> > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.__skb_datagram_iter
> > 0.24 ± 2% -0.0 0.22 perf-profile.self.cycles-pp.mutex_unlock
> > 0.08 ± 5% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.scm_recv
> > 0.17 ± 2% +0.0 0.19 ± 3% perf-profile.self.cycles-pp.__x64_sys_read
> > 0.19 ± 3% +0.0 0.22 ± 2% perf-profile.self.cycles-pp.__get_task_ioprio
> > 0.00 +0.1 0.06 perf-profile.self.cycles-pp.finish_wait
> > 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.cr4_update_irqsoff
> > 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.invalidate_user_asid
> > 0.00 +0.1 0.07 ± 12% perf-profile.self.cycles-pp.wake_affine
> > 0.00 +0.1 0.07 ± 7% perf-profile.self.cycles-pp.check_cfs_rq_runtime
> > 0.00 +0.1 0.07 ± 5% perf-profile.self.cycles-pp.perf_trace_buf_update
> > 0.00 +0.1 0.07 ± 9% perf-profile.self.cycles-pp.asm_sysvec_reschedule_ipi
> > 0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.__bitmap_and
> > 0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp.schedule_debug
> > 0.00 +0.1 0.08 ± 13% perf-profile.self.cycles-pp.read@plt
> > 0.00 +0.1 0.08 ± 12% perf-profile.self.cycles-pp.perf_trace_buf_alloc
> > 0.00 +0.1 0.09 ± 35% perf-profile.self.cycles-pp.migrate_task_rq_fair
> > 0.00 +0.1 0.09 ± 5% perf-profile.self.cycles-pp.place_entity
> > 0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.tracing_gen_ctx_irq_test
> > 0.00 +0.1 0.10 perf-profile.self.cycles-pp.__wake_up_common_lock
> > 0.07 ± 17% +0.1 0.18 ± 3% perf-profile.self.cycles-pp.__list_add_valid
> > 0.00 +0.1 0.11 ± 8% perf-profile.self.cycles-pp.native_irq_return_iret
> > 0.00 +0.1 0.12 ± 6% perf-profile.self.cycles-pp.select_idle_cpu
> > 0.00 +0.1 0.12 ± 34% perf-profile.self.cycles-pp._find_next_and_bit
> > 0.00 +0.1 0.13 ± 25% perf-profile.self.cycles-pp.__cgroup_account_cputime
> > 0.00 +0.1 0.13 ± 7% perf-profile.self.cycles-pp.switch_ldt
> > 0.00 +0.1 0.14 ± 5% perf-profile.self.cycles-pp.check_preempt_curr
> > 0.00 +0.1 0.15 ± 2% perf-profile.self.cycles-pp.save_fpregs_to_fpstate
> > 0.00 +0.1 0.15 ± 5% perf-profile.self.cycles-pp.__rdgsbase_inactive
> > 0.14 ± 3% +0.2 0.29 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> > 0.00 +0.2 0.15 ± 7% perf-profile.self.cycles-pp.ttwu_queue_wakelist
> > 0.00 +0.2 0.17 ± 4% perf-profile.self.cycles-pp.rb_insert_color
> > 0.00 +0.2 0.17 ± 5% perf-profile.self.cycles-pp.rb_next
> > 0.00 +0.2 0.18 ± 2% perf-profile.self.cycles-pp.autoremove_wake_function
> > 0.01 ±223% +0.2 0.19 ± 6% perf-profile.self.cycles-pp.ttwu_do_activate
> > 0.00 +0.2 0.20 ± 2% perf-profile.self.cycles-pp.rcu_note_context_switch
> > 0.00 +0.2 0.20 ± 7% perf-profile.self.cycles-pp.exit_to_user_mode_loop
> > 0.27 +0.2 0.47 ± 3% perf-profile.self.cycles-pp.mutex_lock
> > 0.00 +0.2 0.20 ± 28% perf-profile.self.cycles-pp.perf_trace_sched_switch
> > 0.00 +0.2 0.21 ± 9% perf-profile.self.cycles-pp.resched_curr
> > 0.04 ± 45% +0.2 0.26 ± 7% perf-profile.self.cycles-pp.perf_tp_event
> > 0.06 ± 7% +0.2 0.28 ± 8% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
> > 0.19 ± 7% +0.2 0.41 ± 5% perf-profile.self.cycles-pp.__list_del_entry_valid
> > 0.08 ± 5% +0.2 0.31 ± 11% perf-profile.self.cycles-pp.task_h_load
> > 0.00 +0.2 0.23 ± 5% perf-profile.self.cycles-pp.finish_task_switch
> > 0.03 ± 70% +0.2 0.27 ± 5% perf-profile.self.cycles-pp.rb_erase
> > 0.02 ±142% +0.3 0.29 ± 2% perf-profile.self.cycles-pp.native_sched_clock
> > 0.00 +0.3 0.28 ± 3% perf-profile.self.cycles-pp.__wrgsbase_inactive
> > 0.00 +0.3 0.28 ± 6% perf-profile.self.cycles-pp.clear_buddies
> > 0.07 ± 10% +0.3 0.35 ± 3% perf-profile.self.cycles-pp.schedule_timeout
> > 0.03 ± 70% +0.3 0.33 ± 3% perf-profile.self.cycles-pp.select_task_rq
> > 0.06 ± 13% +0.3 0.36 ± 4% perf-profile.self.cycles-pp.__wake_up_common
> > 0.06 ± 13% +0.3 0.36 ± 3% perf-profile.self.cycles-pp.dequeue_entity
> > 0.06 ± 18% +0.3 0.37 ± 7% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> > 0.01 ±223% +0.3 0.33 ± 4% perf-profile.self.cycles-pp.schedule
> > 0.02 ±142% +0.3 0.35 ± 7% perf-profile.self.cycles-pp.cpuacct_charge
> > 0.01 ±223% +0.3 0.35 perf-profile.self.cycles-pp.set_next_entity
> > 0.00 +0.4 0.35 ± 13% perf-profile.self.cycles-pp.available_idle_cpu
> > 0.08 ± 10% +0.4 0.44 ± 5% perf-profile.self.cycles-pp.prepare_to_wait
> > 0.63 ± 3% +0.4 1.00 ± 4% perf-profile.self.cycles-pp.vfs_read
> > 0.02 ±142% +0.4 0.40 ± 4% perf-profile.self.cycles-pp.check_preempt_wakeup
> > 0.02 ±141% +0.4 0.42 ± 4% perf-profile.self.cycles-pp.pick_next_entity
> > 0.07 ± 17% +0.4 0.48 perf-profile.self.cycles-pp.__calc_delta
> > 0.06 ± 14% +0.4 0.47 ± 3% perf-profile.self.cycles-pp.unix_stream_data_wait
> > 0.04 ± 45% +0.4 0.45 ± 4% perf-profile.self.cycles-pp.switch_fpu_return
> > 0.00 +0.5 0.46 ± 7% perf-profile.self.cycles-pp.set_next_buddy
> > 0.07 ± 17% +0.5 0.53 ± 3% perf-profile.self.cycles-pp.select_task_rq_fair
> > 0.08 ± 16% +0.5 0.55 ± 4% perf-profile.self.cycles-pp.try_to_wake_up
> > 0.08 ± 19% +0.5 0.56 ± 3% perf-profile.self.cycles-pp.update_rq_clock
> > 0.02 ±141% +0.5 0.50 ± 10% perf-profile.self.cycles-pp.select_idle_sibling
> > 0.77 ± 2% +0.5 1.25 ± 2% perf-profile.self.cycles-pp.__libc_read
> > 0.09 ± 19% +0.5 0.59 ± 3% perf-profile.self.cycles-pp.reweight_entity
> > 0.08 ± 14% +0.5 0.59 ± 2% perf-profile.self.cycles-pp.dequeue_task_fair
> > 0.08 ± 13% +0.6 0.64 ± 5% perf-profile.self.cycles-pp.update_min_vruntime
> > 0.02 ±141% +0.6 0.58 ± 7% perf-profile.self.cycles-pp.put_prev_entity
> > 0.06 ± 11% +0.6 0.64 ± 4% perf-profile.self.cycles-pp.enqueue_task_fair
> > 0.07 ± 18% +0.6 0.68 ± 3% perf-profile.self.cycles-pp.os_xsave
> > 1.39 ± 2% +0.7 2.06 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > 0.28 ± 8% +0.7 0.97 ± 4% perf-profile.self.cycles-pp.update_cfs_group
> > 0.14 ± 8% +0.7 0.83 ± 3% perf-profile.self.cycles-pp.__update_load_avg_se
> > 1.76 ± 3% +0.7 2.47 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
> > 0.12 ± 12% +0.7 0.85 ± 5% perf-profile.self.cycles-pp.prepare_task_switch
> > 0.12 ± 12% +0.8 0.91 ± 3% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> > 0.13 ± 12% +0.8 0.93 ± 5% perf-profile.self.cycles-pp.pick_next_task_fair
> > 0.13 ± 12% +0.9 0.98 ± 4% perf-profile.self.cycles-pp.__switch_to
> > 0.11 ± 18% +0.9 1.06 ± 5% perf-profile.self.cycles-pp.___perf_sw_event
> > 0.16 ± 11% +1.2 1.34 ± 4% perf-profile.self.cycles-pp.enqueue_entity
> > 0.20 ± 12% +1.4 1.58 ± 4% perf-profile.self.cycles-pp.__switch_to_asm
> > 0.23 ± 10% +1.4 1.65 ± 5% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> > 0.25 ± 12% +1.5 1.77 ± 4% perf-profile.self.cycles-pp.__schedule
> > 0.22 ± 10% +1.6 1.78 ± 10% perf-profile.self.cycles-pp.update_load_avg
> > 0.23 ± 16% +1.7 1.91 ± 7% perf-profile.self.cycles-pp.update_curr
> > 0.48 ± 11% +3.4 3.86 ± 4% perf-profile.self.cycles-pp.switch_mm_irqs_off
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > sudo bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> > sudo bin/lkp run generated-yaml-file
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

2023-02-27 08:52:53

by Roman Kagan

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
> > What scares me, though, is that I've got a message from the test robot
> > that this commit drammatically affected hackbench results, see the quote
> > below. I expected the commit not to affect any benchmarks.
> >
> > Any idea what could have caused this change?
>
> Hmm, It's most probably because se->exec_start is reset after a
> migration and the condition becomes true for newly migrated task
> whereas its vruntime should be after min_vruntime.
>
> We have missed this condition

Makes sense to me.

But what would then be the reliable way to detect a sched_entity which
has slept for long and risks overflowing in .vruntime comparison?

Thanks,
Roman.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




2023-02-27 14:37:55

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
>
> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> > On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
> > > What scares me, though, is that I've got a message from the test robot
> > > that this commit drammatically affected hackbench results, see the quote
> > > below. I expected the commit not to affect any benchmarks.
> > >
> > > Any idea what could have caused this change?
> >
> > Hmm, It's most probably because se->exec_start is reset after a
> > migration and the condition becomes true for newly migrated task
> > whereas its vruntime should be after min_vruntime.
> >
> > We have missed this condition
>
> Makes sense to me.
>
> But what would then be the reliable way to detect a sched_entity which
> has slept for long and risks overflowing in .vruntime comparison?

For now I don't have a better idea than adding the same check in
migrate_task_rq_fair()

>
> Thanks,
> Roman.
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

2023-02-27 17:00:29

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On 27/02/2023 15:37, Vincent Guittot wrote:
> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
>>
>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
>>>> What scares me, though, is that I've got a message from the test robot
>>>> that this commit drammatically affected hackbench results, see the quote
>>>> below. I expected the commit not to affect any benchmarks.
>>>>
>>>> Any idea what could have caused this change?
>>>
>>> Hmm, It's most probably because se->exec_start is reset after a
>>> migration and the condition becomes true for newly migrated task
>>> whereas its vruntime should be after min_vruntime.
>>>
>>> We have missed this condition
>>
>> Makes sense to me.
>>
>> But what would then be the reliable way to detect a sched_entity which
>> has slept for long and risks overflowing in .vruntime comparison?
>
> For now I don't have a better idea than adding the same check in
> migrate_task_rq_fair()

Don't we have the issue that we could have a non-up-to-date rq clock in
migrate? No rq lock held in `!task_on_rq_migrating(p)`.

Also deferring `se->exec_start = 0` from `migrate` into `enqueue ->
place entity` doesn't seem to work since the rq clocks of different CPUs
are not in sync.


2023-02-27 17:15:39

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Mon, 27 Feb 2023 at 18:00, Dietmar Eggemann <[email protected]> wrote:
>
> On 27/02/2023 15:37, Vincent Guittot wrote:
> > On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
> >>
> >> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
> >>>> What scares me, though, is that I've got a message from the test robot
> >>>> that this commit drammatically affected hackbench results, see the quote
> >>>> below. I expected the commit not to affect any benchmarks.
> >>>>
> >>>> Any idea what could have caused this change?
> >>>
> >>> Hmm, It's most probably because se->exec_start is reset after a
> >>> migration and the condition becomes true for newly migrated task
> >>> whereas its vruntime should be after min_vruntime.
> >>>
> >>> We have missed this condition
> >>
> >> Makes sense to me.
> >>
> >> But what would then be the reliable way to detect a sched_entity which
> >> has slept for long and risks overflowing in .vruntime comparison?
> >
> > For now I don't have a better idea than adding the same check in
> > migrate_task_rq_fair()
>
> Don't we have the issue that we could have a non-up-to-date rq clock in
> migrate? No rq lock held in `!task_on_rq_migrating(p)`.

yes the rq clock may be not up to date but that would also mean that
the cfs was idle and as a result its min_vruntime has not moved
forward and we don't have a problem of possible overflow

>
> Also deferring `se->exec_start = 0` from `migrate` into `enqueue ->
> place entity` doesn't seem to work since the rq clocks of different CPUs
> are not in sync.

yes

>

2023-03-02 09:36:25

by Zhang Qiao

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed



在 2023/2/27 22:37, Vincent Guittot 写道:
> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
>>
>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
>>>> What scares me, though, is that I've got a message from the test robot
>>>> that this commit drammatically affected hackbench results, see the quote
>>>> below. I expected the commit not to affect any benchmarks.
>>>>
>>>> Any idea what could have caused this change?
>>>
>>> Hmm, It's most probably because se->exec_start is reset after a
>>> migration and the condition becomes true for newly migrated task
>>> whereas its vruntime should be after min_vruntime.
>>>
>>> We have missed this condition
>>
>> Makes sense to me.
>>
>> But what would then be the reliable way to detect a sched_entity which
>> has slept for long and risks overflowing in .vruntime comparison?
>
> For now I don't have a better idea than adding the same check in
> migrate_task_rq_fair()

Hi, Vincent,
I fixed this condition as you said, and the test results are as follows.

testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
version1: v6.2
version2: v6.2 + commit 829c1651e9c4
version3: v6.2 + commit 829c1651e9c4 + this patch

-------------------------------------------------
version1 version2 version3
test1 81.0 118.1 82.1
test2 82.1 116.9 80.3
test3 83.2 103.9 83.3
avg(s) 82.1 113.0 81.9

-------------------------------------------------
After deal with the task migration case, the hackbench result has restored.

The patch as follow, how does this look?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff4dbbae3b10..3a88d20fd29e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
#endif
}

+static inline u64 sched_sleeper_credit(struct sched_entity *se)
+{
+
+ unsigned long thresh;
+
+ if (se_is_idle(se))
+ thresh = sysctl_sched_min_granularity;
+ else
+ thresh = sysctl_sched_latency;
+
+ /*
+ * Halve their sleep time's effect, to allow
+ * for a gentler effect of sleepers:
+ */
+ if (sched_feat(GENTLE_FAIR_SLEEPERS))
+ thresh >>= 1;
+
+ return thresh;
+}
+
static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
@@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
vruntime += sched_vslice(cfs_rq, se);

/* sleeps up to a single latency don't count. */
- if (!initial) {
- unsigned long thresh;
-
- if (se_is_idle(se))
- thresh = sysctl_sched_min_granularity;
- else
- thresh = sysctl_sched_latency;
-
- /*
- * Halve their sleep time's effect, to allow
- * for a gentler effect of sleepers:
- */
- if (sched_feat(GENTLE_FAIR_SLEEPERS))
- thresh >>= 1;
-
- vruntime -= thresh;
- }
+ if (!initial)
+ vruntime -= sched_sleeper_credit(se);

/*
* Pull vruntime of the entity being placed to the base level of
@@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
* inversed due to s64 overflow.
*/
sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
- if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+ if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
se->vruntime = vruntime;
else
se->vruntime = max_vruntime(se->vruntime, vruntime);
@@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
*/
if (READ_ONCE(p->__state) == TASK_WAKING) {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;

- se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
+ if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+ se->vruntime = -sched_sleeper_credit(se);
+ else
+ se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
}

if (!task_on_rq_migrating(p)) {



Thanks.
Zhang Qiao.

>
>>
>> Thanks,
>> Roman.
>>
>>
>>
>> Amazon Development Center Germany GmbH
>> Krausenstr. 38
>> 10117 Berlin
>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>> Sitz: Berlin
>> Ust-ID: DE 289 237 879
>>
>>
>>
> .
>

2023-03-02 13:34:35

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <[email protected]> wrote:
>
>
>
> 在 2023/2/27 22:37, Vincent Guittot 写道:
> > On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
> >>
> >> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
> >>>> What scares me, though, is that I've got a message from the test robot
> >>>> that this commit drammatically affected hackbench results, see the quote
> >>>> below. I expected the commit not to affect any benchmarks.
> >>>>
> >>>> Any idea what could have caused this change?
> >>>
> >>> Hmm, It's most probably because se->exec_start is reset after a
> >>> migration and the condition becomes true for newly migrated task
> >>> whereas its vruntime should be after min_vruntime.
> >>>
> >>> We have missed this condition
> >>
> >> Makes sense to me.
> >>
> >> But what would then be the reliable way to detect a sched_entity which
> >> has slept for long and risks overflowing in .vruntime comparison?
> >
> > For now I don't have a better idea than adding the same check in
> > migrate_task_rq_fair()
>
> Hi, Vincent,
> I fixed this condition as you said, and the test results are as follows.
>
> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
> version1: v6.2
> version2: v6.2 + commit 829c1651e9c4
> version3: v6.2 + commit 829c1651e9c4 + this patch
>
> -------------------------------------------------
> version1 version2 version3
> test1 81.0 118.1 82.1
> test2 82.1 116.9 80.3
> test3 83.2 103.9 83.3
> avg(s) 82.1 113.0 81.9
>
> -------------------------------------------------
> After deal with the task migration case, the hackbench result has restored.
>
> The patch as follow, how does this look?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff4dbbae3b10..3a88d20fd29e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
> #endif
> }
>
> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
> +{
> +
> + unsigned long thresh;
> +
> + if (se_is_idle(se))
> + thresh = sysctl_sched_min_granularity;
> + else
> + thresh = sysctl_sched_latency;
> +
> + /*
> + * Halve their sleep time's effect, to allow
> + * for a gentler effect of sleepers:
> + */
> + if (sched_feat(GENTLE_FAIR_SLEEPERS))
> + thresh >>= 1;
> +
> + return thresh;
> +}
> +
> static void
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime += sched_vslice(cfs_rq, se);
>
> /* sleeps up to a single latency don't count. */
> - if (!initial) {
> - unsigned long thresh;
> -
> - if (se_is_idle(se))
> - thresh = sysctl_sched_min_granularity;
> - else
> - thresh = sysctl_sched_latency;
> -
> - /*
> - * Halve their sleep time's effect, to allow
> - * for a gentler effect of sleepers:
> - */
> - if (sched_feat(GENTLE_FAIR_SLEEPERS))
> - thresh >>= 1;
> -
> - vruntime -= thresh;
> - }
> + if (!initial)
> + vruntime -= sched_sleeper_credit(se);
>
> /*
> * Pull vruntime of the entity being placed to the base level of
> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> * inversed due to s64 overflow.
> */
> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> - if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> + if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
> se->vruntime = vruntime;
> else
> se->vruntime = max_vruntime(se->vruntime, vruntime);
> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> */
> if (READ_ONCE(p->__state) == TASK_WAKING) {
> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> + u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>
> - se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> + if ((s64)sleep_time > 60LL * NSEC_PER_SEC)

You also need to test (se->exec_start !=0) here because the task might
migrate another time before being scheduled. You should create a
helper function like below and use it in both place

static inline bool entity_long_sleep(se)
{
struct cfs_rq *cfs_rq;
u64 sleep_time;

if (se->exec_start == 0)
return false;

cfs_rq = cfs_rq_of(se);
sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
return true;

return false;
}


> + se->vruntime = -sched_sleeper_credit(se);
> + else
> + se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> }
>
> if (!task_on_rq_migrating(p)) {
>
>
>
> Thanks.
> Zhang Qiao.
>
> >
> >>
> >> Thanks,
> >> Roman.
> >>
> >>
> >>
> >> Amazon Development Center Germany GmbH
> >> Krausenstr. 38
> >> 10117 Berlin
> >> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> >> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> >> Sitz: Berlin
> >> Ust-ID: DE 289 237 879
> >>
> >>
> >>
> > .
> >

2023-03-02 14:29:40

by Zhang Qiao

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed



在 2023/3/2 21:34, Vincent Guittot 写道:
> On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <[email protected]> wrote:
>>
>>
>>
>> 在 2023/2/27 22:37, Vincent Guittot 写道:
>>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
>>>>
>>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
>>>>>> What scares me, though, is that I've got a message from the test robot
>>>>>> that this commit drammatically affected hackbench results, see the quote
>>>>>> below. I expected the commit not to affect any benchmarks.
>>>>>>
>>>>>> Any idea what could have caused this change?
>>>>>
>>>>> Hmm, It's most probably because se->exec_start is reset after a
>>>>> migration and the condition becomes true for newly migrated task
>>>>> whereas its vruntime should be after min_vruntime.
>>>>>
>>>>> We have missed this condition
>>>>
>>>> Makes sense to me.
>>>>
>>>> But what would then be the reliable way to detect a sched_entity which
>>>> has slept for long and risks overflowing in .vruntime comparison?
>>>
>>> For now I don't have a better idea than adding the same check in
>>> migrate_task_rq_fair()
>>
>> Hi, Vincent,
>> I fixed this condition as you said, and the test results are as follows.
>>
>> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
>> version1: v6.2
>> version2: v6.2 + commit 829c1651e9c4
>> version3: v6.2 + commit 829c1651e9c4 + this patch
>>
>> -------------------------------------------------
>> version1 version2 version3
>> test1 81.0 118.1 82.1
>> test2 82.1 116.9 80.3
>> test3 83.2 103.9 83.3
>> avg(s) 82.1 113.0 81.9
>>
>> -------------------------------------------------
>> After deal with the task migration case, the hackbench result has restored.
>>
>> The patch as follow, how does this look?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff4dbbae3b10..3a88d20fd29e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> #endif
>> }
>>
>> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
>> +{
>> +
>> + unsigned long thresh;
>> +
>> + if (se_is_idle(se))
>> + thresh = sysctl_sched_min_granularity;
>> + else
>> + thresh = sysctl_sched_latency;
>> +
>> + /*
>> + * Halve their sleep time's effect, to allow
>> + * for a gentler effect of sleepers:
>> + */
>> + if (sched_feat(GENTLE_FAIR_SLEEPERS))
>> + thresh >>= 1;
>> +
>> + return thresh;
>> +}
>> +
>> static void
>> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>> {
>> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>> vruntime += sched_vslice(cfs_rq, se);
>>
>> /* sleeps up to a single latency don't count. */
>> - if (!initial) {
>> - unsigned long thresh;
>> -
>> - if (se_is_idle(se))
>> - thresh = sysctl_sched_min_granularity;
>> - else
>> - thresh = sysctl_sched_latency;
>> -
>> - /*
>> - * Halve their sleep time's effect, to allow
>> - * for a gentler effect of sleepers:
>> - */
>> - if (sched_feat(GENTLE_FAIR_SLEEPERS))
>> - thresh >>= 1;
>> -
>> - vruntime -= thresh;
>> - }
>> + if (!initial)
>> + vruntime -= sched_sleeper_credit(se);
>>
>> /*
>> * Pull vruntime of the entity being placed to the base level of
>> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>> * inversed due to s64 overflow.
>> */
>> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>> - if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>> + if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
>> se->vruntime = vruntime;
>> else
>> se->vruntime = max_vruntime(se->vruntime, vruntime);
>> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>> */
>> if (READ_ONCE(p->__state) == TASK_WAKING) {
>> struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> + u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>
>> - se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>> + if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>
> You also need to test (se->exec_start !=0) here because the task might

Hi,

I don't understand when the another migration happend. Could you tell me in more detail?

I think the next migration will happend after the wakee task enqueued, but at this time
the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().

If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
perform multiple times,wouldn't it go wrong in this way?

> migrate another time before being scheduled. You should create a
> helper function like below and use it in both place

Ok, I will update at next version.


Thanks,
ZhangQiao.

>
> static inline bool entity_long_sleep(se)
> {
> struct cfs_rq *cfs_rq;
> u64 sleep_time;
>
> if (se->exec_start == 0)
> return false;
>
> cfs_rq = cfs_rq_of(se);
> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> return true;
>
> return false;
> }
>
>
>> + se->vruntime = -sched_sleeper_credit(se);
>> + else
>> + se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>> }
>>
>> if (!task_on_rq_migrating(p)) {
>>
>>
>>
>> Thanks.
>> Zhang Qiao.
>>
>>>
>>>>
>>>> Thanks,
>>>> Roman.
>>>>
>>>>
>>>>
>>>> Amazon Development Center Germany GmbH
>>>> Krausenstr. 38
>>>> 10117 Berlin
>>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>>>> Sitz: Berlin
>>>> Ust-ID: DE 289 237 879
>>>>
>>>>
>>>>
>>> .
>>>
> .
>

2023-03-02 14:56:33

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Thu, 2 Mar 2023 at 15:29, Zhang Qiao <[email protected]> wrote:
>
>
>
> 在 2023/3/2 21:34, Vincent Guittot 写道:
> > On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <[email protected]> wrote:
> >>
> >>
> >>
> >> 在 2023/2/27 22:37, Vincent Guittot 写道:
> >>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
> >>>>
> >>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
> >>>>>> What scares me, though, is that I've got a message from the test robot
> >>>>>> that this commit drammatically affected hackbench results, see the quote
> >>>>>> below. I expected the commit not to affect any benchmarks.
> >>>>>>
> >>>>>> Any idea what could have caused this change?
> >>>>>
> >>>>> Hmm, It's most probably because se->exec_start is reset after a
> >>>>> migration and the condition becomes true for newly migrated task
> >>>>> whereas its vruntime should be after min_vruntime.
> >>>>>
> >>>>> We have missed this condition
> >>>>
> >>>> Makes sense to me.
> >>>>
> >>>> But what would then be the reliable way to detect a sched_entity which
> >>>> has slept for long and risks overflowing in .vruntime comparison?
> >>>
> >>> For now I don't have a better idea than adding the same check in
> >>> migrate_task_rq_fair()
> >>
> >> Hi, Vincent,
> >> I fixed this condition as you said, and the test results are as follows.
> >>
> >> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
> >> version1: v6.2
> >> version2: v6.2 + commit 829c1651e9c4
> >> version3: v6.2 + commit 829c1651e9c4 + this patch
> >>
> >> -------------------------------------------------
> >> version1 version2 version3
> >> test1 81.0 118.1 82.1
> >> test2 82.1 116.9 80.3
> >> test3 83.2 103.9 83.3
> >> avg(s) 82.1 113.0 81.9
> >>
> >> -------------------------------------------------
> >> After deal with the task migration case, the hackbench result has restored.
> >>
> >> The patch as follow, how does this look?
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index ff4dbbae3b10..3a88d20fd29e 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >> #endif
> >> }
> >>
> >> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
> >> +{
> >> +
> >> + unsigned long thresh;
> >> +
> >> + if (se_is_idle(se))
> >> + thresh = sysctl_sched_min_granularity;
> >> + else
> >> + thresh = sysctl_sched_latency;
> >> +
> >> + /*
> >> + * Halve their sleep time's effect, to allow
> >> + * for a gentler effect of sleepers:
> >> + */
> >> + if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >> + thresh >>= 1;
> >> +
> >> + return thresh;
> >> +}
> >> +
> >> static void
> >> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >> {
> >> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >> vruntime += sched_vslice(cfs_rq, se);
> >>
> >> /* sleeps up to a single latency don't count. */
> >> - if (!initial) {
> >> - unsigned long thresh;
> >> -
> >> - if (se_is_idle(se))
> >> - thresh = sysctl_sched_min_granularity;
> >> - else
> >> - thresh = sysctl_sched_latency;
> >> -
> >> - /*
> >> - * Halve their sleep time's effect, to allow
> >> - * for a gentler effect of sleepers:
> >> - */
> >> - if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >> - thresh >>= 1;
> >> -
> >> - vruntime -= thresh;
> >> - }
> >> + if (!initial)
> >> + vruntime -= sched_sleeper_credit(se);
> >>
> >> /*
> >> * Pull vruntime of the entity being placed to the base level of
> >> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >> * inversed due to s64 overflow.
> >> */
> >> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >> - if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >> + if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
> >> se->vruntime = vruntime;
> >> else
> >> se->vruntime = max_vruntime(se->vruntime, vruntime);
> >> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >> */
> >> if (READ_ONCE(p->__state) == TASK_WAKING) {
> >> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> + u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>
> >> - se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >> + if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >
> > You also need to test (se->exec_start !=0) here because the task might
>
> Hi,
>
> I don't understand when the another migration happend. Could you tell me in more detail?

se->exec_start is update when the task becomes current.

You can have the sequence:

task TA runs on CPU0
TA's se->exec_start = xxxx
TA is put back into the rb tree waiting for next slice while another
task is running
CPU1 pulls TA which migrates on CPU1
migrate_task_rq_fair() w/ TA's se->exec_start == xxxx
TA's se->exec_start = 0
TA is put into the rb tree of CPU1 waiting to run on CPU1
CPU2 pulls TA which migrates on CPU2
migrate_task_rq_fair() w/ TA's se->exec_start == 0
TA's se->exec_start = 0

>
> I think the next migration will happend after the wakee task enqueued, but at this time
> the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().
>
> If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
> perform multiple times,wouldn't it go wrong in this way?

the vruntime have been updated when enqueued but not exec_start

>
> > migrate another time before being scheduled. You should create a
> > helper function like below and use it in both place
>
> Ok, I will update at next version.
>
>
> Thanks,
> ZhangQiao.
>
> >
> > static inline bool entity_long_sleep(se)
> > {
> > struct cfs_rq *cfs_rq;
> > u64 sleep_time;
> >
> > if (se->exec_start == 0)
> > return false;
> >
> > cfs_rq = cfs_rq_of(se);
> > sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> > if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> > return true;
> >
> > return false;
> > }
> >
> >
> >> + se->vruntime = -sched_sleeper_credit(se);
> >> + else
> >> + se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >> }
> >>
> >> if (!task_on_rq_migrating(p)) {
> >>
> >>
> >>
> >> Thanks.
> >> Zhang Qiao.
> >>
> >>>
> >>>>
> >>>> Thanks,
> >>>> Roman.
> >>>>
> >>>>
> >>>>
> >>>> Amazon Development Center Germany GmbH
> >>>> Krausenstr. 38
> >>>> 10117 Berlin
> >>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> >>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> >>>> Sitz: Berlin
> >>>> Ust-ID: DE 289 237 879
> >>>>
> >>>>
> >>>>
> >>> .
> >>>
> > .
> >

2023-03-03 06:51:56

by Zhang Qiao

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed



在 2023/3/2 22:55, Vincent Guittot 写道:
> On Thu, 2 Mar 2023 at 15:29, Zhang Qiao <[email protected]> wrote:
>>
>>
>>
>> 在 2023/3/2 21:34, Vincent Guittot 写道:
>>> On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> 在 2023/2/27 22:37, Vincent Guittot 写道:
>>>>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
>>>>>>
>>>>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>>>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
>>>>>>>> What scares me, though, is that I've got a message from the test robot
>>>>>>>> that this commit drammatically affected hackbench results, see the quote
>>>>>>>> below. I expected the commit not to affect any benchmarks.
>>>>>>>>
>>>>>>>> Any idea what could have caused this change?
>>>>>>>
>>>>>>> Hmm, It's most probably because se->exec_start is reset after a
>>>>>>> migration and the condition becomes true for newly migrated task
>>>>>>> whereas its vruntime should be after min_vruntime.
>>>>>>>
>>>>>>> We have missed this condition
>>>>>>
>>>>>> Makes sense to me.
>>>>>>
>>>>>> But what would then be the reliable way to detect a sched_entity which
>>>>>> has slept for long and risks overflowing in .vruntime comparison?
>>>>>
>>>>> For now I don't have a better idea than adding the same check in
>>>>> migrate_task_rq_fair()
>>>>
>>>> Hi, Vincent,
>>>> I fixed this condition as you said, and the test results are as follows.
>>>>
>>>> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
>>>> version1: v6.2
>>>> version2: v6.2 + commit 829c1651e9c4
>>>> version3: v6.2 + commit 829c1651e9c4 + this patch
>>>>
>>>> -------------------------------------------------
>>>> version1 version2 version3
>>>> test1 81.0 118.1 82.1
>>>> test2 82.1 116.9 80.3
>>>> test3 83.2 103.9 83.3
>>>> avg(s) 82.1 113.0 81.9
>>>>
>>>> -------------------------------------------------
>>>> After deal with the task migration case, the hackbench result has restored.
>>>>
>>>> The patch as follow, how does this look?
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index ff4dbbae3b10..3a88d20fd29e 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>> #endif
>>>> }
>>>>
>>>> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
>>>> +{
>>>> +
>>>> + unsigned long thresh;
>>>> +
>>>> + if (se_is_idle(se))
>>>> + thresh = sysctl_sched_min_granularity;
>>>> + else
>>>> + thresh = sysctl_sched_latency;
>>>> +
>>>> + /*
>>>> + * Halve their sleep time's effect, to allow
>>>> + * for a gentler effect of sleepers:
>>>> + */
>>>> + if (sched_feat(GENTLE_FAIR_SLEEPERS))
>>>> + thresh >>= 1;
>>>> +
>>>> + return thresh;
>>>> +}
>>>> +
>>>> static void
>>>> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>>> {
>>>> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>>> vruntime += sched_vslice(cfs_rq, se);
>>>>
>>>> /* sleeps up to a single latency don't count. */
>>>> - if (!initial) {
>>>> - unsigned long thresh;
>>>> -
>>>> - if (se_is_idle(se))
>>>> - thresh = sysctl_sched_min_granularity;
>>>> - else
>>>> - thresh = sysctl_sched_latency;
>>>> -
>>>> - /*
>>>> - * Halve their sleep time's effect, to allow
>>>> - * for a gentler effect of sleepers:
>>>> - */
>>>> - if (sched_feat(GENTLE_FAIR_SLEEPERS))
>>>> - thresh >>= 1;
>>>> -
>>>> - vruntime -= thresh;
>>>> - }
>>>> + if (!initial)
>>>> + vruntime -= sched_sleeper_credit(se);
>>>>
>>>> /*
>>>> * Pull vruntime of the entity being placed to the base level of
>>>> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>>> * inversed due to s64 overflow.
>>>> */
>>>> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>>> - if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>> + if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>> se->vruntime = vruntime;
>>>> else
>>>> se->vruntime = max_vruntime(se->vruntime, vruntime);
>>>> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>>>> */
>>>> if (READ_ONCE(p->__state) == TASK_WAKING) {
>>>> struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> + u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>>>
>>>> - se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>>>> + if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>
>>> You also need to test (se->exec_start !=0) here because the task might
>>
>> Hi,
>>
>> I don't understand when the another migration happend. Could you tell me in more detail?
>
> se->exec_start is update when the task becomes current.
>
> You can have the sequence:
>
> task TA runs on CPU0
> TA's se->exec_start = xxxx
> TA is put back into the rb tree waiting for next slice while another
> task is running
> CPU1 pulls TA which migrates on CPU1
> migrate_task_rq_fair() w/ TA's se->exec_start == xxxx
> TA's se->exec_start = 0
> TA is put into the rb tree of CPU1 waiting to run on CPU1
> CPU2 pulls TA which migrates on CPU2
> migrate_task_rq_fair() w/ TA's se->exec_start == 0
> TA's se->exec_start = 0
Hi, Vincent,

yes, you're right, such sequence does exist. But at this point, p->__state != TASK_WAKING.

I have a question, Whether there is case that is "p->se.exec_start == 0 && p->__state == TASK_WAKING" ?
I analyzed the code and concluded that this case isn't existed, is it right?

Thanks.
ZhangQiao.

>
>>
>> I think the next migration will happend after the wakee task enqueued, but at this time
>> the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().
>>
>> If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
>> perform multiple times,wouldn't it go wrong in this way?
>
> the vruntime have been updated when enqueued but not exec_start
>
>>
>>> migrate another time before being scheduled. You should create a
>>> helper function like below and use it in both place
>>
>> Ok, I will update at next version.
>>
>>
>> Thanks,
>> ZhangQiao.
>>
>>>
>>> static inline bool entity_long_sleep(se)
>>> {
>>> struct cfs_rq *cfs_rq;
>>> u64 sleep_time;
>>>
>>> if (se->exec_start == 0)
>>> return false;
>>>
>>> cfs_rq = cfs_rq_of(se);
>>> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>> if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>>> return true;
>>>
>>> return false;
>>> }
>>>
>>>
>>>> + se->vruntime = -sched_sleeper_credit(se);
>>>> + else
>>>> + se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>>>> }
>>>>
>>>> if (!task_on_rq_migrating(p)) {
>>>>
>>>>
>>>>
>>>> Thanks.
>>>> Zhang Qiao.
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Roman.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Amazon Development Center Germany GmbH
>>>>>> Krausenstr. 38
>>>>>> 10117 Berlin
>>>>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>>>>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>>>>>> Sitz: Berlin
>>>>>> Ust-ID: DE 289 237 879
>>>>>>
>>>>>>
>>>>>>
>>>>> .
>>>>>
>>> .
>>>
> .
>

2023-03-03 08:33:10

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed

On Fri, 3 Mar 2023 at 07:51, Zhang Qiao <[email protected]> wrote:
>
>
>
> 在 2023/3/2 22:55, Vincent Guittot 写道:
> > On Thu, 2 Mar 2023 at 15:29, Zhang Qiao <[email protected]> wrote:
> >>
> >>
> >>
> >> 在 2023/3/2 21:34, Vincent Guittot 写道:
> >>> On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> 在 2023/2/27 22:37, Vincent Guittot 写道:
> >>>>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <[email protected]> wrote:
> >>>>>>
> >>>>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>>>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <[email protected]> wrote:
> >>>>>>>> What scares me, though, is that I've got a message from the test robot
> >>>>>>>> that this commit drammatically affected hackbench results, see the quote
> >>>>>>>> below. I expected the commit not to affect any benchmarks.
> >>>>>>>>
> >>>>>>>> Any idea what could have caused this change?
> >>>>>>>
> >>>>>>> Hmm, It's most probably because se->exec_start is reset after a
> >>>>>>> migration and the condition becomes true for newly migrated task
> >>>>>>> whereas its vruntime should be after min_vruntime.
> >>>>>>>
> >>>>>>> We have missed this condition
> >>>>>>
> >>>>>> Makes sense to me.
> >>>>>>
> >>>>>> But what would then be the reliable way to detect a sched_entity which
> >>>>>> has slept for long and risks overflowing in .vruntime comparison?
> >>>>>
> >>>>> For now I don't have a better idea than adding the same check in
> >>>>> migrate_task_rq_fair()
> >>>>
> >>>> Hi, Vincent,
> >>>> I fixed this condition as you said, and the test results are as follows.
> >>>>
> >>>> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
> >>>> version1: v6.2
> >>>> version2: v6.2 + commit 829c1651e9c4
> >>>> version3: v6.2 + commit 829c1651e9c4 + this patch
> >>>>
> >>>> -------------------------------------------------
> >>>> version1 version2 version3
> >>>> test1 81.0 118.1 82.1
> >>>> test2 82.1 116.9 80.3
> >>>> test3 83.2 103.9 83.3
> >>>> avg(s) 82.1 113.0 81.9
> >>>>
> >>>> -------------------------------------------------
> >>>> After deal with the task migration case, the hackbench result has restored.
> >>>>
> >>>> The patch as follow, how does this look?
> >>>>
> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>>> index ff4dbbae3b10..3a88d20fd29e 100644
> >>>> --- a/kernel/sched/fair.c
> >>>> +++ b/kernel/sched/fair.c
> >>>> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >>>> #endif
> >>>> }
> >>>>
> >>>> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
> >>>> +{
> >>>> +
> >>>> + unsigned long thresh;
> >>>> +
> >>>> + if (se_is_idle(se))
> >>>> + thresh = sysctl_sched_min_granularity;
> >>>> + else
> >>>> + thresh = sysctl_sched_latency;
> >>>> +
> >>>> + /*
> >>>> + * Halve their sleep time's effect, to allow
> >>>> + * for a gentler effect of sleepers:
> >>>> + */
> >>>> + if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >>>> + thresh >>= 1;
> >>>> +
> >>>> + return thresh;
> >>>> +}
> >>>> +
> >>>> static void
> >>>> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>>> {
> >>>> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>>> vruntime += sched_vslice(cfs_rq, se);
> >>>>
> >>>> /* sleeps up to a single latency don't count. */
> >>>> - if (!initial) {
> >>>> - unsigned long thresh;
> >>>> -
> >>>> - if (se_is_idle(se))
> >>>> - thresh = sysctl_sched_min_granularity;
> >>>> - else
> >>>> - thresh = sysctl_sched_latency;
> >>>> -
> >>>> - /*
> >>>> - * Halve their sleep time's effect, to allow
> >>>> - * for a gentler effect of sleepers:
> >>>> - */
> >>>> - if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >>>> - thresh >>= 1;
> >>>> -
> >>>> - vruntime -= thresh;
> >>>> - }
> >>>> + if (!initial)
> >>>> + vruntime -= sched_sleeper_credit(se);
> >>>>
> >>>> /*
> >>>> * Pull vruntime of the entity being placed to the base level of
> >>>> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>>> * inversed due to s64 overflow.
> >>>> */
> >>>> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>>> - if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>> + if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>> se->vruntime = vruntime;
> >>>> else
> >>>> se->vruntime = max_vruntime(se->vruntime, vruntime);
> >>>> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >>>> */
> >>>> if (READ_ONCE(p->__state) == TASK_WAKING) {
> >>>> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >>>> + u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>>>
> >>>> - se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >>>> + if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>
> >>> You also need to test (se->exec_start !=0) here because the task might
> >>
> >> Hi,
> >>
> >> I don't understand when the another migration happend. Could you tell me in more detail?
> >
> > se->exec_start is update when the task becomes current.
> >
> > You can have the sequence:
> >
> > task TA runs on CPU0
> > TA's se->exec_start = xxxx
> > TA is put back into the rb tree waiting for next slice while another
> > task is running
> > CPU1 pulls TA which migrates on CPU1
> > migrate_task_rq_fair() w/ TA's se->exec_start == xxxx
> > TA's se->exec_start = 0
> > TA is put into the rb tree of CPU1 waiting to run on CPU1
> > CPU2 pulls TA which migrates on CPU2
> > migrate_task_rq_fair() w/ TA's se->exec_start == 0
> > TA's se->exec_start = 0
> Hi, Vincent,
>
> yes, you're right, such sequence does exist. But at this point, p->__state != TASK_WAKING.
>
> I have a question, Whether there is case that is "p->se.exec_start == 0 && p->__state == TASK_WAKING" ?
> I analyzed the code and concluded that this case isn't existed, is it right?

Yes, you're right. Your proposal is enough

Thanks

>
> Thanks.
> ZhangQiao.
>
> >
> >>
> >> I think the next migration will happend after the wakee task enqueued, but at this time
> >> the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().
> >>
> >> If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
> >> perform multiple times,wouldn't it go wrong in this way?
> >
> > the vruntime have been updated when enqueued but not exec_start
> >
> >>
> >>> migrate another time before being scheduled. You should create a
> >>> helper function like below and use it in both place
> >>
> >> Ok, I will update at next version.
> >>
> >>
> >> Thanks,
> >> ZhangQiao.
> >>
> >>>
> >>> static inline bool entity_long_sleep(se)
> >>> {
> >>> struct cfs_rq *cfs_rq;
> >>> u64 sleep_time;
> >>>
> >>> if (se->exec_start == 0)
> >>> return false;
> >>>
> >>> cfs_rq = cfs_rq_of(se);
> >>> sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>> if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>> return true;
> >>>
> >>> return false;
> >>> }
> >>>
> >>>
> >>>> + se->vruntime = -sched_sleeper_credit(se);
> >>>> + else
> >>>> + se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >>>> }
> >>>>
> >>>> if (!task_on_rq_migrating(p)) {
> >>>>
> >>>>
> >>>>
> >>>> Thanks.
> >>>> Zhang Qiao.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Roman.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Amazon Development Center Germany GmbH
> >>>>>> Krausenstr. 38
> >>>>>> 10117 Berlin
> >>>>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> >>>>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> >>>>>> Sitz: Berlin
> >>>>>> Ust-ID: DE 289 237 879
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> .
> >>>>>
> >>> .
> >>>
> > .
> >