2020-03-20 15:13:43

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 0/4] Throttle select_idle_sibling when a target domain is overloaded

This is a follow-on from the CPU/NUMA load balancer reconcilation
after I noticed that select_idle_sibling() was doing excessive work. It
was originally part of a larger series that merged select_idle_core,
select_idle_sibling and select_idle_cpu as a single pass. Unfortunately,
fixes have invalidated the tests multiple times so this series covers
only one part for now as the tests are extremely time-consuming.

tip/sched/core as of March 13th was used as the baseline with "sched/fair:
fix condition of avg_load calculation" applied which was just picked up
by tip at the time of writing.

Patches 1-2 add schedstats to track the efficiency of
select_idle_sibling(). Ordinarily they are disabled and are only really
of use to a kernel developer. However, I find them more practical to work
with than perf.

Patch 3 is a trivial micro-optimisation that avoids clearing part of
a cpumask if a core has been found.

Patch 4 tracks whether a domain appeared to be overloaded during
select_idle_cpu() so that future scans can abort early if necessary.
This reduces the number of runqueues that are scanned uselessly when
a domain is overloaded.

include/linux/sched/topology.h | 1 +
kernel/sched/debug.c | 6 +++
kernel/sched/fair.c | 103 +++++++++++++++++++++++++++++++++++------
kernel/sched/features.h | 3 ++
kernel/sched/sched.h | 8 ++++
kernel/sched/stats.c | 9 ++--
6 files changed, 113 insertions(+), 17 deletions(-)

--
2.16.4


2020-03-20 15:14:18

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 1/4] sched/fair: Track efficiency of select_idle_sibling

select_idle_sibling is an important path that finds a nearby idle CPU on
wakeup. As it is examining other CPUs state, it can be expensive in terms
of cache usage. This patch tracks the search efficiency if schedstats
are enabled. In general, this is only useful for kernel developers but
schedstats are typically disabled by default so it is convenient for
development and mostly free otherwise.

The series can be done without this patch but the stats were used to
generate a number of useful metrics in mmtest to analyse what was
going on.

SIS Search: Number of calls to select_idle_sibling

SIS Domain Search: Number of times the domain was searched because the
fast path failed.

SIS Scanned: Generally the number of runqueues scanned but the fast
path counts as 1 regardless of the values for target, prev
and recent.

SIS Domain Scanned: Number of runqueues scanned during a search of the
LLC domain.

SIS Failures: Number of SIS calls that failed to find an idle CPU

SIS Search Efficiency: A ratio expressed as a percentage of runqueues
scanned versus idle CPUs found. A 100% efficiency indicates that
the target, prev or recent CPU of a task was idle at wakeup. The
lower the efficiency, the more runqueues were scanned before an
idle CPU was found.

SIS Domain Search Efficiency: Similar, except only for the slower SIS
patch.

SIS Fast Success Rate: Percentage of SIS that used target, prev or
recent CPUs.

SIS Success rate: Percentage of scans that found an idle CPU.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/debug.c | 4 ++++
kernel/sched/fair.c | 14 ++++++++++++++
kernel/sched/sched.h | 6 ++++++
kernel/sched/stats.c | 8 +++++---
4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 8331bc04aea2..7af6e8a12f40 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -661,6 +661,10 @@ do { \
P(sched_goidle);
P(ttwu_count);
P(ttwu_local);
+ P(sis_search);
+ P(sis_domain_search);
+ P(sis_scanned);
+ P(sis_failed);
}
#undef P

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dea8554ead0..9d32a81ece08 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5999,6 +5999,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
bool idle = true;

for_each_cpu(cpu, cpu_smt_mask(core)) {
+ schedstat_inc(this_rq()->sis_scanned);
if (!available_idle_cpu(cpu)) {
idle = false;
break;
@@ -6029,6 +6030,7 @@ static int select_idle_smt(struct task_struct *p, int target)
return -1;

for_each_cpu(cpu, cpu_smt_mask(target)) {
+ schedstat_inc(this_rq()->sis_scanned);
if (!cpumask_test_cpu(cpu, p->cpus_ptr))
continue;
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
@@ -6094,6 +6096,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);

for_each_cpu_wrap(cpu, cpus, target) {
+ schedstat_inc(this_rq()->sis_scanned);
if (!--nr)
return -1;
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
@@ -6150,6 +6153,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
struct sched_domain *sd;
int i, recent_used_cpu;

+ schedstat_inc(this_rq()->sis_search);
+
+ /*
+ * Checking if prev, target and recent is treated as one scan. A
+ * perfect hit on one of those is considered 100% efficiency.
+ * Further scanning impairs efficiency.
+ */
+ schedstat_inc(this_rq()->sis_scanned);
+
/*
* For asymmetric CPU capacity systems, our domain of interest is
* sd_asym_cpucapacity rather than sd_llc.
@@ -6215,6 +6227,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if (!sd)
return target;

+ schedstat_inc(this_rq()->sis_domain_search);
i = select_idle_core(p, sd, target);
if ((unsigned)i < nr_cpumask_bits)
return i;
@@ -6227,6 +6240,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if ((unsigned)i < nr_cpumask_bits)
return i;

+ schedstat_inc(this_rq()->sis_failed);
return target;
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9e173fad0425..ff23cd92d25f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1008,6 +1008,12 @@ struct rq {
/* try_to_wake_up() stats */
unsigned int ttwu_count;
unsigned int ttwu_local;
+
+ /* select_idle_sibling stats */
+ unsigned int sis_search;
+ unsigned int sis_domain_search;
+ unsigned int sis_scanned;
+ unsigned int sis_failed;
#endif

#ifdef CONFIG_SMP
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 750fb3c67eed..390bfcc3842c 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -10,7 +10,7 @@
* Bump this up when changing the output format or the meaning of an existing
* format, so that tools can adapt (or abort)
*/
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16

static int show_schedstat(struct seq_file *seq, void *v)
{
@@ -30,12 +30,14 @@ static int show_schedstat(struct seq_file *seq, void *v)

/* runqueue-specific stats */
seq_printf(seq,
- "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+ "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u",
cpu, rq->yld_count,
rq->sched_count, rq->sched_goidle,
rq->ttwu_count, rq->ttwu_local,
rq->rq_cpu_time,
- rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+ rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+ rq->sis_search, rq->sis_domain_search,
+ rq->sis_scanned, rq->sis_failed);

seq_printf(seq, "\n");

--
2.16.4

2020-03-20 15:14:21

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 2/4] sched/fair: Track efficiency of task recent_used_cpu

This simply tracks the efficiency of p->recent_used_cpu. The hit rate of
this matters as it can avoid a domain search. Similarly, the miss rate
matters because each miss is a penalty to the fast path. MMTests uses
this to generate additional metrics

SIS Recent Used Hit: A recent CPU was eligible and used. Each hit is
a domain search avoided.

SIS Recent Used Miss: A recent CPU was eligible but unavailable. Each
time this is hit, there was a penalty to the fast path before
a domain search happened.

SIS Recent Success Rate: A percentage of the number of hits versus
the total attempts to use the recent CPU.

SIS Recent Attempts: The total number of times the recent CPU was examined.
A high number of Recent Attempts with a low Success Rate implies
the fast path is being punished severely. This could have been
presented as a weighting of hits and misses but calculating an
appropriate weight for misses is problematic.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/debug.c | 2 ++
kernel/sched/fair.c | 21 ++++++++++++---------
kernel/sched/sched.h | 2 ++
kernel/sched/stats.c | 7 ++++---
4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 7af6e8a12f40..d58f333bb739 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -665,6 +665,8 @@ do { \
P(sis_domain_search);
P(sis_scanned);
P(sis_failed);
+ P(sis_recent_hit);
+ P(sis_recent_miss);
}
#undef P

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d32a81ece08..7f4356c520be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6212,15 +6212,18 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
recent_used_cpu = p->recent_used_cpu;
if (recent_used_cpu != prev &&
recent_used_cpu != target &&
- cpus_share_cache(recent_used_cpu, target) &&
- (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
- cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
- /*
- * Replace recent_used_cpu with prev as it is a potential
- * candidate for the next wake:
- */
- p->recent_used_cpu = prev;
- return recent_used_cpu;
+ cpus_share_cache(recent_used_cpu, target)) {
+ if ((available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
+ cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
+ /*
+ * Replace recent_used_cpu with prev as it is a potential
+ * candidate for the next wake:
+ */
+ p->recent_used_cpu = prev;
+ schedstat_inc(this_rq()->sis_recent_hit);
+ return recent_used_cpu;
+ }
+ schedstat_inc(this_rq()->sis_recent_miss);
}

sd = rcu_dereference(per_cpu(sd_llc, target));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ff23cd92d25f..3cb7de0b11d7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1014,6 +1014,8 @@ struct rq {
unsigned int sis_domain_search;
unsigned int sis_scanned;
unsigned int sis_failed;
+ unsigned int sis_recent_hit;
+ unsigned int sis_recent_miss;
#endif

#ifdef CONFIG_SMP
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 390bfcc3842c..402fab75aa14 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -10,7 +10,7 @@
* Bump this up when changing the output format or the meaning of an existing
* format, so that tools can adapt (or abort)
*/
-#define SCHEDSTAT_VERSION 16
+#define SCHEDSTAT_VERSION 17

static int show_schedstat(struct seq_file *seq, void *v)
{
@@ -30,14 +30,15 @@ static int show_schedstat(struct seq_file *seq, void *v)

/* runqueue-specific stats */
seq_printf(seq,
- "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u",
+ "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u %u %u",
cpu, rq->yld_count,
rq->sched_count, rq->sched_goidle,
rq->ttwu_count, rq->ttwu_local,
rq->rq_cpu_time,
rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
rq->sis_search, rq->sis_domain_search,
- rq->sis_scanned, rq->sis_failed);
+ rq->sis_scanned, rq->sis_failed,
+ rq->sis_recent_hit, rq->sis_recent_miss);

seq_printf(seq, "\n");

--
2.16.4

2020-03-20 15:14:53

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

Once a domain is overloaded, it is very unlikely that a free CPU will
be found in the short term but there is still potentially a lot of
scanning. This patch tracks if a domain may be overloaded due to an
excessive number of running tasks relative to available CPUs. In the
event a domain is overloaded, a search is aborted.

This has a variable impact on performance for hackbench which often
is overloaded on the test machines used. There was a mix of performance gains
and losses but there is a substantial impact on search efficiency.

On a 2-socket broadwell machine with 80 cores in total, tbench showed
small gains and some losses

Hmean 1 431.51 ( 0.00%) 426.53 * -1.15%*
Hmean 2 842.69 ( 0.00%) 839.00 * -0.44%*
Hmean 4 1631.09 ( 0.00%) 1634.81 * 0.23%*
Hmean 8 3001.08 ( 0.00%) 3020.85 * 0.66%*
Hmean 16 5631.75 ( 0.00%) 5655.04 * 0.41%*
Hmean 32 9736.22 ( 0.00%) 9645.68 * -0.93%*
Hmean 64 13978.54 ( 0.00%) 15215.65 * 8.85%*
Hmean 128 20093.06 ( 0.00%) 19389.45 * -3.50%*
Hmean 256 17491.34 ( 0.00%) 18616.32 * 6.43%*
Hmean 320 17423.67 ( 0.00%) 17793.38 * 2.12%*

However, the "SIS Domain Search Efficiency" went from 6.03% to 19.61%
indicating that far fewer CPUs were scanned. The impact of the patch
is more noticable when sockets have multiple L3 caches. While true for
EPYC 2nd generation, it's particularly noticable on EPYC 1st generation

Hmean 1 325.30 ( 0.00%) 324.92 * -0.12%*
Hmean 2 630.77 ( 0.00%) 621.35 * -1.49%*
Hmean 4 1211.41 ( 0.00%) 1148.51 * -5.19%*
Hmean 8 2017.29 ( 0.00%) 1953.57 * -3.16%*
Hmean 16 4068.81 ( 0.00%) 3514.06 * -13.63%*
Hmean 32 5588.20 ( 0.00%) 6583.58 * 17.81%*
Hmean 64 8470.14 ( 0.00%) 10117.26 * 19.45%*
Hmean 128 11462.06 ( 0.00%) 17207.68 * 50.13%*
Hmean 256 11433.74 ( 0.00%) 13446.93 * 17.61%*
Hmean 512 12576.88 ( 0.00%) 13630.08 * 8.37%*

On this machine, search efficiency goes from 21.04% to 32.66%. There
is a noticable problem at 16 when there are enough clients for a LLC
domain to spill over.

With hackbench, the overload problem is a bit more obvious. On the
2-socket broadwell machine using processes and pipes we see

Amean 1 0.3023 ( 0.00%) 0.2893 ( 4.30%)
Amean 4 0.6823 ( 0.00%) 0.6930 ( -1.56%)
Amean 7 1.0293 ( 0.00%) 1.0380 ( -0.84%)
Amean 12 1.6913 ( 0.00%) 1.7027 ( -0.67%)
Amean 21 2.9307 ( 0.00%) 2.9297 ( 0.03%)
Amean 30 4.0040 ( 0.00%) 4.0270 ( -0.57%)
Amean 48 6.0703 ( 0.00%) 6.1067 ( -0.60%)
Amean 79 9.0630 ( 0.00%) 9.1223 * -0.65%*
Amean 110 12.1917 ( 0.00%) 12.1693 ( 0.18%)
Amean 141 15.7150 ( 0.00%) 15.4187 ( 1.89%)
Amean 172 19.5327 ( 0.00%) 18.9937 ( 2.76%)
Amean 203 23.3093 ( 0.00%) 22.2497 * 4.55%*
Amean 234 27.8657 ( 0.00%) 25.9627 * 6.83%*
Amean 265 32.9783 ( 0.00%) 29.5240 * 10.47%*
Amean 296 35.6727 ( 0.00%) 32.8260 * 7.98%*

More of the SIS stats are worth looking at in this case

Ops SIS Domain Search 10390526707.00 9822163508.00
Ops SIS Scanned 223173467577.00 48330226094.00
Ops SIS Domain Scanned 222820381314.00 47964114165.00
Ops SIS Failures 10183794873.00 9639912418.00
Ops SIS Recent Used Hit 22194515.00 22517194.00
Ops SIS Recent Used Miss 5733847634.00 5500415074.00
Ops SIS Recent Attempts 5756042149.00 5522932268.00
Ops SIS Search Efficiency 4.81 21.08

Search efficiency goes from 4.66% to 20.48% but the SIS Domain Scanned
shows the sheer volume of searching SIS does when prev, target and recent
CPUs are unavailable.

This could be much more aggressive by also cutting off a search for idle
cores. However, to make that work properly requires a much more intrusive
series that is likely to be controversial. This seemed like a reasonable
tradeoff to tackle the most obvious problem with select_idle_cpu.

Signed-off-by: Mel Gorman <[email protected]>
---
include/linux/sched/topology.h | 1 +
kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
kernel/sched/features.h | 3 ++
3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index af9319e4cfb9..76ec7a54f57b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -66,6 +66,7 @@ struct sched_domain_shared {
atomic_t ref;
atomic_t nr_busy_cpus;
int has_idle_cores;
+ int is_overloaded;
};

struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 41913fac68de..31e011e627db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5924,6 +5924,38 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
return new_cpu;
}

+static inline void
+set_sd_overloaded(struct sched_domain_shared *sds, int val)
+{
+ if (!sds)
+ return;
+
+ WRITE_ONCE(sds->is_overloaded, val);
+}
+
+static inline bool test_sd_overloaded(struct sched_domain_shared *sds)
+{
+ return READ_ONCE(sds->is_overloaded);
+}
+
+/* Returns true if a previously overloaded domain is likely still overloaded. */
+static inline bool
+abort_sd_overloaded(struct sched_domain_shared *sds, int prev, int target)
+{
+ if (!sds || !test_sd_overloaded(sds))
+ return false;
+
+ /* Are either target or a suitable prev 1 or 0 tasks? */
+ if (cpu_rq(target)->nr_running <= 1 ||
+ (prev != target && cpus_share_cache(prev, target) &&
+ cpu_rq(prev)->nr_running <= 1)) {
+ set_sd_overloaded(sds, 0);
+ return false;
+ }
+
+ return true;
+}
+
#ifdef CONFIG_SCHED_SMT
DEFINE_STATIC_KEY_FALSE(sched_smt_present);
EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6060,15 +6092,18 @@ static inline int select_idle_smt(struct task_struct *p, int target)
* comparing the average scan cost (tracked in sd->avg_scan_cost) against the
* average idle time for this rq (as found in rq->avg_idle).
*/
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
+ int prev, int target)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
struct sched_domain *this_sd;
+ struct sched_domain_shared *sds;
u64 avg_cost, avg_idle;
u64 time, cost;
s64 delta;
int this = smp_processor_id();
int cpu, nr = INT_MAX;
+ int nr_scanned = 0, nr_running = 0;

this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
@@ -6092,18 +6127,40 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
nr = 4;
}

+ sds = rcu_dereference(per_cpu(sd_llc_shared, target));
+ if (sched_feat(SIS_OVERLOAD)) {
+ if (abort_sd_overloaded(sds, prev, target))
+ return -1;
+ }
+
time = cpu_clock(this);

cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);

for_each_cpu_wrap(cpu, cpus, target) {
schedstat_inc(this_rq()->sis_scanned);
- if (!--nr)
- return -1;
+ if (!--nr) {
+ cpu = -1;
+ break;
+ }
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
break;
+ if (sched_feat(SIS_OVERLOAD)) {
+ nr_scanned++;
+ nr_running += cpu_rq(cpu)->nr_running;
+ }
}

+ /* Check if domain should be marked overloaded if no cpu was found. */
+ if (sched_feat(SIS_OVERLOAD) && (signed)cpu >= nr_cpumask_bits &&
+ nr_scanned && nr_running > (nr_scanned << 1)) {
+ set_sd_overloaded(sds, 1);
+ }
+
+ /* Scan cost not accounted for if scan is throttled */
+ if (!nr)
+ return -1;
+
time = cpu_clock(this) - time;
cost = this_sd->avg_scan_cost;
delta = (s64)(time - cost) / 8;
@@ -6236,7 +6293,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if ((unsigned)i < nr_cpumask_bits)
return i;

- i = select_idle_cpu(p, sd, target);
+ i = select_idle_cpu(p, sd, prev, target);
if ((unsigned)i < nr_cpumask_bits)
return i;

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7481cd96f391..c36ae01910e2 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -57,6 +57,9 @@ SCHED_FEAT(TTWU_QUEUE, true)
SCHED_FEAT(SIS_AVG_CPU, false)
SCHED_FEAT(SIS_PROP, true)

+/* Limit scans if the domain is likely overloaded */
+SCHED_FEAT(SIS_OVERLOAD, true)
+
/*
* Issue a WARN when we do multiple update_rq_clock() calls
* in a single rq->lock section. Default disabled because the
--
2.16.4

2020-03-20 15:15:44

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 3/4] sched/fair: Clear SMT siblings after determining the core is not idle

The clearing of SMT siblings from the SIS mask before checking for an idle
core is a small but unnecessary cost. Defer the clearing of the siblings
until the scan moves to the next potential target. The cost of this was
not measured as it is borderline noise but it should be self-evident.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f4356c520be..41913fac68de 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6005,10 +6005,11 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
break;
}
}
- cpumask_andnot(cpus, cpus, cpu_smt_mask(core));

if (idle)
return core;
+
+ cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
}

/*
--
2.16.4

2020-03-20 15:50:29

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, 20 Mar 2020 at 16:13, Mel Gorman <[email protected]> wrote:
>
> Once a domain is overloaded, it is very unlikely that a free CPU will
> be found in the short term but there is still potentially a lot of
> scanning. This patch tracks if a domain may be overloaded due to an
> excessive number of running tasks relative to available CPUs. In the
> event a domain is overloaded, a search is aborted.
>
> This has a variable impact on performance for hackbench which often
> is overloaded on the test machines used. There was a mix of performance gains
> and losses but there is a substantial impact on search efficiency.
>
> On a 2-socket broadwell machine with 80 cores in total, tbench showed
> small gains and some losses
>
> Hmean 1 431.51 ( 0.00%) 426.53 * -1.15%*
> Hmean 2 842.69 ( 0.00%) 839.00 * -0.44%*
> Hmean 4 1631.09 ( 0.00%) 1634.81 * 0.23%*
> Hmean 8 3001.08 ( 0.00%) 3020.85 * 0.66%*
> Hmean 16 5631.75 ( 0.00%) 5655.04 * 0.41%*
> Hmean 32 9736.22 ( 0.00%) 9645.68 * -0.93%*
> Hmean 64 13978.54 ( 0.00%) 15215.65 * 8.85%*
> Hmean 128 20093.06 ( 0.00%) 19389.45 * -3.50%*
> Hmean 256 17491.34 ( 0.00%) 18616.32 * 6.43%*
> Hmean 320 17423.67 ( 0.00%) 17793.38 * 2.12%*
>
> However, the "SIS Domain Search Efficiency" went from 6.03% to 19.61%
> indicating that far fewer CPUs were scanned. The impact of the patch
> is more noticable when sockets have multiple L3 caches. While true for
> EPYC 2nd generation, it's particularly noticable on EPYC 1st generation
>
> Hmean 1 325.30 ( 0.00%) 324.92 * -0.12%*
> Hmean 2 630.77 ( 0.00%) 621.35 * -1.49%*
> Hmean 4 1211.41 ( 0.00%) 1148.51 * -5.19%*
> Hmean 8 2017.29 ( 0.00%) 1953.57 * -3.16%*
> Hmean 16 4068.81 ( 0.00%) 3514.06 * -13.63%*
> Hmean 32 5588.20 ( 0.00%) 6583.58 * 17.81%*
> Hmean 64 8470.14 ( 0.00%) 10117.26 * 19.45%*
> Hmean 128 11462.06 ( 0.00%) 17207.68 * 50.13%*
> Hmean 256 11433.74 ( 0.00%) 13446.93 * 17.61%*
> Hmean 512 12576.88 ( 0.00%) 13630.08 * 8.37%*
>
> On this machine, search efficiency goes from 21.04% to 32.66%. There
> is a noticable problem at 16 when there are enough clients for a LLC
> domain to spill over.
>
> With hackbench, the overload problem is a bit more obvious. On the
> 2-socket broadwell machine using processes and pipes we see
>
> Amean 1 0.3023 ( 0.00%) 0.2893 ( 4.30%)
> Amean 4 0.6823 ( 0.00%) 0.6930 ( -1.56%)
> Amean 7 1.0293 ( 0.00%) 1.0380 ( -0.84%)
> Amean 12 1.6913 ( 0.00%) 1.7027 ( -0.67%)
> Amean 21 2.9307 ( 0.00%) 2.9297 ( 0.03%)
> Amean 30 4.0040 ( 0.00%) 4.0270 ( -0.57%)
> Amean 48 6.0703 ( 0.00%) 6.1067 ( -0.60%)
> Amean 79 9.0630 ( 0.00%) 9.1223 * -0.65%*
> Amean 110 12.1917 ( 0.00%) 12.1693 ( 0.18%)
> Amean 141 15.7150 ( 0.00%) 15.4187 ( 1.89%)
> Amean 172 19.5327 ( 0.00%) 18.9937 ( 2.76%)
> Amean 203 23.3093 ( 0.00%) 22.2497 * 4.55%*
> Amean 234 27.8657 ( 0.00%) 25.9627 * 6.83%*
> Amean 265 32.9783 ( 0.00%) 29.5240 * 10.47%*
> Amean 296 35.6727 ( 0.00%) 32.8260 * 7.98%*
>
> More of the SIS stats are worth looking at in this case
>
> Ops SIS Domain Search 10390526707.00 9822163508.00
> Ops SIS Scanned 223173467577.00 48330226094.00
> Ops SIS Domain Scanned 222820381314.00 47964114165.00
> Ops SIS Failures 10183794873.00 9639912418.00
> Ops SIS Recent Used Hit 22194515.00 22517194.00
> Ops SIS Recent Used Miss 5733847634.00 5500415074.00
> Ops SIS Recent Attempts 5756042149.00 5522932268.00
> Ops SIS Search Efficiency 4.81 21.08
>
> Search efficiency goes from 4.66% to 20.48% but the SIS Domain Scanned
> shows the sheer volume of searching SIS does when prev, target and recent
> CPUs are unavailable.
>
> This could be much more aggressive by also cutting off a search for idle
> cores. However, to make that work properly requires a much more intrusive
> series that is likely to be controversial. This seemed like a reasonable
> tradeoff to tackle the most obvious problem with select_idle_cpu.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> include/linux/sched/topology.h | 1 +
> kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> kernel/sched/features.h | 3 ++
> 3 files changed, 65 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index af9319e4cfb9..76ec7a54f57b 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -66,6 +66,7 @@ struct sched_domain_shared {
> atomic_t ref;
> atomic_t nr_busy_cpus;
> int has_idle_cores;
> + int is_overloaded;

Can't nr_busy_cpus compared to sd->span_weight give you similar status ?

> };
>
> struct sched_domain {
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 41913fac68de..31e011e627db 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5924,6 +5924,38 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
> return new_cpu;
> }
>
> +static inline void
> +set_sd_overloaded(struct sched_domain_shared *sds, int val)
> +{
> + if (!sds)
> + return;
> +
> + WRITE_ONCE(sds->is_overloaded, val);
> +}
> +
> +static inline bool test_sd_overloaded(struct sched_domain_shared *sds)
> +{
> + return READ_ONCE(sds->is_overloaded);
> +}
> +
> +/* Returns true if a previously overloaded domain is likely still overloaded. */
> +static inline bool
> +abort_sd_overloaded(struct sched_domain_shared *sds, int prev, int target)
> +{
> + if (!sds || !test_sd_overloaded(sds))
> + return false;
> +
> + /* Are either target or a suitable prev 1 or 0 tasks? */
> + if (cpu_rq(target)->nr_running <= 1 ||
> + (prev != target && cpus_share_cache(prev, target) &&
> + cpu_rq(prev)->nr_running <= 1)) {
> + set_sd_overloaded(sds, 0);
> + return false;
> + }
> +
> + return true;
> +}
> +
> #ifdef CONFIG_SCHED_SMT
> DEFINE_STATIC_KEY_FALSE(sched_smt_present);
> EXPORT_SYMBOL_GPL(sched_smt_present);
> @@ -6060,15 +6092,18 @@ static inline int select_idle_smt(struct task_struct *p, int target)
> * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
> * average idle time for this rq (as found in rq->avg_idle).
> */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
> + int prev, int target)
> {
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> struct sched_domain *this_sd;
> + struct sched_domain_shared *sds;
> u64 avg_cost, avg_idle;
> u64 time, cost;
> s64 delta;
> int this = smp_processor_id();
> int cpu, nr = INT_MAX;
> + int nr_scanned = 0, nr_running = 0;
>
> this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
> if (!this_sd)
> @@ -6092,18 +6127,40 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> nr = 4;
> }
>
> + sds = rcu_dereference(per_cpu(sd_llc_shared, target));
> + if (sched_feat(SIS_OVERLOAD)) {
> + if (abort_sd_overloaded(sds, prev, target))
> + return -1;
> + }
> +
> time = cpu_clock(this);
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> for_each_cpu_wrap(cpu, cpus, target) {
> schedstat_inc(this_rq()->sis_scanned);
> - if (!--nr)
> - return -1;
> + if (!--nr) {
> + cpu = -1;
> + break;
> + }
> if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> break;
> + if (sched_feat(SIS_OVERLOAD)) {
> + nr_scanned++;
> + nr_running += cpu_rq(cpu)->nr_running;
> + }
> }
>
> + /* Check if domain should be marked overloaded if no cpu was found. */
> + if (sched_feat(SIS_OVERLOAD) && (signed)cpu >= nr_cpumask_bits &&
> + nr_scanned && nr_running > (nr_scanned << 1)) {
> + set_sd_overloaded(sds, 1);
> + }
> +
> + /* Scan cost not accounted for if scan is throttled */
> + if (!nr)
> + return -1;
> +
> time = cpu_clock(this) - time;
> cost = this_sd->avg_scan_cost;
> delta = (s64)(time - cost) / 8;
> @@ -6236,7 +6293,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>
> - i = select_idle_cpu(p, sd, target);
> + i = select_idle_cpu(p, sd, prev, target);
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 7481cd96f391..c36ae01910e2 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -57,6 +57,9 @@ SCHED_FEAT(TTWU_QUEUE, true)
> SCHED_FEAT(SIS_AVG_CPU, false)
> SCHED_FEAT(SIS_PROP, true)
>
> +/* Limit scans if the domain is likely overloaded */
> +SCHED_FEAT(SIS_OVERLOAD, true)
> +
> /*
> * Issue a WARN when we do multiple update_rq_clock() calls
> * in a single rq->lock section. Default disabled because the
> --
> 2.16.4
>

2020-03-20 16:45:48

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > ---
> > include/linux/sched/topology.h | 1 +
> > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > kernel/sched/features.h | 3 ++
> > 3 files changed, 65 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index af9319e4cfb9..76ec7a54f57b 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > atomic_t ref;
> > atomic_t nr_busy_cpus;
> > int has_idle_cores;
> > + int is_overloaded;
>
> Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
>

It's connected to nohz balancing and I didn't see how I could use that
for detecting overload. Also, I don't think it ever can be larger than
the sd weight and overload is based on the number of running tasks being
greater than the number of available CPUs. Did I miss something obvious?

--
Mel Gorman
SUSE Labs

2020-03-20 16:56:21

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, 20 Mar 2020 at 17:44, Mel Gorman <[email protected]> wrote:
>
> On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > > ---
> > > include/linux/sched/topology.h | 1 +
> > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > > kernel/sched/features.h | 3 ++
> > > 3 files changed, 65 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > index af9319e4cfb9..76ec7a54f57b 100644
> > > --- a/include/linux/sched/topology.h
> > > +++ b/include/linux/sched/topology.h
> > > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > > atomic_t ref;
> > > atomic_t nr_busy_cpus;
> > > int has_idle_cores;
> > > + int is_overloaded;
> >
> > Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
> >
>
> It's connected to nohz balancing and I didn't see how I could use that
> for detecting overload. Also, I don't think it ever can be larger than
> the sd weight and overload is based on the number of running tasks being
> greater than the number of available CPUs. Did I miss something obvious?

IIUC you try to estimate if there is a chance to find an idle cpu
before starting the loop and scanning the domain and abort early if
the possibility is low.

if nr_busy_cpus equals to sd->span_weight it means that there is no
free cpu so there is no need to scan

>
> --
> Mel Gorman
> SUSE Labs

2020-03-20 17:44:55

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, Mar 20, 2020 at 05:54:57PM +0100, Vincent Guittot wrote:
> On Fri, 20 Mar 2020 at 17:44, Mel Gorman <[email protected]> wrote:
> >
> > On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > > > ---
> > > > include/linux/sched/topology.h | 1 +
> > > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > > > kernel/sched/features.h | 3 ++
> > > > 3 files changed, 65 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > > index af9319e4cfb9..76ec7a54f57b 100644
> > > > --- a/include/linux/sched/topology.h
> > > > +++ b/include/linux/sched/topology.h
> > > > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > > > atomic_t ref;
> > > > atomic_t nr_busy_cpus;
> > > > int has_idle_cores;
> > > > + int is_overloaded;
> > >
> > > Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
> > >
> >
> > It's connected to nohz balancing and I didn't see how I could use that
> > for detecting overload. Also, I don't think it ever can be larger than
> > the sd weight and overload is based on the number of running tasks being
> > greater than the number of available CPUs. Did I miss something obvious?
>
> IIUC you try to estimate if there is a chance to find an idle cpu
> before starting the loop and scanning the domain and abort early if
> the possibility is low.
>
> if nr_busy_cpus equals to sd->span_weight it means that there is no
> free cpu so there is no need to scan
>

Ok, I see what you are getting at but I worry there are multiple
problems there. First, the nr_busy_cpus is decremented only when a CPU
is entering idle with the tick stopped. If nohz is disabled then this
breaks, no? Secondly, a CPU can be idle but the tick not stopped if
__tick_nohz_idle_stop_tick knows there is an event in the near future
so using busy_cpus, we potentially miss a sibling that was adequate
for running a task. Finally, the threshold for cutting off the search
entirely seems low. The patch marks a domain as overloaded if there are
twice as many running tasks as runqueues scanned. In that scenario, even
if tasks are rapidly switching between busy/idle, it's still unlikely
the task will go idle. When cutting off at just the fully-busy mark, we
could miss a CPU that is going idle, almost idle or is running SCHED_IDLE
tasks where are acceptable target candidates for select_idle_sibling. I
think there are too many cases where nr_busy_cpus are problematic to
make it a good alternative.

--
Mel Gorman
SUSE Labs

2020-03-23 13:31:24

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/fair: Track efficiency of select_idle_sibling


Hi Mel,

On Fri, Mar 20 2020, Mel Gorman wrote:
> SIS Search: Number of calls to select_idle_sibling
>
> SIS Domain Search: Number of times the domain was searched because the
> fast path failed.
>
> SIS Scanned: Generally the number of runqueues scanned but the fast
> path counts as 1 regardless of the values for target, prev
> and recent.
>
> SIS Domain Scanned: Number of runqueues scanned during a search of the
> LLC domain.
>
> SIS Failures: Number of SIS calls that failed to find an idle CPU
>

Let me put my changelog pedant hat on; it would be nice to explicitely
separate the 'raw' stats (i.e. those that you are adding to sis()) to
the downstream ones.

AIUI the ones above here are the 'raw' stats (except "SIS Domain
Scanned", I'm not sure I get where this one comes from?), and the ones
below are the downstream, post-processed ones.

> SIS Search Efficiency: A ratio expressed as a percentage of runqueues
> scanned versus idle CPUs found. A 100% efficiency indicates that
> the target, prev or recent CPU of a task was idle at wakeup. The
> lower the efficiency, the more runqueues were scanned before an
> idle CPU was found.
>
> SIS Domain Search Efficiency: Similar, except only for the slower SIS
> patch.
>
> SIS Fast Success Rate: Percentage of SIS that used target, prev or
> recent CPUs.
>
> SIS Success rate: Percentage of scans that found an idle CPU.
>
> Signed-off-by: Mel Gorman <[email protected]>

With the nits taken into account:

Reviewed-by: Valentin Schneider <[email protected]>

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1dea8554ead0..9d32a81ece08 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6150,6 +6153,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> struct sched_domain *sd;
> int i, recent_used_cpu;
>
> + schedstat_inc(this_rq()->sis_search);
> +
> + /*
> + * Checking if prev, target and recent is treated as one scan. A
> + * perfect hit on one of those is considered 100% efficiency.
> + * Further scanning impairs efficiency.
> + */
> + schedstat_inc(this_rq()->sis_scanned);
> +

You may want to move that sis_scanned increment to below the 'symmetric'
label. Also, you should instrument select_idle_capacity() with
sis_scanned increments, if only for the sake of completeness.

One last thing: each of the new schedstat_inc() callsites use this_rq();
IIRC because of the RELOC_HIDE() hiding underneath there's very little
chance of the compiler caching this. However, this depends on schedstat,
so I suppose that is fine.

> /*
> * For asymmetric CPU capacity systems, our domain of interest is
> * sd_asym_cpucapacity rather than sd_llc.

2020-03-23 13:31:40

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched/fair: Track efficiency of task recent_used_cpu


On Fri, Mar 20 2020, Mel Gorman wrote:
> SIS Recent Used Hit: A recent CPU was eligible and used. Each hit is
> a domain search avoided.
>
> SIS Recent Used Miss: A recent CPU was eligible but unavailable. Each
> time this is hit, there was a penalty to the fast path before
> a domain search happened.
>
> SIS Recent Success Rate: A percentage of the number of hits versus
> the total attempts to use the recent CPU.
>
> SIS Recent Attempts: The total number of times the recent CPU was examined.
> A high number of Recent Attempts with a low Success Rate implies
> the fast path is being punished severely. This could have been
> presented as a weighting of hits and misses but calculating an
> appropriate weight for misses is problematic.
>

Ditto on the raw vs post-processed detail in the changelog, otherwise:

Reviewed-by: Valentin Schneider <[email protected]>

> Signed-off-by: Mel Gorman <[email protected]>

2020-03-23 13:32:08

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 3/4] sched/fair: Clear SMT siblings after determining the core is not idle


On Fri, Mar 20 2020, Mel Gorman wrote:
> The clearing of SMT siblings from the SIS mask before checking for an idle
> core is a small but unnecessary cost. Defer the clearing of the siblings
> until the scan moves to the next potential target. The cost of this was
> not measured as it is borderline noise but it should be self-evident.
>
> Signed-off-by: Mel Gorman <[email protected]>

Reviewed-by: Valentin Schneider <[email protected]>

2020-03-23 13:58:15

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/fair: Track efficiency of select_idle_sibling

On Mon, Mar 23, 2020 at 01:30:10PM +0000, Valentin Schneider wrote:
>
> Hi Mel,
>
> On Fri, Mar 20 2020, Mel Gorman wrote:
> > SIS Search: Number of calls to select_idle_sibling
> >
> > SIS Domain Search: Number of times the domain was searched because the
> > fast path failed.
> >
> > SIS Scanned: Generally the number of runqueues scanned but the fast
> > path counts as 1 regardless of the values for target, prev
> > and recent.
> >
> > SIS Domain Scanned: Number of runqueues scanned during a search of the
> > LLC domain.
> >
> > SIS Failures: Number of SIS calls that failed to find an idle CPU
> >
>
> Let me put my changelog pedant hat on; it would be nice to explicitely
> separate the 'raw' stats (i.e. those that you are adding to sis()) to
> the downstream ones.
>
> AIUI the ones above here are the 'raw' stats (except "SIS Domain
> Scanned", I'm not sure I get where this one comes from?), and the ones
> below are the downstream, post-processed ones.
>

I can fix that up.

> > SIS Search Efficiency: A ratio expressed as a percentage of runqueues
> > scanned versus idle CPUs found. A 100% efficiency indicates that
> > the target, prev or recent CPU of a task was idle at wakeup. The
> > lower the efficiency, the more runqueues were scanned before an
> > idle CPU was found.
> >
> > SIS Domain Search Efficiency: Similar, except only for the slower SIS
> > patch.
> >
> > SIS Fast Success Rate: Percentage of SIS that used target, prev or
> > recent CPUs.
> >
> > SIS Success rate: Percentage of scans that found an idle CPU.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
>
> With the nits taken into account:
>
> Reviewed-by: Valentin Schneider <[email protected]>
>
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1dea8554ead0..9d32a81ece08 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6150,6 +6153,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > struct sched_domain *sd;
> > int i, recent_used_cpu;
> >
> > + schedstat_inc(this_rq()->sis_search);
> > +
> > + /*
> > + * Checking if prev, target and recent is treated as one scan. A
> > + * perfect hit on one of those is considered 100% efficiency.
> > + * Further scanning impairs efficiency.
> > + */
> > + schedstat_inc(this_rq()->sis_scanned);
> > +
>
> You may want to move that sis_scanned increment to below the 'symmetric'
> label. Also, you should instrument select_idle_capacity() with
> sis_scanned increments, if only for the sake of completeness.
>

Yes, that would make more sense. Instrumenting select_idle_capacity is
trivial so I'll fix that up too.

> One last thing: each of the new schedstat_inc() callsites use this_rq();
> IIRC because of the RELOC_HIDE() hiding underneath there's very little
> chance of the compiler caching this. However, this depends on schedstat,
> so I suppose that is fine.
>

It's a deliberate choice so that when schedstat is disabled there is no
cost. While some schedstat sites lookup the current runqueue, not all of
them do. This might be a little wasteful when schedstats are enabled but
at least it's consistent.

Thanks

--
Mel Gorman
SUSE Labs

2020-03-24 10:37:06

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, 20 Mar 2020 at 18:43, Mel Gorman <[email protected]> wrote:
>
> On Fri, Mar 20, 2020 at 05:54:57PM +0100, Vincent Guittot wrote:
> > On Fri, 20 Mar 2020 at 17:44, Mel Gorman <[email protected]> wrote:
> > >
> > > On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > > > > ---
> > > > > include/linux/sched/topology.h | 1 +
> > > > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > > > > kernel/sched/features.h | 3 ++
> > > > > 3 files changed, 65 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > > > index af9319e4cfb9..76ec7a54f57b 100644
> > > > > --- a/include/linux/sched/topology.h
> > > > > +++ b/include/linux/sched/topology.h
> > > > > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > > > > atomic_t ref;
> > > > > atomic_t nr_busy_cpus;
> > > > > int has_idle_cores;
> > > > > + int is_overloaded;
> > > >
> > > > Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
> > > >
> > >
> > > It's connected to nohz balancing and I didn't see how I could use that
> > > for detecting overload. Also, I don't think it ever can be larger than
> > > the sd weight and overload is based on the number of running tasks being
> > > greater than the number of available CPUs. Did I miss something obvious?
> >
> > IIUC you try to estimate if there is a chance to find an idle cpu
> > before starting the loop and scanning the domain and abort early if
> > the possibility is low.
> >
> > if nr_busy_cpus equals to sd->span_weight it means that there is no
> > free cpu so there is no need to scan
> >
>
> Ok, I see what you are getting at but I worry there are multiple
> problems there. First, the nr_busy_cpus is decremented only when a CPU
> is entering idle with the tick stopped. If nohz is disabled then this
> breaks, no? Secondly, a CPU can be idle but the tick not stopped if

But this can be changed if that make the statistic useful

> __tick_nohz_idle_stop_tick knows there is an event in the near future
> so using busy_cpus, we potentially miss a sibling that was adequate
> for running a task. Finally, the threshold for cutting off the search
> entirely seems low. The patch marks a domain as overloaded if there are
> twice as many running tasks as runqueues scanned. In that scenario, even
> if tasks are rapidly switching between busy/idle, it's still unlikely
> the task will go idle. When cutting off at just the fully-busy mark, we
> could miss a CPU that is going idle, almost idle or is running SCHED_IDLE
> tasks where are acceptable target candidates for select_idle_sibling. I
> think there are too many cases where nr_busy_cpus are problematic to
> make it a good alternative.

I don't really like this patch because it adds yet another metrics and
yet another feature which is set true by default. Also the current
proposal seems a bit fragile because it uses an arbitrary ratio of 2
on an arbitrary number of CPUs. This threshold probably works in your
case and your system but probably not for others and the threshold
really looks like a heuristic that works for you but without any real
meaning.

Then, the update is done at each and every task wake up and by all
CPUs in the LLC. It means that the same variable is updated
simultaneously by all CPUs: one CPU can set it and the next one might
clear it immediately because they haven't scanned the same CPUs. At
the end, 2 threads waking up simultaneously on different CPUS, might
end up using 2 different policy without any other reason than a random
ordering.

I agree that the concept of detecting that a LLC domain is overloaded
can be useful to decide to skip searching for an idle cpu but this
proposal seems to be not really generic

Vincent

>
> --
> Mel Gorman
> SUSE Labs

2020-03-24 11:25:25

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Tue, Mar 24, 2020 at 11:35:08AM +0100, Vincent Guittot wrote:
> > > > It's connected to nohz balancing and I didn't see how I could use that
> > > > for detecting overload. Also, I don't think it ever can be larger than
> > > > the sd weight and overload is based on the number of running tasks being
> > > > greater than the number of available CPUs. Did I miss something obvious?
> > >
> > > IIUC you try to estimate if there is a chance to find an idle cpu
> > > before starting the loop and scanning the domain and abort early if
> > > the possibility is low.
> > >
> > > if nr_busy_cpus equals to sd->span_weight it means that there is no
> > > free cpu so there is no need to scan
> > >
> >
> > Ok, I see what you are getting at but I worry there are multiple
> > problems there. First, the nr_busy_cpus is decremented only when a CPU
> > is entering idle with the tick stopped. If nohz is disabled then this
> > breaks, no? Secondly, a CPU can be idle but the tick not stopped if
>
> But this can be changed if that make the statistic useful
>

Hmm, for all cases to track number of running tasks, I think that would
end up being too costly because of the shared cache line.

> > __tick_nohz_idle_stop_tick knows there is an event in the near future
> > so using busy_cpus, we potentially miss a sibling that was adequate
> > for running a task. Finally, the threshold for cutting off the search
> > entirely seems low. The patch marks a domain as overloaded if there are
> > twice as many running tasks as runqueues scanned. In that scenario, even
> > if tasks are rapidly switching between busy/idle, it's still unlikely
> > the task will go idle. When cutting off at just the fully-busy mark, we
> > could miss a CPU that is going idle, almost idle or is running SCHED_IDLE
> > tasks where are acceptable target candidates for select_idle_sibling. I
> > think there are too many cases where nr_busy_cpus are problematic to
> > make it a good alternative.
>
> I don't really like this patch because it adds yet another metrics and
> yet another feature which is set true by default. Also the current
> proposal seems a bit fragile because it uses an arbitrary ratio of 2
> on an arbitrary number of CPUs. This threshold probably works in your
> case and your system but probably not for others and the threshold
> really looks like a heuristic that works for you but without any real
> meaning.
>

I have to admit that is a possibility. The really interesting case for
other people is the transition from almost-fully-busy -> fully-busy ->
overloaded because the optimal amount to search changes at those points.
It's a minefield of hitting a regression somewhere whether you search
too much or too little.

It also can be somewhat problematic when there are multiple small llc
caches per numa node. In that case a "full" search is still a small number
of CPUs and premature cutoff can be hurtful. It eventually showed up when
enough tests ran for long enough.

> Then, the update is done at each and every task wake up and by all
> CPUs in the LLC.

I avoided that by only doing the write when there is a state transition.
set_sd_overloaded() does not write if it's already marked overloaded.

> It means that the same variable is updated
> simultaneously by all CPUs: one CPU can set it and the next one might
> clear it immediately because they haven't scanned the same CPUs.

That is possible.

> At
> the end, 2 threads waking up simultaneously on different CPUS, might
> end up using 2 different policy without any other reason than a random
> ordering.
>

True, but I did not feel the race was extremely damaging because it
happens once per transition for up to sd_weight-1 unlucky tasks. The same
is true for the test for idle cores where multiple tasks can search for
an idle core unnecessarily. Now the search for an idle core can actually
cache one idle candidate it finds. It's a straight-forward patch but it
hurts the fast path when the domain has spare capacity and similar to
the overloaded state, we cannot detect spare capacity in advance.

> I agree that the concept of detecting that a LLC domain is overloaded
> can be useful to decide to skip searching for an idle cpu but this
> proposal seems to be not really generic
>

Ok, I can dump the patch and think of something else.

Thanks for the review!

--
Mel Gorman
SUSE Labs

2020-04-02 08:01:30

by Chen, Rong A

[permalink] [raw]
Subject: [sched/fair] 15e7470dfc: hackbench.throughput 11.2% improvement

Greeting,

FYI, we noticed a 11.2% improvement of hackbench.throughput due to commit:


commit: 15e7470dfc3d05f9339696ff21be8e49abea864c ("[PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary")
url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Throttle-select_idle_sibling-when-a-target-domain-is-overloaded/20200321-034938


in testcase: hackbench
on test machine: 104 threads Skylake with 192G memory
with following parameters:

nr_threads: 100%
mode: threads
ipc: pipe
cpufreq_governor: performance
ucode: 0x2000065

test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/ipc/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
gcc-7/performance/pipe/x86_64-rhel-7.6/threads/100%/debian-x86_64-20191114.cgz/lkp-skl-fpga01/hackbench/0x2000065

commit:
38321bdba2 ("sched/fair: Clear SMT siblings after determining the core is not idle")
15e7470dfc ("sched/fair: Track possibly overloaded domains and abort a scan if necessary")

38321bdba2e094b2 15e7470dfc3d05f9339696ff21b
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
:4 25% 1:4 kmsg.ipmi_si_dmi-ipmi-si.#:IRQ_index#not_found
%stddev %change %stddev
\ | \
273676 +11.2% 304249 hackbench.throughput
627.01 -3.6% 604.63 hackbench.time.elapsed_time
627.01 -3.6% 604.63 hackbench.time.elapsed_time.max
2.823e+09 -15.4% 2.389e+09 hackbench.time.involuntary_context_switches
862689 ± 2% -24.8% 648824 ± 3% hackbench.time.minor_page_faults
49062 -6.9% 45673 hackbench.time.system_time
15113 +5.1% 15887 hackbench.time.user_time
5.224e+09 -11.1% 4.645e+09 hackbench.time.voluntary_context_switches
1.747e+09 +7.1% 1.872e+09 hackbench.workload
1500 ± 28% -16.5% 1252 ± 30% meminfo.Mlocked
1.56 +0.5 2.07 ± 2% mpstat.cpu.all.idle%
2.077e+08 ± 4% +97.3% 4.098e+08 ± 11% cpuidle.C1.time
34591570 +71.4% 59276157 ± 3% cpuidle.C1.usage
1.417e+08 +28.3% 1.818e+08 cpuidle.POLL.time
60876038 +15.6% 70382270 cpuidle.POLL.usage
84269509 +18.4% 99747407 numa-numastat.node0.local_node
84269605 +18.4% 99760865 numa-numastat.node0.numa_hit
89134191 +14.6% 1.021e+08 numa-numastat.node1.local_node
89167819 +14.6% 1.022e+08 numa-numastat.node1.numa_hit
918.75 ± 5% +14.2% 1049 ± 9% slabinfo.pool_workqueue.active_objs
918.75 ± 5% +14.8% 1054 ± 9% slabinfo.pool_workqueue.num_objs
4528 -10.8% 4040 ± 5% slabinfo.skbuff_head_cache.active_objs
4528 -10.8% 4040 ± 5% slabinfo.skbuff_head_cache.num_objs
74.75 -3.7% 72.00 vmstat.cpu.sy
23.00 +8.7% 25.00 vmstat.cpu.us
12946159 -8.7% 11823064 vmstat.system.cs
996012 -11.2% 884657 vmstat.system.in
373.75 ± 28% -16.1% 313.75 ± 30% proc-vmstat.nr_mlock
666544 ± 2% -32.9% 447123 ± 4% proc-vmstat.numa_hint_faults
575714 ± 3% -39.9% 345813 ± 4% proc-vmstat.numa_hint_faults_local
1.735e+08 +16.4% 2.019e+08 proc-vmstat.numa_hit
1.734e+08 +16.4% 2.019e+08 proc-vmstat.numa_local
76981 ± 7% +20.0% 92403 ± 7% proc-vmstat.numa_pages_migrated
733440 ± 2% -25.0% 550124 ± 6% proc-vmstat.numa_pte_updates
1.737e+08 +16.4% 2.022e+08 proc-vmstat.pgalloc_normal
2337874 -9.2% 2122624 proc-vmstat.pgfault
1.737e+08 +16.4% 2.022e+08 proc-vmstat.pgfree
76981 ± 7% +20.0% 92403 ± 7% proc-vmstat.pgmigrate_success
58825 ± 13% -45.5% 32052 ± 40% numa-vmstat.node0.nr_active_anon
58369 ± 12% -45.5% 31829 ± 40% numa-vmstat.node0.nr_anon_pages
14230 ± 7% -21.1% 11233 ± 14% numa-vmstat.node0.nr_slab_reclaimable
58825 ± 13% -45.5% 32052 ± 40% numa-vmstat.node0.nr_zone_active_anon
42715469 +18.3% 50549181 numa-vmstat.node0.numa_hit
42612066 +18.3% 50398765 numa-vmstat.node0.numa_local
34729 ± 21% +76.8% 61416 ± 19% numa-vmstat.node1.nr_active_anon
14918 ± 51% +179.9% 41748 ± 30% numa-vmstat.node1.nr_anon_pages
8181 ± 13% +37.1% 11217 ± 15% numa-vmstat.node1.nr_slab_reclaimable
34729 ± 21% +76.8% 61416 ± 19% numa-vmstat.node1.nr_zone_active_anon
45590871 +13.9% 51910403 numa-vmstat.node1.numa_hit
45494598 +14.0% 51861137 numa-vmstat.node1.numa_local
235466 ± 13% -45.5% 128292 ± 40% numa-meminfo.node0.Active
235339 ± 13% -45.6% 128136 ± 40% numa-meminfo.node0.Active(anon)
173424 ± 14% -51.7% 83692 ± 45% numa-meminfo.node0.AnonHugePages
233511 ± 12% -45.5% 127246 ± 40% numa-meminfo.node0.AnonPages
56927 ± 7% -21.1% 44927 ± 13% numa-meminfo.node0.KReclaimable
1298360 ± 4% -14.2% 1113777 ± 8% numa-meminfo.node0.MemUsed
56927 ± 7% -21.1% 44927 ± 13% numa-meminfo.node0.SReclaimable
210815 ± 6% -19.9% 168941 ± 18% numa-meminfo.node0.Slab
139001 ± 21% +76.7% 245567 ± 20% numa-meminfo.node1.Active
138924 ± 21% +76.7% 245519 ± 20% numa-meminfo.node1.Active(anon)
31549 ± 80% +278.9% 119529 ± 31% numa-meminfo.node1.AnonHugePages
59680 ± 51% +179.9% 167018 ± 30% numa-meminfo.node1.AnonPages
32728 ± 13% +37.1% 44871 ± 15% numa-meminfo.node1.KReclaimable
1069909 ± 4% +17.6% 1257928 ± 7% numa-meminfo.node1.MemUsed
32728 ± 13% +37.1% 44871 ± 15% numa-meminfo.node1.SReclaimable
79815 ± 43% +111.6% 168859 ± 14% sched_debug.cfs_rq:/.MIN_vruntime.avg
746616 ± 44% +71.8% 1282474 ± 11% sched_debug.cfs_rq:/.MIN_vruntime.stddev
79815 ± 43% +111.6% 168859 ± 14% sched_debug.cfs_rq:/.max_vruntime.avg
746616 ± 44% +71.8% 1282474 ± 11% sched_debug.cfs_rq:/.max_vruntime.stddev
0.36 ± 30% -68.1% 0.12 ± 64% sched_debug.cfs_rq:/.nr_running.min
0.10 ± 18% +45.4% 0.14 ± 16% sched_debug.cfs_rq:/.nr_running.stddev
9360 ± 3% -14.8% 7970 ± 5% sched_debug.cfs_rq:/.runnable_avg.avg
17159 ± 2% -11.1% 15252 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
10.95 ± 29% -40.7% 6.49 ± 38% sched_debug.cfs_rq:/.util_est_enqueued.min
15687 ± 45% -63.7% 5690 ± 32% sched_debug.cpu.curr->pid.min
3102 ± 27% +40.8% 4368 ± 12% sched_debug.cpu.curr->pid.stddev
10.01 ± 4% -12.0% 8.80 ± 4% sched_debug.cpu.nr_running.avg
0.59 ± 17% -41.2% 0.35 ± 37% sched_debug.cpu.nr_running.min
37876581 -11.4% 33570613 ± 3% sched_debug.cpu.nr_switches.avg
35602169 -13.2% 30897790 ± 4% sched_debug.cpu.nr_switches.min
4156219 +100.0% 8313659 sched_debug.sysctl_sched.sysctl_sched_features
10.52 -20.3% 8.39 perf-stat.i.MPKI
2.32 ± 3% +0.9 3.17 perf-stat.i.cache-miss-rate%
31427239 ± 3% +8.6% 34117986 perf-stat.i.cache-misses
1.402e+09 -20.6% 1.113e+09 perf-stat.i.cache-references
12967384 -8.7% 11840723 perf-stat.i.context-switches
813380 -29.9% 570390 perf-stat.i.cpu-migrations
9187 ± 3% -9.2% 8338 perf-stat.i.cycles-between-cache-misses
0.15 +0.0 0.16 perf-stat.i.dTLB-load-miss-rate%
59231906 +10.5% 65470826 perf-stat.i.dTLB-load-misses
4.024e+10 +1.4% 4.078e+10 perf-stat.i.dTLB-loads
2.43e+10 +1.3% 2.461e+10 perf-stat.i.dTLB-stores
96.03 -1.4 94.65 perf-stat.i.iTLB-load-miss-rate%
58608410 +9.2% 64015009 ± 2% perf-stat.i.iTLB-load-misses
2265582 ± 3% +46.3% 3313848 ± 4% perf-stat.i.iTLB-loads
2366 ± 2% -7.0% 2201 perf-stat.i.instructions-per-iTLB-miss
3691 -5.7% 3480 perf-stat.i.minor-faults
73.23 -2.5 70.70 perf-stat.i.node-load-miss-rate%
5437363 ± 3% -7.3% 5042872 ± 2% perf-stat.i.node-load-misses
1661376 +5.9% 1760010 ± 2% perf-stat.i.node-loads
3842299 +10.0% 4227782 perf-stat.i.node-stores
3691 -5.7% 3480 perf-stat.i.page-faults
10.40 -20.9% 8.22 perf-stat.overall.MPKI
2.24 ± 3% +0.8 3.06 perf-stat.overall.cache-miss-rate%
8874 ± 3% -8.3% 8137 perf-stat.overall.cycles-between-cache-misses
0.15 +0.0 0.16 perf-stat.overall.dTLB-load-miss-rate%
96.34 -1.2 95.14 perf-stat.overall.iTLB-load-miss-rate%
2295 -8.2% 2107 perf-stat.overall.instructions-per-iTLB-miss
76.36 -2.5 73.89 perf-stat.overall.node-load-miss-rate%
48328 -9.7% 43622 perf-stat.overall.path-length
31303400 ± 3% +8.6% 33992823 perf-stat.ps.cache-misses
1.398e+09 -20.7% 1.109e+09 perf-stat.ps.cache-references
12959378 -8.7% 11834104 perf-stat.ps.context-switches
805274 -30.1% 563237 perf-stat.ps.cpu-migrations
59231763 +10.5% 65460523 perf-stat.ps.dTLB-load-misses
4.023e+10 +1.3% 4.077e+10 perf-stat.ps.dTLB-loads
2.43e+10 +1.3% 2.461e+10 perf-stat.ps.dTLB-stores
58626745 +9.2% 64026396 ± 2% perf-stat.ps.iTLB-load-misses
2227130 ± 4% +47.0% 3273576 ± 4% perf-stat.ps.iTLB-loads
3651 -6.0% 3432 perf-stat.ps.minor-faults
5369350 ± 3% -7.2% 4980780 ± 2% perf-stat.ps.node-load-misses
1661038 +6.0% 1760000 ± 2% perf-stat.ps.node-loads
3850090 +9.9% 4232834 perf-stat.ps.node-stores
3651 -6.0% 3432 perf-stat.ps.page-faults
8.444e+13 -3.3% 8.166e+13 perf-stat.total.instructions
29801 ± 2% +13.3% 33780 ± 3% softirqs.CPU0.SCHED
24949 ± 5% +15.2% 28741 ± 2% softirqs.CPU1.SCHED
22867 ± 2% +21.7% 27840 ± 2% softirqs.CPU10.SCHED
27954 +10.9% 30992 ± 2% softirqs.CPU100.SCHED
22972 +20.0% 27565 ± 2% softirqs.CPU11.SCHED
22523 ± 2% +21.8% 27433 softirqs.CPU12.SCHED
23390 ± 2% +14.9% 26880 softirqs.CPU13.SCHED
23224 +20.2% 27915 softirqs.CPU14.SCHED
23261 ± 3% +20.2% 27951 ± 2% softirqs.CPU15.SCHED
22713 +20.9% 27466 softirqs.CPU16.SCHED
22639 +20.1% 27189 softirqs.CPU17.SCHED
22386 +20.1% 26882 ± 2% softirqs.CPU18.SCHED
22737 +20.0% 27276 softirqs.CPU19.SCHED
23520 ± 3% +20.2% 28278 softirqs.CPU2.SCHED
22922 ± 2% +18.2% 27104 softirqs.CPU20.SCHED
23363 +19.5% 27929 softirqs.CPU21.SCHED
23496 +18.5% 27836 softirqs.CPU22.SCHED
23528 +18.2% 27800 softirqs.CPU23.SCHED
23138 +19.2% 27582 softirqs.CPU24.SCHED
23168 ± 3% +20.4% 27898 softirqs.CPU25.SCHED
27789 ± 5% +14.1% 31717 ± 6% softirqs.CPU26.SCHED
28112 ± 3% +11.3% 31295 ± 4% softirqs.CPU29.SCHED
23800 ± 2% +18.1% 28114 softirqs.CPU3.SCHED
28084 ± 3% +11.7% 31359 ± 2% softirqs.CPU30.SCHED
28181 +9.7% 30928 ± 2% softirqs.CPU33.SCHED
28344 ± 2% +10.1% 31205 ± 2% softirqs.CPU34.SCHED
28080 ± 2% +10.2% 30943 ± 3% softirqs.CPU35.SCHED
27932 +9.8% 30682 ± 2% softirqs.CPU36.SCHED
27619 ± 3% +8.9% 30080 ± 3% softirqs.CPU37.SCHED
23274 +21.6% 28303 softirqs.CPU4.SCHED
23106 ± 2% +19.6% 27626 softirqs.CPU5.SCHED
23087 +20.4% 27800 softirqs.CPU52.SCHED
23440 ± 2% +19.5% 28017 softirqs.CPU53.SCHED
23342 +20.0% 28016 softirqs.CPU54.SCHED
23873 +17.2% 27986 softirqs.CPU55.SCHED
23333 +20.0% 27988 ± 2% softirqs.CPU56.SCHED
22864 ± 2% +22.1% 27921 ± 2% softirqs.CPU57.SCHED
22271 +22.4% 27250 softirqs.CPU58.SCHED
22998 +20.6% 27738 ± 2% softirqs.CPU59.SCHED
22402 +21.3% 27175 softirqs.CPU6.SCHED
23772 +17.6% 27948 ± 2% softirqs.CPU60.SCHED
23622 ± 2% +19.3% 28192 ± 2% softirqs.CPU61.SCHED
22977 +20.7% 27732 softirqs.CPU62.SCHED
22918 ± 2% +20.2% 27556 ± 2% softirqs.CPU63.SCHED
23058 +20.0% 27666 softirqs.CPU64.SCHED
23381 +17.3% 27433 ± 2% softirqs.CPU65.SCHED
23368 +19.5% 27917 softirqs.CPU66.SCHED
23433 +19.7% 28043 ± 2% softirqs.CPU67.SCHED
23179 +18.5% 27456 softirqs.CPU68.SCHED
23121 +17.9% 27263 softirqs.CPU69.SCHED
23342 +16.9% 27277 softirqs.CPU7.SCHED
22942 +18.4% 27173 softirqs.CPU70.SCHED
22866 +20.5% 27545 softirqs.CPU71.SCHED
23203 ± 2% +17.7% 27321 softirqs.CPU72.SCHED
23678 +16.4% 27556 softirqs.CPU73.SCHED
23867 +16.2% 27738 ± 2% softirqs.CPU74.SCHED
23760 +18.1% 28050 softirqs.CPU75.SCHED
23426 +18.0% 27642 ± 2% softirqs.CPU76.SCHED
23248 ± 3% +20.6% 28044 softirqs.CPU77.SCHED
23421 +19.7% 28032 softirqs.CPU8.SCHED
23822 +16.8% 27813 ± 2% softirqs.CPU9.SCHED
27090 ± 2% +11.7% 30247 ± 3% softirqs.CPU98.SCHED
2666040 +13.7% 3030860 softirqs.SCHED
15.91 ± 2% -4.3 11.64 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
15.39 ± 2% -4.1 11.25 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
15.13 ± 2% -4.1 11.03 ± 4% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
14.73 ± 2% -4.0 10.71 ± 4% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
25.73 -3.2 22.48 perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
26.64 -3.1 23.55 perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
78.57 -2.7 75.92 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.46 -2.6 26.83 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
79.47 -2.6 76.91 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
31.96 -2.4 29.54 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.31 ± 2% -2.2 8.10 ± 4% perf-profile.calltrace.cycles-pp.schedule.pipe_read.new_sync_read.vfs_read.ksys_read
10.11 ± 2% -2.2 7.92 ± 4% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_read.new_sync_read.vfs_read
21.79 -1.7 20.09 perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
22.74 -1.6 21.12 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.77 ± 2% -1.5 2.28 ± 4% perf-profile.calltrace.cycles-pp.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
2.63 ± 2% -1.3 1.38 ± 6% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common
5.01 ± 3% -1.3 3.76 ± 5% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__sched_text_start.schedule.pipe_read.new_sync_read
26.69 -1.1 25.57 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.92 ± 3% -1.1 4.83 ± 6% perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
6.07 ± 3% -1.0 5.08 ± 4% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
6.03 ± 3% -1.0 5.05 ± 4% perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
28.78 -1.0 27.81 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.61 -0.5 2.06 ± 3% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
2.39 ± 4% -0.5 1.87 ± 3% perf-profile.calltrace.cycles-pp.schedule.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.50 ± 3% -0.5 1.98 ± 4% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.28 ± 4% -0.5 1.78 ± 4% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.17 ± 7% -0.5 0.70 ± 12% perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_task_fair.__sched_text_start.schedule.pipe_read
1.22 ± 8% -0.5 0.75 ± 12% perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
2.27 ± 2% -0.4 1.82 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
1.74 ± 2% -0.3 1.40 ± 4% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule.pipe_read
1.90 ± 2% -0.3 1.56 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
1.02 ± 2% -0.3 0.74 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.__sched_text_start.schedule.pipe_read.new_sync_read
1.45 ± 2% -0.3 1.19 ± 2% perf-profile.calltrace.cycles-pp.switch_fpu_return.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.50 ± 2% -0.2 1.25 ± 3% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.pipe_read.new_sync_read
1.33 -0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read.ksys_read
1.18 ± 3% -0.2 0.94 ± 4% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.exit_to_usermode_loop.do_syscall_64
0.84 ± 3% -0.2 0.63 ± 4% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate
0.86 ± 2% -0.2 0.67 ± 5% perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
0.76 ± 2% -0.2 0.59 ± 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__sched_text_start.schedule.pipe_read
0.80 ± 2% -0.2 0.62 ± 2% perf-profile.calltrace.cycles-pp.reweight_entity.dequeue_task_fair.__sched_text_start.schedule.pipe_read
0.92 -0.2 0.75 ± 2% perf-profile.calltrace.cycles-pp.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
0.73 ± 4% -0.2 0.57 ± 4% perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule
0.83 -0.1 0.68 ± 2% perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common
0.88 ± 2% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__switch_to_asm
0.91 -0.1 0.79 ± 2% perf-profile.calltrace.cycles-pp.__switch_to
0.72 ± 2% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__sched_text_start.schedule.pipe_read
0.79 ± 2% -0.1 0.68 ± 2% perf-profile.calltrace.cycles-pp.native_write_msr
0.52 +0.0 0.56 ± 3% perf-profile.calltrace.cycles-pp.current_time.file_update_time.pipe_write.new_sync_write.vfs_write
1.63 ± 2% +0.1 1.69 perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.new_sync_write.vfs_write.ksys_write
1.35 +0.1 1.42 perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.new_sync_read.vfs_read.ksys_read
1.17 ± 3% +0.1 1.26 perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.new_sync_write.vfs_write.ksys_write
0.97 +0.1 1.07 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.new_sync_write.vfs_write.ksys_write
1.51 +0.1 1.61 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout.copy_page_to_iter.pipe_read.new_sync_read
1.49 ± 2% +0.1 1.59 ± 2% perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.__fdget_pos.ksys_write.do_syscall_64
1.22 +0.1 1.32 ± 2% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.92 +0.1 1.02 ± 2% perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.__fdget_pos.ksys_read.do_syscall_64
1.24 +0.1 1.34 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.71 ± 2% +0.1 1.82 ± 2% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.74 ± 2% +0.1 1.85 ± 3% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.03 +0.1 1.14 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
0.71 +0.1 0.84 ± 3% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.new_sync_read.vfs_read
1.68 +0.1 1.82 perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.pipe_read.new_sync_read.vfs_read
0.98 +0.1 1.12 ± 2% perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.new_sync_read.vfs_read.ksys_read
1.11 ± 2% +0.1 1.25 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.copy_page_from_iter.pipe_write.new_sync_write
1.04 +0.1 1.19 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
1.27 ± 2% +0.2 1.44 perf-profile.calltrace.cycles-pp.copyin.copy_page_from_iter.pipe_write.new_sync_write.vfs_write
0.96 +0.2 1.14 ± 2% perf-profile.calltrace.cycles-pp.fsnotify.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.73 ± 2% +0.2 0.92 ± 2% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_write.new_sync_write.vfs_write
0.98 ± 3% +0.2 1.18 ± 2% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read
0.76 ± 2% +0.2 0.96 ± 2% perf-profile.calltrace.cycles-pp.schedule.pipe_write.new_sync_write.vfs_write.ksys_write
1.01 ± 3% +0.2 1.21 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.new_sync_read
1.35 +0.2 1.56 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.02 ± 2% +0.2 1.24 ± 2% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.new_sync_read.vfs_read
1.10 ± 2% +0.2 1.33 ± 2% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.new_sync_read.vfs_read.ksys_read
1.79 +0.3 2.06 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.65 +0.3 2.96 perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.new_sync_write.vfs_write.ksys_write
3.18 +0.4 3.54 perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.new_sync_read.vfs_read.ksys_read
0.87 ± 5% +0.5 1.40 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
0.87 ± 5% +0.5 1.39 ± 2% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.87 ± 5% +0.5 1.40 ± 2% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
0.00 +0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.__might_fault.copy_page_to_iter.pipe_read.new_sync_read.vfs_read
0.00 +0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
0.88 ± 5% +0.5 1.41 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64
0.00 +0.9 0.87 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
0.00 +0.9 0.87 perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
7.36 +1.0 8.32 ± 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
8.02 +1.3 9.29 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
17.02 ± 2% -4.0 12.98 ± 3% perf-profile.children.cycles-pp.__wake_up_common_lock
16.42 ± 2% -3.9 12.49 ± 3% perf-profile.children.cycles-pp.__wake_up_common
16.14 ± 2% -3.9 12.26 ± 3% perf-profile.children.cycles-pp.autoremove_wake_function
15.85 ± 2% -3.9 11.99 ± 3% perf-profile.children.cycles-pp.try_to_wake_up
25.80 -3.2 22.56 perf-profile.children.cycles-pp.pipe_write
26.69 -3.1 23.59 perf-profile.children.cycles-pp.new_sync_write
78.69 -2.6 76.04 perf-profile.children.cycles-pp.do_syscall_64
29.50 -2.6 26.88 perf-profile.children.cycles-pp.vfs_write
79.52 -2.6 76.96 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
13.54 ± 2% -2.5 11.00 ± 3% perf-profile.children.cycles-pp.schedule
13.76 ± 2% -2.5 11.29 ± 3% perf-profile.children.cycles-pp.__sched_text_start
31.98 -2.4 29.58 perf-profile.children.cycles-pp.ksys_write
21.89 -1.7 20.20 perf-profile.children.cycles-pp.pipe_read
22.80 -1.6 21.18 perf-profile.children.cycles-pp.new_sync_read
6.55 ± 2% -1.4 5.11 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate
6.52 ± 2% -1.4 5.08 ± 4% perf-profile.children.cycles-pp.activate_task
4.06 -1.4 2.62 ± 4% perf-profile.children.cycles-pp.select_task_rq_fair
6.39 ± 2% -1.4 4.99 ± 4% perf-profile.children.cycles-pp.enqueue_task_fair
2.90 -1.2 1.66 ± 4% perf-profile.children.cycles-pp.select_idle_sibling
5.41 ± 3% -1.2 4.23 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair
26.74 -1.1 25.62 perf-profile.children.cycles-pp.vfs_read
28.82 -1.0 27.85 perf-profile.children.cycles-pp.ksys_read
2.79 ± 7% -0.9 1.85 ± 10% perf-profile.children.cycles-pp.update_cfs_group
3.88 -0.8 3.06 ± 4% perf-profile.children.cycles-pp._raw_spin_lock
3.88 ± 2% -0.7 3.17 ± 3% perf-profile.children.cycles-pp.update_load_avg
1.05 ± 2% -0.6 0.47 ± 6% perf-profile.children.cycles-pp.available_idle_cpu
3.21 -0.6 2.66 ± 4% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
2.56 ± 3% -0.5 2.04 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
2.83 -0.5 2.32 ± 2% perf-profile.children.cycles-pp.enqueue_entity
2.71 -0.5 2.24 ± 3% perf-profile.children.cycles-pp.update_curr
3.10 ± 2% -0.4 2.67 ± 3% perf-profile.children.cycles-pp.pick_next_task_fair
1.84 ± 2% -0.3 1.50 ± 3% perf-profile.children.cycles-pp.reweight_entity
1.89 ± 2% -0.3 1.58 ± 4% perf-profile.children.cycles-pp.dequeue_entity
0.35 ± 2% -0.3 0.09 ± 9% perf-profile.children.cycles-pp.cpumask_next_wrap
1.46 ± 2% -0.3 1.20 ± 2% perf-profile.children.cycles-pp.switch_fpu_return
1.29 ± 2% -0.2 1.08 ± 3% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
1.48 -0.2 1.27 ± 3% perf-profile.children.cycles-pp.prepare_to_wait_event
1.49 -0.2 1.29 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.68 ± 2% -0.2 0.47 ± 2% perf-profile.children.cycles-pp.account_entity_dequeue
1.33 -0.2 1.16 ± 3% perf-profile.children.cycles-pp.__switch_to
0.92 -0.2 0.75 ± 4% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.96 ± 3% -0.2 0.79 ± 3% perf-profile.children.cycles-pp.update_rq_clock
0.33 -0.2 0.18 ± 4% perf-profile.children.cycles-pp._find_next_bit
0.64 ± 3% -0.2 0.48 ± 2% perf-profile.children.cycles-pp.___perf_sw_event
0.98 -0.1 0.84 ± 2% perf-profile.children.cycles-pp.ttwu_do_wakeup
1.15 -0.1 1.02 ± 2% perf-profile.children.cycles-pp.set_next_entity
0.90 -0.1 0.76 perf-profile.children.cycles-pp.check_preempt_curr
0.97 ± 2% -0.1 0.85 perf-profile.children.cycles-pp.__update_load_avg_se
0.26 ± 8% -0.1 0.13 ± 17% perf-profile.children.cycles-pp.set_task_cpu
0.46 ± 3% -0.1 0.34 ± 3% perf-profile.children.cycles-pp.sched_clock_cpu
0.90 ± 3% -0.1 0.78 ± 2% perf-profile.children.cycles-pp.__switch_to_asm
0.60 ± 3% -0.1 0.48 ± 2% perf-profile.children.cycles-pp.pick_next_entity
0.29 ± 2% -0.1 0.18 ± 6% perf-profile.children.cycles-pp.__mutex_unlock_slowpath
0.42 ± 3% -0.1 0.31 ± 3% perf-profile.children.cycles-pp.sched_clock
0.86 ± 2% -0.1 0.76 ± 2% perf-profile.children.cycles-pp.native_write_msr
0.67 -0.1 0.57 ± 2% perf-profile.children.cycles-pp.check_preempt_wakeup
0.40 ± 2% -0.1 0.30 ± 4% perf-profile.children.cycles-pp.native_sched_clock
0.53 -0.1 0.42 ± 3% perf-profile.children.cycles-pp.__enqueue_entity
0.49 ± 2% -0.1 0.41 ± 3% perf-profile.children.cycles-pp.update_min_vruntime
0.49 ± 3% -0.1 0.41 ± 4% perf-profile.children.cycles-pp.__list_del_entry_valid
0.55 ± 2% -0.1 0.48 ± 4% perf-profile.children.cycles-pp.put_prev_entity
0.58 ± 2% -0.1 0.51 ± 2% perf-profile.children.cycles-pp.__calc_delta
0.15 ± 4% -0.1 0.08 ± 13% perf-profile.children.cycles-pp.migrate_task_rq_fair
0.33 ± 5% -0.1 0.27 ± 8% perf-profile.children.cycles-pp.cpuacct_charge
0.16 ± 5% -0.0 0.11 ± 9% perf-profile.children.cycles-pp.cpus_share_cache
0.38 ± 3% -0.0 0.33 ± 3% perf-profile.children.cycles-pp.account_entity_enqueue
0.14 ± 5% -0.0 0.09 ± 11% perf-profile.children.cycles-pp.wake_up_q
0.24 ± 3% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.clear_buddies
0.30 ± 3% -0.0 0.25 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.16 ± 2% -0.0 0.12 ± 7% perf-profile.children.cycles-pp.resched_curr
0.18 ± 3% -0.0 0.14 ± 5% perf-profile.children.cycles-pp.update_cfs_rq_h_load
0.22 ± 4% -0.0 0.19 ± 6% perf-profile.children.cycles-pp.set_next_buddy
0.21 -0.0 0.18 ± 4% perf-profile.children.cycles-pp.rb_insert_color
0.30 ± 2% -0.0 0.28 ± 3% perf-profile.children.cycles-pp.anon_pipe_buf_release
0.20 ± 2% -0.0 0.17 ± 4% perf-profile.children.cycles-pp.cpumask_next
0.13 ± 3% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.wakeup_preempt_entity
0.08 -0.0 0.06 ± 7% perf-profile.children.cycles-pp.get_xsave_addr
0.17 ± 4% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.finish_wait
0.16 ± 2% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.__list_add_valid
0.12 ± 4% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.rb_next
0.07 -0.0 0.06 perf-profile.children.cycles-pp.perf_swevent_event
0.06 -0.0 0.05 perf-profile.children.cycles-pp.kill_fasync
0.37 +0.0 0.40 ± 2% perf-profile.children.cycles-pp.__x64_sys_read
0.15 ± 5% +0.0 0.18 ± 3% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
0.18 ± 2% +0.0 0.21 ± 2% perf-profile.children.cycles-pp.iov_iter_init
0.06 ± 7% +0.0 0.09 ± 4% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.30 ± 2% +0.0 0.33 ± 5% perf-profile.children.cycles-pp.update_process_times
0.24 +0.0 0.27 ± 6% perf-profile.children.cycles-pp.schedule_idle
0.20 ± 2% +0.0 0.24 ± 3% perf-profile.children.cycles-pp.fpregs_assert_state_consistent
0.09 ± 7% +0.0 0.13 ± 5% perf-profile.children.cycles-pp.menu_select
0.54 +0.0 0.58 ± 2% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.03 ±100% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.tick_nohz_next_event
0.18 ± 2% +0.0 0.23 ± 6% perf-profile.children.cycles-pp.__sb_start_write
0.34 ± 3% +0.0 0.39 perf-profile.children.cycles-pp.generic_pipe_buf_confirm
0.01 ±173% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.24 +0.0 0.28 ± 3% perf-profile.children.cycles-pp.__sb_end_write
0.39 ± 2% +0.0 0.43 perf-profile.children.cycles-pp.aa_file_perm
0.26 +0.0 0.31 ± 2% perf-profile.children.cycles-pp.rcu_all_qs
1.01 +0.1 1.06 ± 2% perf-profile.children.cycles-pp.__might_sleep
0.35 ± 3% +0.1 0.41 perf-profile.children.cycles-pp.__x64_sys_write
0.78 +0.1 0.84 perf-profile.children.cycles-pp.fput_many
0.75 +0.1 0.82 perf-profile.children.cycles-pp._cond_resched
0.24 ± 7% +0.1 0.32 ± 5% perf-profile.children.cycles-pp.poll_idle
0.44 +0.1 0.53 ± 2% perf-profile.children.cycles-pp.apparmor_file_permission
0.99 +0.1 1.09 ± 2% perf-profile.children.cycles-pp.file_update_time
1.64 +0.1 1.75 perf-profile.children.cycles-pp.___might_sleep
1.01 +0.1 1.12 ± 2% perf-profile.children.cycles-pp.current_time
1.64 ± 2% +0.1 1.75 perf-profile.children.cycles-pp.mutex_unlock
0.72 +0.1 0.85 ± 3% perf-profile.children.cycles-pp.atime_needs_update
3.06 +0.1 3.19 perf-profile.children.cycles-pp.mutex_lock
1.69 +0.1 1.83 perf-profile.children.cycles-pp.copyout
0.83 +0.1 0.97 perf-profile.children.cycles-pp.__might_fault
0.99 +0.1 1.13 ± 2% perf-profile.children.cycles-pp.touch_atime
1.29 ± 2% +0.2 1.46 perf-profile.children.cycles-pp.copyin
1.02 ± 3% +0.2 1.20 ± 4% perf-profile.children.cycles-pp.__fsnotify_parent
2.44 +0.2 2.64 ± 2% perf-profile.children.cycles-pp.__fget_files
2.96 +0.2 3.16 ± 2% perf-profile.children.cycles-pp.__fget_light
1.41 ± 2% +0.2 1.62 ± 3% perf-profile.children.cycles-pp.fsnotify
3.03 +0.2 3.25 ± 2% perf-profile.children.cycles-pp.__fdget_pos
2.08 +0.3 2.34 perf-profile.children.cycles-pp.common_file_perm
2.82 +0.3 3.10 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
2.75 +0.3 3.05 perf-profile.children.cycles-pp.copy_page_from_iter
0.20 ± 7% +0.3 0.54 ± 2% perf-profile.children.cycles-pp.intel_idle
3.23 +0.4 3.60 perf-profile.children.cycles-pp.copy_page_to_iter
0.45 ± 7% +0.4 0.88 perf-profile.children.cycles-pp.cpuidle_enter_state
0.45 ± 7% +0.4 0.88 perf-profile.children.cycles-pp.cpuidle_enter
3.17 +0.5 3.67 perf-profile.children.cycles-pp.security_file_permission
0.87 ± 5% +0.5 1.40 ± 2% perf-profile.children.cycles-pp.start_secondary
0.88 ± 5% +0.5 1.41 ± 2% perf-profile.children.cycles-pp.do_idle
0.88 ± 5% +0.5 1.41 ± 2% perf-profile.children.cycles-pp.secondary_startup_64
0.88 ± 5% +0.5 1.41 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
8.30 +1.1 9.35 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
8.04 +1.3 9.30 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
2.79 ± 7% -0.9 1.84 ± 10% perf-profile.self.cycles-pp.update_cfs_group
1.03 ± 2% -0.6 0.47 ± 6% perf-profile.self.cycles-pp.available_idle_cpu
3.21 -0.5 2.66 ± 4% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
1.59 ± 2% -0.3 1.25 ± 5% perf-profile.self.cycles-pp.update_load_avg
0.85 ± 2% -0.3 0.58 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
1.45 ± 2% -0.3 1.19 ± 2% perf-profile.self.cycles-pp.switch_fpu_return
1.34 ± 2% -0.3 1.08 ± 2% perf-profile.self.cycles-pp.update_curr
1.66 ± 2% -0.2 1.43 ± 2% perf-profile.self.cycles-pp.__sched_text_start
1.01 -0.2 0.80 ± 4% perf-profile.self.cycles-pp.select_idle_sibling
1.26 ± 3% -0.2 1.05 ± 3% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
1.29 -0.2 1.09 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.55 -0.2 0.37 ± 2% perf-profile.self.cycles-pp.account_entity_dequeue
0.97 -0.2 0.82 ± 3% perf-profile.self.cycles-pp.select_task_rq_fair
1.21 -0.2 1.06 ± 3% perf-profile.self.cycles-pp.__switch_to
0.33 -0.1 0.18 ± 2% perf-profile.self.cycles-pp._find_next_bit
0.63 ± 2% -0.1 0.49 ± 3% perf-profile.self.cycles-pp.update_rq_clock
0.56 ± 3% -0.1 0.42 ± 2% perf-profile.self.cycles-pp.___perf_sw_event
0.95 ± 2% -0.1 0.83 perf-profile.self.cycles-pp.__update_load_avg_se
0.89 ± 2% -0.1 0.77 ± 2% perf-profile.self.cycles-pp.__switch_to_asm
0.17 -0.1 0.05 ± 8% perf-profile.self.cycles-pp.cpumask_next_wrap
0.70 -0.1 0.59 ± 4% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.60 ± 2% -0.1 0.50 ± 2% perf-profile.self.cycles-pp.enqueue_task_fair
0.40 ± 2% -0.1 0.29 ± 4% perf-profile.self.cycles-pp.native_sched_clock
0.86 ± 2% -0.1 0.75 ± 2% perf-profile.self.cycles-pp.native_write_msr
0.52 -0.1 0.41 ± 4% perf-profile.self.cycles-pp.__enqueue_entity
0.63 -0.1 0.53 ± 2% perf-profile.self.cycles-pp.prepare_to_wait_event
0.49 ± 2% -0.1 0.40 ± 4% perf-profile.self.cycles-pp.__list_del_entry_valid
0.48 ± 2% -0.1 0.40 ± 3% perf-profile.self.cycles-pp.update_min_vruntime
0.10 ± 14% -0.1 0.03 ±100% perf-profile.self.cycles-pp.set_task_cpu
0.57 ± 2% -0.1 0.50 ± 2% perf-profile.self.cycles-pp.__calc_delta
0.52 ± 3% -0.1 0.45 ± 5% perf-profile.self.cycles-pp.pick_next_task_fair
0.49 ± 2% -0.1 0.42 ± 2% perf-profile.self.cycles-pp.dequeue_task_fair
0.42 ± 2% -0.1 0.35 ± 4% perf-profile.self.cycles-pp.try_to_wake_up
0.09 ± 7% -0.1 0.03 ±100% perf-profile.self.cycles-pp.migrate_task_rq_fair
0.36 ± 4% -0.1 0.31 ± 2% perf-profile.self.cycles-pp.enqueue_entity
0.32 ± 3% -0.1 0.26 ± 3% perf-profile.self.cycles-pp.pick_next_entity
0.32 ± 5% -0.1 0.27 ± 7% perf-profile.self.cycles-pp.cpuacct_charge
0.28 -0.0 0.23 ± 6% perf-profile.self.cycles-pp.__wake_up_common
0.21 ± 2% -0.0 0.17 ± 3% perf-profile.self.cycles-pp.clear_buddies
0.16 ± 7% -0.0 0.11 ± 9% perf-profile.self.cycles-pp.cpus_share_cache
0.33 ± 3% -0.0 0.29 ± 2% perf-profile.self.cycles-pp.check_preempt_wakeup
0.29 ± 2% -0.0 0.25 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.18 ± 3% -0.0 0.14 ± 5% perf-profile.self.cycles-pp.update_cfs_rq_h_load
0.21 ± 2% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.check_preempt_curr
0.32 ± 3% -0.0 0.28 ± 2% perf-profile.self.cycles-pp.account_entity_enqueue
0.21 ± 5% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.set_next_buddy
0.15 ± 2% -0.0 0.12 ± 7% perf-profile.self.cycles-pp.resched_curr
0.06 -0.0 0.03 ±100% perf-profile.self.cycles-pp.kill_fasync
0.21 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.rb_insert_color
0.29 ± 2% -0.0 0.26 ± 3% perf-profile.self.cycles-pp.anon_pipe_buf_release
0.12 -0.0 0.10 ± 5% perf-profile.self.cycles-pp.activate_task
0.24 -0.0 0.21 ± 5% perf-profile.self.cycles-pp.schedule
0.11 ± 4% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.wakeup_preempt_entity
0.09 ± 4% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.__wake_up_common_lock
0.10 ± 4% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.exit_to_usermode_loop
0.07 ± 5% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.get_xsave_addr
0.15 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.__list_add_valid
0.12 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.rb_next
0.12 ± 4% -0.0 0.11 ± 3% perf-profile.self.cycles-pp.autoremove_wake_function
0.14 ± 3% +0.0 0.16 ± 2% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
0.17 ± 2% +0.0 0.19 ± 3% perf-profile.self.cycles-pp.touch_atime
0.09 ± 5% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.copyout
0.16 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.iov_iter_init
0.34 +0.0 0.36 ± 2% perf-profile.self.cycles-pp.__x64_sys_read
0.22 +0.0 0.24 ± 3% perf-profile.self.cycles-pp._cond_resched
0.20 ± 2% +0.0 0.23 perf-profile.self.cycles-pp.__might_fault
0.39 ± 2% +0.0 0.42 perf-profile.self.cycles-pp.ksys_read
0.20 ± 2% +0.0 0.24 ± 4% perf-profile.self.cycles-pp.rcu_all_qs
0.32 +0.0 0.35 perf-profile.self.cycles-pp.generic_pipe_buf_confirm
0.19 ± 2% +0.0 0.22 ± 3% perf-profile.self.cycles-pp.fpregs_assert_state_consistent
0.18 ± 2% +0.0 0.22 ± 6% perf-profile.self.cycles-pp.__sb_start_write
0.23 +0.0 0.27 ± 5% perf-profile.self.cycles-pp.__sb_end_write
0.91 +0.0 0.95 ± 2% perf-profile.self.cycles-pp.__might_sleep
0.38 ± 2% +0.0 0.42 perf-profile.self.cycles-pp.aa_file_perm
0.46 +0.0 0.51 ± 3% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.34 ± 2% +0.0 0.39 perf-profile.self.cycles-pp.ksys_write
0.79 +0.1 0.84 ± 4% perf-profile.self.cycles-pp.new_sync_read
0.33 ± 2% +0.1 0.38 ± 2% perf-profile.self.cycles-pp.__x64_sys_write
0.36 ± 3% +0.1 0.42 ± 2% perf-profile.self.cycles-pp.file_update_time
0.28 ± 2% +0.1 0.34 ± 6% perf-profile.self.cycles-pp.atime_needs_update
0.76 +0.1 0.82 perf-profile.self.cycles-pp.fput_many
0.56 +0.1 0.62 ± 3% perf-profile.self.cycles-pp.copy_page_from_iter
0.64 +0.1 0.71 perf-profile.self.cycles-pp.copy_page_to_iter
1.55 +0.1 1.62 perf-profile.self.cycles-pp.mutex_lock
0.23 ± 7% +0.1 0.30 ± 5% perf-profile.self.cycles-pp.poll_idle
0.38 +0.1 0.45 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
1.64 +0.1 1.73 perf-profile.self.cycles-pp.pipe_read
0.38 ± 2% +0.1 0.47 ± 4% perf-profile.self.cycles-pp.current_time
0.37 +0.1 0.46 ± 2% perf-profile.self.cycles-pp.security_file_permission
0.82 +0.1 0.92 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.59 +0.1 1.70 perf-profile.self.cycles-pp.___might_sleep
0.55 ± 5% +0.1 0.66 ± 4% perf-profile.self.cycles-pp.vfs_write
1.60 ± 2% +0.1 1.71 perf-profile.self.cycles-pp.mutex_unlock
0.81 +0.1 0.94 ± 3% perf-profile.self.cycles-pp.new_sync_write
0.96 ± 4% +0.2 1.14 ± 4% perf-profile.self.cycles-pp.__fsnotify_parent
2.41 +0.2 2.61 ± 2% perf-profile.self.cycles-pp.__fget_files
1.36 ± 2% +0.2 1.57 ± 3% perf-profile.self.cycles-pp.fsnotify
1.66 +0.2 1.88 perf-profile.self.cycles-pp.common_file_perm
1.46 ± 2% +0.2 1.69 ± 3% perf-profile.self.cycles-pp.pipe_write
2.77 +0.3 3.03 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.20 ± 7% +0.3 0.54 ± 2% perf-profile.self.cycles-pp.intel_idle
8.29 +1.0 9.33 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
8.04 +1.3 9.30 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
12.90 +1.4 14.26 perf-profile.self.cycles-pp.do_syscall_64
2647 ± 65% -71.3% 760.75 ± 48% interrupts.40:PCI-MSI.67633155-edge.eth0-TxRx-2
37863804 ± 2% -28.7% 26986660 ± 6% interrupts.CAL:Function_call_interrupts
362158 ± 2% -28.2% 260143 ± 6% interrupts.CPU0.CAL:Function_call_interrupts
4172754 -19.2% 3373668 interrupts.CPU0.RES:Rescheduling_interrupts
401592 ± 2% -24.0% 305064 ± 5% interrupts.CPU0.TLB:TLB_shootdowns
363963 ± 2% -28.3% 260808 ± 6% interrupts.CPU1.CAL:Function_call_interrupts
4140181 ± 2% -18.6% 3369331 interrupts.CPU1.RES:Rescheduling_interrupts
401513 ± 2% -24.0% 304981 ± 5% interrupts.CPU1.TLB:TLB_shootdowns
362307 ± 3% -28.3% 259881 ± 6% interrupts.CPU10.CAL:Function_call_interrupts
4062554 -15.5% 3431657 interrupts.CPU10.RES:Rescheduling_interrupts
402121 ± 2% -24.3% 304425 ± 5% interrupts.CPU10.TLB:TLB_shootdowns
368817 ± 2% -29.3% 260608 ± 6% interrupts.CPU100.CAL:Function_call_interrupts
4800401 -16.2% 4023024 ± 2% interrupts.CPU100.RES:Rescheduling_interrupts
401718 ± 2% -24.7% 302610 ± 5% interrupts.CPU100.TLB:TLB_shootdowns
366551 ± 2% -29.3% 259014 ± 6% interrupts.CPU101.CAL:Function_call_interrupts
4613609 ± 2% -13.6% 3984672 ± 2% interrupts.CPU101.RES:Rescheduling_interrupts
401293 ± 2% -24.6% 302610 ± 5% interrupts.CPU101.TLB:TLB_shootdowns
366857 ± 2% -30.3% 255792 ± 6% interrupts.CPU102.CAL:Function_call_interrupts
4757720 -18.0% 3900237 interrupts.CPU102.RES:Rescheduling_interrupts
400915 ± 2% -24.5% 302647 ± 5% interrupts.CPU102.TLB:TLB_shootdowns
360310 ± 3% -28.4% 258147 ± 6% interrupts.CPU103.CAL:Function_call_interrupts
4724316 ± 3% -15.9% 3972301 interrupts.CPU103.RES:Rescheduling_interrupts
396576 ± 2% -24.3% 300113 ± 5% interrupts.CPU103.TLB:TLB_shootdowns
362968 ± 2% -28.1% 260975 ± 6% interrupts.CPU11.CAL:Function_call_interrupts
3981921 -15.3% 3372172 ± 3% interrupts.CPU11.RES:Rescheduling_interrupts
402048 ± 2% -24.2% 304812 ± 5% interrupts.CPU11.TLB:TLB_shootdowns
363055 ± 2% -28.1% 261210 ± 6% interrupts.CPU12.CAL:Function_call_interrupts
3984389 ± 2% -15.2% 3379172 ± 2% interrupts.CPU12.RES:Rescheduling_interrupts
402291 ± 2% -24.1% 305453 ± 5% interrupts.CPU12.TLB:TLB_shootdowns
360471 ± 2% -27.6% 260902 ± 6% interrupts.CPU13.CAL:Function_call_interrupts
4116935 ± 2% -17.9% 3378167 interrupts.CPU13.RES:Rescheduling_interrupts
401996 ± 2% -23.9% 305788 ± 5% interrupts.CPU13.TLB:TLB_shootdowns
364891 ± 2% -28.4% 261201 ± 6% interrupts.CPU14.CAL:Function_call_interrupts
4102020 -14.0% 3526194 ± 5% interrupts.CPU14.RES:Rescheduling_interrupts
402116 ± 2% -24.1% 305374 ± 5% interrupts.CPU14.TLB:TLB_shootdowns
359603 ± 2% -27.4% 261087 ± 6% interrupts.CPU15.CAL:Function_call_interrupts
4095312 ± 3% -16.8% 3407557 ± 3% interrupts.CPU15.RES:Rescheduling_interrupts
402187 ± 2% -24.2% 305046 ± 5% interrupts.CPU15.TLB:TLB_shootdowns
364327 ± 2% -28.6% 260269 ± 6% interrupts.CPU16.CAL:Function_call_interrupts
4029416 -16.0% 3385620 ± 2% interrupts.CPU16.RES:Rescheduling_interrupts
402498 ± 2% -24.3% 304787 ± 5% interrupts.CPU16.TLB:TLB_shootdowns
363982 ± 2% -28.5% 260183 ± 6% interrupts.CPU17.CAL:Function_call_interrupts
4084473 -17.3% 3379738 ± 2% interrupts.CPU17.RES:Rescheduling_interrupts
402531 ± 2% -24.2% 305136 ± 5% interrupts.CPU17.TLB:TLB_shootdowns
360338 -27.8% 260100 ± 6% interrupts.CPU18.CAL:Function_call_interrupts
3930863 ± 2% -13.7% 3390527 interrupts.CPU18.RES:Rescheduling_interrupts
401062 ± 2% -23.9% 305293 ± 5% interrupts.CPU18.TLB:TLB_shootdowns
364435 ± 2% -28.7% 259901 ± 7% interrupts.CPU19.CAL:Function_call_interrupts
3928379 -15.6% 3315295 interrupts.CPU19.RES:Rescheduling_interrupts
402604 ± 2% -24.3% 304913 ± 5% interrupts.CPU19.TLB:TLB_shootdowns
362066 ± 2% -28.0% 260638 ± 7% interrupts.CPU2.CAL:Function_call_interrupts
4049789 ± 4% -13.5% 3501632 ± 3% interrupts.CPU2.RES:Rescheduling_interrupts
401996 ± 2% -24.2% 304544 ± 5% interrupts.CPU2.TLB:TLB_shootdowns
363960 ± 2% -28.5% 260076 ± 6% interrupts.CPU20.CAL:Function_call_interrupts
4079535 ± 3% -18.2% 3335972 interrupts.CPU20.RES:Rescheduling_interrupts
401941 ± 2% -24.0% 305365 ± 5% interrupts.CPU20.TLB:TLB_shootdowns
363498 -28.3% 260522 ± 6% interrupts.CPU21.CAL:Function_call_interrupts
4057077 -14.8% 3455466 ± 3% interrupts.CPU21.RES:Rescheduling_interrupts
402264 ± 2% -24.1% 305299 ± 5% interrupts.CPU21.TLB:TLB_shootdowns
363798 ± 2% -28.6% 259759 ± 6% interrupts.CPU22.CAL:Function_call_interrupts
4074066 -16.2% 3415930 interrupts.CPU22.RES:Rescheduling_interrupts
402227 ± 2% -24.2% 304981 ± 5% interrupts.CPU22.TLB:TLB_shootdowns
365438 ± 2% -29.0% 259418 ± 6% interrupts.CPU23.CAL:Function_call_interrupts
4118858 ± 2% -15.9% 3463648 interrupts.CPU23.RES:Rescheduling_interrupts
402592 ± 2% -24.1% 305374 ± 5% interrupts.CPU23.TLB:TLB_shootdowns
364005 ± 2% -29.2% 257845 ± 7% interrupts.CPU24.CAL:Function_call_interrupts
4000362 ± 2% -14.8% 3408667 interrupts.CPU24.RES:Rescheduling_interrupts
402632 ± 2% -24.2% 305195 ± 5% interrupts.CPU24.TLB:TLB_shootdowns
364387 ± 2% -28.2% 261770 ± 6% interrupts.CPU25.CAL:Function_call_interrupts
3994036 ± 2% -15.0% 3396788 ± 4% interrupts.CPU25.RES:Rescheduling_interrupts
402505 ± 2% -23.9% 306417 ± 5% interrupts.CPU25.TLB:TLB_shootdowns
361758 ± 2% -28.7% 257829 ± 6% interrupts.CPU26.CAL:Function_call_interrupts
4630011 ± 2% -15.6% 3908644 ± 2% interrupts.CPU26.RES:Rescheduling_interrupts
401986 ± 2% -24.4% 303767 ± 5% interrupts.CPU26.TLB:TLB_shootdowns
363712 ± 2% -29.2% 257597 ± 6% interrupts.CPU27.CAL:Function_call_interrupts
4650224 ± 4% -15.0% 3951161 ± 4% interrupts.CPU27.RES:Rescheduling_interrupts
401070 ± 2% -24.4% 303358 ± 5% interrupts.CPU27.TLB:TLB_shootdowns
363333 ± 2% -28.8% 258804 ± 5% interrupts.CPU28.CAL:Function_call_interrupts
4740992 ± 3% -16.9% 3941433 ± 3% interrupts.CPU28.RES:Rescheduling_interrupts
400972 ± 2% -24.3% 303626 ± 5% interrupts.CPU28.TLB:TLB_shootdowns
358463 ± 2% -28.6% 255899 ± 6% interrupts.CPU29.CAL:Function_call_interrupts
4796173 -17.5% 3956275 interrupts.CPU29.RES:Rescheduling_interrupts
400637 ± 2% -24.3% 303302 ± 5% interrupts.CPU29.TLB:TLB_shootdowns
362764 ± 2% -28.3% 260048 ± 7% interrupts.CPU3.CAL:Function_call_interrupts
4124342 ± 2% -15.9% 3469625 interrupts.CPU3.RES:Rescheduling_interrupts
402080 ± 2% -24.1% 305153 ± 5% interrupts.CPU3.TLB:TLB_shootdowns
362499 ± 2% -28.9% 257858 ± 7% interrupts.CPU30.CAL:Function_call_interrupts
4722536 ± 2% -16.3% 3953569 interrupts.CPU30.RES:Rescheduling_interrupts
400635 ± 2% -24.3% 303089 ± 5% interrupts.CPU30.TLB:TLB_shootdowns
361719 ± 2% -29.3% 255783 ± 7% interrupts.CPU31.CAL:Function_call_interrupts
4617452 -15.1% 3918955 ± 2% interrupts.CPU31.RES:Rescheduling_interrupts
400625 ± 2% -24.2% 303491 ± 5% interrupts.CPU31.TLB:TLB_shootdowns
2647 ± 65% -71.3% 760.75 ± 48% interrupts.CPU32.40:PCI-MSI.67633155-edge.eth0-TxRx-2
364248 ± 2% -29.1% 258148 ± 6% interrupts.CPU32.CAL:Function_call_interrupts
4645516 -16.8% 3864633 ± 2% interrupts.CPU32.RES:Rescheduling_interrupts
400580 ± 2% -24.4% 302773 ± 5% interrupts.CPU32.TLB:TLB_shootdowns
360856 ± 2% -29.0% 256142 ± 6% interrupts.CPU33.CAL:Function_call_interrupts
4688960 -15.5% 3964179 interrupts.CPU33.RES:Rescheduling_interrupts
401136 ± 2% -24.7% 302028 ± 5% interrupts.CPU33.TLB:TLB_shootdowns
362749 -29.3% 256630 ± 7% interrupts.CPU34.CAL:Function_call_interrupts
4700963 -14.8% 4003039 ± 2% interrupts.CPU34.RES:Rescheduling_interrupts
400231 ± 2% -24.4% 302608 ± 5% interrupts.CPU34.TLB:TLB_shootdowns
362555 ± 2% -29.3% 256380 ± 6% interrupts.CPU35.CAL:Function_call_interrupts
4720301 -15.9% 3968223 interrupts.CPU35.RES:Rescheduling_interrupts
401338 ± 2% -24.4% 303502 ± 5% interrupts.CPU35.TLB:TLB_shootdowns
363104 ± 2% -29.0% 257882 ± 6% interrupts.CPU36.CAL:Function_call_interrupts
4755161 -15.8% 4001989 ± 2% interrupts.CPU36.RES:Rescheduling_interrupts
401644 ± 2% -24.7% 302312 ± 5% interrupts.CPU36.TLB:TLB_shootdowns
361008 ± 3% -28.9% 256628 ± 6% interrupts.CPU37.CAL:Function_call_interrupts
4669550 ± 2% -16.8% 3884392 ± 3% interrupts.CPU37.RES:Rescheduling_interrupts
401123 ± 2% -24.6% 302492 ± 5% interrupts.CPU37.TLB:TLB_shootdowns
364373 ± 2% -29.4% 257238 ± 6% interrupts.CPU38.CAL:Function_call_interrupts
4671819 ± 2% -15.7% 3939020 interrupts.CPU38.RES:Rescheduling_interrupts
402122 ± 2% -24.8% 302424 ± 5% interrupts.CPU38.TLB:TLB_shootdowns
362451 ± 2% -29.6% 255300 ± 6% interrupts.CPU39.CAL:Function_call_interrupts
4627452 -15.9% 3890773 interrupts.CPU39.RES:Rescheduling_interrupts
400852 ± 2% -24.5% 302499 ± 5% interrupts.CPU39.TLB:TLB_shootdowns
364114 ± 2% -28.8% 259317 ± 6% interrupts.CPU4.CAL:Function_call_interrupts
4066359 ± 2% -17.6% 3350696 ± 2% interrupts.CPU4.RES:Rescheduling_interrupts
402091 ± 2% -24.1% 305297 ± 5% interrupts.CPU4.TLB:TLB_shootdowns
362154 ± 2% -29.4% 255616 ± 5% interrupts.CPU40.CAL:Function_call_interrupts
4835690 ± 4% -18.1% 3962808 interrupts.CPU40.RES:Rescheduling_interrupts
400687 ± 2% -24.4% 302784 ± 5% interrupts.CPU40.TLB:TLB_shootdowns
364832 ± 2% -30.4% 254102 ± 5% interrupts.CPU41.CAL:Function_call_interrupts
4802950 ± 2% -16.5% 4010224 interrupts.CPU41.RES:Rescheduling_interrupts
401241 ± 2% -24.7% 302125 ± 5% interrupts.CPU41.TLB:TLB_shootdowns
364876 ± 2% -29.4% 257505 ± 6% interrupts.CPU42.CAL:Function_call_interrupts
4709549 ± 3% -15.6% 3973577 interrupts.CPU42.RES:Rescheduling_interrupts
401497 ± 2% -24.6% 302726 ± 5% interrupts.CPU42.TLB:TLB_shootdowns
364435 ± 2% -29.6% 256622 ± 5% interrupts.CPU43.CAL:Function_call_interrupts
4650903 -15.5% 3931929 interrupts.CPU43.RES:Rescheduling_interrupts
401222 ± 2% -24.4% 303154 ± 5% interrupts.CPU43.TLB:TLB_shootdowns
363392 ± 2% -29.0% 257903 ± 6% interrupts.CPU44.CAL:Function_call_interrupts
4642435 ± 3% -16.4% 3881937 interrupts.CPU44.RES:Rescheduling_interrupts
401265 ± 2% -24.2% 304146 ± 5% interrupts.CPU44.TLB:TLB_shootdowns
361396 ± 2% -28.8% 257434 ± 5% interrupts.CPU45.CAL:Function_call_interrupts
4716195 -15.5% 3984695 interrupts.CPU45.RES:Rescheduling_interrupts
401579 ± 2% -24.5% 303329 ± 5% interrupts.CPU45.TLB:TLB_shootdowns
363272 ± 2% -29.5% 256164 ± 6% interrupts.CPU46.CAL:Function_call_interrupts
4699566 ± 3% -17.0% 3901234 interrupts.CPU46.RES:Rescheduling_interrupts
401156 ± 2% -24.4% 303164 ± 5% interrupts.CPU46.TLB:TLB_shootdowns
361972 ± 3% -29.1% 256802 ± 6% interrupts.CPU47.CAL:Function_call_interrupts
4766764 -18.6% 3877850 ± 2% interrupts.CPU47.RES:Rescheduling_interrupts
400761 ± 2% -24.4% 302846 ± 5% interrupts.CPU47.TLB:TLB_shootdowns
364506 ± 2% -29.2% 258114 ± 6% interrupts.CPU48.CAL:Function_call_interrupts
4798800 -16.5% 4005147 ± 3% interrupts.CPU48.RES:Rescheduling_interrupts
401365 ± 2% -24.6% 302688 ± 5% interrupts.CPU48.TLB:TLB_shootdowns
361947 ± 2% -29.4% 255420 ± 7% interrupts.CPU49.CAL:Function_call_interrupts
4624642 -14.5% 3953814 interrupts.CPU49.RES:Rescheduling_interrupts
400829 ± 2% -24.6% 302279 ± 5% interrupts.CPU49.TLB:TLB_shootdowns
364071 ± 2% -28.5% 260141 ± 6% interrupts.CPU5.CAL:Function_call_interrupts
4007922 -15.6% 3382942 ± 2% interrupts.CPU5.RES:Rescheduling_interrupts
402119 ± 2% -24.0% 305434 ± 5% interrupts.CPU5.TLB:TLB_shootdowns
362288 ± 2% -29.4% 255899 ± 6% interrupts.CPU50.CAL:Function_call_interrupts
4738102 -18.4% 3868165 ± 2% interrupts.CPU50.RES:Rescheduling_interrupts
401224 ± 2% -24.7% 302304 ± 5% interrupts.CPU50.TLB:TLB_shootdowns
363859 ± 2% -29.2% 257565 ± 6% interrupts.CPU51.CAL:Function_call_interrupts
4612685 ± 2% -14.0% 3966588 interrupts.CPU51.RES:Rescheduling_interrupts
400993 ± 2% -24.4% 303235 ± 5% interrupts.CPU51.TLB:TLB_shootdowns
366026 ± 2% -28.5% 261570 ± 6% interrupts.CPU52.CAL:Function_call_interrupts
4044273 -15.4% 3420867 interrupts.CPU52.RES:Rescheduling_interrupts
402921 ± 2% -24.0% 306236 ± 5% interrupts.CPU52.TLB:TLB_shootdowns
365484 ± 2% -28.5% 261242 ± 6% interrupts.CPU53.CAL:Function_call_interrupts
5794 ± 20% +24.2% 7198 ± 5% interrupts.CPU53.NMI:Non-maskable_interrupts
5794 ± 20% +24.2% 7198 ± 5% interrupts.CPU53.PMI:Performance_monitoring_interrupts
4105600 ± 2% -17.6% 3382750 interrupts.CPU53.RES:Rescheduling_interrupts
402361 ± 2% -23.9% 306068 ± 5% interrupts.CPU53.TLB:TLB_shootdowns
367183 ± 2% -28.7% 261847 ± 7% interrupts.CPU54.CAL:Function_call_interrupts
5794 ± 20% +24.2% 7195 ± 5% interrupts.CPU54.NMI:Non-maskable_interrupts
5794 ± 20% +24.2% 7195 ± 5% interrupts.CPU54.PMI:Performance_monitoring_interrupts
402891 ± 2% -24.1% 305604 ± 5% interrupts.CPU54.TLB:TLB_shootdowns
365130 ± 2% -28.1% 262436 ± 6% interrupts.CPU55.CAL:Function_call_interrupts
4118197 -15.5% 3478406 ± 2% interrupts.CPU55.RES:Rescheduling_interrupts
401954 ± 2% -23.9% 305817 ± 5% interrupts.CPU55.TLB:TLB_shootdowns
363456 ± 2% -28.3% 260512 ± 6% interrupts.CPU56.CAL:Function_call_interrupts
4077237 -16.9% 3386640 ± 2% interrupts.CPU56.RES:Rescheduling_interrupts
402098 ± 2% -24.1% 305288 ± 5% interrupts.CPU56.TLB:TLB_shootdowns
364085 ± 2% -29.3% 257580 ± 7% interrupts.CPU57.CAL:Function_call_interrupts
3972566 -14.2% 3410238 ± 2% interrupts.CPU57.RES:Rescheduling_interrupts
402468 ± 2% -24.1% 305580 ± 5% interrupts.CPU57.TLB:TLB_shootdowns
364890 -28.4% 261302 ± 6% interrupts.CPU58.CAL:Function_call_interrupts
3995325 ± 3% -15.6% 3371100 ± 3% interrupts.CPU58.RES:Rescheduling_interrupts
402219 ± 2% -23.9% 305912 ± 5% interrupts.CPU58.TLB:TLB_shootdowns
366052 ± 2% -28.9% 260298 ± 6% interrupts.CPU59.CAL:Function_call_interrupts
4163187 -18.3% 3401390 ± 3% interrupts.CPU59.RES:Rescheduling_interrupts
402434 ± 2% -24.1% 305280 ± 5% interrupts.CPU59.TLB:TLB_shootdowns
362471 ± 2% -28.4% 259559 ± 6% interrupts.CPU6.CAL:Function_call_interrupts
3991865 ± 3% -16.3% 3339970 ± 4% interrupts.CPU6.RES:Rescheduling_interrupts
401545 ± 2% -23.9% 305440 ± 5% interrupts.CPU6.TLB:TLB_shootdowns
363207 ± 2% -28.2% 260693 ± 6% interrupts.CPU60.CAL:Function_call_interrupts
6717 ± 6% +7.2% 7197 ± 5% interrupts.CPU60.NMI:Non-maskable_interrupts
6717 ± 6% +7.2% 7197 ± 5% interrupts.CPU60.PMI:Performance_monitoring_interrupts
4177724 -19.6% 3360115 interrupts.CPU60.RES:Rescheduling_interrupts
401859 ± 2% -24.0% 305412 ± 5% interrupts.CPU60.TLB:TLB_shootdowns
365406 ± 2% -28.7% 260377 ± 6% interrupts.CPU61.CAL:Function_call_interrupts
4189363 -18.7% 3406194 interrupts.CPU61.RES:Rescheduling_interrupts
401933 ± 2% -24.1% 304927 ± 5% interrupts.CPU61.TLB:TLB_shootdowns
362105 ± 2% -27.9% 261095 ± 6% interrupts.CPU62.CAL:Function_call_interrupts
4058876 -15.8% 3416999 interrupts.CPU62.RES:Rescheduling_interrupts
402332 ± 2% -24.2% 304907 ± 5% interrupts.CPU62.TLB:TLB_shootdowns
362843 ± 3% -28.0% 261235 ± 7% interrupts.CPU63.CAL:Function_call_interrupts
4041319 -15.6% 3410234 interrupts.CPU63.RES:Rescheduling_interrupts
402782 ± 2% -24.0% 306038 ± 5% interrupts.CPU63.TLB:TLB_shootdowns
365454 ± 2% -28.2% 262553 ± 6% interrupts.CPU64.CAL:Function_call_interrupts
4029525 -15.4% 3408027 ± 2% interrupts.CPU64.RES:Rescheduling_interrupts
402503 ± 2% -24.0% 305886 ± 5% interrupts.CPU64.TLB:TLB_shootdowns
366139 ± 2% -28.3% 262579 ± 6% interrupts.CPU65.CAL:Function_call_interrupts
4142385 ± 2% -17.1% 3432752 interrupts.CPU65.RES:Rescheduling_interrupts
402261 ± 2% -23.9% 305949 ± 5% interrupts.CPU65.TLB:TLB_shootdowns
362165 ± 2% -27.6% 262282 ± 6% interrupts.CPU66.CAL:Function_call_interrupts
4065269 -13.4% 3521633 ± 4% interrupts.CPU66.RES:Rescheduling_interrupts
402652 ± 2% -23.9% 306554 ± 5% interrupts.CPU66.TLB:TLB_shootdowns
365763 ± 2% -28.6% 261064 ± 6% interrupts.CPU67.CAL:Function_call_interrupts
6720 ± 6% +7.1% 7198 ± 5% interrupts.CPU67.NMI:Non-maskable_interrupts
6720 ± 6% +7.1% 7198 ± 5% interrupts.CPU67.PMI:Performance_monitoring_interrupts
4164589 ± 2% -17.4% 3438395 ± 2% interrupts.CPU67.RES:Rescheduling_interrupts
402517 ± 2% -24.1% 305630 ± 5% interrupts.CPU67.TLB:TLB_shootdowns
365397 ± 2% -28.3% 262144 ± 6% interrupts.CPU68.CAL:Function_call_interrupts
4114611 -16.8% 3425141 ± 2% interrupts.CPU68.RES:Rescheduling_interrupts
401936 ± 2% -24.1% 305021 ± 5% interrupts.CPU68.TLB:TLB_shootdowns
362434 ± 2% -27.6% 262387 ± 6% interrupts.CPU69.CAL:Function_call_interrupts
4074671 -16.5% 3402472 ± 3% interrupts.CPU69.RES:Rescheduling_interrupts
402591 ± 2% -24.1% 305514 ± 5% interrupts.CPU69.TLB:TLB_shootdowns
364644 ± 2% -28.4% 261143 ± 6% interrupts.CPU7.CAL:Function_call_interrupts
4116996 -17.9% 3381707 ± 2% interrupts.CPU7.RES:Rescheduling_interrupts
402193 ± 2% -24.2% 304933 ± 5% interrupts.CPU7.TLB:TLB_shootdowns
365547 -28.4% 261733 ± 6% interrupts.CPU70.CAL:Function_call_interrupts
6717 ± 6% +7.2% 7198 ± 5% interrupts.CPU70.NMI:Non-maskable_interrupts
6717 ± 6% +7.2% 7198 ± 5% interrupts.CPU70.PMI:Performance_monitoring_interrupts
4024869 -16.4% 3364032 ± 2% interrupts.CPU70.RES:Rescheduling_interrupts
401922 ± 2% -23.9% 305821 ± 5% interrupts.CPU70.TLB:TLB_shootdowns
367754 ± 2% -28.7% 262144 ± 6% interrupts.CPU71.CAL:Function_call_interrupts
3987620 ± 2% -16.0% 3350193 ± 2% interrupts.CPU71.RES:Rescheduling_interrupts
402283 ± 2% -23.8% 306426 ± 5% interrupts.CPU71.TLB:TLB_shootdowns
364008 -28.1% 261558 ± 6% interrupts.CPU72.CAL:Function_call_interrupts
4151426 ± 3% -19.5% 3342555 ± 2% interrupts.CPU72.RES:Rescheduling_interrupts
401616 ± 2% -23.8% 306167 ± 5% interrupts.CPU72.TLB:TLB_shootdowns
366580 ± 2% -28.5% 262000 ± 6% interrupts.CPU73.CAL:Function_call_interrupts
4105731 -17.7% 3378751 interrupts.CPU73.RES:Rescheduling_interrupts
402076 ± 2% -23.9% 305974 ± 5% interrupts.CPU73.TLB:TLB_shootdowns
362358 ± 2% -27.8% 261492 ± 6% interrupts.CPU74.CAL:Function_call_interrupts
4112156 -17.0% 3412501 interrupts.CPU74.RES:Rescheduling_interrupts
402087 ± 2% -24.0% 305510 ± 5% interrupts.CPU74.TLB:TLB_shootdowns
365257 ± 3% -28.4% 261566 ± 6% interrupts.CPU75.CAL:Function_call_interrupts
4156897 -16.4% 3473919 interrupts.CPU75.RES:Rescheduling_interrupts
402476 ± 2% -24.1% 305396 ± 5% interrupts.CPU75.TLB:TLB_shootdowns
366429 ± 2% -28.4% 262310 ± 6% interrupts.CPU76.CAL:Function_call_interrupts
4076843 -16.4% 3408530 interrupts.CPU76.RES:Rescheduling_interrupts
401974 ± 2% -24.1% 305081 ± 5% interrupts.CPU76.TLB:TLB_shootdowns
366844 ± 2% -28.3% 262932 ± 6% interrupts.CPU77.CAL:Function_call_interrupts
4010462 ± 3% -15.5% 3388336 ± 3% interrupts.CPU77.RES:Rescheduling_interrupts
402135 ± 2% -24.1% 305322 ± 5% interrupts.CPU77.TLB:TLB_shootdowns
364781 ± 2% -29.0% 259058 ± 5% interrupts.CPU78.CAL:Function_call_interrupts
4633481 ± 2% -14.8% 3946940 interrupts.CPU78.RES:Rescheduling_interrupts
401686 ± 2% -24.4% 303531 ± 5% interrupts.CPU78.TLB:TLB_shootdowns
364064 ± 2% -29.3% 257370 ± 7% interrupts.CPU79.CAL:Function_call_interrupts
4690111 ± 2% -15.3% 3971419 ± 4% interrupts.CPU79.RES:Rescheduling_interrupts
401747 ± 2% -24.6% 303072 ± 5% interrupts.CPU79.TLB:TLB_shootdowns
358147 -27.3% 260514 ± 6% interrupts.CPU8.CAL:Function_call_interrupts
4137074 -17.9% 3397095 interrupts.CPU8.RES:Rescheduling_interrupts
402009 ± 2% -24.2% 304726 ± 5% interrupts.CPU8.TLB:TLB_shootdowns
363027 ± 2% -28.7% 258810 ± 5% interrupts.CPU80.CAL:Function_call_interrupts
4761080 ± 5% -16.4% 3982078 ± 3% interrupts.CPU80.RES:Rescheduling_interrupts
400095 ± 2% -24.3% 302916 ± 5% interrupts.CPU80.TLB:TLB_shootdowns
364929 ± 2% -29.0% 259175 ± 6% interrupts.CPU81.CAL:Function_call_interrupts
4819494 -16.9% 4005722 interrupts.CPU81.RES:Rescheduling_interrupts
400609 ± 2% -24.2% 303544 ± 5% interrupts.CPU81.TLB:TLB_shootdowns
365233 ± 2% -28.8% 260116 ± 6% interrupts.CPU82.CAL:Function_call_interrupts
4701930 ± 2% -15.1% 3990022 interrupts.CPU82.RES:Rescheduling_interrupts
401341 ± 2% -24.4% 303514 ± 5% interrupts.CPU82.TLB:TLB_shootdowns
363566 ± 2% -28.3% 260684 ± 6% interrupts.CPU83.CAL:Function_call_interrupts
4658814 -15.5% 3934999 interrupts.CPU83.RES:Rescheduling_interrupts
400595 ± 2% -24.2% 303545 ± 5% interrupts.CPU83.TLB:TLB_shootdowns
363499 ± 2% -28.8% 258882 ± 6% interrupts.CPU84.CAL:Function_call_interrupts
4774074 -18.8% 3878251 interrupts.CPU84.RES:Rescheduling_interrupts
401133 ± 2% -24.4% 303077 ± 5% interrupts.CPU84.TLB:TLB_shootdowns
365822 ± 2% -28.9% 259979 ± 6% interrupts.CPU85.CAL:Function_call_interrupts
4694622 -16.3% 3929614 interrupts.CPU85.RES:Rescheduling_interrupts
400559 ± 2% -24.2% 303600 ± 5% interrupts.CPU85.TLB:TLB_shootdowns
364803 ± 2% -28.5% 260828 ± 6% interrupts.CPU86.CAL:Function_call_interrupts
4699440 ± 2% -16.0% 3948788 ± 2% interrupts.CPU86.RES:Rescheduling_interrupts
401231 ± 2% -24.6% 302715 ± 5% interrupts.CPU86.TLB:TLB_shootdowns
365408 ± 2% -29.0% 259486 ± 6% interrupts.CPU87.CAL:Function_call_interrupts
4719360 ± 2% -14.6% 4031045 ± 2% interrupts.CPU87.RES:Rescheduling_interrupts
400941 ± 2% -24.4% 303180 ± 5% interrupts.CPU87.TLB:TLB_shootdowns
366100 -29.0% 259945 ± 6% interrupts.CPU88.CAL:Function_call_interrupts
4783458 -16.7% 3983894 ± 2% interrupts.CPU88.RES:Rescheduling_interrupts
400598 ± 2% -24.4% 302993 ± 5% interrupts.CPU88.TLB:TLB_shootdowns
365997 ± 2% -28.9% 260407 ± 6% interrupts.CPU89.CAL:Function_call_interrupts
4673936 ± 2% -17.1% 3874648 ± 2% interrupts.CPU89.RES:Rescheduling_interrupts
401813 ± 2% -24.5% 303212 ± 5% interrupts.CPU89.TLB:TLB_shootdowns
361760 ± 2% -28.4% 258910 ± 6% interrupts.CPU9.CAL:Function_call_interrupts
4193690 -20.1% 3349024 ± 2% interrupts.CPU9.RES:Rescheduling_interrupts
401659 ± 2% -24.0% 305176 ± 5% interrupts.CPU9.TLB:TLB_shootdowns
365405 ± 2% -28.9% 259926 ± 6% interrupts.CPU90.CAL:Function_call_interrupts
4632041 ± 3% -15.8% 3899831 interrupts.CPU90.RES:Rescheduling_interrupts
401536 ± 2% -24.7% 302268 ± 5% interrupts.CPU90.TLB:TLB_shootdowns
368348 ± 2% -30.0% 257696 ± 7% interrupts.CPU91.CAL:Function_call_interrupts
4699415 -16.8% 3910344 ± 2% interrupts.CPU91.RES:Rescheduling_interrupts
401058 ± 2% -24.6% 302375 ± 5% interrupts.CPU91.TLB:TLB_shootdowns
365593 ± 2% -29.5% 257642 ± 6% interrupts.CPU92.CAL:Function_call_interrupts
4874524 ± 3% -18.7% 3962744 interrupts.CPU92.RES:Rescheduling_interrupts
400768 ± 2% -24.4% 303132 ± 5% interrupts.CPU92.TLB:TLB_shootdowns
369748 ± 2% -29.9% 259219 ± 6% interrupts.CPU93.CAL:Function_call_interrupts
4811191 ± 3% -16.5% 4018539 interrupts.CPU93.RES:Rescheduling_interrupts
401210 ± 2% -24.4% 303151 ± 5% interrupts.CPU93.TLB:TLB_shootdowns
365854 ± 2% -28.7% 260957 ± 6% interrupts.CPU94.CAL:Function_call_interrupts
4725535 ± 3% -15.0% 4014538 interrupts.CPU94.RES:Rescheduling_interrupts
401389 ± 2% -24.5% 303183 ± 5% interrupts.CPU94.TLB:TLB_shootdowns
365697 ± 3% -29.0% 259601 ± 6% interrupts.CPU95.CAL:Function_call_interrupts
4701659 -15.3% 3983254 interrupts.CPU95.RES:Rescheduling_interrupts
401162 ± 2% -24.4% 303321 ± 5% interrupts.CPU95.TLB:TLB_shootdowns
367333 ± 2% -29.0% 260898 ± 6% interrupts.CPU96.CAL:Function_call_interrupts
4699350 ± 2% -16.3% 3931906 interrupts.CPU96.RES:Rescheduling_interrupts
401090 ± 2% -24.3% 303573 ± 5% interrupts.CPU96.TLB:TLB_shootdowns
367997 ± 2% -29.0% 261366 ± 6% interrupts.CPU97.CAL:Function_call_interrupts
4742816 -16.2% 3974366 ± 2% interrupts.CPU97.RES:Rescheduling_interrupts
401878 ± 2% -24.4% 303776 ± 5% interrupts.CPU97.TLB:TLB_shootdowns
365973 ± 3% -29.6% 257706 ± 6% interrupts.CPU98.CAL:Function_call_interrupts
4704134 ± 3% -16.1% 3946205 ± 2% interrupts.CPU98.RES:Rescheduling_interrupts
401550 ± 2% -24.7% 302447 ± 5% interrupts.CPU98.TLB:TLB_shootdowns
364681 ± 2% -28.5% 260792 ± 6% interrupts.CPU99.CAL:Function_call_interrupts
4789912 -18.9% 3883947 ± 3% interrupts.CPU99.RES:Rescheduling_interrupts
400741 ± 2% -24.3% 303226 ± 5% interrupts.CPU99.TLB:TLB_shootdowns
208.00 -50.0% 104.00 interrupts.MCP:Machine_check_polls
4.569e+08 -16.3% 3.824e+08 interrupts.RES:Rescheduling_interrupts
41767734 ± 2% -24.3% 31635153 ± 5% interrupts.TLB:TLB_shootdowns



hackbench.throughput

310000 +------------------------------------------------------------------+
| O OO O OO O O O OO O O |
300000 |-OO OO OO O O O |
| |
| |
290000 |-+ |
| .+ |
280000 |-+ +.++. + +.++.++.+ +. |
| : ++.+.+: : ++. .+ +.|
270000 |-+ : : : + +.+ |
| : : : |
| + : : |
260000 |.++.+ :+.+.++.++. .++.++.++ |
| + + + |
250000 +------------------------------------------------------------------+


hackbench.workload

1.9e+09 +----------------------------------------------------------------+
| OO OO OO OO OO O OO OO OO OO O OO |
1.85e+09 |-+ |
| |
| |
1.8e+09 |-+ |
| |
1.75e+09 |-+ ++.++.++.+.+ ++.++.+.++.++.++.++.++.|
| : : : |
1.7e+09 |-+ : : : |
| : : : |
| : : : |
1.65e+09 |-+ : : : |
|.++.++ +.++.++.++.+.++.++.++ |
1.6e+09 +----------------------------------------------------------------+


hackbench.time.user_time

16000 +-------------------------------------------------------------------+
| OO OO O OO O OO OO O OO O OO OO O O |
15800 |-+ |
15600 |-+ |
| |
15400 |-+ |
15200 |-+ |
| +.+.++.+ ++. .++.+ .+.++.+.++.++.|
15000 |-+ +.+ + : + + |
14800 |-+ : : : |
| : : : |
14600 |-+ : : : |
14400 |.+ .+: : : |
| + + +.++.+.++.++.+.++.++.+ |
14200 +-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (84.84 kB)
config-5.6.0-rc4-00108-g15e7470dfc3d0 (206.92 kB)
job-script (7.33 kB)
job.yaml (5.06 kB)
reproduce (1.41 kB)
Download all attachments