LinuxLists.cc - [PATCH 0/4] Throttle select_idle_sibling when a target domain is overloaded

2020-03-20 15:13:43

Subject: [PATCH 0/4] Throttle select_idle_sibling when a target domain is overloaded

This is a follow-on from the CPU/NUMA load balancer reconcilation
after I noticed that select_idle_sibling() was doing excessive work. It
was originally part of a larger series that merged select_idle_core,
select_idle_sibling and select_idle_cpu as a single pass. Unfortunately,
fixes have invalidated the tests multiple times so this series covers
only one part for now as the tests are extremely time-consuming.

tip/sched/core as of March 13th was used as the baseline with "sched/fair:
fix condition of avg_load calculation" applied which was just picked up
by tip at the time of writing.

Patches 1-2 add schedstats to track the efficiency of
select_idle_sibling(). Ordinarily they are disabled and are only really
of use to a kernel developer. However, I find them more practical to work
with than perf.

Patch 3 is a trivial micro-optimisation that avoids clearing part of
a cpumask if a core has been found.

Patch 4 tracks whether a domain appeared to be overloaded during
select_idle_cpu() so that future scans can abort early if necessary.
This reduces the number of runqueues that are scanned uselessly when
a domain is overloaded.

include/linux/sched/topology.h | 1 +
kernel/sched/debug.c | 6 +++
kernel/sched/fair.c | 103 +++++++++++++++++++++++++++++++++++------
kernel/sched/features.h | 3 ++
kernel/sched/sched.h | 8 ++++
kernel/sched/stats.c | 9 ++--
6 files changed, 113 insertions(+), 17 deletions(-)

--
2.16.4

2020-03-20 15:14:18

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 1/4] sched/fair: Track efficiency of select_idle_sibling

select_idle_sibling is an important path that finds a nearby idle CPU on
wakeup. As it is examining other CPUs state, it can be expensive in terms
of cache usage. This patch tracks the search efficiency if schedstats
are enabled. In general, this is only useful for kernel developers but
schedstats are typically disabled by default so it is convenient for
development and mostly free otherwise.

The series can be done without this patch but the stats were used to
generate a number of useful metrics in mmtest to analyse what was
going on.

SIS Search: Number of calls to select_idle_sibling

SIS Domain Search: Number of times the domain was searched because the
fast path failed.

SIS Scanned: Generally the number of runqueues scanned but the fast
path counts as 1 regardless of the values for target, prev
and recent.

SIS Domain Scanned: Number of runqueues scanned during a search of the
LLC domain.

SIS Failures: Number of SIS calls that failed to find an idle CPU

SIS Search Efficiency: A ratio expressed as a percentage of runqueues
scanned versus idle CPUs found. A 100% efficiency indicates that
the target, prev or recent CPU of a task was idle at wakeup. The
lower the efficiency, the more runqueues were scanned before an
idle CPU was found.

SIS Domain Search Efficiency: Similar, except only for the slower SIS
patch.

SIS Fast Success Rate: Percentage of SIS that used target, prev or
recent CPUs.

SIS Success rate: Percentage of scans that found an idle CPU.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/debug.c | 4 ++++
kernel/sched/fair.c | 14 ++++++++++++++
kernel/sched/sched.h | 6 ++++++
kernel/sched/stats.c | 8 +++++---
4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 8331bc04aea2..7af6e8a12f40 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -661,6 +661,10 @@ do { \
P(sched_goidle);
P(ttwu_count);
P(ttwu_local);
+ P(sis_search);
+ P(sis_domain_search);
+ P(sis_scanned);
+ P(sis_failed);
}
#undef P

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dea8554ead0..9d32a81ece08 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5999,6 +5999,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
bool idle = true;

for_each_cpu(cpu, cpu_smt_mask(core)) {
+ schedstat_inc(this_rq()->sis_scanned);
if (!available_idle_cpu(cpu)) {
idle = false;
break;
@@ -6029,6 +6030,7 @@ static int select_idle_smt(struct task_struct *p, int target)
return -1;

for_each_cpu(cpu, cpu_smt_mask(target)) {
+ schedstat_inc(this_rq()->sis_scanned);
if (!cpumask_test_cpu(cpu, p->cpus_ptr))
continue;
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
@@ -6094,6 +6096,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);

for_each_cpu_wrap(cpu, cpus, target) {
+ schedstat_inc(this_rq()->sis_scanned);
if (!--nr)
return -1;
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
@@ -6150,6 +6153,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
struct sched_domain *sd;
int i, recent_used_cpu;

+ schedstat_inc(this_rq()->sis_search);
+
+ /*
+ * Checking if prev, target and recent is treated as one scan. A
+ * perfect hit on one of those is considered 100% efficiency.
+ * Further scanning impairs efficiency.
+ */
+ schedstat_inc(this_rq()->sis_scanned);
+
/*
* For asymmetric CPU capacity systems, our domain of interest is
* sd_asym_cpucapacity rather than sd_llc.
@@ -6215,6 +6227,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if (!sd)
return target;

+ schedstat_inc(this_rq()->sis_domain_search);
i = select_idle_core(p, sd, target);
if ((unsigned)i < nr_cpumask_bits)
return i;
@@ -6227,6 +6240,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if ((unsigned)i < nr_cpumask_bits)
return i;

+ schedstat_inc(this_rq()->sis_failed);
return target;
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9e173fad0425..ff23cd92d25f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1008,6 +1008,12 @@ struct rq {
/* try_to_wake_up() stats */
unsigned int ttwu_count;
unsigned int ttwu_local;
+
+ /* select_idle_sibling stats */
+ unsigned int sis_search;
+ unsigned int sis_domain_search;
+ unsigned int sis_scanned;
+ unsigned int sis_failed;
#endif

#ifdef CONFIG_SMP
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 750fb3c67eed..390bfcc3842c 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -10,7 +10,7 @@
* Bump this up when changing the output format or the meaning of an existing
* format, so that tools can adapt (or abort)
*/
-#define SCHEDSTAT_VERSION 15
+#define SCHEDSTAT_VERSION 16

static int show_schedstat(struct seq_file *seq, void *v)
{
@@ -30,12 +30,14 @@ static int show_schedstat(struct seq_file *seq, void *v)

/* runqueue-specific stats */
seq_printf(seq,
- "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+ "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u",
cpu, rq->yld_count,
rq->sched_count, rq->sched_goidle,
rq->ttwu_count, rq->ttwu_local,
rq->rq_cpu_time,
- rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+ rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+ rq->sis_search, rq->sis_domain_search,
+ rq->sis_scanned, rq->sis_failed);

seq_printf(seq, "\n");

--
2.16.4

2020-03-20 15:14:21

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 2/4] sched/fair: Track efficiency of task recent_used_cpu

This simply tracks the efficiency of p->recent_used_cpu. The hit rate of
this matters as it can avoid a domain search. Similarly, the miss rate
matters because each miss is a penalty to the fast path. MMTests uses
this to generate additional metrics

SIS Recent Used Hit: A recent CPU was eligible and used. Each hit is
a domain search avoided.

SIS Recent Used Miss: A recent CPU was eligible but unavailable. Each
time this is hit, there was a penalty to the fast path before
a domain search happened.

SIS Recent Success Rate: A percentage of the number of hits versus
the total attempts to use the recent CPU.

SIS Recent Attempts: The total number of times the recent CPU was examined.
A high number of Recent Attempts with a low Success Rate implies
the fast path is being punished severely. This could have been
presented as a weighting of hits and misses but calculating an
appropriate weight for misses is problematic.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/debug.c | 2 ++
kernel/sched/fair.c | 21 ++++++++++++---------
kernel/sched/sched.h | 2 ++
kernel/sched/stats.c | 7 ++++---
4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 7af6e8a12f40..d58f333bb739 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -665,6 +665,8 @@ do { \
P(sis_domain_search);
P(sis_scanned);
P(sis_failed);
+ P(sis_recent_hit);
+ P(sis_recent_miss);
}
#undef P

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d32a81ece08..7f4356c520be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6212,15 +6212,18 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
recent_used_cpu = p->recent_used_cpu;
if (recent_used_cpu != prev &&
recent_used_cpu != target &&
- cpus_share_cache(recent_used_cpu, target) &&
- (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
- cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
- /*
- * Replace recent_used_cpu with prev as it is a potential
- * candidate for the next wake:
- */
- p->recent_used_cpu = prev;
- return recent_used_cpu;
+ cpus_share_cache(recent_used_cpu, target)) {
+ if ((available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
+ cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
+ /*
+ * Replace recent_used_cpu with prev as it is a potential
+ * candidate for the next wake:
+ */
+ p->recent_used_cpu = prev;
+ schedstat_inc(this_rq()->sis_recent_hit);
+ return recent_used_cpu;
+ }
+ schedstat_inc(this_rq()->sis_recent_miss);
}

sd = rcu_dereference(per_cpu(sd_llc, target));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ff23cd92d25f..3cb7de0b11d7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1014,6 +1014,8 @@ struct rq {
unsigned int sis_domain_search;
unsigned int sis_scanned;
unsigned int sis_failed;
+ unsigned int sis_recent_hit;
+ unsigned int sis_recent_miss;
#endif

#ifdef CONFIG_SMP
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 390bfcc3842c..402fab75aa14 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -10,7 +10,7 @@
* Bump this up when changing the output format or the meaning of an existing
* format, so that tools can adapt (or abort)
*/
-#define SCHEDSTAT_VERSION 16
+#define SCHEDSTAT_VERSION 17

static int show_schedstat(struct seq_file *seq, void *v)
{
@@ -30,14 +30,15 @@ static int show_schedstat(struct seq_file *seq, void *v)

/* runqueue-specific stats */
seq_printf(seq,
- "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u",
+ "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u %u %u",
cpu, rq->yld_count,
rq->sched_count, rq->sched_goidle,
rq->ttwu_count, rq->ttwu_local,
rq->rq_cpu_time,
rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
rq->sis_search, rq->sis_domain_search,
- rq->sis_scanned, rq->sis_failed);
+ rq->sis_scanned, rq->sis_failed,
+ rq->sis_recent_hit, rq->sis_recent_miss);

seq_printf(seq, "\n");

--
2.16.4

2020-03-20 15:14:53

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

Once a domain is overloaded, it is very unlikely that a free CPU will
be found in the short term but there is still potentially a lot of
scanning. This patch tracks if a domain may be overloaded due to an
excessive number of running tasks relative to available CPUs. In the
event a domain is overloaded, a search is aborted.

This has a variable impact on performance for hackbench which often
is overloaded on the test machines used. There was a mix of performance gains
and losses but there is a substantial impact on search efficiency.

On a 2-socket broadwell machine with 80 cores in total, tbench showed
small gains and some losses

Hmean 1 431.51 ( 0.00%) 426.53 * -1.15%*
Hmean 2 842.69 ( 0.00%) 839.00 * -0.44%*
Hmean 4 1631.09 ( 0.00%) 1634.81 * 0.23%*
Hmean 8 3001.08 ( 0.00%) 3020.85 * 0.66%*
Hmean 16 5631.75 ( 0.00%) 5655.04 * 0.41%*
Hmean 32 9736.22 ( 0.00%) 9645.68 * -0.93%*
Hmean 64 13978.54 ( 0.00%) 15215.65 * 8.85%*
Hmean 128 20093.06 ( 0.00%) 19389.45 * -3.50%*
Hmean 256 17491.34 ( 0.00%) 18616.32 * 6.43%*
Hmean 320 17423.67 ( 0.00%) 17793.38 * 2.12%*

However, the "SIS Domain Search Efficiency" went from 6.03% to 19.61%
indicating that far fewer CPUs were scanned. The impact of the patch
is more noticable when sockets have multiple L3 caches. While true for
EPYC 2nd generation, it's particularly noticable on EPYC 1st generation

Hmean 1 325.30 ( 0.00%) 324.92 * -0.12%*
Hmean 2 630.77 ( 0.00%) 621.35 * -1.49%*
Hmean 4 1211.41 ( 0.00%) 1148.51 * -5.19%*
Hmean 8 2017.29 ( 0.00%) 1953.57 * -3.16%*
Hmean 16 4068.81 ( 0.00%) 3514.06 * -13.63%*
Hmean 32 5588.20 ( 0.00%) 6583.58 * 17.81%*
Hmean 64 8470.14 ( 0.00%) 10117.26 * 19.45%*
Hmean 128 11462.06 ( 0.00%) 17207.68 * 50.13%*
Hmean 256 11433.74 ( 0.00%) 13446.93 * 17.61%*
Hmean 512 12576.88 ( 0.00%) 13630.08 * 8.37%*

On this machine, search efficiency goes from 21.04% to 32.66%. There
is a noticable problem at 16 when there are enough clients for a LLC
domain to spill over.

With hackbench, the overload problem is a bit more obvious. On the
2-socket broadwell machine using processes and pipes we see

Amean 1 0.3023 ( 0.00%) 0.2893 ( 4.30%)
Amean 4 0.6823 ( 0.00%) 0.6930 ( -1.56%)
Amean 7 1.0293 ( 0.00%) 1.0380 ( -0.84%)
Amean 12 1.6913 ( 0.00%) 1.7027 ( -0.67%)
Amean 21 2.9307 ( 0.00%) 2.9297 ( 0.03%)
Amean 30 4.0040 ( 0.00%) 4.0270 ( -0.57%)
Amean 48 6.0703 ( 0.00%) 6.1067 ( -0.60%)
Amean 79 9.0630 ( 0.00%) 9.1223 * -0.65%*
Amean 110 12.1917 ( 0.00%) 12.1693 ( 0.18%)
Amean 141 15.7150 ( 0.00%) 15.4187 ( 1.89%)
Amean 172 19.5327 ( 0.00%) 18.9937 ( 2.76%)
Amean 203 23.3093 ( 0.00%) 22.2497 * 4.55%*
Amean 234 27.8657 ( 0.00%) 25.9627 * 6.83%*
Amean 265 32.9783 ( 0.00%) 29.5240 * 10.47%*
Amean 296 35.6727 ( 0.00%) 32.8260 * 7.98%*

More of the SIS stats are worth looking at in this case

Ops SIS Domain Search 10390526707.00 9822163508.00
Ops SIS Scanned 223173467577.00 48330226094.00
Ops SIS Domain Scanned 222820381314.00 47964114165.00
Ops SIS Failures 10183794873.00 9639912418.00
Ops SIS Recent Used Hit 22194515.00 22517194.00
Ops SIS Recent Used Miss 5733847634.00 5500415074.00
Ops SIS Recent Attempts 5756042149.00 5522932268.00
Ops SIS Search Efficiency 4.81 21.08

Search efficiency goes from 4.66% to 20.48% but the SIS Domain Scanned
shows the sheer volume of searching SIS does when prev, target and recent
CPUs are unavailable.

This could be much more aggressive by also cutting off a search for idle
cores. However, to make that work properly requires a much more intrusive
series that is likely to be controversial. This seemed like a reasonable
tradeoff to tackle the most obvious problem with select_idle_cpu.

Signed-off-by: Mel Gorman <[email protected]>
---
include/linux/sched/topology.h | 1 +
kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
kernel/sched/features.h | 3 ++
3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index af9319e4cfb9..76ec7a54f57b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -66,6 +66,7 @@ struct sched_domain_shared {
atomic_t ref;
atomic_t nr_busy_cpus;
int has_idle_cores;
+ int is_overloaded;
};

struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 41913fac68de..31e011e627db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5924,6 +5924,38 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
return new_cpu;
}

+static inline void
+set_sd_overloaded(struct sched_domain_shared *sds, int val)
+{
+ if (!sds)
+ return;
+
+ WRITE_ONCE(sds->is_overloaded, val);
+}
+
+static inline bool test_sd_overloaded(struct sched_domain_shared *sds)
+{
+ return READ_ONCE(sds->is_overloaded);
+}
+
+/* Returns true if a previously overloaded domain is likely still overloaded. */
+static inline bool
+abort_sd_overloaded(struct sched_domain_shared *sds, int prev, int target)
+{
+ if (!sds || !test_sd_overloaded(sds))
+ return false;
+
+ /* Are either target or a suitable prev 1 or 0 tasks? */
+ if (cpu_rq(target)->nr_running <= 1 ||
+ (prev != target && cpus_share_cache(prev, target) &&
+ cpu_rq(prev)->nr_running <= 1)) {
+ set_sd_overloaded(sds, 0);
+ return false;
+ }
+
+ return true;
+}
+
#ifdef CONFIG_SCHED_SMT
DEFINE_STATIC_KEY_FALSE(sched_smt_present);
EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6060,15 +6092,18 @@ static inline int select_idle_smt(struct task_struct *p, int target)
* comparing the average scan cost (tracked in sd->avg_scan_cost) against the
* average idle time for this rq (as found in rq->avg_idle).
*/
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
+ int prev, int target)
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
struct sched_domain *this_sd;
+ struct sched_domain_shared *sds;
u64 avg_cost, avg_idle;
u64 time, cost;
s64 delta;
int this = smp_processor_id();
int cpu, nr = INT_MAX;
+ int nr_scanned = 0, nr_running = 0;

this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
@@ -6092,18 +6127,40 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
nr = 4;
}

+ sds = rcu_dereference(per_cpu(sd_llc_shared, target));
+ if (sched_feat(SIS_OVERLOAD)) {
+ if (abort_sd_overloaded(sds, prev, target))
+ return -1;
+ }
+
time = cpu_clock(this);

cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);

for_each_cpu_wrap(cpu, cpus, target) {
schedstat_inc(this_rq()->sis_scanned);
- if (!--nr)
- return -1;
+ if (!--nr) {
+ cpu = -1;
+ break;
+ }
if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
break;
+ if (sched_feat(SIS_OVERLOAD)) {
+ nr_scanned++;
+ nr_running += cpu_rq(cpu)->nr_running;
+ }
}

+ /* Check if domain should be marked overloaded if no cpu was found. */
+ if (sched_feat(SIS_OVERLOAD) && (signed)cpu >= nr_cpumask_bits &&
+ nr_scanned && nr_running > (nr_scanned << 1)) {
+ set_sd_overloaded(sds, 1);
+ }
+
+ /* Scan cost not accounted for if scan is throttled */
+ if (!nr)
+ return -1;
+
time = cpu_clock(this) - time;
cost = this_sd->avg_scan_cost;
delta = (s64)(time - cost) / 8;
@@ -6236,7 +6293,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if ((unsigned)i < nr_cpumask_bits)
return i;

- i = select_idle_cpu(p, sd, target);
+ i = select_idle_cpu(p, sd, prev, target);
if ((unsigned)i < nr_cpumask_bits)
return i;

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7481cd96f391..c36ae01910e2 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -57,6 +57,9 @@ SCHED_FEAT(TTWU_QUEUE, true)
SCHED_FEAT(SIS_AVG_CPU, false)
SCHED_FEAT(SIS_PROP, true)

+/* Limit scans if the domain is likely overloaded */
+SCHED_FEAT(SIS_OVERLOAD, true)
+
/*
* Issue a WARN when we do multiple update_rq_clock() calls
* in a single rq->lock section. Default disabled because the
--
2.16.4

2020-03-20 15:15:44

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 3/4] sched/fair: Clear SMT siblings after determining the core is not idle

The clearing of SMT siblings from the SIS mask before checking for an idle
core is a small but unnecessary cost. Defer the clearing of the siblings
until the scan moves to the next potential target. The cost of this was
not measured as it is borderline noise but it should be self-evident.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f4356c520be..41913fac68de 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6005,10 +6005,11 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
break;
}
}
- cpumask_andnot(cpus, cpus, cpu_smt_mask(core));

if (idle)
return core;
+
+ cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
}

/*
--
2.16.4

2020-03-20 15:50:29

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, 20 Mar 2020 at 16:13, Mel Gorman <[email protected]> wrote:
>
> Once a domain is overloaded, it is very unlikely that a free CPU will
> be found in the short term but there is still potentially a lot of
> scanning. This patch tracks if a domain may be overloaded due to an
> excessive number of running tasks relative to available CPUs. In the
> event a domain is overloaded, a search is aborted.
>
> This has a variable impact on performance for hackbench which often
> is overloaded on the test machines used. There was a mix of performance gains
> and losses but there is a substantial impact on search efficiency.
>
> On a 2-socket broadwell machine with 80 cores in total, tbench showed
> small gains and some losses
>
> Hmean 1 431.51 ( 0.00%) 426.53 * -1.15%*
> Hmean 2 842.69 ( 0.00%) 839.00 * -0.44%*
> Hmean 4 1631.09 ( 0.00%) 1634.81 * 0.23%*
> Hmean 8 3001.08 ( 0.00%) 3020.85 * 0.66%*
> Hmean 16 5631.75 ( 0.00%) 5655.04 * 0.41%*
> Hmean 32 9736.22 ( 0.00%) 9645.68 * -0.93%*
> Hmean 64 13978.54 ( 0.00%) 15215.65 * 8.85%*
> Hmean 128 20093.06 ( 0.00%) 19389.45 * -3.50%*
> Hmean 256 17491.34 ( 0.00%) 18616.32 * 6.43%*
> Hmean 320 17423.67 ( 0.00%) 17793.38 * 2.12%*
>
> However, the "SIS Domain Search Efficiency" went from 6.03% to 19.61%
> indicating that far fewer CPUs were scanned. The impact of the patch
> is more noticable when sockets have multiple L3 caches. While true for
> EPYC 2nd generation, it's particularly noticable on EPYC 1st generation
>
> Hmean 1 325.30 ( 0.00%) 324.92 * -0.12%*
> Hmean 2 630.77 ( 0.00%) 621.35 * -1.49%*
> Hmean 4 1211.41 ( 0.00%) 1148.51 * -5.19%*
> Hmean 8 2017.29 ( 0.00%) 1953.57 * -3.16%*
> Hmean 16 4068.81 ( 0.00%) 3514.06 * -13.63%*
> Hmean 32 5588.20 ( 0.00%) 6583.58 * 17.81%*
> Hmean 64 8470.14 ( 0.00%) 10117.26 * 19.45%*
> Hmean 128 11462.06 ( 0.00%) 17207.68 * 50.13%*
> Hmean 256 11433.74 ( 0.00%) 13446.93 * 17.61%*
> Hmean 512 12576.88 ( 0.00%) 13630.08 * 8.37%*
>
> On this machine, search efficiency goes from 21.04% to 32.66%. There
> is a noticable problem at 16 when there are enough clients for a LLC
> domain to spill over.
>
> With hackbench, the overload problem is a bit more obvious. On the
> 2-socket broadwell machine using processes and pipes we see
>
> Amean 1 0.3023 ( 0.00%) 0.2893 ( 4.30%)
> Amean 4 0.6823 ( 0.00%) 0.6930 ( -1.56%)
> Amean 7 1.0293 ( 0.00%) 1.0380 ( -0.84%)
> Amean 12 1.6913 ( 0.00%) 1.7027 ( -0.67%)
> Amean 21 2.9307 ( 0.00%) 2.9297 ( 0.03%)
> Amean 30 4.0040 ( 0.00%) 4.0270 ( -0.57%)
> Amean 48 6.0703 ( 0.00%) 6.1067 ( -0.60%)
> Amean 79 9.0630 ( 0.00%) 9.1223 * -0.65%*
> Amean 110 12.1917 ( 0.00%) 12.1693 ( 0.18%)
> Amean 141 15.7150 ( 0.00%) 15.4187 ( 1.89%)
> Amean 172 19.5327 ( 0.00%) 18.9937 ( 2.76%)
> Amean 203 23.3093 ( 0.00%) 22.2497 * 4.55%*
> Amean 234 27.8657 ( 0.00%) 25.9627 * 6.83%*
> Amean 265 32.9783 ( 0.00%) 29.5240 * 10.47%*
> Amean 296 35.6727 ( 0.00%) 32.8260 * 7.98%*
>
> More of the SIS stats are worth looking at in this case
>
> Ops SIS Domain Search 10390526707.00 9822163508.00
> Ops SIS Scanned 223173467577.00 48330226094.00
> Ops SIS Domain Scanned 222820381314.00 47964114165.00
> Ops SIS Failures 10183794873.00 9639912418.00
> Ops SIS Recent Used Hit 22194515.00 22517194.00
> Ops SIS Recent Used Miss 5733847634.00 5500415074.00
> Ops SIS Recent Attempts 5756042149.00 5522932268.00
> Ops SIS Search Efficiency 4.81 21.08
>
> Search efficiency goes from 4.66% to 20.48% but the SIS Domain Scanned
> shows the sheer volume of searching SIS does when prev, target and recent
> CPUs are unavailable.
>
> This could be much more aggressive by also cutting off a search for idle
> cores. However, to make that work properly requires a much more intrusive
> series that is likely to be controversial. This seemed like a reasonable
> tradeoff to tackle the most obvious problem with select_idle_cpu.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> include/linux/sched/topology.h | 1 +
> kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> kernel/sched/features.h | 3 ++
> 3 files changed, 65 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index af9319e4cfb9..76ec7a54f57b 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -66,6 +66,7 @@ struct sched_domain_shared {
> atomic_t ref;
> atomic_t nr_busy_cpus;
> int has_idle_cores;
> + int is_overloaded;

Can't nr_busy_cpus compared to sd->span_weight give you similar status ?

> };
>
> struct sched_domain {
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 41913fac68de..31e011e627db 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5924,6 +5924,38 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
> return new_cpu;
> }
>
> +static inline void
> +set_sd_overloaded(struct sched_domain_shared *sds, int val)
> +{
> + if (!sds)
> + return;
> +
> + WRITE_ONCE(sds->is_overloaded, val);
> +}
> +
> +static inline bool test_sd_overloaded(struct sched_domain_shared *sds)
> +{
> + return READ_ONCE(sds->is_overloaded);
> +}
> +
> +/* Returns true if a previously overloaded domain is likely still overloaded. */
> +static inline bool
> +abort_sd_overloaded(struct sched_domain_shared *sds, int prev, int target)
> +{
> + if (!sds || !test_sd_overloaded(sds))
> + return false;
> +
> + /* Are either target or a suitable prev 1 or 0 tasks? */
> + if (cpu_rq(target)->nr_running <= 1 ||
> + (prev != target && cpus_share_cache(prev, target) &&
> + cpu_rq(prev)->nr_running <= 1)) {
> + set_sd_overloaded(sds, 0);
> + return false;
> + }
> +
> + return true;
> +}
> +
> #ifdef CONFIG_SCHED_SMT
> DEFINE_STATIC_KEY_FALSE(sched_smt_present);
> EXPORT_SYMBOL_GPL(sched_smt_present);
> @@ -6060,15 +6092,18 @@ static inline int select_idle_smt(struct task_struct *p, int target)
> * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
> * average idle time for this rq (as found in rq->avg_idle).
> */
> -static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
> +static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
> + int prev, int target)
> {
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> struct sched_domain *this_sd;
> + struct sched_domain_shared *sds;
> u64 avg_cost, avg_idle;
> u64 time, cost;
> s64 delta;
> int this = smp_processor_id();
> int cpu, nr = INT_MAX;
> + int nr_scanned = 0, nr_running = 0;
>
> this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
> if (!this_sd)
> @@ -6092,18 +6127,40 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> nr = 4;
> }
>
> + sds = rcu_dereference(per_cpu(sd_llc_shared, target));
> + if (sched_feat(SIS_OVERLOAD)) {
> + if (abort_sd_overloaded(sds, prev, target))
> + return -1;
> + }
> +
> time = cpu_clock(this);
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> for_each_cpu_wrap(cpu, cpus, target) {
> schedstat_inc(this_rq()->sis_scanned);
> - if (!--nr)
> - return -1;
> + if (!--nr) {
> + cpu = -1;
> + break;
> + }
> if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> break;
> + if (sched_feat(SIS_OVERLOAD)) {
> + nr_scanned++;
> + nr_running += cpu_rq(cpu)->nr_running;
> + }
> }
>
> + /* Check if domain should be marked overloaded if no cpu was found. */
> + if (sched_feat(SIS_OVERLOAD) && (signed)cpu >= nr_cpumask_bits &&
> + nr_scanned && nr_running > (nr_scanned << 1)) {
> + set_sd_overloaded(sds, 1);
> + }
> +
> + /* Scan cost not accounted for if scan is throttled */
> + if (!nr)
> + return -1;
> +
> time = cpu_clock(this) - time;
> cost = this_sd->avg_scan_cost;
> delta = (s64)(time - cost) / 8;
> @@ -6236,7 +6293,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>
> - i = select_idle_cpu(p, sd, target);
> + i = select_idle_cpu(p, sd, prev, target);
> if ((unsigned)i < nr_cpumask_bits)
> return i;
>
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 7481cd96f391..c36ae01910e2 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -57,6 +57,9 @@ SCHED_FEAT(TTWU_QUEUE, true)
> SCHED_FEAT(SIS_AVG_CPU, false)
> SCHED_FEAT(SIS_PROP, true)
>
> +/* Limit scans if the domain is likely overloaded */
> +SCHED_FEAT(SIS_OVERLOAD, true)
> +
> /*
> * Issue a WARN when we do multiple update_rq_clock() calls
> * in a single rq->lock section. Default disabled because the
> --
> 2.16.4
>

2020-03-20 16:45:48

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > ---
> > include/linux/sched/topology.h | 1 +
> > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > kernel/sched/features.h | 3 ++
> > 3 files changed, 65 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index af9319e4cfb9..76ec7a54f57b 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > atomic_t ref;
> > atomic_t nr_busy_cpus;
> > int has_idle_cores;
> > + int is_overloaded;
>
> Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
>

It's connected to nohz balancing and I didn't see how I could use that
for detecting overload. Also, I don't think it ever can be larger than
the sd weight and overload is based on the number of running tasks being
greater than the number of available CPUs. Did I miss something obvious?

--
Mel Gorman
SUSE Labs

2020-03-20 16:56:21

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, 20 Mar 2020 at 17:44, Mel Gorman <[email protected]> wrote:
>
> On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > > ---
> > > include/linux/sched/topology.h | 1 +
> > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > > kernel/sched/features.h | 3 ++
> > > 3 files changed, 65 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > index af9319e4cfb9..76ec7a54f57b 100644
> > > --- a/include/linux/sched/topology.h
> > > +++ b/include/linux/sched/topology.h
> > > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > > atomic_t ref;
> > > atomic_t nr_busy_cpus;
> > > int has_idle_cores;
> > > + int is_overloaded;
> >
> > Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
> >
>
> It's connected to nohz balancing and I didn't see how I could use that
> for detecting overload. Also, I don't think it ever can be larger than
> the sd weight and overload is based on the number of running tasks being
> greater than the number of available CPUs. Did I miss something obvious?

IIUC you try to estimate if there is a chance to find an idle cpu
before starting the loop and scanning the domain and abort early if
the possibility is low.

if nr_busy_cpus equals to sd->span_weight it means that there is no
free cpu so there is no need to scan

>
> --
> Mel Gorman
> SUSE Labs

2020-03-20 17:44:55

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, Mar 20, 2020 at 05:54:57PM +0100, Vincent Guittot wrote:
> On Fri, 20 Mar 2020 at 17:44, Mel Gorman <[email protected]> wrote:
> >
> > On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > > > ---
> > > > include/linux/sched/topology.h | 1 +
> > > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > > > kernel/sched/features.h | 3 ++
> > > > 3 files changed, 65 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > > index af9319e4cfb9..76ec7a54f57b 100644
> > > > --- a/include/linux/sched/topology.h
> > > > +++ b/include/linux/sched/topology.h
> > > > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > > > atomic_t ref;
> > > > atomic_t nr_busy_cpus;
> > > > int has_idle_cores;
> > > > + int is_overloaded;
> > >
> > > Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
> > >
> >
> > It's connected to nohz balancing and I didn't see how I could use that
> > for detecting overload. Also, I don't think it ever can be larger than
> > the sd weight and overload is based on the number of running tasks being
> > greater than the number of available CPUs. Did I miss something obvious?
>
> IIUC you try to estimate if there is a chance to find an idle cpu
> before starting the loop and scanning the domain and abort early if
> the possibility is low.
>
> if nr_busy_cpus equals to sd->span_weight it means that there is no
> free cpu so there is no need to scan
>

Ok, I see what you are getting at but I worry there are multiple
problems there. First, the nr_busy_cpus is decremented only when a CPU
is entering idle with the tick stopped. If nohz is disabled then this
breaks, no? Secondly, a CPU can be idle but the tick not stopped if
__tick_nohz_idle_stop_tick knows there is an event in the near future
so using busy_cpus, we potentially miss a sibling that was adequate
for running a task. Finally, the threshold for cutting off the search
entirely seems low. The patch marks a domain as overloaded if there are
twice as many running tasks as runqueues scanned. In that scenario, even
if tasks are rapidly switching between busy/idle, it's still unlikely
the task will go idle. When cutting off at just the fully-busy mark, we
could miss a CPU that is going idle, almost idle or is running SCHED_IDLE
tasks where are acceptable target candidates for select_idle_sibling. I
think there are too many cases where nr_busy_cpus are problematic to
make it a good alternative.

--
Mel Gorman
SUSE Labs

2020-03-23 13:31:24

by Valentin Schneider

[permalink] [raw]

Subject: Re: [PATCH 1/4] sched/fair: Track efficiency of select_idle_sibling

Hi Mel,

On Fri, Mar 20 2020, Mel Gorman wrote:
> SIS Search: Number of calls to select_idle_sibling
>
> SIS Domain Search: Number of times the domain was searched because the
> fast path failed.
>
> SIS Scanned: Generally the number of runqueues scanned but the fast
> path counts as 1 regardless of the values for target, prev
> and recent.
>
> SIS Domain Scanned: Number of runqueues scanned during a search of the
> LLC domain.
>
> SIS Failures: Number of SIS calls that failed to find an idle CPU
>

Let me put my changelog pedant hat on; it would be nice to explicitely
separate the 'raw' stats (i.e. those that you are adding to sis()) to
the downstream ones.

AIUI the ones above here are the 'raw' stats (except "SIS Domain
Scanned", I'm not sure I get where this one comes from?), and the ones
below are the downstream, post-processed ones.

> SIS Search Efficiency: A ratio expressed as a percentage of runqueues
> scanned versus idle CPUs found. A 100% efficiency indicates that
> the target, prev or recent CPU of a task was idle at wakeup. The
> lower the efficiency, the more runqueues were scanned before an
> idle CPU was found.
>
> SIS Domain Search Efficiency: Similar, except only for the slower SIS
> patch.
>
> SIS Fast Success Rate: Percentage of SIS that used target, prev or
> recent CPUs.
>
> SIS Success rate: Percentage of scans that found an idle CPU.
>
> Signed-off-by: Mel Gorman <[email protected]>

With the nits taken into account:

Reviewed-by: Valentin Schneider <[email protected]>

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1dea8554ead0..9d32a81ece08 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6150,6 +6153,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> struct sched_domain *sd;
> int i, recent_used_cpu;
>
> + schedstat_inc(this_rq()->sis_search);
> +
> + /*
> + * Checking if prev, target and recent is treated as one scan. A
> + * perfect hit on one of those is considered 100% efficiency.
> + * Further scanning impairs efficiency.
> + */
> + schedstat_inc(this_rq()->sis_scanned);
> +

You may want to move that sis_scanned increment to below the 'symmetric'
label. Also, you should instrument select_idle_capacity() with
sis_scanned increments, if only for the sake of completeness.

One last thing: each of the new schedstat_inc() callsites use this_rq();
IIRC because of the RELOC_HIDE() hiding underneath there's very little
chance of the compiler caching this. However, this depends on schedstat,
so I suppose that is fine.

> /*
> * For asymmetric CPU capacity systems, our domain of interest is
> * sd_asym_cpucapacity rather than sd_llc.

2020-03-23 13:31:40

by Valentin Schneider

[permalink] [raw]

Subject: Re: [PATCH 2/4] sched/fair: Track efficiency of task recent_used_cpu

On Fri, Mar 20 2020, Mel Gorman wrote:
> SIS Recent Used Hit: A recent CPU was eligible and used. Each hit is
> a domain search avoided.
>
> SIS Recent Used Miss: A recent CPU was eligible but unavailable. Each
> time this is hit, there was a penalty to the fast path before
> a domain search happened.
>
> SIS Recent Success Rate: A percentage of the number of hits versus
> the total attempts to use the recent CPU.
>
> SIS Recent Attempts: The total number of times the recent CPU was examined.
> A high number of Recent Attempts with a low Success Rate implies
> the fast path is being punished severely. This could have been
> presented as a weighting of hits and misses but calculating an
> appropriate weight for misses is problematic.
>

Ditto on the raw vs post-processed detail in the changelog, otherwise:

Reviewed-by: Valentin Schneider <[email protected]>

> Signed-off-by: Mel Gorman <[email protected]>

2020-03-23 13:32:08

by Valentin Schneider

[permalink] [raw]

Subject: Re: [PATCH 3/4] sched/fair: Clear SMT siblings after determining the core is not idle

On Fri, Mar 20 2020, Mel Gorman wrote:
> The clearing of SMT siblings from the SIS mask before checking for an idle
> core is a small but unnecessary cost. Defer the clearing of the siblings
> until the scan moves to the next potential target. The cost of this was
> not measured as it is borderline noise but it should be self-evident.
>
> Signed-off-by: Mel Gorman <[email protected]>

Reviewed-by: Valentin Schneider <[email protected]>

2020-03-23 13:58:15

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH 1/4] sched/fair: Track efficiency of select_idle_sibling

On Mon, Mar 23, 2020 at 01:30:10PM +0000, Valentin Schneider wrote:
>
> Hi Mel,
>
> On Fri, Mar 20 2020, Mel Gorman wrote:
> > SIS Search: Number of calls to select_idle_sibling
> >
> > SIS Domain Search: Number of times the domain was searched because the
> > fast path failed.
> >
> > SIS Scanned: Generally the number of runqueues scanned but the fast
> > path counts as 1 regardless of the values for target, prev
> > and recent.
> >
> > SIS Domain Scanned: Number of runqueues scanned during a search of the
> > LLC domain.
> >
> > SIS Failures: Number of SIS calls that failed to find an idle CPU
> >
>
> Let me put my changelog pedant hat on; it would be nice to explicitely
> separate the 'raw' stats (i.e. those that you are adding to sis()) to
> the downstream ones.
>
> AIUI the ones above here are the 'raw' stats (except "SIS Domain
> Scanned", I'm not sure I get where this one comes from?), and the ones
> below are the downstream, post-processed ones.
>

I can fix that up.

> > SIS Search Efficiency: A ratio expressed as a percentage of runqueues
> > scanned versus idle CPUs found. A 100% efficiency indicates that
> > the target, prev or recent CPU of a task was idle at wakeup. The
> > lower the efficiency, the more runqueues were scanned before an
> > idle CPU was found.
> >
> > SIS Domain Search Efficiency: Similar, except only for the slower SIS
> > patch.
> >
> > SIS Fast Success Rate: Percentage of SIS that used target, prev or
> > recent CPUs.
> >
> > SIS Success rate: Percentage of scans that found an idle CPU.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
>
> With the nits taken into account:
>
> Reviewed-by: Valentin Schneider <[email protected]>
>
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1dea8554ead0..9d32a81ece08 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6150,6 +6153,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > struct sched_domain *sd;
> > int i, recent_used_cpu;
> >
> > + schedstat_inc(this_rq()->sis_search);
> > +
> > + /*
> > + * Checking if prev, target and recent is treated as one scan. A
> > + * perfect hit on one of those is considered 100% efficiency.
> > + * Further scanning impairs efficiency.
> > + */
> > + schedstat_inc(this_rq()->sis_scanned);
> > +
>
> You may want to move that sis_scanned increment to below the 'symmetric'
> label. Also, you should instrument select_idle_capacity() with
> sis_scanned increments, if only for the sake of completeness.
>

Yes, that would make more sense. Instrumenting select_idle_capacity is
trivial so I'll fix that up too.

> One last thing: each of the new schedstat_inc() callsites use this_rq();
> IIRC because of the RELOC_HIDE() hiding underneath there's very little
> chance of the compiler caching this. However, this depends on schedstat,
> so I suppose that is fine.
>

It's a deliberate choice so that when schedstat is disabled there is no
cost. While some schedstat sites lookup the current runqueue, not all of
them do. This might be a little wasteful when schedstats are enabled but
at least it's consistent.

Thanks

--
Mel Gorman
SUSE Labs

2020-03-24 10:37:06

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Fri, 20 Mar 2020 at 18:43, Mel Gorman <[email protected]> wrote:
>
> On Fri, Mar 20, 2020 at 05:54:57PM +0100, Vincent Guittot wrote:
> > On Fri, 20 Mar 2020 at 17:44, Mel Gorman <[email protected]> wrote:
> > >
> > > On Fri, Mar 20, 2020 at 04:48:39PM +0100, Vincent Guittot wrote:
> > > > > ---
> > > > > include/linux/sched/topology.h | 1 +
> > > > > kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++---
> > > > > kernel/sched/features.h | 3 ++
> > > > > 3 files changed, 65 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > > > > index af9319e4cfb9..76ec7a54f57b 100644
> > > > > --- a/include/linux/sched/topology.h
> > > > > +++ b/include/linux/sched/topology.h
> > > > > @@ -66,6 +66,7 @@ struct sched_domain_shared {
> > > > > atomic_t ref;
> > > > > atomic_t nr_busy_cpus;
> > > > > int has_idle_cores;
> > > > > + int is_overloaded;
> > > >
> > > > Can't nr_busy_cpus compared to sd->span_weight give you similar status ?
> > > >
> > >
> > > It's connected to nohz balancing and I didn't see how I could use that
> > > for detecting overload. Also, I don't think it ever can be larger than
> > > the sd weight and overload is based on the number of running tasks being
> > > greater than the number of available CPUs. Did I miss something obvious?
> >
> > IIUC you try to estimate if there is a chance to find an idle cpu
> > before starting the loop and scanning the domain and abort early if
> > the possibility is low.
> >
> > if nr_busy_cpus equals to sd->span_weight it means that there is no
> > free cpu so there is no need to scan
> >
>
> Ok, I see what you are getting at but I worry there are multiple
> problems there. First, the nr_busy_cpus is decremented only when a CPU
> is entering idle with the tick stopped. If nohz is disabled then this
> breaks, no? Secondly, a CPU can be idle but the tick not stopped if

But this can be changed if that make the statistic useful

> __tick_nohz_idle_stop_tick knows there is an event in the near future
> so using busy_cpus, we potentially miss a sibling that was adequate
> for running a task. Finally, the threshold for cutting off the search
> entirely seems low. The patch marks a domain as overloaded if there are
> twice as many running tasks as runqueues scanned. In that scenario, even
> if tasks are rapidly switching between busy/idle, it's still unlikely
> the task will go idle. When cutting off at just the fully-busy mark, we
> could miss a CPU that is going idle, almost idle or is running SCHED_IDLE
> tasks where are acceptable target candidates for select_idle_sibling. I
> think there are too many cases where nr_busy_cpus are problematic to
> make it a good alternative.

I don't really like this patch because it adds yet another metrics and
yet another feature which is set true by default. Also the current
proposal seems a bit fragile because it uses an arbitrary ratio of 2
on an arbitrary number of CPUs. This threshold probably works in your
case and your system but probably not for others and the threshold
really looks like a heuristic that works for you but without any real
meaning.

Then, the update is done at each and every task wake up and by all
CPUs in the LLC. It means that the same variable is updated
simultaneously by all CPUs: one CPU can set it and the next one might
clear it immediately because they haven't scanned the same CPUs. At
the end, 2 threads waking up simultaneously on different CPUS, might
end up using 2 different policy without any other reason than a random
ordering.

I agree that the concept of detecting that a LLC domain is overloaded
can be useful to decide to skip searching for an idle cpu but this
proposal seems to be not really generic

Vincent

>
> --
> Mel Gorman
> SUSE Labs

2020-03-24 11:25:25

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary

On Tue, Mar 24, 2020 at 11:35:08AM +0100, Vincent Guittot wrote:
> > > > It's connected to nohz balancing and I didn't see how I could use that
> > > > for detecting overload. Also, I don't think it ever can be larger than
> > > > the sd weight and overload is based on the number of running tasks being
> > > > greater than the number of available CPUs. Did I miss something obvious?
> > >
> > > IIUC you try to estimate if there is a chance to find an idle cpu
> > > before starting the loop and scanning the domain and abort early if
> > > the possibility is low.
> > >
> > > if nr_busy_cpus equals to sd->span_weight it means that there is no
> > > free cpu so there is no need to scan
> > >
> >
> > Ok, I see what you are getting at but I worry there are multiple
> > problems there. First, the nr_busy_cpus is decremented only when a CPU
> > is entering idle with the tick stopped. If nohz is disabled then this
> > breaks, no? Secondly, a CPU can be idle but the tick not stopped if
>
> But this can be changed if that make the statistic useful
>

Hmm, for all cases to track number of running tasks, I think that would
end up being too costly because of the shared cache line.

> > __tick_nohz_idle_stop_tick knows there is an event in the near future
> > so using busy_cpus, we potentially miss a sibling that was adequate
> > for running a task. Finally, the threshold for cutting off the search
> > entirely seems low. The patch marks a domain as overloaded if there are
> > twice as many running tasks as runqueues scanned. In that scenario, even
> > if tasks are rapidly switching between busy/idle, it's still unlikely
> > the task will go idle. When cutting off at just the fully-busy mark, we
> > could miss a CPU that is going idle, almost idle or is running SCHED_IDLE
> > tasks where are acceptable target candidates for select_idle_sibling. I
> > think there are too many cases where nr_busy_cpus are problematic to
> > make it a good alternative.
>
> I don't really like this patch because it adds yet another metrics and
> yet another feature which is set true by default. Also the current
> proposal seems a bit fragile because it uses an arbitrary ratio of 2
> on an arbitrary number of CPUs. This threshold probably works in your
> case and your system but probably not for others and the threshold
> really looks like a heuristic that works for you but without any real
> meaning.
>

I have to admit that is a possibility. The really interesting case for
other people is the transition from almost-fully-busy -> fully-busy ->
overloaded because the optimal amount to search changes at those points.
It's a minefield of hitting a regression somewhere whether you search
too much or too little.

It also can be somewhat problematic when there are multiple small llc
caches per numa node. In that case a "full" search is still a small number
of CPUs and premature cutoff can be hurtful. It eventually showed up when
enough tests ran for long enough.

> Then, the update is done at each and every task wake up and by all
> CPUs in the LLC.

I avoided that by only doing the write when there is a state transition.
set_sd_overloaded() does not write if it's already marked overloaded.

> It means that the same variable is updated
> simultaneously by all CPUs: one CPU can set it and the next one might
> clear it immediately because they haven't scanned the same CPUs.

That is possible.

> At
> the end, 2 threads waking up simultaneously on different CPUS, might
> end up using 2 different policy without any other reason than a random
> ordering.
>

True, but I did not feel the race was extremely damaging because it
happens once per transition for up to sd_weight-1 unlucky tasks. The same
is true for the test for idle cores where multiple tasks can search for
an idle core unnecessarily. Now the search for an idle core can actually
cache one idle candidate it finds. It's a straight-forward patch but it
hurts the fast path when the domain has spare capacity and similar to
the overloaded state, we cannot detect spare capacity in advance.

> I agree that the concept of detecting that a LLC domain is overloaded
> can be useful to decide to skip searching for an idle cpu but this
> proposal seems to be not really generic
>

Ok, I can dump the patch and think of something else.

Thanks for the review!

--
Mel Gorman
SUSE Labs

2020-04-02 08:01:30

by Chen, Rong A

[permalink] [raw]

Subject: [sched/fair] 15e7470dfc: hackbench.throughput 11.2% improvement

Greeting,

FYI, we

commit: url:
in testcase: hackbench
on test with following parameters:

nr_threads: 100%
mode: threads
ipc: pipe
cpufreq_governor: performance
ucode: 0x2000065

test-description: test-url:

Details are as below:
-----------------

To reproduce:

git clone bin/lkp run job.yaml

================= compiler/cpufreq_ gcc-7/performan
commit:
38321bdba2 15e7470dfc
38321bdba2e094b2 ---------------- fail:runs | 1:4 :4 %stddev \ 273676 627.01 627.01 2.823e+09 862689 ± 2% 49062 15113 5.224e+09 1.747e+09 1500 ± 28% 1.56 2.077e+08 ± 4% 34591570 1.417e+08 60876038 84269509 84269605 89134191 89167819 918.75 ± 5% 918.75 ± 5% 4528 4528 74.75 23.00 12946159 996012 373.75 ± 28% 666544 ± 2% 575714 ± 3% 1.735e+08 1.734e+08 76981 ± 7% 733440 ± 2% 1.737e+08 2337874 1.737e+08 76981 ± 7% 58825 ± 13% 58369 ± 12% 14230 ± 7% 58825 ± 13% 42715469 42612066 34729 ± 21% 14918 ± 51% 8181 ± 13% 34729 ± 21% 45590871 45494598 235466 ± 13% 235339 ± 13% 173424 ± 14% 233511 ± 12% 56927 ± 7% 1298360 ± 4% 56927 ± 7% 210815 ± 6% 139001 ± 21% 138924 ± 21% 31549 ± 80% 59680 ± 51% 32728 ± 13% 1069909 ± 4% 32728 ± 13% 79815 ± 43% 746616 ± 44% 79815 ± 43% 746616 ± 44% 0.36 ± 30% 0.10 ± 18% 9360 ± 3% 17159 ± 2% 10.95 ± 29% 15687 ± 45% 3102 ± 27% 10.01 ± 4% 0.59 ± 17% 37876581 35602169 4156219 10.52 2.32 ± 3% 31427239 ± 3% 1.402e+09 12967384 813380 9187 ± 3% 0.15 59231906 4.024e+10 2.43e+10 96.03 58608410 2265582 ± 3% 2366 ± 2% 3691 73.23 5437363 ± 3% 1661376 3842299 3691 10.40 2.24 ± 3% 8874 ± 3% 0.15 96.34 2295 76.36 48328 31303400 ± 3% 1.398e+09 12959378 805274 59231763 4.023e+10 2.43e+10 58626745 2227130 ± 4% 3651 5369350 ± 3% 1661038 3850090 3651 8.444e+13 29801 ± 2% 24949 ± 5% 22867 ± 2% 27954 22972 22523 ± 2% 23390 ± 2% 23224 23261 ± 3% 22713 22639 22386 22737 23520 ± 3% 22922 ± 2% 23363 23496 23528 23138 23168 ± 3% 27789 ± 5% 28112 ± 3% 23800 ± 2% 28084 ± 3% 28181 28344 ± 2% 28080 ± 2% 27932 27619 ± 3% 23274 23106 ± 2% 23087 23440 ± 2% 23342 23873 23333 22864 ± 2% 22271 22998 22402 23772 23622 ± 2% 22977 22918 ± 2% 23058 23381 23368 23433 23179 23121 23342 22942 22866 23203 ± 2% 23678 23867 23760 23426 23248 ± 3% 23421 23822 27090 ± 2% 2666040 15.91 ± 2% 15.39 ± 2% 15.13 ± 2% 14.73 ± 2% 25.73 26.64 78.57 29.46 79.47 31.96 10.31 ± 2% 10.11 ± 2% 21.79 22.74 3.77 ± 2% 2.63 ± 2% 5.01 ± 3% 26.69 5.92 ± 3% 6.07 ± 3% 6.03 ± 3% 28.78 2.61 2.39 ± 4% 2.50 ± 3% 2.28 ± 4% 1.17 ± 7% 1.22 ± 8% 2.27 ± 2% 1.74 ± 2% 1.90 ± 2% 1.02 ± 2% 1.45 ± 2% 1.50 ± 2% 1.33 1.18 ± 3% 0.84 ± 3% 0.86 ± 2% 0.76 ± 2% 0.80 ± 2% 0.92 0.73 ± 4% 0.83 0.88 ± 2% 0.91 0.72 ± 2% 0.79 ± 2% 0.52 1.63 ± 2% 1.35 1.17 ± 3% 0.97 1.51 1.49 ± 2% 1.22 0.92 1.24 1.71 ± 2% 1.74 ± 2% 1.03 0.71 1.68 0.98 1.11 ± 2% 1.04 1.27 ± 2% 0.96 0.73 ± 2% 0.98 ± 3% 0.76 ± 2% 1.01 ± 3% 1.35 1.02 ± 2% 1.10 ± 2% 1.79 2.65 3.18 0.87 ± 5% 0.87 ± 5% 0.87 ± 5% 0.00 0.00 0.88 ± 5% 0.00 0.00 7.36 8.02 17.02 ± 2% 16.42 ± 2% 16.14 ± 2% 15.85 ± 2% 25.80 26.69 78.69 29.50 79.52 13.54 ± 2% 13.76 ± 2% 31.98 21.89 22.80 6.55 ± 2% 6.52 ± 2% 4.06 6.39 ± 2% 2.90 5.41 ± 3% 26.74 28.82 2.79 ± 7% 3.88 3.88 ± 2% 1.05 ± 2% 3.21 2.56 ± 3% 2.83 2.71 3.10 ± 2% 1.84 ± 2% 1.89 ± 2% 0.35 ± 2% 1.46 ± 2% 1.29 ± 2% 1.48 1.49 0.68 ± 2% 1.33 0.92 0.96 ± 3% 0.33 0.64 ± 3% 0.98 1.15 0.90 0.97 ± 2% 0.26 ± 8% 0.46 ± 3% 0.90 ± 3% 0.60 ± 3% 0.29 ± 2% 0.42 ± 3% 0.86 ± 2% 0.67 0.40 ± 2% 0.53 0.49 ± 2% 0.49 ± 3% 0.55 ± 2% 0.58 ± 2% 0.15 ± 4% 0.33 ± 5% 0.16 ± 5% 0.38 ± 3% 0.14 ± 5% 0.24 ± 3% 0.30 ± 3% 0.16 ± 2% 0.18 ± 3% 0.22 ± 4% 0.21 0.30 ± 2% 0.20 ± 2% 0.13 ± 3% 0.08 0.17 ± 4% 0.16 ± 2% 0.12 ± 4% 0.07 0.06 0.37 0.15 ± 5% 0.18 ± 2% 0.06 ± 7% 0.30 ± 2% 0.24 0.20 ± 2% 0.09 ± 7% 0.54 0.03 ±100% 0.18 ± 2% 0.34 ± 3% 0.01 ±173% 0.24 0.39 ± 2% 0.26 1.01 0.35 ± 3% 0.78 0.75 0.24 ± 7% 0.44 0.99 1.64 1.01 1.64 ± 2% 0.72 3.06 1.69 0.83 0.99 1.29 ± 2% 1.02 ± 3% 2.44 2.96 1.41 ± 2% 3.03 2.08 2.82 2.75 0.20 ± 7% 3.23 0.45 ± 7% 0.45 ± 7% 3.17 0.87 ± 5% 0.88 ± 5% 0.88 ± 5% 0.88 ± 5% 8.30 8.04 2.79 ± 7% 1.03 ± 2% 3.21 1.59 ± 2% 0.85 ± 2% 1.45 ± 2% 1.34 ± 2% 1.66 ± 2% 1.01 1.26 ± 3% 1.29 0.55 0.97 1.21 0.33 0.63 ± 2% 0.56 ± 3% 0.95 ± 2% 0.89 ± 2% 0.17 0.70 0.60 ± 2% 0.40 ± 2% 0.86 ± 2% 0.52 0.63 0.49 ± 2% 0.48 ± 2% 0.10 ± 14% 0.57 ± 2% 0.52 ± 3% 0.49 ± 2% 0.42 ± 2% 0.09 ± 7% 0.36 ± 4% 0.32 ± 3% 0.32 ± 5% 0.28 0.21 ± 2% 0.16 ± 7% 0.33 ± 3% 0.29 ± 2% 0.18 ± 3% 0.21 ± 2% 0.32 ± 3% 0.21 ± 5% 0.15 ± 2% 0.06 0.21 ± 2% 0.29 ± 2% 0.12 0.24 0.11 ± 4% 0.09 ± 4% 0.10 ± 4% 0.07 ± 5% 0.15 ± 3% 0.12 ± 3% 0.12 ± 4% 0.14 ± 3% 0.17 ± 2% 0.09 ± 5% 0.16 ± 2% 0.34 0.22 0.20 ± 2% 0.39 ± 2% 0.20 ± 2% 0.32 0.19 ± 2% 0.18 ± 2% 0.23 0.91 0.38 ± 2% 0.46 0.34 ± 2% 0.79 0.33 ± 2% 0.36 ± 3% 0.28 ± 2% 0.76 0.56 0.64 1.55 0.23 ± 7% 0.38 1.64 0.38 ± 2% 0.37 0.82 1.59 0.55 ± 5% 1.60 ± 2% 0.81 0.96 ± 4% 2.41 1.36 ± 2% 1.66 1.46 ± 2% 2.77 0.20 ± 7% 8.29 8.04 12.90 2647 ± 65% 37863804 ± 2% 362158 ± 2% 4172754 401592 ± 2% 363963 ± 2% 4140181 ± 2% 401513 ± 2% 362307 ± 3% 4062554 402121 ± 2% 368817 ± 2% 4800401 401718 ± 2% 366551 ± 2% 4613609 ± 2% 401293 ± 2% 366857 ± 2% 4757720 400915 ± 2% 360310 ± 3% 4724316 ± 3% 396576 ± 2% 362968 ± 2% 3981921 402048 ± 2% 363055 ± 2% 3984389 ± 2% 402291 ± 2% 360471 ± 2% 4116935 ± 2% 401996 ± 2% 364891 ± 2% 4102020 402116 ± 2% 359603 ± 2% 4095312 ± 3% 402187 ± 2% 364327 ± 2% 4029416 402498 ± 2% 363982 ± 2% 4084473 402531 ± 2% 360338 3930863 ± 2% 401062 ± 2% 364435 ± 2% 3928379 402604 ± 2% 362066 ± 2% 4049789 ± 4% 401996 ± 2% 363960 ± 2% 4079535 ± 3% 401941 ± 2% 363498 4057077 402264 ± 2% 363798 ± 2% 4074066 402227 ± 2% 365438 ± 2% 4118858 ± 2% 402592 ± 2% 364005 ± 2% 4000362 ± 2% 402632 ± 2% 364387 ± 2% 3994036 ± 2% 402505 ± 2% 361758 ± 2% 4630011 ± 2% 401986 ± 2% 363712 ± 2% 4650224 ± 4% 401070 ± 2% 363333 ± 2% 4740992 ± 3% 400972 ± 2% 358463 ± 2% 4796173 400637 ± 2% 362764 ± 2% 4124342 ± 2% 402080 ± 2% 362499 ± 2% 4722536 ± 2% 400635 ± 2% 361719 ± 2% 4617452 400625 ± 2% 2647 ± 65% 364248 ± 2% 4645516 400580 ± 2% 360856 ± 2% 4688960 401136 ± 2% 362749 4700963 400231 ± 2% 362555 ± 2% 4720301 401338 ± 2% 363104 ± 2% 4755161 401644 ± 2% 361008 ± 3% 4669550 ± 2% 401123 ± 2% 364373 ± 2% 4671819 ± 2% 402122 ± 2% 362451 ± 2% 4627452 400852 ± 2% 364114 ± 2% 4066359 ± 2% 402091 ± 2% 362154 ± 2% 4835690 ± 4% 400687 ± 2% 364832 ± 2% 4802950 ± 2% 401241 ± 2% 364876 ± 2% 4709549 ± 3% 401497 ± 2% 364435 ± 2% 4650903 401222 ± 2% 363392 ± 2% 4642435 ± 3% 401265 ± 2% 361396 ± 2% 4716195 401579 ± 2% 363272 ± 2% 4699566 ± 3% 401156 ± 2% 361972 ± 3% 4766764 400761 ± 2% 364506 ± 2% 4798800 401365 ± 2% 361947 ± 2% 4624642 400829 ± 2% 364071 ± 2% 4007922 402119 ± 2% 362288 ± 2% 4738102 401224 ± 2% 363859 ± 2% 4612685 ± 2% 400993 ± 2% 366026 ± 2% 4044273 402921 ± 2% 365484 ± 2% 5794 ± 20% 5794 ± 20% 4105600 ± 2% 402361 ± 2% 367183 ± 2% 5794 ± 20% 5794 ± 20% 402891 ± 2% 365130 ± 2% 4118197 401954 ± 2% 363456 ± 2% 4077237 402098 ± 2% 364085 ± 2% 3972566 402468 ± 2% 364890 3995325 ± 3% 402219 ± 2% 366052 ± 2% 4163187 402434 ± 2% 362471 ± 2% 3991865 ± 3% 401545 ± 2% 363207 ± 2% 6717 ± 6% 6717 ± 6% 4177724 401859 ± 2% 365406 ± 2% 4189363 401933 ± 2% 362105 ± 2% 4058876 402332 ± 2% 362843 ± 3% 4041319 402782 ± 2% 365454 ± 2% 4029525 402503 ± 2% 366139 ± 2% 4142385 ± 2% 402261 ± 2% 362165 ± 2% 4065269 402652 ± 2% 365763 ± 2% 6720 ± 6% 6720 ± 6% 4164589 ± 2% 402517 ± 2% 365397 ± 2% 4114611 401936 ± 2% 362434 ± 2% 4074671 402591 ± 2% 364644 ± 2% 4116996 402193 ± 2% 365547 6717 ± 6% 6717 ± 6% 4024869 401922 ± 2% 367754 ± 2% 3987620 ± 2% 402283 ± 2% 364008 4151426 ± 3% 401616 ± 2% 366580 ± 2% 4105731 402076 ± 2% 362358 ± 2% 4112156 402087 ± 2% 365257 ± 3% 4156897 402476 ± 2% 366429 ± 2% 4076843 401974 ± 2% 366844 ± 2% 4010462 ± 3% 402135 ± 2% 364781 ± 2% 4633481 ± 2% 401686 ± 2% 364064 ± 2% 4690111 ± 2% 401747 ± 2% 358147 4137074 402009 ± 2% 363027 ± 2% 4761080 ± 5% 400095 ± 2% 364929 ± 2% 4819494 400609 ± 2% 365233 ± 2% 4701930 ± 2% 401341 ± 2% 363566 ± 2% 4658814 400595 ± 2% 363499 ± 2% 4774074 401133 ± 2% 365822 ± 2% 4694622 400559 ± 2% 364803 ± 2% 4699440 ± 2% 401231 ± 2% 365408 ± 2% 4719360 ± 2% 400941 ± 2% 366100 4783458 400598 ± 2% 365997 ± 2% 4673936 ± 2% 401813 ± 2% 361760 ± 2% 4193690 401659 ± 2% 365405 ± 2% 4632041 ± 3% 401536 ± 2% 368348 ± 2% 4699415 401058 ± 2% 365593 ± 2% 4874524 ± 3% 400768 ± 2% 369748 ± 2% 4811191 ± 3% 401210 ± 2% 365854 ± 2% 4725535 ± 3% 401389 ± 2% 365697 ± 3% 4701659 401162 ± 2% 367333 ± 2% 4699350 ± 2% 401090 ± 2% 367997 ± 2% 4742816 401878 ± 2% 365973 ± 3% 4704134 ± 3% 401550 ± 2% 364681 ± 2% 4789912 400741 ± 2% 208.00 4.569e+08 41767734 ± 2%

310000 | 300000 |-OO OO OO | | 290000 |-+ | 280000 |-+ +.++. | 270000 |-+ : | : | + 260000 |.++.+ | + 250000 1.9e+09 1.85e+09 |-+ | | 1.8e+09 |-+ | 1.75e+09 |-+ | : 1.7e+09 |-+ : | : | : 1.65e+09 |-+ : |.++.++ 1.6e+09 16000 15800 |-+ 15600 |-+ | 15400 |-+ 15200 |-+ | 15000 |-+ +.+ 14800 |-+ : | : 14600 |-+ : 14400 |.+ .+: | + + 14200 [*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results for informational design or

Thanks,
Rong Chen

noticed a 11.2% improvement of hackbench.throughput due to commit:
15e7470dfc3d05f9339696ff21be8e49abea864c ("[PATCH 4/4] sched/fair: Track possibly overloaded domains and abort a scan if necessary")
href="https://github.com/0day-ci/linux/commits/Mel-Gorman/Throttle-select_idle_sibling-when-a-target-domain-is-overloaded/20200321-034938">https://github.com/0day-ci/linux/commits/Mel-Gorman/Throttle-select_idle_sibling-when-a-target-domain-is-overloaded/20200321-034938
machine: 104 threads Skylake with 192G memory
Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
href="https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c">https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
--------------------------------------------------------------------------------->
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
bin/lkp install job.yaml # job file is attached in this email
========================================================================
governor/ipc/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
ce/pipe/x86_64-rhel-7.6/threads/100%/debian-x86_64-20191114.cgz/lkp-skl-fpga01/hackbench/0x2000065
("sched/fair: Clear SMT siblings after determining the core is not idle")
("sched/fair: Track possibly overloaded domains and abort a scan if necessary")
15e7470dfc3d05f9339696ff21b
---------------------------
%reproduction fail:runs
| |
-25% :4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
25% 1:4 kmsg.ipmi_si_dmi-ipmi-si.#:IRQ_index#not_found
%change %stddev
| \
+11.2% 304249 hackbench.throughput
-3.6% 604.63 hackbench.time.elapsed_time
-3.6% 604.63 hackbench.time.elapsed_time.max
-15.4% 2.389e+09 hackbench.time.involuntary_context_switches
-24.8% 648824 ± 3% hackbench.time.minor_page_faults
-6.9% 45673 hackbench.time.system_time
+5.1% 15887 hackbench.time.user_time
-11.1% 4.645e+09 hackbench.time.voluntary_context_switches
+7.1% 1.872e+09 hackbench.workload
-16.5% 1252 ± 30% meminfo.Mlocked
+0.5 2.07 ± 2% mpstat.cpu.all.idle%
+97.3% 4.098e+08 ± 11% cpuidle.C1.time
+71.4% 59276157 ± 3% cpuidle.C1.usage
+28.3% 1.818e+08 cpuidle.POLL.time
+15.6% 70382270 cpuidle.POLL.usage
+18.4% 99747407 numa-numastat.node0.local_node
+18.4% 99760865 numa-numastat.node0.numa_hit
+14.6% 1.021e+08 numa-numastat.node1.local_node
+14.6% 1.022e+08 numa-numastat.node1.numa_hit
+14.2% 1049 ± 9% slabinfo.pool_workqueue.active_objs
+14.8% 1054 ± 9% slabinfo.pool_workqueue.num_objs
-10.8% 4040 ± 5% slabinfo.skbuff_head_cache.active_objs
-10.8% 4040 ± 5% slabinfo.skbuff_head_cache.num_objs
-3.7% 72.00 vmstat.cpu.sy
+8.7% 25.00 vmstat.cpu.us
-8.7% 11823064 vmstat.system.cs
-11.2% 884657 vmstat.system.in
-16.1% 313.75 ± 30% proc-vmstat.nr_mlock
-32.9% 447123 ± 4% proc-vmstat.numa_hint_faults
-39.9% 345813 ± 4% proc-vmstat.numa_hint_faults_local
+16.4% 2.019e+08 proc-vmstat.numa_hit
+16.4% 2.019e+08 proc-vmstat.numa_local
+20.0% 92403 ± 7% proc-vmstat.numa_pages_migrated
-25.0% 550124 ± 6% proc-vmstat.numa_pte_updates
+16.4% 2.022e+08 proc-vmstat.pgalloc_normal
-9.2% 2122624 proc-vmstat.pgfault
+16.4% 2.022e+08 proc-vmstat.pgfree
+20.0% 92403 ± 7% proc-vmstat.pgmigrate_success
-45.5% 32052 ± 40% numa-vmstat.node0.nr_active_anon
-45.5% 31829 ± 40% numa-vmstat.node0.nr_anon_pages
-21.1% 11233 ± 14% numa-vmstat.node0.nr_slab_reclaimable
-45.5% 32052 ± 40% numa-vmstat.node0.nr_zone_active_anon
+18.3% 50549181 numa-vmstat.node0.numa_hit
+18.3% 50398765 numa-vmstat.node0.numa_local
+76.8% 61416 ± 19% numa-vmstat.node1.nr_active_anon
+179.9% 41748 ± 30% numa-vmstat.node1.nr_anon_pages
+37.1% 11217 ± 15% numa-vmstat.node1.nr_slab_reclaimable
+76.8% 61416 ± 19% numa-vmstat.node1.nr_zone_active_anon
+13.9% 51910403 numa-vmstat.node1.numa_hit
+14.0% 51861137 numa-vmstat.node1.numa_local
-45.5% 128292 ± 40% numa-meminfo.node0.Active
-45.6% 128136 ± 40% numa-meminfo.node0.Active(anon)
-51.7% 83692 ± 45% numa-meminfo.node0.AnonHugePages
-45.5% 127246 ± 40% numa-meminfo.node0.AnonPages
-21.1% 44927 ± 13% numa-meminfo.node0.KReclaimable
-14.2% 1113777 ± 8% numa-meminfo.node0.MemUsed
-21.1% 44927 ± 13% numa-meminfo.node0.SReclaimable
-19.9% 168941 ± 18% numa-meminfo.node0.Slab
+76.7% 245567 ± 20% numa-meminfo.node1.Active
+76.7% 245519 ± 20% numa-meminfo.node1.Active(anon)
+278.9% 119529 ± 31% numa-meminfo.node1.AnonHugePages
+179.9% 167018 ± 30% numa-meminfo.node1.AnonPages
+37.1% 44871 ± 15% numa-meminfo.node1.KReclaimable
+17.6% 1257928 ± 7% numa-meminfo.node1.MemUsed
+37.1% 44871 ± 15% numa-meminfo.node1.SReclaimable
+111.6% 168859 ± 14% sched_debug.cfs_rq:/.MIN_vruntime.avg
+71.8% 1282474 ± 11% sched_debug.cfs_rq:/.MIN_vruntime.stddev
+111.6% 168859 ± 14% sched_debug.cfs_rq:/.max_vruntime.avg
+71.8% 1282474 ± 11% sched_debug.cfs_rq:/.max_vruntime.stddev
-68.1% 0.12 ± 64% sched_debug.cfs_rq:/.nr_running.min
+45.4% 0.14 ± 16% sched_debug.cfs_rq:/.nr_running.stddev
-14.8% 7970 ± 5% sched_debug.cfs_rq:/.runnable_avg.avg
-11.1% 15252 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
-40.7% 6.49 ± 38% sched_debug.cfs_rq:/.util_est_enqueued.min
-63.7% 5690 ± 32% sched_debug.cpu.curr->pid.min
+40.8% 4368 ± 12% sched_debug.cpu.curr->pid.stddev
-12.0% 8.80 ± 4% sched_debug.cpu.nr_running.avg
-41.2% 0.35 ± 37% sched_debug.cpu.nr_running.min
-11.4% 33570613 ± 3% sched_debug.cpu.nr_switches.avg
-13.2% 30897790 ± 4% sched_debug.cpu.nr_switches.min
+100.0% 8313659 sched_debug.sysctl_sched.sysctl_sched_features
-20.3% 8.39 perf-stat.i.MPKI
+0.9 3.17 perf-stat.i.cache-miss-rate%
+8.6% 34117986 perf-stat.i.cache-misses
-20.6% 1.113e+09 perf-stat.i.cache-references
-8.7% 11840723 perf-stat.i.context-switches
-29.9% 570390 perf-stat.i.cpu-migrations
-9.2% 8338 perf-stat.i.cycles-between-cache-misses
+0.0 0.16 perf-stat.i.dTLB-load-miss-rate%
+10.5% 65470826 perf-stat.i.dTLB-load-misses
+1.4% 4.078e+10 perf-stat.i.dTLB-loads
+1.3% 2.461e+10 perf-stat.i.dTLB-stores
-1.4 94.65 perf-stat.i.iTLB-load-miss-rate%
+9.2% 64015009 ± 2% perf-stat.i.iTLB-load-misses
+46.3% 3313848 ± 4% perf-stat.i.iTLB-loads
-7.0% 2201 perf-stat.i.instructions-per-iTLB-miss
-5.7% 3480 perf-stat.i.minor-faults
-2.5 70.70 perf-stat.i.node-load-miss-rate%
-7.3% 5042872 ± 2% perf-stat.i.node-load-misses
+5.9% 1760010 ± 2% perf-stat.i.node-loads
+10.0% 4227782 perf-stat.i.node-stores
-5.7% 3480 perf-stat.i.page-faults
-20.9% 8.22 perf-stat.overall.MPKI
+0.8 3.06 perf-stat.overall.cache-miss-rate%
-8.3% 8137 perf-stat.overall.cycles-between-cache-misses
+0.0 0.16 perf-stat.overall.dTLB-load-miss-rate%
-1.2 95.14 perf-stat.overall.iTLB-load-miss-rate%
-8.2% 2107 perf-stat.overall.instructions-per-iTLB-miss
-2.5 73.89 perf-stat.overall.node-load-miss-rate%
-9.7% 43622 perf-stat.overall.path-length
+8.6% 33992823 perf-stat.ps.cache-misses
-20.7% 1.109e+09 perf-stat.ps.cache-references
-8.7% 11834104 perf-stat.ps.context-switches
-30.1% 563237 perf-stat.ps.cpu-migrations
+10.5% 65460523 perf-stat.ps.dTLB-load-misses
+1.3% 4.077e+10 perf-stat.ps.dTLB-loads
+1.3% 2.461e+10 perf-stat.ps.dTLB-stores
+9.2% 64026396 ± 2% perf-stat.ps.iTLB-load-misses
+47.0% 3273576 ± 4% perf-stat.ps.iTLB-loads
-6.0% 3432 perf-stat.ps.minor-faults
-7.2% 4980780 ± 2% perf-stat.ps.node-load-misses
+6.0% 1760000 ± 2% perf-stat.ps.node-loads
+9.9% 4232834 perf-stat.ps.node-stores
-6.0% 3432 perf-stat.ps.page-faults
-3.3% 8.166e+13 perf-stat.total.instructions
+13.3% 33780 ± 3% softirqs.CPU0.SCHED
+15.2% 28741 ± 2% softirqs.CPU1.SCHED
+21.7% 27840 ± 2% softirqs.CPU10.SCHED
+10.9% 30992 ± 2% softirqs.CPU100.SCHED
+20.0% 27565 ± 2% softirqs.CPU11.SCHED
+21.8% 27433 softirqs.CPU12.SCHED
+14.9% 26880 softirqs.CPU13.SCHED
+20.2% 27915 softirqs.CPU14.SCHED
+20.2% 27951 ± 2% softirqs.CPU15.SCHED
+20.9% 27466 softirqs.CPU16.SCHED
+20.1% 27189 softirqs.CPU17.SCHED
+20.1% 26882 ± 2% softirqs.CPU18.SCHED
+20.0% 27276 softirqs.CPU19.SCHED
+20.2% 28278 softirqs.CPU2.SCHED
+18.2% 27104 softirqs.CPU20.SCHED
+19.5% 27929 softirqs.CPU21.SCHED
+18.5% 27836 softirqs.CPU22.SCHED
+18.2% 27800 softirqs.CPU23.SCHED
+19.2% 27582 softirqs.CPU24.SCHED
+20.4% 27898 softirqs.CPU25.SCHED
+14.1% 31717 ± 6% softirqs.CPU26.SCHED
+11.3% 31295 ± 4% softirqs.CPU29.SCHED
+18.1% 28114 softirqs.CPU3.SCHED
+11.7% 31359 ± 2% softirqs.CPU30.SCHED
+9.7% 30928 ± 2% softirqs.CPU33.SCHED
+10.1% 31205 ± 2% softirqs.CPU34.SCHED
+10.2% 30943 ± 3% softirqs.CPU35.SCHED
+9.8% 30682 ± 2% softirqs.CPU36.SCHED
+8.9% 30080 ± 3% softirqs.CPU37.SCHED
+21.6% 28303 softirqs.CPU4.SCHED
+19.6% 27626 softirqs.CPU5.SCHED
+20.4% 27800 softirqs.CPU52.SCHED
+19.5% 28017 softirqs.CPU53.SCHED
+20.0% 28016 softirqs.CPU54.SCHED
+17.2% 27986 softirqs.CPU55.SCHED
+20.0% 27988 ± 2% softirqs.CPU56.SCHED
+22.1% 27921 ± 2% softirqs.CPU57.SCHED
+22.4% 27250 softirqs.CPU58.SCHED
+20.6% 27738 ± 2% softirqs.CPU59.SCHED
+21.3% 27175 softirqs.CPU6.SCHED
+17.6% 27948 ± 2% softirqs.CPU60.SCHED
+19.3% 28192 ± 2% softirqs.CPU61.SCHED
+20.7% 27732 softirqs.CPU62.SCHED
+20.2% 27556 ± 2% softirqs.CPU63.SCHED
+20.0% 27666 softirqs.CPU64.SCHED
+17.3% 27433 ± 2% softirqs.CPU65.SCHED
+19.5% 27917 softirqs.CPU66.SCHED
+19.7% 28043 ± 2% softirqs.CPU67.SCHED
+18.5% 27456 softirqs.CPU68.SCHED
+17.9% 27263 softirqs.CPU69.SCHED
+16.9% 27277 softirqs.CPU7.SCHED
+18.4% 27173 softirqs.CPU70.SCHED
+20.5% 27545 softirqs.CPU71.SCHED
+17.7% 27321 softirqs.CPU72.SCHED
+16.4% 27556 softirqs.CPU73.SCHED
+16.2% 27738 ± 2% softirqs.CPU74.SCHED
+18.1% 28050 softirqs.CPU75.SCHED
+18.0% 27642 ± 2% softirqs.CPU76.SCHED
+20.6% 28044 softirqs.CPU77.SCHED
+19.7% 28032 softirqs.CPU8.SCHED
+16.8% 27813 ± 2% softirqs.CPU9.SCHED
+11.7% 30247 ± 3% softirqs.CPU98.SCHED
+13.7% 3030860 softirqs.SCHED
-4.3 11.64 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
-4.1 11.25 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
-4.1 11.03 ± 4% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
-4.0 10.71 ± 4% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
-3.2 22.48 perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
-3.1 23.55 perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
-2.7 75.92 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
-2.6 26.83 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
-2.6 76.91 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
-2.4 29.54 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
-2.2 8.10 ± 4% perf-profile.calltrace.cycles-pp.schedule.pipe_read.new_sync_read.vfs_read.ksys_read
-2.2 7.92 ± 4% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_read.new_sync_read.vfs_read
-1.7 20.09 perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
-1.6 21.12 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
-1.5 2.28 ± 4% perf-profile.calltrace.cycles-pp.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
-1.3 1.38 ± 6% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common
-1.3 3.76 ± 5% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__sched_text_start.schedule.pipe_read.new_sync_read
-1.1 25.57 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
-1.1 4.83 ± 6% perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
-1.0 5.08 ± 4% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
-1.0 5.05 ± 4% perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
-1.0 27.81 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
-0.5 2.06 ± 3% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
-0.5 1.87 ± 3% perf-profile.calltrace.cycles-pp.schedule.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
-0.5 1.98 ± 4% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
-0.5 1.78 ± 4% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
-0.5 0.70 ± 12% perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_task_fair.__sched_text_start.schedule.pipe_read
-0.5 0.75 ± 12% perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
-0.4 1.82 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
-0.3 1.40 ± 4% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule.pipe_read
-0.3 1.56 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
-0.3 0.74 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.__sched_text_start.schedule.pipe_read.new_sync_read
-0.3 1.19 ± 2% perf-profile.calltrace.cycles-pp.switch_fpu_return.do_syscall_64.entry_SYSCALL_64_after_hwframe
-0.2 1.25 ± 3% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.pipe_read.new_sync_read
-0.2 1.09 ± 3% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read.ksys_read
-0.2 0.94 ± 4% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.exit_to_usermode_loop.do_syscall_64
-0.2 0.63 ± 4% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate
-0.2 0.67 ± 5% perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up
-0.2 0.59 ± 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__sched_text_start.schedule.pipe_read
-0.2 0.62 ± 2% perf-profile.calltrace.cycles-pp.reweight_entity.dequeue_task_fair.__sched_text_start.schedule.pipe_read
-0.2 0.75 ± 2% perf-profile.calltrace.cycles-pp.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
-0.2 0.57 ± 4% perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule
-0.1 0.68 ± 2% perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common
-0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__switch_to_asm
-0.1 0.79 ± 2% perf-profile.calltrace.cycles-pp.__switch_to
-0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__sched_text_start.schedule.pipe_read
-0.1 0.68 ± 2% perf-profile.calltrace.cycles-pp.native_write_msr
+0.0 0.56 ± 3% perf-profile.calltrace.cycles-pp.current_time.file_update_time.pipe_write.new_sync_write.vfs_write
+0.1 1.69 perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.new_sync_write.vfs_write.ksys_write
+0.1 1.42 perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.new_sync_read.vfs_read.ksys_read
+0.1 1.26 perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.new_sync_write.vfs_write.ksys_write
+0.1 1.07 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.new_sync_write.vfs_write.ksys_write
+0.1 1.61 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout.copy_page_to_iter.pipe_read.new_sync_read
+0.1 1.59 ± 2% perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.__fdget_pos.ksys_write.do_syscall_64
+0.1 1.32 ± 2% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.1 1.02 ± 2% perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.__fdget_pos.ksys_read.do_syscall_64
+0.1 1.34 ± 2% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.1 1.82 ± 2% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.1 1.85 ± 3% perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.1 1.14 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
+0.1 0.84 ± 3% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.new_sync_read.vfs_read
+0.1 1.82 perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.pipe_read.new_sync_read.vfs_read
+0.1 1.12 ± 2% perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.new_sync_read.vfs_read.ksys_read
+0.1 1.25 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.copy_page_from_iter.pipe_write.new_sync_write
+0.1 1.19 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
+0.2 1.44 perf-profile.calltrace.cycles-pp.copyin.copy_page_from_iter.pipe_write.new_sync_write.vfs_write
+0.2 1.14 ± 2% perf-profile.calltrace.cycles-pp.fsnotify.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.2 0.92 ± 2% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_write.new_sync_write.vfs_write
+0.2 1.18 ± 2% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read
+0.2 0.96 ± 2% perf-profile.calltrace.cycles-pp.schedule.pipe_write.new_sync_write.vfs_write.ksys_write
+0.2 1.21 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.new_sync_read
+0.2 1.56 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.2 1.24 ± 2% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.new_sync_read.vfs_read
+0.2 1.33 ± 2% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.new_sync_read.vfs_read.ksys_read
+0.3 2.06 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
+0.3 2.96 perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.new_sync_write.vfs_write.ksys_write
+0.4 3.54 perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.new_sync_read.vfs_read.ksys_read
+0.5 1.40 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
+0.5 1.39 ± 2% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
+0.5 1.40 ± 2% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
+0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.__might_fault.copy_page_to_iter.pipe_read.new_sync_read.vfs_read
+0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
+0.5 1.41 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64
+0.9 0.87 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
+0.9 0.87 perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
+1.0 8.32 ± 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
+1.3 9.29 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
-4.0 12.98 ± 3% perf-profile.children.cycles-pp.__wake_up_common_lock
-3.9 12.49 ± 3% perf-profile.children.cycles-pp.__wake_up_common
-3.9 12.26 ± 3% perf-profile.children.cycles-pp.autoremove_wake_function
-3.9 11.99 ± 3% perf-profile.children.cycles-pp.try_to_wake_up
-3.2 22.56 perf-profile.children.cycles-pp.pipe_write
-3.1 23.59 perf-profile.children.cycles-pp.new_sync_write
-2.6 76.04 perf-profile.children.cycles-pp.do_syscall_64
-2.6 26.88 perf-profile.children.cycles-pp.vfs_write
-2.6 76.96 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
-2.5 11.00 ± 3% perf-profile.children.cycles-pp.schedule
-2.5 11.29 ± 3% perf-profile.children.cycles-pp.__sched_text_start
-2.4 29.58 perf-profile.children.cycles-pp.ksys_write
-1.7 20.20 perf-profile.children.cycles-pp.pipe_read
-1.6 21.18 perf-profile.children.cycles-pp.new_sync_read
-1.4 5.11 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate
-1.4 5.08 ± 4% perf-profile.children.cycles-pp.activate_task
-1.4 2.62 ± 4% perf-profile.children.cycles-pp.select_task_rq_fair
-1.4 4.99 ± 4% perf-profile.children.cycles-pp.enqueue_task_fair
-1.2 1.66 ± 4% perf-profile.children.cycles-pp.select_idle_sibling
-1.2 4.23 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair
-1.1 25.62 perf-profile.children.cycles-pp.vfs_read
-1.0 27.85 perf-profile.children.cycles-pp.ksys_read
-0.9 1.85 ± 10% perf-profile.children.cycles-pp.update_cfs_group
-0.8 3.06 ± 4% perf-profile.children.cycles-pp._raw_spin_lock
-0.7 3.17 ± 3% perf-profile.children.cycles-pp.update_load_avg
-0.6 0.47 ± 6% perf-profile.children.cycles-pp.available_idle_cpu
-0.6 2.66 ± 4% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
-0.5 2.04 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
-0.5 2.32 ± 2% perf-profile.children.cycles-pp.enqueue_entity
-0.5 2.24 ± 3% perf-profile.children.cycles-pp.update_curr
-0.4 2.67 ± 3% perf-profile.children.cycles-pp.pick_next_task_fair
-0.3 1.50 ± 3% perf-profile.children.cycles-pp.reweight_entity
-0.3 1.58 ± 4% perf-profile.children.cycles-pp.dequeue_entity
-0.3 0.09 ± 9% perf-profile.children.cycles-pp.cpumask_next_wrap
-0.3 1.20 ± 2% perf-profile.children.cycles-pp.switch_fpu_return
-0.2 1.08 ± 3% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
-0.2 1.27 ± 3% perf-profile.children.cycles-pp.prepare_to_wait_event
-0.2 1.29 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
-0.2 0.47 ± 2% perf-profile.children.cycles-pp.account_entity_dequeue
-0.2 1.16 ± 3% perf-profile.children.cycles-pp.__switch_to
-0.2 0.75 ± 4% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
-0.2 0.79 ± 3% perf-profile.children.cycles-pp.update_rq_clock
-0.2 0.18 ± 4% perf-profile.children.cycles-pp._find_next_bit
-0.2 0.48 ± 2% perf-profile.children.cycles-pp.___perf_sw_event
-0.1 0.84 ± 2% perf-profile.children.cycles-pp.ttwu_do_wakeup
-0.1 1.02 ± 2% perf-profile.children.cycles-pp.set_next_entity
-0.1 0.76 perf-profile.children.cycles-pp.check_preempt_curr
-0.1 0.85 perf-profile.children.cycles-pp.__update_load_avg_se
-0.1 0.13 ± 17% perf-profile.children.cycles-pp.set_task_cpu
-0.1 0.34 ± 3% perf-profile.children.cycles-pp.sched_clock_cpu
-0.1 0.78 ± 2% perf-profile.children.cycles-pp.__switch_to_asm
-0.1 0.48 ± 2% perf-profile.children.cycles-pp.pick_next_entity
-0.1 0.18 ± 6% perf-profile.children.cycles-pp.__mutex_unlock_slowpath
-0.1 0.31 ± 3% perf-profile.children.cycles-pp.sched_clock
-0.1 0.76 ± 2% perf-profile.children.cycles-pp.native_write_msr
-0.1 0.57 ± 2% perf-profile.children.cycles-pp.check_preempt_wakeup
-0.1 0.30 ± 4% perf-profile.children.cycles-pp.native_sched_clock
-0.1 0.42 ± 3% perf-profile.children.cycles-pp.__enqueue_entity
-0.1 0.41 ± 3% perf-profile.children.cycles-pp.update_min_vruntime
-0.1 0.41 ± 4% perf-profile.children.cycles-pp.__list_del_entry_valid
-0.1 0.48 ± 4% perf-profile.children.cycles-pp.put_prev_entity
-0.1 0.51 ± 2% perf-profile.children.cycles-pp.__calc_delta
-0.1 0.08 ± 13% perf-profile.children.cycles-pp.migrate_task_rq_fair
-0.1 0.27 ± 8% perf-profile.children.cycles-pp.cpuacct_charge
-0.0 0.11 ± 9% perf-profile.children.cycles-pp.cpus_share_cache
-0.0 0.33 ± 3% perf-profile.children.cycles-pp.account_entity_enqueue
-0.0 0.09 ± 11% perf-profile.children.cycles-pp.wake_up_q
-0.0 0.20 ± 2% perf-profile.children.cycles-pp.clear_buddies
-0.0 0.25 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irq
-0.0 0.12 ± 7% perf-profile.children.cycles-pp.resched_curr
-0.0 0.14 ± 5% perf-profile.children.cycles-pp.update_cfs_rq_h_load
-0.0 0.19 ± 6% perf-profile.children.cycles-pp.set_next_buddy
-0.0 0.18 ± 4% perf-profile.children.cycles-pp.rb_insert_color
-0.0 0.28 ± 3% perf-profile.children.cycles-pp.anon_pipe_buf_release
-0.0 0.17 ± 4% perf-profile.children.cycles-pp.cpumask_next
-0.0 0.10 ± 4% perf-profile.children.cycles-pp.wakeup_preempt_entity
-0.0 0.06 ± 7% perf-profile.children.cycles-pp.get_xsave_addr
-0.0 0.15 ± 3% perf-profile.children.cycles-pp.finish_wait
-0.0 0.14 ± 3% perf-profile.children.cycles-pp.__list_add_valid
-0.0 0.11 ± 3% perf-profile.children.cycles-pp.rb_next
-0.0 0.06 perf-profile.children.cycles-pp.perf_swevent_event
-0.0 0.05 perf-profile.children.cycles-pp.kill_fasync
+0.0 0.40 ± 2% perf-profile.children.cycles-pp.__x64_sys_read
+0.0 0.18 ± 3% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
+0.0 0.21 ± 2% perf-profile.children.cycles-pp.iov_iter_init
+0.0 0.09 ± 4% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
+0.0 0.33 ± 5% perf-profile.children.cycles-pp.update_process_times
+0.0 0.27 ± 6% perf-profile.children.cycles-pp.schedule_idle
+0.0 0.24 ± 3% perf-profile.children.cycles-pp.fpregs_assert_state_consistent
+0.0 0.13 ± 5% perf-profile.children.cycles-pp.menu_select
+0.0 0.58 ± 2% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
+0.0 0.07 ± 6% perf-profile.children.cycles-pp.tick_nohz_next_event
+0.0 0.23 ± 6% perf-profile.children.cycles-pp.__sb_start_write
+0.0 0.39 perf-profile.children.cycles-pp.generic_pipe_buf_confirm
+0.0 0.06 ± 7% perf-profile.children.cycles-pp.get_next_timer_interrupt
+0.0 0.28 ± 3% perf-profile.children.cycles-pp.__sb_end_write
+0.0 0.43 perf-profile.children.cycles-pp.aa_file_perm
+0.0 0.31 ± 2% perf-profile.children.cycles-pp.rcu_all_qs
+0.1 1.06 ± 2% perf-profile.children.cycles-pp.__might_sleep
+0.1 0.41 perf-profile.children.cycles-pp.__x64_sys_write
+0.1 0.84 perf-profile.children.cycles-pp.fput_many
+0.1 0.82 perf-profile.children.cycles-pp._cond_resched
+0.1 0.32 ± 5% perf-profile.children.cycles-pp.poll_idle
+0.1 0.53 ± 2% perf-profile.children.cycles-pp.apparmor_file_permission
+0.1 1.09 ± 2% perf-profile.children.cycles-pp.file_update_time
+0.1 1.75 perf-profile.children.cycles-pp.___might_sleep
+0.1 1.12 ± 2% perf-profile.children.cycles-pp.current_time
+0.1 1.75 perf-profile.children.cycles-pp.mutex_unlock
+0.1 0.85 ± 3% perf-profile.children.cycles-pp.atime_needs_update
+0.1 3.19 perf-profile.children.cycles-pp.mutex_lock
+0.1 1.83 perf-profile.children.cycles-pp.copyout
+0.1 0.97 perf-profile.children.cycles-pp.__might_fault
+0.1 1.13 ± 2% perf-profile.children.cycles-pp.touch_atime
+0.2 1.46 perf-profile.children.cycles-pp.copyin
+0.2 1.20 ± 4% perf-profile.children.cycles-pp.__fsnotify_parent
+0.2 2.64 ± 2% perf-profile.children.cycles-pp.__fget_files
+0.2 3.16 ± 2% perf-profile.children.cycles-pp.__fget_light
+0.2 1.62 ± 3% perf-profile.children.cycles-pp.fsnotify
+0.2 3.25 ± 2% perf-profile.children.cycles-pp.__fdget_pos
+0.3 2.34 perf-profile.children.cycles-pp.common_file_perm
+0.3 3.10 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
+0.3 3.05 perf-profile.children.cycles-pp.copy_page_from_iter
+0.3 0.54 ± 2% perf-profile.children.cycles-pp.intel_idle
+0.4 3.60 perf-profile.children.cycles-pp.copy_page_to_iter
+0.4 0.88 perf-profile.children.cycles-pp.cpuidle_enter_state
+0.4 0.88 perf-profile.children.cycles-pp.cpuidle_enter
+0.5 3.67 perf-profile.children.cycles-pp.security_file_permission
+0.5 1.40 ± 2% perf-profile.children.cycles-pp.start_secondary
+0.5 1.41 ± 2% perf-profile.children.cycles-pp.do_idle
+0.5 1.41 ± 2% perf-profile.children.cycles-pp.secondary_startup_64
+0.5 1.41 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
+1.1 9.35 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
+1.3 9.30 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
-0.9 1.84 ± 10% perf-profile.self.cycles-pp.update_cfs_group
-0.6 0.47 ± 6% perf-profile.self.cycles-pp.available_idle_cpu
-0.5 2.66 ± 4% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
-0.3 1.25 ± 5% perf-profile.self.cycles-pp.update_load_avg
-0.3 0.58 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
-0.3 1.19 ± 2% perf-profile.self.cycles-pp.switch_fpu_return
-0.3 1.08 ± 2% perf-profile.self.cycles-pp.update_curr
-0.2 1.43 ± 2% perf-profile.self.cycles-pp.__sched_text_start
-0.2 0.80 ± 4% perf-profile.self.cycles-pp.select_idle_sibling
-0.2 1.05 ± 3% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
-0.2 1.09 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
-0.2 0.37 ± 2% perf-profile.self.cycles-pp.account_entity_dequeue
-0.2 0.82 ± 3% perf-profile.self.cycles-pp.select_task_rq_fair
-0.2 1.06 ± 3% perf-profile.self.cycles-pp.__switch_to
-0.1 0.18 ± 2% perf-profile.self.cycles-pp._find_next_bit
-0.1 0.49 ± 3% perf-profile.self.cycles-pp.update_rq_clock
-0.1 0.42 ± 2% perf-profile.self.cycles-pp.___perf_sw_event
-0.1 0.83 perf-profile.self.cycles-pp.__update_load_avg_se
-0.1 0.77 ± 2% perf-profile.self.cycles-pp.__switch_to_asm
-0.1 0.05 ± 8% perf-profile.self.cycles-pp.cpumask_next_wrap
-0.1 0.59 ± 4% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
-0.1 0.50 ± 2% perf-profile.self.cycles-pp.enqueue_task_fair
-0.1 0.29 ± 4% perf-profile.self.cycles-pp.native_sched_clock
-0.1 0.75 ± 2% perf-profile.self.cycles-pp.native_write_msr
-0.1 0.41 ± 4% perf-profile.self.cycles-pp.__enqueue_entity
-0.1 0.53 ± 2% perf-profile.self.cycles-pp.prepare_to_wait_event
-0.1 0.40 ± 4% perf-profile.self.cycles-pp.__list_del_entry_valid
-0.1 0.40 ± 3% perf-profile.self.cycles-pp.update_min_vruntime
-0.1 0.03 ±100% perf-profile.self.cycles-pp.set_task_cpu
-0.1 0.50 ± 2% perf-profile.self.cycles-pp.__calc_delta
-0.1 0.45 ± 5% perf-profile.self.cycles-pp.pick_next_task_fair
-0.1 0.42 ± 2% perf-profile.self.cycles-pp.dequeue_task_fair
-0.1 0.35 ± 4% perf-profile.self.cycles-pp.try_to_wake_up
-0.1 0.03 ±100% perf-profile.self.cycles-pp.migrate_task_rq_fair
-0.1 0.31 ± 2% perf-profile.self.cycles-pp.enqueue_entity
-0.1 0.26 ± 3% perf-profile.self.cycles-pp.pick_next_entity
-0.1 0.27 ± 7% perf-profile.self.cycles-pp.cpuacct_charge
-0.0 0.23 ± 6% perf-profile.self.cycles-pp.__wake_up_common
-0.0 0.17 ± 3% perf-profile.self.cycles-pp.clear_buddies
-0.0 0.11 ± 9% perf-profile.self.cycles-pp.cpus_share_cache
-0.0 0.29 ± 2% perf-profile.self.cycles-pp.check_preempt_wakeup
-0.0 0.25 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irq
-0.0 0.14 ± 5% perf-profile.self.cycles-pp.update_cfs_rq_h_load
-0.0 0.18 ± 2% perf-profile.self.cycles-pp.check_preempt_curr
-0.0 0.28 ± 2% perf-profile.self.cycles-pp.account_entity_enqueue
-0.0 0.18 ± 6% perf-profile.self.cycles-pp.set_next_buddy
-0.0 0.12 ± 7% perf-profile.self.cycles-pp.resched_curr
-0.0 0.03 ±100% perf-profile.self.cycles-pp.kill_fasync
-0.0 0.17 ± 2% perf-profile.self.cycles-pp.rb_insert_color
-0.0 0.26 ± 3% perf-profile.self.cycles-pp.anon_pipe_buf_release
-0.0 0.10 ± 5% perf-profile.self.cycles-pp.activate_task
-0.0 0.21 ± 5% perf-profile.self.cycles-pp.schedule
-0.0 0.09 ± 4% perf-profile.self.cycles-pp.wakeup_preempt_entity
-0.0 0.07 ± 6% perf-profile.self.cycles-pp.__wake_up_common_lock
-0.0 0.08 ± 5% perf-profile.self.cycles-pp.exit_to_usermode_loop
-0.0 0.06 ± 9% perf-profile.self.cycles-pp.get_xsave_addr
-0.0 0.13 perf-profile.self.cycles-pp.__list_add_valid
-0.0 0.11 ± 4% perf-profile.self.cycles-pp.rb_next
-0.0 0.11 ± 3% perf-profile.self.cycles-pp.autoremove_wake_function
+0.0 0.16 ± 2% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
+0.0 0.19 ± 3% perf-profile.self.cycles-pp.touch_atime
+0.0 0.11 ± 4% perf-profile.self.cycles-pp.copyout
+0.0 0.18 ± 2% perf-profile.self.cycles-pp.iov_iter_init
+0.0 0.36 ± 2% perf-profile.self.cycles-pp.__x64_sys_read
+0.0 0.24 ± 3% perf-profile.self.cycles-pp._cond_resched
+0.0 0.23 perf-profile.self.cycles-pp.__might_fault
+0.0 0.42 perf-profile.self.cycles-pp.ksys_read
+0.0 0.24 ± 4% perf-profile.self.cycles-pp.rcu_all_qs
+0.0 0.35 perf-profile.self.cycles-pp.generic_pipe_buf_confirm
+0.0 0.22 ± 3% perf-profile.self.cycles-pp.fpregs_assert_state_consistent
+0.0 0.22 ± 6% perf-profile.self.cycles-pp.__sb_start_write
+0.0 0.27 ± 5% perf-profile.self.cycles-pp.__sb_end_write
+0.0 0.95 ± 2% perf-profile.self.cycles-pp.__might_sleep
+0.0 0.42 perf-profile.self.cycles-pp.aa_file_perm
+0.0 0.51 ± 3% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
+0.0 0.39 perf-profile.self.cycles-pp.ksys_write
+0.1 0.84 ± 4% perf-profile.self.cycles-pp.new_sync_read
+0.1 0.38 ± 2% perf-profile.self.cycles-pp.__x64_sys_write
+0.1 0.42 ± 2% perf-profile.self.cycles-pp.file_update_time
+0.1 0.34 ± 6% perf-profile.self.cycles-pp.atime_needs_update
+0.1 0.82 perf-profile.self.cycles-pp.fput_many
+0.1 0.62 ± 3% perf-profile.self.cycles-pp.copy_page_from_iter
+0.1 0.71 perf-profile.self.cycles-pp.copy_page_to_iter
+0.1 1.62 perf-profile.self.cycles-pp.mutex_lock
+0.1 0.30 ± 5% perf-profile.self.cycles-pp.poll_idle
+0.1 0.45 ± 2% perf-profile.self.cycles-pp.apparmor_file_permission
+0.1 1.73 perf-profile.self.cycles-pp.pipe_read
+0.1 0.47 ± 4% perf-profile.self.cycles-pp.current_time
+0.1 0.46 ± 2% perf-profile.self.cycles-pp.security_file_permission
+0.1 0.92 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
+0.1 1.70 perf-profile.self.cycles-pp.___might_sleep
+0.1 0.66 ± 4% perf-profile.self.cycles-pp.vfs_write
+0.1 1.71 perf-profile.self.cycles-pp.mutex_unlock
+0.1 0.94 ± 3% perf-profile.self.cycles-pp.new_sync_write
+0.2 1.14 ± 4% perf-profile.self.cycles-pp.__fsnotify_parent
+0.2 2.61 ± 2% perf-profile.self.cycles-pp.__fget_files
+0.2 1.57 ± 3% perf-profile.self.cycles-pp.fsnotify
+0.2 1.88 perf-profile.self.cycles-pp.common_file_perm
+0.2 1.69 ± 3% perf-profile.self.cycles-pp.pipe_write
+0.3 3.03 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
+0.3 0.54 ± 2% perf-profile.self.cycles-pp.intel_idle
+1.0 9.33 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
+1.3 9.30 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
+1.4 14.26 perf-profile.self.cycles-pp.do_syscall_64
-71.3% 760.75 ± 48% interrupts.40:PCI-MSI.67633155-edge.eth0-TxRx-2
-28.7% 26986660 ± 6% interrupts.CAL:Function_call_interrupts
-28.2% 260143 ± 6% interrupts.CPU0.CAL:Function_call_interrupts
-19.2% 3373668 interrupts.CPU0.RES:Rescheduling_interrupts
-24.0% 305064 ± 5% interrupts.CPU0.TLB:TLB_shootdowns
-28.3% 260808 ± 6% interrupts.CPU1.CAL:Function_call_interrupts
-18.6% 3369331 interrupts.CPU1.RES:Rescheduling_interrupts
-24.0% 304981 ± 5% interrupts.CPU1.TLB:TLB_shootdowns
-28.3% 259881 ± 6% interrupts.CPU10.CAL:Function_call_interrupts
-15.5% 3431657 interrupts.CPU10.RES:Rescheduling_interrupts
-24.3% 304425 ± 5% interrupts.CPU10.TLB:TLB_shootdowns
-29.3% 260608 ± 6% interrupts.CPU100.CAL:Function_call_interrupts
-16.2% 4023024 ± 2% interrupts.CPU100.RES:Rescheduling_interrupts
-24.7% 302610 ± 5% interrupts.CPU100.TLB:TLB_shootdowns
-29.3% 259014 ± 6% interrupts.CPU101.CAL:Function_call_interrupts
-13.6% 3984672 ± 2% interrupts.CPU101.RES:Rescheduling_interrupts
-24.6% 302610 ± 5% interrupts.CPU101.TLB:TLB_shootdowns
-30.3% 255792 ± 6% interrupts.CPU102.CAL:Function_call_interrupts
-18.0% 3900237 interrupts.CPU102.RES:Rescheduling_interrupts
-24.5% 302647 ± 5% interrupts.CPU102.TLB:TLB_shootdowns
-28.4% 258147 ± 6% interrupts.CPU103.CAL:Function_call_interrupts
-15.9% 3972301 interrupts.CPU103.RES:Rescheduling_interrupts
-24.3% 300113 ± 5% interrupts.CPU103.TLB:TLB_shootdowns
-28.1% 260975 ± 6% interrupts.CPU11.CAL:Function_call_interrupts
-15.3% 3372172 ± 3% interrupts.CPU11.RES:Rescheduling_interrupts
-24.2% 304812 ± 5% interrupts.CPU11.TLB:TLB_shootdowns
-28.1% 261210 ± 6% interrupts.CPU12.CAL:Function_call_interrupts
-15.2% 3379172 ± 2% interrupts.CPU12.RES:Rescheduling_interrupts
-24.1% 305453 ± 5% interrupts.CPU12.TLB:TLB_shootdowns
-27.6% 260902 ± 6% interrupts.CPU13.CAL:Function_call_interrupts
-17.9% 3378167 interrupts.CPU13.RES:Rescheduling_interrupts
-23.9% 305788 ± 5% interrupts.CPU13.TLB:TLB_shootdowns
-28.4% 261201 ± 6% interrupts.CPU14.CAL:Function_call_interrupts
-14.0% 3526194 ± 5% interrupts.CPU14.RES:Rescheduling_interrupts
-24.1% 305374 ± 5% interrupts.CPU14.TLB:TLB_shootdowns
-27.4% 261087 ± 6% interrupts.CPU15.CAL:Function_call_interrupts
-16.8% 3407557 ± 3% interrupts.CPU15.RES:Rescheduling_interrupts
-24.2% 305046 ± 5% interrupts.CPU15.TLB:TLB_shootdowns
-28.6% 260269 ± 6% interrupts.CPU16.CAL:Function_call_interrupts
-16.0% 3385620 ± 2% interrupts.CPU16.RES:Rescheduling_interrupts
-24.3% 304787 ± 5% interrupts.CPU16.TLB:TLB_shootdowns
-28.5% 260183 ± 6% interrupts.CPU17.CAL:Function_call_interrupts
-17.3% 3379738 ± 2% interrupts.CPU17.RES:Rescheduling_interrupts
-24.2% 305136 ± 5% interrupts.CPU17.TLB:TLB_shootdowns
-27.8% 260100 ± 6% interrupts.CPU18.CAL:Function_call_interrupts
-13.7% 3390527 interrupts.CPU18.RES:Rescheduling_interrupts
-23.9% 305293 ± 5% interrupts.CPU18.TLB:TLB_shootdowns
-28.7% 259901 ± 7% interrupts.CPU19.CAL:Function_call_interrupts
-15.6% 3315295 interrupts.CPU19.RES:Rescheduling_interrupts
-24.3% 304913 ± 5% interrupts.CPU19.TLB:TLB_shootdowns
-28.0% 260638 ± 7% interrupts.CPU2.CAL:Function_call_interrupts
-13.5% 3501632 ± 3% interrupts.CPU2.RES:Rescheduling_interrupts
-24.2% 304544 ± 5% interrupts.CPU2.TLB:TLB_shootdowns
-28.5% 260076 ± 6% interrupts.CPU20.CAL:Function_call_interrupts
-18.2% 3335972 interrupts.CPU20.RES:Rescheduling_interrupts
-24.0% 305365 ± 5% interrupts.CPU20.TLB:TLB_shootdowns
-28.3% 260522 ± 6% interrupts.CPU21.CAL:Function_call_interrupts
-14.8% 3455466 ± 3% interrupts.CPU21.RES:Rescheduling_interrupts
-24.1% 305299 ± 5% interrupts.CPU21.TLB:TLB_shootdowns
-28.6% 259759 ± 6% interrupts.CPU22.CAL:Function_call_interrupts
-16.2% 3415930 interrupts.CPU22.RES:Rescheduling_interrupts
-24.2% 304981 ± 5% interrupts.CPU22.TLB:TLB_shootdowns
-29.0% 259418 ± 6% interrupts.CPU23.CAL:Function_call_interrupts
-15.9% 3463648 interrupts.CPU23.RES:Rescheduling_interrupts
-24.1% 305374 ± 5% interrupts.CPU23.TLB:TLB_shootdowns
-29.2% 257845 ± 7% interrupts.CPU24.CAL:Function_call_interrupts
-14.8% 3408667 interrupts.CPU24.RES:Rescheduling_interrupts
-24.2% 305195 ± 5% interrupts.CPU24.TLB:TLB_shootdowns
-28.2% 261770 ± 6% interrupts.CPU25.CAL:Function_call_interrupts
-15.0% 3396788 ± 4% interrupts.CPU25.RES:Rescheduling_interrupts
-23.9% 306417 ± 5% interrupts.CPU25.TLB:TLB_shootdowns
-28.7% 257829 ± 6% interrupts.CPU26.CAL:Function_call_interrupts
-15.6% 3908644 ± 2% interrupts.CPU26.RES:Rescheduling_interrupts
-24.4% 303767 ± 5% interrupts.CPU26.TLB:TLB_shootdowns
-29.2% 257597 ± 6% interrupts.CPU27.CAL:Function_call_interrupts
-15.0% 3951161 ± 4% interrupts.CPU27.RES:Rescheduling_interrupts
-24.4% 303358 ± 5% interrupts.CPU27.TLB:TLB_shootdowns
-28.8% 258804 ± 5% interrupts.CPU28.CAL:Function_call_interrupts
-16.9% 3941433 ± 3% interrupts.CPU28.RES:Rescheduling_interrupts
-24.3% 303626 ± 5% interrupts.CPU28.TLB:TLB_shootdowns
-28.6% 255899 ± 6% interrupts.CPU29.CAL:Function_call_interrupts
-17.5% 3956275 interrupts.CPU29.RES:Rescheduling_interrupts
-24.3% 303302 ± 5% interrupts.CPU29.TLB:TLB_shootdowns
-28.3% 260048 ± 7% interrupts.CPU3.CAL:Function_call_interrupts
-15.9% 3469625 interrupts.CPU3.RES:Rescheduling_interrupts
-24.1% 305153 ± 5% interrupts.CPU3.TLB:TLB_shootdowns
-28.9% 257858 ± 7% interrupts.CPU30.CAL:Function_call_interrupts
-16.3% 3953569 interrupts.CPU30.RES:Rescheduling_interrupts
-24.3% 303089 ± 5% interrupts.CPU30.TLB:TLB_shootdowns
-29.3% 255783 ± 7% interrupts.CPU31.CAL:Function_call_interrupts
-15.1% 3918955 ± 2% interrupts.CPU31.RES:Rescheduling_interrupts
-24.2% 303491 ± 5% interrupts.CPU31.TLB:TLB_shootdowns
-71.3% 760.75 ± 48% interrupts.CPU32.40:PCI-MSI.67633155-edge.eth0-TxRx-2
-29.1% 258148 ± 6% interrupts.CPU32.CAL:Function_call_interrupts
-16.8% 3864633 ± 2% interrupts.CPU32.RES:Rescheduling_interrupts
-24.4% 302773 ± 5% interrupts.CPU32.TLB:TLB_shootdowns
-29.0% 256142 ± 6% interrupts.CPU33.CAL:Function_call_interrupts
-15.5% 3964179 interrupts.CPU33.RES:Rescheduling_interrupts
-24.7% 302028 ± 5% interrupts.CPU33.TLB:TLB_shootdowns
-29.3% 256630 ± 7% interrupts.CPU34.CAL:Function_call_interrupts
-14.8% 4003039 ± 2% interrupts.CPU34.RES:Rescheduling_interrupts
-24.4% 302608 ± 5% interrupts.CPU34.TLB:TLB_shootdowns
-29.3% 256380 ± 6% interrupts.CPU35.CAL:Function_call_interrupts
-15.9% 3968223 interrupts.CPU35.RES:Rescheduling_interrupts
-24.4% 303502 ± 5% interrupts.CPU35.TLB:TLB_shootdowns
-29.0% 257882 ± 6% interrupts.CPU36.CAL:Function_call_interrupts
-15.8% 4001989 ± 2% interrupts.CPU36.RES:Rescheduling_interrupts
-24.7% 302312 ± 5% interrupts.CPU36.TLB:TLB_shootdowns
-28.9% 256628 ± 6% interrupts.CPU37.CAL:Function_call_interrupts
-16.8% 3884392 ± 3% interrupts.CPU37.RES:Rescheduling_interrupts
-24.6% 302492 ± 5% interrupts.CPU37.TLB:TLB_shootdowns
-29.4% 257238 ± 6% interrupts.CPU38.CAL:Function_call_interrupts
-15.7% 3939020 interrupts.CPU38.RES:Rescheduling_interrupts
-24.8% 302424 ± 5% interrupts.CPU38.TLB:TLB_shootdowns
-29.6% 255300 ± 6% interrupts.CPU39.CAL:Function_call_interrupts
-15.9% 3890773 interrupts.CPU39.RES:Rescheduling_interrupts
-24.5% 302499 ± 5% interrupts.CPU39.TLB:TLB_shootdowns
-28.8% 259317 ± 6% interrupts.CPU4.CAL:Function_call_interrupts
-17.6% 3350696 ± 2% interrupts.CPU4.RES:Rescheduling_interrupts
-24.1% 305297 ± 5% interrupts.CPU4.TLB:TLB_shootdowns
-29.4% 255616 ± 5% interrupts.CPU40.CAL:Function_call_interrupts
-18.1% 3962808 interrupts.CPU40.RES:Rescheduling_interrupts
-24.4% 302784 ± 5% interrupts.CPU40.TLB:TLB_shootdowns
-30.4% 254102 ± 5% interrupts.CPU41.CAL:Function_call_interrupts
-16.5% 4010224 interrupts.CPU41.RES:Rescheduling_interrupts
-24.7% 302125 ± 5% interrupts.CPU41.TLB:TLB_shootdowns
-29.4% 257505 ± 6% interrupts.CPU42.CAL:Function_call_interrupts
-15.6% 3973577 interrupts.CPU42.RES:Rescheduling_interrupts
-24.6% 302726 ± 5% interrupts.CPU42.TLB:TLB_shootdowns
-29.6% 256622 ± 5% interrupts.CPU43.CAL:Function_call_interrupts
-15.5% 3931929 interrupts.CPU43.RES:Rescheduling_interrupts
-24.4% 303154 ± 5% interrupts.CPU43.TLB:TLB_shootdowns
-29.0% 257903 ± 6% interrupts.CPU44.CAL:Function_call_interrupts
-16.4% 3881937 interrupts.CPU44.RES:Rescheduling_interrupts
-24.2% 304146 ± 5% interrupts.CPU44.TLB:TLB_shootdowns
-28.8% 257434 ± 5% interrupts.CPU45.CAL:Function_call_interrupts
-15.5% 3984695 interrupts.CPU45.RES:Rescheduling_interrupts
-24.5% 303329 ± 5% interrupts.CPU45.TLB:TLB_shootdowns
-29.5% 256164 ± 6% interrupts.CPU46.CAL:Function_call_interrupts
-17.0% 3901234 interrupts.CPU46.RES:Rescheduling_interrupts
-24.4% 303164 ± 5% interrupts.CPU46.TLB:TLB_shootdowns
-29.1% 256802 ± 6% interrupts.CPU47.CAL:Function_call_interrupts
-18.6% 3877850 ± 2% interrupts.CPU47.RES:Rescheduling_interrupts
-24.4% 302846 ± 5% interrupts.CPU47.TLB:TLB_shootdowns
-29.2% 258114 ± 6% interrupts.CPU48.CAL:Function_call_interrupts
-16.5% 4005147 ± 3% interrupts.CPU48.RES:Rescheduling_interrupts
-24.6% 302688 ± 5% interrupts.CPU48.TLB:TLB_shootdowns
-29.4% 255420 ± 7% interrupts.CPU49.CAL:Function_call_interrupts
-14.5% 3953814 interrupts.CPU49.RES:Rescheduling_interrupts
-24.6% 302279 ± 5% interrupts.CPU49.TLB:TLB_shootdowns
-28.5% 260141 ± 6% interrupts.CPU5.CAL:Function_call_interrupts
-15.6% 3382942 ± 2% interrupts.CPU5.RES:Rescheduling_interrupts
-24.0% 305434 ± 5% interrupts.CPU5.TLB:TLB_shootdowns
-29.4% 255899 ± 6% interrupts.CPU50.CAL:Function_call_interrupts
-18.4% 3868165 ± 2% interrupts.CPU50.RES:Rescheduling_interrupts
-24.7% 302304 ± 5% interrupts.CPU50.TLB:TLB_shootdowns
-29.2% 257565 ± 6% interrupts.CPU51.CAL:Function_call_interrupts
-14.0% 3966588 interrupts.CPU51.RES:Rescheduling_interrupts
-24.4% 303235 ± 5% interrupts.CPU51.TLB:TLB_shootdowns
-28.5% 261570 ± 6% interrupts.CPU52.CAL:Function_call_interrupts
-15.4% 3420867 interrupts.CPU52.RES:Rescheduling_interrupts
-24.0% 306236 ± 5% interrupts.CPU52.TLB:TLB_shootdowns
-28.5% 261242 ± 6% interrupts.CPU53.CAL:Function_call_interrupts
+24.2% 7198 ± 5% interrupts.CPU53.NMI:Non-maskable_interrupts
+24.2% 7198 ± 5% interrupts.CPU53.PMI:Performance_monitoring_interrupts
-17.6% 3382750 interrupts.CPU53.RES:Rescheduling_interrupts
-23.9% 306068 ± 5% interrupts.CPU53.TLB:TLB_shootdowns
-28.7% 261847 ± 7% interrupts.CPU54.CAL:Function_call_interrupts
+24.2% 7195 ± 5% interrupts.CPU54.NMI:Non-maskable_interrupts
+24.2% 7195 ± 5% interrupts.CPU54.PMI:Performance_monitoring_interrupts
-24.1% 305604 ± 5% interrupts.CPU54.TLB:TLB_shootdowns
-28.1% 262436 ± 6% interrupts.CPU55.CAL:Function_call_interrupts
-15.5% 3478406 ± 2% interrupts.CPU55.RES:Rescheduling_interrupts
-23.9% 305817 ± 5% interrupts.CPU55.TLB:TLB_shootdowns
-28.3% 260512 ± 6% interrupts.CPU56.CAL:Function_call_interrupts
-16.9% 3386640 ± 2% interrupts.CPU56.RES:Rescheduling_interrupts
-24.1% 305288 ± 5% interrupts.CPU56.TLB:TLB_shootdowns
-29.3% 257580 ± 7% interrupts.CPU57.CAL:Function_call_interrupts
-14.2% 3410238 ± 2% interrupts.CPU57.RES:Rescheduling_interrupts
-24.1% 305580 ± 5% interrupts.CPU57.TLB:TLB_shootdowns
-28.4% 261302 ± 6% interrupts.CPU58.CAL:Function_call_interrupts
-15.6% 3371100 ± 3% interrupts.CPU58.RES:Rescheduling_interrupts
-23.9% 305912 ± 5% interrupts.CPU58.TLB:TLB_shootdowns
-28.9% 260298 ± 6% interrupts.CPU59.CAL:Function_call_interrupts
-18.3% 3401390 ± 3% interrupts.CPU59.RES:Rescheduling_interrupts
-24.1% 305280 ± 5% interrupts.CPU59.TLB:TLB_shootdowns
-28.4% 259559 ± 6% interrupts.CPU6.CAL:Function_call_interrupts
-16.3% 3339970 ± 4% interrupts.CPU6.RES:Rescheduling_interrupts
-23.9% 305440 ± 5% interrupts.CPU6.TLB:TLB_shootdowns
-28.2% 260693 ± 6% interrupts.CPU60.CAL:Function_call_interrupts
+7.2% 7197 ± 5% interrupts.CPU60.NMI:Non-maskable_interrupts
+7.2% 7197 ± 5% interrupts.CPU60.PMI:Performance_monitoring_interrupts
-19.6% 3360115 interrupts.CPU60.RES:Rescheduling_interrupts
-24.0% 305412 ± 5% interrupts.CPU60.TLB:TLB_shootdowns
-28.7% 260377 ± 6% interrupts.CPU61.CAL:Function_call_interrupts
-18.7% 3406194 interrupts.CPU61.RES:Rescheduling_interrupts
-24.1% 304927 ± 5% interrupts.CPU61.TLB:TLB_shootdowns
-27.9% 261095 ± 6% interrupts.CPU62.CAL:Function_call_interrupts
-15.8% 3416999 interrupts.CPU62.RES:Rescheduling_interrupts
-24.2% 304907 ± 5% interrupts.CPU62.TLB:TLB_shootdowns
-28.0% 261235 ± 7% interrupts.CPU63.CAL:Function_call_interrupts
-15.6% 3410234 interrupts.CPU63.RES:Rescheduling_interrupts
-24.0% 306038 ± 5% interrupts.CPU63.TLB:TLB_shootdowns
-28.2% 262553 ± 6% interrupts.CPU64.CAL:Function_call_interrupts
-15.4% 3408027 ± 2% interrupts.CPU64.RES:Rescheduling_interrupts
-24.0% 305886 ± 5% interrupts.CPU64.TLB:TLB_shootdowns
-28.3% 262579 ± 6% interrupts.CPU65.CAL:Function_call_interrupts
-17.1% 3432752 interrupts.CPU65.RES:Rescheduling_interrupts
-23.9% 305949 ± 5% interrupts.CPU65.TLB:TLB_shootdowns
-27.6% 262282 ± 6% interrupts.CPU66.CAL:Function_call_interrupts
-13.4% 3521633 ± 4% interrupts.CPU66.RES:Rescheduling_interrupts
-23.9% 306554 ± 5% interrupts.CPU66.TLB:TLB_shootdowns
-28.6% 261064 ± 6% interrupts.CPU67.CAL:Function_call_interrupts
+7.1% 7198 ± 5% interrupts.CPU67.NMI:Non-maskable_interrupts
+7.1% 7198 ± 5% interrupts.CPU67.PMI:Performance_monitoring_interrupts
-17.4% 3438395 ± 2% interrupts.CPU67.RES:Rescheduling_interrupts
-24.1% 305630 ± 5% interrupts.CPU67.TLB:TLB_shootdowns
-28.3% 262144 ± 6% interrupts.CPU68.CAL:Function_call_interrupts
-16.8% 3425141 ± 2% interrupts.CPU68.RES:Rescheduling_interrupts
-24.1% 305021 ± 5% interrupts.CPU68.TLB:TLB_shootdowns
-27.6% 262387 ± 6% interrupts.CPU69.CAL:Function_call_interrupts
-16.5% 3402472 ± 3% interrupts.CPU69.RES:Rescheduling_interrupts
-24.1% 305514 ± 5% interrupts.CPU69.TLB:TLB_shootdowns
-28.4% 261143 ± 6% interrupts.CPU7.CAL:Function_call_interrupts
-17.9% 3381707 ± 2% interrupts.CPU7.RES:Rescheduling_interrupts
-24.2% 304933 ± 5% interrupts.CPU7.TLB:TLB_shootdowns
-28.4% 261733 ± 6% interrupts.CPU70.CAL:Function_call_interrupts
+7.2% 7198 ± 5% interrupts.CPU70.NMI:Non-maskable_interrupts
+7.2% 7198 ± 5% interrupts.CPU70.PMI:Performance_monitoring_interrupts
-16.4% 3364032 ± 2% interrupts.CPU70.RES:Rescheduling_interrupts
-23.9% 305821 ± 5% interrupts.CPU70.TLB:TLB_shootdowns
-28.7% 262144 ± 6% interrupts.CPU71.CAL:Function_call_interrupts
-16.0% 3350193 ± 2% interrupts.CPU71.RES:Rescheduling_interrupts
-23.8% 306426 ± 5% interrupts.CPU71.TLB:TLB_shootdowns
-28.1% 261558 ± 6% interrupts.CPU72.CAL:Function_call_interrupts
-19.5% 3342555 ± 2% interrupts.CPU72.RES:Rescheduling_interrupts
-23.8% 306167 ± 5% interrupts.CPU72.TLB:TLB_shootdowns
-28.5% 262000 ± 6% interrupts.CPU73.CAL:Function_call_interrupts
-17.7% 3378751 interrupts.CPU73.RES:Rescheduling_interrupts
-23.9% 305974 ± 5% interrupts.CPU73.TLB:TLB_shootdowns
-27.8% 261492 ± 6% interrupts.CPU74.CAL:Function_call_interrupts
-17.0% 3412501 interrupts.CPU74.RES:Rescheduling_interrupts
-24.0% 305510 ± 5% interrupts.CPU74.TLB:TLB_shootdowns
-28.4% 261566 ± 6% interrupts.CPU75.CAL:Function_call_interrupts
-16.4% 3473919 interrupts.CPU75.RES:Rescheduling_interrupts
-24.1% 305396 ± 5% interrupts.CPU75.TLB:TLB_shootdowns
-28.4% 262310 ± 6% interrupts.CPU76.CAL:Function_call_interrupts
-16.4% 3408530 interrupts.CPU76.RES:Rescheduling_interrupts
-24.1% 305081 ± 5% interrupts.CPU76.TLB:TLB_shootdowns
-28.3% 262932 ± 6% interrupts.CPU77.CAL:Function_call_interrupts
-15.5% 3388336 ± 3% interrupts.CPU77.RES:Rescheduling_interrupts
-24.1% 305322 ± 5% interrupts.CPU77.TLB:TLB_shootdowns
-29.0% 259058 ± 5% interrupts.CPU78.CAL:Function_call_interrupts
-14.8% 3946940 interrupts.CPU78.RES:Rescheduling_interrupts
-24.4% 303531 ± 5% interrupts.CPU78.TLB:TLB_shootdowns
-29.3% 257370 ± 7% interrupts.CPU79.CAL:Function_call_interrupts
-15.3% 3971419 ± 4% interrupts.CPU79.RES:Rescheduling_interrupts
-24.6% 303072 ± 5% interrupts.CPU79.TLB:TLB_shootdowns
-27.3% 260514 ± 6% interrupts.CPU8.CAL:Function_call_interrupts
-17.9% 3397095 interrupts.CPU8.RES:Rescheduling_interrupts
-24.2% 304726 ± 5% interrupts.CPU8.TLB:TLB_shootdowns
-28.7% 258810 ± 5% interrupts.CPU80.CAL:Function_call_interrupts
-16.4% 3982078 ± 3% interrupts.CPU80.RES:Rescheduling_interrupts
-24.3% 302916 ± 5% interrupts.CPU80.TLB:TLB_shootdowns
-29.0% 259175 ± 6% interrupts.CPU81.CAL:Function_call_interrupts
-16.9% 4005722 interrupts.CPU81.RES:Rescheduling_interrupts
-24.2% 303544 ± 5% interrupts.CPU81.TLB:TLB_shootdowns
-28.8% 260116 ± 6% interrupts.CPU82.CAL:Function_call_interrupts
-15.1% 3990022 interrupts.CPU82.RES:Rescheduling_interrupts
-24.4% 303514 ± 5% interrupts.CPU82.TLB:TLB_shootdowns
-28.3% 260684 ± 6% interrupts.CPU83.CAL:Function_call_interrupts
-15.5% 3934999 interrupts.CPU83.RES:Rescheduling_interrupts
-24.2% 303545 ± 5% interrupts.CPU83.TLB:TLB_shootdowns
-28.8% 258882 ± 6% interrupts.CPU84.CAL:Function_call_interrupts
-18.8% 3878251 interrupts.CPU84.RES:Rescheduling_interrupts
-24.4% 303077 ± 5% interrupts.CPU84.TLB:TLB_shootdowns
-28.9% 259979 ± 6% interrupts.CPU85.CAL:Function_call_interrupts
-16.3% 3929614 interrupts.CPU85.RES:Rescheduling_interrupts
-24.2% 303600 ± 5% interrupts.CPU85.TLB:TLB_shootdowns
-28.5% 260828 ± 6% interrupts.CPU86.CAL:Function_call_interrupts
-16.0% 3948788 ± 2% interrupts.CPU86.RES:Rescheduling_interrupts
-24.6% 302715 ± 5% interrupts.CPU86.TLB:TLB_shootdowns
-29.0% 259486 ± 6% interrupts.CPU87.CAL:Function_call_interrupts
-14.6% 4031045 ± 2% interrupts.CPU87.RES:Rescheduling_interrupts
-24.4% 303180 ± 5% interrupts.CPU87.TLB:TLB_shootdowns
-29.0% 259945 ± 6% interrupts.CPU88.CAL:Function_call_interrupts
-16.7% 3983894 ± 2% interrupts.CPU88.RES:Rescheduling_interrupts
-24.4% 302993 ± 5% interrupts.CPU88.TLB:TLB_shootdowns
-28.9% 260407 ± 6% interrupts.CPU89.CAL:Function_call_interrupts
-17.1% 3874648 ± 2% interrupts.CPU89.RES:Rescheduling_interrupts
-24.5% 303212 ± 5% interrupts.CPU89.TLB:TLB_shootdowns
-28.4% 258910 ± 6% interrupts.CPU9.CAL:Function_call_interrupts
-20.1% 3349024 ± 2% interrupts.CPU9.RES:Rescheduling_interrupts
-24.0% 305176 ± 5% interrupts.CPU9.TLB:TLB_shootdowns
-28.9% 259926 ± 6% interrupts.CPU90.CAL:Function_call_interrupts
-15.8% 3899831 interrupts.CPU90.RES:Rescheduling_interrupts
-24.7% 302268 ± 5% interrupts.CPU90.TLB:TLB_shootdowns
-30.0% 257696 ± 7% interrupts.CPU91.CAL:Function_call_interrupts
-16.8% 3910344 ± 2% interrupts.CPU91.RES:Rescheduling_interrupts
-24.6% 302375 ± 5% interrupts.CPU91.TLB:TLB_shootdowns
-29.5% 257642 ± 6% interrupts.CPU92.CAL:Function_call_interrupts
-18.7% 3962744 interrupts.CPU92.RES:Rescheduling_interrupts
-24.4% 303132 ± 5% interrupts.CPU92.TLB:TLB_shootdowns
-29.9% 259219 ± 6% interrupts.CPU93.CAL:Function_call_interrupts
-16.5% 4018539 interrupts.CPU93.RES:Rescheduling_interrupts
-24.4% 303151 ± 5% interrupts.CPU93.TLB:TLB_shootdowns
-28.7% 260957 ± 6% interrupts.CPU94.CAL:Function_call_interrupts
-15.0% 4014538 interrupts.CPU94.RES:Rescheduling_interrupts
-24.5% 303183 ± 5% interrupts.CPU94.TLB:TLB_shootdowns
-29.0% 259601 ± 6% interrupts.CPU95.CAL:Function_call_interrupts
-15.3% 3983254 interrupts.CPU95.RES:Rescheduling_interrupts
-24.4% 303321 ± 5% interrupts.CPU95.TLB:TLB_shootdowns
-29.0% 260898 ± 6% interrupts.CPU96.CAL:Function_call_interrupts
-16.3% 3931906 interrupts.CPU96.RES:Rescheduling_interrupts
-24.3% 303573 ± 5% interrupts.CPU96.TLB:TLB_shootdowns
-29.0% 261366 ± 6% interrupts.CPU97.CAL:Function_call_interrupts
-16.2% 3974366 ± 2% interrupts.CPU97.RES:Rescheduling_interrupts
-24.4% 303776 ± 5% interrupts.CPU97.TLB:TLB_shootdowns
-29.6% 257706 ± 6% interrupts.CPU98.CAL:Function_call_interrupts
-16.1% 3946205 ± 2% interrupts.CPU98.RES:Rescheduling_interrupts
-24.7% 302447 ± 5% interrupts.CPU98.TLB:TLB_shootdowns
-28.5% 260792 ± 6% interrupts.CPU99.CAL:Function_call_interrupts
-18.9% 3883947 ± 3% interrupts.CPU99.RES:Rescheduling_interrupts
-24.3% 303226 ± 5% interrupts.CPU99.TLB:TLB_shootdowns
-50.0% 104.00 interrupts.MCP:Machine_check_polls
-16.3% 3.824e+08 interrupts.RES:Rescheduling_interrupts
-24.3% 31635153 ± 5% interrupts.TLB:TLB_shootdowns

hackbench.throughput

+------------------------------------------------------------------+
O OO O OO O O O OO O O |
O O O |
|
|
|
.+ |
+ +.++.++.+ +. |
: ++.+.+: : ++. .+ +.|
: : + +.+ |
: : |
: : |
:+.+.++.++. .++.++.++ |
+ + |
+------------------------------------------------------------------+

hackbench.workload

+----------------------------------------------------------------+
| OO OO OO OO OO O OO OO OO OO O OO |
|
|
|
|
|
++.++.++.+.+ ++.++.+.++.++.++.++.++.|
: : |
: : |
: : |
: : |
: : |
+.++.++.++.+.++.++.++ |
+----------------------------------------------------------------+

hackbench.time.user_time

+-------------------------------------------------------------------+
| OO OO O OO O OO OO O OO O OO OO O O |
|
|
|
|
|
+.+.++.+ ++. .++.+ .+.++.+.++.++.|
+ : + + |
: : |
: : |
: : |
: : |
+.++.+.++.++.+.++.++.+ |
+-------------------------------------------------------------------+

have been estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
configuration may affect actual performance.

Attachments:

(No filename) (84.84 kB)
config-5.6.0-rc4-00108-g15e7470dfc3d0 (206.92 kB)
job-script (7.33 kB)
job.yaml (5.06 kB)
reproduce (1.41 kB)
Download all attachments