2021-07-26 10:25:37

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning for an idle core

When scanning for a single CPU, the scan is limited based on the estimated
average idle time for a domain to reduce the risk that more time is spent
scanning for idle CPUs than we are idle for.

With SMT, if an idle core is expected to exist there is no scan depth
limits so the scan depth may or may not be related to average idle time.
Unfortunately has_idle_cores can be very inaccurate when workloads are
rapidly entering/exiting idle (e.g. hackbench).

As the scan depth is now proportional to cores and not CPUs, enforce
SIS_PROP for idle core scans.

The performance impact of this is variable and is neither a universal
gain nor loss. In some cases, has_idle_cores will be cleared prematurely
because the whole domain was not scanned but has_idle_cores is already
known to be an inaccurate heuristic. There is also additional cost because
time calculations are made even for an idle core scan and the delta is
calculated for both scan successes and failures. Finally, SMT siblings
may be used prematurely due to scan depth limitations.

On the flip side, scan depth is now consistent for both core and smt
scans. The reduction in scan depth improves performance in some cases
and wakeup latency is reduced in some cases.

There were few changes identified in the SIS statistics but notably,
"SIS Core Hit" was slightly reduced in tbench as thread counts increased,
presumably due to the core search depth being throttled.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/fair.c | 33 +++++++++++++++++++--------------
1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 20b9255ebf97..b180205e6b25 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6232,7 +6232,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool

cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);

- if (sched_feat(SIS_PROP) && !has_idle_core) {
+ if (sched_feat(SIS_PROP)) {
u64 avg_cost, avg_idle, span_avg;
unsigned long now = jiffies;

@@ -6265,30 +6265,35 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
if (has_idle_core) {
i = select_idle_core(p, cpu, cpus, &idle_cpu);
if ((unsigned int)i < nr_cpumask_bits)
- return i;
+ break;

+ nr -= sched_smt_weight;
} else {
- if (!--nr)
- return -1;
idle_cpu = __select_idle_cpu(cpu, p);
if ((unsigned int)idle_cpu < nr_cpumask_bits)
break;
+ nr--;
}
+
+ if (nr < 0)
+ break;
}

- if (has_idle_core)
- set_idle_cores(target, false);
+ if ((unsigned int)idle_cpu < nr_cpumask_bits) {
+ if (has_idle_core)
+ set_idle_cores(target, false);

- if (sched_feat(SIS_PROP) && !has_idle_core) {
- time = cpu_clock(this) - time;
+ if (sched_feat(SIS_PROP)) {
+ time = cpu_clock(this) - time;

- /*
- * Account for the scan cost of wakeups against the average
- * idle time.
- */
- this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
+ /*
+ * Account for the scan cost of wakeups against the average
+ * idle time.
+ */
+ this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);

- update_avg(&this_sd->avg_scan_cost, time);
+ update_avg(&this_sd->avg_scan_cost, time);
+ }
}

return idle_cpu;
--
2.26.2


Subject: RE: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning for an idle core



> -----Original Message-----
> From: Mel Gorman [mailto:[email protected]]
> Sent: Monday, July 26, 2021 10:23 PM
> To: LKML <[email protected]>
> Cc: Ingo Molnar <[email protected]>; Peter Zijlstra <[email protected]>;
> Vincent Guittot <[email protected]>; Valentin Schneider
> <[email protected]>; Aubrey Li <[email protected]>; Mel
> Gorman <[email protected]>
> Subject: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning
> for an idle core
>
> When scanning for a single CPU, the scan is limited based on the estimated
> average idle time for a domain to reduce the risk that more time is spent
> scanning for idle CPUs than we are idle for.
>
> With SMT, if an idle core is expected to exist there is no scan depth
> limits so the scan depth may or may not be related to average idle time.
> Unfortunately has_idle_cores can be very inaccurate when workloads are
> rapidly entering/exiting idle (e.g. hackbench).
>
> As the scan depth is now proportional to cores and not CPUs, enforce
> SIS_PROP for idle core scans.
>
> The performance impact of this is variable and is neither a universal
> gain nor loss. In some cases, has_idle_cores will be cleared prematurely
> because the whole domain was not scanned but has_idle_cores is already
> known to be an inaccurate heuristic. There is also additional cost because
> time calculations are made even for an idle core scan and the delta is
> calculated for both scan successes and failures. Finally, SMT siblings
> may be used prematurely due to scan depth limitations.
>
> On the flip side, scan depth is now consistent for both core and smt
> scans. The reduction in scan depth improves performance in some cases
> and wakeup latency is reduced in some cases.
>
> There were few changes identified in the SIS statistics but notably,
> "SIS Core Hit" was slightly reduced in tbench as thread counts increased,
> presumably due to the core search depth being throttled.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> kernel/sched/fair.c | 33 +++++++++++++++++++--------------
> 1 file changed, 19 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 20b9255ebf97..b180205e6b25 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6232,7 +6232,7 @@ static int select_idle_cpu(struct task_struct *p, struct
> sched_domain *sd, bool
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> - if (sched_feat(SIS_PROP) && !has_idle_core) {
> + if (sched_feat(SIS_PROP)) {
> u64 avg_cost, avg_idle, span_avg;
> unsigned long now = jiffies;
>
> @@ -6265,30 +6265,35 @@ static int select_idle_cpu(struct task_struct *p, struct
> sched_domain *sd, bool
> if (has_idle_core) {
> i = select_idle_core(p, cpu, cpus, &idle_cpu);
> if ((unsigned int)i < nr_cpumask_bits)
> - return i;
> + break;
>
> + nr -= sched_smt_weight;
> } else {
> - if (!--nr)
> - return -1;
> idle_cpu = __select_idle_cpu(cpu, p);
> if ((unsigned int)idle_cpu < nr_cpumask_bits)
> break;
> + nr--;
> }
> +
> + if (nr < 0)
> + break;
> }
>
> - if (has_idle_core)
> - set_idle_cores(target, false);
> + if ((unsigned int)idle_cpu < nr_cpumask_bits) {
> + if (has_idle_core)
> + set_idle_cores(target, false);
>

For example, if we have 16 cpus(8 SMT2 cores). In case core7 is idle,
we only have scanned core0+core1(cpu0-cpu3) and if these two cores
are not idle, but here we set has_idle_cores to false while core7 is
idle. It seems incorrect.

> - if (sched_feat(SIS_PROP) && !has_idle_core) {
> - time = cpu_clock(this) - time;
> + if (sched_feat(SIS_PROP)) {
> + time = cpu_clock(this) - time;
>
> - /*
> - * Account for the scan cost of wakeups against the average
> - * idle time.
> - */
> - this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
> + /*
> + * Account for the scan cost of wakeups against the average
> + * idle time.
> + */
> + this_rq->wake_avg_idle -= min(this_rq->wake_avg_idle, time);
>
> - update_avg(&this_sd->avg_scan_cost, time);
> + update_avg(&this_sd->avg_scan_cost, time);
> + }
> }
>
> return idle_cpu;
> --
> 2.26.2


Thanks
Barry


2021-08-04 11:06:34

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/9] sched/fair: Enforce proportional scan limits when scanning for an idle core

On Mon, Aug 02, 2021 at 10:52:01AM +0000, Song Bao Hua (Barry Song) wrote:
> > @@ -6265,30 +6265,35 @@ static int select_idle_cpu(struct task_struct *p, struct
> > sched_domain *sd, bool
> > if (has_idle_core) {
> > i = select_idle_core(p, cpu, cpus, &idle_cpu);
> > if ((unsigned int)i < nr_cpumask_bits)
> > - return i;
> > + break;
> >
> > + nr -= sched_smt_weight;
> > } else {
> > - if (!--nr)
> > - return -1;
> > idle_cpu = __select_idle_cpu(cpu, p);
> > if ((unsigned int)idle_cpu < nr_cpumask_bits)
> > break;
> > + nr--;
> > }
> > +
> > + if (nr < 0)
> > + break;
> > }
> >
> > - if (has_idle_core)
> > - set_idle_cores(target, false);
> > + if ((unsigned int)idle_cpu < nr_cpumask_bits) {
> > + if (has_idle_core)
> > + set_idle_cores(target, false);
> >
>
> For example, if we have 16 cpus(8 SMT2 cores). In case core7 is idle,
> we only have scanned core0+core1(cpu0-cpu3) and if these two cores
> are not idle, but here we set has_idle_cores to false while core7 is
> idle. It seems incorrect.
>

Yep, that block needs to be revisited.

--
Mel Gorman
SUSE Labs