2020-12-03 14:16:55

by Mel Gorman

[permalink] [raw]
Subject: [RFC PATCH 00/10] Reduce time complexity of select_idle_sibling

This is an early prototype that has not been tested heavily. While parts
of it may stand on its own, the motivation to release early is Aubrey
Li's series on using an idle cpumask to optimise the search and Barry
Song's series on representing clusters on die. The series is based on
tip/sched/core rebased to 5.10-rc6.

Patches 1-2 add schedstats to track the search efficiency of
select_idle_sibling. They can be dropped from the final version but
are useful when looking at select_idle_sibling in general. MMTests
can already parse the stats and generate useful data including
graphs over time.

Patch 3 kills SIS_AVG_CPU but is partially reintroduced later in the
context of SIS_PROP.

Patch 4 notes that select_idle_core() can find an idle CPU that is
not a free core yet it is ignored and a second search is conducted
in select_idle_cpu() which is wasteful. Note that this patch
will definitely change in the final version.

Patch 5 adjusts p->recent_used_cpu so that it has a higher success rate
and avoids searching the domain in some cases.

Patch 6 notes that select_idle_* always starts with a CPU that is
definitely not idle and fixes that.

Patch 7 notes that SIS_PROP is only partially accounting for search
costs. While this might be accidentally beneficial, it makes it
much harder to reason about the effectiveness of SIS_PROP.

Patch 8 uses similar logic to SIS_AVG_CPU but in the context of
SIS_PROP to throttle the search depth.

Patches 9 and 10 are stupid in the context of this series. They
are included even though it makes no sense to use SIS_PROP logic in
select_idle_core() as it already has throttling logic. The point
is to illustrate that the select_idle_mask can be initialised
at the start of a domain search used to mask out CPUs that have
already been visited.

In the context of Aubrey's and Barry's work, select_idle_mask would
be initialised *after* select_idle_core as select_idle_core uses
select_idle_mask for its own purposes. In Aubrey's case, the next
step would be to scan idle_cpus_span as those CPUs may still be idle
and bias the search towards likely idle candidates. If that fails,
select_idle_mask clears all the bits set in idle_cpus_span and then
scans the remainder. Similar observations apply to Barry's work, scan the
local domain first, mask out those bits then scan the remaining CPUs in
the cluster.

The final version of this series will drop patches 1-2 unless there is
demand and definitely drop patches 9-10. However, all 4 patches may be
useful in the context of Aubrey's and Barry's work. Patches 1-2 would
give more precise results on exactly how much they are improving "SIS
Domain Search Efficiency" which may be more illustrative than just the
headline performance figures of a given workload. The final version of
this series will also adjust patch 4. If select_idle_core() runs at all
then it definitely should return a CPU -- either an idle CPU or the target
as it has already searched the entire domain and no further searching
should be conducted. Barry might change that back so that a cluster can
be scanned but it would be done in the context of the cluster series.

--
2.26.2


2020-12-03 14:17:28

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

The target CPU is definitely not idle in both select_idle_core and
select_idle_cpu. For select_idle_core(), the SMT is potentially
checked unnecessarily as the core is definitely not idle if the
target is busy. For select_idle_cpu(), the first CPU checked is
simply a waste.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/fair.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68dd9cd62fbd..1d8f5c4b4936 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
return -1;

cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+ __cpumask_clear_cpu(target, cpus);

for_each_cpu_wrap(core, cpus, target) {
bool idle = true;
@@ -6181,6 +6182,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
time = cpu_clock(this);

cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+ __cpumask_clear_cpu(target, cpus);

for_each_cpu_wrap(cpu, cpus, target) {
schedstat_inc(this_rq()->sis_scanned);
--
2.26.2

2020-12-03 14:23:12

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 10/10] sched/fair: Avoid revisiting CPUs multiple times during select_idle_sibling

Note: While this is done in the context of select_idle_core(), I would not
expect it to be done like this. The intent is to illustrate how
idle_cpu_mask could be filtered before select_idle_cpus() scans
the rest of a domain or a wider scan was done across a cluster.

select_idle_core() potentially searches a number of CPUs for idle candidates
before select_idle_cpu() clears the mask and revisits the same CPUs. This
patch moves the initialisation of select_idle_mask to the top-level and
reuses the same mask across both select_idle_core and select_idle_cpu.
select_idle_smt() is left alone as the cost of checking one SMT sibling
is marginal relative to calling __clear_cpumask_cpu() for evey CPU
visited by select_idle_core().

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/fair.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd95daf9f53e..af2e108c20c0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6096,10 +6096,9 @@ void __update_idle_core(struct rq *rq)
* sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
*/
static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
- int target, int nr)
+ int target, int nr, struct cpumask *cpus)
{
int idle_candidate = -1;
- struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
int core, cpu;

if (!static_branch_likely(&sched_smt_present))
@@ -6108,9 +6107,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
if (!test_idle_cores(target, false))
return -1;

- cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
- __cpumask_clear_cpu(target, cpus);
-
for_each_cpu_wrap(core, cpus, target) {
bool idle = true;

@@ -6175,7 +6171,7 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t
#else /* CONFIG_SCHED_SMT */

static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd,
- int target, int nr)
+ int target, int nr, struct cpumask *cpus)
{
return -1;
}
@@ -6193,14 +6189,10 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
* average idle time for this rq (as found in rq->avg_idle).
*/
static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd,
- int target, int nr)
+ int target, int nr, struct cpumask *cpus)
{
- struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
int cpu;

- cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
- __cpumask_clear_cpu(target, cpus);
-
for_each_cpu_wrap(cpu, cpus, target) {
schedstat_inc(this_rq()->sis_scanned);
if (!--nr)
@@ -6260,6 +6252,7 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
static int select_idle_sibling(struct task_struct *p, int prev, int target)
{
struct sched_domain *sd, *this_sd;
+ struct cpumask *cpus_visited;
unsigned long task_util;
int i, recent_used_cpu, depth;
u64 time;
@@ -6358,13 +6351,23 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)

depth = sis_search_depth(sd, this_sd);

+ /*
+ * Init the select_idle_mask. select_idle_core() will mask
+ * out the CPUs that have already been limited to limit the
+ * search in select_idle_cpu(). Further clearing is not
+ * done as select_idle_smt checks only one CPU.
+ */
+ cpus_visited = this_cpu_cpumask_var_ptr(select_idle_mask);
+ cpumask_and(cpus_visited, sched_domain_span(sd), p->cpus_ptr);
+ __cpumask_clear_cpu(target, cpus_visited);
+
schedstat_inc(this_rq()->sis_domain_search);
- i = select_idle_core(p, sd, target, depth);
+ i = select_idle_core(p, sd, target, depth, cpus_visited);
if ((unsigned)i < nr_cpumask_bits)
return i;

time = cpu_clock(smp_processor_id());
- i = select_idle_cpu(p, sd, target, depth);
+ i = select_idle_cpu(p, sd, target, depth, cpus_visited);
if ((unsigned)i < nr_cpumask_bits)
goto acct_cost;

--
2.26.2

2020-12-03 14:23:15

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 09/10] sched/fair: Limit the search for an idle core

Note: This is a bad idea, it's for illustration only to show how the
search space can be filtered at each stage. Searching an
idle_cpu_mask would be a potential option. select_idle_core()
would be left alone as it has its own throttling mechanism

select_idle_core() may search a full domain for an idle core even if idle
CPUs exist result in an excessive search. This patch partially limits
the search for an idle core similar to select_idle_cpu() once an idle
candidate is found.

Note that this patch can *increase* the number of runqueues considered.
Any searching done by select_idle_core() is duplicated by select_idle_cpu()
if an idle candidate is not found. If there is an idle CPU then aborting
select_idle_core() can have a negative impact. This is addressed in the
next patch.

Signed-off-by: Mel Gorman <[email protected]>
---
kernel/sched/fair.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33ce65b67381..cd95daf9f53e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6095,7 +6095,8 @@ void __update_idle_core(struct rq *rq)
* there are no idle cores left in the system; tracked through
* sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
*/
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
+ int target, int nr)
{
int idle_candidate = -1;
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
@@ -6115,6 +6116,11 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int

for_each_cpu(cpu, cpu_smt_mask(core)) {
schedstat_inc(this_rq()->sis_scanned);
+
+ /* Apply limits if there is an idle candidate */
+ if (idle_candidate != -1)
+ nr--;
+
if (!available_idle_cpu(cpu)) {
idle = false;
if (idle_candidate != -1)
@@ -6130,6 +6136,9 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
if (idle)
return core;

+ if (!nr)
+ break;
+
cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
}

@@ -6165,7 +6174,8 @@ static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int t

#else /* CONFIG_SCHED_SMT */

-static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
+static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd,
+ int target, int nr)
{
return -1;
}
@@ -6349,7 +6359,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
depth = sis_search_depth(sd, this_sd);

schedstat_inc(this_rq()->sis_domain_search);
- i = select_idle_core(p, sd, target);
+ i = select_idle_core(p, sd, target, depth);
if ((unsigned)i < nr_cpumask_bits)
return i;

--
2.26.2

2020-12-03 17:56:54

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Thu, Dec 03, 2020 at 05:38:03PM +0100, Vincent Guittot wrote:
> On Thu, 3 Dec 2020 at 15:11, Mel Gorman <[email protected]> wrote:
> >
> > The target CPU is definitely not idle in both select_idle_core and
> > select_idle_cpu. For select_idle_core(), the SMT is potentially
> > checked unnecessarily as the core is definitely not idle if the
> > target is busy. For select_idle_cpu(), the first CPU checked is
> > simply a waste.
>
> >
> > Signed-off-by: Mel Gorman <[email protected]>
> > ---
> > kernel/sched/fair.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 68dd9cd62fbd..1d8f5c4b4936 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> > return -1;
> >
> > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > + __cpumask_clear_cpu(target, cpus);
>
> should clear cpu_smt_mask(target) as we are sure that the core will not be idle
>

The intent was that the sibling might still be an idle candidate. In
the current draft of the series, I do not even clear this so that the
SMT sibling is considered as an idle candidate. The reasoning is that if
there are no idle cores then an SMT sibling of the target is as good an
idle CPU to select as any.

--
Mel Gorman
SUSE Labs

2020-12-03 19:18:40

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Thu, 3 Dec 2020 at 15:11, Mel Gorman <[email protected]> wrote:
>
> The target CPU is definitely not idle in both select_idle_core and
> select_idle_cpu. For select_idle_core(), the SMT is potentially
> checked unnecessarily as the core is definitely not idle if the
> target is busy. For select_idle_cpu(), the first CPU checked is
> simply a waste.

>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> kernel/sched/fair.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 68dd9cd62fbd..1d8f5c4b4936 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> return -1;
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> + __cpumask_clear_cpu(target, cpus);

should clear cpu_smt_mask(target) as we are sure that the core will not be idle

>
> for_each_cpu_wrap(core, cpus, target) {
> bool idle = true;
> @@ -6181,6 +6182,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> time = cpu_clock(this);
>
> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> + __cpumask_clear_cpu(target, cpus);
>
> for_each_cpu_wrap(cpu, cpus, target) {
> schedstat_inc(this_rq()->sis_scanned);
> --
> 2.26.2
>

2020-12-04 11:00:38

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Thu, 3 Dec 2020 at 18:52, Mel Gorman <[email protected]> wrote:
>
> On Thu, Dec 03, 2020 at 05:38:03PM +0100, Vincent Guittot wrote:
> > On Thu, 3 Dec 2020 at 15:11, Mel Gorman <[email protected]> wrote:
> > >
> > > The target CPU is definitely not idle in both select_idle_core and
> > > select_idle_cpu. For select_idle_core(), the SMT is potentially
> > > checked unnecessarily as the core is definitely not idle if the
> > > target is busy. For select_idle_cpu(), the first CPU checked is
> > > simply a waste.
> >
> > >
> > > Signed-off-by: Mel Gorman <[email protected]>
> > > ---
> > > kernel/sched/fair.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 68dd9cd62fbd..1d8f5c4b4936 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6077,6 +6077,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> > > return -1;
> > >
> > > cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > > + __cpumask_clear_cpu(target, cpus);
> >
> > should clear cpu_smt_mask(target) as we are sure that the core will not be idle
> >
>
> The intent was that the sibling might still be an idle candidate. In
> the current draft of the series, I do not even clear this so that the
> SMT sibling is considered as an idle candidate. The reasoning is that if
> there are no idle cores then an SMT sibling of the target is as good an
> idle CPU to select as any.

Isn't the purpose of select_idle_smt ?

select_idle_core() looks for an idle core and opportunistically saves
an idle CPU candidate to skip select_idle_cpu. In this case this is
useless loops for select_idle_core() because we are sure that the core
is not idle


>
> --
> Mel Gorman
> SUSE Labs

2020-12-04 11:34:22

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > The intent was that the sibling might still be an idle candidate. In
> > the current draft of the series, I do not even clear this so that the
> > SMT sibling is considered as an idle candidate. The reasoning is that if
> > there are no idle cores then an SMT sibling of the target is as good an
> > idle CPU to select as any.
>
> Isn't the purpose of select_idle_smt ?
>

Only in part.

> select_idle_core() looks for an idle core and opportunistically saves
> an idle CPU candidate to skip select_idle_cpu. In this case this is
> useless loops for select_idle_core() because we are sure that the core
> is not idle
>

If select_idle_core() finds an idle candidate other than the sibling,
it'll use it if there is no idle core -- it picks a busy sibling based
on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
guaranteed to scan the sibling first (ordering) or even reach the sibling
(throttling). select_idle_smt() is a last-ditch effort.

--
Mel Gorman
SUSE Labs

2020-12-04 13:17:18

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
>
> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > > The intent was that the sibling might still be an idle candidate. In
> > > the current draft of the series, I do not even clear this so that the
> > > SMT sibling is considered as an idle candidate. The reasoning is that if
> > > there are no idle cores then an SMT sibling of the target is as good an
> > > idle CPU to select as any.
> >
> > Isn't the purpose of select_idle_smt ?
> >
>
> Only in part.
>
> > select_idle_core() looks for an idle core and opportunistically saves
> > an idle CPU candidate to skip select_idle_cpu. In this case this is
> > useless loops for select_idle_core() because we are sure that the core
> > is not idle
> >
>
> If select_idle_core() finds an idle candidate other than the sibling,
> it'll use it if there is no idle core -- it picks a busy sibling based
> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not

My point is that it's a waste of time to loop the sibling cpus of
target in select_idle_core because it will not help to find an idle
core. The sibling cpus will then be check either by select_idle_cpu
of select_idle_smt

> guaranteed to scan the sibling first (ordering) or even reach the sibling
> (throttling). select_idle_smt() is a last-ditch effort.
>
> --
> Mel Gorman
> SUSE Labs

2020-12-04 13:22:15

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <[email protected]> wrote:
>
> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
> >
> > On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > > > The intent was that the sibling might still be an idle candidate. In
> > > > the current draft of the series, I do not even clear this so that the
> > > > SMT sibling is considered as an idle candidate. The reasoning is that if
> > > > there are no idle cores then an SMT sibling of the target is as good an
> > > > idle CPU to select as any.
> > >
> > > Isn't the purpose of select_idle_smt ?
> > >
> >
> > Only in part.
> >
> > > select_idle_core() looks for an idle core and opportunistically saves
> > > an idle CPU candidate to skip select_idle_cpu. In this case this is
> > > useless loops for select_idle_core() because we are sure that the core
> > > is not idle
> > >
> >
> > If select_idle_core() finds an idle candidate other than the sibling,
> > it'll use it if there is no idle core -- it picks a busy sibling based
> > on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>
> My point is that it's a waste of time to loop the sibling cpus of
> target in select_idle_core because it will not help to find an idle
> core. The sibling cpus will then be check either by select_idle_cpu
> of select_idle_smt

also, while looping the cpumask, the sibling cpus of not idle cpu are
removed and will not be check

>
> > guaranteed to scan the sibling first (ordering) or even reach the sibling
> > (throttling). select_idle_smt() is a last-ditch effort.
> >
> > --
> > Mel Gorman
> > SUSE Labs

2020-12-04 13:47:45

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On 2020/12/4 21:17, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <[email protected]> wrote:
>>
>> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
>>>
>>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
>>>>> The intent was that the sibling might still be an idle candidate. In
>>>>> the current draft of the series, I do not even clear this so that the
>>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
>>>>> there are no idle cores then an SMT sibling of the target is as good an
>>>>> idle CPU to select as any.
>>>>
>>>> Isn't the purpose of select_idle_smt ?
>>>>
>>>
>>> Only in part.
>>>
>>>> select_idle_core() looks for an idle core and opportunistically saves
>>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
>>>> useless loops for select_idle_core() because we are sure that the core
>>>> is not idle
>>>>
>>>
>>> If select_idle_core() finds an idle candidate other than the sibling,
>>> it'll use it if there is no idle core -- it picks a busy sibling based
>>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>>
>> My point is that it's a waste of time to loop the sibling cpus of
>> target in select_idle_core because it will not help to find an idle
>> core. The sibling cpus will then be check either by select_idle_cpu
>> of select_idle_smt
>
> also, while looping the cpumask, the sibling cpus of not idle cpu are
> removed and will not be check
>

IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
If the target's sibling is removed from select_idle_mask from select_idle_core(),
select_idle_cpu() will lose the chance to pick it up?

Thanks,
-Aubrey

2020-12-04 13:50:57

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, 4 Dec 2020 at 14:40, Li, Aubrey <[email protected]> wrote:
>
> On 2020/12/4 21:17, Vincent Guittot wrote:
> > On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <[email protected]> wrote:
> >>
> >> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
> >>>
> >>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> >>>>> The intent was that the sibling might still be an idle candidate. In
> >>>>> the current draft of the series, I do not even clear this so that the
> >>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
> >>>>> there are no idle cores then an SMT sibling of the target is as good an
> >>>>> idle CPU to select as any.
> >>>>
> >>>> Isn't the purpose of select_idle_smt ?
> >>>>
> >>>
> >>> Only in part.
> >>>
> >>>> select_idle_core() looks for an idle core and opportunistically saves
> >>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
> >>>> useless loops for select_idle_core() because we are sure that the core
> >>>> is not idle
> >>>>
> >>>
> >>> If select_idle_core() finds an idle candidate other than the sibling,
> >>> it'll use it if there is no idle core -- it picks a busy sibling based
> >>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
> >>
> >> My point is that it's a waste of time to loop the sibling cpus of
> >> target in select_idle_core because it will not help to find an idle
> >> core. The sibling cpus will then be check either by select_idle_cpu
> >> of select_idle_smt
> >
> > also, while looping the cpumask, the sibling cpus of not idle cpu are
> > removed and will not be check
> >
>
> IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> If the target's sibling is removed from select_idle_mask from select_idle_core(),
> select_idle_cpu() will lose the chance to pick it up?

This is only relevant for patch 10 which is not to be included IIUC
what mel said in cover letter : "Patches 9 and 10 are stupid in the
context of this series."

>
> Thanks,
> -Aubrey

2020-12-04 13:53:40

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On 2020/12/4 21:40, Li, Aubrey wrote:
> On 2020/12/4 21:17, Vincent Guittot wrote:
>> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <[email protected]> wrote:
>>>
>>> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
>>>>
>>>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
>>>>>> The intent was that the sibling might still be an idle candidate. In
>>>>>> the current draft of the series, I do not even clear this so that the
>>>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
>>>>>> there are no idle cores then an SMT sibling of the target is as good an
>>>>>> idle CPU to select as any.
>>>>>
>>>>> Isn't the purpose of select_idle_smt ?
>>>>>
>>>>
>>>> Only in part.
>>>>
>>>>> select_idle_core() looks for an idle core and opportunistically saves
>>>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
>>>>> useless loops for select_idle_core() because we are sure that the core
>>>>> is not idle
>>>>>
>>>>
>>>> If select_idle_core() finds an idle candidate other than the sibling,
>>>> it'll use it if there is no idle core -- it picks a busy sibling based
>>>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>>>
>>> My point is that it's a waste of time to loop the sibling cpus of
>>> target in select_idle_core because it will not help to find an idle
>>> core. The sibling cpus will then be check either by select_idle_cpu
>>> of select_idle_smt
>>
>> also, while looping the cpumask, the sibling cpus of not idle cpu are
>> removed and will not be check
>>
>
> IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> If the target's sibling is removed from select_idle_mask from select_idle_core(),
> select_idle_cpu() will lose the chance to pick it up?

aha, no, select_idle_mask will be re-assigned in select_idle_cpu() by:

cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);

So, yes, I guess we can remove the cpu_smt_mask(target) from select_idle_core() safely.

>
> Thanks,
> -Aubrey
>

2020-12-04 14:13:57

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On 2020/12/4 21:47, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 14:40, Li, Aubrey <[email protected]> wrote:
>>
>> On 2020/12/4 21:17, Vincent Guittot wrote:
>>> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <[email protected]> wrote:
>>>>
>>>> On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
>>>>>
>>>>> On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
>>>>>>> The intent was that the sibling might still be an idle candidate. In
>>>>>>> the current draft of the series, I do not even clear this so that the
>>>>>>> SMT sibling is considered as an idle candidate. The reasoning is that if
>>>>>>> there are no idle cores then an SMT sibling of the target is as good an
>>>>>>> idle CPU to select as any.
>>>>>>
>>>>>> Isn't the purpose of select_idle_smt ?
>>>>>>
>>>>>
>>>>> Only in part.
>>>>>
>>>>>> select_idle_core() looks for an idle core and opportunistically saves
>>>>>> an idle CPU candidate to skip select_idle_cpu. In this case this is
>>>>>> useless loops for select_idle_core() because we are sure that the core
>>>>>> is not idle
>>>>>>
>>>>>
>>>>> If select_idle_core() finds an idle candidate other than the sibling,
>>>>> it'll use it if there is no idle core -- it picks a busy sibling based
>>>>> on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
>>>>
>>>> My point is that it's a waste of time to loop the sibling cpus of
>>>> target in select_idle_core because it will not help to find an idle
>>>> core. The sibling cpus will then be check either by select_idle_cpu
>>>> of select_idle_smt
>>>
>>> also, while looping the cpumask, the sibling cpus of not idle cpu are
>>> removed and will not be check
>>>
>>
>> IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
>> If the target's sibling is removed from select_idle_mask from select_idle_core(),
>> select_idle_cpu() will lose the chance to pick it up?
>
> This is only relevant for patch 10 which is not to be included IIUC
> what mel said in cover letter : "Patches 9 and 10 are stupid in the
> context of this series."

So the target's sibling can be removed from cpumask in select_idle_core
in patch 6, and need to be added back in select_idle_core in patch 10, :)

2020-12-04 14:31:18

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, Dec 04, 2020 at 02:17:20PM +0100, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 14:13, Vincent Guittot <[email protected]> wrote:
> >
> > On Fri, 4 Dec 2020 at 12:30, Mel Gorman <[email protected]> wrote:
> > >
> > > On Fri, Dec 04, 2020 at 11:56:36AM +0100, Vincent Guittot wrote:
> > > > > The intent was that the sibling might still be an idle candidate. In
> > > > > the current draft of the series, I do not even clear this so that the
> > > > > SMT sibling is considered as an idle candidate. The reasoning is that if
> > > > > there are no idle cores then an SMT sibling of the target is as good an
> > > > > idle CPU to select as any.
> > > >
> > > > Isn't the purpose of select_idle_smt ?
> > > >
> > >
> > > Only in part.
> > >
> > > > select_idle_core() looks for an idle core and opportunistically saves
> > > > an idle CPU candidate to skip select_idle_cpu. In this case this is
> > > > useless loops for select_idle_core() because we are sure that the core
> > > > is not idle
> > > >
> > >
> > > If select_idle_core() finds an idle candidate other than the sibling,
> > > it'll use it if there is no idle core -- it picks a busy sibling based
> > > on a linear walk of the cpumask. Similarly, select_idle_cpu() is not
> >
> > My point is that it's a waste of time to loop the sibling cpus of
> > target in select_idle_core because it will not help to find an idle
> > core. The sibling cpus will then be check either by select_idle_cpu
> > of select_idle_smt
>

I understand and you're right, the full loop was in the context of a series
that unified select_idle_* where it made sense. The version I'm currently
testing aborts the SMT search if a !idle sibling is encountered. That
means that select_idle_core() will no longer scan the entire domain if
there are no idle cores.

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/commit/?h=sched-sissearch-v2r6&id=eb04a344cf7d7ca64c0c8fc0bcade261fa08c19e

With the patch on its own, it does mean that select_idle_sibling
starts over because SMT siblings might have been cleared. As an aside,
select_idle_core() has it's own problems even then. It can start a scan
for an idle sibling when cpu_rq(target)->nr_running is very large --
over 100+ running tasks which is almost certainly a useless scan for
cores. However, I haven't done anything with that in this series as it
seemed like it would be follow-up work.

> also, while looping the cpumask, the sibling cpus of not idle cpu are
> removed and will not be check
>

True and I spotted this. I think the load_balance_mask can be abused to
clear siblings during select_idle_core() while using select_idle_mask to
track CPUs that have not been scanned yet so select_idle_cpu only scans
CPUs that have not already been visited.

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/commit/?h=sched-sissearch-v2r6&id=a6e986dae38855e3be26dfde86bbef1617431dd1

As both the idle candidate and the load_balance_mask abuse are likely to
be controversial, I shuffled the series so that it's ordered from least
least controversial to most controversial.

This
https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/log/?h=sched-sissearch-v2r6
is what is currently being tested. It'll take most of the weekend and I'll
post them properly if they pass tests and do not throw up nasty surprises.

--
Mel Gorman
SUSE Labs

2020-12-04 14:34:50

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > select_idle_cpu() will lose the chance to pick it up?
>
> This is only relevant for patch 10 which is not to be included IIUC
> what mel said in cover letter : "Patches 9 and 10 are stupid in the
> context of this series."
>

Patch 10 was stupid in the context of the prototype because
select_idle_core always returned a CPU. A variation ended up being
reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
are cleared during select_idle_core() but select_idle_cpu() still has a
mask with unvisited CPUs to consider if no idle cores are found.

As far as I know, this would still be compatible with Aubrey's idle
cpu mask as long as it's visited and cleared between select_idle_core
and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
because the idle cpu mask would be a hint so if the information is out
of date, an idle cpu may still be found the normal way.

--
Mel Gorman
SUSE Labs

2020-12-04 15:28:49

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, 4 Dec 2020 at 15:31, Mel Gorman <[email protected]> wrote:
>
> On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > select_idle_cpu() will lose the chance to pick it up?
> >
> > This is only relevant for patch 10 which is not to be included IIUC
> > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > context of this series."
> >
>
> Patch 10 was stupid in the context of the prototype because
> select_idle_core always returned a CPU. A variation ended up being
> reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> are cleared during select_idle_core() but select_idle_cpu() still has a
> mask with unvisited CPUs to consider if no idle cores are found.
>
> As far as I know, this would still be compatible with Aubrey's idle
> cpu mask as long as it's visited and cleared between select_idle_core
> and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> because the idle cpu mask would be a hint so if the information is out
> of date, an idle cpu may still be found the normal way.

But even without patch 10, just replacing sched_domain_span(sd) by
sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
get a chance to be idle so select_idle_core is likely to return an
idle_candidate

>
> --
> Mel Gorman
> SUSE Labs

2020-12-04 15:45:28

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, Dec 04, 2020 at 04:23:48PM +0100, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 15:31, Mel Gorman <[email protected]> wrote:
> >
> > On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > > select_idle_cpu() will lose the chance to pick it up?
> > >
> > > This is only relevant for patch 10 which is not to be included IIUC
> > > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > > context of this series."
> > >
> >
> > Patch 10 was stupid in the context of the prototype because
> > select_idle_core always returned a CPU. A variation ended up being
> > reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> > are cleared during select_idle_core() but select_idle_cpu() still has a
> > mask with unvisited CPUs to consider if no idle cores are found.
> >
> > As far as I know, this would still be compatible with Aubrey's idle
> > cpu mask as long as it's visited and cleared between select_idle_core
> > and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> > because the idle cpu mask would be a hint so if the information is out
> > of date, an idle cpu may still be found the normal way.
>
> But even without patch 10, just replacing sched_domain_span(sd) by
> sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
> get a chance to be idle so select_idle_core is likely to return an
> idle_candidate
>

Yes but if the idle mask is out of date for any reason then idle CPUs might
be missed -- hence the intent to maintain a mask of CPUs visited and use
the idle cpu mask as a hint to prioritise CPUs that are likely idle but
fall back to a normal scan if none of the "idle cpu mask" CPUs are
actually idle.

--
Mel Gorman
SUSE Labs

2020-12-04 15:48:19

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, 4 Dec 2020 at 16:40, Mel Gorman <[email protected]> wrote:
>
> On Fri, Dec 04, 2020 at 04:23:48PM +0100, Vincent Guittot wrote:
> > On Fri, 4 Dec 2020 at 15:31, Mel Gorman <[email protected]> wrote:
> > >
> > > On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > > > select_idle_cpu() will lose the chance to pick it up?
> > > >
> > > > This is only relevant for patch 10 which is not to be included IIUC
> > > > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > > > context of this series."
> > > >
> > >
> > > Patch 10 was stupid in the context of the prototype because
> > > select_idle_core always returned a CPU. A variation ended up being
> > > reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> > > are cleared during select_idle_core() but select_idle_cpu() still has a
> > > mask with unvisited CPUs to consider if no idle cores are found.
> > >
> > > As far as I know, this would still be compatible with Aubrey's idle
> > > cpu mask as long as it's visited and cleared between select_idle_core
> > > and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> > > because the idle cpu mask would be a hint so if the information is out
> > > of date, an idle cpu may still be found the normal way.
> >
> > But even without patch 10, just replacing sched_domain_span(sd) by
> > sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
> > get a chance to be idle so select_idle_core is likely to return an
> > idle_candidate
> >
>
> Yes but if the idle mask is out of date for any reason then idle CPUs might

In fact it's the opposite, a cpu in idle mask might not be idle but
all cpus that enter idle will be set

> be missed -- hence the intent to maintain a mask of CPUs visited and use
> the idle cpu mask as a hint to prioritise CPUs that are likely idle but
> fall back to a normal scan if none of the "idle cpu mask" CPUs are
> actually idle.
>
> --
> Mel Gorman
> SUSE Labs

2020-12-04 18:45:54

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 06/10] sched/fair: Clear the target CPU from the cpumask of CPUs searched

On Fri, Dec 04, 2020 at 04:43:05PM +0100, Vincent Guittot wrote:
> On Fri, 4 Dec 2020 at 16:40, Mel Gorman <[email protected]> wrote:
> >
> > On Fri, Dec 04, 2020 at 04:23:48PM +0100, Vincent Guittot wrote:
> > > On Fri, 4 Dec 2020 at 15:31, Mel Gorman <[email protected]> wrote:
> > > >
> > > > On Fri, Dec 04, 2020 at 02:47:48PM +0100, Vincent Guittot wrote:
> > > > > > IIUC, select_idle_core and select_idle_cpu share the same cpumask(select_idle_mask)?
> > > > > > If the target's sibling is removed from select_idle_mask from select_idle_core(),
> > > > > > select_idle_cpu() will lose the chance to pick it up?
> > > > >
> > > > > This is only relevant for patch 10 which is not to be included IIUC
> > > > > what mel said in cover letter : "Patches 9 and 10 are stupid in the
> > > > > context of this series."
> > > > >
> > > >
> > > > Patch 10 was stupid in the context of the prototype because
> > > > select_idle_core always returned a CPU. A variation ended up being
> > > > reintroduced at the end of the Series Yet To Be Posted so that SMT siblings
> > > > are cleared during select_idle_core() but select_idle_cpu() still has a
> > > > mask with unvisited CPUs to consider if no idle cores are found.
> > > >
> > > > As far as I know, this would still be compatible with Aubrey's idle
> > > > cpu mask as long as it's visited and cleared between select_idle_core
> > > > and select_idle_cpu. It relaxes the contraints on Aubrey to some extent
> > > > because the idle cpu mask would be a hint so if the information is out
> > > > of date, an idle cpu may still be found the normal way.
> > >
> > > But even without patch 10, just replacing sched_domain_span(sd) by
> > > sds_idle_cpus(sd->shared) will ensure that sis loops only on cpus that
> > > get a chance to be idle so select_idle_core is likely to return an
> > > idle_candidate
> > >
> >
> > Yes but if the idle mask is out of date for any reason then idle CPUs might
>
> In fact it's the opposite, a cpu in idle mask might not be idle but
> all cpus that enter idle will be set
>

When I first checked, the information was based on the tick or a CPU
stopping the tick. That was not guaranteed to be up to date so I considered
the best option would be to treat idle cpu mask as advisory. It would
not necessarily cover a CPU that was entering idle and polling before
entering an idle state for example or a rq that would pass sched_idle_cpu()
depending on the timing of the update_idle_cpumask call.

I know you reviewed that patch and v6 may be very different but the more
up to date that information is, the greater the cache conflicts will be
on sched_domain_shared so maintaining the up-to-date information may cost
enough to offset any benefit from reduced searching at wakeup.

If this turns out to be wrong, then great, the idle cpu mask can be used
as both the basis for an idle core search and a fast find of an individual
CPU. If the cost of keeping up to date information is too high then the
idle_cpu_mask can be treated as advisory to start the search and track
CPUs visited.

The series are not either/or, chunks of the series I posted are orthogonal
(e.g. changes to p->recent_cpu_used), the latter parts could either work
with idle cpu mask or be replaced by idle cpu mask depending on which
performs better.

--
Mel Gorman
SUSE Labs