From: Morten Rasmussen <[email protected]>
Issue
=====
On asymmetric CPU capacity topologies, we currently rely on wake_cap() to
drive select_task_rq_fair() towards either
- its slow-path (find_idlest_cpu()) if either the previous or
current (waking) CPU has too little capacity for the waking task
- its fast-path (select_idle_sibling()) otherwise
Commit 3273163c6775 ("sched/fair: Let asymmetric CPU configurations balance
at wake-up") points out that this relies on the assumption that "[...]the
CPU capacities within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are
homogeneous".
This assumption no longer holds on newer generations of big.LITTLE
systems (DynamIQ), which can accommodate CPUs of different compute capacity
within a single LLC domain. To hopefully paint a better picture, a regular
big.LITTLE topology would look like this:
+---------+ +---------+
| L2 | | L2 |
+----+----+ +----+----+
|CPU0|CPU1| |CPU2|CPU3|
+----+----+ +----+----+
^^^ ^^^
LITTLEs bigs
which would result in the following scheduler topology:
DIE [ ] <- sd_asym_cpucapacity
MC [ ] [ ] <- sd_llc
0 1 2 3
Conversely, a DynamIQ topology could look like:
+-------------------+
| L3 |
+----+----+----+----+
| L2 | L2 | L2 | L2 |
+----+----+----+----+
|CPU0|CPU1|CPU2|CPU3|
+----+----+----+----+
^^^^^ ^^^^^
LITTLEs bigs
which would result in the following scheduler topology:
MC [ ] <- sd_llc, sd_asym_cpucapacity
0 1 2 3
What this means is that, on DynamIQ systems, we could pass the wake_cap()
test (IOW presume the waking task fits on the CPU capacities of some LLC
domain), thus go through select_idle_sibling().
This function operates on an LLC domain, which here spans both bigs and
LITTLEs, so it could very well pick a CPU of too small capacity for the
task, despite there being fitting idle CPUs - it very much depends on the
CPU iteration order, on which we have absolutely no guarantees
capacity-wise.
Implementation
==============
Introduce yet another select_idle_sibling() helper function that takes CPU
capacity into account. The policy is to pick the first idle CPU which is
big enough for the task (task_util * margin < cpu_capacity). If no
idle CPU is big enough, we pick the idle one with the highest capacity.
Unlike other select_idle_sibling() helpers, this one operates on the
sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all
known CPU capacities in the system. As such, this will work for both
"legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for
newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain).
Co-authored-by: Valentin Schneider <[email protected]>
Signed-off-by: Morten Rasmussen <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/fair.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fe4e0d7753756..aebc2e0e6c8a1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5894,6 +5894,60 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
return cpu;
}
+static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
+
+/*
+ * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
+ * the task fits. If no CPU is big enough, but there are idle ones, try to
+ * maximize capacity.
+ */
+static int select_idle_capacity(struct task_struct *p, int target)
+{
+ unsigned long best_cap = 0;
+ struct sched_domain *sd;
+ struct cpumask *cpus;
+ int best_cpu = -1;
+ struct rq *rq;
+ int cpu;
+
+ if (!static_branch_unlikely(&sched_asym_cpucapacity))
+ return -1;
+
+ sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
+ if (!sd)
+ return -1;
+
+ sync_entity_load_avg(&p->se);
+
+ cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+ cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
+ for_each_cpu_wrap(cpu, cpus, target) {
+ rq = cpu_rq(cpu);
+
+ if (!available_idle_cpu(cpu))
+ continue;
+ if (task_fits_capacity(p, rq->cpu_capacity))
+ return cpu;
+
+ /*
+ * It would be silly to keep looping when we've found a CPU
+ * of highest available capacity. Just check that it's not been
+ * too pressured lately.
+ */
+ if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
+ !check_cpu_capacity(rq, sd))
+ return cpu;
+
+ if (rq->cpu_capacity > best_cap) {
+ best_cap = rq->cpu_capacity;
+ best_cpu = cpu;
+ }
+ }
+
+ return best_cpu;
+}
+
/*
* Try and locate an idle core/thread in the LLC cache domain.
*/
@@ -5902,6 +5956,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
struct sched_domain *sd;
int i, recent_used_cpu;
+ /* For asymmetric capacities, try to be smart about the placement */
+ i = select_idle_capacity(p, target);
+ if ((unsigned)i < nr_cpumask_bits)
+ return i;
+
if (available_idle_cpu(target) || sched_idle_cpu(target))
return target;
--
2.24.0
Hi Valentin,
On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
>
> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
> +
> +/*
> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> + * the task fits. If no CPU is big enough, but there are idle ones, try to
> + * maximize capacity.
> + */
> +static int select_idle_capacity(struct task_struct *p, int target)
> +{
> + unsigned long best_cap = 0;
> + struct sched_domain *sd;
> + struct cpumask *cpus;
> + int best_cpu = -1;
> + struct rq *rq;
> + int cpu;
> +
> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
> + return -1;
> +
> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> + if (!sd)
> + return -1;
> +
> + sync_entity_load_avg(&p->se);
> +
> + cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +
> + for_each_cpu_wrap(cpu, cpus, target) {
> + rq = cpu_rq(cpu);
> +
> + if (!available_idle_cpu(cpu))
> + continue;
> + if (task_fits_capacity(p, rq->cpu_capacity))
> + return cpu;
I have couple of questions.
(1) Any particular reason for not checking sched_idle_cpu() as a backup
for the case where all eligible CPUs are busy? select_idle_cpu() does
that.
(2) Assuming all CPUs are busy, we return -1 from here and end up
calling select_idle_cpu(). The traversal in select_idle_cpu() may be
waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
Should we worry about this?
> +
> + /*
> + * It would be silly to keep looping when we've found a CPU
> + * of highest available capacity. Just check that it's not been
> + * too pressured lately.
> + */
> + if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
> + !check_cpu_capacity(rq, sd))
> + return cpu;
> +
> + if (rq->cpu_capacity > best_cap) {
> + best_cap = rq->cpu_capacity;
> + best_cpu = cpu;
> + }
> + }
> +
> + return best_cpu;
> +}
> +
> /*
> * Try and locate an idle core/thread in the LLC cache domain.
> */
> @@ -5902,6 +5956,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> struct sched_domain *sd;
> int i, recent_used_cpu;
>
> + /* For asymmetric capacities, try to be smart about the placement */
> + i = select_idle_capacity(p, target);
> + if ((unsigned)i < nr_cpumask_bits)
> + return i;
> +
> if (available_idle_cpu(target) || sched_idle_cpu(target))
> return target;
>
> --
> 2.24.0
>
Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
Hi Pavan,
On 28/01/2020 06:22, Pavan Kondeti wrote:
> Hi Valentin,
>
> On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
>>
>> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
>> +
>> +/*
>> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
>> + * the task fits. If no CPU is big enough, but there are idle ones, try to
>> + * maximize capacity.
>> + */
>> +static int select_idle_capacity(struct task_struct *p, int target)
>> +{
>> + unsigned long best_cap = 0;
>> + struct sched_domain *sd;
>> + struct cpumask *cpus;
>> + int best_cpu = -1;
>> + struct rq *rq;
>> + int cpu;
>> +
>> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
>> + return -1;
>> +
>> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
>> + if (!sd)
>> + return -1;
>> +
>> + sync_entity_load_avg(&p->se);
>> +
>> + cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>> +
>> + for_each_cpu_wrap(cpu, cpus, target) {
>> + rq = cpu_rq(cpu);
>> +
>> + if (!available_idle_cpu(cpu))
>> + continue;
>> + if (task_fits_capacity(p, rq->cpu_capacity))
>> + return cpu;
>
> I have couple of questions.
>
> (1) Any particular reason for not checking sched_idle_cpu() as a backup
> for the case where all eligible CPUs are busy? select_idle_cpu() does
> that.
>
No particular reason other than we didn't consider it, I think. I don't see
any harm in folding it in, I'll do that for v4. I am curious however; are
you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
yet, though Viresh paved the way for that to happen.
> (2) Assuming all CPUs are busy, we return -1 from here and end up
> calling select_idle_cpu(). The traversal in select_idle_cpu() may be
> waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
> Should we worry about this?
>
Before v3, since we didn't have the fallback CPU selection within
select_idle_capacity(), we would need the fall-through to select_idle_cpu()
(we could've skipped an idle CPU just because its capacity wasn't high
enough).
That's not the case anymore, so indeed we may be able to bail out of
select_idle_sibling() right after select_idle_capacity() (or after the
prev / recent_used_cpu checks). Our only requirement here is that sd_llc
remains a subset of sd_asym_cpucapacity.
So far for Arm topologies we can have:
- sd_llc < sd_asym_cpucapacity (e.g. legacy big.LITTLE like Juno)
- sd_llc == sd_asym_cpucapacity (e.g. DynamIQ like SDM845)
I'm slightly worried about sd_llc > sd_asym_cpucapacity ever being an
actual thing - I don't believe it makes much sense, but that's not stopping
anyone.
AFAIA we (Arm) *currently* don't allow that with big.LITTLE or DynamIQ, nor
do I think it can happen with the default scheduler topology where MC is
the last possible level we can have as sd_llc.
So it *might* be a safe assumption - and I can still add a SCHED_WARN_ON().
On Tue, Jan 28, 2020 at 11:30:26AM +0000, Valentin Schneider wrote:
> Hi Pavan,
>
> On 28/01/2020 06:22, Pavan Kondeti wrote:
> > Hi Valentin,
> >
> > On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
> >>
> >> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
> >> +
> >> +/*
> >> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> >> + * the task fits. If no CPU is big enough, but there are idle ones, try to
> >> + * maximize capacity.
> >> + */
> >> +static int select_idle_capacity(struct task_struct *p, int target)
> >> +{
> >> + unsigned long best_cap = 0;
> >> + struct sched_domain *sd;
> >> + struct cpumask *cpus;
> >> + int best_cpu = -1;
> >> + struct rq *rq;
> >> + int cpu;
> >> +
> >> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
> >> + return -1;
> >> +
> >> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> >> + if (!sd)
> >> + return -1;
> >> +
> >> + sync_entity_load_avg(&p->se);
> >> +
> >> + cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> >> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >> +
> >> + for_each_cpu_wrap(cpu, cpus, target) {
> >> + rq = cpu_rq(cpu);
> >> +
> >> + if (!available_idle_cpu(cpu))
> >> + continue;
> >> + if (task_fits_capacity(p, rq->cpu_capacity))
> >> + return cpu;
> >
> > I have couple of questions.
> >
> > (1) Any particular reason for not checking sched_idle_cpu() as a backup
> > for the case where all eligible CPUs are busy? select_idle_cpu() does
> > that.
> >
>
> No particular reason other than we didn't consider it, I think. I don't see
> any harm in folding it in, I'll do that for v4. I am curious however; are
> you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
> yet, though Viresh paved the way for that to happen.
>
We are not using SCHED_IDLE in product setups. I am told Android may use it
for background tasks in future. I am not completely sure though. I asked it
because select_idle_cpu() is using it.
> > (2) Assuming all CPUs are busy, we return -1 from here and end up
> > calling select_idle_cpu(). The traversal in select_idle_cpu() may be
> > waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
> > Should we worry about this?
> >
>
> Before v3, since we didn't have the fallback CPU selection within
> select_idle_capacity(), we would need the fall-through to select_idle_cpu()
> (we could've skipped an idle CPU just because its capacity wasn't high
> enough).
>
> That's not the case anymore, so indeed we may be able to bail out of
> select_idle_sibling() right after select_idle_capacity() (or after the
> prev / recent_used_cpu checks). Our only requirement here is that sd_llc
> remains a subset of sd_asym_cpucapacity.
>
> So far for Arm topologies we can have:
> - sd_llc < sd_asym_cpucapacity (e.g. legacy big.LITTLE like Juno)
> - sd_llc == sd_asym_cpucapacity (e.g. DynamIQ like SDM845)
>
> I'm slightly worried about sd_llc > sd_asym_cpucapacity ever being an
> actual thing - I don't believe it makes much sense, but that's not stopping
> anyone.
>
> AFAIA we (Arm) *currently* don't allow that with big.LITTLE or DynamIQ, nor
> do I think it can happen with the default scheduler topology where MC is
> the last possible level we can have as sd_llc.
>
> So it *might* be a safe assumption - and I can still add a SCHED_WARN_ON().
Agreed.
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
On 01/29/20 09:22, Pavan Kondeti wrote:
> On Tue, Jan 28, 2020 at 11:30:26AM +0000, Valentin Schneider wrote:
> > Hi Pavan,
> >
> > On 28/01/2020 06:22, Pavan Kondeti wrote:
> > > Hi Valentin,
> > >
> > > On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
> > >>
> > >> +static inline int check_cpu_capacity(struct rq *rq, struct sched_domain *sd);
> > >> +
> > >> +/*
> > >> + * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> > >> + * the task fits. If no CPU is big enough, but there are idle ones, try to
> > >> + * maximize capacity.
> > >> + */
> > >> +static int select_idle_capacity(struct task_struct *p, int target)
> > >> +{
> > >> + unsigned long best_cap = 0;
> > >> + struct sched_domain *sd;
> > >> + struct cpumask *cpus;
> > >> + int best_cpu = -1;
> > >> + struct rq *rq;
> > >> + int cpu;
> > >> +
> > >> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
> > >> + return -1;
> > >> +
> > >> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> > >> + if (!sd)
> > >> + return -1;
> > >> +
> > >> + sync_entity_load_avg(&p->se);
> > >> +
> > >> + cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> > >> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > >> +
> > >> + for_each_cpu_wrap(cpu, cpus, target) {
> > >> + rq = cpu_rq(cpu);
> > >> +
> > >> + if (!available_idle_cpu(cpu))
> > >> + continue;
> > >> + if (task_fits_capacity(p, rq->cpu_capacity))
> > >> + return cpu;
> > >
> > > I have couple of questions.
> > >
> > > (1) Any particular reason for not checking sched_idle_cpu() as a backup
> > > for the case where all eligible CPUs are busy? select_idle_cpu() does
> > > that.
> > >
> >
> > No particular reason other than we didn't consider it, I think. I don't see
> > any harm in folding it in, I'll do that for v4. I am curious however; are
> > you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
> > yet, though Viresh paved the way for that to happen.
> >
>
> We are not using SCHED_IDLE in product setups. I am told Android may use it
> for background tasks in future. I am not completely sure though. I asked it
> because select_idle_cpu() is using it.
I believe Viresh intention when he pushed for the support was allowing this use
case.
FWIW, I had a patch locally to implement this but waiting on latency_nice
attribute to go in [1] before I attempt to upstream it.
The design goals I had in mind:
1. If there are multiple energy efficient CPUs we can select, select
the idle one.
2. If latency nice is set and the most energy efficient CPU is not
idle, then fallback to the most energy efficient idle CPU.
Android use case needs EAS path support before it can be useful to it.
[1] https://lore.kernel.org/lkml/[email protected]/
--
Qais Yousef
On 29/01/2020 03:52, Pavan Kondeti wrote:
>> No particular reason other than we didn't consider it, I think. I don't see
>> any harm in folding it in, I'll do that for v4. I am curious however; are
>> you folks making use of SCHED_IDLE? AFAIA Android isn't making use of it
>> yet, though Viresh paved the way for that to happen.
>>
>
> We are not using SCHED_IDLE in product setups. I am told Android may use it
> for background tasks in future. I am not completely sure though. I asked it
> because select_idle_cpu() is using it.
>
Fair enough! Thanks.
On 28/01/2020 12:30, Valentin Schneider wrote:
> Hi Pavan,
>
> On 28/01/2020 06:22, Pavan Kondeti wrote:
>> Hi Valentin,
>>
>> On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
[...]
>>> +
>>> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
>>> + return -1;
We do need this one to bail out quickly on non CPU asym systems. (1)
>>> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
>>> + if (!sd)
>>> + return -1;
And I assume we can't return target here because of exclusive cpusets
which can form symmetric CPU capacities islands on a CPU asymmetric
system? (2)
>>> + sync_entity_load_avg(&p->se);
>>> +
>>> + cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>>> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>>> +
>>> + for_each_cpu_wrap(cpu, cpus, target) {
>>> + rq = cpu_rq(cpu);
>>> +
>>> + if (!available_idle_cpu(cpu))
>>> + continue;
>>> + if (task_fits_capacity(p, rq->cpu_capacity))
>>> + return cpu;
>>
>> I have couple of questions.
[...]
>> (2) Assuming all CPUs are busy, we return -1 from here and end up
>> calling select_idle_cpu(). The traversal in select_idle_cpu() may be
>> waste in cases where sd_llc == sd_asym_cpucapacity . For example SDM845.
>> Should we worry about this?
>>
>
> Before v3, since we didn't have the fallback CPU selection within
> select_idle_capacity(), we would need the fall-through to select_idle_cpu()
> (we could've skipped an idle CPU just because its capacity wasn't high
> enough).
>
> That's not the case anymore, so indeed we may be able to bail out of
> select_idle_sibling() right after select_idle_capacity() (or after the
> prev / recent_used_cpu checks). Our only requirement here is that sd_llc
> remains a subset of sd_asym_cpucapacity.
How do you distinguish '-1' in (1), (2) and 'best_cpu = -1' (3)?
In (1) and (2) you want to check if target is idle (or sched_idle) but
in (3) you probably only want to check 'recent_used_cpu'?
On 26/01/2020 21:09, Valentin Schneider wrote:
[...]
> +static int select_idle_capacity(struct task_struct *p, int target)
> +{
> + unsigned long best_cap = 0;
> + struct sched_domain *sd;
> + struct cpumask *cpus;
> + int best_cpu = -1;
> + struct rq *rq;
> + int cpu;
> +
> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
> + return -1;
> +
> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
> + if (!sd)
> + return -1;
> +
> + sync_entity_load_avg(&p->se);
> +
> + cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +
> + for_each_cpu_wrap(cpu, cpus, target) {
> + rq = cpu_rq(cpu);
> +
> + if (!available_idle_cpu(cpu))
> + continue;
> + if (task_fits_capacity(p, rq->cpu_capacity))
> + return cpu;
> +
> + /*
> + * It would be silly to keep looping when we've found a CPU
> + * of highest available capacity. Just check that it's not been
> + * too pressured lately.
> + */
> + if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
There is a similar check in check_misfit_status(). Common helper function?
> + !check_cpu_capacity(rq, sd))
> + return cpu;
I wonder how this special treatment of a big CPU behaves in (LITTLE,
medium, big) system like Pixel4 (Snapdragon 855):
flame:/ $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
261
261
261
261
871
871
871
1024
Or on legacy systems where the sd->imbalance_pct is 25% instead of 17%?
On 29/01/2020 10:38, Dietmar Eggemann wrote:
> On 28/01/2020 12:30, Valentin Schneider wrote:
>> Hi Pavan,
>>
>> On 28/01/2020 06:22, Pavan Kondeti wrote:
>>> Hi Valentin,
>>>
>>> On Sun, Jan 26, 2020 at 08:09:32PM +0000, Valentin Schneider wrote:
>
> [...]
>
>>>> +
>>>> + if (!static_branch_unlikely(&sched_asym_cpucapacity))
>>>> + return -1;
>
> We do need this one to bail out quickly on non CPU asym systems. (1)
>
>>>> + sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, target));
>>>> + if (!sd)
>>>> + return -1;
>
> And I assume we can't return target here because of exclusive cpusets
> which can form symmetric CPU capacities islands on a CPU asymmetric
> system? (2)
>
Precisely, the "canonical" check for asymmetry is static key + SD pointer.
In terms of functionality we could "just" check sd_asym_cpucapacity (it can't
be set without having the static key set, though the reverse isn't true),
but we *want* to use the static key here to make SMP people happy.
>> That's not the case anymore, so indeed we may be able to bail out of
>> select_idle_sibling() right after select_idle_capacity() (or after the
>> prev / recent_used_cpu checks). Our only requirement here is that sd_llc
>> remains a subset of sd_asym_cpucapacity.
>
> How do you distinguish '-1' in (1), (2) and 'best_cpu = -1' (3)?
>
> In (1) and (2) you want to check if target is idle (or sched_idle) but
> in (3) you probably only want to check 'recent_used_cpu'?
>
Right, when we come back from select_idle_capacity(), and we did go through
the CPU loop, but we still returned -1, it means all CPUs in sd_asym_cpucapacity
were not idle. This includes 'target' of course, so we shouldn't need to
check it again.
In those cases we might still not have evaluated 'prev' or 'recent_used_cpu'.
It's somewhat of a last ditch attempt to find an idle CPU, and they'll only
help when they aren't in sd_asym_cpucapacity. I'm actually curious as to how
much the 'recent_used_cpu' thing helps, I've never paid it much attention.
So yeah my options are (for asym CPU capacity topologies):
a) just bail out after select_idle_capacity()
b) try to squeeze out a bit more out of select_idle_sibling() by also doing
the 'prev' & 'recent_used_cpu' checks.
a) is quite easy to implement; I can just inline the static key and sd checks
in select_idle_sibling() and return unconditionally once I'm past those
checks.
b) is more intrusive and I don't have a clear picture yet as to how much it
will really bring to the table.
I'll think about it and try to play around with both of these.
On 29/01/2020 11:04, Dietmar Eggemann wrote:
>> + /*
>> + * It would be silly to keep looping when we've found a CPU
>> + * of highest available capacity. Just check that it's not been
>> + * too pressured lately.
>> + */
>> + if (rq->cpu_capacity_orig == READ_ONCE(rq->rd->max_cpu_capacity) &&
>
> There is a similar check in check_misfit_status(). Common helper function?
Mright, and check_misfit_status() is missing the READ_ONCE(). That said...
>
>> + !check_cpu_capacity(rq, sd))
>> + return cpu;
>
> I wonder how this special treatment of a big CPU behaves in (LITTLE,
> medium, big) system like Pixel4 (Snapdragon 855):
>
> flame:/ $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
>
> 261
> 261
> 261
> 261
> 871
> 871
> 871
> 1024
>
> Or on legacy systems where the sd->imbalance_pct is 25% instead of 17%?
>
... This is a very valid point. When I wrote this bit I had the good old
big.LITTLE split in mind where there are big differences between the capacity
values. As you point out, that's not so true with DynamIQ systems sporting
> 2 capacity values. The issue here is that we could bail early picking a
(slightly) pressured big (1024 capacity_orig) when there was a non-pressured
idle medium (871 capacity orig).
It's borderline in this example - the threshold for a big to be seen as
pressured by check_cpu_capacity(), assuming a flat topology with just an MC
domain, is ~ 875. If we have e.g. mediums at 900 and bigs at 1024, this
logic is broken.
So this is pretty much a case of my trying to be too clever for my own good,
I'll remove that "fastpath" in v4. Thanks for pointing it out!