2021-04-29 10:21:49

by Pierre Gondois

[permalink] [raw]
Subject: [PATCH v2 2/2] sched/fair: Fix negative energy delta in find_energy_efficient_cpu()

From: Pierre Gondois <[email protected]>

find_energy_efficient_cpu() (feec()) searches the best energy CPU
to place a task on. To do so, compute_energy() estimates the energy
impact of placing the task on a CPU, based on CPU and task utilization
signals.

Utilization signals can be concurrently updated while evaluating a
performance domain (pd). In some cases, this leads to having a
'negative delta', i.e. placing the task in the pd is seen as an
energy gain. Thus, any further energy comparison is biased.

In case of a 'negative delta', return prev_cpu since:
1. a 'negative delta' happens in less than 0.5% of feec() calls,
on a Juno with 6 CPUs (4 little, 2 big)
2. it is unlikely to have two consecutive 'negative delta' for
a task, so if the first call fails, feec() will correctly
place the task in the next feec() call
3. EAS current behavior tends to select prev_cpu if the task
doesn't raise the OPP of its current pd. prev_cpu is EAS's
generic decision
4. prev_cpu should be preferred to returning an error code.
In the latter case, select_idle_sibling() would do the placement,
selecting a big (and not energy efficient) CPU. As 3., the task
would potentially reside on the big CPU for a long time

Reported-by: Xuewen Yan <[email protected]>
Suggested-by: Xuewen Yan <[email protected]>
Signed-off-by: Pierre Gondois <[email protected]>
---
kernel/sched/fair.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c5351366fb93..935f1ea267a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6594,15 +6594,15 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
{
unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
+ int cpu, best_energy_cpu = prev_cpu, target = -1;
unsigned long cpu_cap, util, base_energy = 0;
- int cpu, best_energy_cpu = prev_cpu;
struct sched_domain *sd;
struct perf_domain *pd;

rcu_read_lock();
pd = rcu_dereference(rd->pd);
if (!pd || READ_ONCE(rd->overutilized))
- goto fail;
+ goto unlock;

/*
* Energy-aware wake-up happens on the lowest sched_domain starting
@@ -6612,7 +6612,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
sd = sd->parent;
if (!sd)
- goto fail;
+ goto unlock;
+
+ target = prev_cpu;

sync_entity_load_avg(&p->se);
if (!task_util_est(p))
@@ -6666,6 +6668,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

if (compute_prev_delta) {
prev_delta = compute_energy(p, prev_cpu, pd);
+ if (prev_delta < base_energy_pd)
+ goto unlock;
prev_delta -= base_energy_pd;
best_delta = min(best_delta, prev_delta);
}
@@ -6673,6 +6677,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
/* Evaluate the energy impact of using this CPU. */
if (max_spare_cap_cpu >= 0) {
cur_delta = compute_energy(p, max_spare_cap_cpu, pd);
+ if (cur_delta < base_energy_pd)
+ goto unlock;
cur_delta -= base_energy_pd;
if (cur_delta < best_delta) {
best_delta = cur_delta;
@@ -6680,25 +6686,23 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
}
}
}
-unlock:
+
rcu_read_unlock();

/*
* Pick the best CPU if prev_cpu cannot be used, or if it saves at
* least 6% of the energy used by prev_cpu.
*/
- if (prev_delta == ULONG_MAX)
- return best_energy_cpu;
-
- if ((prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
- return best_energy_cpu;
+ if ((prev_delta == ULONG_MAX) ||
+ (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
+ target = best_energy_cpu;

- return prev_cpu;
+ return target;

-fail:
+unlock:
rcu_read_unlock();

- return -1;
+ return target;
}

/*
--
2.17.1


2021-04-30 11:14:32

by Lukasz Luba

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sched/fair: Fix negative energy delta in find_energy_efficient_cpu()



On 4/29/21 11:19 AM, [email protected] wrote:
> From: Pierre Gondois <[email protected]>
>
> find_energy_efficient_cpu() (feec()) searches the best energy CPU
> to place a task on. To do so, compute_energy() estimates the energy
> impact of placing the task on a CPU, based on CPU and task utilization
> signals.
>
> Utilization signals can be concurrently updated while evaluating a
> performance domain (pd). In some cases, this leads to having a
> 'negative delta', i.e. placing the task in the pd is seen as an
> energy gain. Thus, any further energy comparison is biased.
>
> In case of a 'negative delta', return prev_cpu since:
> 1. a 'negative delta' happens in less than 0.5% of feec() calls,
> on a Juno with 6 CPUs (4 little, 2 big)
> 2. it is unlikely to have two consecutive 'negative delta' for
> a task, so if the first call fails, feec() will correctly
> place the task in the next feec() call
> 3. EAS current behavior tends to select prev_cpu if the task
> doesn't raise the OPP of its current pd. prev_cpu is EAS's
> generic decision
> 4. prev_cpu should be preferred to returning an error code.
> In the latter case, select_idle_sibling() would do the placement,
> selecting a big (and not energy efficient) CPU. As 3., the task
> would potentially reside on the big CPU for a long time
>
> Reported-by: Xuewen Yan <[email protected]>
> Suggested-by: Xuewen Yan <[email protected]>
> Signed-off-by: Pierre Gondois <[email protected]>
> ---
> kernel/sched/fair.c | 28 ++++++++++++++++------------
> 1 file changed, 16 insertions(+), 12 deletions(-)
>


Reviewed-by: Lukasz Luba <[email protected]>

Regards,
Lukasz

2021-05-03 18:04:51

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sched/fair: Fix negative energy delta in find_energy_efficient_cpu()

On 29/04/2021 12:19, [email protected] wrote:
> From: Pierre Gondois <[email protected]>

[...]

> @@ -6680,25 +6686,23 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> }
> }
> }
> -unlock:
> +

No need for empty line.

> rcu_read_unlock();
>
> /*
> * Pick the best CPU if prev_cpu cannot be used, or if it saves at
> * least 6% of the energy used by prev_cpu.
> */
> - if (prev_delta == ULONG_MAX)
> - return best_energy_cpu;
> -
> - if ((prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
> - return best_energy_cpu;
> + if ((prev_delta == ULONG_MAX) ||
> + (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
> + target = best_energy_cpu;

if ((prev_delta == ULONG_MAX) ||
- (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
+ (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
target = best_energy_cpu;

IMHO, using whitespaces to align both sub-conditions here makes it more
readable. Especially since braces aren't required around single
statements with a condition spanning over multiple lines.

[...]

With these minor things sorted:

Reviewed-by: Dietmar Eggemann <[email protected]>

2021-05-04 09:29:29

by Vincent Donnefort

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] sched/fair: Fix negative energy delta in find_energy_efficient_cpu()

On Thu, Apr 29, 2021 at 11:19:48AM +0100, [email protected] wrote:
> From: Pierre Gondois <[email protected]>
>
> find_energy_efficient_cpu() (feec()) searches the best energy CPU
> to place a task on. To do so, compute_energy() estimates the energy
> impact of placing the task on a CPU, based on CPU and task utilization
> signals.
>
> Utilization signals can be concurrently updated while evaluating a
> performance domain (pd). In some cases, this leads to having a
> 'negative delta', i.e. placing the task in the pd is seen as an
> energy gain. Thus, any further energy comparison is biased.
>
> In case of a 'negative delta', return prev_cpu since:
> 1. a 'negative delta' happens in less than 0.5% of feec() calls,
> on a Juno with 6 CPUs (4 little, 2 big)
> 2. it is unlikely to have two consecutive 'negative delta' for
> a task, so if the first call fails, feec() will correctly
> place the task in the next feec() call
> 3. EAS current behavior tends to select prev_cpu if the task
> doesn't raise the OPP of its current pd. prev_cpu is EAS's
> generic decision
> 4. prev_cpu should be preferred to returning an error code.
> In the latter case, select_idle_sibling() would do the placement,
> selecting a big (and not energy efficient) CPU. As 3., the task
> would potentially reside on the big CPU for a long time
>
> Reported-by: Xuewen Yan <[email protected]>
> Suggested-by: Xuewen Yan <[email protected]>
> Signed-off-by: Pierre Gondois <[email protected]>
> ---

I've been testing this patch on the Google's Pixel4, with a modified kernel that
we are using to evalute mailine performance and energy consumption for a
"real-life" mobile usage.

As always, I ran the Work2.0 workload from PCMark on Android. With that setup I
haven't observed any statistically significant performance change neither any CPU
Idle residency modification. Nevertheless, this code protected against ~600 bad
computations (and by extent bad placements) during a single PCMark iteration
and by looking at the traces, this is saving from spurious wake-ups that would
otherwise happen on the biggest CPUs of the system.

+ Reviewed-by: Vincent Donnefort <[email protected]>