LinuxLists.cc - [PATCH v3 0/2] sched: Consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection

2023-05-15 12:29:06

Subject: [PATCH v3 0/2] sched: Consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection

This is the implementation of the idea to factor in CPU runnable_avg
into the CPU utilization getter functions (so called 'runnable
boosting') as a way to consider CPU contention for:

(a) CPU frequency
(b) EAS' max util and
(c) 'migrate_util' type load-balance busiest CPU selection.

Tests:

for (a) and (b):

Testcase is Jankbench (all subtests, 10 iterations) on Pixel6 (Android
12) with mainline v5.18 kernel and forward ported task scheduler
patches.

Uclamp has been deactivated so that the Android Dynamic Performance
Framework (ADPF) 'CPU performance hints' feature (Userspace task
boosting via uclamp_min) does not interfere.

Max_frame_duration:
+-----------------+------------+
| kernel | value [ms] |
+-----------------+------------+
| base | 163.061513 |
| runnable | 161.991705 |
+-----------------+------------+

Mean_frame_duration:
+-----------------+------------+----------+
| kernel | value [ms] | diff [%] |
+-----------------+------------+----------+
| base | 18.0 | 0.0 |
| runnable | 12.7 | -29.43 |
+-----------------+------------+----------+

Jank percentage (Jank deadline 16ms):
+-----------------+------------+----------+
| kernel | value [%] | diff [%] |
+-----------------+------------+----------+
| base | 3.6 | 0.0 |
| runnable | 1.0 | -68.86 |
+-----------------+------------+----------+

Power usage [mW] (total - all CPUs):
+-----------------+------------+----------+
| kernel | value [mW] | diff [%] |
+-----------------+------------+----------+
| base | 129.5 | 0.0 |
| runnable | 134.3 | 3.71* |
+-----------------+------------+----------+

* Power usage went up from 129.3 (-0.15%) in v1 to 134.3 (3.71%) whereas
all the other benchmark numbers stayed roughly the same. This is
probably because of using 'runnable boosting' for EAS max util now as
well and tasks more often end up running on non-little CPUs because of
that.

for (c):

Testcase is 'perf bench sched messaging' on Arm64 Ampere Altra with 160
CPUs (sched domains = {MC, DIE, NUMA}) which shows some small
improvement:

perf stat --null --repeat 10 -- perf bench sched messaging -t -g 1 -l 2000

0.4869 +- 0.0173 seconds time elapsed (+- 3.55%) ->
0.4377 +- 0.0147 seconds time elapsed (+- 3.36%)

Chen Yu tested v1** with schbench, hackbench, netperf and tbench on an
Intel Sapphire Rapids with 2x56C/112T = 224 CPUs which showed no obvious
difference and some small improvements on tbench:

https://lkml.kernel.org/r/ZFSr4Adtx1ZI8hoc@chenyu5-mobl1

** The implementation for (c) hasn't changed in v2.

v1 -> v2:

(1) Refactor CPU utilization getter functions, let cpu_util_cfs() call
cpu_util_next() (now cpu_util()).

(2) Consider CPU contention in EAS (find_energy_efficient_cpu() ->
eenv_pd_max_util()) next to schedutil (sugov_get_util()) as well so
that EAS' and schedutil's views on CPU frequency selection are in
sync.

(3) Move 'util_avg = max(util_avg, runnable_avg)' from
cpu_boosted_util_cfs() to cpu_util_next() (now cpu_util()) so that
EAS can use it too.

(4) Rework patch header.

(5) Add test results (JankbenchX on Pixel6 to test changes in schedutil
and EAS) and 'perf bench sched messaging' on Arm64 Ampere Altra for
CFS load-balance (find_busiest_queue()).

v2 -> v3:

(1) Move function header from cpu_util_cfs() to cpu_util() and add a
paragraph about 'runnable boosting'.

(2) Create cpu_util_cfs_boost() and call it for sites which want to use
'runnable boosting'.

(3) Use regular 'if (boost)' in cpu_util().

Dietmar Eggemann (2):
sched/fair: Refactor CPU utilization functions
sched/fair, cpufreq: Introduce 'runnable boosting'

kernel/sched/cpufreq_schedutil.c | 3 +-
kernel/sched/fair.c | 87 ++++++++++++++++++++++++++------
kernel/sched/sched.h | 48 +-----------------
3 files changed, 76 insertions(+), 62 deletions(-)

--
2.25.1

2023-05-15 12:50:45

by Dietmar Eggemann

[permalink] [raw]

Subject: [PATCH v3 1/2] sched/fair: Refactor CPU utilization functions

There is a lot of code duplication in cpu_util_next() & cpu_util_cfs().

Remove this by allowing cpu_util_next() to be called with p = NULL.
Rename cpu_util_next() to cpu_util() since the '_next' suffix is no
longer necessary to distinct cpu utilization related functions.
Implement cpu_util_cfs(cpu) as cpu_util(cpu, p = NULL, -1).

This will allow to code future related cpu util changes only in one
place, namely in cpu_util().

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/fair.c | 63 ++++++++++++++++++++++++++++++++++----------
kernel/sched/sched.h | 47 +--------------------------------
2 files changed, 50 insertions(+), 60 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3f8135d7c89d..9874e28d5e38 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7145,11 +7145,41 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
return target;
}

-/*
- * Predicts what cpu_util(@cpu) would return if @p was removed from @cpu
- * (@dst_cpu = -1) or migrated to @dst_cpu.
- */
-static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
+/**
+ * cpu_util() - Estimates the amount of CPU capacity used by CFS tasks.
+ * @cpu: the CPU to get the utilization for
+ * @p: task for which the CPU utilization should be predicted or NULL
+ * @dst_cpu: CPU @p migrates to, -1 if @p moves from @cpu or @p == NULL
+ *
+ * The unit of the return value must be the same as the one of CPU capacity
+ * so that CPU utilization can be compared with CPU capacity.
+ *
+ * CPU utilization is the sum of running time of runnable tasks plus the
+ * recent utilization of currently non-runnable tasks on that CPU.
+ * It represents the amount of CPU capacity currently used by CFS tasks in
+ * the range [0..max CPU capacity] with max CPU capacity being the CPU
+ * capacity at f_max.
+ *
+ * The estimated CPU utilization is defined as the maximum between CPU
+ * utilization and sum of the estimated utilization of the currently
+ * runnable tasks on that CPU. It preserves a utilization "snapshot" of
+ * previously-executed tasks, which helps better deduce how busy a CPU will
+ * be when a long-sleeping task wakes up. The contribution to CPU utilization
+ * of such a task would be significantly decayed at this point of time.
+ *
+ * CPU utilization can be higher than the current CPU capacity
+ * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
+ * of rounding errors as well as task migrations or wakeups of new tasks.
+ * CPU utilization has to be capped to fit into the [0..max CPU capacity]
+ * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
+ * could be seen as over-utilized even though CPU1 has 20% of spare CPU
+ * capacity. CPU utilization is allowed to overshoot current CPU capacity
+ * though since this is useful for predicting the CPU capacity required
+ * after task migrations (scheduler-driven DVFS).
+ *
+ * Return: (Estimated) utilization for the specified CPU.
+ */
+static unsigned long cpu_util(int cpu, struct task_struct *p, int dst_cpu)
{
struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
@@ -7160,9 +7190,9 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
* contribution. In all the other cases @cpu is not impacted by the
* migration so its util_avg is already correct.
*/
- if (task_cpu(p) == cpu && dst_cpu != cpu)
+ if (p && task_cpu(p) == cpu && dst_cpu != cpu)
lsub_positive(&util, task_util(p));
- else if (task_cpu(p) != cpu && dst_cpu == cpu)
+ else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
util += task_util(p);

if (sched_feat(UTIL_EST)) {
@@ -7198,7 +7228,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
*/
if (dst_cpu == cpu)
util_est += _task_util_est(p);
- else if (unlikely(task_on_rq_queued(p) || current == p))
+ else if (p && unlikely(task_on_rq_queued(p) || current == p))
lsub_positive(&util_est, _task_util_est(p));

util = max(util, util_est);
@@ -7207,6 +7237,11 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
return min(util, capacity_orig_of(cpu));
}

+unsigned long cpu_util_cfs(int cpu)
+{
+ return cpu_util(cpu, NULL, -1);
+}
+
/*
* cpu_util_without: compute cpu utilization without any contributions from *p
* @cpu: the CPU which utilization is requested
@@ -7224,9 +7259,9 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
{
/* Task has no contribution or is new */
if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
- return cpu_util_cfs(cpu);
+ p = NULL;

- return cpu_util_next(cpu, p, -1);
+ return cpu_util(cpu, p, -1);
}

/*
@@ -7273,7 +7308,7 @@ static inline void eenv_task_busy_time(struct energy_env *eenv,
* cpu_capacity.
*
* The contribution of the task @p for which we want to estimate the
- * energy cost is removed (by cpu_util_next()) and must be calculated
+ * energy cost is removed (by cpu_util()) and must be calculated
* separately (see eenv_task_busy_time). This ensures:
*
* - A stable PD utilization, no matter which CPU of that PD we want to place
@@ -7294,7 +7329,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
int cpu;

for_each_cpu(cpu, pd_cpus) {
- unsigned long util = cpu_util_next(cpu, p, -1);
+ unsigned long util = cpu_util(cpu, p, -1);

busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
}
@@ -7318,7 +7353,7 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,

for_each_cpu(cpu, pd_cpus) {
struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
- unsigned long util = cpu_util_next(cpu, p, dst_cpu);
+ unsigned long util = cpu_util(cpu, p, dst_cpu);
unsigned long cpu_util;

/*
@@ -7464,7 +7499,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
if (!cpumask_test_cpu(cpu, p->cpus_ptr))
continue;

- util = cpu_util_next(cpu, p, cpu);
+ util = cpu_util(cpu, p, cpu);
cpu_cap = capacity_of(cpu);

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec7b3e0a2b20..f78c0f85cc76 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2946,53 +2946,8 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
return READ_ONCE(rq->avg_dl.util_avg);
}

-/**
- * cpu_util_cfs() - Estimates the amount of CPU capacity used by CFS tasks.
- * @cpu: the CPU to get the utilization for.
- *
- * The unit of the return value must be the same as the one of CPU capacity
- * so that CPU utilization can be compared with CPU capacity.
- *
- * CPU utilization is the sum of running time of runnable tasks plus the
- * recent utilization of currently non-runnable tasks on that CPU.
- * It represents the amount of CPU capacity currently used by CFS tasks in
- * the range [0..max CPU capacity] with max CPU capacity being the CPU
- * capacity at f_max.
- *
- * The estimated CPU utilization is defined as the maximum between CPU
- * utilization and sum of the estimated utilization of the currently
- * runnable tasks on that CPU. It preserves a utilization "snapshot" of
- * previously-executed tasks, which helps better deduce how busy a CPU will
- * be when a long-sleeping task wakes up. The contribution to CPU utilization
- * of such a task would be significantly decayed at this point of time.
- *
- * CPU utilization can be higher than the current CPU capacity
- * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
- * of rounding errors as well as task migrations or wakeups of new tasks.
- * CPU utilization has to be capped to fit into the [0..max CPU capacity]
- * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
- * could be seen as over-utilized even though CPU1 has 20% of spare CPU
- * capacity. CPU utilization is allowed to overshoot current CPU capacity
- * though since this is useful for predicting the CPU capacity required
- * after task migrations (scheduler-driven DVFS).
- *
- * Return: (Estimated) utilization for the specified CPU.
- */
-static inline unsigned long cpu_util_cfs(int cpu)
-{
- struct cfs_rq *cfs_rq;
- unsigned long util;
-
- cfs_rq = &cpu_rq(cpu)->cfs;
- util = READ_ONCE(cfs_rq->avg.util_avg);

- if (sched_feat(UTIL_EST)) {
- util = max_t(unsigned long, util,
- READ_ONCE(cfs_rq->avg.util_est.enqueued));
- }
-
- return min(util, capacity_orig_of(cpu));
-}
+extern unsigned long cpu_util_cfs(int cpu);

static inline unsigned long cpu_util_rt(struct rq *rq)
{
--
2.25.1

2023-06-05 12:34:05

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] sched/fair: Refactor CPU utilization functions

On Mon, 15 May 2023 at 13:57, Dietmar Eggemann <[email protected]> wrote:
>
> There is a lot of code duplication in cpu_util_next() & cpu_util_cfs().
>
> Remove this by allowing cpu_util_next() to be called with p = NULL.
> Rename cpu_util_next() to cpu_util() since the '_next' suffix is no
> longer necessary to distinct cpu utilization related functions.
> Implement cpu_util_cfs(cpu) as cpu_util(cpu, p = NULL, -1).
>
> This will allow to code future related cpu util changes only in one
> place, namely in cpu_util().
>
> Signed-off-by: Dietmar Eggemann <[email protected]>

Reviewed-by: Vincent Guittot <[email protected]>

> ---
> kernel/sched/fair.c | 63 ++++++++++++++++++++++++++++++++++----------
> kernel/sched/sched.h | 47 +--------------------------------
> 2 files changed, 50 insertions(+), 60 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3f8135d7c89d..9874e28d5e38 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7145,11 +7145,41 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> return target;
> }
>
> -/*
> - * Predicts what cpu_util(@cpu) would return if @p was removed from @cpu
> - * (@dst_cpu = -1) or migrated to @dst_cpu.
> - */
> -static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
> +/**
> + * cpu_util() - Estimates the amount of CPU capacity used by CFS tasks.
> + * @cpu: the CPU to get the utilization for
> + * @p: task for which the CPU utilization should be predicted or NULL
> + * @dst_cpu: CPU @p migrates to, -1 if @p moves from @cpu or @p == NULL
> + *
> + * The unit of the return value must be the same as the one of CPU capacity
> + * so that CPU utilization can be compared with CPU capacity.
> + *
> + * CPU utilization is the sum of running time of runnable tasks plus the
> + * recent utilization of currently non-runnable tasks on that CPU.
> + * It represents the amount of CPU capacity currently used by CFS tasks in
> + * the range [0..max CPU capacity] with max CPU capacity being the CPU
> + * capacity at f_max.
> + *
> + * The estimated CPU utilization is defined as the maximum between CPU
> + * utilization and sum of the estimated utilization of the currently
> + * runnable tasks on that CPU. It preserves a utilization "snapshot" of
> + * previously-executed tasks, which helps better deduce how busy a CPU will
> + * be when a long-sleeping task wakes up. The contribution to CPU utilization
> + * of such a task would be significantly decayed at this point of time.
> + *
> + * CPU utilization can be higher than the current CPU capacity
> + * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
> + * of rounding errors as well as task migrations or wakeups of new tasks.
> + * CPU utilization has to be capped to fit into the [0..max CPU capacity]
> + * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
> + * could be seen as over-utilized even though CPU1 has 20% of spare CPU
> + * capacity. CPU utilization is allowed to overshoot current CPU capacity
> + * though since this is useful for predicting the CPU capacity required
> + * after task migrations (scheduler-driven DVFS).
> + *
> + * Return: (Estimated) utilization for the specified CPU.
> + */
> +static unsigned long cpu_util(int cpu, struct task_struct *p, int dst_cpu)
> {
> struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
> unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
> @@ -7160,9 +7190,9 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
> * contribution. In all the other cases @cpu is not impacted by the
> * migration so its util_avg is already correct.
> */
> - if (task_cpu(p) == cpu && dst_cpu != cpu)
> + if (p && task_cpu(p) == cpu && dst_cpu != cpu)
> lsub_positive(&util, task_util(p));
> - else if (task_cpu(p) != cpu && dst_cpu == cpu)
> + else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
> util += task_util(p);
>
> if (sched_feat(UTIL_EST)) {
> @@ -7198,7 +7228,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
> */
> if (dst_cpu == cpu)
> util_est += _task_util_est(p);
> - else if (unlikely(task_on_rq_queued(p) || current == p))
> + else if (p && unlikely(task_on_rq_queued(p) || current == p))
> lsub_positive(&util_est, _task_util_est(p));
>
> util = max(util, util_est);
> @@ -7207,6 +7237,11 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
> return min(util, capacity_orig_of(cpu));
> }
>
> +unsigned long cpu_util_cfs(int cpu)
> +{
> + return cpu_util(cpu, NULL, -1);
> +}
> +
> /*
> * cpu_util_without: compute cpu utilization without any contributions from *p
> * @cpu: the CPU which utilization is requested
> @@ -7224,9 +7259,9 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
> {
> /* Task has no contribution or is new */
> if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
> - return cpu_util_cfs(cpu);
> + p = NULL;
>
> - return cpu_util_next(cpu, p, -1);
> + return cpu_util(cpu, p, -1);
> }
>
> /*
> @@ -7273,7 +7308,7 @@ static inline void eenv_task_busy_time(struct energy_env *eenv,
> * cpu_capacity.
> *
> * The contribution of the task @p for which we want to estimate the
> - * energy cost is removed (by cpu_util_next()) and must be calculated
> + * energy cost is removed (by cpu_util()) and must be calculated
> * separately (see eenv_task_busy_time). This ensures:
> *
> * - A stable PD utilization, no matter which CPU of that PD we want to place
> @@ -7294,7 +7329,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> int cpu;
>
> for_each_cpu(cpu, pd_cpus) {
> - unsigned long util = cpu_util_next(cpu, p, -1);
> + unsigned long util = cpu_util(cpu, p, -1);
>
> busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
> }
> @@ -7318,7 +7353,7 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
>
> for_each_cpu(cpu, pd_cpus) {
> struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
> - unsigned long util = cpu_util_next(cpu, p, dst_cpu);
> + unsigned long util = cpu_util(cpu, p, dst_cpu);
> unsigned long cpu_util;
>
> /*
> @@ -7464,7 +7499,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> continue;
>
> - util = cpu_util_next(cpu, p, cpu);
> + util = cpu_util(cpu, p, cpu);
> cpu_cap = capacity_of(cpu);
>
> /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index ec7b3e0a2b20..f78c0f85cc76 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2946,53 +2946,8 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
> return READ_ONCE(rq->avg_dl.util_avg);
> }
>
> -/**
> - * cpu_util_cfs() - Estimates the amount of CPU capacity used by CFS tasks.
> - * @cpu: the CPU to get the utilization for.
> - *
> - * The unit of the return value must be the same as the one of CPU capacity
> - * so that CPU utilization can be compared with CPU capacity.
> - *
> - * CPU utilization is the sum of running time of runnable tasks plus the
> - * recent utilization of currently non-runnable tasks on that CPU.
> - * It represents the amount of CPU capacity currently used by CFS tasks in
> - * the range [0..max CPU capacity] with max CPU capacity being the CPU
> - * capacity at f_max.
> - *
> - * The estimated CPU utilization is defined as the maximum between CPU
> - * utilization and sum of the estimated utilization of the currently
> - * runnable tasks on that CPU. It preserves a utilization "snapshot" of
> - * previously-executed tasks, which helps better deduce how busy a CPU will
> - * be when a long-sleeping task wakes up. The contribution to CPU utilization
> - * of such a task would be significantly decayed at this point of time.
> - *
> - * CPU utilization can be higher than the current CPU capacity
> - * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
> - * of rounding errors as well as task migrations or wakeups of new tasks.
> - * CPU utilization has to be capped to fit into the [0..max CPU capacity]
> - * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
> - * could be seen as over-utilized even though CPU1 has 20% of spare CPU
> - * capacity. CPU utilization is allowed to overshoot current CPU capacity
> - * though since this is useful for predicting the CPU capacity required
> - * after task migrations (scheduler-driven DVFS).
> - *
> - * Return: (Estimated) utilization for the specified CPU.
> - */
> -static inline unsigned long cpu_util_cfs(int cpu)
> -{
> - struct cfs_rq *cfs_rq;
> - unsigned long util;
> -
> - cfs_rq = &cpu_rq(cpu)->cfs;
> - util = READ_ONCE(cfs_rq->avg.util_avg);
>
> - if (sched_feat(UTIL_EST)) {
> - util = max_t(unsigned long, util,
> - READ_ONCE(cfs_rq->avg.util_est.enqueued));
> - }
> -
> - return min(util, capacity_orig_of(cpu));
> -}
> +extern unsigned long cpu_util_cfs(int cpu);
>
> static inline unsigned long cpu_util_rt(struct rq *rq)
> {
> --
> 2.25.1
>

2023-06-06 08:42:49

by tip-bot2 for Alexey Makhalov

[permalink] [raw]

Subject: [tip: sched/core] sched/fair: Refactor CPU utilization functions

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 3eb6d6ececca2fd566d717b37ab467c246f66be7
Gitweb: https://git.kernel.org/tip/3eb6d6ececca2fd566d717b37ab467c246f66be7
Author: Dietmar Eggemann <[email protected]>
AuthorDate: Mon, 15 May 2023 13:57:34 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 05 Jun 2023 21:13:43 +02:00

sched/fair: Refactor CPU utilization functions

There is a lot of code duplication in cpu_util_next() & cpu_util_cfs().

Remove this by allowing cpu_util_next() to be called with p = NULL.
Rename cpu_util_next() to cpu_util() since the '_next' suffix is no
longer necessary to distinct cpu utilization related functions.
Implement cpu_util_cfs(cpu) as cpu_util(cpu, p = NULL, -1).

This will allow to code future related cpu util changes only in one
place, namely in cpu_util().

Signed-off-by: Dietmar Eggemann <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 63 +++++++++++++++++++++++++++++++++----------
kernel/sched/sched.h | 47 +--------------------------------
2 files changed, 50 insertions(+), 60 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df0ff90..09e3be2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7202,11 +7202,41 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
return target;
}

-/*
- * Predicts what cpu_util(@cpu) would return if @p was removed from @cpu
- * (@dst_cpu = -1) or migrated to @dst_cpu.
- */
-static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
+/**
+ * cpu_util() - Estimates the amount of CPU capacity used by CFS tasks.
+ * @cpu: the CPU to get the utilization for
+ * @p: task for which the CPU utilization should be predicted or NULL
+ * @dst_cpu: CPU @p migrates to, -1 if @p moves from @cpu or @p == NULL
+ *
+ * The unit of the return value must be the same as the one of CPU capacity
+ * so that CPU utilization can be compared with CPU capacity.
+ *
+ * CPU utilization is the sum of running time of runnable tasks plus the
+ * recent utilization of currently non-runnable tasks on that CPU.
+ * It represents the amount of CPU capacity currently used by CFS tasks in
+ * the range [0..max CPU capacity] with max CPU capacity being the CPU
+ * capacity at f_max.
+ *
+ * The estimated CPU utilization is defined as the maximum between CPU
+ * utilization and sum of the estimated utilization of the currently
+ * runnable tasks on that CPU. It preserves a utilization "snapshot" of
+ * previously-executed tasks, which helps better deduce how busy a CPU will
+ * be when a long-sleeping task wakes up. The contribution to CPU utilization
+ * of such a task would be significantly decayed at this point of time.
+ *
+ * CPU utilization can be higher than the current CPU capacity
+ * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
+ * of rounding errors as well as task migrations or wakeups of new tasks.
+ * CPU utilization has to be capped to fit into the [0..max CPU capacity]
+ * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
+ * could be seen as over-utilized even though CPU1 has 20% of spare CPU
+ * capacity. CPU utilization is allowed to overshoot current CPU capacity
+ * though since this is useful for predicting the CPU capacity required
+ * after task migrations (scheduler-driven DVFS).
+ *
+ * Return: (Estimated) utilization for the specified CPU.
+ */
+static unsigned long cpu_util(int cpu, struct task_struct *p, int dst_cpu)
{
struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
unsigned long util = READ_ONCE(cfs_rq->avg.util_avg);
@@ -7217,9 +7247,9 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
* contribution. In all the other cases @cpu is not impacted by the
* migration so its util_avg is already correct.
*/
- if (task_cpu(p) == cpu && dst_cpu != cpu)
+ if (p && task_cpu(p) == cpu && dst_cpu != cpu)
lsub_positive(&util, task_util(p));
- else if (task_cpu(p) != cpu && dst_cpu == cpu)
+ else if (p && task_cpu(p) != cpu && dst_cpu == cpu)
util += task_util(p);

if (sched_feat(UTIL_EST)) {
@@ -7255,7 +7285,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
*/
if (dst_cpu == cpu)
util_est += _task_util_est(p);
- else if (unlikely(task_on_rq_queued(p) || current == p))
+ else if (p && unlikely(task_on_rq_queued(p) || current == p))
lsub_positive(&util_est, _task_util_est(p));

util = max(util, util_est);
@@ -7264,6 +7294,11 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
return min(util, capacity_orig_of(cpu));
}

+unsigned long cpu_util_cfs(int cpu)
+{
+ return cpu_util(cpu, NULL, -1);
+}
+
/*
* cpu_util_without: compute cpu utilization without any contributions from *p
* @cpu: the CPU which utilization is requested
@@ -7281,9 +7316,9 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
{
/* Task has no contribution or is new */
if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
- return cpu_util_cfs(cpu);
+ p = NULL;

- return cpu_util_next(cpu, p, -1);
+ return cpu_util(cpu, p, -1);
}

/*
@@ -7330,7 +7365,7 @@ static inline void eenv_task_busy_time(struct energy_env *eenv,
* cpu_capacity.
*
* The contribution of the task @p for which we want to estimate the
- * energy cost is removed (by cpu_util_next()) and must be calculated
+ * energy cost is removed (by cpu_util()) and must be calculated
* separately (see eenv_task_busy_time). This ensures:
*
* - A stable PD utilization, no matter which CPU of that PD we want to place
@@ -7351,7 +7386,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
int cpu;

for_each_cpu(cpu, pd_cpus) {
- unsigned long util = cpu_util_next(cpu, p, -1);
+ unsigned long util = cpu_util(cpu, p, -1);

busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
}
@@ -7375,7 +7410,7 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,

for_each_cpu(cpu, pd_cpus) {
struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
- unsigned long util = cpu_util_next(cpu, p, dst_cpu);
+ unsigned long util = cpu_util(cpu, p, dst_cpu);
unsigned long cpu_util;

/*
@@ -7521,7 +7556,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
if (!cpumask_test_cpu(cpu, p->cpus_ptr))
continue;

- util = cpu_util_next(cpu, p, cpu);
+ util = cpu_util(cpu, p, cpu);
cpu_cap = capacity_of(cpu);

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d8ba81c..aaf6fc2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2955,53 +2955,8 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
return READ_ONCE(rq->avg_dl.util_avg);
}

-/**
- * cpu_util_cfs() - Estimates the amount of CPU capacity used by CFS tasks.
- * @cpu: the CPU to get the utilization for.
- *
- * The unit of the return value must be the same as the one of CPU capacity
- * so that CPU utilization can be compared with CPU capacity.
- *
- * CPU utilization is the sum of running time of runnable tasks plus the
- * recent utilization of currently non-runnable tasks on that CPU.
- * It represents the amount of CPU capacity currently used by CFS tasks in
- * the range [0..max CPU capacity] with max CPU capacity being the CPU
- * capacity at f_max.
- *
- * The estimated CPU utilization is defined as the maximum between CPU
- * utilization and sum of the estimated utilization of the currently
- * runnable tasks on that CPU. It preserves a utilization "snapshot" of
- * previously-executed tasks, which helps better deduce how busy a CPU will
- * be when a long-sleeping task wakes up. The contribution to CPU utilization
- * of such a task would be significantly decayed at this point of time.
- *
- * CPU utilization can be higher than the current CPU capacity
- * (f_curr/f_max * max CPU capacity) or even the max CPU capacity because
- * of rounding errors as well as task migrations or wakeups of new tasks.
- * CPU utilization has to be capped to fit into the [0..max CPU capacity]
- * range. Otherwise a group of CPUs (CPU0 util = 121% + CPU1 util = 80%)
- * could be seen as over-utilized even though CPU1 has 20% of spare CPU
- * capacity. CPU utilization is allowed to overshoot current CPU capacity
- * though since this is useful for predicting the CPU capacity required
- * after task migrations (scheduler-driven DVFS).
- *
- * Return: (Estimated) utilization for the specified CPU.
- */
-static inline unsigned long cpu_util_cfs(int cpu)
-{
- struct cfs_rq *cfs_rq;
- unsigned long util;
-
- cfs_rq = &cpu_rq(cpu)->cfs;
- util = READ_ONCE(cfs_rq->avg.util_avg);

- if (sched_feat(UTIL_EST)) {
- util = max_t(unsigned long, util,
- READ_ONCE(cfs_rq->avg.util_est.enqueued));
- }
-
- return min(util, capacity_orig_of(cpu));
-}
+extern unsigned long cpu_util_cfs(int cpu);

static inline unsigned long cpu_util_rt(struct rq *rq)
{