Hi,
While uclamp restrictions currently only impacts schedutil's frequency
selection, it would make sense to also let it impact CPU selection in
asymmetric topologies. This would let us steer specific tasks towards
certain CPU capacities regardless of their actual utilization - I give a
few examples in patch 3.
The first two patches are just cleanups/renames, the meat of the thing is
in patches 3 and 4.
Note that this is in the same spirit as what Patrick had proposed for EAS
on Android [1]
[1]: https://android.googlesource.com/kernel/common/+/b61876ed122f816660fe49e0de1b7ee4891deaa2%5E%21
Revisions
=========
Changed in v2:
- Collect Reviewed-by
- Make uclamp_task_util() unconditionally use util_est (Quentin)
- Because of the above, move uclamp_task_util() to fair.c
- Split v1's 3/3 into
- task_fits_capacity() tweak (v2's 3/4)
- find_energy_efficient_cpu() tweak (v2's 4/4).
Cheers,
Valentin
Valentin Schneider (4):
sched/uclamp: Make uclamp_util_*() helpers use and return UL values
sched/uclamp: Rename uclamp_util_*() into uclamp_rq_util_*()
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/fair: Make feec() consider uclamp restrictions
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/sched/fair.c | 30 +++++++++++++++++++++++++++---
kernel/sched/sched.h | 22 +++++++++++-----------
3 files changed, 39 insertions(+), 15 deletions(-)
--
2.24.0
Vincent pointed out recently that the canonical type for utilization
values is 'unsigned long'. Internally uclamp uses 'unsigned int' values for
cache optimization, but this doesn't have to be exported to its users.
Make the uclamp helpers that deal with utilization use and return unsigned
long values.
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/sched.h | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..f1d035e5df7e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
static __always_inline
-unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
- struct task_struct *p)
+unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
+ struct task_struct *p)
{
- unsigned int min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
- unsigned int max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
+ unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
+ unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
if (p) {
- min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
- max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
+ min_util = max_t(unsigned long, min_util, uclamp_eff_value(p, UCLAMP_MIN));
+ max_util = max_t(unsigned long, max_util, uclamp_eff_value(p, UCLAMP_MAX));
}
/*
@@ -2325,17 +2325,17 @@ unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
return clamp(util, min_util, max_util);
}
-static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
+static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
{
return uclamp_util_with(rq, util, NULL);
}
#else /* CONFIG_UCLAMP_TASK */
-static inline unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
- struct task_struct *p)
+static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
+ struct task_struct *p)
{
return util;
}
-static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
+static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
{
return util;
}
--
2.24.0
We've just made task_fits_capacity() uclamp-aware, and
find_energy_efficient_cpu() needs to go through the same treatment.
Things are somewhat different here however - we can't directly use
the now uclamp-aware task_fits_capacity(). Consider the following setup:
rq.cpu_capacity_orig = 512
rq.util_avg = 200
rq.uclamp.max = 768
p.util_est = 600
p.uclamp.max = 256
(p not yet enqueued on rq)
Using task_fits_capacity() here would tell us that p fits on the above CPU.
However, enqueuing p on that CPU *will* cause it to become overutilized
since rq clamp values are max-aggregated, so we'd remain with
rq.uclamp.max = 768
Thus we could reach a high enough frequency to reach beyond 0.8 * 512
utilization (== overutilized). What feec() needs here is
uclamp_rq_util_with() which lets us peek at the future utilization
landscape, including rq-wide uclamp values.
Make find_energy_efficient_cpu() use uclamp_rq_util_with() for its
fits_capacity() check. This is in line with what compute_energy() ends up
using for estimating utilization.
Suggested-by: Quentin Perret <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/fair.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dc3e86cb2b2e..cc659a3944f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6284,9 +6284,18 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
if (!cpumask_test_cpu(cpu, p->cpus_ptr))
continue;
- /* Skip CPUs that will be overutilized. */
util = cpu_util_next(cpu, p, cpu);
cpu_cap = capacity_of(cpu);
+ spare_cap = cpu_cap - util;
+
+ /*
+ * Skip CPUs that cannot satisfy the capacity request.
+ * IOW, placing the task there would make the CPU
+ * overutilized. Take uclamp into account to see how
+ * much capacity we can get out of the CPU; this is
+ * aligned with schedutil_cpu_util().
+ */
+ util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
if (!fits_capacity(util, cpu_cap))
continue;
@@ -6301,7 +6310,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
* Find the CPU with the maximum spare capacity in
* the performance domain
*/
- spare_cap = cpu_cap - util;
if (spare_cap > max_spare_cap) {
max_spare_cap = spare_cap;
max_spare_cap_cpu = cpu;
--
2.24.0
task_fits_capacity() drives CPU selection at wakeup time, and is also used
to detect misfit tasks. Right now it does so by comparing task_util_est()
with a CPU's capacity, but doesn't take into account uclamp restrictions.
There's a few interesting uses that can come out of doing this. For
instance, a low uclamp.max value could prevent certain tasks from being
flagged as misfit tasks, so they could merrily remain on low-capacity CPUs.
Similarly, a high uclamp.min value would steer tasks towards high capacity
CPU at wakeup (and, should that fail, later steered via misfit balancing),
so such "boosted" tasks would favor CPUs of higher capacity.
Introduce uclamp_task_util() and make task_fits_capacity() use it.
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/fair.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..dc3e86cb2b2e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3711,6 +3711,22 @@ static inline unsigned long task_util_est(struct task_struct *p)
return max(task_util(p), _task_util_est(p));
}
+#ifdef CONFIG_UCLAMP_TASK
+static inline
+unsigned long uclamp_task_util(struct task_struct *p)
+{
+ return clamp(task_util_est(p),
+ (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
+ (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
+}
+#else
+static inline
+unsigned long uclamp_task_util(struct task_struct *p)
+{
+ return task_util_est(p);
+}
+#endif
+
static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
struct task_struct *p)
{
@@ -3822,7 +3838,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
static inline int task_fits_capacity(struct task_struct *p, long capacity)
{
- return fits_capacity(task_util_est(p), capacity);
+ return fits_capacity(uclamp_task_util(p), capacity);
}
static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
--
2.24.0
The current helpers return (CPU) rq utilization with uclamp restrictions
taken into account. As I'm about to introduce a task clamped utilization
variant, make it explicit that those deal with runqueues.
Reviewed-by: Quentin Perret <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/sched/sched.h | 14 +++++++-------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 86800b4d5453..9deb3a13416d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -240,7 +240,7 @@ unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
*/
util = util_cfs + cpu_util_rt(rq);
if (type == FREQUENCY_UTIL)
- util = uclamp_util_with(rq, util, p);
+ util = uclamp_rq_util_with(rq, util, p);
dl_util = cpu_util_dl(rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f1d035e5df7e..900328c4eeef 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2303,8 +2303,8 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
static __always_inline
-unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
- struct task_struct *p)
+unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
+ struct task_struct *p)
{
unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
@@ -2325,17 +2325,17 @@ unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
return clamp(util, min_util, max_util);
}
-static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
+static inline unsigned long uclamp_rq_util(struct rq *rq, unsigned long util)
{
- return uclamp_util_with(rq, util, NULL);
+ return uclamp_rq_util_with(rq, util, NULL);
}
#else /* CONFIG_UCLAMP_TASK */
-static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
- struct task_struct *p)
+static inline unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
+ struct task_struct *p)
{
return util;
}
-static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
+static inline unsigned long uclamp_rq_util(struct rq *rq, unsigned long util)
{
return util;
}
--
2.24.0
On Tue, 3 Dec 2019 at 17:02, Valentin Schneider
<[email protected]> wrote:
>
> Vincent pointed out recently that the canonical type for utilization
> values is 'unsigned long'. Internally uclamp uses 'unsigned int' values for
> cache optimization, but this doesn't have to be exported to its users.
>
> Make the uclamp helpers that deal with utilization use and return unsigned
> long values.
>
> Reviewed-by: Quentin Perret <[email protected]>
> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> kernel/sched/sched.h | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 280a3c735935..f1d035e5df7e 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
Why not changing uclamp_eff_value to return unsigned long too ? The
returned value represents a utilization to be compared with other
utilization value
>
> static __always_inline
> -unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
> - struct task_struct *p)
> +unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
> + struct task_struct *p)
> {
> - unsigned int min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> - unsigned int max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> + unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> + unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
>
> if (p) {
> - min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
> - max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
> + min_util = max_t(unsigned long, min_util, uclamp_eff_value(p, UCLAMP_MIN));
As mentioned above, uclamp_eff_value should return an unsigned long
> + max_util = max_t(unsigned long, max_util, uclamp_eff_value(p, UCLAMP_MAX));
> }
>
> /*
> @@ -2325,17 +2325,17 @@ unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
> return clamp(util, min_util, max_util);
> }
>
> -static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
> +static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
> {
> return uclamp_util_with(rq, util, NULL);
> }
> #else /* CONFIG_UCLAMP_TASK */
> -static inline unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
> - struct task_struct *p)
> +static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
> + struct task_struct *p)
> {
> return util;
> }
> -static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
> +static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
> {
> return util;
> }
> --
> 2.24.0
>
On 04/12/2019 15:22, Vincent Guittot wrote:
>> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
>> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
>
> Why not changing uclamp_eff_value to return unsigned long too ? The
> returned value represents a utilization to be compared with other
> utilization value
>
IMO uclamp_eff_value() is a simple accessor to uclamp_se.value
(unsigned int), which is why I didn't want to change its return type.
I see it as being the task equivalent of rq->uclamp[clamp_id].value, IOW
"give me the uclamp value for that clamp index". It just happens to be a
bit more intricate for tasks than for rqs.
uclamp_util() & uclamp_util_with() do explicitly return a utilization,
so here it makes sense (in my mind, that is) to return UL.
On 04/12/2019 16:12, Vincent Guittot wrote:
> On Wed, 4 Dec 2019 at 17:03, Valentin Schneider
> <[email protected]> wrote:
>>
>> On 04/12/2019 15:22, Vincent Guittot wrote:
>>>> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
>>>> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
>>>
>>> Why not changing uclamp_eff_value to return unsigned long too ? The
>>> returned value represents a utilization to be compared with other
>>> utilization value
>>>
>>
>> IMO uclamp_eff_value() is a simple accessor to uclamp_se.value
>> (unsigned int), which is why I didn't want to change its return type.
>> I see it as being the task equivalent of rq->uclamp[clamp_id].value, IOW
>> "give me the uclamp value for that clamp index". It just happens to be a
>> bit more intricate for tasks than for rqs.
>
> But then you have to take care of casting the returned value in
> several places here and in patch 3
>
True. I'm not too hot on having to handle rq clamp values
(rq->uclamp[x].value) and task clamp values (uclamp_eff_value())
differently (cast one but not the other), but I don't feel *too* strongly
about this, so if you want I can do that change for v3.
>>
>> uclamp_util() & uclamp_util_with() do explicitly return a utilization,
>> so here it makes sense (in my mind, that is) to return UL.
On Wed, 4 Dec 2019 at 17:03, Valentin Schneider
<[email protected]> wrote:
>
> On 04/12/2019 15:22, Vincent Guittot wrote:
> >> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> >> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
> >
> > Why not changing uclamp_eff_value to return unsigned long too ? The
> > returned value represents a utilization to be compared with other
> > utilization value
> >
>
> IMO uclamp_eff_value() is a simple accessor to uclamp_se.value
> (unsigned int), which is why I didn't want to change its return type.
> I see it as being the task equivalent of rq->uclamp[clamp_id].value, IOW
> "give me the uclamp value for that clamp index". It just happens to be a
> bit more intricate for tasks than for rqs.
But then you have to take care of casting the returned value in
several places here and in patch 3
>
> uclamp_util() & uclamp_util_with() do explicitly return a utilization,
> so here it makes sense (in my mind, that is) to return UL.
You can just use Math.clamp for that!
Idiots.
RAINER
Am Mi., 4. Dez. 2019 um 17:13 Uhr schrieb Vincent Guittot
<[email protected]>:
>
> On Wed, 4 Dec 2019 at 17:03, Valentin Schneider
> <[email protected]> wrote:
> >
> > On 04/12/2019 15:22, Vincent Guittot wrote:
> > >> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> > >> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
> > >
> > > Why not changing uclamp_eff_value to return unsigned long too ? The
> > > returned value represents a utilization to be compared with other
> > > utilization value
> > >
> >
> > IMO uclamp_eff_value() is a simple accessor to uclamp_se.value
> > (unsigned int), which is why I didn't want to change its return type.
> > I see it as being the task equivalent of rq->uclamp[clamp_id].value, IOW
> > "give me the uclamp value for that clamp index". It just happens to be a
> > bit more intricate for tasks than for rqs.
>
> But then you have to take care of casting the returned value in
> several places here and in patch 3
>
> >
> > uclamp_util() & uclamp_util_with() do explicitly return a utilization,
> > so here it makes sense (in my mind, that is) to return UL.
On Wed, 4 Dec 2019 at 18:15, Valentin Schneider
<[email protected]> wrote:
>
> On 04/12/2019 16:12, Vincent Guittot wrote:
> > On Wed, 4 Dec 2019 at 17:03, Valentin Schneider
> > <[email protected]> wrote:
> >>
> >> On 04/12/2019 15:22, Vincent Guittot wrote:
> >>>> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> >>>> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
> >>>
> >>> Why not changing uclamp_eff_value to return unsigned long too ? The
> >>> returned value represents a utilization to be compared with other
> >>> utilization value
> >>>
> >>
> >> IMO uclamp_eff_value() is a simple accessor to uclamp_se.value
> >> (unsigned int), which is why I didn't want to change its return type.
> >> I see it as being the task equivalent of rq->uclamp[clamp_id].value, IOW
> >> "give me the uclamp value for that clamp index". It just happens to be a
> >> bit more intricate for tasks than for rqs.
> >
> > But then you have to take care of casting the returned value in
> > several places here and in patch 3
> >
>
> True. I'm not too hot on having to handle rq clamp values
> (rq->uclamp[x].value) and task clamp values (uclamp_eff_value())
> differently (cast one but not the other), but I don't feel *too* strongly
> about this, so if you want I can do that change for v3.
Yes please. This makes the code easier to read and understand.
The rest of the patch series looks good to me. So feel free to add my
reviewed by on the next version
Reviewed-by: Vincent Guittot <[email protected]>
>
> >>
> >> uclamp_util() & uclamp_util_with() do explicitly return a utilization,
> >> so here it makes sense (in my mind, that is) to return UL.
On 04/12/2019 17:29, Vincent Guittot wrote:
>> True. I'm not too hot on having to handle rq clamp values
>> (rq->uclamp[x].value) and task clamp values (uclamp_eff_value())
>> differently (cast one but not the other), but I don't feel *too* strongly
>> about this, so if you want I can do that change for v3.
>
> Yes please. This makes the code easier to read and understand.
>
Will do.
> The rest of the patch series looks good to me. So feel free to add my
> reviewed by on the next version
> Reviewed-by: Vincent Guittot <[email protected]>>
Thanks!
On 03/12/2019 16:59, Valentin Schneider wrote:
[...]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 08a233e97a01..dc3e86cb2b2e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3711,6 +3711,22 @@ static inline unsigned long task_util_est(struct task_struct *p)
> return max(task_util(p), _task_util_est(p));
> }
>
> +#ifdef CONFIG_UCLAMP_TASK
> +static inline
> +unsigned long uclamp_task_util(struct task_struct *p)
> +{
> + return clamp(task_util_est(p),
> + (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
> + (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
> +}
> +#else
> +static inline
> +unsigned long uclamp_task_util(struct task_struct *p)
[...]
Save some lines?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cc659a3944f1..ab921ee356a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3712,16 +3712,14 @@ static inline unsigned long task_util_est(struct
task_struct *p)
}
#ifdef CONFIG_UCLAMP_TASK
-static inline
-unsigned long uclamp_task_util(struct task_struct *p)
+static inline unsigned long uclamp_task_util(struct task_struct *p)
{
return clamp(task_util_est(p),
(unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
(unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
}
#else
-static inline
-unsigned long uclamp_task_util(struct task_struct *p)
+static inline unsigned long uclamp_task_util(struct task_struct *p)
{
return task_util_est(p);
}
[...]
On 10/12/2019 17:07, Dietmar Eggemann wrote:
> On 03/12/2019 16:59, Valentin Schneider wrote:
>
> [...]
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 08a233e97a01..dc3e86cb2b2e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3711,6 +3711,22 @@ static inline unsigned long task_util_est(struct task_struct *p)
>> return max(task_util(p), _task_util_est(p));
>> }
>>
>> +#ifdef CONFIG_UCLAMP_TASK
>> +static inline
>> +unsigned long uclamp_task_util(struct task_struct *p)
>> +{
>> + return clamp(task_util_est(p),
>> + (unsigned long)uclamp_eff_value(p, UCLAMP_MIN),
>> + (unsigned long)uclamp_eff_value(p, UCLAMP_MAX));
>> +}
>> +#else
>> +static inline
>> +unsigned long uclamp_task_util(struct task_struct *p)
>
> [...]
>
> Save some lines?
>
Right! I went a bit overboard there.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
<snip>
On 03/12/2019 16:59, Valentin Schneider wrote:
[...]
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 280a3c735935..f1d035e5df7e 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
>
> static __always_inline
> -unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
> - struct task_struct *p)
> +unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
> + struct task_struct *p)
> {
> - unsigned int min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> - unsigned int max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> + unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> + unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
>
> if (p) {
> - min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
> - max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
> + min_util = max_t(unsigned long, min_util, uclamp_eff_value(p, UCLAMP_MIN));
> + max_util = max_t(unsigned long, max_util, uclamp_eff_value(p, UCLAMP_MAX));
> }
>
> /*
> @@ -2325,17 +2325,17 @@ unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
> return clamp(util, min_util, max_util);
> }
>
> -static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
> +static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
> {
> return uclamp_util_with(rq, util, NULL);
> }
> #else /* CONFIG_UCLAMP_TASK */
> -static inline unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
> - struct task_struct *p)
> +static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
> + struct task_struct *p)
> {
> return util;
> }
> -static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
> +static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
> {
> return util;
> }
There doesn't seem to be any user of uclamp_util(), only uclamp_util_with()?
On 03/12/2019 16:59, Valentin Schneider wrote:
Could you replace feec (find_energy_efficient_cpu) with EAS wakeup path
in the subject line? The term EAS is described in
Documentation/scheduler/sched-energy.rst so its probably easier to match
the patch to functionality.
> We've just made task_fits_capacity() uclamp-aware, and
> find_energy_efficient_cpu() needs to go through the same treatment.
> Things are somewhat different here however - we can't directly use
> the now uclamp-aware task_fits_capacity(). Consider the following setup:
>
> rq.cpu_capacity_orig = 512
> rq.util_avg = 200
> rq.uclamp.max = 768
>
> p.util_est = 600
> p.uclamp.max = 256
>
> (p not yet enqueued on rq)
>
> Using task_fits_capacity() here would tell us that p fits on the above CPU.
> However, enqueuing p on that CPU *will* cause it to become overutilized
> since rq clamp values are max-aggregated, so we'd remain with
I assume it doesn't harm to explicitly mention that this rq.uclamp.max =
768 value comes from another task already enqueued on a cfs_rq of this
rq. This gives same substance to the term max-aggregated here.
> rq.uclamp.max = 768
>
> Thus we could reach a high enough frequency to reach beyond 0.8 * 512
> utilization (== overutilized). What feec() needs here is
s/feec()/find_energy_efficient_cpu() ?
> uclamp_rq_util_with() which lets us peek at the future utilization
> landscape, including rq-wide uclamp values.
>
> Make find_energy_efficient_cpu() use uclamp_rq_util_with() for its
> fits_capacity() check. This is in line with what compute_energy() ends up
> using for estimating utilization.
This is also aligned with schedutil_cpu_util() (you do mention this in
the code later in this patch.
[...]
On 10/12/2019 18:09, Dietmar Eggemann wrote:
> On 03/12/2019 16:59, Valentin Schneider wrote:
>
> [...]
>
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 280a3c735935..f1d035e5df7e 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -2303,15 +2303,15 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
>> unsigned int uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
>>
>> static __always_inline
>> -unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
>> - struct task_struct *p)
>> +unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
>> + struct task_struct *p)
>> {
>> - unsigned int min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
>> - unsigned int max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
>> + unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
>> + unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
>>
>> if (p) {
>> - min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
>> - max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
>> + min_util = max_t(unsigned long, min_util, uclamp_eff_value(p, UCLAMP_MIN));
>> + max_util = max_t(unsigned long, max_util, uclamp_eff_value(p, UCLAMP_MAX));
>> }
>>
>> /*
>> @@ -2325,17 +2325,17 @@ unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
>> return clamp(util, min_util, max_util);
>> }
>>
>> -static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
>> +static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
>> {
>> return uclamp_util_with(rq, util, NULL);
>> }
>> #else /* CONFIG_UCLAMP_TASK */
>> -static inline unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
>> - struct task_struct *p)
>> +static inline unsigned long uclamp_util_with(struct rq *rq, unsigned long util,
>> + struct task_struct *p)
>> {
>> return util;
>> }
>> -static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
>> +static inline unsigned long uclamp_util(struct rq *rq, unsigned long util)
>> {
>> return util;
>> }
>
> There doesn't seem to be any user of uclamp_util(), only uclamp_util_with()?
>
It was added in
982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
uclamp_util_with() followed closely in
9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
and the sole uclamp_util() user (schedutil_cpu_util()) moved to the latter
in
af24bde8df20 ("sched/uclamp: Add uclamp support to energy_compute()")
So it does seem like we could get rid of it - I could see it being useful
later on, but it's useless right now.
On 10/12/2019 18:23, Dietmar Eggemann wrote:
> On 03/12/2019 16:59, Valentin Schneider wrote:
>
> Could you replace feec (find_energy_efficient_cpu) with EAS wakeup path
> in the subject line? The term EAS is described in
> Documentation/scheduler/sched-energy.rst so its probably easier to match
> the patch to functionality.
>
Will do.
>> We've just made task_fits_capacity() uclamp-aware, and
>> find_energy_efficient_cpu() needs to go through the same treatment.
>> Things are somewhat different here however - we can't directly use
>> the now uclamp-aware task_fits_capacity(). Consider the following setup:
>>
>> rq.cpu_capacity_orig = 512
>> rq.util_avg = 200
>> rq.uclamp.max = 768
>>
>> p.util_est = 600
>> p.uclamp.max = 256
>>
>> (p not yet enqueued on rq)
>>
>> Using task_fits_capacity() here would tell us that p fits on the above CPU.
>> However, enqueuing p on that CPU *will* cause it to become overutilized
>> since rq clamp values are max-aggregated, so we'd remain with
>
> I assume it doesn't harm to explicitly mention that this rq.uclamp.max =
> 768 value comes from another task already enqueued on a cfs_rq of this
> rq. This gives same substance to the term max-aggregated here.
>
I've changed the setup example to:
The target runqueue, rq:
rq.cpu_capacity_orig = 512
rq.cfs.avg.util_avg = 200
rq.uclamp.max = 768 // the max p.uclamp.max of all enqueued p's is 768
The waking task, p (not yet enqueued on rq):
p.util_est = 600
p.uclamp.max = 100
I'll also flesh out the rest.
>> rq.uclamp.max = 768
>>
>> Thus we could reach a high enough frequency to reach beyond 0.8 * 512
>> utilization (== overutilized). What feec() needs here is
>
> s/feec()/find_energy_efficient_cpu() ?
>
Will do.
>> uclamp_rq_util_with() which lets us peek at the future utilization
>> landscape, including rq-wide uclamp values.
>>
>> Make find_energy_efficient_cpu() use uclamp_rq_util_with() for its
>> fits_capacity() check. This is in line with what compute_energy() ends up
>> using for estimating utilization.
>
> This is also aligned with schedutil_cpu_util() (you do mention this in
> the code later in this patch.
>
That's what I imply with compute_energy() (which ends up calling
schedutil_cpu_util()).
> [...]
>