2021-06-30 14:13:51

by Xuewen Yan

[permalink] [raw]
Subject: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

From: Xuewen Yan <[email protected]>

Now in uclamp_rq_util_with(), when the task != NULL, the uclamp_max as following:
uc_rq_max = rq->uclamp[UCLAMP_MAX].value;
uc_eff_max = uclamp_eff_value(p, UCLAMP_MAX);
uclamp_max = max{uc_rq_max, uc_eff_max};

Consider the following scenario:
(1)the rq is idle, the uc_rq_max is last runnable task's UCLAMP_MAX;
(2)the p's uc_eff_max < uc_rq_max.

As a result, the uclamp_max = uc_rq_max instead of uc_eff_max, it is unreasonable.

The scenario often happens in find_energy_efficient_cpu(), when the task has smaller UCLAMP_MAX.

When rq has UCLAMP_FLAG_IDLE flag, enqueuing the task will lift UCLAMP_FLAG_IDLE
and set the rq clamp as the task's via uclamp_idle_reset(). It doesn't need
to read the rq clamp. And it can also avoid the problems described above.

Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")

Signed-off-by: Xuewen Yan <[email protected]>

---
change v2:
*add Fixes(Valentin Schneider);
*ignore all rq clamp when idle (Valentin Schneider)
---
kernel/sched/sched.h | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c80d42e9589b..14a41a243f7b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2818,20 +2818,27 @@ static __always_inline
unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
struct task_struct *p)
{
- unsigned long min_util;
- unsigned long max_util;
+ unsigned long min_util = 0;
+ unsigned long max_util = 0;

if (!static_branch_likely(&sched_uclamp_used))
return util;

- min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
- max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
-
if (p) {
- min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
- max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
+ min_util = uclamp_eff_value(p, UCLAMP_MIN);
+ max_util = uclamp_eff_value(p, UCLAMP_MAX);
+
+ /*
+ * Ignore last runnable task's max clamp, as this task will
+ * reset it. Similarly, no need to read the rq's min clamp.
+ */
+ if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
+ goto out;
}

+ min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
+ max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
+out:
/*
* Since CPU's {min,max}_util clamps are MAX aggregated considering
* RUNNABLE tasks with _different_ clamps, we can end up with an
--
2.25.1


2021-06-30 14:25:32

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle


On the subject: s/ucalmp/uclamp/

On 30/06/21 22:12, Xuewen Yan wrote:
> From: Xuewen Yan <[email protected]>
>
> Now in uclamp_rq_util_with(), when the task != NULL, the uclamp_max as following:
> uc_rq_max = rq->uclamp[UCLAMP_MAX].value;
> uc_eff_max = uclamp_eff_value(p, UCLAMP_MAX);
> uclamp_max = max{uc_rq_max, uc_eff_max};
>
> Consider the following scenario:
> (1)the rq is idle, the uc_rq_max is last runnable task's UCLAMP_MAX;
> (2)the p's uc_eff_max < uc_rq_max.
>
> As a result, the uclamp_max = uc_rq_max instead of uc_eff_max, it is unreasonable.
>
> The scenario often happens in find_energy_efficient_cpu(), when the task has smaller UCLAMP_MAX.
>
> When rq has UCLAMP_FLAG_IDLE flag, enqueuing the task will lift UCLAMP_FLAG_IDLE
> and set the rq clamp as the task's via uclamp_idle_reset(). It doesn't need
> to read the rq clamp. And it can also avoid the problems described above.
>
> Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
>
> Signed-off-by: Xuewen Yan <[email protected]>
>

Thanks!

Reviewed-by: Valentin Schneider <[email protected]>

2021-07-01 11:37:01

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

On 06/30/21 22:12, Xuewen Yan wrote:
> From: Xuewen Yan <[email protected]>
>
> Now in uclamp_rq_util_with(), when the task != NULL, the uclamp_max as following:
> uc_rq_max = rq->uclamp[UCLAMP_MAX].value;
> uc_eff_max = uclamp_eff_value(p, UCLAMP_MAX);
> uclamp_max = max{uc_rq_max, uc_eff_max};
>
> Consider the following scenario:
> (1)the rq is idle, the uc_rq_max is last runnable task's UCLAMP_MAX;
> (2)the p's uc_eff_max < uc_rq_max.
>
> As a result, the uclamp_max = uc_rq_max instead of uc_eff_max, it is unreasonable.
>
> The scenario often happens in find_energy_efficient_cpu(), when the task has smaller UCLAMP_MAX.
>
> When rq has UCLAMP_FLAG_IDLE flag, enqueuing the task will lift UCLAMP_FLAG_IDLE
> and set the rq clamp as the task's via uclamp_idle_reset(). It doesn't need
> to read the rq clamp. And it can also avoid the problems described above.
>
> Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
>
> Signed-off-by: Xuewen Yan <[email protected]>
>
> ---
> change v2:
> *add Fixes(Valentin Schneider);
> *ignore all rq clamp when idle (Valentin Schneider)
> ---
> kernel/sched/sched.h | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index c80d42e9589b..14a41a243f7b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2818,20 +2818,27 @@ static __always_inline
> unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
> struct task_struct *p)
> {
> - unsigned long min_util;
> - unsigned long max_util;
> + unsigned long min_util = 0;
> + unsigned long max_util = 0;
>
> if (!static_branch_likely(&sched_uclamp_used))
> return util;
>
> - min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> - max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> -
> if (p) {
> - min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
> - max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
> + min_util = uclamp_eff_value(p, UCLAMP_MIN);
> + max_util = uclamp_eff_value(p, UCLAMP_MAX);
> +
> + /*
> + * Ignore last runnable task's max clamp, as this task will
> + * reset it. Similarly, no need to read the rq's min clamp.
> + */
> + if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
> + goto out;

We read rq->uclamp_flags without locks here. Me thinks this needs READ_ONCE().
But since we care only about a single bit, I can't see any risk for
inconsistency, so we're fine.

Reviewed-by: Qais Yousef <[email protected]>

Thanks!

--
Qais Yousef

> }
>
> + min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
> + max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
> +out:
> /*
> * Since CPU's {min,max}_util clamps are MAX aggregated considering
> * RUNNABLE tasks with _different_ clamps, we can end up with an
> --
> 2.25.1
>

2021-07-02 11:57:09

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

On 07/02/21 13:12, Peter Zijlstra wrote:
> On Wed, Jun 30, 2021 at 10:12:04PM +0800, Xuewen Yan wrote:
> > From: Xuewen Yan <[email protected]>
> >
> > Now in uclamp_rq_util_with(), when the task != NULL, the uclamp_max as following:
> > uc_rq_max = rq->uclamp[UCLAMP_MAX].value;
> > uc_eff_max = uclamp_eff_value(p, UCLAMP_MAX);
> > uclamp_max = max{uc_rq_max, uc_eff_max};
> >
> > Consider the following scenario:
> > (1)the rq is idle, the uc_rq_max is last runnable task's UCLAMP_MAX;
> > (2)the p's uc_eff_max < uc_rq_max.
> >
> > As a result, the uclamp_max = uc_rq_max instead of uc_eff_max, it is unreasonable.
> >
> > The scenario often happens in find_energy_efficient_cpu(), when the task has smaller UCLAMP_MAX.
> >
> > When rq has UCLAMP_FLAG_IDLE flag, enqueuing the task will lift UCLAMP_FLAG_IDLE
> > and set the rq clamp as the task's via uclamp_idle_reset(). It doesn't need
> > to read the rq clamp. And it can also avoid the problems described above.
> >
> > Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
> >
> > Signed-off-by: Xuewen Yan <[email protected]>
>
> Valentin, Qais, can either of you write a Changelog/comment for this, I
> can't seem to make any sense of it.

Err, yeah I think I've been staring at uclamp for too long. It could be
clearer.

>
> Is this about wake-from-idle, where the first task's uclamp goes amis
> because the rq->uclamp values haven't been updated yet?

Yep. How about the below?

--->8---

sched/uclamp: Ignore max aggregation if rq is idle

When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.

This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.

Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.

--->8---

Thanks

--
Qais Yousef

2021-07-02 12:04:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

On Fri, Jul 02, 2021 at 12:54:21PM +0100, Qais Yousef wrote:
> Yep. How about the below?
>
> --->8---
>
> sched/uclamp: Ignore max aggregation if rq is idle
>
> When a task wakes up on an idle rq, uclamp_rq_util_with() would max
> aggregate with rq value. But since there is no task enqueued yet, the
> values are stale based on the last task that was running. When the new
> task actually wakes up and enqueued, then the rq uclamp values should
> reflect that of the newly woken up task effective uclamp values.
>
> This is a problem particularly for uclamp_max because it default to
> 1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
> would ignore the capping that should apply when this task is enqueued,
> which is wrong.
>
> Fix that by ignoring max aggregation if the rq is idle since in that
> case the effective uclamp value of the rq will be the ones of the task
> that will wake up.
>
> --->8---

Much better, I've updated it. Thanks!

2021-07-02 12:22:41

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

On 02/07/21 12:54, Qais Yousef wrote:
> sched/uclamp: Ignore max aggregation if rq is idle
>
> When a task wakes up on an idle rq, uclamp_rq_util_with() would max
> aggregate with rq value. But since there is no task enqueued yet, the
> values are stale based on the last task that was running. When the new

Nit: those values are "intentionally stale" for UCLAMP_MAX, per

e496187da710 ("sched/uclamp: Enforce last task's UCLAMP_MAX")

for UCLAMP_MIN we'll set uclamp_none(UCLAMP_MIN) == 0 upon dequeueing the
last runnable task, which DTRT.

> task actually wakes up and enqueued, then the rq uclamp values should
> reflect that of the newly woken up task effective uclamp values.
>
> This is a problem particularly for uclamp_max because it default to
^^^^^^^^^^^^
Per the above, it's "only" a problem for UCLAMP_MAX.

> 1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
> would ignore the capping that should apply when this task is enqueued,
> which is wrong.
>
> Fix that by ignoring max aggregation if the rq is idle since in that
> case the effective uclamp value of the rq will be the ones of the task
> that will wake up.
>

2021-07-02 13:03:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

On Wed, Jun 30, 2021 at 10:12:04PM +0800, Xuewen Yan wrote:
> From: Xuewen Yan <[email protected]>
>
> Now in uclamp_rq_util_with(), when the task != NULL, the uclamp_max as following:
> uc_rq_max = rq->uclamp[UCLAMP_MAX].value;
> uc_eff_max = uclamp_eff_value(p, UCLAMP_MAX);
> uclamp_max = max{uc_rq_max, uc_eff_max};
>
> Consider the following scenario:
> (1)the rq is idle, the uc_rq_max is last runnable task's UCLAMP_MAX;
> (2)the p's uc_eff_max < uc_rq_max.
>
> As a result, the uclamp_max = uc_rq_max instead of uc_eff_max, it is unreasonable.
>
> The scenario often happens in find_energy_efficient_cpu(), when the task has smaller UCLAMP_MAX.
>
> When rq has UCLAMP_FLAG_IDLE flag, enqueuing the task will lift UCLAMP_FLAG_IDLE
> and set the rq clamp as the task's via uclamp_idle_reset(). It doesn't need
> to read the rq clamp. And it can also avoid the problems described above.
>
> Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
>
> Signed-off-by: Xuewen Yan <[email protected]>

Valentin, Qais, can either of you write a Changelog/comment for this, I
can't seem to make any sense of it.

Is this about wake-from-idle, where the first task's uclamp goes amis
because the rq->uclamp values haven't been updated yet?


> ---
> kernel/sched/sched.h | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index c80d42e9589b..14a41a243f7b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2818,20 +2818,27 @@ static __always_inline
> unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
> struct task_struct *p)
> {
> - unsigned long min_util;
> - unsigned long max_util;
> + unsigned long min_util = 0;
> + unsigned long max_util = 0;
>
> if (!static_branch_likely(&sched_uclamp_used))
> return util;
>
> - min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> - max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> -
> if (p) {
> - min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
> - max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
> + min_util = uclamp_eff_value(p, UCLAMP_MIN);
> + max_util = uclamp_eff_value(p, UCLAMP_MAX);
> +
> + /*
> + * Ignore last runnable task's max clamp, as this task will
> + * reset it. Similarly, no need to read the rq's min clamp.
> + */
> + if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
> + goto out;
> }
>
> + min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
> + max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
> +out:
> /*
> * Since CPU's {min,max}_util clamps are MAX aggregated considering
> * RUNNABLE tasks with _different_ clamps, we can end up with an
> --
> 2.25.1
>

2021-07-02 15:01:56

by Xuewen Yan

[permalink] [raw]
Subject: Re: [PATCH v2] sched/uclamp: Avoid getting unreasonable ucalmp value when rq is idle

On Fri, Jul 2, 2021 at 8:12 PM Valentin Schneider
<[email protected]> wrote:
>
> On 02/07/21 12:54, Qais Yousef wrote:
> > sched/uclamp: Ignore max aggregation if rq is idle
> >
> > When a task wakes up on an idle rq, uclamp_rq_util_with() would max
> > aggregate with rq value. But since there is no task enqueued yet, the
> > values are stale based on the last task that was running. When the new
>
> Nit: those values are "intentionally stale" for UCLAMP_MAX, per
>
> e496187da710 ("sched/uclamp: Enforce last task's UCLAMP_MAX")
>
> for UCLAMP_MIN we'll set uclamp_none(UCLAMP_MIN) == 0 upon dequeueing the
> last runnable task, which DTRT.
>
> > task actually wakes up and enqueued, then the rq uclamp values should
> > reflect that of the newly woken up task effective uclamp values.
> >
> > This is a problem particularly for uclamp_max because it default to
> ^^^^^^^^^^^^
> Per the above, it's "only" a problem for UCLAMP_MAX.
>
> > 1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
> > would ignore the capping that should apply when this task is enqueued,
> > which is wrong.
> >
> > Fix that by ignoring max aggregation if the rq is idle since in that
> > case the effective uclamp value of the rq will be the ones of the task
> > that will wake up.
> >

Thanks!
xuewen

2021-07-05 07:56:57

by tip-bot2 for Tony Luck

[permalink] [raw]
Subject: [tip: sched/urgent] sched/uclamp: Ignore max aggregation if rq is idle

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 3e1493f46390618ea78607cb30c58fc19e2a5035
Gitweb: https://git.kernel.org/tip/3e1493f46390618ea78607cb30c58fc19e2a5035
Author: Xuewen Yan <[email protected]>
AuthorDate: Wed, 30 Jun 2021 22:12:04 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 02 Jul 2021 15:58:24 +02:00

sched/uclamp: Ignore max aggregation if rq is idle

When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.

This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.

Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.

Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
Signed-off-by: Xuewen Yan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
[qias: Changelog]
Reviewed-by: Qais Yousef <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/sched.h | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c80d42e..14a41a2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2818,20 +2818,27 @@ static __always_inline
unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
struct task_struct *p)
{
- unsigned long min_util;
- unsigned long max_util;
+ unsigned long min_util = 0;
+ unsigned long max_util = 0;

if (!static_branch_likely(&sched_uclamp_used))
return util;

- min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
- max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
-
if (p) {
- min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
- max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
+ min_util = uclamp_eff_value(p, UCLAMP_MIN);
+ max_util = uclamp_eff_value(p, UCLAMP_MAX);
+
+ /*
+ * Ignore last runnable task's max clamp, as this task will
+ * reset it. Similarly, no need to read the rq's min clamp.
+ */
+ if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
+ goto out;
}

+ min_util = max_t(unsigned long, min_util, READ_ONCE(rq->uclamp[UCLAMP_MIN].value));
+ max_util = max_t(unsigned long, max_util, READ_ONCE(rq->uclamp[UCLAMP_MAX].value));
+out:
/*
* Since CPU's {min,max}_util clamps are MAX aggregated considering
* RUNNABLE tasks with _different_ clamps, we can end up with an