2020-10-27 14:43:10

by zhuguangqing83

[permalink] [raw]
Subject: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

From: zhuguangqing <[email protected]>

In the following code path, next_freq is clamped between policy->min
and policy->max twice in functions cpufreq_driver_resolve_freq() and
cpufreq_driver_fast_switch(). For there is no update_lock in the code
path, policy->min and policy->max may be modified (one or more times),
so sg_policy->next_freq updated in function sugov_update_next_freq()
may be not the final cpufreq. Next time when we use
"if (sg_policy->next_freq == next_freq)" to judge whether to update
next_freq, we may get a wrong result.

-> sugov_update_single()
-> get_next_freq()
-> cpufreq_driver_resolve_freq()
-> sugov_fast_switch()
-> sugov_update_next_freq()
-> cpufreq_driver_fast_switch()

For example, at first sg_policy->next_freq is 1 GHz, but the final
cpufreq is 1.2 GHz because policy->min is modified to 1.2 GHz when
we reached cpufreq_driver_fast_switch(). Then next time, policy->min
is changed before we reached cpufreq_driver_resolve_freq() and (assume)
next_freq is 1 GHz, we find "if (sg_policy->next_freq == next_freq)" is
satisfied so we don't change the cpufreq. Actually we should change
the cpufreq to 1.0 GHz this time.

Signed-off-by: zhuguangqing <[email protected]>
---
drivers/cpufreq/cpufreq.c | 6 +++---
include/linux/cpufreq.h | 2 +-
kernel/sched/cpufreq_schedutil.c | 21 ++++++++++-----------
3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index f4b60663efe6..7e8e03c7506b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2057,13 +2057,13 @@ EXPORT_SYMBOL(cpufreq_unregister_notifier);
* error condition, the hardware configuration must be preserved.
*/
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
- unsigned int target_freq)
+ unsigned int *target_freq)
{
unsigned int freq;
int cpu;

- target_freq = clamp_val(target_freq, policy->min, policy->max);
- freq = cpufreq_driver->fast_switch(policy, target_freq);
+ *target_freq = clamp_val(*target_freq, policy->min, policy->max);
+ freq = cpufreq_driver->fast_switch(policy, *target_freq);

if (!freq)
return 0;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index fa37b1c66443..790df38d48de 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -569,7 +569,7 @@ struct cpufreq_governor {

/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
- unsigned int target_freq);
+ unsigned int *target_freq);
int cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e254745a82cb..38d2dc55dd95 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -99,31 +99,30 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
return delta_ns >= sg_policy->freq_update_delay_ns;
}

-static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
+static inline bool sugov_update_next_freq(struct sugov_policy *sg_policy,
unsigned int next_freq)
{
- if (sg_policy->next_freq == next_freq)
- return false;
-
- sg_policy->next_freq = next_freq;
- sg_policy->last_freq_update_time = time;
-
- return true;
+ return sg_policy->next_freq == next_freq ? false : true;
}

static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
- if (sugov_update_next_freq(sg_policy, time, next_freq))
- cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
+ if (sugov_update_next_freq(sg_policy, next_freq)) {
+ cpufreq_driver_fast_switch(sg_policy->policy, &next_freq);
+ sg_policy->next_freq = next_freq;
+ sg_policy->last_freq_update_time = time;
+ }
}

static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
- if (!sugov_update_next_freq(sg_policy, time, next_freq))
+ if (!sugov_update_next_freq(sg_policy, next_freq))
return;

+ sg_policy->next_freq = next_freq;
+ sg_policy->last_freq_update_time = time;
if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);
--
2.17.1


2020-10-28 21:43:47

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

On 27-10-20, 19:54, [email protected] wrote:
> From: zhuguangqing <[email protected]>
>
> In the following code path, next_freq is clamped between policy->min
> and policy->max twice in functions cpufreq_driver_resolve_freq() and
> cpufreq_driver_fast_switch(). For there is no update_lock in the code
> path, policy->min and policy->max may be modified (one or more times),
> so sg_policy->next_freq updated in function sugov_update_next_freq()
> may be not the final cpufreq.

Understood until here, but not sure I followed everything after that.

> Next time when we use
> "if (sg_policy->next_freq == next_freq)" to judge whether to update
> next_freq, we may get a wrong result.
>
> -> sugov_update_single()
> -> get_next_freq()
> -> cpufreq_driver_resolve_freq()
> -> sugov_fast_switch()
> -> sugov_update_next_freq()
> -> cpufreq_driver_fast_switch()
>
> For example, at first sg_policy->next_freq is 1 GHz, but the final
> cpufreq is 1.2 GHz because policy->min is modified to 1.2 GHz when
> we reached cpufreq_driver_fast_switch(). Then next time, policy->min
> is changed before we reached cpufreq_driver_resolve_freq() and (assume)
> next_freq is 1 GHz, we find "if (sg_policy->next_freq == next_freq)" is
> satisfied so we don't change the cpufreq. Actually we should change
> the cpufreq to 1.0 GHz this time.

FWIW, whenever policy->min/max gets changed, sg_policy->limits_changed
gets set to true by sugov_limits() and the next time schedutil
callback gets called from the scheduler, we will fix the frequency.

And so there shouldn't be any issue here, unless I am missing
something.

--
viresh

2020-10-29 01:02:59

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

On 28-10-20, 19:03, zhuguangqing83 wrote:
> Thanks for your comments. Maybe my description was not clear before.
>
> If I understand correctly, when policy->min/max get changed in the time
> Window between get_next_freq() and sugov_fast_switch(), to be more
> precise between cpufreq_driver_resolve_freq() and
> cpufreq_driver_fast_switch(), the issue may happen.
>
> For example, the first time schedutil callback gets called from the
> scheduler, we reached get_next_freq() and calculate the next_freq,
> suppose next_freq is 1.0 GHz, then sg_policy->next_freq is updated
> to 1.0 GHz in sugov_update_next_freq(). If policy->min/max get
> change right now, suppose policy->min is changed to 1.2 GHz,
> then the final next_freq is 1.2 GHz for there is another clamp
> between policy->min and policy->max in cpufreq_driver_fast_switch().
> Then sg_policy->next_freq(1.0 GHz) is not the final next_freq(1.2 GHz).
>
> The second time schedutil callback gets called from the scheduler, there
> are two issues:
> (1) Suppose policy->min is still 1.2 GHz, we reached get_next_freq() and
> calculate the next_freq, because sg_policy->limits_changed gets set to
> true by sugov_limits() and there is a clamp between policy->min and
> policy->max, so this time next_freq will be greater than or equal to 1.2
> GHz, suppose next_freq is also 1.2 GHz. Now next_freq is 1.2 GHz and
> sg_policy->next_freq is 1.0 GHz, then we find
> "if (sg_policy->next_freq == next_freq)" is not satisfied and we call
> cpufreq driver to change the cpufreq to 1.2 GHz. Actually it's already
> 1.2 GHz, it's not necessary to change this time.

This isn't that bad, but ...

> (2) Suppose policy->min was changed again to 1.0 GHz before, we reached
> get_next_freq() and calculate the next_freq, suppose next_freq is also
> 1.0 GHz. Now next_freq is 1.0 GHz and sg_policy->next_freq is also 1.0 GHz,
> then we find "if (sg_policy->next_freq == next_freq)" is satisfied and we
> don't change the cpufreq. Actually we should change the cpufreq to 1.0 GHz
> this time.

This is a real problem we can get into. What about this diff instead ?

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 0c5c61a095f6..bf7800e853d3 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
if (sg_policy->next_freq == next_freq)
return false;

- sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;

return true;
@@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
if (sugov_update_next_freq(sg_policy, time, next_freq))
- cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
+ sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
}

static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
@@ -124,6 +123,7 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
if (!sugov_update_next_freq(sg_policy, time, next_freq))
return;

+ sg_policy->next_freq = next_freq;
if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);

--
viresh

2020-10-29 07:36:57

by zhuguangqing83

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq


> On 27-10-20, 19:54, [email protected] wrote:
> > From: zhuguangqing <[email protected]>
> >
> > In the following code path, next_freq is clamped between policy->min
> > and policy->max twice in functions cpufreq_driver_resolve_freq() and
> > cpufreq_driver_fast_switch(). For there is no update_lock in the code
> > path, policy->min and policy->max may be modified (one or more times),
> > so sg_policy->next_freq updated in function sugov_update_next_freq()
> > may be not the final cpufreq.
>
> Understood until here, but not sure I followed everything after that.
>
> > Next time when we use
> > "if (sg_policy->next_freq == next_freq)" to judge whether to update
> > next_freq, we may get a wrong result.
> >
> > -> sugov_update_single()
> > -> get_next_freq()
> > -> cpufreq_driver_resolve_freq()
> > -> sugov_fast_switch()
> > -> sugov_update_next_freq()
> > -> cpufreq_driver_fast_switch()
> >
> > For example, at first sg_policy->next_freq is 1 GHz, but the final
> > cpufreq is 1.2 GHz because policy->min is modified to 1.2 GHz when
> > we reached cpufreq_driver_fast_switch(). Then next time, policy->min
> > is changed before we reached cpufreq_driver_resolve_freq() and (assume)
> > next_freq is 1 GHz, we find "if (sg_policy->next_freq == next_freq)" is
> > satisfied so we don't change the cpufreq. Actually we should change
> > the cpufreq to 1.0 GHz this time.
>
> FWIW, whenever policy->min/max gets changed, sg_policy->limits_changed
> gets set to true by sugov_limits() and the next time schedutil
> callback gets called from the scheduler, we will fix the frequency.
>
> And so there shouldn't be any issue here, unless I am missing
> something.

Thanks for your comments. Maybe my description was not clear before.

If I understand correctly, when policy->min/max get changed in the time
Window between get_next_freq() and sugov_fast_switch(), to be more
precise between cpufreq_driver_resolve_freq() and
cpufreq_driver_fast_switch(), the issue may happen.

For example, the first time schedutil callback gets called from the
scheduler, we reached get_next_freq() and calculate the next_freq,
suppose next_freq is 1.0 GHz, then sg_policy->next_freq is updated
to 1.0 GHz in sugov_update_next_freq(). If policy->min/max get
change right now, suppose policy->min is changed to 1.2 GHz,
then the final next_freq is 1.2 GHz for there is another clamp
between policy->min and policy->max in cpufreq_driver_fast_switch().
Then sg_policy->next_freq(1.0 GHz) is not the final next_freq(1.2 GHz).

The second time schedutil callback gets called from the scheduler, there
are two issues:
(1) Suppose policy->min is still 1.2 GHz, we reached get_next_freq() and
calculate the next_freq, because sg_policy->limits_changed gets set to
true by sugov_limits() and there is a clamp between policy->min and
policy->max, so this time next_freq will be greater than or equal to 1.2
GHz, suppose next_freq is also 1.2 GHz. Now next_freq is 1.2 GHz and
sg_policy->next_freq is 1.0 GHz, then we find
"if (sg_policy->next_freq == next_freq)" is not satisfied and we call
cpufreq driver to change the cpufreq to 1.2 GHz. Actually it's already
1.2 GHz, it's not necessary to change this time.

(2) Suppose policy->min was changed again to 1.0 GHz before, we reached
get_next_freq() and calculate the next_freq, suppose next_freq is also
1.0 GHz. Now next_freq is 1.0 GHz and sg_policy->next_freq is also 1.0 GHz,
then we find "if (sg_policy->next_freq == next_freq)" is satisfied and we
don't change the cpufreq. Actually we should change the cpufreq to 1.0 GHz
this time.

>
> --
> viresh

2020-10-29 08:49:46

by zhuguangqing83

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq


> On 28-10-20, 19:03, zhuguangqing83 wrote:
> > Thanks for your comments. Maybe my description was not clear before.
> >
> > If I understand correctly, when policy->min/max get changed in the time
> > Window between get_next_freq() and sugov_fast_switch(), to be more
> > precise between cpufreq_driver_resolve_freq() and
> > cpufreq_driver_fast_switch(), the issue may happen.
> >
> > For example, the first time schedutil callback gets called from the
> > scheduler, we reached get_next_freq() and calculate the next_freq,
> > suppose next_freq is 1.0 GHz, then sg_policy->next_freq is updated
> > to 1.0 GHz in sugov_update_next_freq(). If policy->min/max get
> > change right now, suppose policy->min is changed to 1.2 GHz,
> > then the final next_freq is 1.2 GHz for there is another clamp
> > between policy->min and policy->max in cpufreq_driver_fast_switch().
> > Then sg_policy->next_freq(1.0 GHz) is not the final next_freq(1.2 GHz).
> >
> > The second time schedutil callback gets called from the scheduler, there
> > are two issues:
> > (1) Suppose policy->min is still 1.2 GHz, we reached get_next_freq() and
> > calculate the next_freq, because sg_policy->limits_changed gets set to
> > true by sugov_limits() and there is a clamp between policy->min and
> > policy->max, so this time next_freq will be greater than or equal to 1.2
> > GHz, suppose next_freq is also 1.2 GHz. Now next_freq is 1.2 GHz and
> > sg_policy->next_freq is 1.0 GHz, then we find
> > "if (sg_policy->next_freq == next_freq)" is not satisfied and we call
> > cpufreq driver to change the cpufreq to 1.2 GHz. Actually it's already
> > 1.2 GHz, it's not necessary to change this time.
>
> This isn't that bad, but ...
>
> > (2) Suppose policy->min was changed again to 1.0 GHz before, we reached
> > get_next_freq() and calculate the next_freq, suppose next_freq is also
> > 1.0 GHz. Now next_freq is 1.0 GHz and sg_policy->next_freq is also 1.0 GHz,
> > then we find "if (sg_policy->next_freq == next_freq)" is satisfied and we
> > don't change the cpufreq. Actually we should change the cpufreq to 1.0 GHz
> > this time.
>
> This is a real problem we can get into. What about this diff instead ?
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 0c5c61a095f6..bf7800e853d3 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> if (sg_policy->next_freq == next_freq)
> return false;
>
> - sg_policy->next_freq = next_freq;
> sg_policy->last_freq_update_time = time;
>
> return true;

It's a little strange that sg_policy->next_freq and
sg_policy->last_freq_update_time are not updated at the same time.

> @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> if (sugov_update_next_freq(sg_policy, time, next_freq))
> - cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> + sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> }
>

Great, it also takes into account the issue that 0 is returned by the
driver's ->fast_switch() callback to indicate an error condition.

For policy->min/max may be not the real CPU frequency in OPPs, so
next_freq got from get_next_freq() which is after clamping between
policy->min and policy->max may be not the real CPU frequency in OPPs.
In that case, if we use a real CPU frequency in OPPs returned from
cpufreq_driver_fast_switch() to compare with next_freq,
"if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
change the CPU frequency every time schedutil callback gets called from
the scheduler. I see the current code in get_next_freq() as following,
the issue mentioned above should not happen. Maybe it's just one of my
unnecessary worries.

static unsigned int get_next_freq(struct sugov_policy *sg_policy,
unsigned long util, unsigned long max)
{
......
if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
return sg_policy->next_freq;
......
}

> static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> @@ -124,6 +123,7 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> if (!sugov_update_next_freq(sg_policy, time, next_freq))
> return;
>
> + sg_policy->next_freq = next_freq;
> if (!sg_policy->work_in_progress) {
> sg_policy->work_in_progress = true;
> irq_work_queue(&sg_policy->irq_work);
>
> --
> viresh

2020-10-29 09:00:26

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

Your mail client is screwing the "In-reply-to" field of the message
and that prevents it to appear properly in the thread in mailboxes of
other people, please fix that.

On 29-10-20, 09:43, zhuguangqing83 wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 0c5c61a095f6..bf7800e853d3 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > if (sg_policy->next_freq == next_freq)
> > return false;
> >
> > - sg_policy->next_freq = next_freq;
> > sg_policy->last_freq_update_time = time;
> >
> > return true;
>
> It's a little strange that sg_policy->next_freq and
> sg_policy->last_freq_update_time are not updated at the same time.
>
> > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > unsigned int next_freq)
> > {
> > if (sugov_update_next_freq(sg_policy, time, next_freq))
> > - cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > + sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > }
> >
>
> Great, it also takes into account the issue that 0 is returned by the
> driver's ->fast_switch() callback to indicate an error condition.

Yes but even my change wasn't good enough, more on it later.

> For policy->min/max may be not the real CPU frequency in OPPs, so

No, that can't happen. If userspace tries to set a value too large or
too small, we clamp that too to policy->max/min and so the below
problem shall never occur.

> next_freq got from get_next_freq() which is after clamping between
> policy->min and policy->max may be not the real CPU frequency in OPPs.
> In that case, if we use a real CPU frequency in OPPs returned from
> cpufreq_driver_fast_switch() to compare with next_freq,
> "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> change the CPU frequency every time schedutil callback gets called from
> the scheduler. I see the current code in get_next_freq() as following,
> the issue mentioned above should not happen. Maybe it's just one of my
> unnecessary worries.

Coming back to my patch (and yours too), it only fixes the fast-switch
case and not the slow path which can also end up clamping the
frequency. And to be honest, even the drivers can have their own
clamping code in place, no one is stopping them too.

And we also need to do something about the cached_raw_freq as well, as
it will not be in sync with next_freq anymore.

Here is another attempt from me tackling this problem. The idea is to
check if the previous freq update went as expected or not. And if not,
we can't rely on next_freq or cached_raw_freq anymore. For this to
work properly, we need to make sure policy->cur isn't getting updated
at the same time when get_next_freq() is running. For that I have
given a different meaning to work_in_progress flag, which now creates
a lockless (kind of) critical section where we won't play with
next_freq while the cpufreq core is updating the frequency.

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 0c5c61a095f6..8991cc31b011 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
- if (!sugov_update_next_freq(sg_policy, time, next_freq))
- return;
-
- if (!sg_policy->work_in_progress) {
- sg_policy->work_in_progress = true;
+ if (sugov_update_next_freq(sg_policy, time, next_freq))
irq_work_queue(&sg_policy->irq_work);
- }
}

/**
@@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
unsigned int freq = arch_scale_freq_invariant() ?
policy->cpuinfo.max_freq : policy->cur;

+ /*
+ * The previous frequency update didn't go as we expected it to be. Lets
+ * start again to make sure we don't miss any updates.
+ */
+ if (unlikely(policy->cur != sg_policy->next_freq)) {
+ sg_policy->next_freq = 0;
+ sg_policy->cached_raw_freq = 0;
+ }
+
freq = map_util_freq(util, freq, max);

if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
@@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

ignore_dl_rate_limit(sg_cpu, sg_policy);

+ if (!sg_policy->policy->fast_switch_enabled) {
+ raw_spin_lock(&sg_policy->update_lock);
+ if (sg_policy->work_in_progress)
+ goto unlock;
+ }
+
if (!sugov_should_update_freq(sg_policy, time))
- return;
+ goto unlock;

/* Limits may have changed, don't skip frequency update */
busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
@@ -363,13 +373,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
* concurrently on two different CPUs for the same target and it is not
* necessary to acquire the lock in the fast switch case.
*/
- if (sg_policy->policy->fast_switch_enabled) {
+ if (sg_policy->policy->fast_switch_enabled)
sugov_fast_switch(sg_policy, time, next_f);
- } else {
- raw_spin_lock(&sg_policy->update_lock);
+ else
sugov_deferred_update(sg_policy, time, next_f);
+
+unlock:
+ if (!sg_policy->policy->fast_switch_enabled)
raw_spin_unlock(&sg_policy->update_lock);
- }
}

static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
@@ -405,6 +416,9 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)

raw_spin_lock(&sg_policy->update_lock);

+ if (sg_policy->work_in_progress)
+ goto unlock;
+
sugov_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;

@@ -419,33 +433,30 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
sugov_deferred_update(sg_policy, time, next_f);
}

+unlock:
raw_spin_unlock(&sg_policy->update_lock);
}

static void sugov_work(struct kthread_work *work)
{
struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
- unsigned int freq;
unsigned long flags;

/*
- * Hold sg_policy->update_lock shortly to handle the case where:
- * incase sg_policy->next_freq is read here, and then updated by
- * sugov_deferred_update() just before work_in_progress is set to false
- * here, we may miss queueing the new update.
- *
- * Note: If a work was queued after the update_lock is released,
- * sugov_work() will just be called again by kthread_work code; and the
- * request will be proceed before the sugov thread sleeps.
+ * Prevent the schedutil hook to run in parallel while we are updating
+ * the frequency here and accessing next_freq.
*/
raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
- freq = sg_policy->next_freq;
- sg_policy->work_in_progress = false;
+ sg_policy->work_in_progress = true;
raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);

mutex_lock(&sg_policy->work_lock);
- __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
+ __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L);
mutex_unlock(&sg_policy->work_lock);
+
+ raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
+ sg_policy->work_in_progress = false;
+ raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
}

static void sugov_irq_work(struct irq_work *irq_work)

--
viresh

2020-10-29 11:21:36

by zhuguangqing83

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

On 10/29/2020 15:19??Viresh Kumar<[email protected]> wrote??
> Your mail client is screwing the "In-reply-to" field of the message
> and that prevents it to appear properly in the thread in mailboxes of
> other people, please fix that.
>

I will try to fix that.

> On 29-10-20, 09:43, zhuguangqing83 wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 0c5c61a095f6..bf7800e853d3 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > if (sg_policy->next_freq == next_freq)
> > > return false;
> > >
> > > - sg_policy->next_freq = next_freq;
> > > sg_policy->last_freq_update_time = time;
> > >
> > > return true;
> >
> > It's a little strange that sg_policy->next_freq and
> > sg_policy->last_freq_update_time are not updated at the same time.
> >
> > > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > if (sugov_update_next_freq(sg_policy, time, next_freq))
> > > - cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > + sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > }
> > >
> >
> > Great, it also takes into account the issue that 0 is returned by the
> > driver's ->fast_switch() callback to indicate an error condition.
>
> Yes but even my change wasn't good enough, more on it later.
>
> > For policy->min/max may be not the real CPU frequency in OPPs, so
>
> No, that can't happen. If userspace tries to set a value too large or
> too small, we clamp that too to policy->max/min and so the below
> problem shall never occur.
>
> > next_freq got from get_next_freq() which is after clamping between
> > policy->min and policy->max may be not the real CPU frequency in OPPs.
> > In that case, if we use a real CPU frequency in OPPs returned from
> > cpufreq_driver_fast_switch() to compare with next_freq,
> > "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> > change the CPU frequency every time schedutil callback gets called from
> > the scheduler. I see the current code in get_next_freq() as following,
> > the issue mentioned above should not happen. Maybe it's just one of my
> > unnecessary worries.
>
> Coming back to my patch (and yours too), it only fixes the fast-switch
> case and not the slow path which can also end up clamping the
> frequency. And to be honest, even the drivers can have their own
> clamping code in place, no one is stopping them too.
>
> And we also need to do something about the cached_raw_freq as well, as
> it will not be in sync with next_freq anymore.
>
> Here is another attempt from me tackling this problem. The idea is to
> check if the previous freq update went as expected or not. And if not,
> we can't rely on next_freq or cached_raw_freq anymore. For this to
> work properly, we need to make sure policy->cur isn't getting updated
> at the same time when get_next_freq() is running. For that I have
> given a different meaning to work_in_progress flag, which now creates
> a lockless (kind of) critical section where we won't play with
> next_freq while the cpufreq core is updating the frequency.
>

I think your patch is ok for tackling this problem.

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 0c5c61a095f6..8991cc31b011 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> - if (!sugov_update_next_freq(sg_policy, time, next_freq))
> - return;
> -
> - if (!sg_policy->work_in_progress) {
> - sg_policy->work_in_progress = true;
> + if (sugov_update_next_freq(sg_policy, time, next_freq))
> irq_work_queue(&sg_policy->irq_work);
> - }
> }
>
> /**
> @@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> unsigned int freq = arch_scale_freq_invariant() ?
> policy->cpuinfo.max_freq : policy->cur;
>
> + /*
> + * The previous frequency update didn't go as we expected it to be. Lets
> + * start again to make sure we don't miss any updates.
> + */
> + if (unlikely(policy->cur != sg_policy->next_freq)) {
> + sg_policy->next_freq = 0;
> + sg_policy->cached_raw_freq = 0;
> + }
> +
> freq = map_util_freq(util, freq, max);
>
> if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> @@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
> ignore_dl_rate_limit(sg_cpu, sg_policy);
>
> + if (!sg_policy->policy->fast_switch_enabled) {
> + raw_spin_lock(&sg_policy->update_lock);
> + if (sg_policy->work_in_progress)
> + goto unlock;
> + }
> +

Maybe it's better to bring the following code before the code above.
if (!sugov_should_update_freq(sg_policy, time))
return;

> if (!sugov_should_update_freq(sg_policy, time))
> - return;
> + goto unlock;
>
> /* Limits may have changed, don't skip frequency update */
> busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> @@ -363,13 +373,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> * concurrently on two different CPUs for the same target and it is not
> * necessary to acquire the lock in the fast switch case.
> */
> - if (sg_policy->policy->fast_switch_enabled) {
> + if (sg_policy->policy->fast_switch_enabled)
> sugov_fast_switch(sg_policy, time, next_f);
> - } else {
> - raw_spin_lock(&sg_policy->update_lock);
> + else
> sugov_deferred_update(sg_policy, time, next_f);
> +
> +unlock:
> + if (!sg_policy->policy->fast_switch_enabled)
> raw_spin_unlock(&sg_policy->update_lock);
> - }
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> @@ -405,6 +416,9 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
>
> raw_spin_lock(&sg_policy->update_lock);
>
> + if (sg_policy->work_in_progress)
> + goto unlock;
> +
> sugov_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
>
> @@ -419,33 +433,30 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> sugov_deferred_update(sg_policy, time, next_f);
> }
>
> +unlock:
> raw_spin_unlock(&sg_policy->update_lock);
> }
>
> static void sugov_work(struct kthread_work *work)
> {
> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> - unsigned int freq;
> unsigned long flags;
>
> /*
> - * Hold sg_policy->update_lock shortly to handle the case where:
> - * incase sg_policy->next_freq is read here, and then updated by
> - * sugov_deferred_update() just before work_in_progress is set to false
> - * here, we may miss queueing the new update.
> - *
> - * Note: If a work was queued after the update_lock is released,
> - * sugov_work() will just be called again by kthread_work code; and the
> - * request will be proceed before the sugov thread sleeps.
> + * Prevent the schedutil hook to run in parallel while we are updating
> + * the frequency here and accessing next_freq.
> */
> raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> - freq = sg_policy->next_freq;
> - sg_policy->work_in_progress = false;
> + sg_policy->work_in_progress = true;
> raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> mutex_lock(&sg_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> + __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L);
> mutex_unlock(&sg_policy->work_lock);
> +
> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> }
>
> static void sugov_irq_work(struct irq_work *irq_work)
>
> --
> viresh

2020-10-29 11:29:24

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

On 29-10-20, 19:17, zhuguangqing83 wrote:
> I think your patch is ok for tackling this problem.

Great.

> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 0c5c61a095f6..8991cc31b011 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> > unsigned int next_freq)
> > {
> > - if (!sugov_update_next_freq(sg_policy, time, next_freq))
> > - return;
> > -
> > - if (!sg_policy->work_in_progress) {
> > - sg_policy->work_in_progress = true;
> > + if (sugov_update_next_freq(sg_policy, time, next_freq))
> > irq_work_queue(&sg_policy->irq_work);
> > - }
> > }
> >
> > /**
> > @@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> > unsigned int freq = arch_scale_freq_invariant() ?
> > policy->cpuinfo.max_freq : policy->cur;
> >
> > + /*
> > + * The previous frequency update didn't go as we expected it to be. Lets
> > + * start again to make sure we don't miss any updates.
> > + */
> > + if (unlikely(policy->cur != sg_policy->next_freq)) {
> > + sg_policy->next_freq = 0;
> > + sg_policy->cached_raw_freq = 0;
> > + }
> > +
> > freq = map_util_freq(util, freq, max);
> >
> > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > @@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >
> > ignore_dl_rate_limit(sg_cpu, sg_policy);
> >
> > + if (!sg_policy->policy->fast_switch_enabled) {
> > + raw_spin_lock(&sg_policy->update_lock);
> > + if (sg_policy->work_in_progress)
> > + goto unlock;
> > + }
> > +
>
> Maybe it's better to bring the following code before the code above.
> if (!sugov_should_update_freq(sg_policy, time))
> return;

Maybe not. We want to avoid everything in case a freq-update is on the
way elsewhere as there are other flags we are touching in
sugov_should_update_freq().

I will send a proper patch for this shortly. It would be helpful if
you can give it a go and provide your tested-by.

--
viresh

2020-10-29 12:55:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

On Thursday, October 29, 2020 8:19:52 AM CET Viresh Kumar wrote:
> Your mail client is screwing the "In-reply-to" field of the message
> and that prevents it to appear properly in the thread in mailboxes of
> other people, please fix that.
>
> On 29-10-20, 09:43, zhuguangqing83 wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 0c5c61a095f6..bf7800e853d3 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > if (sg_policy->next_freq == next_freq)
> > > return false;
> > >
> > > - sg_policy->next_freq = next_freq;
> > > sg_policy->last_freq_update_time = time;
> > >
> > > return true;
> >
> > It's a little strange that sg_policy->next_freq and
> > sg_policy->last_freq_update_time are not updated at the same time.
> >
> > > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > if (sugov_update_next_freq(sg_policy, time, next_freq))
> > > - cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > + sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > }
> > >
> >
> > Great, it also takes into account the issue that 0 is returned by the
> > driver's ->fast_switch() callback to indicate an error condition.
>
> Yes but even my change wasn't good enough, more on it later.
>
> > For policy->min/max may be not the real CPU frequency in OPPs, so
>
> No, that can't happen. If userspace tries to set a value too large or
> too small, we clamp that too to policy->max/min and so the below
> problem shall never occur.
>
> > next_freq got from get_next_freq() which is after clamping between
> > policy->min and policy->max may be not the real CPU frequency in OPPs.
> > In that case, if we use a real CPU frequency in OPPs returned from
> > cpufreq_driver_fast_switch() to compare with next_freq,
> > "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> > change the CPU frequency every time schedutil callback gets called from
> > the scheduler. I see the current code in get_next_freq() as following,
> > the issue mentioned above should not happen. Maybe it's just one of my
> > unnecessary worries.
>
> Coming back to my patch (and yours too), it only fixes the fast-switch
> case and not the slow path which can also end up clamping the
> frequency. And to be honest, even the drivers can have their own
> clamping code in place, no one is stopping them too.
>
> And we also need to do something about the cached_raw_freq as well, as
> it will not be in sync with next_freq anymore.

I'm not actually sure why freq is not clamped between the policy min
and max before comparing it with the cached "raw" value.

IIRC the reason for having cached_raw_freq is because freq may not
correspond to any of the available values in the driver's table, but
why does it not take the policy min and max into account?

> Here is another attempt from me tackling this problem. The idea is to
> check if the previous freq update went as expected or not. And if not,
> we can't rely on next_freq or cached_raw_freq anymore.

Well, need_update_freq was kind of supposed to be the flag for that.

IOW, if need_update_freq is set, all of the cached frequency values should
be discarded and the update should go to the driver, or at least to
__cpufreq_driver_target(). [Note that if user space flips one of the
limits to a new value and back to the previous one between two consecutive
utilization updates from the scheduler, we'll end up invoking the driver
even though the frequency hasn't really changed, but that's part of the
design.]

IOW, when need_freq_update is set in sugov_should_update_freq(), it should
guarantee that the update will reach either cpufreq_driver_fast_switch()
or __cpufreq_driver_target(). IMO this needs to be fixed first.

Now, say this is the case.

Can we miss a limits update?

If limits_changed is set in sugov_limits() before sugov_should_update_freq()
checks it, then all should be fine if I'm not missing anything, so say it is
set after that check.

First, suppose that it wasn't set before, so the check fails and the update
goes with need_update_freq unset. In that case the next run will have
need_freq_update set and it will reach the "lower level".

In turn, if it was set before, the check succeeds and now limits_changed is
cleared and need_freq_update is set. Obviously, there is a race between
sugov_limits() and sugov_should_update_freq() and say the former wins, so
limits_changed is set again and the current update as well as the next update
will both have need_freq_update set and the both of them will reach the
"lower level". This is fine.

OTOH, if sugov_should_update_freq() wins the race, limits_changed will remain
unset, so the next update will not have need_freq_update set, but it will be
set for the current one which will reach the "lower level".

So this should work in either case if I'm not mistaken.

Hence, the problem appears to be that setting need_freq_update does not guarantee
that the update will reach the "lower level" and it looks like need_freq_update
is cleared too early: it should be cleared in sugov_update_next_freq() which
should let the update reach the "lower level" if need_freq_update is set.



2020-10-30 07:33:03

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set

The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.

For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.

Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.

Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.

This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.

As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.

We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.

Reported-by: zhuguangqing <[email protected]>
Suggested-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index c03a5775d019..c6861be02c86 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
- if (sg_policy->next_freq == next_freq &&
- !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
- return false;
+ if (!sg_policy->need_freq_update) {
+ if (sg_policy->next_freq == next_freq)
+ return false;
+ } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
+ sg_policy->need_freq_update = false;
+ }

sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
@@ -162,11 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,

freq = map_util_freq(util, freq, max);

- if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update &&
- !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
+ if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
return sg_policy->next_freq;

- sg_policy->need_freq_update = false;
sg_policy->cached_raw_freq = freq;
return cpufreq_driver_resolve_freq(policy, freq);
}
@@ -442,7 +443,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
unsigned long util, max;
unsigned int next_f;
- bool busy;
unsigned int cached_freq = sg_policy->cached_raw_freq;

sugov_iowait_boost(sg_cpu, time, flags);
@@ -453,9 +453,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
if (!sugov_should_update_freq(sg_policy, time))
return;

- /* Limits may have changed, don't skip frequency update */
- busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
-
util = sugov_get_util(sg_cpu);
max = sg_cpu->max;
util = sugov_iowait_apply(sg_cpu, time, util, max);
@@ -464,7 +461,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
* Do not reduce the frequency if the CPU has not been idle
* recently, as the reduction is likely to be premature then.
*/
- if (busy && next_f < sg_policy->next_freq) {
+ if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
next_f = sg_policy->next_freq;

/* Restore cached freq as next_freq has changed */
@@ -829,9 +826,10 @@ static int sugov_start(struct cpufreq_policy *policy)
sg_policy->next_freq = 0;
sg_policy->work_in_progress = false;
sg_policy->limits_changed = false;
- sg_policy->need_freq_update = false;
sg_policy->cached_raw_freq = 0;

+ sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
+
for_each_cpu(cpu, policy->cpus) {
struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);

--
2.25.0.rc1.19.g042ed3e048af

2020-10-30 15:10:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set

On Fri, Oct 30, 2020 at 8:31 AM Viresh Kumar <[email protected]> wrote:
>
> The cpufreq policy's frequency limits (min/max) can get changed at any
> point of time, while schedutil is trying to update the next frequency.
> Though the schedutil governor has necessary locking and support in place
> to make sure we don't miss any of those updates, there is a corner case
> where the governor will find that the CPU is already running at the
> desired frequency and so may skip an update.
>
> For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
> and is running at 1 GHz currently. Schedutil tries to update the
> frequency to 1.2 GHz, during this time the policy limits get changed as
> policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
> frequency at various instances, we will eventually set the frequency to
> 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
>
> Now lets say the policy limits get changed back at this time with
> policy->min as 1 GHz. The next time schedutil is invoked by the
> scheduler, we will reevaluate the next frequency (because
> need_freq_update will get set due to limits change event) and lets say
> we want to set the frequency to 1.2 GHz again. At this point
> sugov_update_next_freq() will find the next_freq == current_freq and
> will abort the update, while the CPU actually runs at 1.4 GHz.
>
> Until now need_freq_update was used as a flag to indicate that the
> policy's frequency limits have changed, and that we should consider the
> new limits while reevaluating the next frequency.
>
> This patch fixes the above mentioned issue by extending the purpose of
> the need_freq_update flag. If this flag is set now, the schedutil
> governor will not try to abort a frequency change even if next_freq ==
> current_freq.
>
> As similar behavior is required in the case of
> CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
> set to false if that flag is set for the driver.
>
> We also don't need to consider the need_freq_update flag in
> sugov_update_single() anymore to handle the special case of busy CPU, as
> we won't abort a frequency update anymore.
>
> Reported-by: zhuguangqing <[email protected]>
> Suggested-by: Rafael J. Wysocki <[email protected]>

Thanks for following my suggestion!

> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index c03a5775d019..c6861be02c86 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> - if (sg_policy->next_freq == next_freq &&
> - !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> - return false;
> + if (!sg_policy->need_freq_update) {
> + if (sg_policy->next_freq == next_freq)
> + return false;
> + } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
> + sg_policy->need_freq_update = false;
> + }
>
> sg_policy->next_freq = next_freq;
> sg_policy->last_freq_update_time = time;
> @@ -162,11 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>
> freq = map_util_freq(util, freq, max);
>
> - if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update &&
> - !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> + if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> return sg_policy->next_freq;
>
> - sg_policy->need_freq_update = false;
> sg_policy->cached_raw_freq = freq;
> return cpufreq_driver_resolve_freq(policy, freq);
> }
> @@ -442,7 +443,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> unsigned long util, max;
> unsigned int next_f;
> - bool busy;
> unsigned int cached_freq = sg_policy->cached_raw_freq;
>
> sugov_iowait_boost(sg_cpu, time, flags);
> @@ -453,9 +453,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> - /* Limits may have changed, don't skip frequency update */
> - busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> -
> util = sugov_get_util(sg_cpu);
> max = sg_cpu->max;
> util = sugov_iowait_apply(sg_cpu, time, util, max);
> @@ -464,7 +461,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> * Do not reduce the frequency if the CPU has not been idle
> * recently, as the reduction is likely to be premature then.
> */
> - if (busy && next_f < sg_policy->next_freq) {
> + if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
> next_f = sg_policy->next_freq;
>
> /* Restore cached freq as next_freq has changed */
> @@ -829,9 +826,10 @@ static int sugov_start(struct cpufreq_policy *policy)
> sg_policy->next_freq = 0;
> sg_policy->work_in_progress = false;
> sg_policy->limits_changed = false;
> - sg_policy->need_freq_update = false;
> sg_policy->cached_raw_freq = 0;
>
> + sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> +
> for_each_cpu(cpu, policy->cpus) {
> struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
>
> --

I'll queue it up for -rc3 next week, thanks!

2020-10-30 15:25:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set

On Fri, Oct 30, 2020 at 4:07 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Fri, Oct 30, 2020 at 8:31 AM Viresh Kumar <[email protected]> wrote:
> >
> > The cpufreq policy's frequency limits (min/max) can get changed at any
> > point of time, while schedutil is trying to update the next frequency.
> > Though the schedutil governor has necessary locking and support in place
> > to make sure we don't miss any of those updates, there is a corner case
> > where the governor will find that the CPU is already running at the
> > desired frequency and so may skip an update.
> >
> > For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
> > and is running at 1 GHz currently. Schedutil tries to update the
> > frequency to 1.2 GHz, during this time the policy limits get changed as
> > policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
> > frequency at various instances, we will eventually set the frequency to
> > 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
> >
> > Now lets say the policy limits get changed back at this time with
> > policy->min as 1 GHz. The next time schedutil is invoked by the
> > scheduler, we will reevaluate the next frequency (because
> > need_freq_update will get set due to limits change event) and lets say
> > we want to set the frequency to 1.2 GHz again. At this point
> > sugov_update_next_freq() will find the next_freq == current_freq and
> > will abort the update, while the CPU actually runs at 1.4 GHz.
> >
> > Until now need_freq_update was used as a flag to indicate that the
> > policy's frequency limits have changed, and that we should consider the
> > new limits while reevaluating the next frequency.
> >
> > This patch fixes the above mentioned issue by extending the purpose of
> > the need_freq_update flag. If this flag is set now, the schedutil
> > governor will not try to abort a frequency change even if next_freq ==
> > current_freq.
> >
> > As similar behavior is required in the case of
> > CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
> > set to false if that flag is set for the driver.
> >
> > We also don't need to consider the need_freq_update flag in
> > sugov_update_single() anymore to handle the special case of busy CPU, as
> > we won't abort a frequency update anymore.
> >
> > Reported-by: zhuguangqing <[email protected]>
> > Suggested-by: Rafael J. Wysocki <[email protected]>
>
> Thanks for following my suggestion!
>
> > Signed-off-by: Viresh Kumar <[email protected]>
> > ---
> > kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
> > 1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index c03a5775d019..c6861be02c86 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > unsigned int next_freq)
> > {
> > - if (sg_policy->next_freq == next_freq &&
> > - !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > - return false;
> > + if (!sg_policy->need_freq_update) {
> > + if (sg_policy->next_freq == next_freq)
> > + return false;
> > + } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
> > + sg_policy->need_freq_update = false;

One nit, though.

This can be changed into

} else {
sg_policy->need_freq_update =
cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
}

to save a branch and because need_freq_update is there in the cache
already, this should be a fast update.

So I'm going to make this change while applying the patch.

> > + }
> >
> > sg_policy->next_freq = next_freq;
> > sg_policy->last_freq_update_time = time;
> > @@ -162,11 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >
> > freq = map_util_freq(util, freq, max);
> >
> > - if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update &&
> > - !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > + if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > return sg_policy->next_freq;
> >
> > - sg_policy->need_freq_update = false;
> > sg_policy->cached_raw_freq = freq;
> > return cpufreq_driver_resolve_freq(policy, freq);
> > }
> > @@ -442,7 +443,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> > unsigned long util, max;
> > unsigned int next_f;
> > - bool busy;
> > unsigned int cached_freq = sg_policy->cached_raw_freq;
> >
> > sugov_iowait_boost(sg_cpu, time, flags);
> > @@ -453,9 +453,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > - /* Limits may have changed, don't skip frequency update */
> > - busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> > -
> > util = sugov_get_util(sg_cpu);
> > max = sg_cpu->max;
> > util = sugov_iowait_apply(sg_cpu, time, util, max);
> > @@ -464,7 +461,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > * Do not reduce the frequency if the CPU has not been idle
> > * recently, as the reduction is likely to be premature then.
> > */
> > - if (busy && next_f < sg_policy->next_freq) {
> > + if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
> > next_f = sg_policy->next_freq;
> >
> > /* Restore cached freq as next_freq has changed */
> > @@ -829,9 +826,10 @@ static int sugov_start(struct cpufreq_policy *policy)
> > sg_policy->next_freq = 0;
> > sg_policy->work_in_progress = false;
> > sg_policy->limits_changed = false;
> > - sg_policy->need_freq_update = false;
> > sg_policy->cached_raw_freq = 0;
> >
> > + sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > +
> > for_each_cpu(cpu, policy->cpus) {
> > struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
> >
> > --
>
> I'll queue it up for -rc3 next week, thanks!

2020-11-02 04:46:53

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set

On 30-10-20, 16:23, Rafael J. Wysocki wrote:
> On Fri, Oct 30, 2020 at 4:07 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Fri, Oct 30, 2020 at 8:31 AM Viresh Kumar <[email protected]> wrote:
> > >
> > > The cpufreq policy's frequency limits (min/max) can get changed at any
> > > point of time, while schedutil is trying to update the next frequency.
> > > Though the schedutil governor has necessary locking and support in place
> > > to make sure we don't miss any of those updates, there is a corner case
> > > where the governor will find that the CPU is already running at the
> > > desired frequency and so may skip an update.
> > >
> > > For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
> > > and is running at 1 GHz currently. Schedutil tries to update the
> > > frequency to 1.2 GHz, during this time the policy limits get changed as
> > > policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
> > > frequency at various instances, we will eventually set the frequency to
> > > 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
> > >
> > > Now lets say the policy limits get changed back at this time with
> > > policy->min as 1 GHz. The next time schedutil is invoked by the
> > > scheduler, we will reevaluate the next frequency (because
> > > need_freq_update will get set due to limits change event) and lets say
> > > we want to set the frequency to 1.2 GHz again. At this point
> > > sugov_update_next_freq() will find the next_freq == current_freq and
> > > will abort the update, while the CPU actually runs at 1.4 GHz.
> > >
> > > Until now need_freq_update was used as a flag to indicate that the
> > > policy's frequency limits have changed, and that we should consider the
> > > new limits while reevaluating the next frequency.
> > >
> > > This patch fixes the above mentioned issue by extending the purpose of
> > > the need_freq_update flag. If this flag is set now, the schedutil
> > > governor will not try to abort a frequency change even if next_freq ==
> > > current_freq.
> > >
> > > As similar behavior is required in the case of
> > > CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
> > > set to false if that flag is set for the driver.
> > >
> > > We also don't need to consider the need_freq_update flag in
> > > sugov_update_single() anymore to handle the special case of busy CPU, as
> > > we won't abort a frequency update anymore.
> > >
> > > Reported-by: zhuguangqing <[email protected]>
> > > Suggested-by: Rafael J. Wysocki <[email protected]>
> >
> > Thanks for following my suggestion!
> >
> > > Signed-off-by: Viresh Kumar <[email protected]>
> > > ---
> > > kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
> > > 1 file changed, 10 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index c03a5775d019..c6861be02c86 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > > static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > - if (sg_policy->next_freq == next_freq &&
> > > - !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > - return false;
> > > + if (!sg_policy->need_freq_update) {
> > > + if (sg_policy->next_freq == next_freq)
> > > + return false;
> > > + } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
> > > + sg_policy->need_freq_update = false;
>
> One nit, though.
>
> This can be changed into
>
> } else {
> sg_policy->need_freq_update =
> cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> }
>
> to save a branch and because need_freq_update is there in the cache
> already, this should be a fast update.

Nice.

--
viresh