2018-04-10 16:04:32

by Patrick Bellasi

[permalink] [raw]
Subject: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

The iowait boosting code has been recently updated to add a progressive
boosting behavior which allows to be less aggressive in boosting tasks
doing only sporadic IO operations, thus being more energy efficient for
example on mobile platforms.

The current code is now however a bit convoluted. Some functionalities
(e.g. iowait boost reset) are replicated in different paths and their
documentation is slightly misaligned.

Moreover, from a functional stadpoint, the iowait boosting is also not
always reset in systems where cpufreq policies are not shared, each CPU
has his own policy. Indeed, when a CPU is idle for a long time we keep
doubling the boosting instead of resetting it to the minimum frequency,
as expected by the TICK_NSEC logic, whenever a task wakes up from IO.

Let's cleanup the code by consolidating all the IO wait boosting related
functionality inside the already existing functions and better define
their role:

- sugov_set_iowait_boost: is now in charge only to set/increase the IO
wait boost, every time a task wakes up from an IO wait.

- sugov_iowait_boost: is now in charge to reset/reduce the IO wait
boost, every time a sugov update is triggered, as well as
to (eventually) enforce the currently required IO boost value.

This is possible since these two functions are already used one after
the other, both in single and shared frequency domains, following the
same template:

/* Configure IO boost, if required */
sugov_set_iowait_boost()

/* Return here if freq change is in progress or throttled */

/* Collect and aggregate utilization information */
sugov_get_util()
sugov_aggregate_util()

/* Add IO boost if currently enabled */
sugov_iowait_boost()

As a extra bonus, let's also add the documentation for these two
functions and better align the in-code documentation.

Signed-off-by: Patrick Bellasi <[email protected]>
Reported-by: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Steve Muckle <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: [email protected]
Cc: [email protected]

---
Changes in v2:
- Fix return in sugov_iowait_boost()'s reset code (Viresh)
- Add iowait boost reset for sugov_update_single() (Viresh)
- Title changed to reflact the fix from previous point

Based on today's tip/sched/core:
b720342 sched/core: Update preempt_notifier_key to modern API
---
kernel/sched/cpufreq_schedutil.c | 120 ++++++++++++++++++++++++++-------------
1 file changed, 81 insertions(+), 39 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2b124811947d..2a2ae3a0e41f 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -51,7 +51,7 @@ struct sugov_cpu {
bool iowait_boost_pending;
unsigned int iowait_boost;
unsigned int iowait_boost_max;
- u64 last_update;
+ u64 last_update;

/* The fields below are only needed when sharing a policy: */
unsigned long util_cfs;
@@ -201,43 +201,97 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
return min(util, sg_cpu->max);
}

-static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
+/**
+ * sugov_set_iowait_boost updates the IO boost at each wakeup from IO.
+ * @sg_cpu: the sugov data for the CPU to boost
+ * @time: the update time from the caller
+ * @flags: SCHED_CPUFREQ_IOWAIT if the task is waking up after an IO wait
+ *
+ * Each time a task wakes up after an IO operation, the CPU utilization can be
+ * boosted to a certain utilization which is doubled at each wakeup
+ * from IO, starting from the utilization of the minimum OPP to that of the
+ * maximum one.
+ */
+static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
+ unsigned int flags)
{
- if (flags & SCHED_CPUFREQ_IOWAIT) {
- if (sg_cpu->iowait_boost_pending)
- return;
-
- sg_cpu->iowait_boost_pending = true;
+ bool iowait = flags & SCHED_CPUFREQ_IOWAIT;

- if (sg_cpu->iowait_boost) {
- sg_cpu->iowait_boost <<= 1;
- if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
- sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
- } else {
- sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
- }
- } else if (sg_cpu->iowait_boost) {
+ /* Reset boost if the CPU appears to have been idle enough */
+ if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;

- /* Clear iowait_boost if the CPU apprears to have been idle. */
if (delta_ns > TICK_NSEC) {
- sg_cpu->iowait_boost = 0;
- sg_cpu->iowait_boost_pending = false;
+ sg_cpu->iowait_boost = iowait
+ ? sg_cpu->sg_policy->policy->min : 0;
+ sg_cpu->iowait_boost_pending = iowait;
+ return;
}
}
+
+ /* Boost only tasks waking up after IO */
+ if (!iowait)
+ return;
+
+ /* Ensure IO boost doubles only one time at each frequency increase */
+ if (sg_cpu->iowait_boost_pending)
+ return;
+ sg_cpu->iowait_boost_pending = true;
+
+ /* Double the IO boost at each frequency increase */
+ if (sg_cpu->iowait_boost) {
+ sg_cpu->iowait_boost <<= 1;
+ if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
+ sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
+ return;
+ }
+
+ /* At first wakeup after IO, start with minimum boost */
+ sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
}

-static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
- unsigned long *max)
+/**
+ * sugov_iowait_boost boosts a CPU after a wakeup from IO.
+ * @sg_cpu: the sugov data for the cpu to boost
+ * @time: the update time from the caller
+ * @util: the utilization to (eventually) boost
+ * @max: the maximum value the utilization can be boosted to
+ *
+ * A CPU running a task which woken up after an IO operation can have its
+ * utilization boosted to speed up the completion of those IO operations.
+ * The IO boost value is increased each time a task wakes up from IO, in
+ * sugov_set_iowait_boost(), and it's instead decreased by this function,
+ * each time an increase has not been requested (!iowait_boost_pending).
+ *
+ * A CPU which also appears to have been idle for at least one tick has also
+ * its IO boost utilization reset.
+ *
+ * This mechanism is designed to boost high frequently IO waiting tasks, while
+ * being more conservative on tasks which does sporadic IO operations.
+ */
+static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
+ unsigned long *util, unsigned long *max)
{
unsigned int boost_util, boost_max;
+ s64 delta_ns;

+ /* No IOWait boost active */
if (!sg_cpu->iowait_boost)
return;

+ /* Clear boost if the CPU appears to have been idle enough */
+ delta_ns = time - sg_cpu->last_update;
+ if (delta_ns > TICK_NSEC) {
+ sg_cpu->iowait_boost = 0;
+ sg_cpu->iowait_boost_pending = false;
+ return;
+ }
+
+ /* An IO waiting task has just woken up, use the boost value */
if (sg_cpu->iowait_boost_pending) {
sg_cpu->iowait_boost_pending = false;
} else {
+ /* Reduce the boost value otherwise */
sg_cpu->iowait_boost >>= 1;
if (sg_cpu->iowait_boost < sg_cpu->sg_policy->policy->min) {
sg_cpu->iowait_boost = 0;
@@ -248,6 +302,10 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
boost_util = sg_cpu->iowait_boost;
boost_max = sg_cpu->iowait_boost_max;

+ /*
+ * A CPU is boosted only if its current utilization is smaller then
+ * the current IO boost level.
+ */
if (*util * boost_max < *max * boost_util) {
*util = boost_util;
*max = boost_max;
@@ -299,7 +357,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
sugov_get_util(sg_cpu);
max = sg_cpu->max;
util = sugov_aggregate_util(sg_cpu);
- sugov_iowait_boost(sg_cpu, &util, &max);
+ sugov_iowait_boost(sg_cpu, time, &util, &max);
next_f = get_next_freq(sg_policy, util, max);
/*
* Do not reduce the frequency if the CPU has not been idle
@@ -325,28 +383,12 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
for_each_cpu(j, policy->cpus) {
struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
unsigned long j_util, j_max;
- s64 delta_ns;

sugov_get_util(j_sg_cpu);
-
- /*
- * If the CFS CPU utilization was last updated before the
- * previous frequency update and the time elapsed between the
- * last update of the CPU utilization and the last frequency
- * update is long enough, reset iowait_boost and util_cfs, as
- * they are now probably stale. However, still consider the
- * CPU contribution if it has some DEADLINE utilization
- * (util_dl).
- */
- delta_ns = time - j_sg_cpu->last_update;
- if (delta_ns > TICK_NSEC) {
- j_sg_cpu->iowait_boost = 0;
- j_sg_cpu->iowait_boost_pending = false;
- }
-
j_max = j_sg_cpu->max;
j_util = sugov_aggregate_util(j_sg_cpu);
- sugov_iowait_boost(j_sg_cpu, &j_util, &j_max);
+ sugov_iowait_boost(j_sg_cpu, time, &j_util, &j_max);
+
if (j_util * max > j_max * util) {
util = j_util;
max = j_max;
--
2.15.1



2018-04-10 19:44:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On Tue, Apr 10, 2018 at 04:59:31PM +0100, Patrick Bellasi wrote:
> The iowait boosting code has been recently updated to add a progressive
> boosting behavior which allows to be less aggressive in boosting tasks
> doing only sporadic IO operations, thus being more energy efficient for
> example on mobile platforms.
>
> The current code is now however a bit convoluted. Some functionalities
> (e.g. iowait boost reset) are replicated in different paths and their
> documentation is slightly misaligned.

While your patch does seem to improve things, it still has duplicated
bits in. Eg. the TICK_NSEC clearing exists in both functions.

> - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> wait boost, every time a task wakes up from an IO wait.
>
> - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> boost, every time a sugov update is triggered, as well as
> to (eventually) enforce the currently required IO boost value.

I'm not sold on those function names; feels like we can do better,
although I'm struggling to come up with anything sensible just now.


>
> if (delta_ns > TICK_NSEC) {
> + sg_cpu->iowait_boost = iowait
> + ? sg_cpu->sg_policy->policy->min : 0;
> + sg_cpu->iowait_boost_pending = iowait;
> + return;
> }

> + if (delta_ns > TICK_NSEC) {
> + sg_cpu->iowait_boost = 0;
> + sg_cpu->iowait_boost_pending = false;
> + return;
> + }

Looks like something we can maybe put in a helper or something.

2018-04-11 04:40:52

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On 10-04-18, 16:59, Patrick Bellasi wrote:
> The iowait boosting code has been recently updated to add a progressive
> boosting behavior which allows to be less aggressive in boosting tasks
> doing only sporadic IO operations, thus being more energy efficient for
> example on mobile platforms.
>
> The current code is now however a bit convoluted. Some functionalities
> (e.g. iowait boost reset) are replicated in different paths and their
> documentation is slightly misaligned.
>
> Moreover, from a functional stadpoint, the iowait boosting is also not
> always reset in systems where cpufreq policies are not shared, each CPU
> has his own policy. Indeed, when a CPU is idle for a long time we keep
> doubling the boosting instead of resetting it to the minimum frequency,
> as expected by the TICK_NSEC logic, whenever a task wakes up from IO.
>
> Let's cleanup the code by consolidating all the IO wait boosting related
> functionality inside the already existing functions and better define
> their role:
>
> - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> wait boost, every time a task wakes up from an IO wait.
>
> - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> boost, every time a sugov update is triggered, as well as
> to (eventually) enforce the currently required IO boost value.
>
> This is possible since these two functions are already used one after
> the other, both in single and shared frequency domains, following the
> same template:
>
> /* Configure IO boost, if required */
> sugov_set_iowait_boost()
>
> /* Return here if freq change is in progress or throttled */
>
> /* Collect and aggregate utilization information */
> sugov_get_util()
> sugov_aggregate_util()
>
> /* Add IO boost if currently enabled */
> sugov_iowait_boost()
>
> As a extra bonus, let's also add the documentation for these two
> functions and better align the in-code documentation.
>
> Signed-off-by: Patrick Bellasi <[email protected]>
> Reported-by: Viresh Kumar <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Steve Muckle <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
>
> ---
> Changes in v2:
> - Fix return in sugov_iowait_boost()'s reset code (Viresh)
> - Add iowait boost reset for sugov_update_single() (Viresh)
> - Title changed to reflact the fix from previous point
>
> Based on today's tip/sched/core:
> b720342 sched/core: Update preempt_notifier_key to modern API
> ---
> kernel/sched/cpufreq_schedutil.c | 120 ++++++++++++++++++++++++++-------------
> 1 file changed, 81 insertions(+), 39 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 2b124811947d..2a2ae3a0e41f 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -51,7 +51,7 @@ struct sugov_cpu {
> bool iowait_boost_pending;
> unsigned int iowait_boost;
> unsigned int iowait_boost_max;
> - u64 last_update;
> + u64 last_update;
>
> /* The fields below are only needed when sharing a policy: */
> unsigned long util_cfs;
> @@ -201,43 +201,97 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> return min(util, sg_cpu->max);
> }
>
> -static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
> +/**
> + * sugov_set_iowait_boost updates the IO boost at each wakeup from IO.
> + * @sg_cpu: the sugov data for the CPU to boost
> + * @time: the update time from the caller
> + * @flags: SCHED_CPUFREQ_IOWAIT if the task is waking up after an IO wait
> + *
> + * Each time a task wakes up after an IO operation, the CPU utilization can be
> + * boosted to a certain utilization which is doubled at each wakeup
> + * from IO, starting from the utilization of the minimum OPP to that of the
> + * maximum one.
> + */
> +static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> + unsigned int flags)
> {
> - if (flags & SCHED_CPUFREQ_IOWAIT) {
> - if (sg_cpu->iowait_boost_pending)
> - return;
> -
> - sg_cpu->iowait_boost_pending = true;
> + bool iowait = flags & SCHED_CPUFREQ_IOWAIT;
>
> - if (sg_cpu->iowait_boost) {
> - sg_cpu->iowait_boost <<= 1;
> - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> - } else {
> - sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
> - }
> - } else if (sg_cpu->iowait_boost) {
> + /* Reset boost if the CPU appears to have been idle enough */
> + if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> - /* Clear iowait_boost if the CPU apprears to have been idle. */
> if (delta_ns > TICK_NSEC) {
> - sg_cpu->iowait_boost = 0;
> - sg_cpu->iowait_boost_pending = false;
> + sg_cpu->iowait_boost = iowait
> + ? sg_cpu->sg_policy->policy->min : 0;
> + sg_cpu->iowait_boost_pending = iowait;
> + return;
> }
> }
> +
> + /* Boost only tasks waking up after IO */
> + if (!iowait)
> + return;
> +
> + /* Ensure IO boost doubles only one time at each frequency increase */
> + if (sg_cpu->iowait_boost_pending)
> + return;
> + sg_cpu->iowait_boost_pending = true;
> +
> + /* Double the IO boost at each frequency increase */
> + if (sg_cpu->iowait_boost) {
> + sg_cpu->iowait_boost <<= 1;
> + if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
> + sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + return;
> + }
> +
> + /* At first wakeup after IO, start with minimum boost */
> + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
> }

The above part should be a different patch with this:

Fixes: a5a0809bc58e ("cpufreq: schedutil: Make iowait boost more energy efficient")

> -static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> - unsigned long *max)
> +/**
> + * sugov_iowait_boost boosts a CPU after a wakeup from IO.
> + * @sg_cpu: the sugov data for the cpu to boost
> + * @time: the update time from the caller
> + * @util: the utilization to (eventually) boost
> + * @max: the maximum value the utilization can be boosted to
> + *
> + * A CPU running a task which woken up after an IO operation can have its
> + * utilization boosted to speed up the completion of those IO operations.
> + * The IO boost value is increased each time a task wakes up from IO, in
> + * sugov_set_iowait_boost(), and it's instead decreased by this function,
> + * each time an increase has not been requested (!iowait_boost_pending).
> + *
> + * A CPU which also appears to have been idle for at least one tick has also
> + * its IO boost utilization reset.
> + *
> + * This mechanism is designed to boost high frequently IO waiting tasks, while
> + * being more conservative on tasks which does sporadic IO operations.
> + */
> +static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> + unsigned long *util, unsigned long *max)
> {
> unsigned int boost_util, boost_max;
> + s64 delta_ns;
>
> + /* No IOWait boost active */
> if (!sg_cpu->iowait_boost)
> return;
>
> + /* Clear boost if the CPU appears to have been idle enough */
> + delta_ns = time - sg_cpu->last_update;
> + if (delta_ns > TICK_NSEC) {
> + sg_cpu->iowait_boost = 0;
> + sg_cpu->iowait_boost_pending = false;
> + return;
> + }
> +
> + /* An IO waiting task has just woken up, use the boost value */
> if (sg_cpu->iowait_boost_pending) {
> sg_cpu->iowait_boost_pending = false;
> } else {
> + /* Reduce the boost value otherwise */
> sg_cpu->iowait_boost >>= 1;
> if (sg_cpu->iowait_boost < sg_cpu->sg_policy->policy->min) {
> sg_cpu->iowait_boost = 0;
> @@ -248,6 +302,10 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> boost_util = sg_cpu->iowait_boost;
> boost_max = sg_cpu->iowait_boost_max;
>
> + /*
> + * A CPU is boosted only if its current utilization is smaller then
> + * the current IO boost level.
> + */
> if (*util * boost_max < *max * boost_util) {
> *util = boost_util;
> *max = boost_max;
> @@ -299,7 +357,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> sugov_get_util(sg_cpu);
> max = sg_cpu->max;
> util = sugov_aggregate_util(sg_cpu);
> - sugov_iowait_boost(sg_cpu, &util, &max);
> + sugov_iowait_boost(sg_cpu, time, &util, &max);
> next_f = get_next_freq(sg_policy, util, max);
> /*
> * Do not reduce the frequency if the CPU has not been idle
> @@ -325,28 +383,12 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> for_each_cpu(j, policy->cpus) {
> struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> unsigned long j_util, j_max;
> - s64 delta_ns;
>
> sugov_get_util(j_sg_cpu);
> -
> - /*
> - * If the CFS CPU utilization was last updated before the
> - * previous frequency update and the time elapsed between the
> - * last update of the CPU utilization and the last frequency
> - * update is long enough, reset iowait_boost and util_cfs, as
> - * they are now probably stale. However, still consider the
> - * CPU contribution if it has some DEADLINE utilization
> - * (util_dl).
> - */
> - delta_ns = time - j_sg_cpu->last_update;
> - if (delta_ns > TICK_NSEC) {
> - j_sg_cpu->iowait_boost = 0;
> - j_sg_cpu->iowait_boost_pending = false;
> - }
> -
> j_max = j_sg_cpu->max;
> j_util = sugov_aggregate_util(j_sg_cpu);
> - sugov_iowait_boost(j_sg_cpu, &j_util, &j_max);
> + sugov_iowait_boost(j_sg_cpu, time, &j_util, &j_max);
> +
> if (j_util * max > j_max * util) {
> util = j_util;
> max = j_max;

And the rest is just code rearrangement. And as Peter said, we better
have a routine to clear boost values on delta > TICK_NSEC.

Diff LGTM otherwise. Thanks.

--
viresh

2018-04-11 10:48:40

by Patrick Bellasi

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On 10-Apr 21:37, Peter Zijlstra wrote:
> On Tue, Apr 10, 2018 at 04:59:31PM +0100, Patrick Bellasi wrote:
> > The iowait boosting code has been recently updated to add a progressive
> > boosting behavior which allows to be less aggressive in boosting tasks
> > doing only sporadic IO operations, thus being more energy efficient for
> > example on mobile platforms.
> >
> > The current code is now however a bit convoluted. Some functionalities
> > (e.g. iowait boost reset) are replicated in different paths and their
> > documentation is slightly misaligned.
>
> While your patch does seem to improve things, it still has duplicated
> bits in. Eg. the TICK_NSEC clearing exists in both functions.

Yes, that duplication has been added in v2 since, as Viresh pointed
out, iowait boost reset was still broken for IO wait tasks waking up
on a CPU idle for more then TICK_NSEC... we should probably have it on
a separate patch.

> > - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> > wait boost, every time a task wakes up from an IO wait.
> >
> > - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> > boost, every time a sugov update is triggered, as well as
> > to (eventually) enforce the currently required IO boost value.
>
> I'm not sold on those function names; feels like we can do better,
> although I'm struggling to come up with anything sensible just now.

What about something like:

sugov_iowait_init()
since here we are mainly initializing the iowait boost

sugov_iowait_boost()
since here we are mainly applying the proper boost to each cpu

Although they are not really so different...

> >
> > if (delta_ns > TICK_NSEC) {
> > + sg_cpu->iowait_boost = iowait
> > + ? sg_cpu->sg_policy->policy->min : 0;
> > + sg_cpu->iowait_boost_pending = iowait;
> > + return;
> > }
>
> > + if (delta_ns > TICK_NSEC) {
> > + sg_cpu->iowait_boost = 0;
> > + sg_cpu->iowait_boost_pending = false;
> > + return;
> > + }
>
> Looks like something we can maybe put in a helper or something.

... but we can also probably fold the two chunks above into something
like:

---8<---
static bool sugov_iowait_reset(struct sugov_cpu *sg_cpu, u64 time,
bool iowait_boost)
{
s64 delta_ns = time - sg_cpu->last_update;

if (delta_ns <= TICK_NSEC)
return false;

sg_cpu->iowait_boost = iowait_boost
? sg_cpu->sg_policy->policy->min : 0;
sg_cpu->iowait_boost_pending = iowait_boost;

return true;
}
---8<---

--
#include <best/regards.h>

Patrick Bellasi

2018-04-11 10:52:28

by Patrick Bellasi

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On 11-Apr 10:07, Viresh Kumar wrote:
> On 10-04-18, 16:59, Patrick Bellasi wrote:
> > The iowait boosting code has been recently updated to add a progressive
> > boosting behavior which allows to be less aggressive in boosting tasks
> > doing only sporadic IO operations, thus being more energy efficient for
> > example on mobile platforms.
> >
> > The current code is now however a bit convoluted. Some functionalities
> > (e.g. iowait boost reset) are replicated in different paths and their
> > documentation is slightly misaligned.
> >
> > Moreover, from a functional stadpoint, the iowait boosting is also not
> > always reset in systems where cpufreq policies are not shared, each CPU
> > has his own policy. Indeed, when a CPU is idle for a long time we keep
> > doubling the boosting instead of resetting it to the minimum frequency,
> > as expected by the TICK_NSEC logic, whenever a task wakes up from IO.
> >
> > Let's cleanup the code by consolidating all the IO wait boosting related
> > functionality inside the already existing functions and better define
> > their role:
> >
> > - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> > wait boost, every time a task wakes up from an IO wait.
> >
> > - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> > boost, every time a sugov update is triggered, as well as
> > to (eventually) enforce the currently required IO boost value.
> >
> > This is possible since these two functions are already used one after
> > the other, both in single and shared frequency domains, following the
> > same template:
> >
> > /* Configure IO boost, if required */
> > sugov_set_iowait_boost()
> >
> > /* Return here if freq change is in progress or throttled */
> >
> > /* Collect and aggregate utilization information */
> > sugov_get_util()
> > sugov_aggregate_util()
> >
> > /* Add IO boost if currently enabled */
> > sugov_iowait_boost()
> >
> > As a extra bonus, let's also add the documentation for these two
> > functions and better align the in-code documentation.
> >
> > Signed-off-by: Patrick Bellasi <[email protected]>
> > Reported-by: Viresh Kumar <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Cc: Viresh Kumar <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: Steve Muckle <[email protected]>
> > Cc: Juri Lelli <[email protected]>
> > Cc: Dietmar Eggemann <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> >
> > ---
> > Changes in v2:
> > - Fix return in sugov_iowait_boost()'s reset code (Viresh)
> > - Add iowait boost reset for sugov_update_single() (Viresh)
> > - Title changed to reflact the fix from previous point
> >
> > Based on today's tip/sched/core:
> > b720342 sched/core: Update preempt_notifier_key to modern API
> > ---
> > kernel/sched/cpufreq_schedutil.c | 120 ++++++++++++++++++++++++++-------------
> > 1 file changed, 81 insertions(+), 39 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 2b124811947d..2a2ae3a0e41f 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -51,7 +51,7 @@ struct sugov_cpu {
> > bool iowait_boost_pending;
> > unsigned int iowait_boost;
> > unsigned int iowait_boost_max;
> > - u64 last_update;
> > + u64 last_update;
> >
> > /* The fields below are only needed when sharing a policy: */
> > unsigned long util_cfs;
> > @@ -201,43 +201,97 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> > return min(util, sg_cpu->max);
> > }
> >
> > -static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
> > +/**
> > + * sugov_set_iowait_boost updates the IO boost at each wakeup from IO.
> > + * @sg_cpu: the sugov data for the CPU to boost
> > + * @time: the update time from the caller
> > + * @flags: SCHED_CPUFREQ_IOWAIT if the task is waking up after an IO wait
> > + *
> > + * Each time a task wakes up after an IO operation, the CPU utilization can be
> > + * boosted to a certain utilization which is doubled at each wakeup
> > + * from IO, starting from the utilization of the minimum OPP to that of the
> > + * maximum one.
> > + */
> > +static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > + unsigned int flags)
> > {
> > - if (flags & SCHED_CPUFREQ_IOWAIT) {
> > - if (sg_cpu->iowait_boost_pending)
> > - return;
> > -
> > - sg_cpu->iowait_boost_pending = true;
> > + bool iowait = flags & SCHED_CPUFREQ_IOWAIT;
> >
> > - if (sg_cpu->iowait_boost) {
> > - sg_cpu->iowait_boost <<= 1;
> > - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
> > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> > - } else {
> > - sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
> > - }
> > - } else if (sg_cpu->iowait_boost) {
> > + /* Reset boost if the CPU appears to have been idle enough */
> > + if (sg_cpu->iowait_boost) {
> > s64 delta_ns = time - sg_cpu->last_update;
> >
> > - /* Clear iowait_boost if the CPU apprears to have been idle. */
> > if (delta_ns > TICK_NSEC) {
> > - sg_cpu->iowait_boost = 0;
> > - sg_cpu->iowait_boost_pending = false;
> > + sg_cpu->iowait_boost = iowait
> > + ? sg_cpu->sg_policy->policy->min : 0;
> > + sg_cpu->iowait_boost_pending = iowait;
> > + return;
> > }
> > }
> > +
> > + /* Boost only tasks waking up after IO */
> > + if (!iowait)
> > + return;
> > +
> > + /* Ensure IO boost doubles only one time at each frequency increase */
> > + if (sg_cpu->iowait_boost_pending)
> > + return;
> > + sg_cpu->iowait_boost_pending = true;
> > +
> > + /* Double the IO boost at each frequency increase */
> > + if (sg_cpu->iowait_boost) {
> > + sg_cpu->iowait_boost <<= 1;
> > + if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max)
> > + sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> > + return;
> > + }
> > +
> > + /* At first wakeup after IO, start with minimum boost */
> > + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
> > }
>
> The above part should be a different patch with this:
>
> Fixes: a5a0809bc58e ("cpufreq: schedutil: Make iowait boost more energy efficient")

You mean to split out on a separate patch the fix for the iowait boost
on per-cpu policies?

> > -static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> > - unsigned long *max)
> > +/**
> > + * sugov_iowait_boost boosts a CPU after a wakeup from IO.
> > + * @sg_cpu: the sugov data for the cpu to boost
> > + * @time: the update time from the caller
> > + * @util: the utilization to (eventually) boost
> > + * @max: the maximum value the utilization can be boosted to
> > + *
> > + * A CPU running a task which woken up after an IO operation can have its
> > + * utilization boosted to speed up the completion of those IO operations.
> > + * The IO boost value is increased each time a task wakes up from IO, in
> > + * sugov_set_iowait_boost(), and it's instead decreased by this function,
> > + * each time an increase has not been requested (!iowait_boost_pending).
> > + *
> > + * A CPU which also appears to have been idle for at least one tick has also
> > + * its IO boost utilization reset.
> > + *
> > + * This mechanism is designed to boost high frequently IO waiting tasks, while
> > + * being more conservative on tasks which does sporadic IO operations.
> > + */
> > +static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > + unsigned long *util, unsigned long *max)
> > {
> > unsigned int boost_util, boost_max;
> > + s64 delta_ns;
> >
> > + /* No IOWait boost active */
> > if (!sg_cpu->iowait_boost)
> > return;
> >
> > + /* Clear boost if the CPU appears to have been idle enough */
> > + delta_ns = time - sg_cpu->last_update;
> > + if (delta_ns > TICK_NSEC) {
> > + sg_cpu->iowait_boost = 0;
> > + sg_cpu->iowait_boost_pending = false;
> > + return;
> > + }
> > +
> > + /* An IO waiting task has just woken up, use the boost value */
> > if (sg_cpu->iowait_boost_pending) {
> > sg_cpu->iowait_boost_pending = false;
> > } else {
> > + /* Reduce the boost value otherwise */
> > sg_cpu->iowait_boost >>= 1;
> > if (sg_cpu->iowait_boost < sg_cpu->sg_policy->policy->min) {
> > sg_cpu->iowait_boost = 0;
> > @@ -248,6 +302,10 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> > boost_util = sg_cpu->iowait_boost;
> > boost_max = sg_cpu->iowait_boost_max;
> >
> > + /*
> > + * A CPU is boosted only if its current utilization is smaller then
> > + * the current IO boost level.
> > + */
> > if (*util * boost_max < *max * boost_util) {
> > *util = boost_util;
> > *max = boost_max;
> > @@ -299,7 +357,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > sugov_get_util(sg_cpu);
> > max = sg_cpu->max;
> > util = sugov_aggregate_util(sg_cpu);
> > - sugov_iowait_boost(sg_cpu, &util, &max);
> > + sugov_iowait_boost(sg_cpu, time, &util, &max);
> > next_f = get_next_freq(sg_policy, util, max);
> > /*
> > * Do not reduce the frequency if the CPU has not been idle
> > @@ -325,28 +383,12 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> > for_each_cpu(j, policy->cpus) {
> > struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> > unsigned long j_util, j_max;
> > - s64 delta_ns;
> >
> > sugov_get_util(j_sg_cpu);
> > -
> > - /*
> > - * If the CFS CPU utilization was last updated before the
> > - * previous frequency update and the time elapsed between the
> > - * last update of the CPU utilization and the last frequency
> > - * update is long enough, reset iowait_boost and util_cfs, as
> > - * they are now probably stale. However, still consider the
> > - * CPU contribution if it has some DEADLINE utilization
> > - * (util_dl).
> > - */
> > - delta_ns = time - j_sg_cpu->last_update;
> > - if (delta_ns > TICK_NSEC) {
> > - j_sg_cpu->iowait_boost = 0;
> > - j_sg_cpu->iowait_boost_pending = false;
> > - }
> > -
> > j_max = j_sg_cpu->max;
> > j_util = sugov_aggregate_util(j_sg_cpu);
> > - sugov_iowait_boost(j_sg_cpu, &j_util, &j_max);
> > + sugov_iowait_boost(j_sg_cpu, time, &j_util, &j_max);
> > +
> > if (j_util * max > j_max * util) {
> > util = j_util;
> > max = j_max;
>
> And the rest is just code rearrangement. And as Peter said, we better
> have a routine to clear boost values on delta > TICK_NSEC.

Right, already commented in reply to Peter. I'll split in two patches:
one for documentation and code re-organization and a second one for
the fix to the issue you pointed out.

> Diff LGTM otherwise. Thanks.

Thanks

--
#include <best/regards.h>

Patrick Bellasi

2018-04-11 11:02:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On Wed, Apr 11, 2018 at 11:44:45AM +0100, Patrick Bellasi wrote:
> > > - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> > > wait boost, every time a task wakes up from an IO wait.
> > >
> > > - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> > > boost, every time a sugov update is triggered, as well as
> > > to (eventually) enforce the currently required IO boost value.
> >
> > I'm not sold on those function names; feels like we can do better,
> > although I'm struggling to come up with anything sensible just now.
>
> What about something like:
>
> sugov_iowait_init()
> since here we are mainly initializing the iowait boost
>
> sugov_iowait_boost()
> since here we are mainly applying the proper boost to each cpu
>
> Although they are not really so different...

How about:

sugov_iowait_boost() -- does the actual impulse/boost
sugov_iowait_apply() -- applies the boost state

?

2018-04-11 14:44:35

by Patrick Bellasi

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On 11-Apr 12:58, Peter Zijlstra wrote:
> On Wed, Apr 11, 2018 at 11:44:45AM +0100, Patrick Bellasi wrote:
> > > > - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> > > > wait boost, every time a task wakes up from an IO wait.
> > > >
> > > > - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> > > > boost, every time a sugov update is triggered, as well as
> > > > to (eventually) enforce the currently required IO boost value.
> > >
> > > I'm not sold on those function names; feels like we can do better,
> > > although I'm struggling to come up with anything sensible just now.
> >
> > What about something like:
> >
> > sugov_iowait_init()
> > since here we are mainly initializing the iowait boost
> >
> > sugov_iowait_boost()
> > since here we are mainly applying the proper boost to each cpu
> >
> > Although they are not really so different...
>
> How about:
>
> sugov_iowait_boost() -- does the actual impulse/boost
> sugov_iowait_apply() -- applies the boost state
>
> ?

Whould say it can work too, and it also allows to add a:

sugov_iowait_reset() -- resets boots state after
TICK_NSEC CPU idle time

Viresh, Rafael, Joel: any preferences or other suggestions?

--
#include <best/regards.h>

Patrick Bellasi

2018-04-11 14:51:43

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On 11-04-18, 11:48, Patrick Bellasi wrote:
> On 11-Apr 10:07, Viresh Kumar wrote:
> > On 10-04-18, 16:59, Patrick Bellasi wrote:

> > The above part should be a different patch with this:
> >
> > Fixes: a5a0809bc58e ("cpufreq: schedutil: Make iowait boost more energy efficient")
>
> You mean to split out on a separate patch the fix for the iowait boost
> on per-cpu policies?

Right.

> > And the rest is just code rearrangement. And as Peter said, we better
> > have a routine to clear boost values on delta > TICK_NSEC.
>
> Right, already commented in reply to Peter. I'll split in two patches:
> one for documentation and code re-organization and a second one for
> the fix to the issue you pointed out.

Thanks.

--
viresh

2018-04-11 14:59:21

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2] cpufreq/schedutil: Cleanup, document and fix iowait boost

On 11-04-18, 15:39, Patrick Bellasi wrote:
> On 11-Apr 12:58, Peter Zijlstra wrote:
> > On Wed, Apr 11, 2018 at 11:44:45AM +0100, Patrick Bellasi wrote:
> > > > > - sugov_set_iowait_boost: is now in charge only to set/increase the IO
> > > > > wait boost, every time a task wakes up from an IO wait.
> > > > >
> > > > > - sugov_iowait_boost: is now in charge to reset/reduce the IO wait
> > > > > boost, every time a sugov update is triggered, as well as
> > > > > to (eventually) enforce the currently required IO boost value.
> > > >
> > > > I'm not sold on those function names; feels like we can do better,
> > > > although I'm struggling to come up with anything sensible just now.
> > >
> > > What about something like:
> > >
> > > sugov_iowait_init()
> > > since here we are mainly initializing the iowait boost
> > >
> > > sugov_iowait_boost()
> > > since here we are mainly applying the proper boost to each cpu
> > >
> > > Although they are not really so different...
> >
> > How about:
> >
> > sugov_iowait_boost() -- does the actual impulse/boost
> > sugov_iowait_apply() -- applies the boost state
> >
> > ?
>
> Whould say it can work too, and it also allows to add a:
>
> sugov_iowait_reset() -- resets boots state after
> TICK_NSEC CPU idle time
>
> Viresh, Rafael, Joel: any preferences or other suggestions?

Looks like no matter how we rename it, someone will find it confusing
:)

--
viresh