2017-07-09 17:08:35

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

Currently the iowait_boost feature in schedutil makes the frequency go to max.
This feature was added to handle a case that Peter described where the
throughput of operations involving continuous I/O requests [1] is reduced due
to running at a lower frequency, however the lower throughput itself causes
utilization to be low and hence causing frequency to be low hence its "stuck".

Instead of going to max, its also possible to achieve the same effect by
ramping up to max if there are repeated in_iowait wakeups happening. This patch
is an attempt to do that. We start from a lower frequency (iowait_boost_min)
and double the boost for every consecutive iowait update until we reach the
maximum iowait boost frequency (iowait_boost_max).

I ran a synthetic test on an x86 machine with intel_pstate in passive mode
using schedutil. The patch achieves the desired effect as the existing
behavior. Also tested on ARM64 platform and see that there the transient iowait
requests aren't causing frequency spikes.

[1] https://patchwork.kernel.org/patch/9735885/

Cc: Srinivas Pandruvada <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
Changes since v1:
- not using tunables to plainly turn off iowait boost anymore

kernel/sched/cpufreq_schedutil.c | 52 ++++++++++++++++++++++++++++++++++------
1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 622eed1b7658..4d9e8b96bed1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -53,7 +53,9 @@ struct sugov_cpu {
struct update_util_data update_util;
struct sugov_policy *sg_policy;

+ bool prev_iowait_boost;
unsigned long iowait_boost;
+ unsigned long iowait_boost_min;
unsigned long iowait_boost_max;
u64 last_update;

@@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
*max = cfs_max;
}

+static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
+{
+ sg_cpu->iowait_boost >>= 1;
+
+ if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
+ sg_cpu->iowait_boost = 0;
+}
+
static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
unsigned int flags)
{
if (flags & SCHED_CPUFREQ_IOWAIT) {
- sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
+ /* Remember for next time that we did an iowait boost */
+ sg_cpu->prev_iowait_boost = true;
+ if (sg_cpu->iowait_boost) {
+ sg_cpu->iowait_boost <<= 1;
+ sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
+ sg_cpu->iowait_boost_max);
+ } else {
+ sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;
+ }
} else if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;

/* Clear iowait_boost if the CPU apprears to have been idle. */
if (delta_ns > TICK_NSEC)
sg_cpu->iowait_boost = 0;
+
+ /*
+ * Since we don't decay iowait_boost when its consumed during
+ * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
+ */
+ if (sg_cpu->prev_iowait_boost) {
+ sugov_decay_iowait_boost(sg_cpu);
+ sg_cpu->prev_iowait_boost = false;
+ }
}
}

static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
- unsigned long *max)
+ unsigned long *max, unsigned int flags)
{
unsigned long boost_util = sg_cpu->iowait_boost;
unsigned long boost_max = sg_cpu->iowait_boost_max;
@@ -195,7 +222,16 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
*util = boost_util;
*max = boost_max;
}
- sg_cpu->iowait_boost >>= 1;
+
+ /*
+ * Incase iowait boost just happened on this CPU, don't reduce it right
+ * away since then the iowait boost will never increase on subsequent
+ * in_iowait wakeups.
+ */
+ if (flags & SCHED_CPUFREQ_IOWAIT && this_cpu_ptr(&sugov_cpu) == sg_cpu)
+ return;
+
+ sugov_decay_iowait_boost(sg_cpu);
}

#ifdef CONFIG_NO_HZ_COMMON
@@ -233,7 +269,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max);
- sugov_iowait_boost(sg_cpu, &util, &max);
+ sugov_iowait_boost(sg_cpu, &util, &max, flags);
next_f = get_next_freq(sg_policy, util, max);
/*
* Do not reduce the frequency if the CPU has not been idle
@@ -245,7 +281,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
sugov_update_commit(sg_policy, time, next_f);
}

-static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
+static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time,
+ unsigned int flags)
{
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
struct cpufreq_policy *policy = sg_policy->policy;
@@ -279,7 +316,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
max = j_max;
}

- sugov_iowait_boost(j_sg_cpu, &util, &max);
+ sugov_iowait_boost(j_sg_cpu, &util, &max, flags);
}

return get_next_freq(sg_policy, util, max);
@@ -308,7 +345,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
if (flags & SCHED_CPUFREQ_RT_DL)
next_f = sg_policy->policy->cpuinfo.max_freq;
else
- next_f = sugov_next_freq_shared(sg_cpu, time);
+ next_f = sugov_next_freq_shared(sg_cpu, time, flags);

sugov_update_commit(sg_policy, time, next_f);
}
@@ -612,6 +649,7 @@ static int sugov_start(struct cpufreq_policy *policy)
memset(sg_cpu, 0, sizeof(*sg_cpu));
sg_cpu->sg_policy = sg_policy;
sg_cpu->flags = SCHED_CPUFREQ_RT;
+ sg_cpu->iowait_boost_min = policy->cpuinfo.min_freq;
sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
policy_is_shared(policy) ?
--
2.13.2.725.g09c95d1e9-goog


2017-07-10 10:55:06

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

Hi Joel,

On 09/07/17 10:08, Joel Fernandes wrote:
> Currently the iowait_boost feature in schedutil makes the frequency go to max.
> This feature was added to handle a case that Peter described where the
> throughput of operations involving continuous I/O requests [1] is reduced due
> to running at a lower frequency, however the lower throughput itself causes
> utilization to be low and hence causing frequency to be low hence its "stuck".
>
> Instead of going to max, its also possible to achieve the same effect by
> ramping up to max if there are repeated in_iowait wakeups happening. This patch
> is an attempt to do that. We start from a lower frequency (iowait_boost_min)
> and double the boost for every consecutive iowait update until we reach the
> maximum iowait boost frequency (iowait_boost_max).
>
> I ran a synthetic test on an x86 machine with intel_pstate in passive mode
> using schedutil. The patch achieves the desired effect as the existing
> behavior. Also tested on ARM64 platform and see that there the transient iowait
> requests aren't causing frequency spikes.
>
> [1] https://patchwork.kernel.org/patch/9735885/
>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> Changes since v1:
> - not using tunables to plainly turn off iowait boost anymore
>
> kernel/sched/cpufreq_schedutil.c | 52 ++++++++++++++++++++++++++++++++++------
> 1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 622eed1b7658..4d9e8b96bed1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,7 +53,9 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> + bool prev_iowait_boost;
> unsigned long iowait_boost;
> + unsigned long iowait_boost_min;
> unsigned long iowait_boost_max;
> u64 last_update;
>
> @@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
> *max = cfs_max;
> }
>
> +static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
> +{
> + sg_cpu->iowait_boost >>= 1;
> +
> + if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
> + sg_cpu->iowait_boost = 0;
> +}
> +
> static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + /* Remember for next time that we did an iowait boost */
> + sg_cpu->prev_iowait_boost = true;
> + if (sg_cpu->iowait_boost) {
> + sg_cpu->iowait_boost <<= 1;
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
> + sg_cpu->iowait_boost_max);
> + } else {
> + sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;
> + }
> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> /* Clear iowait_boost if the CPU apprears to have been idle. */
> if (delta_ns > TICK_NSEC)
> sg_cpu->iowait_boost = 0;

In this case we might just also want to reset prev_iowait_boost
unconditionally and return.

> +
> + /*
> + * Since we don't decay iowait_boost when its consumed during
> + * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
> + */

Code below seems pretty self-explaning to me. :)

> + if (sg_cpu->prev_iowait_boost) {
> + sugov_decay_iowait_boost(sg_cpu);
> + sg_cpu->prev_iowait_boost = false;
> + }
> }
> }
>
> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> - unsigned long *max)
> + unsigned long *max, unsigned int flags)
> {
> unsigned long boost_util = sg_cpu->iowait_boost;
> unsigned long boost_max = sg_cpu->iowait_boost_max;
> @@ -195,7 +222,16 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> *util = boost_util;
> *max = boost_max;
> }
> - sg_cpu->iowait_boost >>= 1;
> +
> + /*
> + * Incase iowait boost just happened on this CPU, don't reduce it right
> + * away since then the iowait boost will never increase on subsequent
> + * in_iowait wakeups.
> + */
> + if (flags & SCHED_CPUFREQ_IOWAIT && this_cpu_ptr(&sugov_cpu) == sg_cpu)
> + return;
> +
> + sugov_decay_iowait_boost(sg_cpu);

Mmm, do we need to decay even when we are computing frequency requests
for a domain?

Considering it a per-cpu thing, isn't enough that it gets bumped up or
decayed only when a CPU does an update (by using the above from
sugov_update_shared)?

If we go this way I think we will only need to reset prev_iowait_boost
if delta_ns > TICK_NSEC during the for_each_cpu() loop of sugov_next_
freq_shared().

Thanks,

- Juri

2017-07-11 05:12:18

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

Hi Juri,

On Mon, Jul 10, 2017 at 3:55 AM, Juri Lelli <[email protected]> wrote:
> Hi Joel,
>
> On 09/07/17 10:08, Joel Fernandes wrote:
>> Currently the iowait_boost feature in schedutil makes the frequency go to max.
>> This feature was added to handle a case that Peter described where the
>> throughput of operations involving continuous I/O requests [1] is reduced due
>> to running at a lower frequency, however the lower throughput itself causes
>> utilization to be low and hence causing frequency to be low hence its "stuck".
>>
>> Instead of going to max, its also possible to achieve the same effect by
>> ramping up to max if there are repeated in_iowait wakeups happening. This patch
>> is an attempt to do that. We start from a lower frequency (iowait_boost_min)
>> and double the boost for every consecutive iowait update until we reach the
>> maximum iowait boost frequency (iowait_boost_max).
>>
>> I ran a synthetic test on an x86 machine with intel_pstate in passive mode
>> using schedutil. The patch achieves the desired effect as the existing
>> behavior. Also tested on ARM64 platform and see that there the transient iowait
>> requests aren't causing frequency spikes.
>>
>> [1] https://patchwork.kernel.org/patch/9735885/
>>
>> Cc: Srinivas Pandruvada <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Cc: Rafael J. Wysocki <[email protected]>
>> Cc: Viresh Kumar <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Suggested-by: Peter Zijlstra <[email protected]>
>> Signed-off-by: Joel Fernandes <[email protected]>
[..]
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 622eed1b7658..4d9e8b96bed1 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -53,7 +53,9 @@ struct sugov_cpu {
>> struct update_util_data update_util;
>> struct sugov_policy *sg_policy;
>>
>> + bool prev_iowait_boost;
>> unsigned long iowait_boost;
>> + unsigned long iowait_boost_min;
>> unsigned long iowait_boost_max;
>> u64 last_update;
>>
>> @@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
>> *max = cfs_max;
>> }
>>
>> +static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
>> +{
>> + sg_cpu->iowait_boost >>= 1;
>> +
>> + if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
>> + sg_cpu->iowait_boost = 0;
>> +}
>> +
>> static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> unsigned int flags)
>> {
>> if (flags & SCHED_CPUFREQ_IOWAIT) {
>> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>> + /* Remember for next time that we did an iowait boost */
>> + sg_cpu->prev_iowait_boost = true;
>> + if (sg_cpu->iowait_boost) {
>> + sg_cpu->iowait_boost <<= 1;
>> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
>> + sg_cpu->iowait_boost_max);
>> + } else {
>> + sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;
>> + }
>> } else if (sg_cpu->iowait_boost) {
>> s64 delta_ns = time - sg_cpu->last_update;
>>
>> /* Clear iowait_boost if the CPU apprears to have been idle. */
>> if (delta_ns > TICK_NSEC)
>> sg_cpu->iowait_boost = 0;
>
> In this case we might just also want to reset prev_iowait_boost
> unconditionally and return.

Ok, will do.

>> +
>> + /*
>> + * Since we don't decay iowait_boost when its consumed during
>> + * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
>> + */
>
> Code below seems pretty self-explaning to me. :)
>

Ok :)

>> + if (sg_cpu->prev_iowait_boost) {
>> + sugov_decay_iowait_boost(sg_cpu);
>> + sg_cpu->prev_iowait_boost = false;
>> + }
>> }
>> }
>>
>> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
>> - unsigned long *max)
>> + unsigned long *max, unsigned int flags)
>> {
>> unsigned long boost_util = sg_cpu->iowait_boost;
>> unsigned long boost_max = sg_cpu->iowait_boost_max;
>> @@ -195,7 +222,16 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
>> *util = boost_util;
>> *max = boost_max;
>> }
>> - sg_cpu->iowait_boost >>= 1;
>> +
>> + /*
>> + * Incase iowait boost just happened on this CPU, don't reduce it right
>> + * away since then the iowait boost will never increase on subsequent
>> + * in_iowait wakeups.
>> + */
>> + if (flags & SCHED_CPUFREQ_IOWAIT && this_cpu_ptr(&sugov_cpu) == sg_cpu)
>> + return;
>> +
>> + sugov_decay_iowait_boost(sg_cpu);
>
> Mmm, do we need to decay even when we are computing frequency requests
> for a domain?
>
> Considering it a per-cpu thing, isn't enough that it gets bumped up or
> decayed only when a CPU does an update (by using the above from
> sugov_update_shared)?
>
> If we go this way I think we will only need to reset prev_iowait_boost
> if delta_ns > TICK_NSEC during the for_each_cpu() loop of sugov_next_
> freq_shared().
>

Actually the "decay" was already being done before (without this
patch), I am just preserving the same old behavior where we do decay.
Perhaps your proposal can be a separate match? Or did I miss something
else subtle here?

Thanks,

-Joel

2017-07-11 07:14:07

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 10/07/17 22:12, Joel Fernandes wrote:
> Hi Juri,
>
> On Mon, Jul 10, 2017 at 3:55 AM, Juri Lelli <[email protected]> wrote:
> > Hi Joel,
> >
> > On 09/07/17 10:08, Joel Fernandes wrote:

[...]

> >> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> >> - unsigned long *max)
> >> + unsigned long *max, unsigned int flags)
> >> {
> >> unsigned long boost_util = sg_cpu->iowait_boost;
> >> unsigned long boost_max = sg_cpu->iowait_boost_max;
> >> @@ -195,7 +222,16 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> >> *util = boost_util;
> >> *max = boost_max;
> >> }
> >> - sg_cpu->iowait_boost >>= 1;
> >> +
> >> + /*
> >> + * Incase iowait boost just happened on this CPU, don't reduce it right
> >> + * away since then the iowait boost will never increase on subsequent
> >> + * in_iowait wakeups.
> >> + */
> >> + if (flags & SCHED_CPUFREQ_IOWAIT && this_cpu_ptr(&sugov_cpu) == sg_cpu)
> >> + return;
> >> +
> >> + sugov_decay_iowait_boost(sg_cpu);
> >
> > Mmm, do we need to decay even when we are computing frequency requests
> > for a domain?
> >
> > Considering it a per-cpu thing, isn't enough that it gets bumped up or
> > decayed only when a CPU does an update (by using the above from
> > sugov_update_shared)?
> >
> > If we go this way I think we will only need to reset prev_iowait_boost
> > if delta_ns > TICK_NSEC during the for_each_cpu() loop of sugov_next_
> > freq_shared().
> >
>
> Actually the "decay" was already being done before (without this
> patch), I am just preserving the same old behavior where we do decay.
> Perhaps your proposal can be a separate match? Or did I miss something
> else subtle here?
>

True, we are currently decaying anyway.

Looking again at this path made me however think if we really need to. I
guess we need currently, as we bump frenquency to max and then decay it.
But, with your changes, I was wondering if we can simplify the thing and
decay only on the per-CPU update path.

The other reason for trying to simplify this is that I don't
particularly like adding and consuming flags argument at this point, but
I guess we could refactor the code if this is really a problem.

Thanks,

- Juri

2017-07-11 10:14:38

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 09-07-17, 10:08, Joel Fernandes wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 622eed1b7658..4d9e8b96bed1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,7 +53,9 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> + bool prev_iowait_boost;
> unsigned long iowait_boost;
> + unsigned long iowait_boost_min;
> unsigned long iowait_boost_max;
> u64 last_update;
>
> @@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
> *max = cfs_max;
> }
>
> +static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
> +{
> + sg_cpu->iowait_boost >>= 1;
> +
> + if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
> + sg_cpu->iowait_boost = 0;
> +}
> +
> static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + /* Remember for next time that we did an iowait boost */
> + sg_cpu->prev_iowait_boost = true;
> + if (sg_cpu->iowait_boost) {
> + sg_cpu->iowait_boost <<= 1;
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
> + sg_cpu->iowait_boost_max);
> + } else {
> + sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;

I am not sure if boost should start from the min frequency, as the current
frequency will at least be equal to that. Which means that with no boost
initially, we will never increase the frequency for the first IOWAIT flag event.

> + }
> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> /* Clear iowait_boost if the CPU apprears to have been idle. */
> if (delta_ns > TICK_NSEC)
> sg_cpu->iowait_boost = 0;
> +
> + /*
> + * Since we don't decay iowait_boost when its consumed during
> + * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
> + */
> + if (sg_cpu->prev_iowait_boost) {

SCHED_CPUFREQ_IOWAIT flag is set only by CFS from the enqueue_task() and in many
cases we call the util hook twice from the same enqueue_task() instance before
returning (2nd one after updating util). And in such cases we will set
iowait_boost as 0 on the second call.

Have you ever seen two consecutive calls to sugov_set_iowait_boost() with IOWAIT
flag set ? Can we get the ratio of that against the other case where we have
IOWAIT flag set in first call, followed by one or more non-IOWAIT calls and then
IOWAIT again ?

I am asking because if the calls with IOWAIT flag aren't consecutive then we
will make iowait_boost as 0 in the next non-IOWAIT call.

--
viresh

2017-07-11 11:39:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Tue, Jul 11, 2017 at 03:44:32PM +0530, Viresh Kumar wrote:
> On 09-07-17, 10:08, Joel Fernandes wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 622eed1b7658..4d9e8b96bed1 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -53,7 +53,9 @@ struct sugov_cpu {
> > struct update_util_data update_util;
> > struct sugov_policy *sg_policy;
> >
> > + bool prev_iowait_boost;
> > unsigned long iowait_boost;
> > + unsigned long iowait_boost_min;
> > unsigned long iowait_boost_max;
> > u64 last_update;
> >
> > @@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
> > *max = cfs_max;
> > }
> >
> > +static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
> > +{
> > + sg_cpu->iowait_boost >>= 1;
> > +
> > + if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
> > + sg_cpu->iowait_boost = 0;
> > +}
> > +
> > static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > unsigned int flags)
> > {
> > if (flags & SCHED_CPUFREQ_IOWAIT) {
> > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> > + /* Remember for next time that we did an iowait boost */
> > + sg_cpu->prev_iowait_boost = true;
> > + if (sg_cpu->iowait_boost) {
> > + sg_cpu->iowait_boost <<= 1;
> > + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
> > + sg_cpu->iowait_boost_max);
> > + } else {
> > + sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;
>
> I am not sure if boost should start from the min frequency, as the current
> frequency will at least be equal to that. Which means that with no boost
> initially, we will never increase the frequency for the first IOWAIT flag event.

I suspect this actually works for Joel to get rid of the transient
spikes he was seeing. Starting at the current freq, as you suggest,
appears to make sense, but would add immediate transients back.

2017-07-11 11:40:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Sun, Jul 09, 2017 at 10:08:26AM -0700, Joel Fernandes wrote:
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,7 +53,9 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> + bool prev_iowait_boost;
> unsigned long iowait_boost;


So I absolutely detest 'bool' in composite types, but I see there's
already some of that in this file :/

2017-07-11 14:14:16

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

Hi Viresh,

On Tue, Jul 11, 2017 at 3:14 AM, Viresh Kumar <[email protected]> wrote:
> On 09-07-17, 10:08, Joel Fernandes wrote:
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 622eed1b7658..4d9e8b96bed1 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -53,7 +53,9 @@ struct sugov_cpu {
>> struct update_util_data update_util;
>> struct sugov_policy *sg_policy;
>>
>> + bool prev_iowait_boost;
>> unsigned long iowait_boost;
>> + unsigned long iowait_boost_min;
>> unsigned long iowait_boost_max;
>> u64 last_update;
>>
>> @@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
>> *max = cfs_max;
>> }
>>
>> +static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
>> +{
>> + sg_cpu->iowait_boost >>= 1;
>> +
>> + if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
>> + sg_cpu->iowait_boost = 0;
>> +}
>> +
>> static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> unsigned int flags)
>> {
>> if (flags & SCHED_CPUFREQ_IOWAIT) {
>> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>> + /* Remember for next time that we did an iowait boost */
>> + sg_cpu->prev_iowait_boost = true;
>> + if (sg_cpu->iowait_boost) {
>> + sg_cpu->iowait_boost <<= 1;
>> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
>> + sg_cpu->iowait_boost_max);
>> + } else {
>> + sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;
>
> I am not sure if boost should start from the min frequency, as the current
> frequency will at least be equal to that. Which means that with no boost
> initially, we will never increase the frequency for the first IOWAIT flag event.

I think the whole point of IOWAIT boost was to solve the issue with a
long sequence of repeated I/O requests as described in the commit
message. So IIUC there isn't a usecase for that (increase freq. on
first request). Also its just for the first couple of requests in my
testing and doesn't hurt the performance at all for the intended
usecase while still not causing transient spikes.

Another approach than setting min in sugov_set_iowait_boost, is, since
we have already retrieved the current util, we can check if flags ==
SCHED_CPUFREQ_IOWAIT, then set initial the iowait_boost such that
(iowait_boost / iowait_boost_max) is aleast equal to (util / max) or
iowait_boost_min, which ever is lower. This still will not increase
frequency on the first request, but will ensure the next one will
benefit. Then again I fear running into slightly longer-term
transients where 2 iowait requests are enough to boost the frequency
to high values. I can try to measure how often this can hurt power for
common usecases if you agree its worth exploring.

>
>> + }
>> } else if (sg_cpu->iowait_boost) {
>> s64 delta_ns = time - sg_cpu->last_update;
>>
>> /* Clear iowait_boost if the CPU apprears to have been idle. */
>> if (delta_ns > TICK_NSEC)
>> sg_cpu->iowait_boost = 0;
>> +
>> + /*
>> + * Since we don't decay iowait_boost when its consumed during
>> + * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
>> + */
>> + if (sg_cpu->prev_iowait_boost) {
>
> SCHED_CPUFREQ_IOWAIT flag is set only by CFS from the enqueue_task() and in many
> cases we call the util hook twice from the same enqueue_task() instance before
> returning (2nd one after updating util). And in such cases we will set
> iowait_boost as 0 on the second call.
>
> Have you ever seen two consecutive calls to sugov_set_iowait_boost() with IOWAIT
> flag set ? Can we get the ratio of that against the other case where we have
> IOWAIT flag set in first call, followed by one or more non-IOWAIT calls and then
> IOWAIT again ?
>
> I am asking because if the calls with IOWAIT flag aren't consecutive then we
> will make iowait_boost as 0 in the next non-IOWAIT call.

Yes, I've seen that happen in my testing (consecutive iowait). I
haven't seen the other case where you have IOWAIT followed by
non-IOWAIT for a repeated set of IOWAIT requests. Would you more
comfortable if we moved sugov_set_iowait_boost() after the
sugov_should_update_freq() ? That way if there are consecutive
requests in the same path, then it most likely rate-limiting will
prevent such updates. I will also try to collect some stats as you
suggested to see if how often if at all this can happen.

thanks,

-Joel

2017-07-11 14:18:49

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Tue, Jul 11, 2017 at 7:14 AM, Joel Fernandes <[email protected]> wrote:
[..]
>>> + }
>>> } else if (sg_cpu->iowait_boost) {
>>> s64 delta_ns = time - sg_cpu->last_update;
>>>
>>> /* Clear iowait_boost if the CPU apprears to have been idle. */
>>> if (delta_ns > TICK_NSEC)
>>> sg_cpu->iowait_boost = 0;
>>> +
>>> + /*
>>> + * Since we don't decay iowait_boost when its consumed during
>>> + * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
>>> + */
>>> + if (sg_cpu->prev_iowait_boost) {
>>
>> SCHED_CPUFREQ_IOWAIT flag is set only by CFS from the enqueue_task() and in many
>> cases we call the util hook twice from the same enqueue_task() instance before
>> returning (2nd one after updating util). And in such cases we will set
>> iowait_boost as 0 on the second call.
>>
>> Have you ever seen two consecutive calls to sugov_set_iowait_boost() with IOWAIT
>> flag set ? Can we get the ratio of that against the other case where we have
>> IOWAIT flag set in first call, followed by one or more non-IOWAIT calls and then
>> IOWAIT again ?
>>
>> I am asking because if the calls with IOWAIT flag aren't consecutive then we
>> will make iowait_boost as 0 in the next non-IOWAIT call.
>
> Yes, I've seen that happen in my testing (consecutive iowait). I
> haven't seen the other case where you have IOWAIT followed by
> non-IOWAIT for a repeated set of IOWAIT requests. Would you more
> comfortable if we moved sugov_set_iowait_boost() after the
> sugov_should_update_freq() ? That way if there are consecutive
> requests in the same path, then it most likely rate-limiting will
> prevent such updates. I will also try to collect some stats as you
> suggested to see if how often if at all this can happen.

Just to be more clear, I was saying that I've only seen repeated
IOWAIT requests in the update path, not IOWAIT followed by non-IOWAIT
cpufreq updates for a repeated sequence of IOWAIT wakeups.

thanks,

-Joel

2017-07-11 14:33:33

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

Hi Juri,

On Tue, Jul 11, 2017 at 12:14 AM, Juri Lelli <[email protected]> wrote:
[..]
>> > Considering it a per-cpu thing, isn't enough that it gets bumped up or
>> > decayed only when a CPU does an update (by using the above from
>> > sugov_update_shared)?
>> >
>> > If we go this way I think we will only need to reset prev_iowait_boost
>> > if delta_ns > TICK_NSEC during the for_each_cpu() loop of sugov_next_
>> > freq_shared().
>> >
>>
>> Actually the "decay" was already being done before (without this
>> patch), I am just preserving the same old behavior where we do decay.
>> Perhaps your proposal can be a separate match? Or did I miss something
>> else subtle here?
>>
>
> True, we are currently decaying anyway.
>
> Looking again at this path made me however think if we really need to. I
> guess we need currently, as we bump frenquency to max and then decay it.
> But, with your changes, I was wondering if we can simplify the thing and
> decay only on the per-CPU update path.

Yes that makes sense to me, but even if we're at max, we should still
benefit from your idea right? (I didn't follow why being at min makes
the idea better, I think its a good idea in both cases (whether we are
boosting to max and ramping down or starting from the min) but let me
know if I missed something)

If I understand correctly what you're proposing:
1. Remove the decay from sugov_iowait_boost
2. Add the iowait_boost decay unconditionally to
sugov_set_iowait_boost for the !SCHED_CPUFREQ_IOWAIT case.

That would also get rid of the prev_iowait_boost flag and simplify
things, and also address Peter's concern of adding 'bool' in composite
types :)

thanks,

-Joel

2017-07-11 14:38:37

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 11/07/17 07:33, Joel Fernandes wrote:
> Hi Juri,
>
> On Tue, Jul 11, 2017 at 12:14 AM, Juri Lelli <[email protected]> wrote:
> [..]
> >> > Considering it a per-cpu thing, isn't enough that it gets bumped up or
> >> > decayed only when a CPU does an update (by using the above from
> >> > sugov_update_shared)?
> >> >
> >> > If we go this way I think we will only need to reset prev_iowait_boost
> >> > if delta_ns > TICK_NSEC during the for_each_cpu() loop of sugov_next_
> >> > freq_shared().
> >> >
> >>
> >> Actually the "decay" was already being done before (without this
> >> patch), I am just preserving the same old behavior where we do decay.
> >> Perhaps your proposal can be a separate match? Or did I miss something
> >> else subtle here?
> >>
> >
> > True, we are currently decaying anyway.
> >
> > Looking again at this path made me however think if we really need to. I
> > guess we need currently, as we bump frenquency to max and then decay it.
> > But, with your changes, I was wondering if we can simplify the thing and
> > decay only on the per-CPU update path.
>
> Yes that makes sense to me, but even if we're at max, we should still
> benefit from your idea right? (I didn't follow why being at min makes
> the idea better, I think its a good idea in both cases (whether we are
> boosting to max and ramping down or starting from the min) but let me
> know if I missed something)
>

Not better, but I thought that if we start from max and decay we
probably need to have more chances to decay faster (so decaying also
when aggregating request for a domain is probably needed)?

> If I understand correctly what you're proposing:
> 1. Remove the decay from sugov_iowait_boost
> 2. Add the iowait_boost decay unconditionally to
> sugov_set_iowait_boost for the !SCHED_CPUFREQ_IOWAIT case.
>
> That would also get rid of the prev_iowait_boost flag and simplify
> things, and also address Peter's concern of adding 'bool' in composite
> types :)
>

OK, just a suggestion however, didn't play with the idea myself, it
might not work. :)

Thanks,

- Juri

2017-07-12 05:00:41

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 11-07-17, 07:14, Joel Fernandes wrote:
> I think the whole point of IOWAIT boost was to solve the issue with a
> long sequence of repeated I/O requests as described in the commit
> message. So IIUC there isn't a usecase for that (increase freq. on
> first request).

Right. So we can take example that Peter gave earlier. Task runs .1 ms and waits
for IO for 1 ms (at max speed). But there is high possibility that the util
update handler gets called within that 1 ms (from non-enqueue paths) and because
you chose to reduce iowait boost from sugov_set_iowait_boost() in your commit,
we can easily end up ignoring iowait boosting.

> Also its just for the first couple of requests in my
> testing and doesn't hurt the performance at all for the intended
> usecase while still not causing transient spikes.

We can have bad enough timing where the util handler gets called right in that 1
ms of IOWAIT period and we will never boost.

> Another approach than setting min in sugov_set_iowait_boost, is, since
> we have already retrieved the current util, we can check if flags ==
> SCHED_CPUFREQ_IOWAIT, then set initial the iowait_boost such that
> (iowait_boost / iowait_boost_max) is aleast equal to (util / max) or
> iowait_boost_min, which ever is lower.

So my concerns weren't only about the initial min value, but also that you
reduce the freq from sugov_set_iowait_boost(). We can discuss what the ideal
value to start with can be.

> This still will not increase
> frequency on the first request, but will ensure the next one will
> benefit.

If there is no non-enqueue path request lands.

> Yes, I've seen that happen in my testing (consecutive iowait).

The CFS scheduler can send a util update request every 1 ms for util updates and
I am not sure why isn't that happening in your case.

How much is the time between two consecutive IOWAIT requests in your case ?
Maybe it is too less (Ofcourse it isn't in your control :). But if we take
Peter's example, then it will surely have a non-enqueue path request between two
IOWAIT requests.

> I
> haven't seen the other case where you have IOWAIT followed by
> non-IOWAIT for a repeated set of IOWAIT requests. Would you more
> comfortable if we moved sugov_set_iowait_boost() after the
> sugov_should_update_freq() ?

That may make us ignore all IOWAIT requests that happen between rate_limit_us
time. And that would be bad IMHO.

> That way if there are consecutive
> requests in the same path, then it most likely rate-limiting will
> prevent such updates. I will also try to collect some stats as you
> suggested to see if how often if at all this can happen.

Completely untested, but what about something like this ? This should get rid of
the spikes you were getting.

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 076a2e31951c..3459f327c94e 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -53,6 +53,7 @@ struct sugov_cpu {
struct update_util_data update_util;
struct sugov_policy *sg_policy;

+ bool iowait_boost_pending;
unsigned long iowait_boost;
unsigned long iowait_boost_max;
u64 last_update;
@@ -169,7 +170,12 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
unsigned int flags)
{
if (flags & SCHED_CPUFREQ_IOWAIT) {
- sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
+ sg_cpu->iowait_boost_pending = true;
+
+ if (!sg_cpu->iowait_boost) {
+ sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->cur;
+ sg_cpu->iowait_boost >>= 1;
+ }
} else if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;

@@ -182,17 +188,26 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
unsigned long *max)
{
- unsigned long boost_util = sg_cpu->iowait_boost;
- unsigned long boost_max = sg_cpu->iowait_boost_max;
+ unsigned long boost_util, boost_max;

- if (!boost_util)
+ if (!sg_cpu->iowait_boost)
return;

+ if (sg_cpu->iowait_boost_pending) {
+ sg_cpu->iowait_boost_pending = false;
+ sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
+ sg_cpu->iowait_boost_max);
+ } else {
+ sg_cpu->iowait_boost >>= 1;
+ }
+
+ boost_util = sg_cpu->iowait_boost;
+ boost_max = sg_cpu->iowait_boost_max;
+
if (*util * boost_max < *max * boost_util) {
*util = boost_util;
*max = boost_max;
}
- sg_cpu->iowait_boost >>= 1;
}

#ifdef CONFIG_NO_HZ_COMMON

--
viresh

2017-07-12 09:36:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Wed, Jul 12, 2017 at 10:30:35AM +0530, Viresh Kumar wrote:
> On 11-07-17, 07:14, Joel Fernandes wrote:

> > Another approach than setting min in sugov_set_iowait_boost, is, since
> > we have already retrieved the current util, we can check if flags ==
> > SCHED_CPUFREQ_IOWAIT, then set initial the iowait_boost such that
> > (iowait_boost / iowait_boost_max) is aleast equal to (util / max) or
> > iowait_boost_min, which ever is lower.
>
> So my concerns weren't only about the initial min value, but also that you
> reduce the freq from sugov_set_iowait_boost(). We can discuss what the ideal
> value to start with can be.

I'm not sure I see that. He only mucks with iowait_boost, not the actual
frequency afaict.

And sugov_iowait_boost() picks the highest of util vs iowait_boost,
which wasn't changed.

Or am I completely missing something? (that code is a bit hard to
follow)

2017-07-12 09:46:38

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 12-07-17, 11:36, Peter Zijlstra wrote:
> On Wed, Jul 12, 2017 at 10:30:35AM +0530, Viresh Kumar wrote:
> > On 11-07-17, 07:14, Joel Fernandes wrote:
>
> > > Another approach than setting min in sugov_set_iowait_boost, is, since
> > > we have already retrieved the current util, we can check if flags ==
> > > SCHED_CPUFREQ_IOWAIT, then set initial the iowait_boost such that
> > > (iowait_boost / iowait_boost_max) is aleast equal to (util / max) or
> > > iowait_boost_min, which ever is lower.
> >
> > So my concerns weren't only about the initial min value, but also that you
> > reduce the freq from sugov_set_iowait_boost(). We can discuss what the ideal
> > value to start with can be.
>
> I'm not sure I see that. He only mucks with iowait_boost, not the actual
> frequency afaict.
>
> And sugov_iowait_boost() picks the highest of util vs iowait_boost,
> which wasn't changed.
>
> Or am I completely missing something? (that code is a bit hard to
> follow)

No, I wasn't clear enough. Sorry about that. Lemme try again:

Suppose min freq is 500 MHz and Max is 2 GHz. The iowait-boost is
set to 1 GHz right now (because of previous events with IOWAIT flag
set), and sugov_set_iowait_boost() gets called again with IOWAIT flag,
we boost the iowait-boost value to 2 GHz. We are in the rate_limit_us
window right now, we return without changing the frequency.

If the next call into the schedutil governor happens due to normal
util-update, flags will be passed as 0. With the current patch, we
will bring iowait-boost back to 1 GHz (before updating the real
frequency to 2 GHz) as the prev-iowait-boost boolean would be set.

And even if the task is periodically getting queued after IOWAIT,
actual boosting may not happen at all in some cases.

--
viresh

2017-07-12 13:28:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Wed, Jul 12, 2017 at 03:16:23PM +0530, Viresh Kumar wrote:

> No, I wasn't clear enough. Sorry about that. Lemme try again:
>
> Suppose min freq is 500 MHz and Max is 2 GHz. The iowait-boost is
> set to 1 GHz right now (because of previous events with IOWAIT flag
> set), and sugov_set_iowait_boost() gets called again with IOWAIT flag,
> we boost the iowait-boost value to 2 GHz. We are in the rate_limit_us
> window right now, we return without changing the frequency.
>
> If the next call into the schedutil governor happens due to normal
> util-update, flags will be passed as 0. With the current patch, we
> will bring iowait-boost back to 1 GHz (before updating the real
> frequency to 2 GHz) as the prev-iowait-boost boolean would be set.
>
> And even if the task is periodically getting queued after IOWAIT,
> actual boosting may not happen at all in some cases.

Hmm, so you're worried about that ratelimit stuff? Shouldn't we fix that
independently -- IIRC Rafael proposed a max-filter over the window.

2017-07-13 02:02:56

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

Hi Viresh,

On Tue, Jul 11, 2017 at 10:00 PM, Viresh Kumar <[email protected]> wrote:
[..]
>> Another approach than setting min in sugov_set_iowait_boost, is, since
>> we have already retrieved the current util, we can check if flags ==
>> SCHED_CPUFREQ_IOWAIT, then set initial the iowait_boost such that
>> (iowait_boost / iowait_boost_max) is aleast equal to (util / max) or
>> iowait_boost_min, which ever is lower.
>
> So my concerns weren't only about the initial min value, but also that you
> reduce the freq from sugov_set_iowait_boost(). We can discuss what the ideal
> value to start with can be.
>
>> This still will not increase
>> frequency on the first request, but will ensure the next one will
>> benefit.
>
> If there is no non-enqueue path request lands.
>
>> Yes, I've seen that happen in my testing (consecutive iowait).
>
> The CFS scheduler can send a util update request every 1 ms for util updates and
> I am not sure why isn't that happening in your case.
>
> How much is the time between two consecutive IOWAIT requests in your case ?
> Maybe it is too less (Ofcourse it isn't in your control :). But if we take
> Peter's example, then it will surely have a non-enqueue path request between two
> IOWAIT requests.

Hmm, Ok. I can try to do some measurements about consecutive calls
soon and let you know how often it happens. I also noticed its
possible to call twice in the enqueue path itself as well. It probably
affect my patch more because of starting from min than max.

>> I
>> haven't seen the other case where you have IOWAIT followed by
>> non-IOWAIT for a repeated set of IOWAIT requests. Would you more
>> comfortable if we moved sugov_set_iowait_boost() after the
>> sugov_should_update_freq() ?
>
> That may make us ignore all IOWAIT requests that happen between rate_limit_us
> time. And that would be bad IMHO.

True..

>> That way if there are consecutive
>> requests in the same path, then it most likely rate-limiting will
>> prevent such updates. I will also try to collect some stats as you
>> suggested to see if how often if at all this can happen.
>
> Completely untested, but what about something like this ? This should get rid of
> the spikes you were getting.

I actually like your approach better of consuming the flag first, but
computing the boost later when its used. Its also more immune to the
problem you described. Some comments below:

>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 076a2e31951c..3459f327c94e 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,6 +53,7 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> + bool iowait_boost_pending;
> unsigned long iowait_boost;
> unsigned long iowait_boost_max;
> u64 last_update;
> @@ -169,7 +170,12 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + sg_cpu->iowait_boost_pending = true;
> +
> + if (!sg_cpu->iowait_boost) {
> + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->cur;
> + sg_cpu->iowait_boost >>= 1;
> + }

Hmm, this doesn't look right to me.. why are we decaying in this path?
I think nothing else other than setting the pending flag and the
initial iowait_boost value is needed here.

Also I feel this is more "spike" prone than setting it initially to
min. As Peter was saying, since we apply the boost only if it
increases the frequency and not decreases, starting from min should at
worst just result in ignoring of the boost the first time.

> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> @@ -182,17 +188,26 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> unsigned long *max)
> {
> - unsigned long boost_util = sg_cpu->iowait_boost;
> - unsigned long boost_max = sg_cpu->iowait_boost_max;
> + unsigned long boost_util, boost_max;
>
> - if (!boost_util)
> + if (!sg_cpu->iowait_boost)
> return;
>
> + if (sg_cpu->iowait_boost_pending) {
> + sg_cpu->iowait_boost_pending = false;
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> + sg_cpu->iowait_boost_max);
> + } else {
> + sg_cpu->iowait_boost >>= 1;
> + }

And then this path will do the decay correctly when the boost is applied.

thanks,

-Joel

2017-07-13 03:35:36

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 12-07-17, 19:02, Joel Fernandes wrote:
> On Tue, Jul 11, 2017 at 10:00 PM, Viresh Kumar <[email protected]> wrote:

> Hmm, Ok. I can try to do some measurements about consecutive calls
> soon and let you know how often it happens. I also noticed its
> possible to call twice in the enqueue path itself as well.

Yeah, I think I told you that in previous replies.

> It probably
> affect my patch more because of starting from min than max.

Yeah, it will.

> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 076a2e31951c..3459f327c94e 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -53,6 +53,7 @@ struct sugov_cpu {
> > struct update_util_data update_util;
> > struct sugov_policy *sg_policy;
> >
> > + bool iowait_boost_pending;
> > unsigned long iowait_boost;
> > unsigned long iowait_boost_max;
> > u64 last_update;
> > @@ -169,7 +170,12 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > unsigned int flags)
> > {
> > if (flags & SCHED_CPUFREQ_IOWAIT) {
> > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> > + sg_cpu->iowait_boost_pending = true;
> > +
> > + if (!sg_cpu->iowait_boost) {
> > + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->cur;
> > + sg_cpu->iowait_boost >>= 1;
> > + }
>
> Hmm, this doesn't look right to me.. why are we decaying in this path?

I am convinced that we need a comment here as what I did wasn't
straight forward :)

The idea: We wouldn't increase the frequency for the first event with
IOWAIT flag set, but on every subsequent event (captured over
rate-limit-us window). You may have noticed that I am updating the
boost values in sugov_iowait_boost() a bit earlier now and they will
affect the current frequency update as well.

Because I wanted to do a 2X there unconditionally if
iowait_boost_pending is set, I had to make it half for the very first
event with IOWAIT flag set.

End result:
- First event, we stay at current freq.
- Second and all consecutive events every rate_limit_us time, 2X
- If there is no IOWAIT event in last rate_limit_us, X/2

Makes sense ?

> I think nothing else other than setting the pending flag and the
> initial iowait_boost value is needed here.
>
> Also I feel this is more "spike" prone than setting it initially to
> min. As Peter was saying, since we apply the boost only if it
> increases the frequency and not decreases, starting from min should at
> worst just result in ignoring of the boost the first time.

Yeah, we can discuss on what should be the default value to start with
and I would be fine with "min" if Rafael is, as he proposed the iowait
thing to begin with after seeing some real issues on hardware.

> > } else if (sg_cpu->iowait_boost) {
> > s64 delta_ns = time - sg_cpu->last_update;
> >
> > @@ -182,17 +188,26 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> > unsigned long *max)
> > {
> > - unsigned long boost_util = sg_cpu->iowait_boost;
> > - unsigned long boost_max = sg_cpu->iowait_boost_max;
> > + unsigned long boost_util, boost_max;
> >
> > - if (!boost_util)
> > + if (!sg_cpu->iowait_boost)
> > return;
> >
> > + if (sg_cpu->iowait_boost_pending) {
> > + sg_cpu->iowait_boost_pending = false;
> > + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,

I was talking about this unconditional 2X earlier.

> > + sg_cpu->iowait_boost_max);
> > + } else {
> > + sg_cpu->iowait_boost >>= 1;
> > + }
>
> And then this path will do the decay correctly when the boost is applied.

Yeah, the else part should do good.

--
viresh

2017-07-13 03:42:23

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 12-07-17, 15:28, Peter Zijlstra wrote:
> Hmm, so you're worried about that ratelimit stuff?

Yeah, somewhat.

> Shouldn't we fix that
> independently -- IIRC Rafael proposed a max-filter over the window.

I had some doubts about that idea in general and shared them earlier
with you:

https://marc.info/?l=linux-kernel&m=149683722822537&w=2

But anyway, I think Joel is somewhat convinced to adapt to the
solution I gave and that wouldn't have any of the problems I shared.
All good now :)

--
viresh

2017-07-13 03:53:02

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Wed, Jul 12, 2017 at 8:35 PM, Viresh Kumar <[email protected]> wrote:
[..]
>
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index 076a2e31951c..3459f327c94e 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -53,6 +53,7 @@ struct sugov_cpu {
>> > struct update_util_data update_util;
>> > struct sugov_policy *sg_policy;
>> >
>> > + bool iowait_boost_pending;
>> > unsigned long iowait_boost;
>> > unsigned long iowait_boost_max;
>> > u64 last_update;
>> > @@ -169,7 +170,12 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> > unsigned int flags)
>> > {
>> > if (flags & SCHED_CPUFREQ_IOWAIT) {
>> > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>> > + sg_cpu->iowait_boost_pending = true;
>> > +
>> > + if (!sg_cpu->iowait_boost) {
>> > + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->cur;
>> > + sg_cpu->iowait_boost >>= 1;
>> > + }
>>
>> Hmm, this doesn't look right to me.. why are we decaying in this path?
>
> I am convinced that we need a comment here as what I did wasn't
> straight forward :)
>
> The idea: We wouldn't increase the frequency for the first event with
> IOWAIT flag set, but on every subsequent event (captured over
> rate-limit-us window). You may have noticed that I am updating the
> boost values in sugov_iowait_boost() a bit earlier now and they will
> affect the current frequency update as well.
>
> Because I wanted to do a 2X there unconditionally if
> iowait_boost_pending is set, I had to make it half for the very first
> event with IOWAIT flag set.
>
> End result:
> - First event, we stay at current freq.
> - Second and all consecutive events every rate_limit_us time, 2X
> - If there is no IOWAIT event in last rate_limit_us, X/2
>
> Makes sense ?
>

Yes, that makes sense, its a bit subtle but I get what you're doing
now and I agree with it. Its also cleaner than my original patch :-)
and yeah definitely needs a comment ;-)

thanks,

-Joel

2017-07-13 04:14:47

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On 12-07-17, 20:52, Joel Fernandes wrote:
> Yes, that makes sense, its a bit subtle but I get what you're doing
> now and I agree with it. Its also cleaner than my original patch :-)
> and yeah definitely needs a comment ;-)

And I have full trust on you to include that comment in you V5 :)

--
viresh

2017-07-13 04:16:34

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

On Wed, Jul 12, 2017 at 9:14 PM, Viresh Kumar <[email protected]> wrote:
> On 12-07-17, 20:52, Joel Fernandes wrote:
>> Yes, that makes sense, its a bit subtle but I get what you're doing
>> now and I agree with it. Its also cleaner than my original patch :-)
>> and yeah definitely needs a comment ;-)
>
> And I have full trust on you to include that comment in you V5 :)

You bet :)


thanks,

-Joel