Currently the iowait_boost feature in schedutil makes the frequency go to max
on iowait wakeups. This feature was added to handle a case that Peter
described where the throughput of operations involving continuous I/O requests
[1] is reduced due to running at a lower frequency, however the lower
throughput itself causes utilization to be low and hence causing frequency to
be low hence its "stuck".
Instead of going to max, its also possible to achieve the same effect by
ramping up to max if there are repeated in_iowait wakeups happening. This patch
is an attempt to do that. We start from a lower frequency (policy->mind)
and double the boost for every consecutive iowait update until we reach the
maximum iowait boost frequency (iowait_boost_max).
I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86 machine
with intel_pstate in passive mode using schedutil. In this test the iowait_boost
value ramped from 800MHz to 4GHz in 60ms. The patch achieves the desired improved
throughput as the existing behavior.
Also while at it, make iowait_boost and iowait_boost_max as unsigned int since
its unit is kHz and this is consistent with struct cpufreq_policy.
[1] https://patchwork.kernel.org/patch/9735885/
Cc: Srinivas Pandruvada <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
This version is based on some ideas from Viresh and Juri in v4. Viresh, one
difference between the idea we just discussed is, I am scaling up/down the
boost only after consuming it. This has the effect of slightly delaying the
"deboost" but achieves the same boost ramp time. Its more cleaner in the code
IMO to avoid the scaling up and then down on the initial boost. Note that I
also dropped iowait_boost_min and now I'm just starting the initial boost from
policy->min since as I mentioned in the commit above, the ramp of the
iowait_boost value is very quick and for the usecase its intended for, it works
fine. Hope this is acceptable. Thanks.
kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 622eed1b7658..4225bbada88d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -53,8 +53,9 @@ struct sugov_cpu {
struct update_util_data update_util;
struct sugov_policy *sg_policy;
- unsigned long iowait_boost;
- unsigned long iowait_boost_max;
+ bool iowait_boost_pending;
+ unsigned int iowait_boost;
+ unsigned int iowait_boost_max;
u64 last_update;
/* The fields below are only needed when sharing a policy. */
@@ -172,30 +173,43 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
unsigned int flags)
{
if (flags & SCHED_CPUFREQ_IOWAIT) {
- sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
+ sg_cpu->iowait_boost_pending = true;
+ sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
+ sg_cpu->sg_policy->policy->min);
} else if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;
/* Clear iowait_boost if the CPU apprears to have been idle. */
- if (delta_ns > TICK_NSEC)
+ if (delta_ns > TICK_NSEC) {
sg_cpu->iowait_boost = 0;
+ sg_cpu->iowait_boost_pending = false;
+ }
}
}
static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
unsigned long *max)
{
- unsigned long boost_util = sg_cpu->iowait_boost;
- unsigned long boost_max = sg_cpu->iowait_boost_max;
+ unsigned long boost_util, boost_max;
- if (!boost_util)
+ if (!sg_cpu->iowait_boost)
return;
+ boost_util = sg_cpu->iowait_boost;
+ boost_max = sg_cpu->iowait_boost_max;
+
if (*util * boost_max < *max * boost_util) {
*util = boost_util;
*max = boost_max;
}
- sg_cpu->iowait_boost >>= 1;
+
+ if (sg_cpu->iowait_boost_pending) {
+ sg_cpu->iowait_boost_pending = false;
+ sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
+ sg_cpu->iowait_boost_max);
+ } else {
+ sg_cpu->iowait_boost >>= 1;
+ }
}
#ifdef CONFIG_NO_HZ_COMMON
@@ -267,6 +281,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
delta_ns = time - j_sg_cpu->last_update;
if (delta_ns > TICK_NSEC) {
j_sg_cpu->iowait_boost = 0;
+ j_sg_cpu->iowait_boost_pending = false;
continue;
}
if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
--
2.13.2.932.g7449e964c-goog
On 16-07-17, 01:04, Joel Fernandes wrote:
> Currently the iowait_boost feature in schedutil makes the frequency go to max
> on iowait wakeups. This feature was added to handle a case that Peter
> described where the throughput of operations involving continuous I/O requests
> [1] is reduced due to running at a lower frequency, however the lower
> throughput itself causes utilization to be low and hence causing frequency to
> be low hence its "stuck".
>
> Instead of going to max, its also possible to achieve the same effect by
> ramping up to max if there are repeated in_iowait wakeups happening. This patch
> is an attempt to do that. We start from a lower frequency (policy->mind)
s/mind/min/
> and double the boost for every consecutive iowait update until we reach the
> maximum iowait boost frequency (iowait_boost_max).
>
> I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86 machine
> with intel_pstate in passive mode using schedutil. In this test the iowait_boost
> value ramped from 800MHz to 4GHz in 60ms. The patch achieves the desired improved
> throughput as the existing behavior.
>
> Also while at it, make iowait_boost and iowait_boost_max as unsigned int since
> its unit is kHz and this is consistent with struct cpufreq_policy.
>
> [1] https://patchwork.kernel.org/patch/9735885/
>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> This version is based on some ideas from Viresh and Juri in v4. Viresh, one
> difference between the idea we just discussed is, I am scaling up/down the
> boost only after consuming it. This has the effect of slightly delaying the
> "deboost" but achieves the same boost ramp time. Its more cleaner in the code
> IMO to avoid the scaling up and then down on the initial boost. Note that I
> also dropped iowait_boost_min and now I'm just starting the initial boost from
> policy->min since as I mentioned in the commit above, the ramp of the
> iowait_boost value is very quick and for the usecase its intended for, it works
> fine. Hope this is acceptable. Thanks.
>
> kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 622eed1b7658..4225bbada88d 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,8 +53,9 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> - unsigned long iowait_boost;
> - unsigned long iowait_boost_max;
> + bool iowait_boost_pending;
> + unsigned int iowait_boost;
> + unsigned int iowait_boost_max;
> u64 last_update;
>
> /* The fields below are only needed when sharing a policy. */
> @@ -172,30 +173,43 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + sg_cpu->iowait_boost_pending = true;
> + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
> + sg_cpu->sg_policy->policy->min);
> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> /* Clear iowait_boost if the CPU apprears to have been idle. */
> - if (delta_ns > TICK_NSEC)
> + if (delta_ns > TICK_NSEC) {
> sg_cpu->iowait_boost = 0;
> + sg_cpu->iowait_boost_pending = false;
> + }
We don't really need to clear this flag here as we are already making
iowait_boost as 0 and that's what we check while using boost.
> }
> }
>
> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> unsigned long *max)
> {
> - unsigned long boost_util = sg_cpu->iowait_boost;
> - unsigned long boost_max = sg_cpu->iowait_boost_max;
> + unsigned long boost_util, boost_max;
>
> - if (!boost_util)
> + if (!sg_cpu->iowait_boost)
> return;
>
> + boost_util = sg_cpu->iowait_boost;
> + boost_max = sg_cpu->iowait_boost_max;
> +
The above changes are not required anymore (and were required only
with my patch).
> if (*util * boost_max < *max * boost_util) {
> *util = boost_util;
> *max = boost_max;
> }
> - sg_cpu->iowait_boost >>= 1;
> +
> + if (sg_cpu->iowait_boost_pending) {
> + sg_cpu->iowait_boost_pending = false;
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> + sg_cpu->iowait_boost_max);
Now this has a problem. We will also boost after waiting for
rate_limit_us. And that's why I had proposed the tricky solution in
the first place. I thought we wanted to avoid instant boost only for
the first iteration, but after that we wanted to do it ASAP. Isn't it?
Now that you are using policy->min instead of policy->cur, we can
simplify the solution I proposed and always do 2 * iowait_boost before
getting current util/max in above if loop. i.e. we will start iowait
boost with min * 2 instead of min and that should be fine.
> + } else {
> + sg_cpu->iowait_boost >>= 1;
> + }
> }
>
> #ifdef CONFIG_NO_HZ_COMMON
> @@ -267,6 +281,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> delta_ns = time - j_sg_cpu->last_update;
> if (delta_ns > TICK_NSEC) {
> j_sg_cpu->iowait_boost = 0;
> + j_sg_cpu->iowait_boost_pending = false;
Not required here as well.
> continue;
> }
> if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
> --
> 2.13.2.932.g7449e964c-goog
--
viresh
Hi Viresh,
On Mon, Jul 17, 2017 at 1:04 AM, Viresh Kumar <[email protected]> wrote:
> On 16-07-17, 01:04, Joel Fernandes wrote:
>> Currently the iowait_boost feature in schedutil makes the frequency go to max
>> on iowait wakeups. This feature was added to handle a case that Peter
>> described where the throughput of operations involving continuous I/O requests
>> [1] is reduced due to running at a lower frequency, however the lower
>> throughput itself causes utilization to be low and hence causing frequency to
>> be low hence its "stuck".
>>
>> Instead of going to max, its also possible to achieve the same effect by
>> ramping up to max if there are repeated in_iowait wakeups happening. This patch
>> is an attempt to do that. We start from a lower frequency (policy->mind)
>
> s/mind/min/
>
>> and double the boost for every consecutive iowait update until we reach the
>> maximum iowait boost frequency (iowait_boost_max).
>>
>> I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86 machine
>> with intel_pstate in passive mode using schedutil. In this test the iowait_boost
>> value ramped from 800MHz to 4GHz in 60ms. The patch achieves the desired improved
>> throughput as the existing behavior.
>>
>> Also while at it, make iowait_boost and iowait_boost_max as unsigned int since
>> its unit is kHz and this is consistent with struct cpufreq_policy.
>>
>> [1] https://patchwork.kernel.org/patch/9735885/
>>
>> Cc: Srinivas Pandruvada <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Cc: Rafael J. Wysocki <[email protected]>
>> Cc: Viresh Kumar <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Suggested-by: Peter Zijlstra <[email protected]>
>> Signed-off-by: Joel Fernandes <[email protected]>
>> ---
>> This version is based on some ideas from Viresh and Juri in v4. Viresh, one
>> difference between the idea we just discussed is, I am scaling up/down the
>> boost only after consuming it. This has the effect of slightly delaying the
>> "deboost" but achieves the same boost ramp time. Its more cleaner in the code
>> IMO to avoid the scaling up and then down on the initial boost. Note that I
>> also dropped iowait_boost_min and now I'm just starting the initial boost from
>> policy->min since as I mentioned in the commit above, the ramp of the
>> iowait_boost value is very quick and for the usecase its intended for, it works
>> fine. Hope this is acceptable. Thanks.
>>
>> kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++--------
>> 1 file changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 622eed1b7658..4225bbada88d 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -53,8 +53,9 @@ struct sugov_cpu {
>> struct update_util_data update_util;
>> struct sugov_policy *sg_policy;
>>
>> - unsigned long iowait_boost;
>> - unsigned long iowait_boost_max;
>> + bool iowait_boost_pending;
>> + unsigned int iowait_boost;
>> + unsigned int iowait_boost_max;
>> u64 last_update;
>>
>> /* The fields below are only needed when sharing a policy. */
>> @@ -172,30 +173,43 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> unsigned int flags)
>> {
>> if (flags & SCHED_CPUFREQ_IOWAIT) {
>> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>> + sg_cpu->iowait_boost_pending = true;
>> + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
>> + sg_cpu->sg_policy->policy->min);
>> } else if (sg_cpu->iowait_boost) {
>> s64 delta_ns = time - sg_cpu->last_update;
>>
>> /* Clear iowait_boost if the CPU apprears to have been idle. */
>> - if (delta_ns > TICK_NSEC)
>> + if (delta_ns > TICK_NSEC) {
>> sg_cpu->iowait_boost = 0;
>> + sg_cpu->iowait_boost_pending = false;
>> + }
>
> We don't really need to clear this flag here as we are already making
> iowait_boost as 0 and that's what we check while using boost.
Hmm, I would rather clear this flag here and in the loop in
sugov_next_freq_shared since it keeps the flag in sync with what's
happening and is less confusing IMHO.
>
>> }
>> }
>>
>> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
>> unsigned long *max)
>> {
>> - unsigned long boost_util = sg_cpu->iowait_boost;
>> - unsigned long boost_max = sg_cpu->iowait_boost_max;
>> + unsigned long boost_util, boost_max;
>>
>> - if (!boost_util)
>> + if (!sg_cpu->iowait_boost)
>> return;
>>
>> + boost_util = sg_cpu->iowait_boost;
>> + boost_max = sg_cpu->iowait_boost_max;
>> +
>
> The above changes are not required anymore (and were required only
> with my patch).
Yep, I'll drop it.
>> if (*util * boost_max < *max * boost_util) {
>> *util = boost_util;
>> *max = boost_max;
>> }
>> - sg_cpu->iowait_boost >>= 1;
>> +
>> + if (sg_cpu->iowait_boost_pending) {
>> + sg_cpu->iowait_boost_pending = false;
>> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
>> + sg_cpu->iowait_boost_max);
>
> Now this has a problem. We will also boost after waiting for
> rate_limit_us. And that's why I had proposed the tricky solution in
Not really unless rate_limit_us is < TICK_NSEC? Once TICK_NSEC
elapses, we would clear the boost in sugov_set_iowait_boost and in
sugov_next_freq_shared.
> the first place. I thought we wanted to avoid instant boost only for
> the first iteration, but after that we wanted to do it ASAP. Isn't it?
>
> Now that you are using policy->min instead of policy->cur, we can
> simplify the solution I proposed and always do 2 * iowait_boost before
No, doubling on the first boost was never discussed or intended in my
earlier patches. I thought even your patch never did, you were
dividing by 2, and then scaling it back up by 2 before consuming it to
preserve the initial boost.
> getting current util/max in above if loop. i.e. we will start iowait
> boost with min * 2 instead of min and that should be fine.
Hmm, but why start from double of min? Why not just min? It doesn't
make any difference to the intended behavior itself and is also
consistent with my proposal in RFC v4. Also I feel what you're
suggesting is more spike prone as well, the idea was to start from the
minimum and double it as we go, not to double the min the first go.
That was never intended.
Also I would rather keep the "set and use and set and use" pattern to
keep the logic less confusing and clean IMO.
So we set initial boost in sugov_set_iowait_boost, and then in
sugov_iowait_boost we use it, and then set the boost for the next time
around at the end of sugov_iowait_boost (that is we double it). Next
time sugov_set_iowait_boost wouldn't touch the boost whether iowait
flag is set or not and we would continue into sugov_iowait_boost to
consume the boost. This would have a small delay in reducing the
boost, but that's Ok since its only one cycle of delay, and keeps the
code clean. I assume the last part is not an issue considering you're
proposing double of the initial boost anyway ;-)
thanks,
-Joel
On 17-07-17, 10:35, Joel Fernandes wrote:
> On Mon, Jul 17, 2017 at 1:04 AM, Viresh Kumar <[email protected]> wrote:
> > On 16-07-17, 01:04, Joel Fernandes wrote:
> >> + if (sg_cpu->iowait_boost_pending) {
> >> + sg_cpu->iowait_boost_pending = false;
> >> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> >> + sg_cpu->iowait_boost_max);
> >
> > Now this has a problem. We will also boost after waiting for
s/also/always/
> > rate_limit_us. And that's why I had proposed the tricky solution in
>
> Not really unless rate_limit_us is < TICK_NSEC? Once TICK_NSEC
> elapses, we would clear the boost in sugov_set_iowait_boost and in
> sugov_next_freq_shared.
You misread it and I know why it happened. And so I have sent a small
patch to make it a bit more readable.
rate_limit_us is associated with "last_freq_update_time", while
iowait-boost is associated with "last_update".
And last_update gets updated way too often.
> > the first place. I thought we wanted to avoid instant boost only for
> > the first iteration, but after that we wanted to do it ASAP. Isn't it?
> >
> > Now that you are using policy->min instead of policy->cur, we can
> > simplify the solution I proposed and always do 2 * iowait_boost before
>
> No, doubling on the first boost was never discussed or intended in my
> earlier patches. I thought even your patch never did, you were
> dividing by 2, and then scaling it back up by 2 before consuming it to
> preserve the initial boost.
>
> > getting current util/max in above if loop. i.e. we will start iowait
> > boost with min * 2 instead of min and that should be fine.
>
> Hmm, but why start from double of min? Why not just min? It doesn't
> make any difference to the intended behavior itself and is also
> consistent with my proposal in RFC v4. Also I feel what you're
> suggesting is more spike prone as well, the idea was to start from the
> minimum and double it as we go, not to double the min the first go.
> That was never intended.
>
> Also I would rather keep the "set and use and set and use" pattern to
> keep the logic less confusing and clean IMO.
> So we set initial boost in sugov_set_iowait_boost, and then in
> sugov_iowait_boost we use it, and then set the boost for the next time
> around at the end of sugov_iowait_boost (that is we double it). Next
> time sugov_set_iowait_boost wouldn't touch the boost whether iowait
> flag is set or not and we would continue into sugov_iowait_boost to
> consume the boost. This would have a small delay in reducing the
> boost, but that's Ok since its only one cycle of delay, and keeps the
> code clean. I assume the last part is not an issue considering you're
> proposing double of the initial boost anyway ;-)
Okay, let me try to explain the problem first and then you can propose
a solution if required.
Expected Behavior:
(Window refers to a time window of rate_limit_us here)
A. The first window where IOWAIT flag is set, we set boost to min-freq
and that shall be used for next freq update in
sugov_iowait_boost(). Any more calls to sugov_set_iowait_boost()
within this window shouldn't change the behavior.
B. If the next window also has IOWAIT flag set, then
sugov_iowait_boost() should use iowait*2 for freq update.
C. If a window doesn't have IOWAIT flag set, then sugov_iowait_boost()
should use iowait/2 in it.
Do they look fine to you?
Now coming to how will system behave with your patch:
A. would be fine. We will follow things properly.
But B. and C. aren't true anymore.
This happened because after the first window we updated iowait_boost
as 2*min unconditionally and the next window will *always* use that,
even if the flag isn't set. And we may end up increasing the frequency
unnecessarily, i.e. the spike where this discussion started.
And so in my initial solution I reversed the order in
sugov_iowait_boost().
--
viresh
Hi,
On 18/07/17 11:15, Viresh Kumar wrote:
> On 17-07-17, 10:35, Joel Fernandes wrote:
> > On Mon, Jul 17, 2017 at 1:04 AM, Viresh Kumar <[email protected]> wrote:
> > > On 16-07-17, 01:04, Joel Fernandes wrote:
>
> > >> + if (sg_cpu->iowait_boost_pending) {
> > >> + sg_cpu->iowait_boost_pending = false;
> > >> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> > >> + sg_cpu->iowait_boost_max);
> > >
> > > Now this has a problem. We will also boost after waiting for
>
> s/also/always/
>
> > > rate_limit_us. And that's why I had proposed the tricky solution in
> >
> > Not really unless rate_limit_us is < TICK_NSEC? Once TICK_NSEC
> > elapses, we would clear the boost in sugov_set_iowait_boost and in
> > sugov_next_freq_shared.
>
> You misread it and I know why it happened. And so I have sent a small
> patch to make it a bit more readable.
>
> rate_limit_us is associated with "last_freq_update_time", while
> iowait-boost is associated with "last_update".
>
> And last_update gets updated way too often.
>
> > > the first place. I thought we wanted to avoid instant boost only for
> > > the first iteration, but after that we wanted to do it ASAP. Isn't it?
> > >
> > > Now that you are using policy->min instead of policy->cur, we can
> > > simplify the solution I proposed and always do 2 * iowait_boost before
> >
> > No, doubling on the first boost was never discussed or intended in my
> > earlier patches. I thought even your patch never did, you were
> > dividing by 2, and then scaling it back up by 2 before consuming it to
> > preserve the initial boost.
> >
> > > getting current util/max in above if loop. i.e. we will start iowait
> > > boost with min * 2 instead of min and that should be fine.
> >
> > Hmm, but why start from double of min? Why not just min? It doesn't
> > make any difference to the intended behavior itself and is also
> > consistent with my proposal in RFC v4. Also I feel what you're
> > suggesting is more spike prone as well, the idea was to start from the
> > minimum and double it as we go, not to double the min the first go.
> > That was never intended.
> >
> > Also I would rather keep the "set and use and set and use" pattern to
> > keep the logic less confusing and clean IMO.
> > So we set initial boost in sugov_set_iowait_boost, and then in
> > sugov_iowait_boost we use it, and then set the boost for the next time
> > around at the end of sugov_iowait_boost (that is we double it). Next
> > time sugov_set_iowait_boost wouldn't touch the boost whether iowait
> > flag is set or not and we would continue into sugov_iowait_boost to
> > consume the boost. This would have a small delay in reducing the
> > boost, but that's Ok since its only one cycle of delay, and keeps the
> > code clean. I assume the last part is not an issue considering you're
> > proposing double of the initial boost anyway ;-)
>
> Okay, let me try to explain the problem first and then you can propose
> a solution if required.
>
> Expected Behavior:
>
> (Window refers to a time window of rate_limit_us here)
>
> A. The first window where IOWAIT flag is set, we set boost to min-freq
> and that shall be used for next freq update in
> sugov_iowait_boost(). Any more calls to sugov_set_iowait_boost()
> within this window shouldn't change the behavior.
>
> B. If the next window also has IOWAIT flag set, then
> sugov_iowait_boost() should use iowait*2 for freq update.
>
> C. If a window doesn't have IOWAIT flag set, then sugov_iowait_boost()
> should use iowait/2 in it.
>
>
> Do they look fine to you?
>
> Now coming to how will system behave with your patch:
>
> A. would be fine. We will follow things properly.
>
> But B. and C. aren't true anymore.
>
> This happened because after the first window we updated iowait_boost
> as 2*min unconditionally and the next window will *always* use that,
> even if the flag isn't set. And we may end up increasing the frequency
> unnecessarily, i.e. the spike where this discussion started.
>
Mmm, seems to make sense to me. :/
Would the following work (on top of Joel's v5)? Rationale being that
only in sugov_set_iowait_boost we might bump freq up (if no iowait_boost
was set) or start from policy->min. In sugov_iowait_boost (consumer)
instead we do the decay (if no boosting was pending).
---
kernel/sched/cpufreq_schedutil.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 46b2479641cc..b270563c15a5 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -171,8 +171,14 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
{
if (flags & SCHED_CPUFREQ_IOWAIT) {
sg_cpu->iowait_boost_pending = true;
- sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
- sg_cpu->sg_policy->policy->min);
+ if (sg_cpu->iowait_boost) {
+ /* Bump up 2*current_boost until hitting max */
+ sg_cpu->iowait_boost = max(sg_cpu->iowait_boost << 1,
+ sg_cpu->iowait_boost_max);
+ } else {
+ /* Start from policy->min */
+ sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
+ }
} else if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;
@@ -192,6 +198,17 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
if (!sg_cpu->iowait_boost)
return;
+ if (sg_cpu->iowait_boost_pending) {
+ /*
+ * Record consumption of current boost value
+ * (set by sugov_set_iowait_boost).
+ */
+ sg_cpu->iowait_boost_pending = false;
+ } else {
+ /* Decay boost */
+ sg_cpu->iowait_boost >>= 1;
+ }
+
boost_util = sg_cpu->iowait_boost;
boost_max = sg_cpu->iowait_boost_max;
@@ -199,14 +216,6 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
*util = boost_util;
*max = boost_max;
}
-
- if (sg_cpu->iowait_boost_pending) {
- sg_cpu->iowait_boost_pending = false;
- sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
- sg_cpu->iowait_boost_max);
- } else {
- sg_cpu->iowait_boost >>= 1;
- }
}
#ifdef CONFIG_NO_HZ_COMMON
--
2.11.0
Hi Viresh,
I appreciate the discussion.
On Mon, Jul 17, 2017 at 10:45 PM, Viresh Kumar <[email protected]> wrote:
> On 17-07-17, 10:35, Joel Fernandes wrote:
>> On Mon, Jul 17, 2017 at 1:04 AM, Viresh Kumar <[email protected]> wrote:
>> > On 16-07-17, 01:04, Joel Fernandes wrote:
>
>> >> + if (sg_cpu->iowait_boost_pending) {
>> >> + sg_cpu->iowait_boost_pending = false;
>> >> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
>> >> + sg_cpu->iowait_boost_max);
>> >
>> > Now this has a problem. We will also boost after waiting for
>
> s/also/always/
>
>> > rate_limit_us. And that's why I had proposed the tricky solution in
>>
>> Not really unless rate_limit_us is < TICK_NSEC? Once TICK_NSEC
>> elapses, we would clear the boost in sugov_set_iowait_boost and in
>> sugov_next_freq_shared.
>
> You misread it and I know why it happened. And so I have sent a small
> patch to make it a bit more readable.
>
> rate_limit_us is associated with "last_freq_update_time", while
> iowait-boost is associated with "last_update".
>
> And last_update gets updated way too often.
>
>> > the first place. I thought we wanted to avoid instant boost only for
>> > the first iteration, but after that we wanted to do it ASAP. Isn't it?
>> >
>> > Now that you are using policy->min instead of policy->cur, we can
>> > simplify the solution I proposed and always do 2 * iowait_boost before
>>
>> No, doubling on the first boost was never discussed or intended in my
>> earlier patches. I thought even your patch never did, you were
>> dividing by 2, and then scaling it back up by 2 before consuming it to
>> preserve the initial boost.
>>
>> > getting current util/max in above if loop. i.e. we will start iowait
>> > boost with min * 2 instead of min and that should be fine.
>>
>> Hmm, but why start from double of min? Why not just min? It doesn't
>> make any difference to the intended behavior itself and is also
>> consistent with my proposal in RFC v4. Also I feel what you're
>> suggesting is more spike prone as well, the idea was to start from the
>> minimum and double it as we go, not to double the min the first go.
>> That was never intended.
>>
>> Also I would rather keep the "set and use and set and use" pattern to
>> keep the logic less confusing and clean IMO.
>> So we set initial boost in sugov_set_iowait_boost, and then in
>> sugov_iowait_boost we use it, and then set the boost for the next time
>> around at the end of sugov_iowait_boost (that is we double it). Next
>> time sugov_set_iowait_boost wouldn't touch the boost whether iowait
>> flag is set or not and we would continue into sugov_iowait_boost to
>> consume the boost. This would have a small delay in reducing the
>> boost, but that's Ok since its only one cycle of delay, and keeps the
>> code clean. I assume the last part is not an issue considering you're
>> proposing double of the initial boost anyway ;-)
>
> Okay, let me try to explain the problem first and then you can propose
> a solution if required.
>
> Expected Behavior:
>
> (Window refers to a time window of rate_limit_us here)
>
> A. The first window where IOWAIT flag is set, we set boost to min-freq
> and that shall be used for next freq update in
> sugov_iowait_boost(). Any more calls to sugov_set_iowait_boost()
> within this window shouldn't change the behavior.
>
> B. If the next window also has IOWAIT flag set, then
> sugov_iowait_boost() should use iowait*2 for freq update.
>
> C. If a window doesn't have IOWAIT flag set, then sugov_iowait_boost()
> should use iowait/2 in it.
>
>
> Do they look fine to you?
>
> Now coming to how will system behave with your patch:
>
> A. would be fine. We will follow things properly.
>
> But B. and C. aren't true anymore.
>
> This happened because after the first window we updated iowait_boost
> as 2*min unconditionally and the next window will *always* use that,
> even if the flag isn't set. And we may end up increasing the frequency
> unnecessarily, i.e. the spike where this discussion started.
Not really, to me B will still work because in the case the flag is
set, we are correctly double boosting in the next cycle.
Taking an example, with B = flag is set and D = flag is not set
F = Fmin (minimum)
iowait flag B B B D D D
resulting boost F 2*F 4*F 4*F 2*F F
What will not work is C but as I mentioned in my last email, that
would cause us to delay the iowait boost halving by at most 1 cycle,
is that really an issue considering we are starting from min compared
to max? Note that cases A. and B. are still working.
Considering the following cases:
(1) min freq is 800MHz, it takes upto 4 cycles to reach 4GHz where the
flag is set. At this point I think its likely we will run for many
more cycles which means keeping the boost active for 1 extra cycle
isn't that big a deal. Even if run for just 5 cycles with boost, that
means only the last cycle will suffer from C not decaying as soon as
possible. Comparing that to the case where in current code we
currently run at max from the first cycle, its not that bad.
(2) we have a transient type of thing, in this case we're probably not
reaching the full max immediately so even if we delay the decaying,
its still not as bad as what it is currently.
I think considering that the code is way cleaner than any other
approach - its a small price to pay. Also keeping in mind that this
patch is still an improvement over the current spike, even though as
you said its still a bit spikey, but its still better right?
Functionally the code is working and I think is also clean, but if you
feel that its still confusing, then I'm open to rewriting it.
>
> And so in my initial solution I reversed the order in
> sugov_iowait_boost().
Yes, but to fix A. you had to divide by 2 in sugov_set_iowait_boost,
and then multiply by 2 later in sugov_iowait_boost to keep the first
boost at min. That IMO was confusing so my modified patch did it
differently.
thanks,
-Joel
On 18-07-17, 15:02, Juri Lelli wrote:
> Mmm, seems to make sense to me. :/
>
> Would the following work (on top of Joel's v5)? Rationale being that
> only in sugov_set_iowait_boost we might bump freq up (if no iowait_boost
> was set) or start from policy->min. In sugov_iowait_boost (consumer)
> instead we do the decay (if no boosting was pending).
>
> ---
> kernel/sched/cpufreq_schedutil.c | 29 +++++++++++++++++++----------
> 1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 46b2479641cc..b270563c15a5 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -171,8 +171,14 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> sg_cpu->iowait_boost_pending = true;
> - sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
> - sg_cpu->sg_policy->policy->min);
> + if (sg_cpu->iowait_boost) {
> + /* Bump up 2*current_boost until hitting max */
> + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost << 1,
> + sg_cpu->iowait_boost_max);
And we are back at where we started :)
This wouldn't work because sugov_set_iowait_boost() gets called a lot.
Maybe 10 times within a rate_limit_us period.
The thumb rule is, never double/half the boost from
sugov_set_iowait_boost() :)
--
viresh
On Tue, Jul 18, 2017 at 10:37 PM, Viresh Kumar <[email protected]> wrote:
> On 18-07-17, 15:02, Juri Lelli wrote:
>> Mmm, seems to make sense to me. :/
>>
>> Would the following work (on top of Joel's v5)? Rationale being that
>> only in sugov_set_iowait_boost we might bump freq up (if no iowait_boost
>> was set) or start from policy->min. In sugov_iowait_boost (consumer)
>> instead we do the decay (if no boosting was pending).
>>
>> ---
>> kernel/sched/cpufreq_schedutil.c | 29 +++++++++++++++++++----------
>> 1 file changed, 19 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 46b2479641cc..b270563c15a5 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -171,8 +171,14 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> {
>> if (flags & SCHED_CPUFREQ_IOWAIT) {
>> sg_cpu->iowait_boost_pending = true;
>> - sg_cpu->iowait_boost = max(sg_cpu->iowait_boost,
>> - sg_cpu->sg_policy->policy->min);
>> + if (sg_cpu->iowait_boost) {
>> + /* Bump up 2*current_boost until hitting max */
>> + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost << 1,
>> + sg_cpu->iowait_boost_max);
>
> And we are back at where we started :)
>
> This wouldn't work because sugov_set_iowait_boost() gets called a lot.
> Maybe 10 times within a rate_limit_us period.
>
> The thumb rule is, never double/half the boost from
> sugov_set_iowait_boost() :)
Ok so tomorrow I'll post something slightly different. In
sugov_iowait_boost, if the flag is set and current iowait_boost < min,
then set it to min, otherwise double it. If the flag is not set, halve
it. And both these would be done, *before* consuming the boost (as in
Viresh's last patch). I think that's reasonable and handles all cases.
Hopefully that will put it to rest, let me know if any objections.
thanks,
-Joel
On 18-07-17, 21:39, Joel Fernandes wrote:
> Not really, to me B will still work because in the case the flag is
> set, we are correctly double boosting in the next cycle.
>
> Taking an example, with B = flag is set and D = flag is not set
>
> F = Fmin (minimum)
>
> iowait flag B B B D D D
> resulting boost F 2*F 4*F 4*F 2*F F
What about this ?
iowait flag B D B D B D
resulting boost F 2*F F 2*F F 2*F
Isn't this the worst behavior we may wish for ?
> What will not work is C but as I mentioned in my last email, that
> would cause us to delay the iowait boost halving by at most 1 cycle,
> is that really an issue considering we are starting from min compared
> to max? Note that cases A. and B. are still working.
>
> Considering the following cases:
> (1) min freq is 800MHz, it takes upto 4 cycles to reach 4GHz where the
> flag is set. At this point I think its likely we will run for many
> more cycles which means keeping the boost active for 1 extra cycle
> isn't that big a deal. Even if run for just 5 cycles with boost, that
> means only the last cycle will suffer from C not decaying as soon as
> possible. Comparing that to the case where in current code we
> currently run at max from the first cycle, its not that bad.
>
> (2) we have a transient type of thing, in this case we're probably not
> reaching the full max immediately so even if we delay the decaying,
> its still not as bad as what it is currently.
>
> I think considering that the code is way cleaner than any other
> approach - its a small price to pay. Also keeping in mind that this
> patch is still an improvement over the current spike, even though as
> you said its still a bit spikey, but its still better right?
>
> Functionally the code is working and I think is also clean, but if you
> feel that its still confusing, then I'm open to rewriting it.
I am not worried for being boosted for a bit more time, but with the
spikes even when we do not want a freq change.
> > And so in my initial solution I reversed the order in
> > sugov_iowait_boost().
>
> Yes, but to fix A. you had to divide by 2 in sugov_set_iowait_boost,
> and then multiply by 2 later in sugov_iowait_boost to keep the first
> boost at min. That IMO was confusing so my modified patch did it
> differently.
Yeah, it wasn't great for sure.
I hope the following one will work for everyone ?
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 45fcf21ad685..ceac5f72d8da 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -53,6 +53,7 @@ struct sugov_cpu {
struct update_util_data update_util;
struct sugov_policy *sg_policy;
+ bool iowait_boost_pending;
unsigned long iowait_boost;
unsigned long iowait_boost_max;
u64 last_update;
@@ -169,7 +170,17 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
unsigned int flags)
{
if (flags & SCHED_CPUFREQ_IOWAIT) {
- sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
+ if (sg_cpu->iowait_boost_pending)
+ return;
+
+ sg_cpu->iowait_boost_pending = true;
+
+ if (sg_cpu->iowait_boost) {
+ sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
+ sg_cpu->iowait_boost_max);
+ } else {
+ sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
+ }
} else if (sg_cpu->iowait_boost) {
s64 delta_ns = time - sg_cpu->last_update;
@@ -182,17 +193,23 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
unsigned long *max)
{
- unsigned long boost_util = sg_cpu->iowait_boost;
- unsigned long boost_max = sg_cpu->iowait_boost_max;
+ unsigned long boost_util, boost_max;
- if (!boost_util)
+ if (!sg_cpu->iowait_boost)
return;
+ if (sg_cpu->iowait_boost_pending)
+ sg_cpu->iowait_boost_pending = false;
+ else
+ sg_cpu->iowait_boost >>= 1;
+
+ boost_util = sg_cpu->iowait_boost;
+ boost_max = sg_cpu->iowait_boost_max;
+
if (*util * boost_max < *max * boost_util) {
*util = boost_util;
*max = boost_max;
}
- sg_cpu->iowait_boost >>= 1;
}
#ifdef CONFIG_NO_HZ_COMMON
--
viresh
Hi Viresh,
On Tue, Jul 18, 2017 at 11:19 PM, Viresh Kumar <[email protected]> wrote:
> On 18-07-17, 21:39, Joel Fernandes wrote:
>> Not really, to me B will still work because in the case the flag is
>> set, we are correctly double boosting in the next cycle.
>>
>> Taking an example, with B = flag is set and D = flag is not set
>>
>> F = Fmin (minimum)
>>
>> iowait flag B B B D D D
>> resulting boost F 2*F 4*F 4*F 2*F F
>
> What about this ?
>
> iowait flag B D B D B D
> resulting boost F 2*F F 2*F F 2*F
Yes I guess so but this oscillation can still happen in current code I think.
>
> Isn't this the worst behavior we may wish for ?
>
>> What will not work is C but as I mentioned in my last email, that
>> would cause us to delay the iowait boost halving by at most 1 cycle,
>> is that really an issue considering we are starting from min compared
>> to max? Note that cases A. and B. are still working.
>>
>> Considering the following cases:
>> (1) min freq is 800MHz, it takes upto 4 cycles to reach 4GHz where the
>> flag is set. At this point I think its likely we will run for many
>> more cycles which means keeping the boost active for 1 extra cycle
>> isn't that big a deal. Even if run for just 5 cycles with boost, that
>> means only the last cycle will suffer from C not decaying as soon as
>> possible. Comparing that to the case where in current code we
>> currently run at max from the first cycle, its not that bad.
>>
>> (2) we have a transient type of thing, in this case we're probably not
>> reaching the full max immediately so even if we delay the decaying,
>> its still not as bad as what it is currently.
>>
>> I think considering that the code is way cleaner than any other
>> approach - its a small price to pay. Also keeping in mind that this
>> patch is still an improvement over the current spike, even though as
>> you said its still a bit spikey, but its still better right?
>>
>> Functionally the code is working and I think is also clean, but if you
>> feel that its still confusing, then I'm open to rewriting it.
>
> I am not worried for being boosted for a bit more time, but with the
> spikes even when we do not want a freq change.
>
>> > And so in my initial solution I reversed the order in
>> > sugov_iowait_boost().
>>
>> Yes, but to fix A. you had to divide by 2 in sugov_set_iowait_boost,
>> and then multiply by 2 later in sugov_iowait_boost to keep the first
>> boost at min. That IMO was confusing so my modified patch did it
>> differently.
>
> Yeah, it wasn't great for sure.
>
> I hope the following one will work for everyone ?
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 45fcf21ad685..ceac5f72d8da 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,6 +53,7 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> + bool iowait_boost_pending;
> unsigned long iowait_boost;
> unsigned long iowait_boost_max;
> u64 last_update;
> @@ -169,7 +170,17 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + if (sg_cpu->iowait_boost_pending)
> + return;
> +
> + sg_cpu->iowait_boost_pending = true;
> +
> + if (sg_cpu->iowait_boost) {
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> + sg_cpu->iowait_boost_max);
> + } else {
> + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
> + }
I would prefer this to be:
if (sg_cpu->iowait_boost >= policy->min) {
// double it
} else {
// set it to min
}
This is for the case when boost happens all the way, then its capped
at max, but when its decayed back, its not exactly decayed to Fmin but
lower than it, so in that case when boost next time we start from
Fmin.
> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> @@ -182,17 +193,23 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> unsigned long *max)
> {
> - unsigned long boost_util = sg_cpu->iowait_boost;
> - unsigned long boost_max = sg_cpu->iowait_boost_max;
> + unsigned long boost_util, boost_max;
>
> - if (!boost_util)
> + if (!sg_cpu->iowait_boost)
> return;
>
> + if (sg_cpu->iowait_boost_pending)
> + sg_cpu->iowait_boost_pending = false;
> + else
> + sg_cpu->iowait_boost >>= 1;
> +
> + boost_util = sg_cpu->iowait_boost;
> + boost_max = sg_cpu->iowait_boost_max;
> +
> if (*util * boost_max < *max * boost_util) {
> *util = boost_util;
> *max = boost_max;
This looks good to me and is kind of what I had in mind. I can spend
some time testing it soon. Just to be clear if I were to repost this
patch after testing, should I have your authorship and my tested-by or
do you prefer something else?
thanks,
-Joel
On 19-07-17, 19:38, Joel Fernandes wrote:
> On Tue, Jul 18, 2017 at 11:19 PM, Viresh Kumar <[email protected]> wrote:
> > On 18-07-17, 21:39, Joel Fernandes wrote:
> >> Not really, to me B will still work because in the case the flag is
> >> set, we are correctly double boosting in the next cycle.
> >>
> >> Taking an example, with B = flag is set and D = flag is not set
> >>
> >> F = Fmin (minimum)
> >>
> >> iowait flag B B B D D D
> >> resulting boost F 2*F 4*F 4*F 2*F F
> >
> > What about this ?
> >
> > iowait flag B D B D B D
> > resulting boost F 2*F F 2*F F 2*F
>
> Yes I guess so but this oscillation can still happen in current code I think.
How ?
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 45fcf21ad685..ceac5f72d8da 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -53,6 +53,7 @@ struct sugov_cpu {
> > struct update_util_data update_util;
> > struct sugov_policy *sg_policy;
> >
> > + bool iowait_boost_pending;
> > unsigned long iowait_boost;
> > unsigned long iowait_boost_max;
> > u64 last_update;
> > @@ -169,7 +170,17 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > unsigned int flags)
> > {
> > if (flags & SCHED_CPUFREQ_IOWAIT) {
> > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> > + if (sg_cpu->iowait_boost_pending)
> > + return;
> > +
> > + sg_cpu->iowait_boost_pending = true;
> > +
> > + if (sg_cpu->iowait_boost) {
> > + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
> > + sg_cpu->iowait_boost_max);
> > + } else {
> > + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
> > + }
>
> I would prefer this to be:
>
> if (sg_cpu->iowait_boost >= policy->min) {
> // double it
> } else {
> // set it to min
> }
>
> This is for the case when boost happens all the way, then its capped
> at max, but when its decayed back, its not exactly decayed to Fmin but
> lower than it, so in that case when boost next time we start from
> Fmin.
Actually you can add another patch first which makes iowait_boost as 0
when it goes below min as that problem exists today as well.
And this patch would be fine then as is ?
> > } else if (sg_cpu->iowait_boost) {
> > s64 delta_ns = time - sg_cpu->last_update;
> >
> > @@ -182,17 +193,23 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> > static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> > unsigned long *max)
> > {
> > - unsigned long boost_util = sg_cpu->iowait_boost;
> > - unsigned long boost_max = sg_cpu->iowait_boost_max;
> > + unsigned long boost_util, boost_max;
> >
> > - if (!boost_util)
> > + if (!sg_cpu->iowait_boost)
> > return;
> >
> > + if (sg_cpu->iowait_boost_pending)
> > + sg_cpu->iowait_boost_pending = false;
> > + else
> > + sg_cpu->iowait_boost >>= 1;
> > +
> > + boost_util = sg_cpu->iowait_boost;
> > + boost_max = sg_cpu->iowait_boost_max;
> > +
> > if (*util * boost_max < *max * boost_util) {
> > *util = boost_util;
> > *max = boost_max;
>
> This looks good to me and is kind of what I had in mind. I can spend
> some time testing it soon. Just to be clear if I were to repost this
> patch after testing, should I have your authorship and my tested-by or
> do you prefer something else?
You can keep your authorship I wouldn't mind. Maybe a suggested-by at
max would be fine.
--
viresh
Hi Viresh,
On Wed, Jul 19, 2017 at 8:41 PM, Viresh Kumar <[email protected]> wrote:
> On 19-07-17, 19:38, Joel Fernandes wrote:
>> On Tue, Jul 18, 2017 at 11:19 PM, Viresh Kumar <[email protected]> wrote:
>> > On 18-07-17, 21:39, Joel Fernandes wrote:
>> >> Not really, to me B will still work because in the case the flag is
>> >> set, we are correctly double boosting in the next cycle.
>> >>
>> >> Taking an example, with B = flag is set and D = flag is not set
>> >>
>> >> F = Fmin (minimum)
>> >>
>> >> iowait flag B B B D D D
>> >> resulting boost F 2*F 4*F 4*F 2*F F
>> >
>> > What about this ?
>> >
>> > iowait flag B D B D B D
>> > resulting boost F 2*F F 2*F F 2*F
>>
>> Yes I guess so but this oscillation can still happen in current code I think.
>
> How ?
Yes you're right, its not an issue with current code.
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index 45fcf21ad685..ceac5f72d8da 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -53,6 +53,7 @@ struct sugov_cpu {
>> > struct update_util_data update_util;
>> > struct sugov_policy *sg_policy;
>> >
>> > + bool iowait_boost_pending;
>> > unsigned long iowait_boost;
>> > unsigned long iowait_boost_max;
>> > u64 last_update;
>> > @@ -169,7 +170,17 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> > unsigned int flags)
>> > {
>> > if (flags & SCHED_CPUFREQ_IOWAIT) {
>> > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>> > + if (sg_cpu->iowait_boost_pending)
>> > + return;
>> > +
>> > + sg_cpu->iowait_boost_pending = true;
>> > +
>> > + if (sg_cpu->iowait_boost) {
>> > + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1,
>> > + sg_cpu->iowait_boost_max);
>> > + } else {
>> > + sg_cpu->iowait_boost = sg_cpu->sg_policy->policy->min;
>> > + }
>>
>> I would prefer this to be:
>>
>> if (sg_cpu->iowait_boost >= policy->min) {
>> // double it
>> } else {
>> // set it to min
>> }
>>
>> This is for the case when boost happens all the way, then its capped
>> at max, but when its decayed back, its not exactly decayed to Fmin but
>> lower than it, so in that case when boost next time we start from
>> Fmin.
>
> Actually you can add another patch first which makes iowait_boost as 0
> when it goes below min as that problem exists today as well.
>
> And this patch would be fine then as is ?
Yes I think that's fine, I thought about it some more and I think this
can be an issue in a scenario where
iowait_boost_max < policy->min but:
(iowait_boost / iowait_boost_max) > (rq->cfs.avg.util_avg /
arch_scale_cpu_capacity)
This is probably not a common case in current real world cases but if
iowait_boost_max is say way less than arch_scale_cpu_capacity for some
reason in the future, then it can be an issue I think. I'll post a
patch for it.
>
>> > } else if (sg_cpu->iowait_boost) {
>> > s64 delta_ns = time - sg_cpu->last_update;
>> >
>> > @@ -182,17 +193,23 @@ static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>> > static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
>> > unsigned long *max)
>> > {
>> > - unsigned long boost_util = sg_cpu->iowait_boost;
>> > - unsigned long boost_max = sg_cpu->iowait_boost_max;
>> > + unsigned long boost_util, boost_max;
>> >
>> > - if (!boost_util)
>> > + if (!sg_cpu->iowait_boost)
>> > return;
>> >
>> > + if (sg_cpu->iowait_boost_pending)
>> > + sg_cpu->iowait_boost_pending = false;
>> > + else
>> > + sg_cpu->iowait_boost >>= 1;
>> > +
>> > + boost_util = sg_cpu->iowait_boost;
>> > + boost_max = sg_cpu->iowait_boost_max;
>> > +
>> > if (*util * boost_max < *max * boost_util) {
>> > *util = boost_util;
>> > *max = boost_max;
>>
>> This looks good to me and is kind of what I had in mind. I can spend
>> some time testing it soon. Just to be clear if I were to repost this
>> patch after testing, should I have your authorship and my tested-by or
>> do you prefer something else?
>
> You can keep your authorship I wouldn't mind. Maybe a suggested-by at
> max would be fine.
>
Cool, will do. Thanks a lot Viresh.
thanks,
-Joel
On 20-07-17, 12:49, Joel Fernandes wrote:
> Yes I think that's fine, I thought about it some more and I think this
> can be an issue in a scenario where
>
> iowait_boost_max < policy->min but:
We will never have this case as boost-max is set to cpuinfo.max_freq.
--
viresh
On Thu, Jul 20, 2017 at 9:09 PM, Viresh Kumar <[email protected]> wrote:
> On 20-07-17, 12:49, Joel Fernandes wrote:
>> Yes I think that's fine, I thought about it some more and I think this
>> can be an issue in a scenario where
>>
>> iowait_boost_max < policy->min but:
Uhh I meant to say here iowait_boost < policy->min. Sorry.
> We will never have this case as boost-max is set to cpuinfo.max_freq.
But you're right it can't be an issue in current code. I was just
thinking of future proofing it incase someone decided to lower the
boost-max in the code for whatever reason and forgets to handle this.
thanks,
-Joel