Currently iowait doesn't distinguish background/foreground tasks.
This may not be a problem in servers but could impact devices that
are more sensitive on energy consumption. [1]
In Android world, we have seen many cases where device run to high and
inefficient frequency unnecessarily when running some background task
which are interacting I/O.
The ultimate goal is to only apply iowait boost on UX related tasks and
leave the rest for EAS scheduler for better efficiency.
This patch limits iowait boost for tasks which are associated with boost
signal from user land. On Android specifically, those are foreground or
top app tasks.
[1] ANDROID discussion:
https://android-review.googlesource.com/c/kernel/common/+/1215695
Signed-off-by: Wei Wang <[email protected]>
---
kernel/sched/fair.c | 4 +++-
kernel/sched/sched.h | 12 ++++++++++++
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ba749f579714..221c0fbcea0e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5220,8 +5220,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* If in_iowait is set, the code below may not trigger any cpufreq
* utilization updates, so do it here explicitly with the IOWAIT flag
* passed.
+ * If CONFIG_ENERGY_MODEL and CONFIG_UCLAMP_TASK are both configured,
+ * only boost for tasks set with non-null min-clamp.
*/
- if (p->in_iowait)
+ if (iowait_boosted(p))
cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
for_each_sched_entity(se) {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..a13d49d835ed 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2341,6 +2341,18 @@ static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
}
#endif /* CONFIG_UCLAMP_TASK */
+#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK)
+static inline bool iowait_boosted(struct task_struct *p)
+{
+ return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
+}
+#else /* defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK) */
+static inline bool iowait_boosted(struct task_struct *p)
+{
+ return p->in_iowait;
+}
+#endif /* defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK) */
+
#ifdef arch_scale_freq_capacity
# ifndef arch_scale_freq_invariant
# define arch_scale_freq_invariant() true
--
2.25.0.341.g760bfbb309-goog
On 01/23/20 16:28, Wei Wang wrote:
> Currently iowait doesn't distinguish background/foreground tasks.
> This may not be a problem in servers but could impact devices that
> are more sensitive on energy consumption. [1]
> In Android world, we have seen many cases where device run to high and
> inefficient frequency unnecessarily when running some background task
> which are interacting I/O.
>
> The ultimate goal is to only apply iowait boost on UX related tasks and
> leave the rest for EAS scheduler for better efficiency.
>
> This patch limits iowait boost for tasks which are associated with boost
> signal from user land. On Android specifically, those are foreground or
> top app tasks.
>
> [1] ANDROID discussion:
> https://android-review.googlesource.com/c/kernel/common/+/1215695
>
> Signed-off-by: Wei Wang <[email protected]>
> ---
> kernel/sched/fair.c | 4 +++-
> kernel/sched/sched.h | 12 ++++++++++++
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ba749f579714..221c0fbcea0e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5220,8 +5220,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> * If in_iowait is set, the code below may not trigger any cpufreq
> * utilization updates, so do it here explicitly with the IOWAIT flag
> * passed.
> + * If CONFIG_ENERGY_MODEL and CONFIG_UCLAMP_TASK are both configured,
> + * only boost for tasks set with non-null min-clamp.
> */
> - if (p->in_iowait)
> + if (iowait_boosted(p))
> cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
>
> for_each_sched_entity(se) {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 280a3c735935..a13d49d835ed 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2341,6 +2341,18 @@ static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
> }
> #endif /* CONFIG_UCLAMP_TASK */
>
> +#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK)
Why depend on CONFIG_ENERGY_MODEL? Laptops are battery powered and might not
have this option enabled. Do we really need energy model here?
> +static inline bool iowait_boosted(struct task_struct *p)
> +{
> + return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
I think this is overloading the usage of util clamp. You're basically using
cpu.uclamp.min to temporarily switch iowait boost on/off.
Isn't it better to add a new cgroup attribute to toggle this feature?
The problem does seem generic enough and could benefit other battery-powered
devices outside of the Android world. I don't think the dependency on uclamp &&
energy model are necessary to solve this.
Thanks
--
Qais Yousef
> +}
> +#else /* defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK) */
> +static inline bool iowait_boosted(struct task_struct *p)
> +{
> + return p->in_iowait;
> +}
> +#endif /* defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK) */
> +
> #ifdef arch_scale_freq_capacity
> # ifndef arch_scale_freq_invariant
> # define arch_scale_freq_invariant() true
> --
> 2.25.0.341.g760bfbb309-goog
>
On Friday 24 Jan 2020 at 02:52:39 (+0000), Qais Yousef wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ba749f579714..221c0fbcea0e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5220,8 +5220,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > * If in_iowait is set, the code below may not trigger any cpufreq
> > * utilization updates, so do it here explicitly with the IOWAIT flag
> > * passed.
> > + * If CONFIG_ENERGY_MODEL and CONFIG_UCLAMP_TASK are both configured,
> > + * only boost for tasks set with non-null min-clamp.
> > */
> > - if (p->in_iowait)
> > + if (iowait_boosted(p))
> > cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
> >
> > for_each_sched_entity(se) {
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 280a3c735935..a13d49d835ed 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -2341,6 +2341,18 @@ static inline unsigned int uclamp_util(struct rq *rq, unsigned int util)
> > }
> > #endif /* CONFIG_UCLAMP_TASK */
> >
> > +#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_UCLAMP_TASK)
>
> Why depend on CONFIG_ENERGY_MODEL? Laptops are battery powered and might not
> have this option enabled. Do we really need energy model here?
+1
> > +static inline bool iowait_boosted(struct task_struct *p)
> > +{
> > + return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
>
> I think this is overloading the usage of util clamp. You're basically using
> cpu.uclamp.min to temporarily switch iowait boost on/off.
>
> Isn't it better to add a new cgroup attribute to toggle this feature?
>
> The problem does seem generic enough and could benefit other battery-powered
> devices outside of the Android world. I don't think the dependency on uclamp &&
> energy model are necessary to solve this.
I think using uclamp is not a bad idea here, but perhaps we could do
things differently. As of today the iowait boost escapes the clamping
mechanism, so one option would be to change that. That would let us set
a low max clamp in the 'background' cgroup, which in turns would limit
the frequency request for those tasks even if they're IO-intensive.
That'll have to be done at the RQ level, but figuring out what's the
current max clamp on the rq should be doable from sugov I think.
Wei: would that work for your use case ?
Also, the iowait boost really should be per-task and not per-cpu, so it
can be taken into account during wake-up balance on big.LITTLE. That
might also help on SMP if a task doing a lot of IO migrates, as the
boost would migrate with it. But that's perhaps for later ...
Thanks,
Quentin
On 24/01/2020 09:51, Quentin Perret wrote:
>>> +static inline bool iowait_boosted(struct task_struct *p)
>>> +{
>>> + return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
>>
>> I think this is overloading the usage of util clamp. You're basically using
>> cpu.uclamp.min to temporarily switch iowait boost on/off.
>>
>> Isn't it better to add a new cgroup attribute to toggle this feature?
>>
>> The problem does seem generic enough and could benefit other battery-powered
>> devices outside of the Android world. I don't think the dependency on uclamp &&
>> energy model are necessary to solve this.
>
> I think using uclamp is not a bad idea here, but perhaps we could do
> things differently. As of today the iowait boost escapes the clamping
> mechanism, so one option would be to change that. That would let us set
> a low max clamp in the 'background' cgroup, which in turns would limit
> the frequency request for those tasks even if they're IO-intensive.
>
So I'm pretty sure we *do* want tasks with the default clamps to get iowait
boost'd. What we don't want are background tasks driving up the frequency,
and that should be via uclamp.max (as Quentin is suggesting) rather than
uclamp.min (as is suggested in the patch).
Now, whether that is overloading the usage of uclamp... I'm not sure.
One of the argument for uclamp was actually frequency selection, so if
we just make iowait boost respect that, IOW not boost further than
uclamp.max (which is a bit better than a simple on/off switch), that
wouldn't be too crazy I think.
> That'll have to be done at the RQ level, but figuring out what's the
> current max clamp on the rq should be doable from sugov I think.
>
> Wei: would that work for your use case ?
>
> Also, the iowait boost really should be per-task and not per-cpu, so it
> can be taken into account during wake-up balance on big.LITTLE. That
> might also help on SMP if a task doing a lot of IO migrates, as the
> boost would migrate with it. But that's perhaps for later ...
>
> Thanks,
> Quentin
>
On 01/24/20 11:01, Valentin Schneider wrote:
> On 24/01/2020 09:51, Quentin Perret wrote:
> >>> +static inline bool iowait_boosted(struct task_struct *p)
> >>> +{
> >>> + return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
> >>
> >> I think this is overloading the usage of util clamp. You're basically using
> >> cpu.uclamp.min to temporarily switch iowait boost on/off.
> >>
> >> Isn't it better to add a new cgroup attribute to toggle this feature?
> >>
> >> The problem does seem generic enough and could benefit other battery-powered
> >> devices outside of the Android world. I don't think the dependency on uclamp &&
> >> energy model are necessary to solve this.
> >
> > I think using uclamp is not a bad idea here, but perhaps we could do
> > things differently. As of today the iowait boost escapes the clamping
> > mechanism, so one option would be to change that. That would let us set
> > a low max clamp in the 'background' cgroup, which in turns would limit
> > the frequency request for those tasks even if they're IO-intensive.
> >
>
> So I'm pretty sure we *do* want tasks with the default clamps to get iowait
> boost'd. What we don't want are background tasks driving up the frequency,
> and that should be via uclamp.max (as Quentin is suggesting) rather than
> uclamp.min (as is suggested in the patch).
>
> Now, whether that is overloading the usage of uclamp... I'm not sure.
> One of the argument for uclamp was actually frequency selection, so if
> we just make iowait boost respect that, IOW not boost further than
> uclamp.max (which is a bit better than a simple on/off switch), that
> wouldn't be too crazy I think.
Capping iowait boost value in schedutil based on uclamp makes sense indeed.
What didn't make sense to me is the use of uclamp as a switch to toggle iowait
boost on/off.
Cheers
--
Qais Yousef
On Fri, Jan 24, 2020 at 3:01 AM Valentin Schneider
<[email protected]> wrote:
>
> On 24/01/2020 09:51, Quentin Perret wrote:
> >>> +static inline bool iowait_boosted(struct task_struct *p)
> >>> +{
> >>> + return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
> >>
> >> I think this is overloading the usage of util clamp. You're basically using
> >> cpu.uclamp.min to temporarily switch iowait boost on/off.
> >>
> >> Isn't it better to add a new cgroup attribute to toggle this feature?
> >>
> >> The problem does seem generic enough and could benefit other battery-powered
> >> devices outside of the Android world. I don't think the dependency on uclamp &&
> >> energy model are necessary to solve this.
> >
> > I think using uclamp is not a bad idea here, but perhaps we could do
> > things differently. As of today the iowait boost escapes the clamping
> > mechanism, so one option would be to change that. That would let us set
> > a low max clamp in the 'background' cgroup, which in turns would limit
> > the frequency request for those tasks even if they're IO-intensive.
> >
Something we see e.g. is f2fs's gc thread, and my thought on this is
instead of chasing everything down, it is a lot easier for us to only
boost what we know in foreground (and on Android we sort of have that
information on hand from framework).
I was hesitant to introduce a new switch ( e.g. Android's older EAS
kernel prefer_idle toggle ) but would be happy to do that if someone
supports that idea.
>
>
>
> So I'm pretty sure we *do* want tasks with the default clamps to get iowait
> boost'd. What we don't want are background tasks driving up the frequency,
> and that should be via uclamp.max (as Quentin is suggesting) rather than
> uclamp.min (as is suggested in the patch).
>
> Now, whether that is overloading the usage of uclamp... I'm not sure.
> One of the argument for uclamp was actually frequency selection, so if
> we just make iowait boost respect that, IOW not boost further than
> uclamp.max (which is a bit better than a simple on/off switch), that
> wouldn't be too crazy I think.
>
>
> > That'll have to be done at the RQ level, but figuring out what's the
> > current max clamp on the rq should be doable from sugov I think.
> >
> > Wei: would that work for your use case ?
> >
> > Also, the iowait boost really should be per-task and not per-cpu, so it
> > can be taken into account during wake-up balance on big.LITTLE. That
> > might also help on SMP if a task doing a lot of IO migrates, as the
> > boost would migrate with it. But that's perhaps for later ...
> >
> > Thanks,
> > Quentin
> >
On Fri, Jan 24, 2020 at 3:30 AM Qais Yousef <[email protected]> wrote:
>
> On 01/24/20 11:01, Valentin Schneider wrote:
> > On 24/01/2020 09:51, Quentin Perret wrote:
> > >>> +static inline bool iowait_boosted(struct task_struct *p)
> > >>> +{
> > >>> + return p->in_iowait && uclamp_eff_value(p, UCLAMP_MIN) > 0;
> > >>
> > >> I think this is overloading the usage of util clamp. You're basically using
> > >> cpu.uclamp.min to temporarily switch iowait boost on/off.
> > >>
> > >> Isn't it better to add a new cgroup attribute to toggle this feature?
> > >>
> > >> The problem does seem generic enough and could benefit other battery-powered
> > >> devices outside of the Android world. I don't think the dependency on uclamp &&
> > >> energy model are necessary to solve this.
> > >
> > > I think using uclamp is not a bad idea here, but perhaps we could do
> > > things differently. As of today the iowait boost escapes the clamping
> > > mechanism, so one option would be to change that. That would let us set
> > > a low max clamp in the 'background' cgroup, which in turns would limit
> > > the frequency request for those tasks even if they're IO-intensive.
> > >
> >
> > So I'm pretty sure we *do* want tasks with the default clamps to get iowait
> > boost'd. What we don't want are background tasks driving up the frequency,
> > and that should be via uclamp.max (as Quentin is suggesting) rather than
> > uclamp.min (as is suggested in the patch).
> >
> > Now, whether that is overloading the usage of uclamp... I'm not sure.
> > One of the argument for uclamp was actually frequency selection, so if
> > we just make iowait boost respect that, IOW not boost further than
> > uclamp.max (which is a bit better than a simple on/off switch), that
> > wouldn't be too crazy I think.
>
> Capping iowait boost value in schedutil based on uclamp makes sense indeed.
>
> What didn't make sense to me is the use of uclamp as a switch to toggle iowait
> boost on/off.
>
> Cheers
>
> --
> Qais Yousef
Sounds like we all agree on adding a new toggle, so will move forward
with that then.
For capping iowait boost, it should be a seperate patch. I am not sure
if we want to apply what's the current max clamp on the rq but I do
see the per-task iowait boost makes sense.
On 01/24/20 12:55, Wei Wang wrote:
> > > So I'm pretty sure we *do* want tasks with the default clamps to get iowait
> > > boost'd. What we don't want are background tasks driving up the frequency,
> > > and that should be via uclamp.max (as Quentin is suggesting) rather than
> > > uclamp.min (as is suggested in the patch).
> > >
> > > Now, whether that is overloading the usage of uclamp... I'm not sure.
> > > One of the argument for uclamp was actually frequency selection, so if
> > > we just make iowait boost respect that, IOW not boost further than
> > > uclamp.max (which is a bit better than a simple on/off switch), that
> > > wouldn't be too crazy I think.
> >
> > Capping iowait boost value in schedutil based on uclamp makes sense indeed.
> >
> > What didn't make sense to me is the use of uclamp as a switch to toggle iowait
> > boost on/off.
>
> Sounds like we all agree on adding a new toggle, so will move forward
> with that then.
Looking more closely at iowait boost, it's not actually a generic cpufreq
attribute. Only schedutil and intel_pstate have it. Other governors might
implement something similar but under a different name.
So I'm not sure how easy it'd be to implement a generic toggle for something
that probably should be considered an implementation detail of a governor and
userspace shouldn't care much about.
Of course, the maintainers might have a different opinion. So don't let mine
discourage you from pursuing this further! :-)
> For capping iowait boost, it should be a seperate patch. I am not sure
> if we want to apply what's the current max clamp on the rq but I do
> see the per-task iowait boost makes sense.
It is true the 2 patches are orthogonal, but if you already cap the max
frequencies the background task can use, by ensuring the iowait_boost in
schedutil respects the uclamp restrictions then this should solve your problem
too, no?
The patch below only compile tested.
background/cpu.uclamp.max = 200 # Cap background tasks max frequencies
---
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 9b8916fd00a2..a76c02eecdaf 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -421,7 +421,8 @@ static unsigned long sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time,
* into the same scale so we can compare.
*/
boost = (sg_cpu->iowait_boost * max) >> SCHED_CAPACITY_SHIFT;
- return max(boost, util);
+ boost = max(boost, util);
+ return uclamp_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
}
#ifdef CONFIG_NO_HZ_COMMON
--
Qais Yousef
On 01/25/20 23:59, Qais Yousef wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 9b8916fd00a2..a76c02eecdaf 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -421,7 +421,8 @@ static unsigned long sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time,
> * into the same scale so we can compare.
> */
> boost = (sg_cpu->iowait_boost * max) >> SCHED_CAPACITY_SHIFT;
> - return max(boost, util);
> + boost = max(boost, util);
> + return uclamp_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
> }
>
> #ifdef CONFIG_NO_HZ_COMMON
Forgot to mention this will work if the background task is the only task
running on this CPU. Like Quentin already pointed out, the iowait_boost might
need more massaging to allow finer per task control.
--
Qais Yousef
On Sat, Jan 25, 2020 at 3:59 PM Qais Yousef <[email protected]> wrote:
>
> On 01/24/20 12:55, Wei Wang wrote:
> > > > So I'm pretty sure we *do* want tasks with the default clamps to get iowait
> > > > boost'd. What we don't want are background tasks driving up the frequency,
> > > > and that should be via uclamp.max (as Quentin is suggesting) rather than
> > > > uclamp.min (as is suggested in the patch).
> > > >
> > > > Now, whether that is overloading the usage of uclamp... I'm not sure.
> > > > One of the argument for uclamp was actually frequency selection, so if
> > > > we just make iowait boost respect that, IOW not boost further than
> > > > uclamp.max (which is a bit better than a simple on/off switch), that
> > > > wouldn't be too crazy I think.
> > >
> > > Capping iowait boost value in schedutil based on uclamp makes sense indeed.
> > >
> > > What didn't make sense to me is the use of uclamp as a switch to toggle iowait
> > > boost on/off.
> >
> > Sounds like we all agree on adding a new toggle, so will move forward
> > with that then.
>
> Looking more closely at iowait boost, it's not actually a generic cpufreq
> attribute. Only schedutil and intel_pstate have it. Other governors might
> implement something similar but under a different name.
>
> So I'm not sure how easy it'd be to implement a generic toggle for something
> that probably should be considered an implementation detail of a governor and
> userspace shouldn't care much about.
>
> Of course, the maintainers might have a different opinion. So don't let mine
> discourage you from pursuing this further! :-)
>
Indeed, that's why I was hesitate to add the toggle and wanted to
bring this up for Android common kernel first. :-)
> > For capping iowait boost, it should be a separate patch. I am not sure
> > if we want to apply what's the current max clamp on the rq but I do
> > see the per-task iowait boost makes sense.
>
> It is true the 2 patches are orthogonal, but if you already cap the max
> frequencies the background task can use, by ensuring the iowait_boost in
> schedutil respects the uclamp restrictions then this should solve your problem
> too, no?
>
We haven't tried on 5.4, and there is no uclamp policy placed for
background yet. Also I am not sure it is beneficial to do uclamp
restriction on those kernel threads (e.g. is f2fs's gc), but that is
an interesting experiment on power balance. Also for applying at rq
lever, what if there are foreground tasks (and it could be the case
sometimes).
> The patch below only compile tested.
>
>
> background/cpu.uclamp.max = 200 # Cap background tasks max frequencies
>
> ---
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 9b8916fd00a2..a76c02eecdaf 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -421,7 +421,8 @@ static unsigned long sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time,
> * into the same scale so we can compare.
> */
> boost = (sg_cpu->iowait_boost * max) >> SCHED_CAPACITY_SHIFT;
> - return max(boost, util);
> + boost = max(boost, util);
> + return uclamp_util_with(cpu_rq(sg_cpu->cpu), boost, NULL);
> }
>
> #ifdef CONFIG_NO_HZ_COMMON
>
> --
> Qais Yousef