On 30-11-17, 11:47, Patrick Bellasi wrote:
> diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> index d1ad3d825561..bb5f778db023 100644
> --- a/include/linux/sched/cpufreq.h
> +++ b/include/linux/sched/cpufreq.h
> @@ -11,6 +11,7 @@
> #define SCHED_CPUFREQ_RT (1U << 0)
> #define SCHED_CPUFREQ_DL (1U << 1)
> #define SCHED_CPUFREQ_IOWAIT (1U << 2)
> +#define SCHED_CPUFREQ_IDLE (1U << 3)
>
> #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 2f52ec0f1539..67339ccb5595 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>
> sg_cpu->util = util;
> sg_cpu->max = max;
> +
> + /* CPU is entering IDLE, reset flags without triggering an update */
> + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> + sg_cpu->flags = 0;
> + goto done;
> + }
> sg_cpu->flags = flags;
>
> sugov_set_iowait_boost(sg_cpu, time, flags);
> @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> sugov_update_commit(sg_policy, time, next_f);
> }
>
> +done:
> raw_spin_unlock(&sg_policy->update_lock);
> }
>
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index d518664cce4f..6e8ae2aa7a13 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> put_prev_task(rq, prev);
> update_idle_core(rq);
> schedstat_inc(rq->sched_goidle);
> +
> + /* kick cpufreq (see the comment in kernel/sched/sched.h). */
> + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
We posted some comments on V2 for this particular patch suggesting
some improvements. The patch hasn't changed at all and you haven't
replied to few of those suggestions as well. Any particular reason for
that?
For example:
- I suggested to get rid of the conditional expression in
cpufreq_schedutil.c file that you have added.
- And Joel suggested to clear the RT/DL flags from dequeue path to
avoid adding SCHED_CPUFREQ_IDLE flag.
--
viresh
Hi Viresh,
On 07-Dec 10:31, Viresh Kumar wrote:
> On 30-11-17, 11:47, Patrick Bellasi wrote:
> > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> > index d1ad3d825561..bb5f778db023 100644
> > --- a/include/linux/sched/cpufreq.h
> > +++ b/include/linux/sched/cpufreq.h
> > @@ -11,6 +11,7 @@
> > #define SCHED_CPUFREQ_RT (1U << 0)
> > #define SCHED_CPUFREQ_DL (1U << 1)
> > #define SCHED_CPUFREQ_IOWAIT (1U << 2)
> > +#define SCHED_CPUFREQ_IDLE (1U << 3)
> >
> > #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 2f52ec0f1539..67339ccb5595 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >
> > sg_cpu->util = util;
> > sg_cpu->max = max;
> > +
> > + /* CPU is entering IDLE, reset flags without triggering an update */
> > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> > + sg_cpu->flags = 0;
> > + goto done;
> > + }
> > sg_cpu->flags = flags;
> >
> > sugov_set_iowait_boost(sg_cpu, time, flags);
> > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > sugov_update_commit(sg_policy, time, next_f);
> > }
> >
> > +done:
> > raw_spin_unlock(&sg_policy->update_lock);
> > }
> >
> > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> > index d518664cce4f..6e8ae2aa7a13 100644
> > --- a/kernel/sched/idle_task.c
> > +++ b/kernel/sched/idle_task.c
> > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> > put_prev_task(rq, prev);
> > update_idle_core(rq);
> > schedstat_inc(rq->sched_goidle);
> > +
> > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */
> > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
>
> We posted some comments on V2 for this particular patch suggesting
> some improvements. The patch hasn't changed at all and you haven't
> replied to few of those suggestions as well. Any particular reason for
> that?
You right, since the previous posting has been a long time ago, with
this one I mainly wanted to refresh the discussion. Thanks for
highlighting hereafter which one was the main discussion points.
> For example:
> - I suggested to get rid of the conditional expression in
> cpufreq_schedutil.c file that you have added.
We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting
them), however I think we still need an if condition somewhere.
Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger
an OPP change (reasons described in the changelog).
If that's still a goal, then we will need to check this flag and bail
out from sugov_update_shared straight away. That's why I've added a
check at the beginning and also defined it as unlikely to have not
impact on all cases where we call a schedutil update with runnable
tasks.
Does this makes sense?
> - And Joel suggested to clear the RT/DL flags from dequeue path to
> avoid adding SCHED_CPUFREQ_IDLE flag.
I had a thought about Joel's proposal:
>> wouldn't another way be to just clear the flag from the RT scheduling
>> class with an extra call to cpufreq_update_util with flags = 0 during
>> dequeue_rt_entity?
The main concern for me was that the current API is completely
transparent about which scheduling class is calling schedutil for
updates.
Thus, at dequeue time of an RT task we cannot really clear
all the flags (e.g. IOWAIT of a fair task), we should clear only
the RT related flags.
This means that we likely need to implement Joel's idea by:
1. adding a new set of flags like:
SCHED_CPUFREQ_RT_IDLE, SCHED_CPUFREQ_DL_IDLE, etc...
3. add an operation flag, e.g.
SCHED_CPUFERQ_SET, SCHED_CPUFREQ_RESET to be ORed with the class
flag, e.g.
cpufreq_update_util(rq, SCHED_CPUFREQ_SET|SCHED_CPUFREQ_RT);
3. change the API to carry the operation required for a flag, e.g.:
cpufreq_update_util(rq, flag, set={true, false});
To be honest I don't like any of those, especially compared to the
simplicity of the one proposed by this patch. :)
IMO, the only pitfall of this patch is that (as Juri pointed out in
v2) for DL it can happen that we do not want to reset the flag right
when a CPU enters IDLE. We need instead a specific call to reset the
DL flag at the 0-lag time.
However, AFAIU, this special case for DL will disappear as long as we
have last Juri's set [1]in. Indeed, at this point, schedutil will
always and only need to know the utilization required by DL.
[1] https://lkml.org/lkml/2017/12/4/173
Cheers Patrick
--
#include <best/regards.h>
Patrick Bellasi
On 12/07/2017 01:45 PM, Patrick Bellasi wrote:
> Hi Viresh,
>
> On 07-Dec 10:31, Viresh Kumar wrote:
>> On 30-11-17, 11:47, Patrick Bellasi wrote:
[...]
>> We posted some comments on V2 for this particular patch suggesting
>> some improvements. The patch hasn't changed at all and you haven't
>> replied to few of those suggestions as well. Any particular reason for
>> that?
>
> You right, since the previous posting has been a long time ago, with
> this one I mainly wanted to refresh the discussion. Thanks for
> highlighting hereafter which one was the main discussion points.
>
>
>> For example:
>> - I suggested to get rid of the conditional expression in
>> cpufreq_schedutil.c file that you have added.
>
> We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting
> them), however I think we still need an if condition somewhere.
>
> Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger
> an OPP change (reasons described in the changelog).
>
> If that's still a goal, then we will need to check this flag and bail
> out from sugov_update_shared straight away. That's why I've added a
> check at the beginning and also defined it as unlikely to have not
> impact on all cases where we call a schedutil update with runnable
> tasks.
>
> Does this makes sense?
IIRC, there was also this question of doing this not only in the shared
but also in the single case ...
[...]
On 07-12-17, 12:45, Patrick Bellasi wrote:
> On 07-Dec 10:31, Viresh Kumar wrote:
> > We posted some comments on V2 for this particular patch suggesting
> > some improvements. The patch hasn't changed at all and you haven't
> > replied to few of those suggestions as well. Any particular reason for
> > that?
>
> You right, since the previous posting has been a long time ago, with
> this one I mainly wanted to refresh the discussion.
But posting again without making proposed changes or finishing earlier
discussion is normally not appreciated, specially by the very busy maintainers
(I am not one of those though :)). For example, I had to go look at previous
discussions now to see what all comments I gave and what has changed in respect
to them. That wasted lots of time as nothing really changed.
If you want to refresh the discussion, then its probably better to finish
discussions on the earlier threads only and then only send a new version.
> > For example:
> > - I suggested to get rid of the conditional expression in
> > cpufreq_schedutil.c file that you have added.
>
> We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting
> them), however I think we still need an if condition somewhere.
Yeah, my point was to have similar behavior in single and shared policy cases.
> > - And Joel suggested to clear the RT/DL flags from dequeue path to
> > avoid adding SCHED_CPUFREQ_IDLE flag.
>
> I had a thought about Joel's proposal:
>
> >> wouldn't another way be to just clear the flag from the RT scheduling
> >> class with an extra call to cpufreq_update_util with flags = 0 during
> >> dequeue_rt_entity?
>
> The main concern for me was that the current API is completely
> transparent about which scheduling class is calling schedutil for
> updates.
I think its important to fix the basic mechanism of util update than fixing
corner cases with workarounds. I attempted a simpler approach (at least
according to me :)). Please share your feedback on it. You can include that as
part of your series, or I can send it separately if everyone finds it okay.
--
viresh
-------------------------8<-------------------------
From: Viresh Kumar <[email protected]>
Date: Tue, 12 Dec 2017 15:43:26 +0530
Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
Currently the schedutil governor overwrites the sg_cpu->flags field on
every call to the utilization handler. It was pretty good as the initial
implementation of utilization handlers, there are several drawbacks
though.
The biggest drawback is that the sg_cpu->flags field doesn't always
represent the correct type of tasks that are enqueued on a CPU's rq. For
example, if a fair task is enqueued while a RT or DL task is running, we
will overwrite the flags with value 0 and that may take the CPU to lower
OPPs unintentionally. There can be other corner cases as well which we
aren't aware of currently.
This patch changes the current implementation to keep track of all the
task types that are currently enqueued to the CPUs rq. There are two
flags for every scheduling class now, one to set the flag and other one
to clear it. The flag is set by the scheduling classes from the existing
set of calls to cpufreq_update_util(), and the flag is cleared when the
last task of the scheduling class is dequeued. For now, the util update
handlers return immediately if they were called to clear the flag.
We can add more optimizations over this patch separately.
The last parameter of sugov_set_iowait_boost() is also dropped as the
function can get it from sg_cpu anyway.
Signed-off-by: Viresh Kumar <[email protected]>
---
include/linux/sched/cpufreq.h | 7 ++++++-
kernel/sched/cpufreq_schedutil.c | 32 +++++++++++++++++++++++---------
kernel/sched/deadline.c | 4 ++++
kernel/sched/fair.c | 8 ++++++--
kernel/sched/rt.c | 4 ++++
5 files changed, 43 insertions(+), 12 deletions(-)
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d1ad3d825561..dc2470affea4 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -8,9 +8,14 @@
* Interface between cpufreq drivers and the scheduler:
*/
+#define SCHED_CPUFREQ_CLEAR (1U << 31)
#define SCHED_CPUFREQ_RT (1U << 0)
+#define SCHED_CPUFREQ_RT_CLEAR (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_CLEAR)
#define SCHED_CPUFREQ_DL (1U << 1)
-#define SCHED_CPUFREQ_IOWAIT (1U << 2)
+#define SCHED_CPUFREQ_DL_CLEAR (SCHED_CPUFREQ_DL | SCHED_CPUFREQ_CLEAR)
+#define SCHED_CPUFREQ_CFS (1U << 2)
+#define SCHED_CPUFREQ_CFS_CLEAR (SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_CLEAR)
+#define SCHED_CPUFREQ_IOWAIT (1U << 3)
#define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2f52ec0f1539..7edfdc59ee8f 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -187,10 +187,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
*max = cfs_max;
}
-static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
- unsigned int flags)
+static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time)
{
- if (flags & SCHED_CPUFREQ_IOWAIT) {
+ if (sg_cpu->flags & SCHED_CPUFREQ_IOWAIT) {
+ sg_cpu->flags &= ~SCHED_CPUFREQ_IOWAIT;
+
if (sg_cpu->iowait_boost_pending)
return;
@@ -264,7 +265,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int next_f;
bool busy;
- sugov_set_iowait_boost(sg_cpu, time, flags);
+ if (unlikely(flags & SCHED_CPUFREQ_CLEAR)) {
+ sg_cpu->flags &= ~flags;
+ return;
+ }
+
+ sg_cpu->flags |= flags;
+
+ sugov_set_iowait_boost(sg_cpu, time);
sg_cpu->last_update = time;
if (!sugov_should_update_freq(sg_policy, time))
@@ -272,7 +280,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
busy = sugov_cpu_is_busy(sg_cpu);
- if (flags & SCHED_CPUFREQ_RT_DL) {
+ if (sg_cpu->flags & SCHED_CPUFREQ_RT_DL) {
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max, sg_cpu->cpu);
@@ -345,15 +353,20 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
raw_spin_lock(&sg_policy->update_lock);
+ if (unlikely(flags & SCHED_CPUFREQ_CLEAR)) {
+ sg_cpu->flags &= ~flags;
+ goto unlock;
+ }
+
sg_cpu->util = util;
sg_cpu->max = max;
- sg_cpu->flags = flags;
+ sg_cpu->flags |= flags;
- sugov_set_iowait_boost(sg_cpu, time, flags);
+ sugov_set_iowait_boost(sg_cpu, time);
sg_cpu->last_update = time;
if (sugov_should_update_freq(sg_policy, time)) {
- if (flags & SCHED_CPUFREQ_RT_DL)
+ if (sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
next_f = sg_policy->policy->cpuinfo.max_freq;
else
next_f = sugov_next_freq_shared(sg_cpu, time);
@@ -361,6 +374,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
sugov_update_commit(sg_policy, time, next_f);
}
+unlock:
raw_spin_unlock(&sg_policy->update_lock);
}
@@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
memset(sg_cpu, 0, sizeof(*sg_cpu));
sg_cpu->cpu = cpu;
sg_cpu->sg_policy = sg_policy;
- sg_cpu->flags = SCHED_CPUFREQ_RT;
+ sg_cpu->flags = 0;
sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
}
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 2473736c7616..d9c7c6887493 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1472,6 +1472,10 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
*/
if (flags & DEQUEUE_SLEEP)
task_non_contending(p);
+
+ /* Clear cpufreq flags after last deadline task is dequeued */
+ if (!rq->dl.dl_nr_running)
+ cpufreq_update_util(rq, SCHED_CPUFREQ_DL_CLEAR);
}
/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2915c0d95107..492188c3ee2d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3033,7 +3033,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
*
* See cpu_util().
*/
- cpufreq_update_util(rq, 0);
+ cpufreq_update_util(rq, SCHED_CPUFREQ_CFS);
}
}
@@ -5214,7 +5214,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* passed.
*/
if (p->in_iowait)
- cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
+ cpufreq_update_util(rq, SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_IOWAIT);
for_each_sched_entity(se) {
if (se->on_rq)
@@ -5309,6 +5309,10 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
sub_nr_running(rq, 1);
hrtick_update(rq);
+
+ /* Clear cpufreq flags after last CFS task is dequeued */
+ if (!rq->cfs.nr_running)
+ cpufreq_update_util(rq, SCHED_CPUFREQ_CFS_CLEAR);
}
#ifdef CONFIG_SMP
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4056c19ca3f0..73131abd9df6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1337,6 +1337,10 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
dequeue_rt_entity(rt_se, flags);
dequeue_pushable_task(rq, p);
+
+ /* Clear cpufreq flags after last rt task is dequeued */
+ if (!rq->rt.rt_nr_running)
+ cpufreq_update_util(rq, SCHED_CPUFREQ_RT_CLEAR);
}
/*
Hi Viresh,
On 12/12/17 17:07, Viresh Kumar wrote:
[...]
> From: Viresh Kumar <[email protected]>
> Date: Tue, 12 Dec 2017 15:43:26 +0530
> Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
>
> Currently the schedutil governor overwrites the sg_cpu->flags field on
> every call to the utilization handler. It was pretty good as the initial
> implementation of utilization handlers, there are several drawbacks
> though.
>
> The biggest drawback is that the sg_cpu->flags field doesn't always
> represent the correct type of tasks that are enqueued on a CPU's rq. For
> example, if a fair task is enqueued while a RT or DL task is running, we
> will overwrite the flags with value 0 and that may take the CPU to lower
> OPPs unintentionally. There can be other corner cases as well which we
> aren't aware of currently.
>
> This patch changes the current implementation to keep track of all the
> task types that are currently enqueued to the CPUs rq. There are two
> flags for every scheduling class now, one to set the flag and other one
> to clear it. The flag is set by the scheduling classes from the existing
> set of calls to cpufreq_update_util(), and the flag is cleared when the
> last task of the scheduling class is dequeued. For now, the util update
> handlers return immediately if they were called to clear the flag.
>
> We can add more optimizations over this patch separately.
>
> The last parameter of sugov_set_iowait_boost() is also dropped as the
> function can get it from sg_cpu anyway.
>
> Signed-off-by: Viresh Kumar <[email protected]>
[...]
> @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> memset(sg_cpu, 0, sizeof(*sg_cpu));
> sg_cpu->cpu = cpu;
> sg_cpu->sg_policy = sg_policy;
> - sg_cpu->flags = SCHED_CPUFREQ_RT;
> + sg_cpu->flags = 0;
> sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> }
Why this change during initialization?
Thanks,
- Juri
On 12-12-17, 14:38, Juri Lelli wrote:
> Hi Viresh,
>
> On 12/12/17 17:07, Viresh Kumar wrote:
>
> [...]
>
> > From: Viresh Kumar <[email protected]>
> > Date: Tue, 12 Dec 2017 15:43:26 +0530
> > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> >
> > Currently the schedutil governor overwrites the sg_cpu->flags field on
> > every call to the utilization handler. It was pretty good as the initial
> > implementation of utilization handlers, there are several drawbacks
> > though.
> >
> > The biggest drawback is that the sg_cpu->flags field doesn't always
> > represent the correct type of tasks that are enqueued on a CPU's rq. For
> > example, if a fair task is enqueued while a RT or DL task is running, we
> > will overwrite the flags with value 0 and that may take the CPU to lower
> > OPPs unintentionally. There can be other corner cases as well which we
> > aren't aware of currently.
> >
> > This patch changes the current implementation to keep track of all the
> > task types that are currently enqueued to the CPUs rq. There are two
> > flags for every scheduling class now, one to set the flag and other one
> > to clear it. The flag is set by the scheduling classes from the existing
> > set of calls to cpufreq_update_util(), and the flag is cleared when the
> > last task of the scheduling class is dequeued. For now, the util update
> > handlers return immediately if they were called to clear the flag.
> >
> > We can add more optimizations over this patch separately.
> >
> > The last parameter of sugov_set_iowait_boost() is also dropped as the
> > function can get it from sg_cpu anyway.
> >
> > Signed-off-by: Viresh Kumar <[email protected]>
>
> [...]
>
> > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> > memset(sg_cpu, 0, sizeof(*sg_cpu));
> > sg_cpu->cpu = cpu;
> > sg_cpu->sg_policy = sg_policy;
> > - sg_cpu->flags = SCHED_CPUFREQ_RT;
> > + sg_cpu->flags = 0;
> > sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> > }
>
> Why this change during initialization?
Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't
change the frequency until the first time the util handler is called. And once
that is called we were updating the flag anyway. So, unless I misunderstood its
purpose, it was doing anything helpful.
I need to remove it otherwise the RT flag may remain set for a very long time
unnecessarily. That would be until the time the last RT task is not dequeued.
Consider this for example: we are at max freq when sugov_start() is called and
it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS
tasks but we always keep running at max because of the flag. Even the schedutil
RT thread doesn't get a chance to run/deququed, because we never want a freq
change with the RT flag and stay at max.
Makes sense ?
--
viresh
On 12/12/17 20:10, Viresh Kumar wrote:
> On 12-12-17, 14:38, Juri Lelli wrote:
> > Hi Viresh,
> >
> > On 12/12/17 17:07, Viresh Kumar wrote:
> >
> > [...]
> >
> > > From: Viresh Kumar <[email protected]>
> > > Date: Tue, 12 Dec 2017 15:43:26 +0530
> > > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> > >
> > > Currently the schedutil governor overwrites the sg_cpu->flags field on
> > > every call to the utilization handler. It was pretty good as the initial
> > > implementation of utilization handlers, there are several drawbacks
> > > though.
> > >
> > > The biggest drawback is that the sg_cpu->flags field doesn't always
> > > represent the correct type of tasks that are enqueued on a CPU's rq. For
> > > example, if a fair task is enqueued while a RT or DL task is running, we
> > > will overwrite the flags with value 0 and that may take the CPU to lower
> > > OPPs unintentionally. There can be other corner cases as well which we
> > > aren't aware of currently.
> > >
> > > This patch changes the current implementation to keep track of all the
> > > task types that are currently enqueued to the CPUs rq. There are two
> > > flags for every scheduling class now, one to set the flag and other one
> > > to clear it. The flag is set by the scheduling classes from the existing
> > > set of calls to cpufreq_update_util(), and the flag is cleared when the
> > > last task of the scheduling class is dequeued. For now, the util update
> > > handlers return immediately if they were called to clear the flag.
> > >
> > > We can add more optimizations over this patch separately.
> > >
> > > The last parameter of sugov_set_iowait_boost() is also dropped as the
> > > function can get it from sg_cpu anyway.
> > >
> > > Signed-off-by: Viresh Kumar <[email protected]>
> >
> > [...]
> >
> > > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> > > memset(sg_cpu, 0, sizeof(*sg_cpu));
> > > sg_cpu->cpu = cpu;
> > > sg_cpu->sg_policy = sg_policy;
> > > - sg_cpu->flags = SCHED_CPUFREQ_RT;
> > > + sg_cpu->flags = 0;
> > > sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> > > }
> >
> > Why this change during initialization?
>
> Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't
> change the frequency until the first time the util handler is called. And once
> that is called we were updating the flag anyway. So, unless I misunderstood its
> purpose, it was doing anything helpful.
That was actually my understanding as well. Your patch made me notice
it.
>
> I need to remove it otherwise the RT flag may remain set for a very long time
> unnecessarily. That would be until the time the last RT task is not dequeued.
> Consider this for example: we are at max freq when sugov_start() is called and
> it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS
> tasks but we always keep running at max because of the flag. Even the schedutil
> RT thread doesn't get a chance to run/deququed, because we never want a freq
> change with the RT flag and stay at max.
>
> Makes sense ?
Yes. I guess it's working ok for now because of the problem this patch,
and Patrick's, address (always overwriting).
Thanks,
- Juri
Hi Viresh,
On 12-Dec 17:07, Viresh Kumar wrote:
> On 07-12-17, 12:45, Patrick Bellasi wrote:
> > On 07-Dec 10:31, Viresh Kumar wrote:
[...]
> I think its important to fix the basic mechanism of util update than fixing
> corner cases with workarounds. I attempted a simpler approach (at least
> according to me :)). Please share your feedback on it. You can include that as
> part of your series, or I can send it separately if everyone finds it okay.
please go on and post this patch on the list, all other patches from
my series can follow on top, later.
Hereafter inline are some comments on your patch...
>
> --
> viresh
>
> -------------------------8<-------------------------
> From: Viresh Kumar <[email protected]>
> Date: Tue, 12 Dec 2017 15:43:26 +0530
> Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
>
> Currently the schedutil governor overwrites the sg_cpu->flags field on
> every call to the utilization handler. It was pretty good as the initial
> implementation of utilization handlers, there are several drawbacks
> though.
>
> The biggest drawback is that the sg_cpu->flags field doesn't always
> represent the correct type of tasks that are enqueued on a CPU's rq. For
> example, if a fair task is enqueued while a RT or DL task is running, we
> will overwrite the flags with value 0 and that may take the CPU to lower
> OPPs unintentionally. There can be other corner cases as well which we
> aren't aware of currently.
>
> This patch changes the current implementation to keep track of all the
> task types that are currently enqueued to the CPUs rq. There are two
> flags for every scheduling class now, one to set the flag and other one
> to clear it.
nit-pick: that's not completely correct, there is only one CLEAR flag
which is used to clear whatever other flags are passed in.
> The flag is set by the scheduling classes from the existing
> set of calls to cpufreq_update_util(), and the flag is cleared when the
> last task of the scheduling class is dequeued. For now, the util update
> handlers return immediately if they were called to clear the flag.
>
> We can add more optimizations over this patch separately.
>
> The last parameter of sugov_set_iowait_boost() is also dropped as the
> function can get it from sg_cpu anyway.
As I comment below, this should be on a different patch IMO.
> Signed-off-by: Viresh Kumar <[email protected]>
[...]
> @@ -8,9 +8,14 @@
> * Interface between cpufreq drivers and the scheduler:
> */
>
> +#define SCHED_CPUFREQ_CLEAR (1U << 31)
> #define SCHED_CPUFREQ_RT (1U << 0)
> +#define SCHED_CPUFREQ_RT_CLEAR (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_CLEAR)
> #define SCHED_CPUFREQ_DL (1U << 1)
> -#define SCHED_CPUFREQ_IOWAIT (1U << 2)
> +#define SCHED_CPUFREQ_DL_CLEAR (SCHED_CPUFREQ_DL | SCHED_CPUFREQ_CLEAR)
> +#define SCHED_CPUFREQ_CFS (1U << 2)
> +#define SCHED_CPUFREQ_CFS_CLEAR (SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_CLEAR)
> +#define SCHED_CPUFREQ_IOWAIT (1U << 3)
>
> #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
Since you are already changing some flags position, maybe we can have
a better organization by using lower flags for "general bits" and
higher ones for class specific, i.e.
#define SCHED_CPUFREQ_CLEAR (1U << 0)
#define SCHED_CPUFREQ_IOWAIT (1U << 1)
#define SCHED_CPUFREQ_CFS (1U << 8)
#define SCHED_CPUFREQ_RT (1U << 9)
#define SCHED_CPUFREQ_DL (1U << 10)
#define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
#define SCHED_CPUFREQ_CFS_CLEAR (SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_CLEAR)
#define SCHED_CPUFREQ_RT_CLEAR (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_CLEAR)
#define SCHED_CPUFREQ_DL_CLEAR (SCHED_CPUFREQ_DL | SCHED_CPUFREQ_CLEAR)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 2f52ec0f1539..7edfdc59ee8f 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -187,10 +187,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
> *max = cfs_max;
> }
>
> -static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> - unsigned int flags)
> +static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time)
> {
> - if (flags & SCHED_CPUFREQ_IOWAIT) {
> + if (sg_cpu->flags & SCHED_CPUFREQ_IOWAIT) {
> + sg_cpu->flags &= ~SCHED_CPUFREQ_IOWAIT;
> +
This function should still work if we pass in flags as a parameter.
Thus, this looks like an change/optimization of the
sugov_set_iowait_boost API, which maybe should be better moved into a
separate patch on top of this one.
> if (sg_cpu->iowait_boost_pending)
> return;
>
[...]
> @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> memset(sg_cpu, 0, sizeof(*sg_cpu));
> sg_cpu->cpu = cpu;
> sg_cpu->sg_policy = sg_policy;
> - sg_cpu->flags = SCHED_CPUFREQ_RT;
> + sg_cpu->flags = 0;
Juri already pointed out this change, why it's needed?
Perhaps a note in the changelog can be useful.
> sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> }
>
[...]
--
#include <best/regards.h>
Patrick Bellasi
On 12-Dec 15:56, Juri Lelli wrote:
> On 12/12/17 20:10, Viresh Kumar wrote:
> > On 12-12-17, 14:38, Juri Lelli wrote:
> > > Hi Viresh,
> > >
> > > On 12/12/17 17:07, Viresh Kumar wrote:
> > >
> > > [...]
> > >
> > > > From: Viresh Kumar <[email protected]>
> > > > Date: Tue, 12 Dec 2017 15:43:26 +0530
> > > > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> > > >
> > > > Currently the schedutil governor overwrites the sg_cpu->flags field on
> > > > every call to the utilization handler. It was pretty good as the initial
> > > > implementation of utilization handlers, there are several drawbacks
> > > > though.
> > > >
> > > > The biggest drawback is that the sg_cpu->flags field doesn't always
> > > > represent the correct type of tasks that are enqueued on a CPU's rq. For
> > > > example, if a fair task is enqueued while a RT or DL task is running, we
> > > > will overwrite the flags with value 0 and that may take the CPU to lower
> > > > OPPs unintentionally. There can be other corner cases as well which we
> > > > aren't aware of currently.
> > > >
> > > > This patch changes the current implementation to keep track of all the
> > > > task types that are currently enqueued to the CPUs rq. There are two
> > > > flags for every scheduling class now, one to set the flag and other one
> > > > to clear it. The flag is set by the scheduling classes from the existing
> > > > set of calls to cpufreq_update_util(), and the flag is cleared when the
> > > > last task of the scheduling class is dequeued. For now, the util update
> > > > handlers return immediately if they were called to clear the flag.
> > > >
> > > > We can add more optimizations over this patch separately.
> > > >
> > > > The last parameter of sugov_set_iowait_boost() is also dropped as the
> > > > function can get it from sg_cpu anyway.
> > > >
> > > > Signed-off-by: Viresh Kumar <[email protected]>
> > >
> > > [...]
> > >
> > > > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> > > > memset(sg_cpu, 0, sizeof(*sg_cpu));
> > > > sg_cpu->cpu = cpu;
> > > > sg_cpu->sg_policy = sg_policy;
> > > > - sg_cpu->flags = SCHED_CPUFREQ_RT;
> > > > + sg_cpu->flags = 0;
> > > > sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> > > > }
> > >
> > > Why this change during initialization?
> >
> > Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't
> > change the frequency until the first time the util handler is called. And once
> > that is called we were updating the flag anyway. So, unless I misunderstood its
> > purpose, it was doing anything helpful.
>
> That was actually my understanding as well. Your patch made me notice
> it.
>
> >
> > I need to remove it otherwise the RT flag may remain set for a very long time
> > unnecessarily. That would be until the time the last RT task is not dequeued.
> > Consider this for example: we are at max freq when sugov_start() is called and
> > it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS
> > tasks but we always keep running at max because of the flag. Even the schedutil
> > RT thread doesn't get a chance to run/deququed, because we never want a freq
> > change with the RT flag and stay at max.
> >
> > Makes sense ?
>
> Yes. I guess it's working ok for now because of the problem this patch,
> and Patrick's, address (always overwriting).
Yes, makes sens to me too ;-)
--
#include <best/regards.h>
Patrick Bellasi
On 12-12-17, 15:16, Patrick Bellasi wrote:
> Since you are already changing some flags position, maybe we can have
> a better organization by using lower flags for "general bits" and
> higher ones for class specific, i.e.
>
> #define SCHED_CPUFREQ_CLEAR (1U << 0)
> #define SCHED_CPUFREQ_IOWAIT (1U << 1)
>
> #define SCHED_CPUFREQ_CFS (1U << 8)
> #define SCHED_CPUFREQ_RT (1U << 9)
> #define SCHED_CPUFREQ_DL (1U << 10)
Since we aren't normally going to add any more of class specific ones, I would
like to keep them as 0, 1 and 2. And using the top most bit for CLEAR seems
better to me somehow, just like the signed bit for 32 bit integers.
--
viresh