Here is an update of:
https://lkml.org/lkml/2018/5/10/346
Where, apart from some minor clean-ups, I've added a cpufreq_enqueue() wrapper
function for cpufreq_update_util() to be used in enqueue_task_fair().
That's because:
- it helps to keep more clean the enqueue_task_fair()
by collecting all cpufreq related code into a single function
- it fixes the flags masking issue reported by Peter
- it allows to add a compilation guard for !SMP systems
where we never migrate tasks and thus we should never set the
SCHED_CPUFREQ_MIGRATION flag.
The SCHED_CPUFREQ_MIGRATION flag was previously set from update_load_avg()
which is empty defined for !SMP. Now that we update schedutil at the end of
enqueue_task_fair, we have to specifically address this case.
Still, the resulting code, it's more clean in defining when schedutil is going
to be updated and under which conditions/flags.
Again, I want to call out that with the last patch there is one function change
wrt mainline. When the CPU controller is in use, and a task is queued into a
throttled RQ, the behavior of the IOWAIT flag is different.
Currently we set the flag (and thus boost the frequency to max) even tough we
know the task is not going to run. With the patch the flag is not set, and thus
the frequency not bumped, which seems to most correct behavior to me.
Moreover, the IOWAIT flag for this task is lost forever, since when the RQ in
unthrottled we do not walk the tasks to collect their flags. However, given the
task has been already delayed because of the throttled RQ, perhaps a delayed
IOWAIT boost also does not make a lot of sense. What do you think?
Finally, Viresh suggested some patches could/should have "stable" in
Cc but I was not entirely sure. Thus, I've left it as a maintainers call.
Cheers Patrick
Changes in v2:
- improved comment in enqueue_task_fair (Peter)
- fixed flags masking (Peter)
- fixed !CONFIG_SMP build (0-DAY)
by using a cpufreq_enqueue wrapper to avoid setting the
SCHED_CPUFREQ_MIGRATION flag on !SMP systems
- removed blank lines (Viresh)
- add "Acked-by" Viresh tag to the first two patches
Patrick Bellasi (3):
sched/cpufreq: always consider blocked FAIR utilization
sched/fair: util_est: update before schedutil
sched/fair: schedutil: explicit update only when required
kernel/sched/cpufreq_schedutil.c | 17 ++++----
kernel/sched/fair.c | 93 +++++++++++++++++++++-------------------
2 files changed, 56 insertions(+), 54 deletions(-)
--
2.15.1
When a task is enqueue the estimated utilization of a CPU is updated
to better support the selection of the required frequency.
However, schedutil is (implicitly) updated by update_load_avg() which
always happens before util_est_{en,de}queue(), thus potentially
introducing a latency between estimated utilization updates and
frequency selections.
Let's update util_est at the beginning of enqueue_task_fair(),
which will ensure that all schedutil updates will see the most
updated estimated utilization value for a CPU.
Reported-by: Vincent Guittot <[email protected]>
Signed-off-by: Patrick Bellasi <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
---
Changes in v2:
- improve comment in enqueue_task_fair() (Peter)
- add "Fixes" tag
- add "Acked-by" Viresh tag
---
kernel/sched/fair.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1f6a23a5b451..f01f0f395f9a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5356,6 +5356,14 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
+ /*
+ * The code below (indirectly) updates schedutil which looks at
+ * the cfs_rq utilization to select a frequency.
+ * Let's add the task's estimated utilization to the cfs_rq's
+ * estimated utilization, before we update schedutil.
+ */
+ util_est_enqueue(&rq->cfs, p);
+
/*
* If in_iowait is set, the code below may not trigger any cpufreq
* utilization updates, so do it here explicitly with the IOWAIT flag
@@ -5397,7 +5405,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!se)
add_nr_running(rq, 1);
- util_est_enqueue(&rq->cfs, p);
hrtick_update(rq);
}
--
2.15.1
Schedutil updates for FAIR tasks are triggered implicitly each time a
cfs_rq's utilization is updated via cfs_rq_util_change(), currently
called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has
changed, and {attach,detach}_entity_load_avg().
This design is based on the idea that "we should callback schedutil
frequently enough" to properly update the CPU frequency at every
utilization change. However, such an integration strategy has also
some downsides:
- schedutil updates are triggered by RQ's load updates, which makes
sense in general but it does not allow to know exactly which other RQ
related information have been updated.
Recently, for example, we had issues due to schedutil dependencies on
cfs_rq->h_nr_running and estimated utilization updates.
- cfs_rq_util_change() is mainly a wrapper function for an already
existing "public API", cpufreq_update_util(), which is required
just to ensure we actually update schedutil only when we are updating
a root cfs_rq.
Thus, especially when task groups are in use, most of the calls to
this wrapper function are not required.
- the usage of a wrapper function is not completely consistent across
fair.c, since we could still need additional explicit calls to
cpufreq_update_util().
For example this already happens to report the IOWAIT boot flag in
the wakeup path.
- it makes it hard to integrate new features since it could require to
change other function prototypes just to pass in an additional flag,
as it happened for example in commit:
commit ea14b57e8a18 ("sched/cpufreq: Provide migration hint")
All the above considered, let's make schedutil updates more explicit in
fair.c by removing the cfs_rq_util_change() wrapper function in favour
of the existing cpufreq_update_util() public API.
This can be done by calling cpufreq_update_util() explicitly in the few
call sites where it really makes sense and when all the (potentially)
required cfs_rq's information have been updated.
This patch mainly removes code and adds explicit schedutil updates
only when we:
- {enqueue,dequeue}_task_fair() a task to/from the root cfs_rq
- (un)throttle_cfs_rq() a set of tasks up to the root cfs_rq
- task_tick_fair() to update the utilization of the root cfs_rq
All the other code paths, currently _indirectly_ covered by a call to
update_load_avg(), are still covered. Indeed, some paths already imply
enqueue/dequeue calls:
- switch_{to,from}_fair()
- sched_move_task()
while others are followed by enqueue/dequeue calls:
- cpu_cgroup_fork() and
post_init_entity_util_avg():
are used at wakeup_new_task() time and thus already followed by an
enqueue_task_fair()
- migrate_task_rq_fair():
updates the removed utilization but not the actual cfs_rq
utilization, which is updated by a following sched event
This new proposal allows also to better aggregate schedutil related
flags, which are required only at enqueue_task_fair() time.
IOWAIT and MIGRATION flags are now requested only when a task is
actually visible at the root cfs_rq level.
Signed-off-by: Patrick Bellasi <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes in v2:
- fixed flags masking (Peter)
- fixed !CONFIG_SMP build (0-DAY)
by using a cpufreq_enqueue wrapper to avoid setting the
SCHED_CPUFREQ_MIGRATION flag on !SMP systems
- removed blank lines (Viresh)
NOTE: this patch changes the behavior of the IOWAIT flag: in case of a
task waking up on a throttled RQ we do not assert the flag to schedutil
anymore. However, this seems to make sense since the task will not be
running anyway.
---
kernel/sched/fair.c | 90 +++++++++++++++++++++++++----------------------------
1 file changed, 43 insertions(+), 47 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f01f0f395f9a..dc368c73cf9e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -772,7 +772,7 @@ void post_init_entity_util_avg(struct sched_entity *se)
* For !fair tasks do:
*
update_cfs_rq_load_avg(now, cfs_rq);
- attach_entity_load_avg(cfs_rq, se, 0);
+ attach_entity_load_avg(cfs_rq, se);
switched_from_fair(rq, p);
*
* such that the next switched_to_fair() has the
@@ -3009,29 +3009,6 @@ static inline void update_cfs_group(struct sched_entity *se)
}
#endif /* CONFIG_FAIR_GROUP_SCHED */
-static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags)
-{
- struct rq *rq = rq_of(cfs_rq);
-
- if (&rq->cfs == cfs_rq || (flags & SCHED_CPUFREQ_MIGRATION)) {
- /*
- * There are a few boundary cases this might miss but it should
- * get called often enough that that should (hopefully) not be
- * a real problem.
- *
- * It will not get called when we go idle, because the idle
- * thread is a different class (!fair), nor will the utilization
- * number include things like RT tasks.
- *
- * As is, the util number is not freq-invariant (we'd have to
- * implement arch_scale_freq_capacity() for that).
- *
- * See cpu_util().
- */
- cpufreq_update_util(rq, flags);
- }
-}
-
#ifdef CONFIG_SMP
/*
* Approximate:
@@ -3712,9 +3689,6 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
#endif
- if (decayed)
- cfs_rq_util_change(cfs_rq, 0);
-
return decayed;
}
@@ -3726,7 +3700,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
* Must call update_cfs_rq_load_avg() before this, since we rely on
* cfs_rq->avg.last_update_time being current.
*/
-static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
+static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
@@ -3761,8 +3735,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
cfs_rq->avg.util_sum += se->avg.util_sum;
add_tg_cfs_propagate(cfs_rq, se->avg.load_sum);
-
- cfs_rq_util_change(cfs_rq, flags);
}
/**
@@ -3780,8 +3752,6 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum);
-
- cfs_rq_util_change(cfs_rq, 0);
}
/*
@@ -3818,7 +3788,7 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
*
* IOW we're enqueueing a task on a new CPU.
*/
- attach_entity_load_avg(cfs_rq, se, SCHED_CPUFREQ_MIGRATION);
+ attach_entity_load_avg(cfs_rq, se);
update_tg_load_avg(cfs_rq, 0);
} else if (decayed && (flags & UPDATE_TG))
@@ -4028,13 +3998,12 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1)
{
- cfs_rq_util_change(cfs_rq, 0);
}
static inline void remove_entity_load_avg(struct sched_entity *se) {}
static inline void
-attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) {}
+attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
static inline void
detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
@@ -4762,8 +4731,11 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
dequeue = 0;
}
- if (!se)
+ /* The tasks are no more visible from the root cfs_rq */
+ if (!se) {
sub_nr_running(rq, task_delta);
+ cpufreq_update_util(rq, 0);
+ }
cfs_rq->throttled = 1;
cfs_rq->throttled_clock = rq_clock(rq);
@@ -4825,8 +4797,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
break;
}
- if (!se)
+ /* The tasks are now visible from the root cfs_rq */
+ if (!se) {
add_nr_running(rq, task_delta);
+ cpufreq_update_util(rq, 0);
+ }
/* Determine whether we need to wake up potentially idle CPU: */
if (rq->curr == rq->idle && rq->cfs.nr_running)
@@ -5345,6 +5320,28 @@ static inline void hrtick_update(struct rq *rq)
}
#endif
+static void
+cpufreq_enqueue(struct rq *rq, struct task_struct *p)
+{
+ unsigned int flags = 0;
+
+ if (p->in_iowait)
+ flags |= SCHED_CPUFREQ_IOWAIT;
+
+#ifdef CONFIG_SMP
+ /*
+ * !last_update_time means we've passed through
+ * migrate_task_rq_fair() indicating we migrated.
+ *
+ * IOW we're enqueueing a task on a new CPU.
+ */
+ if (!p->se.avg.last_update_time)
+ flags |= SCHED_CPUFREQ_MIGRATION;
+#endif
+
+ cpufreq_update_util(rq, flags);
+}
+
/*
* The enqueue_task method is called before nr_running is
* increased. Here we update the fair scheduling stats and
@@ -5364,14 +5361,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
*/
util_est_enqueue(&rq->cfs, p);
- /*
- * If in_iowait is set, the code below may not trigger any cpufreq
- * utilization updates, so do it here explicitly with the IOWAIT flag
- * passed.
- */
- if (p->in_iowait)
- cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
-
for_each_sched_entity(se) {
if (se->on_rq)
break;
@@ -5402,8 +5391,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_group(se);
}
- if (!se)
+ /* The task is visible from the root cfs_rq */
+ if (!se) {
add_nr_running(rq, 1);
+ cpufreq_enqueue(rq, p);
+ }
hrtick_update(rq);
}
@@ -5461,10 +5453,12 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_group(se);
}
+ /* The task is no more visible from the root cfs_rq */
if (!se)
sub_nr_running(rq, 1);
util_est_dequeue(&rq->cfs, p, task_sleep);
+ cpufreq_update_util(rq, 0);
hrtick_update(rq);
}
@@ -9950,6 +9944,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
if (static_branch_unlikely(&sched_numa_balancing))
task_tick_numa(rq, curr);
+
+ cpufreq_update_util(rq, 0);
}
/*
@@ -10087,7 +10083,7 @@ static void attach_entity_cfs_rq(struct sched_entity *se)
/* Synchronize entity with its cfs_rq */
update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
- attach_entity_load_avg(cfs_rq, se, 0);
+ attach_entity_load_avg(cfs_rq, se);
update_tg_load_avg(cfs_rq, false);
propagate_entity_cfs_rq(se);
}
--
2.15.1
Since the refactoring introduced by:
commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
we aggregate FAIR utilization only if this class has runnable tasks.
This was mainly due to avoid the risk to stay on an high frequency just
because of the blocked utilization of a CPU not being properly decayed
while the CPU was idle.
However, since:
commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
the FAIR blocked utilization is properly decayed also for IDLE CPUs.
This allows us to use the FAIR blocked utilization as a safe mechanism
to gracefully reduce the frequency only if no FAIR tasks show up on a
CPU for a reasonable period of time.
Moreover, we also reduce the frequency drops of CPUs running periodic
tasks which, depending on the task periodicity and the time required
for a frequency switch, was increasing the chances to introduce some
undesirable performance variations.
Reported-by: Vincent Guittot <[email protected]>
Signed-off-by: Patrick Bellasi <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
Changes in v2:
- add "Acked-by" Viresh tag
---
kernel/sched/cpufreq_schedutil.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index d2c6083304b4..a74d05160e66 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -183,22 +183,21 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
{
struct rq *rq = cpu_rq(sg_cpu->cpu);
- unsigned long util;
- if (rq->rt.rt_nr_running) {
- util = sg_cpu->max;
- } else {
- util = sg_cpu->util_dl;
- if (rq->cfs.h_nr_running)
- util += sg_cpu->util_cfs;
- }
+ if (rq->rt.rt_nr_running)
+ return sg_cpu->max;
/*
+ * Utilization required by DEADLINE must always be granted while, for
+ * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
+ * gracefully reduce the frequency when no tasks show up for longer
+ * periods of time.
+ *
* Ideally we would like to set util_dl as min/guaranteed freq and
* util_cfs + util_dl as requested freq. However, cpufreq is not yet
* ready for such an interface. So, we only do the latter for now.
*/
- return min(util, sg_cpu->max);
+ return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
}
static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
--
2.15.1
Hi Patrick,
On 11 May 2018 at 15:15, Patrick Bellasi <[email protected]> wrote:
> Since the refactoring introduced by:
>
> commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
>
> we aggregate FAIR utilization only if this class has runnable tasks.
> This was mainly due to avoid the risk to stay on an high frequency just
> because of the blocked utilization of a CPU not being properly decayed
> while the CPU was idle.
>
> However, since:
>
> commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>
> the FAIR blocked utilization is properly decayed also for IDLE CPUs.
>
> This allows us to use the FAIR blocked utilization as a safe mechanism
> to gracefully reduce the frequency only if no FAIR tasks show up on a
> CPU for a reasonable period of time.
>
> Moreover, we also reduce the frequency drops of CPUs running periodic
> tasks which, depending on the task periodicity and the time required
> for a frequency switch, was increasing the chances to introduce some
> undesirable performance variations.
>
> Reported-by: Vincent Guittot <[email protected]>
> Signed-off-by: Patrick Bellasi <[email protected]>
> Acked-by: Viresh Kumar <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
With this patch, I can't see the spurious OPP changes that I was seeing before
FWIW
Acked-by: Vincent Guittot <[email protected]>
Tested-by: Vincent Guittot <[email protected]>
Regards,
Vincent
>
> ---
>
> Changes in v2:
> - add "Acked-by" Viresh tag
> ---
> kernel/sched/cpufreq_schedutil.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index d2c6083304b4..a74d05160e66 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -183,22 +183,21 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> {
> struct rq *rq = cpu_rq(sg_cpu->cpu);
> - unsigned long util;
>
> - if (rq->rt.rt_nr_running) {
> - util = sg_cpu->max;
> - } else {
> - util = sg_cpu->util_dl;
> - if (rq->cfs.h_nr_running)
> - util += sg_cpu->util_cfs;
> - }
> + if (rq->rt.rt_nr_running)
> + return sg_cpu->max;
>
> /*
> + * Utilization required by DEADLINE must always be granted while, for
> + * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> + * gracefully reduce the frequency when no tasks show up for longer
> + * periods of time.
> + *
> * Ideally we would like to set util_dl as min/guaranteed freq and
> * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> * ready for such an interface. So, we only do the latter for now.
> */
> - return min(util, sg_cpu->max);
> + return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> }
>
> static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
> --
> 2.15.1
>
Hi Patrick,
On 11 May 2018 at 15:15, Patrick Bellasi <[email protected]> wrote:
> When a task is enqueue the estimated utilization of a CPU is updated
> to better support the selection of the required frequency.
> However, schedutil is (implicitly) updated by update_load_avg() which
> always happens before util_est_{en,de}queue(), thus potentially
> introducing a latency between estimated utilization updates and
> frequency selections.
>
> Let's update util_est at the beginning of enqueue_task_fair(),
> which will ensure that all schedutil updates will see the most
> updated estimated utilization value for a CPU.
>
> Reported-by: Vincent Guittot <[email protected]>
> Signed-off-by: Patrick Bellasi <[email protected]>
> Acked-by: Viresh Kumar <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
Acked-by: Vincent Guittot <[email protected]>
>
> ---
>
> Changes in v2:
> - improve comment in enqueue_task_fair() (Peter)
> - add "Fixes" tag
> - add "Acked-by" Viresh tag
> ---
On 14-May 11:16, Vincent Guittot wrote:
> Hi Patrick,
Hi Vincent,
> On 11 May 2018 at 15:15, Patrick Bellasi <[email protected]> wrote:
> > Since the refactoring introduced by:
> >
> > commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
> >
> > we aggregate FAIR utilization only if this class has runnable tasks.
> > This was mainly due to avoid the risk to stay on an high frequency just
> > because of the blocked utilization of a CPU not being properly decayed
> > while the CPU was idle.
> >
> > However, since:
> >
> > commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
> >
> > the FAIR blocked utilization is properly decayed also for IDLE CPUs.
> >
> > This allows us to use the FAIR blocked utilization as a safe mechanism
> > to gracefully reduce the frequency only if no FAIR tasks show up on a
> > CPU for a reasonable period of time.
> >
> > Moreover, we also reduce the frequency drops of CPUs running periodic
> > tasks which, depending on the task periodicity and the time required
> > for a frequency switch, was increasing the chances to introduce some
> > undesirable performance variations.
> >
> > Reported-by: Vincent Guittot <[email protected]>
> > Signed-off-by: Patrick Bellasi <[email protected]>
> > Acked-by: Viresh Kumar <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Viresh Kumar <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
>
> With this patch, I can't see the spurious OPP changes that I was seeing before
Cool thanks... regarding OPP updates I've added some more comments in
my reply to Joel's comments to my last patch of this series.
Would be nice if you can have a look... toward the end there are some
considerations about schedutil updates (indirectly) triggered by your
patches for blocked load updates on IDLE CPUs.
> FWIW
> Acked-by: Vincent Guittot <[email protected]>
> Tested-by: Vincent Guittot <[email protected]>
Thanks for testing, will add these to the next respin.
--
#include <best/regards.h>
Patrick Bellasi
On 14 May 2018 at 18:48, Patrick Bellasi <[email protected]> wrote:
> On 14-May 11:16, Vincent Guittot wrote:
>> Hi Patrick,
>
> Hi Vincent,
>
>> On 11 May 2018 at 15:15, Patrick Bellasi <[email protected]> wrote:
>> > Since the refactoring introduced by:
>> >
>> > commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
>> >
>> > we aggregate FAIR utilization only if this class has runnable tasks.
>> > This was mainly due to avoid the risk to stay on an high frequency just
>> > because of the blocked utilization of a CPU not being properly decayed
>> > while the CPU was idle.
>> >
>> > However, since:
>> >
>> > commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>> >
>> > the FAIR blocked utilization is properly decayed also for IDLE CPUs.
>> >
>> > This allows us to use the FAIR blocked utilization as a safe mechanism
>> > to gracefully reduce the frequency only if no FAIR tasks show up on a
>> > CPU for a reasonable period of time.
>> >
>> > Moreover, we also reduce the frequency drops of CPUs running periodic
>> > tasks which, depending on the task periodicity and the time required
>> > for a frequency switch, was increasing the chances to introduce some
>> > undesirable performance variations.
>> >
>> > Reported-by: Vincent Guittot <[email protected]>
>> > Signed-off-by: Patrick Bellasi <[email protected]>
>> > Acked-by: Viresh Kumar <[email protected]>
>> > Cc: Ingo Molnar <[email protected]>
>> > Cc: Peter Zijlstra <[email protected]>
>> > Cc: Rafael J. Wysocki <[email protected]>
>> > Cc: Vincent Guittot <[email protected]>
>> > Cc: Viresh Kumar <[email protected]>
>> > Cc: Joel Fernandes <[email protected]>
>> > Cc: [email protected]
>> > Cc: [email protected]
>>
>> With this patch, I can't see the spurious OPP changes that I was seeing before
>
> Cool thanks... regarding OPP updates I've added some more comments in
> my reply to Joel's comments to my last patch of this series.
>
> Would be nice if you can have a look... toward the end there are some
> considerations about schedutil updates (indirectly) triggered by your
> patches for blocked load updates on IDLE CPUs.
I have started to have a look at the 3rd patch and was checking if
there were some hole and your proposal regarding the update of blocked
load and the removed utilization
I will read your latest comment.
>
>> FWIW
>> Acked-by: Vincent Guittot <[email protected]>
>> Tested-by: Vincent Guittot <[email protected]>
>
> Thanks for testing, will add these to the next respin.
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi
Hi Patrick,
On 11 May 2018 at 15:15, Patrick Bellasi <[email protected]> wrote:
> Schedutil updates for FAIR tasks are triggered implicitly each time a
> cfs_rq's utilization is updated via cfs_rq_util_change(), currently
> called by update_cfs_rq_load_avg(), when the utilization of a cfs_rq has
> changed, and {attach,detach}_entity_load_avg().
>
> This design is based on the idea that "we should callback schedutil
> frequently enough" to properly update the CPU frequency at every
> utilization change. However, such an integration strategy has also
> some downsides:
>
> - schedutil updates are triggered by RQ's load updates, which makes
> sense in general but it does not allow to know exactly which other RQ
> related information have been updated.
> Recently, for example, we had issues due to schedutil dependencies on
> cfs_rq->h_nr_running and estimated utilization updates.
>
> - cfs_rq_util_change() is mainly a wrapper function for an already
> existing "public API", cpufreq_update_util(), which is required
> just to ensure we actually update schedutil only when we are updating
> a root cfs_rq.
> Thus, especially when task groups are in use, most of the calls to
> this wrapper function are not required.
>
> - the usage of a wrapper function is not completely consistent across
> fair.c, since we could still need additional explicit calls to
> cpufreq_update_util().
> For example this already happens to report the IOWAIT boot flag in
> the wakeup path.
>
> - it makes it hard to integrate new features since it could require to
> change other function prototypes just to pass in an additional flag,
> as it happened for example in commit:
>
> commit ea14b57e8a18 ("sched/cpufreq: Provide migration hint")
>
> All the above considered, let's make schedutil updates more explicit in
> fair.c by removing the cfs_rq_util_change() wrapper function in favour
> of the existing cpufreq_update_util() public API.
removing the wrapper makes sense
> This can be done by calling cpufreq_update_util() explicitly in the few
> call sites where it really makes sense and when all the (potentially)
> required cfs_rq's information have been updated.
>
> This patch mainly removes code and adds explicit schedutil updates
> only when we:
> - {enqueue,dequeue}_task_fair() a task to/from the root cfs_rq
The task's utilization is added during attach/detach and not during
the enqueue/dequeue.
> - (un)throttle_cfs_rq() a set of tasks up to the root cfs_rq
> - task_tick_fair() to update the utilization of the root cfs_rq
>
> All the other code paths, currently _indirectly_ covered by a call to
> update_load_avg(), are still covered. Indeed, some paths already imply
> enqueue/dequeue calls:
> - switch_{to,from}_fair()
> - sched_move_task()
> while others are followed by enqueue/dequeue calls:
> - cpu_cgroup_fork() and
> post_init_entity_util_avg():
> are used at wakeup_new_task() time and thus already followed by an
> enqueue_task_fair()
> - migrate_task_rq_fair():
> updates the removed utilization but not the actual cfs_rq
> utilization, which is updated by a following sched event
The removed utilization will be applied during next call to
update_cfs_rq_load_avg() which means at next call to update_load_avg()
or update_blocked_averages()
I'm going to continue on the longer reply that you made to Joel
>
> This new proposal allows also to better aggregate schedutil related
> flags, which are required only at enqueue_task_fair() time.
> IOWAIT and MIGRATION flags are now requested only when a task is
> actually visible at the root cfs_rq level.
>
> Signed-off-by: Patrick Bellasi <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
>
> ---