2017-09-03 20:15:58

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH RFC RESEND v2 0/2] Prevent cpufreq update for only task on rq that sleeps

These patches are just a repost of [1] and [2] with a cover letter for more
history and backround. On the Pixel product we carry a similar path which was
also posted some time ago to LKML [3] [4] however that patch was for schedfreq
governor (which isn't upstream). For schedutil which is upstream and currently
used on our future products, we go through the cpufreq update hooks and this
patch is adapted for this usecase.

[1] https://patchwork.kernel.org/patch/9910019/
[2] https://patchwork.kernel.org/patch/9910017/
[3] https://patchwork.kernel.org/patch/8385861/
[4] https://lwn.net/Articles/676886/

Joel Fernandes (2):
Revert "sched/fair: Drop always true parameter of
update_cfs_rq_load_avg()"
sched/fair: Skip frequency update if CPU about to idle

kernel/sched/fair.c | 38 +++++++++++++++++++++++++++++---------
kernel/sched/sched.h | 1 +
2 files changed, 30 insertions(+), 9 deletions(-)

Cc: Srinivas Pandruvada <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Patrick Bellasi <[email protected]>
Cc: Steve Muckle <[email protected]>
Cc: [email protected]
Signed-off-by: Joel Fernandes <[email protected]>
--
2.14.1.581.gf28d330327-goog


2017-09-03 20:16:03

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH RFC v2 1/2] Revert "sched/fair: Drop always true parameter of update_cfs_rq_load_avg()"

This reverts commit 3a123bbbb10d54dbdde6ccbbd519c74c91ba2f52.

Its needed by the series for controlling whether cpufreq is notified about
updating frequency during an update to the utilization.

Cc: Srinivas Pandruvada <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Patrick Bellasi <[email protected]>
Cc: Steve Muckle <[email protected]>
Cc: [email protected]
Signed-off-by: Joel Fernandes <[email protected]>
---
kernel/sched/fair.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eca6a57527f9..bf3595c0badf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -797,7 +797,7 @@ void post_init_entity_util_avg(struct sched_entity *se)
/*
* For !fair tasks do:
*
- update_cfs_rq_load_avg(now, cfs_rq);
+ update_cfs_rq_load_avg(now, cfs_rq, false);
attach_entity_load_avg(cfs_rq, se);
switched_from_fair(rq, p);
*
@@ -3607,6 +3607,7 @@ static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum
* update_cfs_rq_load_avg - update the cfs_rq's load/util averages
* @now: current time, as per cfs_rq_clock_task()
* @cfs_rq: cfs_rq to update
+ * @update_freq: should we call cfs_rq_util_change() or will the call do so
*
* The cfs_rq avg is the direct sum of all its entities (blocked and runnable)
* avg. The immediate corollary is that all (fair) tasks must be attached, see
@@ -3620,7 +3621,7 @@ static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum
* call update_tg_load_avg() when this function returns true.
*/
static inline int
-update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
{
unsigned long removed_load = 0, removed_util = 0, removed_runnable_sum = 0;
struct sched_avg *sa = &cfs_rq->avg;
@@ -3657,7 +3658,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
#endif

- if (decayed)
+ if (update_freq && decayed)
cfs_rq_util_change(cfs_rq);

return decayed;
@@ -3751,7 +3752,7 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
__update_load_avg_se(now, cpu, cfs_rq, se);

- decayed = update_cfs_rq_load_avg(now, cfs_rq);
+ decayed = update_cfs_rq_load_avg(now, cfs_rq, true);
decayed |= propagate_entity_load_avg(se);

if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
@@ -3841,7 +3842,7 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
#else /* CONFIG_SMP */

static inline int
-update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
{
return 0;
}
@@ -7331,7 +7332,7 @@ static void update_blocked_averages(int cpu)
if (throttled_hierarchy(cfs_rq))
continue;

- if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq))
+ if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))
update_tg_load_avg(cfs_rq, 0);

/* Propagate pending load changes to the parent, if any: */
@@ -7404,7 +7405,7 @@ static inline void update_blocked_averages(int cpu)

rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
- update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+ update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true);
rq_unlock_irqrestore(rq, &rf);
}

--
2.14.1.581.gf28d330327-goog

2017-09-03 20:16:16

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH RFC v2 2/2] sched/fair: Skip frequency update if CPU about to idle

In an application playing music where the music app's thread wakes up and
sleeps periodically on an Android device, its seen that the frequency increases
slightly on the dequeue and is reduced after the wake up on the wake up. This
oscillation continues between 300Mhz and 350Mhz, all the while the task is
running at 300MHz when its active. This is pointless and causes unnecessary
wake ups of the governor thread on slow-switch systems.

This patch prevents a frequency update on the last dequeue. With this the
number of schedutil governor thread wake ups are reduces more than 2 times
(1389 -> 527).

Cc: Srinivas Pandruvada <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Patrick Bellasi <[email protected]>
Cc: Steve Muckle <[email protected]>
Cc: [email protected]
Signed-off-by: Joel Fernandes <[email protected]>
---
kernel/sched/fair.c | 25 ++++++++++++++++++++++---
kernel/sched/sched.h | 1 +
2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf3595c0badf..82496bb2dad3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3736,6 +3736,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
#define UPDATE_TG 0x1
#define SKIP_AGE_LOAD 0x2
#define DO_ATTACH 0x4
+#define SKIP_CPUFREQ 0x8

/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
@@ -3752,7 +3753,7 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
__update_load_avg_se(now, cpu, cfs_rq, se);

- decayed = update_cfs_rq_load_avg(now, cfs_rq, true);
+ decayed = update_cfs_rq_load_avg(now, cfs_rq, !(flags & SKIP_CPUFREQ));
decayed |= propagate_entity_load_avg(se);

if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
@@ -3850,6 +3851,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
#define UPDATE_TG 0x0
#define SKIP_AGE_LOAD 0x0
#define DO_ATTACH 0x0
+#define SKIP_CPUFREQ 0x0

static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1)
{
@@ -4071,6 +4073,8 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
+ int update_flags;
+
/*
* Update run-time statistics of the 'current'.
*/
@@ -4084,7 +4088,12 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
* - For group entity, update its weight to reflect the new share
* of its group cfs_rq.
*/
- update_load_avg(cfs_rq, se, UPDATE_TG);
+ update_flags = UPDATE_TG;
+
+ if (flags & DEQUEUE_IDLE)
+ update_flags |= SKIP_CPUFREQ;
+
+ update_load_avg(cfs_rq, se, update_flags);
dequeue_runnable_load_avg(cfs_rq, se);

update_stats_dequeue(cfs_rq, se, flags);
@@ -5231,6 +5240,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
struct sched_entity *se = &p->se;
int task_sleep = flags & DEQUEUE_SLEEP;

+ if (task_sleep && rq->nr_running == 1)
+ flags |= DEQUEUE_IDLE;
+
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, flags);
@@ -5261,13 +5273,20 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
}

for_each_sched_entity(se) {
+ int update_flags;
+
cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_running--;

if (cfs_rq_throttled(cfs_rq))
break;

- update_load_avg(cfs_rq, se, UPDATE_TG);
+ update_flags = UPDATE_TG;
+
+ if (flags & DEQUEUE_IDLE)
+ update_flags |= SKIP_CPUFREQ;
+
+ update_load_avg(cfs_rq, se, update_flags);
update_cfs_group(se);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ef16fd0a74fd..44a9048e54dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1388,6 +1388,7 @@ extern const u32 sched_prio_to_wmult[40];
#define DEQUEUE_SAVE 0x02 /* matches ENQUEUE_RESTORE */
#define DEQUEUE_MOVE 0x04 /* matches ENQUEUE_MOVE */
#define DEQUEUE_NOCLOCK 0x08 /* matches ENQUEUE_NOCLOCK */
+#define DEQUEUE_IDLE 0x80

#define ENQUEUE_WAKEUP 0x01
#define ENQUEUE_RESTORE 0x02
--
2.14.1.581.gf28d330327-goog

2017-09-07 16:14:03

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC RESEND v2 0/2] Prevent cpufreq update for only task on rq that sleeps

On Sun, Sep 3, 2017 at 1:15 PM, Joel Fernandes <[email protected]> wrote:
> These patches are just a repost of [1] and [2] with a cover letter for more
> history and backround. On the Pixel product we carry a similar path which was
> also posted some time ago to LKML [3] [4] however that patch was for schedfreq
> governor (which isn't upstream). For schedutil which is upstream and currently
> used on our future products, we go through the cpufreq update hooks and this
> patch is adapted for this usecase.
>
> [1] https://patchwork.kernel.org/patch/9910019/
> [2] https://patchwork.kernel.org/patch/9910017/
> [3] https://patchwork.kernel.org/patch/8385861/
> [4] https://lwn.net/Articles/676886/
>

Hi,

I'm planning to rebase this series on Linus's master and post it
again, but just checking any thoughts about it?

Just to add more context, the reason for not updating the frequency:

- When a last dequeue of a sleeping task happens, it is sufficient to
update utilization without updating the frequency because if other
CPUs are busy then their updates will consider the utilization of the
idle CPU in the shared policy unless sufficient time has passed.

- If the last dequeue of a sleeping task happens while all other CPUs
in the cluster are idle, then the cluster will likely enter
cluster-idle soon.

Could you let me know your opinions on this series?

thanks,

-Joel


[..]

2017-09-07 18:11:02

by Steve Muckle

[permalink] [raw]
Subject: Re: [PATCH RFC RESEND v2 0/2] Prevent cpufreq update for only task on rq that sleeps

On 09/07/2017 09:14 AM, Joel Fernandes wrote:
> I'm planning to rebase this series on Linus's master and post it
> again, but just checking any thoughts about it?
>
> Just to add more context, the reason for not updating the frequency:
>
> - When a last dequeue of a sleeping task happens, it is sufficient to
> update utilization without updating the frequency because if other
> CPUs are busy then their updates will consider the utilization of the
> idle CPU in the shared policy unless sufficient time has passed.
>
> - If the last dequeue of a sleeping task happens while all other CPUs
> in the cluster are idle, then the cluster will likely enter
> cluster-idle soon.

To clarify - when you say "last dequeue of a sleeping task happens"
above, you're referring to the dequeue of the last task running on the
CPU, correct? I.e. the CPU is about to go idle?

It's been a while since I've looked at this area so would like to hold
off for a rebased version to review in further detail. But I think the
concept is valid.

thanks,
steve

2017-09-07 18:58:32

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH RFC RESEND v2 0/2] Prevent cpufreq update for only task on rq that sleeps

Hi Steve,

On Thu, Sep 7, 2017 at 11:10 AM, Steve Muckle <[email protected]> wrote:
> On 09/07/2017 09:14 AM, Joel Fernandes wrote:
>>
>> I'm planning to rebase this series on Linus's master and post it
>> again, but just checking any thoughts about it?
>>
>> Just to add more context, the reason for not updating the frequency:
>>
>> - When a last dequeue of a sleeping task happens, it is sufficient to
>> update utilization without updating the frequency because if other
>> CPUs are busy then their updates will consider the utilization of the
>> idle CPU in the shared policy unless sufficient time has passed.
>>
>> - If the last dequeue of a sleeping task happens while all other CPUs
>> in the cluster are idle, then the cluster will likely enter
>> cluster-idle soon.
>
>
> To clarify - when you say "last dequeue of a sleeping task happens" above,
> you're referring to the dequeue of the last task running on the CPU,
> correct? I.e. the CPU is about to go idle?

Yes that's right, sorry for my poor choice of words. I am referring to
dequeue of a task that is DEQUEUE_SLEEP and is the only task on the
RQ.

> It's been a while since I've looked at this area so would like to hold off
> for a rebased version to review in further detail. But I think the concept
> is valid.

Sure and thanks for making time for the review!

-Joel