2016-04-18 05:51:33

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

Sometimes update_curr() is called w/o tasks actually running, it is
captured by:
u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
We should not trigger cpufreq update in this case for rt/deadline
classes, and this patch fix it.

Signed-off-by: Wanpeng Li <[email protected]>
---
kernel/sched/deadline.c | 8 ++++----
kernel/sched/rt.c | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index affd97e..8f9b5af 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -717,10 +717,6 @@ static void update_curr_dl(struct rq *rq)
if (!dl_task(curr) || !on_dl_rq(dl_se))
return;

- /* Kick cpufreq (see the comment in linux/cpufreq.h). */
- if (cpu_of(rq) == smp_processor_id())
- cpufreq_trigger_update(rq_clock(rq));
-
/*
* Consumed budget is computed considering the time as
* observed by schedulable tasks (excluding time spent
@@ -736,6 +732,10 @@ static void update_curr_dl(struct rq *rq)
return;
}

+ /* kick cpufreq (see the comment in linux/cpufreq.h). */
+ if (cpu_of(rq) == smp_processor_id())
+ cpufreq_trigger_update(rq_clock(rq));
+
schedstat_set(curr->se.statistics.exec_max,
max(curr->se.statistics.exec_max, delta_exec));

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c41ea7a..19e1306 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -953,14 +953,14 @@ static void update_curr_rt(struct rq *rq)
if (curr->sched_class != &rt_sched_class)
return;

- /* Kick cpufreq (see the comment in linux/cpufreq.h). */
- if (cpu_of(rq) == smp_processor_id())
- cpufreq_trigger_update(rq_clock(rq));
-
delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec <= 0))
return;

+ /* Kick cpufreq (see the comment in linux/cpufreq.h). */
+ if (cpu_of(rq) == smp_processor_id())
+ cpufreq_trigger_update(rq_clock(rq));
+
schedstat_set(curr->se.statistics.exec_max,
max(curr->se.statistics.exec_max, delta_exec));

--
1.9.1


2016-04-20 00:29:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
> Sometimes update_curr() is called w/o tasks actually running, it is
> captured by:
> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> We should not trigger cpufreq update in this case for rt/deadline
> classes, and this patch fix it.
>
> Signed-off-by: Wanpeng Li <[email protected]>

The signed-off-by tag should agree with the From: header. One way to achieve
that is to add an extra From: line at the start of the changelog.

That said, this looks like a good catch that should go into 4.6 to me.

Peter, what do you think?

> ---
> kernel/sched/deadline.c | 8 ++++----
> kernel/sched/rt.c | 8 ++++----
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index affd97e..8f9b5af 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -717,10 +717,6 @@ static void update_curr_dl(struct rq *rq)
> if (!dl_task(curr) || !on_dl_rq(dl_se))
> return;
>
> - /* Kick cpufreq (see the comment in linux/cpufreq.h). */
> - if (cpu_of(rq) == smp_processor_id())
> - cpufreq_trigger_update(rq_clock(rq));
> -
> /*
> * Consumed budget is computed considering the time as
> * observed by schedulable tasks (excluding time spent
> @@ -736,6 +732,10 @@ static void update_curr_dl(struct rq *rq)
> return;
> }
>
> + /* kick cpufreq (see the comment in linux/cpufreq.h). */
> + if (cpu_of(rq) == smp_processor_id())
> + cpufreq_trigger_update(rq_clock(rq));
> +
> schedstat_set(curr->se.statistics.exec_max,
> max(curr->se.statistics.exec_max, delta_exec));
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index c41ea7a..19e1306 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -953,14 +953,14 @@ static void update_curr_rt(struct rq *rq)
> if (curr->sched_class != &rt_sched_class)
> return;
>
> - /* Kick cpufreq (see the comment in linux/cpufreq.h). */
> - if (cpu_of(rq) == smp_processor_id())
> - cpufreq_trigger_update(rq_clock(rq));
> -
> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> if (unlikely((s64)delta_exec <= 0))
> return;
>
> + /* Kick cpufreq (see the comment in linux/cpufreq.h). */
> + if (cpu_of(rq) == smp_processor_id())
> + cpufreq_trigger_update(rq_clock(rq));
> +
> schedstat_set(curr->se.statistics.exec_max,
> max(curr->se.statistics.exec_max, delta_exec));
>
>

2016-04-20 00:48:44

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

2016-04-20 8:32 GMT+08:00 Rafael J. Wysocki <[email protected]>:
> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>> Sometimes update_curr() is called w/o tasks actually running, it is
>> captured by:
>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>> We should not trigger cpufreq update in this case for rt/deadline
>> classes, and this patch fix it.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
>
> The signed-off-by tag should agree with the From: header. One way to achieve
> that is to add an extra From: line at the start of the changelog.

Thanks for the tip Rafael, just send out v2 to fix it.

Regards,
Wanpeng Li

2016-04-20 14:01:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
> > Sometimes update_curr() is called w/o tasks actually running, it is
> > captured by:
> > u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> > We should not trigger cpufreq update in this case for rt/deadline
> > classes, and this patch fix it.
> >
> > Signed-off-by: Wanpeng Li <[email protected]>
>
> The signed-off-by tag should agree with the From: header. One way to achieve
> that is to add an extra From: line at the start of the changelog.
>
> That said, this looks like a good catch that should go into 4.6 to me.
>
> Peter, what do you think?

I'm confused by the Changelog. *what* ?

2016-04-20 22:24:21

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

2016-04-20 22:01 GMT+08:00 Peter Zijlstra <[email protected]>:
> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>> > Sometimes update_curr() is called w/o tasks actually running, it is
>> > captured by:
>> > u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>> > We should not trigger cpufreq update in this case for rt/deadline
>> > classes, and this patch fix it.
>> >
>> > Signed-off-by: Wanpeng Li <[email protected]>
>>
>> The signed-off-by tag should agree with the From: header. One way to achieve
>> that is to add an extra From: line at the start of the changelog.
>>
>> That said, this looks like a good catch that should go into 4.6 to me.
>>
>> Peter, what do you think?
>
> I'm confused by the Changelog. *what* ?

Sometimes .update_curr hook is called w/o tasks actually running, it is
captured by:

u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;

We should not trigger cpufreq update in this case for rt/deadline
classes, and this patch fix it.

Regards,
Wanpeng Li

2016-04-20 22:28:42

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On 4/21/2016 12:24 AM, Wanpeng Li wrote:
> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <[email protected]>:
>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>> captured by:
>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>> classes, and this patch fix it.
>>>>
>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>> The signed-off-by tag should agree with the From: header. One way to achieve
>>> that is to add an extra From: line at the start of the changelog.
>>>
>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>
>>> Peter, what do you think?
>> I'm confused by the Changelog. *what* ?
> Sometimes .update_curr hook is called w/o tasks actually running, it is
> captured by:
>
> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>
> We should not trigger cpufreq update in this case for rt/deadline
> classes, and this patch fix it.

That's what you wrote in the changelog, no need to repeat that.

I guess Peter is asking for more details, though. I actually would like
to get some more details here too. Like an example of when the
situation in question actually happens.

Thanks,
Rafael


2016-04-21 01:09:46

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

2016-04-21 6:28 GMT+08:00 Rafael J. Wysocki <[email protected]>:
> On 4/21/2016 12:24 AM, Wanpeng Li wrote:
>>
>> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <[email protected]>:
>>>
>>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>>>
>>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>>>
>>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>>> captured by:
>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>> classes, and this patch fix it.
>>>>>
>>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>>>
>>>> The signed-off-by tag should agree with the From: header. One way to
>>>> achieve
>>>> that is to add an extra From: line at the start of the changelog.
>>>>
>>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>>
>>>> Peter, what do you think?
>>>
>>> I'm confused by the Changelog. *what* ?
>>
>> Sometimes .update_curr hook is called w/o tasks actually running, it is
>> captured by:
>>
>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>
>> We should not trigger cpufreq update in this case for rt/deadline
>> classes, and this patch fix it.
>
>
> That's what you wrote in the changelog, no need to repeat that.
>
> I guess Peter is asking for more details, though. I actually would like to
> get some more details here too. Like an example of when the situation in
> question actually happens.

I add a print to print when delta_exec is zero for rt class, something
like below:

watchdog/5-48 [005] d... 568.449095: update_curr_rt: rt
delta_exec is zero
watchdog/5-48 [005] d... 568.449104: <stack trace>
=> pick_next_task_rt
=> __schedule
=> schedule
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
delta_exec is zero
watchdog/5-48 [005] d... 568.449111: <stack trace>
=> put_prev_task_rt
=> pick_next_task_idle
=> __schedule
=> schedule
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
watchdog/6-56 [006] d... 568.510094: update_curr_rt: rt
delta_exec is zero
watchdog/6-56 [006] d... 568.510103: <stack trace>
=> pick_next_task_rt
=> __schedule
=> schedule
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
watchdog/6-56 [006] d... 568.510105: update_curr_rt: rt
delta_exec is zero
watchdog/6-56 [006] d... 568.510111: <stack trace>
=> put_prev_task_rt
=> pick_next_task_idle
=> __schedule
=> schedule
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
[...]

Regards,
Wanpeng Li

2016-04-21 11:12:23

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On 4/21/2016 3:09 AM, Wanpeng Li wrote:
> 2016-04-21 6:28 GMT+08:00 Rafael J. Wysocki <[email protected]>:
>> On 4/21/2016 12:24 AM, Wanpeng Li wrote:
>>> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <[email protected]>:
>>>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>>>> captured by:
>>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>>> classes, and this patch fix it.
>>>>>>
>>>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>>>> The signed-off-by tag should agree with the From: header. One way to
>>>>> achieve
>>>>> that is to add an extra From: line at the start of the changelog.
>>>>>
>>>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>>>
>>>>> Peter, what do you think?
>>>> I'm confused by the Changelog. *what* ?
>>> Sometimes .update_curr hook is called w/o tasks actually running, it is
>>> captured by:
>>>
>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>
>>> We should not trigger cpufreq update in this case for rt/deadline
>>> classes, and this patch fix it.
>>
>> That's what you wrote in the changelog, no need to repeat that.
>>
>> I guess Peter is asking for more details, though. I actually would like to
>> get some more details here too. Like an example of when the situation in
>> question actually happens.
> I add a print to print when delta_exec is zero for rt class, something
> like below:
>
> watchdog/5-48 [005] d... 568.449095: update_curr_rt: rt
> delta_exec is zero
> watchdog/5-48 [005] d... 568.449104: <stack trace>
> => pick_next_task_rt
> => __schedule
> => schedule
> => smpboot_thread_fn
> => kthread
> => ret_from_fork
> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
> delta_exec is zero
> watchdog/5-48 [005] d... 568.449111: <stack trace>
> => put_prev_task_rt
> => pick_next_task_idle
> => __schedule
> => schedule
> => smpboot_thread_fn
> => kthread
> => ret_from_fork
> watchdog/6-56 [006] d... 568.510094: update_curr_rt: rt
> delta_exec is zero
> watchdog/6-56 [006] d... 568.510103: <stack trace>
> => pick_next_task_rt
> => __schedule
> => schedule
> => smpboot_thread_fn
> => kthread
> => ret_from_fork
> watchdog/6-56 [006] d... 568.510105: update_curr_rt: rt
> delta_exec is zero
> watchdog/6-56 [006] d... 568.510111: <stack trace>
> => put_prev_task_rt
> => pick_next_task_idle
> => __schedule
> => schedule
> => smpboot_thread_fn
> => kthread
> => ret_from_fork
> [...]

And the statement in your changelog follows from this I suppose. How
does it follow, exactly?

Thanks,
Rafael

2016-04-21 12:13:02

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

2016-04-21 19:11 GMT+08:00 Rafael J. Wysocki <[email protected]>:
> On 4/21/2016 3:09 AM, Wanpeng Li wrote:
>>
>> 2016-04-21 6:28 GMT+08:00 Rafael J. Wysocki <[email protected]>:
>>>
>>> On 4/21/2016 12:24 AM, Wanpeng Li wrote:
>>>>
>>>> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <[email protected]>:
>>>>>
>>>>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>>>>>
>>>>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>>>>>
>>>>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>>>>> captured by:
>>>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>>>> classes, and this patch fix it.
>>>>>>>
>>>>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>>>>>
>>>>>> The signed-off-by tag should agree with the From: header. One way to
>>>>>> achieve
>>>>>> that is to add an extra From: line at the start of the changelog.
>>>>>>
>>>>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>>>>
>>>>>> Peter, what do you think?
>>>>>
>>>>> I'm confused by the Changelog. *what* ?
>>>>
>>>> Sometimes .update_curr hook is called w/o tasks actually running, it is
>>>> captured by:
>>>>
>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>
>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>> classes, and this patch fix it.
>>>
>>>
>>> That's what you wrote in the changelog, no need to repeat that.
>>>
>>> I guess Peter is asking for more details, though. I actually would like
>>> to
>>> get some more details here too. Like an example of when the situation in
>>> question actually happens.
>>
>> I add a print to print when delta_exec is zero for rt class, something
>> like below:
>>
>> watchdog/5-48 [005] d... 568.449095: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/5-48 [005] d... 568.449104: <stack trace>
>> => pick_next_task_rt
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/5-48 [005] d... 568.449111: <stack trace>
>> => put_prev_task_rt
>> => pick_next_task_idle
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> watchdog/6-56 [006] d... 568.510094: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/6-56 [006] d... 568.510103: <stack trace>
>> => pick_next_task_rt
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> watchdog/6-56 [006] d... 568.510105: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/6-56 [006] d... 568.510111: <stack trace>
>> => put_prev_task_rt
>> => pick_next_task_idle
>> => __schedule
>> => schedule
>> => smpboot_thread_fn
>> => kthread
>> => ret_from_fork
>> [...]
>
>
> And the statement in your changelog follows from this I suppose. How does it
> follow, exactly?

For example, rt task A will go to sleep, an rt task B is the next
candidate to run.

__schedule()
-> deactivate_task(A, DEQUEUE_SLEEP)
-> dequeue_task_rt()
-> update_curr_rt()
-> cpufreq_trigger_update()
-> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
[...]
-> pick_next_task_rt()
-> update_curr_rt() => rq->curr is still A currently
-> cpufreq_trigger_update()
-> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
=> delta == 0, actually A is not running between these two updates
if (likely(prev != next)) {
rq->curr = B;
[...]
}

Regards,
Wanpeng Li

2016-04-21 12:25:18

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

2016-04-21 20:12 GMT+08:00 Wanpeng Li <[email protected]>:
> 2016-04-21 19:11 GMT+08:00 Rafael J. Wysocki <[email protected]>:
>> On 4/21/2016 3:09 AM, Wanpeng Li wrote:
>>>
>>> 2016-04-21 6:28 GMT+08:00 Rafael J. Wysocki <[email protected]>:
>>>>
>>>> On 4/21/2016 12:24 AM, Wanpeng Li wrote:
>>>>>
>>>>> 2016-04-20 22:01 GMT+08:00 Peter Zijlstra <[email protected]>:
>>>>>>
>>>>>> On Wed, Apr 20, 2016 at 02:32:35AM +0200, Rafael J. Wysocki wrote:
>>>>>>>
>>>>>>> On Monday, April 18, 2016 01:51:24 PM Wanpeng Li wrote:
>>>>>>>>
>>>>>>>> Sometimes update_curr() is called w/o tasks actually running, it is
>>>>>>>> captured by:
>>>>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>>>>> classes, and this patch fix it.
>>>>>>>>
>>>>>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>>>>>>
>>>>>>> The signed-off-by tag should agree with the From: header. One way to
>>>>>>> achieve
>>>>>>> that is to add an extra From: line at the start of the changelog.
>>>>>>>
>>>>>>> That said, this looks like a good catch that should go into 4.6 to me.
>>>>>>>
>>>>>>> Peter, what do you think?
>>>>>>
>>>>>> I'm confused by the Changelog. *what* ?
>>>>>
>>>>> Sometimes .update_curr hook is called w/o tasks actually running, it is
>>>>> captured by:
>>>>>
>>>>> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>>>>>
>>>>> We should not trigger cpufreq update in this case for rt/deadline
>>>>> classes, and this patch fix it.
>>>>
>>>>
>>>> That's what you wrote in the changelog, no need to repeat that.
>>>>
>>>> I guess Peter is asking for more details, though. I actually would like
>>>> to
>>>> get some more details here too. Like an example of when the situation in
>>>> question actually happens.
>>>
>>> I add a print to print when delta_exec is zero for rt class, something
>>> like below:
>>>
>>> watchdog/5-48 [005] d... 568.449095: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/5-48 [005] d... 568.449104: <stack trace>
>>> => pick_next_task_rt
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/5-48 [005] d... 568.449111: <stack trace>
>>> => put_prev_task_rt
>>> => pick_next_task_idle
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> watchdog/6-56 [006] d... 568.510094: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/6-56 [006] d... 568.510103: <stack trace>
>>> => pick_next_task_rt
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> watchdog/6-56 [006] d... 568.510105: update_curr_rt: rt
>>> delta_exec is zero
>>> watchdog/6-56 [006] d... 568.510111: <stack trace>
>>> => put_prev_task_rt
>>> => pick_next_task_idle
>>> => __schedule
>>> => schedule
>>> => smpboot_thread_fn
>>> => kthread
>>> => ret_from_fork
>>> [...]
>>
>>
>> And the statement in your changelog follows from this I suppose. How does it
>> follow, exactly?
>
> For example, rt task A will go to sleep, an rt task B is the next
> candidate to run.
>
> __schedule()
> -> deactivate_task(A, DEQUEUE_SLEEP)
> -> dequeue_task_rt()
> -> update_curr_rt()
> -> cpufreq_trigger_update()
> -> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> [...]
> -> pick_next_task_rt()
> -> update_curr_rt() => rq->curr is still A currently
> -> cpufreq_trigger_update()
> -> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> => delta == 0, actually A is not running between these two updates
> if (likely(prev != next)) {
> rq->curr = B;
> [...]
> }

Actually I suspect that there is another cpufreq update w/ delta == 0
due to pick_next_task_rt() currently implementation:

if (prev->sched_class == &rt_sched_class)
update_curr(rq); => rq->curr is still A currently
[...]
put_prev_task(rq, prev);
-> update_curr(rq); => rq->curr is still A currently

Regards,
Wanpeng Li

2016-04-21 12:33:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On Thu, Apr 21, 2016 at 09:09:43AM +0800, Wanpeng Li wrote:
> >> Sometimes .update_curr hook is called w/o tasks actually running, it is
> >> captured by:
> >>
> >> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> >>
> >> We should not trigger cpufreq update in this case for rt/deadline
> >> classes, and this patch fix it.

> I add a print to print when delta_exec is zero for rt class, something

So its zero, so what?

> like below:

> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
> delta_exec is zero
> watchdog/5-48 [005] d... 568.449111: <stack trace>
> => put_prev_task_rt
> => pick_next_task_idle

So we'll go idle, but as of this point we're still running the rt task.

So your Changelog is actively wrong, the tasks _are_ still running,
albeit not for very much longer.

2016-04-21 13:33:50

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

Hi Peterz,
2016-04-21 20:33 GMT+08:00 Peter Zijlstra <[email protected]>:
> On Thu, Apr 21, 2016 at 09:09:43AM +0800, Wanpeng Li wrote:
>> >> Sometimes .update_curr hook is called w/o tasks actually running, it is
>> >> captured by:
>> >>
>> >> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>> >>
>> >> We should not trigger cpufreq update in this case for rt/deadline
>> >> classes, and this patch fix it.
>
>> I add a print to print when delta_exec is zero for rt class, something
>
> So its zero, so what?
>
>> like below:
>
>> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/5-48 [005] d... 568.449111: <stack trace>
>> => put_prev_task_rt
>> => pick_next_task_idle
>
> So we'll go idle, but as of this point we're still running the rt task.
>
> So your Changelog is actively wrong, the tasks _are_ still running,
> albeit not for very much longer.

Thanks for your pointing out, I will update the changelog as we
discuss in IRC. :-)

Regards,
Wanpeng Li

2016-04-21 17:07:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On Thu, Apr 21, 2016 at 2:33 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Apr 21, 2016 at 09:09:43AM +0800, Wanpeng Li wrote:
>> >> Sometimes .update_curr hook is called w/o tasks actually running, it is
>> >> captured by:
>> >>
>> >> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>> >>
>> >> We should not trigger cpufreq update in this case for rt/deadline
>> >> classes, and this patch fix it.
>
>> I add a print to print when delta_exec is zero for rt class, something
>
> So its zero, so what?
>
>> like below:
>
>> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
>> delta_exec is zero
>> watchdog/5-48 [005] d... 568.449111: <stack trace>
>> => put_prev_task_rt
>> => pick_next_task_idle
>
> So we'll go idle, but as of this point we're still running the rt task.

Skipping the update in that case might be the right thing to do, though.

It doesn't matter in 4.6-rc, because the current governors don't use
util/max anyway, so they just get an extra call they can use to
evaluate things.

However, it matters for schedutil, because it will (over)react to the
special util/max combination then. So this looks like a change to
make in 4.7.

2016-04-21 17:17:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cpufreq: don't trigger cpufreq update w/o real rt/deadline tasks running

On Thu, Apr 21, 2016 at 07:07:51PM +0200, Rafael J. Wysocki wrote:
> On Thu, Apr 21, 2016 at 2:33 PM, Peter Zijlstra <[email protected]> wrote:
> > On Thu, Apr 21, 2016 at 09:09:43AM +0800, Wanpeng Li wrote:
> >> >> Sometimes .update_curr hook is called w/o tasks actually running, it is
> >> >> captured by:
> >> >>
> >> >> u64 delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> >> >>
> >> >> We should not trigger cpufreq update in this case for rt/deadline
> >> >> classes, and this patch fix it.
> >
> >> I add a print to print when delta_exec is zero for rt class, something
> >
> > So its zero, so what?
> >
> >> like below:
> >
> >> watchdog/5-48 [005] d... 568.449105: update_curr_rt: rt
> >> delta_exec is zero
> >> watchdog/5-48 [005] d... 568.449111: <stack trace>
> >> => put_prev_task_rt
> >> => pick_next_task_idle
> >
> > So we'll go idle, but as of this point we're still running the rt task.
>
> Skipping the update in that case might be the right thing to do, though.

It is; the patch looks fine, but the Changelog is entirely
misleading/wrong.

Its not because the task isn't running; it is. Its because we end up
calling update_curr() multiple times and bailing when nothing changed is
indeed the right thing.