LinuxLists.cc - [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

2014-10-31 23:21:46

Subject: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi all,

I observe that dl task can't be migrated to other cpus during cpu hotplug, in
addition, task may/may not be running again if cpu is added back. The root cause
which I found is that dl task will be throtted and removed from dl rq after
comsuming all budget, which leads to stop task can't pick it up from dl rq and
migrate to other cpus during hotplug.

So I try two methods.

- add throttled dl sched_entity to a throttled_list, the list will be traversed
during cpu hotplug, and the dl sched_entity will be picked and enqueue, then
stop task will pick and migrate it. However, dl sched_entity is throttled again
before stop task running since the below path. This path will set rq->online 0
which lead to set_rq_offline() won't be called in function migration_call().

Call Trace:
[...] rq_offline_dl+0x44/0x66
[...] set_rq_offline+0x29/0x54
[...] rq_attach_root+0x3f/0xb7
[...] cpu_attach_domain+0x1c7/0x354
[...] build_sched_domains+0x295/0x304
[...] partition_sched_domains+0x26a/0x2e6
[...] ? emulator_write_gpr+0x27/0x27 [kvm]
[...] cpuset_update_active_cpus+0x12/0x2c
[...] cpuset_cpu_inactive+0x1b/0x38
[...] notifier_call_chain+0x32/0x5e
[...] __raw_notifier_call_chain+0x9/0xb
[...] __cpu_notify+0x1b/0x2d
[...] _cpu_down+0x81/0x22a
[...] cpu_down+0x28/0x35
[...] cpu_subsys_offline+0xf/0x11
[...] device_offline+0x78/0xa8
[...] online_store+0x48/0x69
[...] ? kernfs_fop_write+0x61/0x129
[...] dev_attr_store+0x1b/0x1d
[...] sysfs_kf_write+0x37/0x39
[...] kernfs_fop_write+0xe9/0x129
[...] vfs_write+0xc6/0x19e
[...] SyS_write+0x4b/0x8f
[...] system_call_fastpath+0x16/0x1b

- The difference of the method two is that dl sched_entity won't be throtted
if rq is offline, the dl sched_entity will be replenished in update_curr_dl().
However, the echo 0 > /sys/devices/system/cpu/cpuN/online hung.

Juri, your proposal is a great welcome. ;-)

Note: This patch is just a proposal and still can't successfully migrate
dl task during cpu hotplug.

Signed-off-by: Wanpeng Li <[email protected]>
---
include/linux/sched.h | 2 ++
kernel/sched/deadline.c | 22 +++++++++++++++++++++-
kernel/sched/sched.h | 3 +++
3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4400ddc..bd71f19 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1253,6 +1253,8 @@ struct sched_dl_entity {
* own bandwidth to be enforced, thus we need one timer per task.
*/
struct hrtimer dl_timer;
+ struct list_head throttled_node;
+ int on_list;
};

union rcu_special {
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 2e31a30..d6d6b71 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -80,6 +80,7 @@ void init_dl_rq(struct dl_rq *dl_rq, struct rq *rq)
dl_rq->dl_nr_migratory = 0;
dl_rq->overloaded = 0;
dl_rq->pushable_dl_tasks_root = RB_ROOT;
+ INIT_LIST_HEAD(&dl_rq->throttled_list);
#else
init_dl_bw(&dl_rq->dl_bw);
#endif
@@ -538,6 +539,10 @@ again:
update_rq_clock(rq);
dl_se->dl_throttled = 0;
dl_se->dl_yielded = 0;
+ if (dl_se->on_list) {
+ list_del(&dl_se->throttled_node);
+ dl_se->on_list = 0;
+ }
if (task_on_rq_queued(p)) {
enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
if (dl_task(rq->curr))
@@ -636,8 +641,12 @@ static void update_curr_dl(struct rq *rq)
dl_se->runtime -= delta_exec;
if (dl_runtime_exceeded(rq, dl_se)) {
__dequeue_task_dl(rq, curr, 0);
- if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
+ if (rq->online && likely(start_dl_timer(dl_se, curr->dl.dl_boosted))) {
dl_se->dl_throttled = 1;
+ dl_se->on_list = 1;
+ list_add(&dl_se->throttled_node,
+ &rq->dl.throttled_list);
+ }
else
enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);

@@ -1593,9 +1602,20 @@ static void rq_online_dl(struct rq *rq)
/* Assumes rq->lock is held */
static void rq_offline_dl(struct rq *rq)
{
+ struct task_struct *p, *n;
+
if (rq->dl.overloaded)
dl_clear_overload(rq);

+ /* Make sched_dl_entity available for pick_next_task() */
+ list_for_each_entry_safe(p, n, &rq->dl.throttled_list, dl.throttled_node) {
+ p->dl.dl_throttled = 0;
+ hrtimer_cancel(&p->dl.dl_timer);
+ p->dl.dl_runtime = p->dl.dl_runtime;
+ if (task_on_rq_queued(p))
+ enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
+ }
+
cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec3917c..8f95036 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -482,6 +482,9 @@ struct dl_rq {
*/
struct rb_root pushable_dl_tasks_root;
struct rb_node *pushable_dl_tasks_leftmost;
+
+ struct list_head throttled_list;
+
#else
struct dl_bw dl_bw;
#endif
--
1.9.1

2014-11-03 02:24:32

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi Kirill,
On Fri, Oct 31, 2014 at 12:20:29PM +0300, Kirill Tkhai wrote:
>В Пт, 31/10/2014 в 15:28 +0800, Wanpeng Li пишет:
>> Hi all,
>>
>> I observe that dl task can't be migrated to other cpus during cpu hotplug, in
>> addition, task may/may not be running again if cpu is added back. The root cause
>> which I found is that dl task will be throtted and removed from dl rq after
>> comsuming all budget, which leads to stop task can't pick it up from dl rq and
>> migrate to other cpus during hotplug.
>>
>> So I try two methods.
>>
>> - add throttled dl sched_entity to a throttled_list, the list will be traversed
>> during cpu hotplug, and the dl sched_entity will be picked and enqueue, then
>> stop task will pick and migrate it. However, dl sched_entity is throttled again
>> before stop task running since the below path. This path will set rq->online 0
>> which lead to set_rq_offline() won't be called in function migration_call().
>>
>> Call Trace:
>> [...] rq_offline_dl+0x44/0x66
>> [...] set_rq_offline+0x29/0x54
>> [...] rq_attach_root+0x3f/0xb7
>> [...] cpu_attach_domain+0x1c7/0x354
>> [...] build_sched_domains+0x295/0x304
>> [...] partition_sched_domains+0x26a/0x2e6
>> [...] ? emulator_write_gpr+0x27/0x27 [kvm]
>> [...] cpuset_update_active_cpus+0x12/0x2c
>> [...] cpuset_cpu_inactive+0x1b/0x38
>> [...] notifier_call_chain+0x32/0x5e
>> [...] __raw_notifier_call_chain+0x9/0xb
>> [...] __cpu_notify+0x1b/0x2d
>> [...] _cpu_down+0x81/0x22a
>> [...] cpu_down+0x28/0x35
>> [...] cpu_subsys_offline+0xf/0x11
>> [...] device_offline+0x78/0xa8
>> [...] online_store+0x48/0x69
>> [...] ? kernfs_fop_write+0x61/0x129
>> [...] dev_attr_store+0x1b/0x1d
>> [...] sysfs_kf_write+0x37/0x39
>> [...] kernfs_fop_write+0xe9/0x129
>> [...] vfs_write+0xc6/0x19e
>> [...] SyS_write+0x4b/0x8f
>> [...] system_call_fastpath+0x16/0x1b
>>
>>
>> - The difference of the method two is that dl sched_entity won't be throtted
>> if rq is offline, the dl sched_entity will be replenished in update_curr_dl().
>> However, the echo 0 > /sys/devices/system/cpu/cpuN/online hung.
>>
>> Juri, your proposal is a great welcome. ;-)
>>
>> Note: This patch is just a proposal and still can't successfully migrate
>> dl task during cpu hotplug.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> include/linux/sched.h | 2 ++
>> kernel/sched/deadline.c | 22 +++++++++++++++++++++-
>> kernel/sched/sched.h | 3 +++
>> 3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 4400ddc..bd71f19 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1253,6 +1253,8 @@ struct sched_dl_entity {
>> * own bandwidth to be enforced, thus we need one timer per task.
>> */
>> struct hrtimer dl_timer;
>> + struct list_head throttled_node;
>> + int on_list;
>
>Get rig of on_list. It's better to check for list_empty(&dl->throttled_node)
>instead. Of course, you should change list_del() on list_del_init() for this.

Agreed.

>
>> };
>>
>> union rcu_special {
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 2e31a30..d6d6b71 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -80,6 +80,7 @@ void init_dl_rq(struct dl_rq *dl_rq, struct rq *rq)
>> dl_rq->dl_nr_migratory = 0;
>> dl_rq->overloaded = 0;
>> dl_rq->pushable_dl_tasks_root = RB_ROOT;
>> + INIT_LIST_HEAD(&dl_rq->throttled_list);
>> #else
>> init_dl_bw(&dl_rq->dl_bw);
>> #endif
>> @@ -538,6 +539,10 @@ again:
>> update_rq_clock(rq);
>> dl_se->dl_throttled = 0;
>> dl_se->dl_yielded = 0;
>> + if (dl_se->on_list) {
>> + list_del(&dl_se->throttled_node);
>> + dl_se->on_list = 0;
>> + }
>> if (task_on_rq_queued(p)) {
>> enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> if (dl_task(rq->curr))
>> @@ -636,8 +641,12 @@ static void update_curr_dl(struct rq *rq)
>> dl_se->runtime -= delta_exec;
>> if (dl_runtime_exceeded(rq, dl_se)) {
>> __dequeue_task_dl(rq, curr, 0);
>> - if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
>> + if (rq->online && likely(start_dl_timer(dl_se, curr->dl.dl_boosted))) {
>
>Why is this check for rq->online necessary?

I will remove it.

>
>> dl_se->dl_throttled = 1;
>> + dl_se->on_list = 1;
>> + list_add(&dl_se->throttled_node,
>> + &rq->dl.throttled_list);
>
>Alignment is wrong.

Agreed.

>
>> + }
>> else
>> enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
>>
>> @@ -1593,9 +1602,20 @@ static void rq_online_dl(struct rq *rq)
>> /* Assumes rq->lock is held */
>> static void rq_offline_dl(struct rq *rq)
>> {
>> + struct task_struct *p, *n;
>> +
>> if (rq->dl.overloaded)
>> dl_clear_overload(rq);
>>
>> + /* Make sched_dl_entity available for pick_next_task() */
>> + list_for_each_entry_safe(p, n, &rq->dl.throttled_list, dl.throttled_node) {
>> + p->dl.dl_throttled = 0;
>> + hrtimer_cancel(&p->dl.dl_timer);
>
>Deadlock is possible here. You're holding rq->lock and want to cancel timer handler,
>which is waiting for your rq->lock.

So what's your idea to handle this?

>
>> + p->dl.dl_runtime = p->dl.dl_runtime;
>> + if (task_on_rq_queued(p))
>> + enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> + }
>> +
>> cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
>> }
>>
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index ec3917c..8f95036 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -482,6 +482,9 @@ struct dl_rq {
>> */
>> struct rb_root pushable_dl_tasks_root;
>> struct rb_node *pushable_dl_tasks_leftmost;
>> +
>> + struct list_head throttled_list;
>> +
>> #else
>> struct dl_bw dl_bw;
>> #endif
>
>What about the situations when task changes its sched_class?

I still not consider this currently, your proposal is a great welcome. ;-)

Regards,
Wanpeng Li

2014-11-03 02:37:31

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi Juri,
On Fri, Oct 31, 2014 at 11:42:23AM +0000, Juri Lelli wrote:
>Hi,
>
>On 31/10/14 07:28, Wanpeng Li wrote:
>> Hi all,
>>
>> I observe that dl task can't be migrated to other cpus during cpu hotplug, in
>> addition, task may/may not be running again if cpu is added back. The root cause
>
>Can you share more information about this? Do you have a test I can run
>on my box?

schedtool -E -t 50000:100000 -e ./test
Actually test is just a simple for loop. Then observe which cpu the test
task is on.
echo 0 > /sys/devices/system/cpu/cpuN/online
Task may/may not be running again if cpu is added back in my observation, in
addition, task won't be migrated to other cpus after logical remove cpu.

Regards,
Wanpeng Li

>
>Thanks,
>
>- Juri
>
>> which I found is that dl task will be throtted and removed from dl rq after
>> comsuming all budget, which leads to stop task can't pick it up from dl rq and
>> migrate to other cpus during hotplug.
>>
>> So I try two methods.
>>
>> - add throttled dl sched_entity to a throttled_list, the list will be traversed
>> during cpu hotplug, and the dl sched_entity will be picked and enqueue, then
>> stop task will pick and migrate it. However, dl sched_entity is throttled again
>> before stop task running since the below path. This path will set rq->online 0
>> which lead to set_rq_offline() won't be called in function migration_call().
>>
>> Call Trace:
>> [...] rq_offline_dl+0x44/0x66
>> [...] set_rq_offline+0x29/0x54
>> [...] rq_attach_root+0x3f/0xb7
>> [...] cpu_attach_domain+0x1c7/0x354
>> [...] build_sched_domains+0x295/0x304
>> [...] partition_sched_domains+0x26a/0x2e6
>> [...] ? emulator_write_gpr+0x27/0x27 [kvm]
>> [...] cpuset_update_active_cpus+0x12/0x2c
>> [...] cpuset_cpu_inactive+0x1b/0x38
>> [...] notifier_call_chain+0x32/0x5e
>> [...] __raw_notifier_call_chain+0x9/0xb
>> [...] __cpu_notify+0x1b/0x2d
>> [...] _cpu_down+0x81/0x22a
>> [...] cpu_down+0x28/0x35
>> [...] cpu_subsys_offline+0xf/0x11
>> [...] device_offline+0x78/0xa8
>> [...] online_store+0x48/0x69
>> [...] ? kernfs_fop_write+0x61/0x129
>> [...] dev_attr_store+0x1b/0x1d
>> [...] sysfs_kf_write+0x37/0x39
>> [...] kernfs_fop_write+0xe9/0x129
>> [...] vfs_write+0xc6/0x19e
>> [...] SyS_write+0x4b/0x8f
>> [...] system_call_fastpath+0x16/0x1b
>>
>>
>> - The difference of the method two is that dl sched_entity won't be throtted
>> if rq is offline, the dl sched_entity will be replenished in update_curr_dl().
>> However, the echo 0 > /sys/devices/system/cpu/cpuN/online hung.
>>
>> Juri, your proposal is a great welcome. ;-)
>>
>> Note: This patch is just a proposal and still can't successfully migrate
>> dl task during cpu hotplug.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> include/linux/sched.h | 2 ++
>> kernel/sched/deadline.c | 22 +++++++++++++++++++++-
>> kernel/sched/sched.h | 3 +++
>> 3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 4400ddc..bd71f19 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1253,6 +1253,8 @@ struct sched_dl_entity {
>> * own bandwidth to be enforced, thus we need one timer per task.
>> */
>> struct hrtimer dl_timer;
>> + struct list_head throttled_node;
>> + int on_list;
>> };
>>
>> union rcu_special {
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 2e31a30..d6d6b71 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -80,6 +80,7 @@ void init_dl_rq(struct dl_rq *dl_rq, struct rq *rq)
>> dl_rq->dl_nr_migratory = 0;
>> dl_rq->overloaded = 0;
>> dl_rq->pushable_dl_tasks_root = RB_ROOT;
>> + INIT_LIST_HEAD(&dl_rq->throttled_list);
>> #else
>> init_dl_bw(&dl_rq->dl_bw);
>> #endif
>> @@ -538,6 +539,10 @@ again:
>> update_rq_clock(rq);
>> dl_se->dl_throttled = 0;
>> dl_se->dl_yielded = 0;
>> + if (dl_se->on_list) {
>> + list_del(&dl_se->throttled_node);
>> + dl_se->on_list = 0;
>> + }
>> if (task_on_rq_queued(p)) {
>> enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> if (dl_task(rq->curr))
>> @@ -636,8 +641,12 @@ static void update_curr_dl(struct rq *rq)
>> dl_se->runtime -= delta_exec;
>> if (dl_runtime_exceeded(rq, dl_se)) {
>> __dequeue_task_dl(rq, curr, 0);
>> - if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
>> + if (rq->online && likely(start_dl_timer(dl_se, curr->dl.dl_boosted))) {
>> dl_se->dl_throttled = 1;
>> + dl_se->on_list = 1;
>> + list_add(&dl_se->throttled_node,
>> + &rq->dl.throttled_list);
>> + }
>> else
>> enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
>>
>> @@ -1593,9 +1602,20 @@ static void rq_online_dl(struct rq *rq)
>> /* Assumes rq->lock is held */
>> static void rq_offline_dl(struct rq *rq)
>> {
>> + struct task_struct *p, *n;
>> +
>> if (rq->dl.overloaded)
>> dl_clear_overload(rq);
>>
>> + /* Make sched_dl_entity available for pick_next_task() */
>> + list_for_each_entry_safe(p, n, &rq->dl.throttled_list, dl.throttled_node) {
>> + p->dl.dl_throttled = 0;
>> + hrtimer_cancel(&p->dl.dl_timer);
>> + p->dl.dl_runtime = p->dl.dl_runtime;
>> + if (task_on_rq_queued(p))
>> + enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> + }
>> +
>> cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
>> }
>>
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index ec3917c..8f95036 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -482,6 +482,9 @@ struct dl_rq {
>> */
>> struct rb_root pushable_dl_tasks_root;
>> struct rb_node *pushable_dl_tasks_leftmost;
>> +
>> + struct list_head throttled_list;
>> +
>> #else
>> struct dl_bw dl_bw;
>> #endif
>>

2014-11-04 00:18:36

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi Peter,
On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
>On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
>> Hi all,
>>
>> I observe that dl task can't be migrated to other cpus during cpu hotplug, in
>> addition, task may/may not be running again if cpu is added back. The root cause
>> which I found is that dl task will be throtted and removed from dl rq after
>> comsuming all budget, which leads to stop task can't pick it up from dl rq and
>> migrate to other cpus during hotplug.
>>
>> So I try two methods.
>>
>> - add throttled dl sched_entity to a throttled_list, the list will be traversed
>> during cpu hotplug, and the dl sched_entity will be picked and enqueue, then
>> stop task will pick and migrate it. However, dl sched_entity is throttled again
>> before stop task running since the below path. This path will set rq->online 0
>> which lead to set_rq_offline() won't be called in function migration_call().
>>
>
>This seems wrong to me; this screws around with the CBS by replenishing
>too soon.

Agreed.

>
>> @@ -1593,9 +1602,20 @@ static void rq_online_dl(struct rq *rq)
>> /* Assumes rq->lock is held */
>> static void rq_offline_dl(struct rq *rq)
>> {
>> + struct task_struct *p, *n;
>> +
>> if (rq->dl.overloaded)
>> dl_clear_overload(rq);
>>
>> + /* Make sched_dl_entity available for pick_next_task() */
>> + list_for_each_entry_safe(p, n, &rq->dl.throttled_list, dl.throttled_node) {
>> + p->dl.dl_throttled = 0;
>> + hrtimer_cancel(&p->dl.dl_timer);
>> + p->dl.dl_runtime = p->dl.dl_runtime;
>> + if (task_on_rq_queued(p))
>> + enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
>> + }
>> +
>> cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
>> }
>
>
>So what is wrong with making dl_task_timer() deal with it? The timer
>will still fire on the correct time, canceling it and or otherwise
>messing with the CBS is wrong. Once it fires, all we need to do is
>migrate it to another cpu (preferably one that is still online of course
>:-).

Do you mean what I need to do is push the task to another cpu in dl_task_timer()
if rq is offline? In addition, what will happen if dl task can't preempt on
another cpu?

Regards,
Wanpeng Li

2014-11-04 08:32:33

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:

> >On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
> >So what is wrong with making dl_task_timer() deal with it? The timer
> >will still fire on the correct time, canceling it and or otherwise
> >messing with the CBS is wrong. Once it fires, all we need to do is
> >migrate it to another cpu (preferably one that is still online of course
> >:-).

> Do you mean what I need to do is push the task to another cpu in dl_task_timer()
> if rq is offline?

That does indeed appear to be the sensible fix to me.

> In addition, what will happen if dl task can't preempt on
> another cpu?

So if we find that the rq the task was on is no longer available, we
need to select a new rq, the 'right' rq would be the one running the
latest deadline.

If it cannot preempt the latest (running) deadline, it was not eligible
for running in the first place so no worries, right?

2014-11-04 08:44:33

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi Peter,
On Tue, Nov 04, 2014 at 09:32:25AM +0100, Peter Zijlstra wrote:
>On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
>> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
>
>> >On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
>> >So what is wrong with making dl_task_timer() deal with it? The timer
>> >will still fire on the correct time, canceling it and or otherwise
>> >messing with the CBS is wrong. Once it fires, all we need to do is
>> >migrate it to another cpu (preferably one that is still online of course
>> >:-).
>
>> Do you mean what I need to do is push the task to another cpu in dl_task_timer()
>> if rq is offline?
>
>That does indeed appear to be the sensible fix to me.
>
>> In addition, what will happen if dl task can't preempt on
>> another cpu?
>
>So if we find that the rq the task was on is no longer available, we
>need to select a new rq, the 'right' rq would be the one running the
>latest deadline.
>
>If it cannot preempt the latest (running) deadline, it was not eligible
>for running in the first place so no worries, right?

I think this will lead to this deadline task cannot running on any rqs any more.
If my understanding is not right, when it will be picked?

Regards,
Wanpeng Li

2014-11-04 09:20:53

by Juri Lelli

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi,

On 04/11/14 08:23, Wanpeng Li wrote:
> Hi Peter,
> On Tue, Nov 04, 2014 at 09:32:25AM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
>>> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
>>
>>>> On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
>>>> So what is wrong with making dl_task_timer() deal with it? The timer
>>>> will still fire on the correct time, canceling it and or otherwise
>>>> messing with the CBS is wrong. Once it fires, all we need to do is
>>>> migrate it to another cpu (preferably one that is still online of course
>>>> :-).
>>
>>> Do you mean what I need to do is push the task to another cpu in dl_task_timer()
>>> if rq is offline?
>>
>> That does indeed appear to be the sensible fix to me.
>>
>>> In addition, what will happen if dl task can't preempt on
>>> another cpu?
>>
>> So if we find that the rq the task was on is no longer available, we
>> need to select a new rq, the 'right' rq would be the one running the
>> latest deadline.
>>
>> If it cannot preempt the latest (running) deadline, it was not eligible
>> for running in the first place so no worries, right?
>
> I think this will lead to this deadline task cannot running on any rqs any more.
> If my understanding is not right, when it will be picked?
>

I think you can just pick one random rq and move the task there. The
push/pull mechanism should then move it around properly.

Thanks,

- Juri

2014-11-04 10:10:29

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

On Tue, Nov 04, 2014 at 04:23:45PM +0800, Wanpeng Li wrote:
> On Tue, Nov 04, 2014 at 09:32:25AM +0100, Peter Zijlstra wrote:
> >On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
> >> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
> >
> >> >On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
> >> >So what is wrong with making dl_task_timer() deal with it? The timer
> >> >will still fire on the correct time, canceling it and or otherwise
> >> >messing with the CBS is wrong. Once it fires, all we need to do is
> >> >migrate it to another cpu (preferably one that is still online of course
> >> >:-).
> >
> >> Do you mean what I need to do is push the task to another cpu in dl_task_timer()
> >> if rq is offline?
> >
> >That does indeed appear to be the sensible fix to me.
> >
> >> In addition, what will happen if dl task can't preempt on
> >> another cpu?
> >
> >So if we find that the rq the task was on is no longer available, we
> >need to select a new rq, the 'right' rq would be the one running the
> >latest deadline.
> >
> >If it cannot preempt the latest (running) deadline, it was not eligible
> >for running in the first place so no worries, right?
>
> I think this will lead to this deadline task cannot running on any rqs any more.
> If my understanding is not right, when it will be picked?

So you unconditionally place it on the rq with the latest deadline. If
it cannot preempt, at least its on an online cpu. It will get scheduled
whenever its deadline is one of the N earliest, with N the number of
online CPUs.

2014-11-04 10:51:27

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

On 14/11/4 下午6:10, Peter Zijlstra wrote:
> On Tue, Nov 04, 2014 at 04:23:45PM +0800, Wanpeng Li wrote:
>> On Tue, Nov 04, 2014 at 09:32:25AM +0100, Peter Zijlstra wrote:
>>> On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
>>>> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
>>>>> On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
>>>>> So what is wrong with making dl_task_timer() deal with it? The timer
>>>>> will still fire on the correct time, canceling it and or otherwise
>>>>> messing with the CBS is wrong. Once it fires, all we need to do is
>>>>> migrate it to another cpu (preferably one that is still online of course
>>>>> :-).
>>>> Do you mean what I need to do is push the task to another cpu in dl_task_timer()
>>>> if rq is offline?
>>> That does indeed appear to be the sensible fix to me.
>>>
>>>> In addition, what will happen if dl task can't preempt on
>>>> another cpu?
>>> So if we find that the rq the task was on is no longer available, we
>>> need to select a new rq, the 'right' rq would be the one running the
>>> latest deadline.
>>>
>>> If it cannot preempt the latest (running) deadline, it was not eligible
>>> for running in the first place so no worries, right?
>> I think this will lead to this deadline task cannot running on any rqs any more.
>> If my understanding is not right, when it will be picked?
> So you unconditionally place it on the rq with the latest deadline. If
> it cannot preempt, at least its on an online cpu. It will get scheduled
> whenever its deadline is one of the N earliest, with N the number of
> online CPUs.

Got it. Thanks for everybody's reply. ;-) I will make a patch tomorrow.

Regards,
Wanpeng Li

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-11-04 13:31:06

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi Peter,
On 14/11/4 下午6:10, Peter Zijlstra wrote:
> On Tue, Nov 04, 2014 at 04:23:45PM +0800, Wanpeng Li wrote:
>> On Tue, Nov 04, 2014 at 09:32:25AM +0100, Peter Zijlstra wrote:
>>> On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
>>>> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
>>>>> On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
>>>>> So what is wrong with making dl_task_timer() deal with it? The timer
>>>>> will still fire on the correct time, canceling it and or otherwise
>>>>> messing with the CBS is wrong. Once it fires, all we need to do is
>>>>> migrate it to another cpu (preferably one that is still online of course
>>>>> :-).
>>>> Do you mean what I need to do is push the task to another cpu in dl_task_timer()
>>>> if rq is offline?
>>> That does indeed appear to be the sensible fix to me.
>>>
>>>> In addition, what will happen if dl task can't preempt on
>>>> another cpu?
>>> So if we find that the rq the task was on is no longer available, we
>>> need to select a new rq, the 'right' rq would be the one running the
>>> latest deadline.
>>>
>>> If it cannot preempt the latest (running) deadline, it was not eligible
>>> for running in the first place so no worries, right?
>> I think this will lead to this deadline task cannot running on any rqs any more.
>> If my understanding is not right, when it will be picked?
> So you unconditionally place it on the rq with the latest deadline. If
> it cannot preempt, at least its on an online cpu. It will get scheduled
> whenever its deadline is one of the N earliest, with N the number of
> online CPUs.

If something like this make sense?

(Will test it tomorrow and send out a formal one)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f3d7776..dac33d1 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -553,6 +553,41 @@ again:
push_dl_task(rq);
#endif
}
+
+ /*
+ * So if we find that the rq the task was on is no longer
+ * available, we need to select a new rq, the 'right' rq
+ * would be the one running the latest deadline.
+ *
+ * If it cannot preempt, at least it's on an online cpu. It
+ * will get scheduled whenever its deadline is one of the N
+ * earliest, with N the number of online CPUs.
+ */
+
+ if (!rq->online) {
+ struct rq *latest_rq = NULL;
+ int cpu;
+ u64 dmin = LONG_MAX;
+
+ for_each_cpu(cpu, &p->cpus_allowed)
+ if (cpu_online(cpu) &&
+ cpu_rq(cpu)->dl.earliest_dl.curr < dmin) {
+ latest_rq = cpu_rq(cpu);
+ dmin = latest_rq->dl.earliest_dl.curr;
+ }
+
+ if (!latest_rq)
+ goto unlock;
+
+ raw_spin_lock(&latest_rq->lock);
+
+ deactivate_task(rq, p, 0);
+ set_task_cpu(p, latest_rq->cpu);
+ activate_task(latest_rq, p, 0);
+
+ raw_spin_unlock(&latest_rq->lock);
+ }
+
unlock:
raw_spin_unlock(&rq->lock);

Regards,
Wanpeng Li

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-11-04 13:33:52

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

On 14/11/4 下午9:30, Wanpeng Li wrote:
> Hi Peter,
> On 14/11/4 下午6:10, Peter Zijlstra wrote:
>> On Tue, Nov 04, 2014 at 04:23:45PM +0800, Wanpeng Li wrote:
>>> On Tue, Nov 04, 2014 at 09:32:25AM +0100, Peter Zijlstra wrote:
>>>> On Tue, Nov 04, 2014 at 07:57:48AM +0800, Wanpeng Li wrote:
>>>>> On Mon, Nov 03, 2014 at 11:41:11AM +0100, Peter Zijlstra wrote:
>>>>>> On Fri, Oct 31, 2014 at 03:28:17PM +0800, Wanpeng Li wrote:
>>>>>> So what is wrong with making dl_task_timer() deal with it? The timer
>>>>>> will still fire on the correct time, canceling it and or otherwise
>>>>>> messing with the CBS is wrong. Once it fires, all we need to do is
>>>>>> migrate it to another cpu (preferably one that is still online of
>>>>>> course
>>>>>> :-).
>>>>> Do you mean what I need to do is push the task to another cpu in
>>>>> dl_task_timer()
>>>>> if rq is offline?
>>>> That does indeed appear to be the sensible fix to me.
>>>>
>>>>> In addition, what will happen if dl task can't preempt on
>>>>> another cpu?
>>>> So if we find that the rq the task was on is no longer available, we
>>>> need to select a new rq, the 'right' rq would be the one running the
>>>> latest deadline.
>>>>
>>>> If it cannot preempt the latest (running) deadline, it was not
>>>> eligible
>>>> for running in the first place so no worries, right?
>>> I think this will lead to this deadline task cannot running on any
>>> rqs any more.
>>> If my understanding is not right, when it will be picked?
>> So you unconditionally place it on the rq with the latest deadline. If
>> it cannot preempt, at least its on an online cpu. It will get scheduled
>> whenever its deadline is one of the N earliest, with N the number of
>> online CPUs.
>
> If something like this make sense?
>
> (Will test it tomorrow and send out a formal one)
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index f3d7776..dac33d1 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -553,6 +553,41 @@ again:
> push_dl_task(rq);
> #endif
> }
> +
> + /*
> + * So if we find that the rq the task was on is no longer
> + * available, we need to select a new rq, the 'right' rq
> + * would be the one running the latest deadline.
> + *
> + * If it cannot preempt, at least it's on an online cpu. It
> + * will get scheduled whenever its deadline is one of the N
> + * earliest, with N the number of online CPUs.
> + */
> +
> + if (!rq->online) {
> + struct rq *latest_rq = NULL;
> + int cpu;
> + u64 dmin = LONG_MAX;
> +
> + for_each_cpu(cpu, &p->cpus_allowed)
> + if (cpu_online(cpu) &&
> + cpu_rq(cpu)->dl.earliest_dl.curr < dmin) {
> + latest_rq = cpu_rq(cpu);
> + dmin = latest_rq->dl.earliest_dl.curr;
> + }
> +
> + if (!latest_rq)
> + goto unlock;
> +
> + raw_spin_lock(&latest_rq->lock);
> +
> + deactivate_task(rq, p, 0);
> + set_task_cpu(p, latest_rq->cpu);
> + activate_task(latest_rq, p, 0);
> +
> + raw_spin_unlock(&latest_rq->lock);
> + }
> +
> unlock:
> raw_spin_unlock(&rq->lock);

wow, sorry for mess format.

>
> Regards,
> Wanpeng Li
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>

2014-11-04 15:46:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

On Tue, Nov 04, 2014 at 09:30:46PM +0800, Wanpeng Li wrote:

> + if (!rq->online) {
> + struct rq *latest_rq = NULL;
> + int cpu;
> + u64 dmin = LONG_MAX;
> +
> + for_each_cpu(cpu, &p->cpus_allowed)
> + if (cpu_online(cpu) &&
> + cpu_rq(cpu)->dl.earliest_dl.curr < dmin) {
> + latest_rq = cpu_rq(cpu);
> + dmin = latest_rq->dl.earliest_dl.curr;
> + }

I would have expected something using find_later_rq(), but I might be
mistaken, I'll let Juri suggest something.

> +
> + if (!latest_rq)
> + goto unlock;
> +
> + raw_spin_lock(&latest_rq->lock);
> +
> + deactivate_task(rq, p, 0);
> + set_task_cpu(p, latest_rq->cpu);
> + activate_task(latest_rq, p, 0);
> +
> + raw_spin_unlock(&latest_rq->lock);
> + }
> +
> unlock:
> raw_spin_unlock(&rq->lock);

2014-11-04 15:50:29

by Juri Lelli

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi,

On 04/11/14 15:46, Peter Zijlstra wrote:
> On Tue, Nov 04, 2014 at 09:30:46PM +0800, Wanpeng Li wrote:
>
>
>> + if (!rq->online) {
>> + struct rq *latest_rq = NULL;
>> + int cpu;
>> + u64 dmin = LONG_MAX;
>> +
>> + for_each_cpu(cpu, &p->cpus_allowed)
>> + if (cpu_online(cpu) &&
>> + cpu_rq(cpu)->dl.earliest_dl.curr < dmin) {
>> + latest_rq = cpu_rq(cpu);
>> + dmin = latest_rq->dl.earliest_dl.curr;
>> + }
>
> I would have expected something using find_later_rq(), but I might be
> mistaken, I'll let Juri suggest something.
>

Yeah, we should be able to reuse something that we already have. I'm
actually sorry that I'm not responsive on this, but I'm really busy on
other things this week. I hope to be able to find some time soon to test
this all.

Thanks,

- Juri

>> +
>> + if (!latest_rq)
>> + goto unlock;
>> +
>> + raw_spin_lock(&latest_rq->lock);
>> +
>> + deactivate_task(rq, p, 0);
>> + set_task_cpu(p, latest_rq->cpu);
>> + activate_task(latest_rq, p, 0);
>> +
>> + raw_spin_unlock(&latest_rq->lock);
>> + }
>> +
>> unlock:
>> raw_spin_unlock(&rq->lock);
>

2014-11-05 06:24:18

by Wanpeng Li

[permalink] [raw]

Subject: Re: [PATCH RFC] sched/deadline: support dl task migrate during cpu hotplug

Hi Juri,
On 14/11/4 下午11:50, Juri Lelli wrote:
> Hi,
>
> On 04/11/14 15:46, Peter Zijlstra wrote:
>> On Tue, Nov 04, 2014 at 09:30:46PM +0800, Wanpeng Li wrote:
>>
>>
>>> + if (!rq->online) {
>>> + struct rq *latest_rq = NULL;
>>> + int cpu;
>>> + u64 dmin = LONG_MAX;
>>> +
>>> + for_each_cpu(cpu, &p->cpus_allowed)
>>> + if (cpu_online(cpu) &&
>>> + cpu_rq(cpu)->dl.earliest_dl.curr < dmin) {
>>> + latest_rq = cpu_rq(cpu);
>>> + dmin = latest_rq->dl.earliest_dl.curr;
>>> + }
>> I would have expected something using find_later_rq(), but I might be
>> mistaken, I'll let Juri suggest something.
>>
> Yeah, we should be able to reuse something that we already have. I'm
> actually sorry that I'm not responsive on this, but I'm really busy on
> other things this week. I hope to be able to find some time soon to test
> this all.

I have time to do it, your proposal is a great welcome. ;-)

Regards,
Wanpeng Li

>
> Thanks,
>
> - Juri
>
>>> +
>>> + if (!latest_rq)
>>> + goto unlock;
>>> +
>>> + raw_spin_lock(&latest_rq->lock);
>>> +
>>> + deactivate_task(rq, p, 0);
>>> + set_task_cpu(p, latest_rq->cpu);
>>> + activate_task(latest_rq, p, 0);
>>> +
>>> + raw_spin_unlock(&latest_rq->lock);
>>> + }
>>> +
>>> unlock:
>>> raw_spin_unlock(&rq->lock);