2015-05-26 13:32:09

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH 2/2] sched: Update runtime of prev task before doing pick_next_task()

pick_next_task() puts prev rq's task. This may lead to runtime
expiration and to dequeueing of all scheduling class's tasks
because of throttling. And the current logic is that put_prev_task()
must be called in the pick method of next task's class.

This was fixed for RT and DL classes, while fair class have this
problem. So, instead of doing partial solutions, let's update prev
task's runtime for all classes in __schedule() and fix the problem
completelly.

Also, let's freeze the clock during pick_next_task() to be sure
new expirations of runtime won't happen.

Reported-by: Konstantin Khlebnikov <[email protected]>
Reported-by: Mohammed Naser <[email protected]>
Signed-off-by: Kirill Tkhai <[email protected]>
---
kernel/sched/core.c | 3 +++
kernel/sched/deadline.c | 7 -------
kernel/sched/rt.c | 7 -------
3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4eec607..0872280 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2806,7 +2806,10 @@ static void __sched __schedule(void)

if (task_on_rq_queued(prev))
update_rq_clock(rq);
+ prev->sched_class->update_curr(rq);

+ /* freeze clock to avoid new run time expirations in pick_next_task() */
+ rq_clock_skip_update(rq, true);
next = pick_next_task(rq, prev);
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 7a08d59..570eadd 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1102,13 +1102,6 @@ struct task_struct *pick_next_task_dl(struct rq *rq, struct task_struct *prev)
return RETRY_TASK;
}

- /*
- * When prev is DL, we may throttle it in put_prev_task().
- * So, we update time before we check for dl_nr_running.
- */
- if (prev->sched_class == &dl_sched_class)
- update_curr_dl(rq);
-
if (unlikely(!dl_rq->dl_nr_running))
return NULL;

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7d7093c5..3437e7e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1480,13 +1480,6 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
return RETRY_TASK;
}

- /*
- * We may dequeue prev's rt_rq in put_prev_task().
- * So, we update time before rt_nr_running check.
- */
- if (prev->sched_class == &rt_sched_class)
- update_curr_rt(rq);
-
if (!rt_rq->rt_queued)
return NULL;




2015-05-26 17:48:42

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: Update runtime of prev task before doing pick_next_task()

Kirill Tkhai <[email protected]> writes:

> pick_next_task() puts prev rq's task. This may lead to runtime
> expiration and to dequeueing of all scheduling class's tasks
> because of throttling. And the current logic is that put_prev_task()
> must be called in the pick method of next task's class.
>
> This was fixed for RT and DL classes, while fair class have this
> problem. So, instead of doing partial solutions, let's update prev
> task's runtime for all classes in __schedule() and fix the problem
> completelly.
>
> Also, let's freeze the clock during pick_next_task() to be sure
> new expirations of runtime won't happen.
>
> Reported-by: Konstantin Khlebnikov <[email protected]>
> Reported-by: Mohammed Naser <[email protected]>
> Signed-off-by: Kirill Tkhai <[email protected]>


If this is actually the bug I sent a patch for (and that I was correct
in guessing what the issue /was/, which is not at all certain), this
won't actually eliminate the issue - I couldn't find a race involving
actual updates but did find one with disable/enable.