Hi Peter,
When sysctl_sched_rt_runtime is set to something other than -1 and the
CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state
where we see one or more CPUs idling forvever even though there are
real-time
tasks in their rt runqueue that are able to run (no longer throttled).
The sequence is:
- A real-time task is running when the timer sets the rt runqueue
to throttled, and the rt task is resched_task()ed and switched
out, and idle is switched in since there are no non-rt tasks to
run on that cpu.
- Eventually the do_sched_rt_period_timer() runs and un-throttles
the rt runqueue, but we just exit the timer interrupt and go back
to executing the idle task in the idle loop forever.
If we change the sched_rt_rq_enqueue() routine to use some of the code
from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and
resched_task() the currently executing task (idle in our case) if it is
a lower priority task than the higher rt task in the now un-throttled
runqueue, the problem is no longer observed.
Thank you for your time and consideration.
Signed-off-by: John Blackwood <[email protected]>
Index: a/kernel/sched_rt.c
===================================================================
--- a.orig/kernel/sched_rt.c 2008-08-22 15:11:31.000000000 -0400
+++ a/kernel/sched_rt.c 2008-08-22 15:12:36.000000000 -0400
@@ -193,6 +193,12 @@
static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
+ if (rt_rq->rt_nr_running) {
+ struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
+
+ if (rt_rq->highest_prio < curr->prio)
+ resched_task(curr);
+ }
}
static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
On Tue, 2008-08-26 at 15:09 -0400, John Blackwood wrote:
> Hi Peter,
>
> When sysctl_sched_rt_runtime is set to something other than -1 and the
> CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state
> where we see one or more CPUs idling forvever even though there are
> real-time
> tasks in their rt runqueue that are able to run (no longer throttled).
>
> The sequence is:
>
> - A real-time task is running when the timer sets the rt runqueue
> to throttled, and the rt task is resched_task()ed and switched
> out, and idle is switched in since there are no non-rt tasks to
> run on that cpu.
>
> - Eventually the do_sched_rt_period_timer() runs and un-throttles
> the rt runqueue, but we just exit the timer interrupt and go back
> to executing the idle task in the idle loop forever.
>
> If we change the sched_rt_rq_enqueue() routine to use some of the code
> from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and
> resched_task() the currently executing task (idle in our case) if it is
> a lower priority task than the higher rt task in the now un-throttled
> runqueue, the problem is no longer observed.
Very good spotting, Thanks!
However I think the patch isn't quite good, as highest_prio is only
available on SMP || RT_GROUP_SCHED.
Furthermore, on !RT_GROUP_SCHED any RT task will be higher than current,
so we can do the below, do you agree?
---
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 94daace..f672aee 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -199,6 +199,8 @@ static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se)
static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
+ if (rt_rq->rt_nr_running)
+ resched_task(rq_of_rt_rq(rt_rq)->curr);
}
static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
> On Tue, 2008-08-26 at 15:09 -0400, John Blackwood wrote:
> > > Hi Peter,
> > >
> > > When sysctl_sched_rt_runtime is set to something other than -1
and the
> > > CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get
into a state
> > > where we see one or more CPUs idling forvever even though there are
> > > real-time
> > > tasks in their rt runqueue that are able to run (no longer
throttled).
> > >
> > > The sequence is:
> > >
> > > - A real-time task is running when the timer sets the rt runqueue
> > > to throttled, and the rt task is resched_task()ed and switched
> > > out, and idle is switched in since there are no non-rt tasks to
> > > run on that cpu.
> > >
> > > - Eventually the do_sched_rt_period_timer() runs and un-throttles
> > > the rt runqueue, but we just exit the timer interrupt and go back
> > > to executing the idle task in the idle loop forever.
> > >
> > > If we change the sched_rt_rq_enqueue() routine to use some of the
code
> > > from the CONFIG_RT_GROUP_SCHED enabled version of this same
routine and
> > > resched_task() the currently executing task (idle in our case) if
it is
> > > a lower priority task than the higher rt task in the now un-throttled
> > > runqueue, the problem is no longer observed.
>
> Very good spotting, Thanks!
You're welcome.
> However I think the patch isn't quite good, as highest_prio is only
> available on SMP || RT_GROUP_SCHED.
>
> Furthermore, on !RT_GROUP_SCHED any RT task will be higher than current,
> so we can do the below, do you agree?
Yes, I see what you are saying.
The patch version below looks good.
I re-tested with it and it works fine.
Thanks!
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 94daace..f672aee 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -199,6 +199,8 @@ static inline struct rt_rq *group_rt_rq(struct
sched_rt_entity *rt_se)
>
> static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
> {
> + if (rt_rq->rt_nr_running)
> + resched_task(rq_of_rt_rq(rt_rq)->curr);
> }
>
> static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
>
>
Ingo, please apply.
---
Subject: sched: sched_rt_rq_enqueue() resched idle
From: John Blackwood <[email protected]>
Date: Tue, 26 Aug 2008 15:09:43 -0400
When sysctl_sched_rt_runtime is set to something other than -1 and the
CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state
where we see one or more CPUs idling forvever even though there are
real-time
tasks in their rt runqueue that are able to run (no longer throttled).
The sequence is:
- A real-time task is running when the timer sets the rt runqueue
to throttled, and the rt task is resched_task()ed and switched
out, and idle is switched in since there are no non-rt tasks to
run on that cpu.
- Eventually the do_sched_rt_period_timer() runs and un-throttles
the rt runqueue, but we just exit the timer interrupt and go back
to executing the idle task in the idle loop forever.
If we change the sched_rt_rq_enqueue() routine to use some of the code
from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and
resched_task() the currently executing task (idle in our case) if it is
a lower priority task than the higher rt task in the now un-throttled
runqueue, the problem is no longer observed.
Signed-off-by: John Blackwood <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched_rt.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -199,6 +199,8 @@ static inline struct rt_rq *group_rt_rq(
static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
+ if (rt_rq->rt_nr_running)
+ resched_task(rq_of_rt_rq(rt_rq)->curr);
}
static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq)