LinuxLists.cc - [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

2011-03-03 11:34:43

Subject: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

Fix hierarchical scheduling in sched rt group

From: Balbir Singh <[email protected]>

The current sched rt code is broken when it comes to hierarchical
scheduling, this patch fixes two problems

1. It adds redundant enqueuing (harmless) when it finds a queue
has tasks enqueued, but it has no run time and it is not
throttled.
2. The most important change is in sched_rt_rq_enqueue/dequeue.
The code just picks the rt_rq belonging to the current cpu
on which the period timer runs, the patch fixes it, so that
the correct rt_se is enqueued/dequeued.

Tested with a simple hierarchy

/c/d, c and d assigned similar runtimes of 50,000 and a while
1 loop runs within "d". Both c and d get throttled, without
the patch, the task just stops running and never runs (depends
on where the sched_rt b/w timer runs). With the patch, the
task is throttled and runs as expected.

[bharata, suggestions on how to pick the rt_se belong to the rt_rq
and correct cpu]

Signed-off-by: Balbir Singh <[email protected]>
Signed-off-by: Bharata B Rao <[email protected]>
---
kernel/sched_rt.c | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index ad62677..01f75a5 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -210,11 +210,12 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se);

static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
- int this_cpu = smp_processor_id();
struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
struct sched_rt_entity *rt_se;

- rt_se = rt_rq->tg->rt_se[this_cpu];
+ int cpu = cpu_of(rq_of_rt_rq(rt_rq));
+
+ rt_se = rt_rq->tg->rt_se[cpu];

if (rt_rq->rt_nr_running) {
if (rt_se && !on_rt_rq(rt_se))
@@ -226,10 +227,10 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)

static void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
{
- int this_cpu = smp_processor_id();
struct sched_rt_entity *rt_se;
+ int cpu = cpu_of(rq_of_rt_rq(rt_rq));

- rt_se = rt_rq->tg->rt_se[this_cpu];
+ rt_se = rt_rq->tg->rt_se[cpu];

if (rt_se && on_rt_rq(rt_se))
dequeue_rt_entity(rt_se);
@@ -565,8 +566,11 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
if (rt_rq->rt_time || rt_rq->rt_nr_running)
idle = 0;
raw_spin_unlock(&rt_rq->rt_runtime_lock);
- } else if (rt_rq->rt_nr_running)
+ } else if (rt_rq->rt_nr_running) {
idle = 0;
+ if (!rt_rq_throttled(rt_rq))
+ enqueue = 1;
+ }

if (enqueue)
sched_rt_rq_enqueue(rt_rq);

--
Three Cheers,
Balbir

2011-03-03 14:06:05

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Thu, Mar 03, 2011 at 05:04:35PM +0530, Balbir Singh wrote:
> Fix hierarchical scheduling in sched rt group
>
> From: Balbir Singh <[email protected]>
>
> The current sched rt code is broken when it comes to hierarchical
> scheduling, this patch fixes two problems
>
> 1. It adds redundant enqueuing (harmless) when it finds a queue
> has tasks enqueued, but it has no run time and it is not
> throttled.

You say redundant here, so in fact we don't need it, right?

> 2. The most important change is in sched_rt_rq_enqueue/dequeue.
> The code just picks the rt_rq belonging to the current cpu
> on which the period timer runs, the patch fixes it, so that
> the correct rt_se is enqueued/dequeued.

Ah, this is true. It is also needed for stable-2.6.33+

Thanks,
Yong

>
> Tested with a simple hierarchy
>
> /c/d, c and d assigned similar runtimes of 50,000 and a while
> 1 loop runs within "d". Both c and d get throttled, without
> the patch, the task just stops running and never runs (depends
> on where the sched_rt b/w timer runs). With the patch, the
> task is throttled and runs as expected.
>
> [bharata, suggestions on how to pick the rt_se belong to the rt_rq
> and correct cpu]
>
> Signed-off-by: Balbir Singh <[email protected]>
> Signed-off-by: Bharata B Rao <[email protected]>
> ---
> kernel/sched_rt.c | 14 +++++++++-----
> 1 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index ad62677..01f75a5 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -210,11 +210,12 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se);
>
> static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
> {
> - int this_cpu = smp_processor_id();
> struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
> struct sched_rt_entity *rt_se;
>
> - rt_se = rt_rq->tg->rt_se[this_cpu];
> + int cpu = cpu_of(rq_of_rt_rq(rt_rq));
> +
> + rt_se = rt_rq->tg->rt_se[cpu];
>
> if (rt_rq->rt_nr_running) {
> if (rt_se && !on_rt_rq(rt_se))
> @@ -226,10 +227,10 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
>
> static void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
> {
> - int this_cpu = smp_processor_id();
> struct sched_rt_entity *rt_se;
> + int cpu = cpu_of(rq_of_rt_rq(rt_rq));
>
> - rt_se = rt_rq->tg->rt_se[this_cpu];
> + rt_se = rt_rq->tg->rt_se[cpu];
>
> if (rt_se && on_rt_rq(rt_se))
> dequeue_rt_entity(rt_se);
> @@ -565,8 +566,11 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
> if (rt_rq->rt_time || rt_rq->rt_nr_running)
> idle = 0;
> raw_spin_unlock(&rt_rq->rt_runtime_lock);
> - } else if (rt_rq->rt_nr_running)
> + } else if (rt_rq->rt_nr_running) {
> idle = 0;
> + if (!rt_rq_throttled(rt_rq))
> + enqueue = 1;
> + }
>
> if (enqueue)
> sched_rt_rq_enqueue(rt_rq);
>
> --
> Three Cheers,
> Balbir
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-03-03 15:29:31

by Balbir Singh

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Thu, Mar 3, 2011 at 7:35 PM, Yong Zhang <[email protected]> wrote:
> On Thu, Mar 03, 2011 at 05:04:35PM +0530, Balbir Singh wrote:
>> Fix hierarchical scheduling in sched rt group
>>
>> From: Balbir Singh <[email protected]>
>>
>> The current sched rt code is broken when it comes to hierarchical
>> scheduling, this patch fixes two problems
>>
>> 1. It adds redundant enqueuing (harmless) when it finds a queue
>> ? ?has tasks enqueued, but it has no run time and it is not
>> ? ?throttled.
>
> You say redundant here, so in fact we don't need it, right?
>

No, not really :) It is required, it is a backup check to see if we
have queued tasks, rt_time of 0 and the runqueue is not throttled, why
should it be dequeued?

>> 2. The most important change is in sched_rt_rq_enqueue/dequeue.
>> ? ?The code just picks the rt_rq belonging to the current cpu
>> ? ?on which the period timer runs, the patch fixes it, so that
>> ? ?the correct rt_se is enqueued/dequeued.
>
> Ah, this is true. It is also needed for stable-2.6.33+
>

Yep, we'll need backports to stable versions

Balbir

2011-03-04 03:43:18

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Thu, Mar 3, 2011 at 11:29 PM, Balbir Singh <[email protected]> wrote:
> No, not really :) It is required, it is a backup check to see if we
> have queued tasks, rt_time of 0 and the runqueue is not throttled, why
> should it be dequeued?

But I can't see where that kind of rt_rq is dequeued, mind pointing it out?

Thanks,
Yong

--
Only stand for myself

2011-03-04 07:40:29

by Balbir Singh

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

* Yong Zhang <[email protected]> [2011-03-04 11:43:16]:

> On Thu, Mar 3, 2011 at 11:29 PM, Balbir Singh <[email protected]> wrote:
> > No, not really :) It is required, it is a backup check to see if we
> > have queued tasks, rt_time of 0 and the runqueue is not throttled, why
> > should it be dequeued?
>
> But I can't see where that kind of rt_rq is dequeued, mind pointing it out?
>

So here is what I saw

1. sched_dequeue_stack called from the dequeue path dequeues the
queues and sets rt_nr_running to 0
2. Enqueuing fails because rt_throttled is set for the group_rq
(parent who is throttled)
3. This causes further enqueue to fail, since rt_nr_running did
not increment in step 2, eventually the timer decrements rt_time
to 0 and the task is never picked up.

--
Three Cheers,
Balbir

2011-03-04 08:32:51

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Fri, Mar 4, 2011 at 3:25 PM, Balbir Singh <[email protected]> wrote:
> * Yong Zhang <[email protected]> [2011-03-04 11:43:16]:
>
>> On Thu, Mar 3, 2011 at 11:29 PM, Balbir Singh <[email protected]> wrote:
>> > No, not really :) It is required, it is a backup check to see if we
>> > have queued tasks, rt_time of 0 and the runqueue is not throttled, why
>> > should it be dequeued?
>>
>> But I can't see where that kind of rt_rq is dequeued, mind pointing it out?
>>
>
> So here is what I saw
>
> 1. sched_dequeue_stack called from the dequeue path dequeues the
> queues and sets rt_nr_running to 0
> 2. Enqueuing fails because rt_throttled is set for the group_rq
> (parent who is throttled)
> 3. This causes further enqueue to fail, since rt_nr_running did
> not increment in step 2

Whose rt_nr_running? group_rq's or parent of group_rq?
For parent of group_rq, yes.
For group_rq's, no, because we don't touch its rt_nr_running.

> , eventually the timer decrements rt_time
> to 0 and the task is never picked up.

If I understand correctly, you mean this:

For a group(A) which has one task(b) attached but A is throttled,so
A is unqueued now
A.throttled == 1 && A.rt_nr_running == 1
deactivate_task(b); /* A.throttled == 1 && A.rt_nr_running == 0 */
do_sched_rt_period_timer(); /* A.run_time == 0 && A.throttled == 0*/
sched_rt_rq_enqueue(A); /* fails due to A.rt_nr_running == 0 */

But this doesn't prevent task readded to group A and eventually
group A is added back.

Thanks,
Yong

--
Only stand for myself

2011-03-04 08:35:54

by Balbir Singh

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Fri, Mar 4, 2011 at 2:02 PM, Yong Zhang <[email protected]> wrote:
> On Fri, Mar 4, 2011 at 3:25 PM, Balbir Singh <[email protected]> wrote:
>> * Yong Zhang <[email protected]> [2011-03-04 11:43:16]:
>>
>>> On Thu, Mar 3, 2011 at 11:29 PM, Balbir Singh <[email protected]> wrote:
>>> > No, not really :) It is required, it is a backup check to see if we
>>> > have queued tasks, rt_time of 0 and the runqueue is not throttled, why
>>> > should it be dequeued?
>>>
>>> But I can't see where that kind of rt_rq is dequeued, mind pointing it out?
>>>
>>
>> So here is what I saw
>>
>> 1. sched_dequeue_stack called from the dequeue path dequeues the
>> queues and sets rt_nr_running to 0
>> 2. Enqueuing fails because rt_throttled is set for the group_rq
>> (parent who is throttled)
>> 3. This causes further enqueue to fail, since rt_nr_running did
>> not increment in step 2
>
> Whose rt_nr_running? group_rq's or parent of group_rq?
> For parent of group_rq, yes.
> For group_rq's, no, because we don't touch its rt_nr_running.
>

For parent of group_rq

>
>> , eventually the timer decrements rt_time
>> to 0 and the task is never picked up.
>
> If I understand correctly, you mean this:
>
> For a group(A) which has one task(b) attached but A is throttled,so
> A is unqueued now
> A.throttled == 1 && A.rt_nr_running == 1
> ?deactivate_task(b); /* A.throttled == 1 && A.rt_nr_running == 0 */
> ? ?do_sched_rt_period_timer(); /* A.run_time == 0 && A.throttled == 0*/

Note at some point rt_time becomes 0 and if enqueue is not set, the
next do_sched_rt_period_timer() is a NOP and does not enqueue back the
group

Balbir

2011-03-04 08:52:29

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Fri, Mar 4, 2011 at 4:35 PM, Balbir Singh <[email protected]> wrote:
> On Fri, Mar 4, 2011 at 2:02 PM, Yong Zhang <[email protected]> wrote:
>> On Fri, Mar 4, 2011 at 3:25 PM, Balbir Singh <[email protected]> wrote:
>>> * Yong Zhang <[email protected]> [2011-03-04 11:43:16]:
>>>
>>>> On Thu, Mar 3, 2011 at 11:29 PM, Balbir Singh <[email protected]> wrote:
>>>> > No, not really :) It is required, it is a backup check to see if we
>>>> > have queued tasks, rt_time of 0 and the runqueue is not throttled, why
>>>> > should it be dequeued?
>>>>
>>>> But I can't see where that kind of rt_rq is dequeued, mind pointing it out?
>>>>
>>>
>>> So here is what I saw
>>>
>>> 1. sched_dequeue_stack called from the dequeue path dequeues the
>>> queues and sets rt_nr_running to 0
>>> 2. Enqueuing fails because rt_throttled is set for the group_rq
>>> (parent who is throttled)
>>> 3. This causes further enqueue to fail, since rt_nr_running did
>>> not increment in step 2
>>
>> Whose rt_nr_running? group_rq's or parent of group_rq?
>> For parent of group_rq, yes.
>> For group_rq's, no, because we don't touch its rt_nr_running.
>>
>
> For parent of group_rq
>
>>
>>> , eventually the timer decrements rt_time
>>> to 0 and the task is never picked up.
>>
>> If I understand correctly, you mean this:
>>
>> For a group(A) which has one task(b) attached but A is throttled,so
>> A is unqueued now
>> A.throttled == 1 && A.rt_nr_running == 1
>> deactivate_task(b); /* A.throttled == 1 && A.rt_nr_running == 0 */
>> do_sched_rt_period_timer(); /* A.run_time == 0 && A.throttled == 0*/
>
> Note at some point rt_time becomes 0 and if enqueue is not set, the

If the group should be add back, it will be at the first
do_sched_rt_period_timer() which decreases run_time;

> next do_sched_rt_period_timer() is a NOP and does not enqueue back the
> group

Otherwise it will be added back when a task is attching to it.

I still can't see how a unthrottled group which has task attched stay
unqueued.

Thanks,
Yong

--
Only stand for myself

2011-03-04 08:59:11

by Balbir Singh

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

>>> ?deactivate_task(b); /* A.throttled == 1 && A.rt_nr_running == 0 */
>>> ? ?do_sched_rt_period_timer(); /* A.run_time == 0 && A.throttled == 0*/
>>
>> Note at some point rt_time becomes 0 and if enqueue is not set, the
>
> If the group should be add back, it will be at the first
> do_sched_rt_period_timer() which decreases run_time;
>

As long as idle is 0, the period will continue to run, if it has
rt_nr_running or rt_time, the timer will run.

>> next do_sched_rt_period_timer() is a NOP and does not enqueue back the
>> group
>
> Otherwise it will be added back when a task is attching to it.
>
> I still can't see how a unthrottled group which has task attched stay
> unqueued.

The other way of looking at the first change is

Can we have rt_time as 0, rt_nr_running >=1, rt_throttled !=0 and
still not have the rt_rq enqueued?

If this is not the case, we don't lose much, a quick check for
rt_nr_running and on_rt_q

Balbir

2011-03-04 09:31:02

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Fri, Mar 4, 2011 at 4:59 PM, Balbir Singh <[email protected]> wrote:
>>>> deactivate_task(b); /* A.throttled == 1 && A.rt_nr_running == 0 */
>>>> do_sched_rt_period_timer(); /* A.run_time == 0 && A.throttled == 0*/
>>>
>>> Note at some point rt_time becomes 0 and if enqueue is not set, the
>>
>> If the group should be add back, it will be at the first
>> do_sched_rt_period_timer() which decreases run_time;
>>
>
> As long as idle is 0, the period will continue to run, if it has
> rt_nr_running or rt_time, the timer will run.

Yep.
Should be the first do_sched_rt_period_timer() which meets
rt_rq->rt_time < runtime :)

>
>>> next do_sched_rt_period_timer() is a NOP and does not enqueue back the
>>> group
>>
>> Otherwise it will be added back when a task is attching to it.
>>
>> I still can't see how a unthrottled group which has task attched stay
>> unqueued.
>
> The other way of looking at the first change is
>
> Can we have rt_time as 0, rt_nr_running >=1, rt_throttled !=0 and
> still not have the rt_rq enqueued?

Yeah, the same question. How could we reach that?

IMHO, rt_time == 0 and rt_throttled !=0 can't coexist.

>
> If this is not the case, we don't lose much, a quick check for
> rt_nr_running and on_rt_q

I don't get what you mean here.

Thanks,
Yong

--
Only stand for myself

2011-03-04 11:49:36

by Balbir Singh

[permalink] [raw]

Subject: [tip:sched/core] sched: Fix sched rt group scheduling when hierachy is enabled

Commit-ID: 0c3b9168017cbad2c4af3dd65ec93fe646eeaa62
Gitweb: http://git.kernel.org/tip/0c3b9168017cbad2c4af3dd65ec93fe646eeaa62
Author: Balbir Singh <[email protected]>
AuthorDate: Thu, 3 Mar 2011 17:04:35 +0530
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Mar 2011 11:03:18 +0100

sched: Fix sched rt group scheduling when hierachy is enabled

The current sched rt code is broken when it comes to hierarchical
scheduling, this patch fixes two problems

1. It adds redundant enqueuing (harmless) when it finds a queue
has tasks enqueued, but it has no run time and it is not
throttled.

2. The most important change is in sched_rt_rq_enqueue/dequeue.
The code just picks the rt_rq belonging to the current cpu
on which the period timer runs, the patch fixes it, so that
the correct rt_se is enqueued/dequeued.

Tested with a simple hierarchy

/c/d, c and d assigned similar runtimes of 50,000 and a while
1 loop runs within "d". Both c and d get throttled, without
the patch, the task just stops running and never runs (depends
on where the sched_rt b/w timer runs). With the patch, the
task is throttled and runs as expected.

[ bharata, suggestions on how to pick the rt_se belong to the
rt_rq and correct cpu ]

Signed-off-by: Balbir Singh <[email protected]>
Acked-by: Bharata B Rao <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched_rt.c | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index ad62677..01f75a5 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -210,11 +210,12 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se);

static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
- int this_cpu = smp_processor_id();
struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
struct sched_rt_entity *rt_se;

- rt_se = rt_rq->tg->rt_se[this_cpu];
+ int cpu = cpu_of(rq_of_rt_rq(rt_rq));
+
+ rt_se = rt_rq->tg->rt_se[cpu];

if (rt_rq->rt_nr_running) {
if (rt_se && !on_rt_rq(rt_se))
@@ -226,10 +227,10 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)

static void sched_rt_rq_dequeue(struct rt_rq *rt_rq)
{
- int this_cpu = smp_processor_id();
struct sched_rt_entity *rt_se;
+ int cpu = cpu_of(rq_of_rt_rq(rt_rq));

- rt_se = rt_rq->tg->rt_se[this_cpu];
+ rt_se = rt_rq->tg->rt_se[cpu];

if (rt_se && on_rt_rq(rt_se))
dequeue_rt_entity(rt_se);
@@ -565,8 +566,11 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
if (rt_rq->rt_time || rt_rq->rt_nr_running)
idle = 0;
raw_spin_unlock(&rt_rq->rt_runtime_lock);
- } else if (rt_rq->rt_nr_running)
+ } else if (rt_rq->rt_nr_running) {
idle = 0;
+ if (!rt_rq_throttled(rt_rq))
+ enqueue = 1;
+ }

if (enqueue)
sched_rt_rq_enqueue(rt_rq);

2011-03-04 12:11:55

by Balbir Singh

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

* Yong Zhang <[email protected]> [2011-03-04 17:30:58]:

> On Fri, Mar 4, 2011 at 4:59 PM, Balbir Singh <[email protected]> wrote:
> >>>> ?deactivate_task(b); /* A.throttled == 1 && A.rt_nr_running == 0 */
> >>>> ? ?do_sched_rt_period_timer(); /* A.run_time == 0 && A.throttled == 0*/
> >>>
> >>> Note at some point rt_time becomes 0 and if enqueue is not set, the
> >>
> >> If the group should be add back, it will be at the first
> >> do_sched_rt_period_timer() which decreases run_time;
> >>
> >
> > As long as idle is 0, the period will continue to run, if it has
> > rt_nr_running or rt_time, the timer will run.
>
> Yep.
> Should be the first do_sched_rt_period_timer() which meets
> rt_rq->rt_time < runtime :)
>
> >
> >>> next do_sched_rt_period_timer() is a NOP and does not enqueue back the
> >>> group
> >>
> >> Otherwise it will be added back when a task is attching to it.
> >>
> >> I still can't see how a unthrottled group which has task attched stay
> >> unqueued.
> >
> > The other way of looking at the first change is
> >
> > Can we have rt_time as 0, rt_nr_running >=1, rt_throttled !=0 and
> > still not have the rt_rq enqueued?
>
> Yeah, the same question. How could we reach that?
>

I based the changes on what I saw during my debugging/test. I
explained it earlier,

Everyone is dequeued

1. child runs first, finds parent throttled, so it does not queue
anything on parent group. child is unthrottled and rt_time now becomes
0, parent's rt_nr_running is not incremented.
2. Parent timer runs, it is unthrottled, its group->rt_nr_running is 0
hence enqueue is not called

> IMHO, rt_time == 0 and rt_throttled !=0 can't coexist.
>
> >
> > If this is not the case, we don't lose much, a quick check for
> > rt_nr_running and on_rt_q
>
> I don't get what you mean here.
>

I was talking of the overhead of the check I added.

--
Three Cheers,
Balbir

2011-03-07 07:00:27

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Fri, Mar 4, 2011 at 8:11 PM, Balbir Singh <[email protected]> wrote:
> I based the changes on what I saw during my debugging/test. I
> explained it earlier,
>
> Everyone is dequeued
>
> 1. child runs first, finds parent throttled, so it does not queue
> anything on parent group. child is unthrottled and rt_time now becomes
> 0, parent's rt_nr_running is not incremented.
> 2. Parent timer runs, it is unthrottled, its group->rt_nr_running is 0
> hence enqueue is not called

I have tested with the attached(web mail will mangle it) patch with
yours applied. But I failed to trigger that WARNING.

Below is my steps:
1)mount -t cgroup -ocpu cpu /mnt
2)mkdir /mnt/test-1
3)mkdir /mnt/test-1-1
4)set rt_runtime to 100000 for test-1 and test-1-1
5)run a loop task and attach it to test-1-1

So I thought out a scenario to satisfy your description,
but it's based on the unpatched(without your patch) kernel:
Let's assume a dual-core system with test-1/test-1-1
for rt group, a loop task is running on CPU 1 and test-1
and test-1-1 are both throttled.

CPU-0 CPU-1
do_sched_rt_period_timer(test-1-1)
{
for CPU-1
unthrottled test-1-1.rt_rq[1];
but fail to enqueue it because
we alway get test-1-1.rt_se[0]
due to smp_processor_id();
thus test-1.rt_rq[1].nr_running == 0;
and it returned with run_time == 0;
}
do_sched_rt_period_timer(test-1)
unthrottle test-1.rt_rt[1] but
fail to enqueue test-1.rt_rt[1];
because nr_running == 0;

So if we have your patch for issue-1, when
the hrtimer is running on CPU-1, test-1-1
and test-1 will be queued because that
additional check in run_timer == 0 case.

But once we have your patch for issue-2, the above
problem will be killed by it. right?

Correct me if I'm wrong :)

Thanks,
Yong

--
Only stand for myself

Attachments:

0001.patch (532.00 B)

2011-03-08 08:42:06

by Yong Zhang

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

On Mon, Mar 7, 2011 at 3:00 PM, Yong Zhang <[email protected]> wrote:
>
> I have tested with the attached(web mail will mangle it) patch with
> yours applied. But I failed to trigger that WARNING.
>
> Below is my steps:
> 1)mount -t cgroup -ocpu cpu /mnt
> 2)mkdir /mnt/test-1
> 3)mkdir /mnt/test-1-1
> 4)set rt_runtime to 100000 for test-1 and test-1-1
> 5)run a loop task and attach it to test-1-1
>
> So I thought out a scenario to satisfy your description,
> but it's based on the unpatched(without your patch) kernel:
> Let's assume a dual-core system with test-1/test-1-1
> for rt group, a loop task is running on CPU 1 and test-1
> and test-1-1 are both throttled.
>
> CPU-0 CPU-1
> do_sched_rt_period_timer(test-1-1)
> {
> for CPU-1
> unthrottled test-1-1.rt_rq[1];
> but fail to enqueue it because
> we alway get test-1-1.rt_se[0]
> due to smp_processor_id();
> thus test-1.rt_rq[1].nr_running == 0;
> and it returned with run_time == 0;
> }
> do_sched_rt_period_timer(test-1)
> unthrottle test-1.rt_rt[1] but
> fail to enqueue test-1.rt_rt[1];
> because nr_running == 0;
>
> So if we have your patch for issue-1, when
> the hrtimer is running on CPU-1, test-1-1
> and test-1 will be queued because that
> additional check in run_timer == 0 case.
>
> But once we have your patch for issue-2, the above
> problem will be killed by it. right?

And another finding is that the top rt_rq could trigger your
additional code, but we don't need to enqueue
root_task_group.rt_se[].

BTW, I update my patch(attached) to void testing on top rt_rq.

Thanks,
Yong

--
Only stand for myself

Attachments:

0001-updated.patch (597.00 B)

2011-03-08 18:34:59

by Balbir Singh

[permalink] [raw]

Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled

* Yong Zhang <[email protected]> [2011-03-08 16:42:00]:

> On Mon, Mar 7, 2011 at 3:00 PM, Yong Zhang <[email protected]> wrote:
> >
> > I have tested with the attached(web mail will mangle it) patch with
> > yours applied. But I failed to trigger that WARNING.
> >
> > Below is my steps:
> > 1)mount -t cgroup -ocpu cpu /mnt
> > 2)mkdir /mnt/test-1
> > 3)mkdir /mnt/test-1-1
> > 4)set rt_runtime to 100000 for test-1 and test-1-1
> > 5)run a loop task and attach it to test-1-1
> >
> > So I thought out a scenario to satisfy your description,
> > but it's based on the unpatched(without your patch) kernel:
> > Let's assume a dual-core system with test-1/test-1-1
> > for rt group, a loop task is running on CPU 1 and test-1
> > and test-1-1 are both throttled.
> >
> > ? ? ? ? ? ? ?CPU-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?CPU-1
> > do_sched_rt_period_timer(test-1-1)
> > {
> > ?for CPU-1
> > ? ?unthrottled test-1-1.rt_rq[1];
> > ? ? ?but fail to enqueue it because
> > ? ? ?we alway get test-1-1.rt_se[0]
> > ? ? ?due to smp_processor_id();
> > ? ? ?thus test-1.rt_rq[1].nr_running == 0;
> > ? ? ?and it returned with run_time == 0;
> > }
> > do_sched_rt_period_timer(test-1)
> > ?unthrottle test-1.rt_rt[1] but
> > ?fail to enqueue test-1.rt_rt[1];
> > ?because nr_running == 0;
> >
> > ? ? ? ? ? ? ? ? ? ? ? ? ? So if we have your patch for issue-1, when
> > ? ? ? ? ? ? ? ? ? ? ? ? ? the hrtimer is running on CPU-1, test-1-1
> > ? ? ? ? ? ? ? ? ? ? ? ? ? and test-1 will be queued because that
> > ? ? ? ? ? ? ? ? ? ? ? ? ? additional check in run_timer == 0 case.
> >
> > But once we have your patch for issue-2, the above
> > problem will be killed by it. right?
>
> And another finding is that the top rt_rq could trigger your
> additional code, but we don't need to enqueue
> root_task_group.rt_se[].
>
> BTW, I update my patch(attached) to void testing on top rt_rq.
>
> Thanks,
> Yong
>
>
> --
> Only stand for myself

> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 01f75a5..b02b516 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -568,8 +568,14 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
> raw_spin_unlock(&rt_rq->rt_runtime_lock);
> } else if (rt_rq->rt_nr_running) {
> idle = 0;
> - if (!rt_rq_throttled(rt_rq))
> + if (!rt_rq_throttled(rt_rq)) {
> + struct sched_rt_entity *rt_se;
> + int cpu = cpu_of(rq_of_rt_rq(rt_rq));
> +
> + rt_se = rt_rq->tg->rt_se[cpu];
> + WARN_ON(rt_se && !on_rt_rq(rt_se));
> enqueue = 1;

Fair enough, I think it is good to have the warning in there.

> + }
> }
>
> if (enqueue)

--
Three Cheers,
Balbir