2002-01-10 05:10:26

by Kevin O'Connor

[permalink] [raw]
Subject: lock order in O(1) scheduler


Hi Ingo,

I was looking through the new O(1) scheduler (found in linux-2.5.2-pre11),
when I came upon the following code in try_to_wake_up():

lock_task_rq(rq, p, flags);
p->state = TASK_RUNNING;
if (!p->array) {
if (!rt_task(p) && synchronous && (smp_processor_id() < p->cpu)) {
spin_lock(&this_rq()->lock);
p->cpu = smp_processor_id();
activate_task(p, this_rq());
spin_unlock(&this_rq()->lock);
} else {

I was unable to figure out what the logic of the '(smp_processor_id() <
p->cpu)' test is.. (Why should the CPU number of the process being awoken
matter?) My best guess is that this is to enforce a locking invariant -
but if so, isn't this test backwards? If p->cpu > current->cpu then
p->cpu's runqueue is locked first followed by this_rq - locking greatest to
least, where the rest of the code does least to greatest..

Also, this code in set_cpus_allowed() looks bogus:

if (target_cpu < smp_processor_id()) {
spin_lock_irq(&target_rq->lock);
spin_lock(&this_rq->lock);
} else {
spin_lock_irq(&target_rq->lock);
spin_lock(&this_rq->lock);
}

The lock order is the same regardless of the if statement..


-Kevin

--
------------------------------------------------------------------------
| Kevin O'Connor "BTW, IMHO we need a FAQ for |
| [email protected] 'IMHO', 'FAQ', 'BTW', etc. !" |
------------------------------------------------------------------------


2002-01-10 05:24:07

by Robert Love

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler

On Thu, 2002-01-10 at 00:10, [email protected] wrote:

> I was unable to figure out what the logic of the '(smp_processor_id() <
> p->cpu)' test is.. (Why should the CPU number of the process being awoken
> matter?) My best guess is that this is to enforce a locking invariant -
> but if so, isn't this test backwards? If p->cpu > current->cpu then
> p->cpu's runqueue is locked first followed by this_rq - locking greatest to
> least, where the rest of the code does least to greatest..

Not so sure of the validity, but it is to respect lock order. Locking
order is to obtain the locks lowest CPU id first to prevent AB/BA
deadlock. See the comment above the runqueue data structure for
explanation.

> Also, this code in set_cpus_allowed() looks bogus:
>
> if (target_cpu < smp_processor_id()) {
> spin_lock_irq(&target_rq->lock);
> spin_lock(&this_rq->lock);
> } else {
> spin_lock_irq(&target_rq->lock);
> spin_lock(&this_rq->lock);
> }

This is certainly wrong, I noticed this earlier today. The unlocking
order is not respected either, I suspect.

I believe the code should be:

if (target_cpu < smp_processor_id()) {
spin_lock_irq(&target_rq->lock);
spin_lock(&this_rq->lock);
} else {
spin_lock_irq(&this_rq->lock);
spin_lock(&target_rq->lock);
}

Not so sure about unlocking. Ingo?

Robert Love

2002-01-10 05:31:07

by David Miller

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler

From: Robert Love <[email protected]>
Date: 10 Jan 2002 00:26:08 -0500

Not so sure about unlocking. Ingo?

Unlocking order doesn't matter wrt. ABBA deadlock.

2002-01-10 05:33:38

by Davide Libenzi

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler

On 10 Jan 2002, Robert Love wrote:

> On Thu, 2002-01-10 at 00:10, [email protected] wrote:
>
> > I was unable to figure out what the logic of the '(smp_processor_id() <
> > p->cpu)' test is.. (Why should the CPU number of the process being awoken
> > matter?) My best guess is that this is to enforce a locking invariant -
> > but if so, isn't this test backwards? If p->cpu > current->cpu then
> > p->cpu's runqueue is locked first followed by this_rq - locking greatest to
> > least, where the rest of the code does least to greatest..
>
> Not so sure of the validity, but it is to respect lock order. Locking
> order is to obtain the locks lowest CPU id first to prevent AB/BA
> deadlock. See the comment above the runqueue data structure for
> explanation.
>
> > Also, this code in set_cpus_allowed() looks bogus:
> >
> > if (target_cpu < smp_processor_id()) {
> > spin_lock_irq(&target_rq->lock);
> > spin_lock(&this_rq->lock);
> > } else {
> > spin_lock_irq(&target_rq->lock);
> > spin_lock(&this_rq->lock);
> > }
>
> This is certainly wrong, I noticed this earlier today. The unlocking
> order is not respected either, I suspect.
>
> I believe the code should be:
>
> if (target_cpu < smp_processor_id()) {
> spin_lock_irq(&target_rq->lock);
> spin_lock(&this_rq->lock);
> } else {
> spin_lock_irq(&this_rq->lock);
> spin_lock(&target_rq->lock);
> }

Yes, this is a classical example of the famous cut-and-paste bug :)


>
> Not so sure about unlocking. Ingo?

Unlocking doesn't matter.




- Davide


2002-01-10 05:47:27

by Robert Love

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler

On Thu, 2002-01-10 at 00:29, David S. Miller wrote:

> Unlocking order doesn't matter wrt. ABBA deadlock.

Indeed. Thank you.

Anyhow, Ingo, here is a patch for the typo in set_cpus_allowed:

diff -urN linux-2.5.2-pre10/ linux/
--- linux-2.5.2-pre10/kernel/sched.c Tue Jan 8 00:26:17 2002
+++ linux/kernel/sched.c Thu Jan 10 00:41:38 2002
@@ -813,8 +813,8 @@
spin_lock_irq(&target_rq->lock);
spin_lock(&this_rq->lock);
} else {
- spin_lock_irq(&target_rq->lock);
- spin_lock(&this_rq->lock);
+ spin_lock_irq(&this_rq->lock);
+ spin_lock(&target_rq->lock);
}
dequeue_task(p, p->array);
this_rq->nr_running--;

Robert Love

2002-01-10 05:48:58

by Robert Love

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler

On Thu, 2002-01-10 at 00:10, [email protected] wrote:

> I was unable to figure out what the logic of the '(smp_processor_id() <
> p->cpu)' test is.. (Why should the CPU number of the process being awoken
> matter?) My best guess is that this is to enforce a locking invariant -
> but if so, isn't this test backwards? If p->cpu > current->cpu then
> p->cpu's runqueue is locked first followed by this_rq - locking greatest to
> least, where the rest of the code does least to greatest..

OK, I replied I was unsure of the validity, but looking this over, I now
suspect it is wrong.

The test should be (smp_processor_id() > p->cpu). Thus it would be safe
to lock this_rq since it is of a lower cpu id than p's rq.

Robert Love

2002-01-10 06:30:24

by Kevin O'Connor

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler

On Thu, Jan 10, 2002 at 12:26:08AM -0500, Robert Love wrote:
> On Thu, 2002-01-10 at 00:10, [email protected] wrote:
>
> > I was unable to figure out what the logic of the '(smp_processor_id() <
> > p->cpu)' test is.. (Why should the CPU number of the process being awoken
> > matter?) My best guess is that this is to enforce a locking invariant -
> > but if so, isn't this test backwards? If p->cpu > current->cpu then
> > p->cpu's runqueue is locked first followed by this_rq - locking greatest to
> > least, where the rest of the code does least to greatest..
>
> Not so sure of the validity, but it is to respect lock order.
[...]

Right. It is a confusing though - depending on the value of p->cpu
relative to current->cpu, the synchronous flag may be ignored. Since the
relationship between p->cpu and current->cpu is essentially random, this
makes the behavior of the synchronous flag random - why bother?

I can see that re-grabbing the locks in the proper order would be a pain.
Also, after a quick grep, it appears only fs/pipe.c is interested in the
wake_up_sync() variant. Maybe this feature should just be culled?

-Kevin


The try_to_wake_up function in pre11 (just for reference):


static int try_to_wake_up(task_t * p, int synchronous)
{
unsigned long flags;
int success = 0;
runqueue_t *rq;

lock_task_rq(rq, p, flags);
p->state = TASK_RUNNING;
if (!p->array) {
if (!rt_task(p) && synchronous && (smp_processor_id() < p->cpu)) {
spin_lock(&this_rq()->lock);
p->cpu = smp_processor_id();
activate_task(p, this_rq());
spin_unlock(&this_rq()->lock);
} else {
activate_task(p, rq);
if ((rq->curr == rq->idle) ||
(p->prio < rq->curr->prio))
resched_task(rq->curr);
}
success = 1;
}
unlock_task_rq(rq, p, flags);
return success;
}

--
------------------------------------------------------------------------
| Kevin O'Connor "BTW, IMHO we need a FAQ for |
| [email protected] 'IMHO', 'FAQ', 'BTW', etc. !" |
------------------------------------------------------------------------

2002-01-10 11:14:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: lock order in O(1) scheduler


On 10 Jan 2002, Robert Love wrote:

> I believe the code should be:
>
> if (target_cpu < smp_processor_id()) {
> spin_lock_irq(&target_rq->lock);
> spin_lock(&this_rq->lock);
> } else {
> spin_lock_irq(&this_rq->lock);
> spin_lock(&target_rq->lock);
> }
>
> Not so sure about unlocking. Ingo?

yep, correct, good catch!

the unlocking order does not matter much.

Ingo