2008-12-09 14:48:05

by Brian King

[permalink] [raw]
Subject: [PATCH 1/1] sched: CPU remove deadlock fix


This patch fixes a possible deadlock scenario in the CPU remove path.
migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
with the lock held. Then one of the tasks on the migration queue ends up
calling tg_shares_up which then also tries to acquire the same rq->lock.

[c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
[c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
[c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
[c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
[c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
[c000000058eab640] c00000000007ada4 .complete+0x54/0x80
[c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
[c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
[c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
[c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
[c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
[c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
[c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
[c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
[c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
[c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40

Signed-off-by: Brian King <[email protected]>
---

kernel/sched.c | 2 ++
1 file changed, 2 insertions(+)

diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
--- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
+++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
@@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
req = list_entry(rq->migration_queue.next,
struct migration_req, list);
list_del_init(&req->list);
+ spin_unlock_irq(&rq->lock);
complete(&req->done);
+ spin_lock_irq(&rq->lock);
}
spin_unlock_irq(&rq->lock);
break;
_


2008-12-09 14:52:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix

On Tue, 2008-12-09 at 08:47 -0600, Brian King wrote:
> This patch fixes a possible deadlock scenario in the CPU remove path.
> migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
> with the lock held. Then one of the tasks on the migration queue ends up
> calling tg_shares_up which then also tries to acquire the same rq->lock.

Looks ok, does lockdep agree?

> Signed-off-by: Brian King <[email protected]>
> ---
>
> kernel/sched.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
> --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
> +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
> @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
> req = list_entry(rq->migration_queue.next,
> struct migration_req, list);
> list_del_init(&req->list);
> + spin_unlock_irq(&rq->lock);
> complete(&req->done);
> + spin_lock_irq(&rq->lock);
> }
> spin_unlock_irq(&rq->lock);
> break;
> _

2008-12-09 15:00:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix

On Tue, 2008-12-09 at 15:52 +0100, Peter Zijlstra wrote:
> On Tue, 2008-12-09 at 08:47 -0600, Brian King wrote:
> > This patch fixes a possible deadlock scenario in the CPU remove path.
> > migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
> > with the lock held. Then one of the tasks on the migration queue ends up
> > calling tg_shares_up which then also tries to acquire the same rq->lock.
>
> Looks ok, does lockdep agree?

On second thought, I'm not seeing it at all..

why doesn't every wakeup deadlock?

> > Signed-off-by: Brian King <[email protected]>
> > ---
> >
> > kernel/sched.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
> > --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
> > +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
> > @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
> > req = list_entry(rq->migration_queue.next,
> > struct migration_req, list);
> > list_del_init(&req->list);
> > + spin_unlock_irq(&rq->lock);
> > complete(&req->done);
> > + spin_lock_irq(&rq->lock);
> > }
> > spin_unlock_irq(&rq->lock);
> > break;
> > _

2008-12-09 15:08:27

by Brian King

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix

Peter Zijlstra wrote:
> On Tue, 2008-12-09 at 08:47 -0600, Brian King wrote:
>> This patch fixes a possible deadlock scenario in the CPU remove path.
>> migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
>> with the lock held. Then one of the tasks on the migration queue ends up
>> calling tg_shares_up which then also tries to acquire the same rq->lock.
>
> Looks ok, does lockdep agree?

I didn't have lockdep enabled when I verified it, but the section of code now
looks like:

spin_lock_irq(&rq->lock);
while (!list_empty(&rq->migration_queue)) {
struct migration_req *req;

req = list_entry(rq->migration_queue.next,
struct migration_req, list);
list_del_init(&req->list);
spin_unlock_irq(&rq->lock);
complete(&req->done);
spin_lock_irq(&rq->lock);
}
spin_unlock_irq(&rq->lock);

So I'm pretty sure lockdep will agree.

-Brian

>
>> Signed-off-by: Brian King <[email protected]>
>> ---
>>
>> kernel/sched.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
>> --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
>> +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
>> @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
>> req = list_entry(rq->migration_queue.next,
>> struct migration_req, list);
>> list_del_init(&req->list);
>> + spin_unlock_irq(&rq->lock);
>> complete(&req->done);
>> + spin_lock_irq(&rq->lock);
>> }
>> spin_unlock_irq(&rq->lock);
>> break;
>> _
>


--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

2008-12-09 15:12:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix

On Tue, 2008-12-09 at 15:59 +0100, Peter Zijlstra wrote:
> On Tue, 2008-12-09 at 15:52 +0100, Peter Zijlstra wrote:
> > On Tue, 2008-12-09 at 08:47 -0600, Brian King wrote:
> > > This patch fixes a possible deadlock scenario in the CPU remove path.
> > > migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
> > > with the lock held. Then one of the tasks on the migration queue ends up
> > > calling tg_shares_up which then also tries to acquire the same rq->lock.
> >
> > Looks ok, does lockdep agree?
>
> On second thought, I'm not seeing it at all..
>
> why doesn't every wakeup deadlock?

because I'm blind...

void __wake_up(wait_queue_head_t *q, unsigned int mode,
int nr_exclusive, void *key)
{
unsigned long flags;

spin_lock_irqsave(&q->lock, flags);
__wake_up_common(q, mode, nr_exclusive, 0, key);
spin_unlock_irqrestore(&q->lock, flags);
}

that's q->lock, not rq->lock...

> > > Signed-off-by: Brian King <[email protected]>
> > > ---
> > >
> > > kernel/sched.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
> > > --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
> > > +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
> > > @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
> > > req = list_entry(rq->migration_queue.next,
> > > struct migration_req, list);
> > > list_del_init(&req->list);
> > > + spin_unlock_irq(&rq->lock);
> > > complete(&req->done);
> > > + spin_lock_irq(&rq->lock);
> > > }
> > > spin_unlock_irq(&rq->lock);
> > > break;
> > > _

2008-12-09 15:15:59

by Brian King

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix

Peter Zijlstra wrote:
> On Tue, 2008-12-09 at 15:52 +0100, Peter Zijlstra wrote:
>> On Tue, 2008-12-09 at 08:47 -0600, Brian King wrote:
>>> This patch fixes a possible deadlock scenario in the CPU remove path.
>>> migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
>>> with the lock held. Then one of the tasks on the migration queue ends up
>>> calling tg_shares_up which then also tries to acquire the same rq->lock.
>> Looks ok, does lockdep agree?
>
> On second thought, I'm not seeing it at all..
>
> why doesn't every wakeup deadlock?

>From what I can tell, the only other place that does a complete(&req->done)
is in migration_thread, and there the lock is released before doing the wakeup.

-Brian

>
>>> Signed-off-by: Brian King <[email protected]>
>>> ---
>>>
>>> kernel/sched.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
>>> --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
>>> +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
>>> @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
>>> req = list_entry(rq->migration_queue.next,
>>> struct migration_req, list);
>>> list_del_init(&req->list);
>>> + spin_unlock_irq(&rq->lock);
>>> complete(&req->done);
>>> + spin_lock_irq(&rq->lock);
>>> }
>>> spin_unlock_irq(&rq->lock);
>>> break;
>>> _


--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

2008-12-09 18:14:19

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix

On Tue, Dec 09, 2008 at 08:47:00AM -0600, Brian King wrote:
>
> This patch fixes a possible deadlock scenario in the CPU remove path.
> migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
> with the lock held. Then one of the tasks on the migration queue ends up
> calling tg_shares_up which then also tries to acquire the same rq->lock.
>
> [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
> [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
> [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
> [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
> [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
> [c000000058eab640] c00000000007ada4 .complete+0x54/0x80
> [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
> [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
> [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
> [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
> [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
> [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
> [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
> [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
> [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
> [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40
>
> Signed-off-by: Brian King <[email protected]>
> ---
>
> kernel/sched.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
> --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
> +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
> @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
> req = list_entry(rq->migration_queue.next,
> struct migration_req, list);
> list_del_init(&req->list);
> + spin_unlock_irq(&rq->lock);
> complete(&req->done);
> + spin_lock_irq(&rq->lock);
> }
> spin_unlock_irq(&rq->lock);
> break;

I've seen the same deadlock on s390 and had a similar fix, but
no testing yet. Thanks for the fix! :)

Will this be in 2.6.28?

2008-12-09 18:27:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix


* Brian King <[email protected]> wrote:

> This patch fixes a possible deadlock scenario in the CPU remove path.
> migration_call grabs rq->lock, then wakes up everything on
> rq->migration_queue with the lock held. Then one of the tasks on the
> migration queue ends up calling tg_shares_up which then also tries to
> acquire the same rq->lock.
>
> [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
> [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
> [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
> [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
> [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
> [c000000058eab640] c00000000007ada4 .complete+0x54/0x80
> [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
> [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
> [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
> [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
> [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
> [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
> [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
> [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
> [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
> [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40
>
> Signed-off-by: Brian King <[email protected]>
> ---
>
> kernel/sched.c | 2 ++
> 1 file changed, 2 insertions(+)

good catch - applied to tip/sched/urgent, thanks Brian!

Ingo

2008-12-09 18:28:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/1] sched: CPU remove deadlock fix


* Heiko Carstens <[email protected]> wrote:

> On Tue, Dec 09, 2008 at 08:47:00AM -0600, Brian King wrote:
> >
> > This patch fixes a possible deadlock scenario in the CPU remove path.
> > migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
> > with the lock held. Then one of the tasks on the migration queue ends up
> > calling tg_shares_up which then also tries to acquire the same rq->lock.
> >
> > [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
> > [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
> > [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
> > [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
> > [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
> > [c000000058eab640] c00000000007ada4 .complete+0x54/0x80
> > [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
> > [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
> > [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
> > [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
> > [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
> > [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
> > [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
> > [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
> > [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
> > [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40
> >
> > Signed-off-by: Brian King <[email protected]>
> > ---
> >
> > kernel/sched.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff -puN kernel/sched.c~sched_cpu_down_deadlock_fix kernel/sched.c
> > --- linux-2.6/kernel/sched.c~sched_cpu_down_deadlock_fix 2008-12-09 08:42:09.000000000 -0600
> > +++ linux-2.6-bjking1/kernel/sched.c 2008-12-09 08:42:09.000000000 -0600
> > @@ -6587,7 +6587,9 @@ migration_call(struct notifier_block *nf
> > req = list_entry(rq->migration_queue.next,
> > struct migration_req, list);
> > list_del_init(&req->list);
> > + spin_unlock_irq(&rq->lock);
> > complete(&req->done);
> > + spin_lock_irq(&rq->lock);
> > }
> > spin_unlock_irq(&rq->lock);
> > break;
>
> I've seen the same deadlock on s390 and had a similar fix, but
> no testing yet. Thanks for the fix! :)
>
> Will this be in 2.6.28?

yeah, most definitely.

Ingo