2015-02-12 00:34:39

by Paul E. McKenney

[permalink] [raw]
Subject: BUG: spinlock bad magic on CPU#0, migration/0/9

Hello!

Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
hit the following on x86_64. This happened after about 15 minutes of
rcutorture. In contrast, I have been doing successful 15-hour runs
on v3.19. I will check reproducibility and try to narrow it down.
Might this be a duplicate of the bug that Raghavendra posted a fix for?

Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).

[ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
[ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: ?g?<81>????/0, .owner_cpu: -42
[ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
[ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
[ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
[ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
[ 837.287013] Call Trace:
[ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
[ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
[ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
[ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
[ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
[ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
[ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
[ 837.287013] [<ffffffff810ebb15>] cpu_stop_signal_done+0x25/0x30
[ 837.287013] [<ffffffff810ebf66>] cpu_stopper_thread+0x86/0x100
[ 837.287013] [<ffffffff810ebc73>] ? cpu_stop_should_run+0x23/0x50
[ 837.287013] [<ffffffff81902591>] ? _raw_spin_unlock_irqrestore+0x31/0x50
[ 837.287013] [<ffffffff8109420d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 837.287013] [<ffffffff810942dd>] ? trace_hardirqs_on+0xd/0x10
[ 837.287013] [<ffffffff810732a3>] smpboot_thread_fn+0x143/0x1b0
[ 837.287013] [<ffffffff81073160>] ? SyS_setgroups+0x150/0x150
[ 837.287013] [<ffffffff8106f438>] kthread+0xe8/0x100
[ 837.287013] [<ffffffff8106f350>] ? kthread_create_on_node+0x220/0x220
[ 837.287013] [<ffffffff81902dac>] ret_from_fork+0x7c/0xb0
[ 837.287013] [<ffffffff8106f350>] ? kthread_create_on_node+0x220/0x220

Things went downhill from there. ;-)

This affected only one of my 16 test scenarios.

Thanx, Paul


Attachments:
(No filename) (2.38 kB)
.config (95.24 kB)
.config
Download all attachments

2015-02-12 03:15:24

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> Hello!
>
> Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> hit the following on x86_64. This happened after about 15 minutes of
> rcutorture. In contrast, I have been doing successful 15-hour runs
> on v3.19. I will check reproducibility and try to narrow it down.
> Might this be a duplicate of the bug that Raghavendra posted a fix for?
>
> Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
>
> [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: gî<81>ÿÿÿÿ/0, .owner_cpu: -42
> [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> [ 837.287013] Call Trace:
> [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50

We did have some recent changes in completions:

7c34e318 (sched/completion: Add lock-free checking of the blocking case)
de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)

The second one being more related (although both appear to make sense).
Perhaps some subtle implication in the completion_done side that
disappeared with the spinlock?

Thanks,
Davidlohr

2015-02-12 03:43:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Wed, Feb 11, 2015 at 07:15:11PM -0800, Davidlohr Bueso wrote:
> On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > Hello!
> >
> > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > hit the following on x86_64. This happened after about 15 minutes of
> > rcutorture. In contrast, I have been doing successful 15-hour runs
> > on v3.19. I will check reproducibility and try to narrow it down.
> > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> >
> > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> >
> > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > [ 837.287013] Call Trace:
> > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
>
> We did have some recent changes in completions:
>
> 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
>
> The second one being more related (although both appear to make sense).
> Perhaps some subtle implication in the completion_done side that
> disappeared with the spinlock?

I will try reverting those, thank you!

Thanx, Paul

2015-02-12 17:30:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On 02/11, Davidlohr Bueso wrote:
>
> On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > Hello!
> >
> > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > hit the following on x86_64. This happened after about 15 minutes of
> > rcutorture. In contrast, I have been doing successful 15-hour runs
> > on v3.19. I will check reproducibility and try to narrow it down.
> > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> >
> > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> >
> > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > [ 837.287013] Call Trace:
> > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
>
> We did have some recent changes in completions:
>
> 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
>
> The second one being more related (although both appear to make sense).
> Perhaps some subtle implication in the completion_done side that
> disappeared with the spinlock?

At first glance both changes look suspicious. Unless at least document how
you can use these helpers.

Consider this code:

void xxx(void)
{
struct completion c;

init_completion(&c);

expose_this_completion(&c);

while (!completion_done(&c)
schedule_timeout_uninterruptible(1);
}

Before that change this code was correct, now it is not. Hmm and note that
this is what stop_machine_from_inactive_cpu() does although I do not know
if this is related or not.

Because completion_done() can now race with complete(), the final
spin_unlock() can write to the memory after it was freed/reused. In this
case it can write to the stack after return.

Add CC's.

Oleg.

2015-02-12 17:45:05

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On 02/12, Oleg Nesterov wrote:
> On 02/11, Davidlohr Bueso wrote:
> >
> > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > Hello!
> > >
> > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > hit the following on x86_64. This happened after about 15 minutes of
> > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > on v3.19. I will check reproducibility and try to narrow it down.
> > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > >
> > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > >
> > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > [ 837.287013] Call Trace:
> > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> >
> > We did have some recent changes in completions:
> >
> > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> >
> > The second one being more related (although both appear to make sense).
> > Perhaps some subtle implication in the completion_done side that
> > disappeared with the spinlock?
>
> At first glance both changes look suspicious.

No, sorry, only the 2nd one.

> Unless at least document how
> you can use these helpers.
>
> Consider this code:
>
> void xxx(void)
> {
> struct completion c;
>
> init_completion(&c);
>
> expose_this_completion(&c);
>
> while (!completion_done(&c)
> schedule_timeout_uninterruptible(1);
> }
>
> Before that change this code was correct, now it is not. Hmm and note that
> this is what stop_machine_from_inactive_cpu() does although I do not know
> if this is related or not.
>
> Because completion_done() can now race with complete(), the final
> spin_unlock() can write to the memory after it was freed/reused. In this
> case it can write to the stack after return.
>
> Add CC's.

Nicholas, don't we need something like below?

Oleg.


--- x/kernel/sched/completion.c
+++ x/kernel/sched/completion.c
@@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
* first without taking the lock so we can
* return early in the blocking case.
*/
- if (!ACCESS_ONCE(x->done))
+ if (!READ_ONCE(x->done))
return 0;

spin_lock_irqsave(&x->wait.lock, flags);
@@ -297,6 +297,11 @@ EXPORT_SYMBOL(try_wait_for_completion);
*/
bool completion_done(struct completion *x)
{
- return !!ACCESS_ONCE(x->done);
+ if (!READ_ONCE(x->done))
+ return false;
+
+ smp_rmb();
+ spin_unlock_wait(&x->wait.lock);
+ return true;
}
EXPORT_SYMBOL(completion_done);

2015-02-12 17:58:33

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Thu, 2015-02-12 at 18:41 +0100, Oleg Nesterov wrote:
> On 02/12, Oleg Nesterov wrote:
> > On 02/11, Davidlohr Bueso wrote:
> > >
> > > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > > hit the following on x86_64. This happened after about 15 minutes of
> > > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > > on v3.19. I will check reproducibility and try to narrow it down.
> > > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > > >
> > > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > > >
> > > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: gî<81>ÿÿÿÿ/0, .owner_cpu: -42
> > > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > > [ 837.287013] Call Trace:
> > > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> > >
> > > We did have some recent changes in completions:
> > >
> > > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> > >
> > > The second one being more related (although both appear to make sense).
> > > Perhaps some subtle implication in the completion_done side that
> > > disappeared with the spinlock?
> >
> > At first glance both changes look suspicious.
>
> No, sorry, only the 2nd one.

Yes, we can get away with it for the "try" case, and we recheck the done
condition again after grabbing the lock anyway.

2015-02-12 19:10:14

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Thu, 12 Feb 2015, Oleg Nesterov wrote:

> On 02/12, Oleg Nesterov wrote:
> > On 02/11, Davidlohr Bueso wrote:
> > >
> > > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > > hit the following on x86_64. This happened after about 15 minutes of
> > > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > > on v3.19. I will check reproducibility and try to narrow it down.
> > > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > > >
> > > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > > >
> > > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > > [ 837.287013] Call Trace:
> > > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> > >
> > > We did have some recent changes in completions:
> > >
> > > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> > >
> > > The second one being more related (although both appear to make sense).
> > > Perhaps some subtle implication in the completion_done side that
> > > disappeared with the spinlock?
> >
> > At first glance both changes look suspicious.
>
> No, sorry, only the 2nd one.
>
> > Unless at least document how
> > you can use these helpers.
> >
> > Consider this code:
> >
> > void xxx(void)
> > {
> > struct completion c;
> >
> > init_completion(&c);
> >
> > expose_this_completion(&c);
> >
> > while (!completion_done(&c)
> > schedule_timeout_uninterruptible(1);

But that would not break due to the change - even if completion_done() had a
problem - complete_done() is not consuming x->done it is only checking it?

> > }
> >
> > Before that change this code was correct, now it is not. Hmm and note that
> > this is what stop_machine_from_inactive_cpu() does although I do not know
> > if this is related or not.
> >
> > Because completion_done() can now race with complete(), the final
> > spin_unlock() can write to the memory after it was freed/reused. In this
> > case it can write to the stack after return.
> >
> > Add CC's.
>
> Nicholas, don't we need something like below?
>
> Oleg.
>
>
> --- x/kernel/sched/completion.c
> +++ x/kernel/sched/completion.c
> @@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
> * first without taking the lock so we can
> * return early in the blocking case.
> */
> - if (!ACCESS_ONCE(x->done))
> + if (!READ_ONCE(x->done))
> return 0;
>
> spin_lock_irqsave(&x->wait.lock, flags);
> @@ -297,6 +297,11 @@ EXPORT_SYMBOL(try_wait_for_completion);
> */
> bool completion_done(struct completion *x)
> {
> - return !!ACCESS_ONCE(x->done);
> + if (!READ_ONCE(x->done))
> + return false;
> +
> + smp_rmb();
> + spin_unlock_wait(&x->wait.lock);
> + return true;

what would be the sense of the spin_unlock_wait here ?
all you are interested in here is the state of x->done

regarding the smp_rmb() where would the counterpart to that be ?

looking at it and trying to see how the changes could cause your ooops - don't yet see it.

thx!
hofrat

2015-02-12 19:32:16

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Thu, 12 Feb 2015, Oleg Nesterov wrote:

> On 02/12, Oleg Nesterov wrote:
> > On 02/11, Davidlohr Bueso wrote:
> > >
> > > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > > hit the following on x86_64. This happened after about 15 minutes of
> > > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > > on v3.19. I will check reproducibility and try to narrow it down.
> > > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > > >
> > > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > > >
> > > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > > [ 837.287013] Call Trace:
> > > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> > >
> > > We did have some recent changes in completions:
> > >
> > > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> > >
> > > The second one being more related (although both appear to make sense).
> > > Perhaps some subtle implication in the completion_done side that
> > > disappeared with the spinlock?
> >
> > At first glance both changes look suspicious.
>
> No, sorry, only the 2nd one.
>
> > Unless at least document how
> > you can use these helpers.
> >
> > Consider this code:
> >
> > void xxx(void)
> > {
> > struct completion c;
> >
> > init_completion(&c);
> >
> > expose_this_completion(&c);
> >
> > while (!completion_done(&c)
> > schedule_timeout_uninterruptible(1);
> > }
> >
> > Before that change this code was correct, now it is not. Hmm and note that
> > this is what stop_machine_from_inactive_cpu() does although I do not know
> > if this is related or not.
> >
> > Because completion_done() can now race with complete(), the final
> > spin_unlock() can write to the memory after it was freed/reused. In this
> > case it can write to the stack after return.
> >
> > Add CC's.
>
> Nicholas, don't we need something like below?
>
> Oleg.
>
>
> --- x/kernel/sched/completion.c
> +++ x/kernel/sched/completion.c
> @@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
> * first without taking the lock so we can
> * return early in the blocking case.
> */
> - if (!ACCESS_ONCE(x->done))
> + if (!READ_ONCE(x->done))
> return 0;
>
from looking at compiler.h I don't think that there would be a difference
between ACCESS_ONCE() and READ_ONCE() in this case - done is an unsigned int
here so it would be a single read instruction on the PPC440 here as well
and would not resort to the barrier protected memcpy.

Is the oops reproducible ? If it is could you drop me a few lines how to trigger
this ?

thx!
hofrat

2015-02-12 19:39:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On 02/12, Nicholas Mc Guire wrote:
>
> On Thu, 12 Feb 2015, Oleg Nesterov wrote:
>
> > No, sorry, only the 2nd one.
> >
> > > Unless at least document how
> > > you can use these helpers.
> > >
> > > Consider this code:
> > >
> > > void xxx(void)
> > > {
> > > struct completion c;
> > >
> > > init_completion(&c);
> > >
> > > expose_this_completion(&c);
> > >
> > > while (!completion_done(&c)
> > > schedule_timeout_uninterruptible(1);
>
> But that would not break due to the change - even if completion_done() had a
> problem - complete_done() is not consuming x->done it is only checking it?

Nicholas, looks like you didn't read the text below:

> > > Before that change this code was correct, now it is not. Hmm and note that
> > > this is what stop_machine_from_inactive_cpu() does although I do not know
> > > if this is related or not.
> > >
> > > Because completion_done() can now race with complete(), the final
> > > spin_unlock() can write to the memory after it was freed/reused. In this
> > > case it can write to the stack after return.

Or I misunderstood you.

> > bool completion_done(struct completion *x)
> > {
> > - return !!ACCESS_ONCE(x->done);
> > + if (!READ_ONCE(x->done))
> > + return false;
> > +
> > + smp_rmb();
> > + spin_unlock_wait(&x->wait.lock);
> > + return true;
>
> what would be the sense of the spin_unlock_wait here ?
> all you are interested in here is the state of x->done

No. Please see above.

> regarding the smp_rmb() where would the counterpart to that be ?

to avoid the reordering, we should not read ->wait.lock before ->done.

Oleg.

2015-02-12 19:41:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On 02/12, Nicholas Mc Guire wrote:
>
> On Thu, 12 Feb 2015, Oleg Nesterov wrote:
>
> > --- x/kernel/sched/completion.c
> > +++ x/kernel/sched/completion.c
> > @@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
> > * first without taking the lock so we can
> > * return early in the blocking case.
> > */
> > - if (!ACCESS_ONCE(x->done))
> > + if (!READ_ONCE(x->done))
> > return 0;
> >
> from looking at compiler.h I don't think that there would be a difference
> between ACCESS_ONCE() and READ_ONCE() in this case

Yes, this is unrelated "while at it" cosmetic change, now that we have
READ_ONCE() it makes more sense in this case.

Oleg.

2015-02-12 20:01:22

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] sched/completion: completion_done() should serialize with complete()

Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
reading completion state" was not correct, without lock/unlock the code
like stop_machine_from_inactive_cpu()

while (!completion_done())
cpu_relax();

can return before complete() finishes its spin_unlock() which writes to
this memory. And spin_unlock_wait().

While at it, change try_wait_for_completion() to use READ_ONCE().

Reported-by: "Paul E. McKenney" <[email protected]>
Reported-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>

--- x/kernel/sched/completion.c
+++ x/kernel/sched/completion.c
@@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
* first without taking the lock so we can
* return early in the blocking case.
*/
- if (!ACCESS_ONCE(x->done))
+ if (!READ_ONCE(x->done))
return 0;

spin_lock_irqsave(&x->wait.lock, flags);
@@ -297,6 +297,11 @@ EXPORT_SYMBOL(try_wait_for_completion);
*/
bool completion_done(struct completion *x)
{
- return !!ACCESS_ONCE(x->done);
+ if (!READ_ONCE(x->done))
+ return false;
+
+ smp_rmb();
+ spin_unlock_wait(&x->wait.lock);
+ return true;
}
EXPORT_SYMBOL(completion_done);

2015-02-12 19:59:35

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Thu, 2015-02-12 at 20:10 +0100, Nicholas Mc Guire wrote:
> On Thu, 12 Feb 2015, Oleg Nesterov wrote:
>
> > On 02/12, Oleg Nesterov wrote:
> > > On 02/11, Davidlohr Bueso wrote:
> > > >
> > > > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > > > Hello!
> > > > >
> > > > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > > > hit the following on x86_64. This happened after about 15 minutes of
> > > > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > > > on v3.19. I will check reproducibility and try to narrow it down.
> > > > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > > > >
> > > > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > > > >
> > > > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: g?<81>????/0, .owner_cpu: -42
> > > > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > > > [ 837.287013] Call Trace:
> > > > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> > > >
> > > > We did have some recent changes in completions:
> > > >
> > > > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > > > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> > > >
> > > > The second one being more related (although both appear to make sense).
> > > > Perhaps some subtle implication in the completion_done side that
> > > > disappeared with the spinlock?
> > >
> > > At first glance both changes look suspicious.
> >
> > No, sorry, only the 2nd one.
> >
> > > Unless at least document how
> > > you can use these helpers.
> > >
> > > Consider this code:
> > >
> > > void xxx(void)
> > > {
> > > struct completion c;
> > >
> > > init_completion(&c);
> > >
> > > expose_this_completion(&c);
> > >
> > > while (!completion_done(&c)
> > > schedule_timeout_uninterruptible(1);
>
> But that would not break due to the change - even if completion_done() had a
> problem - complete_done() is not consuming x->done it is only checking it?

There is for instance contention to consider. When this happens:

CPU0 CPU1

complete(x)
spining_lock_irqsave()
completion_done(x)
load x->done;

x->done++;
wakeup

Before the change in completion_done() we would respect the order of the
completion checks with the spinlock ticket, so you wouldn't run into
this issue. The impact would obviously depend how the waiting for
completion is done, for instance issuing more sleeps when the lock
becomes very contended.

2015-02-12 21:29:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

Nicholas, sorry, I sent the patch but forgot to CC you.
See https://lkml.org/lkml/2015/2/12/587

And please note that "completion" was specially designed to guarantee
that complete() can't play with this memory after wait_for_completion/etc
returns.

On 02/12, Oleg Nesterov wrote:
>
> On 02/12, Nicholas Mc Guire wrote:
> >
> > On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> >
> > > No, sorry, only the 2nd one.
> > >
> > > > Unless at least document how
> > > > you can use these helpers.
> > > >
> > > > Consider this code:
> > > >
> > > > void xxx(void)
> > > > {
> > > > struct completion c;
> > > >
> > > > init_completion(&c);
> > > >
> > > > expose_this_completion(&c);
> > > >
> > > > while (!completion_done(&c)
> > > > schedule_timeout_uninterruptible(1);
> >
> > But that would not break due to the change - even if completion_done() had a
> > problem - complete_done() is not consuming x->done it is only checking it?
>
> Nicholas, looks like you didn't read the text below:
>
> > > > Before that change this code was correct, now it is not. Hmm and note that
> > > > this is what stop_machine_from_inactive_cpu() does although I do not know
> > > > if this is related or not.
> > > >
> > > > Because completion_done() can now race with complete(), the final
> > > > spin_unlock() can write to the memory after it was freed/reused. In this
> > > > case it can write to the stack after return.
>
> Or I misunderstood you.
>
> > > bool completion_done(struct completion *x)
> > > {
> > > - return !!ACCESS_ONCE(x->done);
> > > + if (!READ_ONCE(x->done))
> > > + return false;
> > > +
> > > + smp_rmb();
> > > + spin_unlock_wait(&x->wait.lock);
> > > + return true;
> >
> > what would be the sense of the spin_unlock_wait here ?
> > all you are interested in here is the state of x->done
>
> No. Please see above.
>
> > regarding the smp_rmb() where would the counterpart to that be ?
>
> to avoid the reordering, we should not read ->wait.lock before ->done.
>
> Oleg.

2015-02-13 18:18:00

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Thu, 12 Feb 2015, Oleg Nesterov wrote:

> Nicholas, sorry, I sent the patch but forgot to CC you.
> See https://lkml.org/lkml/2015/2/12/587
>
> And please note that "completion" was specially designed to guarantee
> that complete() can't play with this memory after wait_for_completion/etc
> returns.
>

hmmm.... I guess that "falling out of context" can happen in a number of cases
with completion - any of the timeout/interruptible variants e.g:

void xxx(void)
{
struct completion c;

init_completion(&c);

expose_this_completion(&c);

wait_for_completion_timeout(&c,A_FEW_JIFFIES);
}

and if the other side did not call complete() within A_FEW_JIFFIES then
it would result in the same failure - I don't think the API can prevent
this type of bug. Tt has to be ensured by additional locking e.g:
drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting
the completion to NULL and testing for !NULL before calling complete()
with appropriate locking protection access.

Never the less of course the proposed change in completion_done() was a bug -
many thanks for catching that so quickly !

> On 02/12, Oleg Nesterov wrote:
> >
> > On 02/12, Nicholas Mc Guire wrote:
> > >
> > > On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> > >
> > > > No, sorry, only the 2nd one.
> > > >
> > > > > Unless at least document how
> > > > > you can use these helpers.
> > > > >
> > > > > Consider this code:
> > > > >
> > > > > void xxx(void)
> > > > > {
> > > > > struct completion c;
> > > > >
> > > > > init_completion(&c);
> > > > >
> > > > > expose_this_completion(&c);
> > > > >
> > > > > while (!completion_done(&c)
> > > > > schedule_timeout_uninterruptible(1);
> > >
> > > But that would not break due to the change - even if completion_done() had a
> > > problem - complete_done() is not consuming x->done it is only checking it?
> >
> > Nicholas, looks like you didn't read the text below:
> >
> > > > > Before that change this code was correct, now it is not. Hmm and note that
> > > > > this is what stop_machine_from_inactive_cpu() does although I do not know
> > > > > if this is related or not.
> > > > >
> > > > > Because completion_done() can now race with complete(), the final
> > > > > spin_unlock() can write to the memory after it was freed/reused. In this
> > > > > case it can write to the stack after return.
> >
> > Or I misunderstood you.
> >
> > > > bool completion_done(struct completion *x)
> > > > {
> > > > - return !!ACCESS_ONCE(x->done);
> > > > + if (!READ_ONCE(x->done))
> > > > + return false;
> > > > +
> > > > + smp_rmb();
> > > > + spin_unlock_wait(&x->wait.lock);
> > > > + return true;
> > >
> > > what would be the sense of the spin_unlock_wait here ?
> > > all you are interested in here is the state of x->done
> >
> > No. Please see above.
> >
> > > regarding the smp_rmb() where would the counterpart to that be ?
> >
> > to avoid the reordering, we should not read ->wait.lock before ->done.
> >
> > Oleg.
>

thx!
hofrat

2015-02-13 18:55:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On 02/13, Nicholas Mc Guire wrote:
>
> On Thu, 12 Feb 2015, Oleg Nesterov wrote:
>
> > Nicholas, sorry, I sent the patch but forgot to CC you.
> > See https://lkml.org/lkml/2015/2/12/587
> >
> > And please note that "completion" was specially designed to guarantee
> > that complete() can't play with this memory after wait_for_completion/etc
> > returns.
> >
>
> hmmm.... I guess that "falling out of context" can happen in a number of cases
> with completion - any of the timeout/interruptible variants e.g:
>
> void xxx(void)
> {
> struct completion c;
>
> init_completion(&c);
>
> expose_this_completion(&c);
>
> wait_for_completion_timeout(&c,A_FEW_JIFFIES);
> }
>
> and if the other side did not call complete() within A_FEW_JIFFIES then
> it would result in the same failure - I don't think the API can prevent
> this type of bug.

Yes sure, but in this case the user of wait_for_completion_timeout() should
blame itself, it is simply buggy.

> Tt has to be ensured by additional locking

Yes, but

> drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting
> the completion to NULL and testing for !NULL before calling complete()
> with appropriate locking protection access.

I don't understand this code, I can be easily wrong. but at first glance it
doesn't need completion at all. Exactly because it relies on the additional
fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could
simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process().
Nevermind, this is off-topic and most probably I misread this code.

> Never the less of course the proposed change in completion_done() was a bug -
> many thanks for catching that so quickly !

OK, perhaps you can ack the fix I sent?

Oleg.

2015-02-13 21:09:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] sched/completion: completion_done() should serialize with complete()

On Thu, Feb 12, 2015 at 08:59:13PM +0100, Oleg Nesterov wrote:
> Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
> reading completion state" was not correct, without lock/unlock the code
> like stop_machine_from_inactive_cpu()
>
> while (!completion_done())
> cpu_relax();
>
> can return before complete() finishes its spin_unlock() which writes to
> this memory. And spin_unlock_wait().
>
> While at it, change try_wait_for_completion() to use READ_ONCE().
>
> Reported-by: "Paul E. McKenney" <[email protected]>
> Reported-by: Davidlohr Bueso <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>

So I am having some difficulty reproducing the original problem, but
the patch passes rcutorture testing. So...

Tested-by: Paul E. McKenney <[email protected]>

> --- x/kernel/sched/completion.c
> +++ x/kernel/sched/completion.c
> @@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
> * first without taking the lock so we can
> * return early in the blocking case.
> */
> - if (!ACCESS_ONCE(x->done))
> + if (!READ_ONCE(x->done))
> return 0;
>
> spin_lock_irqsave(&x->wait.lock, flags);
> @@ -297,6 +297,11 @@ EXPORT_SYMBOL(try_wait_for_completion);
> */
> bool completion_done(struct completion *x)
> {
> - return !!ACCESS_ONCE(x->done);
> + if (!READ_ONCE(x->done))
> + return false;
> +
> + smp_rmb();
> + spin_unlock_wait(&x->wait.lock);
> + return true;
> }
> EXPORT_SYMBOL(completion_done);
>

2015-02-13 21:56:36

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] sched/completion: completion_done() should serialize with complete()

On Thu, 2015-02-12 at 20:59 +0100, Oleg Nesterov wrote:
> Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
> reading completion state" was not correct, without lock/unlock the code
> like stop_machine_from_inactive_cpu()
>
> while (!completion_done())
> cpu_relax();
>
> can return before complete() finishes its spin_unlock() which writes to
> this memory. And spin_unlock_wait().

How about reverting the patch altogether?

This was never a problem nor have I ever seen a performance issues in
completions that would merit these lockless checks. The commit changelog
has *zero* information, so I don't know if this was ever a real issue.

Thanks,
Davidlohr

2015-02-13 22:03:06

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] sched/completion: completion_done() should serialize with complete()

On Fri, 2015-02-13 at 13:56 -0800, Davidlohr Bueso wrote:
> On Thu, 2015-02-12 at 20:59 +0100, Oleg Nesterov wrote:
> > Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
> > reading completion state" was not correct, without lock/unlock the code
> > like stop_machine_from_inactive_cpu()
> >
> > while (!completion_done())
> > cpu_relax();
> >
> > can return before complete() finishes its spin_unlock() which writes to
> > this memory. And spin_unlock_wait().
>
> How about reverting the patch altogether?
>
> This was never a problem nor have I ever seen a performance issues in
> completions that would merit these lockless checks. The commit changelog
> has *zero* information, so I don't know if this was ever a real issue.
>

hmm I guess you're patch is more optimal tho, because we don't update
the lock, less cacheline bouncing issues etc.

2015-02-14 08:35:59

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On Fri, 13 Feb 2015, Oleg Nesterov wrote:

> On 02/13, Nicholas Mc Guire wrote:
> >
> > On Thu, 12 Feb 2015, Oleg Nesterov wrote:
> >
> > > Nicholas, sorry, I sent the patch but forgot to CC you.
> > > See https://lkml.org/lkml/2015/2/12/587
> > >
> > > And please note that "completion" was specially designed to guarantee
> > > that complete() can't play with this memory after wait_for_completion/etc
> > > returns.
> > >
> >
> > hmmm.... I guess that "falling out of context" can happen in a number of cases
> > with completion - any of the timeout/interruptible variants e.g:
> >
> > void xxx(void)
> > {
> > struct completion c;
> >
> > init_completion(&c);
> >
> > expose_this_completion(&c);
> >
> > wait_for_completion_timeout(&c,A_FEW_JIFFIES);
> > }
> >
> > and if the other side did not call complete() within A_FEW_JIFFIES then
> > it would result in the same failure - I don't think the API can prevent
> > this type of bug.
>
> Yes sure, but in this case the user of wait_for_completion_timeout() should
> blame itself, it is simply buggy.
>
> > Tt has to be ensured by additional locking
>
> Yes, but
>
> > drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting
> > the completion to NULL and testing for !NULL before calling complete()
> > with appropriate locking protection access.
>
> I don't understand this code, I can be easily wrong. but at first glance it
> doesn't need completion at all. Exactly because it relies on the additional
> fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could
> simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process().
> Nevermind, this is off-topic and most probably I misread this code.
>

this is unfortunately true for a few other places as well - so the problem of
going out of scope with the _timeout/interruptible variants is quite general
and there is no clean solution. you are right that its the users code that is
buggy if the struct completion drops out of context - was jsut thinking if it
were not a resonable extension of the competion API to eliminate that need to
mess with locks to resolve this by adding a caccelation mechanism that would
resolve this at the API level.

Basically if you call wait_for_completion_timeout and the timeout condition
occures you always need some way of notifying the completing end that it
should no longer call complete()/complete_all().

> > Never the less of course the proposed change in completion_done() was a bug -
> > many thanks for catching that so quickly !
>
> OK, perhaps you can ack the fix I sent?
>

the only question I still have is that there would be no matching
smp_wmb() to the smp_rmb() you are using (atleast I did not figure out where).


thx!
hofrat

2015-02-14 14:02:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9

On 02/14, Nicholas Mc Guire wrote:
>
> Basically if you call wait_for_completion_timeout and the timeout condition
> occures you always need some way of notifying the completing end that it
> should no longer call complete()/complete_all().

Sure. "struct completion" doesn't differ from any other object when it comes
to use-after-free.

> > OK, perhaps you can ack the fix I sent?
>
> the only question I still have is that there would be no matching
> smp_wmb() to the smp_rmb() you are using (atleast I did not figure out where).

You seem to assume that every rmb() must be paired with wmb(). This is not
always true.

But as for completion_done(), its rmb() pairs with "release" semantics of
complete()->spin_unlock(), which is a "one way" barrier.

Oleg.

2015-02-16 08:21:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/completion: completion_done() should serialize with complete()

On Thu, Feb 12, 2015 at 08:59:13PM +0100, Oleg Nesterov wrote:
> Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
> reading completion state" was not correct, without lock/unlock the code
> like stop_machine_from_inactive_cpu()
>
> while (!completion_done())
> cpu_relax();
>
> can return before complete() finishes its spin_unlock() which writes to
> this memory. And spin_unlock_wait().
>
> While at it, change try_wait_for_completion() to use READ_ONCE().

So I share Davidlohrs concern if we should not simply revert that
change; but given we've now gone over it detail I suppose we should just
keep the optimized version.

I did add a comment to your patch; and queued the below for
sched/urgent.

---
Subject: sched/completion: completion_done() should serialize with complete()
From: Oleg Nesterov <[email protected]>
Date: Thu, 12 Feb 2015 20:59:13 +0100

Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
reading completion state" was not correct, without lock/unlock the code
like stop_machine_from_inactive_cpu()

while (!completion_done())
cpu_relax();

can return before complete() finishes its spin_unlock() which writes to
this memory. And spin_unlock_wait().

While at it, change try_wait_for_completion() to use READ_ONCE().

Fixes: de30ec47302c ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state")
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Nicholas Mc Guire <[email protected]>
Cc: Linus Torvalds <[email protected]>
Reported-by: "Paul E. McKenney" <[email protected]>
Tested-by: "Paul E. McKenney" <[email protected]>
Reported-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
[peterz: Add a comment with the barrier]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
---
kernel/sched/completion.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -274,7 +274,7 @@ bool try_wait_for_completion(struct comp
* first without taking the lock so we can
* return early in the blocking case.
*/
- if (!ACCESS_ONCE(x->done))
+ if (!READ_ONCE(x->done))
return 0;

spin_lock_irqsave(&x->wait.lock, flags);
@@ -297,6 +297,21 @@ EXPORT_SYMBOL(try_wait_for_completion);
*/
bool completion_done(struct completion *x)
{
- return !!ACCESS_ONCE(x->done);
+ if (!READ_ONCE(x->done))
+ return false;
+
+ /*
+ * If ->done, we need to wait for complete() to release ->wait.lock
+ * otherwise we can end up freeing the completion before complete()
+ * is done referencing it.
+ *
+ * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
+ * the loads of ->done and ->wait.lock such that we cannot observe
+ * the lock before complete() acquires it while observing the ->done
+ * after it's acquired the lock.
+ */
+ smp_rmb();
+ spin_unlock_wait(&x->wait.lock);
+ return true;
}
EXPORT_SYMBOL(completion_done);

2015-02-16 16:54:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sched/completion: completion_done() should serialize with complete()

On 02/16, Peter Zijlstra wrote:
>
> On Thu, Feb 12, 2015 at 08:59:13PM +0100, Oleg Nesterov wrote:
> > Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
> > reading completion state" was not correct, without lock/unlock the code
> > like stop_machine_from_inactive_cpu()
> >
> > while (!completion_done())
> > cpu_relax();
> >
> > can return before complete() finishes its spin_unlock() which writes to
> > this memory. And spin_unlock_wait().
> >
> > While at it, change try_wait_for_completion() to use READ_ONCE().
>
> So I share Davidlohrs concern

Ah. I forgot to reply to Davidlohr's email. Sorry.

> if we should not simply revert that
> change; but given we've now gone over it detail I suppose we should just
> keep the optimized version.

Yes, I was going to say that of course I won't argue if we simply revert
that commit. As he rigthly pointed the lockless check doesn't make sense
performance-wise.

However, this code needs a comment to explain why we can't simply check
->done and return, unlock_wait() is more documentation than optimization.

But,

> I did add a comment to your patch; and queued the below for
> sched/urgent.

Thanks!

Now this logic is actually documented ;) unlock_wait() alone could confuse
the reader too.

Oleg.

Subject: [tip:sched/core] sched/completion: Serialize completion_done() with complete()

Commit-ID: bc9560155f4063bbc9be71bd69d6726d41b47653
Gitweb: http://git.kernel.org/tip/bc9560155f4063bbc9be71bd69d6726d41b47653
Author: Oleg Nesterov <[email protected]>
AuthorDate: Thu, 12 Feb 2015 20:59:13 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 18 Feb 2015 14:27:40 +0100

sched/completion: Serialize completion_done() with complete()

Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
reading completion state" was not correct, without lock/unlock the code
like stop_machine_from_inactive_cpu()

while (!completion_done())
cpu_relax();

can return before complete() finishes its spin_unlock() which writes to
this memory. And spin_unlock_wait().

While at it, change try_wait_for_completion() to use READ_ONCE().

Reported-by: Paul E. McKenney <[email protected]>
Reported-by: Davidlohr Bueso <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[ Added a comment with the barrier. ]
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Mc Guire <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: de30ec47302c ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/completion.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 7052d3f..8d0f35d 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -274,7 +274,7 @@ bool try_wait_for_completion(struct completion *x)
* first without taking the lock so we can
* return early in the blocking case.
*/
- if (!ACCESS_ONCE(x->done))
+ if (!READ_ONCE(x->done))
return 0;

spin_lock_irqsave(&x->wait.lock, flags);
@@ -297,6 +297,21 @@ EXPORT_SYMBOL(try_wait_for_completion);
*/
bool completion_done(struct completion *x)
{
- return !!ACCESS_ONCE(x->done);
+ if (!READ_ONCE(x->done))
+ return false;
+
+ /*
+ * If ->done, we need to wait for complete() to release ->wait.lock
+ * otherwise we can end up freeing the completion before complete()
+ * is done referencing it.
+ *
+ * The RMB pairs with complete()'s RELEASE of ->wait.lock and orders
+ * the loads of ->done and ->wait.lock such that we cannot observe
+ * the lock before complete() acquires it while observing the ->done
+ * after it's acquired the lock.
+ */
+ smp_rmb();
+ spin_unlock_wait(&x->wait.lock);
+ return true;
}
EXPORT_SYMBOL(completion_done);