2014-12-09 12:23:39

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
grace periods") introduced another problem that can easily be reproduced by
starting/stopping cpus in a loop.

E.g.:
for i in `seq 5000`; do
echo 1 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu1/online
done

Will result in:
INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
Call Trace:
([<00000000006a028e>] __schedule+0x406/0x91c)
[<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
[<0000000000130ff6>] _cpu_up+0x3e/0x1c4
[<0000000000131232>] cpu_up+0xb6/0xd4
[<00000000004a5720>] device_online+0x80/0xc0
[<00000000004a57f0>] online_store+0x90/0xb0
...

And a deadlock.

Problem is that if the last ref in put_online_cpus() can't get the
cpu_hotplug.lock the puts_pending count is incremented, but a sleeping
active_writer might never be woken up, therefore never exiting the loop in
cpu_hotplug_begin().

This fix wakes up the active_writer proactively. The writer already goes back to
sleep if the ref count isn't already down to 0, so this should be fine.

In order to avoid many potential races, we have to:
- Protect current_writer by a spin lock. When holding this lock we can be sure
that the writer won't vainsh or change. (use-after-free)
- Increment the cpu_hotplug.puts_pending count before we test for an
active_writer. (otherwise a wakeup might get lost)
- Move setting of TASK_UNINTERRUPTIBLE in cpu_hotplug_begin() above the
condition check. (otherwise a wakeup might get lost)

Can't reproduce it with this fix.

Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/cpu.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 90a3d01..7489b7a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -58,6 +58,7 @@ static int cpu_hotplug_disabled;

static struct {
struct task_struct *active_writer;
+ spinlock_t awr_lock; /* protects active_writer from being changed */
struct mutex lock; /* Synchronizes accesses to refcount, */
/*
* Also blocks the new readers during
@@ -72,6 +73,7 @@ static struct {
#endif
} cpu_hotplug = {
.active_writer = NULL,
+ .awr_lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.awr_lock),
.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
.refcount = 0,
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -116,7 +118,13 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
if (!mutex_trylock(&cpu_hotplug.lock)) {
+ /* inc before testing for active_writer to not lose wake ups */
atomic_inc(&cpu_hotplug.puts_pending);
+ spin_lock(&cpu_hotplug.awr_lock);
+ /* we might be the last one */
+ if (unlikely(cpu_hotplug.active_writer))
+ wake_up_process(cpu_hotplug.active_writer);
+ spin_unlock(&cpu_hotplug.awr_lock);
cpuhp_lock_release();
return;
}
@@ -156,20 +164,24 @@ EXPORT_SYMBOL_GPL(put_online_cpus);
*/
void cpu_hotplug_begin(void)
{
+ spin_lock(&cpu_hotplug.awr_lock);
cpu_hotplug.active_writer = current;
+ spin_unlock(&cpu_hotplug.awr_lock);

cpuhp_lock_acquire();
for (;;) {
mutex_lock(&cpu_hotplug.lock);
+ __set_current_state(TASK_UNINTERRUPTIBLE);
if (atomic_read(&cpu_hotplug.puts_pending)) {
int delta;

delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
cpu_hotplug.refcount -= delta;
}
- if (likely(!cpu_hotplug.refcount))
+ if (likely(!cpu_hotplug.refcount)) {
+ __set_current_state(TASK_RUNNING);
break;
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ }
mutex_unlock(&cpu_hotplug.lock);
schedule();
}
@@ -177,7 +189,9 @@ void cpu_hotplug_begin(void)

void cpu_hotplug_done(void)
{
+ spin_lock(&cpu_hotplug.awr_lock);
cpu_hotplug.active_writer = NULL;
+ spin_unlock(&cpu_hotplug.awr_lock);
mutex_unlock(&cpu_hotplug.lock);
cpuhp_lock_release();
}
--
1.8.5.5


2014-12-09 14:19:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

On Tue, Dec 09, 2014 at 01:23:31PM +0100, David Hildenbrand wrote:
> Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> grace periods") introduced another problem that can easily be reproduced by
> starting/stopping cpus in a loop.
>
> E.g.:
> for i in `seq 5000`; do
> echo 1 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> done
>
> Will result in:
> INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> Call Trace:
> ([<00000000006a028e>] __schedule+0x406/0x91c)
> [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> [<0000000000131232>] cpu_up+0xb6/0xd4
> [<00000000004a5720>] device_online+0x80/0xc0
> [<00000000004a57f0>] online_store+0x90/0xb0
> ...
>
> And a deadlock.
>
> Problem is that if the last ref in put_online_cpus() can't get the
> cpu_hotplug.lock the puts_pending count is incremented, but a sleeping
> active_writer might never be woken up, therefore never exiting the loop in
> cpu_hotplug_begin().
>
> This fix wakes up the active_writer proactively. The writer already goes back to
> sleep if the ref count isn't already down to 0, so this should be fine.
>
> In order to avoid many potential races, we have to:
> - Protect current_writer by a spin lock. When holding this lock we can be sure
> that the writer won't vainsh or change. (use-after-free)
> - Increment the cpu_hotplug.puts_pending count before we test for an
> active_writer. (otherwise a wakeup might get lost)
> - Move setting of TASK_UNINTERRUPTIBLE in cpu_hotplug_begin() above the
> condition check. (otherwise a wakeup might get lost)
>
> Can't reproduce it with this fix.

Would wait_event()/wake_up() work for the wakeup-writer case?

Thanx, Paul

> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/cpu.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 90a3d01..7489b7a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -58,6 +58,7 @@ static int cpu_hotplug_disabled;
>
> static struct {
> struct task_struct *active_writer;
> + spinlock_t awr_lock; /* protects active_writer from being changed */
> struct mutex lock; /* Synchronizes accesses to refcount, */
> /*
> * Also blocks the new readers during
> @@ -72,6 +73,7 @@ static struct {
> #endif
> } cpu_hotplug = {
> .active_writer = NULL,
> + .awr_lock = __SPIN_LOCK_UNLOCKED(cpu_hotplug.awr_lock),
> .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
> .refcount = 0,
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> @@ -116,7 +118,13 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> if (!mutex_trylock(&cpu_hotplug.lock)) {
> + /* inc before testing for active_writer to not lose wake ups */
> atomic_inc(&cpu_hotplug.puts_pending);
> + spin_lock(&cpu_hotplug.awr_lock);
> + /* we might be the last one */
> + if (unlikely(cpu_hotplug.active_writer))
> + wake_up_process(cpu_hotplug.active_writer);
> + spin_unlock(&cpu_hotplug.awr_lock);
> cpuhp_lock_release();
> return;
> }
> @@ -156,20 +164,24 @@ EXPORT_SYMBOL_GPL(put_online_cpus);
> */
> void cpu_hotplug_begin(void)
> {
> + spin_lock(&cpu_hotplug.awr_lock);
> cpu_hotplug.active_writer = current;
> + spin_unlock(&cpu_hotplug.awr_lock);
>
> cpuhp_lock_acquire();
> for (;;) {
> mutex_lock(&cpu_hotplug.lock);
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> if (atomic_read(&cpu_hotplug.puts_pending)) {
> int delta;
>
> delta = atomic_xchg(&cpu_hotplug.puts_pending, 0);
> cpu_hotplug.refcount -= delta;
> }
> - if (likely(!cpu_hotplug.refcount))
> + if (likely(!cpu_hotplug.refcount)) {
> + __set_current_state(TASK_RUNNING);
> break;
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> + }
> mutex_unlock(&cpu_hotplug.lock);
> schedule();
> }
> @@ -177,7 +189,9 @@ void cpu_hotplug_begin(void)
>
> void cpu_hotplug_done(void)
> {
> + spin_lock(&cpu_hotplug.awr_lock);
> cpu_hotplug.active_writer = NULL;
> + spin_unlock(&cpu_hotplug.awr_lock);
> mutex_unlock(&cpu_hotplug.lock);
> cpuhp_lock_release();
> }
> --
> 1.8.5.5
>

2014-12-09 16:24:39

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

> On Tue, Dec 09, 2014 at 01:23:31PM +0100, David Hildenbrand wrote:
> > Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> > grace periods") introduced another problem that can easily be reproduced by
> > starting/stopping cpus in a loop.
> >
> > E.g.:
> > for i in `seq 5000`; do
> > echo 1 > /sys/devices/system/cpu/cpu1/online
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> > done
> >
> > Will result in:
> > INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> > Call Trace:
> > ([<00000000006a028e>] __schedule+0x406/0x91c)
> > [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> > [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> > [<0000000000131232>] cpu_up+0xb6/0xd4
> > [<00000000004a5720>] device_online+0x80/0xc0
> > [<00000000004a57f0>] online_store+0x90/0xb0
> > ...
> >
> > And a deadlock.
> >
> > Problem is that if the last ref in put_online_cpus() can't get the
> > cpu_hotplug.lock the puts_pending count is incremented, but a sleeping
> > active_writer might never be woken up, therefore never exiting the loop in
> > cpu_hotplug_begin().
> >
> > This fix wakes up the active_writer proactively. The writer already goes back to
> > sleep if the ref count isn't already down to 0, so this should be fine.
> >
> > In order to avoid many potential races, we have to:
> > - Protect current_writer by a spin lock. When holding this lock we can be sure
> > that the writer won't vainsh or change. (use-after-free)
> > - Increment the cpu_hotplug.puts_pending count before we test for an
> > active_writer. (otherwise a wakeup might get lost)
> > - Move setting of TASK_UNINTERRUPTIBLE in cpu_hotplug_begin() above the
> > condition check. (otherwise a wakeup might get lost)
> >
> > Can't reproduce it with this fix.
>
> Would wait_event()/wake_up() work for the wakeup-writer case?
>
> Thanx, Paul
>

Thanks! Was also thinking about wait queues. Will investigate if that can help
to beautify this :)

David

2014-12-09 17:12:21

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

On Tue, Dec 09, 2014 at 05:24:27PM +0100, David Hildenbrand wrote:
> > On Tue, Dec 09, 2014 at 01:23:31PM +0100, David Hildenbrand wrote:
> > > Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> > > grace periods") introduced another problem that can easily be reproduced by
> > > starting/stopping cpus in a loop.
> > >
> > > E.g.:
> > > for i in `seq 5000`; do
> > > echo 1 > /sys/devices/system/cpu/cpu1/online
> > > echo 0 > /sys/devices/system/cpu/cpu1/online
> > > done
> > >
> > > Will result in:
> > > INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> > > Call Trace:
> > > ([<00000000006a028e>] __schedule+0x406/0x91c)
> > > [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> > > [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> > > [<0000000000131232>] cpu_up+0xb6/0xd4
> > > [<00000000004a5720>] device_online+0x80/0xc0
> > > [<00000000004a57f0>] online_store+0x90/0xb0
> > > ...
> > >
> > > And a deadlock.
> > >
> > > Problem is that if the last ref in put_online_cpus() can't get the
> > > cpu_hotplug.lock the puts_pending count is incremented, but a sleeping
> > > active_writer might never be woken up, therefore never exiting the loop in
> > > cpu_hotplug_begin().
> > >
> > > This fix wakes up the active_writer proactively. The writer already goes back to
> > > sleep if the ref count isn't already down to 0, so this should be fine.
> > >
> > > In order to avoid many potential races, we have to:
> > > - Protect current_writer by a spin lock. When holding this lock we can be sure
> > > that the writer won't vainsh or change. (use-after-free)
> > > - Increment the cpu_hotplug.puts_pending count before we test for an
> > > active_writer. (otherwise a wakeup might get lost)
> > > - Move setting of TASK_UNINTERRUPTIBLE in cpu_hotplug_begin() above the
> > > condition check. (otherwise a wakeup might get lost)
> > >
> > > Can't reproduce it with this fix.
> >
> > Would wait_event()/wake_up() work for the wakeup-writer case?
>
> Thanks! Was also thinking about wait queues. Will investigate if that can help
> to beautify this :)

Looking forward to seeing it. ;-)

Thanx, Paul

2014-12-10 00:13:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

(sorry if this was already discussed, I ignored most of my emails
I got this week)

On 12/09, David Hildenbrand wrote:
>
> @@ -116,7 +118,13 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> if (!mutex_trylock(&cpu_hotplug.lock)) {
> + /* inc before testing for active_writer to not lose wake ups */
> atomic_inc(&cpu_hotplug.puts_pending);
> + spin_lock(&cpu_hotplug.awr_lock);
> + /* we might be the last one */
> + if (unlikely(cpu_hotplug.active_writer))
> + wake_up_process(cpu_hotplug.active_writer);
> + spin_unlock(&cpu_hotplug.awr_lock);

Not sure I understand. awr_lock can only ensure that active_writer
can't go away.

Why active_writer should see .puts_pending != 0 if this is called
right after cpu_hotplug_begin() takes cpu_hotplug.lock but before
it sets TASK_UNINTERRUPTIBLE?

IOW,

> void cpu_hotplug_begin(void)
> {
> + spin_lock(&cpu_hotplug.awr_lock);
> cpu_hotplug.active_writer = current;
> + spin_unlock(&cpu_hotplug.awr_lock);
>
> cpuhp_lock_acquire();
> for (;;) {
> mutex_lock(&cpu_hotplug.lock);
> + __set_current_state(TASK_UNINTERRUPTIBLE);

don't we need set_current_state() here ?

Oleg.

2014-12-10 01:20:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

On 12/09, Paul E. McKenney wrote:
>
> Would wait_event()/wake_up() work for the wakeup-writer case?

Yes, and in this case we could probably kill this puts_pending logic
and avoid cpu_hotplug.lock in put_online_cpus() altogether? Can't we
just make cpu_hotplug.refcount atomic_t?

Anyway, this makes me think again that this code should use percpu_rwsem.
Perhaps I'll try to make a patch next week...

(we need down_write_recursive_readers(), and probably rcusync patches).

Oleg.

2014-12-10 03:18:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

On Wed, Dec 10, 2014 at 01:26:19AM +0100, Oleg Nesterov wrote:
> On 12/09, Paul E. McKenney wrote:
> >
> > Would wait_event()/wake_up() work for the wakeup-writer case?
>
> Yes, and in this case we could probably kill this puts_pending logic
> and avoid cpu_hotplug.lock in put_online_cpus() altogether? Can't we
> just make cpu_hotplug.refcount atomic_t?

Seems like that should be possible. That would certainly simplify the
wakeup logic from put_online_cpus(). It might even be possible to
avoid acquiring cpu_hotplug.lock in put_online_cpus(), though that
would of course require more luck than anyone deserves.

> Anyway, this makes me think again that this code should use percpu_rwsem.
> Perhaps I'll try to make a patch next week...
>
> (we need down_write_recursive_readers(), and probably rcusync patches).

Careful! You might end up re-introducing the deadlock that I used
puts_pending to get rid of. ;-)

Thanx, Paul

2014-12-10 07:56:36

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

> (sorry if this was already discussed, I ignored most of my emails
> I got this week)
>
> On 12/09, David Hildenbrand wrote:
> >
> > @@ -116,7 +118,13 @@ void put_online_cpus(void)
> > if (cpu_hotplug.active_writer == current)
> > return;
> > if (!mutex_trylock(&cpu_hotplug.lock)) {
> > + /* inc before testing for active_writer to not lose wake ups */
> > atomic_inc(&cpu_hotplug.puts_pending);
> > + spin_lock(&cpu_hotplug.awr_lock);
> > + /* we might be the last one */
> > + if (unlikely(cpu_hotplug.active_writer))
> > + wake_up_process(cpu_hotplug.active_writer);
> > + spin_unlock(&cpu_hotplug.awr_lock);
>
> Not sure I understand. awr_lock can only ensure that active_writer
> can't go away.

This solution is not optimal but works without races ... I'll try to get
something with wait queues running and/or even change the way refcount is
accessed as suggested by you.

And yes, awr_lock will only ensure that active_writer won't go away.

>
> Why active_writer should see .puts_pending != 0 if this is called
> right after cpu_hotplug_begin() takes cpu_hotplug.lock but before
> it sets TASK_UNINTERRUPTIBLE?

get_online_cpus() increased the refcount.
put_online_cpus() will increment puts_pending and trigger a wake up (if the
lock is alread taken - might be by cpu_hotplug_begin() or by some other
get_online_cpus()).

So refcount == 1, puts_pending == 1

cpu_hotplug_begin() gets the lock and sees refcount == 1 and puts_pending == 0
or puts_pending == 1 (race with put_online_cpus()).

If that answers your question :)

>
> IOW,
>
> > void cpu_hotplug_begin(void)
> > {
> > + spin_lock(&cpu_hotplug.awr_lock);
> > cpu_hotplug.active_writer = current;
> > + spin_unlock(&cpu_hotplug.awr_lock);
> >
> > cpuhp_lock_acquire();
> > for (;;) {
> > mutex_lock(&cpu_hotplug.lock);
> > + __set_current_state(TASK_UNINTERRUPTIBLE);
>
> don't we need set_current_state() here ?

Hm, good question, this was only a move of existing code. But I thing the
checked variant would be better.

>
> Oleg.
>

Thanks!

David

2014-12-10 19:00:05

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

On 12/10, David Hildenbrand wrote:
>
> > Why active_writer should see .puts_pending != 0 if this is called
> > right after cpu_hotplug_begin() takes cpu_hotplug.lock but before
> > it sets TASK_UNINTERRUPTIBLE?
>
> get_online_cpus() increased the refcount.
> put_online_cpus() will increment puts_pending and trigger a wake up (if the
> lock is alread taken - might be by cpu_hotplug_begin() or by some other
> get_online_cpus()).
>
> So refcount == 1, puts_pending == 1
>
> cpu_hotplug_begin() gets the lock and sees refcount == 1 and puts_pending == 0
> or puts_pending == 1 (race with put_online_cpus()).
>
> If that answers your question :)

Sorry for confusion ;)

I meant that without mb() cpu_hotplug_begin() can miss puts_pending != 0,
so it needs set_current_state() before atomic_read().

But this doesn't matter, your v4 uses wake_up/prepare_to_wait.

> > IOW,
> >
> > > void cpu_hotplug_begin(void)
> > > {
> > > + spin_lock(&cpu_hotplug.awr_lock);
> > > cpu_hotplug.active_writer = current;
> > > + spin_unlock(&cpu_hotplug.awr_lock);
> > >
> > > cpuhp_lock_acquire();
> > > for (;;) {
> > > mutex_lock(&cpu_hotplug.lock);
> > > + __set_current_state(TASK_UNINTERRUPTIBLE);
> >
> > don't we need set_current_state() here ?
>
> Hm, good question, this was only a move of existing code. But I thing the
> checked variant would be better.
>
> >
> > Oleg.
> >
>
> Thanks!
>
> David
>