Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
grace periods") introduced another problem that can easily be reproduced by
starting/stopping cpus in a loop.
E.g.:
for i in `seq 5000`; do
echo 1 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu1/online
done
Will result in:
INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
Call Trace:
([<00000000006a028e>] __schedule+0x406/0x91c)
[<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
[<0000000000130ff6>] _cpu_up+0x3e/0x1c4
[<0000000000131232>] cpu_up+0xb6/0xd4
[<00000000004a5720>] device_online+0x80/0xc0
[<00000000004a57f0>] online_store+0x90/0xb0
...
And a deadlock.
Problem is that if the last ref in put_online_cpus() can't get the
cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
This quick fix wakes up the active_writer proactively. The writer already
goes back to sleep if the ref count isn't already down to 0, so this should be
fine.
Can't reproduce the error with this fix.
Signed-off-by: David Hildenbrand <[email protected]>
---
kernel/cpu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 90a3d01..e77740583 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -117,6 +117,9 @@ void put_online_cpus(void)
return;
if (!mutex_trylock(&cpu_hotplug.lock)) {
atomic_inc(&cpu_hotplug.puts_pending);
+ /* we might be the last one */
+ if (unlikely(cpu_hotplug.active_writer))
+ wake_up_process(cpu_hotplug.active_writer);
cpuhp_lock_release();
return;
}
--
1.8.5.5
The title should of course say active_writer ... grml
David
On Mon, Dec 08, 2014 at 07:13:03PM +0100, David Hildenbrand wrote:
> Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> grace periods") introduced another problem that can easily be reproduced by
> starting/stopping cpus in a loop.
>
> E.g.:
> for i in `seq 5000`; do
> echo 1 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> done
>
> Will result in:
> INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> Call Trace:
> ([<00000000006a028e>] __schedule+0x406/0x91c)
> [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> [<0000000000131232>] cpu_up+0xb6/0xd4
> [<00000000004a5720>] device_online+0x80/0xc0
> [<00000000004a57f0>] online_store+0x90/0xb0
> ...
>
> And a deadlock.
>
> Problem is that if the last ref in put_online_cpus() can't get the
> cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
> might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
>
> This quick fix wakes up the active_writer proactively. The writer already
> goes back to sleep if the ref count isn't already down to 0, so this should be
> fine.
>
> Can't reproduce the error with this fix.
Good catch!
But don't we need to use exactly the same value for the NULL check
and for the wakeup? Otherwise, wouldn't it be possible for
cpu_hotplug.active_writer to be non-NULL for the check but NULL
for the wake_up_process()?
Thanx, Paul
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> kernel/cpu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 90a3d01..e77740583 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -117,6 +117,9 @@ void put_online_cpus(void)
> return;
> if (!mutex_trylock(&cpu_hotplug.lock)) {
> atomic_inc(&cpu_hotplug.puts_pending);
> + /* we might be the last one */
> + if (unlikely(cpu_hotplug.active_writer))
> + wake_up_process(cpu_hotplug.active_writer);
> cpuhp_lock_release();
> return;
> }
> --
> 1.8.5.5
>
> On Mon, Dec 08, 2014 at 07:13:03PM +0100, David Hildenbrand wrote:
> > Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> > grace periods") introduced another problem that can easily be reproduced by
> > starting/stopping cpus in a loop.
> >
> > E.g.:
> > for i in `seq 5000`; do
> > echo 1 > /sys/devices/system/cpu/cpu1/online
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> > done
> >
> > Will result in:
> > INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> > Call Trace:
> > ([<00000000006a028e>] __schedule+0x406/0x91c)
> > [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> > [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> > [<0000000000131232>] cpu_up+0xb6/0xd4
> > [<00000000004a5720>] device_online+0x80/0xc0
> > [<00000000004a57f0>] online_store+0x90/0xb0
> > ...
> >
> > And a deadlock.
> >
> > Problem is that if the last ref in put_online_cpus() can't get the
> > cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
> > might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
> >
> > This quick fix wakes up the active_writer proactively. The writer already
> > goes back to sleep if the ref count isn't already down to 0, so this should be
> > fine.
> >
> > Can't reproduce the error with this fix.
>
> Good catch!
>
> But don't we need to use exactly the same value for the NULL check
> and for the wakeup? Otherwise, wouldn't it be possible for
> cpu_hotplug.active_writer to be non-NULL for the check but NULL
> for the wake_up_process()?
>
> Thanx, Paul
active_writer is cleared while holding cpuhp_lock, so this should be safe,
right?
Thanks!
On Mon, Dec 08, 2014 at 07:58:14PM +0100, David Hildenbrand wrote:
> > On Mon, Dec 08, 2014 at 07:13:03PM +0100, David Hildenbrand wrote:
> > > Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> > > grace periods") introduced another problem that can easily be reproduced by
> > > starting/stopping cpus in a loop.
> > >
> > > E.g.:
> > > for i in `seq 5000`; do
> > > echo 1 > /sys/devices/system/cpu/cpu1/online
> > > echo 0 > /sys/devices/system/cpu/cpu1/online
> > > done
> > >
> > > Will result in:
> > > INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> > > Call Trace:
> > > ([<00000000006a028e>] __schedule+0x406/0x91c)
> > > [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> > > [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> > > [<0000000000131232>] cpu_up+0xb6/0xd4
> > > [<00000000004a5720>] device_online+0x80/0xc0
> > > [<00000000004a57f0>] online_store+0x90/0xb0
> > > ...
> > >
> > > And a deadlock.
> > >
> > > Problem is that if the last ref in put_online_cpus() can't get the
> > > cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
> > > might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
> > >
> > > This quick fix wakes up the active_writer proactively. The writer already
> > > goes back to sleep if the ref count isn't already down to 0, so this should be
> > > fine.
> > >
> > > Can't reproduce the error with this fix.
> >
> > Good catch!
> >
> > But don't we need to use exactly the same value for the NULL check
> > and for the wakeup? Otherwise, wouldn't it be possible for
> > cpu_hotplug.active_writer to be non-NULL for the check but NULL
> > for the wake_up_process()?
> >
> > Thanx, Paul
>
> active_writer is cleared while holding cpuhp_lock, so this should be safe,
> right?
You lost me on that one. Don't we get to that piece of code precisely
because we don't hold any of the CPU-hotplug locks? If so, the
writer might well hold all the locks it needs, and might well change
cpu_hotplug.active_writer out from under us.
What am I missing here?
Thanx, Paul
> > active_writer is cleared while holding cpuhp_lock, so this should be safe,
> > right?
>
> You lost me on that one. Don't we get to that piece of code precisely
> because we don't hold any of the CPU-hotplug locks? If so, the
> writer might well hold all the locks it needs, and might well change
> cpu_hotplug.active_writer out from under us.
>
> What am I missing here?
>
> Thanx, Paul
I was missing that cpuhp_lock_* are simply lockdep anotations ... it's
getting late :)
So you're right, we need to verify that we don't get a 0 on the second access.
Will send an updated version soon.
Thanks!
On Mon, Dec 08, 2014 at 08:30:18PM +0100, David Hildenbrand wrote:
> > > active_writer is cleared while holding cpuhp_lock, so this should be safe,
> > > right?
> >
> > You lost me on that one. Don't we get to that piece of code precisely
> > because we don't hold any of the CPU-hotplug locks? If so, the
> > writer might well hold all the locks it needs, and might well change
> > cpu_hotplug.active_writer out from under us.
> >
> > What am I missing here?
> >
> > Thanx, Paul
>
> I was missing that cpuhp_lock_* are simply lockdep anotations ... it's
> getting late :)
>
> So you're right, we need to verify that we don't get a 0 on the second access.
All you should need to do is to do something like this:
awp = ACCESS_ONCE(cpu_hotplug.active_writer);
if (awp)
wake_up_process(cpu_hotplug.active_writer);
That way you only have one access, and the check and wake_up_process()
are guaranteed to be consistent.
> Will send an updated version soon.
Sounds good!
Thanx, Paul