2014-02-11 06:51:27

by Jane Li

[permalink] [raw]
Subject: [PATCH] printk: fix one circular lockdep warning about console_lock

From: Jane Li <[email protected]>

This patch tries to fix a warning about possible circular locking
dependency.

If do in following sequence:
enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online)
lockdep will show warning as following:

======================================================
[ INFO: possible circular locking dependency detected ]
3.10.0 #2 Tainted: G O
-------------------------------------------------------
sh/1271 is trying to acquire lock:
(console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
but task is already holding lock:
(cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #2 (cpu_hotplug.lock){+.+.+.}:
[<c017bb7c>] lock_acquire+0x98/0x12c
[<c06f5014>] mutex_lock_nested+0x50/0x3d8
[<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
[<c06ebfac>] _cpu_up+0x24/0x154
[<c06ec140>] cpu_up+0x64/0x84
[<c0981834>] smp_init+0x9c/0xd4
[<c0973880>] kernel_init_freeable+0x78/0x1c8
[<c06e7f40>] kernel_init+0x8/0xe4
[<c010eec8>] ret_from_fork+0x14/0x2c

-> #1 (cpu_add_remove_lock){+.+.+.}:
[<c017bb7c>] lock_acquire+0x98/0x12c
[<c06f5014>] mutex_lock_nested+0x50/0x3d8
[<c012b758>] disable_nonboot_cpus+0x8/0xe8
[<c016b83c>] suspend_devices_and_enter+0x214/0x448
[<c016bc54>] pm_suspend+0x1e4/0x284
[<c016bdcc>] try_to_suspend+0xa4/0xbc
[<c0143848>] process_one_work+0x1c4/0x4fc
[<c0143f80>] worker_thread+0x138/0x37c
[<c014aaf8>] kthread+0xa4/0xb0
[<c010eec8>] ret_from_fork+0x14/0x2c

-> #0 (console_lock){+.+.+.}:
[<c017b5d0>] __lock_acquire+0x1b38/0x1b80
[<c017bb7c>] lock_acquire+0x98/0x12c
[<c01288c4>] console_lock+0x54/0x68
[<c06ebf7c>] console_cpu_notify+0x20/0x2c
[<c01501d4>] notifier_call_chain+0x44/0x84
[<c012b448>] __cpu_notify+0x2c/0x48
[<c012b5b0>] cpu_notify_nofail+0x8/0x14
[<c06e81bc>] _cpu_down+0xf4/0x258
[<c06e8344>] cpu_down+0x24/0x40
[<c06e921c>] store_online+0x30/0x74
[<c03b7298>] dev_attr_store+0x18/0x24
[<c025fc5c>] sysfs_write_file+0x16c/0x19c
[<c0207a98>] vfs_write+0xb4/0x190
[<c0207e58>] SyS_write+0x3c/0x70
[<c010ee00>] ret_fast_syscall+0x0/0x48

Chain exists of:
console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock

Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpu_hotplug.lock);
lock(cpu_add_remove_lock);
lock(cpu_hotplug.lock);
lock(console_lock);
*** DEADLOCK ***

There are three locks involved in two sequence:
a) pm suspend:
console_lock (@suspend_console())
cpu_add_remove_lock (@disable_nonboot_cpus())
cpu_hotplug.lock (@_cpu_down())
b) Plug-out CPUx:
cpu_add_remove_lock (@(cpu_down())
cpu_hotplug.lock (@_cpu_down())
console_lock (@console_cpu_notify()) => Lockdeps prints warning log.

There should be not real deadlock, as flag of console_suspended can
protect this.

Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
always execute console_lock() and console_unlock(). This patch
modifies that with console_trylock() and console_unlock(). Then use
that instead of the unconditional console_lock/unlock pair to avoid the
warning.

Signed-off-by: Jane Li <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
kernel/printk/printk.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b1d255f..e7a0525 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1893,6 +1893,20 @@ void resume_console(void)
}

/**
+ * console_flush - flush dmesg if console isn't suspended
+ *
+ * console_unlock always flushes the dmesg buffer, so just try to
+ * grab&drop the console lock. If that fails we know that the current
+ * holder will eventually drop the console lock and so flush the dmesg
+ * buffers at the earliest possible time.
+ */
+void console_flush(void)
+{
+ if (console_trylock())
+ console_unlock();
+}
+
+/**
* console_cpu_notify - print deferred console messages after CPU hotplug
* @self: notifier struct
* @action: CPU hotplug event
@@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
case CPU_DEAD:
case CPU_DOWN_FAILED:
case CPU_UP_CANCELED:
- console_lock();
- console_unlock();
+ console_flush();
}
return NOTIFY_OK;
}
--
1.7.9.5


2014-02-11 18:29:33

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock

On Mon, Feb 10, 2014 at 10:50 PM, <[email protected]> wrote:
> From: Jane Li <[email protected]>
>
> This patch tries to fix a warning about possible circular locking
> dependency.
>
> If do in following sequence:
> enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online)
> lockdep will show warning as following:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.10.0 #2 Tainted: G O
> -------------------------------------------------------
> sh/1271 is trying to acquire lock:
> (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> but task is already holding lock:
> (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
> -> #2 (cpu_hotplug.lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> [<c06ebfac>] _cpu_up+0x24/0x154
> [<c06ec140>] cpu_up+0x64/0x84
> [<c0981834>] smp_init+0x9c/0xd4
> [<c0973880>] kernel_init_freeable+0x78/0x1c8
> [<c06e7f40>] kernel_init+0x8/0xe4
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #1 (cpu_add_remove_lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> [<c016bc54>] pm_suspend+0x1e4/0x284
> [<c016bdcc>] try_to_suspend+0xa4/0xbc
> [<c0143848>] process_one_work+0x1c4/0x4fc
> [<c0143f80>] worker_thread+0x138/0x37c
> [<c014aaf8>] kthread+0xa4/0xb0
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #0 (console_lock){+.+.+.}:
> [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c01288c4>] console_lock+0x54/0x68
> [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> [<c01501d4>] notifier_call_chain+0x44/0x84
> [<c012b448>] __cpu_notify+0x2c/0x48
> [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> [<c06e81bc>] _cpu_down+0xf4/0x258
> [<c06e8344>] cpu_down+0x24/0x40
> [<c06e921c>] store_online+0x30/0x74
> [<c03b7298>] dev_attr_store+0x18/0x24
> [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> [<c0207a98>] vfs_write+0xb4/0x190
> [<c0207e58>] SyS_write+0x3c/0x70
> [<c010ee00>] ret_fast_syscall+0x0/0x48
>
> Chain exists of:
> console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
>
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(cpu_hotplug.lock);
> lock(cpu_add_remove_lock);
> lock(cpu_hotplug.lock);
> lock(console_lock);
> *** DEADLOCK ***
>
> There are three locks involved in two sequence:
> a) pm suspend:
> console_lock (@suspend_console())
> cpu_add_remove_lock (@disable_nonboot_cpus())
> cpu_hotplug.lock (@_cpu_down())
> b) Plug-out CPUx:
> cpu_add_remove_lock (@(cpu_down())
> cpu_hotplug.lock (@_cpu_down())
> console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>
> There should be not real deadlock, as flag of console_suspended can
> protect this.
>
> Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
> always execute console_lock() and console_unlock(). This patch
> modifies that with console_trylock() and console_unlock(). Then use
> that instead of the unconditional console_lock/unlock pair to avoid the
> warning.
>
> Signed-off-by: Jane Li <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> kernel/printk/printk.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b1d255f..e7a0525 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1893,6 +1893,20 @@ void resume_console(void)
> }
>
> /**
> + * console_flush - flush dmesg if console isn't suspended
> + *
> + * console_unlock always flushes the dmesg buffer, so just try to
> + * grab&drop the console lock. If that fails we know that the current
> + * holder will eventually drop the console lock and so flush the dmesg
> + * buffers at the earliest possible time.
> + */
> +void console_flush(void)
> +{
> + if (console_trylock())
> + console_unlock();
> +}
> +
> +/**
> * console_cpu_notify - print deferred console messages after CPU hotplug
> * @self: notifier struct
> * @action: CPU hotplug event
> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
> case CPU_DEAD:
> case CPU_DOWN_FAILED:
> case CPU_UP_CANCELED:
> - console_lock();
> - console_unlock();
> + console_flush();
> }
> return NOTIFY_OK;
> }
> --
> 1.7.9.5
>

Seems reasonable to me.

Reviewed-by: Kees Cook <[email protected]>

-Kees

--
Kees Cook
Chrome OS Security

2014-02-11 21:19:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock

On Tue, 11 Feb 2014 14:50:00 +0800 <[email protected]> wrote:

> From: Jane Li <[email protected]>
>
> This patch tries to fix a warning about possible circular locking
> dependency.
>
> If do in following sequence:
> enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online)
> lockdep will show warning as following:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.10.0 #2 Tainted: G O
> -------------------------------------------------------
> sh/1271 is trying to acquire lock:
> (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> but task is already holding lock:
> (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
> -> #2 (cpu_hotplug.lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> [<c06ebfac>] _cpu_up+0x24/0x154
> [<c06ec140>] cpu_up+0x64/0x84
> [<c0981834>] smp_init+0x9c/0xd4
> [<c0973880>] kernel_init_freeable+0x78/0x1c8
> [<c06e7f40>] kernel_init+0x8/0xe4
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #1 (cpu_add_remove_lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> [<c016bc54>] pm_suspend+0x1e4/0x284
> [<c016bdcc>] try_to_suspend+0xa4/0xbc
> [<c0143848>] process_one_work+0x1c4/0x4fc
> [<c0143f80>] worker_thread+0x138/0x37c
> [<c014aaf8>] kthread+0xa4/0xb0
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #0 (console_lock){+.+.+.}:
> [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c01288c4>] console_lock+0x54/0x68
> [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> [<c01501d4>] notifier_call_chain+0x44/0x84
> [<c012b448>] __cpu_notify+0x2c/0x48
> [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> [<c06e81bc>] _cpu_down+0xf4/0x258
> [<c06e8344>] cpu_down+0x24/0x40
> [<c06e921c>] store_online+0x30/0x74
> [<c03b7298>] dev_attr_store+0x18/0x24
> [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> [<c0207a98>] vfs_write+0xb4/0x190
> [<c0207e58>] SyS_write+0x3c/0x70
> [<c010ee00>] ret_fast_syscall+0x0/0x48
>
> Chain exists of:
> console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
>
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(cpu_hotplug.lock);
> lock(cpu_add_remove_lock);
> lock(cpu_hotplug.lock);
> lock(console_lock);
> *** DEADLOCK ***

These traces hurt my brain.

> There are three locks involved in two sequence:
> a) pm suspend:
> console_lock (@suspend_console())
> cpu_add_remove_lock (@disable_nonboot_cpus())
> cpu_hotplug.lock (@_cpu_down())

But but but. suspend_console() releases console_sem again. So the
sequence is actually

down(&console_sem) (@suspend_console())
up(&console_sem) (@suspend_console())
cpu_add_remove_lock (@disable_nonboot_cpus())
cpu_hotplug.lock (@_cpu_down())

So console_sem *doesn't* nest outside cpu_add_remove_lock and
cpu_hotplug.lock.

> b) Plug-out CPUx:
> cpu_add_remove_lock (@(cpu_down())
> cpu_hotplug.lock (@_cpu_down())
> console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>
> There should be not real deadlock, as flag of console_suspended can
> protect this.

console_lock() does down(&console_sem) *before* testing
console_suspended, so I don't understand this sentence - a more
detailed description would help.

> Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
> always execute console_lock() and console_unlock(). This patch
> modifies that with console_trylock() and console_unlock(). Then use
> that instead of the unconditional console_lock/unlock pair to avoid the
> warning.
>
> ...
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1893,6 +1893,20 @@ void resume_console(void)
> }
>
> /**
> + * console_flush - flush dmesg if console isn't suspended
> + *
> + * console_unlock always flushes the dmesg buffer, so just try to
> + * grab&drop the console lock. If that fails we know that the current
> + * holder will eventually drop the console lock and so flush the dmesg
> + * buffers at the earliest possible time.
> + */

The comment should describe why we added this code, please: talk about
cpu_hotplug.lock and console_lock.

> +void console_flush(void)
> +{
> + if (console_trylock())
> + console_unlock();
> +}
> +
> +/**
> * console_cpu_notify - print deferred console messages after CPU hotplug
> * @self: notifier struct
> * @action: CPU hotplug event
> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
> case CPU_DEAD:
> case CPU_DOWN_FAILED:
> case CPU_UP_CANCELED:
> - console_lock();
> - console_unlock();
> + console_flush();
> }
> return NOTIFY_OK;

Well, this is a bit hacky and makes the already-far-too-complex code
even more complex. If it is indeed the case that the deadlock cannot
really occur then let's try to find a way of suppressing the lockdep
warning without making runtime changes.

What I'm struggling with is what *should* the ranking of these locks be?
>From a conceptual high-level design standpoint, which is the
"innermost" lock? I tend to think that it is console_lock, because
blocking CPU hotplug is a quite high-level operation.

But console_lock is such a kooky special-case in the way it is used to
control the printk corking that it is hard to take general rules and
apply them here.

2014-02-11 21:35:31

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock

Aside: I've suggested this fix (but was too lazy to write the actual
patch), hence also my sob on it.

On Tue, Feb 11, 2014 at 10:19 PM, Andrew Morton
<[email protected]> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -1893,6 +1893,20 @@ void resume_console(void)
>> }
>>
>> /**
>> + * console_flush - flush dmesg if console isn't suspended
>> + *
>> + * console_unlock always flushes the dmesg buffer, so just try to
>> + * grab&drop the console lock. If that fails we know that the current
>> + * holder will eventually drop the console lock and so flush the dmesg
>> + * buffers at the earliest possible time.
>> + */
>
> The comment should describe why we added this code, please: talk about
> cpu_hotplug.lock and console_lock.

I've suggested this since it's the generic way to flush out dmesg. The
printk functions also use this trick, but with the complication that
console_trylock_for_printk also drops the logbuf log. So imo this is a
general helper to flush the console buffer in odd circumstances and
not specific to the cpu lockdep issue at hand. At least my idea was
that this could be useful to untangle other issues around
console_lock.

>> +void console_flush(void)
>> +{
>> + if (console_trylock())
>> + console_unlock();
>> +}
>> +
>> +/**
>> * console_cpu_notify - print deferred console messages after CPU hotplug
>> * @self: notifier struct
>> * @action: CPU hotplug event
>> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
>> case CPU_DEAD:
>> case CPU_DOWN_FAILED:
>> case CPU_UP_CANCELED:
>> - console_lock();
>> - console_unlock();
>> + console_flush();
>> }
>> return NOTIFY_OK;
>
> Well, this is a bit hacky and makes the already-far-too-complex code
> even more complex. If it is indeed the case that the deadlock cannot
> really occur then let's try to find a way of suppressing the lockdep
> warning without making runtime changes.
>
> What I'm struggling with is what *should* the ranking of these locks be?
> From a conceptual high-level design standpoint, which is the
> "innermost" lock? I tend to think that it is console_lock, because
> blocking CPU hotplug is a quite high-level operation.
>
> But console_lock is such a kooky special-case in the way it is used to
> control the printk corking that it is hard to take general rules and
> apply them here.

console_lock is one giant disaster imo, at least from my experience
with it's interactions in fbdev/drm/gpu code - due to bonghits around
how the fbcon code is structured we need to run the entire initial
modeset under the console_lock, which means if anything goes wrong we
can't even dump to net/serial console.

Hence why I've proposed the trylock "fix", prettied up in the
console_flush helper to duct-tape over the lockdep splat at hand. It
actually achieves exactly what the code sets out to do (flush dmesg as
soon as possible) without extending the reach of insane console_lock
interactions into even more code. Because that way lies madness, at
least when I look at our vain attempts to make fbcon and drm work
together properly ...

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-02-13 10:48:28

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock

On Tue 11-02-14 13:19:27, Andrew Morton wrote:
> On Tue, 11 Feb 2014 14:50:00 +0800 <[email protected]> wrote:
>
> > From: Jane Li <[email protected]>
> >
> > This patch tries to fix a warning about possible circular locking
> > dependency.
> >
> > If do in following sequence:
> > enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online)
> > lockdep will show warning as following:
> >
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.10.0 #2 Tainted: G O
> > -------------------------------------------------------
> > sh/1271 is trying to acquire lock:
> > (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> > but task is already holding lock:
> > (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> > -> #2 (cpu_hotplug.lock){+.+.+.}:
> > [<c017bb7c>] lock_acquire+0x98/0x12c
> > [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> > [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> > [<c06ebfac>] _cpu_up+0x24/0x154
> > [<c06ec140>] cpu_up+0x64/0x84
> > [<c0981834>] smp_init+0x9c/0xd4
> > [<c0973880>] kernel_init_freeable+0x78/0x1c8
> > [<c06e7f40>] kernel_init+0x8/0xe4
> > [<c010eec8>] ret_from_fork+0x14/0x2c
> >
> > -> #1 (cpu_add_remove_lock){+.+.+.}:
> > [<c017bb7c>] lock_acquire+0x98/0x12c
> > [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> > [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> > [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> > [<c016bc54>] pm_suspend+0x1e4/0x284
> > [<c016bdcc>] try_to_suspend+0xa4/0xbc
> > [<c0143848>] process_one_work+0x1c4/0x4fc
> > [<c0143f80>] worker_thread+0x138/0x37c
> > [<c014aaf8>] kthread+0xa4/0xb0
> > [<c010eec8>] ret_from_fork+0x14/0x2c
> >
> > -> #0 (console_lock){+.+.+.}:
> > [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> > [<c017bb7c>] lock_acquire+0x98/0x12c
> > [<c01288c4>] console_lock+0x54/0x68
> > [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> > [<c01501d4>] notifier_call_chain+0x44/0x84
> > [<c012b448>] __cpu_notify+0x2c/0x48
> > [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> > [<c06e81bc>] _cpu_down+0xf4/0x258
> > [<c06e8344>] cpu_down+0x24/0x40
> > [<c06e921c>] store_online+0x30/0x74
> > [<c03b7298>] dev_attr_store+0x18/0x24
> > [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> > [<c0207a98>] vfs_write+0xb4/0x190
> > [<c0207e58>] SyS_write+0x3c/0x70
> > [<c010ee00>] ret_fast_syscall+0x0/0x48
> >
> > Chain exists of:
> > console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
> >
> > Possible unsafe locking scenario:
> > CPU0 CPU1
> > ---- ----
> > lock(cpu_hotplug.lock);
> > lock(cpu_add_remove_lock);
> > lock(cpu_hotplug.lock);
> > lock(console_lock);
> > *** DEADLOCK ***
>
> These traces hurt my brain.
>
> > There are three locks involved in two sequence:
> > a) pm suspend:
> > console_lock (@suspend_console())
> > cpu_add_remove_lock (@disable_nonboot_cpus())
> > cpu_hotplug.lock (@_cpu_down())
>
> But but but. suspend_console() releases console_sem again. So the
> sequence is actually
>
> down(&console_sem) (@suspend_console())
> up(&console_sem) (@suspend_console())
> cpu_add_remove_lock (@disable_nonboot_cpus())
> cpu_hotplug.lock (@_cpu_down())
>
> So console_sem *doesn't* nest outside cpu_add_remove_lock and
> cpu_hotplug.lock.
Exactly. My take would be that the lockdep annotation of console_sem is
just missing
mutex_release(&console_lock_dep_map, 1, _RET_IP_);
in suspend_console() and similar counterpart in resume_console(). We are
doing the annotation by hand and apparently this got missed.

...
> > +void console_flush(void)
> > +{
> > + if (console_trylock())
> > + console_unlock();
> > +}
> > +
> > +/**
> > * console_cpu_notify - print deferred console messages after CPU hotplug
> > * @self: notifier struct
> > * @action: CPU hotplug event
> > @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
> > case CPU_DEAD:
> > case CPU_DOWN_FAILED:
> > case CPU_UP_CANCELED:
> > - console_lock();
> > - console_unlock();
> > + console_flush();
> > }
> > return NOTIFY_OK;
>
> Well, this is a bit hacky and makes the already-far-too-complex code
> even more complex. If it is indeed the case that the deadlock cannot
> really occur then let's try to find a way of suppressing the lockdep
> warning without making runtime changes.
>
> What I'm struggling with is what *should* the ranking of these locks be?
> From a conceptual high-level design standpoint, which is the
> "innermost" lock? I tend to think that it is console_lock, because
> blocking CPU hotplug is a quite high-level operation.
>
> But console_lock is such a kooky special-case in the way it is used to
> control the printk corking that it is hard to take general rules and
> apply them here.
Currently I think it should be pretty much the innermost lock for
anything except for console driver special locks.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-02-14 07:43:06

by Jane Li

[permalink] [raw]
Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock

On 02/12/2014 05:19 AM, Andrew Morton wrote:

>> There are three locks involved in two sequence:
>> a) pm suspend:
>> console_lock (@suspend_console())
>> cpu_add_remove_lock (@disable_nonboot_cpus())
>> cpu_hotplug.lock (@_cpu_down())
> But but but. suspend_console() releases console_sem again.

Console_lock does not refer to console_sem but console_lock_dep_map. Its name is console_lock. Suspend_console() does not release console_lock_dep_map.

> So the
> sequence is actually
>
> down(&console_sem) (@suspend_console())

acquire(&console_lock_dep_map) (&suspend_console())

> up(&console_sem) (@suspend_console())
> cpu_add_remove_lock (@disable_nonboot_cpus())
> cpu_hotplug.lock (@_cpu_down())
>
> So console_sem *doesn't* nest outside cpu_add_remove_lock and
> cpu_hotplug.lock.

Add console_lock in the sequence.

>
>> b) Plug-out CPUx:
>> cpu_add_remove_lock (@(cpu_down())
>> cpu_hotplug.lock (@_cpu_down())
>> console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>>
>> There should be not real deadlock, as flag of console_suspended can
>> protect this.
> console_lock() does down(&console_sem) *before* testing
> console_suspended, so I don't understand this sentence - a more
> detailed description would help.

After suspend_console(), console_sem is unlocked, but console_lock_dep_map has been acquired.


Best Regards,
Jane