Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754019AbaBKVTd (ORCPT ); Tue, 11 Feb 2014 16:19:33 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:41460 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752666AbaBKVTa (ORCPT ); Tue, 11 Feb 2014 16:19:30 -0500 Date: Tue, 11 Feb 2014 13:19:27 -0800 From: Andrew Morton To: Cc: , , , , , , Daniel Vetter , Jan Kara Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock Message-Id: <20140211131927.f180ac29dede51b2733b92a6@linux-foundation.org> In-Reply-To: <1392101400-14550-1-git-send-email-jiel@marvell.com> References: <1392101400-14550-1-git-send-email-jiel@marvell.com> X-Mailer: Sylpheed 3.2.0beta5 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 11 Feb 2014 14:50:00 +0800 wrote: > From: Jane Li > > This patch tries to fix a warning about possible circular locking > dependency. > > If do in following sequence: > enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online) > lockdep will show warning as following: > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.10.0 #2 Tainted: G O > ------------------------------------------------------- > sh/1271 is trying to acquire lock: > (console_lock){+.+.+.}, at: [] console_cpu_notify+0x20/0x2c > but task is already holding lock: > (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2c/0x58 > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > -> #2 (cpu_hotplug.lock){+.+.+.}: > [] lock_acquire+0x98/0x12c > [] mutex_lock_nested+0x50/0x3d8 > [] cpu_hotplug_begin+0x2c/0x58 > [] _cpu_up+0x24/0x154 > [] cpu_up+0x64/0x84 > [] smp_init+0x9c/0xd4 > [] kernel_init_freeable+0x78/0x1c8 > [] kernel_init+0x8/0xe4 > [] ret_from_fork+0x14/0x2c > > -> #1 (cpu_add_remove_lock){+.+.+.}: > [] lock_acquire+0x98/0x12c > [] mutex_lock_nested+0x50/0x3d8 > [] disable_nonboot_cpus+0x8/0xe8 > [] suspend_devices_and_enter+0x214/0x448 > [] pm_suspend+0x1e4/0x284 > [] try_to_suspend+0xa4/0xbc > [] process_one_work+0x1c4/0x4fc > [] worker_thread+0x138/0x37c > [] kthread+0xa4/0xb0 > [] ret_from_fork+0x14/0x2c > > -> #0 (console_lock){+.+.+.}: > [] __lock_acquire+0x1b38/0x1b80 > [] lock_acquire+0x98/0x12c > [] console_lock+0x54/0x68 > [] console_cpu_notify+0x20/0x2c > [] notifier_call_chain+0x44/0x84 > [] __cpu_notify+0x2c/0x48 > [] cpu_notify_nofail+0x8/0x14 > [] _cpu_down+0xf4/0x258 > [] cpu_down+0x24/0x40 > [] store_online+0x30/0x74 > [] dev_attr_store+0x18/0x24 > [] sysfs_write_file+0x16c/0x19c > [] vfs_write+0xb4/0x190 > [] SyS_write+0x3c/0x70 > [] ret_fast_syscall+0x0/0x48 > > Chain exists of: > console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock > > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(cpu_hotplug.lock); > lock(cpu_add_remove_lock); > lock(cpu_hotplug.lock); > lock(console_lock); > *** DEADLOCK *** These traces hurt my brain. > There are three locks involved in two sequence: > a) pm suspend: > console_lock (@suspend_console()) > cpu_add_remove_lock (@disable_nonboot_cpus()) > cpu_hotplug.lock (@_cpu_down()) But but but. suspend_console() releases console_sem again. So the sequence is actually down(&console_sem) (@suspend_console()) up(&console_sem) (@suspend_console()) cpu_add_remove_lock (@disable_nonboot_cpus()) cpu_hotplug.lock (@_cpu_down()) So console_sem *doesn't* nest outside cpu_add_remove_lock and cpu_hotplug.lock. > b) Plug-out CPUx: > cpu_add_remove_lock (@(cpu_down()) > cpu_hotplug.lock (@_cpu_down()) > console_lock (@console_cpu_notify()) => Lockdeps prints warning log. > > There should be not real deadlock, as flag of console_suspended can > protect this. console_lock() does down(&console_sem) *before* testing console_suspended, so I don't understand this sentence - a more detailed description would help. > Printk registers cpu hotplug notify function. When CPUx is plug-out/in, > always execute console_lock() and console_unlock(). This patch > modifies that with console_trylock() and console_unlock(). Then use > that instead of the unconditional console_lock/unlock pair to avoid the > warning. > > ... > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1893,6 +1893,20 @@ void resume_console(void) > } > > /** > + * console_flush - flush dmesg if console isn't suspended > + * > + * console_unlock always flushes the dmesg buffer, so just try to > + * grab&drop the console lock. If that fails we know that the current > + * holder will eventually drop the console lock and so flush the dmesg > + * buffers at the earliest possible time. > + */ The comment should describe why we added this code, please: talk about cpu_hotplug.lock and console_lock. > +void console_flush(void) > +{ > + if (console_trylock()) > + console_unlock(); > +} > + > +/** > * console_cpu_notify - print deferred console messages after CPU hotplug > * @self: notifier struct > * @action: CPU hotplug event > @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self, > case CPU_DEAD: > case CPU_DOWN_FAILED: > case CPU_UP_CANCELED: > - console_lock(); > - console_unlock(); > + console_flush(); > } > return NOTIFY_OK; Well, this is a bit hacky and makes the already-far-too-complex code even more complex. If it is indeed the case that the deadlock cannot really occur then let's try to find a way of suppressing the lockdep warning without making runtime changes. What I'm struggling with is what *should* the ranking of these locks be? >From a conceptual high-level design standpoint, which is the "innermost" lock? I tend to think that it is console_lock, because blocking CPU hotplug is a quite high-level operation. But console_lock is such a kooky special-case in the way it is used to control the printk corking that it is hard to take general rules and apply them here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/