Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754470AbaBMKs2 (ORCPT ); Thu, 13 Feb 2014 05:48:28 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33726 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754428AbaBMKsZ (ORCPT ); Thu, 13 Feb 2014 05:48:25 -0500 Date: Thu, 13 Feb 2014 11:48:23 +0100 From: Jan Kara To: Andrew Morton Cc: jiel@marvell.com, joe@perches.com, tj@kernel.org, fweisbec@gmail.com, davem@davemloft.net, keescook@chromium.org, linux-kernel@vger.kernel.org, Daniel Vetter , Jan Kara Subject: Re: [PATCH] printk: fix one circular lockdep warning about console_lock Message-ID: <20140213104823.GC5160@quack.suse.cz> References: <1392101400-14550-1-git-send-email-jiel@marvell.com> <20140211131927.f180ac29dede51b2733b92a6@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140211131927.f180ac29dede51b2733b92a6@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 11-02-14 13:19:27, Andrew Morton wrote: > On Tue, 11 Feb 2014 14:50:00 +0800 wrote: > > > From: Jane Li > > > > This patch tries to fix a warning about possible circular locking > > dependency. > > > > If do in following sequence: > > enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online) > > lockdep will show warning as following: > > > > ====================================================== > > [ INFO: possible circular locking dependency detected ] > > 3.10.0 #2 Tainted: G O > > ------------------------------------------------------- > > sh/1271 is trying to acquire lock: > > (console_lock){+.+.+.}, at: [] console_cpu_notify+0x20/0x2c > > but task is already holding lock: > > (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x2c/0x58 > > which lock already depends on the new lock. > > > > the existing dependency chain (in reverse order) is: > > -> #2 (cpu_hotplug.lock){+.+.+.}: > > [] lock_acquire+0x98/0x12c > > [] mutex_lock_nested+0x50/0x3d8 > > [] cpu_hotplug_begin+0x2c/0x58 > > [] _cpu_up+0x24/0x154 > > [] cpu_up+0x64/0x84 > > [] smp_init+0x9c/0xd4 > > [] kernel_init_freeable+0x78/0x1c8 > > [] kernel_init+0x8/0xe4 > > [] ret_from_fork+0x14/0x2c > > > > -> #1 (cpu_add_remove_lock){+.+.+.}: > > [] lock_acquire+0x98/0x12c > > [] mutex_lock_nested+0x50/0x3d8 > > [] disable_nonboot_cpus+0x8/0xe8 > > [] suspend_devices_and_enter+0x214/0x448 > > [] pm_suspend+0x1e4/0x284 > > [] try_to_suspend+0xa4/0xbc > > [] process_one_work+0x1c4/0x4fc > > [] worker_thread+0x138/0x37c > > [] kthread+0xa4/0xb0 > > [] ret_from_fork+0x14/0x2c > > > > -> #0 (console_lock){+.+.+.}: > > [] __lock_acquire+0x1b38/0x1b80 > > [] lock_acquire+0x98/0x12c > > [] console_lock+0x54/0x68 > > [] console_cpu_notify+0x20/0x2c > > [] notifier_call_chain+0x44/0x84 > > [] __cpu_notify+0x2c/0x48 > > [] cpu_notify_nofail+0x8/0x14 > > [] _cpu_down+0xf4/0x258 > > [] cpu_down+0x24/0x40 > > [] store_online+0x30/0x74 > > [] dev_attr_store+0x18/0x24 > > [] sysfs_write_file+0x16c/0x19c > > [] vfs_write+0xb4/0x190 > > [] SyS_write+0x3c/0x70 > > [] ret_fast_syscall+0x0/0x48 > > > > Chain exists of: > > console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock > > > > Possible unsafe locking scenario: > > CPU0 CPU1 > > ---- ---- > > lock(cpu_hotplug.lock); > > lock(cpu_add_remove_lock); > > lock(cpu_hotplug.lock); > > lock(console_lock); > > *** DEADLOCK *** > > These traces hurt my brain. > > > There are three locks involved in two sequence: > > a) pm suspend: > > console_lock (@suspend_console()) > > cpu_add_remove_lock (@disable_nonboot_cpus()) > > cpu_hotplug.lock (@_cpu_down()) > > But but but. suspend_console() releases console_sem again. So the > sequence is actually > > down(&console_sem) (@suspend_console()) > up(&console_sem) (@suspend_console()) > cpu_add_remove_lock (@disable_nonboot_cpus()) > cpu_hotplug.lock (@_cpu_down()) > > So console_sem *doesn't* nest outside cpu_add_remove_lock and > cpu_hotplug.lock. Exactly. My take would be that the lockdep annotation of console_sem is just missing mutex_release(&console_lock_dep_map, 1, _RET_IP_); in suspend_console() and similar counterpart in resume_console(). We are doing the annotation by hand and apparently this got missed. ... > > +void console_flush(void) > > +{ > > + if (console_trylock()) > > + console_unlock(); > > +} > > + > > +/** > > * console_cpu_notify - print deferred console messages after CPU hotplug > > * @self: notifier struct > > * @action: CPU hotplug event > > @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self, > > case CPU_DEAD: > > case CPU_DOWN_FAILED: > > case CPU_UP_CANCELED: > > - console_lock(); > > - console_unlock(); > > + console_flush(); > > } > > return NOTIFY_OK; > > Well, this is a bit hacky and makes the already-far-too-complex code > even more complex. If it is indeed the case that the deadlock cannot > really occur then let's try to find a way of suppressing the lockdep > warning without making runtime changes. > > What I'm struggling with is what *should* the ranking of these locks be? > From a conceptual high-level design standpoint, which is the > "innermost" lock? I tend to think that it is console_lock, because > blocking CPU hotplug is a quite high-level operation. > > But console_lock is such a kooky special-case in the way it is used to > control the printk corking that it is hard to take general rules and > apply them here. Currently I think it should be pretty much the innermost lock for anything except for console driver special locks. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/