Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933301Ab2KETSF (ORCPT ); Mon, 5 Nov 2012 14:18:05 -0500 Received: from mail-ia0-f174.google.com ([209.85.210.174]:35348 "EHLO mail-ia0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933007Ab2KETSC (ORCPT ); Mon, 5 Nov 2012 14:18:02 -0500 Date: Mon, 5 Nov 2012 11:17:58 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Sasha Levin cc: Daniel Vetter , Alan Cox , Sasha Levin , Greg Kroah-Hartman , Jiri Slaby , linux-kernel@vger.kernel.org, Dave Jones , linux-fbdev@vger.kernel.org, florianSchandinat@gmx.de Subject: Re: tty, vt: lockdep warnings In-Reply-To: <5097FEA9.2090603@oracle.com> Message-ID: References: <50899507.1040900@oracle.com> <20121026143754.50277bd8@pyramind.ukuu.org.uk> <20121105175937.26f31d2a@pyramind.ukuu.org.uk> <5097FEA9.2090603@oracle.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8194 Lines: 208 On Mon, 5 Nov 2012, Sasha Levin wrote: > On 11/05/2012 12:59 PM, Alan Cox wrote: > > On Mon, 5 Nov 2012 12:26:43 -0500 > > Sasha Levin wrote: > > > >> Ping? Should I bisect it? > >> > >> On Fri, Oct 26, 2012 at 9:37 AM, Alan Cox wrote: > >>> On Thu, 25 Oct 2012 15:37:43 -0400 > >>> Sasha Levin wrote: > >>> > >>>> Hi all, > >>>> > >>>> While fuzzing with trinity inside a KVM tools (lkvm) guest running latest -next kernel, > >>>> I've stumbled on the following spew: > >>> > >>> Looks real enough but its not a tty/vt layer spew. This is all coming out > >>> of the core framebuffer code which doesn't seem to be able to decide what > >>> the locking rules at the invocation of fb_notifier_call_chain are. > >>> > >>> It might need some console layer tweaking to provide 'register console > >>> and I already hold the locks' or similar but that notifier needs some > >>> kind of sanity applying as well. > >>> > >>> Cc'ing the fbdev folks > > > > I've cc'd the framebuffer folks. I can see why its occurring but I have > > no idea how they intend to fix it and I've not seen any replies. > > > > Sorry but I've got enough other things on my plate right now without > > trying to deal with the locking brain damage that the fbdev layer is. > > > > As far as I can tell the actual bug proper is years old. > > > > Alan > > > > Ow, I figured it's something new since I've only now started seeing it in fuzz > tests, and it reproduces pretty much every time. The fbdev potential for deadlock may be years old, but the warning (and consequent disabling of lockdep from that point on - making it useless to everybody else in need of it) is new, and comes from the commit below in linux-next. I revert it in my own testing: if there is no quick fix to the fbdev issue on the way, Daniel, please revert it from your tree. Thanks, Hugh commit daee779718a319ff9f83e1ba3339334ac650bb22 Author: Daniel Vetter Date: Sat Sep 22 19:52:11 2012 +0200 console: implement lockdep support for console_lock Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock: https://lkml.org/lkml/2012/8/21/36 Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat. Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen. Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually. There are a few special cases: - The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything. - In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either. - The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that. This all leaves us with annotating the normal console_lock, _unlock and _trylock functions. And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit: ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted ------------------------------------------------------- kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [] wait_on_work+0x0/0xa7 but task is already holding lock: (console_lock){+.+.+.}, at: [] bind_con_driver+0x38/0x263 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (console_lock){+.+.+.}: [] lock_acquire+0x95/0x105 [] console_lock+0x59/0x5b [] fb_flashcursor+0x2e/0x12c [] process_one_work+0x1d9/0x3b4 [] worker_thread+0x1a7/0x24b [] kthread+0x7f/0x87 [] kernel_thread_helper+0x4/0x10 -> #0 ((&info->queue)){+.+...}: [] __lock_acquire+0x999/0xcf6 [] lock_acquire+0x95/0x105 [] wait_on_work+0x3b/0xa7 [] __cancel_work_timer+0xbf/0x102 [] cancel_work_sync+0xb/0xd [] fbcon_deinit+0x11c/0x1dc [] bind_con_driver+0x145/0x263 [] unbind_con_driver+0x14f/0x195 [] store_bind+0x1ad/0x1c1 [] dev_attr_store+0x13/0x1f [] sysfs_write_file+0xe9/0x121 [] vfs_write+0x9b/0xfd [] sys_write+0x3e/0x6b [] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue)); *** DEADLOCK *** v2: Mark the lockdep_map static, noticed by Jani Nikula. Cc: Dave Airlie Cc: Thomas Gleixner Cc: Alan Cox Cc: Peter Zijlstra Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman diff --git a/kernel/printk.c b/kernel/printk.c index 2d607f4..ee79f14 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -87,6 +87,12 @@ static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers); +#ifdef CONFIG_LOCKDEP +static struct lockdep_map console_lock_dep_map = { + .name = "console_lock" +}; +#endif + /* * This is used for debugging the mess that is the VT code by * keeping track if we have the console semaphore held. It's @@ -1914,6 +1920,7 @@ void console_lock(void) return; console_locked = 1; console_may_schedule = 1; + mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_); } EXPORT_SYMBOL(console_lock); @@ -1935,6 +1942,7 @@ int console_trylock(void) } console_locked = 1; console_may_schedule = 0; + mutex_acquire(&console_lock_dep_map, 0, 1, _RET_IP_); return 1; } EXPORT_SYMBOL(console_trylock); @@ -2095,6 +2103,7 @@ skip: local_irq_restore(flags); } console_locked = 0; + mutex_release(&console_lock_dep_map, 1, _RET_IP_); /* Release the exclusive_console once it is used */ if (unlikely(exclusive_console)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/