Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933874AbZARPjx (ORCPT ); Sun, 18 Jan 2009 10:39:53 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1765577AbZARPjm (ORCPT ); Sun, 18 Jan 2009 10:39:42 -0500 Received: from bombadil.infradead.org ([18.85.46.34]:44513 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1765506AbZARPjl (ORCPT ); Sun, 18 Jan 2009 10:39:41 -0500 Subject: Re: [2.6.29-rc2] Inconsistent lock state on resume in hres_timers_resume From: Peter Zijlstra To: Andrey Borzenkov Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "Rafael J. Wysocki" In-Reply-To: <200901181642.00886.arvidjaar@mail.ru> References: <200901181642.00886.arvidjaar@mail.ru> Content-Type: text/plain Date: Sun, 18 Jan 2009 16:39:29 +0100 Message-Id: <1232293169.5204.14.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.24.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5439 Lines: 114 On Sun, 2009-01-18 at 16:41 +0300, Andrey Borzenkov wrote: > [17854.688347] ================================= > [17854.688347] [ INFO: inconsistent lock state ] > [17854.688347] 2.6.29-rc2-1avb #1 > [17854.688347] --------------------------------- > [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes: > [17854.688347] (&cpu_base->lock){++..}, at: [] retrigger_next_event+0x5c/0xa0 > [17854.688347] {in-hardirq-W} state was registered at: > [17854.688347] [] __lock_acquire+0x79d/0x1930 > [17854.688347] [] lock_acquire+0x5c/0x80 > [17854.688347] [] _spin_lock+0x35/0x70 > [17854.688347] [] hrtimer_run_queues+0x31/0x140 > [17854.688347] [] run_local_timers+0x8/0x20 > [17854.688347] [] update_process_times+0x23/0x60 > [17854.688347] [] tick_periodic+0x24/0x80 > [17854.688347] [] tick_handle_periodic+0x12/0x70 > [17854.688347] [] timer_interrupt+0x14/0x20 > [17854.688347] [] handle_IRQ_event+0x29/0x60 > [17854.688347] [] handle_level_irq+0x69/0xe0 > [17854.688347] [] 0xffffffff > [17854.688347] irq event stamp: 55771 > [17854.688347] hardirqs last enabled at (55771): [] _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] hardirqs last disabled at (55770): [] _spin_lock_irqsave+0x19/0x80 > [17854.688347] softirqs last enabled at (54836): [] __do_softirq+0xc4/0x110 > [17854.688347] softirqs last disabled at (54831): [] do_softirq+0x8e/0xe0 > [17854.688347] > [17854.688347] other info that might help us debug this: > [17854.688347] 3 locks held by pm-suspend/18240: > [17854.688347] #0: (&buffer->mutex){--..}, at: [] sysfs_write_file+0x25/0x100 > [17854.688347] #1: (pm_mutex){--..}, at: [] enter_state+0x4f/0x140 > [17854.688347] #2: (dpm_list_mtx){--..}, at: [] device_pm_lock+0xf/0x20 > [17854.688347] > [17854.688347] stack backtrace: > [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1 > [17854.688347] Call Trace: > [17854.688347] [] ? printk+0x18/0x20 > [17854.688347] [] print_usage_bug+0x16c/0x1d0 > [17854.688347] [] mark_lock+0x8bf/0xc90 > [17854.688347] [] ? pit_next_event+0x2f/0x40 > [17854.688347] [] __lock_acquire+0x580/0x1930 > [17854.688347] [] ? _spin_unlock+0x1d/0x20 > [17854.688347] [] ? pit_next_event+0x2f/0x40 > [17854.688347] [] ? clockevents_program_event+0x98/0x160 > [17854.688347] [] ? mark_held_locks+0x48/0x90 > [17854.688347] [] ? _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] [] ? trace_hardirqs_on_caller+0x139/0x190 > [17854.688347] [] ? trace_hardirqs_on+0xb/0x10 > [17854.688347] [] lock_acquire+0x5c/0x80 > [17854.688347] [] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [] _spin_lock+0x35/0x70 > [17854.688347] [] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [] retrigger_next_event+0x5c/0xa0 > [17854.688347] [] hres_timers_resume+0xa/0x10 > [17854.688347] [] timekeeping_resume+0xee/0x150 > [17854.688347] [] __sysdev_resume+0x14/0x50 > [17854.688347] [] sysdev_resume+0x47/0x80 > [17854.688347] [] device_power_up+0xb/0x20 > [17854.688347] [] suspend_devices_and_enter+0xcf/0x150 > [17854.688347] [] ? freeze_processes+0x3f/0x90 > [17854.688347] [] enter_state+0xf4/0x140 > [17854.688347] [] state_store+0x7d/0xc0 > [17854.688347] [] ? state_store+0x0/0xc0 > [17854.688347] [] kobj_attr_store+0x24/0x30 > [17854.688347] [] sysfs_write_file+0x9c/0x100 > [17854.688347] [] vfs_write+0x9c/0x160 > [17854.688347] [] ? restore_nocheck_notrace+0x0/0xe > [17854.688347] [] ? sysfs_write_file+0x0/0x100 > [17854.688347] [] sys_write+0x3d/0x70 > [17854.688347] [] sysenter_do_call+0x12/0x31 Not sure what caused this to trigger, but it looks like timekeeping_resume() isn't called with IRQs disabled (and the code doesn't seem to expect that since it uses write_seqlock_irqsave). hres_timers_resume() however calls retrigger_next_event() which does require IRQs disabled and doesn't do that. Like said, I'm not sure what caused this since the code in question doesn't seem to have changed since April 2007. Anyway, does the below patch cure trouble? --- Subject: hrtimer: fix irq recursion deadlock retrigger_next_event() requires IRQs disabled, provide that for the resume callback. Signed-off-by: Peter Zijlstra --- diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 2c40ee8..430103d 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -614,8 +614,12 @@ void clock_was_set(void) */ void hres_timers_resume(void) { + unsigned long flags; + /* Retrigger the CPU local events: */ + local_irq_save(flags); retrigger_next_event(NULL); + local_irq_restore(flags); } /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/