Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755050Ab3H2Xp2 (ORCPT ); Thu, 29 Aug 2013 19:45:28 -0400 Received: from mail-pa0-f45.google.com ([209.85.220.45]:35591 "EHLO mail-pa0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752076Ab3H2Xp0 (ORCPT ); Thu, 29 Aug 2013 19:45:26 -0400 Message-ID: <521FDD12.7050000@linaro.org> Date: Thu, 29 Aug 2013 16:45:22 -0700 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 MIME-Version: 1.0 To: "Falauto, Gerlando" CC: "linux-kernel@vger.kernel.org" , Thomas Gleixner , Richard Cochran , Prarit Bhargava , "Brunck, Holger" , "Longchamp, Valentin" , "Bigler, Stefan" Subject: Re: kernel deadlock References: <521F6D06.1040107@keymile.com> In-Reply-To: X-Enigmail-Version: 1.5.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4174 Lines: 110 On 08/29/2013 01:56 PM, Falauto, Gerlando wrote: > Hi everyone, > > I ran into the deadlock situation reported at the bottom. > Actually, on my latest 3.10 kernel for some reason I don't get the > report (the kernel just hangs for some reason), so it took me quite some > time to track it down. > > Once I figured the trigger to the machine hanging was adjtimex(), I > reverted everything (between 3.9 to 3.10) that was touching > kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double > checked that indeed the problem was not happening anymore, and finally > started bisecting, landing on the following offending commit. > THEN, and ONLY THEN, did I get the &%""?+"% deadlock report. > > Do you guys have any ideas what could be wrong and how to fix it? Thanks for the report! What exactly is your process for reproducing the issue? > [ INFO: inconsistent lock state ] > 3.10.0-04864-g346ecc9-dirty #16 Not tainted > --------------------------------- > inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes: > (timekeeper_lock){?.-...}, at: [] do_adjtimex+0x64/0xbc > {IN-HARDIRQ-W} state was registered at: > [] __lock_acquire+0xabc/0x1bb8 > [] lock_acquire+0xa8/0x15c > [] _raw_spin_lock_irqsave+0x50/0x64 > [] do_timer+0x2c/0xa54 > [] tick_periodic+0x74/0x9c > [] tick_handle_periodic+0x18/0x7c > [] orion_timer_interrupt+0x24/0x34 > [] handle_irq_event_percpu+0x5c/0x300 > [] handle_irq_event+0x3c/0x5c > [] handle_level_irq+0x8c/0xe8 > [] generic_handle_irq+0x30/0x4c > [] handle_IRQ+0x30/0x84 > [] __irq_svc+0x38/0xa0 > [] calibrate_delay+0x350/0x4e4 > [] start_kernel+0x23c/0x2c4 > [<0000803c>] 0x803c > irq event stamp: 32358 > hardirqs last enabled at (32357): [] ret_fast_syscall+0x24/0x44 > hardirqs last disabled at (32358): [] > _raw_spin_lock_irqsave+0x20/0x64 > softirqs last enabled at (32160): [] __do_softirq+0x1b8/0x308 > softirqs last disabled at (32137): [] irq_exit+0xa0/0xd8 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(timekeeper_lock); > > lock(timekeeper_lock); > > *** DEADLOCK *** > > 1 lock held by SAKEY/738: > #0: (timekeeper_lock){?.-...}, at: [] do_adjtimex+0x64/0xbc > > stack backtrace: > CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16 > [] (unwind_backtrace+0x0/0xf0) from [] > (show_stack+0x10/0x14) > [] (show_stack+0x10/0x14) from [] > (print_usage_bug.part.27+0x218/0x280) > [] (print_usage_bug.part.27+0x218/0x280) from [] > (mark_lock+0x538/0x6bc) > [] (mark_lock+0x538/0x6bc) from [] > (mark_held_locks+0x90/0x124) > [] (mark_held_locks+0x90/0x124) from [] > (trace_hardirqs_on_caller+0xa8/0x23c) > [] (trace_hardirqs_on_caller+0xa8/0x23c) from [] > (_raw_spin_unlock_irq+0x24/0x5c) > [] (_raw_spin_unlock_irq+0x24/0x5c) from [] > (__do_adjtimex+0x17c/0x65c) > [] (__do_adjtimex+0x17c/0x65c) from [] > (do_adjtimex+0x84/0xbc) > [] (do_adjtimex+0x84/0xbc) from [] > (SyS_adjtimex+0x50/0xa8) > [] (SyS_adjtimex+0x50/0xa8) from [] > (ret_fast_syscall+0x0/0x44) Hrmm. So I'm a little confused by the report, as we hold the write lock on the timekeeper_lock disabling irqs, so I'm not sure I see how the irq could trigger to cause the deadlock. In fact, all the timekeeper_lock users save irqs. Hrmm. I dunno. :( Thomas, you have a guess? Let me know how you trigger it and I'll see if I can't reproduce it myself. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/