Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757772AbYFCMWi (ORCPT ); Tue, 3 Jun 2008 08:22:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755282AbYFCMWb (ORCPT ); Tue, 3 Jun 2008 08:22:31 -0400 Received: from an-out-0708.google.com ([209.85.132.241]:34299 "EHLO an-out-0708.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754201AbYFCMWa (ORCPT ); Tue, 3 Jun 2008 08:22:30 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=XBWPc/bM6J3hvNySmhIMjDhQJ5RAluyXZAlf1Tspj6UL+8TKq91VqcU4yRGBCABrngPg1+F5E97IFhNZfOo3sZduy6DCJAkZLrHuuTjNO1Pof3kEH6AGJK733nu+h/FWrOuh5srNPUFzGoiLvjItnhQ97EeyCnrUiaiClvFGYC8= Message-ID: Date: Tue, 3 Jun 2008 09:22:28 -0300 From: "Kevin Winchester" To: "Peter Zijlstra" Subject: Re: linux-next: WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() Cc: "Ingo Molnar" , "Linux Kernel Mailing List" In-Reply-To: <1212483192.6304.8.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4844868F.20104@gmail.com> <1212483192.6304.8.camel@twins> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4616 Lines: 101 On Tue, Jun 3, 2008 at 5:53 AM, Peter Zijlstra wrote: > On Mon, 2008-06-02 at 20:47 -0300, Kevin Winchester wrote: >> In next-20080530 and next-20080602 (and possibly earlier - I can't >> remember the linux-next tree before that I tried) I get the following: >> >> [ 12.885153] ------------[ cut here ]------------ >> [ 12.885203] WARNING: at kernel/lockdep.c:2680 check_flags+0x98/0x151() >> [ 12.885248] Pid: 4, comm: watchdog/0 Not tainted >> 2.6.26-rc4-next-20080602 #13 >> [ 12.885292] >> [ 12.885293] Call Trace: >> [ 12.885364] [] warn_on_slowpath+0x58/0x8a >> [ 12.885410] [] ? _spin_unlock_irqrestore+0x51/0x6d >> [ 12.885455] [] ? debug_locks_off+0x9/0x3c >> [ 12.885498] [] ? ftrace_record_ip+0x1fa/0x272 >> [ 12.885542] [] ? watchdog+0xc5/0x1ff >> [ 12.885584] [] ? mcount_call+0x5/0x35 >> [ 12.885627] [] ? watchdog+0xc5/0x1ff >> [ 12.885668] [] check_flags+0x98/0x151 >> [ 12.885710] [] lock_acquire+0x4a/0xa9 >> [ 12.885753] [] ? watchdog+0xc5/0x1ff >> [ 12.885795] [] ? watchdog+0x0/0x1ff >> [ 12.885837] [] _read_lock+0x37/0x43 >> [ 12.885879] [] watchdog+0xc5/0x1ff >> [ 12.885921] [] ? watchdog+0x0/0x1ff >> [ 12.885963] [] kthread+0x4e/0x7b >> [ 12.886005] [] child_rip+0xa/0x12 >> [ 12.886046] [] ? finish_task_switch+0x57/0x92 >> [ 12.886090] [] ? _spin_unlock_irq+0x3b/0x57 >> [ 12.886133] [] ? restore_args+0x0/0x30 >> [ 12.886137] [] ? kthread+0x0/0x7b >> [ 12.886137] [] ? child_rip+0x0/0x12 >> [ 12.886137] >> [ 12.886137] ---[ end trace 60e7f098a6913839 ]--- >> [ 12.886137] possible reason: unannotated irqs-on. >> [ 12.886137] irq event stamp: 20 >> [ 12.886137] hardirqs last enabled at (19): [] >> trace_hardirqs_on+0xd/0xf >> [ 12.886137] hardirqs last disabled at (20): [] >> trace_hardirqs_off+0xd/0xf >> [ 12.886137] softirqs last enabled at (0): [] >> copy_process+0x2da/0x109e >> [ 12.886137] softirqs last disabled at (0): [<0000000000000000>] 0x0 >> >> Do I understand this correctly that something enabled irqs in a way that >> got around lockdep? I assume the problem is not in watchdog, just that >> the watchdog was the next thing to run that interacted with irqs and >> thus lockdep picked up the situation then? >> >> Is there additional debugging I can do, given some instructions? If the >> cause is readily apparent to anyone, could they let me know (for my own >> interest) why it is apparent so that I can investigate things like this >> further next time? > > > You are correct - someone did sti but failed to call > trace_hardirqs_on(). Frequently its possible to isolate the code from > knowing the last recorded event: > > [ 12.886137] irq event stamp: 20 > [ 12.886137] hardirqs last disabled at (20): [] trace_hardirqs_off+0xd/0xf > > However your compilation seems to have lost the caller IP: > > void trace_hardirqs_off(void) > { > trace_hardirqs_off_caller(CALLER_ADDR0); > } > EXPORT_SYMBOL(trace_hardirqs_off); > > So that is of little help here. (Not sure how that happened, nor how you > could fix that - perhaps turn on some debugging knobs like build with > debug info etc..) > Could this be related to CONFIG_CC_OPTIMIZE_INLINING (or whatever it is called now)? I am pretty sure I have that enabled. I will check the rest of my config this evening to see if there is anything else I can turn on/off to help. > >> This is completely reproducible on every boot - should I try to bisect it? > > That is a possibility yes - if you can find the offending patch it > should be relatively straight forward to find the offending sti. > Sure - I can try it this evening. Also, is calling sti the only way this could have happened? And is linux-next broken out into a single patch? I wouldn't expect there to be too many calls to sti, so a grep in the patch file might be quicker than bisection. Thanks, -- Kevin Winchester -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/