Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261508AbUKBUCN (ORCPT ); Tue, 2 Nov 2004 15:02:13 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261663AbUKBUAX (ORCPT ); Tue, 2 Nov 2004 15:00:23 -0500 Received: from mx2.elte.hu ([157.181.151.9]:42113 "EHLO mx2.elte.hu") by vger.kernel.org with ESMTP id S261795AbUKBT6B (ORCPT ); Tue, 2 Nov 2004 14:58:01 -0500 Date: Tue, 2 Nov 2004 20:49:15 +0100 From: Ingo Molnar To: Mark_H_Johnson@raytheon.com Cc: Thomas Gleixner , Florian Schmidt , Lee Revell , Paul Davis , LKML , Bill Huey , Adam Heath , Michal Schmidt , Fernando Pablo Lopez-Lezcano , Karsten Wiese , jackit-devel , Rui Nuno Capela , "K.R. Foley" Subject: Re: [patch] Real-Time Preemption, -RT-2.6.9-mm1-V0.6.8 Message-ID: <20041102194915.GC3053@elte.hu> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-ELTE-SpamVersion: MailScanner 4.31.6-itk1 (ELTE 1.2) SpamAssassin 2.63 ClamAV 0.73 X-ELTE-VirusStatus: clean X-ELTE-SpamCheck: no X-ELTE-SpamCheck-Details: score=-4.9, required 5.9, autolearn=not spam, BAYES_00 -4.90 X-ELTE-SpamLevel: X-ELTE-SpamScore: -4 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2317 Lines: 55 * Mark_H_Johnson@raytheon.com wrote: > NMI Watchdog detected LOCKUP > Pid: 3933, comm: cpu_burn > EIP: 0073:[<08048340>] CPU: 0 > EIP is at 0x8048340 > ESP: 007b:bffffa40 EFLAGS: 00200282 Not tainted > (2.6.9-mm1-RT-V0.6.7) > EAX: 00000000 EBX: 00711ffc ECX: bffffadc EDX: bffffad4 > ESI: 00000001 EDI: 007140fc EBP: bffffa48 DS: 007b ES: 007b > CR0: 8005003b CR2: 00681400 CR3: 015735e0 CR4: 000006f0 > [] show_regs+0x14c/0x174 (36) > [] nmi_watchdog_tick+0x12f/0x140 (28) > [] default_do_nmi+0x6c/0x110 (96) > [] do_nmi+0x6d/0x70 (24) > [] nmi_stack_correct+0x1e/0x2e (-196314476) hm, this one is an extremely weird deadlock - the NMI watchdog detected a _user-space_ deadlock - i.e. the "cpu_burn" user-space code disabled interrupts for more than ~5 seconds? Sounds quite unlikely and the EFLAGS register also directly contradicts it, it has 0x200 set so interrupts are enabled! The only other way for the NMI watchdog to trigger is if for whatever reason the local APIC timer interrupts are not getting through and the NMI ticks (which come via a different interrupt pin) get through. this is what's happening on the other CPU: > [] die_nmi+0x5f/0xa0 (24) > [] nmi_watchdog_tick+0xef/0x140 (28) > [] default_do_nmi+0x6c/0x110 (96) > [] do_nmi+0x6d/0x70 (24) > [] nmi_stack_correct+0x1e/0x2e (116) > [] __mcount+0x1d/0x30 (16) > [] mcount+0x14/0x18 (20) > [] _spin_lock+0x11/0x70 (20) > [] _down_write_trylock+0x58/0x290 (52) > [] down_trylock+0x45/0x180 (52) > [] vprintk+0xf5/0x170 (36) > [] printk+0x1d/0x30 (16) > [] show_trace+0x95/0xe0 (32) > [] dump_stack+0x23/0x30 (20) > [] check_preempt_timing+0x16e/0x300 (76) > [] sub_preempt_count+0x7f/0xf0 (32) > [] flush_tlb_mm+0x5a/0x110 (36) flush_tlb_mm sends an IPI to the other CPU - maybe there's a connection. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/