Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755562AbZDNNXo (ORCPT ); Tue, 14 Apr 2009 09:23:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754704AbZDNNXf (ORCPT ); Tue, 14 Apr 2009 09:23:35 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:45040 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754429AbZDNNXe (ORCPT ); Tue, 14 Apr 2009 09:23:34 -0400 Date: Tue, 14 Apr 2009 15:23:19 +0200 From: Ingo Molnar To: Ed Tomlinson Cc: Thomas Gleixner , Peter Zijlstra , linux-kernel@vger.kernel.org Subject: Re: 2.6.30-rc1 - nmi_watchdog broken? Message-ID: <20090414132319.GB4403@elte.hu> References: <200904140000.30155.edt@aei.ca> <20090414084232.GG27003@elte.hu> <200904140829.21493.edt@aei.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200904140829.21493.edt@aei.ca> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1747 Lines: 48 * Ed Tomlinson wrote: > On Tuesday 14 April 2009 04:42:32 Ingo Molnar wrote: > > > > * Ed Tomlinson wrote: > > > > > Hi, > > > > > > I've been having fun finding bugs in 30-rc1. One of them is a > > > hard freeze. I've not seen this type of problem on this hardware > > > before 30-rc1 - so I doubt if its hardware. The best way I know > > > to debug a hard hang is with the nmi_watchdog. I just cannot get > > > it to work. > > > > [ Btw., have you tried CONFIG_PROVE_LOCKING=y - does it produce > > anything before or at the hard lockup point? ] > > > > > The system is a 3 core amd cpu on a 790gx chipset. > > > > > > If I boot the nmi_watchdog=1 it complains that lapci is not > > > available and the boot stops. Same problem if I change the > > > clocksource to tsc, If I disable highres timers it panics. If I > > > use nmi_watchdog=2 it panics. Am I doing something wrong or have I > > > hit a bug? > > > > > > Logs of boots with and without highres timers inlined below. > > > > hm, nmi_watchdog=1 acting funny is not unheard of. But > > nmi_watchdog=2 should really work. How does it panic, do > > you have a capture of that? > > I had not tried nmi_watchdog=2 highres=off. This works. Looks > like there is a conflict between highres timers and nmi_watchdog > here. yes. Both use a limited resource of the lapic so we get one or the other. ( Might be fixable once we migrate the NMI watchdog code over to perfcounters. ) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/