Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S969945AbXFIC1Y (ORCPT ); Fri, 8 Jun 2007 22:27:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S967932AbXFIC1O (ORCPT ); Fri, 8 Jun 2007 22:27:14 -0400 Received: from mail.gmx.net ([213.165.64.20]:35908 "HELO mail.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1764357AbXFIC1N (ORCPT ); Fri, 8 Jun 2007 22:27:13 -0400 X-Authenticated: #5039886 X-Provags-ID: V01U2FsdGVkX19DayEkKrtDoyZPuBtkQbTC++qcfh5scQi57Fb5c5 Fo9R0A49PhFyKf Date: Sat, 9 Jun 2007 04:27:10 +0200 From: =?iso-8859-1?Q?Bj=F6rn?= Steinbrink To: Ingo Molnar Cc: Andrew Morton , Andi Kleen , "Udo A. Steinberg" , Michal Piotrowski , Linus Torvalds , LKML , ak@suse.de, dzickus@redhat.com Subject: [PATCH] i386: Fix the K7 NMI watchdog checkbit Message-ID: <20070609022710.GA2399@atjola.homenet> Mail-Followup-To: =?iso-8859-1?Q?Bj=F6rn?= Steinbrink , Ingo Molnar , Andrew Morton , Andi Kleen , "Udo A. Steinberg" , Michal Piotrowski , Linus Torvalds , LKML , ak@suse.de, dzickus@redhat.com References: <465C2225.2000100@googlemail.com> <20070603150246.5151dda6@laptop.hypervisor.org> <20070608060244.GA2369@atjola.homenet> <20070607234153.09c32b49.akpm@linux-foundation.org> <20070608105808.GA10190@elte.hu> <20070608184422.GA2204@atjola.homenet> <20070608204324.GA25392@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20070608204324.GA25392@elte.hu> User-Agent: Mutt/1.5.13 (2006-08-11) X-Y-GMX-Trusted: 0 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2011 Lines: 50 On 2007.06.08 22:43:25 +0200, Ingo Molnar wrote: > > * Bj?rn Steinbrink wrote: > > > Anyway, both are bugs and should be fixed. Maybe we're even lucky and > > it fixes your hang. *fingers crossed* > > just to make it clear: the NMI watchdog was working perfectly fine on > that box (in v2.6.21 and in dozens of kernel releases before that, for > multiple years) before Andi's cleanup patch. So lets find that bug first > or revert the cleanups. Might have been pure luck. ;-) The culprit seems to be commit b7471c6da94d30d3deadc55986cc38d1ff57f9ca (from Sep 2006), which introduced the check bit to figure out if a NMI was generated by the watchdog timer. While the performance counter register on K7 is 64 bits wide, the upper 16 bits are reserved and thus using bit 63 as the check bit is wrong. A quick check using /dev/cpu/0/msr shows that here, the upper 16 bits are zero all the time, chances are that this is not deterministic and you got a 1 in bit 63 due to some random change. Bj?rn The performance counters on K7 are only 48 bits wide, so using bit 63 to check if the counter overflowed is wrong. Let's use bit 47 instead. Signed-off-by: Bj?rn Steinbrink Cc: Don Zickus Cc: Andi Kleen --- diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c index 2b04c8f..82c6967 100644 --- a/arch/i386/kernel/cpu/perfctr-watchdog.c +++ b/arch/i386/kernel/cpu/perfctr-watchdog.c @@ -294,7 +294,7 @@ static struct wd_ops k7_wd_ops = { .stop = single_msr_stop_watchdog, .perfctr = MSR_K7_PERFCTR0, .evntsel = MSR_K7_EVNTSEL0, - .checkbit = 1ULL<<63, + .checkbit = 1ULL<<47, }; /* Intel Model 6 (PPro+,P2,P3,P-M,Core1) */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/