2008-08-14 20:32:33

by Aristeu Rozanski

[permalink] [raw]
Subject: [PATCH] perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds

Currently, setup_p4_watchdog() use CCCR_OVF_PMI1 to enable the counter
overflow interrupts to the second logical core. But this bit doesn't work
on Pentium 4 Ds (model 4, stepping 4) and this patch avoids its use on
these processors. Tested on 4 different machines that have this
specific model with success.

Signed-off-by: Aristeu Rozanski <[email protected]>

---
arch/x86/kernel/cpu/perfctr-watchdog.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- linus-2.6.orig/arch/x86/kernel/cpu/perfctr-watchdog.c 2008-08-12 11:13:35.000000000 -0400
+++ linus-2.6/arch/x86/kernel/cpu/perfctr-watchdog.c 2008-08-12 11:21:09.000000000 -0400
@@ -478,7 +478,13 @@ static int setup_p4_watchdog(unsigned nm
perfctr_msr = MSR_P4_IQ_PERFCTR1;
evntsel_msr = MSR_P4_CRU_ESCR0;
cccr_msr = MSR_P4_IQ_CCCR1;
- cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+
+ /* Pentium 4 D processors don't support P4_CCCR_OVF_PMI1 */
+ if (boot_cpu_data.x86_model == 4 && boot_cpu_data.x86_mask == 4)
+ cccr_val = P4_CCCR_OVF_PMI0;
+ else
+ cccr_val = P4_CCCR_OVF_PMI1;
+ cccr_val |= P4_CCCR_ESCR_SELECT(4);
}

evntsel = P4_ESCR_EVENT_SELECT(0x3F)


2008-08-15 11:59:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds


* Aristeu Rozanski <[email protected]> wrote:

> Currently, setup_p4_watchdog() use CCCR_OVF_PMI1 to enable the counter
> overflow interrupts to the second logical core. But this bit doesn't
> work on Pentium 4 Ds (model 4, stepping 4) and this patch avoids its
> use on these processors. [...]

btw., what was the effect - an oops on bootup, or a non-working
watchdog?

> [...] Tested on 4 different machines that have this specific model
> with success.
>
> Signed-off-by: Aristeu Rozanski <[email protected]>

applied to tip/x86/urgent - thanks Aristeu.

Ingo

2008-08-15 12:16:40

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH] perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds

Hi Ingo,
> > Currently, setup_p4_watchdog() use CCCR_OVF_PMI1 to enable the counter
> > overflow interrupts to the second logical core. But this bit doesn't
> > work on Pentium 4 Ds (model 4, stepping 4) and this patch avoids its
> > use on these processors. [...]
>
> btw., what was the effect - an oops on bootup, or a non-working
> watchdog?
it just won't work at boot time - the second logic unit will be stuck:

Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 5586.12 BogoMIPS (lpj=2793063)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU1: Thermal monitoring enabled (TM1)
Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
Brought up 2 CPUs
testing NMI watchdog ... <4>WARNING: CPU#1: NMI appears to be stuck (0->0)!

--
Aristeu

2008-08-15 12:36:35

by Aristeu Rozanski

[permalink] [raw]
Subject: [PATCH] NMI: fix watchdog failure message

> it just won't work at boot time - the second logic unit will be stuck:
>
> Booting processor 1/2 APIC 0x1
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 5586.12 BogoMIPS (lpj=2793063)
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 1024K
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 1
> CPU1: Thermal monitoring enabled (TM1)
> Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
> Brought up 2 CPUs
> testing NMI watchdog ... <4>WARNING: CPU#1: NMI appears to be stuck (0->0)!
while at it...

Signed-off-by: Aristeu Rozanski <[email protected]>

---
arch/x86/kernel/nmi.c | 4 ++++
1 file changed, 4 insertions(+)

--- linus-2.6.orig/arch/x86/kernel/nmi.c 2008-08-12 11:13:35.000000000 -0400
+++ linus-2.6/arch/x86/kernel/nmi.c 2008-08-15 08:33:57.000000000 -0400
@@ -142,11 +142,15 @@ int __init check_nmi_watchdog(void)
if (!per_cpu(wd_enabled, cpu))
continue;
if (get_nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
+ printk("\n");
printk(KERN_WARNING "WARNING: CPU#%d: NMI "
"appears to be stuck (%d->%d)!\n",
cpu,
prev_nmi_count[cpu],
get_nmi_count(cpu));
+ printk(KERN_WARNING "Please report this to "
+ "[email protected] and attach "
+ "the output of 'dmesg' command.\n");
per_cpu(wd_enabled, cpu) = 0;
atomic_dec(&nmi_active);
}

2008-08-15 13:36:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] NMI: fix watchdog failure message


* Aristeu Rozanski <[email protected]> wrote:

> > it just won't work at boot time - the second logic unit will be stuck:
> >
> > Booting processor 1/2 APIC 0x1
> > Initializing CPU#1
> > Calibrating delay using timer specific routine.. 5586.12 BogoMIPS (lpj=2793063)
> > CPU: Trace cache: 12K uops, L1 D cache: 16K
> > CPU: L2 cache: 1024K
> > CPU: Physical Processor ID: 0
> > CPU: Processor Core ID: 1
> > CPU1: Thermal monitoring enabled (TM1)
> > Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
> > Brought up 2 CPUs
> > testing NMI watchdog ... <4>WARNING: CPU#1: NMI appears to be stuck (0->0)!
> while at it...
>
> Signed-off-by: Aristeu Rozanski <[email protected]>

applied to tip/x86/urgent, thanks Aristeu.

I've also done the cleanup below - those ugly linebreaks are gone this
way as well.

Ingo

---------------------->
>From 8bb851900f5d0a79d3fddac808cc670d9894ef67 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Fri, 15 Aug 2008 15:34:32 +0200
Subject: [PATCH] x86, nmi: clean UP NMI watchdog failure message

clean up the failure message - and redirect people to bugzilla
instead of lkml.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/nmi.c | 32 +++++++++++++++++++-------------
1 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 919473a..abb78a2 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -114,6 +114,23 @@ static __init void nmi_cpu_busy(void *data)
}
#endif

+static void report_broken_nmi(int cpu, int *prev_nmi_count)
+{
+ printk(KERN_CONT "\n");
+
+ printk(KERN_WARNING
+ "WARNING: CPU#%d: NMI appears to be stuck (%d->%d)!\n",
+ cpu, prev_nmi_count[cpu], get_nmi_count(cpu));
+
+ printk(KERN_WARNING
+ "Please report this to bugzilla.kernel.org,\n");
+ printk(KERN_WARNING
+ "and attach the output of the 'dmesg' command.\n");
+
+ per_cpu(wd_enabled, cpu) = 0;
+ atomic_dec(&nmi_active);
+}
+
int __init check_nmi_watchdog(void)
{
unsigned int *prev_nmi_count;
@@ -141,19 +158,8 @@ int __init check_nmi_watchdog(void)
for_each_online_cpu(cpu) {
if (!per_cpu(wd_enabled, cpu))
continue;
- if (get_nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
- printk("\n");
- printk(KERN_WARNING "WARNING: CPU#%d: NMI "
- "appears to be stuck (%d->%d)!\n",
- cpu,
- prev_nmi_count[cpu],
- get_nmi_count(cpu));
- printk(KERN_WARNING "Please report this to "
- "[email protected] and attach "
- "the output of 'dmesg' command.\n");
- per_cpu(wd_enabled, cpu) = 0;
- atomic_dec(&nmi_active);
- }
+ if (get_nmi_count(cpu) - prev_nmi_count[cpu] <= 5)
+ report_broken_nmi(cpu, prev_nmi_count);
}
endflag = 1;
if (!atomic_read(&nmi_active)) {

2008-08-15 14:03:17

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH] NMI: fix watchdog failure message

> applied to tip/x86/urgent, thanks Aristeu.
>
> I've also done the cleanup below - those ugly linebreaks are gone this
> way as well.
a lot better. thanks Ingo

--
Aristeu