2007-05-04 21:29:10

by john stultz

[permalink] [raw]
Subject: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

One of the 2.6.21 regressions was Guilherme's problem seeing his box
lock up when the system detected an unstable TSC and dropped back to
using the HPET.

In digging deeper, we found the HPET is not actually incrementing on
this system. And in fact, the reason why this issue just cropped up was
because of Thomas's clocksource watchdog code was comparing the TSC to
the HPET (which wasn't moving) and thought the TSC was broken.

Anyway, Guliherme checked for a BIOS update and did not find one, so
I've added a DMI blacklist against his system so the HPET is not used.

Many thanks to Guilherme for the slow and laborious testing that finally
narrowed down this issue.

thanks
-john

Signed-off-by: John Stultz <[email protected]>

diff --git a/arch/i386/kernel/hpet.c b/arch/i386/kernel/hpet.c
index 17d7345..1ae27f3 100644
--- a/arch/i386/kernel/hpet.c
+++ b/arch/i386/kernel/hpet.c
@@ -5,6 +5,7 @@
#include <linux/init.h>
#include <linux/sysdev.h>
#include <linux/pm.h>
+#include <linux/dmi.h>

#include <asm/hpet.h>
#include <asm/io.h>
@@ -48,6 +49,31 @@ static int __init hpet_setup(char* str)
}
__setup("hpet=", hpet_setup);

+
+/* DMI Blacklist for bad HPETs */
+static int __init dmi_mark_hpet_broken(struct dmi_system_id *d)
+{
+ printk(KERN_NOTICE "%s detected: HPET does not function.\n",
+ d->ident);
+ boot_hpet_disable = 1;
+ return 0;
+}
+
+/* List of systems that have known HPETproblems */
+static struct dmi_system_id bad_hpet_dmi_table[] = {
+ {
+ .callback = dmi_mark_hpet_broken,
+ .ident = "Dell OptiPlex 320",
+ .matches = {
+ DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 320"),
+ DMI_MATCH(DMI_BOARD_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_BOARD_NAME, "0UT237"),
+ },
+ },
+ {}
+};
+
+
static inline int is_hpet_capable(void)
{
return (!boot_hpet_disable && hpet_address);
@@ -228,6 +254,8 @@ int __init hpet_enable(void)
uint64_t hpet_freq;
u64 tmp;

+ dmi_check_system(bad_hpet_dmi_table);
+
if (!is_hpet_capable())
return 0;






2007-05-04 21:44:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Fri, 04 May 2007 14:29:04 -0700
john stultz <[email protected]> wrote:

> One of the 2.6.21 regressions was Guilherme's problem seeing his box
> lock up when the system detected an unstable TSC and dropped back to
> using the HPET.
>
> In digging deeper, we found the HPET is not actually incrementing on
> this system. And in fact, the reason why this issue just cropped up was
> because of Thomas's clocksource watchdog code was comparing the TSC to
> the HPET (which wasn't moving) and thought the TSC was broken.
>
> Anyway, Guliherme checked for a BIOS update and did not find one, so
> I've added a DMI blacklist against his system so the HPET is not used.
>
> Many thanks to Guilherme for the slow and laborious testing that finally
> narrowed down this issue.
>

OK, I tagged that for -stable too.

>
> diff --git a/arch/i386/kernel/hpet.c b/arch/i386/kernel/hpet.c
> index 17d7345..1ae27f3 100644
> --- a/arch/i386/kernel/hpet.c
> +++ b/arch/i386/kernel/hpet.c
> @@ -5,6 +5,7 @@
> #include <linux/init.h>
> #include <linux/sysdev.h>
> #include <linux/pm.h>
> +#include <linux/dmi.h>
>
> #include <asm/hpet.h>
> #include <asm/io.h>
> @@ -48,6 +49,31 @@ static int __init hpet_setup(char* str)
> }
> __setup("hpet=", hpet_setup);
>
> +
> +/* DMI Blacklist for bad HPETs */
> +static int __init dmi_mark_hpet_broken(struct dmi_system_id *d)
> +{
> + printk(KERN_NOTICE "%s detected: HPET does not function.\n",
> + d->ident);
> + boot_hpet_disable = 1;
> + return 0;
> +}
> +
> +/* List of systems that have known HPETproblems */
> +static struct dmi_system_id bad_hpet_dmi_table[] = {
> + {
> + .callback = dmi_mark_hpet_broken,
> + .ident = "Dell OptiPlex 320",
> + .matches = {
> + DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 320"),
> + DMI_MATCH(DMI_BOARD_VENDOR, "Dell Inc."),
> + DMI_MATCH(DMI_BOARD_NAME, "0UT237"),
> + },
> + },
> + {}
> +};
> +
> +
> static inline int is_hpet_capable(void)
> {
> return (!boot_hpet_disable && hpet_address);
> @@ -228,6 +254,8 @@ int __init hpet_enable(void)
> uint64_t hpet_freq;
> u64 tmp;
>
> + dmi_check_system(bad_hpet_dmi_table);
> +
> if (!is_hpet_capable())
> return 0;

The table can be __initdata, can't it?


--- a/arch/i386/kernel/hpet.c~blacklist-dell-optiplex-320-from-using-the-hpet-fix
+++ a/arch/i386/kernel/hpet.c
@@ -60,7 +60,7 @@ static int __init dmi_mark_hpet_broken(s
}

/* List of systems that have known HPETproblems */
-static struct dmi_system_id bad_hpet_dmi_table[] = {
+static struct dmi_system_id __initdata bad_hpet_dmi_table[] = {
{
.callback = dmi_mark_hpet_broken,
.ident = "Dell OptiPlex 320",
@@ -73,7 +73,6 @@ static struct dmi_system_id bad_hpet_dmi
{}
};

-
static inline int is_hpet_capable(void)
{
return (!boot_hpet_disable && hpet_address);
_

2007-05-04 21:47:31

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Fri, 2007-05-04 at 14:44 -0700, Andrew Morton wrote:
> On Fri, 04 May 2007 14:29:04 -0700
> john stultz <[email protected]> wrote:
>
> > One of the 2.6.21 regressions was Guilherme's problem seeing his box
> > lock up when the system detected an unstable TSC and dropped back to
> > using the HPET.
> >
> > In digging deeper, we found the HPET is not actually incrementing on
> > this system. And in fact, the reason why this issue just cropped up was
> > because of Thomas's clocksource watchdog code was comparing the TSC to
> > the HPET (which wasn't moving) and thought the TSC was broken.
> >
> > Anyway, Guliherme checked for a BIOS update and did not find one, so
> > I've added a DMI blacklist against his system so the HPET is not used.
> >
> > Many thanks to Guilherme for the slow and laborious testing that finally
> > narrowed down this issue.
> >
>
> OK, I tagged that for -stable too.

Thanks.


> >
> > diff --git a/arch/i386/kernel/hpet.c b/arch/i386/kernel/hpet.c
> > index 17d7345..1ae27f3 100644
> > --- a/arch/i386/kernel/hpet.c
> > +++ b/arch/i386/kernel/hpet.c
> > @@ -5,6 +5,7 @@
> > #include <linux/init.h>
> > #include <linux/sysdev.h>
> > #include <linux/pm.h>
> > +#include <linux/dmi.h>
> >
> > #include <asm/hpet.h>
> > #include <asm/io.h>
> > @@ -48,6 +49,31 @@ static int __init hpet_setup(char* str)
> > }
> > __setup("hpet=", hpet_setup);
> >
> > +
> > +/* DMI Blacklist for bad HPETs */
> > +static int __init dmi_mark_hpet_broken(struct dmi_system_id *d)
> > +{
> > + printk(KERN_NOTICE "%s detected: HPET does not function.\n",
> > + d->ident);
> > + boot_hpet_disable = 1;
> > + return 0;
> > +}
> > +
> > +/* List of systems that have known HPETproblems */
> > +static struct dmi_system_id bad_hpet_dmi_table[] = {
> > + {
> > + .callback = dmi_mark_hpet_broken,
> > + .ident = "Dell OptiPlex 320",
> > + .matches = {
> > + DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 320"),
> > + DMI_MATCH(DMI_BOARD_VENDOR, "Dell Inc."),
> > + DMI_MATCH(DMI_BOARD_NAME, "0UT237"),
> > + },
> > + },
> > + {}
> > +};
> > +
> > +
> > static inline int is_hpet_capable(void)
> > {
> > return (!boot_hpet_disable && hpet_address);
> > @@ -228,6 +254,8 @@ int __init hpet_enable(void)
> > uint64_t hpet_freq;
> > u64 tmp;
> >
> > + dmi_check_system(bad_hpet_dmi_table);
> > +
> > if (!is_hpet_capable())
> > return 0;
>
> The table can be __initdata, can't it?

Yep. Thanks for catching that.

-john



2007-05-04 23:18:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Friday 04 May 2007 23:29:04 john stultz wrote:
> One of the 2.6.21 regressions was Guilherme's problem seeing his box
> lock up when the system detected an unstable TSC and dropped back to
> using the HPET.
>
> In digging deeper, we found the HPET is not actually incrementing on
> this system. And in fact, the reason why this issue just cropped up was
> because of Thomas's clocksource watchdog code was comparing the TSC to
> the HPET (which wasn't moving) and thought the TSC was broken.
>
> Anyway, Guliherme checked for a BIOS update and did not find one, so
> I've added a DMI blacklist against his system so the HPET is not used.
>
> Many thanks to Guilherme for the slow and laborious testing that finally
> narrowed down this issue.

Before going to hard to maintain DMI black lists we should first check
if it's a more general problem and can't it be solved better? Most likely
that system isn't the one with this issue and I don't want to apply
DMI patches forever.

In particular: what lspci chipset does it have? If it's Intel it might be
worth checking the datasheet if there is some "HPET stop" bit -- perhaps it
could be fixed up.

We seem to have a couple of Intel systems recently with HPET trouble.

-Andi

2007-05-04 23:22:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Friday 04 May 2007 23:44:08 Andrew Morton wrote:
> On Fri, 04 May 2007 14:29:04 -0700
> john stultz <[email protected]> wrote:
>
> > One of the 2.6.21 regressions was Guilherme's problem seeing his box
> > lock up when the system detected an unstable TSC and dropped back to
> > using the HPET.
> >
> > In digging deeper, we found the HPET is not actually incrementing on
> > this system. And in fact, the reason why this issue just cropped up was
> > because of Thomas's clocksource watchdog code was comparing the TSC to
> > the HPET (which wasn't moving) and thought the TSC was broken.
> >
> > Anyway, Guliherme checked for a BIOS update and did not find one, so
> > I've added a DMI blacklist against his system so the HPET is not used.
> >
> > Many thanks to Guilherme for the slow and laborious testing that finally
> > narrowed down this issue.
> >
>
> OK, I tagged that for -stable too.

Don't please. It is completely the wrong approach. DMI should be only last resort,
not first.

-Andi

2007-05-04 23:27:14

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Sat, 2007-05-05 at 01:18 +0200, Andi Kleen wrote:
> On Friday 04 May 2007 23:29:04 john stultz wrote:
> > One of the 2.6.21 regressions was Guilherme's problem seeing his box
> > lock up when the system detected an unstable TSC and dropped back to
> > using the HPET.
> >
> > In digging deeper, we found the HPET is not actually incrementing on
> > this system. And in fact, the reason why this issue just cropped up was
> > because of Thomas's clocksource watchdog code was comparing the TSC to
> > the HPET (which wasn't moving) and thought the TSC was broken.
> >
> > Anyway, Guliherme checked for a BIOS update and did not find one, so
> > I've added a DMI blacklist against his system so the HPET is not used.
> >
> > Many thanks to Guilherme for the slow and laborious testing that finally
> > narrowed down this issue.
>
> Before going to hard to maintain DMI black lists we should first check
> if it's a more general problem and can't it be solved better? Most likely
> that system isn't the one with this issue and I don't want to apply
> DMI patches forever.

We can give it a whirl, I just didn't want to add yet another "compare
with some other counter that may or may not work" check. In this case,
probably reading three times in a row and getting the same result would
be a clearly broken box.


> In particular: what lspci chipset does it have? If it's Intel it might be
> worth checking the datasheet if there is some "HPET stop" bit -- perhaps it
> could be fixed up.

Guilherme: Could you provide lspci output?


> We seem to have a couple of Intel systems recently with HPET trouble.

Ok, I wasn't aware it was a common issue.

thanks
-john

2007-05-05 15:24:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

Guilherme,

On Fri, 2007-05-04 at 16:27 -0700, john stultz wrote:
> > Before going to hard to maintain DMI black lists we should first check
> > if it's a more general problem and can't it be solved better? Most likely
> > that system isn't the one with this issue and I don't want to apply
> > DMI patches forever.
>
> We can give it a whirl, I just didn't want to add yet another "compare
> with some other counter that may or may not work" check. In this case,
> probably reading three times in a row and getting the same result would
> be a clearly broken box.

can you please undo John's patch and check whether the patch below works
for you.

Thanks,

tglx

Index: linux-2.6.21/arch/i386/kernel/hpet.c
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/hpet.c
+++ linux-2.6.21/arch/i386/kernel/hpet.c
@@ -231,6 +231,7 @@ int __init hpet_enable(void)
{
unsigned long id;
uint64_t hpet_freq;
+ cycle_t t1;
u64 tmp;

if (!is_hpet_capable())
@@ -278,6 +279,14 @@ int __init hpet_enable(void)
/* Start the counter */
hpet_start_counter();

+ /* Verify whether hpet counter works */
+ t1 = hpet_read();
+ udelay(50);
+ if (t1 == hpet_read()) {
+ printk(KERN_WARNING "HPET counter is defect\n");
+ goto out_nohpet;
+ }
+
/* Initialize and register HPET clocksource
*
* hpet period is in femto seconds per cycle


2007-05-05 15:24:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

Guilherme,

please discard the previous patch. I missed to refresh it before sending
it out. Correct version below.

tglx

On Sat, 2007-05-05 at 16:26 +0200, Thomas Gleixner wrote:
> can you please undo John's patch and check whether the patch below works
> for you.

Index: linux-2.6.21/arch/i386/kernel/hpet.c
===================================================================
--- linux-2.6.21.orig/arch/i386/kernel/hpet.c
+++ linux-2.6.21/arch/i386/kernel/hpet.c
@@ -231,6 +231,7 @@ int __init hpet_enable(void)
{
unsigned long id;
uint64_t hpet_freq;
+ cycle_t t1;
u64 tmp;

if (!is_hpet_capable())
@@ -278,6 +279,14 @@ int __init hpet_enable(void)
/* Start the counter */
hpet_start_counter();

+ /* Verify whether hpet counter works */
+ t1 = read_hpet();
+ udelay(50);
+ if (t1 == read_hpet()) {
+ printk(KERN_WARNING "HPET counter is defect\n");
+ goto out_nohpet;
+ }
+
/* Initialize and register HPET clocksource
*
* hpet period is in femto seconds per cycle


Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Sat, 05 May 2007, Thomas Gleixner wrote:
> + printk(KERN_WARNING "HPET counter is defect\n");

What about
printk(KERN_WARNING "HPET counter is stopped\n");

which is far more descriptive?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2007-05-05 17:24:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET


> if (!is_hpet_capable())
> @@ -278,6 +279,14 @@ int __init hpet_enable(void)
> /* Start the counter */
> hpet_start_counter();
>
> + /* Verify whether hpet counter works */
> + t1 = hpet_read();
> + udelay(50);

Are you sure udelay is calibrated at this point? I didn't think so.
In fact it needs the external clocks and it's a chicken and egg problem.

It might be safer to use a long loop with io port accesses or similar.

-Andi


> + if (t1 == hpet_read()) {
> + printk(KERN_WARNING "HPET counter is defect\n");
> + goto out_nohpet;
> + }
> +
> /* Initialize and register HPET clocksource
> *
> * hpet period is in femto seconds per cycle
>
>
>


2007-05-05 19:12:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Sat, 2007-05-05 at 19:24 +0200, Andi Kleen wrote:
> > if (!is_hpet_capable())
> > @@ -278,6 +279,14 @@ int __init hpet_enable(void)
> > /* Start the counter */
> > hpet_start_counter();
> >
> > + /* Verify whether hpet counter works */
> > + t1 = hpet_read();
> > + udelay(50);
>
> Are you sure udelay is calibrated at this point? I didn't think so.
> In fact it needs the external clocks and it's a chicken and egg problem.

Oops. You are right. OTOH it does not matter whether it is accurate or
not.

> It might be safer to use a long loop with io port accesses or similar.

Should work as well.

tglx


2007-05-05 22:05:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Saturday 05 May 2007 21:15:01 Thomas Gleixner wrote:
> On Sat, 2007-05-05 at 19:24 +0200, Andi Kleen wrote:
> > > if (!is_hpet_capable())
> > > @@ -278,6 +279,14 @@ int __init hpet_enable(void)
> > > /* Start the counter */
> > > hpet_start_counter();
> > >
> > > + /* Verify whether hpet counter works */
> > > + t1 = hpet_read();
> > > + udelay(50);
> >
> > Are you sure udelay is calibrated at this point? I didn't think so.
> > In fact it needs the external clocks and it's a chicken and egg problem.
>
> Oops. You are right. OTOH it does not matter whether it is accurate or
> not.

It might be very short if you're very unlucky and the CPU is fast
and then trigger the check incorrectly.

-Andi

2007-05-05 22:41:48

by Guilherme M. Schroeder

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

The lspci output:

00:00.0 Host bridge: ATI Technologies Inc Radeon Xpress 200 Host Bridge
(rev 01)
00:01.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0)
00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1)
00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2)
00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3)
00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4)
00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI)
00:14.0 SMBus: ATI Technologies Inc SB600 SMBus (rev 13)
00:14.1 IDE interface: ATI Technologies Inc SB600 IDE
00:14.2 Audio device: ATI Technologies Inc SB600 Azalia
00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge
00:14.4 PCI bridge: ATI Technologies Inc SB600 PCI to PCI Bridge
01:05.0 VGA compatible controller: ATI Technologies Inc RC410 [Radeon
Xpress 200]
02:09.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX
(rev 02)

And what about the patch? there's a final version to try?

Thanks.

john stultz wrote:
> On Sat, 2007-05-05 at 01:18 +0200, Andi Kleen wrote:
>> On Friday 04 May 2007 23:29:04 john stultz wrote:
>>> One of the 2.6.21 regressions was Guilherme's problem seeing his box
>>> lock up when the system detected an unstable TSC and dropped back to
>>> using the HPET.
>>>
>>> In digging deeper, we found the HPET is not actually incrementing on
>>> this system. And in fact, the reason why this issue just cropped up was
>>> because of Thomas's clocksource watchdog code was comparing the TSC to
>>> the HPET (which wasn't moving) and thought the TSC was broken.
>>>
>>> Anyway, Guliherme checked for a BIOS update and did not find one, so
>>> I've added a DMI blacklist against his system so the HPET is not used.
>>>
>>> Many thanks to Guilherme for the slow and laborious testing that finally
>>> narrowed down this issue.
>> Before going to hard to maintain DMI black lists we should first check
>> if it's a more general problem and can't it be solved better? Most likely
>> that system isn't the one with this issue and I don't want to apply
>> DMI patches forever.
>
> We can give it a whirl, I just didn't want to add yet another "compare
> with some other counter that may or may not work" check. In this case,
> probably reading three times in a row and getting the same result would
> be a clearly broken box.
>
>
>> In particular: what lspci chipset does it have? If it's Intel it might be
>> worth checking the datasheet if there is some "HPET stop" bit -- perhaps it
>> could be fixed up.
>
> Guilherme: Could you provide lspci output?
>
>
>> We seem to have a couple of Intel systems recently with HPET trouble.
>
> Ok, I wasn't aware it was a common issue.
>
> thanks
> -john
>
>
>

2007-05-07 18:59:09

by Andreas Mohr

[permalink] [raw]
Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

Hi,

On Sat, May 05, 2007 at 12:58:15PM -0300, Henrique de Moraes Holschuh wrote:
> On Sat, 05 May 2007, Thomas Gleixner wrote:
> > + printk(KERN_WARNING "HPET counter is defect\n");
>
> What about
> printk(KERN_WARNING "HPET counter is stopped\n");
>
> which is far more descriptive?

Far more misguiding, rather?

People may mistake this (when not realizing the KERN_WARNING output)
as a status message indicating that we actively stopped the counter.
Maybe something like
"HPET counter inoperable or defective (no increment found during probing)"?
(with an optional "please report" part)

Andreas Mohr

Subject: Re: [PATCH] Blacklist Dell Optiplex 320 from using the HPET

On Mon, 07 May 2007, Andreas Mohr wrote:
> On Sat, May 05, 2007 at 12:58:15PM -0300, Henrique de Moraes Holschuh wrote:
> > On Sat, 05 May 2007, Thomas Gleixner wrote:
> > > + printk(KERN_WARNING "HPET counter is defect\n");
> >
> > What about
> > printk(KERN_WARNING "HPET counter is stopped\n");
> >
> > which is far more descriptive?
>
> Far more misguiding, rather?

True, one can miss the "is".

> "HPET counter inoperable or defective (no increment found during probing)"?
> (with an optional "please report" part)

Yes, this looks better.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh