2006-12-03 13:50:46

by Ian Campbell

[permalink] [raw]
Subject: PMTMR running too fast

Hi,

In older kernels arch/i386/kernel/timers/timer_pm.c:verify_pmtmr_rate
contained a check for sensible PMTMR rate and disabled that clocksource
if it was found to be out of spec[0]. This check seems to have been lost
in the transition to drivers/clocksource/acpi_pm.c, the removal is in
61743fe445213b87fb55a389c8d073785323ca3e "Time: i386 Conversion - part
4: Remove Old timer_opts Code"[1] and the check is not present in the
replacement 5d0cf410e94b1f1ff852c3f210d22cc6c5a27ffa "Time: i386
Clocksource Drivers"[2].

Is there a specific reason the check was removed (I couldn't see on in
the archives) or was it simply overlooked? Without it I need to pass
clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
an Aladdin chipset (will dig out the precise details if required). Would
a patch to reintroduce the check be acceptable or would some sort of
blacklist based solution be more acceptable?

Cheers,
Ian.

[0] "PM-Timer running at invalid rate: 200% of normal - aborting." from
http://www.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=commit;h=6d58b1286c7ac88741374c158867f564e602b288
see also http://bugme.osdl.org/show_bug.cgi?id=2375
[1]
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=61743fe445213b87fb55a389c8d073785323ca3e
[2]
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5d0cf410e94b1f1ff852c3f210d22cc6c5a27ffa

--
Ian Campbell

Any time things appear to be going better, you have overlooked something.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2006-12-04 19:19:30

by john stultz

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Sun, 2006-12-03 at 13:50 +0000, Ian Campbell wrote:
> In older kernels arch/i386/kernel/timers/timer_pm.c:verify_pmtmr_rate
> contained a check for sensible PMTMR rate and disabled that clocksource
> if it was found to be out of spec[0]. This check seems to have been lost
> in the transition to drivers/clocksource/acpi_pm.c, the removal is in
> 61743fe445213b87fb55a389c8d073785323ca3e "Time: i386 Conversion - part
> 4: Remove Old timer_opts Code"[1] and the check is not present in the
> replacement 5d0cf410e94b1f1ff852c3f210d22cc6c5a27ffa "Time: i386
> Clocksource Drivers"[2].

Fedora has a bug covering this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211902


> Is there a specific reason the check was removed (I couldn't see on in
> the archives) or was it simply overlooked? Without it I need to pass
> clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
> an Aladdin chipset (will dig out the precise details if required). Would
> a patch to reintroduce the check be acceptable or would some sort of
> blacklist based solution be more acceptable?

If I recall correctly, it was pulled because there was some question as
to if it was actually needed (x86_64 didn't need it) and it slows down
the boot time (although not by much).

I'm fine just re-adding it. Although if the number of affected systems
are small we could just blacklist it (Ian, mind sending dmidecode
output?).

Andi, your thoughts?

thanks
-john


2006-12-04 19:40:53

by Ian Campbell

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Mon, 2006-12-04 at 11:19 -0800, john stultz wrote:
> On Sun, 2006-12-03 at 13:50 +0000, Ian Campbell wrote:
> > In older kernels arch/i386/kernel/timers/timer_pm.c:verify_pmtmr_rate
> > contained a check for sensible PMTMR rate and disabled that clocksource
> > if it was found to be out of spec[0]. This check seems to have been lost
> > in the transition to drivers/clocksource/acpi_pm.c, the removal is in
> > 61743fe445213b87fb55a389c8d073785323ca3e "Time: i386 Conversion - part
> > 4: Remove Old timer_opts Code"[1] and the check is not present in the
> > replacement 5d0cf410e94b1f1ff852c3f210d22cc6c5a27ffa "Time: i386
> > Clocksource Drivers"[2].
>
> Fedora has a bug covering this:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211902

> > Is there a specific reason the check was removed (I couldn't see on in
> > the archives) or was it simply overlooked? Without it I need to pass
> > clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
> > an Aladdin chipset (will dig out the precise details if required). Would
> > a patch to reintroduce the check be acceptable or would some sort of
> > blacklist based solution be more acceptable?
>
> If I recall correctly, it was pulled because there was some question as
> to if it was actually needed (x86_64 didn't need it) and it slows down
> the boot time (although not by much).
>
> I'm fine just re-adding it. Although if the number of affected systems
> are small we could just blacklist it (Ian, mind sending dmidecode
> output?).

dmidecode.txt is attached.

Cheers,
Ian.

--
Ian Campbell

Felson's Law:
To steal ideas from one person is plagiarism; to steal from
many is research.


Attachments:
dmidecode.txt (9.55 kB)
signature.asc (189.00 B)
This is a digitally signed message part
Download all attachments

2006-12-04 20:15:23

by john stultz

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Mon, 2006-12-04 at 19:40 +0000, Ian Campbell wrote:
> On Mon, 2006-12-04 at 11:19 -0800, john stultz wrote:
> > On Sun, 2006-12-03 at 13:50 +0000, Ian Campbell wrote:
> > > In older kernels arch/i386/kernel/timers/timer_pm.c:verify_pmtmr_rate
> > > contained a check for sensible PMTMR rate and disabled that clocksource
> > > if it was found to be out of spec[0]. This check seems to have been lost
> > > in the transition to drivers/clocksource/acpi_pm.c, the removal is in
> > > 61743fe445213b87fb55a389c8d073785323ca3e "Time: i386 Conversion - part
> > > 4: Remove Old timer_opts Code"[1] and the check is not present in the
> > > replacement 5d0cf410e94b1f1ff852c3f210d22cc6c5a27ffa "Time: i386
> > > Clocksource Drivers"[2].
> >
> > Fedora has a bug covering this:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211902
>
> > > Is there a specific reason the check was removed (I couldn't see on in
> > > the archives) or was it simply overlooked? Without it I need to pass
> > > clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
> > > an Aladdin chipset (will dig out the precise details if required). Would
> > > a patch to reintroduce the check be acceptable or would some sort of
> > > blacklist based solution be more acceptable?
> >
> > If I recall correctly, it was pulled because there was some question as
> > to if it was actually needed (x86_64 didn't need it) and it slows down
> > the boot time (although not by much).
> >
> > I'm fine just re-adding it. Although if the number of affected systems
> > are small we could just blacklist it (Ian, mind sending dmidecode
> > output?).

I don't have a dev box to test on at the moment, but here's a quick hack
attempt at re-adding the code. Does the following work for you?

thanks
-john


diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 7fcb77a..3379b5f 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -21,9 +21,12 @@ #include <linux/errno.h>
#include <linux/init.h>
#include <linux/pci.h>
#include <asm/io.h>
+#include "mach_timer.h"

/* Number of PMTMR ticks expected during calibration run */
#define PMTMR_TICKS_PER_SEC 3579545
+#define PMTMR_EXPECTED_RATE \
+ ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))

/*
* The I/O port the PMTMR resides at.
@@ -142,6 +145,36 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SE
acpi_pm_check_graylist);
#endif

+#ifndef CONFIG_X86_64
+/*
+ * Some boards have the PMTMR running way too fast. We check
+ * the PMTMR rate against PIT channel 2 to catch these cases.
+ */
+static int verify_pmtmr_rate(void)
+{
+ u32 value1, value2;
+ unsigned long count, delta;
+
+ mach_prepare_counter();
+ value1 = read_pmtmr();
+ mach_countup(&count);
+ value2 = read_pmtmr();
+ delta = (value2 - value1) & ACPI_PM_MASK;
+
+ /* Check that the PMTMR delta is within 5% of what we expect */
+ if (delta < (PMTMR_EXPECTED_RATE * 19) / 20 ||
+ delta > (PMTMR_EXPECTED_RATE * 21) / 20) {
+ printk(KERN_INFO "PM-Timer running at invalid rate: %lu%% "
+ "of normal - aborting.\n",
+ 100UL * delta / PMTMR_EXPECTED_RATE);
+ return -1;
+ }
+
+ return 0;
+}
+#else
+#define verify_pmtmr_rate() (0)
+#endif

static int __init init_acpi_pm_clocksource(void)
{
@@ -173,6 +206,9 @@ static int __init init_acpi_pm_clocksour
return -ENODEV;

pm_good:
+ if (verify_pmtmr_rate() != 0)
+ return -ENODEV;
+
return clocksource_register(&clocksource_acpi_pm);
}



2006-12-05 07:41:53

by Ian Campbell

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Mon, 2006-12-04 at 12:14 -0800, john stultz wrote:
> On Mon, 2006-12-04 at 19:40 +0000, Ian Campbell wrote:
> > On Mon, 2006-12-04 at 11:19 -0800, john stultz wrote:
> > > On Sun, 2006-12-03 at 13:50 +0000, Ian Campbell wrote:
> > > > In older kernels arch/i386/kernel/timers/timer_pm.c:verify_pmtmr_rate
> > > > contained a check for sensible PMTMR rate and disabled that clocksource
> > > > if it was found to be out of spec[0]. This check seems to have been lost
> > > > in the transition to drivers/clocksource/acpi_pm.c, the removal is in
> > > > 61743fe445213b87fb55a389c8d073785323ca3e "Time: i386 Conversion - part
> > > > 4: Remove Old timer_opts Code"[1] and the check is not present in the
> > > > replacement 5d0cf410e94b1f1ff852c3f210d22cc6c5a27ffa "Time: i386
> > > > Clocksource Drivers"[2].
> > >
> > > Fedora has a bug covering this:
> > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211902
> >
> > > > Is there a specific reason the check was removed (I couldn't see on in
> > > > the archives) or was it simply overlooked? Without it I need to pass
> > > > clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
> > > > an Aladdin chipset (will dig out the precise details if required). Would
> > > > a patch to reintroduce the check be acceptable or would some sort of
> > > > blacklist based solution be more acceptable?
> > >
> > > If I recall correctly, it was pulled because there was some question as
> > > to if it was actually needed (x86_64 didn't need it) and it slows down
> > > the boot time (although not by much).
> > >
> > > I'm fine just re-adding it. Although if the number of affected systems
> > > are small we could just blacklist it (Ian, mind sending dmidecode
> > > output?).
>
> I don't have a dev box to test on at the moment, but here's a quick hack
> attempt at re-adding the code. Does the following work for you?

I get:
PM-Timer running at invalid rate: 200% of normal - aborting.
...
Time: pit clocksource has been installed.

Without the clocksource parameter.

Should tsc be preferred to pit though?

/sys/devices/system/clocksource/clocksource0/available_clocksource now
shows "jiffies tsc pit" and current_clocksource is "pit".

Thanks,
Ian.

--
Ian Campbell

love, n.:
When it's growing, you don't mind watering it with a few tears.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2006-12-05 19:37:32

by john stultz

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Tue, 2006-12-05 at 07:41 +0000, Ian Campbell wrote:
> On Mon, 2006-12-04 at 12:14 -0800, john stultz wrote:
> > I don't have a dev box to test on at the moment, but here's a quick hack
> > attempt at re-adding the code. Does the following work for you?
>
> I get:
> PM-Timer running at invalid rate: 200% of normal - aborting.
> ...
> Time: pit clocksource has been installed.
>
> Without the clocksource parameter.

Great!

> Should tsc be preferred to pit though?

Depends on your system. If C2/C3 or cpufreq state changes are detected,
we mark the tsc as unstable.

It sounds as if from your earlier email the TSC works fine, so we might
want to look at what's making the system think its not ok. I probably
need to add a message as to why it was disqualified. However, that's a
separate issue from the last patch.

Thanks for the testing!

-john




2006-12-06 08:14:52

by Ian Campbell

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Tue, 2006-12-05 at 11:34 -0800, john stultz wrote:
> On Tue, 2006-12-05 at 07:41 +0000, Ian Campbell wrote:
> > Should tsc be preferred to pit though?
>
> Depends on your system. If C2/C3 or cpufreq state changes are detected,
> we mark the tsc as unstable.

I'm not using them on purpose but I'll check it out.

> It sounds as if from your earlier email the TSC works fine, so we might
> want to look at what's making the system think its not ok. I probably
> need to add a message as to why it was disqualified. However, that's a
> separate issue from the last patch.

I'll have a play, see if I can figure it out.

> Thanks for the testing!

Thanks for the fix!

Ian.

--
Ian Campbell

* Turken thinks little kids are absolutely adorable... especialyy when
they're someone elses.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2006-12-06 17:44:23

by Andi Kleen

[permalink] [raw]
Subject: Re: PMTMR running too fast


>
> > Is there a specific reason the check was removed (I couldn't see on in
> > the archives) or was it simply overlooked? Without it I need to pass
> > clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
> > an Aladdin chipset (will dig out the precise details if required). Would
> > a patch to reintroduce the check be acceptable or would some sort of
> > blacklist based solution be more acceptable?
>
> If I recall correctly, it was pulled because there was some question as
> to if it was actually needed (x86_64 didn't need it) and it slows down
> the boot time (although not by much).
>
> I'm fine just re-adding it. Although if the number of affected systems
> are small we could just blacklist it (Ian, mind sending dmidecode
> output?).
>
> Andi, your thoughts?

Doing a check at boot time is fine for me. Just I don't want the
"read pmtmr three times at runtime" code anywhere near x86-64

I don't think the boot time check needs DMI guarding

But BTW the check is not necessarily enough -- there is at least one
NF3 machine around where the PIT timer ticks at a wrong frequency.
Safer would be probably to calibrate against RTC which is afaik used
by Windows too (so it's likely to be ok)

-Andi

2006-12-06 19:29:00

by john stultz

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Wed, 2006-12-06 at 17:44 +0100, Andi Kleen wrote:
> >
> > > Is there a specific reason the check was removed (I couldn't see on in
> > > the archives) or was it simply overlooked? Without it I need to pass
> > > clocksource=tsc to have 2.6.18 work correctly on an older K6 system with
> > > an Aladdin chipset (will dig out the precise details if required). Would
> > > a patch to reintroduce the check be acceptable or would some sort of
> > > blacklist based solution be more acceptable?
> >
> > If I recall correctly, it was pulled because there was some question as
> > to if it was actually needed (x86_64 didn't need it) and it slows down
> > the boot time (although not by much).
> >
> > I'm fine just re-adding it. Although if the number of affected systems
> > are small we could just blacklist it (Ian, mind sending dmidecode
> > output?).
> >
> > Andi, your thoughts?
>
> Doing a check at boot time is fine for me. Just I don't want the
> "read pmtmr three times at runtime" code anywhere near x86-64

:) This change fully disqualifies the ACPI PM if its running at the
wrong frequency, so no worries there.

> I don't think the boot time check needs DMI guarding

DMI guarding? I'm not following...

> But BTW the check is not necessarily enough -- there is at least one
> NF3 machine around where the PIT timer ticks at a wrong frequency.
> Safer would be probably to calibrate against RTC which is afaik used
> by Windows too (so it's likely to be ok)

Hmm.. I might lean towards pushing the patch closer to as it is, as
we're just restoring functionality that was there before in 2.6.17. The
NF3 system already needs bits to correct for the PIT frequency, so it
seems that code could also correct the mach_countup() routines.

Even so, I do agree with you that moving to utilize more "widely tested"
hardware for calibration would be a good thing.

thanks
-john


2006-12-06 19:46:32

by Ian Campbell

[permalink] [raw]
Subject: Re: PMTMR running too fast

On Tue, 2006-12-05 at 11:34 -0800, john stultz wrote:
>
> > Should tsc be preferred to pit though?
>
> Depends on your system. If C2/C3 or cpufreq state changes are
> detected, we mark the tsc as unstable.

Turns out I was seeing C2 states. I'll stick with PIT for now.

Thanks,
Ian.

--
Ian Campbell

If ignorance is bliss, why aren't there more happy people?


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2006-12-06 20:14:23

by Andi Kleen

[permalink] [raw]
Subject: Re: PMTMR running too fast


> > I don't think the boot time check needs DMI guarding
>
> DMI guarding? I'm not following...

DMI guarding = Making it conditional on a DMI check

-Andi