2003-06-19 15:56:20

by Andy Pfiffer

[permalink] [raw]
Subject: 2.5.72: wall-clock time advancing too rapidly?

I have a uniproc P3-800 system running 2.5.72, and time (from that
system's point of view) is racing ahead rapidly.

By "racing ahead rapidly", I mean this:

% date ; sleep 60 ; date
Thu Jun 19 09:04:29 PDT 2003
Thu Jun 19 09:05:29 PDT 2003
%

returns in 35 seconds (measured with my eyeballs and cheap wristwatch).

Has anyone else seen this?

Regards,
Andy



2003-06-19 21:22:51

by john stultz

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

On Thu, 2003-06-19 at 09:10, Andy Pfiffer wrote:
> I have a uniproc P3-800 system running 2.5.72, and time (from that
> system's point of view) is racing ahead rapidly.
>
> By "racing ahead rapidly", I mean this:
>
> % date ; sleep 60 ; date
> Thu Jun 19 09:04:29 PDT 2003
> Thu Jun 19 09:05:29 PDT 2003
> %
>
> returns in 35 seconds (measured with my eyeballs and cheap wristwatch).
>
> Has anyone else seen this?

Well, variants on a theme. Can I get more hardware info? Is this a
laptop? Are we running w/ Speed Step?

thanks
-john


2003-06-20 13:45:44

by Wiktor Wodecki

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

same here, time is running way too fast. Kernel 2.5.72-mm1, see attached
config for config. The cpu is a Pentium III (Coppermine) wit speedstep
enabled.
Might the time problem relate to the keyboard problem reported earlier?
(press a key and watch it a zilliopn times on display. not all the time,
tho)

00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge
(rev 03)
00:01.0 PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge
(rev 03)
00:02.0 CardBus bridge: Texas Instruments PCI1450 (rev 03)
00:02.1 CardBus bridge: Texas Instruments PCI1450 (rev 03)
00:03.0 Communication controller: Lucent Microelectronics WinModem 56k
(rev 01)
00:05.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24
[CrystalClear SoundFusion Audio Accelerator] (rev 01)
00:07.0 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03)
01:00.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/IX-MV (rev
11)

If you need more hardware info, tell me

--
Regards,

Wiktor Wodecki


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2003-06-20 14:49:12

by Andreas Haumer

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

john stultz wrote:
> On Thu, 2003-06-19 at 09:10, Andy Pfiffer wrote:
>
>>I have a uniproc P3-800 system running 2.5.72, and time (from that
>>system's point of view) is racing ahead rapidly.
>>
>>By "racing ahead rapidly", I mean this:
>>
>> % date ; sleep 60 ; date
>> Thu Jun 19 09:04:29 PDT 2003
>> Thu Jun 19 09:05:29 PDT 2003
>> %
>>
>>returns in 35 seconds (measured with my eyeballs and cheap wristwatch).
>>
>>Has anyone else seen this?
>
>
> Well, variants on a theme. Can I get more hardware info? Is this a
> laptop? Are we running w/ Speed Step?
>
I had this symptom recently on an Asus PR-DLS533 mainboard
(ServerWorks GCLE chipset) with linux-2.4.21 and found out
it happens only if I had BIOS "USB legacy support" enabled.
As soon as I disabled this BIOS option, the phenomenon
disappeard.
For more info look for lkml thread with subject "system clock
speed too high?", Message-ID <[email protected]>

HTH

- - andreas

- --
Andreas Haumer | mailto:[email protected]
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+8yIhxJmyeGcXPhERAoyRAKDD+bmvWsdoHXtsAUnmQpOivdYlRwCfZZox
gofL6W64SQ+Hy8xQMehLeS8=
=GEhC
-----END PGP SIGNATURE-----

2003-06-20 18:02:13

by john stultz

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

On Fri, 2003-06-20 at 07:01, Wiktor Wodecki wrote:
> same here, time is running way too fast. Kernel 2.5.72-mm1, see attached
> config for config. The cpu is a Pentium III (Coppermine) wit speedstep
> enabled.

Speedstep and the lost-ticks compensation code in the TSC time source
are fighting. Booting w/ "clock=pit" will let you work around it, but
I'm working on trying to make it automatically fall back.

This is being tracked in Bug #827.
http://bugme.osdl.org/show_bug.cgi?id=827

thanks
-john


2003-06-20 23:15:27

by Andy Pfiffer

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

On Fri, 2003-06-20 at 08:02, Andreas Haumer wrote:
> john stultz wrote:
> > On Thu, 2003-06-19 at 09:10, Andy Pfiffer wrote:

> >>By "racing ahead rapidly", I mean this:
> >>
> >> % date ; sleep 60 ; date
> >> Thu Jun 19 09:04:29 PDT 2003
> >> Thu Jun 19 09:05:29 PDT 2003
> >> %
> >>
> >>returns in 35 seconds (measured with my eyeballs and cheap wristwatch).
> >>

> > Well, variants on a theme. Can I get more hardware info? Is this a
> > laptop? Are we running w/ Speed Step?
> >
> I had this symptom recently on an Asus PR-DLS533 mainboard
> (ServerWorks GCLE chipset) with linux-2.4.21 and found out
> it happens only if I had BIOS "USB legacy support" enabled.
> As soon as I disabled this BIOS option, the phenomenon
> disappeard.

Update:

1) USB legacy support setting in the BIOS had no effect on the problem.
2) adding "clock=pit" (as suggested elsewhere) worked around the
problem.

So, does this fit the symptoms of:
http://bugme.osdl.org/show_bug.cgi?id=827

The system is not a laptop, and SpeedStep related issues should not be
messing things up here.

% cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 0.000
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 mmx fxsr sse
bogomips : 794.62

% lspci
00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
00:01.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04)
00:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 08)
00:09.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 08)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev
04)
01:03.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev 02)



2003-06-20 23:24:53

by john stultz

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

On Fri, 2003-06-20 at 16:28, Andy Pfiffer wrote:
> Update:
>
> 1) USB legacy support setting in the BIOS had no effect on the problem.
> 2) adding "clock=pit" (as suggested elsewhere) worked around the
> problem.
>
> So, does this fit the symptoms of:
> http://bugme.osdl.org/show_bug.cgi?id=827

Ok, good to know. I've been working on a fix and hopefully I'll have
something available soon.

> The system is not a laptop, and SpeedStep related issues should not be
> messing things up here.


Hmm. Interesting. Could you send me your dmesg?

thanks
-john


2003-06-21 00:00:45

by john stultz

[permalink] [raw]
Subject: Re: 2.5.72: wall-clock time advancing too rapidly?

Alright, I've got a first pass at a patch that might solve this.

This patch detects if we have 100 consecutive interrupts that find lost
ticks. If that occurs, we assume that the TSC is changing frequency and
we fall back to the PIT for a time source (equiv to booting w/
clock=pit).

I'd like to see this well tested as I don't want to prematurely disable
the TSC if lost ticks is just doing its job, but I also want to catch
speedstep cpus before folks notice time going out of control. So the #
of consecutive interrupts may need adjusting.

thanks
-john

This patch can also be found under bugme bug #827
http://bugme.osdl.org/show_bug.cgi?id=827

diff -Nru a/arch/i386/kernel/timers/timer.c b/arch/i386/kernel/timers/timer.c
--- a/arch/i386/kernel/timers/timer.c Fri Jun 20 16:56:05 2003
+++ b/arch/i386/kernel/timers/timer.c Fri Jun 20 16:56:05 2003
@@ -29,6 +29,16 @@
}
__setup("clock=", clock_setup);

+
+/* The chosen timesource has been found to be bad.
+ * Fall back to a known good timesource (the PIT)
+ */
+extern struct timer_opts *timer;
+void clock_fallback(void)
+{
+ timer = &timer_pit;
+}
+
/* iterates through the list of timers, returning the first
* one that initializes successfully.
*/
diff -Nru a/arch/i386/kernel/timers/timer_tsc.c b/arch/i386/kernel/timers/timer_tsc.c
--- a/arch/i386/kernel/timers/timer_tsc.c Fri Jun 20 16:56:05 2003
+++ b/arch/i386/kernel/timers/timer_tsc.c Fri Jun 20 16:56:05 2003
@@ -124,6 +124,7 @@
int countmp;
static int count1 = 0;
unsigned long long this_offset, last_offset;
+ static int lost_count = 0;

write_lock(&monotonic_lock);
last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
@@ -178,9 +179,19 @@
delta += delay_at_last_interrupt;
lost = delta/(1000000/HZ);
delay = delta%(1000000/HZ);
- if (lost >= 2)
+ if (lost >= 2) {
jiffies += lost-1;

+ /* sanity check to ensure we're not always loosing ticks */
+ if (lost_count++ > 100) {
+ printk("Loosing too many ticks!\n");
+ printk("TSC cannot be used as a timesource."
+ " (Are you running with SpeedStep?)\n");
+ printk("Falling back to a sane timesource.\n");
+ clock_fallback();
+ }
+ } else
+ lost_count = 0;
/* update the monotonic base value */
this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
monotonic_base += cycles_2_ns(this_offset - last_offset);





2003-06-24 00:03:08

by Andy Pfiffer

[permalink] [raw]
Subject: Re: 2.5.7[23]: wall-clock time advancing too rapidly?

On Fri, 2003-06-20 at 17:07, john stultz wrote:
> Alright, I've got a first pass at a patch that might solve this.
>
> This patch detects if we have 100 consecutive interrupts that find lost
> ticks. If that occurs, we assume that the TSC is changing frequency and
> we fall back to the PIT for a time source (equiv to booting w/
> clock=pit).
>
> I'd like to see this well tested as I don't want to prematurely disable
> the TSC if lost ticks is just doing its job, but I also want to catch
> speedstep cpus before folks notice time going out of control. So the #
> of consecutive interrupts may need adjusting.
>
> thanks
> -john
>
> This patch can also be found under bugme bug #827
> http://bugme.osdl.org/show_bug.cgi?id=827

Hmmm... I tried the patch in 2.5.73 and it appeared to have no effect.
The system continues to advance what it thinks is wall-clock time by
about 7.25 seconds for every 15 seconds actual wall-clock time:

# foreach i ( 1 2 3 4 5 6 7 8 9 10 )
foreach? ntpdate ntp
foreach? sleep 15
foreach? end
23 Jun 17:08:42 ntpdate[1336]: <snip> offset -48.975254 sec
23 Jun 17:08:49 ntpdate[1338]: <snip> offset -7.286794 sec
23 Jun 17:08:57 ntpdate[1340]: <snip> offset -7.291861 sec
23 Jun 17:09:05 ntpdate[1342]: <snip> offset -7.288794 sec
23 Jun 17:09:13 ntpdate[1344]: <snip> offset -7.289806 sec
23 Jun 17:09:21 ntpdate[1346]: <snip> offset -7.293829 sec
23 Jun 17:09:29 ntpdate[1348]: <snip> offset -7.286798 sec
23 Jun 17:09:36 ntpdate[1350]: <snip> offset -7.289818 sec
23 Jun 17:09:44 ntpdate[1352]: <snip> offset -7.283786 sec
23 Jun 17:09:52 ntpdate[1354]: <snip> offset -7.285859 sec
#


None of the printk's in the patch have been printed on the console.

Adding "clock=pit" continues to work as a workaround.

I have attached the output of dmesg from the boot of the kernel with the
patch present.


Andy

ps: cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 800.056
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips : 1576.96





Attachments:
dmesg.patched (11.74 kB)

2003-06-24 00:30:52

by john stultz

[permalink] [raw]
Subject: Re: 2.5.7[23]: wall-clock time advancing too rapidly?

On Mon, 2003-06-23 at 17:16, Andy Pfiffer wrote:
> Hmmm... I tried the patch in 2.5.73 and it appeared to have no effect.

Ah, just when I sent it off to Andrew. Well, I've been getting a number
of successful reports, so its still good to give it further testing.


> The system continues to advance what it thinks is wall-clock time by
> about 7.25 seconds for every 15 seconds actual wall-clock time:
[snip]
>
> None of the printk's in the patch have been printed on the console.

Yea, it clearly isn't triggering the code.

> Adding "clock=pit" continues to work as a workaround.
>
> I have attached the output of dmesg from the boot of the kernel with the
> patch present.

Looking over it again you're still not showing any of the signs of
changing cpu-frequency. But the symptoms are very similar. I'm curious,
this is the x220? Do you have a service processor installed in that box?
Maybe we're running into some sort of SMI trouble?

thanks
-john

2003-06-24 15:55:41

by Andy Pfiffer

[permalink] [raw]
Subject: Re: 2.5.7[23]: wall-clock time advancing too rapidly?

On Mon, 2003-06-23 at 17:37, john stultz wrote:
> On Mon, 2003-06-23 at 17:16, Andy Pfiffer wrote:
> > Hmmm... I tried the patch in 2.5.73 and it appeared to have no effect.
>
> Ah, just when I sent it off to Andrew. Well, I've been getting a number
> of successful reports, so its still good to give it further testing.

Yeah, I saw that. Same time in my inbox, too. ;^)

> Looking over it again you're still not showing any of the signs of
> changing cpu-frequency. But the symptoms are very similar. I'm curious,
> this is the x220? Do you have a service processor installed in that box?
> Maybe we're running into some sort of SMI trouble?

It says on the front of the case "IBM eserver", but the "e" sort of
looks like an "@". Also printed on the front of the case is "x220."

I have no idea if there is a service processor installed in the box.

Here is the output of lspci:

00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
00:01.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04)
00:09.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 08)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev
04)
01:03.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev 02)

There is a 2nd Intel e100 interface on the motherboard ("planar" in
IBM-speak), but that has been set to "disabled" in the BIOS for some
reason. Re-enabling it has no effect on the time skew, nor does the USB
legacy support that worked for someone else on a different system.

Strange, huh?