2002-04-03 14:10:42

by Chris Wilson

[permalink] [raw]
Subject: P4/i845 Strange clock drifting


Hi,

I've got a 1U 2.0 Ghz P4 rackmount server with an i845 chipset and have
noticed some strange issues with the timer. For the most part it keeps
time perfectly... but pretty often (tens of times each day) it'll have
drifted anything from a few seconds to a few minutes - during a 10 minute
period. It's always behind-time - so perhaps this is something to do with
the P4's throttling stuff? Has anyone else seen similar?

I tried to use 2.5.7-dj2 with Zwane Mwaikambo's thermal LVT support in
there but it didn't detect a local APIC on bootup (!) - I'm guessing there
needs to be an APIC for Zwane's stuff? When I tried to switch back to
2.4.18 the machine never came back - as soon as someone power cycles it
then I can do some more tests!

Regards,

Chris

--
Chris Wilson
[email protected]


2002-04-03 14:30:50

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting

On Wed, 3 Apr 2002, Chris Wilson wrote:

> I've got a 1U 2.0 Ghz P4 rackmount server with an i845 chipset and have
> noticed some strange issues with the timer. For the most part it keeps
> time perfectly... but pretty often (tens of times each day) it'll have
> drifted anything from a few seconds to a few minutes - during a 10 minute
> period. It's always behind-time - so perhaps this is something to do with
> the P4's throttling stuff? Has anyone else seen similar?

The throttle is not supposed to affect the TSC, and only takes affect
when overheating.

> I tried to use 2.5.7-dj2 with Zwane Mwaikambo's thermal LVT support in
> there but it didn't detect a local APIC on bootup (!) - I'm guessing there
> needs to be an APIC for Zwane's stuff? When I tried to switch back to

-dj2 P4 thermal patch is a bit broken (my bad), but the fact that it
doesn't detect an APIC means that code would, erm do interesting things...

Zwane

--
http://function.linuxpower.ca


Subject: RE: P4/i845 Strange clock drifting


Hi.

Maybe you could read Bernd Schubert message to this list on Wed Mar 27 2002
- 10:28:35 EST
http://www.uwsg.iu.edu/hypermail/linux/kernel/0203.3/0557.html

HTH

Gonzalo

-----Original Message-----
From: Chris Wilson [mailto:[email protected]]
Sent: Wednesday, April 03, 2002 4:10 PM
To: [email protected]
Subject: P4/i845 Strange clock drifting



Hi,

I've got a 1U 2.0 Ghz P4 rackmount server with an i845 chipset and have
noticed some strange issues with the timer. For the most part it keeps
time perfectly... but pretty often (tens of times each day) it'll have
drifted anything from a few seconds to a few minutes - during a 10 minute
period. It's always behind-time - so perhaps this is something to do with
the P4's throttling stuff? Has anyone else seen similar?

I tried to use 2.5.7-dj2 with Zwane Mwaikambo's thermal LVT support in
there but it didn't detect a local APIC on bootup (!) - I'm guessing there
needs to be an APIC for Zwane's stuff? When I tried to switch back to
2.4.18 the machine never came back - as soon as someone power cycles it
then I can do some more tests!

Regards,

Chris

--
Chris Wilson
[email protected]

2002-04-05 12:28:22

by Chris Wilson

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting



> > I tried to use 2.5.7-dj2 with Zwane Mwaikambo's thermal LVT support in
> > there but it didn't detect a local APIC on bootup (!) - I'm guessing there
> > needs to be an APIC for Zwane's stuff? When I tried to switch back to
>
> -dj2 P4 thermal patch is a bit broken (my bad), but the fact that it
> doesn't detect an APIC means that code would, erm do interesting things...

<grin>

I've now tried a couple more kernels to no avail - nothing can find APICs.
Is it even possible for a P4 to not have a local APIC? System is a
supermicro 5012B*.

/proc/cpuinfo shows:

flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm

(notice no "apic"). Is this normal/correct? If just just removed the check
from apic.c and tried to enable the apic anyway then are bad things going
to happen?

I've also noticed [probably unrelated but...] that I can't reboot the box
without use of the reset button - it doesn't come up after /sbin/reboot -f
either. It's at a colo facility so I can't see what's being displayed
until I find out a null modem and go for a drive... :)

Any suggestions??

Chris

* http://www.supermicro.com/PRODUCT/SUPERServer/SuperServer5012B-E.htm

--
Chris Wilson
[email protected]

2002-04-05 14:44:49

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting

On Fri, 5 Apr 2002, Chris Wilson wrote:

> > -dj2 P4 thermal patch is a bit broken (my bad), but the fact that it
> > doesn't detect an APIC means that code would, erm do interesting things...
>
> <grin>
>
> I've now tried a couple more kernels to no avail - nothing can find APICs.
> Is it even possible for a P4 to not have a local APIC? System is a
> supermicro 5012B*.
>
> /proc/cpuinfo shows:
>
> flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
>
> (notice no "apic"). Is this normal/correct? If just just removed the check
> from apic.c and tried to enable the apic anyway then are bad things going
> to happen?

All P4s have a local APIC, however your bios can play a part in making it
unavailable (global enable flag in apic base MSR). Please send me your
dmesg.

> I've also noticed [probably unrelated but...] that I can't reboot the box
> without use of the reset button - it doesn't come up after /sbin/reboot -f
> either. It's at a colo facility so I can't see what's being displayed
> until I find out a null modem and go for a drive... :)

Have you tried the various reboot kernel parameters? You can try the
following.

reboot=w - Sets warm reboot flag
reboot=c - Sets cold reboot flag
reboot=b - Reboot via jump to BIOS

and finally if you're really desperate ;)

reboot=h - do a hard reboot, i think this is does a triple fault

Cheers,
Zwane

--
http://function.linuxpower.ca



2002-04-05 15:01:52

by Chris Wilson

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting


Hi Zwane,

Thanks for your message.

> All P4s have a local APIC, however your bios can play a part in making it
> unavailable (global enable flag in apic base MSR). Please send me your
> dmesg.

I've looked through the motherboard manual and it doesn't look like there
are any settings that should affect this. dmesg is at the bottom of this
message.

> Have you tried the various reboot kernel parameters? You can try the
> following.
>
> reboot=w - Sets warm reboot flag
> reboot=c - Sets cold reboot flag
> reboot=b - Reboot via jump to BIOS
>
> and finally if you're really desperate ;)
>
> reboot=h - do a hard reboot, i think this is does a triple fault

Thanks - I wasn't aware of them! I'll see whether that fixes the reboot problem!

Cheers,

Chris

dmesg:

Linux version 2.4.18 (root@lightning) (gcc version 2.95.3 20010315 (release)) #5 Wed Apr 3 19:21:02 BST 2002
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
On node 0 totalpages: 262128
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 32752 pages.
No local APIC present or hardware disabled
Kernel command line: auto BOOT_IMAGE=Linux ro root=900
Initializing CPU#0
Detected 1999.834 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 3984.58 BogoMIPS
Memory: 1029912k/1048512k available (1019k kernel code, 18216k reserved, 300k data, 212k init, 131008k highmem)
Dentry-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
Buffer-cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: Before vendor init, caps: 3febf9ff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: After vendor init, caps: 3febf9ff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 3febf9ff 00000000 00000000 00000000
CPU: Common caps: 3febf9ff 00000000 00000000 00000000
CPU: Intel(R) Pentium(R) 4 CPU 2.00GHz stepping 04
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch ([email protected])
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfb1a0, last bus=2
PCI: Using configuration type 1
PCI: Probing PCI hardware
Unknown bridge resource 0: assuming transparent
Unknown bridge resource 1: assuming transparent
Unknown bridge resource 2: assuming transparent
Unknown bridge resource 2: assuming transparent
PCI: Using IRQ router PIIX [8086/2440] at 00:1f.0
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
allocated 32 pages and 32 bhs reserved for the highmem bounces
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 128 slots per queue, batch=32
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev f9
PIIX4: chipset revision 18
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:pio
keyboard: Timeout - AT keyboard not present?(ed)
keyboard: Timeout - AT keyboard not present?(f4)
hda: IC35L040AVVA07-0, ATA DISK drive
hdb: IC35L040AVVA07-0, ATA DISK drive
hdc: MATSHITA CR-177, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=5005/255/63, UDMA(100)
hdb: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=5005/255/63, UDMA(100)
hdc: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
Partition check:
hda: hda1 hda2
hdb: hdb1 hdb2
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <[email protected]> and others
PCI: Found IRQ 12 for device 02:06.0
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:48:51:01:AF, IRQ 12.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
PCI: Found IRQ 11 for device 02:07.0
PCI: Sharing IRQ 11 with 02:08.0
eth1: OEM i82557/i82558 10/100 Ethernet, 00:30:48:51:01:AE, IRQ 11.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 00000024]
[events: 00000024]
md: autorun ...
md: considering hdb1 ...
md: adding hdb1 ...
md: adding hda1 ...
md: created md0
md: bind<hda1,1>
md: bind<hdb1,2>
md: running: <hdb1><hda1>
md: hdb1's event counter: 00000024
md: hda1's event counter: 00000024
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hdb1 operational as mirror 0
raid1: device hda1 operational as mirror 1
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hdb1 [events: 00000025]<6>(write) hdb1's sb offset: 39150272
md: hda1 [events: 00000025]<6>(write) hda1's sb offset: 39150272
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 212k freed

2002-04-05 16:41:46

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting

On Fri, 5 Apr 2002, Chris Wilson wrote:

> No local APIC present or hardware disabled

Thanks for the report. According to docs the following patch should
help. Not tested at all but trivial enough to work out of the box.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

patch-2.4.18-p4-upapic-0
diff -up --recursive --new-file linux-2.4.18.macro/arch/i386/kernel/apic.c linux-2.4.18/arch/i386/kernel/apic.c
--- linux-2.4.18.macro/arch/i386/kernel/apic.c 2002-03-01 14:48:39.000000000 +0000
+++ linux-2.4.18/arch/i386/kernel/apic.c 2002-04-05 16:38:11.000000000 +0000
@@ -598,7 +598,7 @@ static int __init detect_init_APIC (void
goto no_apic;
case X86_VENDOR_INTEL:
if (boot_cpu_data.x86 == 6 ||
- (boot_cpu_data.x86 == 15 && cpu_has_apic) ||
+ boot_cpu_data.x86 == 15 ||
(boot_cpu_data.x86 == 5 && cpu_has_apic))
break;
goto no_apic;
@@ -610,7 +610,8 @@ static int __init detect_init_APIC (void
/*
* Some BIOSes disable the local APIC in the
* APIC_BASE MSR. This can only be done in
- * software for Intel P6 and AMD K7 (Model > 1).
+ * software for Intel P6, Intel P4 and AMD K7
+ * (Model > 1).
*/
rdmsr(MSR_IA32_APICBASE, l, h);
if (!(l & MSR_IA32_APICBASE_ENABLE)) {

2002-04-05 17:52:20

by Mikael Pettersson

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting

On Fri, 5 Apr 2002 13:28:10 +0100, Chris Wilson wrote:
>I've now tried a couple more kernels to no avail - nothing can find APICs.
>Is it even possible for a P4 to not have a local APIC? System is a
>supermicro 5012B*.
>
>/proc/cpuinfo shows:
>
>flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
>
>(notice no "apic"). Is this normal/correct? If just just removed the check
>from apic.c and tried to enable the apic anyway then are bad things going
>to happen?

Your P4 does contain a local APIC, but your BIOS chose to disable it.
The following patch should (re)enable it:

--- linux-2.5.7/arch/i386/kernel/apic.c.~1~ Sat Mar 9 12:53:12 2002
+++ linux-2.5.7/arch/i386/kernel/apic.c Fri Apr 5 19:35:14 2002
@@ -603,7 +603,7 @@
goto no_apic;
case X86_VENDOR_INTEL:
if (boot_cpu_data.x86 == 6 ||
- (boot_cpu_data.x86 == 15 && cpu_has_apic) ||
+ boot_cpu_data.x86 == 15 ||
(boot_cpu_data.x86 == 5 && cpu_has_apic))
break;
goto no_apic;
@@ -615,7 +615,7 @@
/*
* Some BIOSes disable the local APIC in the
* APIC_BASE MSR. This can only be done in
- * software for Intel P6 and AMD K7 (Model > 1).
+ * software for Intel P6/P4 and AMD K7 (Model > 1).
*/
rdmsr(MSR_IA32_APICBASE, l, h);
if (!(l & MSR_IA32_APICBASE_ENABLE)) {

This should work (and is known to work on many P6 and K7 boards),
but your BIOS may have problems with the local APIC.
- does apm --suspend work? does the resume afterwards work?
- if you run something compute-intensive for a while, does it
continue working ok or does it hang suddenly?
If your box remained stable, great!

If it experienced problems like unexpected hangs, then we'll need to
prevent the local APIC from being enabled on this mainboard.
In this case, please apply the patch below, reconfigure without
local APIC support, rebuild and send me (not the list) the DMI strings
printed during boot -- or the entire boot log if you're not certain
which parts are the DMI strings.

/Mikael

--- linux-2.5.7/arch/i386/kernel/dmi_scan.c.~1~ Tue Mar 19 01:10:03 2002
+++ linux-2.5.7/arch/i386/kernel/dmi_scan.c Fri Apr 5 19:35:33 2002
@@ -21,8 +21,8 @@
u16 handle;
};

-#define dmi_printk(x)
-//#define dmi_printk(x) printk x
+//#define dmi_printk(x)
+#define dmi_printk(x) printk x

static char * __init dmi_string(struct dmi_header *dm, u8 s)
{

2002-04-08 13:49:08

by Chris Wilson

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting


Hi Mikael,

I've now applied your patch so that APIC's get enabled even though the
bios has disabled them.

I really should have saved the dmi messages to another machine when the
machine first came up as it appears to have crashed now :(

The machine was completely idle other than for an ssh session running
ntpdate every ten minutes (my original problem is that the clock appears
to stop/slow down periodically).

>From memory the dmi messages just had a bios date, and the mobo model
(Supermicro P4SBE). Once I've got the NOC to reboot the system I'll switch
it back to a kernel that doesn't attempt to enable the APIC and send you
the full dmi strings.

> This should work (and is known to work on many P6 and K7 boards),
> but your BIOS may have problems with the local APIC.
> - does apm --suspend work? does the resume afterwards work?

I don't have APM support compiled in and will have disabled anything power
management related in the BIOS setup.

I guess that I'm not going to get APIC going on this system then - does
anyone have any other suggestions on what to try next to debug the clock
drift problem? Short of dropping the server into the nearest skip....

Are the APIC thing and the clock thing likely to be related??


Cheers,

Chris

2002-04-09 09:33:19

by Chris Wilson

[permalink] [raw]
Subject: Re: P4/i845 Strange clock drifting



An update...

> I've now applied your patch so that APIC's get enabled even though the
> bios has disabled them.
>
> I really should have saved the dmi messages to another machine when the
> machine first came up as it appears to have crashed now :(
>
> The machine was completely idle other than for an ssh session running
> ntpdate every ten minutes (my original problem is that the clock appears
> to stop/slow down periodically).

Back without APIC now - and the machine is stable again. Clock is still
screwed however.

> > This should work (and is known to work on many P6 and K7 boards),
> > but your BIOS may have problems with the local APIC.
> > - does apm --suspend work? does the resume afterwards work?
>
> I don't have APM support compiled in and will have disabled anything power
> management related in the BIOS setup.

Last night I set another machine doing:

(while sleep 600; do ssh screwedbox ntpdate goodbox; done) | tee ntp.log

The results (attached) are interesting. It appears that my clock drift
problem occurs every hour. Perhaps it is APM related?? Load doesn't seem
to make any difference. And there's never two consecutive ntpdate's that
are a long way out - suggesting that whenever the problem occurs it's then
fixed temporarily when ntpdate sets the clock.

Anyone have any suggestions as to what to try next? Shall I try compiling
a kernel with APM support and see if I can turn off anything that might be
causing troubs? I'm *sure* I turned off any mention of power management in
the BIOS....

Regards,

Chris


Attachments:
ntp.log.gz (1.11 kB)