2007-11-29 16:20:04

by Paul Rolland

[permalink] [raw]
Subject: constant_tsc and TSC unstable

Hello,

I've a machine with a Core2Duo CPU. /proc/cpuinfo reports the flag
constant_tsc, but at boot time, I have the log :

...
Total of 2 processors activated (6919.15 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization [CPU#0 -> CPU#1]:
Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
Marking TSC unstable due to: check_tsc_sync_source failed.
Brought up 2 CPUs
...

This machine is running 2.6.23.1-21.fc7. I know I should report to Fedora,
but I was wondering if this is a bug or a feature ;)

Regards,
Paul


2007-11-29 19:04:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

Paul Rolland (ポール・ロラン) wrote:
> Hello,
>
> I've a machine with a Core2Duo CPU. /proc/cpuinfo reports the flag
> constant_tsc, but at boot time, I have the log :
>
> ...
> Total of 2 processors activated (6919.15 BogoMIPS).
> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> checking TSC synchronization [CPU#0 -> CPU#1]:
> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
> Marking TSC unstable due to: check_tsc_sync_source failed.
> Brought up 2 CPUs
> ...
>
> This machine is running 2.6.23.1-21.fc7. I know I should report to Fedora,
> but I was wondering if this is a bug or a feature ;)
>

constant_tsc means the TSC doesn't change with CPU core frequency.

The problem you're having is that the TSCs of your two cores are
completely different, over a second apart. This is a bug, unrelated to
constant_tsc.

-hpa



2007-11-29 21:27:06

by Michael Tokarev

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

H. Peter Anvin wrote:
> Paul Rolland (ポール・ロラン) wrote:
[]
>> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
>> Marking TSC unstable due to: check_tsc_sync_source failed.
[]
>> but I was wondering if this is a bug or a feature ;)

> The problem you're having is that the TSCs of your two cores are
> completely different, over a second apart. This is a bug, unrelated to
> constant_tsc.

A bug in where - in the CPU or in kernel?

The thing is that all our dual-core machines shows something like
that.

For example, dualcore PentiumD machine:
Nov 7 20:23:56 paltus kernel: Linux version 2.6.22-i686smp ([email protected]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #2.6.22.12 SMP Wed Nov 7 20:02:14 MSK 2007
...
Nov 7 20:23:56 paltus kernel: checking TSC synchronization [CPU#0 -> CPU#1]:
Nov 7 20:23:56 paltus kernel: Measured 112 cycles TSC warp between CPUs, turning off TSC clock.
Nov 7 20:23:56 paltus kernel: Marking TSC unstable due to: check_tsc_sync_source failed.
Nov 7 20:23:56 paltus kernel: Brought up 2 CPUs

(not that huge difference as Paul reported, but still "unstable".
The same happens with 2.6.23)

Note that once TSC is disabled (it's using "jiffies" as far
as I can see), ntpd constantly speeds up and slows down the
clock, it jumps +/- 0.5sec every several minutes or hours -
I guess that's when ntpd process gets moved from one core
to another for whatever reason. And an interesting thing
is that with 64bits kernel this TSC problem does not occur
on this very machine.

Something similar is reported on AMD X2 64 machines as well --
can't check right now.

Thanks.

/mjt

2007-11-29 21:37:01

by Patrick McHardy

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

H. Peter Anvin wrote:
> Paul Rolland (ポール・ロラン) wrote:
>> Hello,
>>
>> I've a machine with a Core2Duo CPU. /proc/cpuinfo reports the flag
>> constant_tsc, but at boot time, I have the log :
>>
>> ...
>> Total of 2 processors activated (6919.15 BogoMIPS).
>> ENABLING IO-APIC IRQs
>> ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
>> checking TSC synchronization [CPU#0 -> CPU#1]:
>> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
>> Marking TSC unstable due to: check_tsc_sync_source failed.
>> Brought up 2 CPUs
>> ...
>>
>> This machine is running 2.6.23.1-21.fc7. I know I should report to
>> Fedora,
>> but I was wondering if this is a bug or a feature ;)
>>
>
> constant_tsc means the TSC doesn't change with CPU core frequency.
>
> The problem you're having is that the TSCs of your two cores are
> completely different, over a second apart. This is a bug, unrelated
> to constant_tsc.

On my core2duo I noticed that the TSC of the second core doesn't
start counting until the CPU is activated. Since they're not
synchronized manually anymore this makes the TSC check fail.


2007-11-29 23:29:45

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: constant_tsc and TSC unstable



>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Paul
>Rolland (???$B!&(J???)
>Sent: Thursday, November 29, 2007 8:12 AM
>To: Linux Kernel
>Cc: [email protected]
>Subject: constant_tsc and TSC unstable
>
>Hello,
>
>I've a machine with a Core2Duo CPU. /proc/cpuinfo reports the flag
>constant_tsc, but at boot time, I have the log :
>
>...
>Total of 2 processors activated (6919.15 BogoMIPS).
>ENABLING IO-APIC IRQs
>..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
>checking TSC synchronization [CPU#0 -> CPU#1]:
>Measured 3978592228 cycles TSC warp between CPUs, turning off
>TSC clock.
>Marking TSC unstable due to: check_tsc_sync_source failed.
>Brought up 2 CPUs
>...
>
>This machine is running 2.6.23.1-21.fc7. I know I should
>report to Fedora,
>but I was wondering if this is a bug or a feature ;)
>

TSCs on Core 2 Duo are supposed to be in sync unless CPU supports deep idle states like C2, C3. Can you send the full /proc/cpuinfo and full dmesg.

Thanks,
Venki

2007-11-30 00:30:35

by Frans Pop

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

Paul Rolland wrote:
> Total of 2 processors activated (6919.15 BogoMIPS).
> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> checking TSC synchronization [CPU#0 -> CPU#1]:
> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
> Marking TSC unstable due to: check_tsc_sync_source failed.
> Brought up 2 CPUs
> ...

Not sure if this is related, but thought I'd contribute it anyway...

I've got a Pentium D system (dual core, single processor) and I on some
boots I get "Marking TSC unstable due to check_tsc_sync_source failed" with
some cycles warp between CPUs, while most boots are OK. This kind of
inconsistency seems more due to a failure in the kernel to deal with
differences between boots than with something inherent to the hardware.

I conclude that because basically I never have any problems with the system
once it has booted and the TSC has passed.

>From my kern.logs since Okt 26, I get the following data:
2.6.23+cfs: 2 passes
2.6.23.1: 1 pass; 1 failure (48 cycles warp)
2.6.24-rc1: 15 passes
2.6.24-rc2: 13 passes; 1 failure (8 cycles warp)
2.6.24-rc3: 5 passes; 3 failures (8, 8 and 16 cycles warp)

Note that this is not a new issue. For 2.6.21/2.6.23-RCx kernels I reported
similar data in http://lkml.org/lkml/2007/9/16/45.

Cheers,
FJP

2007-11-30 06:53:26

by Paul Rolland

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

Hello,

On Thu, 29 Nov 2007 15:29:49 -0800
"Pallipadi, Venkatesh" <[email protected]> wrote:



> TSCs on Core 2 Duo are supposed to be in sync unless CPU supports deep idle
> states like C2, C3. Can you send the full /proc/cpuinfo and full dmesg.
>
Sure I can...
[root@tux log]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
stepping : 2
cpu MHz : 800.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat ps
e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmo
n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3461.13
clflush size : 64

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
stepping : 2
cpu MHz : 800.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat ps
e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmo
n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3458.02
clflush size : 64

Regards,
Paul


Attachments:
(No filename) (1.80 kB)
dmesg (29.98 kB)
Download all attachments

2007-11-30 06:59:56

by Paul Rolland

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

Hello,

On Fri, 30 Nov 2007 00:26:47 +0300
Michael Tokarev <[email protected]> wrote:

> H. Peter Anvin wrote:
> > Paul Rolland (ポール・ロラン) wrote:
> []
> >> Measured 3978592228 cycles TSC warp between CPUs, turning off TSC clock.
> >> Marking TSC unstable due to: check_tsc_sync_source failed.
> []
> >> but I was wondering if this is a bug or a feature ;)
>
> > The problem you're having is that the TSCs of your two cores are
> > completely different, over a second apart. This is a bug, unrelated to
> > constant_tsc.
>
> A bug in where - in the CPU or in kernel?
Good question !

> The thing is that all our dual-core machines shows something like
> that.
>
> (not that huge difference as Paul reported, but still "unstable".
> The same happens with 2.6.23)
I've been checking my logs, and the difference is quite constant and
huge :
[root@tux log]# grep 'cycles TSC warp' messages*
messages:Nov 26 08:27:56 tux kernel: Measured 4078687691 cycles TSC warp between C
PUs, turning off TSC clock.
messages:Nov 26 17:21:21 tux kernel: Measured 3978592228 cycles TSC warp between C
PUs, turning off TSC clock.
messages.1:Nov 18 22:52:23 tux kernel: Measured 4063102940 cycles TSC warp between
CPUs, turning off TSC clock.
messages.1:Nov 19 07:19:02 tux kernel: Measured 4057192061 cycles TSC warp between
CPUs, turning off TSC clock.
messages.1:Nov 23 20:50:12 tux kernel: Measured 4064589321 cycles TSC warp between
CPUs, turning off TSC clock.
messages.2:Nov 12 08:06:44 tux kernel: Measured 4072130361 cycles TSC warp between
CPUs, turning off TSC clock.
messages.2:Nov 13 19:42:47 tux kernel: Measured 4049899451 cycles TSC warp between
CPUs, turning off TSC clock.
messages.2:Nov 17 09:27:22 tux kernel: Measured 4066629060 cycles TSC warp between
CPUs, turning off TSC clock.
messages.3:Nov 5 08:25:08 tux kernel: Measured 4086386109 cycles TSC warp between
CPUs, turning off TSC clock.
messages.3:Nov 8 13:07:08 tux kernel: Measured 4041945934 cycles TSC warp between
CPUs, turning off TSC clock.
messages.3:Nov 9 23:31:24 tux kernel: Measured 4092303059 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Oct 29 07:28:23 tux kernel: Measured 4096946373 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Oct 31 17:07:21 tux kernel: Measured 4046765372 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Oct 31 17:15:09 tux kernel: Measured 4039328228 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Oct 31 23:19:00 tux kernel: Measured 4069714246 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Nov 1 20:33:02 tux kernel: Measured 4088199726 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Nov 2 11:53:17 tux kernel: Measured 4079927527 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Nov 3 09:37:16 tux kernel: Measured 4071112656 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Nov 3 10:51:29 tux kernel: Measured 3986266219 cycles TSC warp between
CPUs, turning off TSC clock.
messages.4:Nov 4 18:14:56 tux kernel: Measured 4074214144 cycles TSC warp between
CPUs, turning off TSC clock.

> Note that once TSC is disabled (it's using "jiffies" as far
> as I can see), ntpd constantly speeds up and slows down the
> clock, it jumps +/- 0.5sec every several minutes or hours -
> I guess that's when ntpd process gets moved from one core
> to another for whatever reason. And an interesting thing
> is that with 64bits kernel this TSC problem does not occur
> on this very machine.
Hmmmm.... That could make it a problem related to kernel rather than CPU.

> Something similar is reported on AMD X2 64 machines as well --
> can't check right now.
If I recall correctly, issues with AMD X2 where related to TSC being
independant for each core and not constant (speed depending of C state).
But the reason I raise the issue is that the Core2 reports constant TSC,
so there is (IMHO) no reason for that.

Paul

2007-11-30 07:07:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: constant_tsc and TSC unstable

Paul Rolland (ポール・ロラン) wrote:
>
>> Note that once TSC is disabled (it's using "jiffies" as far
>> as I can see), ntpd constantly speeds up and slows down the
>> clock, it jumps +/- 0.5sec every several minutes or hours -
>> I guess that's when ntpd process gets moved from one core
>> to another for whatever reason. And an interesting thing
>> is that with 64bits kernel this TSC problem does not occur
>> on this very machine.
> Hmmmm.... That could make it a problem related to kernel rather than CPU.
>
>> Something similar is reported on AMD X2 64 machines as well --
>> can't check right now.
> If I recall correctly, issues with AMD X2 where related to TSC being
> independant for each core and not constant (speed depending of C state).
> But the reason I raise the issue is that the Core2 reports constant TSC,
> so there is (IMHO) no reason for that.
>

Well, "constant" doesn't mean "synchronized", but it might very well be
that the Core2 could really benefit from synchronizing the TSCs manually
like we used to.

On the other hand, I notice that most of the TSC warp values are
relatively close to 2^32, so this could be a specific bug.

-hpa

2007-12-06 00:36:29

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: constant_tsc and TSC unstable



>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Paul
>Rolland (???$B!&(J???)
>Sent: Thursday, November 29, 2007 10:53 PM
>To: Pallipadi, Venkatesh
>Cc: Linux Kernel; Siddha, Suresh B; [email protected]
>Subject: Re: constant_tsc and TSC unstable
>
>Hello,
>
>On Thu, 29 Nov 2007 15:29:49 -0800
>"Pallipadi, Venkatesh" <[email protected]> wrote:
>
>
>
>> TSCs on Core 2 Duo are supposed to be in sync unless CPU
>supports deep idle
>> states like C2, C3. Can you send the full /proc/cpuinfo and
>full dmesg.
>>
>Sure I can...
>[root@tux log]# cat /proc/cpuinfo
>processor : 0
>vendor_id : GenuineIntel
>cpu family : 6
>model : 15
>model name : Intel(R) Core(TM)2 CPU T5300 @ 1.73GHz
>stepping : 2
>cpu MHz : 800.000
>cache size : 2048 KB
>physical id : 0
>siblings : 2
>core id : 0
>cpu cores : 2
>fdiv_bug : no
>hlt_bug : no
>f00f_bug : no
>coma_bug : no
>fpu : yes
>fpu_exception : yes
>cpuid level : 10
>wp : yes
>flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr
>pge mca cmov pat ps
>e36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
>constant_tsc arch_perfmo
>n pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
>bogomips : 3461.13
>clflush size : 64
>

Tried reproducing this here, but on a similar system (slightly newer CPU stepping) I don$B!G(Jt see this happening. This error does not matter on this particular system as even with TSC synchronization passes, TSC is going to be disabled later due to C2, C3 states, with a message like this
Marking TSC unstable due to: TSC halts in idle.

Will try to reproduce it on other systems to see whether there are any bugs in sync routines.

Thanks,
Venki