2006-10-27 17:15:10

by Lee Revell

[permalink] [raw]
Subject: AMD X2 unsynced TSC fix?

Someone recently pointed out to me that a Windows "CPU driver update"
supplied by AMD fixes the unsynced TSC problem on dual core AMD64
systems.

http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_13118,00.html

"The AMD Dual-Core Optimizer can help improve some PC gaming video
performance by compensating for those applications that bypass the
Windows API for timing by directly using the RDTSC (Read Time Stamp
Counter) instruction. Applications that rely on RDTSC do not benefit
from the logic in the operating system to properly account for the
affect of power management mechanisms on the rate at which a processor
core's Time Stamp Counter (TSC) is incremented. The AMD Dual-Core
Optimizer helps to correct the resulting video performance effects or
other incorrect timing effects that these applications may experience on
dual-core processor systems, by periodically adjusting the core
time-stamp-counters, so that they are synchronized."

What are the chances of Linux getting a similar fix?

Lee


2006-10-27 20:18:09

by Luca Tettamanti

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

Lee Revell <[email protected]> ha scritto:
> Someone recently pointed out to me that a Windows "CPU driver update"
> supplied by AMD fixes the unsynced TSC problem on dual core AMD64
> systems.
[...]
> other incorrect timing effects that these applications may experience on
> dual-core processor systems, by periodically adjusting the core
> time-stamp-counters, so that they are synchronized."
>
> What are the chances of Linux getting a similar fix?

Zero? ;)
There's always a window where the TSCs are not in sync (and userspace may
see a non-monotonic counter); furthermore when C'n'Q is active TSCs
aren't updated at a fixed frequency, userspace cannot use TSC for timing
anyway.


Luca
--
> While we're on all of this, are we going to change "tained" to some
> other less alarmist word?
"screwed" -- Alexander Viro

2006-10-27 20:35:54

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?


> What are the chances of Linux getting a similar fix?

Fix isn't the right word i would use for this particular implementation.

-Andi

2006-10-27 20:41:23

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 2006-10-27 at 13:35 -0700, Andi Kleen wrote:
> > What are the chances of Linux getting a similar fix?
>
> Fix isn't the right word i would use for this particular implementation.

What exactly does that AMD patch do? Other OS users report that it
makes TSC usable for timing again. Does it do something really heavy
handed like disable power management features?

Lee

2006-10-27 21:48:17

by Chris Friesen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

Lee Revell wrote:

> What exactly does that AMD patch do?

"...by periodically adjusting the core time-stamp-counters, so that they
are synchronized."

It sounds like they just periodically write a new value to the TSC.
Presumably they set the "slower" one equal to the "faster" one.

You'd likely still have windows where time might run backwards, but it
would be better than nothing.

Chris

2006-10-27 21:58:47

by Friedrich Göpel

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On 13:15 Fri 27 Oct , Lee Revell wrote:
> Someone recently pointed out to me that a Windows "CPU driver update"
> supplied by AMD fixes the unsynced TSC problem on dual core AMD64
> systems.
>
...
> What are the chances of Linux getting a similar fix?
>
> Lee
>

Hi,

This post earlier seems to suggest someone is indeed working on
something similar, if I'm understanding this correctly:
http://lkml.org/lkml/2006/10/27/27

quote:
> Jiri Bohac ([email protected]) is currently working on a new timekeeping code for
> x86-64 that takes a significantly different approach that allows for
> precise and fast gettimeofday even on CPUs with unsynchronized TSCs.

Cheers,

Friedrich G?pel

2006-10-27 22:08:22

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 2006-10-27 at 15:48 -0600, Chris Friesen wrote:
> Lee Revell wrote:
>
> > What exactly does that AMD patch do?
>
> "...by periodically adjusting the core time-stamp-counters, so that they
> are synchronized."
>
> It sounds like they just periodically write a new value to the TSC.
> Presumably they set the "slower" one equal to the "faster" one.
>
> You'd likely still have windows where time might run backwards, but it
> would be better than nothing.

The patch also apparently changes boot params to make the OS use the
ACPI PM timer, so it must not be a complete solution.

Lee

2006-10-27 23:05:05

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, Oct 27, 2006 at 10:18:20PM +0200, Luca Tettamanti wrote:
> Lee Revell <[email protected]> ha scritto:
> > Someone recently pointed out to me that a Windows "CPU driver update"
> > supplied by AMD fixes the unsynced TSC problem on dual core AMD64
> > systems.
> [...]
> > other incorrect timing effects that these applications may experience on
> > dual-core processor systems, by periodically adjusting the core
> > time-stamp-counters, so that they are synchronized."
> >
> > What are the chances of Linux getting a similar fix?
>
> Zero? ;)

Wrong. We have a fix that has been under serious testing for a long time.

> There's always a window where the TSCs are not in sync (and userspace may
> see a non-monotonic counter); furthermore when C'n'Q is active TSCs
> aren't updated at a fixed frequency, userspace cannot use TSC for timing
> anyway.

Wrong, too. We have a patch that will be coming SOON (trust me, I am
pushing hard for the author to publish it). With this patch applied you
should never see the TSC go backwards. Period. It should be monotonic
(to userspace, kernel rdtsc calls can still be wrong). CPUs should stay
very nearly in sync (again, to userspace). The overhead of this patch is
pretty minimal and costs nothing unless you actually read the TSC.

The catch is that, while it is monotonic, it is not guaranteed to be
perfectly linear. For many applications, this will be good enough. Time
will always move forward, and you won't be subject to the weird HZ
granularity gettimeofday that unsynced TSCs can show.

I'm BCCing the author to poke him more publicly.

Tim

2006-10-28 00:00:14

by Luca Tettamanti

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On 10/28/06, [email protected] <[email protected]> wrote:
> On Fri, Oct 27, 2006 at 10:18:20PM +0200, Luca Tettamanti wrote:
> > There's always a window where the TSCs are not in sync (and userspace may
> > see a non-monotonic counter); furthermore when C'n'Q is active TSCs
> > aren't updated at a fixed frequency, userspace cannot use TSC for timing
> > anyway.
>
> Wrong, too. We have a patch that will be coming SOON (trust me, I am
> pushing hard for the author to publish it). With this patch applied you
> should never see the TSC go backwards. Period. It should be monotonic
> (to userspace, kernel rdtsc calls can still be wrong). CPUs should stay
> very nearly in sync (again, to userspace). The overhead of this patch is
> pretty minimal and costs nothing unless you actually read the TSC.

I know that's it's possible to resync the TSCs, but:

> The catch is that, while it is monotonic, it is not guaranteed to be
> perfectly linear. For many applications, this will be good enough. Time
> will always move forward, and you won't be subject to the weird HZ
> granularity gettimeofday that unsynced TSCs can show.

As you say you cannot use it to do timing unless you disable any power
management on the CPU. Otherwise you can count the elapsed ticks but
you cannot convert the number to anything meaningful.
You may be able to emulate rdtsc for userspace but then again the
whole point of using rdtsc is that it should be uber-fast... if rdtsc
is emulated then you can just use gettimeofday (which is also
optimized to be *very* fast). No?

Luca

2006-10-28 00:17:36

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, 2006-10-28 at 02:00 +0200, Luca Tettamanti wrote:
>
> As you say you cannot use it to do timing unless you disable any power
> management on the CPU. Otherwise you can count the elapsed ticks but
> you cannot convert the number to anything meaningful.
> You may be able to emulate rdtsc for userspace but then again the
> whole point of using rdtsc is that it should be uber-fast... if rdtsc
> is emulated then you can just use gettimeofday (which is also
> optimized to be *very* fast). No?

gettimeofday() cannot be fast if it has to use the ACPI PM timer. It's
50% slower on my shiny new "AMD Athlon(tm)64 X2 Dual Core Processor
3800+" than on my 600Mhz Via C3, which in general is about a 10x slower
machine. That's a massive regression.

Lee

2006-10-28 01:05:23

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?


> Wrong, too. We have a patch that will be coming SOON (trust me, I am
> pushing hard for the author to publish it). With this patch applied you
> should never see the TSC go backwards. Period. It should be monotonic
> (to userspace, kernel rdtsc calls can still be wrong). CPUs should stay
> very nearly in sync (again, to userspace). The Thoverhead of this patch is
> pretty minimal and costs nothing unless you actually read the TSC.

There is another patch in the pipeline to make gettimeofday use
RDTSC in more cases by keeping the offsets per CPU

(this has nothing to do with syncing TSCs which is not possible
in the general case on several platforms)

I don't think it makes too much sense to hack on pure RDTSC when
gtod is fast enough -- RDTSC will be always icky and hard to use.

-Andi

2006-10-28 02:46:45

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 02:00:11AM +0200, Luca Tettamanti wrote:

> I know that's it's possible to resync the TSCs, but:
>
> >The catch is that, while it is monotonic, it is not guaranteed to be
> >perfectly linear. For many applications, this will be good enough. Time
> >will always move forward, and you won't be subject to the weird HZ
> >granularity gettimeofday that unsynced TSCs can show.
>
> As you say you cannot use it to do timing unless you disable any power
> management on the CPU. Otherwise you can count the elapsed ticks but
> you cannot convert the number to anything meaningful.

I fyou have a third-party clock you can get pretty darn close.
Fortunately, we usually have an HPET, these days. You can definitely
resync and get near-linear values of RDTSC.

> You may be able to emulate rdtsc for userspace but then again the
> whole point of using rdtsc is that it should be uber-fast... if rdtsc
> is emulated then you can just use gettimeofday (which is also
> optimized to be *very* fast). No?

We're not emulating it at all. The vast vast vast majority of rdtsc calls
are nothing more than the RDTSC instruction. RDTSC is faster than
gettimeofday(), necessarily. If gettimeofday() uses RDTSC, then the
gettimeofday() vsyscall will be pretty good.

But, if I recall, i386 does not support vsyscall? 32 bit binaries on
x86_64 does not support vsyscall. There is still a need for very fast
pure RDTSC.

There are few problems at hand. I'm not familiar with the patch Andi's
talking about but it has to solve all these problems to be really useful:

* TSC skew across CPUs at bootup (Linux handles this already)
* TSC drift across CPUs at the "same" frequency (pretty constant, minimal)
* TSC drift because of PM states, such as C1 (hlt) (semi-random, severe)

Anyway, I hope that all solutions will be considered. And I hope this
patch comes soon.

Tim

2006-10-28 03:28:00

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
> I don't think it makes too much sense to hack on pure RDTSC when
> gtod is fast enough -- RDTSC will be always icky and hard to use.

I agree FWIW, our application would be happy to just use gtod if it
wasn't so slow on these machines.

Lee

2006-10-28 03:59:44

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Friday 27 October 2006 19:46, [email protected] wrote:
> On Sat, Oct 28, 2006 at 02:00:11AM +0200, Luca Tettamanti wrote:
> > I know that's it's possible to resync the TSCs, but:
> > >The catch is that, while it is monotonic, it is not guaranteed to be
> > >perfectly linear. For many applications, this will be good enough.
> > > Time will always move forward, and you won't be subject to the weird HZ
> > > granularity gettimeofday that unsynced TSCs can show.
> >
> > As you say you cannot use it to do timing unless you disable any power
> > management on the CPU. Otherwise you can count the elapsed ticks but
> > you cannot convert the number to anything meaningful.
>
> I fyou have a third-party clock you can get pretty darn close.

Not when powernow is involved on a multi socket system.

This means it could be probably gotten to work on a variety of systems,
but it wouldn't work on other systems because of that and I don't
think it makes sense to try to fix an interface that will never
work everywhere.

> Fortunately, we usually have an HPET, these days. You can definitely
> resync and get near-linear values of RDTSC.

No we don't -- most BIOS still don't give us the HPET table
even when it is there in hardware. In the future this will change sure
but people will still run a lot of older motherboards.

> > You may be able to emulate rdtsc for userspace but then again the
> > whole point of using rdtsc is that it should be uber-fast... if rdtsc
> > is emulated then you can just use gettimeofday (which is also
> > optimized to be *very* fast). No?
>
> We're not emulating it at all. The vast vast vast majority of rdtsc calls
> are nothing more than the RDTSC instruction.> RDTSC is faster than
> gettimeofday(), necessarily. If gettimeofday() uses RDTSC, then the
> gettimeofday() vsyscall will be pretty good.

Yes.

> But, if I recall, i386 does not support vsyscall?

There are ways to make it work there.

> 32 bit binaries on
> x86_64 does not support vsyscall.

And here too.

Basically you have to test for the calls in the system call vDSO
and jump off. It's a little ugly but possible. I think John had experimental
patches for this once.

> There are few problems at hand. I'm not familiar with the patch Andi's
> talking about but it has to solve all these problems to be really useful:

It's from Jiri and Vojtech. Basically it will allow to use RDTSC
in gettimeofday even with unsynchronized TSCs by keeping
the necessary offsets CPU local.

Drawback: for vsyscall you need RDTSCP, this means AMD F stepping
at least. But even as a syscall it will be still faster than before.

> * TSC skew across CPUs at bootup (Linux handles this already)

Just not very good. There is still a significant error when it's done.

> * TSC drift across CPUs at the "same" frequency (pretty constant, minimal)

It just adds up over time.

> * TSC drift because of PM states, such as C1 (hlt) (semi-random, severe)

TSC drift with powernow -- CPUs run at different frequencies

-Andi

2006-10-28 03:58:31

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 2006-10-27 at 18:08 -0400, Lee Revell wrote:
> On Fri, 2006-10-27 at 15:48 -0600, Chris Friesen wrote:
> > Lee Revell wrote:
> >
> > > What exactly does that AMD patch do?
> >
> > "...by periodically adjusting the core time-stamp-counters, so that they
> > are synchronized."
> >
> > It sounds like they just periodically write a new value to the TSC.
> > Presumably they set the "slower" one equal to the "faster" one.
> >
> > You'd likely still have windows where time might run backwards, but it
> > would be better than nothing.
>
> The patch also apparently changes boot params to make the OS use the
> ACPI PM timer, so it must not be a complete solution.

Hi,
So far, has I can understand. Seems to me that my computer which have a
Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
with the patch of hrtimers on
( http://www.tglx.de/projects/hrtimers/2.6.18/ )
Kernel found and use a new clocksource, the acpi_pm. And works stable
but I don't deny that could be a little slower.

Just to point out. This could be more a problem of chipsets than CPUs
(AMD or Intel). AMD just begin first using x86_64 archs :)

Last Note:
I still have other minor problem, seems (to me) related with SATA
drives. Kernel 2.4.19-rc3 have big changes on SATA and I like to test it
but can't apply hrtimers patch (I don't understand half seems in kernel
other half not).
In rc3 with jiffies clocksource even with boot parameter "notsc" I have
unsynchronized issues and many "Lost timer tickets", but I can say that
is a regression because computer never work well.

Thanks,
--
Sérgio M. B.

2006-10-28 04:06:43

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?


> So far, has I can understand. Seems to me that my computer which have a
> Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> with the patch of hrtimers on

Intel systems (except for some large highend systems) have synchronized TSCs.
Only exception so far seems to be a few systems that are
overclocked/overvolted and running outside their specification.
When you do that you'e on your own and we're not interested in a bug
report.

There was also one BIOS found that had this problem, but it was old and rare
and got fixed with a upgrade.

> Just to point out. This could be more a problem of chipsets than CPUs
> (AMD or Intel). AMD just begin first using x86_64 archs :)

No.

-Andi

2006-10-28 04:22:57

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 2006-10-27 at 21:06 -0700, Andi Kleen wrote:
> > So far, has I can understand. Seems to me that my computer which have a
> > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > with the patch of hrtimers on
>
> Intel systems (except for some large highend systems) have synchronized TSCs.
> Only exception so far seems to be a few systems that are
> overclocked/overvolted and running outside their specification.
> When you do that you'e on your own and we're not interested in a bug
> report.

and my computer :)
http://www.asrock.com/product/775Dual-880Pro.htm
http://www.asrock.com/support/CPU_Support/show.asp?Model=775Dual-880Pro
Monday I will checkout if my computer is under specs.
Seems that I like buy computers with many problems on Linux and fix :)

> There was also one BIOS found that had this problem, but it was old and rare
> and got fixed with a upgrade.
>
> > Just to point out. This could be more a problem of chipsets than CPUs
> > (AMD or Intel). AMD just begin first using x86_64 archs :)
>
> No.
>
> -Andi

2006-10-28 05:34:34

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, Oct 27, 2006 at 11:28:00PM -0400, Lee Revell wrote:
> On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
> > I don't think it makes too much sense to hack on pure RDTSC when
> > gtod is fast enough -- RDTSC will be always icky and hard to use.
>
> I agree FWIW, our application would be happy to just use gtod if it
> wasn't so slow on these machines.

Agreed, I had to turn about 20 dual-core servers to single core because
the only way to get a monotonic gtod made it so slow that it was not
worth using a dual-core. I initially considered buying one dual-core
AMD for my own use, but after seeing this, I'm definitely sure I won't
ever buy one as long as this problem is not fixed, as it causes too
many problems.

Willy

2006-10-28 06:32:22

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, Oct 27, 2006 at 08:59:13PM -0700, Andi Kleen wrote:
> > I fyou have a third-party clock you can get pretty darn close.
>
> Not when powernow is involved on a multi socket system.

When CPUs are in different P-States, any resync effort will become
unsynced immediately. I agree with that. This is a further complication
that I think our code does not handle perfectly, yet.

> > Fortunately, we usually have an HPET, these days. You can definitely
> > resync and get near-linear values of RDTSC.
>
> No we don't -- most BIOS still don't give us the HPET table
> even when it is there in hardware. In the future this will change sure
> but people will still run a lot of older motherboards.

If you know where the HPET base-address-register is, can't we program it
ourselves? Even without HPET, we have PM Timer. As long as you don't
need to resync the TSCs on most gtod(), you can still do better than not
trying.

> > There are few problems at hand. I'm not familiar with the patch Andi's
> > talking about but it has to solve all these problems to be really useful:
>
> It's from Jiri and Vojtech. Basically it will allow to use RDTSC
> in gettimeofday even with unsynchronized TSCs by keeping
> the necessary offsets CPU local.

Offset from what? With automatic clock ramping in C1, the rate is
cycling up and down a lot.

> > * TSC drift because of PM states, such as C1 (hlt) (semi-random, severe)
>
> TSC drift with powernow -- CPUs run at different frequencies

Yeah, C1 is workaround-able, because the clock returns to full frequency,
and we never execute code in the reduced clock state. Powernow makes it
more fun. Not only do you need some offset, but you need some scalar.

Assume you resync TSCs to a clock (PM, HPET, whatever) any time any CPU
changes p-state. Then you can calculate the approximate TSC for now by:

tsc_now = tsc_at_last_resync + ((rdtsc - tsc_at_last_resync) * pstate_scalar)

Something like that. Not pretty, but still possible to get close. And
close might be good enough. As long as you can guarantee monotonicity and
approximate linearity, you can make most apps happy ENOUGH.

Tim

2006-10-28 06:35:33

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, Oct 27, 2006 at 09:06:12PM -0700, Andi Kleen wrote:
>
> > So far, has I can understand. Seems to me that my computer which have a
> > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > with the patch of hrtimers on
>
> Intel systems (except for some large highend systems) have synchronized TSCs.

Does Intel guarantee that, or is that just what we happen to see, so far.

2006-10-28 06:47:46

by Andrew Morton

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 27 Oct 2006 23:35:24 -0700
[email protected] wrote:

> On Fri, Oct 27, 2006 at 09:06:12PM -0700, Andi Kleen wrote:
> >
> > > So far, has I can understand. Seems to me that my computer which have a
> > > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > > with the patch of hrtimers on
> >
> > Intel systems (except for some large highend systems) have synchronized TSCs.
>
> Does Intel guarantee that, or is that just what we happen to see, so far.

Matthias has a Xeon machine on which the TSCs are unsynced, and which are
unsyncable - write_tsc() just doesn't do anything. See thread at
http://lkml.org/lkml/2006/7/22/104

2006-10-28 06:49:34

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, Oct 27, 2006 at 11:46:15PM -0700, Andrew Morton wrote:
> On Fri, 27 Oct 2006 23:35:24 -0700
> [email protected] wrote:
>
> > On Fri, Oct 27, 2006 at 09:06:12PM -0700, Andi Kleen wrote:
> > >
> > > > So far, has I can understand. Seems to me that my computer which have a
> > > > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > > > with the patch of hrtimers on
> > >
> > > Intel systems (except for some large highend systems) have synchronized TSCs.
> >
> > Does Intel guarantee that, or is that just what we happen to see, so far.
>
> Matthias has a Xeon machine on which the TSCs are unsynced, and which are
> unsyncable - write_tsc() just doesn't do anything. See thread at
> http://lkml.org/lkml/2006/7/22/104

Nothing at all, or just the the low few bits are writeable? I had heard,
but never seen that some Intel CPUs only allowed 16 bits of writable bits
in the TSC MSR. I also heard of, but never saw, CPUs that cleared the TSC
to 0 on a write!

2006-10-28 07:14:09

by Andrew Morton

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 27 Oct 2006 23:49:24 -0700
[email protected] wrote:

> On Fri, Oct 27, 2006 at 11:46:15PM -0700, Andrew Morton wrote:
> > On Fri, 27 Oct 2006 23:35:24 -0700
> > [email protected] wrote:
> >
> > > On Fri, Oct 27, 2006 at 09:06:12PM -0700, Andi Kleen wrote:
> > > >
> > > > > So far, has I can understand. Seems to me that my computer which have a
> > > > > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > > > > with the patch of hrtimers on
> > > >
> > > > Intel systems (except for some large highend systems) have synchronized TSCs.
> > >
> > > Does Intel guarantee that, or is that just what we happen to see, so far.
> >
> > Matthias has a Xeon machine on which the TSCs are unsynced, and which are
> > unsyncable - write_tsc() just doesn't do anything. See thread at
> > http://lkml.org/lkml/2006/7/22/104
>
> Nothing at all, or just the the low few bits are writeable?

We don't know - the tsc sync code doesn't remeasure the errors after "correcting"
them.

2006-10-28 07:25:30

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 12:13:16AM -0700, Andrew Morton wrote:
> > > http://lkml.org/lkml/2006/7/22/104
> >
> > Nothing at all, or just the the low few bits are writeable?
>
> We don't know - the tsc sync code doesn't remeasure the errors after "correcting"
> them.

I read the thread. Just as a challenge, I'd love to poke at such a
system, but I doubt very much I'll get the chance :)

Tim

2006-10-28 09:16:26

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, Oct 27, 2006 at 08:59:13PM -0700, Andi Kleen wrote:

> > There are few problems at hand. I'm not familiar with the patch Andi's
> > talking about but it has to solve all these problems to be really useful:
>
> It's from Jiri and Vojtech. Basically it will allow to use RDTSC
> in gettimeofday even with unsynchronized TSCs by keeping
> the necessary offsets CPU local.
>
> Drawback: for vsyscall you need RDTSCP, this means AMD F stepping
> at least. But even as a syscall it will be still faster than before.
>
> > * TSC skew across CPUs at bootup (Linux handles this already)
>
> Just not very good. There is still a significant error when it's done.
>
> > * TSC drift across CPUs at the "same" frequency (pretty constant, minimal)
>
> It just adds up over time.
>
> > * TSC drift because of PM states, such as C1 (hlt) (semi-random, severe)
>
> TSC drift with powernow -- CPUs run at different frequencies

And the patch does exactly that.

It doesn't assume much about TSCs, except that they're individually
monotonic and that without a warning (cpufreq notifier, c1 state
enter/leave) the frequency doesn't change quickly. Slow frequency drift
(spread spectrum modulation, thermal effects on Xtal) is compensated for.

We still are testing the patch and fixing the issues we find, currently
with our cpufreq handling, but I believe we're on a good way to have it
working well.

--
Vojtech Pavlik
Director SuSE Labs

2006-10-28 09:52:07

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?


> Nothing at all, or just the the low few bits are writeable? I had heard,
> but never seen that some Intel CPUs only allowed 16 bits of writable bits
> in the TSC MSR. I also heard of, but never saw, CPUs that cleared the TSC
> to 0 on a write!

Normally on Intel you can only write the first 32bits

-Andi

2006-10-28 09:52:00

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Friday 27 October 2006 23:46, Andrew Morton wrote:
> On Fri, 27 Oct 2006 23:35:24 -0700
>
> [email protected] wrote:
> > On Fri, Oct 27, 2006 at 09:06:12PM -0700, Andi Kleen wrote:
> > > > So far, has I can understand. Seems to me that my computer which have
> > > > a Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC
> > > > and with the patch of hrtimers on
> > >
> > > Intel systems (except for some large highend systems) have synchronized
> > > TSCs.
> >
> > Does Intel guarantee that, or is that just what we happen to see, so far.
>
> Matthias has a Xeon machine on which the TSCs are unsynced, and which are
> unsyncable - write_tsc() just doesn't do anything. See thread at
> http://lkml.org/lkml/2006/7/22/104

That is a clear BIOS bug (FSBs are programmed incorrectly) and doesn't seem to
be common. In fact the BIOS bug is so bad that it's surprising the system
works at all.

-Andi

2006-10-28 09:52:24

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?


> Does Intel guarantee that, or is that just what we happen to see, so far.

I don't think it's architecturally guaranteed no.

-Andi

2006-10-28 18:08:31

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, 2006-10-28 at 07:28 +0200, Willy Tarreau wrote:
> On Fri, Oct 27, 2006 at 11:28:00PM -0400, Lee Revell wrote:
> > On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
> > > I don't think it makes too much sense to hack on pure RDTSC when
> > > gtod is fast enough -- RDTSC will be always icky and hard to use.
> >
> > I agree FWIW, our application would be happy to just use gtod if it
> > wasn't so slow on these machines.
>
> Agreed, I had to turn about 20 dual-core servers to single core because
> the only way to get a monotonic gtod made it so slow that it was not
> worth using a dual-core. I initially considered buying one dual-core
> AMD for my own use, but after seeing this, I'm definitely sure I won't
> ever buy one as long as this problem is not fixed, as it causes too
> many problems.

Does anyone know if the problem will really be fixed in new CPUs, as AMD
promised a year or so ago?

http://lkml.org/lkml/2005/11/4/173

Since that post, there has been Socket F and AM2 which apparently have
the same issue.

Were the AMD guys just blowing smoke?

Lee


2006-10-28 18:22:09

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 2006-10-27 at 20:59 -0700, Andi Kleen wrote:
> > Fortunately, we usually have an HPET, these days. You can
> definitely
> > resync and get near-linear values of RDTSC.
>
> No we don't -- most BIOS still don't give us the HPET table
> even when it is there in hardware. In the future this will change sure
> but people will still run a lot of older motherboards.

I have exactly such a system (see thread "x86-64 with nvidia MCP51
chipset: kernel does not find HPET"). Is there anything at all I can do
to make the kernel see the HPET? Can I try to guess the address? BIOS
upgrade?

Lee

2006-10-28 18:37:58

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Friday 27 October 2006 22:28, Willy Tarreau wrote:
> On Fri, Oct 27, 2006 at 11:28:00PM -0400, Lee Revell wrote:
> > On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
> > > I don't think it makes too much sense to hack on pure RDTSC when
> > > gtod is fast enough -- RDTSC will be always icky and hard to use.
> >
> > I agree FWIW, our application would be happy to just use gtod if it
> > wasn't so slow on these machines.
>
> Agreed, I had to turn about 20 dual-core servers to single core because
> the only way to get a monotonic gtod made it so slow that it was not
> worth using a dual-core.

Curious - what workload was that?

While gtod is time critical and often appears high on profile lists it is
normally not as time critical as you're claiming it is; especially not
time critical enough to warrant such radical action.

> I initially considered buying one dual-core
> AMD for my own use, but after seeing this, I'm definitely sure I won't
> ever buy one as long as this problem is not fixed, as it causes too
> many problems.

It's somewhat slower, but I'm not sure what "too many problems" you're
refering to.

-Andi

2006-10-28 19:15:00

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 02:08:34PM -0400, Lee Revell wrote:
> Does anyone know if the problem will really be fixed in new CPUs, as AMD
> promised a year or so ago?
>
> http://lkml.org/lkml/2005/11/4/173
>
> Since that post, there has been Socket F and AM2 which apparently have
> the same issue.
>
> Were the AMD guys just blowing smoke?

I think it is coming, but still not here yet.

2006-10-28 19:15:34

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 11:37:22AM -0700, Andi Kleen wrote:
> On Friday 27 October 2006 22:28, Willy Tarreau wrote:
> > On Fri, Oct 27, 2006 at 11:28:00PM -0400, Lee Revell wrote:
> > > On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
> > > > I don't think it makes too much sense to hack on pure RDTSC when
> > > > gtod is fast enough -- RDTSC will be always icky and hard to use.
> > >
> > > I agree FWIW, our application would be happy to just use gtod if it
> > > wasn't so slow on these machines.
> >
> > Agreed, I had to turn about 20 dual-core servers to single core because
> > the only way to get a monotonic gtod made it so slow that it was not
> > worth using a dual-core.
>
> Curious - what workload was that?

Two different but related workloads :
- load balancer doing between 10 and 100k gtod per second on a sun
x2100 under RHEL 3. HPET was not available and the only way I found
to get monotonic clock was to use the APIC timer IIRC (it was more
than 6 months ago, so sorry if I don't remember about all the details).

- network sniffer that I tried to tune to get the highest possible packet
rates on gigabit ethernet.

> While gtod is time critical and often appears high on profile lists it is
> normally not as time critical as you're claiming it is; especially not
> time critical enough to warrant such radical action.

Yes it was, because the small gain of using a dual core with such
a workload was clearly lost by that change. IIRC, I reached 25000
sessions/s on dual core with TSC if I didn't care about the clock,
20000 without TSC, and 18000 on single core+TSC. But with the sniffer,
it was even worse : I had 500 kpps in dual-core+TSC, 70kpps without
TSC and 300 kpps with single-core+TSC. Since I had to buy the same
machines for both uses, this last argument was enough for me to stick
to a single core.

> > I initially considered buying one dual-core
> > AMD for my own use, but after seeing this, I'm definitely sure I won't
> > ever buy one as long as this problem is not fixed, as it causes too
> > many problems.
>
> It's somewhat slower, but I'm not sure what "too many problems" you're
> refering to.

Anticipated or delayed timeouts on the proxy, time measurement errors
(when the logs show that a session finishes before it begins, there's
a real problem, particularly because we use those logs for troubleshooting).
And for the sniffer, getting wrong times by about 2s was a real problem too.
I would have preferred to get something monotonic with little accuracy than
out of order packets !

This is definitely a design problem on those chips, probably because
marketting targets gamers only. And that's very sad, because they are
excellent processors !

> -Andi

Regards,
Willy

2006-10-28 19:18:06

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 09:15:15PM +0200, Willy Tarreau wrote:
> > While gtod is time critical and often appears high on profile lists it is
> > normally not as time critical as you're claiming it is; especially not
> > time critical enough to warrant such radical action.
>
> Yes it was, because the small gain of using a dual core with such
> a workload was clearly lost by that change. IIRC, I reached 25000
> sessions/s on dual core with TSC if I didn't care about the clock,
> 20000 without TSC, and 18000 on single core+TSC. But with the sniffer,
> it was even worse : I had 500 kpps in dual-core+TSC, 70kpps without
> TSC and 300 kpps with single-core+TSC. Since I had to buy the same
> machines for both uses, this last argument was enough for me to stick
> to a single core.

Was the problem that they were not synced at poweron or that they would
drift due to power-states?

Did you try running with idle=poll, to avoid ever entering C1 state (hlt)?

2006-10-28 19:34:00

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Saturday 28 October 2006 12:15, Willy Tarreau wrote:

> Yes it was, because the small gain of using a dual core with such
> a workload was clearly lost by that change. IIRC, I reached 25000
> sessions/s on dual core with TSC if I didn't care about the clock,
> 20000 without TSC, and 18000 on single core+TSC. But with the sniffer,
> it was even worse : I had 500 kpps in dual-core+TSC, 70kpps without
> TSC and 300 kpps with single-core+TSC. Since I had to buy the same
> machines for both uses, this last argument was enough for me to stick
> to a single core.

Ok, but it is a very specialized situation not applicable to most
others. I just say this for all the other people following the thread.
Again most workloads are not that gtod intensive.

BTW if you don't use powernow and don't use blades with thermal clock ramping
and use idle=poll then the TSCs should be synchronized on AMD dual core
and TSC gtod can be used. But it will burn a lot of power and make the system
run very hot.

>
> > > I initially considered buying one dual-core
> > > AMD for my own use, but after seeing this, I'm definitely sure I won't
> > > ever buy one as long as this problem is not fixed, as it causes too
> > > many problems.
> >
> > It's somewhat slower, but I'm not sure what "too many problems" you're
> > refering to.
>
> Anticipated or delayed timeouts on the proxy, time measurement errors
> (when the logs show that a session finishes before it begins, there's
> a real problem, particularly because we use those logs for
> troubleshooting). And for the sniffer, getting wrong times by about 2s was
> a real problem too. I would have preferred to get something monotonic with
> little accuracy than out of order packets !

Ah you mean you forced the kernel to use a unsynchronized TSC
for gtod during your tuning attempts and then discovered that it didn't work?
Call me surprised.

In the default configuration there shouldn't be any problems
like this, it will just run slower because the kernel falls back to a slower
time source.

> This is definitely a design problem on those chips, probably because
> marketting targets gamers only.

Last time I checked Dual core Opterons weren't marketed to gamers.

> And that's very sad, because they are
> excellent processors !

Lots of various parties are to blame here, not just AMD.

The BIOS vendors for not exposing HPET even when it is available in the
hardware. While HPET is slower than TSC too it definitely isn't nearly as
slow as pmtimer.

Possibly the Linux people for not getting per CPU TSC going quicker.

The writers of software who uses gtod too often or force the kernel
to call it for each packet by carelessly using the timestamp ioctl.

-Andi

2006-10-28 19:37:31

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 12:18:00PM -0700, [email protected] wrote:
> On Sat, Oct 28, 2006 at 09:15:15PM +0200, Willy Tarreau wrote:
> > > While gtod is time critical and often appears high on profile lists it is
> > > normally not as time critical as you're claiming it is; especially not
> > > time critical enough to warrant such radical action.
> >
> > Yes it was, because the small gain of using a dual core with such
> > a workload was clearly lost by that change. IIRC, I reached 25000
> > sessions/s on dual core with TSC if I didn't care about the clock,
> > 20000 without TSC, and 18000 on single core+TSC. But with the sniffer,
> > it was even worse : I had 500 kpps in dual-core+TSC, 70kpps without
> > TSC and 300 kpps with single-core+TSC. Since I had to buy the same
> > machines for both uses, this last argument was enough for me to stick
> > to a single core.
>
> Was the problem that they were not synced at poweron or that they would
> drift due to power-states?

They resynced at power up, but would constantly drift. I don't even know
if it was caused by power states. When the machine was loaded, a single
task moving across the cores could see its time jump back and forth
several times a second by an offset sometimes close to +2/-2s.

> Did you try running with idle=poll, to avoid ever entering C1 state (hlt)?

Yes, I remember trying such things. I also tried 'nohlt', completely
disabling power management, including ACPI, etc... I also tried vanilla
kernels as well as severely patched ones, but the problem remained the
same in all circumstances, that only 'notsc' could solve.

BTW, I've just found a remain of dmesg capture after boot in case you'd
like to look for anything in it.

Regards,
Willy

2006-10-28 19:42:49

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 09:32:18PM +0200, Willy Tarreau wrote:
> > Was the problem that they were not synced at poweron or that they would
> > drift due to power-states?
>
> They resynced at power up, but would constantly drift. I don't even know
> if it was caused by power states. When the machine was loaded, a single
> task moving across the cores could see its time jump back and forth
> several times a second by an offset sometimes close to +2/-2s.

That sounds like C1, to me.

> > Did you try running with idle=poll, to avoid ever entering C1 state (hlt)?
>
> Yes, I remember trying such things. I also tried 'nohlt', completely
> disabling power management, including ACPI, etc... I also tried vanilla
> kernels as well as severely patched ones, but the problem remained the
> same in all circumstances, that only 'notsc' could solve.

That's exceedingly strange. On my dual-socket dual-core, I can get
roughly synced TSCs (no appreciable drift) by just using idle=poll. If
that did not work for you, I'd really want to poke at the system more.

> BTW, I've just found a remain of dmesg capture after boot in case you'd
> like to look for anything in it.

A dmesg won't be that useful, I'd actually have to poke at the system.

2006-10-28 19:59:12

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 02:22:11PM -0400, Lee Revell wrote:

> On Fri, 2006-10-27 at 20:59 -0700, Andi Kleen wrote:
> > > Fortunately, we usually have an HPET, these days. You can
> > definitely
> > > resync and get near-linear values of RDTSC.
> >
> > No we don't -- most BIOS still don't give us the HPET table
> > even when it is there in hardware. In the future this will change sure
> > but people will still run a lot of older motherboards.
>
> I have exactly such a system (see thread "x86-64 with nvidia MCP51
> chipset: kernel does not find HPET"). Is there anything at all I can do
> to make the kernel see the HPET? Can I try to guess the address? BIOS
> upgrade?

In most cases where the HPET is present but not reported, it's not
configured. Usually, you need to write a chipset-specific register to
configure the address.

Finding the register, finding some free MMIO space, writing the address
to the register and telling the address to the kernel is enough.

--
Vojtech Pavlik
Director SuSE Labs

2006-10-28 20:04:33

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 12:33:27PM -0700, Andi Kleen wrote:
> On Saturday 28 October 2006 12:15, Willy Tarreau wrote:
>
> > Yes it was, because the small gain of using a dual core with such
> > a workload was clearly lost by that change. IIRC, I reached 25000
> > sessions/s on dual core with TSC if I didn't care about the clock,
> > 20000 without TSC, and 18000 on single core+TSC. But with the sniffer,
> > it was even worse : I had 500 kpps in dual-core+TSC, 70kpps without
> > TSC and 300 kpps with single-core+TSC. Since I had to buy the same
> > machines for both uses, this last argument was enough for me to stick
> > to a single core.
>
> Ok, but it is a very specialized situation not applicable to most
> others. I just say this for all the other people following the thread.
> Again most workloads are not that gtod intensive.

100% agreed (fortunately !).

> BTW if you don't use powernow and don't use blades with thermal clock ramping
> and use idle=poll then the TSCs should be synchronized on AMD dual core
> and TSC gtod can be used. But it will burn a lot of power and make the system
> run very hot.

I tried to make it run like this. I once was said that by racking pizza boxes,
you get a pizza oven. I was prepared to accept it :-)

But I would not manage to keep them in sync. I even remember running
background loops to ensure that there was no idle at all, and the clocks
still managed to get out of sync ! I tried to disable a lot of devices,
starting with everything susceptible to send interrupts with long processing
time (eg: USB, SATA, ...), but with no success. I once thought that I
succeeded by sticking all interrupts to one core and the tasks to the
other one, but was proved wrong after several minutes.

I really think that the hardware was doing tricks far beyond my knowledge,
because on another Sun (a V40Z), there were 4 dual cores which I never saw
out of sync even after hours of testing. But the HPET was available in it,
I don't remember if it's used by default when detected.

> > > > I initially considered buying one dual-core
> > > > AMD for my own use, but after seeing this, I'm definitely sure I won't
> > > > ever buy one as long as this problem is not fixed, as it causes too
> > > > many problems.
> > >
> > > It's somewhat slower, but I'm not sure what "too many problems" you're
> > > refering to.
> >
> > Anticipated or delayed timeouts on the proxy, time measurement errors
> > (when the logs show that a session finishes before it begins, there's
> > a real problem, particularly because we use those logs for
> > troubleshooting). And for the sniffer, getting wrong times by about 2s was
> > a real problem too. I would have preferred to get something monotonic with
> > little accuracy than out of order packets !
>
> Ah you mean you forced the kernel to use a unsynchronized TSC
> for gtod during your tuning attempts and then discovered that it didn't work?
> Call me surprised.

No I did not "force" anything at first. You take the RHEL3 CD, you install
it, reboot and watch your logs report negative times, then scratch your
head, first call red hat dumb ass, and after a few tests, apologize to the
poor innocent red hat and call the box a total crap. To put it shortly
(might be useful for people who Google for it) : Dual-core Sun x2100 is
unreliable out of the box under Linux.

> In the default configuration there shouldn't be any problems
> like this, it will just run slower because the kernel falls back to a slower
> time source.

You have to specify "notsc" for this. As an alternative, a NUMA kernel
worked fine too (because TSC is disabled), but it's not obvious for
anyone why a dual-core, single proc system should be considered "NUMA" !

> > This is definitely a design problem on those chips, probably because
> > marketting targets gamers only.
>
> Last time I checked Dual core Opterons weren't marketed to gamers.

Not "opterons" under this name, but AMD X2 yes. Ask google for "AMD X2"
and click on the first non-AMD site (3rd link), then check how it's
benchmarked... On the other hand, if you look for "opteron", you
immediately find more serious usages.

> > And that's very sad, because they are
> > excellent processors !
>
> Lots of various parties are to blame here, not just AMD.
>
> The BIOS vendors for not exposing HPET even when it is available in the
> hardware. While HPET is slower than TSC too it definitely isn't nearly as
> slow as pmtimer.

I'm sure that the BIOS is buggy there, because I too found it strange
that there was no HPET reported in such a system. But I found no way
to enable it by force either, as I did not know where to start looking
at.

> Possibly the Linux people for not getting per CPU TSC going quicker.
>
> The writers of software who uses gtod too often or force the kernel
> to call it for each packet by carelessly using the timestamp ioctl.

You can't use gtod less than once in a poll() loop unfortunately. And
believe me, I do count my syscalls because each one hits performance
by a few percent. When it comes to getting time on each packet, the
problem is the same : you're dependant on the frequency of external
events. You need to get the time once for each event. But I agree
that a per-CPU TSC could help a lot at getting monotonic clocks. I
think that using the local TSC to measure non-accurate time and
decide when to call an external source would be a great improvement.

Regards,
Willy

2006-10-28 20:11:41

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Saturday 28 October 2006 13:04, Willy Tarreau wrote:

> I really think that the hardware was doing tricks far beyond my knowledge,
> because on another Sun (a V40Z), there were 4 dual cores which I never saw
> out of sync even after hours of testing. But the HPET was available in it,
> I don't remember if it's used by default when detected.

I think some system occasionally ramp the clock for thermal management,
but that should be rare.

> No I did not "force" anything at first. You take the RHEL3 CD, you install
> it, reboot and watch your logs report negative times, then scratch your
> head, first call red hat dumb ass, and after a few tests, apologize to the
> poor innocent red hat

Well they should have fixed the kernel to fall back to another clock
by backporting the appropiate fixes from mainline. I assume they
did actually.

> and call the box a total crap. To put it shortly
> (might be useful for people who Google for it) : Dual-core Sun x2100 is
> unreliable out of the box under Linux.

No that shouldn't be true with any modern kernel. It will just fallback
to HPET or more likely PMtimer.

>
> > In the default configuration there shouldn't be any problems
> > like this, it will just run slower because the kernel falls back to a
> > slower time source.
>
> You have to specify "notsc" for this.

No, the kernel should work out of the box. Some older kernels didn't
at various points of time though.

-Andi

2006-10-28 20:16:51

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 12:42:45PM -0700, [email protected] wrote:
> On Sat, Oct 28, 2006 at 09:32:18PM +0200, Willy Tarreau wrote:
> > > Was the problem that they were not synced at poweron or that they would
> > > drift due to power-states?
> >
> > They resynced at power up, but would constantly drift. I don't even know
> > if it was caused by power states. When the machine was loaded, a single
> > task moving across the cores could see its time jump back and forth
> > several times a second by an offset sometimes close to +2/-2s.
>
> That sounds like C1, to me.

OK.

> > > Did you try running with idle=poll, to avoid ever entering C1 state (hlt)?
> >
> > Yes, I remember trying such things. I also tried 'nohlt', completely
> > disabling power management, including ACPI, etc... I also tried vanilla
> > kernels as well as severely patched ones, but the problem remained the
> > same in all circumstances, that only 'notsc' could solve.
>
> That's exceedingly strange. On my dual-socket dual-core, I can get
> roughly synced TSCs (no appreciable drift) by just using idle=poll.

As I said in another mail, I thought I won by running several busy loops
in parallel to the load, which prevented the system from either halting
or slowing down. But it was OK for a few minutes only and started going
mad again.

> If that did not work for you, I'd really want to poke at the system more.

The machine was returned to the supplier and for other reasons, we switched
to a different maker for the about 20 machines (and all single-core). I've
read somewhere that there's already a second version of the sun x2100, I
don't know if it still exhibits the problem. Maybe at least they've fixed
the BIOS to report the HPET.

> > BTW, I've just found a remain of dmesg capture after boot in case you'd
> > like to look for anything in it.
>
> A dmesg won't be that useful, I'd actually have to poke at the system.

OK. I don't know if anyone there has one at hand, as I don't have it
anymore.

Regards,
Willy

2006-10-28 20:36:50

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 01:11:14PM -0700, Andi Kleen wrote:
> On Saturday 28 October 2006 13:04, Willy Tarreau wrote:
>
> > I really think that the hardware was doing tricks far beyond my knowledge,
> > because on another Sun (a V40Z), there were 4 dual cores which I never saw
> > out of sync even after hours of testing. But the HPET was available in it,
> > I don't remember if it's used by default when detected.
>
> I think some system occasionally ramp the clock for thermal management,
> but that should be rare.

I should say that at one moment, I've been wondering whether they were
or not performing sort of an automatic overclocking under load, because
those machines were really faster even in single-core than other opterons
I had tested. Since such boxes are often compared on workloads such as
SSL, doing so might have favored them in comparative benchmarks.

> > No I did not "force" anything at first. You take the RHEL3 CD, you install
> > it, reboot and watch your logs report negative times, then scratch your
> > head, first call red hat dumb ass, and after a few tests, apologize to the
> > poor innocent red hat
>
> Well they should have fixed the kernel to fall back to another clock
> by backporting the appropiate fixes from mainline. I assume they
> did actually.

But upon what trigger should they apply the fallback ? I don't see
what can be detected. I see no such thing in 2.4 mainline (except
TSC resync at boot), and do not seem to find any such fallback either
in 2.6 (though I might not have looked deep enough as the code is more
complex there).

> > and call the box a total crap. To put it shortly
> > (might be useful for people who Google for it) : Dual-core Sun x2100 is
> > unreliable out of the box under Linux.
>
> No that shouldn't be true with any modern kernel. It will just fallback
> to HPET or more likely PMtimer.

same comment as above :-)

> >
> > > In the default configuration there shouldn't be any problems
> > > like this, it will just run slower because the kernel falls back to a
> > > slower time source.
> >
> > You have to specify "notsc" for this.
>
> No, the kernel should work out of the box. Some older kernels didn't
> at various points of time though.

Anyway, if they started providing kernels which used TSC by default,
I don't think they will change this afterwards, in order to avoid
causing regressions.

Could you please check if the fallbacks you're talking about are
hard to backport in 2.4 ? Depending on their complexity and risk,
I would not be against a small backport. I think for instance that
automatically disabling TSC on SMP when HPET is present would not
be a terrible regression and might help in a number of occasions.
The user would then have to force the use of TSC if needed.

Regards,
Willy

2006-10-28 21:00:21

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, 2006-10-28 at 21:15 +0200, Willy Tarreau wrote:
> This is definitely a design problem on those chips, probably because
> marketting targets gamers only. And that's very sad, because they are
> excellent processors !

Hmm, gamers seem to be the worst affected by this problem on other OS...

Lee

2006-10-28 22:54:21

by Tim Hockin

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, Oct 28, 2006 at 09:57:39PM +0200, Vojtech Pavlik wrote:
> > > No we don't -- most BIOS still don't give us the HPET table
> > > even when it is there in hardware. In the future this will change sure
> > > but people will still run a lot of older motherboards.
> >
> > I have exactly such a system (see thread "x86-64 with nvidia MCP51
> > chipset: kernel does not find HPET"). Is there anything at all I can do
> > to make the kernel see the HPET? Can I try to guess the address? BIOS
> > upgrade?
>
> In most cases where the HPET is present but not reported, it's not
> configured. Usually, you need to write a chipset-specific register to
> configure the address.
>
> Finding the register, finding some free MMIO space, writing the address
> to the register and telling the address to the kernel is enough.

Do we want to establish a precedent for chipsets that we can find the HPET
and configure ourselves? Register them all as PCI quirks...

2006-10-29 01:28:51

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, 2006-10-28 at 12:33 -0700, Andi Kleen wrote:
> On Saturday 28 October 2006 12:15, Willy Tarreau wrote:
>
> > Yes it was, because the small gain of using a dual core with such
> > a workload was clearly lost by that change. IIRC, I reached 25000
> > sessions/s on dual core with TSC if I didn't care about the clock,
> > 20000 without TSC, and 18000 on single core+TSC. But with the sniffer,
> > it was even worse : I had 500 kpps in dual-core+TSC, 70kpps without
> > TSC and 300 kpps with single-core+TSC. Since I had to buy the same
> > machines for both uses, this last argument was enough for me to stick
> > to a single core.
>
> Ok, but it is a very specialized situation not applicable to most
> others. I just say this for all the other people following the thread.
> Again most workloads are not that gtod intensive.

Haven't benchmarked or anything, but isn't X11 also a very gtod
intensive workload?

Lee

2006-10-30 03:11:08

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Sat, 2006-10-28 at 05:22 +0100, Sergio Monteiro Basto wrote:
> On Fri, 2006-10-27 at 21:06 -0700, Andi Kleen wrote:
> > > So far, has I can understand. Seems to me that my computer which have a
> > > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > > with the patch of hrtimers on
> >
> > Intel systems (except for some large highend systems) have synchronized TSCs.
> > Only exception so far seems to be a few systems that are
> > overclocked/overvolted and running outside their specification.
> > When you do that you'e on your own and we're not interested in a bug
> > report.
>
> and my computer :)
> http://www.asrock.com/product/775Dual-880Pro.htm
> http://www.asrock.com/support/CPU_Support/show.asp?Model=775Dual-880Pro
> Monday I will checkout if my computer is under specs.
> Seems that I like buy computers with many problems on Linux and fix :)

I bought this computer, on computers shop that have the best credits in
Portugal. And I don't change anything.

cat /proc/cpuinfo
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) D CPU 2.80GHz
stepping : 4
cpu MHz : 2793.050
cache size : 1024 KB

with 2 x 1024 KB cache size just saw Pentium D 820 in
http://www.intel.com/products/processor_number/chart/pentium_d.htm

which is supported on
http://www.asrock.com/support/CPU_Support/show.asp?Model=775Dual-880Pro

775 Pentium D 820 2.80GHz 8O0MHz 2MB Smithfield All

Just see that don't have Enhanced Intel SpeedStep? Technology.

I attach here x86info which match with
http://processorfinder.intel.com/details.aspx?sSpec=SL88T

Other curiosity with kernel 2.6.18.1 and the hrtimers patch. Kernel boot
oops and hang , if I don't give "notsc" option.




>
> > There was also one BIOS found that had this problem, but it was old and rare
> > and got fixed with a upgrade.

I have last BIOS released

> >
> > > Just to point out. This could be more a problem of chipsets than CPUs
> > > (AMD or Intel). AMD just begin first using x86_64 archs :)
> >
> > No.
> >
> > -Andi

--
S?rgio M.B.


Attachments:
x86info.txt (1.81 kB)
smime.p7s (2.12 kB)
Download all attachments

2006-10-30 15:23:33

by Andi Kleen

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Monday 30 October 2006 04:10, Sergio Monteiro Basto wrote:
> On Sat, 2006-10-28 at 05:22 +0100, Sergio Monteiro Basto wrote:
> > On Fri, 2006-10-27 at 21:06 -0700, Andi Kleen wrote:
> > > > So far, has I can understand. Seems to me that my computer which have a
> > > > Pentium D (Dual Core) on VIA chipset, also have unsynchronized TSC and
> > > > with the patch of hrtimers on
> > >
> > > Intel systems (except for some large highend systems) have synchronized TSCs.
> > > Only exception so far seems to be a few systems that are
> > > overclocked/overvolted and running outside their specification.
> > > When you do that you'e on your own and we're not interested in a bug
> > > report.
> >
> > and my computer :)
> > http://www.asrock.com/product/775Dual-880Pro.htm
> > http://www.asrock.com/support/CPU_Support/show.asp?Model=775Dual-880Pro
> > Monday I will checkout if my computer is under specs.
> > Seems that I like buy computers with many problems on Linux and fix :)
>
> I bought this computer, on computers shop that have the best credits in
> Portugal. And I don't change anything.


Can you give us a full dmesg without noapic or notsc please?

Adding Suresh to cc too because he spotted a similar problem last time.

-Andi

2006-10-30 17:33:59

by Langsdorf, Mark

[permalink] [raw]
Subject: RE: AMD X2 unsynced TSC fix?

> > Agreed, I had to turn about 20 dual-core servers to single
> > core because the only way to get a monotonic gtod made it
> > so slow that it was not worth using a dual-core. I initially
> > considered buying one dual-core AMD for my own use, but after
> > seeing this, I'm definitely sure I won't ever buy one as
> > long as this problem is not fixed, as it causes too
> > many problems.
>
> Does anyone know if the problem will really be fixed in new
> CPUs, as AMD promised a year or so ago?
>
> http://lkml.org/lkml/2005/11/4/173
>
> Since that post, there has been Socket F and AM2 which apparently have
> the same issue.
> Were the AMD guys just blowing smoke?

AMD was not blowing smoke. Future AMD processors will have
pstate/cstate invariant TSCs detectable by a CPUID bit.

Unfortunately, those processors have not be released yet, and
I can't comment on their release timeframe, other than to say
they are on our roadmap.

-Mark Langsdorf
AMD, Inc.


2006-10-30 20:31:28

by Christoph Lameter

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Fri, 27 Oct 2006, [email protected] wrote:

> Wrong, too. We have a patch that will be coming SOON (trust me, I am
> pushing hard for the author to publish it). With this patch applied you
> should never see the TSC go backwards. Period. It should be monotonic
> (to userspace, kernel rdtsc calls can still be wrong). CPUs should stay
> very nearly in sync (again, to userspace). The overhead of this patch is
> pretty minimal and costs nothing unless you actually read the TSC.

Well why not use regular clock_gettime() instead? If you add code for TSC
processing (intercepting RDTSC from user space???) then it may be
comparable in performance to time retrieval via POSIX calls using
vsyscalls. Look like you may start duplicating the time subsystem?


2006-10-31 00:14:12

by Lee Revell

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Tue, 2006-10-31 at 00:03 +0000, Sergio Monteiro Basto wrote:
> On Mon, 2006-10-30 at 16:23 +0100, Andi Kleen wrote:
> > Can you give us a full dmesg without noapic or notsc please?
> >
>
> yes , I send an dmesg of 2.6.18-git20, dmesg27
> and other dmesg of kernel 2.6.18.1, dmesg30
> To vanilla kernel I just add this patch:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/gregkh-pci-pci-via-irq-quirk-behaviour-change.patch
>
> > Adding Suresh to cc too because he spotted a similar problem last
> > time.
>
> Feel free to ask any test, test patches or even access to this machine.

Maybe I've been running -rt for too long but I don't see clocksource
selection - does 2.6.18 not have John Stultz's GTOD rework?

How can it know not to use TSC on machines where it's unstable?

Lee

2006-10-31 00:26:05

by john stultz

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Mon, 2006-10-30 at 19:14 -0500, Lee Revell wrote:
> On Tue, 2006-10-31 at 00:03 +0000, Sergio Monteiro Basto wrote:
> > On Mon, 2006-10-30 at 16:23 +0100, Andi Kleen wrote:
> > > Can you give us a full dmesg without noapic or notsc please?
> > >
> >
> > yes , I send an dmesg of 2.6.18-git20, dmesg27
> > and other dmesg of kernel 2.6.18.1, dmesg30
> > To vanilla kernel I just add this patch:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/gregkh-pci-pci-via-irq-quirk-behaviour-change.patch
> >
> > > Adding Suresh to cc too because he spotted a similar problem last
> > > time.
> >
> > Feel free to ask any test, test patches or even access to this machine.
>
> Maybe I've been running -rt for too long but I don't see clocksource
> selection - does 2.6.18 not have John Stultz's GTOD rework?

He's booting x86_64. I've not had the time yet to cleanup and push my
x86_64 conversion to CONFIG_GENERIC_TIME. Soon hopefully.

thanks
-john


2006-10-31 03:03:33

by Suresh Siddha

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Tue, Oct 31, 2006 at 12:03:28AM +0000, Sergio Monteiro Basto wrote:
> time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)

Is this the reason why you are saying your system has unsynchronized TSC?
Some where in this thread, you mentioned that Lost ticks happen even
when you use "notsc"

This sounds to me as a different problem. Can you send us the output
of /proc/interrupts?

thanks,
suresh

2006-10-31 11:13:20

by Pádraig Brady

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

Willy Tarreau wrote:
> On Fri, Oct 27, 2006 at 11:28:00PM -0400, Lee Revell wrote:
>
>>On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
>>
>>>I don't think it makes too much sense to hack on pure RDTSC when
>>>gtod is fast enough -- RDTSC will be always icky and hard to use.
>>
>>I agree FWIW, our application would be happy to just use gtod if it
>>wasn't so slow on these machines.
>
>
> Agreed, I had to turn about 20 dual-core servers to single core because
> the only way to get a monotonic gtod made it so slow that it was not
> worth using a dual-core. I initially considered buying one dual-core
> AMD for my own use, but after seeing this, I'm definitely sure I won't
> ever buy one as long as this problem is not fixed, as it causes too
> many problems.

For the record, in my previous job we were implementing
a very fast packet sniffer/timestamper using 2x3.2GHz P4 Xeons + linux 2.4.20 (with gtod)
Very rarely we would see inter packet times jump by (2^32)/CPU_Hz seconds,
when sniffing about 1.2 million packets per second on 2 e1000 links,
which suggested a wrap around of a 32 bit comparison somewhere.
This lead to the fix below which was never picked up
(I guessed because it was addressed elsewhere?).
Note we were only interested in millisecond resolution for the timestamps,
but the approximation is very good in general as you know the TSCs are very
close to each other when this condition happens.
Note power management was not used on our systems.

P?draig.

diff -Naru linux-2.4.20/arch/i386/kernel/time.c linux-2.4.20-corvil/arch/i386/kernel/time.c
--- linux-2.4.20/arch/i386/kernel/time.c 2002-11-28 23:53:09.000000000 +0000
+++ linux-2.4.20-pb/arch/i386/kernel/time.c 2005-07-07 10:32:34.000000000 +0100
@@ -94,6 +94,9 @@

/* .. relative to previous jiffy (32 bits is enough) */
eax -= last_tsc_low; /* tsc_low delta */
+ if ((signed)eax < 0) { /* workaround for drifting TSCs */
+ eax = 0;
+ printk(KERN_INFO "tsc wrap around applied\n"); /* rare */
+ }

/*
* Time offset = (tsc_low delta) * fast_gettimeoffset_quotient

2006-10-31 15:05:10

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Mon, 2006-10-30 at 18:41 -0800, Siddha, Suresh B wrote:
> On Tue, Oct 31, 2006 at 12:03:28AM +0000, Sergio Monteiro Basto wrote:
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
>
> Is this the reason why you are saying your system has unsynchronized TSC?

yes

> Some where in this thread, you mentioned that Lost ticks happen even
> when you use "notsc"

yes, with news kernels 2.6.19-rcx

>
> This sounds to me as a different problem. Can you send us the output
> of /proc/interrupts?

of which kernel ?
I am not at home ..
but I have here /proc/interrupts from one 2.6.16
http://bugzilla.kernel.org/attachment.cgi?id=7927&action=view
from my bug
http://bugzilla.kernel.org/show_bug.cgi?id=6419

Tonight I can attach on bugzilla bug#6419, /proc/interrupts from one
kernel 2.6.18 and from one kernel 2.6.19-rc4

BTW: those kernels are for x86_64 arch, I haven't try, yet, i386, but
maybe will be my next test.

Thanks,
--
Sérgio M. B.
>
> thanks,
> suresh

2006-10-31 15:35:13

by Willy Tarreau

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Tue, Oct 31, 2006 at 11:12:47AM +0000, P?draig Brady wrote:
> Willy Tarreau wrote:
> > On Fri, Oct 27, 2006 at 11:28:00PM -0400, Lee Revell wrote:
> >
> >>On Fri, 2006-10-27 at 18:04 -0700, Andi Kleen wrote:
> >>
> >>>I don't think it makes too much sense to hack on pure RDTSC when
> >>>gtod is fast enough -- RDTSC will be always icky and hard to use.
> >>
> >>I agree FWIW, our application would be happy to just use gtod if it
> >>wasn't so slow on these machines.
> >
> >
> > Agreed, I had to turn about 20 dual-core servers to single core because
> > the only way to get a monotonic gtod made it so slow that it was not
> > worth using a dual-core. I initially considered buying one dual-core
> > AMD for my own use, but after seeing this, I'm definitely sure I won't
> > ever buy one as long as this problem is not fixed, as it causes too
> > many problems.
>
> For the record, in my previous job we were implementing
> a very fast packet sniffer/timestamper using 2x3.2GHz P4 Xeons + linux 2.4.20 (with gtod)
> Very rarely we would see inter packet times jump by (2^32)/CPU_Hz seconds,
> when sniffing about 1.2 million packets per second on 2 e1000 links,
> which suggested a wrap around of a 32 bit comparison somewhere.

Interesting, as in my case I was jumps of about +/- 2s on a 2.2 GHz box, which
also suggests a wrap around.

> This lead to the fix below which was never picked up
> (I guessed because it was addressed elsewhere?).
> Note we were only interested in millisecond resolution for the timestamps,
> but the approximation is very good in general as you know the TSCs are very
> close to each other when this condition happens.

100% agreed.

> Note power management was not used on our systems.
>
> P?draig.
>
> diff -Naru linux-2.4.20/arch/i386/kernel/time.c linux-2.4.20-corvil/arch/i386/kernel/time.c
> --- linux-2.4.20/arch/i386/kernel/time.c 2002-11-28 23:53:09.000000000 +0000
> +++ linux-2.4.20-pb/arch/i386/kernel/time.c 2005-07-07 10:32:34.000000000 +0100
> @@ -94,6 +94,9 @@
>
> /* .. relative to previous jiffy (32 bits is enough) */
> eax -= last_tsc_low; /* tsc_low delta */
> + if ((signed)eax < 0) { /* workaround for drifting TSCs */
> + eax = 0;
> + printk(KERN_INFO "tsc wrap around applied\n"); /* rare */
> + }
>
> /*
> * Time offset = (tsc_low delta) * fast_gettimeoffset_quotient

Cheers,
Willy

2006-11-01 03:05:49

by Suresh Siddha

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Wed, Nov 01, 2006 at 01:46:48AM +0000, Sergio Monteiro Basto wrote:
> On Mon, 2006-10-30 at 18:41 -0800, Siddha, Suresh B wrote:
> > On Tue, Oct 31, 2006 at 12:03:28AM +0000, Sergio Monteiro Basto wrote:
> > > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> >
> > Is this the reason why you are saying your system has unsynchronized TSC?
> > Some where in this thread, you mentioned that Lost ticks happen even
> > when you use "notsc"
> >
> > This sounds to me as a different problem. Can you send us the output
> > of /proc/interrupts?
>
> /proc/interrupts on kernel 2.6.18
> http://bugzilla.kernel.org/attachment.cgi?id=9384&action=view
> dmesg w/o notsc kernel 2.6.19-rc4
> http://bugzilla.kernel.org/attachment.cgi?id=9385&action=view
> /proc/interrupts kernel 2.6.19-rc4
> http://bugzilla.kernel.org/attachment.cgi?id=9386&action=view
> dmesg w/ notsc kernel 2.6.19-rc4
> http://bugzilla.kernel.org/attachment.cgi?id=9387&action=view
> /proc/interrupts kernel 2.6.19-rc4
> http://bugzilla.kernel.org/attachment.cgi?id=9388&action=view
> list of interrupts give by windows XP
> http://bugzilla.kernel.org/attachment.cgi?id=9389&action=view

First of all, from "lost timer ticks" messages and the fact that "notsc"
decreases the number of ticks lost can't be concluded as a TSC sync issue.

Some device is hogging interrupts which results in lost timer ticks and from
your 2.6.18 interrupts info, usb seems to be the culprit.. It is probably
a side effect that "notsc" decreases the lost timer ticks..

copied Len who seems to be the owner of the bug for his thoughts..
(http://bugzilla.kernel.org/show_bug.cgi?id=6419)

thanks,
suresh

2006-11-01 01:46:58

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Mon, 2006-10-30 at 18:41 -0800, Siddha, Suresh B wrote:
> On Tue, Oct 31, 2006 at 12:03:28AM +0000, Sergio Monteiro Basto wrote:
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
> > time.c: Lost 300 timer tick(s)! rip mwait_idle+0x33/0x4f)
>
> Is this the reason why you are saying your system has unsynchronized TSC?
> Some where in this thread, you mentioned that Lost ticks happen even
> when you use "notsc"
>
> This sounds to me as a different problem. Can you send us the output
> of /proc/interrupts?

/proc/interrupts on kernel 2.6.18
http://bugzilla.kernel.org/attachment.cgi?id=9384&action=view
dmesg w/o notsc kernel 2.6.19-rc4
http://bugzilla.kernel.org/attachment.cgi?id=9385&action=view
/proc/interrupts kernel 2.6.19-rc4
http://bugzilla.kernel.org/attachment.cgi?id=9386&action=view
dmesg w/ notsc kernel 2.6.19-rc4
http://bugzilla.kernel.org/attachment.cgi?id=9387&action=view
/proc/interrupts kernel 2.6.19-rc4
http://bugzilla.kernel.org/attachment.cgi?id=9388&action=view
list of interrupts give by windows XP
http://bugzilla.kernel.org/attachment.cgi?id=9389&action=view

Let me know, if I can help on something.
Thanks,
--
S?rgio M.B.


Attachments:
smime.p7s (2.12 kB)

2006-11-08 00:22:34

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Tue, 2006-10-31 at 18:44 -0800, Siddha, Suresh B wrote:

> First of all, from "lost timer ticks" messages and the fact that "notsc"
> decreases the number of ticks lost can't be concluded as a TSC sync issue.

ok, but without notsc it is a nightmare

>
> Some device is hogging interrupts which results in lost timer ticks and from
> your 2.6.18 interrupts info, usb seems to be the culprit.. It is probably
> a side effect that "notsc" decreases the lost timer ticks..

I begging use net with Ethernet instead usbnet and reduce a little the
problems (I can have nvidia DRI working without problems or oops) but
still appear the same lost tickets.

> copied Len who seems to be the owner of the bug for his thoughts..
> (http://bugzilla.kernel.org/show_bug.cgi?id=6419)

I had update bugzilla with dmesg from 2.6.19-RC4-mm2, which already came
with the latest release of hrtimers, because for the first time I could
boot without hang on boot, with hrtimers and without notsc boot option.
But it have a long long oops that maybe could give you some clues.

http://bugzilla.kernel.org/show_bug.cgi?id=6419#c55



Thanks,

--
S?rgio M.B.


Attachments:
smime.p7s (2.12 kB)

2006-11-08 19:51:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Wed, 2006-11-08 at 00:22 +0000, Sergio Monteiro Basto wrote:
> I had update bugzilla with dmesg from 2.6.19-RC4-mm2, which already came
> with the latest release of hrtimers, because for the first time I could
> boot without hang on boot, with hrtimers and without notsc boot option.
> But it have a long long oops that maybe could give you some clues.
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6419#c55

This one is a lock dependency problem, which is fixed in -rc5-mm1

tglx


2006-11-09 00:46:34

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Wed, 2006-11-08 at 20:53 +0100, Thomas Gleixner wrote:
> This one is a lock dependency problem, which is fixed in -rc5-mm1

yes, oops fixed w/ and w/o notsc option.
Other question, hrtimer in 2.6.18 found acpi_pm clocksource and use it.
With 2.6.19-rcx can't get acpi_pm clocksource even trying force at boot
kernel with clocksource=acpi_pm, any idea ?
because with this clocksource my lost ticket disappears

Thanks,
--
S?rgio M.B.


Attachments:
smime.p7s (2.12 kB)

2006-11-09 01:13:35

by john stultz

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Thu, 2006-11-09 at 00:39 +0000, Sergio Monteiro Basto wrote:
> On Wed, 2006-11-08 at 20:53 +0100, Thomas Gleixner wrote:
> > This one is a lock dependency problem, which is fixed in -rc5-mm1
>
> yes, oops fixed w/ and w/o notsc option.
> Other question, hrtimer in 2.6.18 found acpi_pm clocksource and use it.
> With 2.6.19-rcx can't get acpi_pm clocksource even trying force at boot
> kernel with clocksource=acpi_pm, any idea ?
> because with this clocksource my lost ticket disappears

Looking at the dmesg in the bugzilla:
http://bugzilla.kernel.org/show_bug.cgi?id=6419

I noticed you're using x86_64. x86_64 doesn't yet support clocksource
overrides in mainline, as it is not converted to GENERIC_TIME. (Probably
printing out such a warning if an override is used would be nice. I'll
try to get to that soon.)

Now, the code to convert x86_64 is in tglx's hrtimer patch set, so I'm
glad to hear its working for you, however I'm not sure if it really is
solving the issue or just hiding it (as lost ticks won't affect
timekeeping when you use continuous clocksources and GENERIC_TIME).

To use the ACPI PM w/ a 2.6.19-rcX kernel, use "notsc", and you'll see
the line:
time.c: Using 3.579545 MHz WALL PM GTOD PM timer.

Using the "notsc" option, do you continue to see lost tick messages
after bootup?

thanks
-john


2006-11-09 01:34:23

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Wed, 2006-11-08 at 17:13 -0800, john stultz wrote:
> Using the "notsc" option, do you continue to see lost tick messages
> after bootup?

With notsc after boot up, lost ticket stops, the bigger exception
was in last test kernel (2.6.19-RC5-mm1) which appear some few lost
ticket but seems they just stop. I am waiting to see if appears a new
one but don't.

Thanks,


--
S?rgio M.B.


Attachments:
smime.p7s (2.12 kB)

2006-11-15 01:58:50

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Wed, 2006-11-08 at 17:13 -0800, john stultz wrote:
> On Thu, 2006-11-09 at 00:39 +0000, Sergio Monteiro Basto wrote:
> > On Wed, 2006-11-08 at 20:53 +0100, Thomas Gleixner wrote:
> > > This one is a lock dependency problem, which is fixed in -rc5-mm1
> >
> > yes, oops fixed w/ and w/o notsc option.
> > Other question, hrtimer in 2.6.18 found acpi_pm clocksource and use it.
> > With 2.6.19-rcx can't get acpi_pm clocksource even trying force at boot
> > kernel with clocksource=acpi_pm, any idea ?
> > because with this clocksource my lost ticket disappears
>
> Looking at the dmesg in the bugzilla:
> http://bugzilla.kernel.org/show_bug.cgi?id=6419
>
> I noticed you're using x86_64.

yes, I _just_ use x86_64 never test it on i386.

> x86_64 doesn't yet support clocksource
> overrides in mainline,

petty , can I have a experimental patch to test it?

> as it is not converted to GENERIC_TIME. (Probably
> printing out such a warning if an override is used would be nice. I'll
> try to get to that soon.)
>
> Now, the code to convert x86_64 is in tglx's hrtimer patch set, so I'm
> glad to hear its working for you, however I'm not sure if it really is
> solving the issue or just hiding it (as lost ticks won't affect
> timekeeping when you use continuous clocksources and GENERIC_TIME).

Well, the only kernel where I can work (yes I use computer to work) is
2.6.18 + dyntick. I think don't hid neither solve the issue, is just use
other resource (clocksource) that works better ! .

>
> To use the ACPI PM w/ a 2.6.19-rcX kernel, use "notsc", and you'll see
> the line:
> time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
>
> Using the "notsc" option, do you continue to see lost tick messages
> after bootup?

I just test 2.6.19-RC5-mm2 and still very unstable even with notsc.
And after bootup, yes appears some lost tick messages.
Just trying rebuild other kernel and use command yum to update others
things, at same time, have lock up my computer.
So I back to kernel 2.6.18 + dyntick

Thanks,

>
> thanks
> -john
>
>
--
S?rgio M.B.


Attachments:
smime.p7s (2.12 kB)

2006-11-16 01:38:46

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

On Wed, 2006-11-15 at 19:40 +0100, Andreas Arens wrote:
> as I see from the dmesg on the Fedora bugzilla, your acpi tables
> don't provide an entry to the HPET timer.

> As the VIA8237 happens to have a built-in HPET, I was able to force it
> on using the
> attached patch (against 2.6.18) on an X2 system with the same
> problem, which greatly improved the system stability for me.

But I have one Intel(R) Pentium(R) D CPU 2.8 on a VIA8237
My latest suspect of the root of the problem of my computer is not in
Processor but in those VIAs. As you find that "don't provide an entry to
the HPET timer on acpi tables" it match, but how do you know that ?
I don't send DSDT on bugzilla


> The patch is hand-crafted from some older clock-tick kernel tree
> sources I found by googling.
>
> The thing is hackish and not suitable for mainline inclusion,
> but may be useful nontheless.
> If you find it useful, and it helps you please let me know.

I try your patch and it give me this differences on dmesg (file attach),
detect a different timer.c but no improvement without notsc boot option
and with notsc the computer got worst.


>
--
S?rgio M.B.


Attachments:
dmesg30-38.diff (14.48 kB)
smime.p7s (2.12 kB)
Download all attachments

2006-11-16 01:45:38

by Sergio Monteiro Basto

[permalink] [raw]
Subject: Re: AMD X2 unsynced TSC fix?

yap Andreas Arens send the patch just for me, I am sending it to the
maling lists.


On Thu, 2006-11-16 at 01:38 +0000, Sergio Monteiro Basto wrote:
> On Wed, 2006-11-15 at 19:40 +0100, Andreas Arens wrote:
> > as I see from the dmesg on the Fedora bugzilla, your acpi tables
> > don't provide an entry to the HPET timer.
>
> > As the VIA8237 happens to have a built-in HPET, I was able to force it
> > on using the
> > attached patch (against 2.6.18) on an X2 system with the same
> > problem, which greatly improved the system stability for me.
>
> But I have one Intel(R) Pentium(R) D CPU 2.8 on a VIA8237
> My latest suspect of the root of the problem of my computer is not in
> Processor but in those VIAs. As you find that "don't provide an entry to
> the HPET timer on acpi tables" it match, but how do you know that ?
> I don't send DSDT on bugzilla
>
>
> > The patch is hand-crafted from some older clock-tick kernel tree
> > sources I found by googling.
> >
> > The thing is hackish and not suitable for mainline inclusion,
> > but may be useful nontheless.
> > If you find it useful, and it helps you please let me know.
>
> I try your patch and it give me this differences on dmesg (file attach),
> detect a different timer.c but no improvement without notsc boot option
> and with notsc the computer got worst.
>
>
> >
--
S?rgio M.B.


Attachments:
2_6_18_via_8237_force_hpet_enable.diff (1.40 kB)
smime.p7s (2.12 kB)
Download all attachments