2005-11-28 12:06:12

by Steven Rostedt

[permalink] [raw]
Subject: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

Hi Ingo,

With -rt20 on the AMD64 x2, I'm getting a crap load of these:

read_tsc: ACK! TSC went backward! Unsynced TSCs?

So bad that the system wont even boot (at least I won't wait long enough
to let it finish).

config at: http://www.kihontech.com/tests/rt/config

-- Steve



2005-11-28 12:27:57

by Jonas Oreland

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

Steven Rostedt wrote:
> Hi Ingo,
>
> With -rt20 on the AMD64 x2, I'm getting a crap load of these:
>
> read_tsc: ACK! TSC went backward! Unsynced TSCs?
>
> So bad that the system wont even boot (at least I won't wait long enough
> to let it finish).
>
> config at: http://www.kihontech.com/tests/rt/config

Check this: http://bugzilla.kernel.org/show_bug.cgi?id=5105

Booting with idle=poll, fixes that.

Hope it help

/Jonas

2005-11-28 12:59:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Mon, 2005-11-28 at 13:30 +0100, Jonas Oreland wrote:
> Steven Rostedt wrote:
> > Hi Ingo,
> >
> > With -rt20 on the AMD64 x2, I'm getting a crap load of these:
> >
> > read_tsc: ACK! TSC went backward! Unsynced TSCs?
> >
> > So bad that the system wont even boot (at least I won't wait long enough
> > to let it finish).
> >
> > config at: http://www.kihontech.com/tests/rt/config
>
> Check this: http://bugzilla.kernel.org/show_bug.cgi?id=5105
>
> Booting with idle=poll, fixes that.
>
> Hope it help

I forgot to mention that I tried that too. But thank you for sending
this, because, just to make sure, I tried it again, and now it booted.
I think I might have had a typo when adding idle=poll the first time. I
think of this as a temporary solution, and I won't be adding that to
grub anytime soon. Manually typing it in at boot time will keep me
remembering that it is there. As long as I make sure that I type it
right ;-)

OK, this means that I don't want to stay in the -RT kernel too long
(electric prices are up you know).

-- Steve


2005-11-28 14:20:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?


* Steven Rostedt <[email protected]> wrote:

> > Booting with idle=poll, fixes that.
> >
> > Hope it help
>
> I forgot to mention that I tried that too. But thank you for sending
> this, because, just to make sure, I tried it again, and now it booted.
> I think I might have had a typo when adding idle=poll the first time.
> I think of this as a temporary solution, and I won't be adding that to
> grub anytime soon. Manually typing it in at boot time will keep me
> remembering that it is there. As long as I make sure that I type it
> right ;-)
>
> OK, this means that I don't want to stay in the -RT kernel too long
> (electric prices are up you know).

you'll get problems on stock SMP kernels too, unless you use idle=poll.
(or notsc) You can get rid of those warnings in the -rt kernel by
disabling the PARANOID_GENERIC_TIME .config option. In any case, i think
that warning should be a once-per-bootup.

Ingo

2005-11-28 15:38:15

by Lee Revell

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Mon, 2005-11-28 at 13:30 +0100, Jonas Oreland wrote:
> Steven Rostedt wrote:
> > Hi Ingo,
> >
> > With -rt20 on the AMD64 x2, I'm getting a crap load of these:
> >
> > read_tsc: ACK! TSC went backward! Unsynced TSCs?
> >
> > So bad that the system wont even boot (at least I won't wait long enough
> > to let it finish).
> >
> > config at: http://www.kihontech.com/tests/rt/config
>
> Check this: http://bugzilla.kernel.org/show_bug.cgi?id=5105
>
> Booting with idle=poll, fixes that.
>

But that bug is marked fixed over a month ago. Isn't the fix in 2.6.14?

Lee

2005-11-28 17:26:17

by Tim Hockin

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Mon, Nov 28, 2005 at 07:05:54AM -0500, Steven Rostedt wrote:
> With -rt20 on the AMD64 x2, I'm getting a crap load of these:
>
> read_tsc: ACK! TSC went backward! Unsynced TSCs?
>
> So bad that the system wont even boot (at least I won't wait long enough
> to let it finish).

The kernel's use of TSC is wholly incorrect. TSCs can ramp up and down
and *do* vary between nodes as well as between cores within a node. You
really can not compare TSCs between cpu cores at all, as is (and the
kernel assumes 1 global TSC in at least a few places).

If you have any sort of power-management enabled on a k8 (including 'hlt'
C1 state), you *will* get hosed.

We got into a situation where 1 CPU had somehow lagged behind the other
because it was idle for a while. Suddenly gettimeofday() was only giving
me HZ granularity. Successive reads would get the exact same timeval, as
much as 1 ms later.

What happened was the last_tsc was set to the higher-TSC CPU. The
gettimeofday code for TSC was running on the lower-TSC CPU. The code
recognized that current tsc < last tsc and set current = last. As long as
I was running on the laggy CPU, time stood still for bursts. Then if I
bounced CPUs it would shoot forward.

Switching to HPET for timing made it all go away, but (at least as of
2.6.11) it was horribly broken.

Tim

2005-11-28 17:52:00

by Lee Revell

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Mon, 2005-11-28 at 09:30 -0800, [email protected] wrote:
> The kernel's use of TSC is wholly incorrect. TSCs can ramp up and
> down and *do* vary between nodes as well as between cores within a
> node. You really can not compare TSCs between cpu cores at all, as is
> (and the kernel assumes 1 global TSC in at least a few places).

That's one way to look at it; another is that the AMD dual cores have a
broken TSC implementation. The kernel's use of the TSC was never a
problem in the past...

Lee

2005-11-28 18:31:19

by Dave Jones

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Mon, Nov 28, 2005 at 12:39:28PM -0500, Lee Revell wrote:
> On Mon, 2005-11-28 at 09:30 -0800, [email protected] wrote:
> > The kernel's use of TSC is wholly incorrect. TSCs can ramp up and
> > down and *do* vary between nodes as well as between cores within a
> > node. You really can not compare TSCs between cpu cores at all, as is
> > (and the kernel assumes 1 global TSC in at least a few places).
>
> That's one way to look at it; another is that the AMD dual cores have a
> broken TSC implementation. The kernel's use of the TSC was never a
> problem in the past...

Not true. Speedstep, or anything else that uses SMI to disappear
into magick bios code for long periods of time also have exactly
the same issue.

This is not a new problem.

Dave

2005-11-28 18:30:57

by Tim Hockin

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Mon, Nov 28, 2005 at 12:39:28PM -0500, Lee Revell wrote:
> On Mon, 2005-11-28 at 09:30 -0800, [email protected] wrote:
> > The kernel's use of TSC is wholly incorrect. TSCs can ramp up and
> > down and *do* vary between nodes as well as between cores within a
> > node. You really can not compare TSCs between cpu cores at all, as is
> > (and the kernel assumes 1 global TSC in at least a few places).
>
> That's one way to look at it; another is that the AMD dual cores have a
> broken TSC implementation. The kernel's use of the TSC was never a
> problem in the past...

Sure. But the OS can be fixed, the chips can not. That said, I'd like to
see a spec that says TSCs are a) synced, b) linear. If such a beast
exists, then we can all mock AMD publicly. If not, we should hush up and
fix the parts that can be fixed.

2005-11-29 09:38:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

[email protected] writes:

> On Mon, Nov 28, 2005 at 12:39:28PM -0500, Lee Revell wrote:
> > On Mon, 2005-11-28 at 09:30 -0800, [email protected] wrote:
> > > The kernel's use of TSC is wholly incorrect. TSCs can ramp up and
> > > down and *do* vary between nodes as well as between cores within a
> > > node. You really can not compare TSCs between cpu cores at all, as is
> > > (and the kernel assumes 1 global TSC in at least a few places).
> >
> > That's one way to look at it; another is that the AMD dual cores have a
> > broken TSC implementation. The kernel's use of the TSC was never a
> > problem in the past...
>
> Sure. But the OS can be fixed, the chips can not. That said, I'd like to
> see a spec that says TSCs are a) synced, b) linear.

They have specs that say they're not. Intel has specs that say that they
are on newer systems, but at least one chipset breaks it.

Linear they are never because there are MSRs to change them.

But I'm surprised you're saying 2.6.11 broke. At least 2.6.11 64bit should
have always used HPET in this case. I only broke it around 2.6.13
where I added an overeager optimization for single socket DC on my side based
on a misunderstanding. Earlier and later kernels should have been ok.

32bit might have been different.

-Andi

2005-11-29 16:47:32

by Tim Hockin

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Tue, Nov 29, 2005 at 07:06:24AM -0700, Andi Kleen wrote:
> But I'm surprised you're saying 2.6.11 broke. At least 2.6.11 64bit should
> have always used HPET in this case. I only broke it around 2.6.13
> where I added an overeager optimization for single socket DC on my side based
> on a misunderstanding. Earlier and later kernels should have been ok.

we didn't have HPET enabled in BIOS until recently. Turning that on made
all the TSC gettimeofday() crap disappear. Now to find and kill any
straggling users of rdtsc

2005-11-29 16:49:01

by Andi Kleen

[permalink] [raw]
Subject: Re: [RT] read_tsc: ACK! TSC went backward! Unsynced TSCs?

On Tue, Nov 29, 2005 at 08:52:19AM -0800, [email protected] wrote:
> On Tue, Nov 29, 2005 at 07:06:24AM -0700, Andi Kleen wrote:
> > But I'm surprised you're saying 2.6.11 broke. At least 2.6.11 64bit should
> > have always used HPET in this case. I only broke it around 2.6.13
> > where I added an overeager optimization for single socket DC on my side based
> > on a misunderstanding. Earlier and later kernels should have been ok.
>
> we didn't have HPET enabled in BIOS until recently. Turning that on made
> all the TSC gettimeofday() crap disappear. Now to find and kill any
> straggling users of rdtsc

The newer kernels work around this too by using pmtimer when needed.
Of course it's slow.

Regarding straggling users of rdtsc - one way would be to optionally
trap them and log them in the kernel. That would work in ring 3 at least.

-Andi