2006-11-09 23:43:36

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 13/19] GTOD: Mark TSC unusable for highres timers

From: Thomas Gleixner <[email protected]>

The TSC is too unstable and unreliable to be used with high resolution timers.
The automatic detection of TSC unstability fails once we switched to high
resolution mode, because the tick emulation would use the TSC as reference.
This results in a circular dependency. Mark it unusable for high res upfront.

[[email protected]: updated for i386-time-avoid-pit-smp-lockups.patch]
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff -puN arch/i386/kernel/tsc.c~gtod-mark-tsc-unusable-for-highres-timers arch/i386/kernel/tsc.c
--- a/arch/i386/kernel/tsc.c~gtod-mark-tsc-unusable-for-highres-timers
+++ a/arch/i386/kernel/tsc.c
@@ -459,10 +459,23 @@ static int __init init_tsc_clocksource(v
current_tsc_khz = tsc_khz;
clocksource_tsc.mult = clocksource_khz2mult(current_tsc_khz,
clocksource_tsc.shift);
+#ifndef CONFIG_HIGH_RES_TIMERS
/* lower the rating if we already know its unstable: */
if (check_tsc_unstable())
clocksource_tsc.rating = 0;
-
+#else
+ /*
+ * Mark TSC unsuitable for high resolution timers. TSC has so
+ * many pitfalls: frequency changes, stop in idle ... When we
+ * switch to high resolution mode we can not longer detect a
+ * firmware caused frequency change, as the emulated tick uses
+ * TSC as reference. This results in a circular dependency.
+ * Switch only to high resolution mode, if pm_timer or such
+ * is available.
+ */
+ clocksource_tsc.rating = 50;
+ clocksource_tsc.is_continuous = 0;
+#endif
init_timer(&verify_tsc_freq_timer);
verify_tsc_freq_timer.function = verify_tsc_freq;
verify_tsc_freq_timer.expires =
_

--


2006-11-10 01:13:18

by john stultz

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Thu, 2006-11-09 at 23:38 +0000, Thomas Gleixner wrote:
> plain text document attachment
> (gtod-mark-tsc-unusable-for-highres-timers.patch)
> From: Thomas Gleixner <[email protected]>
>
> The TSC is too unstable and unreliable to be used with high resolution timers.
> The automatic detection of TSC unstability fails once we switched to high
> resolution mode, because the tick emulation would use the TSC as reference.
> This results in a circular dependency. Mark it unusable for high res upfront.
>
> [[email protected]: updated for i386-time-avoid-pit-smp-lockups.patch]
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff -puN arch/i386/kernel/tsc.c~gtod-mark-tsc-unusable-for-highres-timers arch/i386/kernel/tsc.c
> --- a/arch/i386/kernel/tsc.c~gtod-mark-tsc-unusable-for-highres-timers
> +++ a/arch/i386/kernel/tsc.c
> @@ -459,10 +459,23 @@ static int __init init_tsc_clocksource(v
> current_tsc_khz = tsc_khz;
> clocksource_tsc.mult = clocksource_khz2mult(current_tsc_khz,
> clocksource_tsc.shift);
> +#ifndef CONFIG_HIGH_RES_TIMERS
> /* lower the rating if we already know its unstable: */
> if (check_tsc_unstable())
> clocksource_tsc.rating = 0;
> -
> +#else
> + /*
> + * Mark TSC unsuitable for high resolution timers. TSC has so
> + * many pitfalls: frequency changes, stop in idle ... When we
> + * switch to high resolution mode we can not longer detect a
> + * firmware caused frequency change, as the emulated tick uses
> + * TSC as reference. This results in a circular dependency.
> + * Switch only to high resolution mode, if pm_timer or such
> + * is available.
> + */
> + clocksource_tsc.rating = 50;
> + clocksource_tsc.is_continuous = 0;
> +#endif
> init_timer(&verify_tsc_freq_timer);
> verify_tsc_freq_timer.function = verify_tsc_freq;
> verify_tsc_freq_timer.expires =


Hmmm. I wish this patch was unnecessary, but I don't see an easy
solution.

Mind adding a warning so users know why a system that might use the TSC
normally does not use the TSC w/ highres timers?

Otherwise looks ok.

Acked-by: John Stultz <[email protected]>

thanks
-john

2006-11-10 05:10:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

current_tsc_khz = tsc_khz;
> > clocksource_tsc.mult = clocksource_khz2mult(current_tsc_khz,
> > clocksource_tsc.shift);
> > +#ifndef CONFIG_HIGH_RES_TIMERS
> > /* lower the rating if we already know its unstable: */
> > if (check_tsc_unstable())
> > clocksource_tsc.rating = 0;
> > -
> > +#else
> > + /*
> > + * Mark TSC unsuitable for high resolution timers. TSC has so
> > + * many pitfalls: frequency changes, stop in idle ... When we
> > + * switch to high resolution mode we can not longer detect a
> > + * firmware caused frequency change, as the emulated tick uses
> > + * TSC as reference. This results in a circular dependency.
> > + * Switch only to high resolution mode, if pm_timer or such
> > + * is available.
> > + */
> > + clocksource_tsc.rating = 50;
> > + clocksource_tsc.is_continuous = 0;
> > +#endif
> > init_timer(&verify_tsc_freq_timer);
> > verify_tsc_freq_timer.function = verify_tsc_freq;
> > verify_tsc_freq_timer.expires =
>
>
> Hmmm. I wish this patch was unnecessary, but I don't see an easy
> solution.

Very sad. This will make a lot of people unhappy, even to the point
where they might prefer disabling noidlehz over super slow gettimeofday.
I assume you at least have a suitable command line option for that, right?

Can we get a summary on which systems the TSC is considered unstable?
Normally we assume if it's stable enough for gettimeofday it should
be stable enough for longer delays too.

-Andi

2006-11-10 08:07:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Fri, 2006-11-10 at 06:10 +0100, Andi Kleen wrote:
> > > verify_tsc_freq_timer.function = verify_tsc_freq;
> > > verify_tsc_freq_timer.expires =
> >
> >
> > Hmmm. I wish this patch was unnecessary, but I don't see an easy
> > solution.
>
> Very sad. This will make a lot of people unhappy, even to the point
> where they might prefer disabling noidlehz over super slow gettimeofday.
> I assume you at least have a suitable command line option for that, right?

Yes it is sad. And the sadest part is that AMD and Intel have been asked
to fix that more than 5 years ago. They did not get their brain straight
and now we are the dimwits.

> Can we get a summary on which systems the TSC is considered unstable?
> Normally we assume if it's stable enough for gettimeofday it should
> be stable enough for longer delays too.

TSC is simply a nightmare:

- Frequency changes with CPU clock
- Unsynced across CPUs
- Stops in C3, which makes it completely unusable

Once you take away periodic interrupts it is simply broken. AMD and
Intel can run in circels, it does not get better.

tglx




2006-11-10 08:51:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Fri, 10 Nov 2006 09:10:06 +0100
Thomas Gleixner <[email protected]> wrote:

> On Fri, 2006-11-10 at 06:10 +0100, Andi Kleen wrote:
> > > > verify_tsc_freq_timer.function = verify_tsc_freq;
> > > > verify_tsc_freq_timer.expires =
> > >
> > >
> > > Hmmm. I wish this patch was unnecessary, but I don't see an easy
> > > solution.
> >
> > Very sad. This will make a lot of people unhappy, even to the point
> > where they might prefer disabling noidlehz over super slow gettimeofday.
> > I assume you at least have a suitable command line option for that, right?
>
> Yes it is sad. And the sadest part is that AMD and Intel have been asked
> to fix that more than 5 years ago. They did not get their brain straight
> and now we are the dimwits.
>
> > Can we get a summary on which systems the TSC is considered unstable?
> > Normally we assume if it's stable enough for gettimeofday it should
> > be stable enough for longer delays too.
>
> TSC is simply a nightmare:
>
> - Frequency changes with CPU clock
> - Unsynced across CPUs
> - Stops in C3, which makes it completely unusable
>
> Once you take away periodic interrupts it is simply broken. AMD and
> Intel can run in circels, it does not get better.
>

What is the actual problem? verify_tsc_freq()?

If so, could that function use the PIT/pmtimer/etc for working out if
the TSC is bust, rather than directly using jiffies?

2006-11-10 08:58:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


* Andrew Morton <[email protected]> wrote:

> If so, could that function use the PIT/pmtimer/etc for working out if
> the TSC is bust, rather than directly using jiffies?

there's no realiable way to figure out the TSC is bust: some CPUs have a
slight 'skew' between cores for example. On some systems the TSC might
skew between sockets. A CPU might break its TSC only once some
powersaving mode has been activated - which might be long after bootup.
The whole TSC business is a nightmare and cannot be supported reliably.
AFAIK Windows doesnt use it, so it's a continuous minefield for new
hardware to break.

We should wait until CPU makers get their act together and implement a
TSC variant that is /architecturally promised/ to have constant
frequency (system bus frequency or whatever) and which never stops.

Ingo

2006-11-10 09:14:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Fri, 10 Nov 2006 09:57:28 +0100
Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > If so, could that function use the PIT/pmtimer/etc for working out if
> > the TSC is bust, rather than directly using jiffies?
>
> there's no realiable way to figure out the TSC is bust: some CPUs have a
> slight 'skew' between cores for example. On some systems the TSC might
> skew between sockets. A CPU might break its TSC only once some
> powersaving mode has been activated - which might be long after bootup.
> The whole TSC business is a nightmare and cannot be supported reliably.
> AFAIK Windows doesnt use it, so it's a continuous minefield for new
> hardware to break.

But that's different.

We're limping along in a semi-OK fashion with the TSC. But now Thomas is
proposing that we effectively kill it off for all x86 because of hrtimers.

And afaict the reason for that is that we're using jiffies to determine if
the TSC has gone bad, and that test is getting false positives.

> We should wait until CPU makers get their act together and implement a
> TSC variant that is /architecturally promised/ to have constant
> frequency (system bus frequency or whatever) and which never stops.
>

That'll hurt the big machines rather a lot, won't it?

2006-11-10 09:28:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Friday 10 November 2006 09:57, Ingo Molnar wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > If so, could that function use the PIT/pmtimer/etc for working out if
> > the TSC is bust, rather than directly using jiffies?
>
> there's no realiable way to figure out the TSC is bust: some CPUs have a
> slight 'skew' between cores for example.

We find this out by black listing them. I got that working reliably as far as I know.

The main cases I know where we can't use it right now is:
- AMD >1 core
* when clock ramping is disabled it gets a little better, but on multi socket
it is still broken
* also it varies in frequency here which has to be handled
+ There is a little issue here that the frequency takes some unpredictible
time to stabilize after the frequency change. AFAIK the error is too small
to cause problems though.
- Some Intel NUMA systems (IBM x4xx, Unisys ES7000, ScaleMP)
* handled by detecting multiple Apic Clusters
- Intel systems with C3
* stops in C3. disable here
- a few P4 dual cores seem to lose TSC synchronization when overclocked
(or most likely overvolted) and running out of Spec
* I chose to ignore this case. User fault. They can set command line options.
- We had one Intel BIOS which misprogrammed the FSB dividers
* Got fixed by BIOS update. Also it was a obscure case that can be handled
with command line options.

I don't see how this is changing much with dyntimers. The only
difference that should be there is that you require TSC stability for
a longer time (instead of only HZ), but normally when the TSC is instable
it already causes trouble in the current setup.

You're probably overreacting to something. Maybe one of the old bugs?
(I had a typo in the Intel C3 detection for a long time that broke
a lot of Intel laptops)

> On some systems the TSC might
> skew between sockets. A CPU might break its TSC only once some
> powersaving mode has been activated - which might be long after bootup.
> The whole TSC business is a nightmare and cannot be supported reliably.

I disagree.

> AFAIK Windows doesnt use it, so it's a continuous minefield for new
> hardware to break.

Not true.

> We should wait until CPU makers get their act together and implement a
> TSC variant that is /architecturally promised/ to have constant
> frequency (system bus frequency or whatever)

Intel already has that (modulo totally broken BIOS and overclocking).
TSC is running always at highest P-state and usually synchronized too.

AMD is getting there.

> and which never stops.

That's unrealistic unfortunately any time soon. All the CPU vendors
are pushing for much more aggressive power saving and this basically
means turning off the CPU completely in the deeper sleep states.

-Andi


2006-11-10 09:30:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


> But that's different.
>
> We're limping along in a semi-OK fashion with the TSC. But now Thomas is
> proposing that we effectively kill it off for all x86 because of hrtimers.

I'm totally against that.

> And afaict the reason for that is that we're using jiffies to determine if
> the TSC has gone bad, and that test is getting false positives.


The i386 clocksource had always trouble with that. e.g. I have a box
where the TSC works perfectly fine on a 64bit kernel, but since the new i386
clocksource code is in it always insists on disabling it shortly after boot.
My guess is that some of the checks in there are just broken and need
to be fixed.



>
> > We should wait until CPU makers get their act together and implement a
> > TSC variant that is /architecturally promised/ to have constant
> > frequency (system bus frequency or whatever) and which never stops.
> >
>
> That'll hurt the big machines rather a lot, won't it?

It's unrealistic and short term it will cause extreme pain in many workloads
which are gettimeofday intensive (networking, databases etc.)

-Andi

2006-11-10 10:11:16

by Alan

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

Ar Gwe, 2006-11-10 am 09:57 +0100, ysgrifennodd Ingo Molnar:
> AFAIK Windows doesnt use it, so it's a continuous minefield for new
> hardware to break.

Windows uses it extensively especially games. The AMD desync upset a lot
of Windows gamers.

> We should wait until CPU makers get their act together and implement a
> TSC variant that is /architecturally promised/ to have constant
> frequency (system bus frequency or whatever) and which never stops.

This will never happen for the really big boxes, light is just too
slow... Our current TSC handling is not perfect but the TSC is often
quite usable.

If hrtimer needs and requires we stop TSC support then we should delay
the merge of HRTIMERS until these new processors are out and common ;)

Alan

2006-11-10 10:28:46

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Fri, 2006-11-10 at 06:10 +0100, Andi Kleen wrote:
> Very sad. This will make a lot of people unhappy, even to the point
> where they might prefer disabling noidlehz over super slow gettimeofday.
> I assume you at least have a suitable command line option for that, right?
>
> Can we get a summary on which systems the TSC is considered unstable?

the part where it stops in idle...
(the rest is fixed in recent enough hw)

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-10 10:31:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Friday 10 November 2006 11:28, Arjan van de Ven wrote:
> On Fri, 2006-11-10 at 06:10 +0100, Andi Kleen wrote:
> > Very sad. This will make a lot of people unhappy, even to the point
> > where they might prefer disabling noidlehz over super slow gettimeofday.
> > I assume you at least have a suitable command line option for that, right?
> >
> > Can we get a summary on which systems the TSC is considered unstable?
>
> the part where it stops in idle...

That is handled by if (intel && C3 available) disable

-Andi

2006-11-10 10:36:11

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


> We're limping along in a semi-OK fashion with the TSC.

that's because we fake it a heck of a lot; like after C3 we just make
the kernel guestimate how much to progress it so that it has just enough
ductape on it to not totally fall apart ;(

There's no easy answer. We can keep trying to ductape the TSC everywhere
it sort of breaks (cpu frequency changes on older chips, C3 idle (which
old kernels hit less often just because of the constant timer ticks),
cross cpu drifts and offsets etc etc).
What that would need at minimum is
1) a per cpu "offset" that gets added to whatever we read from rdtsc
instruction
2) a per cpu "multiplier" or something that gets applied to tsc deltas
3) all code that gets to mop up where TSC breaks (cpuspeed and C3 power
states) use "other timers" to adjust the offset/multiplier values on a
per cpu basis, rather than "hardware TSC".

I suspect that is enough to mostly keep it limping along. It's not
cheap, but it moves the costs mostly to the places where the hardware
can't do it, so if you want to call gettimeofday() in a tight loop at
least you don't pay the hpet tax. (only an add and maybe a mul but those
are cheap and effectively unavoidable if we want to keep the illusion
alive)

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-10 10:37:12

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Fri, 2006-11-10 at 11:30 +0100, Andi Kleen wrote:
> On Friday 10 November 2006 11:28, Arjan van de Ven wrote:
> > On Fri, 2006-11-10 at 06:10 +0100, Andi Kleen wrote:
> > > Very sad. This will make a lot of people unhappy, even to the point
> > > where they might prefer disabling noidlehz over super slow gettimeofday.
> > > I assume you at least have a suitable command line option for that, right?
> > >
> > > Can we get a summary on which systems the TSC is considered unstable?
> >
> > the part where it stops in idle...
>
> That is handled by if (intel && C3 available) disable

I'm not so sure it doesn't stop on AMD; the ACPI spec at least allows
it; just that I've seen few AMD CPUs that actually have C3, but that
could be a matter of time.


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-10 10:47:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Friday 10 November 2006 11:35, Arjan van de Ven wrote:
>
> > We're limping along in a semi-OK fashion with the TSC.
>
> that's because we fake it a heck of a lot; like after C3 we just make
> the kernel guestimate how much to progress it so that it has just enough
> ductape on it to not totally fall apart ;(

Do we? Where? AFAIK we just do some resetting after cpu frequency
changes, but on C3 TSC is just disabled globally.

That is better than it sounds.

Most systems don't have C3 right now. And on those that have
(laptops) it tends to be not that critical because they normally
don't run workload where gettimeofday() is really time critical
(and nobody expects them to be particularly fast anyways)

[... proposal for per CPU TSC state snipped ...]

All is being worked on.

-Andi

2006-11-10 10:56:14

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


>
> Do we? Where? AFAIK we just do some resetting after cpu frequency
> changes, but on C3 TSC is just disabled globally.
>
> That is better than it sounds.

is it?
>
> Most systems don't have C3 right now. And on those that have
> (laptops) it tends to be not that critical because they normally
> don't run workload where gettimeofday() is really time critical
> (and nobody expects them to be particularly fast anyways)

and that got changed when the blade people decided to start using laptop
processors ......

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-11-10 11:12:46

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

Hi!

> > If so, could that function use the PIT/pmtimer/etc for working out if
> > the TSC is bust, rather than directly using jiffies?
>
> there's no realiable way to figure out the TSC is bust: some CPUs have a
> slight 'skew' between cores for example. On some systems the TSC might
> skew between sockets. A CPU might break its TSC only once some

But we could still do a whitelist?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-10 11:11:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

Ahoj!

Tahle debata (lkml) by se Vam mohla hodit...
Pavel

On Fri 2006-11-10 09:10:06, Thomas Gleixner wrote:
> On Fri, 2006-11-10 at 06:10 +0100, Andi Kleen wrote:
> > > > verify_tsc_freq_timer.function = verify_tsc_freq;
> > > > verify_tsc_freq_timer.expires =
> > >
> > >
> > > Hmmm. I wish this patch was unnecessary, but I don't see an easy
> > > solution.
> >
> > Very sad. This will make a lot of people unhappy, even to the point
> > where they might prefer disabling noidlehz over super slow gettimeofday.
> > I assume you at least have a suitable command line option for that, right?
>
> Yes it is sad. And the sadest part is that AMD and Intel have been asked
> to fix that more than 5 years ago. They did not get their brain straight
> and now we are the dimwits.
>
> > Can we get a summary on which systems the TSC is considered unstable?
> > Normally we assume if it's stable enough for gettimeofday it should
> > be stable enough for longer delays too.
>
> TSC is simply a nightmare:
>
> - Frequency changes with CPU clock
> - Unsynced across CPUs
> - Stops in C3, which makes it completely unusable
>
> Once you take away periodic interrupts it is simply broken. AMD and
> Intel can run in circels, it does not get better.
>
> tglx
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-10 11:14:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


* Arjan van de Ven <[email protected]> wrote:

> > Most systems don't have C3 right now. And on those that have
> > (laptops) it tends to be not that critical because they normally
> > don't run workload where gettimeofday() is really time critical (and
> > nobody expects them to be particularly fast anyways)
>
> and that got changed when the blade people decided to start using
> laptop processors ......

and some systems disable the lapic in C2 already: BIOSs started doing
lowlevel-C3 in their C2 functionality and lie to the OS about it.

Ingo

2006-11-10 11:20:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


* Alan Cox <[email protected]> wrote:

> Ar Gwe, 2006-11-10 am 09:57 +0100, ysgrifennodd Ingo Molnar:
> > AFAIK Windows doesnt use it, so it's a continuous minefield for new
> > hardware to break.
>
> Windows uses it extensively especially games. The AMD desync upset a
> lot of Windows gamers.

well, i meant the Windows kernel itself, not applications. (maybe the
Windows kernel uses it on SMP systems where the TSC /used to be/ pretty
stable, i dont know)

> > We should wait until CPU makers get their act together and implement
> > a TSC variant that is /architecturally promised/ to have constant
> > frequency (system bus frequency or whatever) and which never stops.
>
> This will never happen for the really big boxes, light is just too
> slow... [...]

that's not a problem - time goes as fast as light [by definition] :-)

> If hrtimer needs and requires we stop TSC support [...]

no, it doesnt, so there's no real friction here. We just observed that
in the past 10 years no generally working TSC-based gettimeofday was
written (and i wrote the first version of it for the Pentium, so the
blame is on me too), and that we might be better off without it. If
someone can pull off a working TSC-based gettimeofday() implementation
then there's no objection from us.

Ingo

2006-11-10 11:28:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Friday 10 November 2006 11:55, Arjan van de Ven wrote:

> >
> > Most systems don't have C3 right now. And on those that have
> > (laptops) it tends to be not that critical because they normally
> > don't run workload where gettimeofday() is really time critical
> > (and nobody expects them to be particularly fast anyways)
>
> and that got changed when the blade people decided to start using laptop
> processors ......

Well those will be handled eventually. Currently they just have
a slower gettimeofday.

But the majority of systems is not impacted.

BTW if someone really wants to have fast gettimeofday on a blade
they can just disable C3 and force TSC.

-Andi

2006-11-10 11:49:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


* Pavel Machek <[email protected]> wrote:

> > > If so, could that function use the PIT/pmtimer/etc for working out
> > > if the TSC is bust, rather than directly using jiffies?
> >
> > there's no realiable way to figure out the TSC is bust: some CPUs
> > have a slight 'skew' between cores for example. On some systems the
> > TSC might skew between sockets. A CPU might break its TSC only once
> > some
>
> But we could still do a whitelist?

we could, but it would have to be almost empty right now :-) Reason:
even on systems that have (hardware-initialized) 'perfect' TSCs and
which do not support any frequency scaling or power-saving mode, our
current TSC initialization on SMP systems introduces a small (1-2 usecs)
skew.

but even that limited set of systems is now mostly obsolete: no
multi-core CPU based system i'm aware of would qualify. I have written
user-space testcode for TSC and gettimeofday warps, see:

http://redhat.com/~mingo/time-warp-test/time-warp-test.c

no SMP system i have passes at the moment, running 2.6.17/18:

--------------------------------------
jupiter:~> ./time-warp-test
4 CPUs, running 4 parallel test-tasks.
checking for time-warps via:
- read time stamp counter (RDTSC) instruction (cycle resolution)
- gettimeofday (TOD) syscall (usec resolution)

[...]
new TSC-warp maximum: -6392 cycles, 0000294e1f3b6100 -> 0000294e1f3b4808
| # of TSC-warps:183606 |

--------------------------------------
venus:~> ./time-warp-test
4 CPUs, running 4 parallel test-tasks.
[...]
new TSC-warp maximum: -1328 cycles, 00001d9549c6c738 -> 00001d9549c6c208
| # of TSC-warps:332510 |

--------------------------------------
neptune:~> ./time-warp-test
2 CPUs, running 2 parallel test-tasks.
[...]
new TSC-warp maximum: -332 cycles, 0000005e00b1b89e -> 0000005e00b1b752
| # of TSC-warps:340 |

[and i'm lazy to turn on the 8-way now, but that has TSC warps too.]

so i'd love to see non-warping time, but after 10 years of trying i'm
not holding my breath.

Ingo

2006-11-10 11:56:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


> we could, but it would have to be almost empty right now :-) Reason:
> even on systems that have (hardware-initialized) 'perfect' TSCs and
> which do not support any frequency scaling or power-saving mode, our
> current TSC initialization on SMP systems introduces a small (1-2 usecs)
> skew.

On Intel we don't sync the TSC anymore and on most systems users seem
to be happy at least. And on multicore AMD it is drifting anyways and
usually turned off.

-Andi

2006-11-10 12:00:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

Hi!

> > > > If so, could that function use the PIT/pmtimer/etc for working out
> > > > if the TSC is bust, rather than directly using jiffies?
> > >
> > > there's no realiable way to figure out the TSC is bust: some CPUs
> > > have a slight 'skew' between cores for example. On some systems the
> > > TSC might skew between sockets. A CPU might break its TSC only once
> > > some
> >
> > But we could still do a whitelist?
>
> we could, but it would have to be almost empty right now :-) Reason:

Well, if it would contain at least 50% of the UP machines... that
would be reasonably long list for a start.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-10 13:15:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


* Pavel Machek <[email protected]> wrote:

> > we could, but it would have to be almost empty right now :-) Reason:
>
> Well, if it would contain at least 50% of the UP machines... that
> would be reasonably long list for a start.

which 50%? Does it include those where the TSC slows down due a thermal
event SMM?

Ingo

2006-11-10 13:14:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


* Andi Kleen <[email protected]> wrote:

> > we could, but it would have to be almost empty right now :-) Reason:
> > even on systems that have (hardware-initialized) 'perfect' TSCs and
> > which do not support any frequency scaling or power-saving mode, our
> > current TSC initialization on SMP systems introduces a small (1-2 usecs)
> > skew.
>
> On Intel we don't sync the TSC anymore [...]

yeah, after i reported this a few months ago ;-)

Ingo

2006-11-10 15:45:13

by Chris Friesen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

Alan Cox wrote:
> Ar Gwe, 2006-11-10 am 09:57 +0100, ysgrifennodd Ingo Molnar:

>>We should wait until CPU makers get their act together and implement a
>>TSC variant that is /architecturally promised/ to have constant
>>frequency (system bus frequency or whatever) and which never stops.
>
> This will never happen for the really big boxes, light is just too
> slow... Our current TSC handling is not perfect but the TSC is often
> quite usable.

This hypothetical clock wouldn't have to run full speed, would it? You
could have a 1MHz clock distributed across even a large system fairly
easily.

Wouldn't that be good enough?

Chris

2006-11-11 11:12:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Fri, 2006-11-10 at 10:29 +0100, Andi Kleen wrote:
> > But that's different.
> >
> > We're limping along in a semi-OK fashion with the TSC. But now Thomas is
> > proposing that we effectively kill it off for all x86 because of hrtimers.
>
> I'm totally against that.

I'm working on that. The general disable is indeed overkill. All I need
to prevent is to switch over to highres/dyntick in case that there is no
fallback (e.g. pm_timer) available. Else I end up in a circular
dependency as the emulated tick depends on the monotonic clock.

> > And afaict the reason for that is that we're using jiffies to determine if
> > the TSC has gone bad, and that test is getting false positives.
>
> The i386 clocksource had always trouble with that. e.g. I have a box
> where the TSC works perfectly fine on a 64bit kernel, but since the new i386
> clocksource code is in it always insists on disabling it shortly after boot.
> My guess is that some of the checks in there are just broken and need
> to be fixed.

It's the unconditional mark_unstable call in ACPI C2 state. /me looks.

tglx


2006-11-11 13:59:11

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Saturday 11 November 2006 14:58, Thomas Gleixner wrote:

> >
> > > > My guess is that some of the checks in there are just broken and need
> > > > to be fixed.
> > >
> > > It's the unconditional mark_unstable call in ACPI C2 state. /me looks.
> >
> > The system doesn't support C2 states. It's an older single socket Athlon 64
> > with VIA chipset. I haven't looked in detail on why it fails.
>
> Does it have cpu freqency changing ?

Yep. But only OS controlled one (powernow).

Most likely it happens when ondemand starts doing its thing.

-Andi

2006-11-11 13:55:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Sat, 2006-11-11 at 14:51 +0100, Andi Kleen wrote:
> > > > And afaict the reason for that is that we're using jiffies to determine if
> > > > the TSC has gone bad, and that test is getting false positives.
> > >
> > > The i386 clocksource had always trouble with that. e.g. I have a box
> > > where the TSC works perfectly fine on a 64bit kernel, but since the new i386
> > > clocksource code is in it always insists on disabling it shortly after boot.
>
> shortly after boot means in user space here, not during the first idling.
>
> > > My guess is that some of the checks in there are just broken and need
> > > to be fixed.
> >
> > It's the unconditional mark_unstable call in ACPI C2 state. /me looks.
>
> The system doesn't support C2 states. It's an older single socket Athlon 64
> with VIA chipset. I haven't looked in detail on why it fails.

Does it have cpu freqency changing ?

tglx


2006-11-11 13:51:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers


> > > And afaict the reason for that is that we're using jiffies to determine if
> > > the TSC has gone bad, and that test is getting false positives.
> >
> > The i386 clocksource had always trouble with that. e.g. I have a box
> > where the TSC works perfectly fine on a 64bit kernel, but since the new i386
> > clocksource code is in it always insists on disabling it shortly after boot.

shortly after boot means in user space here, not during the first idling.

> > My guess is that some of the checks in there are just broken and need
> > to be fixed.
>
> It's the unconditional mark_unstable call in ACPI C2 state. /me looks.

The system doesn't support C2 states. It's an older single socket Athlon 64
with VIA chipset. I haven't looked in detail on why it fails.

-Andi

2006-11-11 14:06:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 13/19] GTOD: Mark TSC unusable for highres timers

On Sat, 2006-11-11 at 14:59 +0100, Andi Kleen wrote:
> On Saturday 11 November 2006 14:58, Thomas Gleixner wrote:
>
> > >
> > > > > My guess is that some of the checks in there are just broken and need
> > > > > to be fixed.
> > > >
> > > > It's the unconditional mark_unstable call in ACPI C2 state. /me looks.
> > >
> > > The system doesn't support C2 states. It's an older single socket Athlon 64
> > > with VIA chipset. I haven't looked in detail on why it fails.
> >
> > Does it have cpu freqency changing ?
>
> Yep. But only OS controlled one (powernow).
>
> Most likely it happens when ondemand starts doing its thing.

Yes, thats one of the criterias the tsc clocksource is using. I'm
looking into that right now.

tglx