2004-11-16 00:23:05

by dean gaudet

[permalink] [raw]
Subject: [patch] prefer TSC over PM Timer

i've heard other folks have independently run into this problem -- in fact
i see the most recent fc2 kernels already do this. i'd like this to be
accepted into the main kernel though.

the x86 PM Timer is an order of magnitude slower than the TSC for
gettimeofday calls. i'm seeing 8%+ of the time spent doing gettimeofday
in someworkloads... and apparently kernel.org was seeing 80% of its time
go to gettimeofday during the fc3-release overload. PM timer is also less
accurate than TSC.

i can see a vague argument around cpufreq / tsc troubles, but i'm having a
hell of a time getting a centrino box to show any TSC troubles even while
i induce workloads that cause cpufreq to bounce the frequency around.
maybe someone could give an example of it failing...

note: when timer_tsc discovers inaccuracy after boot it falls back to
timer_pit ... timer_pit is twice as expensive as timer_pm, and it'd be
cool if timer_tsc could fall back to timer_pm... but by that point in time
all the __init stuff is gone, so i can't see how to init timer_pm. this
would be a more ideal solution.

thanks
-dean

Signed-off-by: dean gaudet <[email protected]>

--- linux-2.6.10-rc2/arch/i386/kernel/timers/timer.c.orig 2004-11-15 23:28:30.000000000 -0800
+++ linux-2.6.10-rc2/arch/i386/kernel/timers/timer.c 2004-11-15 23:29:07.000000000 -0800
@@ -19,10 +19,10 @@
#ifdef CONFIG_HPET_TIMER
&timer_hpet_init,
#endif
+ &timer_tsc_init,
#ifdef CONFIG_X86_PM_TIMER
&timer_pmtmr_init,
#endif
- &timer_tsc_init,
&timer_pit_init,
NULL,
};


2004-11-16 01:38:19

by john stultz

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Mon, 2004-11-15 at 16:23, dean gaudet wrote:
> i've heard other folks have independently run into this problem -- in fact
> i see the most recent fc2 kernels already do this. i'd like this to be
> accepted into the main kernel though.
>
> the x86 PM Timer is an order of magnitude slower than the TSC for
> gettimeofday calls. i'm seeing 8%+ of the time spent doing gettimeofday
> in someworkloads... and apparently kernel.org was seeing 80% of its time
> go to gettimeofday during the fc3-release overload. PM timer is also less
> accurate than TSC.
>
> i can see a vague argument around cpufreq / tsc troubles, but i'm having a
> hell of a time getting a centrino box to show any TSC troubles even while
> i induce workloads that cause cpufreq to bounce the frequency around.
> maybe someone could give an example of it failing...

I understand your frustration.

While there are a great number of systems that can use the TSC, cpufreq
scaling laptops, and a number of SMP and NUMA systems cannot use it as a
time source. Additinoally its difficult to detect when its wrong as
there are a reasonable number of systems that frequently miss timer
ticks. Although it is much slower, ACPI PM is just more reliable across
the broad spectrum of systems.

With your patch, ACPI PM would never be selected (as TSC always wins
when available, and it will be available on all ACPI enabled i386
systems). So its just the same as disabling CONFIG_X86_PM_TIMER, so why
not just do that?

Do note, using the "clock=tsc" boot option, you can easily force the
system to use the TSC.

> note: when timer_tsc discovers inaccuracy after boot it falls back to
> timer_pit ... timer_pit is twice as expensive as timer_pm, and it'd be
> cool if timer_tsc could fall back to timer_pm... but by that point in time
> all the __init stuff is gone, so i can't see how to init timer_pm. this
> would be a more ideal solution.

Well, the lost-ticks/pit fallback code isn't all that robust. We have
two unreliable time sources where we try to sort out which one is wrong
by using the other. I worry we'd have to implement something like NTP in
the kernel in order to correctly choose the best working time source.

I would however, support a patch that selected the TSC over the ACPI PM
time source when CONFIG_CPUFREQ and CONFIG_SMP were N. That's fairly
safe.

thanks
-john

2004-11-16 03:21:34

by dean gaudet

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Mon, 15 Nov 2004, john stultz wrote:

> I understand your frustration.
>
> While there are a great number of systems that can use the TSC, cpufreq
> scaling laptops, and a number of SMP and NUMA systems cannot use it as a
> time source. Additinoally its difficult to detect when its wrong as
> there are a reasonable number of systems that frequently miss timer
> ticks. Although it is much slower, ACPI PM is just more reliable across
> the broad spectrum of systems.

i'm having a difficult time getting a centrino w/cpufreq to do anything
bad with tsc while i'm imposing loads which cause the frequency to flutter
around (i've got ondemand governor going). maybe i need to do something
more detailed like have ntp running against a solid time source while i do
all these and let it run for longer to look for drift. suggestions
welcome.

> With your patch, ACPI PM would never be selected (as TSC always wins
> when available, and it will be available on all ACPI enabled i386
> systems). So its just the same as disabling CONFIG_X86_PM_TIMER, so why
> not just do that?

my patch lets you use "clock=pmtmr" if you want it.

> Do note, using the "clock=tsc" boot option, you can easily force the
> system to use the TSC.

right -- except i think the default is the opposite of what it should be
for a generic kernel. i think more systems are served better by using tsc
than those that need clock=pm... NUMA systems are rare (with custom
kernels/etc), and if my experience with the centrino is valid then newer
laptops aren't having this tsc/cpufreq problem.


> > note: when timer_tsc discovers inaccuracy after boot it falls back to
> > timer_pit ... timer_pit is twice as expensive as timer_pm, and it'd be
> > cool if timer_tsc could fall back to timer_pm... but by that point in time
> > all the __init stuff is gone, so i can't see how to init timer_pm. this
> > would be a more ideal solution.
>
> Well, the lost-ticks/pit fallback code isn't all that robust. We have
> two unreliable time sources where we try to sort out which one is wrong
> by using the other. I worry we'd have to implement something like NTP in
> the kernel in order to correctly choose the best working time source.

yeah that does sound unfortunate... it's almost like we should initialize
timer_pm whenever it is there so it can be used for these calibration
purposes.


> I would however, support a patch that selected the TSC over the ACPI PM
> time source when CONFIG_CPUFREQ and CONFIG_SMP were N. That's fairly
> safe.

i'm looking for a solution that generic distribution kernels can use...

honestly my selfish motivation is to get efficeon/crusoe treated properly
-- they support a fixed TSC rate which does not vary with frequency (which
many people fault us for, but the reality is that fixed TSC is the only
viable solution for a processor which can vary power consumption without
the involvement of the kernel). i'd advocate a patch like the one
below... but it feels wrong.

i suppose one way to solve all this is to punt the whole thing to userland
and let someone write a tool which either uses a database or runs code
to figure out which timer is best and sticks that into grub/lilo/whatever.

-dean


Signed-off-by: dean gaudet <[email protected]>

--- linux-2.6.10-rc2/arch/i386/kernel/timers/timer_pm.c.orig 2004-11-15 23:28:30.000000000 -0800
+++ linux-2.6.10-rc2/arch/i386/kernel/timers/timer_pm.c 2004-11-16 03:05:52.000000000 -0800
@@ -107,6 +107,13 @@
if (!cpu_has_tsc)
return -ENODEV;

+ /*
+ * Transmeta CPUs have a fixed rate TSC, so prefer tsc
+ * unless the user specifically requests pmtmr.
+ */
+ if (!override[0] && boot_cpu_data.x86_vendor == X86_VENDOR_TRANSMETA)
+ return -ENODEV;
+
/* "verify" this timing source */
value1 = read_pmtmr();
for (i = 0; i < 10000; i++) {

2004-11-16 08:12:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer


> While there are a great number of systems that can use the TSC, cpufreq
> scaling laptops, and a number of SMP and NUMA systems cannot use it as a
> time source.

please don't drag cpufreq into this; cpufreq adjusts this timer on
frequency changes just fine.



2004-11-16 09:36:46

by john stultz

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Tue, 2004-11-16 at 09:11 +0100, Arjan van de Ven wrote:
> > While there are a great number of systems that can use the TSC, cpufreq
> > scaling laptops, and a number of SMP and NUMA systems cannot use it as a
> > time source.
>
> please don't drag cpufreq into this; cpufreq adjusts this timer on
> frequency changes just fine.

Fair enough. Dominik and others have done some great work there and I
shouldn't cast doubt on it. I just haven't played with it enough to
really get confident that there really aren't any holes with it.

That said, not all laptops properly notify the kernel when they change
frequency. The BIOS just changes it on its own. My old one had this
problem and the pmtmr helped quite a bit there. But maybe these cases
are just rare enough that its not an issue.

thanks
-john

2004-11-16 09:53:18

by john stultz

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Mon, 2004-11-15 at 19:21 -0800, dean gaudet wrote:
> On Mon, 15 Nov 2004, john stultz wrote:
> > With your patch, ACPI PM would never be selected (as TSC always wins
> > when available, and it will be available on all ACPI enabled i386
> > systems). So its just the same as disabling CONFIG_X86_PM_TIMER, so why
> > not just do that?
>
> my patch lets you use "clock=pmtmr" if you want it.

Yea, but at that point you have to enable it in the config and then pass
a boot parameter to use it. I dunno. If you want to go with that you
should def include a comment in the pmtmr code as well as in the config
help.

> > Do note, using the "clock=tsc" boot option, you can easily force the
> > system to use the TSC.
>
> right -- except i think the default is the opposite of what it should be
> for a generic kernel. i think more systems are served better by using tsc
> than those that need clock=pm... NUMA systems are rare (with custom
> kernels/etc), and if my experience with the centrino is valid then newer
> laptops aren't having this tsc/cpufreq problem.
>
> > I would however, support a patch that selected the TSC over the ACPI PM
> > time source when CONFIG_CPUFREQ and CONFIG_SMP were N. That's fairly
> > safe.
>
> i'm looking for a solution that generic distribution kernels can use...
>
> honestly my selfish motivation is to get efficeon/crusoe treated properly
> -- they support a fixed TSC rate which does not vary with frequency (which
> many people fault us for, but the reality is that fixed TSC is the only
> viable solution for a processor which can vary power consumption without
> the involvement of the kernel).

Yea, I just wish we could get away from the TSC and have a well defined
and hardware guaranteed timebase register like PPC.

> i'd advocate a patch like the one
> below... but it feels wrong.

Yea, no, I definitely don't like that. I know how these tricks work,
send out a worse patch to make the first look better ;) But alas, you've
worn me down! Add the comments I mentioned above and I'd go along with
it.

Dominik: are you cool with this?

thanks
-john

2004-11-16 18:28:30

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [patch] prefer TSC over PM Timer

>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of john stultz
>Sent: Monday, November 15, 2004 5:38 PM
>To: dean gaudet
>Cc: lkml
>Subject: Re: [patch] prefer TSC over PM Timer
>
>On Mon, 2004-11-15 at 16:23, dean gaudet wrote:
>> i've heard other folks have independently run into this
>problem -- in fact
>> i see the most recent fc2 kernels already do this. i'd like
>this to be
>> accepted into the main kernel though.
>>
>> the x86 PM Timer is an order of magnitude slower than the TSC for
>> gettimeofday calls. i'm seeing 8%+ of the time spent doing
>gettimeofday
>> in someworkloads... and apparently kernel.org was seeing 80%
>of its time
>> go to gettimeofday during the fc3-release overload. PM
>timer is also less
>> accurate than TSC.
>>

I think trying to remove repeated inl()'s in read_pmtmr is a better
fix for this issue. As John mentioned in other thread, we should do
repeated reads only when something looks broken. Not always.

TSC counter stops couting when the CPU is in deep sleep state. It
should be OK to use tsc with Centrinos which support Enhanced Speedstep
Technology. But, it will have issues with older system that supports
Older Speedstep. So, I would say using pm_timer as default is better
as that works correctly on most of the systems.

Thanks,
Venki

2004-11-16 20:33:06

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Tue, Nov 16, 2004 at 01:50:44AM -0800, john stultz wrote:
> >
> > right -- except i think the default is the opposite of what it should be
> > for a generic kernel. i think more systems are served better by using tsc
> > than those that need clock=pm... NUMA systems are rare (with custom
> > kernels/etc), and if my experience with the centrino is valid then newer
> > laptops aren't having this tsc/cpufreq problem.

Oh yes, they do -- as Venkatesh pointed out, the TSC stops if the CPU is in
the "deep sleep" power state. And better support for deeper sleep states is
in the working...

Also, the cpufreq code currently can only update the timing code with an
inaccuracy of up to one jiffy. If transitions happen in between two timer
ticks, timing becomes inaccurate by -0.5<x<0.5 jiffy. So, if you're
transitioning back and forth a lot, it becomes quite inaccurate over time.
It's the best we can do, and with john's new timer core, we'll be able to
reduce this issue to zero.

In addition, notebooks won't be changing their CPU's frequency behind their
kernel's back in future as often -- a call to disable this BIOS interference
was added into 2.6.10-rc2.

> Yea, no, I definitely don't like that. I know how these tricks work,
> send out a worse patch to make the first look better ;) But alas, you've
> worn me down! Add the comments I mentioned above and I'd go along with
> it.
>
> Dominik: are you cool with this?

I agree with handling TMTA specially, as it uses such a different approach
to CPU frequency scaling _and_ gets TSC right. Therefore, ACK.

Thanks,
Dominik

2004-11-16 21:06:43

by john stultz

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Tue, 2004-11-16 at 12:29, Dominik Brodowski wrote:
> On Tue, Nov 16, 2004 at 01:50:44AM -0800, john stultz wrote:
> > Dominik: are you cool with this?
>
> I agree with handling TMTA specially, as it uses such a different approach
> to CPU frequency scaling _and_ gets TSC right. Therefore, ACK.

Dean: Ok, I'll defer to Dominik then. He's the expert on this.

thanks
-john

2004-11-17 02:12:08

by dean gaudet

[permalink] [raw]
Subject: RE: [patch] prefer TSC over PM Timer

On Tue, 16 Nov 2004, Pallipadi, Venkatesh wrote:

> I think trying to remove repeated inl()'s in read_pmtmr is a better
> fix for this issue. As John mentioned in other thread, we should do
> repeated reads only when something looks broken. Not always.

that would be a nice improvement... then timer_pm will only be 3x as slow
as timer_tsc instead of 10x slower :) it's still a lot of unnecessary
overhead for many systems, and unfortunately this is a real performance
problem (albeit exaggerated by code which is overzealous in its use of
gettimeofday()).

on a tangent... has the local apic timer ever been considered? it's fixed
rate, and my measurements show it in the same performance ballpark as TSC.

i know that all p3, p-m, p4, k8 and efficeon have local APIC, but i'm not
sure if k7 (other than k7 smp parts of course) have local apics... so i'm
not sure how widespread it is compared to pm-timer.

wouldn't local apic timer be a lot better for NUMA too?

hey wait, what exactly is the problem with TSC on NUMA? don't you just
need some per-cpu data (epoch and calibration) to make it work?

-dean

2004-11-17 10:44:40

by Mikael Pettersson

[permalink] [raw]
Subject: RE: [patch] prefer TSC over PM Timer

dean gaudet writes:
> On Tue, 16 Nov 2004, Pallipadi, Venkatesh wrote:
>
> > I think trying to remove repeated inl()'s in read_pmtmr is a better
> > fix for this issue. As John mentioned in other thread, we should do
> > repeated reads only when something looks broken. Not always.
>
> that would be a nice improvement... then timer_pm will only be 3x as slow
> as timer_tsc instead of 10x slower :) it's still a lot of unnecessary
> overhead for many systems, and unfortunately this is a real performance
> problem (albeit exaggerated by code which is overzealous in its use of
> gettimeofday()).
>
> on a tangent... has the local apic timer ever been considered? it's fixed
> rate, and my measurements show it in the same performance ballpark as TSC.
>
> i know that all p3, p-m, p4, k8 and efficeon have local APIC, but i'm not
> sure if k7 (other than k7 smp parts of course) have local apics... so i'm
> not sure how widespread it is compared to pm-timer.

All K7/K8s except the very first K7 Model 1 have local APICs.
There is no difference between UP and MP parts in this respect.

/Mikael

2004-11-17 14:19:56

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Tue, 16 Nov 2004 17:50:42 -0800 (PST), dean gaudet
<[email protected]> wrote:
> on a tangent... has the local apic timer ever been considered? it's fixed
> rate, and my measurements show it in the same performance ballpark as TSC.
>

At least Dell laptops will die horrible death if you enable lapic,
probably others.

--
Dmitry

2004-11-17 15:08:45

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [patch] prefer TSC over PM Timer

>-----Original Message-----
>From: Dmitry Torokhov [mailto:[email protected]]
>Sent: Wednesday, November 17, 2004 6:20 AM
>To: dean gaudet
>Cc: Pallipadi, Venkatesh; john stultz; lkml
>Subject: Re: [patch] prefer TSC over PM Timer
>
>On Tue, 16 Nov 2004 17:50:42 -0800 (PST), dean gaudet
><[email protected]> wrote:
>> on a tangent... has the local apic timer ever been
>considered? it's fixed
>> rate, and my measurements show it in the same performance
>ballpark as TSC.
>>
>
>At least Dell laptops will die horrible death if you enable lapic,
>probably others.
>

Hmm... And local APIC timer comes with its own set of problems
http://bugme.osdl.org/show_bug.cgi?id=2560

While in C3, we don't get local APIC timer interrupts.

Thanks,
Venki

2004-11-17 16:28:14

by Alan

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Maw, 2004-11-16 at 00:23, dean gaudet wrote:
> i've heard other folks have independently run into this problem -- in fact
> i see the most recent fc2 kernels already do this. i'd like this to be
> accepted into the main kernel though.

IMHO it was a mistake to make this change in FC2.

> the x86 PM Timer is an order of magnitude slower than the TSC for
> gettimeofday calls. i'm seeing 8%+ of the time spent doing gettimeofday
> in someworkloads... and apparently kernel.org was seeing 80% of its time
> go to gettimeofday during the fc3-release overload. PM timer is also less
> accurate than TSC.

Nobody guarantees that the TSC is clocked at the same rate per CPU and
several power management schemes break it. I see it break on my Thinkpad
600 and its one reason I have to replace the FC kernel with a 2.6-ac
kernel on that system.

Is gettimeofday supposed to return the right value or be fast ?

2004-11-17 16:32:30

by Alan

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

On Maw, 2004-11-16 at 08:11, Arjan van de Ven wrote:
> > While there are a great number of systems that can use the TSC, cpufreq
> > scaling laptops, and a number of SMP and NUMA systems cannot use it as a
> > time source.
>
> please don't drag cpufreq into this; cpufreq adjusts this timer on
> frequency changes just fine.

Not on multiprocessor systems

2004-11-17 17:29:18

by Chris Friesen

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

Alan Cox wrote:

> Is gettimeofday supposed to return the right value or be fast ?

C. All of the above. :)

Chris

2004-11-17 16:40:16

by Alan

[permalink] [raw]
Subject: RE: [patch] prefer TSC over PM Timer

On Mer, 2004-11-17 at 01:50, dean gaudet wrote:
> on a tangent... has the local apic timer ever been considered? it's fixed
> rate, and my measurements show it in the same performance ballpark as TSC.
>

It would certainly work for the SMP cases which are most of the "hard"
cases where TSC breaks. This seems to be a good path to me and although
C3 would fail as has been pointed out a C3 resume is sufficiently
expensive that fixing up the tsc offset on the resume from PMTMR isn't
going to kill anyone.

> hey wait, what exactly is the problem with TSC on NUMA? don't you just
> need some per-cpu data (epoch and calibration) to make it work?

You have unrelated clocks that drift over time. You can't just calibrate
them.
Its different to the BP6 for example where you at least know the CPU
clocks are fixed ratio.

2004-11-17 18:04:58

by dean gaudet

[permalink] [raw]
Subject: summary (Re: [patch] prefer TSC over PM Timer)

ok thanks everyone... i've been educated, and attempted to summarize the
situation.

if timer_pm is fixed to read the PM timer only once on non-broken systems
then it is generally the best choice. it is only at a ~3x disadvantage
compared to tsc/lapic in that case.

until/unless C3 and deeper resync tsc then it's best not to default to tsc
even on transmeta. it would require some co-ordination between timer_tsc
and ACPI code to know if C3/etc. are enabled, i don't see that
co-ordination there now. so it really does seem like adding "clock=tsc"
to boot is best left to installers/users/not-the-kernel for now.

here's my device summary:

PIT:
- many slow i/o accesses to read
- works everywhere

PM:
- minimum one slow i/o access to read
- measurements on a handful of systems show one PM timer read
costs ~3x a TSC read.
- kernel presently uses 3 reads as a bug workaround, but can be
reduced to one read.
- works on ~all hardware less than a few years old

TSC:
- fast read
- on most systems this varies with power mgmt -- and some power mgmt
occurs "behind-the-scenes" without kernel awareness
- cpufreq is better and better at tracking the changes (but not on SMP?)
- 2.6.10-rc2 disables even more behind-the-scenes power mgmt
- stops counting in C3 (solved? with PIT/PM/RTC read coming out of C3)
- drift possible across nodes in NUMA

local APIC:
- fast read (approx same as TSC)
- enabling lapic causes some dell laptops to crash
- stops counting in C3 (solvable with PIT/PM/RTC read coming out of C3)
- shared with scheduler -- easy to manage today
- can't be shared with scheduler if we add variable scheduler ticks
(can't read CCR and write ICR atomically -- potential to drift)
- local apic timer ticks are the best choice for scheduling on SMP
because it allows all the CPU schedulers to be skewed and avoid
lock conflicts.
- drift possible across nodes in NUMA?

HPET:
- at the moment i know nothing about it (none of my systems have it)

let me know if i've missed anything.

-dean

2004-11-17 22:38:22

by George Anzinger

[permalink] [raw]
Subject: Re: summary (Re: [patch] prefer TSC over PM Timer)

dean gaudet wrote:
> ok thanks everyone... i've been educated, and attempted to summarize the
> situation.
>
> if timer_pm is fixed to read the PM timer only once on non-broken systems
> then it is generally the best choice. it is only at a ~3x disadvantage
> compared to tsc/lapic in that case.
>
> until/unless C3 and deeper resync tsc then it's best not to default to tsc
> even on transmeta. it would require some co-ordination between timer_tsc
> and ACPI code to know if C3/etc. are enabled, i don't see that
> co-ordination there now. so it really does seem like adding "clock=tsc"
> to boot is best left to installers/users/not-the-kernel for now.
>
> here's my device summary:
>
> PIT:
> - many slow i/o accesses to read
> - works everywhere
>
> PM:
> - minimum one slow i/o access to read
> - measurements on a handful of systems show one PM timer read
> costs ~3x a TSC read.
> - kernel presently uses 3 reads as a bug workaround, but can be
> reduced to one read.
> - works on ~all hardware less than a few years old

Both the PIT and PM use the same 14.3181818MHz "rock" which is chosen for time
keeping. As such the PIT & PM should be considered the "GOLD" standard for time
keeping.
>
> TSC:
> - fast read
> - on most systems this varies with power mgmt -- and some power mgmt
> occurs "behind-the-scenes" without kernel awareness
> - cpufreq is better and better at tracking the changes (but not on SMP?)
> - 2.6.10-rc2 disables even more behind-the-scenes power mgmt
> - stops counting in C3 (solved? with PIT/PM/RTC read coming out of C3)
> - drift possible across nodes in NUMA

The TSC frequency is unknown. During boot an attempt is made to calibrate it by
comparing it with the PIT. This attempt is flawed by the I/O delays in
accessing the PIT and so will be off by 5 or more counts per tick (measured on
an 800 MHZ box, and this was done after changing the calibration time to the max
PIT count, ~50ms, and attempting to pair the beginning and ending I/O
instructions so as to, as much as possible, negate the I/O delays). It is also
not driven by a time keeping "rock" and may also be varied to lower EMI
radiation (isn't time keeping interesting).
>
> local APIC:
> - fast read (approx same as TSC)
> - enabling lapic causes some dell laptops to crash
> - stops counting in C3 (solvable with PIT/PM/RTC read coming out of C3)
> - shared with scheduler -- easy to manage today
> - can't be shared with scheduler if we add variable scheduler ticks
> (can't read CCR and write ICR atomically -- potential to drift)
> - local apic timer ticks are the best choice for scheduling on SMP
> because it allows all the CPU schedulers to be skewed and avoid
> lock conflicts.
Actually doing this is problematic as it skews the timer expire time. With the
per cpu timer lists in 2.6 there is very little lock contention. I think we can
safely dismiss the lock issue.
> - drift possible across nodes in NUMA?

The APIC timer is again on a different "rock" which is not designed for time
keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.

IMHO, the best time keeping we can get in and x86 box is to:

a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
it again so the I/O access issues become mute),

b) select either the TSC (if we think it is stable) or the pm_timer to do the
short term between tick interpolation and also to detect and correct for PIT
interrupt overrun (like we missed a tick or two). We should prefer the TSC here
because of speed and that it is read every gettimeofday() access.

c) Use the PIT interrupt (followed by an IPI from the PIT interrupt handler for
SMP systems) to do the scheduler and timer list servicing. (We really do want
the timer list to be serviced as close to the jiffies++ as possible.)

d) Use the APIC timer for both finer (as in High Resolution Timers, HRT) and
courser timing (as in variable scheduler ticks, VST).

The current HRT patch (see signature) does a, b, and c. I am currently working
on d.
>
> HPET:
> - at the moment i know nothing about it (none of my systems have it)

Well, we do know that it is in I/O space and all that that implies...
>
> let me know if i've missed anything.
>

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/

2004-11-17 23:14:41

by john stultz

[permalink] [raw]
Subject: Re: summary (Re: [patch] prefer TSC over PM Timer)

On Wed, 2004-11-17 at 14:30 -0800, George Anzinger wrote:
> The APIC timer is again on a different "rock" which is not designed for time
> keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.
>
> IMHO, the best time keeping we can get in and x86 box is to:
>
> a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
> it again so the I/O access issues become mute),
>
> b) select either the TSC (if we think it is stable) or the pm_timer to do the
> short term between tick interpolation and also to detect and correct for PIT
> interrupt overrun (like we missed a tick or two). We should prefer the TSC here
> because of speed and that it is read every gettimeofday() access.

My only qualm here is that using the TSC to interpolate between timer
ticks allows for time inconsistencies. If the TSC isn't cumulatively
accurate, then when used in between ticks it will cause minor
inaccuracies and possible inconsistencies. I'd instead prefer picking a
single time source, and using NTP to correct for drift or inaccurate
calibration.

Also breaking time subsystem from requiring regular periodic ticks
allows for tickless systems and additional power management savings. But
this should be saved for another thread.

thanks
-john

2004-11-17 23:28:13

by George Anzinger

[permalink] [raw]
Subject: Re: summary (Re: [patch] prefer TSC over PM Timer)

john stultz wrote:
> On Wed, 2004-11-17 at 14:30 -0800, George Anzinger wrote:
>
>>The APIC timer is again on a different "rock" which is not designed for time
>>keeping and, again, is calibrated at boot up against the "GOLD" standard PIT.
>>
>>IMHO, the best time keeping we can get in and x86 box is to:
>>
>>a) set up the PIT up to do the 1/HZ ticks (once set up we do not need to touch
>>it again so the I/O access issues become mute),
>>
>>b) select either the TSC (if we think it is stable) or the pm_timer to do the
>>short term between tick interpolation and also to detect and correct for PIT
>>interrupt overrun (like we missed a tick or two). We should prefer the TSC here
>>because of speed and that it is read every gettimeofday() access.
>
>
> My only qualm here is that using the TSC to interpolate between timer
> ticks allows for time inconsistencies. If the TSC isn't cumulatively
> accurate, then when used in between ticks it will cause minor
> inaccuracies and possible inconsistencies. I'd instead prefer picking a
> single time source, and using NTP to correct for drift or inaccurate
> calibration.

I think the inconstistancies are of the order of micro seconds and so will not
really show. Not all systems are connected to an NTP server. One possibility
is to build in an ntp like thing that averages out the PIT ticks and refines the
TSC count per tick thing over a much longer period. This would drive the errors
way down into the noise and it still honors the notion of the PIT being the
STANDARD for time.
>
> Also breaking time subsystem from requiring regular periodic ticks
> allows for tickless systems and additional power management savings. But
> this should be saved for another thread.

Amen!
>

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/

2004-11-18 02:10:22

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [patch] prefer TSC over PM Timer

dean gaudet <[email protected]> writes:

> i know that all p3, p-m, p4, k8 and efficeon have local APIC,

Some Celeron P3s (the one in my notebook for example) have no L-APIC:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Celeron (Coppermine)
stepping : 6
cpu MHz : 597.367
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips : 1179.64
--
Krzysztof Halasa