2002-09-09 22:29:17

by anton wilson

[permalink] [raw]
Subject: do_gettimeofday vs. rdtsc in the scheduler



I'm writing a patch for the scheduler that allows normal processes to run
occasionally even though real-time processes completely dominate the CPU. In
order to do this the way I want to for a specific real-time application, I
need to keep track of the times that the schedule(void) function gets called.
This time is then used to calculate the time difference between when a normal
process was run last and the current time. I was trying to avoid
do_gettimeofday because of the overhead, but now I'm wondering if rdtsc on an
SMP machine may mess up my readings because the TSC from two different
processors may be read. Am I right in assuming this? Secondly, any good
suggestions on how to proceed with my patch?


Thanks,

Anton


2002-09-17 20:55:40

by Andi Kleen

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

"David S. Miller" <[email protected]> writes:

> From: john stultz <[email protected]>
> Date: 17 Sep 2002 13:29:18 -0700
>
> Some NUMA boxes do not have synced TSC, so on those systems your
> code won't work.
>
> It would have been really nice if x86 had specified a "system tick"
> register that incremented based upon the system bus cycles and thus
> were immune the processor rates.

It has - the local APIC timer. It has a tick register too that you can
read. Unfortunately it's buggy/unreliable on many systems. Linux uses
it for task scheduling and the local timer interrupt when it works,
but it's not really good enough for gettimeofday.

Microsoft/Intel have specified the HPET timer as replacement, but
it is still missing in many chipsets and buggy in others.

Also reading HPET is somewhat more costly than reading TSCs because it
goes to the southbridge, so there are cases where using TSC is
probably better (e.g. I think for networking packet time stamping the
TSC is just fine with all its limitations)

> I foresee lots of patches coming which basically are "how does this
> x86 system provide a stable synchronized tick source".

>From those who didn't implement HPET but some own spec like IBM.

-Andi

2002-09-17 20:59:08

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: Andi Kleen <[email protected]>
Date: 17 Sep 2002 23:00:38 +0200

Also reading HPET is somewhat more costly than reading TSCs because it
goes to the southbridge, so there are cases where using TSC is
probably better (e.g. I think for networking packet time stamping the
TSC is just fine with all its limitations)

The cpu gets a bus clock input, so the system tick should be processor
local as much as TSC is.

It's boggling that this is being messed up so much. I can't believe
Sun got something incredibly right (Ultra-III has a system tick) :-)

2002-09-17 21:19:59

by Alan

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, 2002-09-17 at 21:54, David S. Miller wrote:
> The cpu gets a bus clock input, so the system tick should be processor
> local as much as TSC is.
>
> It's boggling that this is being messed up so much. I can't believe
> Sun got something incredibly right (Ultra-III has a system tick) :-)

A bus clock - but things like the x440 have more than one bus clock. Its
NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
BP6 board with tsc on and enjoy

2002-09-17 21:22:16

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: Alan Cox <[email protected]>
Date: 17 Sep 2002 22:28:12 +0100

A bus clock - but things like the x440 have more than one bus clock. Its
NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
BP6 board with tsc on and enjoy

That's mostly my point.

If the bus clocks differ, then great create some system wide crystal
oscillator. That's a detail, the important bit is that you don't need
to go out to the system bus to read the tick value, it must be cpu
local to be effective and without serious performance impact.

2002-09-17 21:57:22

by James Cleverdon

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tuesday 17 September 2002 02:18 pm, David S. Miller wrote:
> From: Alan Cox <[email protected]>
> Date: 17 Sep 2002 22:28:12 +0100
>
> A bus clock - but things like the x440 have more than one bus clock. Its
> NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
> dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
> BP6 board with tsc on and enjoy
>
> That's mostly my point.
>
> If the bus clocks differ, then great create some system wide crystal
> oscillator. That's a detail, the important bit is that you don't need
> to go out to the system bus to read the tick value, it must be cpu
> local to be effective and without serious performance impact.
> -

It's more than just a detail. Sequent's last NUMA system (_not_ the NUMA-Q;
never released) did exactly what you suggest. The midplane card generated
the bus clock for all quad modules. We had requested this feature because it
was such a pain dealing with clock drift between nodes in the OS.

The HW guys were able to give us synchronized bus clocks on a 16-way box, but
warned us that it would not be practical on the 256-way. Too much clock skew
at those speeds, or something like that. I suppose you could trade off
interconnect rate for clock sync, but then performance would suffer.

I don't know how Sun and SGI manage with their larger systems. Either they
don't do clock sync, or they may have to make expensive tradeoffs.

Interestingly, Intel's IA64 manual does not guarantee that the CPU clock (and
thus its TSC register) has anything to do with the bus clock rate. Maybe
they want to dabble with asynchronous logic or multiple clock domains in
future CPUs.

Trivia: NUMA-Q systems running Dynix/PTX can contain quads running at very
different CPU speeds. This made locating some race conditions quite easy.

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-09-17 22:42:38

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: Andi Kleen <[email protected]>
Date: Wed, 18 Sep 2002 00:44:42 +0200

> I don't know how Sun and SGI manage with their larger systems. Either they
> don't do clock sync, or they may have to make expensive tradeoffs.

I guess you could always run NTP between the different CPUs ;) ;)

:-)

More seriously, you don't need to have the cpu tick registers sync'd,
it is the rate that matters.

Once booted, you can sync these system tick registers with a pretty
straight forward algorithm in the kernel. Bonus points if you can
figure out how to cancel out the cost of moving the system tick sample
cachelines between master and slave in your algorithm :-)

2002-09-17 22:39:43

by Andi Kleen

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

> I don't know how Sun and SGI manage with their larger systems. Either they
> don't do clock sync, or they may have to make expensive tradeoffs.

I guess you could always run NTP between the different CPUs ;) ;)

-Andi

2002-09-17 22:55:12

by James Cleverdon

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tuesday 17 September 2002 03:38 pm, David S. Miller wrote:
> From: Andi Kleen <[email protected]>
> Date: Wed, 18 Sep 2002 00:44:42 +0200
>
> > I don't know how Sun and SGI manage with their larger systems. Either
> > they don't do clock sync, or they may have to make expensive
> > tradeoffs.
>
> I guess you could always run NTP between the different CPUs ;) ;)
>
> :-)
>
> More seriously, you don't need to have the cpu tick registers sync'd,
> it is the rate that matters.
>
> Once booted, you can sync these system tick registers with a pretty
> straight forward algorithm in the kernel. Bonus points if you can
> figure out how to cancel out the cost of moving the system tick sample
> cachelines between master and slave in your algorithm :-)

Been there. Done that. Had the product canceled. ;^)

The initial sync was easy, even with variable latencies on cache lines. A
much simplified NTP-ish algorithm works fine. The painful thing was bus
clock drift and programs that foolishly relied on the TSC being the same
between CPUs and between nodes.

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-09-17 23:16:26

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: James Cleverdon <[email protected]>
Date: Tue, 17 Sep 2002 15:55:52 -0700

The initial sync was easy, even with variable latencies on cache lines. A
much simplified NTP-ish algorithm works fine. The painful thing was bus
clock drift and programs that foolishly relied on the TSC being the same
between CPUs and between nodes.

This is why the gettimeofday implementation should use the system tick
thing and also any profiling support in the C library should avoid
TSC as well.

For small stretches of code TSC can be used for very precise profiling
but otherwise it is pretty useless by in large.

2002-09-17 23:33:58

by john stultz

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, 2002-09-17 at 16:12, David S. Miller wrote:
> From: James Cleverdon <[email protected]>
> Date: Tue, 17 Sep 2002 15:55:52 -0700
>
> The initial sync was easy, even with variable latencies on cache lines. A
> much simplified NTP-ish algorithm works fine. The painful thing was bus
> clock drift and programs that foolishly relied on the TSC being the same
> between CPUs and between nodes.
>
> This is why the gettimeofday implementation should use the system tick
> thing and also any profiling support in the C library should avoid
> TSC as well.

I think the point James is making is that on very large systems, you
will get system tick skew as well. On one system I know of, the bus
frequency is intensionally skewed slightly between nodes. This is what
causes the TSCs to skew, and I believe would also cause this "system
tick" to skew as well.

Additionally, where is this system tick thing? You make it sound like
its a register in the cpu, and while the Ultra-III may have one, I'm
unaware of a system/bus tick register on intel chips. Is it in some
semi-documented MSR?

I apologize for being confused, I'm just not sure if your criticizing
the code or the hardware.

thanks
-john

2002-09-17 23:36:56

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: john stultz <[email protected]>
Date: 17 Sep 2002 16:32:15 -0700

Additionally, where is this system tick thing? You make it sound like
its a register in the cpu, and while the Ultra-III may have one, I'm
unaware of a system/bus tick register on intel chips. Is it in some
semi-documented MSR?

It's in a register on Ultra-III. The whole point of this
conversation, if you read my initial postings, is that
"this should have been specified in the x86 architecture"

I know full well it isn't currently :-)

2002-09-17 23:47:12

by Andi Kleen

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
> From: john stultz <[email protected]>
> Date: 17 Sep 2002 16:32:15 -0700
>
> Additionally, where is this system tick thing? You make it sound like
> its a register in the cpu, and while the Ultra-III may have one, I'm
> unaware of a system/bus tick register on intel chips. Is it in some
> semi-documented MSR?
>
> It's in a register on Ultra-III. The whole point of this
> conversation, if you read my initial postings, is that
> "this should have been specified in the x86 architecture"
>
> I know full well it isn't currently :-)

Sorry, it's wrong. The x86 architecture has several such registers
(apic timers, 8253 timer, HPET [Microsoft requires this for new
hardware that will be w*s certified])
They just all suck on various systems or in general. HPET is ok,
but still not widespread enough.

-Andi

2002-09-17 23:50:59

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: Andi Kleen <[email protected]>
Date: Wed, 18 Sep 2002 01:52:09 +0200

On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
> I know full well it isn't currently :-)

Sorry, it's wrong. The x86 architecture has several such registers

Not in the processor, and not architectually specified.

All of the things you list are in the scope of things outside
the cpu.

2002-09-17 23:53:43

by Andi Kleen

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, Sep 17, 2002 at 04:46:49PM -0700, David S. Miller wrote:
> From: Andi Kleen <[email protected]>
> Date: Wed, 18 Sep 2002 01:52:09 +0200
>
> On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
> > I know full well it isn't currently :-)
>
> Sorry, it's wrong. The x86 architecture has several such registers
>
> Not in the processor, and not architectually specified.
>
> All of the things you list are in the scope of things outside
> the cpu.

The local APIC timer is specified in the Intel Manual volume 3 for example.
It's an optional feature (CPUID), but pretty much everyone has it.

-Andi



2002-09-17 23:55:40

by David Miller

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

From: Andi Kleen <[email protected]>
Date: Wed, 18 Sep 2002 01:58:38 +0200

The local APIC timer is specified in the Intel Manual volume 3 for example.
It's an optional feature (CPUID), but pretty much everyone has it.

It is internal or external to the processor? Ie. can it be in the
southbridge or something? If yes, then I still hold my point.

You shouldn't have to PIO to get a reliable timer value.

2002-09-18 00:00:38

by Andi Kleen

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, Sep 17, 2002 at 04:51:31PM -0700, David S. Miller wrote:
> From: Andi Kleen <[email protected]>
> Date: Wed, 18 Sep 2002 01:58:38 +0200
>
> The local APIC timer is specified in the Intel Manual volume 3 for example.
> It's an optional feature (CPUID), but pretty much everyone has it.
>
> It is internal or external to the processor? Ie. can it be in the
> southbridge or something? If yes, then I still hold my point.

Local Apic is in the cpu.

-Andi

2002-09-18 01:00:37

by James Cleverdon

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tuesday 17 September 2002 05:05 pm, Andi Kleen wrote:
> On Tue, Sep 17, 2002 at 04:51:31PM -0700, David S. Miller wrote:
> > From: Andi Kleen <[email protected]>
> > Date: Wed, 18 Sep 2002 01:58:38 +0200
> >
> > The local APIC timer is specified in the Intel Manual volume 3 for
> > example. It's an optional feature (CPUID), but pretty much everyone has
> > it.
> >
> > It is internal or external to the processor? Ie. can it be in the
> > southbridge or something? If yes, then I still hold my point.
>
> Local Apic is in the cpu.
>
> -Andi

I believe you gents are going off at a tangent. Intel's current P4 manual
says the local APIC timer is driven by the "bus clock". For serial APICs
that was doubtless the APIC serial bus clock, which almost always was derived
from the system clock. For P4 systems with the xAPIC in parallel mode, the
only one available is the system bus.

If a multi-node system doesn't have synchronized bus clocks, it doesn't matter
which one you use. The time bases will drift relative to each other.

It's even worse when the "Frequency Spreading" BIOS option is turned on.
Then, the bus clocks are deliberately offset by as much as half a megahertz
(doubtless to pass FCC or equivalent emission certifications).

I don't know what Sun does with the Ultra SPARC 3's time counter. Maybe they
have a separate clock input for it that runs at 1 MHz so skew and
distribution is no problem. That's fine for Sun; they build their own CPUs
and can put in whatever they want. The rest of us have to work with what we
get from the different manufacturers. And, just about all of them use a
value derived from the bus clock -- which might have drift in a multi-node
system.

That's where a better abstraction of the timer hardware would come in handy.
It would use the PIT or TSC for 99% of boxes, and switch to special code for
the weird ones.

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-09-18 06:36:24

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, Sep 17, 2002 at 03:02:04PM -0700, James Cleverdon wrote:
> On Tuesday 17 September 2002 02:18 pm, David S. Miller wrote:
> > From: Alan Cox <[email protected]>
> > Date: 17 Sep 2002 22:28:12 +0100
> >
> > A bus clock - but things like the x440 have more than one bus clock. Its
> > NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
> > dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
> > BP6 board with tsc on and enjoy
> >
> > That's mostly my point.
> >
> > If the bus clocks differ, then great create some system wide crystal
> > oscillator. That's a detail, the important bit is that you don't need
> > to go out to the system bus to read the tick value, it must be cpu
> > local to be effective and without serious performance impact.
> > -
>
> It's more than just a detail. Sequent's last NUMA system (_not_ the NUMA-Q;
> never released) did exactly what you suggest. The midplane card generated
> the bus clock for all quad modules. We had requested this feature because it
> was such a pain dealing with clock drift between nodes in the OS.
>
> The HW guys were able to give us synchronized bus clocks on a 16-way box, but
> warned us that it would not be practical on the 256-way. Too much clock skew
> at those speeds, or something like that. I suppose you could trade off
> interconnect rate for clock sync, but then performance would suffer.
>
> I don't know how Sun and SGI manage with their larger systems. Either they
> don't do clock sync, or they may have to make expensive tradeoffs.
>
> Interestingly, Intel's IA64 manual does not guarantee that the CPU clock (and
> thus its TSC register) has anything to do with the bus clock rate. Maybe
> they want to dabble with asynchronous logic or multiple clock domains in
> future CPUs.

The point here is: You don't need a synchronized bus clock. You don't
need synchronized CPU clocks. You need a synchronized system-wide clock
that doesn't drive any bus or CPU, just a simple counter in every CPU
that you can read from inside the CPU. You can pull that pretty far and
to many CPUs. That's what I understand Sun does.

--
Vojtech Pavlik
SuSE Labs

2002-09-19 11:15:54

by Mikael Pettersson

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

Andi Kleen writes:
> On Tue, Sep 17, 2002 at 04:46:49PM -0700, David S. Miller wrote:
> > From: Andi Kleen <[email protected]>
> > Date: Wed, 18 Sep 2002 01:52:09 +0200
> >
> > On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
> > > I know full well it isn't currently :-)
> >
> > Sorry, it's wrong. The x86 architecture has several such registers
> >
> > Not in the processor, and not architectually specified.
> >
> > All of the things you list are in the scope of things outside
> > the cpu.
>
> The local APIC timer is specified in the Intel Manual volume 3 for example.
> It's an optional feature (CPUID), but pretty much everyone has it.

Except that like everything else related to the local APIC, you're at
the mercy of the competence (or lack thereof) of the BIOS implementors.
- There are plenty of laptops whose CPUs have local APICs but whose
BIOSen go berserk if you enable it. There are also plenty of laptops
that don't have one, since Intel removed it from many Mobile P6 CPUs.
- There are even some desktop boards with BIOS problems, including Intel's
AL440LX on which Linux must stay away from the local APIC timer.

To assume the local APIC works on 686-class UP boxes is not realistic, alas.

/Mikael

2002-09-19 13:22:22

by Alan

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Thu, 2002-09-19 at 12:20, Mikael Pettersson wrote:
> > The local APIC timer is specified in the Intel Manual volume 3 for example.
> > It's an optional feature (CPUID), but pretty much everyone has it.
>
> Except that like everything else related to the local APIC, you're at
> the mercy of the competence (or lack thereof) of the BIOS implementors.
> - There are plenty of laptops whose CPUs have local APICs but whose
> BIOSen go berserk if you enable it. There are also plenty of laptops

Frequently because we don't disable it again before any APM calls I
suspect. When a CPU goes into sleep mode you must disable PMC and local
apic timer interrupts.

> that don't have one, since Intel removed it from many Mobile P6 CPUs.
> - There are even some desktop boards with BIOS problems, including Intel's
> AL440LX on which Linux must stay away from the local APIC timer.
>
> To assume the local APIC works on 686-class UP boxes is not realistic, alas.

Yep

2002-09-19 13:34:43

by Mikael Pettersson

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

Alan Cox writes:
> On Thu, 2002-09-19 at 12:20, Mikael Pettersson wrote:
> > > The local APIC timer is specified in the Intel Manual volume 3 for example.
> > > It's an optional feature (CPUID), but pretty much everyone has it.
> >
> > Except that like everything else related to the local APIC, you're at
> > the mercy of the competence (or lack thereof) of the BIOS implementors.
> > - There are plenty of laptops whose CPUs have local APICs but whose
> > BIOSen go berserk if you enable it. There are also plenty of laptops
>
> Frequently because we don't disable it again before any APM calls I
> suspect. When a CPU goes into sleep mode you must disable PMC and local
> apic timer interrupts.

We do on sane boxes where the APM BIOS informs us before suspending.
E.g., on my ASUS P3B-F & P4T-E suspend works with local APIC enabled
because I hooked both the NMI watchdog and local APIC to the
PM system, so we disable before suspending and restore afterwards.

The problem is that some BIOSen don't post the suspend event to
our APM driver, so we fail to disable before suspend, and some BIOSen
(like the utter crap Dell put in the Inspiron) die on all entries to
the BIOS: pull the power cord -> #SMM event -> box crashes.

/Mikael

2002-09-19 17:57:20

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Tue, Sep 17, 2002 at 06:04:33PM -0700, James Cleverdon wrote:
> have a separate clock input for it that runs at 1 MHz so skew and

The clock input should be the same, or they can always run out of
synchrony if you left it running forever. The timer generation is an
analogic thing, the reception is digital, so having a single timer
guarantees no counter skew.

If the precision we'd need from the timer driving gettimeofday would be
1HZ, so 1 tick per second, you could make it scale perfectly without
oscillations on a 256G box.

you simply can't do that with a < 1nanosecond tick period on more than a
few cpus, because of physics, or it happens what's been mentioned a
number of times on this thread (oscillations generated by the latency of
the signal delivery or further slowdown in accessing the information
with overhead in the interconnects).

The best hardware solution to this problem is to have two cpu registers
increased by two timers, one is the regular cpu tick (TSC) that we have
today, that could even go away with asynchronous cpus, and the other
timer would be the new "real time timer", a 10/100khz clock delivered to
all the cpus that goes to increase such in-cpu-core counter (so that it
can be read from userspace too inside vgettimeofday and with extremely
low latency, exactly like the current tsc, but driven by such a
secondary low frequency timer that will tell us about the time changes).
10/100usec should be much more than enough margin to deliver this timer
to all the hundred cpus with a very small oscillation. And no software
that I'm aware about needs a time-of-day precision over 10/100usec. An
interrupt itself is going to take some usec. A context switch as well is
going to take more than 10usec, that's the important bit to guarantee
gettimeofday to be monothone, different threads can have a minor
difference in the perception of the time, dominated by the speed of
light delivery of the timer signal, that's not a problem as far as it's
monothone.

The TSC and also the system clock mentioned by Dave are way too fast to
be kept synchronized in a numa without introducing significant drifts
and oscillations.

If somebody really needs 1usec resolution, he will first need vsyscalls
to avoid enter/exit kernel latencies, likely he will need to run iopl
with irq disabled, and so it should be ok to use the TSC in such case
with a specialized hacked kernel config option (with all the disclaimer
that it would break if the cpu clock changes under you etc...) All mere
mortals will be perfectly fine with a 100khz clock for gettimeofday. If
sun did a 1mhz clock to achieve the above suggested design solution,
then they did the optimal thing IMHO.

Another approch would be to use separate timer sources per-cpu and to
re-resychronize every once in a while, at regular intervals that
guarantees the drift not to spread above the half of the time of the
shortest context switch, but it would need tedious software support with
knowledge of very lowevel hardware informations, so I'd definitely
prefer the previous mentioned solution that will require all hardware
vendors to get it right or it won't work. Like it's happening now with
the TSC, with the difference that the 100k timer would be doable, while
the TSC at 2ghz isn't doable.

Of course the cyclone timer and the HPET are the very next best thing
the hardware vendors could provide us on x86, and of course you cannot
do better than the cyclone and HPET without upgrading the cpu too,
because the cpu is simply missing a register to avoid hitting the
southbridge at every vgettimeofday. At least the good thing is that HPET
is mapped in a mmio region so we don't need to enter kernel but only to
access the southbridge from userspace and that saves a number of usec at
every gettimeofday.

All of this assumes gettimeofday is an important operation and that an
additional cpu sequence counter and an additional numa-shared timer
would payoff to make gettimeofday most efficient and most accurate on
all class of machines. It would be also an option to replace
the TSC with such new "real time counter" if adding a new counter is too
expensive, the TSC is almost unusable in its current too high frequency
form, it is useful only for microbenchmarking, so it's more a debugging
facility than a production feature, while the other would be a really
useful feature not only for debugging/benchmarking purposes.

Andrea

2002-09-19 17:58:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Wed, Sep 18, 2002 at 08:40:22AM +0200, Vojtech Pavlik wrote:
> The point here is: You don't need a synchronized bus clock. You don't
> need synchronized CPU clocks. You need a synchronized system-wide clock
> that doesn't drive any bus or CPU, just a simple counter in every CPU
> that you can read from inside the CPU. You can pull that pretty far and

Exactly.

Andrea

2002-09-20 10:58:46

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Wed, 18 Sep 2002, Andi Kleen wrote:

> > It is internal or external to the processor? Ie. can it be in the
> > southbridge or something? If yes, then I still hold my point.
>
> Local Apic is in the cpu.

Except from when it's an i82489DX... Rare but still.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-09-20 15:21:20

by John Levon

[permalink] [raw]
Subject: Re: do_gettimeofday vs. rdtsc in the scheduler

On Thu, Sep 19, 2002 at 02:27:19PM +0100, Alan Cox wrote:

> > - There are plenty of laptops whose CPUs have local APICs but whose
> > BIOSen go berserk if you enable it. There are also plenty of laptops
>
> Frequently because we don't disable it again before any APM calls I
> suspect. When a CPU goes into sleep mode you must disable PMC and local
> apic timer interrupts.

Isn't this exactly what apic_pm_suspend() does ? Or is that in 2.5 only ?

regards
john
--
Support the project - http://www.gtonline.net/private/mapp/project/