LinuxLists.cc - Why HZ on i386 is 100 ?

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

>>>>> On Tue, 16 Apr 2002 16:27:12 +0000 (UTC), [email protected] (Linus Torvalds) said:

Linus> And I've had some Intel people grumble about it, because it
Linus> apparently means that the timer tick takes anything from 2%
Linus> to an extreme of 10% (!!) of the CPU time under certain
Linus> loads.

I'm not sure I believe this. I have had occasional cases where I
wondered whether the timer tick caused significant overhead, but it
always turned out to be something else. In my measurements,
*user-level* profiling has the 2-10% overhead you're mentioning, but
that's with a signal delivered to user level on each tick.

--david

2002-04-16 17:11:01

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, 16 Apr 2002, David Mosberger wrote:

> >>>>> On Tue, 16 Apr 2002 16:27:12 +0000 (UTC), [email protected] (Linus Torvalds) said:
>
> Linus> And I've had some Intel people grumble about it, because it
> Linus> apparently means that the timer tick takes anything from 2%
> Linus> to an extreme of 10% (!!) of the CPU time under certain
> Linus> loads.
>
> I'm not sure I believe this. I have had occasional cases where I
> wondered whether the timer tick caused significant overhead, but it
> always turned out to be something else. In my measurements,
> *user-level* profiling has the 2-10% overhead you're mentioning, but
> that's with a signal delivered to user level on each tick.

i still have pieces of paper on my desk about tests done on my dual piii
where by hacking HZ to 1000 the kernel build time went from an average of
2min:30sec to an average 2min:43sec. that is pretty close to 10%

- Davide

2002-04-16 17:17:21

by Mark Mielke

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, Apr 16, 2002 at 12:32:25PM -0300, Rik van Riel wrote:
> On Tue, 16 Apr 2002, Mark Mielke wrote:
> > Increasing the HZ can only improve responsiveness, however, there is a
> > cost (mentioned by others). The cost is that the scheduler is executed
> > more often per second. If the scheduler does the same amount of work
> > per tick, but there are more ticks per second, the scheduler does more
> > work overall, and the CPU is free for use by the processes less.
> Why are you discussing Linux 1.2 ?
> Linux is not running the scheduler each cpu tick and hasn't
> done this for years.

Hmm... sorry... :-) Too early in the morning...

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2002-04-16 17:52:45

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

2002-04-16 18:02:52

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

2002-04-16 21:35:00

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, Apr 16, 2002 at 08:12:22AM +0000, Olaf Fraczyk wrote:
> Hi,
> I would like to know why exactly this value was choosen.
> Is it safe to change it to eg. 1024? Will it break anything?
> What else should I change to get it working:
> CLOCKS_PER_SEC?
> Please CC me.

Your uptime wraps to zero after 49 days. I think 'top' gets confused.

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-04-16 22:23:39

by Andreas Dilger

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Apr 16, 2002 23:34 +0200, bert hubert wrote:
> On Tue, Apr 16, 2002 at 08:12:22AM +0000, Olaf Fraczyk wrote:
> > Hi,
> > I would like to know why exactly this value was choosen.
> > Is it safe to change it to eg. 1024? Will it break anything?
> > What else should I change to get it working:
> > CLOCKS_PER_SEC?
> > Please CC me.
>
> Your uptime wraps to zero after 49 days. I think 'top' gets confused.

Trivially fixed with the existing 64-bit jiffies patches. As it is,
your uptime wraps to zero after 472 days or something like that if you
don't have the 64-bit jiffies patch, which is totally in the realm of
possibility for Linux servers.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-04-16 22:37:54

by Herbert Xu

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Andreas Dilger <[email protected]> wrote:
> On Apr 16, 2002 23:34 +0200, bert hubert wrote:
>>
>> Your uptime wraps to zero after 49 days. I think 'top' gets confused.

> Trivially fixed with the existing 64-bit jiffies patches. As it is,
> your uptime wraps to zero after 472 days or something like that if you
> don't have the 64-bit jiffies patch, which is totally in the realm of
> possibility for Linux servers.

Why are we still measuring uptime using the tick variable? Ticks != time.
Surely we should be recording the boot time somewhere (probably on a
file system), and then comparing that with the current time?
--
Debian GNU/Linux 2.2 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2002-04-16 22:58:49

by Andreas Dilger

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Apr 17, 2002 08:37 +1000, Herbert Xu wrote:
> Why are we still measuring uptime using the tick variable? Ticks != time.
> Surely we should be recording the boot time somewhere (probably on a
> file system), and then comparing that with the current time?

Er, because the 'tick' is a valid count of the actual time that the
system has been running, while the "boot time" is totally meaningless.
What if the system has no RTC, or the RTC is wrong until later in the
boot sequence when it can be set by the user/ntpd? What if you pass
daylight savings time? Does your uptime increase/decrease by an hour?

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-04-17 00:22:54

by H. Peter Anvin

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Followup to: <[email protected]>
By author: Alan Cox <[email protected]>
In newsgroup: linux.dev.kernel
>
> > I seem to recall from theory that the 100HZ is human dependent. Any
> > higher and you would begin to notice delays from you input until
> > whatever program you're talking to responds.
>
> Ultimately its because Linus pulled that number out of a hat about ten years
> ago. For some workloads 1KHz is much better, for others like giant number
> crunching people actually drop it down to about 5..
>

Hardly so. 100 Hz was standard on most commercial Unices around the
time the first Linux was done...

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-04-17 00:33:25

by Chen, Kenneth W

[permalink] [raw]

Subject: RE: Why HZ on i386 is 100 ?

If you change HZ to 1000, you need to change PROC_CHANGE_PENALTY
accordingly. Otherwise, process would get preempted before its time slice
gets expired. The net effect is more context switch than necessary, which
could explain the 10% difference.

-----Original Message-----
From: Davide Libenzi [mailto:[email protected]]
Sent: Tuesday, April 16, 2002 11:10 AM
To: [email protected]
Cc: Linus Torvalds; Linux Kernel Mailing List
Subject: Re: Why HZ on i386 is 100 ?

On Tue, 16 Apr 2002, David Mosberger wrote:

> >>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi
<[email protected]> said:
>
> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> Did you keep the timeslice roughly constant?

it was 2.5.1 time and it was still ruled by TICK_SCALE that made the
timeslice to drop from 60ms ( 100HZ ) to 21ms ( 1000HZ ).

- Davide

2002-04-17 00:35:13

by jdow

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

From: "Andreas Dilger" <[email protected]>

> On Apr 17, 2002 08:37 +1000, Herbert Xu wrote:
> > Why are we still measuring uptime using the tick variable? Ticks != time.
> > Surely we should be recording the boot time somewhere (probably on a
> > file system), and then comparing that with the current time?
>
> Er, because the 'tick' is a valid count of the actual time that the
> system has been running, while the "boot time" is totally meaningless.
> What if the system has no RTC, or the RTC is wrong until later in the
> boot sequence when it can be set by the user/ntpd? What if you pass
> daylight savings time? Does your uptime increase/decrease by an hour?

Well, Andreas, it seems like a very simple thing to define the time
quantum, "tick", differently from the resolution of the count reported
by a call to get the tick counter value. If the latter maintains a
constant resolution even if the tick time changes then all utilities
should continue to work. Of course, with a tick time resolution of 10mS
it gets ugly when setting up a tick time of 1mS. Ideally reporting would
have an LSB of a microsecond or even a tenth microsecond while the
increment might still be a hundredth or thousandth of a second. Of course,
that blows anything that relies on the tick counter to smithereens, I fear.

{^_^} Joanne "I STILL want a Linux suitable for multimedia applications" Dow.
[email protected] (1mS ticks is a GREAT help for multimedia apps.)

2002-04-17 00:49:51

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

>>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi <[email protected]> said:

Davide> i still have pieces of paper on my desk about tests done on
Davide> my dual piii where by hacking HZ to 1000 the kernel build
Davide> time went from an average of 2min:30sec to an average
Davide> 2min:43sec. that is pretty close to 10%

The last time I measured timer tick overhead on ia64 it was well below
1% of overhead. I don't really like using kernel builds as a
benchmark, because there are far too many variables for the results to
have any long-term or cross-platform value. But since it's popular, I
did measure it quickly on a relatively slow (old) Itanium box: with
100Hz, the kernel compile was about 0.6% faster than with 1024Hz
(2.4.18 UP kernel).

--david

2002-04-17 00:54:54

[permalink] [raw]

Subject: RE: Why HZ on i386 is 100 ?

On Tue, 16 Apr 2002, Chen, Kenneth W wrote:

> If you change HZ to 1000, you need to change PROC_CHANGE_PENALTY
> accordingly. Otherwise, process would get preempted before its time slice
> gets expired. The net effect is more context switch than necessary, which
> could explain the 10% difference.

that might be the case. i was not running the latsched sampler during that
test, that would have helped me in detecting extra task bounces/cs

- Davide

2002-04-17 00:57:45

by Robert Love

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, 2002-04-16 at 20:49, David Mosberger wrote:

> But since it's popular, I did measure it quickly on a relatively
> slow (old) Itanium box: with 100Hz, the kernel compile was about
> 0.6% faster than with 1024Hz (2.4.18 UP kernel).

One question I have always had is why 1024 and not 1000 ?

Because that is what Alpha does? It seems to me there is no reason for
a power-of-two timer value, and using 1024 vs 1000 just makes the math
and rounding more difficult.

Robert Love

2002-04-17 01:02:37

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On 16 Apr 2002, Robert Love wrote:

> On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
>
> > But since it's popular, I did measure it quickly on a relatively
> > slow (old) Itanium box: with 100Hz, the kernel compile was about
> > 0.6% faster than with 1024Hz (2.4.18 UP kernel).
>
> One question I have always had is why 1024 and not 1000 ?
>
> Because that is what Alpha does? It seems to me there is no reason for
> a power-of-two timer value, and using 1024 vs 1000 just makes the math
> and rounding more difficult.

maybe because of the old TICK_SCALE macro ...

- Davide

2002-04-17 01:15:14

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, 16 Apr 2002, David Mosberger wrote:

> >>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi <[email protected]> said:
>
> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).

uhm, this is quite interesting. it's quite possible at this point that
PROC_CHANGE_PENALTY put an high cs pressure in place, with terrible cache
effects. pretty sadly i was not running the sampler that would have helped
me to detect such behaviour.

- Davide

2002-04-17 01:53:21

by Dan Mann

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Why not just try and modify the time slice scale? wouldn't this then
help with what you are trying to gain, while leaving other values alone
that rely on 100HZ?

Doesn't an unmodified TICK_SCALE negate most of the lowered time slice
effect you'd gain from raising HZ anyway?

Seems like Ingo did a bunch of work on trying to get the "sweet spot"
time slice quanta value already, and I don't think he did it with the HZ
value, but I could be wrong.

And if it is Xwindows performance you are trying for, it's not the
kernel (flying by the seat of my pants here). I can have crappy X
performance and yet my audio never skips a beat, running at exactly the
same priority as X(though my X perf problems seem to stem from multiple
clients rendering on screen at the same time.;-)(No disrespect to the X
developers, since it does a lot of thing very nicely :-)

Dan

On Tue, 2002-04-16 at 20:57, Robert Love wrote:
> On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
>
> > But since it's popular, I did measure it quickly on a relatively
> > slow (old) Itanium box: with 100Hz, the kernel compile was about
> > 0.6% faster than with 1024Hz (2.4.18 UP kernel).
>
> One question I have always had is why 1024 and not 1000 ?

> Because that is what Alpha does? It seems to me there is no reason for
> a power-of-two timer value, and using 1024 vs 1000 just makes the math
> and rounding more difficult.
>
> Robert Love
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-04-17 02:41:11

by Herbert Xu

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, Apr 16, 2002 at 04:56:31PM -0600, Andreas Dilger wrote:
>
> Er, because the 'tick' is a valid count of the actual time that the
> system has been running, while the "boot time" is totally meaningless.
> What if the system has no RTC, or the RTC is wrong until later in the
> boot sequence when it can be set by the user/ntpd? What if you pass
> daylight savings time? Does your uptime increase/decrease by an hour?

Tick is the number of timer interrupts that you've collected, which
may or may not be exactly 100Hz. In fact, after 400 days of operation,
the deviation from true time is likely to be above 1 hour.

Anyway, you don't need the RTC since you can always fall back to the
system clock which is no worse than before. However, if you do have
an accurate clock source then this is much better than using the tick.

If you use ntpd, then you can simply record the time on a server that
you trust, and use the tick reading at that point in time to deduce
the boot time.
--
Debian GNU/Linux 2.2 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2002-04-17 03:19:20

by Ben Greear

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

David Mosberger wrote:

> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).

How hard would it be to tune HZ dynamically at run time, either through
kernel smarts, or driven from user space by some sort of daemon or other
(manual) control?

Ben

--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear

2002-04-17 05:24:21

by Mark Mielke

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, Apr 16, 2002 at 08:57:09PM -0400, Robert Love wrote:
> On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
> > But since it's popular, I did measure it quickly on a relatively
> > slow (old) Itanium box: with 100Hz, the kernel compile was about
> > 0.6% faster than with 1024Hz (2.4.18 UP kernel).
> One question I have always had is why 1024 and not 1000 ?
>
> Because that is what Alpha does? It seems to me there is no reason for
> a power-of-two timer value, and using 1024 vs 1000 just makes the math
> and rounding more difficult.

Only from the perspective of time displayed to a user... :-)

Of course, that may be one of the only factors...

mark

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2002-04-17 05:35:18

by Linus Torvalds

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Wed, 17 Apr 2002, Mark Mielke wrote:
> On Tue, Apr 16, 2002 at 08:57:09PM -0400, Robert Love wrote:
> >
> > Because that is what Alpha does? It seems to me there is no reason for
> > a power-of-two timer value, and using 1024 vs 1000 just makes the math
> > and rounding more difficult.
>
> Only from the perspective of time displayed to a user... :-)

No, it also makes it much easier to convert to/from the standard UNIX time
formats (ie "struct timeval" and "struct timespec") without any surprises,
because a jiffy is exactly representable in both if you have a HZ value
of 100 or 100, but not if your HZ is 1024.

Linus

2002-04-17 06:01:43

by Robert Love

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:

> No, it also makes it much easier to convert to/from the standard UNIX time
> formats (ie "struct timeval" and "struct timespec") without any surprises,
> because a jiffy is exactly representable in both if you have a HZ value
> of 100 or 100, but not if your HZ is 1024.

Exactly - this was my issue. So what _was_ the rationale behind Alpha
picking 1024 (and others following)? More importantly, can we change to
1000?

Robert Love

2002-04-17 06:17:46

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

>>>>> On 17 Apr 2002 02:01:42 -0400, Robert Love <[email protected]> said:

Robert> Exactly - this was my issue. So what _was_ the rationale
Robert> behind Alpha picking 1024 (and others following)?

Picking a timer tick is a bit like picking the color of a
window. Everybody has an opinion and there is no truly "right" choice.
I guarantee you whatever you pick, someone will come along and say:
why not X instead?

A power-of-2 value obviously makes it easy to divide by HZ.

Robert> More importantly, can we change to 1000?

On ia64, you can make it anything you want. User-level will pick up
the current value from sysconf(_SC_CLK_TCK).

--david

2002-04-17 07:55:18

by Helge Hafting

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

David Mosberger wrote:

> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).

Did you try a parallell build, with the number of processes at least
2-3 times the number of processors? Then you get more
of the cache-miss effects from switching processes, not
merely the overhead of the fairly fast scheduler.

Helge Hafting

2002-04-17 08:02:36

by Arjan van de Ven

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

In article <1019023303.1670.37.camel@phantasy> you wrote:
> On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:
>
>> No, it also makes it much easier to convert to/from the standard UNIX time
>> formats (ie "struct timeval" and "struct timespec") without any surprises,
>> because a jiffy is exactly representable in both if you have a HZ value
>> of 100 or 100, but not if your HZ is 1024.
>
> Exactly - this was my issue. So what _was_ the rationale behind Alpha
> picking 1024

I seem to remember that this was for allowing True64 unix binaries to run
as well, those expect HZ to be 1024.....

--
But when you distribute the same sections as part of a whole which is a work
based on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the entire whole,
and thus to each and every part regardless of who wrote it. [sect.2 GPL]

2002-04-17 08:04:47

by Matti Aarnio

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Wed, Apr 17, 2002 at 02:01:42AM -0400, Robert Love wrote:
> On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:
> > No, it also makes it much easier to convert to/from the standard UNIX time
> > formats (ie "struct timeval" and "struct timespec") without any surprises,
> > because a jiffy is exactly representable in both if you have a HZ value
> > of 100 or 100, but not if your HZ is 1024.
>
> Exactly - this was my issue. So what _was_ the rationale behind Alpha
> picking 1024 (and others following)? More importantly, can we change to
> 1000?

Alpha processors don't have full division hardware, they have to
iterate it one bit at the time. They do have a flash multiplier,
and a barrel-shifter. Shifts take one pipeline cycle, like to
addition and substraction. Multiply takes 6-12 depending on model,
but division takes 64...

Converting the tick to gettimeofday() seconds is faster when
the tick is power of two.

> Robert Love

/Matti Aarnio

2002-04-17 08:28:31

[permalink] [raw]

Subject: please merge 64-bit jiffy patches. Was Re: Why HZ on i386 is 100 ?

On Tue, Apr 16, 2002 at 10:24:26PM +0000, Andreas Dilger wrote:

> Trivially fixed with the existing 64-bit jiffies patches. As it is,
> your uptime wraps to zero after 472 days or something like that if you
> don't have the 64-bit jiffies patch, which is totally in the realm of
> possibility for Linux servers.

I feel your pain

4:26am up 482 days, 10:33, 2 users, load average: 0.04, 0.02, 0.00

On a very remote server.

So can we please merge the 64-bit jiffies patches? I sometimes think that
that is the main reason why alpha DOES have HZ=1024 - the jiffies there
don't wrap in an embarrassing way within two months :-)

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-04-17 11:06:26

by Tim Schmielau

[permalink] [raw]

Subject: Re: please merge 64-bit jiffy patches.

On Wed, 17 Apr 2002, bert hubert wrote:

> I feel your pain
>
> 4:26am up 482 days, 10:33, 2 users, load average: 0.04, 0.02, 0.00
>
> On a very remote server.
>
> So can we please merge the 64-bit jiffies patches? I sometimes think that
> that is the main reason why alpha DOES have HZ=1024 - the jiffies there
> don't wrap in an embarrassing way within two months :-)
>

Rik van Riel correctly suggested to merge it in 2.5 first. I have a
forward-ported version, but it has a minor locking issue on UP.

Albert Cahalan suggested to get rid of locking at all by only updating the
high word from the timer interupt. I will try to code this on the weekend.

I'm sorry I had no time for lobbying the merge in the last month.

Tim

2002-04-17 11:06:25

by Wakko Warner

[permalink] [raw]

Subject: Re: please merge 64-bit jiffy patches. Was Re: Why HZ on i386 is 100 ?

> > Trivially fixed with the existing 64-bit jiffies patches. As it is,
> > your uptime wraps to zero after 472 days or something like that if you
> > don't have the 64-bit jiffies patch, which is totally in the realm of
> > possibility for Linux servers.
>
> I feel your pain
>
> 4:26am up 482 days, 10:33, 2 users, load average: 0.04, 0.02, 0.00
>
> On a very remote server.
>
> So can we please merge the 64-bit jiffies patches? I sometimes think that
> that is the main reason why alpha DOES have HZ=1024 - the jiffies there
> don't wrap in an embarrassing way within two months :-)

Yes, but being that the alpha is 64-bit, it doesn't wrap at 49.7 days. I've
seen mine at 60 days or so.

--
Lab tests show that use of micro$oft causes cancer in lab animals

2002-04-17 11:12:31

[permalink] [raw]

Subject: Re: please merge 64-bit jiffy patches.

On Wed, Apr 17, 2002 at 11:07:49AM +0000, Tim Schmielau wrote:

> Rik van Riel correctly suggested to merge it in 2.5 first. I have a
> forward-ported version, but it has a minor locking issue on UP.

I think that would be right. It touches a lot of code.

> Albert Cahalan suggested to get rid of locking at all by only updating the
> high word from the timer interupt. I will try to code this on the weekend.

Smart.

> I'm sorry I had no time for lobbying the merge in the last month.

Anything I can do to help, just let me know. Right now I am actually facing
costs because of this issue, so I am very much in favour of saving those
costs 500 days from now :-)

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-04-17 11:14:49

by Martin Dalecki

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Linus Torvalds wrote:
>
> On Wed, 17 Apr 2002, Mark Mielke wrote:
>
>>On Tue, Apr 16, 2002 at 08:57:09PM -0400, Robert Love wrote:
>>
>>>Because that is what Alpha does? It seems to me there is no reason for
>>>a power-of-two timer value, and using 1024 vs 1000 just makes the math
>>>and rounding more difficult.
>>
>>Only from the perspective of time displayed to a user... :-)
>
>
> No, it also makes it much easier to convert to/from the standard UNIX time
> formats (ie "struct timeval" and "struct timespec") without any surprises,
> because a jiffy is exactly representable in both if you have a HZ value
> of 100 or 100, but not if your HZ is 1024.

And infally 100HZ is (by accident) quite right on the perceptive
threshold of a human, which is about 0.15 of a second :).

2002-04-17 12:36:47

by Bill Davidsen

[permalink] [raw]

Subject: Re: please merge 64-bit jiffy patches.

On Wed, 17 Apr 2002, bert hubert wrote:

> Anything I can do to help, just let me know. Right now I am actually facing
> costs because of this issue, so I am very much in favour of saving those
> costs 500 days from now :-)

Other than a few things reporting wrong numbers, what costs do you
anticipate? I have servers in six USA states (four timezones) and I
haven't seen any real ill-effect on this.

Back in the Xenix days we had servers on three continents and they were
doing critical applications. There were serious costs there, people did
have to be on site for a reboot.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-04-17 12:44:10

by Kent Borg

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Wed, Apr 17, 2002 at 08:37:43AM +1000, Herbert Xu wrote:
> Why are we still measuring uptime using the tick variable? Ticks != time.
> Surely we should be recording the boot time somewhere (probably on a
> file system), and then comparing that with the current time?

It depends on the meaning of "is", er, opps, I mean: it depends on the
meaning of "uptime".

The notebook I am typing on at this moment was last booted just about
exactly 8 days ago (judging from the timestamp on /var/log/dmesg) but
in a cat-like way it spends a lot of its time asleep and so top
reports an uptime of only "4 days, 2:42".

Which is correct? I suggest that the smaller number is closer to
correct because that is roughly the amount of time the system has
actually spent running.

-kb, the Kent who expects this question to get more complicated as the
new suspend gets more and more clever and if the kernel ever starts
seriously catnapping on its own.

2002-04-17 12:42:34

[permalink] [raw]

Subject: Re: please merge 64-bit jiffy patches.

On Wed, Apr 17, 2002 at 08:33:34AM -0400, Bill Davidsen wrote:

> Other than a few things reporting wrong numbers, what costs do you
> anticipate? I have servers in six USA states (four timezones) and I
> haven't seen any real ill-effect on this.

I have been advised by Alan to treat the jiffy wraparound as a scheduled
maintenance event. I tend to trust bearded kernel hackers from Wales.

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-04-17 15:00:42

by Bill Davidsen

[permalink] [raw]

Subject: Re: please merge 64-bit jiffy patches.

On Wed, 17 Apr 2002, bert hubert wrote:

> On Wed, Apr 17, 2002 at 08:33:34AM -0400, Bill Davidsen wrote:
>
> > Other than a few things reporting wrong numbers, what costs do you
> > anticipate? I have servers in six USA states (four timezones) and I
> > haven't seen any real ill-effect on this.
>
> I have been advised by Alan to treat the jiffy wraparound as a scheduled
> maintenance event. I tend to trust bearded kernel hackers from Wales.

Alan has to be conservative, since he want to avoid giving potentially
damaging advice to someone. However, since you can take a reboot at your
convenience and schedule a year in advance, I still don't see the great
cost. If you have an app which must be up 7x24 and don't have seamless
backup you have other problems more serious than timer wrap.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-04-22 16:09:43

by Pavel Machek

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Hi!

> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).

.5% still looks like a lot to me. Good compiler optimization is .5% on
average...

And think what it does with old 386sx.. Maybe time for those "tick on demand"
patches?
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2002-04-22 17:20:51

by John Alvord

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Sun, 21 Apr 2002, Pavel Machek wrote:

> Hi!
>
> > Davide> i still have pieces of paper on my desk about tests done on
> > Davide> my dual piii where by hacking HZ to 1000 the kernel build
> > Davide> time went from an average of 2min:30sec to an average
> > Davide> 2min:43sec. that is pretty close to 10%
> >
> > The last time I measured timer tick overhead on ia64 it was well below
> > 1% of overhead. I don't really like using kernel builds as a
> > benchmark, because there are far too many variables for the results to
> > have any long-term or cross-platform value. But since it's popular, I
> > did measure it quickly on a relatively slow (old) Itanium box: with
> > 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> > (2.4.18 UP kernel).
>
> .5% still looks like a lot to me. Good compiler optimization is .5% on
> average...
>
> And think what it does with old 386sx.. Maybe time for those "tick on demand"
> patches?

Doesn't IBM have a tickless patch.. useful when demonstrating 10,000
virtual linux machines on a single system.

john alvord

2002-04-22 17:25:00

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

>>>>> On Sun, 21 Apr 2002 18:00:22 +0000, Pavel Machek <[email protected]> said:

Pavel> .5% still looks like a lot to me. Good compiler optimization
Pavel> is .5% on average...

Umh, but those optimizations are interesting only if they're
cumulative, i.e., once you've got 10 of them and they make a *total*
difference of 5% (actually, I'm doubtful anyone really notices
differences of 20-30% other than for benchmarking purposes... ;-).

For me, 1% is the magic threshold. If we find real apps that get a
higher penalty than that, I'd either lower the HZ or see if we can
tune the timer tick to be within a safe margin.

No matter what, though, higher tick rate clearly incurs somewhat
higher overhead. The benefit is lower application-level response time
and finer-granularity timeouts. I assume Robert has all the
benchmarks to show that. ;-)

--david

2002-04-22 21:53:09

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

John Alvord wrote:
>
> On Sun, 21 Apr 2002, Pavel Machek wrote:
>
> > Hi!
> >
> > > Davide> i still have pieces of paper on my desk about tests done on
> > > Davide> my dual piii where by hacking HZ to 1000 the kernel build
> > > Davide> time went from an average of 2min:30sec to an average
> > > Davide> 2min:43sec. that is pretty close to 10%
> > >
> > > The last time I measured timer tick overhead on ia64 it was well below
> > > 1% of overhead. I don't really like using kernel builds as a
> > > benchmark, because there are far too many variables for the results to
> > > have any long-term or cross-platform value. But since it's popular, I
> > > did measure it quickly on a relatively slow (old) Itanium box: with
> > > 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> > > (2.4.18 UP kernel).
> >
> > .5% still looks like a lot to me. Good compiler optimization is .5% on
> > average...
> >
> > And think what it does with old 386sx.. Maybe time for those "tick on demand"
> > patches?
>
> Doesn't IBM have a tickless patch.. useful when demonstrating 10,000
> virtual linux machines on a single system.

Please folks. When can we put the "tick on demand" thing to bed? If in
doubt, get the patch from the high-res-timers sourceforge site (see
signature for the URL) and try it. Overhead becomes higher with system
load passing the ticked system at relatively light loads. Just what we
want, very low overhead idle systems!

The problem is in accounting (or time slicing if you prefer) where we
need to start a timer each time a task is context switched to, and stop
it when the task is switched away. The overhead is purely in the set up
and tear down. MOST of these never expire.

-g
>
> john alvord
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-22 23:06:23

by J.D. Bakker

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

At 14:52 -0700 22-04-2002, george anzinger wrote:
>John Alvord wrote:
> > On Sun, 21 Apr 2002, Pavel Machek wrote:
> > > And think what it does with old 386sx.. Maybe time for those
>"tick on demand"
>> > patches?
>>
> > Doesn't IBM have a tickless patch.. useful when demonstrating 10,000
>> virtual linux machines on a single system.
>
>Please folks. When can we put the "tick on demand" thing to bed? If in
>doubt, get the patch from the high-res-timers sourceforge site (see
>signature for the URL) and try it. Overhead becomes higher with system
>load passing the ticked system at relatively light loads. Just what we
>want, very low overhead idle systems!

During idle, the current monitors on our StrongARM-based low power
testbed show a distinct 100Hz beat. A significant portion of idle
power consumption can be attributed to the timer interrupt. IIRC the
IBM LinuxWatch people came to a similar conclusion.

In some cases we definitely do want very low overhead idle systems.
And of course on ARM systems context switches are relatively
expensive anyway, due to the need to flush the (virtually
indexed/tagged) caches.

JDB
[not that I'm proposing to inflict this on the mainline kernel]
--
LART. 250 MIPS under one Watt. Free hardware design files.
http://www.lart.tudelft.nl/

2002-04-22 23:27:41

by Anton Blanchard

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

> Please folks. When can we put the "tick on demand" thing to bed? If in
> doubt, get the patch from the high-res-timers sourceforge site (see
> signature for the URL) and try it. Overhead becomes higher with system
> load passing the ticked system at relatively light loads. Just what we
> want, very low overhead idle systems!
>
> The problem is in accounting (or time slicing if you prefer) where we
> need to start a timer each time a task is context switched to, and stop
> it when the task is switched away. The overhead is purely in the set up
> and tear down. MOST of these never expire.

Did you work out where exactly the overhead was and if it was hardware
specific? On ppc for example updating the timer is just a write to a cpu
register.

Anton

2002-04-23 06:50:53

by Alan

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

> The problem is in accounting (or time slicing if you prefer) where we
> need to start a timer each time a task is context switched to, and stop
> it when the task is switched away. The overhead is purely in the set up
> and tear down. MOST of these never expire.

Done properly on many platforms a variable tick is very very easy and also
very efficient to handle. X86 is a paticular problem case because the timer
is so expensive to fiddle with

2002-04-23 07:17:28

by Andi Kleen

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Alan Cox <[email protected]> writes:

> > The problem is in accounting (or time slicing if you prefer) where we
> > need to start a timer each time a task is context switched to, and stop
> > it when the task is switched away. The overhead is purely in the set up
> > and tear down. MOST of these never expire.
>
> Done properly on many platforms a variable tick is very very easy and also
> very efficient to handle. X86 is a paticular problem case because the timer
> is so expensive to fiddle with

Depends. On modern x86 you can either use the local APIC timer or
the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
should be in newer x86 chipsets). Both should be better than the
8254 timer and are also not expensive to work with.

-Andi

2002-04-23 19:04:58

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Anton Blanchard wrote:
>
>
> > Please folks. When can we put the "tick on demand" thing to bed? If in
> > doubt, get the patch from the high-res-timers sourceforge site (see
> > signature for the URL) and try it. Overhead becomes higher with system
> > load passing the ticked system at relatively light loads. Just what we
> > want, very low overhead idle systems!
> >
> > The problem is in accounting (or time slicing if you prefer) where we
> > need to start a timer each time a task is context switched to, and stop
> > it when the task is switched away. The overhead is purely in the set up
> > and tear down. MOST of these never expire.
>
> Did you work out where exactly the overhead was and if it was hardware
> specific? On ppc for example updating the timer is just a write to a cpu
> register.

It has nothing to do with hardware. The over head is putting a timer
entry in the list and then removing it. Almost all timers are canceled
before they expire. Even with the O(1) timer list, this takes time and
when done at the context switch rate the time mounts rapidly. And we
need at least one timer when we switch to a task. In the test code I
only start a "slice" timer. This means that a task that wants a
execution time signal may find the signal delayed by as much as a slice,
but it does keep the overhead lower.
>
> Anton

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-23 19:11:47

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Andi Kleen wrote:
>
> Alan Cox <[email protected]> writes:
>
> > > The problem is in accounting (or time slicing if you prefer) where we
> > > need to start a timer each time a task is context switched to, and stop
> > > it when the task is switched away. The overhead is purely in the set up
> > > and tear down. MOST of these never expire.
> >
> > Done properly on many platforms a variable tick is very very easy and also
> > very efficient to handle. X86 is a paticular problem case because the timer
> > is so expensive to fiddle with
>
> Depends. On modern x86 you can either use the local APIC timer or
> the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
> should be in newer x86 chipsets). Both should be better than the
> 8254 timer and are also not expensive to work with.

I must not be making my self clear :) The overhead has nothing to do
with hardware. It is all timer list insertion and deletion. The
problem is that we need to do this at context switch rates, which are
MUCH higher that tick rates and, even with the O(1) insertion code,
cause the overhead to increase above the ticked overhead.

-g
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-23 19:27:03

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Andi Kleen wrote:
>
> Alan Cox <[email protected]> writes:
>
> > > The problem is in accounting (or time slicing if you prefer) where we
> > > need to start a timer each time a task is context switched to, and stop
> > > it when the task is switched away. The overhead is purely in the set up
> > > and tear down. MOST of these never expire.
> >
> > Done properly on many platforms a variable tick is very very easy and also
> > very efficient to handle. X86 is a paticular problem case because the timer
> > is so expensive to fiddle with
>
> Depends. On modern x86 you can either use the local APIC timer or
> the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
> should be in newer x86 chipsets). Both should be better than the
> 8254 timer and are also not expensive to work with.

I just looked at the mmtimers. Looks like the right idea but a bit
overblown. I would prefer an interrupt generated by a compare to the
TSC all on board the cpu chip. This would eliminate the I/O overhead.
Still the 8-bit PIT is the pits.

When can we expect to see this in a real cpu?

-g
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-23 19:36:01

by Andi Kleen

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, Apr 23, 2002 at 12:24:06PM -0700, george anzinger wrote:
> Andi Kleen wrote:
> >
> > Alan Cox <[email protected]> writes:
> >
> > > > The problem is in accounting (or time slicing if you prefer) where we
> > > > need to start a timer each time a task is context switched to, and stop
> > > > it when the task is switched away. The overhead is purely in the set up
> > > > and tear down. MOST of these never expire.
> > >
> > > Done properly on many platforms a variable tick is very very easy and also
> > > very efficient to handle. X86 is a paticular problem case because the timer
> > > is so expensive to fiddle with
> >
> > Depends. On modern x86 you can either use the local APIC timer or
> > the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
> > should be in newer x86 chipsets). Both should be better than the
> > 8254 timer and are also not expensive to work with.
>
> I just looked at the mmtimers. Looks like the right idea but a bit
> overblown. I would prefer an interrupt generated by a compare to the
> TSC all on board the cpu chip. This would eliminate the I/O overhead.

That's the local APIC timer. Pretty much all modern x86 have it.
But at least microsoft warns from using them for high precision
tim ekeeping on their mmtimer page "due to inaccuracy and
frequent silicon bugs" (and I guess they have the data for that)

For scheduling and time accounting it seems to work reasonably though,
even though I've had some problems with inaccuracies (e.g. when you
instrument both 8254 and apic timer and log the TSCs there are sometimes
drifts)

The linux local APIC timer setup could be probably also improved, for
example the 16 multiplier is a bit dubious and the calibration does not
look very robust.

> When can we expect to see this in a real cpu?

mmtimers? They are in the chipset, not in the CPU.
They are in some modern Intel and AMD chipsets already for example and
Microsoft is pushing them too so I guess they will be soon in all new
chipsets.

-Andi

2002-04-23 22:43:58

by Albert D. Cahalan

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Matti Aarnio writes:
> On Wed, Apr 17, 2002 at 02:01:42AM -0400, Robert Love wrote:
>> On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:

>>> No, it also makes it much easier to convert to/from the standard UNIX time
>>> formats (ie "struct timeval" and "struct timespec") without any surprises,
>>> because a jiffy is exactly representable in both if you have a HZ value
>>> of 100 or 100, but not if your HZ is 1024.
>>
>> Exactly - this was my issue. So what _was_ the rationale behind Alpha
>> picking 1024 (and others following)? More importantly, can we change to
>> 1000?
>
> Alpha processors don't have full division hardware, they have to
> iterate it one bit at the time. They do have a flash multiplier,
> and a barrel-shifter. Shifts take one pipeline cycle, like to
> addition and substraction. Multiply takes 6-12 depending on model,
> but division takes 64...

Division by 1000 is a UMULH followed by a right shift.
So maybe it costs you one cycle more than division by 1024 would.

2002-04-24 01:23:44

by Alan

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

> I must not be making my self clear :) The overhead has nothing to do
> with hardware. It is all timer list insertion and deletion. The
> problem is that we need to do this at context switch rates, which are
> MUCH higher that tick rates and, even with the O(1) insertion code,
> cause the overhead to increase above the ticked overhead.

I remain unconvinced. Firstly the timer changes do not have to
occur at schedule rate unless your implementaiton is incredibly naiive.
Secondly for the specfic schedule case done that way, it would be even more
naiive to use the standard timer api over a single compare to getthe
timer list versus schedule clock.

2002-04-24 17:25:38

by Maciej W. Rozycki

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

On Tue, 23 Apr 2002, Andi Kleen wrote:

> That's the local APIC timer. Pretty much all modern x86 have it.
> But at least microsoft warns from using them for high precision
> tim ekeeping on their mmtimer page "due to inaccuracy and
> frequent silicon bugs" (and I guess they have the data for that)

That's nothing new -- I recall a problem of missing half a tick each
time when hardware reloads the timer after reaching zero with certain
revisions of Pentium CPUs. It is documented in the specification update.

> The linux local APIC timer setup could be probably also improved, for
> example the 16 multiplier is a bit dubious and the calibration does not
> look very robust.

When fiddling with the predivider, please keep in mind the i82489DX only
supports 2, 4, 8 and 16 as dividers and you may set up 1 (i.e. no
division) but in a different way -- by setting LVTT appropriately (use
SET_APIC_TIMER_BASE(APIC_TIMER_BASE_CLKIN)).

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-04-24 20:21:27

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Alan Cox wrote:
>
> > I must not be making my self clear :) The overhead has nothing to do
> > with hardware. It is all timer list insertion and deletion. The
> > problem is that we need to do this at context switch rates, which are
> > MUCH higher that tick rates and, even with the O(1) insertion code,
> > cause the overhead to increase above the ticked overhead.
>
> I remain unconvinced. Firstly the timer changes do not have to
> occur at schedule rate unless your implementaiton is incredibly naiive.

OK, I'll bite, how do you stop a task at the end of its slice if you
don't set up a timer event for that time?

> Secondly for the specfic schedule case done that way, it would be even more
> naiive to use the standard timer api over a single compare to getthe
> timer list versus schedule clock.

I guess it is my day to be naive :) What are you suggesting here?

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-27 20:09:33

by Alan

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

> > I remain unconvinced. Firstly the timer changes do not have to
> > occur at schedule rate unless your implementaiton is incredibly naiive.
>
> OK, I'll bite, how do you stop a task at the end of its slice if you
> don't set up a timer event for that time?

At high scheduling rate you task switch more often than you hit the timer,
so you want to handle it in a lazy manner most of the time. Ie so long as
the timer goes off before the time slice expire why frob it

> > Secondly for the specfic schedule case done that way, it would be even more
> > naiive to use the standard timer api over a single compare to getthe
> > timer list versus schedule clock.
>
> I guess it is my day to be naive :) What are you suggesting here?

At the point you think about setting the timer register you do

next_clock = first_of(timers->head, next_timeslice);
if(before(next_clock, current_clock)
{
current_clock = next_clock;
set_timeout(next_clock);
}

2002-04-28 06:03:57

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Alan Cox wrote:
>
> > > I remain unconvinced. Firstly the timer changes do not have to
> > > occur at schedule rate unless your implementaiton is incredibly naiive.
> >
> > OK, I'll bite, how do you stop a task at the end of its slice if you
> > don't set up a timer event for that time?
>
> At high scheduling rate you task switch more often than you hit the timer,
> so you want to handle it in a lazy manner most of the time. Ie so long as
> the timer goes off before the time slice expire why frob it

So then we test for this condition (avoiding races, of course) and if
so, what? We will have a timer interrupt prior to the slice end, and
will have to make this decision all over again. However, the real rub
is that we have to keep track of elapsed time and account for that (i.e.
shorten the remaining slice) not only in the timer interrupt, but each
context switch. We are still doing more work each schedule and making
it "smaller" just puts off the inevitable, i.e. at some level of
scheduling activity we will accumulate more time in this accounting code
than in the current "flat" or constant overhead way of doing things.
>

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-28 09:25:44

by Alan

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

> so, what? We will have a timer interrupt prior to the slice end, and
> will have to make this decision all over again. However, the real rub

Only on unusual occasions.

> is that we have to keep track of elapsed time and account for that (i.e.
> shorten the remaining slice) not only in the timer interrupt, but each

We do anyway

Alan

2002-04-28 17:35:12

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

Alan Cox wrote:
>
> > so, what? We will have a timer interrupt prior to the slice end, and
> > will have to make this decision all over again. However, the real rub
>
> Only on unusual occasions.
>
> > is that we have to keep track of elapsed time and account for that (i.e.
> > shorten the remaining slice) not only in the timer interrupt, but each
>
> We do anyway

Yes, but now we do all this in the timer tick, not in schedule(). This
occures much less often.
>
> Alan

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2002-04-28 18:40:46

by Alan

[permalink] [raw]

Subject: Re: Why HZ on i386 is 100 ?

> > We do anyway
>
> Yes, but now we do all this in the timer tick, not in schedule(). This
> occures much less often.

Well in the timer tick code we already hold the locks needed to check
the front of the timer queue safely, we already have current and the top
timer needing to touch cache (current for accounting stats at the least).
So thats what an extra compare and cmov - 1 clock maybe 2 ?

2002-04-28 21:51:05