Hi,
I would like to know why exactly this value was choosen.
Is it safe to change it to eg. 1024? Will it break anything?
What else should I change to get it working:
CLOCKS_PER_SEC?
Please CC me.
Regards,
Olaf Fraczyk
On Tue, Apr 16, 2002 at 09:47:48AM +0200, Olaf Fraczyk wrote:
> Hi,
> I would like to know why exactly this value was choosen.
> Is it safe to change it to eg. 1024? Will it break anything?
> What else should I change to get it working:
> CLOCKS_PER_SEC?
> Please CC me.
> Regards,
> Olaf Fraczyk
I tried a few times running with HZ == 1024 for some testing (or I guess
just to see what happened). I didn't see any problems, even without the
obscure CLOCKS_PER_SEC ELF business.
Cheers,
Bill
I remember seeing somewhere unix system VII used to have HZ set to 60
for the machines built in the 70's. I wonder if todays pentium iiis and ivs
should still use HZ of 100, though their internal clock is in GHz.
I think somethings in the kernel may be tuned for the value of HZ, these
things would be arch specific.
Increasing the HZ on your system should change the scheduling behaviour,
it could lead to more aggresive scheduling and could affect the
behaviour of the VM subsystem if scheduling happens more frequently. I am
just guessing, I do not know.
Changing though trivial would require a good look at all the code that
uses HZ.
Comments,
Balbir
|-----Original Message-----
|From: [email protected]
|[mailto:[email protected]]On Behalf Of William Lee
|Irwin III
|Sent: Tuesday, April 16, 2002 1:45 PM
|To: Olaf Fraczyk
|Cc: [email protected]
|Subject: Re: Why HZ on i386 is 100 ?
|
|
|On Tue, Apr 16, 2002 at 09:47:48AM +0200, Olaf Fraczyk wrote:
|> Hi,
|> I would like to know why exactly this value was choosen.
|> Is it safe to change it to eg. 1024? Will it break anything?
|> What else should I change to get it working:
|> CLOCKS_PER_SEC?
|> Please CC me.
|> Regards,
|> Olaf Fraczyk
|
|I tried a few times running with HZ == 1024 for some testing (or I guess
|just to see what happened). I didn't see any problems, even without the
|obscure CLOCKS_PER_SEC ELF business.
|
|
|Cheers,
|Bill
|-
|To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
|the body of a message to [email protected]
|More majordomo info at http://vger.kernel.org/majordomo-info.html
|Please read the FAQ at http://www.tux.org/lkml/
On Tue, 2002-04-16 at 09:18, BALBIR SINGH wrote:
> I remember seeing somewhere unix system VII used to have HZ set to 60
> for the machines built in the 70's. I wonder if todays pentium iiis and ivs
> should still use HZ of 100, though their internal clock is in GHz.
>
> I think somethings in the kernel may be tuned for the value of HZ, these
> things would be arch specific.
>
> Increasing the HZ on your system should change the scheduling behaviour,
> it could lead to more aggresive scheduling and could affect the
> behaviour of the VM subsystem if scheduling happens more frequently. I am
> just guessing, I do not know.
>
I remember reading that a higher HZ value will make your machine more
responsive, but will also mean that each running process will have a
smaller CPU time slice and that the kernel will spend more CPU time
scheduling at the expense of processes.
HTH
Liam Girdwood
> Changing though trivial would require a good look at all the code that
> uses HZ.
>
> Comments,
> Balbir
>
> |-----Original Message-----
> |From: [email protected]
> |[mailto:[email protected]]On Behalf Of William Lee
> |Irwin III
> |Sent: Tuesday, April 16, 2002 1:45 PM
> |To: Olaf Fraczyk
> |Cc: [email protected]
> |Subject: Re: Why HZ on i386 is 100 ?
> |
> |
> |On Tue, Apr 16, 2002 at 09:47:48AM +0200, Olaf Fraczyk wrote:
> |> Hi,
> |> I would like to know why exactly this value was choosen.
> |> Is it safe to change it to eg. 1024? Will it break anything?
> |> What else should I change to get it working:
> |> CLOCKS_PER_SEC?
> |> Please CC me.
> |> Regards,
> |> Olaf Fraczyk
> |
> |I tried a few times running with HZ == 1024 for some testing (or I guess
> |just to see what happened). I didn't see any problems, even without the
> |obscure CLOCKS_PER_SEC ELF business.
> |
> |
> |Cheers,
> |Bill
> |-
> |To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> |the body of a message to [email protected]
> |More majordomo info at http://vger.kernel.org/majordomo-info.html
> |Please read the FAQ at http://www.tux.org/lkml/
> ----
>
> **************************Disclaimer************************************
>
>
>
> Information contained in this E-MAIL being proprietary to Wipro Limited
> is 'privileged' and 'confidential' and intended for use only by the
> individual or entity to which it is addressed. You are notified that any
> use, copying or dissemination of the information contained in the E-MAIL
> in any manner whatsoever is strictly prohibited.
>
>
>
> ********************************************************************
--
Liam Girdwood
[email protected] (Work)
[email protected] (Home)
On 2002.04.16 12:29 Liam Girdwood wrote:
> On Tue, 2002-04-16 at 09:18, BALBIR SINGH wrote:
> > I remember seeing somewhere unix system VII used to have HZ set to
> 60
> > for the machines built in the 70's. I wonder if todays pentium iiis
> and ivs
> > should still use HZ of 100, though their internal clock is in GHz.
> >
> > I think somethings in the kernel may be tuned for the value of HZ,
> these
> > things would be arch specific.
> >
> > Increasing the HZ on your system should change the scheduling
> behaviour,
> > it could lead to more aggresive scheduling and could affect the
> > behaviour of the VM subsystem if scheduling happens more frequently.
> I am
> > just guessing, I do not know.
> >
>
> I remember reading that a higher HZ value will make your machine more
> responsive, but will also mean that each running process will have a
> smaller CPU time slice and that the kernel will spend more CPU time
> scheduling at the expense of processes.
>
Has anyone measured this?
This shouldn't be a big problem, because some architectures use value
1024, eg. Alpha, ia-64.
And todays Intel/AMD 32-bit processors are as fast as Alpha was 1-2
years ago.
Regards,
Olaf
>> -----Original Message-----
>> From: Olaf Fraczyk [mailto:[email protected]]
>> Sent: mardi 16 avril 2002 12:02
>> To: Liam Girdwood
>> Cc: BALBIR SINGH; William Lee Irwin III; [email protected]
>> Subject: Re: Why HZ on i386 is 100 ?
>>
>>
>> On 2002.04.16 12:29 Liam Girdwood wrote:
>> > On Tue, 2002-04-16 at 09:18, BALBIR SINGH wrote:
>> > > I remember seeing somewhere unix system VII used to have
>> HZ set to
>> > 60
>> > > for the machines built in the 70's. I wonder if todays
>> pentium iiis
>> > and ivs
>> > > should still use HZ of 100, though their internal clock
>> is in GHz.
>> > >
>> > > I think somethings in the kernel may be tuned for the
>> value of HZ,
>> > these
>> > > things would be arch specific.
>> > >
>> > > Increasing the HZ on your system should change the scheduling
>> > behaviour,
>> > > it could lead to more aggresive scheduling and could affect the
>> > > behaviour of the VM subsystem if scheduling happens more
>> frequently.
>> > I am
>> > > just guessing, I do not know.
>> > >
>> >
>> > I remember reading that a higher HZ value will make your
>> machine more
>> > responsive, but will also mean that each running process
>> will have a
>> > smaller CPU time slice and that the kernel will spend more CPU time
>> > scheduling at the expense of processes.
>> >
>> Has anyone measured this?
>> This shouldn't be a big problem, because some architectures
>> use value
>> 1024, eg. Alpha, ia-64.
>> And todays Intel/AMD 32-bit processors are as fast as Alpha was 1-2
>> years ago.
Anyone knows if this would be interesting to decrease this value
for computationnal farms and CPU/memory bound tasks ?
On Tue, 16 Apr 2002, William Lee Irwin III wrote:
> On Tue, Apr 16, 2002 at 09:47:48AM +0200, Olaf Fraczyk wrote:
> > Hi,
> > I would like to know why exactly this value was choosen.
> > Is it safe to change it to eg. 1024? Will it break anything?
> > What else should I change to get it working:
> > CLOCKS_PER_SEC?
> > Please CC me.
> > Regards,
> > Olaf Fraczyk
>
> I tried a few times running with HZ == 1024 for some testing (or I guess
> just to see what happened). I didn't see any problems, even without the
> obscure CLOCKS_PER_SEC ELF business.
>
>
> Cheers,
> Bill
> -
On Version 2.3.17, with a 600 MHz SMP Pentium, I set HZ to 1024 and
recompiled everything. There was no apparent difference in performance
or "feel".
Note that HZ represents the rate at which a CPU-bound process may
get the CPU taken away. Real-world tasks are more likely to be
doing I/O, thus surrendering the CPU, before this relatively long
time-slice expires. I don't think you will find any difference in
performance with real-world tasks. FYI, the Alpha uses 1024 simply
because the timer-chip can't divide down to 100 Hz.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
On Tue, 16 Apr 2002, BALBIR SINGH wrote:
> I remember seeing somewhere unix system VII used to have HZ set to 60
> for the machines built in the 70's. I wonder if todays pentium iiis and ivs
> should still use HZ of 100, though their internal clock is in GHz.
>
A different clock goes to the timer chip. It is always:
CLOCK_TICK_RATE 1193180 Hz
unless an Elan SC-520 at which time the frequency is:
CLOCK_TICK_RATE 1189200 Hz
(from ../include/asm/timex.h)
> I think somethings in the kernel may be tuned for the value of HZ, these
> things would be arch specific.
>
> Increasing the HZ on your system should change the scheduling behaviour,
> it could lead to more aggresive scheduling and could affect the
> behaviour of the VM subsystem if scheduling happens more frequently. I am
> just guessing, I do not know.
>
It doesn't/can't change scheduling behavior. It changes only the rate
at which a CPU bound task will get the CPU taken away. It also changes
the rate it which it gets it back, in a 1:1 ratio, with a net effect
of nothing-gained/nothing-lost except for preemption overhead.
> Changing though trivial would require a good look at all the code that
> uses HZ.
>
The reference to HZ seems to be correct in all the headers so changing
it is trivial.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
I seem to recall from theory that the 100HZ is human dependent. Any
higher and you would begin to notice delays from you input until
whatever program you're talking to responds.
However in order to actually notice it you must have other programs
running that uses close to 100% CPU *AT LEAST AT THE SAME OR HIGHER
PRIORITY*. To test this, just running a couple of shells/script with
while [true;] won't slow you down until you aggressively renice the
shells/script.
THus: Setting it higher *may* improve your latency if you've other CPU
intensive task going. Setting it lower will only be a boon if you have
so many active processes that the kernel spend more than negligible time
scheduling, thus you spend fewer cycles scheduling per sec. I don't know
that *so many* is with a 1 GHz CPU is but it's very likely to be > 10.
The O(1) scheduler in progress will push that even higher.
TJ
On Tue, 2002-04-16 at 12:01, Olaf Fraczyk wrote:
> On 2002.04.16 12:29 Liam Girdwood wrote:
> > On Tue, 2002-04-16 at 09:18, BALBIR SINGH wrote:
> > > I remember seeing somewhere unix system VII used to have HZ set to
> > 60
> > > for the machines built in the 70's. I wonder if todays pentium iiis
> > and ivs
> > > should still use HZ of 100, though their internal clock is in GHz.
> > >
> > > I think somethings in the kernel may be tuned for the value of HZ,
> > these
> > > things would be arch specific.
> > >
> > > Increasing the HZ on your system should change the scheduling
> > behaviour,
> > > it could lead to more aggresive scheduling and could affect the
> > > behaviour of the VM subsystem if scheduling happens more frequently.
> > I am
> > > just guessing, I do not know.
> > >
> >
> > I remember reading that a higher HZ value will make your machine more
> > responsive, but will also mean that each running process will have a
> > smaller CPU time slice and that the kernel will spend more CPU time
> > scheduling at the expense of processes.
> >
> Has anyone measured this?
> This shouldn't be a big problem, because some architectures use value
> 1024, eg. Alpha, ia-64.
> And todays Intel/AMD 32-bit processors are as fast as Alpha was 1-2
> years ago.
>
> Regards,
>
> Olaf
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
_________________________________________________________________________
Terje Eggestad mailto:[email protected]
Scali Scalable Linux Systems http://www.scali.com
Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal +47 975 31 574 (MOBILE)
N-0619 Oslo fax: +47 22 62 89 51
NORWAY
_________________________________________________________________________
> I seem to recall from theory that the 100HZ is human dependent. Any
> higher and you would begin to notice delays from you input until
> whatever program you're talking to responds.
Ultimately its because Linus pulled that number out of a hat about ten years
ago. For some workloads 1KHz is much better, for others like giant number
crunching people actually drop it down to about 5..
On Tue, Apr 16, 2002 at 03:35:19PM +0200, Terje Eggestad wrote:
> I seem to recall from theory that the 100HZ is human dependent. Any
> higher and you would begin to notice delays from you input until
> whatever program you're talking to responds.
I suspect by "higher" you mean "each tick takes up more of a second".
As in, if the HZ is *less* than 100HZ, you would notice delays when
typing, or similar.
Increasing the HZ can only improve responsiveness, however, there is a
cost (mentioned by others). The cost is that the scheduler is executed
more often per second. If the scheduler does the same amount of work
per tick, but there are more ticks per second, the scheduler does more
work overall, and the CPU is free for use by the processes less.
mark
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
On Tue, 2002-04-16 at 15:38, Mark Mielke wrote:
> On Tue, Apr 16, 2002 at 03:35:19PM +0200, Terje Eggestad wrote:
> > I seem to recall from theory that the 100HZ is human dependent. Any
> > higher and you would begin to notice delays from you input until
> > whatever program you're talking to responds.
>
> I suspect by "higher" you mean "each tick takes up more of a second".
>
> As in, if the HZ is *less* than 100HZ, you would notice delays when
> typing, or similar.
>
Quote right, my typo.
>
> mark
>
> --
> [email protected]/[email protected]/[email protected] __________________________
> . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
> |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
> | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
>
> One ring to rule them all, one ring to find them, one ring to bring them all
> and in the darkness bind them...
>
> http://mark.mielke.cc/
--
_________________________________________________________________________
Terje Eggestad mailto:[email protected]
Scali Scalable Linux Systems http://www.scali.com
Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE)
P.O.Box 150, Oppsal +47 975 31 574 (MOBILE)
N-0619 Oslo fax: +47 22 62 89 51
NORWAY
_________________________________________________________________________
On Tue, 16 Apr 2002, Olaf Fraczyk wrote:
> Hi,
> I would like to know why exactly this value was choosen.
> Is it safe to change it to eg. 1024? Will it break anything?
> What else should I change to get it working:
> CLOCKS_PER_SEC?
> Please CC me.
I think you just want to change HZ, and can do that safely. Do note that
some software may be using 100 instead of HZ, so you might get some
problems there.
Think of HZ as "how often do we want to thrash the cache of CPU-bound
processes." More is not necessarily better.
If you want low latency there are low latency and preempt patches. They
will do more the make the system responsive than increasing HZ.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Tue, 16 Apr 2002, Mark Mielke wrote:
> Increasing the HZ can only improve responsiveness, however, there is a
> cost (mentioned by others). The cost is that the scheduler is executed
> more often per second. If the scheduler does the same amount of work
> per tick, but there are more ticks per second, the scheduler does more
> work overall, and the CPU is free for use by the processes less.
Why are you discussing Linux 1.2 ?
Linux is not running the scheduler each cpu tick and hasn't
done this for years.
regards,
Rik
--
http://www.linuxsymposium.org/2002/
"You're one of those condescending OLS attendants"
"Here's a nickle kid. Go buy yourself a real t-shirt"
http://www.surriel.com/ http://distro.conectiva.com/
Rik van Riel wrote:
>
> On Tue, 16 Apr 2002, Mark Mielke wrote:
>
> > Increasing the HZ can only improve responsiveness, however, there is a
> > cost (mentioned by others). The cost is that the scheduler is executed
> > more often per second. If the scheduler does the same amount of work
> > per tick, but there are more ticks per second, the scheduler does more
> > work overall, and the CPU is free for use by the processes less.
>
> Why are you discussing Linux 1.2 ?
>
> Linux is not running the scheduler each cpu tick and hasn't
> done this for years.
Very true. However it does run the timer/clock code every tick, which is still
additional overhead when the tick time is reduced.
The basic idea (increased overhead at higher HZ) is sound, the details are not.
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
In article <[email protected]>,
Olaf Fraczyk <[email protected]> wrote:
>On 2002.04.16 12:29 Liam Girdwood wrote:
>>
>> I remember reading that a higher HZ value will make your machine more
>> responsive, but will also mean that each running process will have a
>> smaller CPU time slice and that the kernel will spend more CPU time
>> scheduling at the expense of processes.
>>
>Has anyone measured this?
>This shouldn't be a big problem, because some architectures use value
>1024, eg. Alpha, ia-64.
On the ia-64, they do indeed use a HZ value of 1000 by default.
And I've had some Intel people grumble about it, because it apparently
means that the timer tick takes anything from 2% to an extreme of 10%
(!!) of the CPU time under certain loads.
Apparently the 10% is due to cache/tlb intensive loads, and as a result
the interrupt handler just missing in the caches a lot, but still:
that's exactly the kind of load that you want to buy an ia64 for.
There's no point in saying that "the timer interrupt takes only 0.5% of
an idle CPU", if it takes a much larger chunk out of a busy one.
So the argument that a kHz timer takes a noticeable amount of CPU power
seems to be still true today - even with the "architecture of tomorrow".
Yeah, I wouldn't have believed it myself, but there it is.. You only
get the gigaHz speeds if you hit in the cache - when you miss, you start
crawling (everything is relative, of course: the crawl of today is a
rather rapid one by 6502 standards ;)
Linus
>>>>> On Tue, 16 Apr 2002 16:27:12 +0000 (UTC), [email protected] (Linus Torvalds) said:
Linus> And I've had some Intel people grumble about it, because it
Linus> apparently means that the timer tick takes anything from 2%
Linus> to an extreme of 10% (!!) of the CPU time under certain
Linus> loads.
I'm not sure I believe this. I have had occasional cases where I
wondered whether the timer tick caused significant overhead, but it
always turned out to be something else. In my measurements,
*user-level* profiling has the 2-10% overhead you're mentioning, but
that's with a signal delivered to user level on each tick.
--david
On Tue, 16 Apr 2002, David Mosberger wrote:
> >>>>> On Tue, 16 Apr 2002 16:27:12 +0000 (UTC), [email protected] (Linus Torvalds) said:
>
> Linus> And I've had some Intel people grumble about it, because it
> Linus> apparently means that the timer tick takes anything from 2%
> Linus> to an extreme of 10% (!!) of the CPU time under certain
> Linus> loads.
>
> I'm not sure I believe this. I have had occasional cases where I
> wondered whether the timer tick caused significant overhead, but it
> always turned out to be something else. In my measurements,
> *user-level* profiling has the 2-10% overhead you're mentioning, but
> that's with a signal delivered to user level on each tick.
i still have pieces of paper on my desk about tests done on my dual piii
where by hacking HZ to 1000 the kernel build time went from an average of
2min:30sec to an average 2min:43sec. that is pretty close to 10%
- Davide
On Tue, Apr 16, 2002 at 12:32:25PM -0300, Rik van Riel wrote:
> On Tue, 16 Apr 2002, Mark Mielke wrote:
> > Increasing the HZ can only improve responsiveness, however, there is a
> > cost (mentioned by others). The cost is that the scheduler is executed
> > more often per second. If the scheduler does the same amount of work
> > per tick, but there are more ticks per second, the scheduler does more
> > work overall, and the CPU is free for use by the processes less.
> Why are you discussing Linux 1.2 ?
> Linux is not running the scheduler each cpu tick and hasn't
> done this for years.
Hmm... sorry... :-) Too early in the morning...
mark
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
>>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi <[email protected]> said:
Davide> i still have pieces of paper on my desk about tests done on
Davide> my dual piii where by hacking HZ to 1000 the kernel build
Davide> time went from an average of 2min:30sec to an average
Davide> 2min:43sec. that is pretty close to 10%
Did you keep the timeslice roughly constant?
--david
On Tue, 16 Apr 2002, David Mosberger wrote:
> >>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi <[email protected]> said:
>
> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> Did you keep the timeslice roughly constant?
it was 2.5.1 time and it was still ruled by TICK_SCALE that made the
timeslice to drop from 60ms ( 100HZ ) to 21ms ( 1000HZ ).
- Davide
On Tue, Apr 16, 2002 at 08:12:22AM +0000, Olaf Fraczyk wrote:
> Hi,
> I would like to know why exactly this value was choosen.
> Is it safe to change it to eg. 1024? Will it break anything?
> What else should I change to get it working:
> CLOCKS_PER_SEC?
> Please CC me.
Your uptime wraps to zero after 49 days. I think 'top' gets confused.
Regards,
bert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
On Apr 16, 2002 23:34 +0200, bert hubert wrote:
> On Tue, Apr 16, 2002 at 08:12:22AM +0000, Olaf Fraczyk wrote:
> > Hi,
> > I would like to know why exactly this value was choosen.
> > Is it safe to change it to eg. 1024? Will it break anything?
> > What else should I change to get it working:
> > CLOCKS_PER_SEC?
> > Please CC me.
>
> Your uptime wraps to zero after 49 days. I think 'top' gets confused.
Trivially fixed with the existing 64-bit jiffies patches. As it is,
your uptime wraps to zero after 472 days or something like that if you
don't have the 64-bit jiffies patch, which is totally in the realm of
possibility for Linux servers.
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
Andreas Dilger <[email protected]> wrote:
> On Apr 16, 2002 23:34 +0200, bert hubert wrote:
>>
>> Your uptime wraps to zero after 49 days. I think 'top' gets confused.
> Trivially fixed with the existing 64-bit jiffies patches. As it is,
> your uptime wraps to zero after 472 days or something like that if you
> don't have the 64-bit jiffies patch, which is totally in the realm of
> possibility for Linux servers.
Why are we still measuring uptime using the tick variable? Ticks != time.
Surely we should be recording the boot time somewhere (probably on a
file system), and then comparing that with the current time?
--
Debian GNU/Linux 2.2 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Apr 17, 2002 08:37 +1000, Herbert Xu wrote:
> Why are we still measuring uptime using the tick variable? Ticks != time.
> Surely we should be recording the boot time somewhere (probably on a
> file system), and then comparing that with the current time?
Er, because the 'tick' is a valid count of the actual time that the
system has been running, while the "boot time" is totally meaningless.
What if the system has no RTC, or the RTC is wrong until later in the
boot sequence when it can be set by the user/ntpd? What if you pass
daylight savings time? Does your uptime increase/decrease by an hour?
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
Followup to: <[email protected]>
By author: Alan Cox <[email protected]>
In newsgroup: linux.dev.kernel
>
> > I seem to recall from theory that the 100HZ is human dependent. Any
> > higher and you would begin to notice delays from you input until
> > whatever program you're talking to responds.
>
> Ultimately its because Linus pulled that number out of a hat about ten years
> ago. For some workloads 1KHz is much better, for others like giant number
> crunching people actually drop it down to about 5..
>
Hardly so. 100 Hz was standard on most commercial Unices around the
time the first Linux was done...
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
If you change HZ to 1000, you need to change PROC_CHANGE_PENALTY
accordingly. Otherwise, process would get preempted before its time slice
gets expired. The net effect is more context switch than necessary, which
could explain the 10% difference.
-----Original Message-----
From: Davide Libenzi [mailto:[email protected]]
Sent: Tuesday, April 16, 2002 11:10 AM
To: [email protected]
Cc: Linus Torvalds; Linux Kernel Mailing List
Subject: Re: Why HZ on i386 is 100 ?
On Tue, 16 Apr 2002, David Mosberger wrote:
> >>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi
<[email protected]> said:
>
> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> Did you keep the timeslice roughly constant?
it was 2.5.1 time and it was still ruled by TICK_SCALE that made the
timeslice to drop from 60ms ( 100HZ ) to 21ms ( 1000HZ ).
- Davide
From: "Andreas Dilger" <[email protected]>
> On Apr 17, 2002 08:37 +1000, Herbert Xu wrote:
> > Why are we still measuring uptime using the tick variable? Ticks != time.
> > Surely we should be recording the boot time somewhere (probably on a
> > file system), and then comparing that with the current time?
>
> Er, because the 'tick' is a valid count of the actual time that the
> system has been running, while the "boot time" is totally meaningless.
> What if the system has no RTC, or the RTC is wrong until later in the
> boot sequence when it can be set by the user/ntpd? What if you pass
> daylight savings time? Does your uptime increase/decrease by an hour?
Well, Andreas, it seems like a very simple thing to define the time
quantum, "tick", differently from the resolution of the count reported
by a call to get the tick counter value. If the latter maintains a
constant resolution even if the tick time changes then all utilities
should continue to work. Of course, with a tick time resolution of 10mS
it gets ugly when setting up a tick time of 1mS. Ideally reporting would
have an LSB of a microsecond or even a tenth microsecond while the
increment might still be a hundredth or thousandth of a second. Of course,
that blows anything that relies on the tick counter to smithereens, I fear.
{^_^} Joanne "I STILL want a Linux suitable for multimedia applications" Dow.
[email protected] (1mS ticks is a GREAT help for multimedia apps.)
>>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi <[email protected]> said:
Davide> i still have pieces of paper on my desk about tests done on
Davide> my dual piii where by hacking HZ to 1000 the kernel build
Davide> time went from an average of 2min:30sec to an average
Davide> 2min:43sec. that is pretty close to 10%
The last time I measured timer tick overhead on ia64 it was well below
1% of overhead. I don't really like using kernel builds as a
benchmark, because there are far too many variables for the results to
have any long-term or cross-platform value. But since it's popular, I
did measure it quickly on a relatively slow (old) Itanium box: with
100Hz, the kernel compile was about 0.6% faster than with 1024Hz
(2.4.18 UP kernel).
--david
On Tue, 16 Apr 2002, Chen, Kenneth W wrote:
> If you change HZ to 1000, you need to change PROC_CHANGE_PENALTY
> accordingly. Otherwise, process would get preempted before its time slice
> gets expired. The net effect is more context switch than necessary, which
> could explain the 10% difference.
that might be the case. i was not running the latsched sampler during that
test, that would have helped me in detecting extra task bounces/cs
- Davide
On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
> But since it's popular, I did measure it quickly on a relatively
> slow (old) Itanium box: with 100Hz, the kernel compile was about
> 0.6% faster than with 1024Hz (2.4.18 UP kernel).
One question I have always had is why 1024 and not 1000 ?
Because that is what Alpha does? It seems to me there is no reason for
a power-of-two timer value, and using 1024 vs 1000 just makes the math
and rounding more difficult.
Robert Love
On 16 Apr 2002, Robert Love wrote:
> On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
>
> > But since it's popular, I did measure it quickly on a relatively
> > slow (old) Itanium box: with 100Hz, the kernel compile was about
> > 0.6% faster than with 1024Hz (2.4.18 UP kernel).
>
> One question I have always had is why 1024 and not 1000 ?
>
> Because that is what Alpha does? It seems to me there is no reason for
> a power-of-two timer value, and using 1024 vs 1000 just makes the math
> and rounding more difficult.
maybe because of the old TICK_SCALE macro ...
- Davide
On Tue, 16 Apr 2002, David Mosberger wrote:
> >>>>> On Tue, 16 Apr 2002 10:18:18 -0700 (PDT), Davide Libenzi <[email protected]> said:
>
> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).
uhm, this is quite interesting. it's quite possible at this point that
PROC_CHANGE_PENALTY put an high cs pressure in place, with terrible cache
effects. pretty sadly i was not running the sampler that would have helped
me to detect such behaviour.
- Davide
Why not just try and modify the time slice scale? wouldn't this then
help with what you are trying to gain, while leaving other values alone
that rely on 100HZ?
Doesn't an unmodified TICK_SCALE negate most of the lowered time slice
effect you'd gain from raising HZ anyway?
Seems like Ingo did a bunch of work on trying to get the "sweet spot"
time slice quanta value already, and I don't think he did it with the HZ
value, but I could be wrong.
And if it is Xwindows performance you are trying for, it's not the
kernel (flying by the seat of my pants here). I can have crappy X
performance and yet my audio never skips a beat, running at exactly the
same priority as X(though my X perf problems seem to stem from multiple
clients rendering on screen at the same time.;-)(No disrespect to the X
developers, since it does a lot of thing very nicely :-)
Dan
On Tue, 2002-04-16 at 20:57, Robert Love wrote:
> On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
>
> > But since it's popular, I did measure it quickly on a relatively
> > slow (old) Itanium box: with 100Hz, the kernel compile was about
> > 0.6% faster than with 1024Hz (2.4.18 UP kernel).
>
> One question I have always had is why 1024 and not 1000 ?
> Because that is what Alpha does? It seems to me there is no reason for
> a power-of-two timer value, and using 1024 vs 1000 just makes the math
> and rounding more difficult.
>
> Robert Love
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Tue, Apr 16, 2002 at 04:56:31PM -0600, Andreas Dilger wrote:
>
> Er, because the 'tick' is a valid count of the actual time that the
> system has been running, while the "boot time" is totally meaningless.
> What if the system has no RTC, or the RTC is wrong until later in the
> boot sequence when it can be set by the user/ntpd? What if you pass
> daylight savings time? Does your uptime increase/decrease by an hour?
Tick is the number of timer interrupts that you've collected, which
may or may not be exactly 100Hz. In fact, after 400 days of operation,
the deviation from true time is likely to be above 1 hour.
Anyway, you don't need the RTC since you can always fall back to the
system clock which is no worse than before. However, if you do have
an accurate clock source then this is much better than using the tick.
If you use ntpd, then you can simply record the time on a server that
you trust, and use the tick reading at that point in time to deduce
the boot time.
--
Debian GNU/Linux 2.2 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
David Mosberger wrote:
> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).
How hard would it be to tune HZ dynamically at run time, either through
kernel smarts, or driven from user space by some sort of daemon or other
(manual) control?
Ben
--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear
On Tue, Apr 16, 2002 at 08:57:09PM -0400, Robert Love wrote:
> On Tue, 2002-04-16 at 20:49, David Mosberger wrote:
> > But since it's popular, I did measure it quickly on a relatively
> > slow (old) Itanium box: with 100Hz, the kernel compile was about
> > 0.6% faster than with 1024Hz (2.4.18 UP kernel).
> One question I have always had is why 1024 and not 1000 ?
>
> Because that is what Alpha does? It seems to me there is no reason for
> a power-of-two timer value, and using 1024 vs 1000 just makes the math
> and rounding more difficult.
Only from the perspective of time displayed to a user... :-)
Of course, that may be one of the only factors...
mark
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
On Wed, 17 Apr 2002, Mark Mielke wrote:
> On Tue, Apr 16, 2002 at 08:57:09PM -0400, Robert Love wrote:
> >
> > Because that is what Alpha does? It seems to me there is no reason for
> > a power-of-two timer value, and using 1024 vs 1000 just makes the math
> > and rounding more difficult.
>
> Only from the perspective of time displayed to a user... :-)
No, it also makes it much easier to convert to/from the standard UNIX time
formats (ie "struct timeval" and "struct timespec") without any surprises,
because a jiffy is exactly representable in both if you have a HZ value
of 100 or 100, but not if your HZ is 1024.
Linus
On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:
> No, it also makes it much easier to convert to/from the standard UNIX time
> formats (ie "struct timeval" and "struct timespec") without any surprises,
> because a jiffy is exactly representable in both if you have a HZ value
> of 100 or 100, but not if your HZ is 1024.
Exactly - this was my issue. So what _was_ the rationale behind Alpha
picking 1024 (and others following)? More importantly, can we change to
1000?
Robert Love
>>>>> On 17 Apr 2002 02:01:42 -0400, Robert Love <[email protected]> said:
Robert> Exactly - this was my issue. So what _was_ the rationale
Robert> behind Alpha picking 1024 (and others following)?
Picking a timer tick is a bit like picking the color of a
window. Everybody has an opinion and there is no truly "right" choice.
I guarantee you whatever you pick, someone will come along and say:
why not X instead?
A power-of-2 value obviously makes it easy to divide by HZ.
Robert> More importantly, can we change to 1000?
On ia64, you can make it anything you want. User-level will pick up
the current value from sysconf(_SC_CLK_TCK).
--david
David Mosberger wrote:
> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).
Did you try a parallell build, with the number of processes at least
2-3 times the number of processors? Then you get more
of the cache-miss effects from switching processes, not
merely the overhead of the fairly fast scheduler.
Helge Hafting
In article <1019023303.1670.37.camel@phantasy> you wrote:
> On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:
>
>> No, it also makes it much easier to convert to/from the standard UNIX time
>> formats (ie "struct timeval" and "struct timespec") without any surprises,
>> because a jiffy is exactly representable in both if you have a HZ value
>> of 100 or 100, but not if your HZ is 1024.
>
> Exactly - this was my issue. So what _was_ the rationale behind Alpha
> picking 1024
I seem to remember that this was for allowing True64 unix binaries to run
as well, those expect HZ to be 1024.....
--
But when you distribute the same sections as part of a whole which is a work
based on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the entire whole,
and thus to each and every part regardless of who wrote it. [sect.2 GPL]
On Wed, Apr 17, 2002 at 02:01:42AM -0400, Robert Love wrote:
> On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:
> > No, it also makes it much easier to convert to/from the standard UNIX time
> > formats (ie "struct timeval" and "struct timespec") without any surprises,
> > because a jiffy is exactly representable in both if you have a HZ value
> > of 100 or 100, but not if your HZ is 1024.
>
> Exactly - this was my issue. So what _was_ the rationale behind Alpha
> picking 1024 (and others following)? More importantly, can we change to
> 1000?
Alpha processors don't have full division hardware, they have to
iterate it one bit at the time. They do have a flash multiplier,
and a barrel-shifter. Shifts take one pipeline cycle, like to
addition and substraction. Multiply takes 6-12 depending on model,
but division takes 64...
Converting the tick to gettimeofday() seconds is faster when
the tick is power of two.
> Robert Love
/Matti Aarnio
On Tue, Apr 16, 2002 at 10:24:26PM +0000, Andreas Dilger wrote:
> Trivially fixed with the existing 64-bit jiffies patches. As it is,
> your uptime wraps to zero after 472 days or something like that if you
> don't have the 64-bit jiffies patch, which is totally in the realm of
> possibility for Linux servers.
I feel your pain
4:26am up 482 days, 10:33, 2 users, load average: 0.04, 0.02, 0.00
On a very remote server.
So can we please merge the 64-bit jiffies patches? I sometimes think that
that is the main reason why alpha DOES have HZ=1024 - the jiffies there
don't wrap in an embarrassing way within two months :-)
Regards,
bert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
On Wed, 17 Apr 2002, bert hubert wrote:
> I feel your pain
>
> 4:26am up 482 days, 10:33, 2 users, load average: 0.04, 0.02, 0.00
>
> On a very remote server.
>
> So can we please merge the 64-bit jiffies patches? I sometimes think that
> that is the main reason why alpha DOES have HZ=1024 - the jiffies there
> don't wrap in an embarrassing way within two months :-)
>
Rik van Riel correctly suggested to merge it in 2.5 first. I have a
forward-ported version, but it has a minor locking issue on UP.
Albert Cahalan suggested to get rid of locking at all by only updating the
high word from the timer interupt. I will try to code this on the weekend.
I'm sorry I had no time for lobbying the merge in the last month.
Tim
> > Trivially fixed with the existing 64-bit jiffies patches. As it is,
> > your uptime wraps to zero after 472 days or something like that if you
> > don't have the 64-bit jiffies patch, which is totally in the realm of
> > possibility for Linux servers.
>
> I feel your pain
>
> 4:26am up 482 days, 10:33, 2 users, load average: 0.04, 0.02, 0.00
>
> On a very remote server.
>
> So can we please merge the 64-bit jiffies patches? I sometimes think that
> that is the main reason why alpha DOES have HZ=1024 - the jiffies there
> don't wrap in an embarrassing way within two months :-)
Yes, but being that the alpha is 64-bit, it doesn't wrap at 49.7 days. I've
seen mine at 60 days or so.
--
Lab tests show that use of micro$oft causes cancer in lab animals
On Wed, Apr 17, 2002 at 11:07:49AM +0000, Tim Schmielau wrote:
> Rik van Riel correctly suggested to merge it in 2.5 first. I have a
> forward-ported version, but it has a minor locking issue on UP.
I think that would be right. It touches a lot of code.
> Albert Cahalan suggested to get rid of locking at all by only updating the
> high word from the timer interupt. I will try to code this on the weekend.
Smart.
> I'm sorry I had no time for lobbying the merge in the last month.
Anything I can do to help, just let me know. Right now I am actually facing
costs because of this issue, so I am very much in favour of saving those
costs 500 days from now :-)
Regards,
bert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
Linus Torvalds wrote:
>
> On Wed, 17 Apr 2002, Mark Mielke wrote:
>
>>On Tue, Apr 16, 2002 at 08:57:09PM -0400, Robert Love wrote:
>>
>>>Because that is what Alpha does? It seems to me there is no reason for
>>>a power-of-two timer value, and using 1024 vs 1000 just makes the math
>>>and rounding more difficult.
>>
>>Only from the perspective of time displayed to a user... :-)
>
>
> No, it also makes it much easier to convert to/from the standard UNIX time
> formats (ie "struct timeval" and "struct timespec") without any surprises,
> because a jiffy is exactly representable in both if you have a HZ value
> of 100 or 100, but not if your HZ is 1024.
And infally 100HZ is (by accident) quite right on the perceptive
threshold of a human, which is about 0.15 of a second :).
On Wed, 17 Apr 2002, bert hubert wrote:
> Anything I can do to help, just let me know. Right now I am actually facing
> costs because of this issue, so I am very much in favour of saving those
> costs 500 days from now :-)
Other than a few things reporting wrong numbers, what costs do you
anticipate? I have servers in six USA states (four timezones) and I
haven't seen any real ill-effect on this.
Back in the Xenix days we had servers on three continents and they were
doing critical applications. There were serious costs there, people did
have to be on site for a reboot.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Wed, Apr 17, 2002 at 08:37:43AM +1000, Herbert Xu wrote:
> Why are we still measuring uptime using the tick variable? Ticks != time.
> Surely we should be recording the boot time somewhere (probably on a
> file system), and then comparing that with the current time?
It depends on the meaning of "is", er, opps, I mean: it depends on the
meaning of "uptime".
The notebook I am typing on at this moment was last booted just about
exactly 8 days ago (judging from the timestamp on /var/log/dmesg) but
in a cat-like way it spends a lot of its time asleep and so top
reports an uptime of only "4 days, 2:42".
Which is correct? I suggest that the smaller number is closer to
correct because that is roughly the amount of time the system has
actually spent running.
-kb, the Kent who expects this question to get more complicated as the
new suspend gets more and more clever and if the kernel ever starts
seriously catnapping on its own.
On Wed, Apr 17, 2002 at 08:33:34AM -0400, Bill Davidsen wrote:
> Other than a few things reporting wrong numbers, what costs do you
> anticipate? I have servers in six USA states (four timezones) and I
> haven't seen any real ill-effect on this.
I have been advised by Alan to treat the jiffy wraparound as a scheduled
maintenance event. I tend to trust bearded kernel hackers from Wales.
Regards,
bert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
On Wed, 17 Apr 2002, bert hubert wrote:
> On Wed, Apr 17, 2002 at 08:33:34AM -0400, Bill Davidsen wrote:
>
> > Other than a few things reporting wrong numbers, what costs do you
> > anticipate? I have servers in six USA states (four timezones) and I
> > haven't seen any real ill-effect on this.
>
> I have been advised by Alan to treat the jiffy wraparound as a scheduled
> maintenance event. I tend to trust bearded kernel hackers from Wales.
Alan has to be conservative, since he want to avoid giving potentially
damaging advice to someone. However, since you can take a reboot at your
convenience and schedule a year in advance, I still don't see the great
cost. If you have an app which must be up 7x24 and don't have seamless
backup you have other problems more serious than timer wrap.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
Hi!
> Davide> i still have pieces of paper on my desk about tests done on
> Davide> my dual piii where by hacking HZ to 1000 the kernel build
> Davide> time went from an average of 2min:30sec to an average
> Davide> 2min:43sec. that is pretty close to 10%
>
> The last time I measured timer tick overhead on ia64 it was well below
> 1% of overhead. I don't really like using kernel builds as a
> benchmark, because there are far too many variables for the results to
> have any long-term or cross-platform value. But since it's popular, I
> did measure it quickly on a relatively slow (old) Itanium box: with
> 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> (2.4.18 UP kernel).
.5% still looks like a lot to me. Good compiler optimization is .5% on
average...
And think what it does with old 386sx.. Maybe time for those "tick on demand"
patches?
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
On Sun, 21 Apr 2002, Pavel Machek wrote:
> Hi!
>
> > Davide> i still have pieces of paper on my desk about tests done on
> > Davide> my dual piii where by hacking HZ to 1000 the kernel build
> > Davide> time went from an average of 2min:30sec to an average
> > Davide> 2min:43sec. that is pretty close to 10%
> >
> > The last time I measured timer tick overhead on ia64 it was well below
> > 1% of overhead. I don't really like using kernel builds as a
> > benchmark, because there are far too many variables for the results to
> > have any long-term or cross-platform value. But since it's popular, I
> > did measure it quickly on a relatively slow (old) Itanium box: with
> > 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> > (2.4.18 UP kernel).
>
> .5% still looks like a lot to me. Good compiler optimization is .5% on
> average...
>
> And think what it does with old 386sx.. Maybe time for those "tick on demand"
> patches?
Doesn't IBM have a tickless patch.. useful when demonstrating 10,000
virtual linux machines on a single system.
john alvord
>>>>> On Sun, 21 Apr 2002 18:00:22 +0000, Pavel Machek <[email protected]> said:
Pavel> .5% still looks like a lot to me. Good compiler optimization
Pavel> is .5% on average...
Umh, but those optimizations are interesting only if they're
cumulative, i.e., once you've got 10 of them and they make a *total*
difference of 5% (actually, I'm doubtful anyone really notices
differences of 20-30% other than for benchmarking purposes... ;-).
For me, 1% is the magic threshold. If we find real apps that get a
higher penalty than that, I'd either lower the HZ or see if we can
tune the timer tick to be within a safe margin.
No matter what, though, higher tick rate clearly incurs somewhat
higher overhead. The benefit is lower application-level response time
and finer-granularity timeouts. I assume Robert has all the
benchmarks to show that. ;-)
--david
John Alvord wrote:
>
> On Sun, 21 Apr 2002, Pavel Machek wrote:
>
> > Hi!
> >
> > > Davide> i still have pieces of paper on my desk about tests done on
> > > Davide> my dual piii where by hacking HZ to 1000 the kernel build
> > > Davide> time went from an average of 2min:30sec to an average
> > > Davide> 2min:43sec. that is pretty close to 10%
> > >
> > > The last time I measured timer tick overhead on ia64 it was well below
> > > 1% of overhead. I don't really like using kernel builds as a
> > > benchmark, because there are far too many variables for the results to
> > > have any long-term or cross-platform value. But since it's popular, I
> > > did measure it quickly on a relatively slow (old) Itanium box: with
> > > 100Hz, the kernel compile was about 0.6% faster than with 1024Hz
> > > (2.4.18 UP kernel).
> >
> > .5% still looks like a lot to me. Good compiler optimization is .5% on
> > average...
> >
> > And think what it does with old 386sx.. Maybe time for those "tick on demand"
> > patches?
>
> Doesn't IBM have a tickless patch.. useful when demonstrating 10,000
> virtual linux machines on a single system.
Please folks. When can we put the "tick on demand" thing to bed? If in
doubt, get the patch from the high-res-timers sourceforge site (see
signature for the URL) and try it. Overhead becomes higher with system
load passing the ticked system at relatively light loads. Just what we
want, very low overhead idle systems!
The problem is in accounting (or time slicing if you prefer) where we
need to start a timer each time a task is context switched to, and stop
it when the task is switched away. The overhead is purely in the set up
and tear down. MOST of these never expire.
-g
>
> john alvord
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
At 14:52 -0700 22-04-2002, george anzinger wrote:
>John Alvord wrote:
> > On Sun, 21 Apr 2002, Pavel Machek wrote:
> > > And think what it does with old 386sx.. Maybe time for those
>"tick on demand"
>> > patches?
>>
> > Doesn't IBM have a tickless patch.. useful when demonstrating 10,000
>> virtual linux machines on a single system.
>
>Please folks. When can we put the "tick on demand" thing to bed? If in
>doubt, get the patch from the high-res-timers sourceforge site (see
>signature for the URL) and try it. Overhead becomes higher with system
>load passing the ticked system at relatively light loads. Just what we
>want, very low overhead idle systems!
During idle, the current monitors on our StrongARM-based low power
testbed show a distinct 100Hz beat. A significant portion of idle
power consumption can be attributed to the timer interrupt. IIRC the
IBM LinuxWatch people came to a similar conclusion.
In some cases we definitely do want very low overhead idle systems.
And of course on ARM systems context switches are relatively
expensive anyway, due to the need to flush the (virtually
indexed/tagged) caches.
JDB
[not that I'm proposing to inflict this on the mainline kernel]
--
LART. 250 MIPS under one Watt. Free hardware design files.
http://www.lart.tudelft.nl/
> Please folks. When can we put the "tick on demand" thing to bed? If in
> doubt, get the patch from the high-res-timers sourceforge site (see
> signature for the URL) and try it. Overhead becomes higher with system
> load passing the ticked system at relatively light loads. Just what we
> want, very low overhead idle systems!
>
> The problem is in accounting (or time slicing if you prefer) where we
> need to start a timer each time a task is context switched to, and stop
> it when the task is switched away. The overhead is purely in the set up
> and tear down. MOST of these never expire.
Did you work out where exactly the overhead was and if it was hardware
specific? On ppc for example updating the timer is just a write to a cpu
register.
Anton
> The problem is in accounting (or time slicing if you prefer) where we
> need to start a timer each time a task is context switched to, and stop
> it when the task is switched away. The overhead is purely in the set up
> and tear down. MOST of these never expire.
Done properly on many platforms a variable tick is very very easy and also
very efficient to handle. X86 is a paticular problem case because the timer
is so expensive to fiddle with
Alan Cox <[email protected]> writes:
> > The problem is in accounting (or time slicing if you prefer) where we
> > need to start a timer each time a task is context switched to, and stop
> > it when the task is switched away. The overhead is purely in the set up
> > and tear down. MOST of these never expire.
>
> Done properly on many platforms a variable tick is very very easy and also
> very efficient to handle. X86 is a paticular problem case because the timer
> is so expensive to fiddle with
Depends. On modern x86 you can either use the local APIC timer or
the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
should be in newer x86 chipsets). Both should be better than the
8254 timer and are also not expensive to work with.
-Andi
Anton Blanchard wrote:
>
>
> > Please folks. When can we put the "tick on demand" thing to bed? If in
> > doubt, get the patch from the high-res-timers sourceforge site (see
> > signature for the URL) and try it. Overhead becomes higher with system
> > load passing the ticked system at relatively light loads. Just what we
> > want, very low overhead idle systems!
> >
> > The problem is in accounting (or time slicing if you prefer) where we
> > need to start a timer each time a task is context switched to, and stop
> > it when the task is switched away. The overhead is purely in the set up
> > and tear down. MOST of these never expire.
>
> Did you work out where exactly the overhead was and if it was hardware
> specific? On ppc for example updating the timer is just a write to a cpu
> register.
It has nothing to do with hardware. The over head is putting a timer
entry in the list and then removing it. Almost all timers are canceled
before they expire. Even with the O(1) timer list, this takes time and
when done at the context switch rate the time mounts rapidly. And we
need at least one timer when we switch to a task. In the test code I
only start a "slice" timer. This means that a task that wants a
execution time signal may find the signal delayed by as much as a slice,
but it does keep the overhead lower.
>
> Anton
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
Andi Kleen wrote:
>
> Alan Cox <[email protected]> writes:
>
> > > The problem is in accounting (or time slicing if you prefer) where we
> > > need to start a timer each time a task is context switched to, and stop
> > > it when the task is switched away. The overhead is purely in the set up
> > > and tear down. MOST of these never expire.
> >
> > Done properly on many platforms a variable tick is very very easy and also
> > very efficient to handle. X86 is a paticular problem case because the timer
> > is so expensive to fiddle with
>
> Depends. On modern x86 you can either use the local APIC timer or
> the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
> should be in newer x86 chipsets). Both should be better than the
> 8254 timer and are also not expensive to work with.
I must not be making my self clear :) The overhead has nothing to do
with hardware. It is all timer list insertion and deletion. The
problem is that we need to do this at context switch rates, which are
MUCH higher that tick rates and, even with the O(1) insertion code,
cause the overhead to increase above the ticked overhead.
-g
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
Andi Kleen wrote:
>
> Alan Cox <[email protected]> writes:
>
> > > The problem is in accounting (or time slicing if you prefer) where we
> > > need to start a timer each time a task is context switched to, and stop
> > > it when the task is switched away. The overhead is purely in the set up
> > > and tear down. MOST of these never expire.
> >
> > Done properly on many platforms a variable tick is very very easy and also
> > very efficient to handle. X86 is a paticular problem case because the timer
> > is so expensive to fiddle with
>
> Depends. On modern x86 you can either use the local APIC timer or
> the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
> should be in newer x86 chipsets). Both should be better than the
> 8254 timer and are also not expensive to work with.
I just looked at the mmtimers. Looks like the right idea but a bit
overblown. I would prefer an interrupt generated by a compare to the
TSC all on board the cpu chip. This would eliminate the I/O overhead.
Still the 8-bit PIT is the pits.
When can we expect to see this in a real cpu?
-g
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
On Tue, Apr 23, 2002 at 12:24:06PM -0700, george anzinger wrote:
> Andi Kleen wrote:
> >
> > Alan Cox <[email protected]> writes:
> >
> > > > The problem is in accounting (or time slicing if you prefer) where we
> > > > need to start a timer each time a task is context switched to, and stop
> > > > it when the task is switched away. The overhead is purely in the set up
> > > > and tear down. MOST of these never expire.
> > >
> > > Done properly on many platforms a variable tick is very very easy and also
> > > very efficient to handle. X86 is a paticular problem case because the timer
> > > is so expensive to fiddle with
> >
> > Depends. On modern x86 you can either use the local APIC timer or
> > the mmtimers (ftp://download.intel.com/ial/home/sp/mmts097.pdf -
> > should be in newer x86 chipsets). Both should be better than the
> > 8254 timer and are also not expensive to work with.
>
> I just looked at the mmtimers. Looks like the right idea but a bit
> overblown. I would prefer an interrupt generated by a compare to the
> TSC all on board the cpu chip. This would eliminate the I/O overhead.
That's the local APIC timer. Pretty much all modern x86 have it.
But at least microsoft warns from using them for high precision
tim ekeeping on their mmtimer page "due to inaccuracy and
frequent silicon bugs" (and I guess they have the data for that)
For scheduling and time accounting it seems to work reasonably though,
even though I've had some problems with inaccuracies (e.g. when you
instrument both 8254 and apic timer and log the TSCs there are sometimes
drifts)
The linux local APIC timer setup could be probably also improved, for
example the 16 multiplier is a bit dubious and the calibration does not
look very robust.
> When can we expect to see this in a real cpu?
mmtimers? They are in the chipset, not in the CPU.
They are in some modern Intel and AMD chipsets already for example and
Microsoft is pushing them too so I guess they will be soon in all new
chipsets.
-Andi
Matti Aarnio writes:
> On Wed, Apr 17, 2002 at 02:01:42AM -0400, Robert Love wrote:
>> On Wed, 2002-04-17 at 01:34, Linus Torvalds wrote:
>>> No, it also makes it much easier to convert to/from the standard UNIX time
>>> formats (ie "struct timeval" and "struct timespec") without any surprises,
>>> because a jiffy is exactly representable in both if you have a HZ value
>>> of 100 or 100, but not if your HZ is 1024.
>>
>> Exactly - this was my issue. So what _was_ the rationale behind Alpha
>> picking 1024 (and others following)? More importantly, can we change to
>> 1000?
>
> Alpha processors don't have full division hardware, they have to
> iterate it one bit at the time. They do have a flash multiplier,
> and a barrel-shifter. Shifts take one pipeline cycle, like to
> addition and substraction. Multiply takes 6-12 depending on model,
> but division takes 64...
Division by 1000 is a UMULH followed by a right shift.
So maybe it costs you one cycle more than division by 1024 would.
> I must not be making my self clear :) The overhead has nothing to do
> with hardware. It is all timer list insertion and deletion. The
> problem is that we need to do this at context switch rates, which are
> MUCH higher that tick rates and, even with the O(1) insertion code,
> cause the overhead to increase above the ticked overhead.
I remain unconvinced. Firstly the timer changes do not have to
occur at schedule rate unless your implementaiton is incredibly naiive.
Secondly for the specfic schedule case done that way, it would be even more
naiive to use the standard timer api over a single compare to getthe
timer list versus schedule clock.
On Tue, 23 Apr 2002, Andi Kleen wrote:
> That's the local APIC timer. Pretty much all modern x86 have it.
> But at least microsoft warns from using them for high precision
> tim ekeeping on their mmtimer page "due to inaccuracy and
> frequent silicon bugs" (and I guess they have the data for that)
That's nothing new -- I recall a problem of missing half a tick each
time when hardware reloads the timer after reaching zero with certain
revisions of Pentium CPUs. It is documented in the specification update.
> The linux local APIC timer setup could be probably also improved, for
> example the 16 multiplier is a bit dubious and the calibration does not
> look very robust.
When fiddling with the predivider, please keep in mind the i82489DX only
supports 2, 4, 8 and 16 as dividers and you may set up 1 (i.e. no
division) but in a different way -- by setting LVTT appropriately (use
SET_APIC_TIMER_BASE(APIC_TIMER_BASE_CLKIN)).
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
Alan Cox wrote:
>
> > I must not be making my self clear :) The overhead has nothing to do
> > with hardware. It is all timer list insertion and deletion. The
> > problem is that we need to do this at context switch rates, which are
> > MUCH higher that tick rates and, even with the O(1) insertion code,
> > cause the overhead to increase above the ticked overhead.
>
> I remain unconvinced. Firstly the timer changes do not have to
> occur at schedule rate unless your implementaiton is incredibly naiive.
OK, I'll bite, how do you stop a task at the end of its slice if you
don't set up a timer event for that time?
> Secondly for the specfic schedule case done that way, it would be even more
> naiive to use the standard timer api over a single compare to getthe
> timer list versus schedule clock.
I guess it is my day to be naive :) What are you suggesting here?
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
> > I remain unconvinced. Firstly the timer changes do not have to
> > occur at schedule rate unless your implementaiton is incredibly naiive.
>
> OK, I'll bite, how do you stop a task at the end of its slice if you
> don't set up a timer event for that time?
At high scheduling rate you task switch more often than you hit the timer,
so you want to handle it in a lazy manner most of the time. Ie so long as
the timer goes off before the time slice expire why frob it
> > Secondly for the specfic schedule case done that way, it would be even more
> > naiive to use the standard timer api over a single compare to getthe
> > timer list versus schedule clock.
>
> I guess it is my day to be naive :) What are you suggesting here?
At the point you think about setting the timer register you do
next_clock = first_of(timers->head, next_timeslice);
if(before(next_clock, current_clock)
{
current_clock = next_clock;
set_timeout(next_clock);
}
Alan Cox wrote:
>
> > > I remain unconvinced. Firstly the timer changes do not have to
> > > occur at schedule rate unless your implementaiton is incredibly naiive.
> >
> > OK, I'll bite, how do you stop a task at the end of its slice if you
> > don't set up a timer event for that time?
>
> At high scheduling rate you task switch more often than you hit the timer,
> so you want to handle it in a lazy manner most of the time. Ie so long as
> the timer goes off before the time slice expire why frob it
So then we test for this condition (avoiding races, of course) and if
so, what? We will have a timer interrupt prior to the slice end, and
will have to make this decision all over again. However, the real rub
is that we have to keep track of elapsed time and account for that (i.e.
shorten the remaining slice) not only in the timer interrupt, but each
context switch. We are still doing more work each schedule and making
it "smaller" just puts off the inevitable, i.e. at some level of
scheduling activity we will accumulate more time in this accounting code
than in the current "flat" or constant overhead way of doing things.
>
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
> so, what? We will have a timer interrupt prior to the slice end, and
> will have to make this decision all over again. However, the real rub
Only on unusual occasions.
> is that we have to keep track of elapsed time and account for that (i.e.
> shorten the remaining slice) not only in the timer interrupt, but each
We do anyway
Alan
Alan Cox wrote:
>
> > so, what? We will have a timer interrupt prior to the slice end, and
> > will have to make this decision all over again. However, the real rub
>
> Only on unusual occasions.
>
> > is that we have to keep track of elapsed time and account for that (i.e.
> > shorten the remaining slice) not only in the timer interrupt, but each
>
> We do anyway
Yes, but now we do all this in the timer tick, not in schedule(). This
occures much less often.
>
> Alan
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
> > We do anyway
>
> Yes, but now we do all this in the timer tick, not in schedule(). This
> occures much less often.
Well in the timer tick code we already hold the locks needed to check
the front of the timer queue safely, we already have current and the top
timer needing to touch cache (current for accounting stats at the least).
So thats what an extra compare and cmov - 1 clock maybe 2 ?
Alan Cox wrote:
>
> > > We do anyway
> >
> > Yes, but now we do all this in the timer tick, not in schedule(). This
> > occures much less often.
>
> Well in the timer tick code we already hold the locks needed to check
> the front of the timer queue safely, we already have current and the top
> timer needing to touch cache (current for accounting stats at the least).
> So thats what an extra compare and cmov - 1 clock maybe 2 ?
The problem is the extra code in the schedule() path, not in the timer
tick path. It is traversed FAR more often.
The current tick at 1/HZ is really quite relaxed. Given the PIT (ugh!)
the longest we can put off a tick is about 50 ms. This means that any
time greater than this will require more than one interrupt, i.e. the
best case improvement by going tick less (again given the PIT) is about
5 times. Other platforms/ hardware, of course, change this.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
> The problem is the extra code in the schedule() path, not in the timer
> tick path. It is traversed FAR more often.
Thats still in most cases a single compare. The tick timer will mostly be
going off before our time slice completes. Also importantly the more we
context switch the less timers go off - so it scales correctly.
> The current tick at 1/HZ is really quite relaxed. Given the PIT (ugh!)
> the longest we can put off a tick is about 50 ms. This means that any
> time greater than this will require more than one interrupt, i.e. the
> best case improvement by going tick less (again given the PIT) is about
> 5 times. Other platforms/ hardware, of course, change this.
If you are arguing that the PIT makes it impractical on basic x86 then
we are in violent agreement. I don't propose this kind of stuff for the
PIT but for real computers where a timer reload is a couple of clocks