1999-01-06 00:45:49

by Kurt Garloff

[permalink] [raw]
Subject: [PATCH] HZ change for ix86

Hey guys,

lately, I've seen a couple of questions about changing HZ in the kernel for
ix86. Your scheduler will run more often and your system might feel snappier
when increasing HZ, that's why we want it. Overhead for doing so got
relativlely low with recent CPUs, so me might really want it.

Basically, you can change it without causing any problems within the kernel.
However the HZ based value is reported by a system call sys_times() to the
userspace and as well from the proc fs.

So, this has to be fixed to have a seamless HZ change.

I created a patch which changes the values of HZ to 400 and fixed all places
I could spot which report the jiffies value to userspace. I think I caught
all of them. Note that 400 is a nice value, because we have to divide the
values by 4 then, which the gcc optimizes to shift operations, which can be
done in one or two cycles each and even parallelized on modern CPUs. Integer
divisions are slow on the ix86 (~20 cycles) and the sys_times() needs four of
them. You can easily change HZ to 500 and HZ_TO_STD to 5 in asm-i386/param.h,
but then you would loose 80 CPU cycles per sys_times() syscall, which you
might want to avoid.

This patch works here since 2.1.125, so you can consider it well tested on
one system.

Have fun !
--
Kurt Garloff <[email protected]> [Dortmund, FRG]
Plasma physics, high perf. computing [Linux-ix86,-axp, DUX]
PGP key on http://www.garloff.de/kurt/ [Linux SCSI driver: DC390]


Attachments:
21125-hz-change.diff (4.01 kB)
21125-hz-change.diff

1999-01-06 12:34:45

by Benjamin Scherrey

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

Kurt -

Thanx for the insightful information about the impact of changing the HZ
values. Questions: a) how platform specific is this setting (i86, ALPHA, et al),
and b) Does increasing the HZ value increases context switches or increases
duration of each context?

This sounds like an excellent developer's config option to me.... Any chance
of this happening soon?

regards,

Ben Scherrey

Kurt Garloff wrote:
<snip>

> lately, I've seen a couple of questions about changing HZ in the kernel for
> ix86. Your scheduler will run more often and your system might feel snappier
> when increasing HZ, that's why we want it. Overhead for doing so got
> relativlely low with recent CPUs, so me might really want it.
>
> Basically, you can change it without causing any problems within the kernel.
> However the HZ based value is reported by a system call sys_times() to the
> userspace and as well from the proc fs.
>
> So, this has to be fixed to have a seamless HZ change.
>
> I created a patch which changes the values of HZ to 400 and fixed all places
> I could spot which report the jiffies value to userspace. I think I caught
> all of them. Note that 400 is a nice value, because we have to divide the
> values by 4 then, which the gcc optimizes to shift operations, which can be
> done in one or two cycles each and even parallelized on modern CPUs. Integer
> divisions are slow on the ix86 (~20 cycles) and the sys_times() needs four of
> them. You can easily change HZ to 500 and HZ_TO_STD to 5 in asm-i386/param.h,
> but then you would loose 80 CPU cycles per sys_times() syscall, which you
> might want to avoid.

<snip>


1999-01-07 01:44:46

by B. James Phillippe

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Tue, 5 Jan 1999, Kurt Garloff wrote:

> Hey guys,
>
> lately, I've seen a couple of questions about changing HZ in the kernel for
> ix86. Your scheduler will run more often and your system might feel snappier
> when increasing HZ, that's why we want it. Overhead for doing so got
> relativlely low with recent CPUs, so me might really want it.
...
> I created a patch which changes the values of HZ to 400 and fixed all places
> I could spot which report the jiffies value to userspace. I think I caught
> all of them. Note that 400 is a nice value, because we have to divide the
> values by 4 then, which the gcc optimizes to shift operations, which can be
> done in one or two cycles each and even parallelized on modern CPUs. Integer
> divisions are slow on the ix86 (~20 cycles) and the sys_times() needs four of

I don't know anything about it (and my box is an Alpha for which HZ is
1024), but, one ignorant proposal: would it perhaps be worthwhile to have
the HZ value higher for faster (x86) systems based on the target picked in
make config? Say, your 400 for Pentium+ and 100 for 486 or lower..?

cheers,
-bp
--
B. James Phillippe . [email protected]
Linux Engineer/Admin . http://www.terran.org/~bryan
Member since 1.1.59 . finger:[email protected]


1999-01-07 01:49:31

by Egil Kvaleberg

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On 5 Jan 1999, Kurt Garloff wrote:

> your system might feel snappier
> when increasing HZ

Not just "might feel" snappier. Somewhat to my surprise, it can also
significantly increase overall performance. On my twin hacked-Celeron
machine, which is temporarily set up with a slow 2Gb IDE disk for test
purposes, a simple HZ change from 100 to 1000 decreased the time for a full
kernel compile from almost 4 minutes to just above 3 minutes. Presumably due
to better CPU utilization.

> I created a patch which changes the values of HZ to 400 and fixed all places
> I could spot which report the jiffies value to userspace.

At the very least, I think you should say something along:

#define HZ 400

#define HZ_STD 100
#define HZ_TO_STD (HZ / HZ_STD)

But doesn't this entire approach seem a bit hackish? As you note, there is
an overhead involved. And unless you want to introduce floating point
(ouch!), it fails miserably for many useful values of HZ. Finally, it also
does not provide user space with the benefit of increased time resolution.

IMHO, the right thing would be to implement CLK_TCK properly as a true
reflection of HZ. Now, it seems to be fixed: e.g. 100 for i386, and 1024
for alpha.

The easiest approach would be to make "timebits.h" pick up HZ from the
kernel, thus:

#include <asm/param.h>
#define CLK_TCK HZ

The downside is of course that programs would need to be recompiled for any
change in HZ. The best thing would be to fix CLK_TCK at runtime. But could
this possibly break anything?

Re. HZ, there are probably a couple of other places that also needs to be
cleaned up. In timex.h, I came over this:

#ifdef __alpha__
# define SHIFT_HZ 10 /* log2(HZ) */
#else
# define SHIFT_HZ 7 /* log2(HZ) */
#endif

Trivial to fix, though, with something akin to "#if HZ >= 1000".

Egil
--
Email: [email protected] Voice: +47 22523641, 92022870 Fax: +47 22525899
Snail: Egil Kvaleberg, Husebybakken 14A, 0379 Oslo, Norway
URL: http://www.kvaleberg.no/


1999-01-07 02:11:45

by Kurt Garloff

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Tue, Jan 05, 1999 at 10:57:16PM -0500, Benjamin Scherrey wrote:
> Kurt -
>
> Thanx for the insightful information about the impact of changing the HZ
> values. Questions: a) how platform specific is this setting (i86, ALPHA, et al),
> and b) Does increasing the HZ value increases context switches or increases
> duration of each context?

a)
The HZ value differs between the different architectures. The alpha has e.g.
HZ set to 1024. That's why the kernel core has to independent of it.
The way I coded it, it will break compilation on other archs, as I was to
lazy to put the constant HZ_TO_STD into the header files of other archs.
Of course, we could use something like #ifndef HZ_TO_STD #define HZ_TO_STD 1
#endif in kernel/sys.c

b)
The timer interrupt and therefore the scheduler will be called more often.
If more than one process competes for CPU (R state), than the number of
switches between these processes will occur more often, about 4 times as
often.
If I understood correctly, also the bottom half data processing of the
kernel is tied to the timer interrupt and will thus happen more often.

It speeded up some of my numerical computations on my SMP machine, BTW. I
have rc5des runnning (idle priority, Rik's patch), and some threads sleeping
and waiting for some job to be submitted to them. However, after they were
signalled, they will only start after the next scheduler tick. So the HZ
value influences scheduling latency. Unfortunately my program is not very
well parallelized, so the jobs to be done by the threads are very short and
take about the same time as the scheduler latency. Now, with 400 Hz it was
much better ...

> This sounds like an excellent developer's config option to me.... Any
> chance of this happening soon?

This is not up to me.
I can however create a cleaned up patch and put it on my website, if enough
people want it. It will take some days, though, as I'm very busy.

Regards,
--
Kurt Garloff <[email protected]> [Dortmund, FRG]
Plasma physics, high perf. computing [Linux-ix86,-axp, DUX]
PGP key on http://www.garloff.de/kurt/ [Linux SCSI driver: DC390]

1999-01-07 02:11:46

by Kurt Garloff

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Tue, Jan 05, 1999 at 09:25:25PM -0800, B. James Phillippe wrote:
> I don't know anything about it (and my box is an Alpha for which HZ is
> 1024), but, one ignorant proposal: would it perhaps be worthwhile to have
> the HZ value higher for faster (x86) systems based on the target picked in
> make config? Say, your 400 for Pentium+ and 100 for 486 or lower..?

Yes, I think this would be a good idea.
No time to code it into the CONFIG files, right now, though ...
If Linus tells me: "Hey, do it, it will be integrated then!" I will have
time, of course.

--
Kurt Garloff <[email protected]> [Dortmund, FRG]
Plasma physics, high perf. computing [Linux-ix86,-axp, DUX]
PGP key on http://www.garloff.de/kurt/ [Linux SCSI driver: DC390]

1999-01-07 19:54:50

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Tue, Jan 05, 1999 at 10:57:16PM -0500, Benjamin Scherrey wrote:

> Thanx for the insightful information about the impact of
> changing the HZ values. Questions: a) how platform specific is this
> setting (i86, ALPHA, et al)

each platform is differnet, for example, x86 is 100, alpha is 1024.
Some RT people use 10,000 or so on x86 I beleive (you can also do
this to get more fine frained shaper control)

> and b) Does increasing the HZ value increases context switches or
> increases duration of each context?

no

> This sounds like an excellent developer's config option to me....

why? what are you truing to achieve?



-cw

1999-01-07 19:55:12

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Tue, Jan 05, 1999 at 09:25:25PM -0800, B. James Phillippe wrote:

> I don't know anything about it (and my box is an Alpha for which HZ
> is 1024), but, one ignorant proposal: would it perhaps be
> worthwhile to have the HZ value higher for faster (x86) systems
> based on the target picked in make config? Say, your 400 for
> Pentium+ and 100 for 486 or lower..?

I musted have missed the reset of this thread -- what exactly are
people wanting to acheive with a higher timer frequency?



-cw

1999-01-07 20:20:10

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

Hi!

> > I don't know anything about it (and my box is an Alpha for which HZ is
> > 1024), but, one ignorant proposal: would it perhaps be worthwhile to have
> > the HZ value higher for faster (x86) systems based on the target picked in
> > make config? Say, your 400 for Pentium+ and 100 for 486 or lower..?
>
> Yes, I think this would be a good idea.
> No time to code it into the CONFIG files, right now, though ...
> If Linus tells me: "Hey, do it, it will be integrated then!" I will have
> time, of course.

You should _not_ need to increase HZ. But there've always been obscure
"feature" in scheduler, and increased HZ work around it.

Pavel
--
The best software in life is free (not shareware)! Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

1999-01-08 07:56:27

by Riley Williams

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

Hi James.

>> I created a patch which changes the values of HZ to 400 and fixed
>> all places I could spot which report the jiffies value to
>> userspace. I think I caught all of them. Note that 400 is a nice
>> value, because we have to divide the values by 4 then, which the
>> gcc optimizes to shift operations, which can be done in one or two
>> cycles each and even parallelized on modern CPUs. Integer
>> divisions are slow on the ix86 (~20 cycles) and the sys_times()
>> needs four of them.

> I don't know anything about it (and my box is an Alpha for which HZ
> is 1024), but, one ignorant proposal: would it perhaps be
> worthwhile to have the HZ value higher for faster (x86) systems
> based on the target picked in make config?

> Say, your 400 for Pentium+ and 100 for 486 or lower..?

If we were going to do this, I'd suggest 400 for Pentium+, 200 for 486
and 100 for 386 class systems as being more reasonable, and still
maintaining the shift-optimisation mentioned above...

Best wishes from Riley.

---
* ftp://ps.cus.umist.ac.uk/pub/rhw/Linux
* http://ps.cus.umist.ac.uk/~rhw/kernel.versions.html


1999-01-08 08:00:10

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

Egil Kvaleberg schrieb:
> IMHO, the right thing would be to implement CLK_TCK properly as a true
> reflection of HZ. Now, it seems to be fixed: e.g. 100 for i386, and 1024
> for alpha.
>
> The easiest approach would be to make "timebits.h" pick up HZ from the
> kernel, thus:
> #include <asm/param.h>
> #define CLK_TCK HZ
> The downside is of course that programs would need to be recompiled for any
> change in HZ.
> The best thing would be to fix CLK_TCK at runtime.
e.g.
#define CLK_TCK new_function_to_get_HZ()
> But could this possibly break anything?

Yes, it would break user programs that were compiled before your change.
I know of two ways for user code to access system time right now:
clock() and times(). Both of these have constants (CLOCKS_PER_SECOND and
CLK_TCK)
that can't be changed without breaking user applications.
IMHO we need to leave those alone. I don't want to have to recompile my apps
to move them from 2.0.36 to 2.2.0. (I do like the idea of changing
HZ to a power-of-two multiple of CLK_TCK.)

Maybe we should create a new interface for user applications to get the true
system time in its native units, with the value of the native tick available
at runtime only. e.g. long sys_ticks(), long sys_ticks_per_second().
- Dan
--
Speaking only for myself, not for my employer

1999-01-09 12:22:31

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Thu, 7 Jan 1999, Pavel Machek wrote:

> Hi!
>
> > > I don't know anything about it (and my box is an Alpha for which HZ is
> > > 1024), but, one ignorant proposal: would it perhaps be worthwhile to have
> > > the HZ value higher for faster (x86) systems based on the target picked in
> > > make config? Say, your 400 for Pentium+ and 100 for 486 or lower..?
> >
> > Yes, I think this would be a good idea.
> > No time to code it into the CONFIG files, right now, though ...
> > If Linus tells me: "Hey, do it, it will be integrated then!" I will have
> > time, of course.
>
> You should _not_ need to increase HZ. But there've always been obscure
> "feature" in scheduler, and increased HZ work around it.
>
> Pavel

There seems to be a general misinformation about what the HZ value is.
I will "simplicate and add lightness".

If your code does:
while (1)
;

The CPU gets taken away from you HZ times per second so that other
tasks can use the CPU cycles you are wasting. Under these conditions
it seems like a good idea to make the HZ value as high as possible.

However, if your code is doing:

UncompressFonts(...........);
ReadAudioFromDsp(...........);
ConvolveImageData(..........);

you don't want the CPU taken away until you are done.

For most interactive applications, it has been experimentally determined
that 100 Hz is (about) right because human beings can't detect flicker
above 80 Hz. In other words, getting the CPU stolen 100 times per second
doesn't produce visual effects. The higher the HZ value, the more
often the CPU gets stolen from the interactive user. There are trade-
offs as with most everything.

More is not better. More is just different.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.131 on an i686 machine (400.59 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.
Wisdom : It's not a Y2K problem. It's a Y2Day problem.



1999-01-09 14:13:11

by Kurt Garloff

[permalink] [raw]
Subject: Re: [PATCH] HZ change for ix86

On Fri, Jan 08, 1999 at 09:14:14AM -0500, Richard B. Johnson wrote:
> On Thu, 7 Jan 1999, Pavel Machek wrote:
>
> > You should _not_ need to increase HZ. But there've always been obscure
> > "feature" in scheduler, and increased HZ work around it.
> >
> > Pavel
>
> There seems to be a general misinformation about what the HZ value is.
> I will "simplicate and add lightness".
>
> If your code does:
> while (1)
> ;
>
> The CPU gets taken away from you HZ times per second so that other
> tasks can use the CPU cycles you are wasting. Under these conditions
> it seems like a good idea to make the HZ value as high as possible.
>
> However, if your code is doing:
>
> UncompressFonts(...........);
> ReadAudioFromDsp(...........);
> ConvolveImageData(..........);
>
> you don't want the CPU taken away until you are done.
>
> For most interactive applications, it has been experimentally determined
> that 100 Hz is (about) right because human beings can't detect flicker
> above 80 Hz. In other words, getting the CPU stolen 100 times per second
> doesn't produce visual effects. The higher the HZ value, the more
> often the CPU gets stolen from the interactive user. There are trade-
> offs as with most everything.
>
> More is not better. More is just different.

I think you didn't describe the tradeoff correctly. The amount of time a
process gets are independent of HZ. HZ just defines its granularity.

Let me describe this with a picture, where we have two runnable processes A
and B: . . . . . . . .
HZ = 100: AAAAAAAABBBBBBBBAAAAAAAABBBBBBBB
HZ = 400: AABBAABBAABBAABBAABBAABBAABBAABB

Both processes have the same priority and get the same amount of CPU. This
is independent of the HZ value.

Basically, there are timeslices which are 1/HZ long (1/100s on ix86). Every
1/100s the time interrupt interrupts the CPU and the scheduler is run and
decides which process should get the CPU. Most processes won't even be
considered, as they are waiting for something and thus sleeping (S in ps).
The ones that are runnable (R in ps) will be considered and their priority
and the amount of time the current process already had the CPU will be used
to calculate (goodness) which process is next. Note that not on every timer
tick, the current process will be changed, so you don't have a 100HZ
switching frequency.
There are some exceptions: Real time (SCHED_FIFO or SCHED_RR) will be
treated differently. If a sleeping process becomes runnable (which happens
by an asynchronous event which is processed by the kernel), it will
immediately get the CPU => Good interactive performance. Also signals, I
believe, can result in the scheduler being run. Also the scheduler is run on
some other occasions, e.g. whenever the current process goes to sleep or
performs certain system calls.
[This is my understanding of the scheduler from reading the code and reading
postings in the kernel-list, and some details might be slightly wrong.
Please correct me, if you know better.]


Now, if the scheduler itself and the task switching (and the cache updating)
did not need any time, it would be better to have very high HZ values,
because the multitasking would be more fine grained and the picture of
parallel running processes would be more perfect.

However, there is some overhead: Every time the timer generates an
interrupt, the CPU has to go to kernel mode (GPL 0) which takes some time,
has to run the scheduler and calculate priorities, which takes some time and
to choose a process and go back to user mode (GPL 3), which again takes some
time. If we changed the process, the contents of the caches will be mostly
useless, which again costs some performance.

Now, with the 386, Linus decided that 100HZ is a good compromise between
good parallelism and scheduling overhead. When he started working on the
alpha, much later, these machines were much faster and 1024HZ seemed the way
to go. So, no wonder people with today's iP-II machines, which are as fast
as the alphas some years ago, think that 100HZ is too low.

There are some situations they may be right. If you have a couple of
processes which need the CPU, a higher HZ value would give a better
multitasking feeling. Note, that it's only a better impression, nopt that
your system is really faster. (It might be a little bit, if your processes
are implemented in a way, they wait for something, but cannot sleep for some
reason. Most time this is a poor design in your software, however.)

However, in normal situations (and there are exceptions, I know), if you
have many runnable processes, you do something wrong, as normally processes
should sleep, cause they have to wait for something.

Also the above stated exceptions on the scheduler running out of order are
important and make the HZ value less important.

Ingo Molnar told me, that there was sort of a bug in the SMP scheduler code
which caused a woken up thread to only get the CPU after the next HZ tick,
if it was to run on the other CPU. This resulted in poor performance in some
of my multithreaded numerical apps, and on researching why this happens, I
saw it took some time till the subthread was woken up. It got remarkably
better after I set my scheduler to 400HZ, that's why I created the patch.
Note that I had rc5des runnning (SCHED_IDLE with Rik's patch) in the
background.

Now he told me, that this was fixed in 2.2.0-preX, but I couldn't test yet.
I'm really curious to see how it performs, when rc5des running reniced with
the new kernels as Ingo also told me it will be almost like idle.


Maybe there were or even are other scheduler strangenesses which cause a
slight performance increase or better interactive feeling with higher HZ
values. Also there might be (probably poorly designed) processes which
might profit from higher HZ values.


So the question is: We can increase the HZ value on Pentium or PII class
machines without wasting too much time. So it won't hurt anybody and
probably even result in slightly better performance for some cases. And it
will allow more fine grained time accounting, BTW. (It's no accident that
the resolution of the time command is only 1/100s for user and system time
on the ix86.)

However, as I know Linus, he will ask another question: Do we need it?
Isn't it better to fix the issues with the scheduler (if there are any left)
and tell the people to fix their apps instead of changing the kernel?
Linus never likes anything which just hides problems, and a higher HZ value
might hide problems with the scheduler or other parts of the kernel or
applications.


I think there is some advantage in having higher HZ values: The more precise
time accounting and the possibility to have more fine grained priorities.
However none of these is really important and I doubt that Linus will be
convinced by this. As said, the real reason for increasing HZ is probably
solved by MIngo's 2.2.0 scheduler changes.

So, please give better reasons.
Maybe some pseudo-RT apps?
Maybe userspace apps which do things the kernel doesn't properly support, so
they cannot sleep?


Sorry, this got too long! Have a nice evening ...
--
Kurt Garloff <[email protected]> [Dortmund, FRG]
Plasma physics, high perf. computing [Linux-ix86,-axp, DUX]
PGP key on http://www.garloff.de/kurt/ [Linux SCSI driver: DC390]