2002-10-04 22:48:37

by Dave Hansen

[permalink] [raw]
Subject: [RFC][PATCH] HZ as a config option

diff -ur linux-2.5.40/arch/i386/Config.help linux-2.5.40-hz_config/arch/i386/Config.help
--- linux-2.5.40/arch/i386/Config.help 2002-10-01 00:06:17.000000000 -0700
+++ linux-2.5.40-hz_config/arch/i386/Config.help 2002-10-04 15:25:52.000000000 -0700
@@ -850,6 +850,17 @@

If in doubt, say N.

+CONFIG_HZ
+ This is unrelated to your processor's speed. This variable alters
+ how often the system is asked to generate timer interrupts. A larger
+ value can lead to a more responsive system, but also causes extra
+ overhead from the increased number of context switches.
+
+ In older kernels, this was set to 100. In 2.5, it was set to 1000.
+ HZ must be greater than 11 and less than 1536.
+
+ If in doubt, leave it at the default of 1000.
+
CONFIG_CPU_FREQ_24_API
This enables the /proc/sys/cpu/ sysctl interface for controlling
CPUFreq, as known from the 2.4.-kernel patches for CPUFreq. Note
diff -ur linux-2.5.40/arch/i386/config.in linux-2.5.40-hz_config/arch/i386/config.in
--- linux-2.5.40/arch/i386/config.in 2002-10-04 14:25:29.000000000 -0700
+++ linux-2.5.40-hz_config/arch/i386/config.in 2002-10-04 15:21:21.000000000 -0700
@@ -208,6 +208,10 @@
fi
fi

+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ int 'Kernel Timer Frequency (HZ)' CONFIG_HZ 1000
+fi
+
tristate 'Toshiba Laptop support' CONFIG_TOSHIBA
tristate 'Dell laptop support' CONFIG_I8K

diff -ur linux-2.5.40/include/asm-i386/param.h linux-2.5.40-hz_config/include/asm-i386/param.h
--- linux-2.5.40/include/asm-i386/param.h 2002-10-01 00:06:20.000000000 -0700
+++ linux-2.5.40-hz_config/include/asm-i386/param.h 2002-10-04 14:54:04.000000000 -0700
@@ -2,7 +2,7 @@
#define _ASMi386_PARAM_H

#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */
+# define HZ CONFIG_HZ /* Internal kernel timer frequency */
# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif
diff -ur linux-2.5.40/include/linux/timex.h linux-2.5.40-hz_config/include/linux/timex.h
--- linux-2.5.40/include/linux/timex.h 2002-10-01 00:06:15.000000000 -0700
+++ linux-2.5.40-hz_config/include/linux/timex.h 2002-10-04 15:24:04.000000000 -0700
@@ -76,7 +76,7 @@
#elif HZ >= 768 && HZ < 1536
# define SHIFT_HZ 10
#else
-# error You lose.
+# error Please use a HZ value which is between 12 and 1536
#endif

/*


Attachments:
config_hz-2.5.40-1.patch (2.33 kB)

2002-10-05 00:39:17

by Alan Cox

[permalink] [raw]
Subject: Re: [RFC][PATCH] HZ as a config option

On Fri, 2002-10-04 at 23:53, Dave Hansen wrote:
> On large systems (like NUMA-Q, Intel Profusion, etc...), latency and
> user responsiveness become much less important. The extra scheduling
> overhead caused by higher HZ is bad.
>
> This is x86-only right now. Is there any wider desire to tune this at
> config time? Do any architecutures have strict rules as to what this
> can be set to?

You can't set this arbitarily, the NTP PLL's will only lock for certain
value ranges.

2002-10-07 16:53:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC][PATCH] HZ as a config option

Alan Cox wrote:
> On Fri, 2002-10-04 at 23:53, Dave Hansen wrote:
>
>>On large systems (like NUMA-Q, Intel Profusion, etc...), latency and
>>user responsiveness become much less important. The extra scheduling
>>overhead caused by higher HZ is bad.
>>
>>This is x86-only right now. Is there any wider desire to tune this at
>>config time? Do any architecutures have strict rules as to what this
>>can be set to?
>
> You can't set this arbitarily, the NTP PLL's will only lock for certain
> value ranges.

Where can I find these ranges? include/linux/timex.h only errors if
the number is out of the 12-1535 range.

--
Dave Hansen
[email protected]

2002-10-07 17:05:07

by Alan Cox

[permalink] [raw]
Subject: Re: [RFC][PATCH] HZ as a config option

On Mon, 2002-10-07 at 17:58, Dave Hansen wrote:
> > You can't set this arbitarily, the NTP PLL's will only lock for certain
> > value ranges.
>
> Where can I find these ranges? include/linux/timex.h only errors if
> the number is out of the 12-1535 range.

See Rolf Fokkens message on 21st Sept for one bit about it, there was
another thread about error ranges as well

2002-10-07 17:22:20

by George Anzinger

[permalink] [raw]
Subject: Re: [RFC][PATCH] HZ as a config option

Dave Hansen wrote:
>
> Alan Cox wrote:
> > On Fri, 2002-10-04 at 23:53, Dave Hansen wrote:
> >
> >>On large systems (like NUMA-Q, Intel Profusion, etc...), latency and
> >>user responsiveness become much less important. The extra scheduling
> >>overhead caused by higher HZ is bad.
> >>
> >>This is x86-only right now. Is there any wider desire to tune this at
> >>config time? Do any architecutures have strict rules as to what this
> >>can be set to?
> >
> > You can't set this arbitarily, the NTP PLL's will only lock for certain
> > value ranges.
>
> Where can I find these ranges? include/linux/timex.h only errors if
> the number is out of the 12-1535 range.

The issue is not the value itself, but if, by using the
value and the PIT with its clock of 14.3181818/12 MHZ you
can come up with a count that is ?? parts per million of
being right. I am not sure what ?? should be but 20 comes
to mind.

All this is (incorrectly) covered over in 2.5 by using a
more "correct" value for tick_nsec. This keeps the wall
clock correct for most any value of HZ, BUT breaks the POSIX
standard that says NO TIMER SHALL EXPIRE BEFORE ITS TIME.
To prove this breakage, try this on a 2.5 system:

time sleep 60

Any answer less than 1 minute is BROKEN.

I think the correct way to do all this is to use something
other than the PIT as the clock reference AND adjust the
jiffies time, not the wall clock.

The High-res-timers patch I posted last Friday does the
first part, i.e. uses a different clock reference. The NTP
changes will come later.

-g
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-10-07 19:23:29

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [RFC][PATCH] HZ as a config option

On Mon, Oct 07, 2002 at 10:27:13AM -0700, george anzinger wrote:
> Dave Hansen wrote:
> >
> > Alan Cox wrote:
> > > On Fri, 2002-10-04 at 23:53, Dave Hansen wrote:
> > >
> > >>On large systems (like NUMA-Q, Intel Profusion, etc...), latency and
> > >>user responsiveness become much less important. The extra scheduling
> > >>overhead caused by higher HZ is bad.
> > >>
> > >>This is x86-only right now. Is there any wider desire to tune this at
> > >>config time? Do any architecutures have strict rules as to what this
> > >>can be set to?
> > >
> > > You can't set this arbitarily, the NTP PLL's will only lock for certain
> > > value ranges.
> >
> > Where can I find these ranges? include/linux/timex.h only errors if
> > the number is out of the 12-1535 range.
>
> The issue is not the value itself, but if, by using the
> value and the PIT with its clock of 14.3181818/12 MHZ you
> can come up with a count that is ?? parts per million of
> being right. I am not sure what ?? should be but 20 comes
> to mind.
>
> All this is (incorrectly) covered over in 2.5 by using a
> more "correct" value for tick_nsec. This keeps the wall
> clock correct for most any value of HZ, BUT breaks the POSIX
> standard that says NO TIMER SHALL EXPIRE BEFORE ITS TIME.
> To prove this breakage, try this on a 2.5 system:
>
> time sleep 60
>
> Any answer less than 1 minute is BROKEN.

1 minute universal time or 1 minute gettimeofday time or 1 minute
internal kernel time?

The current kernels do the last ...
I guess correct would probably be the second ...
While the first is what the user would expect. ;_

Anyway, if you do an usleep(20000) on a 2.4 (with 100 Hz timer), you
probably expect it to sleep two (and not three) ticks, while with NTP
running you can easily get that it took 19998 usec. Or do we really want
it to go off at 39997 us?

> I think the correct way to do all this is to use something
> other than the PIT as the clock reference AND adjust the
> jiffies time, not the wall clock.
>
> The High-res-timers patch I posted last Friday does the
> first part, i.e. uses a different clock reference. The NTP
> changes will come later.

Btw, current kernel NTP implementation (at least 2.4, didn't check 2.5
yet, but it looks very similar) is broken enough, that with NTP running,
the time can jump -1us or -2us backward if the 14.318... MHz clock in
the computer goes slightly faster than it should per spec. Errors of
+200 ppm are quite common.

This is quite a problem ...

--
Vojtech Pavlik
SuSE Labs

2002-10-07 20:25:04

by George Anzinger

[permalink] [raw]
Subject: Re: [RFC][PATCH] HZ as a config option

Vojtech Pavlik wrote:
>
> On Mon, Oct 07, 2002 at 10:27:13AM -0700, george anzinger wrote:
> > Dave Hansen wrote:
> > >
> > > Alan Cox wrote:
> > > > On Fri, 2002-10-04 at 23:53, Dave Hansen wrote:
> > > >
> > > >>On large systems (like NUMA-Q, Intel Profusion, etc...), latency and
> > > >>user responsiveness become much less important. The extra scheduling
> > > >>overhead caused by higher HZ is bad.
> > > >>
> > > >>This is x86-only right now. Is there any wider desire to tune this at
> > > >>config time? Do any architecutures have strict rules as to what this
> > > >>can be set to?
> > > >
> > > > You can't set this arbitarily, the NTP PLL's will only lock for certain
> > > > value ranges.
> > >
> > > Where can I find these ranges? include/linux/timex.h only errors if
> > > the number is out of the 12-1535 range.
> >
> > The issue is not the value itself, but if, by using the
> > value and the PIT with its clock of 14.3181818/12 MHZ you
> > can come up with a count that is ?? parts per million of
> > being right. I am not sure what ?? should be but 20 comes
> > to mind.
> >
> > All this is (incorrectly) covered over in 2.5 by using a
> > more "correct" value for tick_nsec. This keeps the wall
> > clock correct for most any value of HZ, BUT breaks the POSIX
> > standard that says NO TIMER SHALL EXPIRE BEFORE ITS TIME.
> > To prove this breakage, try this on a 2.5 system:
> >
> > time sleep 60
> >
> > Any answer less than 1 minute is BROKEN.
>
> 1 minute universal time or 1 minute gettimeofday time or 1 minute
> internal kernel time?

I think you miss my point. The current 2.5 kernel does not
sleep long enough on ANY request. By sleeping 60 seconds,
the error becomes so big that it washes out the "time"
program latency giving a value less than 60 seconds in the
above answer. The problem is that the system sleeps on 1/HZ
time but does NOT up date the wall clock by 1/HZ each tick.
Instead it uses a value that more correctly represents the
actual time between the 1/HZ ticks. Problem is this
violates the POSIX standard in that a call to gettimeofday
before and after a sleep should ALWAYS say the sleep took at
least the requested amount of time (assuming it was not
interrupted by any signals).
>
> The current kernels do the last ...
> I guess correct would probably be the second ...

The "correct" answer IMHO is, baring over time setting, that
internal kernel time (what POSIX calls CLOCK_MONOTONIC time)
should track down to the nanosecond. I.e. NTP should change
the 1/HZ tick size, which in turn will change the wall
clock.

> While the first is what the user would expect. ;_
>
> Anyway, if you do an usleep(20000) on a 2.4 (with 100 Hz timer), you
> probably expect it to sleep two (and not three) ticks, while with NTP
> running you can easily get that it took 19998 usec. Or do we really want
> it to go off at 39997 us?

The above problem is NOT related to NTP at all. But what
you are saying is true, i.e. NTP could make the timers
complete early. Another reason to keep the clocks in sync.
>
> > I think the correct way to do all this is to use something
> > other than the PIT as the clock reference AND adjust the
> > jiffies time, not the wall clock.
> >
> > The High-res-timers patch I posted last Friday does the
> > first part, i.e. uses a different clock reference. The NTP
> > changes will come later.
>
> Btw, current kernel NTP implementation (at least 2.4, didn't check 2.5
> yet, but it looks very similar) is broken enough, that with NTP running,
> the time can jump -1us or -2us backward if the 14.318... MHz clock in
> the computer goes slightly faster than it should per spec. Errors of
> +200 ppm are quite common.

This is caused by using the TSC to fill in the under 1/HZ
info AND not adjusting the conversion constant used to
convert TSC to micro seconds. A more correct implementation
would change this conversion number as well at the wall
clock tick size.

I have in mind a solution to all of this that involves
adjusting the conversion values that convert the reference
clock to jiffies. The reference clock could be the TSC or
the ACPI pm timer (code done) or some other timer (to be
coded).
>
> This is quite a problem ...

Yes, out astronomy friends agree.
>
> --
> Vojtech Pavlik
> SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml