Is it possible to find out what the kernel's notion of HZ is from user
space?
It seem to change from system to system and between 2.4 (100 on i386)
to 2.6 (1000 on i386).
On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> Is it possible to find out what the kernel's notion of HZ is from user
> space?
> It seem to change from system to system and between 2.4 (100 on i386)
> to 2.6 (1000 on i386).
if you can see 1000 from userspace that is a bad kernel bug; can you say
where you find something in units of 1000 ?
Arjan van de Ven wrote:
> On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
>
>>Is it possible to find out what the kernel's notion of HZ is from user
>>space?
>>It seem to change from system to system and between 2.4 (100 on i386)
>>to 2.6 (1000 on i386).
>
>
> if you can see 1000 from userspace that is a bad kernel bug; can you say
> where you find something in units of 1000 ?
create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
which can be found by crawling through the stack above the pointer
to the last environment variable.
--
On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> Arjan van de Ven wrote:
> >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> >
> >>Is it possible to find out what the kernel's notion of HZ is from user
> >>space?
> >>It seem to change from system to system and between 2.4 (100 on i386)
> >>to 2.6 (1000 on i386).
> >
> >
> >if you can see 1000 from userspace that is a bad kernel bug; can you say
> >where you find something in units of 1000 ?
>
> create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> which can be found by crawling through the stack above the pointer
> to the last environment variable.
Ugh that should say 100 on x86....
but..
param.h:# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
.....
that looks like 100 to me.
On Saturday 13 March 2004 12:24 pm, Arjan van de Ven wrote:
> On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
> > It seem to change from system to system and between 2.4 (100 on i386)
> > to 2.6 (1000 on i386).
>
> if you can see 1000 from userspace that is a bad kernel bug; can you say
> where you find something in units of 1000 ?
2.6.3-rc1-mm1
procinfo gives the timer interrupt counting 1000 ints/sec
tho procinfo is broken for other stuff like 2.4 showed pages swapped, pages
read in and out.
--
tabris
-
"We never make assertions, Miss Taggart," said Hugh Akston. "That is
the moral crime peculiar to our enemies. We do not tell -- we *show*.
We do not claim -- we *prove*."
-- Ayn Rand, _Atlas Shrugged_
On Sat, Mar 13, 2004 at 06:24:31PM +0100, Arjan van de Ven wrote:
> On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
> > It seem to change from system to system and between 2.4 (100 on i386)
> > to 2.6 (1000 on i386).
>
> if you can see 1000 from userspace that is a bad kernel bug; can you say
> where you find something in units of 1000 ?
I can't see it from user space. Its in the kernel headers. The thing is
I am working on fixes to laptop mode. The problem is it requires
changing bdflush and journaled file systems journal flush times. The
problem is that some of these (bdflush, xfs) expect the value in jiffies
and not seconds or milliseconds so making the initiation script portable
requires knowing the value of HZ.
On Sat, Mar 13, 2004 at 08:38:52PM +0100, Arjan van de Ven wrote:
> On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> > Arjan van de Ven wrote:
> > >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > >
> > >>Is it possible to find out what the kernel's notion of HZ is from user
> > >>space?
> > >>It seem to change from system to system and between 2.4 (100 on i386)
> > >>to 2.6 (1000 on i386).
> > >
> > >
> > >if you can see 1000 from userspace that is a bad kernel bug; can you say
> > >where you find something in units of 1000 ?
> >
> > create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> > NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> > which can be found by crawling through the stack above the pointer
> > to the last environment variable.
>
> Ugh that should say 100 on x86....
> but..
> param.h:# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
> param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
> .....
> that looks like 100 to me.
>
When dealing with bdflush and a few other interfaces the values need to
be in jiffies which requires knowledge of the kernels notion of HZ not
userspace.
The other option is to try to push a change to make the interface in
centisecs instead of jiffies, question is if it will catch.
On Sat, 2004-03-13 at 23:14, Micha Feigin wrote:
> On Sat, Mar 13, 2004 at 08:38:52PM +0100, Arjan van de Ven wrote:
> > On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> > > Arjan van de Ven wrote:
> > > >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > >
> > > >>Is it possible to find out what the kernel's notion of HZ is from user
> > > >>space?
> > > >>It seem to change from system to system and between 2.4 (100 on i386)
> > > >>to 2.6 (1000 on i386).
> > > >
> > > >
> > > >if you can see 1000 from userspace that is a bad kernel bug; can you say
> > > >where you find something in units of 1000 ?
> > >
> > > create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> > > NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> > > which can be found by crawling through the stack above the pointer
> > > to the last environment variable.
> >
> > Ugh that should say 100 on x86....
> > but..
> > param.h:# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
> > param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
> > .....
> > that looks like 100 to me.
> >
>
> When dealing with bdflush and a few other interfaces the values need to
> be in jiffies which requires knowledge of the kernels notion of HZ not
> userspace.
Wrong. Any such interface is supposed to convert automatically. Any
interface you can find that doesn't should be reported as a serious bug!
On Sat, 2004-03-13 at 23:10, Micha Feigin wrote:
> On Sat, Mar 13, 2004 at 06:24:31PM +0100, Arjan van de Ven wrote:
> > On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > Is it possible to find out what the kernel's notion of HZ is from user
> > > space?
> > > It seem to change from system to system and between 2.4 (100 on i386)
> > > to 2.6 (1000 on i386).
> >
> > if you can see 1000 from userspace that is a bad kernel bug; can you say
> > where you find something in units of 1000 ?
>
> I can't see it from user space. Its in the kernel headers. The thing is
> I am working on fixes to laptop mode. The problem is it requires
> changing bdflush and journaled file systems journal flush times. The
> problem is that some of these (bdflush, xfs) expect the value in jiffies
> and not seconds or milliseconds so making the initiation script portable
> requires knowing the value of HZ.
the kernel side is supposed to use clock_t_to_jiffies() and co for this
to present a unified HZ to userspace. The internal kernel HZ should
*NOT* leak out to usespace. Heck it's quite thinkable that in the future
there will be no such HZ.
On Sat, Mar 13, 2004 at 11:32:39PM +0100, Arjan van de Ven wrote:
> On Sat, 2004-03-13 at 23:14, Micha Feigin wrote:
> > On Sat, Mar 13, 2004 at 08:38:52PM +0100, Arjan van de Ven wrote:
> > > On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> > > > Arjan van de Ven wrote:
> > > > >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > > >
> > > > >>Is it possible to find out what the kernel's notion of HZ is from user
> > > > >>space?
> > > > >>It seem to change from system to system and between 2.4 (100 on i386)
> > > > >>to 2.6 (1000 on i386).
> > > > >
> > > > >
> > > > >if you can see 1000 from userspace that is a bad kernel bug; can you say
> > > > >where you find something in units of 1000 ?
> > > >
> > > > create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> > > > NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> > > > which can be found by crawling through the stack above the pointer
> > > > to the last environment variable.
> > >
> > > Ugh that should say 100 on x86....
> > > but..
> > > param.h:# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
> > > param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
> > > .....
> > > that looks like 100 to me.
> > >
> >
> > When dealing with bdflush and a few other interfaces the values need to
> > be in jiffies which requires knowledge of the kernels notion of HZ not
> > userspace.
>
> Wrong. Any such interface is supposed to convert automatically. Any
> interface you can find that doesn't should be reported as a serious bug!
>
Like I said, look at bdflush in 2.4 (this was fixed with the changed 2.6
interface) and xfs proc interface in both 2.4 and 2.6.
In light of your post then there is a serious bug.
For example for bdflush age_buffer field (true for the other used fields
also), no conversion:
bh->b_flushtime = jiffies + bdf_prm.b_un.age_buffer;
For xfs flush interval:
if (pbd_active == 1) {
mod_timer(&pb_daemon_timer,
jiffies + pb_params.flush_interval.val);
interruptible_sleep_on(&pbd_waitq);
}
xfs should be converted to centisecs, bdflush should also be converted
to centisecs, or the interface from 2.6 should somehow be ported to
exist in parallel to the 2.4 one.
I don't mind making a patch, which approach should be used?
On Sat, Mar 13, 2004 at 11:41:25PM +0100, Arjan van de Ven wrote:
> On Sat, 2004-03-13 at 23:10, Micha Feigin wrote:
> > On Sat, Mar 13, 2004 at 06:24:31PM +0100, Arjan van de Ven wrote:
> > > On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > > Is it possible to find out what the kernel's notion of HZ is from user
> > > > space?
> > > > It seem to change from system to system and between 2.4 (100 on i386)
> > > > to 2.6 (1000 on i386).
> > >
> > > if you can see 1000 from userspace that is a bad kernel bug; can you say
> > > where you find something in units of 1000 ?
> >
> > I can't see it from user space. Its in the kernel headers. The thing is
> > I am working on fixes to laptop mode. The problem is it requires
> > changing bdflush and journaled file systems journal flush times. The
> > problem is that some of these (bdflush, xfs) expect the value in jiffies
> > and not seconds or milliseconds so making the initiation script portable
> > requires knowing the value of HZ.
>
> the kernel side is supposed to use clock_t_to_jiffies() and co for this
> to present a unified HZ to userspace. The internal kernel HZ should
> *NOT* leak out to usespace. Heck it's quite thinkable that in the future
> there will be no such HZ.
>
>
Kernel side doesn't do that at the moment. Even the fixed bdflush
interface in 2.6 which has dirty_writeback_centisecs as an example
converts it as
(dirty_writeback_centisecs * HZ) / 100
Micha Feigin <[email protected]> wrote:
>
> > Wrong. Any such interface is supposed to convert automatically. Any
> > interface you can find that doesn't should be reported as a serious bug!
> >
>
> Like I said, look at bdflush in 2.4 (this was fixed with the changed 2.6
> interface) and xfs proc interface in both 2.4 and 2.6.
> In light of your post then there is a serious bug.
>
> For example for bdflush age_buffer field (true for the other used fields
> also), no conversion:
> bh->b_flushtime = jiffies + bdf_prm.b_un.age_buffer;
I doubt if there's any motivation to fix these things in 2.4. If you change
HZ in 2.4 you own both pieces. (alpha has HZ=1024 in 2.4, so presumably
bdflush tuning doesn't work right).
In 2.6, the bdflush parameters do not exist. They were replaced by
/proc/sys/vm/*_centisecs, which are HZ-independent.
There are, I think, still some /proc tunables in 2.6 which do depend upon
HZ and they should be found and fixed. If the same tunables are present in
2.4 kernels then they should be converted to take centiseconds in 2.6, so
2.4-based tools continue to work correctly.
We have similar problems where /proc tunables are expressed in terms of
"number of pages". As PAGE_SIZE varies from 4096 to 65536 this is
sometimes wrong. Fixing this is more subtle.
Micha Feigin <[email protected]> said:
> Is it possible to find out what the kernel's notion of HZ is from user
> space?
What for? It should be invisible to userspace...
> It seem to change from system to system and between 2.4 (100 on i386)
> to 2.6 (1000 on i386).
And can also be tweaked when compiling, and depends on architecture, and...
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Sat, Mar 13, 2004 at 05:49:29PM -0800, Andrew Morton wrote:
> Micha Feigin <[email protected]> wrote:
> >
> > > Wrong. Any such interface is supposed to convert automatically. Any
> > > interface you can find that doesn't should be reported as a serious bug!
> > >
> >
> > Like I said, look at bdflush in 2.4 (this was fixed with the changed 2.6
> > interface) and xfs proc interface in both 2.4 and 2.6.
> > In light of your post then there is a serious bug.
> >
> > For example for bdflush age_buffer field (true for the other used fields
> > also), no conversion:
> > bh->b_flushtime = jiffies + bdf_prm.b_un.age_buffer;
>
> I doubt if there's any motivation to fix these things in 2.4. If you change
> HZ in 2.4 you own both pieces. (alpha has HZ=1024 in 2.4, so presumably
> bdflush tuning doesn't work right).
>
There is for laptop mode which is now in the kernel so a generic startup
script can be written.
I will right a patch and post it in a new thread and see how it takes.
> In 2.6, the bdflush parameters do not exist. They were replaced by
> /proc/sys/vm/*_centisecs, which are HZ-independent.
>
> There are, I think, still some /proc tunables in 2.6 which do depend upon
> HZ and they should be found and fixed. If the same tunables are present in
> 2.4 kernels then they should be converted to take centiseconds in 2.6, so
> 2.4-based tools continue to work correctly.
>
> We have similar problems where /proc tunables are expressed in terms of
> "number of pages". As PAGE_SIZE varies from 4096 to 65536 this is
> sometimes wrong. Fixing this is more subtle.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> +++++++++++++++++++++++++++++++++++++++++++
> This Mail Was Scanned By Mail-seCure System
> at the Tel-Aviv University CC.
>
On Sat, Mar 13, 2004 at 11:45:17PM -0300, Horst von Brand wrote:
> Micha Feigin <[email protected]> said:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
>
> What for? It should be invisible to userspace...
>
Its not. Some proc interfaces expect time in jiffies, which means
knowing HZ (bdflush in 2.4 or xfs for example).
> > It seem to change from system to system and between 2.4 (100 on i386)
> > to 2.6 (1000 on i386).
>
> And can also be tweaked when compiling, and depends on architecture, and...
> --
> Dr. Horst H. von Brand User #22616 counter.li.org
> Departamento de Informatica Fono: +56 32 654431
> Universidad Tecnica Federico Santa Maria +56 32 654239
> Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> +++++++++++++++++++++++++++++++++++++++++++
> This Mail Was Scanned By Mail-seCure System
> at the Tel-Aviv University CC.
>
> The internal kernel HZ should *NOT* leak out to usespace.
/proc/interrupts "leaks" the value of HZ. On x86, for instance:
( cat /proc/interrupts; sleep 5; cat /proc/interrupts ) | grep timer
--
Horst von Brand wrote:
> What for? It should be invisible to userspace...
It's not invisible. select/poll/epoll/setitimer round their time
argument according to HZ, and programs which do smooth (i.e. low
_jitter_) animation of the kind where the eye is sensitive to the
jitter need to track it and correct for it.
-- Jamie
* Horst von Brand <[email protected]> [2004-03-14]:
> Micha Feigin <[email protected]> said:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
>
> What for? It should be invisible to userspace...
>
A related issue that's bugged me for a long time is lack of userspace
access to the quantity that's called 'freq_scale' in 2.4, where it's
(1<<SHIFT_HZ)/HZ for HZ!=100 and 128/128.125 for HZ==100. (I haven't
started to reverse-engineer the equivalent value in 2.6, I took a quick
look once and concluded things had got a little more hairy.)
My interest is that I maintain (in spare-time) an NTP application called
chrony (http://chrony.sunsite.dk/), originally written to be good for
dial-up, i.e. NTP servers accessible for a short window once or twice a
day. This app wants to tune the parameters it passes to adjtimex() to
take a best shot at keeping the system clock correct over the
potentially 'long' offline period. To do this well, it has to
reverse-compensate for the freq_scale multiplier that the kernel will
apply to the frequency value passed to adjtimex(). Getting the right
value for this across different kernels has always been a fragile
exercise.
--
Richard \\\ SH-4/SH-5 Core & Debug Architect
Curnow \\\ SuperH (UK) Ltd, Bristol
Arjan van de Ven wrote:
> On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
>
>>Arjan van de Ven wrote:
>>
>>>On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
>>>
>>>
>>>>Is it possible to find out what the kernel's notion of HZ is from user
>>>>space?
>>>>It seem to change from system to system and between 2.4 (100 on i386)
>>>>to 2.6 (1000 on i386).
>>>
>>>
>>>if you can see 1000 from userspace that is a bad kernel bug; can you say
>>>where you find something in units of 1000 ?
>>
>>create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
>> NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
>>which can be found by crawling through the stack above the pointer
>>to the last environment variable.
>
>
> Ugh that should say 100 on x86....
> but..
> param.h:# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
> param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
> .....
> that looks like 100 to me.
>
This horrible hack of converting all tick values to 100 (from 1000) for
export to user space because a large number of user space programs
assume that HZ is 100 would NOT be necessary if there was a mechanism
whereby user space programs could find out how many ticks there are in a
second instead of having to make assumptions.
I think that providing such a mechanism should be a priority and when
it's been available for a reasonable amount time (so that the user space
programs can be converted to using it) USER_HZ should become equal to HZ.
Another alternative would be to stop exporting time as ticks and use
some standard unit for all systems. The chosen unit should be small
enough (e.g. microseconds or mybe even nanoseconds) so that no
information is lost (which it is in the current implementation) on
conversion from ticks to these units. Of course 64 bit integers would
be needed.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Peter Williams <[email protected]> writes:
> This horrible hack of converting all tick values to 100 (from 1000)
> for export to user space because a large number of user space programs
> assume that HZ is 100 would NOT be necessary if there was a mechanism
> whereby user space programs could find out how many ticks there are in
> a second instead of having to make assumptions.
Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
for it too (sysconf)
-Andi
Andi Kleen wrote:
> Peter Williams <[email protected]> writes:
>
>
>>This horrible hack of converting all tick values to 100 (from 1000)
>>for export to user space because a large number of user space programs
>>assume that HZ is 100 would NOT be necessary if there was a mechanism
>>whereby user space programs could find out how many ticks there are in
>>a second instead of having to make assumptions.
>
>
> Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
> for it too (sysconf)
So it does and POSIX.1 (_SC_CLK_TCK) compliant as well. Unfortunately,
the presence of this functionality makes it VERY difficult to understand
why ticks are being converted from HZ==1000 values to HZ=100 values when
they are being exported to user space especially as this conversion
throws away precision. Can anyone enlighten me?
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
On Tue, Mar 16, 2004 at 11:28:18AM +1100, Peter Williams wrote:
> >Ugh that should say 100 on x86....
> >but..
> >param.h:# define USER_HZ 100 /* .. some user interfaces
> >are in "ticks" */
> >param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
> >.....
> >that looks like 100 to me.
> >
>
> This horrible hack of converting all tick values to 100 (from 1000) for
> export to user space because a large number of user space programs
> assume that HZ is 100 would NOT be necessary if there was a mechanism
> whereby user space programs could find out how many ticks there are in a
> second instead of having to make assumptions.
there is one. Nothing uses it
(sysconf() provides this info)
> So it does and POSIX.1 (_SC_CLK_TCK) compliant as well. Unfortunately,
> the presence of this functionality makes it VERY difficult to understand
> why ticks are being converted from HZ==1000 values to HZ=100 values when
> they are being exported to user space especially as this conversion
> throws away precision. Can anyone enlighten me?
There are two different cases here:
Timer tick as visible to user space in the minimum delay of select()
and other kernel functions with timeout. That is what AT_CLKTCK aims at.
And exports of values with jiffie units in sysctls in /proc. This was in fact i
always a bug because they should have used ms or s as unit
(there are readily usable utility functions to do this for sysctl). Otherwise
writing documentation becomes quite difficult. But there are already i
configurations that set or read these values and was not a good idea to
subtly and silently break them. Especially since they predate any exporting
of HZ to user space. So the the conversion factor was added.
This is not only obscure sysctls, ps and top are also consumers of such
jiffies values in /proc
-Andi
On Die, 2004-03-16 at 06:53, Peter Williams wrote:
> Andi Kleen wrote:
> > Peter Williams <[email protected]> writes:
[...]
> > Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
> > for it too (sysconf)
>
> So it does and POSIX.1 (_SC_CLK_TCK) compliant as well. Unfortunately,
> the presence of this functionality makes it VERY difficult to understand
> why ticks are being converted from HZ==1000 values to HZ=100 values when
> they are being exported to user space especially as this conversion
> throws away precision. Can anyone enlighten me?
1) Because Linux had long time HZ=100 hardcoded (except on Alphas) and
lots of applications probably use that value today (as HZ in their
source and not sysconf(...)) - especially since 2.4 (at least most
of them) has HZ=100 except for 64bit CPUs).
2) There are patches which dynamically change the CPU speed. And it
probably (IMHO) makes sense to change HZ dynamically too in that
situations. And a over-time changing HZ value is useless in
user-space.
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
[various people]
>> This horrible hack of converting all tick values to 100
>> (from 1000) for export to user space because a large number
>> of user space programs assume that HZ is 100 would NOT be
>> necessary if there was a mechanism whereby user space
>> programs could find out how many ticks there are in a
>> second instead of having to make assumptions.
>
> there is one. Nothing uses it
> (sysconf() provides this info)
If you have a recent glibc on a recent kernel, it might.
You could also get a -1 or a supposed ABI value that
has nothing to do with the kernel currently running.
The most reliable way is to first look around on the
stack in search of ELF notes, and then fall back to
some horribly gross hacks as needed.
> /proc/interrupts "leaks" the value of HZ. On x86, for instance:
> ( cat /proc/interrupts; sleep 5; cat /proc/interrupts ) | grep timer
That doesn't really count. The code could be set to do a
dozen interrupts per jiffie tick. Jiffies are what matter.
>> It seem to change from system to system and between 2.4
>> (100 on i386) to 2.6 (1000 on i386).
>
> And can also be tweaked when compiling, and depends on architecture, and...
Yep. For Linux 2.4.xx and up, ELF notes provide the data.
For older systems, you need to compute the ratio of uptime
to total jiffies.
>> This horrible hack of converting all tick values to 100
>> (from 1000) for export to user space because a large number
>> of user space programs assume that HZ is 100 would NOT be
>> necessary if there was a mechanism whereby user space
>> programs could find out how many ticks there are in a
>> second instead of having to make assumptions.
>
> Already exists for a long time - AT_CLKTCK. glibc has a
> nice wrapper for it too (sysconf)
AT_CLKTCK is new with the 2.4 kernel. When it is missing or
unsupported by an old glibc, the sysconf() call returns
a guess instead of an error code. So sysconf() is worthless
if you want to support old kernels (Debian!) or old glibc.
> This is not only obscure sysctls, ps and top are
> also consumers of such jiffies values in /proc
They follow AT_CLKTCK when it is available, not a HZ
value from some header file. So you can change HZ quite
a bit and these tools won't mind.
> 1) Because Linux had long time HZ=100 hardcoded
> (except on Alphas) and lots of applications
> probably use that value today (as HZ in their
> source and not sysconf(...)) - especially
> since 2.4 (at least most of them) has HZ=100
> except for 64bit CPUs).
That is severely broken anyway.
At least with Linux 2.4 kernels, many ports have used
a hardware-specific HZ value. All did, really, if you
consider user-mode Linux. My table:
10 S/390 (sometimes)
20 user-mode Linux
32 ia64 emulator
64 StrongARM /Shark
100 normal Linux
128 MIPS, ARM
1000 ARM
1024 Alpha, ia64
1200 Alpha
Any app supporting Linux 2.4 with an old glibc or
supporting Linux 2.2 will need to do something evil.
> A related issue that's bugged me for a long time is lack
> of userspace access to the quantity that's called
> 'freq_scale' in 2.4, where it's (1<<SHIFT_HZ)/HZ for
> HZ!=100 and 128/128.125 for HZ==100. (I haven't started
> to reverse-engineer the equivalent value in 2.6, I took
> a quick look once and concluded things had got a little
> more hairy.)
>
> My interest is that I maintain (in spare-time) an NTP
> application called chrony (http://chrony.sunsite.dk/),
> originally written to be good for dial-up, i.e. NTP
> servers accessible for a short window once or twice a day.
> This app wants to tune the parameters it passes to
> adjtimex() to take a best shot at keeping the system
> clock correct over the potentially 'long' offline period.
> To do this well, it has to reverse-compensate for the
> freq_scale multiplier that the kernel will apply to the
> frequency value passed to adjtimex(). Getting the right
> value for this across different kernels has always been
> a fragile exercise.
Arrrrgh!!!! I thought I had it bad.
Fortunately this is a fresh new reason to beg Linus for
some data. (all previous arguments have been rejected)
What would be useful for you?
HZ (-1 for tickless?)
USER_HZ
freq_scale
some boolean to indicate ppc-like (pure cycle counter) time
???
* Albert Cahalan <[email protected]> [2004-03-16]:
>
> Fortunately this is a fresh new reason to beg Linus for
> some data. (all previous arguments have been rejected)
> What would be useful for you?
>
> HZ (-1 for tickless?)
> USER_HZ
> freq_scale
> some boolean to indicate ppc-like (pure cycle counter) time
> ???
freq_scale would be a good starting point, I think.
However, there is worse. There is bounds checking on the txc.freq
argument to adjtimex(). IIRC the bounds have changed at various points
in the kernel history, but at one time the limit was +/- 100ppm. At the
time, I had a mobo with a -300ppm clock error. To cope with this,
chrony modifies txc.tick to take out the gross error as well as txc.freq
to adjust the fine error. Therefore, it needs some idea of how tick and
freq inter-relate, and what the valid range of values for tick is. This
is another mess. I need to go away and think some more to know info
from the kernel side would make the problem easier to code for, though.
--
Richard \\\ SH-4/SH-5 Core & Debug Architect
Curnow \\\ SuperH (UK) Ltd, Bristol
On Monday 15 March 2004 00:17, Jamie Lokier wrote:
> Horst von Brand wrote:
> > What for? It should be invisible to userspace...
>
> It's not invisible. select/poll/epoll/setitimer round their time
> argument according to HZ, and programs which do smooth (i.e. low
> _jitter_) animation of the kind where the eye is sensitive to the
> jitter need to track it and correct for it.
>
Wouldn't it be better to just use high res timers and associated posix interfaces or low jitter applications?
--mgross
Andi Kleen wrote:
>>So it does and POSIX.1 (_SC_CLK_TCK) compliant as well. Unfortunately,
>>the presence of this functionality makes it VERY difficult to understand
>>why ticks are being converted from HZ==1000 values to HZ=100 values when
>>they are being exported to user space especially as this conversion
>>throws away precision. Can anyone enlighten me?
>
>
> There are two different cases here:
>
> Timer tick as visible to user space in the minimum delay of select()
> and other kernel functions with timeout. That is what AT_CLKTCK aims at.
Which is a good reason for USER_HZ to be the same as HZ.
>
> And exports of values with jiffie units in sysctls in /proc. This was in fact i
> always a bug because they should have used ms or s as unit
> (there are readily usable utility functions to do this for sysctl). Otherwise
> writing documentation becomes quite difficult. But there are already i
> configurations that set or read these values and was not a good idea to
> subtly and silently break them. Especially since they predate any exporting
> of HZ to user space. So the the conversion factor was added.
>
> This is not only obscure sysctls, ps and top are also consumers of such
> jiffies values in /proc
>
These programs could (and should) use sysconfig(_SC_CLK_TCK) to find out
how many ticks there are in a second so this does not constitute a good
reason for USER_HZ not being equal to HZ.
BTW, in ignorance of sysconfig(_SC_CLK_TCK) and because of statements to
the same effect in Robert Love's book, I had been assuming that this was
the reason for USER_HZ and HZ not being equal. But now that I've been
told about sysconfig(_SC_CLK_TCK) I can see no valid reason. That
doesn't mean that there aren't any but the reasons you've advanced
certainly aren't them.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Arjan van de Ven wrote:
> On Tue, Mar 16, 2004 at 11:28:18AM +1100, Peter Williams wrote:
>
>>>Ugh that should say 100 on x86....
>>>but..
>>>param.h:# define USER_HZ 100 /* .. some user interfaces
>>>are in "ticks" */
>>>param.h:# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
>>>.....
>>>that looks like 100 to me.
>>>
>>
>>This horrible hack of converting all tick values to 100 (from 1000) for
>>export to user space because a large number of user space programs
>>assume that HZ is 100 would NOT be necessary if there was a mechanism
>>whereby user space programs could find out how many ticks there are in a
>>second instead of having to make assumptions.
>
>
> there is one. Nothing uses it
> (sysconf() provides this info)
Seems to me that it would be fairly trivial to modify those programs
(that should use this mechanism but don't) to use it? So why should
they be allowed to dictate kernel behaviour?
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Bernd Petrovitsch wrote:
> On Die, 2004-03-16 at 06:53, Peter Williams wrote:
>
>>Andi Kleen wrote:
>>
>>>Peter Williams <[email protected]> writes:
>
> [...]
>
>>>Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
>>>for it too (sysconf)
>>
>>So it does and POSIX.1 (_SC_CLK_TCK) compliant as well. Unfortunately,
>>the presence of this functionality makes it VERY difficult to understand
>>why ticks are being converted from HZ==1000 values to HZ=100 values when
>>they are being exported to user space especially as this conversion
>>throws away precision. Can anyone enlighten me?
>
>
> 1) Because Linux had long time HZ=100 hardcoded (except on Alphas) and
> lots of applications probably use that value today (as HZ in their
> source and not sysconf(...)) - especially since 2.4 (at least most
> of them) has HZ=100 except for 64bit CPUs).
That is not a valid reason. The programs should be fixed.
> 2) There are patches which dynamically change the CPU speed. And it
> probably (IMHO) makes sense to change HZ dynamically too in that
> situations. And a over-time changing HZ value is useless in
> user-space.
I can't see why. Ticks are used internally for process accounting (e.g.
utime, stime, cutime and cstime) and if HZ was changing dynamically
you'd have to visit every task and modify these values to be consistent
with the changed value of HZ. Even if HZ was allowed to change
dynamically the values reported to user space should be in units
appropriate to the MAXIMUM possible value of HZ so that precision is not
lost.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
>
> These programs could (and should) use sysconfig(_SC_CLK_TCK) to find out
> how many ticks there are in a second so this does not constitute a good
> reason for USER_HZ not being equal to HZ.
These programs are usually shell scripts that initialise some sysctls.
It's not easy to call sysconf from there. Also we tend to avoid breaking
things that would fail silently instead of failing with an obvious error
message. This would be such a case. Silent breakage is an extremly bad
thing.
-Andi
Andi Kleen wrote:
>>These programs could (and should) use sysconfig(_SC_CLK_TCK) to find out
>>how many ticks there are in a second so this does not constitute a good
>>reason for USER_HZ not being equal to HZ.
>
>
> These programs are usually shell scripts that initialise some sysctls.
Which ones? Top and ps don't appear to be scripts on my system (Red Hat
9.0).
> It's not easy to call sysconf from there.
A small utility program would suffice.
> Also we tend to avoid breaking
> things that would fail silently instead of failing with an obvious error
> message. This would be such a case. Silent breakage is an extremly bad
> thing.
This is the responsibility of the authors of the programs in question
not the kernel.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
> > there is one. Nothing uses it
> > (sysconf() provides this info)
>
> If you have a recent glibc on a recent kernel, it might.
> You could also get a -1 or a supposed ABI value that
> has nothing to do with the kernel currently running.
> The most reliable way is to first look around on the
> stack in search of ELF notes, and then fall back to
> some horribly gross hacks as needed.
eh sysconf() is the nice way to get to the ELF notes instead of having to
grovel yourself.
On Wed, Mar 17, 2004 at 10:38:03AM +1100, Peter Williams wrote:
> >there is one. Nothing uses it
> >(sysconf() provides this info)
>
> Seems to me that it would be fairly trivial to modify those programs
> (that should use this mechanism but don't) to use it? So why should
> they be allowed to dictate kernel behaviour?
quality of implementation; for example shell scripts that want to do
echo 500 > /proc/sys/foo/bar/something_in_HZ
...
or /etc/sysctl.conf or ...
>>>there is one. Nothing uses it
>>>(sysconf() provides this info)
>>
>>Seems to me that it would be fairly trivial to modify those programs
>>(that should use this mechanism but don't) to use it? So why should
>>they be allowed to dictate kernel behaviour?
>
>
> quality of implementation; for example shell scripts that want to do
> echo 500 > /proc/sys/foo/bar/something_in_HZ
> ...
> or /etc/sysctl.conf or ...
>
Then write a simple program already. How hard is it to write a program
that does a sysconf() and returns (as ascii of course) just the
value of HZ? Then do some trivial calculation off of that.
HZ=$(gethz)
If your 500 was 5 seconds, do
TIME=$[HZ*5]
echo $TIME > /proc/sys/foo/bar/something_in_HZ
I mean, come on.
Then you include it in the default distro of choice so that
everybody can use it and there you are.
If someone doesn't have "gethz" then they can download it.
// Stefan
On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
>
> Then you include it in the default distro of choice so that
> everybody can use it and there you are.
but what is the POINT of all this changing/breaking ?
Can someone at least tell me that ?
On Sat, 2004-03-20 at 04:56, Arjan van de Ven wrote:
> On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
> > > there is one. Nothing uses it
> > > (sysconf() provides this info)
> >
> > If you have a recent glibc on a recent kernel, it might.
> > You could also get a -1 or a supposed ABI value that
> > has nothing to do with the kernel currently running.
> > The most reliable way is to first look around on the
> > stack in search of ELF notes, and then fall back to
> > some horribly gross hacks as needed.
>
> eh sysconf() is the nice way to get to the ELF notes
> instead of having to grovel yourself.
Unless there is some hidden feature that lets
me specify the ELF note number directly, no way.
The sysconf(_SC_CLK_TCK) call does not return an
error code when used on a 2.2.xx i386 kernel.
You get an arbitrary value that fails for ARM,
Alpha, and any system with modified HZ.
You can't rely on sysconf(_SC_NPROCESSORS_CONF)
or sysconf(_SC_NPROCESSORS_ONLN) either. You'll
get back a 0 from the SPARC glibc, which really
means 0 processors since -1 is the error code.
Whatever the question, "use sysconf" is most
likely not the answer.
The man page ought to mention this.
Arjan van de Ven wrote:
> On Wed, Mar 17, 2004 at 10:38:03AM +1100, Peter Williams wrote:
>
>>>there is one. Nothing uses it
>>>(sysconf() provides this info)
>>
>>Seems to me that it would be fairly trivial to modify those programs
>>(that should use this mechanism but don't) to use it? So why should
>>they be allowed to dictate kernel behaviour?
>
>
> quality of implementation; for example shell scripts that want to do
> echo 500 > /proc/sys/foo/bar/something_in_HZ
> ...
> or /etc/sysctl.conf or ...
>
A small utility program secs_to_ticks would solve this problem e.g.:
secs_to_ticks 0.5 > /proc/sys/foo/bar/something_in_HZ
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Albert Cahalan wrote:
> On Sat, 2004-03-20 at 04:56, Arjan van de Ven wrote:
>
>>On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
>>
>>>>there is one. Nothing uses it
>>>>(sysconf() provides this info)
>>>
>>>If you have a recent glibc on a recent kernel, it might.
>>>You could also get a -1 or a supposed ABI value that
>>>has nothing to do with the kernel currently running.
>>>The most reliable way is to first look around on the
>>>stack in search of ELF notes, and then fall back to
>>>some horribly gross hacks as needed.
>>
>>eh sysconf() is the nice way to get to the ELF notes
>>instead of having to grovel yourself.
>
>
> Unless there is some hidden feature that lets
> me specify the ELF note number directly, no way.
>
> The sysconf(_SC_CLK_TCK) call does not return an
> error code when used on a 2.2.xx i386 kernel.
> You get an arbitrary value that fails for ARM,
> Alpha, and any system with modified HZ.
As Linux is supposed to be POSIX compliant this is a bug and should be
fixed.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Arjan van de Ven wrote:
> On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
>
>>Then you include it in the default distro of choice so that
>>everybody can use it and there you are.
>
>
> but what is the POINT of all this changing/breaking ?
> Can someone at least tell me that ?
In the 2.6 kernels internal timing and task statistics (for i386
systems) are now kept in milliseconds where they were previously in
1/100ths of a second. By converting these statistics to 1/100ths of a
second for export to user space an order of magnitude (i.e. a factor of
10) loss of precision occurs.
Peter
PS I'd like to point out that there are changes in 2.6 kernels that have
more serious consequences than this that have to be coped with when
using 2.6 kernels on distributions such as RedHat 9 that were built
around older kernels.
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
On Sun, 21 Mar 2004, Peter Williams wrote:
> In the 2.6 kernels internal timing and task statistics (for i386
> systems) are now kept in milliseconds where they were previously in
> 1/100ths of a second. By converting these statistics to 1/100ths of a
> second for export to user space an order of magnitude (i.e. a factor of
> 10) loss of precision occurs.
No. The statistics are not a result of full bookkeeping, but simply
gained by periodically sampling the processor state. So they don't
have a precision of 1/1000th of a second anyways.
Tim
Tim Schmielau wrote:
> On Sun, 21 Mar 2004, Peter Williams wrote:
>
>
>>In the 2.6 kernels internal timing and task statistics (for i386
>>systems) are now kept in milliseconds where they were previously in
>>1/100ths of a second. By converting these statistics to 1/100ths of a
>>second for export to user space an order of magnitude (i.e. a factor of
>>10) loss of precision occurs.
>
>
> No. The statistics are not a result of full bookkeeping, but simply
> gained by periodically sampling the processor state. So they don't
> have a precision of 1/1000th of a second anyways.
1/1000th of a second IS the internal timing precision. The issue of how
tasks' CPU usage is allocated for reporting is a different matter but
from a statistical viewpoint this will just effect the variance (or
standard deviation) of the estimates and NOT their precision. As the
number of samples the variance (or standard deviation) decrease rapidly
so to all intents and purposes the statistics are accurate to the
nearest 1/1000th of a second.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
[email protected] (Stefan Smietanowski) wrote on 20.03.04 in <[email protected]>:
> >>>there is one. Nothing uses it
> >>>(sysconf() provides this info)
> >>
> >>Seems to me that it would be fairly trivial to modify those programs
> >>(that should use this mechanism but don't) to use it? So why should
> >>they be allowed to dictate kernel behaviour?
> >
> >
> > quality of implementation; for example shell scripts that want to do
> > echo 500 > /proc/sys/foo/bar/something_in_HZ
> > ...
> > or /etc/sysctl.conf or ...
> >
>
> Then write a simple program already. How hard is it to write a program
> that does a sysconf() and returns (as ascii of course) just the
> value of HZ? Then do some trivial calculation off of that.
How about a slightly more useful utility, like this:
$ getconf CLK_TCK
100
$ getconf OPEN_MAX
1024
$ getconf PATH_MAX /proc/
4096
$
MfG Kai
Hi.
>>>>>there is one. Nothing uses it
>>>>>(sysconf() provides this info)
>>>>
>>>>Seems to me that it would be fairly trivial to modify those programs
>>>>(that should use this mechanism but don't) to use it? So why should
>>>>they be allowed to dictate kernel behaviour?
>>>
>>>
>>>quality of implementation; for example shell scripts that want to do
>>>echo 500 > /proc/sys/foo/bar/something_in_HZ
>>>...
>>>or /etc/sysctl.conf or ...
>>>
>>
>>Then write a simple program already. How hard is it to write a program
>>that does a sysconf() and returns (as ascii of course) just the
>>value of HZ? Then do some trivial calculation off of that.
>
>
> How about a slightly more useful utility, like this:
>
> $ getconf CLK_TCK
> 100
> $ getconf OPEN_MAX
> 1024
> $ getconf PATH_MAX /proc/
> 4096
> $
Yes, yes, yes, I like that one actually.
It does solve the shell script issues and we've never said that things
don't need to adapt to changes before so I don't see why not now.
And that one would be good to have regardless of the HZ issue.
// Stefan
On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> >>>there is one. Nothing uses it
> >>>(sysconf() provides this info)
> >>
> >>Seems to me that it would be fairly trivial to modify those programs
> >>(that should use this mechanism but don't) to use it? So why should
> >>they be allowed to dictate kernel behaviour?
> >
> >
> >quality of implementation; for example shell scripts that want to do
> >echo 500 > /proc/sys/foo/bar/something_in_HZ
> >...
> >or /etc/sysctl.conf or ...
> >
>
> Then write a simple program already. How hard is it to write a program
> that does a sysconf() and returns (as ascii of course) just the
> value of HZ? Then do some trivial calculation off of that.
>
> HZ=$(gethz)
>
> If your 500 was 5 seconds, do
>
> TIME=$[HZ*5]
> echo $TIME > /proc/sys/foo/bar/something_in_HZ
>
Will this be USER_HZ or kernel HZ?
Someone earlier suggested it would be USER_HZ which would make it
pointless.
> I mean, come on.
>
> Then you include it in the default distro of choice so that
> everybody can use it and there you are.
>
> If someone doesn't have "gethz" then they can download it.
>
> // Stefan
>
Micha Feigin wrote:
> On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
>
>>>>>there is one. Nothing uses it
>>>>>(sysconf() provides this info)
>>>>
>>>>Seems to me that it would be fairly trivial to modify those programs
>>>>(that should use this mechanism but don't) to use it? So why should
>>>>they be allowed to dictate kernel behaviour?
>>>
>>>
>>>quality of implementation; for example shell scripts that want to do
>>>echo 500 > /proc/sys/foo/bar/something_in_HZ
>>>...
>>>or /etc/sysctl.conf or ...
>>>
>>
>>Then write a simple program already. How hard is it to write a program
>>that does a sysconf() and returns (as ascii of course) just the
>>value of HZ? Then do some trivial calculation off of that.
>>
>>HZ=$(gethz)
>>
>>If your 500 was 5 seconds, do
>>
>>TIME=$[HZ*5]
>>echo $TIME > /proc/sys/foo/bar/something_in_HZ
>>
>
>
> Will this be USER_HZ or kernel HZ?
> Someone earlier suggested it would be USER_HZ which would make it
> pointless.
It has to be whatever enables user space to correctly interpret values
sent to user space as "ticks". That means USER_HZ and it's not useless
as it enables USER_HZ to be different and/or change without breaking
programs that use values expressed in "ticks".
>
>
>>I mean, come on.
>>
>>Then you include it in the default distro of choice so that
>>everybody can use it and there you are.
>>
>>If someone doesn't have "gethz" then they can download it.
>>
>>// Stefan
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Peter Williams wrote:
> >Will this be USER_HZ or kernel HZ?
> >Someone earlier suggested it would be USER_HZ which would make it
> >pointless.
>
> It has to be whatever enables user space to correctly interpret values
> sent to user space as "ticks". That means USER_HZ and it's not useless
> as it enables USER_HZ to be different and/or change without breaking
> programs that use values expressed in "ticks".
It is, however, useless for the _other_ reasons userspace needs to
know kernel HZ, including as I mentioned userspace timer granularity.
(Btw, that usage would be better as a period rather than a frequency,
so that a "tickless" kernel can report zero).
The fundamental problem is that there are two values, and both values
have programs which can usefully use them.
How hard can it be to export both?
-- Jamie
Jamie Lokier wrote:
> Peter Williams wrote:
>
>>>Will this be USER_HZ or kernel HZ?
>>>Someone earlier suggested it would be USER_HZ which would make it
>>>pointless.
>>
>>It has to be whatever enables user space to correctly interpret values
>>sent to user space as "ticks". That means USER_HZ and it's not useless
>>as it enables USER_HZ to be different and/or change without breaking
>>programs that use values expressed in "ticks".
>
>
> It is, however, useless for the _other_ reasons userspace needs to
> know kernel HZ, including as I mentioned userspace timer granularity.
Theoretically, which I know can be a pain, user space timer granularity
should be in USER_HZ as, theoretically, this is the only one user space
is supposed to know about. Because of this, in my view, HZ and USER_HZ
should be the same or USER_HZ should be greater than HZ.
>
> (Btw, that usage would be better as a period rather than a frequency,
> so that a "tickless" kernel can report zero).
_SC_CLK_TCK is a POSIX.1 definition and can't be changed. But I don't
think that there's any impediment to adding new parameters that can be
reported by sysconf().
>
> The fundamental problem is that there are two values, and both values
> have programs which can usefully use them.
>
> How hard can it be to export both?
>
Making HZ == USER_HZ would also solve the problem.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Peter Williams wrote:
> Making HZ == USER_HZ would also solve the problem.
They were equal once.
Making them equal now would reintroduce the problem that USER_HZ was
created to resolve: some userspace programs hard-code the value, so it
cannot be changed in interfaces used by those programs.
-- Jamie
On Tue, Mar 23, 2004 at 10:04:22AM +1100, Peter Williams wrote:
> Micha Feigin wrote:
> >On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> >
> >>>>>there is one. Nothing uses it
> >>>>>(sysconf() provides this info)
> >>>>
> >>>>Seems to me that it would be fairly trivial to modify those programs
> >>>>(that should use this mechanism but don't) to use it? So why should
> >>>>they be allowed to dictate kernel behaviour?
> >>>
> >>>
> >>>quality of implementation; for example shell scripts that want to do
> >>>echo 500 > /proc/sys/foo/bar/something_in_HZ
> >>>...
> >>>or /etc/sysctl.conf or ...
> >>>
> >>
> >>Then write a simple program already. How hard is it to write a program
> >>that does a sysconf() and returns (as ascii of course) just the
> >>value of HZ? Then do some trivial calculation off of that.
> >>
> >>HZ=$(gethz)
> >>
> >>If your 500 was 5 seconds, do
> >>
> >>TIME=$[HZ*5]
> >>echo $TIME > /proc/sys/foo/bar/something_in_HZ
> >>
> >
> >
> >Will this be USER_HZ or kernel HZ?
> >Someone earlier suggested it would be USER_HZ which would make it
> >pointless.
>
> It has to be whatever enables user space to correctly interpret values
> sent to user space as "ticks". That means USER_HZ and it's not useless
> as it enables USER_HZ to be different and/or change without breaking
> programs that use values expressed in "ticks".
>
Unless the kernel is converted to make that conversion possible then it
is useless at the moment since userspace gets USER_HZ and the kernel
proc interface speaks (KERNEL) HZ so userspace really has no idea how
to speak to kernel space with 2.6.
> >
> >
> >>I mean, come on.
> >>
> >>Then you include it in the default distro of choice so that
> >>everybody can use it and there you are.
> >>
> >>If someone doesn't have "gethz" then they can download it.
> >>
> >>// Stefan
> >>
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at http://www.tux.org/lkml/
>
>
> --
> Dr Peter Williams, Chief Scientist [email protected]
> Aurema Pty Limited Tel:+61 2 9698 2322
> PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
> 79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
>
>
> +++++++++++++++++++++++++++++++++++++++++++
> This Mail Was Scanned By Mail-seCure System
> at the Tel-Aviv University CC.
>
Jamie Lokier wrote:
> Peter Williams wrote:
>
>>Making HZ == USER_HZ would also solve the problem.
>
>
> They were equal once.
>
> Making them equal now would reintroduce the problem that USER_HZ was
> created to resolve: some userspace programs hard-code the value, so it
> cannot be changed in interfaces used by those programs.
That was the wrong solution to that particular problem. The programs
should have been fixed rather than the kernel being maimed to
accommodate their shortcomings.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Peter Williams wrote:
> >>Making HZ == USER_HZ would also solve the problem.
> >
> >Making them equal now would reintroduce the problem that USER_HZ was
> >created to resolve: some userspace programs hard-code the value, so it
> >cannot be changed in interfaces used by those programs.
>
> That was the wrong solution to that particular problem. The programs
> should have been fixed rather than the kernel being maimed to
> accommodate their shortcomings.
I agree, and perhaps that should still be done so we can eliminate USER_HZ.
-- Jamie
On Sun, 21 Mar 2004 10:58:20 +1100 Peter Williams wrote:
| Albert Cahalan wrote:
| > On Sat, 2004-03-20 at 04:56, Arjan van de Ven wrote:
| >
| >>On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
| >>
| >>>>there is one. Nothing uses it
| >>>>(sysconf() provides this info)
| >>>
| >>>If you have a recent glibc on a recent kernel, it might.
| >>>You could also get a -1 or a supposed ABI value that
| >>>has nothing to do with the kernel currently running.
| >>>The most reliable way is to first look around on the
| >>>stack in search of ELF notes, and then fall back to
| >>>some horribly gross hacks as needed.
| >>
| >>eh sysconf() is the nice way to get to the ELF notes
| >>instead of having to grovel yourself.
| >
| >
| > Unless there is some hidden feature that lets
| > me specify the ELF note number directly, no way.
| >
| > The sysconf(_SC_CLK_TCK) call does not return an
| > error code when used on a 2.2.xx i386 kernel.
| > You get an arbitrary value that fails for ARM,
| > Alpha, and any system with modified HZ.
|
| As Linux is supposed to be POSIX compliant this is a bug and should be
| fixed.
My understanding (from a few years back) is that Linux is POSIX
if/when/where it makes sense, but not necessarily POSIX-just-to-be-POSIX.
--
~Randy
"You can't do anything without having to do something else first."
-- Belefant's Law
> | >>>>there is one. Nothing uses it
> | >>>>(sysconf() provides this info)
> | >>>
> | >>>If you have a recent glibc on a recent kernel, it might.
> | >>>You could also get a -1 or a supposed ABI value that
> | >>>has nothing to do with the kernel currently running.
> | >>>The most reliable way is to first look around on the
> | >>>stack in search of ELF notes, and then fall back to
> | >>>some horribly gross hacks as needed.
> | >>
> | >>eh sysconf() is the nice way to get to the ELF notes
> | >>instead of having to grovel yourself.
> | >
> | >
> | > Unless there is some hidden feature that lets
> | > me specify the ELF note number directly, no way.
> | >
> | > The sysconf(_SC_CLK_TCK) call does not return an
> | > error code when used on a 2.2.xx i386 kernel.
> | > You get an arbitrary value that fails for ARM,
> | > Alpha, and any system with modified HZ.
> |
> | As Linux is supposed to be POSIX compliant this is a bug and should be
> | fixed.
>
>
> My understanding (from a few years back) is that Linux is POSIX
> if/when/where it makes sense, but not necessarily POSIX-just-to-be-POSIX.
The fixing has been done.
This is not yet helpful for app developers, because
old kernels and old libraries are still in use.
If you rely on sysconf(_SC_CLK_TCK) to work, then
your software will support:
* all systems with a 2.6.xx kernel
* all systems with a 2.4.xx kernel and recent glibc
* all i386 systems running with the default HZ
That's quite a bit I suppose. Maybe you have no
interest in supporting a 1200 HZ Alpha with an old
kernel or glibc. Maybe you don't care about somebody
running a 2.2.xx kernel with modified HZ.
For the moment, I still care. I won't for long.
Albert Cahalan wrote:
> If you rely on sysconf(_SC_CLK_TCK) to work, then
> your software will support:
>
> * all systems with a 2.6.xx kernel
> * all systems with a 2.4.xx kernel and recent glibc
> * all i386 systems running with the default HZ
>
> That's quite a bit I suppose. Maybe you have no
> interest in supporting a 1200 HZ Alpha with an old
> kernel or glibc. Maybe you don't care about somebody
> running a 2.2.xx kernel with modified HZ.
I'm still unclear. Does sysconf(_SC_CLK_TCK), when it is reliable,
return HZ or USER_HZ?
-- Jamie
On Thu, Apr 01, 2004 at 04:54:20PM +0100, Jamie Lokier wrote:
> Albert Cahalan wrote:
> > If you rely on sysconf(_SC_CLK_TCK) to work, then
> > your software will support:
> >
> > * all systems with a 2.6.xx kernel
> > * all systems with a 2.4.xx kernel and recent glibc
> > * all i386 systems running with the default HZ
> >
> > That's quite a bit I suppose. Maybe you have no
> > interest in supporting a 1200 HZ Alpha with an old
> > kernel or glibc. Maybe you don't care about somebody
> > running a 2.2.xx kernel with modified HZ.
>
> I'm still unclear. Does sysconf(_SC_CLK_TCK), when it is reliable,
> return HZ or USER_HZ?
USER_HZ; the value all the userspace interfaces are in.
HZ doesn't mean nothing, esp when we go to a tickless kernel...
On Thu, 2004-04-01 at 10:54, Jamie Lokier wrote:
> Albert Cahalan wrote:
> > If you rely on sysconf(_SC_CLK_TCK) to work, then
> > your software will support:
> >
> > * all systems with a 2.6.xx kernel
> > * all systems with a 2.4.xx kernel and recent glibc
> > * all i386 systems running with the default HZ
> >
> > That's quite a bit I suppose. Maybe you have no
> > interest in supporting a 1200 HZ Alpha with an old
> > kernel or glibc. Maybe you don't care about somebody
> > running a 2.2.xx kernel with modified HZ.
>
> I'm still unclear. Does sysconf(_SC_CLK_TCK), when it is reliable,
> return HZ or USER_HZ?
I consider "reliable" to mean it returns whatever is
used by /proc and other kernel interfaces. Prior to the
2.6.xx (and late 2.5.xx) kernels USER_HZ did not exist.
On a 2.6.xx kernel, you get back USER_HZ.
On a 2.4.xx kernel with recent glibc, you get
back HZ, which works OK since there isn't any
HZ to USER_HZ conversion.
On any i386 system with the default HZ, you
will get back 100. On older systems, glibc is
just giving you a constant value -- so it is
correct if your system is an i386 without any
non-Linus modifications. An old glibc can only
do sysconf(_SC_CLK_TCK) this way.
Arjan van de Ven wrote:
> HZ doesn't mean nothing, esp when we go to a tickless kernel...
As explained several times in this thread, HZ is meaningful because it
affects the rounding in select/poll/epoll/setitimer. A few userspace
programs with low jitter soft-RT timing requirements need to
compensate for that rounding and/or deliberately synchronise
themselves with the tick.
Such programs can determine HZ experimentally and lock onto the tick
in the manner of a PLL, but it would be nice to simply be able to
have the value, to reduce the number of control variables.
When we go to a tickless kernel and offer high-resolution timers to
userspace, then it will be irrelevant. Until then, or if the kernel
goes tickless but limits the resolution of timers for efficiency, the
value of HZ is still relevant.
Not to get irritatingly back to the subject of this thread or
anything, but... is the value of HZ reported to userspace anywhere?
Thanks :)
-- Jamie
On Thu, 1 Apr 2004, Jamie Lokier wrote:
> Arjan van de Ven wrote:
> > HZ doesn't mean nothing, esp when we go to a tickless kernel...
>
> As explained several times in this thread, HZ is meaningful because it
> affects the rounding in select/poll/epoll/setitimer. A few userspace
> programs with low jitter soft-RT timing requirements need to
> compensate for that rounding and/or deliberately synchronise
> themselves with the tick.
>
> Such programs can determine HZ experimentally and lock onto the tick
> in the manner of a PLL, but it would be nice to simply be able to
> have the value, to reduce the number of control variables.
>
> When we go to a tickless kernel and offer high-resolution timers to
> userspace, then it will be irrelevant. Until then, or if the kernel
> goes tickless but limits the resolution of timers for efficiency, the
> value of HZ is still relevant.
>
> Not to get irritatingly back to the subject of this thread or
> anything, but... is the value of HZ reported to userspace anywhere?
>
> Thanks :)
> -- Jamie
I may be naive, but what's the matter with:
#include <stdio.h>
#include <sys/param.h> // Required to be here!
int main()
{
printf("HZ=%d\n", HZ);
return 0;
}
It works for me.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.
Richard B. Johnson wrote:
> > Not to get irritatingly back to the subject of this thread or
> > anything, but... is the value of HZ reported to userspace anywhere?
>
> I may be naive, but what's the matter with:
>
> #include <sys/param.h> // Required to be here!
> int main()
> {
> printf("HZ=%d\n", HZ);
> return 0;
> }
> It works for me.
It gives the wrong answer for HZ on 2.6 kernels. Try it.
The value called "HZ" we are talking about in this thread is the timer
interrupt frequency. On 2.6 kernels, on x86, that is 1000. Your
program prints 100.
The reason that you are able to use "HZ" from userspace and get the
wrong answer is that the macros have different names when used from
userspace than from kernelspace.
The value your program reports is what we mean by USER_HZ in this
thread. That macro is renamed to HZ when the kernel header
<linux/param.h> is included from userspace, for backward
source compatibility with some programs.
Your method also perpetuates the problem that USER_HZ is hard-coded as
a constant into programs, so cannot ever be changed. Perhaps the
header files should redefine "HZ" to call sysconf(_SC_CLK_TCK)
nowadays, but presently they don't.
-- Jamie
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Thursday 01 April 2004 18:50, Richard B. Johnson wrote:
> I may be naive, but what's the matter with:
>
> #include <stdio.h>
> #include <sys/param.h> // Required to be here!
> int main()
> {
> printf("HZ=%d\n", HZ);
> return 0;
> }
> It works for me.
What when you compile this tool under a system with,
for example 2.4 kern-headers, and switch to a system
with a 2.6 kernel and kern-headers? It still reports
HZ=100 and that's not true anymore.
- --
Regards Michael Buesch [ http://www.tuxsoft.de.vu ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAbIkzFGK1OIvVOP4RArmgAJ0QKFVPLjyYH/OZVox9TLGEGSKHWACcC6FP
b++fJyobg5K+FP7Nskx4Djo=
=SckD
-----END PGP SIGNATURE-----
Jamie Lokier wrote:
> Arjan van de Ven wrote:
>
>>HZ doesn't mean nothing, esp when we go to a tickless kernel...
>
>
> As explained several times in this thread, HZ is meaningful because it
> affects the rounding in select/poll/epoll/setitimer. A few userspace
> programs with low jitter soft-RT timing requirements need to
> compensate for that rounding and/or deliberately synchronise
> themselves with the tick.
>
> Such programs can determine HZ experimentally and lock onto the tick
> in the manner of a PLL, but it would be nice to simply be able to
> have the value, to reduce the number of control variables.
>
> When we go to a tickless kernel and offer high-resolution timers to
> userspace, then it will be irrelevant. Until then, or if the kernel
> goes tickless but limits the resolution of timers for efficiency, the
> value of HZ is still relevant.
The resolution will always be limited. That's the nature of digital
systems. Unlimited resolution would require real "real" numbers and
that's not possible. The nearest you get on a digital system is the
floating point APPROXIMATION to real numbers.
>
> Not to get irritatingly back to the subject of this thread or
> anything, but... is the value of HZ reported to userspace anywhere?
I don't think so. There are those (I'm not one) who insist that to do
so would be a bug.
IMHO, as I've said several times, USER_HZ should be changed to be equal
to or greater than HZ. In fact, if having USER_HZ greater than HZ would
still make it unusable for your purposes, I'd change that opinion to say
USER_HZ should be equal to HZ (or, in other words, cease to exist).
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Richard B. Johnson wrote:
> On Thu, 1 Apr 2004, Jamie Lokier wrote:
>
>
>>Arjan van de Ven wrote:
>>
>>>HZ doesn't mean nothing, esp when we go to a tickless kernel...
>>
>>As explained several times in this thread, HZ is meaningful because it
>>affects the rounding in select/poll/epoll/setitimer. A few userspace
>>programs with low jitter soft-RT timing requirements need to
>>compensate for that rounding and/or deliberately synchronise
>>themselves with the tick.
>>
>>Such programs can determine HZ experimentally and lock onto the tick
>>in the manner of a PLL, but it would be nice to simply be able to
>>have the value, to reduce the number of control variables.
>>
>>When we go to a tickless kernel and offer high-resolution timers to
>>userspace, then it will be irrelevant. Until then, or if the kernel
>>goes tickless but limits the resolution of timers for efficiency, the
>>value of HZ is still relevant.
>>
>>Not to get irritatingly back to the subject of this thread or
>>anything, but... is the value of HZ reported to userspace anywhere?
>>
>>Thanks :)
>>-- Jamie
>
>
> I may be naive, but what's the matter with:
>
> #include <stdio.h>
> #include <sys/param.h> // Required to be here!
> int main()
> {
> printf("HZ=%d\n", HZ);
> return 0;
> }
> It works for me.
There's no guarantee that the kernel that's running was compiled using
that header file which (on my system i.e. RedHat 9) comes as part of the
glibc package. It also gets the value indirectly via linux/param.h
which in turn gets it via asm/param.h which makes any such program
highly non portable. Not to mention that the HZ obtained this way is
100 which is actually not the same as HZ in the 2.6.5-rc3 kernel that
I'm running.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Peter Williams wrote:
> >When we go to a tickless kernel and offer high-resolution timers to
> >userspace, then it will be irrelevant. Until then, or if the kernel
> >goes tickless but limits the resolution of timers for efficiency, the
> >value of HZ is still relevant.
>
> The resolution will always be limited. That's the nature of digital
> systems. Unlimited resolution would require real "real" numbers and
> that's not possible. The nearest you get on a digital system is the
> floating point APPROXIMATION to real numbers.
Sure, but HZ will still be irrelevant. There won't be a HZ to report.
> IMHO, as I've said several times, USER_HZ should be changed to be equal
> to or greater than HZ. In fact, if having USER_HZ greater than HZ would
> still make it unusable for your purposes, I'd change that opinion to say
> USER_HZ should be equal to HZ (or, in other words, cease to exist).
It's not possible to change USER_HZ. There are too many programs with
the number hard-coded into the binary. The best we could do is make
the HZ userspace macro non-constant, so it calls sysconf(_SC_CLK_TCK),
and wait a few years until practically all programs being used no
longer contain a hard-coded constant. Then we could get rid of USER_HZ again.
-- Jamie
Jamie Lokier wrote:
> Peter Williams wrote:
>
>>>When we go to a tickless kernel and offer high-resolution timers to
>>>userspace, then it will be irrelevant. Until then, or if the kernel
>>>goes tickless but limits the resolution of timers for efficiency, the
>>>value of HZ is still relevant.
>>
>>The resolution will always be limited. That's the nature of digital
>>systems. Unlimited resolution would require real "real" numbers and
>>that's not possible. The nearest you get on a digital system is the
>>floating point APPROXIMATION to real numbers.
>
>
> Sure, but HZ will still be irrelevant. There won't be a HZ to report.
>
>
>>IMHO, as I've said several times, USER_HZ should be changed to be equal
>>to or greater than HZ. In fact, if having USER_HZ greater than HZ would
>>still make it unusable for your purposes, I'd change that opinion to say
>>USER_HZ should be equal to HZ (or, in other words, cease to exist).
>
>
> It's not possible to change USER_HZ. There are too many programs with
> the number hard-coded into the binary.
This is an argument that the tail should be allowed to wag the dog and
is not really valid :-)
> The best we could do is make
> the HZ userspace macro non-constant, so it calls sysconf(_SC_CLK_TCK),
> and wait a few years until practically all programs being used no
> longer contain a hard-coded constant. Then we could get rid of USER_HZ again.
If USER_HZ is dispensed with the programs will get fixed pretty quick
but as long as this concession to buggy programs is made they won't get
fixed (because they don't have to be).
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
Peter Williams wrote:
>> It's not possible to change USER_HZ. There are too many programs with
>> the number hard-coded into the binary.
>
> This is an argument that the tail should be allowed to wag the dog and
> is not really valid :-)
It is an interesting, but untenable, position that the applications
are the tail and the OS is the dog. The OS exists to serve the applications.
The applications, are, after all what a user actually DOES with their computer.
It is possible that the current applications which use hardcoded USER_HZ are
not important enough, or are easy enough to fix, that the cost in incompatibility
is offset by the benefit of providing different behaviour for future applications.
But breaking them for no good reason, and particularly while there is a
migration path possible over time which does not break compatibility, seems like
a bad idea.
=============================
Tim Bird
Architecture Group Co-Chair
CE Linux Forum
Senior Staff Engineer
Sony Electronics
E-mail: [email protected]
=============================
Tim Bird wrote:
> Peter Williams wrote:
>
>>> It's not possible to change USER_HZ. There are too many programs with
>>> the number hard-coded into the binary.
>>
>>
>> This is an argument that the tail should be allowed to wag the dog and
>> is not really valid :-)
>
>
> It is an interesting, but untenable, position that the applications
> are the tail and the OS is the dog. The OS exists to serve the
> applications.
> The applications, are, after all what a user actually DOES with their
> computer.
I guess wagging was a bad analogy. I was thinking in terms of the
kernel being the main entity and the programs being peripheral in the
sense that the kernel can exist without the programs but the programs
can't exist without the kernel.
>
> It is possible that the current applications which use hardcoded USER_HZ
> are
> not important enough, or are easy enough to fix, that the cost in
> incompatibility
> is offset by the benefit of providing different behaviour for future
> applications.
Yes, this is the real point is that the facilities provided by the
kernel shouldn't be tailored/compromised to cope with the problems of a
couple of buggy programs especially when fixing the programs would be
trivial. I don't think the importance of the program is an issue as I
doubt that there is any program that is so important that its
requirements dictate kernel design.
>
> But breaking them for no good reason, and particularly while there is a
> migration path possible over time which does not break compatibility,
> seems like
> a bad idea.
Far more important things than these programs have been "broken" by
changes in the kernel (I know, I've had to cope with them getting 2.6.X
kernels to work with Red Hat 9) but no one complains or suggests that
the kernel should revert to its original behaviour. Change is part of
progress and has to be coped with not resisted for no good reason.
Peter
--
Dr Peter Williams, Chief Scientist [email protected]
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com