2015-08-22 14:01:12

by William Breathitt Gray

[permalink] [raw]
Subject: Regarding USER_HZ and the exposure of kernel jiffies in userspace

Hello,

I submitted a bug report a couple months ago regarding the exposure of
unscaled kernel jiffies in the /proc/timer_list file (see
http://bugzilla.kernel.org/show_bug.cgi?id=99401):

> I noticed that the “jiffies” line from the /proc/timer_list file has a
> value that is not scaled via the USER_HZ constant. Looking into the
> source code of the kernel/time/timer_list.c file, I found lines
> 189-190 to be the cause:
>
> SEQ_printf(m, "jiffies: %Lu\n",
> (unsigned long long)jiffies);
>
> The actual kernel jiffies are printed out directly without scaling. I
> was under the impression that all kernel jiffies should be scaled via
> USER_HZ -- e.g. through the jiffies_to_clock_t function provided by
> include/linux/jiffies.h -- before exposure in userspace.

There has been no response since, and the behavior is still present in
Linux version 4.1.6, so I suspect my understanding is faulty and the
exposure of unscaled kernel jiffies is in fact intentional behavior.

I would like to understand why this behavior is intentional, and correct
my faulty impression of the design. Here's my understanding so far,
please let me know where I go wrong:

The Linux kernel used to have HZ set at a constant 100 for all
architectures. As additional architecture support was added, the HZ
value became variable: e.g. Linux on one machine could have a HZ
value of 1000 while Linux on another machine could have a HZ value
of 100.

This possibility of a variable HZ value caused existing user code,
which had hardcoded an expectation of HZ set to 100, to break due to
the exposure in userspace of kernel jiffies which may have be based
on a HZ value that was not equal to 100.

To prevent the chaos that would occur from years of existing user
code hardcoding a constant HZ value of 100, a compromise was made:
any exposure of kernel jiffies to userspace should be scaled via a
new USER_HZ value -- thus preventing existing user code from
breaking on machines with a different HZ value, while still allowing
the kernel on those machines to have a HZ value different from the
historic 100 value.

I believe the error in my understanding is the assumption that _all_
instances of kernel jiffies exposure in userspace should be scaled; but
it appears that not all instances are. When are kernel jiffies meant to
be scaled via USER_HZ, and when are they not?

Thanks,

William Breathitt Gray


2015-08-22 14:29:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Regarding USER_HZ and the exposure of kernel jiffies in userspace

On Sat, 22 Aug 2015, William Breathitt Gray wrote:
> I believe the error in my understanding is the assumption that _all_
> instances of kernel jiffies exposure in userspace should be scaled; but
> it appears that not all instances are. When are kernel jiffies meant to
> be scaled via USER_HZ, and when are they not?

All instances which are de facto APIs, syscalls and also various files
in proc/ must be in USER_HZ because userspace applications depend on
the USER_HZ value.

proc/timer_list is exempt from that because its more a debugging
interface which is not part of the strict kernel API. And we really
want to see the real values and not the scaled USER_HZ ones for that
purpose. I hope that answers your question.

Thanks,

tglx