2008-06-25 14:49:44

by Michael Kerrisk

[permalink] [raw]
Subject: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

Oleg, Thomas,

In the switch to configurable HZ in 2.6, the treatment of the si_utime
and si_stime fields that are exposed to userland via the siginfo
structure looks to have been botched. As things stand, these fields
report times in units of HZ, so that userland gets information that
varies depending on the HZ that the kernel was configured with. This
(trivial, untested) patch changes the reported values to use USER_HZ
units. What do you think of making this change?

Signed-off-by: Michael Kerrisk <[email protected]>

--- /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c.orig 2008-06-24
16:20:39.000000000 +0200
+++ /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c 2008-06-24
16:22:17.000000000 +0200
@@ -1379,10 +1379,9 @@

info.si_uid = tsk->uid;

- /* FIXME: find out whether or not this is supposed to be c*time. */
- info.si_utime = cputime_to_jiffies(cputime_add(tsk->utime,
+ info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
tsk->signal->utime));
- info.si_stime = cputime_to_jiffies(cputime_add(tsk->stime,
+ info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
tsk->signal->stime));

info.si_status = tsk->exit_code & 0x7f;
@@ -1450,9 +1449,8 @@

info.si_uid = tsk->uid;

- /* FIXME: find out whether or not this is supposed to be c*time. */
- info.si_utime = cputime_to_jiffies(tsk->utime);
- info.si_stime = cputime_to_jiffies(tsk->stime);
+ info.si_utime = cputime_to_clock_t(tsk->utime);
+ info.si_stime = cputime_to_clock_t(tsk->stime);

info.si_code = why;
switch (why) {


2008-06-25 15:35:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

On 06/25, Michael Kerrisk wrote:
>
> --- /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c.orig 2008-06-24
> 16:20:39.000000000 +0200
> +++ /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c 2008-06-24
> 16:22:17.000000000 +0200
> @@ -1379,10 +1379,9 @@
>
> info.si_uid = tsk->uid;
>
> - /* FIXME: find out whether or not this is supposed to be c*time. */
> - info.si_utime = cputime_to_jiffies(cputime_add(tsk->utime,
> + info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
> tsk->signal->utime));
> - info.si_stime = cputime_to_jiffies(cputime_add(tsk->stime,
> + info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
> tsk->signal->stime));
>
> info.si_status = tsk->exit_code & 0x7f;
> @@ -1450,9 +1449,8 @@
>
> info.si_uid = tsk->uid;
>
> - /* FIXME: find out whether or not this is supposed to be c*time. */
> - info.si_utime = cputime_to_jiffies(tsk->utime);
> - info.si_stime = cputime_to_jiffies(tsk->stime);
> + info.si_utime = cputime_to_clock_t(tsk->utime);
> + info.si_stime = cputime_to_clock_t(tsk->stime);
>
> info.si_code = why;
> switch (why) {

This looks like the obviously good fix to me.

The patch also deletes the comment about signal_struct->cXtime,
this also looks right: why should we use cutime/cstime ?

Oleg.

2008-06-26 15:10:01

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

On Wed, Jun 25, 2008 at 5:38 PM, Oleg Nesterov <[email protected]> wrote:
> On 06/25, Michael Kerrisk wrote:
>>
>> --- /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c.orig 2008-06-24
>> 16:20:39.000000000 +0200
>> +++ /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c 2008-06-24
>> 16:22:17.000000000 +0200
>> @@ -1379,10 +1379,9 @@
>>
>> info.si_uid = tsk->uid;
>>
>> - /* FIXME: find out whether or not this is supposed to be c*time. */
>> - info.si_utime = cputime_to_jiffies(cputime_add(tsk->utime,
>> + info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
>> tsk->signal->utime));
>> - info.si_stime = cputime_to_jiffies(cputime_add(tsk->stime,
>> + info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
>> tsk->signal->stime));
>>
>> info.si_status = tsk->exit_code & 0x7f;
>> @@ -1450,9 +1449,8 @@
>>
>> info.si_uid = tsk->uid;
>>
>> - /* FIXME: find out whether or not this is supposed to be c*time. */
>> - info.si_utime = cputime_to_jiffies(tsk->utime);
>> - info.si_stime = cputime_to_jiffies(tsk->stime);
>> + info.si_utime = cputime_to_clock_t(tsk->utime);
>> + info.si_stime = cputime_to_clock_t(tsk->stime);
>>
>> info.si_code = why;
>> switch (why) {
>
> This looks like the obviously good fix to me.

Tested now, and it does what I expect.

> The patch also deletes the comment about signal_struct->cXtime,
> this also looks right: why should we use cutime/cstime ?

Hmmm -- maybe I was wrong to delete that comment. I think the point
of the comment was: should the time returned vie these fields of the
signinfo structure also include the times for (grand)children of the
process that had terminated and been wait()ed for. My first take on
that was "no". But now I'm not 100% sure. A quick test on Solaris 8
suggests that these fields *do* include the times of waited for
children. (None of this is specified in POSIX.1, which doesn't
specify si_utime and si_stime.) I've not yet tested FreeBSD (not sure
if it supports these fields or not).

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2008-06-27 08:07:53

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

On Thu, Jun 26, 2008 at 5:09 PM, Michael Kerrisk
<[email protected]> wrote:
> On Wed, Jun 25, 2008 at 5:38 PM, Oleg Nesterov <[email protected]> wrote:
>> On 06/25, Michael Kerrisk wrote:
>>>
>>> --- /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c.orig 2008-06-24
>>> 16:20:39.000000000 +0200
>>> +++ /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c 2008-06-24
>>> 16:22:17.000000000 +0200
>>> @@ -1379,10 +1379,9 @@
>>>
>>> info.si_uid = tsk->uid;
>>>
>>> - /* FIXME: find out whether or not this is supposed to be c*time. */
>>> - info.si_utime = cputime_to_jiffies(cputime_add(tsk->utime,
>>> + info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
>>> tsk->signal->utime));
>>> - info.si_stime = cputime_to_jiffies(cputime_add(tsk->stime,
>>> + info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
>>> tsk->signal->stime));
>>>
>>> info.si_status = tsk->exit_code & 0x7f;
>>> @@ -1450,9 +1449,8 @@
>>>
>>> info.si_uid = tsk->uid;
>>>
>>> - /* FIXME: find out whether or not this is supposed to be c*time. */
>>> - info.si_utime = cputime_to_jiffies(tsk->utime);
>>> - info.si_stime = cputime_to_jiffies(tsk->stime);
>>> + info.si_utime = cputime_to_clock_t(tsk->utime);
>>> + info.si_stime = cputime_to_clock_t(tsk->stime);
>>>
>>> info.si_code = why;
>>> switch (why) {
>>
>> This looks like the obviously good fix to me.
>
> Tested now, and it does what I expect.
>
>> The patch also deletes the comment about signal_struct->cXtime,
>> this also looks right: why should we use cutime/cstime ?
>
> Hmmm -- maybe I was wrong to delete that comment. I think the point
> of the comment was: should the time returned vie these fields of the
> signinfo structure also include the times for (grand)children of the
> process that had terminated and been wait()ed for. My first take on
> that was "no". But now I'm not 100% sure. A quick test on Solaris 8
> suggests that these fields *do* include the times of waited for
> children. (None of this is specified in POSIX.1, which doesn't
> specify si_utime and si_stime.) I've not yet tested FreeBSD (not sure
> if it supports these fields or not).

So, FreeBSD doesn't implement these fields. A more extensive test
(see below) on Solaris (8) shows that it does indeed include the times
of waited-for children in the si_utime and si_stime fields.

I'm not sure if we should emulate that behavior or not - it'd be easy
enough to do so of course. Thoughts, anyone?

Cheers,

Michael

Test
====

(If anyone has access to Tru64 or Irix, I'd be interested to see
results from the test program below.)

Using the test program below allows us to see the difference if the
child doesn't wait on it's child versus if it does wait.

# Child creates (grand)child that burns 1 second of CPU
# Child burns 2 seconds of CPU, and waits for 4 seconds
# Child does not wait on (grand)child

$ ./a.out 2 4 1
Child 17838 about to burn 1 secs
Child 17838 done
Finished burn
Finished sleep
waited for 0 children
SIGCHLD si_status=0x0; si_utime=37 si_stime=164

# Child creates (grand)child that burns 1 second of CPU
# Child burns 2 seconds of CPU, and waits for 4 seconds
# Child waits on (grand)child

$ ./a.out 2 4 1w
Child 17841 about to burn 1 secs
Child 17841 done
Finished burn
Finished sleep
About to wait for child 3 (1w), PID 17841
waited for 1 children
SIGCHLD si_status=0x0; si_utime=59 si_stime=243

==================

/* sigchld_siginfo_time_test.c

Copyright 2008, Linux Foundation
Author: Michael Kerrisk <[email protected]>
Licensed under GPLv2 or later
*/
#include <signal.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>


#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#define usageErr(msg, progName) \
do { fprintf(stderr, "Usage: "); \
fprintf(stderr, msg, progName); \
exit(EXIT_FAILURE); } while (0)

static void
handler(int sig, siginfo_t *siginfo, void *arg)
{
printf("SIGCHLD si_status=0x%x; ",
(unsigned int) siginfo->si_status);
printf("si_utime=%ld si_stime=%ld\n",
siginfo->si_utime, siginfo->si_stime);
} /* handler */


static void
burn(int nsecs)
{
struct rusage start, curr;
float t;

if (getrusage(RUSAGE_SELF, &start) == -1) errExit("getrusage");

for (;;) {
if (getrusage(RUSAGE_SELF, &curr) == -1)
errExit("getrusage");

t = (curr.ru_utime.tv_sec - start.ru_utime.tv_sec) * 1000000 +
(curr.ru_utime.tv_usec - start.ru_utime.tv_usec) +
(curr.ru_stime.tv_sec - start.ru_stime.tv_sec) * 1000000 +
(curr.ru_stime.tv_usec - start.ru_stime.tv_usec);

if (t > nsecs * 1000000)
break;
}
} /* burn */


static void
doChild(int argc, char *argv[])
{
int cn, waited, rtime;
pid_t *childList;
char *ep;

if (signal(SIGCHLD, SIG_DFL) == SIG_ERR) errExit("signal");

childList = calloc(argc, sizeof(pid_t));
if (childList == NULL)
errExit("calloc");

for (cn = 3; cn < argc; cn++) {
rtime = strtol(argv[cn], &ep, 0);

childList[cn] = fork();
if (childList[cn] == -1)
errExit("fork");

if (childList[cn] == 0) {
printf("Child %ld about to burn %d secs\n",
(long) getpid(), rtime);
burn(rtime);
printf("Child %ld done\n", (long) getpid());
exit(EXIT_SUCCESS);
} else {
if (*ep == '\0')
childList[cn] = -childList[cn]; /* Don't waitpid() */
}
}

/* Burn some CPU time ourselves */

burn(atoi(argv[1]));
printf("Finished burn\n");

/* Optionally sleep for a while, so as to allow children that we
don't wait for to terminate */

sleep(atoi(argv[2]));
printf("Finished sleep\n");

/* Wait for children */

waited = 0;
for (cn = 3; cn < argc; cn++) {
if (childList[cn] > 0) {
printf("About to wait for child %d (%s), PID %ld\n",
cn, argv[cn], (long) childList[cn]);
if (waitpid(childList[cn], NULL, 0) == -1) errExit("waitpid");
waited++;
}
}

printf("waited for %d children\n", waited);
} /* doChild */

int
main(int argc, char *argv[])
{
struct sigaction sa;
pid_t childPid;

if (argc < 4 || strcmp(argv[1], "--help") == 0)
usageErr("%s <child-run-time> <child-sleep-time> "
"<nsecs>[w]...\n", argv[0]);

sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGCHLD, &sa, NULL) == -1) errExit("sigaction");

childPid = fork();

if (childPid == -1)
errExit("fork");

if (childPid == 0) {
doChild(argc, argv);
exit(EXIT_SUCCESS);
}

pause(); /* Wait for SIGCHLD */

exit(EXIT_SUCCESS);
} /* main */

2008-06-27 16:31:17

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

On 06/27, Michael Kerrisk wrote:
>
> On Thu, Jun 26, 2008 at 5:09 PM, Michael Kerrisk
> <[email protected]> wrote:
> > On Wed, Jun 25, 2008 at 5:38 PM, Oleg Nesterov <[email protected]> wrote:
> >> On 06/25, Michael Kerrisk wrote:
> >>>
> >>> --- /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c.orig 2008-06-24
> >>> 16:20:39.000000000 +0200
> >>> +++ /home/mtk/ARCHIVE/KERNEL/linux-2.6.26-rc7/kernel/signal.c 2008-06-24
> >>> 16:22:17.000000000 +0200
> >>> @@ -1379,10 +1379,9 @@
> >>>
> >>> info.si_uid = tsk->uid;
> >>>
> >>> - /* FIXME: find out whether or not this is supposed to be c*time. */
> >>> - info.si_utime = cputime_to_jiffies(cputime_add(tsk->utime,
> >>> + info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
> >>> tsk->signal->utime));
> >>> - info.si_stime = cputime_to_jiffies(cputime_add(tsk->stime,
> >>> + info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
> >>> tsk->signal->stime));
> >>>
> >>> info.si_status = tsk->exit_code & 0x7f;
> >>> @@ -1450,9 +1449,8 @@
> >>>
> >>> info.si_uid = tsk->uid;
> >>>
> >>> - /* FIXME: find out whether or not this is supposed to be c*time. */
> >>> - info.si_utime = cputime_to_jiffies(tsk->utime);
> >>> - info.si_stime = cputime_to_jiffies(tsk->stime);
> >>> + info.si_utime = cputime_to_clock_t(tsk->utime);
> >>> + info.si_stime = cputime_to_clock_t(tsk->stime);
> >>>
> >>> info.si_code = why;
> >>> switch (why) {
> >>
> >> This looks like the obviously good fix to me.
> >
> > Tested now, and it does what I expect.
> >
> >> The patch also deletes the comment about signal_struct->cXtime,
> >> this also looks right: why should we use cutime/cstime ?
> >
> > Hmmm -- maybe I was wrong to delete that comment. I think the point
> > of the comment was: should the time returned vie these fields of the
> > signinfo structure also include the times for (grand)children of the
> > process that had terminated and been wait()ed for. My first take on
> > that was "no". But now I'm not 100% sure. A quick test on Solaris 8
> > suggests that these fields *do* include the times of waited for
> > children. (None of this is specified in POSIX.1, which doesn't
> > specify si_utime and si_stime.) I've not yet tested FreeBSD (not sure
> > if it supports these fields or not).
>
> So, FreeBSD doesn't implement these fields. A more extensive test
> (see below) on Solaris (8) shows that it does indeed include the times
> of waited-for children in the si_utime and si_stime fields.
>
> I'm not sure if we should emulate that behavior or not - it'd be easy
> enough to do so of course. Thoughts, anyone?

Well, I don't know. But since POSIX says nothing, perhaps there is
no reason to change the historical behaviour?

Anyway, I think you were right to kill this comment, it is very old
(from 2.4.26 at least), and confusing. It looks as if it suggests
to use signal->cXtime _instead_ of utime/stime.


Hmm. do_notify_parent_cldstop() only uses tsk->xtime even if we report
CLD_STOPPED, but tsk is the "random" thread which does finish_stop().
I don't know what is the supposed behaviour in that case... but unlikely
we should use signal->cXtime, another reason to kill the comment ;)

Oleg.

2008-06-27 20:51:47

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

Like I said, what makes most sense in the abstract is not what we're doing,
and what Solaris does may make more sense. But I don't think it's worth
changing now. Noone can be successfully relying on it, it's not standard, etc.


Thanks,
Roland

2008-06-27 20:52:52

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] make siginfo_t si_utime + si_sstime report times in USER_HZ, not HZ

That looks fine. The si_utime and si_stime fields have never been
standardized and I don't really know of anything that's ever actually used
them. I'm sure whatever it has meant for ages is fine for it to mean now,
and certainly today's clock_t is what matches what this code meant when it
was written. It seems unlikely that cutime rather than utime was ever
desired here. If anything, it should probably be the group-wide utime sum
rather than the thread's, but that is not so trivial to sample and there is
no reason to worry about changing it now.


Thanks,
Roland