2004-04-13 22:38:44

by john stultz

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

On Thu, 2004-02-26 at 16:20, George Anzinger wrote:
> john stultz wrote:
> > On Thu, 2004-02-26 at 15:06, George Anzinger wrote:
> >>john stultz wrote:
> >>>On Wed, 2004-02-25 at 13:10, George Anzinger wrote:
> >>>>Albert Cahalan wrote:
> >>>>
> >>>>>This is NOT sane. Remeber that procps doesn't get to see HZ.
> >>>>>Only USER_HZ is available, as the AT_CLKTCK ELF note.
> >>>>>
> >>>>>I think the way to fix this is to skip or add a tick
> >>>>>every now and then, so that the long-term HZ is exact.
> >>>>>
> >>>>>Another way is to simply choose between pure old-style
> >>>>>tick-based timekeeping and pure new-style cycle-based
> >>>>>(TSC or ACPI) timekeeping. Systems with uncooperative
> >>>>>hardware have to use the old-style time keeping. This
> >>>>>should simply the code greatly.
> >>>>
> >>>>On checking the code and thinking about this, I would suggest that we change
> >>>>start_time in the task struct to be the wall time (or monotonic time if that
> >>>>seems better). I only find two places this is used, in proc and in the
> >>>>accounting code. Both of these could easily be changed. Of course, even
> >>>>leaving it as it is, they could be changed to report more correct values by
> >>>>using the correct conversions to translate the system HZ to USER_HZ.
> >>>
> >>>
> >>>Is this close to what your thinking of?
> >>>I can't reproduce the issue on my systems, so I'll need someone else to
> >>>test this.
> >>
> >>More or less. I wonder if:
> >
> >>static inline long jiffies_to_clock_t(long x)
> >>{
> >> u64 tmp = (u64)x * TICK_NSEC;
> >> div64(tmp, (NSEC_PER_SEC / USER_HZ));
> >> return (long)x;
> >>}
> >>might be better as it addresses the overflow issue. Should be able to toss the
> >>#if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to
> >>eliminate the div64 but I don't think this path is used enough to justify the
> >>clarity ;) that would make.
> >
> > Sounds good to me. Would you mind sending the diff so Petri and David
> > could test it?
>
> Oops, I have been caught :) The above was composed in the email window. I
> don't have a 2.6.x kernel up at the moment and I don't have any free cycles...
> Late next week??

Finally got a chance to go through my work queue and yikes! This is
seriously stale! As neither George or I have come to bat with a patch,
I'll attempt a swing.

Albert/David: Would you mind testing the following to see if it resolves
the issue for you?

George: Mind skimming this to make sure its close enough to what you
intended?

thanks
-john


diff -Nru a/include/linux/times.h b/include/linux/times.h
--- a/include/linux/times.h Tue Apr 13 15:00:25 2004
+++ b/include/linux/times.h Tue Apr 13 15:00:25 2004
@@ -7,7 +7,12 @@
#include <asm/param.h>

#if (HZ % USER_HZ)==0
-# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
+static inline long jiffies_to_clock_t(long x)
+{
+ u64 tmp = (u64)x * TICK_NSEC;
+ x = do_div(tmp, (NSEC_PER_SEC / USER_HZ));
+ return (long)tmp;
+}
#else
# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
#endif






2004-04-13 22:59:25

by George Anzinger

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

john stultz wrote:
> On Thu, 2004-02-26 at 16:20, George Anzinger wrote:
>
>>john stultz wrote:
>>
>>>On Thu, 2004-02-26 at 15:06, George Anzinger wrote:
>>>
>>>>john stultz wrote:
>>>>
>>>>>On Wed, 2004-02-25 at 13:10, George Anzinger wrote:
>>>>>
>>>>>>Albert Cahalan wrote:
>>>>>>
>>>>>>
>>>>>>>This is NOT sane. Remeber that procps doesn't get to see HZ.
>>>>>>>Only USER_HZ is available, as the AT_CLKTCK ELF note.
>>>>>>>
>>>>>>>I think the way to fix this is to skip or add a tick
>>>>>>>every now and then, so that the long-term HZ is exact.
>>>>>>>
>>>>>>>Another way is to simply choose between pure old-style
>>>>>>>tick-based timekeeping and pure new-style cycle-based
>>>>>>>(TSC or ACPI) timekeeping. Systems with uncooperative
>>>>>>>hardware have to use the old-style time keeping. This
>>>>>>>should simply the code greatly.
>>>>>>
>>>>>>On checking the code and thinking about this, I would suggest that we change
>>>>>>start_time in the task struct to be the wall time (or monotonic time if that
>>>>>>seems better). I only find two places this is used, in proc and in the
>>>>>>accounting code. Both of these could easily be changed. Of course, even
>>>>>>leaving it as it is, they could be changed to report more correct values by
>>>>>>using the correct conversions to translate the system HZ to USER_HZ.
>>>>>
>>>>>
>>>>>Is this close to what your thinking of?
>>>>>I can't reproduce the issue on my systems, so I'll need someone else to
>>>>>test this.
>>>>
>>>>More or less. I wonder if:
>>>
>>>>static inline long jiffies_to_clock_t(long x)
>>>>{
>>>> u64 tmp = (u64)x * TICK_NSEC;
>>>> div64(tmp, (NSEC_PER_SEC / USER_HZ));
>>>> return (long)x;
>>>>}
>>>>might be better as it addresses the overflow issue. Should be able to toss the
>>>>#if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to
>>>>eliminate the div64 but I don't think this path is used enough to justify the
>>>>clarity ;) that would make.
>>>
>>>Sounds good to me. Would you mind sending the diff so Petri and David
>>>could test it?
>>
>>Oops, I have been caught :) The above was composed in the email window. I
>>don't have a 2.6.x kernel up at the moment and I don't have any free cycles...
>>Late next week??
>
>
> Finally got a chance to go through my work queue and yikes! This is
> seriously stale! As neither George or I have come to bat with a patch,
> I'll attempt a swing.
>
> Albert/David: Would you mind testing the following to see if it resolves
> the issue for you?
>
> George: Mind skimming this to make sure its close enough to what you
> intended?

Looks rather like exactly what I intended.

-g
>
> thanks
> -john
>
>
> diff -Nru a/include/linux/times.h b/include/linux/times.h
> --- a/include/linux/times.h Tue Apr 13 15:00:25 2004
> +++ b/include/linux/times.h Tue Apr 13 15:00:25 2004
> @@ -7,7 +7,12 @@
> #include <asm/param.h>
>
> #if (HZ % USER_HZ)==0
> -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
> +static inline long jiffies_to_clock_t(long x)
> +{
> + u64 tmp = (u64)x * TICK_NSEC;
> + x = do_div(tmp, (NSEC_PER_SEC / USER_HZ));
> + return (long)tmp;
> +}
> #else
> # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
> #endif
>
>
>
>
>

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2004-04-14 12:12:26

by Tim Schmielau

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

> diff -Nru a/include/linux/times.h b/include/linux/times.h
> --- a/include/linux/times.h Tue Apr 13 15:00:25 2004
> +++ b/include/linux/times.h Tue Apr 13 15:00:25 2004
> @@ -7,7 +7,12 @@
> #include <asm/param.h>
>
> #if (HZ % USER_HZ)==0
> -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
> +static inline long jiffies_to_clock_t(long x)
> +{
> + u64 tmp = (u64)x * TICK_NSEC;
> + x = do_div(tmp, (NSEC_PER_SEC / USER_HZ));
> + return (long)tmp;
> +}
> #else
> # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
> #endif

Excuse me for barging in lately and innocently, but I find this patch
hard to comprehend:
- shouldn't a foo_to_clock_t() function return a clock?
- the x = seems superfluous
- the #if is not a shortcut anymore, so why keep it?
Shouldn't this patch be more like the following
(completely untested)?

Tim


diff -urp --exclude-from dontdiff linux-2.6.5/include/linux/times.h linux-2.6.5-jfix1/include/linux/times.h
--- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100
+++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-14 13:48:57.000000000 +0200
@@ -6,11 +6,16 @@
#include <asm/types.h>
#include <asm/param.h>

-#if (HZ % USER_HZ)==0
-# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
-#else
-# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
-#endif
+static inline clock_t jiffies_to_clock_t(long x)
+{
+#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
+ return x / (HZ / USER_HZ);
+#else
+ u64 tmp = (u64)x * TICK_NSEC;
+ do_div(tmp, (NSEC_PER_SEC / USER_HZ));
+ return (long)tmp;
+#endif
+}

static inline unsigned long clock_t_to_jiffies(unsigned long x)
{

2004-04-14 17:03:53

by George Anzinger

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

Tim Schmielau wrote:
>>diff -Nru a/include/linux/times.h b/include/linux/times.h
>>--- a/include/linux/times.h Tue Apr 13 15:00:25 2004
>>+++ b/include/linux/times.h Tue Apr 13 15:00:25 2004
>>@@ -7,7 +7,12 @@
>> #include <asm/param.h>
>>
>> #if (HZ % USER_HZ)==0
>>-# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
>>+static inline long jiffies_to_clock_t(long x)
>>+{
>>+ u64 tmp = (u64)x * TICK_NSEC;
>>+ x = do_div(tmp, (NSEC_PER_SEC / USER_HZ));
>>+ return (long)tmp;
>>+}
>> #else
>> # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
>> #endif
>
>
> Excuse me for barging in lately and innocently, but I find this patch
> hard to comprehend:
> - shouldn't a foo_to_clock_t() function return a clock?
> - the x = seems superfluous
> - the #if is not a shortcut anymore, so why keep it?
> Shouldn't this patch be more like the following
> (completely untested)?
>
> Tim
>
>
> diff -urp --exclude-from dontdiff linux-2.6.5/include/linux/times.h linux-2.6.5-jfix1/include/linux/times.h
> --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100
> +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-14 13:48:57.000000000 +0200
> @@ -6,11 +6,16 @@
> #include <asm/types.h>
> #include <asm/param.h>
>
> -#if (HZ % USER_HZ)==0
> -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
> -#else
> -# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
> -#endif
> +static inline clock_t jiffies_to_clock_t(long x)
> +{
> +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
> + return x / (HZ / USER_HZ);
> +#else
> + u64 tmp = (u64)x * TICK_NSEC;
> + do_div(tmp, (NSEC_PER_SEC / USER_HZ));
> + return (long)tmp;
> +#endif
> +}
>
> static inline unsigned long clock_t_to_jiffies(unsigned long x)
> {
>
It does look a bit better. Takes into account the issue of TICK_NSEC being what
it is.

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

2004-04-14 18:28:46

by john stultz

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

On Wed, 2004-04-14 at 05:10, Tim Schmielau wrote:
> Excuse me for barging in lately and innocently, but I find this patch
> hard to comprehend:
> - shouldn't a foo_to_clock_t() function return a clock?
> - the x = seems superfluous
> - the #if is not a shortcut anymore, so why keep it?
> Shouldn't this patch be more like the following
> (completely untested)?

Yes, you're cleanups look much better! Although we still have yet to
hear if it resolves the problem.

thanks
-john

2004-04-15 10:37:14

by Petri Kaukasoina

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

On Wed, Apr 14, 2004 at 11:28:15AM -0700, john stultz wrote:
> On Wed, 2004-04-14 at 05:10, Tim Schmielau wrote:
> > Excuse me for barging in lately and innocently, but I find this patch
> > hard to comprehend:
> > - shouldn't a foo_to_clock_t() function return a clock?
> > - the x = seems superfluous
> > - the #if is not a shortcut anymore, so why keep it?
> > Shouldn't this patch be more like the following
> > (completely untested)?
>
> Yes, you're cleanups look much better! Although we still have yet to
> hear if it resolves the problem.

Hi,

If we are still talking about the problem with ps showing process start
times in future, I'm sorry neither of the patches helped. The error grows
here at a rate of 15 seconds in 24 hours as before.

-Petri

2004-04-15 11:05:52

by Tim Schmielau

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

On Thu, 15 Apr 2004, Petri Kaukasoina wrote:

> If we are still talking about the problem with ps showing process start
> times in future, I'm sorry neither of the patches helped. The error grows
> here at a rate of 15 seconds in 24 hours as before.

Oops...
sure, it cannot. Maybe this one is better...


--- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100
+++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-15 12:59:05.000000000 +0200
@@ -6,11 +6,16 @@
#include <asm/types.h>
#include <asm/param.h>

-#if (HZ % USER_HZ)==0
-# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
-#else
-# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
-#endif
+static inline clock_t jiffies_to_clock_t(long x)
+{
+#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
+ return x / (HZ / USER_HZ);
+#else
+ u64 tmp = (u64)x * TICK_NSEC;
+ do_div(tmp, (NSEC_PER_SEC / USER_HZ));
+ return (long)tmp;
+#endif
+}

static inline unsigned long clock_t_to_jiffies(unsigned long x)
{
@@ -34,7 +39,7 @@ static inline unsigned long clock_t_to_j

static inline u64 jiffies_64_to_clock_t(u64 x)
{
-#if (HZ % USER_HZ)==0
+#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
do_div(x, HZ / USER_HZ);
#else
/*
@@ -42,8 +47,8 @@ static inline u64 jiffies_64_to_clock_t(
* but even this doesn't overflow in hundreds of years
* in 64 bits, so..
*/
- x *= USER_HZ;
- do_div(x, HZ);
+ x *= TICK_NSEC;
+ do_div(x, (NSEC_PER_SEC / USER_HZ));
#endif
return x;
}

2004-04-15 16:14:41

by Petri Kaukasoina

[permalink] [raw]
Subject: Re: /proc or ps tools bug? 2.6.3, time is off

On Thu, Apr 15, 2004 at 01:05:17PM +0200, Tim Schmielau wrote:
> On Thu, 15 Apr 2004, Petri Kaukasoina wrote:
>
> > If we are still talking about the problem with ps showing process start
> > times in future, I'm sorry neither of the patches helped. The error grows
> > here at a rate of 15 seconds in 24 hours as before.
>
> Oops...
> sure, it cannot. Maybe this one is better...
>
>
> --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100
> +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-15 12:59:05.000000000 +0200

Yes, it seems to have fixed it. There is a small error: ps shows a start
time of a new minute about four seconds too early, but the error stays
constant and does not change as a function of uptime any longer. (Actually
it still does but only at the same rate as ntpd corrects time.)

-Petri