Hi
I'm very sorry if this is already reported. In 2.6.1 (and earlier) ps does
not report the process start times correctly. (Either procps 2.0.18 or
3.1.15). The problem shows when uptime is large enough, now it is 14 days.
After bootup ps showed the right time.
For example, I started this bash process really at 21:24 (date showed 21:24
then):
kaukasoi 22108 0.0 0.2 4452 1532 pts/4 R 21:28 0:00 /bin/bash
Here is a line from w:
kaukasoi pts/4 - 9:24pm 0.00s 0.06s 0.00s w
BTW, is it normal that all the time stamps at the proc directory are not the same?
elektroni kaukasoi ~ > ls -l /proc/22108
total 0
-r-------- 1 kaukasoi fys 0 Jan 23 21:27 auxv
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:24 cmdline
lrwxrwxrwx 1 kaukasoi fys 0 Jan 23 21:27 cwd -> /home/kaukasoi
-r-------- 1 kaukasoi fys 0 Jan 23 21:24 environ
lrwxrwxrwx 1 kaukasoi fys 0 Jan 23 21:27 exe -> /bin/bash
dr-x------ 2 kaukasoi fys 0 Jan 23 21:24 fd
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:27 maps
-rw------- 1 kaukasoi fys 0 Jan 23 21:27 mem
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:27 mounts
lrwxrwxrwx 1 kaukasoi fys 0 Jan 23 21:27 root -> /
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:24 stat
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:27 statm
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:24 status
dr-xr-xr-x 3 kaukasoi fys 0 Jan 23 21:27 task
-r--r--r-- 1 kaukasoi fys 0 Jan 23 21:27 wchan
elektroni kaukasoi ~ >
On Fri, Jan 23, 2004 at 09:47:14PM +0200, I wrote:
> I'm very sorry if this is already reported. In 2.6.1 (and earlier) ps does
> not report the process start times correctly.
...
> For example, I started this bash process really at 21:24 (date showed 21:24
> then):
>
> kaukasoi 22108 0.0 0.2 4452 1532 pts/4 R 21:28 0:00 /bin/bash
OK, I would like to make my bug report more accurate: the problem seems to
be that the value of btime in /proc/stat is not correct. procps uses that to
calculate the start time of a process.
date +%s ; cat /proc/uptime ; grep btime /proc/stat
1075028140
1393207.63 1379643.46
btime 1073634689
Time now (1075028140) minus uptime (1393207.63) is 1073634932 (the correct
bootup time). btime differs from that by 243 seconds, so that's the error of
four minutes in ps output. Just to check, a line from syslog:
Jan 9 09:55:41 elektroni kernel: Linux version 2.6.1 (kaukasoi@elektroni) (gcc version 2.95.3 20010315 (release)) #1 Fri Jan 9 09:54:37 EET 2004
and from that
date +%s --date='Jan 9 09:55:41'
1073634941
ok, syslogd seems to have started 9 seconds after bootup.
This is athlon tb 1400 and I run ntpd.
On Sun, Jan 25, 2004 at 01:08:47PM +0200, I wrote:
> On Fri, Jan 23, 2004 at 09:47:14PM +0200, I wrote:
> > For example, I started this bash process really at 21:24 (date showed 21:24
> > then):
> >
> > kaukasoi 22108 0.0 0.2 4452 1532 pts/4 R 21:28 0:00 /bin/bash
>
> OK, I would like to make my bug report more accurate: the problem seems to
> be that the value of btime in /proc/stat is not correct.
btime in /proc/stat does not stay constant but decreases at a rate of 15
secs/day. (So I thought that that's why there is that four minute error in
ps output after uptime of a couple of weeks.) Maybe this has something to do
with the fact that the 'timer' line in /proc/interrupts does not seem to
increase at an exact rate of 1000 steps per second but about 1000.18 steps
per second, instead. (The relative error is the same: 0.18 divided by 1000
is equal to 15 seconds divided by 24 hours).
I made an experiment shown below. I know nothing about kernel programming,
so this is probably not correct, but at least btime seemed to stay constant.
(I don't believe this fixes procps, though. If HZ if off by 180 ppm then I
guess ps can't possibly get its calculations involving HZ right. But at
least the bootup time reported by procinfo stays constant.)
--- linux-2.6.1/fs/proc/proc_misc.c.orig 2004-01-09 08:59:09.000000000 +0200
+++ linux-2.6.1/fs/proc/proc_misc.c 2004-01-27 14:39:04.000000000 +0200
@@ -363,19 +363,13 @@
u64 jif;
unsigned int sum = 0, user = 0, nice = 0, system = 0, idle = 0, iowait = 0, irq = 0, softirq = 0;
struct timeval now;
- unsigned long seq;
-
- /* Atomically read jiffies and time of day */
- do {
- seq = read_seqbegin(&xtime_lock);
-
- jif = get_jiffies_64();
- do_gettimeofday(&now);
- } while (read_seqretry(&xtime_lock, seq));
+ struct timespec uptime;
+ do_gettimeofday(&now);
+ do_posix_clock_monotonic_gettime(&uptime);
/* calc # of seconds since boot time */
- jif -= INITIAL_JIFFIES;
- jif = ((u64)now.tv_sec * HZ) + (now.tv_usec/(1000000/HZ)) - jif;
+ jif = ((u64)now.tv_sec * HZ) + (now.tv_usec/(1000000/HZ)) \
+ - ((u64)uptime.tv_sec * HZ) - (uptime.tv_nsec/(NSEC_PER_SEC/HZ));
do_div(jif, HZ);
for (i = 0; i < NR_CPUS; i++) {
On Tue, Jan 27, 2004 at 05:52:54PM +0200, Petri Kaukasoina wrote:
> I made an experiment shown below. I know nothing about kernel programming,
> so this is probably not correct, but at least btime seemed to stay constant.
> (I don't believe this fixes procps, though. If HZ if off by 180 ppm then I
> guess ps can't possibly get its calculations involving HZ right. But at
> least the bootup time reported by procinfo stays constant.)
Yes, btime stays now constant. But to make ps work right, I replaced all uses of
variable Hertz in ps source (procps 2.0.18) with constant 100.0172.
After uptime 4 h 42 min, the error is already 3 seconds in the default ps,
and ./ps which is the "fixed" one, displays the right start time.
Tue Jan 27 19:17:57 EET 2004
kaukasoi 13388 0.0 0.1 2652 740 pts/5 R 19:17 0:00 ps uxw
kaukasoi 13392 0.0 0.1 2636 724 pts/5 R 19:17 0:00 ./ps uxw
Tue Jan 27 19:17:57 EET 2004
kaukasoi 13398 0.0 0.1 2652 740 pts/5 R 19:18 0:00 ps uxw
kaukasoi 13402 0.0 0.1 2636 724 pts/5 R 19:17 0:00 ./ps uxw
Tue Jan 27 19:17:57 EET 2004
...
Tue Jan 27 19:17:59 EET 2004
kaukasoi 13979 0.0 0.1 2652 740 pts/5 R 19:18 0:00 ps uxw
kaukasoi 13983 0.0 0.1 2636 724 pts/5 R 19:18 0:00 ./ps uxw
Tue Jan 27 19:18:00 EET 2004
So the problem is the inaccuracy of HZ.
On Tue, 2004-01-27 at 07:52, Petri Kaukasoina wrote:
> On Sun, Jan 25, 2004 at 01:08:47PM +0200, I wrote:
> > On Fri, Jan 23, 2004 at 09:47:14PM +0200, I wrote:
> > > For example, I started this bash process really at 21:24 (date showed 21:24
> > > then):
> > >
> > > kaukasoi 22108 0.0 0.2 4452 1532 pts/4 R 21:28 0:00 /bin/bash
> >
> > OK, I would like to make my bug report more accurate: the problem seems to
> > be that the value of btime in /proc/stat is not correct.
>
> btime in /proc/stat does not stay constant but decreases at a rate of 15
> secs/day. (So I thought that that's why there is that four minute error in
> ps output after uptime of a couple of weeks.) Maybe this has something to do
> with the fact that the 'timer' line in /proc/interrupts does not seem to
> increase at an exact rate of 1000 steps per second but about 1000.18 steps
> per second, instead. (The relative error is the same: 0.18 divided by 1000
> is equal to 15 seconds divided by 24 hours).
>
> I made an experiment shown below. I know nothing about kernel programming,
> so this is probably not correct, but at least btime seemed to stay constant.
> (I don't believe this fixes procps, though. If HZ if off by 180 ppm then I
> guess ps can't possibly get its calculations involving HZ right. But at
> least the bootup time reported by procinfo stays constant.)
Uh, what does your /etc/ntp/drift file show?
The basic equation is:
btime ~= gettimeodfay() - uptime
Thus if your time of day is adjusted by NTP, btime will change as well.
Uptime is calculated calculated by jiffies/HZ, and HZ is not NTP
adjusted, so if your system is running 180ppm too fast or slow, btime
would be expected to change.
I'm not yet sure how that is related to the ps start time being wrong.
thanks
-john
Thanks for answering,
On Wed, Jan 28, 2004 at 06:21:52PM -0800, john stultz wrote:
> On Tue, 2004-01-27 at 07:52, Petri Kaukasoina wrote:
> > I made an experiment shown below. I know nothing about kernel programming,
> > so this is probably not correct, but at least btime seemed to stay constant.
> > (I don't believe this fixes procps, though. If HZ if off by 180 ppm then I
> > guess ps can't possibly get its calculations involving HZ right. But at
> > least the bootup time reported by procinfo stays constant.)
>
>
> Uh, what does your /etc/ntp/drift file show?
ntp.drift is -22.251 on linux-2.6.1. But on linux-2.4.24 it stabilizes at
about -5.4.
> The basic equation is:
> btime ~= gettimeodfay() - uptime
>
> Thus if your time of day is adjusted by NTP, btime will change as well.
> Uptime is calculated calculated by jiffies/HZ, and HZ is not NTP
> adjusted, so if your system is running 180ppm too fast or slow, btime
> would be expected to change.
>
Yes, on linux-2.2.24 I can see that /proc/uptime is just the jiffies and
btime is current time - jiffies. But in linux-2.6.1 /proc/uptime is now
do_posix_clock_monotonic_gettime(), whatever that means, and /proc/uptime
gives a correct value. But btime is still gettimeofday-jiffies and it does
not stay constant. My patch changed btime to be
gettimeofday-do_posix_clock_monotonic_gettime() and after that it stays
constant.
In other words, on linux-2.2.24, if I print the current time with 'date +%s'
and subtract uptime (/proc/uptime) from that, I do get the same number that
is in the btime field in /proc/stat. But not on linux-2.6.1 without my
patch. (ntp running in both cases.)
> I'm not yet sure how that is related to the ps start time being wrong.
I guess it's not. The relative error was just the same in both. On
linux-2.2.24, ps start time is correct but on linux-2.6.1 it shows times in
future. How much in future, about minute per four days of uptime. But if I
multiply Hertz by e.g. 1.000172 in the ps source, then I get the right
results on linux-2.6.1.
I'll do an experiment and boot to linux-2.6.1 without ntpd next...
Thanks,
-Petri
On Thu, Jan 29, 2004 at 04:38:47PM +0200, Petri Kaukasoina wrote:
> I'll do an experiment and boot to linux-2.6.1 without ntpd next...
The problem does not seem to be connected to ntpd. btime goes backwards and
ps process start times forwards even without ntp.
I tried both linux-2.6.1 and linux-2.4.24 without ntpd. I read the values of
several variables, waited some time, read them again, and subtracted to
obtain the differences. Results below. The references to ntp time were taken
using 'ntpdate -q', so the clock was running freely and not controlled by
ntp.
Linux-2.6.1 without ntpd:
passed time according to 'date': 1988.941515 s
timer interrupts: 1989244
interrupts per second: 1000.152
error compared to 1000 interrupts per second: +152 ppm
(ps times go towards future about 2.1 s in 13479 s of uptime: about 160 ppm)
(Bootup time 'btime' similarly goes backwards about 2 s in the same time)
clock too fast compared to ntp time by: +0.045162 s
so 'date' is too fast compared to UT by: +22.7 ppm
Linux-2.4.24 without ntpd:
passed time according to 'date': 2711.294952 s
timer interrupts: 271129
interrupts per second: 100.000
error compared to 100 interrupts per second: 0 ppm
(ps times stay correct)
(btime stays constant)
clock too fast compared to ntp time by: +0.019216 s
so 'date' is too fast compared to UT by: +7.1 ppm
I would like to interpret these numbers so that in linux-2.6.1 the constant
HZ is 152 ppm too large and when procps doesn't know it, it calculates
process start times wrong in ps output. The timeofday somehow is not
affected because time goes only 15.6 ppm faster compared to linux-2.4.24.
Thanks,
-Petri
On Thu, 2004-01-29 at 06:38, Petri Kaukasoina wrote:
> Thanks for answering,
>
> On Wed, Jan 28, 2004 at 06:21:52PM -0800, john stultz wrote:
> > On Tue, 2004-01-27 at 07:52, Petri Kaukasoina wrote:
> > > I made an experiment shown below. I know nothing about kernel programming,
> > > so this is probably not correct, but at least btime seemed to stay constant.
> > > (I don't believe this fixes procps, though. If HZ if off by 180 ppm then I
> > > guess ps can't possibly get its calculations involving HZ right. But at
> > > least the bootup time reported by procinfo stays constant.)
> >
> >
> > Uh, what does your /etc/ntp/drift file show?
>
> ntp.drift is -22.251 on linux-2.6.1. But on linux-2.4.24 it stabilizes at
> about -5.4.
Hmm.
> > The basic equation is:
> > btime ~= gettimeodfay() - uptime
> >
> > Thus if your time of day is adjusted by NTP, btime will change as well.
> > Uptime is calculated calculated by jiffies/HZ, and HZ is not NTP
> > adjusted, so if your system is running 180ppm too fast or slow, btime
> > would be expected to change.
> >
>
> Yes, on linux-2.2.24 I can see that /proc/uptime is just the jiffies and
> btime is current time - jiffies. But in linux-2.6.1 /proc/uptime is now
> do_posix_clock_monotonic_gettime(), whatever that means, and /proc/uptime
> gives a correct value. But btime is still gettimeofday-jiffies and it does
> not stay constant. My patch changed btime to be
> gettimeofday-do_posix_clock_monotonic_gettime() and after that it stays
> constant.
Does George Anzinger's patch work as well?
> > I'm not yet sure how that is related to the ps start time being wrong.
>
> I guess it's not. The relative error was just the same in both. On
> linux-2.2.24, ps start time is correct but on linux-2.6.1 it shows times in
> future. How much in future, about minute per four days of uptime. But if I
> multiply Hertz by e.g. 1.000172 in the ps source, then I get the right
> results on linux-2.6.1.
>
> I'll do an experiment and boot to linux-2.6.1 without ntpd next...
I'd be interested in that.
thanks
-john
On Thu, Jan 29, 2004 at 11:48:49AM -0800, john stultz wrote:
> On Thu, 2004-01-29 at 06:38, Petri Kaukasoina wrote:
> > Yes, on linux-2.2.24 I can see that /proc/uptime is just the jiffies and
> > btime is current time - jiffies. But in linux-2.6.1 /proc/uptime is now
> > do_posix_clock_monotonic_gettime(), whatever that means, and /proc/uptime
> > gives a correct value. But btime is still gettimeofday-jiffies and it does
> > not stay constant. My patch changed btime to be
> > gettimeofday-do_posix_clock_monotonic_gettime() and after that it stays
> > constant.
>
> Does George Anzinger's patch work as well?
I must have missed that... Any references?
-Petri
Petri Kaukasoina <[email protected]> wrote:
>
> On Thu, Jan 29, 2004 at 11:48:49AM -0800, john stultz wrote:
> > On Thu, 2004-01-29 at 06:38, Petri Kaukasoina wrote:
> > > Yes, on linux-2.2.24 I can see that /proc/uptime is just the jiffies and
> > > btime is current time - jiffies. But in linux-2.6.1 /proc/uptime is now
> > > do_posix_clock_monotonic_gettime(), whatever that means, and /proc/uptime
> > > gives a correct value. But btime is still gettimeofday-jiffies and it does
> > > not stay constant. My patch changed btime to be
> > > gettimeofday-do_posix_clock_monotonic_gettime() and after that it stays
> > > constant.
> >
> > Does George Anzinger's patch work as well?
>
> I must have missed that... Any references?
>
This one.
diff -puN fs/proc/proc_misc.c~proc-stat-btime-fix-2 fs/proc/proc_misc.c
--- 25/fs/proc/proc_misc.c~proc-stat-btime-fix-2 Tue Jan 27 17:46:57 2004
+++ 25-akpm/fs/proc/proc_misc.c Tue Jan 27 17:46:57 2004
@@ -360,23 +360,12 @@ int show_stat(struct seq_file *p, void *
{
int i;
extern unsigned long total_forks;
- u64 jif;
+ unsigned long jif;
unsigned int sum = 0, user = 0, nice = 0, system = 0, idle = 0, iowait = 0, irq = 0, softirq = 0;
- struct timeval now;
- unsigned long seq;
- /* Atomically read jiffies and time of day */
- do {
- seq = read_seqbegin(&xtime_lock);
-
- jif = get_jiffies_64();
- do_gettimeofday(&now);
- } while (read_seqretry(&xtime_lock, seq));
-
- /* calc # of seconds since boot time */
- jif -= INITIAL_JIFFIES;
- jif = ((u64)now.tv_sec * HZ) + (now.tv_usec/(1000000/HZ)) - jif;
- do_div(jif, HZ);
+ jif = - wall_to_monotonic.tv_sec;
+ if (wall_to_monotonic.tv_nsec)
+ --jif;
for_each_cpu(i) {
int j;
_
On Thu, Jan 29, 2004 at 02:51:48PM -0800, Andrew Morton wrote:
> Petri Kaukasoina <[email protected]> wrote:
> >
> > On Thu, Jan 29, 2004 at 11:48:49AM -0800, john stultz wrote:
> > >
> > > Does George Anzinger's patch work as well?
> >
> > I must have missed that... Any references?
>
> This one.
>
> diff -puN fs/proc/proc_misc.c~proc-stat-btime-fix-2 fs/proc/proc_misc.c
Yes, it works fine: btime stays at the correct value. Just the
ps-output-in-future-problem left.
Linux-2.6.1+patch without ntpd:
passed time according to 'date': 11430.68524 s
timer interrupts: 11432383
interrupts per second: 1000.149
error compared to 1000 interrupts per second: +149 ppm
ps times go towards future about 2.0 s: about 170 ppm
btime stays constant!
clock too fast compared to ntp time by: +0.260461 s
so 'date' is too fast compared to UT by: +22.8 ppm
Linux-2.6.1+patch with ntpd:
passed time according to 'date': 14792.60765 s
timer interrupts: 14795154
interrupts per second: 1000.172
error compared to 1000 interrupts per second: +172 ppm
ps times go towards future about 2.7 s: about 180 ppm
btime stays constant!
(ntp.drift: -21.943)
-Petri