2002-04-23 10:03:53

by Denis Vlasenko

[permalink] [raw]
Subject: user/system/nice accounting

I modified proc_misc.c (see patch) to close small race
and to have jiffies printed too as a debugging aid.
But I still see 'idle' going backwards.
This happens very rarely but:

user n syste idle jiffies
....
1 cpu 13560 0 26994 746329 786883
2 cpu 13560 0 26995 746330 786885
3 cpu 13560 0 26997 746329 786886 <<<
4 cpu 13561 0 26997 746329 786887
5 cpu 13561 0 26998 746330 786889
6 cpu 13561 0 26999 746330 786890
....

I am trying to explain what we see here:
3) Two ticks were accounted as system time,
but reported jiffy is only incremented once
(A race? 'Real' jiffy is 786887?)
4) One tick got accounted as user time,
jiffy is incremented once too.
Race theory falls apart here: we should see jiffies+=2:
4 cpu 13561 0 26997 746330 786888
unless we lose race here two times in a row: highly improbable!

BTW, IMHO I closed the race in /proc/stat generation,
and jiffy is ++'ed before system/user/nice stats in timer interrupt,
so it cannot race there too.

Do we have other places where we do system/user/nice accounting
except timer int?!

Kernel is 2.4.18 _UP_. That is, uniprocessor.

Can anybody explain this?
--
vda

# diff -u proc_misc.c.orig proc_misc.c
--- proc_misc.c.orig Wed Nov 21 03:29:09 2001
+++ proc_misc.c Tue Apr 23 09:23:01 2002
@@ -240,7 +240,7 @@
{
int i, len;
extern unsigned long total_forks;
- unsigned long jif = jiffies;
+ unsigned long jif;
unsigned int sum = 0, user = 0, nice = 0, system = 0;
int major, disk;

@@ -256,8 +256,10 @@
#endif
}

- len = sprintf(page, "cpu %u %u %u %lu\n", user, nice, system,
- jif * smp_num_cpus - (user + nice + system));
+ jif = jiffies;
+ len = sprintf(page, "cpu %u %u %u %lu %lu\n", user, nice, system,
+ jif * smp_num_cpus - (user + nice + system),
+ jif);
for (i = 0 ; i < smp_num_cpus; i++)
len += sprintf(page + len, "cpu%d %u %u %u %lu\n",
i,



test_badidle2
=============
#!/bin/sh
prev=0
while true; do cat /proc/stat; done | \
grep -F 'cpu ' | \
while read line; do
next=`echo "$line" | cut -d ' ' -f 6`
diff=$(($next-$prev))
if test $diff -lt 0; then
echo "$line <<<"
else
echo "$line"
fi
prev=$next
done


2002-04-24 14:07:56

by Randy.Dunlap

[permalink] [raw]
Subject: [PATCH] Re: user/system/nice accounting

Hi Denis-

Yes, kstat_read_proc() in linux/fs/proc/proc_misc.c is
racing with the timer tick interrupt handler,
especially given your test_badidle2 script..
(I modified it to print only "error" conditions.)

I don't know how important it is to fix this, but
adding a lock at the beginning of kstat_read_proc()
allows it to run overnight with a problem [for me,
on UP].

BTW, idle shouldn't be blamed for going backwards,
since it's just the leftover time after adding
user + nice + system. ;)

I don't recall what kernel version you are using, but
here's a patch to 2.4.18.


--- linux-2418/fs/proc/proc_misc.c.JIF Tue Nov 20 21:29:09 2001
+++ linux-2418/fs/proc/proc_misc.c Wed Apr 24 07:50:43 2002
@@ -235,14 +235,20 @@
};
#endif

+extern rwlock_t xtime_lock;
+
static int kstat_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
int i, len;
extern unsigned long total_forks;
- unsigned long jif = jiffies;
+ unsigned long jif;
unsigned int sum = 0, user = 0, nice = 0, system = 0;
int major, disk;
+ unsigned long flags;
+
+ read_lock_irqsave(&xtime_lock, flags);
+ jif = jiffies;

for (i = 0 ; i < smp_num_cpus; i++) {
int cpu = cpu_logical_map(i), j;
@@ -256,8 +262,9 @@
#endif
}

- len = sprintf(page, "cpu %u %u %u %lu\n", user, nice, system,
- jif * smp_num_cpus - (user + nice + system));
+ read_unlock_irqrestore(&xtime_lock, flags);
+ len = sprintf(page, "cpu %u %u %u %lu\n", user, nice, system,
+ jif * smp_num_cpus - (user + nice + system));
for (i = 0 ; i < smp_num_cpus; i++)
len += sprintf(page + len, "cpu%d %u %u %u %lu\n",
i,


HTH.
~Randy


On Tue, 23 Apr 2002, Denis Vlasenko wrote:

| I modified proc_misc.c (see patch) to close small race
| and to have jiffies printed too as a debugging aid.
| But I still see 'idle' going backwards.
| This happens very rarely but:
|
| user n syste idle jiffies
| ....
| 1 cpu 13560 0 26994 746329 786883
| 2 cpu 13560 0 26995 746330 786885
| 3 cpu 13560 0 26997 746329 786886 <<<
| 4 cpu 13561 0 26997 746329 786887
| 5 cpu 13561 0 26998 746330 786889
| 6 cpu 13561 0 26999 746330 786890
| ....
|
| I am trying to explain what we see here:
| 3) Two ticks were accounted as system time,
| but reported jiffy is only incremented once
| (A race? 'Real' jiffy is 786887?)
| 4) One tick got accounted as user time,
| jiffy is incremented once too.
| Race theory falls apart here: we should see jiffies+=2:
| 4 cpu 13561 0 26997 746330 786888
| unless we lose race here two times in a row: highly improbable!
|
| BTW, IMHO I closed the race in /proc/stat generation,
| and jiffy is ++'ed before system/user/nice stats in timer interrupt,
| so it cannot race there too.
|
| Do we have other places where we do system/user/nice accounting
| except timer int?!

No.

| Kernel is 2.4.18 _UP_. That is, uniprocessor.
|
| Can anybody explain this?
| --