2008-08-06 20:36:20

by Rafael Almeida

[permalink] [raw]
Subject: Weird behaviour on /proc/stat

I've executed the following code on a intel core 2 quad (linux 2.6.21.5):

for (( x=0; x < 1800; x = x+1 )); do
head -n5 /proc/stat |
awk '{ print $2+$3+$4+$5+$6+$7+$8+$9 }' |
awk 'BEGIN { x=0 } { if (NR == 1) y=$0; else x=x+$1; } END {
print y, x }' |
awk '{ print $0, $1-$2 }' >> values
sleep 1;
done

My expectation was that the values file would have only 0s on the second
field. It didn't happen. Actually, it was always a value greater than 0.
So I went to the kernel code. The utilization is summed up here:

http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L463

Reading that file, if anything the sum of all the cpuX fields should be
greater than the cpu line. After all, it happens later and, if
information regarding the utilization is updated during the generation
of the output, then the cpuX lines should have a greater value.

I also noted that on
http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L463
for_each_possible_cpu is used. While on
http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L487
for_each_online_cpu is used. All the cores on the system are online, so
where could be the extra utilization that's being added to the first
line result?

I wish I had a machine with 4 cores which I could test changes on that
code, so I could investigate things a little further. But the only
machine I can change the kernel is my home computer which has only one
core :(.


2008-08-06 21:52:19

by Sven Wegener

[permalink] [raw]
Subject: Re: Weird behaviour on /proc/stat

On Wed, 6 Aug 2008, Rafael C. de Almeida wrote:

> I've executed the following code on a intel core 2 quad (linux 2.6.21.5):
>
> for (( x=0; x < 1800; x = x+1 )); do
> head -n5 /proc/stat |
> awk '{ print $2+$3+$4+$5+$6+$7+$8+$9 }' |
> awk 'BEGIN { x=0 } { if (NR == 1) y=$0; else x=x+$1; } END {
> print y, x }' |
> awk '{ print $0, $1-$2 }' >> values
> sleep 1;
> done
>
> My expectation was that the values file would have only 0s on the second
> field. It didn't happen. Actually, it was always a value greater than 0.
> So I went to the kernel code. The utilization is summed up here:
>
> http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L463
>
> Reading that file, if anything the sum of all the cpuX fields should be
> greater than the cpu line. After all, it happens later and, if
> information regarding the utilization is updated during the generation
> of the output, then the cpuX lines should have a greater value.
>
> I also noted that on
> http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L463
> for_each_possible_cpu is used. While on
> http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L487
> for_each_online_cpu is used. All the cores on the system are online, so
> where could be the extra utilization that's being added to the first
> line result?
>
> I wish I had a machine with 4 cores which I could test changes on that
> code, so I could investigate things a little further. But the only
> machine I can change the kernel is my home computer which has only one
> core :(.

It's expected behaviour, but it is indeed misleading. Here's the reason
why it happens: In the kernel we're accounting time based on CONFIG_HZ
(which I suspect is 1000 in your case) but are exporting values based on
USER_HZ (100, historic reasons) to userspace. So we're effectively
dividing the values by 10. Well, that division obviously leaves a
remainder in most cases, which is dropped. You see in the code that for
the summary we first add all in-kernel values up and then do the
conversion (cputime64_to_clock_t) to userspace values. So we're actually
adding up all the remainders, which we drop when converting each per-cpu
data on its own. This leads to a couple of additional jiffies being
accounted in the summary.

Sven