2003-02-09 12:50:00

by Chris Rankin

[permalink] [raw]
Subject: Linux 2.4.20-SMP: where is all my CPU time going?

Hi,

I have been running Linux-2.4.20-SMP on my dual PIII-9333MHz machine
(1 GB RAM) for over 3 weeks as a desktop machine, with 2 instances of
SETI@Home running at nice 19 in the background the whole time:

# uptime
12:25:44 up 24 days, 17:51, 7 users, load average: 2.03, 2.05, 2.13

However, I have occasionally been noticing that some SETI jobs have
been taking longer than I would otherwise suppose. Here is what
wait4() has to say about the CPU time for each work unit over the last
few days:

TIMESTAMP [2003] CLIENT NAME RUN-TIME SYS-TIME
Feb 07 16:16:48: SETI-6 33215.280000 278.470000
Feb 07 18:20:19: SETI-1 33722.820000 203.570000
Feb 08 02:27:32: SETI-4 33207.240000 219.950000
Feb 08 04:26:54: SETI-3 32954.330000 217.670000
Feb 08 15:24:00: SETI-5 29543.910000 1908.440000
Feb 08 17:30:35: SETI-6 29512.810000 1949.680000
Feb 09 06:14:08: SETI-1 33391.280000 2165.590000
Feb 09 12:22:21: SETI-4 32747.760000 3840.990000

Each work unit is run as a distinct process. When the process
terminates, a super-demon program writes an entry in a log file, along
with the data from the struct rusage. However, over the last few days,
more and more time has started being spent in the kernel. The last
work unit to complete must have begun around 17:30:35 on Feb 8th, but
didn't finish until 12:22:21 on Feb 9th. That's about 19 hours of
wall-time, of which 9 hours were spent crunching the work unit and 1
hour was spent "doing things" in the kernel. That leaves 9 hours
unaccounted for! Granted, the SETI process was running at nice 19 but
then the machine was also idle since at least midnight on Feb 9th.

Can anyone suggest what the kernel might be doing to generate one hour
of sys-time (compared to a more usual ~ 4 minutes?) And where could
those missing 9 hours of run time have gone?

Thanks for any insights here,
Cheers,
Chris

P.S. This isn't the first time that this has happened. The *very*
first sys-time "spike" happened in Oct 2001, but they've become much
more regular since about Oct 2002, appearing approximately one or two
weeks apart.


2003-02-09 17:37:22

by Chris Rankin

[permalink] [raw]
Subject: Re: Linux 2.4.20-SMP: where is all my CPU time going?

> > I have been running Linux-2.4.20-SMP on my dual PIII-9333MHz machine
> > (1 GB RAM) for over 3 weeks as a desktop machine, with 2 instances of
> > SETI@Home running at nice 19 in the background the whole time:
>
> but other unknown factors have changed, right? like daily use
> of the desktop?

Yes, I do *use* my destop occasionally... ;-). However, my full "SETI
statistics" file contains daily data going back to October 2000. For
example, here is a excerpt from September 2002:

TIMESTAMP [2002] CLIENT NAME RUN-TIME SYS-TIME
Sep 22 23:08:00: SETI-5 20606.560000 190.480000
Sep 23 02:19:22: SETI-1 31343.990000 289.770000
Sep 23 08:47:07: SETI-2 32592.250000 273.890000
Sep 23 11:19:43: SETI-3 30795.910000 265.160000
Sep 23 18:10:38: SETI-4 32877.130000 270.660000
Sep 23 20:11:08: SETI-6 30946.050000 265.570000
Sep 24 03:51:53: SETI-5 33192.640000 270.920000
Sep 24 05:53:52: SETI-1 33039.560000 279.380000
Sep 24 13:32:49: SETI-2 32785.650000 271.290000
Sep 24 15:38:35: SETI-3 33046.460000 282.480000
Sep 24 23:16:30: SETI-4 32947.420000 279.430000
Sep 25 00:12:22: SETI-6 28678.120000 259.580000
Sep 25 07:52:13: SETI-5 28920.640000 268.700000
Sep 25 10:57:25: SETI-1 33598.060000 554.140000
Sep 25 22:24:01: SETI-3 16284.640000 2286.710000
Sep 26 00:34:02: SETI-2 31303.880000 2678.700000
Sep 26 08:17:12: SETI-6 33268.100000 281.640000
Sep 26 09:52:11: SETI-5 31248.750000 276.190000
Sep 26 18:00:18: SETI-1 32560.010000 274.440000
Sep 26 19:08:24: SETI-3 31045.920000 269.040000
Sep 27 03:54:47: SETI-2 33266.910000 276.200000
Sep 27 04:46:27: SETI-4 32386.510000 272.180000
Sep 27 13:27:50: SETI-6 33058.870000 267.090000
Sep 27 14:21:14: SETI-5 33171.320000 271.520000
Sep 27 21:45:00: SETI-1 28743.730000 251.880000
Sep 27 23:59:34: SETI-3 33474.890000 261.430000
Sep 28 07:17:16: SETI-2 33210.900000 256.300000

I have not changed the way I have used this machine, and yet these
"sys-spikes" only appear for two work units. This box should also not
suffer from memory starvation:

$ free -t
total used free shared buffers cached
Mem: 1033532 839516 194016 0 57468 596040
-/+ buffers/cache: 186008 847524
Swap: 498004 12000 486004
Total: 1531536 851516 680020

> > Feb 07 18:20:19: SETI-1 33722.820000 203.570000
> > Feb 09 06:14:08: SETI-1 33391.280000 2165.590000
>
> do you have any other data? if the kernel happens to pull pages away
> from the seti process, it could easily cause this sort of thing.

Well, I actually have no idea what sort of activity counts towards the
sys-time total. Is this time spent in spinlocks? Semaphores? Waiting
for DMA transfers to complete? And we're talking 36 minutes-worth of
sys-time here! It sounds as if some in-kernel process is going
completely off the deep end.

> check your cron jobs; I'm guessing you simply run updatedb or something
> at that interval. it does enough IO to flush pages in the seti client's
> working set.

My only cron job is one that runs "rmmod -a" every hour.

> that's just one possible perturber, though: there are plenty of other
> mechanisms that would work fine to produce this kind of effect.
> some sort of cache-thrashing in particular.

Could cache-thrashing account for the 9 hours of missing run-time? I
didn't see or hear any unusual disc activity last night, and the only
processes in my run-queue were the two setiathome ones.

Chris

2003-02-09 18:46:49

by Chris Rankin

[permalink] [raw]
Subject: Re: Linux 2.4.20-SMP: where is all my CPU time going?

> > TIMESTAMP [2002] CLIENT NAME RUN-TIME SYS-TIME
> > Sep 22 23:08:00: SETI-5 20606.560000 190.480000
>
> incidentally, what's the meaning of the client-name column?
> (no, I've never run seti, so don't know whether it increments
> through spurious client names on the same machine for some reason...)

My super demon program has 6 "client" directories, each of which
contains an instance of the setiathome program. The super demon keeps
two of these clients processing work units at any one time; 2 because
the machine has 2 CPUs.

> > > do you have any other data? if the kernel happens to pull pages away
> > > from the seti process, it could easily cause this sort of thing.
> >
> > Well, I actually have no idea what sort of activity counts towards the
> > sys-time total. Is this time spent in spinlocks? Semaphores? Waiting
> > for DMA transfers to complete? And we're talking 36 minutes-worth of
> > sys-time here! It sounds as if some in-kernel process is going
> > completely off the deep end.
>
> you have absolutely no data to support that. all you know is that for some
> reason, the kernel thinks seti is taking up more in-kernel time. in general,
> driver-level stuff (like irq's and dma) is not accounted by the kernel,
> and thus shows up in the user-time of whatever process happens to be running.
>
> do you have *elapsed* time per workunit? that's actually a hard number,
> not a wishywashy one like user/sys times. if there's an increase in
> elapsed time, the phenomenon is a little more interesting.

Yes I do. The work-unit began on Feb 08 2003 17:30:35 and finished
on Feb 09 2003 12:22:21. That's approximately 19 hours of wall time,
of which there were 32747.76 seconds of user time and 3840.99 seconds
of sys time. Compare this to a previous work unit which began on Feb
07 2003 16:16:48, finished on Feb 08 2003 02:27:32 (10 hours of wall
time) and took 33207.24 and 219.95 seconds of user and sys time
respectively. This is where my "9 hours of missing run-time" assertion
comes from.

> > > check your cron jobs; I'm guessing you simply run updatedb or something
> > > at that interval. it does enough IO to flush pages in the seti client's
> > > working set.
> >
> > My only cron job is one that runs "rmmod -a" every hour.
>
> really? no logwatch, logrotate, tmpclean, updatedb, tripwire,
> preen-man-pages?

Nope. This is a home-built box.

> > Could cache-thrashing account for the 9 hours of missing run-time? I
>
> sure. you have basically no useful data, since user/sys time accounting
> is purely for amusement purposes, not taken seriously by the kernel.

Terrific.

> the only way you can possibly track this down is by getting more data
> around one of these spikes. something like running vmstat periodically,
> as well as grabbing /proc/{meminfo,slabinfo}. perhaps kernel profiling
> or oprofile during non-spike and spike times.

Well on the assumption that I'm still in a spike time, here's some
more memory info.

$ cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 1058336768 860172288 198164480 0 59092992 614592512
Swap: 509956096 12288000 497668096
MemTotal: 1033532 kB
MemFree: 193520 kB
MemShared: 0 kB
Buffers: 57708 kB
Cached: 598712 kB
SwapCached: 1476 kB
Active: 342780 kB
Inactive: 414884 kB
HighTotal: 130944 kB
HighFree: 2044 kB
LowTotal: 902588 kB
LowFree: 191476 kB
SwapTotal: 498004 kB
SwapFree: 486004 kB

$ cat /proc/slabinfo
slabinfo - version: 1.1 (SMP)
kmem_cache 80 80 244 5 5 1 : 252 126
uhci_urb_priv 1 63 60 1 1 1 : 252 126
fib6_nodes 226 226 32 2 2 1 : 252 126
ip6_dst_cache 60 60 192 3 3 1 : 252 126
ndisc_cache 60 60 128 2 2 1 : 252 126
nfs_write_data 147 209 352 17 19 1 : 124 62
nfs_read_data 0 0 352 0 0 1 : 124 62
nfs_page 188 440 96 7 11 1 : 252 126
hpsb_packet 0 0 100 0 0 1 : 252 126
tcp_tw_bucket 60 60 128 2 2 1 : 252 126
tcp_bind_bucket 273 339 32 3 3 1 : 252 126
tcp_open_request 80 80 96 2 2 1 : 252 126
inet_peer_cache 59 59 64 1 1 1 : 252 126
ip_fib_hash 10 113 32 1 1 1 : 252 126
ip_dst_cache 192 192 160 8 8 1 : 252 126
arp_cache 30 30 128 1 1 1 : 252 126
blkdev_requests 512 520 96 13 13 1 : 252 126
devfsd_event 169 169 20 1 1 1 : 252 126
journal_head 324 702 48 8 9 1 : 252 126
revoke_table 1 253 12 1 1 1 : 252 126
revoke_record 226 226 32 2 2 1 : 252 126
dnotify_cache 0 0 20 0 0 1 : 252 126
file_lock_cache 120 120 96 3 3 1 : 252 126
fasync_cache 3 202 16 1 1 1 : 252 126
uid_cache 3 113 32 1 1 1 : 252 126
skbuff_head_cache 522 648 160 22 27 1 : 252 126
sock 192 192 928 48 48 1 : 124 62
sigqueue 164 290 132 6 10 1 : 252 126
kiobuf 0 0 64 0 0 1 : 252 126
cdev_cache 273 295 64 5 5 1 : 252 126
bdev_cache 10 118 64 2 2 1 : 252 126
mnt_cache 18 177 64 3 3 1 : 252 126
inode_cache 26053 28416 480 3269 3552 1 : 124 62
dentry_cache 41382 43650 128 1455 1455 1 : 252 126
dquot 3 60 128 2 2 1 : 252 126
filp 1144 1200 128 40 40 1 : 252 126
names_cache 38 38 4096 38 38 1 : 60 30
buffer_head 158790 211920 96 5172 5298 1 : 252 126
mm_struct 312 312 160 13 13 1 : 252 126
vm_area_struct 1838 2360 96 59 59 1 : 252 126
fs_cache 354 354 64 6 6 1 : 252 126
files_cache 189 189 416 21 21 1 : 124 62
signal_act 132 132 1312 44 44 1 : 60 30
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 0 0 65536 0 0 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 0 0 32768 0 0 8 : 0 0
size-16384(DMA) 0 0 16384 0 0 4 : 0 0
size-16384 2 13 16384 2 13 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 4 18 8192 4 18 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 60 30
size-4096 112 142 4096 112 142 1 : 60 30
size-2048(DMA) 0 0 2048 0 0 1 : 60 30
size-2048 130 190 2048 76 95 1 : 60 30
size-1024(DMA) 0 0 1024 0 0 1 : 124 62
size-1024 288 288 1024 72 72 1 : 124 62
size-512(DMA) 0 0 512 0 0 1 : 124 62
size-512 330 392 512 49 49 1 : 124 62
size-256(DMA) 0 0 256 0 0 1 : 252 126
size-256 264 390 256 26 26 1 : 252 126
size-128(DMA) 2 30 128 1 1 1 : 252 126
size-128 2088 2970 128 99 99 1 : 252 126
size-64(DMA) 0 0 64 0 0 1 : 252 126
size-64 1119 1475 64 24 25 1 : 252 126
size-32(DMA) 2 113 32 1 1 1 : 252 126
size-32 8679 11413 32 88 101 1 : 252 126

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 12000 192384 57708 598728 0 0 3 3 3 4 94 6 0 0
2 0 12000 192384 57708 598728 0 0 0 6 101 186 99 1 0 0
2 0 12000 192376 57708 598728 0 0 0 6 101 186 100 0 0 0
2 0 12000 193272 57708 598728 0 0 0 0 100 185 99 1 0 0
2 0 12000 194432 57708 598728 0 0 0 0 100 180 99 1 0 0
3 0 12000 193404 57708 598728 0 0 0 0 100 180 99 1 0 0
2 0 12000 194432 57708 598728 0 0 0 0 100 179 99 1 0 0
3 0 12000 194432 57708 598728 0 0 0 17 104 177 99 1 0 0
2 0 12000 193144 57708 598728 0 0 0 0 100 185 98 2 0 0