2004-11-23 19:30:08

by Mark_H_Johnson

[permalink] [raw]
Subject: Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm2-V0.7.30-9

>i have released the -V0.7.30-9 Real-Time Preemption patch,

The profile script (5 second wait, dump data if wait > 10 seconds) was
stuck for over 300 seconds & when it woke up there were 34 jobs waiting
to run. A couple excerpts from the data [will send that separately].
This was with -V0.7.30-7 (PREEMPT_RT) running latencytest & I believe
the disk write test (comparing date/time modified of log files...) plus
a pair of data collecting scripts.

Don't know if this is the journaling problem you mention being fixed
in -9 or not. This huge delay did not recur with the disk copy test.

The odd data point I notice here was the nearly 100% cpu time for
the shell script get_ltrace.sh which is...

#!/bin/sh

let MAX=`cat /proc/sys/kernel/preempt_max_latency`
let I=0 J=1
let MP=${1:-1000}
echo "Current Maximum is $MAX, limit will be $MP."
while (( I < 100 )) ; do
sleep 1s
let NOW=`cat /proc/sys/kernel/preempt_max_latency`
if (( MAX != NOW )) ; then
echo "New trace $I w/ $NOW usec latency."
cat /proc/latency_trace > lt.`printf "%02d" $I`
sync ; sync
let I++
let MAX=NOW
elif (( J++ >= 10 )) ; then
if (( MAX != MP )) ; then
echo "Resetting max latency from $MAX to $MP."
echo $MP > /proc/sys/kernel/preempt_max_latency
let MAX=$MP
else
echo "No new latency samples at `date`."
fi
let J=1
# else do nothing...
fi
done

Perhaps the sync caused the 100% CPU usage for the moment (but certainly
not for five minutes!). It is however suspicious that the total run time
for that script (5:33) is very close to the delay time (336 seconds) for
the profile script. Both of these run at RT FIFO priority 1. There is also
a 5 minute gap in the files generated by get_ltrace.sh as well.

--Mark

--- excerpts follow ---

Tue Nov 23 12:40:48 CST 2004
336 seconds elapsed time.
Load Average ->
36.00 24.57 12.44 36/125 8600

Tasks in RUN state...

PID PRI RTPRIO NI S TIME %CPU CMD
041123-01/prof002.txt: 1 23 - 0 R 00:00:04 0.1 init [5]
041123-01/prof002.txt: 5 34 - -10 R 00:00:00 0.0 [desched/0]
041123-01/prof002.txt: 8 34 - -10 R 00:00:00 0.0 [desched/1]
041123-01/prof002.txt: 100 24 - 0 R 00:00:00 0.0 [pdflush]
041123-01/prof002.txt: 310 24 - 0 R 00:00:00 0.0 [kirqd]
041123-01/prof002.txt: 320 24 - 0 R 00:00:02 0.1 [kjournald]
041123-01/prof002.txt: 1208 23 - 0 R 00:00:03 0.1 [kjournald]
041123-01/prof002.txt: 1803 23 - 0 R 00:00:00 0.0 /sbin/dhclient
-1 -q -lf /var/lib/dhcp/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid
-cf /etc/dhclient-eth0.conf eth0
041123-01/prof002.txt: 1838 23 - 0 R 00:00:00 0.0 syslogd -m 0
041123-01/prof002.txt: 1909 23 - 0 R 00:00:00 0.0 rpc.idmapd
041123-01/prof002.txt: 2040 23 - 0 R 00:00:18 0.7 /usr/sbin/nscd
041123-01/prof002.txt: 2072 23 - 0 R 00:00:02 0.0 cupsd
041123-01/prof002.txt: 2361 23 - 0 R 00:00:00 0.0 ntpd -U ntp -p
/var/run/ntpd.pid
041123-01/prof002.txt: 2381 23 - 0 R 00:00:00 0.0
/usr/local/nagios/sbin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
041123-01/prof002.txt: 2400 23 - 0 R 00:00:00 0.0 sendmail:
accepting connections
041123-01/prof002.txt: 2426 23 - 0 R 00:00:00 0.0 gpm -m
/dev/input/mice -t imps2
041123-01/prof002.txt: 2551 23 - 0 R 00:00:00 0.0 crond
041123-01/prof002.txt: 2574 23 - 0 R 00:00:00 0.0 xfs -droppriv
-daemon
041123-01/prof002.txt: 2593 23 - 0 R 00:00:00 0.0 /usr/sbin/atd
041123-01/prof002.txt: 2911 23 - 0 R 00:00:00 0.0
/usr/bin/ssh-agent /home/u21305/.Xclients
041123-01/prof002.txt: 2944 23 - 0 R 00:00:00 0.0 kdeinit:
klauncher
041123-01/prof002.txt: 2947 24 - 0 R 00:00:02 0.1 kdeinit: kded
041123-01/prof002.txt: 3174 24 - 0 R 00:00:03 0.1 kdeinit: knotify
041123-01/prof002.txt: 3196 23 - 0 R 00:00:00 0.0 kwrapper
ksmserver
041123-01/prof002.txt: 3201 23 - 0 R 00:00:02 0.0 kdeinit:
kdesktop
041123-01/prof002.txt: 3203 24 - 0 R 00:00:18 0.7 kdeinit: kicker
041123-01/prof002.txt: 3216 23 - 0 R 00:00:00 0.0
/sbin/pam_timestamp_check -d root
041123-01/prof002.txt: 3220 23 - 0 R 00:00:11 0.4 kdeinit: konsole
-session 11c034d757000106674694500000017150004_1101232254_900758
041123-01/prof002.txt: 3624 41 1 0 R 00:00:04 0.2 /bin/sh
./getlog.sh
041123-01/prof002.txt: 3782 4 - 10 R 00:20:05 68.5 ./cpu_burn
041123-01/prof002.txt: 6030 23 - 0 R 00:00:01 0.1
/usr/bin/kdesktop_lock
041123-01/prof002.txt: 8497 21 - 0 R 00:00:06 1.6 head -c
750000000 /dev/zero
041123-01/prof002.txt: 8594 70 1 0 R 00:05:33 99.9 /bin/sh
./get_ltrace.sh 100


2004-11-23 19:54:47

by Adam Heath

[permalink] [raw]
Subject: Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm2-V0.7.30-9

yOn Tue, 23 Nov 2004, wrote:

> >i have released the -V0.7.30-9 Real-Time Preemption patch,
>
> The profile script (5 second wait, dump data if wait > 10 seconds) was
> stuck for over 300 seconds & when it woke up there were 34 jobs waiting
> to run. A couple excerpts from the data [will send that separately].
> This was with -V0.7.30-7 (PREEMPT_RT) running latencytest & I believe
> the disk write test (comparing date/time modified of log files...) plus
> a pair of data collecting scripts.

Sounds very similiar to the problem I reported with 29-0. I left work late
saturday night, arrived back sunday around 11am, and found the machine saying
the time was 1:15am. I've seen other pauses, were nothing runs. Hitting a
key causes the machine to start responding/running again.