To: Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel@vger.kernel.org, Lee Revell <rlrevell@joe-job.com>,
       Rui Nuno Capela <rncbc@rncbc.org>, Mark_H_Johnson@raytheon.com,
       "K.R. Foley" <kr@cybsft.com>, Bill Huey <bhuey@lnxw.com>,
       Adam Heath <doogie@debian.org>, Florian Schmidt <mista.tapas@gmx.net>,
       Thomas Gleixner <tglx@linutronix.de>,
       Michal Schmidt <xschmi00@stud.feec.vutbr.cz>,
       Fernando Pablo Lopez-Lezcano <nando@ccrma.Stanford.EDU>,
       Karsten Wiese <annabellesgarden@yahoo.de>,
       Gunther Persoons <gunther_persoons@spymac.com>, emann@mrv.com,
       Shane Shrybman <shrybman@aei.ca>, Amit Shah <amit.shah@codito.com>,
       Esben Nielsen <simlo@phys.au.dk>
From: Mark_H_Johnson@raytheon.com
Subject: Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm2-V0.7.30-9
Date: Tue, 23 Nov 2004 13:24:28 -0600
Message-ID: <OF103A6225.8159BCC7-ON86256F55.006A9C20-86256F55.006A9C62@raytheon.com>
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5056
Lines: 117

>i have released the -V0.7.30-9 Real-Time Preemption patch,

The profile script (5 second wait, dump data if wait > 10 seconds) was
stuck for over 300 seconds & when it woke up there were 34 jobs waiting
to run. A couple excerpts from the data [will send that separately].
This was with -V0.7.30-7 (PREEMPT_RT) running latencytest & I believe
the disk write test (comparing date/time modified of log files...) plus
a pair of data collecting scripts.

Don't know if this is the journaling problem you mention being fixed
in -9 or not. This huge delay did not recur with the disk copy test.

The odd data point I notice here was the nearly 100% cpu time for
the shell script get_ltrace.sh which is...

#!/bin/sh

let MAX=`cat /proc/sys/kernel/preempt_max_latency`
let I=0 J=1
let MP=${1:-1000}
echo "Current Maximum is $MAX, limit will be $MP."
while (( I < 100 )) ; do
    sleep 1s
    let NOW=`cat /proc/sys/kernel/preempt_max_latency`
    if (( MAX != NOW )) ; then
        echo "New trace $I w/ $NOW usec latency."
        cat /proc/latency_trace > lt.`printf "%02d" $I`
        sync ; sync
        let I++
        let MAX=NOW
    elif (( J++ >= 10 )) ; then
        if (( MAX != MP )) ; then
            echo "Resetting max latency from $MAX to $MP."
            echo $MP > /proc/sys/kernel/preempt_max_latency
            let MAX=$MP
        else
            echo "No new latency samples at `date`."
        fi
        let J=1
# else do nothing...
    fi
done

Perhaps the sync caused the 100% CPU usage for the moment (but certainly
not for five minutes!). It is however suspicious that the total run time
for that script (5:33) is very close to the delay time (336 seconds) for
the profile script. Both of these run at RT FIFO priority 1. There is also
a 5 minute gap in the files generated by get_ltrace.sh as well.

--Mark

--- excerpts follow ---

Tue Nov 23 12:40:48 CST 2004
336 seconds elapsed time.
Load Average ->
36.00 24.57 12.44 36/125 8600

Tasks in RUN state...

                        PID PRI RTPRIO  NI S     TIME %CPU CMD
041123-01/prof002.txt:    1  23      -   0 R 00:00:04  0.1 init [5]
041123-01/prof002.txt:    5  34      - -10 R 00:00:00  0.0 [desched/0]
041123-01/prof002.txt:    8  34      - -10 R 00:00:00  0.0 [desched/1]
041123-01/prof002.txt:  100  24      -   0 R 00:00:00  0.0 [pdflush]
041123-01/prof002.txt:  310  24      -   0 R 00:00:00  0.0 [kirqd]
041123-01/prof002.txt:  320  24      -   0 R 00:00:02  0.1 [kjournald]
041123-01/prof002.txt: 1208  23      -   0 R 00:00:03  0.1 [kjournald]
041123-01/prof002.txt: 1803  23      -   0 R 00:00:00  0.0 /sbin/dhclient
-1 -q -lf /var/lib/dhcp/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid
-cf /etc/dhclient-eth0.conf eth0
041123-01/prof002.txt: 1838  23      -   0 R 00:00:00  0.0 syslogd -m 0
041123-01/prof002.txt: 1909  23      -   0 R 00:00:00  0.0 rpc.idmapd
041123-01/prof002.txt: 2040  23      -   0 R 00:00:18  0.7 /usr/sbin/nscd
041123-01/prof002.txt: 2072  23      -   0 R 00:00:02  0.0 cupsd
041123-01/prof002.txt: 2361  23      -   0 R 00:00:00  0.0 ntpd -U ntp -p
/var/run/ntpd.pid
041123-01/prof002.txt: 2381  23      -   0 R 00:00:00  0.0
/usr/local/nagios/sbin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
041123-01/prof002.txt: 2400  23      -   0 R 00:00:00  0.0 sendmail:
accepting connections
041123-01/prof002.txt: 2426  23      -   0 R 00:00:00  0.0 gpm -m
/dev/input/mice -t imps2
041123-01/prof002.txt: 2551  23      -   0 R 00:00:00  0.0 crond
041123-01/prof002.txt: 2574  23      -   0 R 00:00:00  0.0 xfs -droppriv
-daemon
041123-01/prof002.txt: 2593  23      -   0 R 00:00:00  0.0 /usr/sbin/atd
041123-01/prof002.txt: 2911  23      -   0 R 00:00:00  0.0
/usr/bin/ssh-agent /home/u21305/.Xclients
041123-01/prof002.txt: 2944  23      -   0 R 00:00:00  0.0 kdeinit:
klauncher
041123-01/prof002.txt: 2947  24      -   0 R 00:00:02  0.1 kdeinit: kded
041123-01/prof002.txt: 3174  24      -   0 R 00:00:03  0.1 kdeinit: knotify
041123-01/prof002.txt: 3196  23      -   0 R 00:00:00  0.0 kwrapper
ksmserver
041123-01/prof002.txt: 3201  23      -   0 R 00:00:02  0.0 kdeinit:
kdesktop
041123-01/prof002.txt: 3203  24      -   0 R 00:00:18  0.7 kdeinit: kicker
041123-01/prof002.txt: 3216  23      -   0 R 00:00:00  0.0
/sbin/pam_timestamp_check -d root
041123-01/prof002.txt: 3220  23      -   0 R 00:00:11  0.4 kdeinit: konsole
-session 11c034d757000106674694500000017150004_1101232254_900758
041123-01/prof002.txt: 3624  41      1   0 R 00:00:04  0.2 /bin/sh
./getlog.sh
041123-01/prof002.txt: 3782   4      -  10 R 00:20:05 68.5 ./cpu_burn
041123-01/prof002.txt: 6030  23      -   0 R 00:00:01  0.1
/usr/bin/kdesktop_lock
041123-01/prof002.txt: 8497  21      -   0 R 00:00:06  1.6 head -c
750000000 /dev/zero
041123-01/prof002.txt: 8594  70      1   0 R 00:05:33 99.9 /bin/sh
./get_ltrace.sh 100

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/