From: "Andrew A." <aathan-linux-kernel-1542@cloakmail.com>
To: <linux-kernel@vger.kernel.org>
Subject: RE: Consistent lock up 2.6.10-rc1-bk7 (mutex/SCHED_RR bug?)
Date: Fri, 29 Oct 2004 11:36:24 -0400
Message-ID: <NFBBICMEBHKIKEFBPLMCCEEGJGAA.aathan-linux-kernel-1542@cloakmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
In-Reply-To: <OMEGLKPBDPDHAGCIBHHJOELLFCAA.aathan-linux-kernel-1542@cloakmail.com>
Importance: Normal
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2550
Lines: 47


For whatever reason, I have been unable to send the original message below through vger.

I am therefore enclosing some of the important text here:

==========

I have in the past posted emails with the subject "Consistent kernel hang during heavy TCP connection handling load"  I then
recently saw a linux-kernel thread "PROBLEM: Consistent lock up on >=2.6.8" that seemed to be related to the problem I am
experiencing, but I am not sure of it (thus the cc:).

I have, however, now managed to formulate a series of steps which reproduce my particular lock up consistently.  When I trigger the
hang, all virtual consoles become unresponsive, and the application cannot be signaled from the keyboard.  Sysrq seems to work.

The application in question is called "tt1".  It runs several threads in SCHED_RR and uses select(), sleep() and/or nanosleep()
extensively.  I suspect there's a good chance the application calls select() with nfds=0 at some point.

Due to the SCHED_RR usage in tt1, before executing the tt1 hang, I have tried to log into a virtual console on the host and run
"nice -20 bash" as root.  THe nice'd shell is hung just like everything else.

Did I do it right?  I was trying to make sure this hang is not simply an infinite loop in a SCHED_RR high priority process (tt1).

I initially had a lot of trouble trying to capture sysrq output, but then I checked my netlog host and found (lo and behold) that it
had captured it!  Of course, that was before I went through the trouble of taking pictures of my monitor! I've included the netlog
sysrq output from two runs below.  They are at the very bottom of this email, separated by lines of '*'s  These runs are probably
DIFFERENT than the runs from which I produced the below screenshots.

So, here are those screenshots, I still welcome any comments you might have about easier ways to capture sysrq output than using
netdump!

I modified /etc/syslog.conf to say kern.* /var/log/kernel, however, output of sysrq-t and sysrq-p while in the locked up state never
ends up in the file (though, it does, when not locked up).

The sysreq output and screenshots can be found at triple w dot memeplex dot com slash

sysrq[1-2].txt.gz
lock[1-3].gif mapping to System.map-2.6.8.1.gz
lock[4-5].gif mapping to System.map-2.6.8-1.521.gz

=======


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/