Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752438AbXJURZY (ORCPT ); Sun, 21 Oct 2007 13:25:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751667AbXJURZM (ORCPT ); Sun, 21 Oct 2007 13:25:12 -0400 Received: from wa-out-1112.google.com ([209.85.146.181]:39883 "EHLO wa-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751639AbXJURZL (ORCPT ); Sun, 21 Oct 2007 13:25:11 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=NwJ8qm65Cp/fWoBDaXGUBRWg4u6d/5UC8kwuhMvXrYupqikQM71d2ts+RJ3fI9YSUJR+xnxMB9VwDZ1WKvQJoDZAEYqddU7dRJSFesj5Q//EhNGDc9V2mXdMkygX8j9bTgdP5GgSRo5M8pp/T58NOaLCee1/BtByF9D/PDJiK2M= Message-ID: Date: Sun, 21 Oct 2007 19:25:10 +0200 From: "Dmitry Adamushko" To: "Jeff Garzik" Subject: Re: [2.6.23] tasks stuck in running state? Cc: "Ingo Molnar" , "Chuck Ebbert" , "Linux Kernel Mailing List" In-Reply-To: <47192999.6010607@garzik.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <47192425.6020507@garzik.org> <47192763.9050004@redhat.com> <47192999.6010607@garzik.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2367 Lines: 64 On 20/10/2007, Jeff Garzik wrote: > Chuck Ebbert wrote: > > On 10/19/2007 05:39 PM, Jeff Garzik wrote: > >> On my main devel box, vanilla 2.6.23 on x86-64/Fedora-7, I'm seeing a > >> certain behavior at least once a day. I'll start a kernel build (make > >> -sj5 on this box), and it will "hang" in the following way: > >> > > > > Can you try to strace the hanging task? > > Well, to the system it's running, so that doesn't do much of anything... > > > > > 8482 pts/0 S+ 0:00 \_ /bin/sh /garz/repo/misc-2.6/scripts/hdrcheck.sh /garz/repo/misc-2.6/usr/include /garz/repo/misc-2.6/usr/include/linux/kernelcapi.h /garz/repo/misc-2.6/usr/include/linux/.check.kernelcapi.h > > 8484 pts/0 R+ 3:10 \_ grep ^[ \t]*#[ \t]*include[ \t]*< /garz/repo/misc-2.6/usr/include/linux/kernelcapi.h > > 8486 pts/0 S+ 0:00 \_ cut -f2 -d< > > 8487 pts/0 S+ 0:00 \_ cut -f1 -d> > > 8488 pts/0 S+ 0:00 \_ egrep ^linux|^asm > > [jgarzik@pretzel misc-2.6]$ strace -p8484 > > Process 8484 attached - interrupt to quit > [sits there, chewing up CPU grepping a 47-line header file] I think, the following information would be helpful : (1) # cat /proc/PID/stat ; sleep 5; cat /proc/PID/stat as we could know : - whether utime or/and stime keeps increasing ; - one of the fields is actually 'eip' ... so we could, hopefully, more or less precisely localize the area of the code where it's looping. (do_task_stat() in fs/proc/array.c) (2) # cat /proc/sched_debug; and # cat /proc/PID/sched (if your kernel has CONFIG_SCHED_DEBUG=y) (3) is anything else able to run on this CPU (as I understand it's SMP) or the task does completely locks it up. again, one of the fields of /proc/PID/stat is the CPU# on which the task is currently active. Let's start a buzy-loop task on this cpu and see wheather it's able to make any progress (TIME counter in 'ps') # taskset -c THIS_CPU some_busy_looping_prog TIA, -- Best regards, Dmitry Adamushko - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/