Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762826AbXIKQ7x (ORCPT ); Tue, 11 Sep 2007 12:59:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755946AbXIKQ7n (ORCPT ); Tue, 11 Sep 2007 12:59:43 -0400 Received: from smtp2.linux-foundation.org ([207.189.120.14]:51850 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755823AbXIKQ7m (ORCPT ); Tue, 11 Sep 2007 12:59:42 -0400 Date: Tue, 11 Sep 2007 09:58:32 -0700 (PDT) From: Linus Torvalds To: Andy Whitcroft cc: sct@redhat.com, akpm@linux-foundation.org, adilger@clusterfs.com, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, mel@csn.ul.ie Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests In-Reply-To: <20070911124202.GI9556@shadowen.org> Message-ID: References: <20070911124202.GI9556@shadowen.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1410 Lines: 35 On Tue, 11 Sep 2007, Andy Whitcroft wrote: > > I have a couple of failed test runs against 2.6.23-rc6 where the > job timed out while running dbench over ext3. Both on powerpc, > though both significantly different hardware setups. A failed > run like this implies that the machine was still responsive to > other processes but the dbench was making no progress. There is > no console diagnostics during the failure. Since the machine seems to be otherwise alive, can you do a sysrq-W (which is most easily done by just doing a echo w > /proc/sysrq-trigger and you don't actually need any console access or anything like that). That should give you all the blocked process traces, and if it's a deadlock on some semaphore or other, it should all stand out quite nicely. In fact, things like the above are probably worth scripting for any automated testing - if you auto-fail after some time, please make the failure case do that sysrq-W by default. (The other sysrq things can be useful too - "T" shows the same as "W", except for _all_ tasks, which is often so verbose that it hides the problem, but is sometimes the right thing to do). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/