From: Andy Whitcroft Subject: Re: 2.6.23-rc6: hanging ext3 dbench tests Date: Wed, 19 Sep 2007 19:15:46 +0100 Message-ID: <20070919181546.GA4343@shadowen.org> References: <20070911124202.GI9556@shadowen.org> <20070911173049.GA10499@shadowen.org> <20070914094905.GA32670@shadowen.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: sct@redhat.com, akpm@linux-foundation.org, adilger@clusterfs.com, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, mel@csn.ul.ie To: Linus Torvalds Return-path: Content-Disposition: inline In-Reply-To: <20070914094905.GA32670@shadowen.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Fri, Sep 14, 2007 at 10:49:05AM +0100, Andy Whitcroft wrote: > On Tue, Sep 11, 2007 at 06:30:49PM +0100, Andy Whitcroft wrote: > > Annoyingly this seems to be intermittent, and I have not managed to get > > a machine into this state again yet. Will keep trying. > > Ok, I have been completly unsuccessful in reproducing this. Dispite > having two distinct machines showing this behaviour. I have neither > been able to reproduce it with those machine on 2.6.23-rc6 nor has any > of the testing of any of the -git releases which follow thrown this > error. I have run about 10 repeats of the jobs which failed too and > none of those have thrown the same error. > > It is pretty clear from the dbench output that the problem is/was real, > that its not some artifact of the test harness. I am a loss as to how to > get this to trigger again. > > I guess I will keep monitoring the ongoing tests for new instances. I > will also look to getting the sysrq-* stuff triggered automatically on > job timeout as that seems like a sane plan in all cases. > > Frustrated. I have since had a single occurance of a hang on 2.6.23-rc6-mm1. As the base is different I cannot for sure say its the same problem. In this new event we had a mkfs hung in a 'D' wait: ======================= mkfs.ext2 D c10220f4 0 6233 6222 c344fc80 00000082 00000286 c10220f4 c344fc90 002ed099 c2963340 c2b9f640 c142bce0 c2b9f640 c344fc90 002ed099 c344fcfc c344fcc0 c1219563 c1109bf2 c344fcc4 c186e4d4 c186e4d4 002ed099 c1022612 c2b9f640 c186e000 c104000c Call Trace: [] lock_timer_base+0x19/0x35 [] schedule_timeout+0x70/0x8d [] prop_fraction_single+0x37/0x5d [] process_timeout+0x0/0x5 [] task_dirty_limit+0x3a/0xb5 [] io_schedule_timeout+0x1e/0x28 [] congestion_wait+0x62/0x7a [] autoremove_wake_function+0x0/0x33 [] get_dirty_limits+0x16a/0x172 [] autoremove_wake_function+0x0/0x33 [] balance_dirty_pages+0x154/0x1be [] generic_perform_write+0x168/0x18a [] generic_file_buffered_write+0x73/0x107 [] __generic_file_aio_write_nolock+0x47a/0x4a5 [] do_sock_write+0x92/0x99 [] sock_aio_write+0x52/0x5e [] generic_file_aio_write_nolock+0x48/0x9b [] do_sync_write+0xbf/0xfc [] autoremove_wake_function+0x0/0x33 [] do_page_fault+0x2cc/0x739 [] vfs_write+0x8d/0x108 [] sys_write+0x41/0x67 [] syscall_call+0x7/0xb ======================= -apw