Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753080Ab1ECON2 (ORCPT ); Tue, 3 May 2011 10:13:28 -0400 Received: from cantor2.suse.de ([195.135.220.15]:34354 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753009Ab1ECONH (ORCPT ); Tue, 3 May 2011 10:13:07 -0400 Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback related. From: James Bottomley To: Mel Gorman Cc: Mel Gorman , Jan Kara , colin.king@canonical.com, Chris Mason , linux-fsdevel , linux-mm , linux-kernel , linux-ext4 In-Reply-To: <20110503091320.GA4542@novell.com> References: <20110428150827.GY4658@suse.de> <1304006499.2598.5.camel@mulgrave.site> <1304009438.2598.9.camel@mulgrave.site> <1304009778.2598.10.camel@mulgrave.site> <20110428171826.GZ4658@suse.de> <1304015436.2598.19.camel@mulgrave.site> <20110428192104.GA4658@suse.de> <1304020767.2598.21.camel@mulgrave.site> <1304025145.2598.24.camel@mulgrave.site> <1304030629.2598.42.camel@mulgrave.site> <20110503091320.GA4542@novell.com> Content-Type: text/plain; charset="UTF-8" Date: Tue, 03 May 2011 09:13:02 -0500 Message-ID: <1304431982.2576.5.camel@mulgrave.site> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2154 Lines: 48 On Tue, 2011-05-03 at 10:13 +0100, Mel Gorman wrote: > On Thu, Apr 28, 2011 at 05:43:48PM -0500, James Bottomley wrote: > > On Thu, 2011-04-28 at 16:12 -0500, James Bottomley wrote: > > > On Thu, 2011-04-28 at 14:59 -0500, James Bottomley wrote: > > > > Actually, talking to Chris, I think I can get the system up using > > > > init=/bin/bash without systemd, so I can try the no cgroup config. > > > > > > OK, so a non-PREEMPT non-CGROUP kernel has survived three back to back > > > runs of untar without locking or getting kswapd pegged, so I'm pretty > > > certain this is cgroups related. The next steps are to turn cgroups > > > back on but try disabling the memory and IO controllers. > > > > I tried non-PREEMPT CGROUP but disabled GROUP_MEM_RES_CTLR. > > > > The results are curious: the tar does complete (I've done three back to > > back). However, I did get one soft lockup in kswapd (below). But the > > system recovers instead of halting I/O and hanging like it did > > previously. > > > > The soft lockup is in shrink_slab, so perhaps it's a combination of slab > > shrinker and cgroup memory controller issues? > > > > So, kswapd is still looping in reclaim and spending a lot of time in > shrink_slab but it must not be the shrinker itself or that debug patch > would have triggered. It's curious that cgroups are involved with > systemd considering that one would expect those groups to be fairly > small. I still don't have a new theory but will get hold of a Fedora 15 > install CD and see can I reproduce it locally. I've got a ftrace output of kswapd ... it's 500k compressed, so I'll send under separate cover. > One last thing, what is the value of /proc/sys/vm/zone_reclaim_mode? Two > of the reporting machines could be NUMA and if that proc file reads as > 1, I'd be interested in hearing the results of a test with it set to 0. > Thanks. It's zero, I'm afraid James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/