Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757514Ab1D1M3t (ORCPT ); Thu, 28 Apr 2011 08:29:49 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:46460 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754019Ab1D1M3r (ORCPT ); Thu, 28 Apr 2011 08:29:47 -0400 Content-Type: text/plain; charset=UTF-8 From: Chris Mason To: Colin Ian King Cc: James Bottomley , linux-fsdevel , linux-mm , linux-kernel , linux-ext4 Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback related. In-reply-to: <1303990590.2081.9.camel@lenovo> References: <1303920553.2583.7.camel@mulgrave.site> <1303921583-sup-4021@think> <1303923000.2583.8.camel@mulgrave.site> <1303923177-sup-2603@think> <1303924902.2583.13.camel@mulgrave.site> <1303925374-sup-7968@think> <1303926637.2583.17.camel@mulgrave.site> <1303934716.2583.22.camel@mulgrave.site> <1303990590.2081.9.camel@lenovo> Date: Thu, 28 Apr 2011 08:29:29 -0400 Message-Id: <1303993705-sup-5213@think> User-Agent: Sup/git Content-Transfer-Encoding: 8bit X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090208.4DB95DB7.0004:SCFMA922111,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1534 Lines: 43 Excerpts from Colin Ian King's message of 2011-04-28 07:36:30 -0400: > One more data point to add, I've been looking at an identical issue when > copying large amounts of data. I bisected this - and the lockups occur > with commit > 3e7d344970673c5334cf7b5bb27c8c0942b06126 - before that I don't see the > issue. With this commit, my file copy test locks up after ~8-10 > iterations, before this commit I can copy > 100 times and don't see the > lockup. Well, that's really interesting. I tried with compaction on here and couldn't trigger it, but this (very very lightly) tested patch might help. It moves the writeout throttle before the goto restart, and also makes sure we do at least one cond_resched before we loop. diff --git a/mm/vmscan.c b/mm/vmscan.c index 6771ea7..cb08b41 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1934,12 +1934,14 @@ restart: if (inactive_anon_is_low(zone, sc)) shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0); + throttle_vm_writeout(sc->gfp_mask); + /* reclaim/compaction might need reclaim to continue */ if (should_continue_reclaim(zone, nr_reclaimed, - sc->nr_scanned - nr_scanned, sc)) + sc->nr_scanned - nr_scanned, sc)) { + cond_resched(); goto restart; - - throttle_vm_writeout(sc->gfp_mask); + } } /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/