Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754083AbZJ0RVR (ORCPT ); Tue, 27 Oct 2009 13:21:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752086AbZJ0RVQ (ORCPT ); Tue, 27 Oct 2009 13:21:16 -0400 Received: from cpsmtpm-eml104.kpnxchange.com ([195.121.3.8]:60435 "EHLO CPSMTPM-EML104.kpnxchange.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751877AbZJ0RVP (ORCPT ); Tue, 27 Oct 2009 13:21:15 -0400 From: Frans Pop To: Chris Mason , Mel Gorman , David Rientjes , KOSAKI Motohiro , "Rafael J. Wysocki" , Linux Kernel Mailing List , Kernel Testers List , Pekka Enberg , Reinette Chatre , Bartlomiej Zolnierkiewicz , Karol Lewandowski , Mohamed Abbas , Jens Axboe , "John W. Linville" , linux-mm@kvack.org Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn Date: Tue, 27 Oct 2009 18:21:13 +0100 User-Agent: KMail/1.9.9 References: <3onW63eFtRF.A.xXH.oMTxKB@chimera> <20091027155223.GL8900@csn.ul.ie> <20091027160332.GA7776@think> In-Reply-To: <20091027160332.GA7776@think> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200910271821.18521.elendil@planet.nl> X-OriginalArrivalTime: 27 Oct 2009 17:21:18.0432 (UTC) FILETIME=[E536A600:01CA5729] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3050 Lines: 64 On Tuesday 27 October 2009, Chris Mason wrote: > On Tue, Oct 27, 2009 at 03:52:24PM +0000, Mel Gorman wrote: > > > So, after the move to async/sync, a lot more pages are getting > > > queued for writeback - more than three times the number of pages are > > > queued for writeback with the vanilla kernel. This amount of > > > congestion might be why direct reclaimers and kswapd's timings have > > > changed so much. > > > > Or more accurately, the vanilla kernel has queued up a lot more pages > > for IO than when the patch is reverted. I'm not seeing yet why this > > is. > > [ sympathies over confusion about congestion...lots of variables here ] > > If wb_kupdate has been able to queue more writes it is because the > congestion logic isn't stopping it. We have congestion_wait(), but > before calling that in the writeback paths it says: are you congested? > and then backs off if the answer is yes. > > Ideally, direct reclaim will never do writeback. We want it to be able > to find clean pages that kupdate and friends have already processed. > > Waiting for congestion is a funny thing, it only tells us the device has > managed to finish some IO or that a timeout has passed. Neither event > has any relation to figuring out if the IO for reclaimable pages has > finished. > > One option is to have the VM remember the hashed waitqueue for one of > the pages it direct reclaims and then wait on it. What people should be aware of is the behavior of the system I see at this point. I've already mentioned this in other mails, but it's probably good to repeat it here. While gitk is reading commits with vanilla .31 and .32 kernels there is at some point a fairly long period (10-20 seconds) where I see: - a completely frozen desktop, including frozen mouse cursor - really very little disk activity (HD led flashes very briefly less than once per second) - reading commits stops completely during this period - no music. After that there is a period (another 5-15 seconds) with a huge amount of disk activity during which the system gradually becomes responsive again and in gitk the count of commits that have been read starts increasing again (without a jump in the counter which confirms that no commits were read during the freeze). I cannot really tell what the system is doing during those freezes. Because of the frozen desktop I cannot for example see CPU usage. I suspect that, as there is hardly any disk activity, the system must be reorganizing RAM or something. But it seems quite bad that that gets "bunched up" instead of happening more gradually. With the congestion_wait() change reverted I never see these freezes, only much more normal minor latencies (< 2 seconds; mostly < 0.5 seconds), which is probably unavoidable during heavy swapping. Hth, FJP -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/