Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760172AbZKZN4O (ORCPT ); Thu, 26 Nov 2009 08:56:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754235AbZKZN4O (ORCPT ); Thu, 26 Nov 2009 08:56:14 -0500 Received: from gir.skynet.ie ([193.1.99.77]:44304 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753953AbZKZN4N (ORCPT ); Thu, 26 Nov 2009 08:56:13 -0500 Date: Thu, 26 Nov 2009 13:56:15 +0000 From: Mel Gorman To: Mike Galbraith Cc: Bartlomiej Zolnierkiewicz , Jens Axboe , Andrew Morton , Linus Torvalds , Frans Pop , Jiri Kosina , Sven Geggus , Karol Lewandowski , Tobias Oetiker , KOSAKI Motohiro , Pekka Enberg , Rik van Riel , Christoph Lameter , Stephan von Krawczynski , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH-RFC] cfq: Disable low_latency by default for 2.6.32 Message-ID: <20091126135615.GD13095@csn.ul.ie> References: <20091126121945.GB13095@csn.ul.ie> <1259240937.7371.15.camel@marge.simson.net> <200911261420.57121.bzolnier@gmail.com> <1259242651.6622.5.camel@marge.simson.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1259242651.6622.5.camel@marge.simson.net> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3730 Lines: 75 On Thu, Nov 26, 2009 at 02:37:31PM +0100, Mike Galbraith wrote: > On Thu, 2009-11-26 at 14:20 +0100, Bartlomiej Zolnierkiewicz wrote: > > On Thursday 26 November 2009 02:08:57 pm Mike Galbraith wrote: > > > On Thu, 2009-11-26 at 12:19 +0000, Mel Gorman wrote: > > > > (cc'ing the people from the page allocator failure thread as this might be > > > > relevant to some of their problems) > > > > > > > > I know this is very last minute but I believe we should consider disabling > > > > the "low_latency" tunable for block devices by default for 2.6.32. There was > > > > evidence that low_latency was a problem last week for page allocation failure > > > > reports but the reproduction-case was unusual and involved high-order atomic > > > > allocations in low-memory conditions. It took another few days to accurately > > > > show the problem for more normal workloads and it's a bit more wide-spread > > > > than just allocation failures. > > > > > > > > Basically, low_latency looks great as long as you have plenty of memory > > > > but in low memory situations, it appears to cause problems that manifest > > > > as reduced performance, desktop stalls and in some cases, page allocation > > > > failures. I think most kernel developers are not seeing the problem as they > > > > tend to test on beefier machines and without hitting swap or low-memory > > > > situations for the most part. When they are hitting low-memory situations, > > > > it tends to be for stress tests where stalls and low performance are expected. > > > > > > Ouch. It was bad desktop stalls under heavy write that kicked the whole > > > thing off. > > > > The problem is that 'desktop' means different things for different people > > (for some kernel developers 'desktop' is more like 'a workstation' and for > > others it is more like 'an embedded device'). Will concede that - the term "desktop" is fuzzy at best. The characteristics of note are a mid-range machine running workloads that are not steady, have abupt phase changes and are not very well sized to the available memory. "Desktops" fall into this category but it's also possible that badly-or-borderline-provisioned servers would also fall into it. > > The stalls I'm talking about were reported for garden variety desktop > PC. The stalls I'm seeing on the laptop are tiny but there. It's prefectly possible a whole host of stalls for people have been resolved but there is one corner case. > I reproduced them on my supermarket special Q6600 desktop PC. That > problem has been with us roughly forever, but I'd hoped it had been > cured. Guess not. > It's possible the corner case causing stalls is specific to low-memory rather than writes. Conceivably, what is going wrong is that writes need to complete for pages to be clean so pages can be reclaimed. The cleaning of pages is getting pre-empted by sync IO until such point as pages cannot be reclaimed and they stall allowing writes to complete. I'll prototype something to disable low_latency if kswapd is awake. If it makes as difference, this might be plausible. As Jens would say though, this is "mostly hand-wavy nonsense". > As an idle speculation, I wonder if the sync vs async slice ratios may > not have been knocked out of kilter a bit by giving more to sync. > I don't know enough to speculate. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/