Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757559Ab0DRTKr (ORCPT ); Sun, 18 Apr 2010 15:10:47 -0400 Received: from mexforward.lss.emc.com ([128.222.32.20]:47463 "EHLO mexforward.lss.emc.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753515Ab0DRTKq (ORCPT ); Sun, 18 Apr 2010 15:10:46 -0400 Date: Sun, 18 Apr 2010 15:10:07 -0400 To: "Andrew Morton" , "Mel Gorman" Subject: Re: [PATCH] mm: disallow direct reclaim page writeback From: "Sorin Faibish" Cc: "Dave Chinner" , "KOSAKI Motohiro" , "Chris Mason" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-15 Message-ID: User-Agent: Opera Mail/9.63 (Win32) MIME-Version: 1.0 References: <20100413202021.GZ13327@think> <20100414014041.GD2493@dastard> <20100414155233.D153.A69D9226@jp.fujitsu.com> <20100414072830.GK2493@dastard> <20100414085132.GJ25756@csn.ul.ie> <20100415013436.GO2493@dastard> <20100415102837.GB10966@csn.ul.ie> <20100416041412.GY2493@dastard> <20100416151403.GM19264@csn.ul.ie> <20100417203239.dda79e88.akpm@linux-foundation.org> In-Reply-To: <20100417203239.dda79e88.akpm@linux-foundation.org> Content-Transfer-Encoding: 8bit X-EMM-EM: Active Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3544 Lines: 95 On Sat, 17 Apr 2010 20:32:39 -0400, Andrew Morton wrote: > > There are two issues here: stack utilisation and poor IO patterns in > direct reclaim. They are different. > > The poor IO patterns thing is a regression. Some time several years > ago (around 2.6.16, perhaps), page reclaim started to do a LOT more > dirty-page writeback than it used to. AFAIK nobody attempted to work > out why, nor attempted to try to fix it. I for one am looking very seriously at this problem together with Bruce. We plan to have a discussion on this topic at the next LSF meeting in Boston. > > > Doing writearound in pageout() might help. The kernel was in fact was > doing that around 2.5.10, but I took it out again because it wasn't > obviously beneficial. > > Writearound is hard to do, because direct-reclaim doesn't have an easy > way of pinning the address_space: it can disappear and get freed under > your feet. I was able to make this happen under intense MM loads. The > current page-at-a-time pageout code pins the address_space by taking a > lock on one of its pages. Once that lock is released, we cannot touch > *mapping. > > And lo, the pageout() code is presently buggy: > > res = mapping->a_ops->writepage(page, &wbc); > if (res < 0) > handle_write_error(mapping, page, res); > > The ->writepage can/will unlock the page, and we're passing a hand > grenade into handle_write_error(). > > Any attempt to implement writearound in pageout will need to find a way > to safely pin that address_space. One way is to take a temporary ref > on mapping->host, but IIRC that introduced nasties with inode_lock. > Certainly it'll put more load on that worrisomely-singleton lock. > > > Regarding simply not doing any writeout in direct reclaim (Dave's > initial proposal): the problem is that pageout() will clean a page in > the target zone. Normal writeout won't do that, so we could get into a > situation where vast amounts of writeout is happening, but none of it > is cleaning pages in the zone which we're trying to allocate from. > It's quite possibly livelockable, too. > > Doing writearound (if we can get it going) will solve that adequately > (assuming that the target page gets reliably written), but it won't > help the stack usage problem. > > > To solve the IO-pattern thing I really do think we should first work > out ytf we started doing much more IO off the LRU. What caused it? Is > it really unavoidable? > > > To solve the stack-usage thing: dunno, really. One could envisage code > which skips pageout() if we're using more than X amount of stack, but > that sucks. Another possibility might be to hand the target page over > to another thread (I suppose kswapd will do) and then synchronise with > that thread - get_page()+wait_on_page_locked() is one way. The helper > thread could of course do writearound. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Best Regards Sorin Faibish Corporate Distinguished Engineer Network Storage Group EMC? where information lives Phone: 508-435-1000 x 48545 Cellphone: 617-510-0422 Email : sfaibish@emc.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/