Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755214Ab0BACRU (ORCPT ); Sun, 31 Jan 2010 21:17:20 -0500 Received: from mga02.intel.com ([134.134.136.20]:4491 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753122Ab0BACRS (ORCPT ); Sun, 31 Jan 2010 21:17:18 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.49,380,1262592000"; d="scan'208";a="591994793" Date: Mon, 1 Feb 2010 10:17:03 +0800 From: Wu Fengguang To: Chris Frost Cc: KAMEZAWA Hiroyuki , Andrew Morton , Steve Dickson , David Howells , Xu Chenfeng , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Steve VanDeBogart , Nick Piggin Subject: Re: [PATCH] mm/readahead.c: update the LRU positions of in-core pages, too Message-ID: <20100201021703.GA11260@localhost> References: <20100120215536.GN27212@frostnet.net> <20100121054734.GC24236@localhost> <20100123040348.GC30844@frostnet.net> <20100123102222.GA6943@localhost> <20100125094228.f7ca1430.kamezawa.hiroyu@jp.fujitsu.com> <20100125024544.GA16462@localhost> <20100125223635.GC2822@frostnet.net> <20100126133217.GB25407@localhost> <20100131143142.GA11186@localhost> <20100201020639.GA27212@frostnet.net> MIME-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Disposition: inline In-Reply-To: <20100201020639.GA27212@frostnet.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4999 Lines: 137 On Sun, Jan 31, 2010 at 07:06:39PM -0700, Chris Frost wrote: > On Sun, Jan 31, 2010 at 10:31:42PM +0800, Wu Fengguang wrote: > > On Tue, Jan 26, 2010 at 09:32:17PM +0800, Wu Fengguang wrote: > > > On Mon, Jan 25, 2010 at 03:36:35PM -0700, Chris Frost wrote: > > > > I changed Wu's patch to add a PageLRU() guard that I believe is required > > > > and optimized zone lock acquisition to only unlock and lock at zone changes. > > > > This optimization seems to provide a 10-20% system time improvement for > > > > some of my GIMP benchmarks and no improvement for other benchmarks. > > > > I feel very uncomfortable about this put_page() inside zone->lru_lock. > > (might deadlock: put_page() conditionally takes zone->lru_lock again) > > > > If you really want the optimization, can we do it like this? > > Sorry that I was slow to respond. (I was out of town.) > > Thanks for catching __page_cache_release() locking the zone. > I think staying simple for now sounds good. The below locks > and unlocks the zone for each page. Look good? OK :) Thanks, Fengguang > --- > readahead: retain inactive lru pages to be accessed soon > From: Chris Frost > > Ensure that cached pages in the inactive list are not prematurely evicted; > move such pages to lru head when they are covered by > - in-kernel heuristic readahead > - an posix_fadvise(POSIX_FADV_WILLNEED) hint from an application > > Before this patch, pages already in core may be evicted before the > pages covered by the same prefetch scan but that were not yet in core. > Many small read requests may be forced on the disk because of this > behavior. > > In particular, posix_fadvise(... POSIX_FADV_WILLNEED) on an in-core page > has no effect on the page's location in the LRU list, even if it is the > next victim on the inactive list. > > This change helps address the performance problems we encountered > while modifying SQLite and the GIMP to use large file prefetching. > Overall these prefetching techniques improved the runtime of large > benchmarks by 10-17x for these applications. More in the publication > _Reducing Seek Overhead with Application-Directed Prefetching_ in > USENIX ATC 2009 and at http://libprefetch.cs.ucla.edu/. > > Signed-off-by: Chris Frost > Signed-off-by: Steve VanDeBogart > Signed-off-by: KAMEZAWA Hiroyuki > Signed-off-by: Wu Fengguang > --- > readahead.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/mm/readahead.c b/mm/readahead.c > index aa1aa23..c615f96 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -9,7 +9,9 @@ > > #include > #include > +#include > #include > +#include > #include > #include > #include > @@ -133,6 +135,40 @@ out: > } > > /* > + * The file range is expected to be accessed in near future. Move pages > + * (possibly in inactive lru tail) to lru head, so that they are retained > + * in memory for some reasonable time. > + */ > +static void retain_inactive_pages(struct address_space *mapping, > + pgoff_t index, int len) > +{ > + int i; > + > + for (i = 0; i < len; i++) { > + struct page *page; > + struct zone *zone; > + > + page = find_get_page(mapping, index + i); > + if (!page) > + continue; > + zone = page_zone(page); > + spin_lock_irq(&zone->lru_lock); > + > + if (PageLRU(page) && > + !PageActive(page) && > + !PageUnevictable(page)) { > + int lru = page_lru_base_type(page); > + > + del_page_from_lru_list(zone, page, lru); > + add_page_to_lru_list(zone, page, lru); > + } > + > + spin_unlock_irq(&zone->lru_lock); > + put_page(page); > + } > +} > + > +/* > * __do_page_cache_readahead() actually reads a chunk of disk. It allocates all > * the pages first, then submits them all for I/O. This avoids the very bad > * behaviour which would occur if page allocations are causing VM writeback. > @@ -184,6 +220,14 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, > } > > /* > + * Normally readahead will auto stop on cached segments, so we won't > + * hit many cached pages. If it does happen, bring the inactive pages > + * adjecent to the newly prefetched ones(if any). > + */ > + if (ret < nr_to_read) > + retain_inactive_pages(mapping, offset, page_idx); > + > + /* > * Now start the IO. We ignore I/O errors - if the page is not > * uptodate then the caller will launch readpage again, and > * will then handle the error. > > -- > Chris Frost > http://www.frostnet.net/chris/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/