Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755210Ab0K2HuI (ORCPT ); Mon, 29 Nov 2010 02:50:08 -0500 Received: from mga03.intel.com ([143.182.124.21]:36899 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755014Ab0K2HuG (ORCPT ); Mon, 29 Nov 2010 02:50:06 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.59,275,1288594800"; d="scan'208";a="354299416" Date: Mon, 29 Nov 2010 15:49:54 +0800 From: Wu Fengguang To: Minchan Kim Cc: Andrew Morton , linux-mm , LKML , Ben Gamari , Peter Zijlstra , Rik van Riel , KOSAKI Motohiro , Johannes Weiner , Nick Piggin , Mel Gorman Subject: Re: [PATCH v2 1/3] deactivate invalidated pages Message-ID: <20101129074954.GB22803@localhost> References: <7b50614882592047dfd96f6ca2bb2d0baa8f5367.1290956059.git.minchan.kim@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7b50614882592047dfd96f6ca2bb2d0baa8f5367.1290956059.git.minchan.kim@gmail.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7779 Lines: 225 On Sun, Nov 28, 2010 at 11:02:55PM +0800, Minchan Kim wrote: > This patch is based on mmotm-11-23. I cannot find __pagevec_lru_deactive() in mmotm-11-23. Do you have any more patches? > Recently, there are reported problem about thrashing. > (http://marc.info/?l=rsync&m=128885034930933&w=2) > It happens by backup workloads(ex, nightly rsync). > That's because the workload makes just use-once pages > and touches pages twice. It promotes the page into > active list so that it results in working set page eviction. > > Some app developer want to support POSIX_FADV_NOREUSE. > But other OSes don't support it, either. > (http://marc.info/?l=linux-mm&m=128928979512086&w=2) > > By Other approach, app developer uses POSIX_FADV_DONTNEED. > But it has a problem. If kernel meets page is writing > during invalidate_mapping_pages, it can't work. > It is very hard for application programmer to use it. > Because they always have to sync data before calling > fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could > be discardable. At last, they can't use deferred write of kernel > so that they could see performance loss. > (http://insights.oetiker.ch/linux/fadvise.html) > > In fact, invalidation is very big hint to reclaimer. > It means we don't use the page any more. So let's move > the writing page into inactive list's head. > > Why I need the page to head, Dirty/Writeback page would be flushed > sooner or later. This patch uses trick PG_reclaim so the page would > be moved into tail of inactive list when the page writeout completes. > > It can prevent writeout of pageout which is less effective than > flusher's writeout. > > This patch considers page_mappged(page) with working set. > So the page could leave head of inactive to get a change to activate. > > Originally, I reused lru_demote of Peter with some change so added > his Signed-off-by. > > Note : > PG_reclaim trick of writeback page could race with end_page_writeback > so this patch check PageWriteback one more. It makes race window time > reall small. But by theoretical, it still have a race. But it's a trivial. > > Quote from fe3cba17 and some modification > "If some page PG_reclaim unintentionally, it will confuse readahead and > make it restart the size rampup process. But it's a trivial problem, and > can mostly be avoided by checking PageWriteback(page) first in readahead" > > PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io > always clears PG_reclaim. Next patch will fix it. > > Reported-by: Ben Gamari > Signed-off-by: Minchan Kim > Signed-off-by: Peter Zijlstra > Cc: Wu Fengguang > Cc: Rik van Riel > Cc: KOSAKI Motohiro > Cc: Johannes Weiner > Cc: Nick Piggin > Cc: Mel Gorman > > Changelog since v1: > - modify description > - correct typo > - add some comment > - change deactivation policy > --- > mm/swap.c | 84 +++++++++++++++++++++++++++++++++++++++++++++--------------- > 1 files changed, 63 insertions(+), 21 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index 31f5ec4..345eca1 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page) > spin_unlock_irq(&zone->lru_lock); > } > > -static void __pagevec_lru_deactive(struct pagevec *pvec) > +/* > + * This function is used by invalidate_mapping_pages. > + * If the page can't be invalidated, this function moves the page > + * into inative list's head or tail to reclaim ASAP and evict > + * working set page. > + * > + * PG_reclaim means when the page's writeback completes, the page > + * will move into tail of inactive for reclaiming ASAP. > + * > + * 1. active, mapped page -> inactive, head > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim > + * 3. inactive, mapped page -> none > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim > + * 5. others -> none > + * > + * In 4, why it moves inactive's head, the VM expects the page would > + * be writeout by flusher. The flusher's writeout is much effective than > + * reclaimer's random writeout. > + */ > +static void __lru_deactivate(struct page *page, struct zone *zone) > { > - int i, lru, file; > + int lru, file; > + int active = 0; > + > + if (!PageLRU(page)) > + return; > + > + if (PageActive(page)) > + active = 1; > + /* Some processes are using the page */ > + if (page_mapped(page) && !active) > + return; It's good to check such protections if doing heuristic demotion. However if requested explicitly by the user, I'm _much more_ inclined to act stupid&dumb and meet the user's expectation. Or will this code be called by someone other than DONTNEED? Sorry I have no context of the full code. > + else if (PageWriteback(page)) { > + SetPageReclaim(page); > + /* Check race with end_page_writeback */ > + if (!PageWriteback(page)) > + ClearPageReclaim(page); Does the double check help a lot? > + } else if (PageDirty(page)) > + SetPageReclaim(page); Typically there are much more dirty pages than writeback pages. I guess it's a good place to call bdi_start_inode_writeback() which was posted here: http://www.spinics.net/lists/linux-mm/msg10833.html Thanks, Fengguang > + > + file = page_is_file_cache(page); > + lru = page_lru_base_type(page); > + del_page_from_lru_list(zone, page, lru + active); > + ClearPageActive(page); > + ClearPageReferenced(page); > + add_page_to_lru_list(zone, page, lru); > + if (active) > + __count_vm_event(PGDEACTIVATE); > + > + update_page_reclaim_stat(zone, page, file, 0); > +} > > +/* > + * This function must be called with preemption disable. > + */ > +static void __pagevec_lru_deactivate(struct pagevec *pvec) > +{ > + int i; > struct zone *zone = NULL; > > for (i = 0; i < pagevec_count(pvec); i++) { > @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec) > zone = pagezone; > spin_lock_irq(&zone->lru_lock); > } > - > - if (PageLRU(page)) { > - if (PageActive(page)) { > - file = page_is_file_cache(page); > - lru = page_lru_base_type(page); > - del_page_from_lru_list(zone, page, > - lru + LRU_ACTIVE); > - ClearPageActive(page); > - ClearPageReferenced(page); > - add_page_to_lru_list(zone, page, lru); > - __count_vm_event(PGDEACTIVATE); > - > - update_page_reclaim_stat(zone, page, file, 0); > - } > - } > + __lru_deactivate(page, zone); > } > if (zone) > spin_unlock_irq(&zone->lru_lock); > @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu) > > pvec = &per_cpu(lru_deactivate_pvecs, cpu); > if (pagevec_count(pvec)) > - __pagevec_lru_deactive(pvec); > + __pagevec_lru_deactivate(pvec); > } > > /* > - * Forecfully demote a page to the tail of the inactive list. > + * Forcefully deactivate a page. > + * This function is used for reclaiming the page ASAP when the page > + * can't be invalidated by Dirty/Writeback. > */ > void lru_deactivate_page(struct page *page) > { > @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page) > struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); > > if (!pagevec_add(pvec, page)) > - __pagevec_lru_deactive(pvec); > + __pagevec_lru_deactivate(pvec); > put_cpu_var(lru_deactivate_pvecs); > } > } > > - > void lru_add_drain(void) > { > drain_cpu_pagevecs(get_cpu()); > -- > 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/