Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753008Ab0LOKy1 (ORCPT ); Wed, 15 Dec 2010 05:54:27 -0500 Received: from gir.skynet.ie ([193.1.99.77]:49000 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752598Ab0LOKyZ (ORCPT ); Wed, 15 Dec 2010 05:54:25 -0500 Date: Wed, 15 Dec 2010 10:54:06 +0000 From: Mel Gorman To: Andrew Morton Cc: Simon Kirby , KOSAKI Motohiro , Shaohua Li , Dave Hansen , Johannes Weiner , linux-mm , linux-kernel Subject: Re: [PATCH 2/6] mm: kswapd: Keep kswapd awake for high-order allocations until a percentage of the node is balanced Message-ID: <20101215105406.GI13914@csn.ul.ie> References: <1291995985-5913-1-git-send-email-mel@csn.ul.ie> <1291995985-5913-3-git-send-email-mel@csn.ul.ie> <20101214144341.71b43cb5.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20101214144341.71b43cb5.akpm@linux-foundation.org> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5140 Lines: 148 On Tue, Dec 14, 2010 at 02:43:41PM -0800, Andrew Morton wrote: > On Fri, 10 Dec 2010 15:46:21 +0000 > Mel Gorman wrote: > > > When reclaiming for high-orders, kswapd is responsible for balancing a > > node but it should not reclaim excessively. It avoids excessive reclaim by > > considering if any zone in a node is balanced then the node is balanced. > > Here you're referring to your [patch 1/6] yes? Not to current upstream. > Yes. > > In > > the cases where there are imbalanced zone sizes (e.g. ZONE_DMA with both > > ZONE_DMA32 and ZONE_NORMAL), kswapd can go to sleep prematurely as just > > one small zone was balanced. > > Since [1/6]? > Yes. > > This alters the sleep logic of kswapd slightly. It counts the number of pages > > that make up the balanced zones. If the total number of balanced pages is > > Define "balanced page"? Seems to be the sum of the total sizes of all > zones which have reached their desired free-pages threshold? > Correct. > But this includes all page orders, whereas here we're targetting a > particular order. Although things should work out OK due to the > scaling/sizing proportionality. > It's the size of the whole zone that is being accounted for and as it's a watermark check, the order is being taken into account. > > more than a quarter of the zone, kswapd will go back to sleep. This should > > keep a node balanced without reclaiming an excessive number of pages. > > ick. > > > Signed-off-by: Mel Gorman > > Reviewed-by: Minchan Kim > > --- > > mm/vmscan.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++--------- > > 1 files changed, 49 insertions(+), 9 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 625dfba..6723101 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2191,10 +2191,40 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, > > } > > #endif > > > > +/* > > + * pgdat_balanced is used when checking if a node is balanced for high-order > > + * allocations. > > Is this the correct use of the term "balanced"? I think "balanced" is > something that happens *between* zones: They've all achieved the same > (perhaps weighted) ratio of free pages. > What would be a better term? pgdat_sufficiently_but_not_fully_balanced()? If it returns true, it can mean the node is either fully "balanced" as you define it or that enough zones have enough free suitably-ordered pages for allocations to succeed. > > Only zones that meet watermarks and are in a zone allowed > > + * by the callers classzone_idx are added to balanced_pages. The total of > > caller's > Right. > > + * balanced pages must be at least 25% of the zones allowed by classzone_idx > > + * for the node to be considered balanced. Forcing all zones to be balanced > > + * for high orders can cause excessive reclaim when there are imbalanced zones. > > Excessive reclaim of what? > slab, list rotations and pages within the imbalanced zones that may never become balanced. Minimally, kswapd just stays awake consuming CPU. > If one particular zone is having trouble achieving its desired level of > free pages of a partocular order, are you saying that kswapd sits there > madly scanning other zones, which have already reached their desired > level? If so, that would be bad. > As far as I can gather, yes, this is what is happening. I don't have a local reproduction case so I'm basing this on a bug report. He has two problems - kswapd stays awake constantly and way too many pages are free. > I think you're saying that we just keep on scanning away at this one > zone. But what was wrong with doing that? > It wastes CPU. > > + * The choice of 25% is due to > > + * o a 16M DMA zone that is balanced will not balance a zone on any > > + * reasonable sized machine > > How does a zone balance another zone? > That should have been "will not balance a node". > > + * o On all other machines, the top zone must be at least a reasonable > > + * precentage of the middle zones. For example, on 32-bit x86, highmem > > + * would need to be at least 256M for it to be balance a whole node. > > + * Similarly, on x86-64 the Normal zone would need to be at least 1G > > + * to balance a node on its own. These seemed like reasonable ratios. > > + */ > > +static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages, > > + int classzone_idx) > > +{ > > + unsigned long present_pages = 0; > > + int i; > > + > > + for (i = 0; i <= classzone_idx; i++) > > + present_pages += pgdat->node_zones[i].present_pages; > > + > > + return balanced_pages > (present_pages >> 2); > > +} > > + > > > > ... > > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/