Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758474AbZCPMCa (ORCPT ); Mon, 16 Mar 2009 08:02:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751821AbZCPMCV (ORCPT ); Mon, 16 Mar 2009 08:02:21 -0400 Received: from gir.skynet.ie ([193.1.99.77]:48266 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751784AbZCPMCU (ORCPT ); Mon, 16 Mar 2009 08:02:20 -0400 Date: Mon, 16 Mar 2009 12:02:17 +0000 From: Mel Gorman To: Nick Piggin Cc: Linux Memory Management List , Pekka Enberg , Rik van Riel , KOSAKI Motohiro , Christoph Lameter , Johannes Weiner , Linux Kernel Mailing List , Lin Ming , Zhang Yanmin , Peter Zijlstra Subject: Re: [PATCH 00/35] Cleanup and optimise the page allocator V3 Message-ID: <20090316120216.GB6382@csn.ul.ie> References: <1237196790-7268-1-git-send-email-mel@csn.ul.ie> <20090316104054.GA23046@wotan.suse.de> <20090316111906.GA6382@csn.ul.ie> <20090316113358.GA30802@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20090316113358.GA30802@wotan.suse.de> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3782 Lines: 83 On Mon, Mar 16, 2009 at 12:33:58PM +0100, Nick Piggin wrote: > On Mon, Mar 16, 2009 at 11:19:06AM +0000, Mel Gorman wrote: > > On Mon, Mar 16, 2009 at 11:40:54AM +0100, Nick Piggin wrote: > > > That's wonderful, but it would > > > significantly increase the fragmentation problem, wouldn't it? > > > > Not necessarily, anti-fragmentation groups movable pages within a > > hugepage-aligned block and high-order allocations will trigger a merge of > > buddies from PAGE_ALLOC_MERGE_ORDER (defined in the relevant patch) up to > > MAX_ORDER-1. Critically, a merge is also triggered when anti-fragmentation > > wants to fallback to another migratetype to satisfy an allocation. As > > long as the grouping works, it doesn't matter if they were only merged up > > to PAGE_ALLOC_MERGE_ORDER as a full merge will still free up hugepages. > > So two slow paths are made slower but the fast path should be faster and it > > should be causing fewer cache line bounces due to writes to struct page. > > Oh, but the anti-fragmentation stuff is orthogonal to this. Movable > groups should always be defragmentable (at some cost)... the bane of > anti-frag is fragmentation of the non-movable groups. > True, the reclaimable area has varying degrees of success and the non-movable groups are almost unworkable and depend on how much of them depend on pagetables. > And one reason why buddy is so good at avoiding fragmentation is > because it will pick up _any_ pages that go past the allocator if > they have any free buddies. And it hands out ones that don't have > free buddies. So in that way it is naturally continually filtering > out pages which can be merged. > > Wheras if you defer this until the point you need a higher order > page, the only thing you have to work with are the pages that are > free *right now*. > Well, buddy always uses the smallest available page first. Even with deferred coalescing, it will merge up to order-5 at least. Lets say they could have merged up to order-10 in ordinary circumstances, they are still avoided for as long as possible. Granted, it might mean that an order-5 is split that could have been merged but it's hard to tell how much of a difference that makes. > It will almost definitely increase fragmentation of non movable zones, > and if you have a workload doing non-trivial, non movable higher order > allocations that are likely to cause fragmentation, it will result > in these allocations eating movable groups sooner I think. > I think the effect will be same unless the non-movable high-order allocations are order-5 or higher in which case we are likely going to hit trouble anyway. > > > When I last checked (about 10 days) ago, I hadn't damaged anti-fragmentation > > but that was a lot of revisions ago. I'm redoing the tests to make sure > > anti-fragmentation is still ok. > > Your anti-frag tests probably don't stress this long term fragmentation > problem. > Probably not, but we have little data on long-term fragmentation other than anecdotal evidence that it's ok these days. > Still, it's significant enough that I think it should be made > optional (and arguably default to on) even if it does harm higher > order allocations a bit. > I could make PAGE_ORDER_MERGE_ORDER a proc tunable? If it's placed as a read-mostly variable beside the gfp_zone table, it might even fit in the same cache line. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/