Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261222AbVEDIU3 (ORCPT ); Wed, 4 May 2005 04:20:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261309AbVEDIU3 (ORCPT ); Wed, 4 May 2005 04:20:29 -0400 Received: from holly.csn.ul.ie ([136.201.105.4]:16516 "EHLO holly.csn.ul.ie") by vger.kernel.org with ESMTP id S261222AbVEDIUB (ORCPT ); Wed, 4 May 2005 04:20:01 -0400 Date: Wed, 4 May 2005 09:20:01 +0100 (IST) From: Mel Gorman X-X-Sender: mel@skynet To: Joel Schopp Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@osdl.org Subject: Re: [PATCH] Avoiding external fragmentation with a placement policy Version 10 In-Reply-To: <4277D13F.3050804@austin.ibm.com> Message-ID: References: <20050503164452.A3AB9E592@skynet.csn.ul.ie> <4277D13F.3050804@austin.ibm.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6827 Lines: 176 On Tue, 3 May 2005, Joel Schopp wrote: > Comments inline below. > > > o Tightened what pools are used for fallbacks, less likely to fragment > > o Many micro-optimisations to have the same performance as the standard > > allocator. Modified allocator now faster than standard allocator using > > gcc 3.3.5 > > Nice. > > > o Increased the size of reserve for fallbacks from 10% to 12.5%. > > This just screams out for a tunable. Systems with different workloads and > different amounts of memory will behave better with different values. It > would be even better if it would self tune, but that might prove difficult. > The 12.5% was chosen as it's slightly above what the low-level watermarks are. I don't think it needs to be a tunable although I agree that calculating it based on zone watermarks would be a better idea than me guessing 12.5%. > > Difference in performance operations report generated by diff-aim9.sh from > > VMRegress 0.14 > > N Test Standard MBuddy V10 Diff % diff Test description > > Ops/sec Ops/sec Ops/sec > > -- ---------- --------- ---------- -------- ------ ---------------- > > 1 add_double 460569.72 465222.46 4652.74 1.01% Thousand Double > > Precision Additions/second > > 2 add_float 460523.25 465322.45 4799.20 1.04% Thousand Single > > Precision Additions/second > > 3 add_long 1421763.04 1436042.64 14279.60 1.00% Thousand Long > > Integer Additions/second > > 4 add_int 1421763.04 1436042.64 14279.60 1.00% Thousand Integer > > Additions/second > > 5 add_short 1421363.11 1435760.71 14397.60 1.01% Thousand Short > > Integer Additions/second > > 7 page_test 121048.16 123059.49 2011.33 1.66% System Allocations & > > Pages/second > > 8 brk_test 445743.79 452407.93 6664.14 1.50% System Memory > > Allocations/second > > 9 jmp_test 4158416.67 4232083.33 73666.66 1.77% Non-local > > gotos/second > > 10 signal_test 94417.60 94584.24 166.64 0.18% Signal Traps/second > > 11 exec_test 65.04 66.69 1.65 2.54% Program Loads/second > > 12 fork_test 1537.82 1730.51 192.69 12.53% Task > > Creations/second > > 13 link_test 6411.28 6477.45 66.17 1.03% Link/Unlink > > Pairs/second > > > > The aim9 results show that there are consistent improvements for common > > page-related operations. The results are compiler dependant and there are > > variances of 1-2% between versions. > > Any explanation for why fork_test shows markedly better improvement compared > to the others? > Not a clue. > > -#define __GFP_BITS_SHIFT 16 /* Room for 16 __GFP_FOO bits */ > > +#define __GFP_BITS_SHIFT 18 /* Room for 16 __GFP_FOO bits */ > > Comment should have the new 18, not the old 16. > Opps, correct. > > +#ifdef CONFIG_ALLOCSTATS > > + /* > > + * These are beancounters that track how the placement policy > > + * of the buddy allocator is performing > > + */ > > + unsigned long fallback_count[ALLOC_TYPES]; > > + unsigned long alloc_count[ALLOC_TYPES]; > > + unsigned long reserve_count[ALLOC_TYPES]; > > + unsigned long kernnorclm_full_steal; > > + unsigned long kernnorclm_partial_steal; > > + unsigned long bulk_requests[MAX_ORDER]; > > + unsigned long bulk_alloced[MAX_ORDER]; > > +#endif > > It would be nice if all of the CONFIG_ALLOCSTATS stuff was broken out as a > second patch. It would make this patch much smaller and more readable. > I can do that. They were kept as one patch as I almost always collect the statistics as it's easier to figure out what is happening. I've also found that benchmark results tend to be better if I collect statistics (which makes no sense but was consistently true). > > +int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { + {ALLOC_KERNNORCLM, > > ALLOC_FALLBACK, ALLOC_KERNRCLM, ALLOC_USERRCLM}, > > + {ALLOC_KERNRCLM, ALLOC_FALLBACK, ALLOC_KERNNORCLM, ALLOC_USERRCLM}, > > I would have thought that KernRclm would want to choose USERRCLM over > KERNNOCRLM. > No, because UserRclm is the easiest to free pages in by far where as KernRclm pages need a lot more work, specifically the ability to reclaim a slab page on demand. From the point of view of fragmentation, KernNoRclm is already lost, so I prefer to fallback there than anywhere else. The problem is that it can accelerate when a KernNoRclm allocation needs to fallback. I have not found it to be a problem in the benchmarks I've run but there may be cases in the future where we need to be a lot more strict about fallbacks. > > + {ALLOC_USERRCLM, ALLOC_FALLBACK, ALLOC_KERNNORCLM, ALLOC_KERNRCLM}, > > I'm almost certain the UserRclm type should prefer KERNRCLM over KERNNORCLM. > Again no for similar reasons to why KernRclm falls back to KernNoRclm. > > > + * Here, the alloc type lists has been depleted as well as the global > > + * pool, so fallback. When falling back, the largest possible block > > + * will be taken to keep the fallbacks clustered if possible > > + */ > > I was curious if you had tried taking the smallest possible block. I would > think that it would reduce the amount of fallback needed, and thus increase > the amount available for the 3 allocation types. I would expect a net win, > despite not clustering fallbacks particularly well. > I found it to be a net loss on tests with increased fallbacks. If we fallback on order-0, we'll also fallback on the next allocation. However, if we "steal" a large block of pages, fallbacks will be delayed until that large block is consumed. > > + alloctype = fallback_list[retry_count]; > > + > > + /* Find a block to allocate */ > > + area = zone->free_area_lists[alloctype] + (MAX_ORDER-1); > > + current_order=MAX_ORDER; > > + do { > > + current_order--; > > + if (list_empty(&area->free_list)) { > > + area--; > > + continue; > > + } > > + > > + goto remove_page; > > + } while (current_order != order); > > + } > > This loop is a bit hard to understand. I think it would be easier to > understand if it looked something like this (totally untested): > > + current_order=MAX_ORDER - 1 ; > + do { > + if (!list_empty(&area->free_list)) { > + goto remove_page; > + } > + > + area--; > + current_order--; > + } while (current_order >= order); > Will try it out, thanks. -- Mel Gorman Part-time Phd Student Java Applications Developer University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/