Message-ID: <4277D13F.3050804@austin.ibm.com>
Date: Tue, 03 May 2005 14:30:07 -0500
From: Joel Schopp <jschopp@austin.ibm.com>
Reply-To: jschopp@austin.ibm.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040910
MIME-Version: 1.0
To: Mel Gorman <mel@csn.ul.ie>
CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [PATCH] Avoiding external fragmentation with a placement policy
 Version 10
References: <20050503164452.A3AB9E592@skynet.csn.ul.ie>
In-Reply-To: <20050503164452.A3AB9E592@skynet.csn.ul.ie>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4901
Lines: 120

Comments inline below.

> o Tightened what pools are used for fallbacks, less likely to fragment
> o Many micro-optimisations to have the same performance as the standard 
>   allocator. Modified allocator now faster than standard allocator using
>   gcc 3.3.5

Nice.

> o Increased the size of reserve for fallbacks from 10% to 12.5%. 

This just screams out for a tunable.  Systems with different workloads 
and different amounts of memory will behave better with different 
values.  It would be even better if it would self tune, but that might 
prove difficult.

> Difference in performance operations report generated by diff-aim9.sh from VMRegress 0.14
> N  Test          Standard   MBuddy V10  Diff     % diff Test description
>                  Ops/sec    Ops/sec     Ops/sec
> -- ----------    ---------  ----------  -------- ------ ----------------
>  1 add_double    460569.72  465222.46    4652.74  1.01% Thousand Double Precision Additions/second
>  2 add_float     460523.25  465322.45    4799.20  1.04% Thousand Single Precision Additions/second
>  3 add_long     1421763.04 1436042.64   14279.60  1.00% Thousand Long Integer Additions/second
>  4 add_int      1421763.04 1436042.64   14279.60  1.00% Thousand Integer Additions/second
>  5 add_short    1421363.11 1435760.71   14397.60  1.01% Thousand Short Integer Additions/second
>  7 page_test     121048.16  123059.49    2011.33  1.66% System Allocations & Pages/second
>  8 brk_test      445743.79  452407.93    6664.14  1.50% System Memory Allocations/second
>  9 jmp_test     4158416.67 4232083.33   73666.66  1.77% Non-local gotos/second
> 10 signal_test    94417.60   94584.24     166.64  0.18% Signal Traps/second
> 11 exec_test         65.04      66.69       1.65  2.54% Program Loads/second
> 12 fork_test       1537.82    1730.51     192.69 12.53% Task Creations/second
> 13 link_test       6411.28    6477.45      66.17  1.03% Link/Unlink Pairs/second
> 
> The aim9 results show that there are consistent improvements for common
> page-related operations. The results are compiler dependant and there are
> variances of 1-2% between versions.

Any explanation for why fork_test shows markedly better improvement 
compared to the others?

> -#define __GFP_BITS_SHIFT 16	/* Room for 16 __GFP_FOO bits */
> +#define __GFP_BITS_SHIFT 18	/* Room for 16 __GFP_FOO bits */

Comment should have the new 18, not the old 16.

> +#ifdef CONFIG_ALLOCSTATS
> +	/*
> +	 * These are beancounters that track how the placement policy
> +	 * of the buddy allocator is performing
> +	 */
> +	unsigned long fallback_count[ALLOC_TYPES];
> +	unsigned long alloc_count[ALLOC_TYPES];
> +	unsigned long reserve_count[ALLOC_TYPES];
> +	unsigned long kernnorclm_full_steal;
> +	unsigned long kernnorclm_partial_steal;
> +	unsigned long bulk_requests[MAX_ORDER];
> +	unsigned long bulk_alloced[MAX_ORDER];
> +#endif

It would be nice if all of the CONFIG_ALLOCSTATS stuff was broken out as 
a second patch.  It would make this patch much smaller and more readable.

> +int fallback_allocs[ALLOC_TYPES][ALLOC_TYPES] = { 
> +	{ALLOC_KERNNORCLM, ALLOC_FALLBACK,   ALLOC_KERNRCLM,   ALLOC_USERRCLM},
> +	{ALLOC_KERNRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_USERRCLM},

I would have thought that KernRclm would want to choose USERRCLM over 
KERNNOCRLM.

> +	{ALLOC_USERRCLM,   ALLOC_FALLBACK,   ALLOC_KERNNORCLM, ALLOC_KERNRCLM},

I'm almost certain the UserRclm type should prefer KERNRCLM over KERNNORCLM.


> +	 * Here, the alloc type lists has been depleted as well as the global
> +	 * pool, so fallback. When falling back, the largest possible block
> +	 * will be taken to keep the fallbacks clustered if possible
> +	 */

I was curious if you had tried taking the smallest possible block.  I 
would think that it would reduce the amount of fallback needed, and thus 
increase the amount available for the 3 allocation types.  I would 
expect a net win, despite not clustering fallbacks particularly well.

> +		alloctype = fallback_list[retry_count];
> +
> +		/* Find a block to allocate */
> +		area = zone->free_area_lists[alloctype] + (MAX_ORDER-1);
> +		current_order=MAX_ORDER;
> +		do {
> +			current_order--;
> +			if (list_empty(&area->free_list)) {
> +				area--;
> +				continue;
> +			}
> +
> +			goto remove_page;
> +		} while (current_order != order);
> +	}

This loop is a bit hard to understand.  I think it would be easier to 
understand if it looked something like this (totally untested):

+		current_order=MAX_ORDER - 1 ;
+		do {
+			if (!list_empty(&area->free_list)) {
+				goto remove_page;
+			}
+
+			area--;
+			current_order--;
+		} while (current_order >= order);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/