2013-10-23 21:01:47

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH] mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

From: KOSAKI Motohiro <[email protected]>

Yasuaki Ithimatsu reported memory hot-add spent more than 5 _hours_
on 9TB memory machine and we found out setup_zone_migrate_reserve
spnet >90% time.

The problem is, setup_zone_migrate_reserve scan all pageblock
unconditionally, but it is only necessary number of reserved block
was reduced (i.e. memory hot remove).
Moreover, maximum MIGRATE_RESERVE per zone are currently 2. It mean,
number of reserved pageblock are almost always unchanged.

This patch adds zone->nr_migrate_reserve_block to maintain number
of MIGRATE_RESERVE pageblock and it reduce an overhead of
setup_zone_migrate_reserve dramatically.

Cc: Mel Gorman <[email protected]>
Reported-by: Yasuaki Ishimatsu <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
include/linux/mmzone.h | 6 ++++++
mm/page_alloc.c | 13 +++++++++++++
2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bd791e4..67ab5fe 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -490,6 +490,12 @@ struct zone {
unsigned long managed_pages;

/*
+ * Number of MIGRATE_RESEVE page block. To maintain for just
+ * optimization. Protected by zone->lock.
+ */
+ int nr_migrate_reserve_block;
+
+ /*
* rarely used fields:
*/
const char *name;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 58e67ea..76ca054 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3909,6 +3909,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
struct page *page;
unsigned long block_migratetype;
int reserve;
+ int old_reserve;

/*
* Get the start pfn, end pfn and the number of blocks to reserve
@@ -3930,6 +3931,12 @@ static void setup_zone_migrate_reserve(struct zone *zone)
* future allocation of hugepages at runtime.
*/
reserve = min(2, reserve);
+ old_reserve = zone->nr_migrate_reserve_block;
+
+ /* When memory hot-add, we almost always need to do nothing */
+ if (reserve == old_reserve)
+ return;
+ zone->nr_migrate_reserve_block = reserve;

for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
if (!pfn_valid(pfn))
@@ -3967,6 +3974,12 @@ static void setup_zone_migrate_reserve(struct zone *zone)
reserve--;
continue;
}
+ } else if (!old_reserve) {
+ /*
+ * When boot time, we don't need scan whole zone
+ * for turning off MIGRATE_RESERVE.
+ */
+ break;
}

/*
--
1.7.1


2013-10-30 15:19:10

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

On Wed, Oct 23, 2013 at 05:01:32PM -0400, [email protected] wrote:
> From: KOSAKI Motohiro <[email protected]>
>
> Yasuaki Ithimatsu reported memory hot-add spent more than 5 _hours_
> on 9TB memory machine and we found out setup_zone_migrate_reserve
> spnet >90% time.
>
> The problem is, setup_zone_migrate_reserve scan all pageblock
> unconditionally, but it is only necessary number of reserved block
> was reduced (i.e. memory hot remove).
> Moreover, maximum MIGRATE_RESERVE per zone are currently 2. It mean,
> number of reserved pageblock are almost always unchanged.
>
> This patch adds zone->nr_migrate_reserve_block to maintain number
> of MIGRATE_RESERVE pageblock and it reduce an overhead of
> setup_zone_migrate_reserve dramatically.
>

It seems regrettable to expand the size of struct zone just for this.
You are right that the number of blocks does not exceed 2 because of a
check made in setup_zone_migrate_reserve so it should be possible to
special case this. I didn't test this or think about it particularly
carefully and no doubt there is a nicer way but for illustration
purposes see the patch below.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dd886fa..1aedddd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3897,6 +3897,8 @@ static int pageblock_is_reserved(unsigned long start_pfn, unsigned long end_pfn)
return 0;
}

+#define MAX_MIGRATE_RESERVE_BLOCKS 2
+
/*
* Mark a number of pageblocks as MIGRATE_RESERVE. The number
* of blocks reserved is based on min_wmark_pages(zone). The memory within
@@ -3910,6 +3912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
struct page *page;
unsigned long block_migratetype;
int reserve;
+ int found = 0;

/*
* Get the start pfn, end pfn and the number of blocks to reserve
@@ -3926,11 +3929,11 @@ static void setup_zone_migrate_reserve(struct zone *zone)
/*
* Reserve blocks are generally in place to help high-order atomic
* allocations that are short-lived. A min_free_kbytes value that
- * would result in more than 2 reserve blocks for atomic allocations
- * is assumed to be in place to help anti-fragmentation for the
- * future allocation of hugepages at runtime.
+ * would result in more than MAX_MIGRATE_RESERVE_BLOCKS reserve blocks
+ * for atomic allocations is assumed to be in place to help
+ * anti-fragmentation for the future allocation of hugepages at runtime.
*/
- reserve = min(2, reserve);
+ reserve = min(MAX_MIGRATE_RESERVE_BLOCKS, reserve);

for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
if (!pfn_valid(pfn))
@@ -3956,6 +3959,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
/* If this block is reserved, account for it */
if (block_migratetype == MIGRATE_RESERVE) {
reserve--;
+ found++;
continue;
}

@@ -3970,6 +3974,10 @@ static void setup_zone_migrate_reserve(struct zone *zone)
}
}

+ /* If all possible reserve blocks have been found, we're done */
+ if (found >= MAX_MIGRATE_RESERVE_BLOCKS)
+ break;
+
/*
* If the reserve is met and this is a previous reserved block,
* take it back

2013-10-30 19:26:21

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

(10/30/13 11:19 AM), Mel Gorman wrote:
> On Wed, Oct 23, 2013 at 05:01:32PM -0400, [email protected] wrote:
>> From: KOSAKI Motohiro <[email protected]>
>>
>> Yasuaki Ithimatsu reported memory hot-add spent more than 5 _hours_
>> on 9TB memory machine and we found out setup_zone_migrate_reserve
>> spnet >90% time.
>>
>> The problem is, setup_zone_migrate_reserve scan all pageblock
>> unconditionally, but it is only necessary number of reserved block
>> was reduced (i.e. memory hot remove).
>> Moreover, maximum MIGRATE_RESERVE per zone are currently 2. It mean,
>> number of reserved pageblock are almost always unchanged.
>>
>> This patch adds zone->nr_migrate_reserve_block to maintain number
>> of MIGRATE_RESERVE pageblock and it reduce an overhead of
>> setup_zone_migrate_reserve dramatically.
>>
>
> It seems regrettable to expand the size of struct zone just for this.

This is only matter when backporting enterprise distro. But you are right
it would be nice if it's avoidable.

> You are right that the number of blocks does not exceed 2 because of a
> check made in setup_zone_migrate_reserve so it should be possible to
> special case this. I didn't test this or think about it particularly
> carefully and no doubt there is a nicer way but for illustration
> purposes see the patch below.

I'll test. A few days give me please.

2013-10-30 20:19:12

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

> @@ -3926,11 +3929,11 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> /*
> * Reserve blocks are generally in place to help high-order atomic
> * allocations that are short-lived. A min_free_kbytes value that
> - * would result in more than 2 reserve blocks for atomic allocations
> - * is assumed to be in place to help anti-fragmentation for the
> - * future allocation of hugepages at runtime.
> + * would result in more than MAX_MIGRATE_RESERVE_BLOCKS reserve blocks
> + * for atomic allocations is assumed to be in place to help
> + * anti-fragmentation for the future allocation of hugepages at runtime.
> */
> - reserve = min(2, reserve);
> + reserve = min(MAX_MIGRATE_RESERVE_BLOCKS, reserve);
>
> for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> if (!pfn_valid(pfn))
> @@ -3956,6 +3959,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> /* If this block is reserved, account for it */
> if (block_migratetype == MIGRATE_RESERVE) {
> reserve--;
> + found++;
> continue;
> }
>
> @@ -3970,6 +3974,10 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> }
> }
>
> + /* If all possible reserve blocks have been found, we're done */
> + if (found >= MAX_MIGRATE_RESERVE_BLOCKS)
> + break;
> +
> /*
> * If the reserve is met and this is a previous reserved block,
> * take it back

Nit. I would like to add following hunk. This is just nit because moving
reserve pageblock is extreme rare.

if (block_migratetype == MIGRATE_RESERVE) {
+ found++;
set_pageblock_migratetype(page, MIGRATE_MOVABLE);
move_freepages_block(zone, page, MIGRATE_MOVABLE);
}


2013-10-31 10:15:31

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

On Wed, Oct 30, 2013 at 04:19:07PM -0400, KOSAKI Motohiro wrote:
> >@@ -3926,11 +3929,11 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> > /*
> > * Reserve blocks are generally in place to help high-order atomic
> > * allocations that are short-lived. A min_free_kbytes value that
> >- * would result in more than 2 reserve blocks for atomic allocations
> >- * is assumed to be in place to help anti-fragmentation for the
> >- * future allocation of hugepages at runtime.
> >+ * would result in more than MAX_MIGRATE_RESERVE_BLOCKS reserve blocks
> >+ * for atomic allocations is assumed to be in place to help
> >+ * anti-fragmentation for the future allocation of hugepages at runtime.
> > */
> >- reserve = min(2, reserve);
> >+ reserve = min(MAX_MIGRATE_RESERVE_BLOCKS, reserve);
> >
> > for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> > if (!pfn_valid(pfn))
> >@@ -3956,6 +3959,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> > /* If this block is reserved, account for it */
> > if (block_migratetype == MIGRATE_RESERVE) {
> > reserve--;
> >+ found++;
> > continue;
> > }
> >
> >@@ -3970,6 +3974,10 @@ static void setup_zone_migrate_reserve(struct zone *zone)
> > }
> > }
> >
> >+ /* If all possible reserve blocks have been found, we're done */
> >+ if (found >= MAX_MIGRATE_RESERVE_BLOCKS)
> >+ break;
> >+
> > /*
> > * If the reserve is met and this is a previous reserved block,
> > * take it back
>
> Nit. I would like to add following hunk. This is just nit because moving
> reserve pageblock is extreme rare.
>
> if (block_migratetype == MIGRATE_RESERVE) {
> + found++;
> set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> move_freepages_block(zone, page, MIGRATE_MOVABLE);
> }

I don't really see the advantage but if you think it is necessary then I
do not object either.

--
Mel Gorman
SUSE Labs

2013-10-31 17:14:46

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

>> Nit. I would like to add following hunk. This is just nit because moving
>> reserve pageblock is extreme rare.
>>
>> if (block_migratetype == MIGRATE_RESERVE) {
>> + found++;
>> set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> move_freepages_block(zone, page, MIGRATE_MOVABLE);
>> }
>
> I don't really see the advantage but if you think it is necessary then I
> do not object either.

For example, a zone has five pageblock b1,b2,b3,b4,b5 and b1 has MIGRATE_RESERVE.
When hotremove b1 and hotadd again, your code need to scan all of blocks. But
mine only need to scan b1 and b2. I mean that's a hotplug specific optimization.