This patchset aims at increase of compaction success rate. Changes are
related to compaction finish condition and freepage isolation condition.
>From these changes, I did stress highalloc test in mmtests with nonmovable
order 7 allocation configuration, and success rate (%) at phase 1 are,
Base Patch-1 Patch-3 Patch-4
55.00 57.00 62.67 64.00
And, compaction success rate (%) on same test are,
Base Patch-1 Patch-3 Patch-4
18.47 28.94 35.13 41.50
This patchset is based on my tracepoint update on compaction.
https://lkml.org/lkml/2014/12/3/71
Joonsoo Kim (4):
mm/compaction: fix wrong order check in compact_finished()
mm/page_alloc: expands broken freepage to proper buddy list when
steal
mm/compaction: enhance compaction finish condition
mm/compaction: stop the isolation when we isolate enough freepage
include/linux/mmzone.h | 3 ++
include/trace/events/kmem.h | 7 +++--
mm/compaction.c | 48 ++++++++++++++++++++++------
mm/internal.h | 1 +
mm/page_alloc.c | 73 +++++++++++++++++++++++++------------------
5 files changed, 89 insertions(+), 43 deletions(-)
--
1.7.9.5
What we want to check here is whether there is highorder freepage
in buddy list of other migratetype in order to steal it without
fragmentation. But, current code just checks cc->order which means
allocation request order. So, this is wrong.
Without this fix, non-movable synchronous compaction below pageblock order
would not stopped until compaction is complete, because migratetype of most
pageblocks are movable and high order freepage made by compaction is usually
on movable type buddy list.
There is some report related to this bug. See below link.
http://www.spinics.net/lists/linux-mm/msg81666.html
Although the issued system still has load spike comes from compaction,
this makes that system completely stable and responsive according to
his report.
stress-highalloc test in mmtests with non movable order 7 allocation doesn't
show any notable difference in allocation success rate, but, it shows more
compaction success rate and reduced elapsed time.
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
18.47 : 28.94
Elapsed time (sec)
1429 : 1411
Cc: <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/compaction.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index c9ee464..1a5f465 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1105,7 +1105,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
return COMPACT_PARTIAL;
/* Job done if allocation would set block type */
- if (cc->order >= pageblock_order && area->nr_free)
+ if (order >= pageblock_order && area->nr_free)
return COMPACT_PARTIAL;
}
--
1.7.9.5
There is odd behaviour when we steal freepages from other migratetype
buddy list. In try_to_steal_freepages(), we move all freepages in
the pageblock that founded freepage is belong to to the request
migratetype in order to mitigate fragmentation. If the number of moved
pages are enough to change pageblock migratetype, there is no problem. If
not enough, we don't change pageblock migratetype and add broken freepages
to the original migratetype buddy list rather than request migratetype
one. For me, this is odd, because we already moved all freepages in this
pageblock to the request migratetype. This patch fixes this situation to
add broken freepages to the request migratetype buddy list in this case.
This patch introduce new function that can help to decide if we can
steal the page without resulting in fragmentation. It will be used in
following patch for compaction finish criteria.
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/trace/events/kmem.h | 7 +++--
mm/page_alloc.c | 72 +++++++++++++++++++++++++------------------
2 files changed, 46 insertions(+), 33 deletions(-)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index aece134..4ad10ba 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -268,11 +268,11 @@ TRACE_EVENT(mm_page_alloc_extfrag,
TP_PROTO(struct page *page,
int alloc_order, int fallback_order,
- int alloc_migratetype, int fallback_migratetype, int new_migratetype),
+ int alloc_migratetype, int fallback_migratetype),
TP_ARGS(page,
alloc_order, fallback_order,
- alloc_migratetype, fallback_migratetype, new_migratetype),
+ alloc_migratetype, fallback_migratetype),
TP_STRUCT__entry(
__field( struct page *, page )
@@ -289,7 +289,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
__entry->fallback_order = fallback_order;
__entry->alloc_migratetype = alloc_migratetype;
__entry->fallback_migratetype = fallback_migratetype;
- __entry->change_ownership = (new_migratetype == alloc_migratetype);
+ __entry->change_ownership = (alloc_migratetype ==
+ get_pageblock_migratetype(page));
),
TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7c46d0f..7b4c9aa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1139,44 +1139,50 @@ static void change_pageblock_range(struct page *pageblock_page,
* Returns the new migratetype of the pageblock (or the same old migratetype
* if it was unchanged).
*/
-static int try_to_steal_freepages(struct zone *zone, struct page *page,
- int start_type, int fallback_type)
+static void try_to_steal_freepages(struct zone *zone, struct page *page,
+ int target_mt)
{
+ int pages;
int current_order = page_order(page);
- /*
- * When borrowing from MIGRATE_CMA, we need to release the excess
- * buddy pages to CMA itself. We also ensure the freepage_migratetype
- * is set to CMA so it is returned to the correct freelist in case
- * the page ends up being not actually allocated from the pcp lists.
- */
- if (is_migrate_cma(fallback_type))
- return fallback_type;
-
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
- change_pageblock_range(page, current_order, start_type);
- return start_type;
+ change_pageblock_range(page, current_order, target_mt);
+ return;
}
- if (current_order >= pageblock_order / 2 ||
- start_type == MIGRATE_RECLAIMABLE ||
- page_group_by_mobility_disabled) {
- int pages;
+ pages = move_freepages_block(zone, page, target_mt);
- pages = move_freepages_block(zone, page, start_type);
+ /* Claim the whole block if over half of it is free */
+ if (pages >= (1 << (pageblock_order-1)) ||
+ page_group_by_mobility_disabled) {
- /* Claim the whole block if over half of it is free */
- if (pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled) {
+ set_pageblock_migratetype(page, target_mt);
+ }
+}
- set_pageblock_migratetype(page, start_type);
- return start_type;
- }
+static bool can_steal_freepages(unsigned int order,
+ int start_mt, int fallback_mt)
+{
+ /*
+ * When borrowing from MIGRATE_CMA, we need to release the excess
+ * buddy pages to CMA itself. We also ensure the freepage_migratetype
+ * is set to CMA so it is returned to the correct freelist in case
+ * the page ends up being not actually allocated from the pcp lists.
+ */
+ if (is_migrate_cma(fallback_mt))
+ return false;
- }
+ /* Can take ownership for orders >= pageblock_order */
+ if (order >= pageblock_order)
+ return true;
+
+ if (order >= pageblock_order / 2 ||
+ start_mt == MIGRATE_RECLAIMABLE ||
+ page_group_by_mobility_disabled)
+ return true;
- return fallback_type;
+ return false;
}
/* Remove an element from the buddy allocator from the fallback list */
@@ -1187,6 +1193,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
unsigned int current_order;
struct page *page;
int migratetype, new_type, i;
+ bool can_steal;
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1;
@@ -1194,6 +1201,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
--current_order) {
for (i = 0;; i++) {
migratetype = fallbacks[start_migratetype][i];
+ new_type = migratetype;
/* MIGRATE_RESERVE handled later if necessary */
if (migratetype == MIGRATE_RESERVE)
@@ -1207,9 +1215,13 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
struct page, lru);
area->nr_free--;
- new_type = try_to_steal_freepages(zone, page,
- start_migratetype,
- migratetype);
+ can_steal = can_steal_freepages(current_order,
+ start_migratetype, migratetype);
+ if (can_steal) {
+ new_type = start_migratetype;
+ try_to_steal_freepages(zone, page,
+ start_migratetype);
+ }
/* Remove the page from the freelists */
list_del(&page->lru);
@@ -1225,7 +1237,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
set_freepage_migratetype(page, new_type);
trace_mm_page_alloc_extfrag(page, order, current_order,
- start_migratetype, migratetype, new_type);
+ start_migratetype, migratetype);
return page;
}
--
1.7.9.5
Compaction has anti fragmentation algorithm. It is that freepage
should be more than pageblock order to finish the compaction if we don't
find any freepage in requested migratetype buddy list. This is for
mitigating fragmentation, but, it is a lack of migratetype consideration
and too excessive.
At first, it doesn't consider migratetype so there would be false positive
on compaction finish decision. For example, if allocation request is
for unmovable migratetype, freepage in CMA migratetype doesn't help that
allocation, so compaction should not be stopped. But, current logic
considers it as compaction is no longer needed and stop the compaction.
Secondly, it is too excessive. We can steal freepage from other migratetype
and change pageblock migratetype on more relaxed conditions. In page
allocator, there is another conditions that can succeed to steal without
introducing fragmentation.
To solve these problems, this patch borrows anti fragmentation logic from
page allocator. It will reduce premature compaction finish in some cases
and reduce excessive compaction work.
stress-highalloc test in mmtests with non movable order 7 allocation shows
in allocation success rate on phase 1 and compaction success rate.
Allocation success rate on phase 1 (%)
57.00 : 63.67
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
28.94 : 35.13
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/mmzone.h | 3 +++
mm/compaction.c | 31 +++++++++++++++++++++++++++++--
mm/internal.h | 1 +
mm/page_alloc.c | 5 ++---
4 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2f0856d..87f5bb5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -63,6 +63,9 @@ enum {
MIGRATE_TYPES
};
+#define FALLBACK_MIGRATETYPES (4)
+extern int fallbacks[MIGRATE_TYPES][FALLBACK_MIGRATETYPES];
+
#ifdef CONFIG_CMA
# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
#else
diff --git a/mm/compaction.c b/mm/compaction.c
index 1a5f465..2fd5f79 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1054,6 +1054,30 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
}
+static bool can_steal_fallbacks(struct free_area *area,
+ unsigned int order, int migratetype)
+{
+ int i;
+ int fallback_mt;
+
+ if (area->nr_free == 0)
+ return false;
+
+ for (i = 0; i < FALLBACK_MIGRATETYPES; i++) {
+ fallback_mt = fallbacks[migratetype][i];
+ if (fallback_mt == MIGRATE_RESERVE)
+ break;
+
+ if (list_empty(&area->free_list[fallback_mt]))
+ continue;
+
+ if (can_steal_freepages(order, migratetype, fallback_mt))
+ return true;
+ }
+
+ return false;
+}
+
static int __compact_finished(struct zone *zone, struct compact_control *cc,
const int migratetype)
{
@@ -1104,8 +1128,11 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
if (!list_empty(&area->free_list[migratetype]))
return COMPACT_PARTIAL;
- /* Job done if allocation would set block type */
- if (order >= pageblock_order && area->nr_free)
+ /*
+ * Job done if allocation would steal freepages from
+ * other migratetype buddy lists.
+ */
+ if (can_steal_fallbacks(area, order, migratetype))
return COMPACT_PARTIAL;
}
diff --git a/mm/internal.h b/mm/internal.h
index efad241..7028d83 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -179,6 +179,7 @@ unsigned long
isolate_migratepages_range(struct compact_control *cc,
unsigned long low_pfn, unsigned long end_pfn);
+bool can_steal_freepages(unsigned int order, int start_mt, int fallback_mt);
#endif
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7b4c9aa..dcb8523 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1031,7 +1031,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
*/
-static int fallbacks[MIGRATE_TYPES][4] = {
+int fallbacks[MIGRATE_TYPES][FALLBACK_MIGRATETYPES] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
#ifdef CONFIG_CMA
@@ -1161,8 +1161,7 @@ static void try_to_steal_freepages(struct zone *zone, struct page *page,
}
}
-static bool can_steal_freepages(unsigned int order,
- int start_mt, int fallback_mt)
+bool can_steal_freepages(unsigned int order, int start_mt, int fallback_mt)
{
/*
* When borrowing from MIGRATE_CMA, we need to release the excess
--
1.7.9.5
From: Joonsoo Kim <[email protected]>
Currently, freepage isolation in one pageblock doesn't consider how many
freepages we isolate. When I traced flow of compaction, compaction
sometimes isolates more than 256 freepages to migrate just 32 pages.
In this patch, freepage isolation is stopped at the point that we
have more isolated freepage than isolated page for migration. This
results in slowing down free page scanner and make compaction success
rate higher.
stress-highalloc test in mmtests with non movable order 7 allocation shows
increase of compaction success rate and slight improvement of allocation
success rate.
Allocation success rate on phase 1 (%)
62.70 : 64.00
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
35.13 : 41.50
pfn where both scanners meets on compaction complete
(separate test due to enormous tracepoint buffer)
(zone_start=4096, zone_end=1048576)
586034 : 654378
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/compaction.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 2fd5f79..12223b9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -422,6 +422,13 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
/* If a page was split, advance to the end of it */
if (isolated) {
+ cc->nr_freepages += isolated;
+ if (!strict &&
+ cc->nr_migratepages <= cc->nr_freepages) {
+ blockpfn += isolated;
+ break;
+ }
+
blockpfn += isolated - 1;
cursor += isolated - 1;
continue;
@@ -831,7 +838,6 @@ static void isolate_freepages(struct compact_control *cc)
unsigned long isolate_start_pfn; /* exact pfn we start at */
unsigned long block_end_pfn; /* end of current pageblock */
unsigned long low_pfn; /* lowest pfn scanner is able to scan */
- int nr_freepages = cc->nr_freepages;
struct list_head *freelist = &cc->freepages;
/*
@@ -856,11 +862,11 @@ static void isolate_freepages(struct compact_control *cc)
* pages on cc->migratepages. We stop searching if the migrate
* and free page scanners meet or enough free pages are isolated.
*/
- for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
+ for (; block_start_pfn >= low_pfn &&
+ cc->nr_migratepages > cc->nr_freepages;
block_end_pfn = block_start_pfn,
block_start_pfn -= pageblock_nr_pages,
isolate_start_pfn = block_start_pfn) {
- unsigned long isolated;
/*
* This can iterate a massively long zone without finding any
@@ -885,9 +891,8 @@ static void isolate_freepages(struct compact_control *cc)
continue;
/* Found a block suitable for isolating free pages from. */
- isolated = isolate_freepages_block(cc, &isolate_start_pfn,
+ isolate_freepages_block(cc, &isolate_start_pfn,
block_end_pfn, freelist, false);
- nr_freepages += isolated;
/*
* Remember where the free scanner should restart next time,
@@ -919,8 +924,6 @@ static void isolate_freepages(struct compact_control *cc)
*/
if (block_start_pfn < low_pfn)
cc->free_pfn = cc->migrate_pfn;
-
- cc->nr_freepages = nr_freepages;
}
/*
--
1.7.9.5
On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> What we want to check here is whether there is highorder freepage
> in buddy list of other migratetype in order to steal it without
> fragmentation. But, current code just checks cc->order which means
> allocation request order. So, this is wrong.
>
> Without this fix, non-movable synchronous compaction below pageblock order
> would not stopped until compaction is complete, because migratetype of most
> pageblocks are movable and high order freepage made by compaction is usually
> on movable type buddy list.
>
> There is some report related to this bug. See below link.
>
> http://www.spinics.net/lists/linux-mm/msg81666.html
>
> Although the issued system still has load spike comes from compaction,
> this makes that system completely stable and responsive according to
> his report.
>
> stress-highalloc test in mmtests with non movable order 7 allocation doesn't
> show any notable difference in allocation success rate, but, it shows more
> compaction success rate and reduced elapsed time.
>
> Compaction success rate (Compaction success * 100 / Compaction stalls, %)
> 18.47 : 28.94
>
> Elapsed time (sec)
> 1429 : 1411
>
> Cc: <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
> ---
> mm/compaction.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c9ee464..1a5f465 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1105,7 +1105,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> return COMPACT_PARTIAL;
>
> /* Job done if allocation would set block type */
> - if (cc->order >= pageblock_order && area->nr_free)
> + if (order >= pageblock_order && area->nr_free)
> return COMPACT_PARTIAL;
> }
>
>
On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> This patchset aims at increase of compaction success rate. Changes are
> related to compaction finish condition and freepage isolation condition.
>
> From these changes, I did stress highalloc test in mmtests with nonmovable
> order 7 allocation configuration, and success rate (%) at phase 1 are,
>
> Base Patch-1 Patch-3 Patch-4
> 55.00 57.00 62.67 64.00
>
> And, compaction success rate (%) on same test are,
>
> Base Patch-1 Patch-3 Patch-4
> 18.47 28.94 35.13 41.50
Did you test Patch-2 separately? Any difference to Patch 1?
> This patchset is based on my tracepoint update on compaction.
>
> https://lkml.org/lkml/2014/12/3/71
>
> Joonsoo Kim (4):
> mm/compaction: fix wrong order check in compact_finished()
> mm/page_alloc: expands broken freepage to proper buddy list when
> steal
> mm/compaction: enhance compaction finish condition
> mm/compaction: stop the isolation when we isolate enough freepage
>
> include/linux/mmzone.h | 3 ++
> include/trace/events/kmem.h | 7 +++--
> mm/compaction.c | 48 ++++++++++++++++++++++------
> mm/internal.h | 1 +
> mm/page_alloc.c | 73 +++++++++++++++++++++++++------------------
> 5 files changed, 89 insertions(+), 43 deletions(-)
>
On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> There is odd behaviour when we steal freepages from other migratetype
> buddy list. In try_to_steal_freepages(), we move all freepages in
> the pageblock that founded freepage is belong to to the request
> migratetype in order to mitigate fragmentation. If the number of moved
> pages are enough to change pageblock migratetype, there is no problem. If
> not enough, we don't change pageblock migratetype and add broken freepages
> to the original migratetype buddy list rather than request migratetype
> one. For me, this is odd, because we already moved all freepages in this
> pageblock to the request migratetype. This patch fixes this situation to
> add broken freepages to the request migratetype buddy list in this case.
I'd rather split the fix from the refactoring. And maybe my description
is longer, but easier to understand? (I guess somebody else should judge
this)
> This patch introduce new function that can help to decide if we can
> steal the page without resulting in fragmentation. It will be used in
> following patch for compaction finish criteria.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/trace/events/kmem.h | 7 +++--
> mm/page_alloc.c | 72 +++++++++++++++++++++++++------------------
> 2 files changed, 46 insertions(+), 33 deletions(-)
>
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index aece134..4ad10ba 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -268,11 +268,11 @@ TRACE_EVENT(mm_page_alloc_extfrag,
>
> TP_PROTO(struct page *page,
> int alloc_order, int fallback_order,
> - int alloc_migratetype, int fallback_migratetype, int new_migratetype),
> + int alloc_migratetype, int fallback_migratetype),
>
> TP_ARGS(page,
> alloc_order, fallback_order,
> - alloc_migratetype, fallback_migratetype, new_migratetype),
> + alloc_migratetype, fallback_migratetype),
>
> TP_STRUCT__entry(
> __field( struct page *, page )
> @@ -289,7 +289,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
> __entry->fallback_order = fallback_order;
> __entry->alloc_migratetype = alloc_migratetype;
> __entry->fallback_migratetype = fallback_migratetype;
> - __entry->change_ownership = (new_migratetype == alloc_migratetype);
> + __entry->change_ownership = (alloc_migratetype ==
> + get_pageblock_migratetype(page));
> ),
>
> TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7c46d0f..7b4c9aa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1139,44 +1139,50 @@ static void change_pageblock_range(struct page *pageblock_page,
> * Returns the new migratetype of the pageblock (or the same old migratetype
> * if it was unchanged).
> */
> -static int try_to_steal_freepages(struct zone *zone, struct page *page,
> - int start_type, int fallback_type)
> +static void try_to_steal_freepages(struct zone *zone, struct page *page,
> + int target_mt)
> {
> + int pages;
> int current_order = page_order(page);
>
> - /*
> - * When borrowing from MIGRATE_CMA, we need to release the excess
> - * buddy pages to CMA itself. We also ensure the freepage_migratetype
> - * is set to CMA so it is returned to the correct freelist in case
> - * the page ends up being not actually allocated from the pcp lists.
> - */
> - if (is_migrate_cma(fallback_type))
> - return fallback_type;
> -
> /* Take ownership for orders >= pageblock_order */
> if (current_order >= pageblock_order) {
> - change_pageblock_range(page, current_order, start_type);
> - return start_type;
> + change_pageblock_range(page, current_order, target_mt);
> + return;
So here's a (current_order >= pageblock_order) check.
> }
>
> - if (current_order >= pageblock_order / 2 ||
> - start_type == MIGRATE_RECLAIMABLE ||
> - page_group_by_mobility_disabled) {
> - int pages;
> + pages = move_freepages_block(zone, page, target_mt);
>
> - pages = move_freepages_block(zone, page, start_type);
> + /* Claim the whole block if over half of it is free */
> + if (pages >= (1 << (pageblock_order-1)) ||
> + page_group_by_mobility_disabled) {
>
> - /* Claim the whole block if over half of it is free */
> - if (pages >= (1 << (pageblock_order-1)) ||
> - page_group_by_mobility_disabled) {
> + set_pageblock_migratetype(page, target_mt);
> + }
> +}
>
> - set_pageblock_migratetype(page, start_type);
> - return start_type;
> - }
> +static bool can_steal_freepages(unsigned int order,
> + int start_mt, int fallback_mt)
> +{
> + /*
> + * When borrowing from MIGRATE_CMA, we need to release the excess
> + * buddy pages to CMA itself. We also ensure the freepage_migratetype
> + * is set to CMA so it is returned to the correct freelist in case
> + * the page ends up being not actually allocated from the pcp lists.
> + */
> + if (is_migrate_cma(fallback_mt))
> + return false;
>
> - }
> + /* Can take ownership for orders >= pageblock_order */
> + if (order >= pageblock_order)
> + return true;
And another check.
> +
> + if (order >= pageblock_order / 2 ||
> + start_mt == MIGRATE_RECLAIMABLE ||
> + page_group_by_mobility_disabled)
> + return true;
>
> - return fallback_type;
> + return false;
> }
>
> /* Remove an element from the buddy allocator from the fallback list */
> @@ -1187,6 +1193,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> unsigned int current_order;
> struct page *page;
> int migratetype, new_type, i;
> + bool can_steal;
>
> /* Find the largest possible block of pages in the other list */
> for (current_order = MAX_ORDER-1;
> @@ -1194,6 +1201,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> --current_order) {
> for (i = 0;; i++) {
> migratetype = fallbacks[start_migratetype][i];
> + new_type = migratetype;
>
> /* MIGRATE_RESERVE handled later if necessary */
> if (migratetype == MIGRATE_RESERVE)
> @@ -1207,9 +1215,13 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> struct page, lru);
> area->nr_free--;
So wouldn't it be better to handle the "order >= pageblock_order" case
separately at this level? I think it would be better also for the
compaction case (I'll comment on the later patch why).
> - new_type = try_to_steal_freepages(zone, page,
> - start_migratetype,
> - migratetype);
> + can_steal = can_steal_freepages(current_order,
> + start_migratetype, migratetype);
> + if (can_steal) {
> + new_type = start_migratetype;
> + try_to_steal_freepages(zone, page,
> + start_migratetype);
> + }
>
> /* Remove the page from the freelists */
> list_del(&page->lru);
> @@ -1225,7 +1237,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> set_freepage_migratetype(page, new_type);
>
> trace_mm_page_alloc_extfrag(page, order, current_order,
> - start_migratetype, migratetype, new_type);
> + start_migratetype, migratetype);
>
> return page;
> }
>
On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> Compaction has anti fragmentation algorithm. It is that freepage
> should be more than pageblock order to finish the compaction if we don't
> find any freepage in requested migratetype buddy list. This is for
> mitigating fragmentation, but, it is a lack of migratetype consideration
> and too excessive.
>
> At first, it doesn't consider migratetype so there would be false positive
> on compaction finish decision. For example, if allocation request is
> for unmovable migratetype, freepage in CMA migratetype doesn't help that
> allocation, so compaction should not be stopped. But, current logic
> considers it as compaction is no longer needed and stop the compaction.
>
> Secondly, it is too excessive. We can steal freepage from other migratetype
> and change pageblock migratetype on more relaxed conditions. In page
> allocator, there is another conditions that can succeed to steal without
> introducing fragmentation.
>
> To solve these problems, this patch borrows anti fragmentation logic from
> page allocator. It will reduce premature compaction finish in some cases
> and reduce excessive compaction work.
>
> stress-highalloc test in mmtests with non movable order 7 allocation shows
> in allocation success rate on phase 1 and compaction success rate.
>
> Allocation success rate on phase 1 (%)
> 57.00 : 63.67
>
> Compaction success rate (Compaction success * 100 / Compaction stalls, %)
> 28.94 : 35.13
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/mmzone.h | 3 +++
> mm/compaction.c | 31 +++++++++++++++++++++++++++++--
> mm/internal.h | 1 +
> mm/page_alloc.c | 5 ++---
> 4 files changed, 35 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2f0856d..87f5bb5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -63,6 +63,9 @@ enum {
> MIGRATE_TYPES
> };
>
> +#define FALLBACK_MIGRATETYPES (4)
> +extern int fallbacks[MIGRATE_TYPES][FALLBACK_MIGRATETYPES];
> +
> #ifdef CONFIG_CMA
> # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> #else
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1a5f465..2fd5f79 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1054,6 +1054,30 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
>
> +static bool can_steal_fallbacks(struct free_area *area,
> + unsigned int order, int migratetype)
> +{
> + int i;
> + int fallback_mt;
> +
> + if (area->nr_free == 0)
> + return false;
> +
> + for (i = 0; i < FALLBACK_MIGRATETYPES; i++) {
> + fallback_mt = fallbacks[migratetype][i];
> + if (fallback_mt == MIGRATE_RESERVE)
> + break;
> +
> + if (list_empty(&area->free_list[fallback_mt]))
> + continue;
> +
> + if (can_steal_freepages(order, migratetype, fallback_mt))
> + return true;
> + }
> +
> + return false;
> +}
> +
> static int __compact_finished(struct zone *zone, struct compact_control *cc,
> const int migratetype)
> {
> @@ -1104,8 +1128,11 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> if (!list_empty(&area->free_list[migratetype]))
> return COMPACT_PARTIAL;
>
> - /* Job done if allocation would set block type */
> - if (order >= pageblock_order && area->nr_free)
So, can_steal_fallbacks() -> can_steal_freepages() is quite involved way
if in the end we just realize that order >= pageblock_order and we are
stealing whole pageblock. Given that often compaction is done for THP,
it would be better to check order >= pageblock_order and handle it
upfront. This goes together with my comments on previous patch that
order >= pageblock_order is better handled separately.
> + /*
> + * Job done if allocation would steal freepages from
> + * other migratetype buddy lists.
> + */
> + if (can_steal_fallbacks(area, order, migratetype))
> return COMPACT_PARTIAL;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index efad241..7028d83 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -179,6 +179,7 @@ unsigned long
> isolate_migratepages_range(struct compact_control *cc,
> unsigned long low_pfn, unsigned long end_pfn);
>
> +bool can_steal_freepages(unsigned int order, int start_mt, int fallback_mt);
> #endif
>
> /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7b4c9aa..dcb8523 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1031,7 +1031,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> * This array describes the order lists are fallen back to when
> * the free lists for the desirable migrate type are depleted
> */
> -static int fallbacks[MIGRATE_TYPES][4] = {
> +int fallbacks[MIGRATE_TYPES][FALLBACK_MIGRATETYPES] = {
> [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
> [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
> #ifdef CONFIG_CMA
> @@ -1161,8 +1161,7 @@ static void try_to_steal_freepages(struct zone *zone, struct page *page,
> }
> }
>
> -static bool can_steal_freepages(unsigned int order,
> - int start_mt, int fallback_mt)
> +bool can_steal_freepages(unsigned int order, int start_mt, int fallback_mt)
> {
> /*
> * When borrowing from MIGRATE_CMA, we need to release the excess
>
On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> Currently, freepage isolation in one pageblock doesn't consider how many
> freepages we isolate. When I traced flow of compaction, compaction
> sometimes isolates more than 256 freepages to migrate just 32 pages.
>
> In this patch, freepage isolation is stopped at the point that we
> have more isolated freepage than isolated page for migration. This
> results in slowing down free page scanner and make compaction success
> rate higher.
>
> stress-highalloc test in mmtests with non movable order 7 allocation shows
> increase of compaction success rate and slight improvement of allocation
> success rate.
>
> Allocation success rate on phase 1 (%)
> 62.70 : 64.00
>
> Compaction success rate (Compaction success * 100 / Compaction stalls, %)
> 35.13 : 41.50
This is weird. I could maybe understand that isolating too many
freepages and then returning them is a waste of time if compaction
terminates immediately after the following migration (otherwise we would
keep those free pages for the future migrations within same compaction
run). And wasting time could reduce success rates for async compaction
terminating prematurely due to cond_resched(), but that should be all
the difference, unless there's another subtle bug, no?
> pfn where both scanners meets on compaction complete
> (separate test due to enormous tracepoint buffer)
> (zone_start=4096, zone_end=1048576)
> 586034 : 654378
The difference here suggests that there is indeed another subtle bug
related to where free scanner restarts, and we must be leaving the
excessively isolated (and then returned) freepages behind. Otherwise I
think the scanners should meet at the same place regardless of your patch.
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/compaction.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 2fd5f79..12223b9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -422,6 +422,13 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>
> /* If a page was split, advance to the end of it */
> if (isolated) {
> + cc->nr_freepages += isolated;
> + if (!strict &&
> + cc->nr_migratepages <= cc->nr_freepages) {
> + blockpfn += isolated;
> + break;
> + }
> +
> blockpfn += isolated - 1;
> cursor += isolated - 1;
> continue;
> @@ -831,7 +838,6 @@ static void isolate_freepages(struct compact_control *cc)
> unsigned long isolate_start_pfn; /* exact pfn we start at */
> unsigned long block_end_pfn; /* end of current pageblock */
> unsigned long low_pfn; /* lowest pfn scanner is able to scan */
> - int nr_freepages = cc->nr_freepages;
> struct list_head *freelist = &cc->freepages;
>
> /*
> @@ -856,11 +862,11 @@ static void isolate_freepages(struct compact_control *cc)
> * pages on cc->migratepages. We stop searching if the migrate
> * and free page scanners meet or enough free pages are isolated.
> */
> - for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
> + for (; block_start_pfn >= low_pfn &&
> + cc->nr_migratepages > cc->nr_freepages;
> block_end_pfn = block_start_pfn,
> block_start_pfn -= pageblock_nr_pages,
> isolate_start_pfn = block_start_pfn) {
> - unsigned long isolated;
>
> /*
> * This can iterate a massively long zone without finding any
> @@ -885,9 +891,8 @@ static void isolate_freepages(struct compact_control *cc)
> continue;
>
> /* Found a block suitable for isolating free pages from. */
> - isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> + isolate_freepages_block(cc, &isolate_start_pfn,
> block_end_pfn, freelist, false);
> - nr_freepages += isolated;
>
> /*
> * Remember where the free scanner should restart next time,
> @@ -919,8 +924,6 @@ static void isolate_freepages(struct compact_control *cc)
> */
> if (block_start_pfn < low_pfn)
> cc->free_pfn = cc->migrate_pfn;
> -
> - cc->nr_freepages = nr_freepages;
> }
>
> /*
>
On Mon, Dec 08, 2014 at 10:16:34AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> >This patchset aims at increase of compaction success rate. Changes are
> >related to compaction finish condition and freepage isolation condition.
> >
> > From these changes, I did stress highalloc test in mmtests with nonmovable
> >order 7 allocation configuration, and success rate (%) at phase 1 are,
> >
> >Base Patch-1 Patch-3 Patch-4
> >55.00 57.00 62.67 64.00
> >
> >And, compaction success rate (%) on same test are,
> >
> >Base Patch-1 Patch-3 Patch-4
> >18.47 28.94 35.13 41.50
>
> Did you test Patch-2 separately? Any difference to Patch 1?
I didn't test it separately. I guess that there is no remarkable
difference because it just slightly changes page stealing logic, not
compaction logic. Compaction success rate would not be affected by
patch 2, but, I will check it next time.
Thanks.
On Mon, Dec 08, 2014 at 10:29:44AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> >There is odd behaviour when we steal freepages from other migratetype
> >buddy list. In try_to_steal_freepages(), we move all freepages in
> >the pageblock that founded freepage is belong to to the request
> >migratetype in order to mitigate fragmentation. If the number of moved
> >pages are enough to change pageblock migratetype, there is no problem. If
> >not enough, we don't change pageblock migratetype and add broken freepages
> >to the original migratetype buddy list rather than request migratetype
> >one. For me, this is odd, because we already moved all freepages in this
> >pageblock to the request migratetype. This patch fixes this situation to
> >add broken freepages to the request migratetype buddy list in this case.
>
> I'd rather split the fix from the refactoring. And maybe my
> description is longer, but easier to understand? (I guess somebody
> else should judge this)
Your patch is much better to understand than mine. :)
No need to judge from somebody else.
After your patch is merged, I will resubmit these on top of it.
>
> >This patch introduce new function that can help to decide if we can
> >steal the page without resulting in fragmentation. It will be used in
> >following patch for compaction finish criteria.
> >
> >Signed-off-by: Joonsoo Kim <[email protected]>
> >---
> > include/trace/events/kmem.h | 7 +++--
> > mm/page_alloc.c | 72 +++++++++++++++++++++++++------------------
> > 2 files changed, 46 insertions(+), 33 deletions(-)
> >
> >diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> >index aece134..4ad10ba 100644
> >--- a/include/trace/events/kmem.h
> >+++ b/include/trace/events/kmem.h
> >@@ -268,11 +268,11 @@ TRACE_EVENT(mm_page_alloc_extfrag,
> >
> > TP_PROTO(struct page *page,
> > int alloc_order, int fallback_order,
> >- int alloc_migratetype, int fallback_migratetype, int new_migratetype),
> >+ int alloc_migratetype, int fallback_migratetype),
> >
> > TP_ARGS(page,
> > alloc_order, fallback_order,
> >- alloc_migratetype, fallback_migratetype, new_migratetype),
> >+ alloc_migratetype, fallback_migratetype),
> >
> > TP_STRUCT__entry(
> > __field( struct page *, page )
> >@@ -289,7 +289,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
> > __entry->fallback_order = fallback_order;
> > __entry->alloc_migratetype = alloc_migratetype;
> > __entry->fallback_migratetype = fallback_migratetype;
> >- __entry->change_ownership = (new_migratetype == alloc_migratetype);
> >+ __entry->change_ownership = (alloc_migratetype ==
> >+ get_pageblock_migratetype(page));
> > ),
> >
> > TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 7c46d0f..7b4c9aa 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -1139,44 +1139,50 @@ static void change_pageblock_range(struct page *pageblock_page,
> > * Returns the new migratetype of the pageblock (or the same old migratetype
> > * if it was unchanged).
> > */
> >-static int try_to_steal_freepages(struct zone *zone, struct page *page,
> >- int start_type, int fallback_type)
> >+static void try_to_steal_freepages(struct zone *zone, struct page *page,
> >+ int target_mt)
> > {
> >+ int pages;
> > int current_order = page_order(page);
> >
> >- /*
> >- * When borrowing from MIGRATE_CMA, we need to release the excess
> >- * buddy pages to CMA itself. We also ensure the freepage_migratetype
> >- * is set to CMA so it is returned to the correct freelist in case
> >- * the page ends up being not actually allocated from the pcp lists.
> >- */
> >- if (is_migrate_cma(fallback_type))
> >- return fallback_type;
> >-
> > /* Take ownership for orders >= pageblock_order */
> > if (current_order >= pageblock_order) {
> >- change_pageblock_range(page, current_order, start_type);
> >- return start_type;
> >+ change_pageblock_range(page, current_order, target_mt);
> >+ return;
>
> So here's a (current_order >= pageblock_order) check.
>
> > }
> >
> >- if (current_order >= pageblock_order / 2 ||
> >- start_type == MIGRATE_RECLAIMABLE ||
> >- page_group_by_mobility_disabled) {
> >- int pages;
> >+ pages = move_freepages_block(zone, page, target_mt);
> >
> >- pages = move_freepages_block(zone, page, start_type);
> >+ /* Claim the whole block if over half of it is free */
> >+ if (pages >= (1 << (pageblock_order-1)) ||
> >+ page_group_by_mobility_disabled) {
> >
> >- /* Claim the whole block if over half of it is free */
> >- if (pages >= (1 << (pageblock_order-1)) ||
> >- page_group_by_mobility_disabled) {
> >+ set_pageblock_migratetype(page, target_mt);
> >+ }
> >+}
> >
> >- set_pageblock_migratetype(page, start_type);
> >- return start_type;
> >- }
> >+static bool can_steal_freepages(unsigned int order,
> >+ int start_mt, int fallback_mt)
> >+{
> >+ /*
> >+ * When borrowing from MIGRATE_CMA, we need to release the excess
> >+ * buddy pages to CMA itself. We also ensure the freepage_migratetype
> >+ * is set to CMA so it is returned to the correct freelist in case
> >+ * the page ends up being not actually allocated from the pcp lists.
> >+ */
> >+ if (is_migrate_cma(fallback_mt))
> >+ return false;
> >
> >- }
> >+ /* Can take ownership for orders >= pageblock_order */
> >+ if (order >= pageblock_order)
> >+ return true;
>
> And another check.
>
> >+
> >+ if (order >= pageblock_order / 2 ||
> >+ start_mt == MIGRATE_RECLAIMABLE ||
> >+ page_group_by_mobility_disabled)
> >+ return true;
> >
> >- return fallback_type;
> >+ return false;
> > }
> >
> > /* Remove an element from the buddy allocator from the fallback list */
> >@@ -1187,6 +1193,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> > unsigned int current_order;
> > struct page *page;
> > int migratetype, new_type, i;
> >+ bool can_steal;
> >
> > /* Find the largest possible block of pages in the other list */
> > for (current_order = MAX_ORDER-1;
> >@@ -1194,6 +1201,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> > --current_order) {
> > for (i = 0;; i++) {
> > migratetype = fallbacks[start_migratetype][i];
> >+ new_type = migratetype;
> >
> > /* MIGRATE_RESERVE handled later if necessary */
> > if (migratetype == MIGRATE_RESERVE)
> >@@ -1207,9 +1215,13 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> > struct page, lru);
> > area->nr_free--;
>
> So wouldn't it be better to handle the "order >= pageblock_order"
> case separately at this level? I think it would be better also for
> the compaction case (I'll comment on the later patch why).
I will also comment on the later patch.
Thanks.
>
> >- new_type = try_to_steal_freepages(zone, page,
> >- start_migratetype,
> >- migratetype);
> >+ can_steal = can_steal_freepages(current_order,
> >+ start_migratetype, migratetype);
> >+ if (can_steal) {
> >+ new_type = start_migratetype;
> >+ try_to_steal_freepages(zone, page,
> >+ start_migratetype);
> >+ }
> >
> > /* Remove the page from the freelists */
> > list_del(&page->lru);
> >@@ -1225,7 +1237,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
> > set_freepage_migratetype(page, new_type);
> >
> > trace_mm_page_alloc_extfrag(page, order, current_order,
> >- start_migratetype, migratetype, new_type);
> >+ start_migratetype, migratetype);
> >
> > return page;
> > }
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Mon, Dec 08, 2014 at 10:34:05AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> >Compaction has anti fragmentation algorithm. It is that freepage
> >should be more than pageblock order to finish the compaction if we don't
> >find any freepage in requested migratetype buddy list. This is for
> >mitigating fragmentation, but, it is a lack of migratetype consideration
> >and too excessive.
> >
> >At first, it doesn't consider migratetype so there would be false positive
> >on compaction finish decision. For example, if allocation request is
> >for unmovable migratetype, freepage in CMA migratetype doesn't help that
> >allocation, so compaction should not be stopped. But, current logic
> >considers it as compaction is no longer needed and stop the compaction.
> >
> >Secondly, it is too excessive. We can steal freepage from other migratetype
> >and change pageblock migratetype on more relaxed conditions. In page
> >allocator, there is another conditions that can succeed to steal without
> >introducing fragmentation.
> >
> >To solve these problems, this patch borrows anti fragmentation logic from
> >page allocator. It will reduce premature compaction finish in some cases
> >and reduce excessive compaction work.
> >
> >stress-highalloc test in mmtests with non movable order 7 allocation shows
> >in allocation success rate on phase 1 and compaction success rate.
> >
> >Allocation success rate on phase 1 (%)
> >57.00 : 63.67
> >
> >Compaction success rate (Compaction success * 100 / Compaction stalls, %)
> >28.94 : 35.13
> >
> >Signed-off-by: Joonsoo Kim <[email protected]>
> >---
> > include/linux/mmzone.h | 3 +++
> > mm/compaction.c | 31 +++++++++++++++++++++++++++++--
> > mm/internal.h | 1 +
> > mm/page_alloc.c | 5 ++---
> > 4 files changed, 35 insertions(+), 5 deletions(-)
> >
> >diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >index 2f0856d..87f5bb5 100644
> >--- a/include/linux/mmzone.h
> >+++ b/include/linux/mmzone.h
> >@@ -63,6 +63,9 @@ enum {
> > MIGRATE_TYPES
> > };
> >
> >+#define FALLBACK_MIGRATETYPES (4)
> >+extern int fallbacks[MIGRATE_TYPES][FALLBACK_MIGRATETYPES];
> >+
> > #ifdef CONFIG_CMA
> > # define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> > #else
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index 1a5f465..2fd5f79 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -1054,6 +1054,30 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> > return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> > }
> >
> >+static bool can_steal_fallbacks(struct free_area *area,
> >+ unsigned int order, int migratetype)
> >+{
> >+ int i;
> >+ int fallback_mt;
> >+
> >+ if (area->nr_free == 0)
> >+ return false;
> >+
> >+ for (i = 0; i < FALLBACK_MIGRATETYPES; i++) {
> >+ fallback_mt = fallbacks[migratetype][i];
> >+ if (fallback_mt == MIGRATE_RESERVE)
> >+ break;
> >+
> >+ if (list_empty(&area->free_list[fallback_mt]))
> >+ continue;
> >+
> >+ if (can_steal_freepages(order, migratetype, fallback_mt))
> >+ return true;
> >+ }
> >+
> >+ return false;
> >+}
> >+
> > static int __compact_finished(struct zone *zone, struct compact_control *cc,
> > const int migratetype)
> > {
> >@@ -1104,8 +1128,11 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> > if (!list_empty(&area->free_list[migratetype]))
> > return COMPACT_PARTIAL;
> >
> >- /* Job done if allocation would set block type */
> >- if (order >= pageblock_order && area->nr_free)
>
> So, can_steal_fallbacks() -> can_steal_freepages() is quite involved
> way if in the end we just realize that order >= pageblock_order and
> we are stealing whole pageblock. Given that often compaction is done
> for THP, it would be better to check order >= pageblock_order and
> handle it upfront. This goes together with my comments on previous
> patch that order >= pageblock_order is better handled separately.
I'd like to keep this order check in can_steal_freepages(). At first, we
should first check migratetype before order checking. If high order page
is on CMA, we can't steal it. Secondly, I think that maintaining well
defined function to check whether we can steal or not is better than
separating logic. It would help future maintanance.
Thanks.
>
> >+ /*
> >+ * Job done if allocation would steal freepages from
> >+ * other migratetype buddy lists.
> >+ */
> >+ if (can_steal_fallbacks(area, order, migratetype))
> > return COMPACT_PARTIAL;
> > }
> >
> >diff --git a/mm/internal.h b/mm/internal.h
> >index efad241..7028d83 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -179,6 +179,7 @@ unsigned long
> > isolate_migratepages_range(struct compact_control *cc,
> > unsigned long low_pfn, unsigned long end_pfn);
> >
> >+bool can_steal_freepages(unsigned int order, int start_mt, int fallback_mt);
> > #endif
> >
> > /*
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 7b4c9aa..dcb8523 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -1031,7 +1031,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> > * This array describes the order lists are fallen back to when
> > * the free lists for the desirable migrate type are depleted
> > */
> >-static int fallbacks[MIGRATE_TYPES][4] = {
> >+int fallbacks[MIGRATE_TYPES][FALLBACK_MIGRATETYPES] = {
> > [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
> > [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
> > #ifdef CONFIG_CMA
> >@@ -1161,8 +1161,7 @@ static void try_to_steal_freepages(struct zone *zone, struct page *page,
> > }
> > }
> >
> >-static bool can_steal_freepages(unsigned int order,
> >- int start_mt, int fallback_mt)
> >+bool can_steal_freepages(unsigned int order, int start_mt, int fallback_mt)
> > {
> > /*
> > * When borrowing from MIGRATE_CMA, we need to release the excess
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Mon, Dec 08, 2014 at 10:59:17AM +0100, Vlastimil Babka wrote:
> On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> >From: Joonsoo Kim <[email protected]>
> >
> >Currently, freepage isolation in one pageblock doesn't consider how many
> >freepages we isolate. When I traced flow of compaction, compaction
> >sometimes isolates more than 256 freepages to migrate just 32 pages.
> >
> >In this patch, freepage isolation is stopped at the point that we
> >have more isolated freepage than isolated page for migration. This
> >results in slowing down free page scanner and make compaction success
> >rate higher.
> >
> >stress-highalloc test in mmtests with non movable order 7 allocation shows
> >increase of compaction success rate and slight improvement of allocation
> >success rate.
> >
> >Allocation success rate on phase 1 (%)
> >62.70 : 64.00
> >
> >Compaction success rate (Compaction success * 100 / Compaction stalls, %)
> >35.13 : 41.50
>
> This is weird. I could maybe understand that isolating too many
In fact, I also didn't fully understand why it results in this
result. :)
> freepages and then returning them is a waste of time if compaction
> terminates immediately after the following migration (otherwise we
> would keep those free pages for the future migrations within same
> compaction run). And wasting time could reduce success rates for
> async compaction terminating prematurely due to cond_resched(), but
> that should be all the difference, unless there's another subtle
> bug, no?
My guess is that there is bad effect when we release isolated
freepages. In asynchronous compaction, this happens quite easily.
In this case, freepages are returned to page allocator and, maybe,
they are on pcp list or front of buddy list so they would be used by
another user at first. This reduces freepages we can utilize so
compaction is finished earlier.
>
> >pfn where both scanners meets on compaction complete
> >(separate test due to enormous tracepoint buffer)
> >(zone_start=4096, zone_end=1048576)
> >586034 : 654378
>
> The difference here suggests that there is indeed another subtle bug
> related to where free scanner restarts, and we must be leaving the
> excessively isolated (and then returned) freepages behind. Otherwise
> I think the scanners should meet at the same place regardless of
> your patch.
I tried to find another subtle bug, but, can't find any critical one.
Hmm...
Anyway, regardless of the reason of result, this patch seems reasonable,
because we don't need to waste time to isolate unneeded freepages.
Thanks.
>
> >Signed-off-by: Joonsoo Kim <[email protected]>
> >---
> > mm/compaction.c | 17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index 2fd5f79..12223b9 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -422,6 +422,13 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >
> > /* If a page was split, advance to the end of it */
> > if (isolated) {
> >+ cc->nr_freepages += isolated;
> >+ if (!strict &&
> >+ cc->nr_migratepages <= cc->nr_freepages) {
> >+ blockpfn += isolated;
> >+ break;
> >+ }
> >+
> > blockpfn += isolated - 1;
> > cursor += isolated - 1;
> > continue;
> >@@ -831,7 +838,6 @@ static void isolate_freepages(struct compact_control *cc)
> > unsigned long isolate_start_pfn; /* exact pfn we start at */
> > unsigned long block_end_pfn; /* end of current pageblock */
> > unsigned long low_pfn; /* lowest pfn scanner is able to scan */
> >- int nr_freepages = cc->nr_freepages;
> > struct list_head *freelist = &cc->freepages;
> >
> > /*
> >@@ -856,11 +862,11 @@ static void isolate_freepages(struct compact_control *cc)
> > * pages on cc->migratepages. We stop searching if the migrate
> > * and free page scanners meet or enough free pages are isolated.
> > */
> >- for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
> >+ for (; block_start_pfn >= low_pfn &&
> >+ cc->nr_migratepages > cc->nr_freepages;
> > block_end_pfn = block_start_pfn,
> > block_start_pfn -= pageblock_nr_pages,
> > isolate_start_pfn = block_start_pfn) {
> >- unsigned long isolated;
> >
> > /*
> > * This can iterate a massively long zone without finding any
> >@@ -885,9 +891,8 @@ static void isolate_freepages(struct compact_control *cc)
> > continue;
> >
> > /* Found a block suitable for isolating free pages from. */
> >- isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> >+ isolate_freepages_block(cc, &isolate_start_pfn,
> > block_end_pfn, freelist, false);
> >- nr_freepages += isolated;
> >
> > /*
> > * Remember where the free scanner should restart next time,
> >@@ -919,8 +924,6 @@ static void isolate_freepages(struct compact_control *cc)
> > */
> > if (block_start_pfn < low_pfn)
> > cc->free_pfn = cc->migrate_pfn;
> >-
> >- cc->nr_freepages = nr_freepages;
> > }
> >
> > /*
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On 12/10/2014 07:38 AM, Joonsoo Kim wrote:
> On Mon, Dec 08, 2014 at 10:29:44AM +0100, Vlastimil Babka wrote:
>> On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
>>> There is odd behaviour when we steal freepages from other migratetype
>>> buddy list. In try_to_steal_freepages(), we move all freepages in
>>> the pageblock that founded freepage is belong to to the request
>>> migratetype in order to mitigate fragmentation. If the number of moved
>>> pages are enough to change pageblock migratetype, there is no problem. If
>>> not enough, we don't change pageblock migratetype and add broken freepages
>>> to the original migratetype buddy list rather than request migratetype
>>> one. For me, this is odd, because we already moved all freepages in this
>>> pageblock to the request migratetype. This patch fixes this situation to
>>> add broken freepages to the request migratetype buddy list in this case.
>>
>> I'd rather split the fix from the refactoring. And maybe my
>> description is longer, but easier to understand? (I guess somebody
>> else should judge this)
>
> Your patch is much better to understand than mine. :)
> No need to judge from somebody else.
> After your patch is merged, I will resubmit these on top of it.
Thanks. I'm doing another evaluation focusing on number of unmovable
pageblocks as Mel suggested and then resubmit with tracepoint fixed.
Vlastimil
On 12/10/2014 08:00 AM, Joonsoo Kim wrote:
> On Mon, Dec 08, 2014 at 10:59:17AM +0100, Vlastimil Babka wrote:
>> On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
>>> From: Joonsoo Kim <[email protected]>
>>>
>>> Currently, freepage isolation in one pageblock doesn't consider how many
>>> freepages we isolate. When I traced flow of compaction, compaction
>>> sometimes isolates more than 256 freepages to migrate just 32 pages.
>>>
>>> In this patch, freepage isolation is stopped at the point that we
>>> have more isolated freepage than isolated page for migration. This
>>> results in slowing down free page scanner and make compaction success
>>> rate higher.
>>>
>>> stress-highalloc test in mmtests with non movable order 7 allocation shows
>>> increase of compaction success rate and slight improvement of allocation
>>> success rate.
>>>
>>> Allocation success rate on phase 1 (%)
>>> 62.70 : 64.00
>>>
>>> Compaction success rate (Compaction success * 100 / Compaction stalls, %)
>>> 35.13 : 41.50
>>
>> This is weird. I could maybe understand that isolating too many
>
> In fact, I also didn't fully understand why it results in this
> result. :)
>
>> freepages and then returning them is a waste of time if compaction
>> terminates immediately after the following migration (otherwise we
>> would keep those free pages for the future migrations within same
>> compaction run). And wasting time could reduce success rates for
>> async compaction terminating prematurely due to cond_resched(), but
>> that should be all the difference, unless there's another subtle
>> bug, no?
>
> My guess is that there is bad effect when we release isolated
> freepages. In asynchronous compaction, this happens quite easily.
> In this case, freepages are returned to page allocator and, maybe,
> they are on pcp list or front of buddy list so they would be used by
> another user at first. This reduces freepages we can utilize so
> compaction is finished earlier.
Hmm, some might even stay on the pcplists and we won't isolate them
again. So we will leave them behind. I wouldn't expect such big
difference here, but anyway...
It might be interesting to evaluate if a pcplists drain after returning
isolated freepages (unless the scanners have already met, that's
pointless) would make any difference.
>>
>>> pfn where both scanners meets on compaction complete
>>> (separate test due to enormous tracepoint buffer)
>>> (zone_start=4096, zone_end=1048576)
>>> 586034 : 654378
>>
>> The difference here suggests that there is indeed another subtle bug
>> related to where free scanner restarts, and we must be leaving the
>> excessively isolated (and then returned) freepages behind. Otherwise
>> I think the scanners should meet at the same place regardless of
>> your patch.
>
> I tried to find another subtle bug, but, can't find any critical one.
> Hmm...
>
> Anyway, regardless of the reason of result, this patch seems reasonable,
> because we don't need to waste time to isolate unneeded freepages.
Right.
> Thanks.
>
>>
>>> Signed-off-by: Joonsoo Kim <[email protected]>
>>> ---
>>> mm/compaction.c | 17 ++++++++++-------
>>> 1 file changed, 10 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 2fd5f79..12223b9 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -422,6 +422,13 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>>>
>>> /* If a page was split, advance to the end of it */
>>> if (isolated) {
>>> + cc->nr_freepages += isolated;
>>> + if (!strict &&
>>> + cc->nr_migratepages <= cc->nr_freepages) {
>>> + blockpfn += isolated;
>>> + break;
>>> + }
>>> +
>>> blockpfn += isolated - 1;
>>> cursor += isolated - 1;
>>> continue;
>>> @@ -831,7 +838,6 @@ static void isolate_freepages(struct compact_control *cc)
>>> unsigned long isolate_start_pfn; /* exact pfn we start at */
>>> unsigned long block_end_pfn; /* end of current pageblock */
>>> unsigned long low_pfn; /* lowest pfn scanner is able to scan */
>>> - int nr_freepages = cc->nr_freepages;
>>> struct list_head *freelist = &cc->freepages;
>>>
>>> /*
>>> @@ -856,11 +862,11 @@ static void isolate_freepages(struct compact_control *cc)
>>> * pages on cc->migratepages. We stop searching if the migrate
>>> * and free page scanners meet or enough free pages are isolated.
>>> */
>>> - for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
>>> + for (; block_start_pfn >= low_pfn &&
>>> + cc->nr_migratepages > cc->nr_freepages;
>>> block_end_pfn = block_start_pfn,
>>> block_start_pfn -= pageblock_nr_pages,
>>> isolate_start_pfn = block_start_pfn) {
>>> - unsigned long isolated;
>>>
>>> /*
>>> * This can iterate a massively long zone without finding any
>>> @@ -885,9 +891,8 @@ static void isolate_freepages(struct compact_control *cc)
>>> continue;
>>>
>>> /* Found a block suitable for isolating free pages from. */
>>> - isolated = isolate_freepages_block(cc, &isolate_start_pfn,
>>> + isolate_freepages_block(cc, &isolate_start_pfn,
>>> block_end_pfn, freelist, false);
>>> - nr_freepages += isolated;
>>>
>>> /*
>>> * Remember where the free scanner should restart next time,
>>> @@ -919,8 +924,6 @@ static void isolate_freepages(struct compact_control *cc)
>>> */
>>> if (block_start_pfn < low_pfn)
>>> cc->free_pfn = cc->migrate_pfn;
>>> -
>>> - cc->nr_freepages = nr_freepages;
>>> }
>>>
>>> /*
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Wed, Dec 10, 2014 at 04:19:13PM +0100, Vlastimil Babka wrote:
> On 12/10/2014 08:00 AM, Joonsoo Kim wrote:
> >On Mon, Dec 08, 2014 at 10:59:17AM +0100, Vlastimil Babka wrote:
> >>On 12/08/2014 08:16 AM, Joonsoo Kim wrote:
> >>>From: Joonsoo Kim <[email protected]>
> >>>
> >>>Currently, freepage isolation in one pageblock doesn't consider how many
> >>>freepages we isolate. When I traced flow of compaction, compaction
> >>>sometimes isolates more than 256 freepages to migrate just 32 pages.
> >>>
> >>>In this patch, freepage isolation is stopped at the point that we
> >>>have more isolated freepage than isolated page for migration. This
> >>>results in slowing down free page scanner and make compaction success
> >>>rate higher.
> >>>
> >>>stress-highalloc test in mmtests with non movable order 7 allocation shows
> >>>increase of compaction success rate and slight improvement of allocation
> >>>success rate.
> >>>
> >>>Allocation success rate on phase 1 (%)
> >>>62.70 : 64.00
> >>>
> >>>Compaction success rate (Compaction success * 100 / Compaction stalls, %)
> >>>35.13 : 41.50
> >>
> >>This is weird. I could maybe understand that isolating too many
> >
> >In fact, I also didn't fully understand why it results in this
> >result. :)
> >
> >>freepages and then returning them is a waste of time if compaction
> >>terminates immediately after the following migration (otherwise we
> >>would keep those free pages for the future migrations within same
> >>compaction run). And wasting time could reduce success rates for
> >>async compaction terminating prematurely due to cond_resched(), but
> >>that should be all the difference, unless there's another subtle
> >>bug, no?
> >
> >My guess is that there is bad effect when we release isolated
> >freepages. In asynchronous compaction, this happens quite easily.
> >In this case, freepages are returned to page allocator and, maybe,
> >they are on pcp list or front of buddy list so they would be used by
> >another user at first. This reduces freepages we can utilize so
> >compaction is finished earlier.
>
> Hmm, some might even stay on the pcplists and we won't isolate them
> again. So we will leave them behind. I wouldn't expect such big
> difference here, but anyway...
> It might be interesting to evaluate if a pcplists drain after
> returning isolated freepages (unless the scanners have already met,
> that's pointless) would make any difference.
Yes, I will check it.
>
> >>
> >>>pfn where both scanners meets on compaction complete
> >>>(separate test due to enormous tracepoint buffer)
> >>>(zone_start=4096, zone_end=1048576)
> >>>586034 : 654378
> >>
> >>The difference here suggests that there is indeed another subtle bug
> >>related to where free scanner restarts, and we must be leaving the
> >>excessively isolated (and then returned) freepages behind. Otherwise
> >>I think the scanners should meet at the same place regardless of
> >>your patch.
> >
> >I tried to find another subtle bug, but, can't find any critical one.
> >Hmm...
> >
> >Anyway, regardless of the reason of result, this patch seems reasonable,
> >because we don't need to waste time to isolate unneeded freepages.
>
> Right.
>
> >Thanks.
> >
> >>
> >>>Signed-off-by: Joonsoo Kim <[email protected]>
> >>>---
> >>> mm/compaction.c | 17 ++++++++++-------
> >>> 1 file changed, 10 insertions(+), 7 deletions(-)
> >>>
> >>>diff --git a/mm/compaction.c b/mm/compaction.c
> >>>index 2fd5f79..12223b9 100644
> >>>--- a/mm/compaction.c
> >>>+++ b/mm/compaction.c
> >>>@@ -422,6 +422,13 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >>>
> >>> /* If a page was split, advance to the end of it */
> >>> if (isolated) {
> >>>+ cc->nr_freepages += isolated;
> >>>+ if (!strict &&
> >>>+ cc->nr_migratepages <= cc->nr_freepages) {
> >>>+ blockpfn += isolated;
> >>>+ break;
> >>>+ }
> >>>+
> >>> blockpfn += isolated - 1;
> >>> cursor += isolated - 1;
> >>> continue;
> >>>@@ -831,7 +838,6 @@ static void isolate_freepages(struct compact_control *cc)
> >>> unsigned long isolate_start_pfn; /* exact pfn we start at */
> >>> unsigned long block_end_pfn; /* end of current pageblock */
> >>> unsigned long low_pfn; /* lowest pfn scanner is able to scan */
> >>>- int nr_freepages = cc->nr_freepages;
> >>> struct list_head *freelist = &cc->freepages;
> >>>
> >>> /*
> >>>@@ -856,11 +862,11 @@ static void isolate_freepages(struct compact_control *cc)
> >>> * pages on cc->migratepages. We stop searching if the migrate
> >>> * and free page scanners meet or enough free pages are isolated.
> >>> */
> >>>- for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
> >>>+ for (; block_start_pfn >= low_pfn &&
> >>>+ cc->nr_migratepages > cc->nr_freepages;
> >>> block_end_pfn = block_start_pfn,
> >>> block_start_pfn -= pageblock_nr_pages,
> >>> isolate_start_pfn = block_start_pfn) {
> >>>- unsigned long isolated;
> >>>
> >>> /*
> >>> * This can iterate a massively long zone without finding any
> >>>@@ -885,9 +891,8 @@ static void isolate_freepages(struct compact_control *cc)
> >>> continue;
> >>>
> >>> /* Found a block suitable for isolating free pages from. */
> >>>- isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> >>>+ isolate_freepages_block(cc, &isolate_start_pfn,
> >>> block_end_pfn, freelist, false);
> >>>- nr_freepages += isolated;
> >>>
> >>> /*
> >>> * Remember where the free scanner should restart next time,
> >>>@@ -919,8 +924,6 @@ static void isolate_freepages(struct compact_control *cc)
> >>> */
> >>> if (block_start_pfn < low_pfn)
> >>> cc->free_pfn = cc->migrate_pfn;
> >>>-
> >>>- cc->nr_freepages = nr_freepages;
> >>> }
> >>>
> >>> /*
> >>>
> >>
> >>--
> >>To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>the body to [email protected]. For more info on Linux MM,
> >>see: http://www.linux-mm.org/ .
> >>Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>