2016-03-31 08:51:41

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 0/4] reduce latency of direct async compaction

The goal here is to reduce latency (and increase success) of direct async
compaction by making it focus more on the goal of creating a high-order page,
at some expense of thoroughness.

This is based on an older attempt [1] which I didn't finish as it seemed that
it increased longer-term fragmentation. Now it seems it doesn't, and we have
kcompactd for that goal. The main patch (3) makes migration scanner skip whole
order-aligned blocks as soon as isolation fails in them, as it takes just one
unmigrated page to prevent a high-order buddy page from fully merging.

Patch 4 then attempts to reduce the excessive freepage scanning (such as
reported in [2]) by allocating migration targets directly from freelists. Here
we just need to be sure that the free pages are not from the same block as the
migrated pages. This is also limited to direct async compaction and is not
meant to replace the more thorough free scanner for other scenarios.

[1] https://lkml.org/lkml/2014/7/16/988
[2] http://www.spinics.net/lists/linux-mm/msg97475.html

Testing was done using stress-highalloc from mmtests, configured for order-4
GFP_KERNEL allocations:

4.6-rc1 4.6-rc1 4.6-rc1
patch2 patch3 patch4
Success 1 Min 24.00 ( 0.00%) 27.00 (-12.50%) 43.00 (-79.17%)
Success 1 Mean 30.20 ( 0.00%) 31.60 ( -4.64%) 51.60 (-70.86%)
Success 1 Max 37.00 ( 0.00%) 35.00 ( 5.41%) 73.00 (-97.30%)
Success 2 Min 42.00 ( 0.00%) 32.00 ( 23.81%) 73.00 (-73.81%)
Success 2 Mean 44.00 ( 0.00%) 44.80 ( -1.82%) 78.00 (-77.27%)
Success 2 Max 48.00 ( 0.00%) 52.00 ( -8.33%) 81.00 (-68.75%)
Success 3 Min 91.00 ( 0.00%) 92.00 ( -1.10%) 88.00 ( 3.30%)
Success 3 Mean 92.20 ( 0.00%) 92.80 ( -0.65%) 91.00 ( 1.30%)
Success 3 Max 94.00 ( 0.00%) 93.00 ( 1.06%) 94.00 ( 0.00%)

While the eager skipping of unsuitable blocks from patch 3 didn't affect
success rates, direct freepage allocation did improve them.

4.6-rc1 4.6-rc1 4.6-rc1
patch2 patch3 patch4
User 2587.42 2566.53 2413.57
System 482.89 471.20 461.71
Elapsed 1395.68 1382.00 1392.87

Times are not so useful metric for this benchmark as main portion is the
interfering kernel builds, but results do hint at reduced system times.

4.6-rc1 4.6-rc1 4.6-rc1
patch2 patch3 patch4
Direct pages scanned 163614 159608 123385
Kswapd pages scanned 2070139 2078790 2081385
Kswapd pages reclaimed 2061707 2069757 2073723
Direct pages reclaimed 163354 159505 122304

Reduced direct reclaim was unintended, but could be explained by more
successful first attempt at (async) direct compaction, which is attempted
before the first reclaim attempt in __alloc_pages_slowpath().

Compaction stalls 33052 39853 55091
Compaction success 12121 19773 37875
Compaction failures 20931 20079 17216

Compaction is indeed more successful, and thus less likely to get deferred,
so there are also more direct compaction stalls.

Page migrate success 3781876 3326819 2790838
Page migrate failure 45817 41774 38113
Compaction pages isolated 7868232 6941457 5025092
Compaction migrate scanned 168160492 127269354 87087993
Compaction migrate prescanned 0 0 0
Compaction free scanned 2522142582 2326342620 743205879
Compaction free direct alloc 0 0 920792
Compaction free dir. all. miss 0 0 5865
Compaction cost 5252 4476 3602

Patch 2 reduces migration scanned pages by 25% thanks to the eager skipping.
Patch 3 reduces free scanned pages by 70%. The portion of direct allocation
misses to all direct allocations is less than 1% which should be acceptable.
Interestingly, patch 3 also reduces migration scanned pages by another 30% on
top of patch 2. The reason is not clear, but we can rejoice nevertheless.

Vlastimil Babka (4):
mm, compaction: wrap calculating first and last pfn of pageblock
mm, compaction: reduce spurious pcplist drains
mm, compaction: skip blocks where isolation fails in async direct
compaction
mm, compaction: direct freepage allocation for async direct compaction

include/linux/vm_event_item.h | 1 +
mm/compaction.c | 189 ++++++++++++++++++++++++++++++++++--------
mm/internal.h | 5 ++
mm/page_alloc.c | 27 ++++++
mm/vmstat.c | 2 +
5 files changed, 191 insertions(+), 33 deletions(-)

--
2.7.3


2016-03-31 08:51:40

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 3/4] mm, compaction: skip blocks where isolation fails in async direct compaction

The goal of direct compaction is to quickly make a high-order page available
for the pending allocation. Within an aligned block of pages of desired order,
a single allocated page that cannot be isolated for migration means that the
block cannot fully merge to a buddy page that would satisfy the allocation
request. Therefore we can reduce the allocation stall by skipping the rest of
the block immediately on isolation failure. For async compaction, this also
means a higher chance of succeeding until it detects contention.

We however shouldn't completely sacrifice the second objective of compaction,
which is to reduce overal long-term memory fragmentation. As a compromise,
perform the eager skipping only in direct async compaction, while sync
compaction (including kcompactd) remains thorough.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 77 insertions(+), 7 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 74b4b775459e..fe94d22d9144 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -644,6 +644,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
bool locked = false;
struct page *page = NULL, *valid_page = NULL;
unsigned long start_pfn = low_pfn;
+ bool skip_on_failure = false;
+ unsigned long next_skip_pfn = 0;

/*
* Ensure that there are not too many pages isolated from the LRU
@@ -664,10 +666,37 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
if (compact_should_abort(cc))
return 0;

+ if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
+ skip_on_failure = true;
+ next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+ }
+
/* Time to isolate some pages for migration */
for (; low_pfn < end_pfn; low_pfn++) {
bool is_lru;

+ if (skip_on_failure && low_pfn >= next_skip_pfn) {
+ /*
+ * We have isolated all migration candidates in the
+ * previous order-aligned block, and did not skip it due
+ * to failure. We should migrate the pages now and
+ * hopefully succeed compaction.
+ */
+ if (nr_isolated)
+ break;
+
+ /*
+ * We failed to isolate in the previous order-aligned
+ * block. Set the new boundary to the end of the
+ * current block. Note we can't simply increase
+ * next_skip_pfn by 1 << order, as low_pfn might have
+ * been incremented by a higher number due to skipping
+ * a compound or a high-order buddy page in the
+ * previous loop iteration.
+ */
+ next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+ }
+
/*
* Periodically drop the lock (if held) regardless of its
* contention, to give chance to IRQs. Abort async compaction
@@ -679,7 +708,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
break;

if (!pfn_valid_within(low_pfn))
- continue;
+ goto isolate_fail;
nr_scanned++;

page = pfn_to_page(low_pfn);
@@ -734,11 +763,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
if (likely(comp_order < MAX_ORDER))
low_pfn += (1UL << comp_order) - 1;

- continue;
+ goto isolate_fail;
}

if (!is_lru)
- continue;
+ goto isolate_fail;

/*
* Migration will fail if an anonymous page is pinned in memory,
@@ -747,7 +776,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
*/
if (!page_mapping(page) &&
page_count(page) > page_mapcount(page))
- continue;
+ goto isolate_fail;

/* If we already hold the lock, we can skip some rechecking */
if (!locked) {
@@ -758,7 +787,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

/* Recheck PageLRU and PageCompound under lock */
if (!PageLRU(page))
- continue;
+ goto isolate_fail;

/*
* Page become compound since the non-locked check,
@@ -767,7 +796,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
*/
if (unlikely(PageCompound(page))) {
low_pfn += (1UL << compound_order(page)) - 1;
- continue;
+ goto isolate_fail;
}
}

@@ -775,7 +804,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

/* Try isolate the page */
if (__isolate_lru_page(page, isolate_mode) != 0)
- continue;
+ goto isolate_fail;

VM_BUG_ON_PAGE(PageCompound(page), page);

@@ -801,6 +830,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
++low_pfn;
break;
}
+
+ continue;
+isolate_fail:
+ if (!skip_on_failure)
+ continue;
+
+ /*
+ * We have isolated some pages, but then failed. Release them
+ * instead of migrating, as we cannot form the cc->order buddy
+ * page anyway.
+ */
+ if (nr_isolated) {
+ if (locked) {
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ locked = false;
+ }
+ putback_movable_pages(migratelist);
+ nr_isolated = 0;
+ cc->last_migrated_pfn = 0;
+ }
+
+ if (low_pfn < next_skip_pfn) {
+ low_pfn = next_skip_pfn - 1;
+ /*
+ * The check near the loop beginning would have updated
+ * next_skip_pfn too, but this is a bit simpler.
+ */
+ next_skip_pfn += 1UL << cc->order;
+ }
}

/*
@@ -1409,6 +1467,18 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
ret = COMPACT_CONTENDED;
goto out;
}
+ /*
+ * We failed to migrate at least one page in the current
+ * order-aligned block, so skip the rest of it.
+ */
+ if (cc->direct_compaction &&
+ (cc->mode == MIGRATE_ASYNC)) {
+ cc->migrate_pfn = block_end_pfn(
+ cc->migrate_pfn - 1, cc->order);
+ /* Draining pcplists is useless in this case */
+ cc->last_migrated_pfn = 0;
+
+ }
}

check_drain:
--
2.7.3

2016-03-31 08:51:38

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 1/4] mm, compaction: wrap calculating first and last pfn of pageblock

Compaction code has accumulated numerous instances of manual calculations of
the first (inclusive) and last (exclusive) pfn of a pageblock (or a smaller
block of given order), given a pfn within the pageblock. Wrap these
calculations by introducing pageblock_start_pfn(pfn) and pageblock_end_pfn(pfn)
macros.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 33 +++++++++++++++++++--------------
1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ccf97b02b85f..3319145a387d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -42,6 +42,11 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
#define CREATE_TRACE_POINTS
#include <trace/events/compaction.h>

+#define block_start_pfn(pfn, order) round_down(pfn, 1UL << (order))
+#define block_end_pfn(pfn, order) ALIGN((pfn) + 1, 1UL << (order))
+#define pageblock_start_pfn(pfn) block_start_pfn(pfn, pageblock_order)
+#define pageblock_end_pfn(pfn) block_end_pfn(pfn, pageblock_order)
+
static unsigned long release_freepages(struct list_head *freelist)
{
struct page *page, *next;
@@ -161,7 +166,7 @@ static void reset_cached_positions(struct zone *zone)
zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
zone->compact_cached_free_pfn =
- round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages);
+ pageblock_start_pfn(zone_end_pfn(zone) - 1);
}

/*
@@ -519,10 +524,10 @@ isolate_freepages_range(struct compact_control *cc,
LIST_HEAD(freelist);

pfn = start_pfn;
- block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
+ block_start_pfn = pageblock_start_pfn(pfn);
if (block_start_pfn < cc->zone->zone_start_pfn)
block_start_pfn = cc->zone->zone_start_pfn;
- block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+ block_end_pfn = pageblock_end_pfn(pfn);

for (; pfn < end_pfn; pfn += isolated,
block_start_pfn = block_end_pfn,
@@ -538,8 +543,8 @@ isolate_freepages_range(struct compact_control *cc,
* scanning range to right one.
*/
if (pfn >= block_end_pfn) {
- block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
- block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+ block_start_pfn = pageblock_start_pfn(pfn);
+ block_end_pfn = pageblock_end_pfn(pfn);
block_end_pfn = min(block_end_pfn, end_pfn);
}

@@ -834,10 +839,10 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,

/* Scan block by block. First and last block may be incomplete */
pfn = start_pfn;
- block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
+ block_start_pfn = pageblock_start_pfn(pfn);
if (block_start_pfn < cc->zone->zone_start_pfn)
block_start_pfn = cc->zone->zone_start_pfn;
- block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+ block_end_pfn = pageblock_end_pfn(pfn);

for (; pfn < end_pfn; pfn = block_end_pfn,
block_start_pfn = block_end_pfn,
@@ -932,10 +937,10 @@ static void isolate_freepages(struct compact_control *cc)
* is using.
*/
isolate_start_pfn = cc->free_pfn;
- block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
+ block_start_pfn = pageblock_start_pfn(cc->free_pfn);
block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
zone_end_pfn(zone));
- low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
+ low_pfn = pageblock_start_pfn(cc->migrate_pfn);

/*
* Isolate free pages until enough are available to migrate the
@@ -1089,12 +1094,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
* initialized by compact_zone()
*/
low_pfn = cc->migrate_pfn;
- block_start_pfn = cc->migrate_pfn & ~(pageblock_nr_pages - 1);
+ block_start_pfn = pageblock_start_pfn(low_pfn);
if (block_start_pfn < zone->zone_start_pfn)
block_start_pfn = zone->zone_start_pfn;

/* Only scan within a pageblock boundary */
- block_end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
+ block_end_pfn = pageblock_end_pfn(low_pfn);

/*
* Iterate over whole pageblocks until we find the first suitable.
@@ -1351,7 +1356,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
cc->free_pfn = zone->compact_cached_free_pfn;
if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
- cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages);
+ cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
zone->compact_cached_free_pfn = cc->free_pfn;
}
if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
@@ -1419,7 +1424,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
if (cc->order > 0 && cc->last_migrated_pfn) {
int cpu;
unsigned long current_block_start =
- cc->migrate_pfn & ~((1UL << cc->order) - 1);
+ block_start_pfn(cc->migrate_pfn, cc->order);

if (cc->last_migrated_pfn < current_block_start) {
cpu = get_cpu();
@@ -1444,7 +1449,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
cc->nr_freepages = 0;
VM_BUG_ON(free_pfn == 0);
/* The cached pfn is always the first in a pageblock */
- free_pfn &= ~(pageblock_nr_pages-1);
+ free_pfn = pageblock_start_pfn(free_pfn);
/*
* Only go back, not forward. The cached pfn might have been
* already reset to zone end in compact_finished()
--
2.7.3

2016-03-31 08:51:37

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 2/4] mm, compaction: reduce spurious pcplist drains

Compaction drains the local pcplists each time migration scanner moves away
from a cc->order aligned block where it isolated pages for migration, so that
the pages freed by migrations can merge into higher orders.

The detection is currently coarser than it could be. The cc->last_migrated_pfn
variable should track the lowest pfn that was isolated for migration. But it
is set to the pfn where isolate_migratepages_block() starts scanning, which is
typically the first pfn of the pageblock. There, the scanner might fail to
isolate several order-aligned blocks, and then isolate COMPACT_CLUSTER_MAX in
another block. This would cause the pcplists drain to be performed, although
the scanner didn't yet finish the block where it isolated from.

This patch thus makes cc->last_migrated_pfn handling more accurate by setting
it to the pfn of an actually isolated page in isolate_migratepages_block().
Although practical effects of this patch are likely low, it arguably makes the
intent of the code more obvious. Also the next patch will make async direct
compaction skip blocks more aggressively, and draining pcplists due to skipped
blocks is wasteful.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3319145a387d..74b4b775459e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -787,6 +787,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
cc->nr_migratepages++;
nr_isolated++;

+ /*
+ * Record where we could have freed pages by migration and not
+ * yet flushed them to buddy allocator.
+ * - this is the lowest page that was isolated and likely be
+ * then freed by migration.
+ */
+ if (!cc->last_migrated_pfn)
+ cc->last_migrated_pfn = low_pfn;
+
/* Avoid isolating too much */
if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
++low_pfn;
@@ -1083,7 +1092,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
unsigned long block_start_pfn;
unsigned long block_end_pfn;
unsigned long low_pfn;
- unsigned long isolate_start_pfn;
struct page *page;
const isolate_mode_t isolate_mode =
(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
@@ -1138,7 +1146,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
continue;

/* Perform the isolation */
- isolate_start_pfn = low_pfn;
low_pfn = isolate_migratepages_block(cc, low_pfn,
block_end_pfn, isolate_mode);

@@ -1148,15 +1155,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}

/*
- * Record where we could have freed pages by migration and not
- * yet flushed them to buddy allocator.
- * - this is the lowest page that could have been isolated and
- * then freed by migration.
- */
- if (cc->nr_migratepages && !cc->last_migrated_pfn)
- cc->last_migrated_pfn = isolate_start_pfn;
-
- /*
* Either we isolated something and proceed with migration. Or
* we failed and compact_zone should decide if we should
* continue or not.
--
2.7.3

2016-03-31 08:51:35

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 4/4] mm, compaction: direct freepage allocation for async direct compaction

The goal of direct compaction is to quickly make a high-order page available
for the pending allocation. The free page scanner can add significant latency
when searching for migration targets, although to succeed the compaction, the
only important limit on the target free pages is that they must not come from
the same order-aligned block as the migrated pages.

This patch therefore makes direct async compaction allocate freepages directly
from freelists. Pages that do come from the same block (which we cannot simply
exclude from the freelist allocation) are put on separate list and released
only after migration to allow them to merge.

In addition to reduced stall, another advantage is that we split larger free
pages for migration targets only when smaller pages are depleted, while the
free scanner can split pages up to (order - 1) as it encouters them. However,
this approach likely sacrifices some of the long-term anti-fragmentation
features of a thorough compaction, so we limit the direct allocation approach
to direct async compaction.

For observational purposes, the patch introduces two new counters to
/proc/vmstat. compact_free_direct_alloc counts how many pages were allocated
directly without scanning, and compact_free_direct_miss counts the subset of
these allocations that were from the wrong range and had to be held on the
separate list.

Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/vm_event_item.h | 1 +
mm/compaction.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
mm/internal.h | 5 +++++
mm/page_alloc.c | 27 ++++++++++++++++++++++
mm/vmstat.c | 2 ++
5 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index ec084321fe09..9ec29406a01e 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#endif
#ifdef CONFIG_COMPACTION
COMPACTMIGRATE_SCANNED, COMPACTFREE_SCANNED,
+ COMPACTFREE_DIRECT_ALLOC, COMPACTFREE_DIRECT_MISS,
COMPACTISOLATED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
KCOMPACTD_WAKE,
diff --git a/mm/compaction.c b/mm/compaction.c
index fe94d22d9144..215db281ecaf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1083,6 +1083,41 @@ static void isolate_freepages(struct compact_control *cc)
cc->free_pfn = isolate_start_pfn;
}

+static void isolate_freepages_direct(struct compact_control *cc)
+{
+ unsigned long nr_pages;
+ unsigned long flags;
+
+ nr_pages = cc->nr_migratepages - cc->nr_freepages;
+
+ if (!compact_trylock_irqsave(&cc->zone->lock, &flags, cc))
+ return;
+
+ while (nr_pages) {
+ struct page *page;
+ unsigned long pfn;
+
+ page = alloc_pages_zone(cc->zone, 0, MIGRATE_MOVABLE);
+ if (!page)
+ break;
+ pfn = page_to_pfn(page);
+
+ count_compact_event(COMPACTFREE_DIRECT_ALLOC);
+
+ /* Is the free page in the block we are migrating from? */
+ if (pfn >> cc->order == (cc->migrate_pfn - 1) >> cc->order) {
+ list_add(&page->lru, &cc->freepages_held);
+ count_compact_event(COMPACTFREE_DIRECT_MISS);
+ } else {
+ list_add(&page->lru, &cc->freepages);
+ cc->nr_freepages++;
+ nr_pages--;
+ }
+ }
+
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+}
+
/*
* This is a migrate-callback that "allocates" freepages by taking pages
* from the isolated freelists in the block we are migrating to.
@@ -1099,7 +1134,12 @@ static struct page *compaction_alloc(struct page *migratepage,
* contention.
*/
if (list_empty(&cc->freepages)) {
- if (!cc->contended)
+ if (cc->contended)
+ return NULL;
+
+ if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC))
+ isolate_freepages_direct(cc);
+ else
isolate_freepages(cc);

if (list_empty(&cc->freepages))
@@ -1475,6 +1515,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
(cc->mode == MIGRATE_ASYNC)) {
cc->migrate_pfn = block_end_pfn(
cc->migrate_pfn - 1, cc->order);
+
+ if (!list_empty(&cc->freepages_held))
+ release_freepages(&cc->freepages_held);
+
/* Draining pcplists is useless in this case */
cc->last_migrated_pfn = 0;

@@ -1495,6 +1539,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
block_start_pfn(cc->migrate_pfn, cc->order);

if (cc->last_migrated_pfn < current_block_start) {
+ if (!list_empty(&cc->freepages_held))
+ release_freepages(&cc->freepages_held);
cpu = get_cpu();
lru_add_drain_cpu(cpu);
drain_local_pages(zone);
@@ -1525,6 +1571,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
if (free_pfn > zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = free_pfn;
}
+ if (!list_empty(&cc->freepages_held))
+ release_freepages(&cc->freepages_held);

trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync, ret);
@@ -1553,6 +1601,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
+ INIT_LIST_HEAD(&cc.freepages_held);

ret = compact_zone(zone, &cc);

@@ -1698,6 +1747,7 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
cc->zone = zone;
INIT_LIST_HEAD(&cc->freepages);
INIT_LIST_HEAD(&cc->migratepages);
+ INIT_LIST_HEAD(&cc->freepages_held);

/*
* When called via /proc/sys/vm/compact_memory
diff --git a/mm/internal.h b/mm/internal.h
index b79abb6721cf..a0c0286a9567 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -145,6 +145,8 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
}

extern int __isolate_free_page(struct page *page, unsigned int order);
+extern struct page * alloc_pages_zone(struct zone *zone, unsigned int order,
+ int migratetype);
extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
unsigned int order);
extern void prep_compound_page(struct page *page, unsigned int order);
@@ -165,6 +167,9 @@ extern int user_min_free_kbytes;
struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
struct list_head migratepages; /* List of pages being migrated */
+ struct list_head freepages_held;/* List of free pages from the block
+ * that's being migrated
+ */
unsigned long nr_freepages; /* Number of isolated free pages */
unsigned long nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59de90d5d3a3..3ee83fe02274 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2343,6 +2343,33 @@ int split_free_page(struct page *page)
}

/*
+ * Like split_free_page, but given the zone, it will grab a free page from
+ * the freelists.
+ */
+struct page *
+alloc_pages_zone(struct zone *zone, unsigned int order, int migratetype)
+{
+ struct page *page;
+ unsigned long watermark;
+
+ watermark = low_wmark_pages(zone) + (1 << order);
+ if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+ return NULL;
+
+ page = __rmqueue(zone, order, migratetype);
+ if (!page)
+ return NULL;
+
+ __mod_zone_freepage_state(zone, -(1 << order),
+ get_pcppage_migratetype(page));
+
+ set_page_owner(page, order, __GFP_MOVABLE);
+ set_page_refcounted(page);
+
+ return page;
+}
+
+/*
* Allocate a page from the given zone. Use pcplists for order-0 allocations.
*/
static inline
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 5e4300482897..9e07d11afa0d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -822,6 +822,8 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_COMPACTION
"compact_migrate_scanned",
"compact_free_scanned",
+ "compact_free_direct_alloc",
+ "compact_free_direct_miss",
"compact_isolated",
"compact_stall",
"compact_fail",
--
2.7.3

2016-04-04 09:32:05

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm, compaction: direct freepage allocation for async direct compaction

On Thu, Mar 31, 2016 at 10:50:36AM +0200, Vlastimil Babka wrote:
> The goal of direct compaction is to quickly make a high-order page available
> for the pending allocation. The free page scanner can add significant latency
> when searching for migration targets, although to succeed the compaction, the
> only important limit on the target free pages is that they must not come from
> the same order-aligned block as the migrated pages.
>

What prevents the free pages being allocated from behind the migration
scanner? Having compaction abort when the scanners meet misses
compaction opportunities but it avoids the problem of Compactor A using
pageblock X as a migration target and Compactor B using pageblock X as a
migration source.

--
Mel Gorman
SUSE Labs

2016-04-04 11:05:26

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm, compaction: direct freepage allocation for async direct compaction

On 04/04/2016 11:31 AM, Mel Gorman wrote:
> On Thu, Mar 31, 2016 at 10:50:36AM +0200, Vlastimil Babka wrote:
>> The goal of direct compaction is to quickly make a high-order page available
>> for the pending allocation. The free page scanner can add significant latency
>> when searching for migration targets, although to succeed the compaction, the
>> only important limit on the target free pages is that they must not come from
>> the same order-aligned block as the migrated pages.
>>
>
> What prevents the free pages being allocated from behind the migration
> scanner? Having compaction abort when the scanners meet misses
> compaction opportunities but it avoids the problem of Compactor A using
> pageblock X as a migration target and Compactor B using pageblock X as a
> migration source.

It's true that there's no complete protection, but parallel async
compactions should eventually get detect contention and back off. Sync
compaction keeps using the free scanner, so this seemed like a safe
thing to attempt in the initial async compaction, without compromising
success rates thanks to the followup sync compaction.


2016-04-11 07:03:08

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] reduce latency of direct async compaction

On Thu, Mar 31, 2016 at 10:50:32AM +0200, Vlastimil Babka wrote:
> The goal here is to reduce latency (and increase success) of direct async
> compaction by making it focus more on the goal of creating a high-order page,
> at some expense of thoroughness.
>
> This is based on an older attempt [1] which I didn't finish as it seemed that
> it increased longer-term fragmentation. Now it seems it doesn't, and we have
> kcompactd for that goal. The main patch (3) makes migration scanner skip whole
> order-aligned blocks as soon as isolation fails in them, as it takes just one
> unmigrated page to prevent a high-order buddy page from fully merging.
>
> Patch 4 then attempts to reduce the excessive freepage scanning (such as
> reported in [2]) by allocating migration targets directly from freelists. Here
> we just need to be sure that the free pages are not from the same block as the
> migrated pages. This is also limited to direct async compaction and is not
> meant to replace the more thorough free scanner for other scenarios.

I don't like that another algorithm is introduced for async
compaction. As you know, we already suffer from corner case that async
compaction have (such as compaction deferring doesn't work if we only
do async compaction). It makes further analysis/improvement harder. Generally,
more difference on async compaction would cause more problem later.

In suggested approach, possible risky places I think is finish condition
and deferring logic. Scanner meet position would be greatly affected
by system load. If there are no processes and async compaction
isn't aborted, freepage scanner will be at the end of the zone and
we can scan migratable page until we reach there. But, in the other case
that the system has some load, async compaction would be aborted easily and
freepage scanner will be at the some of point of the zone and
async compaction's scanning power can be limited a lot.

And, with different algorithm, it doesn't make sense to share same deferring
logic. Async compaction can succeed even if sync compaction continually fails.

I hope that we don't make async/sync compaction more diverse. I'd be
more happy if we can apply such a change to both async/sync direct
compaction.

>
> [1] https://lkml.org/lkml/2014/7/16/988
> [2] http://www.spinics.net/lists/linux-mm/msg97475.html
>
> Testing was done using stress-highalloc from mmtests, configured for order-4
> GFP_KERNEL allocations:
>
> 4.6-rc1 4.6-rc1 4.6-rc1
> patch2 patch3 patch4
> Success 1 Min 24.00 ( 0.00%) 27.00 (-12.50%) 43.00 (-79.17%)
> Success 1 Mean 30.20 ( 0.00%) 31.60 ( -4.64%) 51.60 (-70.86%)
> Success 1 Max 37.00 ( 0.00%) 35.00 ( 5.41%) 73.00 (-97.30%)
> Success 2 Min 42.00 ( 0.00%) 32.00 ( 23.81%) 73.00 (-73.81%)
> Success 2 Mean 44.00 ( 0.00%) 44.80 ( -1.82%) 78.00 (-77.27%)
> Success 2 Max 48.00 ( 0.00%) 52.00 ( -8.33%) 81.00 (-68.75%)
> Success 3 Min 91.00 ( 0.00%) 92.00 ( -1.10%) 88.00 ( 3.30%)
> Success 3 Mean 92.20 ( 0.00%) 92.80 ( -0.65%) 91.00 ( 1.30%)
> Success 3 Max 94.00 ( 0.00%) 93.00 ( 1.06%) 94.00 ( 0.00%)
>
> While the eager skipping of unsuitable blocks from patch 3 didn't affect
> success rates, direct freepage allocation did improve them.

Direct freepage allocation changes compaction algorithm a lot. It
removes limitation that we cannot get freepages from behind the
migration scanner so we can get freepage easily. It would be achieved
by other compaction algorithm changes (such as your pivot change or my
compaction algorithm change or this patchset). For the long term, this
limitation should be removed for sync compaction (at least direct sync
compaction), too. What's the reason that you don't apply this algorithm
to other cases? Is there any change in fragmentation?

Thanks.

>
> 4.6-rc1 4.6-rc1 4.6-rc1
> patch2 patch3 patch4
> User 2587.42 2566.53 2413.57
> System 482.89 471.20 461.71
> Elapsed 1395.68 1382.00 1392.87
>
> Times are not so useful metric for this benchmark as main portion is the
> interfering kernel builds, but results do hint at reduced system times.
>
> 4.6-rc1 4.6-rc1 4.6-rc1
> patch2 patch3 patch4
> Direct pages scanned 163614 159608 123385
> Kswapd pages scanned 2070139 2078790 2081385
> Kswapd pages reclaimed 2061707 2069757 2073723
> Direct pages reclaimed 163354 159505 122304
>
> Reduced direct reclaim was unintended, but could be explained by more
> successful first attempt at (async) direct compaction, which is attempted
> before the first reclaim attempt in __alloc_pages_slowpath().
>
> Compaction stalls 33052 39853 55091
> Compaction success 12121 19773 37875
> Compaction failures 20931 20079 17216
>
> Compaction is indeed more successful, and thus less likely to get deferred,
> so there are also more direct compaction stalls.
>
> Page migrate success 3781876 3326819 2790838
> Page migrate failure 45817 41774 38113
> Compaction pages isolated 7868232 6941457 5025092
> Compaction migrate scanned 168160492 127269354 87087993
> Compaction migrate prescanned 0 0 0
> Compaction free scanned 2522142582 2326342620 743205879
> Compaction free direct alloc 0 0 920792
> Compaction free dir. all. miss 0 0 5865
> Compaction cost 5252 4476 3602
>
> Patch 2 reduces migration scanned pages by 25% thanks to the eager skipping.
> Patch 3 reduces free scanned pages by 70%. The portion of direct allocation
> misses to all direct allocations is less than 1% which should be acceptable.
> Interestingly, patch 3 also reduces migration scanned pages by another 30% on
> top of patch 2. The reason is not clear, but we can rejoice nevertheless.

s/Patch 2/Patch 3
s/Patch 3/Patch 4

> Vlastimil Babka (4):
> mm, compaction: wrap calculating first and last pfn of pageblock
> mm, compaction: reduce spurious pcplist drains
> mm, compaction: skip blocks where isolation fails in async direct
> compaction
> mm, compaction: direct freepage allocation for async direct compaction
>
> include/linux/vm_event_item.h | 1 +
> mm/compaction.c | 189 ++++++++++++++++++++++++++++++++++--------
> mm/internal.h | 5 ++
> mm/page_alloc.c | 27 ++++++
> mm/vmstat.c | 2 +
> 5 files changed, 191 insertions(+), 33 deletions(-)
>
> --
> 2.7.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-11 07:11:12

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm, compaction: direct freepage allocation for async direct compaction

On Thu, Mar 31, 2016 at 10:50:36AM +0200, Vlastimil Babka wrote:
> The goal of direct compaction is to quickly make a high-order page available
> for the pending allocation. The free page scanner can add significant latency
> when searching for migration targets, although to succeed the compaction, the
> only important limit on the target free pages is that they must not come from
> the same order-aligned block as the migrated pages.

If we fails migration, free pages will remain and they can interfere
further compaction success because they doesn't come from previous
order-aligned block but can come from next order-aligned block. You
need to free remaining freelist after migration attempt fails?

Thanks.

>
> This patch therefore makes direct async compaction allocate freepages directly
> from freelists. Pages that do come from the same block (which we cannot simply
> exclude from the freelist allocation) are put on separate list and released
> only after migration to allow them to merge.
>
> In addition to reduced stall, another advantage is that we split larger free
> pages for migration targets only when smaller pages are depleted, while the
> free scanner can split pages up to (order - 1) as it encouters them. However,
> this approach likely sacrifices some of the long-term anti-fragmentation
> features of a thorough compaction, so we limit the direct allocation approach
> to direct async compaction.
>
> For observational purposes, the patch introduces two new counters to
> /proc/vmstat. compact_free_direct_alloc counts how many pages were allocated
> directly without scanning, and compact_free_direct_miss counts the subset of
> these allocations that were from the wrong range and had to be held on the
> separate list.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/vm_event_item.h | 1 +
> mm/compaction.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
> mm/internal.h | 5 +++++
> mm/page_alloc.c | 27 ++++++++++++++++++++++
> mm/vmstat.c | 2 ++
> 5 files changed, 86 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index ec084321fe09..9ec29406a01e 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> #endif
> #ifdef CONFIG_COMPACTION
> COMPACTMIGRATE_SCANNED, COMPACTFREE_SCANNED,
> + COMPACTFREE_DIRECT_ALLOC, COMPACTFREE_DIRECT_MISS,
> COMPACTISOLATED,
> COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
> KCOMPACTD_WAKE,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fe94d22d9144..215db281ecaf 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1083,6 +1083,41 @@ static void isolate_freepages(struct compact_control *cc)
> cc->free_pfn = isolate_start_pfn;
> }
>
> +static void isolate_freepages_direct(struct compact_control *cc)
> +{
> + unsigned long nr_pages;
> + unsigned long flags;
> +
> + nr_pages = cc->nr_migratepages - cc->nr_freepages;
> +
> + if (!compact_trylock_irqsave(&cc->zone->lock, &flags, cc))
> + return;
> +
> + while (nr_pages) {
> + struct page *page;
> + unsigned long pfn;
> +
> + page = alloc_pages_zone(cc->zone, 0, MIGRATE_MOVABLE);
> + if (!page)
> + break;
> + pfn = page_to_pfn(page);
> +
> + count_compact_event(COMPACTFREE_DIRECT_ALLOC);
> +
> + /* Is the free page in the block we are migrating from? */
> + if (pfn >> cc->order == (cc->migrate_pfn - 1) >> cc->order) {
> + list_add(&page->lru, &cc->freepages_held);
> + count_compact_event(COMPACTFREE_DIRECT_MISS);
> + } else {
> + list_add(&page->lru, &cc->freepages);
> + cc->nr_freepages++;
> + nr_pages--;
> + }
> + }
> +
> + spin_unlock_irqrestore(&cc->zone->lock, flags);
> +}
> +
> /*
> * This is a migrate-callback that "allocates" freepages by taking pages
> * from the isolated freelists in the block we are migrating to.
> @@ -1099,7 +1134,12 @@ static struct page *compaction_alloc(struct page *migratepage,
> * contention.
> */
> if (list_empty(&cc->freepages)) {
> - if (!cc->contended)
> + if (cc->contended)
> + return NULL;
> +
> + if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC))
> + isolate_freepages_direct(cc);
> + else
> isolate_freepages(cc);
>
> if (list_empty(&cc->freepages))
> @@ -1475,6 +1515,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> (cc->mode == MIGRATE_ASYNC)) {
> cc->migrate_pfn = block_end_pfn(
> cc->migrate_pfn - 1, cc->order);
> +
> + if (!list_empty(&cc->freepages_held))
> + release_freepages(&cc->freepages_held);
> +
> /* Draining pcplists is useless in this case */
> cc->last_migrated_pfn = 0;
>
> @@ -1495,6 +1539,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> block_start_pfn(cc->migrate_pfn, cc->order);
>
> if (cc->last_migrated_pfn < current_block_start) {
> + if (!list_empty(&cc->freepages_held))
> + release_freepages(&cc->freepages_held);
> cpu = get_cpu();
> lru_add_drain_cpu(cpu);
> drain_local_pages(zone);
> @@ -1525,6 +1571,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> if (free_pfn > zone->compact_cached_free_pfn)
> zone->compact_cached_free_pfn = free_pfn;
> }
> + if (!list_empty(&cc->freepages_held))
> + release_freepages(&cc->freepages_held);
>
> trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync, ret);
> @@ -1553,6 +1601,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order,
> };
> INIT_LIST_HEAD(&cc.freepages);
> INIT_LIST_HEAD(&cc.migratepages);
> + INIT_LIST_HEAD(&cc.freepages_held);
>
> ret = compact_zone(zone, &cc);
>
> @@ -1698,6 +1747,7 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> cc->zone = zone;
> INIT_LIST_HEAD(&cc->freepages);
> INIT_LIST_HEAD(&cc->migratepages);
> + INIT_LIST_HEAD(&cc->freepages_held);
>
> /*
> * When called via /proc/sys/vm/compact_memory
> diff --git a/mm/internal.h b/mm/internal.h
> index b79abb6721cf..a0c0286a9567 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -145,6 +145,8 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> }
>
> extern int __isolate_free_page(struct page *page, unsigned int order);
> +extern struct page * alloc_pages_zone(struct zone *zone, unsigned int order,
> + int migratetype);
> extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
> unsigned int order);
> extern void prep_compound_page(struct page *page, unsigned int order);
> @@ -165,6 +167,9 @@ extern int user_min_free_kbytes;
> struct compact_control {
> struct list_head freepages; /* List of free pages to migrate to */
> struct list_head migratepages; /* List of pages being migrated */
> + struct list_head freepages_held;/* List of free pages from the block
> + * that's being migrated
> + */
> unsigned long nr_freepages; /* Number of isolated free pages */
> unsigned long nr_migratepages; /* Number of pages to migrate */
> unsigned long free_pfn; /* isolate_freepages search base */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 59de90d5d3a3..3ee83fe02274 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2343,6 +2343,33 @@ int split_free_page(struct page *page)
> }
>
> /*
> + * Like split_free_page, but given the zone, it will grab a free page from
> + * the freelists.
> + */
> +struct page *
> +alloc_pages_zone(struct zone *zone, unsigned int order, int migratetype)
> +{
> + struct page *page;
> + unsigned long watermark;
> +
> + watermark = low_wmark_pages(zone) + (1 << order);
> + if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
> + return NULL;
> +
> + page = __rmqueue(zone, order, migratetype);
> + if (!page)
> + return NULL;
> +
> + __mod_zone_freepage_state(zone, -(1 << order),
> + get_pcppage_migratetype(page));
> +
> + set_page_owner(page, order, __GFP_MOVABLE);
> + set_page_refcounted(page);
> +
> + return page;
> +}
> +
> +/*
> * Allocate a page from the given zone. Use pcplists for order-0 allocations.
> */
> static inline
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 5e4300482897..9e07d11afa0d 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -822,6 +822,8 @@ const char * const vmstat_text[] = {
> #ifdef CONFIG_COMPACTION
> "compact_migrate_scanned",
> "compact_free_scanned",
> + "compact_free_direct_alloc",
> + "compact_free_direct_miss",
> "compact_isolated",
> "compact_stall",
> "compact_fail",
> --
> 2.7.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-11 07:27:08

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm, compaction: direct freepage allocation for async direct compaction

On 04/11/2016 09:13 AM, Joonsoo Kim wrote:
> On Thu, Mar 31, 2016 at 10:50:36AM +0200, Vlastimil Babka wrote:
>> The goal of direct compaction is to quickly make a high-order page available
>> for the pending allocation. The free page scanner can add significant latency
>> when searching for migration targets, although to succeed the compaction, the
>> only important limit on the target free pages is that they must not come from
>> the same order-aligned block as the migrated pages.
>
> If we fails migration, free pages will remain and they can interfere
> further compaction success because they doesn't come from previous
> order-aligned block but can come from next order-aligned block. You
> need to free remaining freelist after migration attempt fails?

Oh, good point, thanks!

2016-04-11 08:17:21

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] reduce latency of direct async compaction

On 04/11/2016 09:05 AM, Joonsoo Kim wrote:
> On Thu, Mar 31, 2016 at 10:50:32AM +0200, Vlastimil Babka wrote:
>> The goal here is to reduce latency (and increase success) of direct async
>> compaction by making it focus more on the goal of creating a high-order page,
>> at some expense of thoroughness.
>>
>> This is based on an older attempt [1] which I didn't finish as it seemed that
>> it increased longer-term fragmentation. Now it seems it doesn't, and we have
>> kcompactd for that goal. The main patch (3) makes migration scanner skip whole
>> order-aligned blocks as soon as isolation fails in them, as it takes just one
>> unmigrated page to prevent a high-order buddy page from fully merging.
>>
>> Patch 4 then attempts to reduce the excessive freepage scanning (such as
>> reported in [2]) by allocating migration targets directly from freelists. Here
>> we just need to be sure that the free pages are not from the same block as the
>> migrated pages. This is also limited to direct async compaction and is not
>> meant to replace the more thorough free scanner for other scenarios.
>
> I don't like that another algorithm is introduced for async
> compaction. As you know, we already suffer from corner case that async
> compaction have (such as compaction deferring doesn't work if we only
> do async compaction). It makes further analysis/improvement harder. Generally,
> more difference on async compaction would cause more problem later.

My idea is that async compaction could become "good enough" for majority
of cases, and strive for minimum latency. If it has to be different for
that goal, so be it. But of course it should not cause problems for the
sync fallback/kcompactd work.

> In suggested approach, possible risky places I think is finish condition
> and deferring logic. Scanner meet position would be greatly affected
> by system load. If there are no processes and async compaction
> isn't aborted, freepage scanner will be at the end of the zone and
> we can scan migratable page until we reach there. But, in the other case
> that the system has some load, async compaction would be aborted easily and
> freepage scanner will be at the some of point of the zone and
> async compaction's scanning power can be limited a lot.

Hmm, I thought that I've changed the migration scanner for the new mode
to stop looking at free scanner position. Looks like I forgot/it got
lost, but I definitely wanted to try that.

> And, with different algorithm, it doesn't make sense to share same deferring
> logic. Async compaction can succeed even if sync compaction continually fails.

That makes sense.

> I hope that we don't make async/sync compaction more diverse. I'd be
> more happy if we can apply such a change to both async/sync direct
> compaction.

OK, perhaps for sync direct compaction it could be tried too. But I
think not kcompactd, which has broader goals than making a single page
of given order (well, not in the initial implementation, but I'm working
on it :)

But it just occured to me that even kcompactd could incorporate
something like patch 3 to fight fragmentation. If we can't isolate a
page, then migrating its buddy will only create order-0 freepage. That
cannot help against fragmentation, only possibly make it worse if we
have to split a larger page for migration target. The question is, to
which order to extend this logic?

>>
>> [1] https://lkml.org/lkml/2014/7/16/988
>> [2] http://www.spinics.net/lists/linux-mm/msg97475.html
>>
>> Testing was done using stress-highalloc from mmtests, configured for order-4
>> GFP_KERNEL allocations:
>>
>> 4.6-rc1 4.6-rc1 4.6-rc1
>> patch2 patch3 patch4
>> Success 1 Min 24.00 ( 0.00%) 27.00 (-12.50%) 43.00 (-79.17%)
>> Success 1 Mean 30.20 ( 0.00%) 31.60 ( -4.64%) 51.60 (-70.86%)
>> Success 1 Max 37.00 ( 0.00%) 35.00 ( 5.41%) 73.00 (-97.30%)
>> Success 2 Min 42.00 ( 0.00%) 32.00 ( 23.81%) 73.00 (-73.81%)
>> Success 2 Mean 44.00 ( 0.00%) 44.80 ( -1.82%) 78.00 (-77.27%)
>> Success 2 Max 48.00 ( 0.00%) 52.00 ( -8.33%) 81.00 (-68.75%)
>> Success 3 Min 91.00 ( 0.00%) 92.00 ( -1.10%) 88.00 ( 3.30%)
>> Success 3 Mean 92.20 ( 0.00%) 92.80 ( -0.65%) 91.00 ( 1.30%)
>> Success 3 Max 94.00 ( 0.00%) 93.00 ( 1.06%) 94.00 ( 0.00%)
>>
>> While the eager skipping of unsuitable blocks from patch 3 didn't affect
>> success rates, direct freepage allocation did improve them.
>
> Direct freepage allocation changes compaction algorithm a lot. It
> removes limitation that we cannot get freepages from behind the
> migration scanner so we can get freepage easily. It would be achieved
> by other compaction algorithm changes (such as your pivot change or my
> compaction algorithm change or this patchset).

Pivot change or your algorithm would be definitely good for kcompactd.

> For the long term, this
> limitation should be removed for sync compaction (at least direct sync
> compaction), too. What's the reason that you don't apply this algorithm
> to other cases? Is there any change in fragmentation?

I wanted to be on the safe side. As Mel pointed out, parallel
compactions could be using same blocks for opposite purposes, so leave a
fallback mode that's not prone to that. But I'm considering that
pageblock skip bits could be repurposed as a "pageblock lock" for
compaction. Michal's oom rework experiments show that the original
purpose of the skip bits is causing problems when compaction is asked to
"try really everything you can and either succeed, or report a real
failure" and I suspect they aren't much better than a random pageblock
skipping in reducing compaction latencies.

And yeah, potential long-term fragmentation was another concern, but
hopefully will be diminished by a more proactive kcompactd.

So, it seems both you and Mel have doubts about Patch 4, but patches 1-3
could be acceptable for starters?

> Thanks.
>
>>
>> 4.6-rc1 4.6-rc1 4.6-rc1
>> patch2 patch3 patch4
>> User 2587.42 2566.53 2413.57
>> System 482.89 471.20 461.71
>> Elapsed 1395.68 1382.00 1392.87
>>
>> Times are not so useful metric for this benchmark as main portion is the
>> interfering kernel builds, but results do hint at reduced system times.
>>
>> 4.6-rc1 4.6-rc1 4.6-rc1
>> patch2 patch3 patch4
>> Direct pages scanned 163614 159608 123385
>> Kswapd pages scanned 2070139 2078790 2081385
>> Kswapd pages reclaimed 2061707 2069757 2073723
>> Direct pages reclaimed 163354 159505 122304
>>
>> Reduced direct reclaim was unintended, but could be explained by more
>> successful first attempt at (async) direct compaction, which is attempted
>> before the first reclaim attempt in __alloc_pages_slowpath().
>>
>> Compaction stalls 33052 39853 55091
>> Compaction success 12121 19773 37875
>> Compaction failures 20931 20079 17216
>>
>> Compaction is indeed more successful, and thus less likely to get deferred,
>> so there are also more direct compaction stalls.
>>
>> Page migrate success 3781876 3326819 2790838
>> Page migrate failure 45817 41774 38113
>> Compaction pages isolated 7868232 6941457 5025092
>> Compaction migrate scanned 168160492 127269354 87087993
>> Compaction migrate prescanned 0 0 0
>> Compaction free scanned 2522142582 2326342620 743205879
>> Compaction free direct alloc 0 0 920792
>> Compaction free dir. all. miss 0 0 5865
>> Compaction cost 5252 4476 3602
>>
>> Patch 2 reduces migration scanned pages by 25% thanks to the eager skipping.
>> Patch 3 reduces free scanned pages by 70%. The portion of direct allocation
>> misses to all direct allocations is less than 1% which should be acceptable.
>> Interestingly, patch 3 also reduces migration scanned pages by another 30% on
>> top of patch 2. The reason is not clear, but we can rejoice nevertheless.
>
> s/Patch 2/Patch 3
> s/Patch 3/Patch 4

Thanks.

>> Vlastimil Babka (4):
>> mm, compaction: wrap calculating first and last pfn of pageblock
>> mm, compaction: reduce spurious pcplist drains
>> mm, compaction: skip blocks where isolation fails in async direct
>> compaction
>> mm, compaction: direct freepage allocation for async direct compaction
>>
>> include/linux/vm_event_item.h | 1 +
>> mm/compaction.c | 189 ++++++++++++++++++++++++++++++++++--------
>> mm/internal.h | 5 ++
>> mm/page_alloc.c | 27 ++++++
>> mm/vmstat.c | 2 +
>> 5 files changed, 191 insertions(+), 33 deletions(-)
>>
>> --
>> 2.7.3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-12 04:46:47

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] reduce latency of direct async compaction

On Mon, Apr 11, 2016 at 10:17:13AM +0200, Vlastimil Babka wrote:
> On 04/11/2016 09:05 AM, Joonsoo Kim wrote:
> >On Thu, Mar 31, 2016 at 10:50:32AM +0200, Vlastimil Babka wrote:
> >>The goal here is to reduce latency (and increase success) of direct async
> >>compaction by making it focus more on the goal of creating a high-order page,
> >>at some expense of thoroughness.
> >>
> >>This is based on an older attempt [1] which I didn't finish as it seemed that
> >>it increased longer-term fragmentation. Now it seems it doesn't, and we have
> >>kcompactd for that goal. The main patch (3) makes migration scanner skip whole
> >>order-aligned blocks as soon as isolation fails in them, as it takes just one
> >>unmigrated page to prevent a high-order buddy page from fully merging.
> >>
> >>Patch 4 then attempts to reduce the excessive freepage scanning (such as
> >>reported in [2]) by allocating migration targets directly from freelists. Here
> >>we just need to be sure that the free pages are not from the same block as the
> >>migrated pages. This is also limited to direct async compaction and is not
> >>meant to replace the more thorough free scanner for other scenarios.
> >
> >I don't like that another algorithm is introduced for async
> >compaction. As you know, we already suffer from corner case that async
> >compaction have (such as compaction deferring doesn't work if we only
> >do async compaction). It makes further analysis/improvement harder. Generally,
> >more difference on async compaction would cause more problem later.
>
> My idea is that async compaction could become "good enough" for
> majority of cases, and strive for minimum latency. If it has to be
> different for that goal, so be it. But of course it should not cause
> problems for the sync fallback/kcompactd work.

Hmm... I re-read my argument and I'm not sure I expressed my opinion
properly. What I'd like to say is that difference between async/sync
compaction will make things harder. Efficiency is important but
maintenance (analyze/fix bug) is also important. Currently, async
compaction is slightly different with sync compaction that
it doesn't invoke deferring but reset all scanner position. Return
value also has different meaning. If COMPACT_COMPLETE returns for
async compaction, it doesn't mean that all pageblocks are scanned.
It only means that all *movable* pageblock are scanned and this could be
problem in some systems. This kind of difference already makes things
complicated. Introducing new algorithm for async compaction would make
difference larger and cause similar problem. I worry about that.

And, current async implement stops the compaction when contended and
it's very random timing. It could not be "good enough" in this form.
You need to fix it first.

>
> >In suggested approach, possible risky places I think is finish condition
> >and deferring logic. Scanner meet position would be greatly affected
> >by system load. If there are no processes and async compaction
> >isn't aborted, freepage scanner will be at the end of the zone and
> >we can scan migratable page until we reach there. But, in the other case
> >that the system has some load, async compaction would be aborted easily and
> >freepage scanner will be at the some of point of the zone and
> >async compaction's scanning power can be limited a lot.
>
> Hmm, I thought that I've changed the migration scanner for the new
> mode to stop looking at free scanner position. Looks like I
> forgot/it got lost, but I definitely wanted to try that.
>
> >And, with different algorithm, it doesn't make sense to share same deferring
> >logic. Async compaction can succeed even if sync compaction continually fails.
>
> That makes sense.
>
> >I hope that we don't make async/sync compaction more diverse. I'd be
> >more happy if we can apply such a change to both async/sync direct
> >compaction.
>
> OK, perhaps for sync direct compaction it could be tried too. But I
> think not kcompactd, which has broader goals than making a single
> page of given order (well, not in the initial implementation, but
> I'm working on it :)

Agreed.

> But it just occured to me that even kcompactd could incorporate
> something like patch 3 to fight fragmentation. If we can't isolate a

I'm fine if patch 3 can be applied to all the cases. I think that
direct sync compaction also need to be fast (low latency).

> page, then migrating its buddy will only create order-0 freepage.
> That cannot help against fragmentation, only possibly make it worse
> if we have to split a larger page for migration target. The question
> is, to which order to extend this logic?

One possible candidate would be PAGE_ALLOC_COSTLY_ORDER?

> >>
> >>[1] https://lkml.org/lkml/2014/7/16/988
> >>[2] http://www.spinics.net/lists/linux-mm/msg97475.html
> >>
> >>Testing was done using stress-highalloc from mmtests, configured for order-4
> >>GFP_KERNEL allocations:
> >>
> >> 4.6-rc1 4.6-rc1 4.6-rc1
> >> patch2 patch3 patch4
> >>Success 1 Min 24.00 ( 0.00%) 27.00 (-12.50%) 43.00 (-79.17%)
> >>Success 1 Mean 30.20 ( 0.00%) 31.60 ( -4.64%) 51.60 (-70.86%)
> >>Success 1 Max 37.00 ( 0.00%) 35.00 ( 5.41%) 73.00 (-97.30%)
> >>Success 2 Min 42.00 ( 0.00%) 32.00 ( 23.81%) 73.00 (-73.81%)
> >>Success 2 Mean 44.00 ( 0.00%) 44.80 ( -1.82%) 78.00 (-77.27%)
> >>Success 2 Max 48.00 ( 0.00%) 52.00 ( -8.33%) 81.00 (-68.75%)
> >>Success 3 Min 91.00 ( 0.00%) 92.00 ( -1.10%) 88.00 ( 3.30%)
> >>Success 3 Mean 92.20 ( 0.00%) 92.80 ( -0.65%) 91.00 ( 1.30%)
> >>Success 3 Max 94.00 ( 0.00%) 93.00 ( 1.06%) 94.00 ( 0.00%)
> >>
> >>While the eager skipping of unsuitable blocks from patch 3 didn't affect
> >>success rates, direct freepage allocation did improve them.
> >
> >Direct freepage allocation changes compaction algorithm a lot. It
> >removes limitation that we cannot get freepages from behind the
> >migration scanner so we can get freepage easily. It would be achieved
> >by other compaction algorithm changes (such as your pivot change or my
> >compaction algorithm change or this patchset).
>
> Pivot change or your algorithm would be definitely good for kcompactd.
>
> >For the long term, this
> >limitation should be removed for sync compaction (at least direct sync
> >compaction), too. What's the reason that you don't apply this algorithm
> >to other cases? Is there any change in fragmentation?
>
> I wanted to be on the safe side. As Mel pointed out, parallel
> compactions could be using same blocks for opposite purposes, so
> leave a fallback mode that's not prone to that. But I'm considering
> that pageblock skip bits could be repurposed as a "pageblock lock"
> for compaction. Michal's oom rework experiments show that the
> original purpose of the skip bits is causing problems when
> compaction is asked to "try really everything you can and either
> succeed, or report a real failure" and I suspect they aren't much
> better than a random pageblock skipping in reducing compaction
> latencies.
>
> And yeah, potential long-term fragmentation was another concern, but
> hopefully will be diminished by a more proactive kcompactd.
>
> So, it seems both you and Mel have doubts about Patch 4, but patches
> 1-3 could be acceptable for starters?

1-3 would be okay, but, as I said earlier, please try to apply it to
the direct sync compaction and measure long term fragmentation effect.

Thanks.

>
> >Thanks.
> >
> >>
> >> 4.6-rc1 4.6-rc1 4.6-rc1
> >> patch2 patch3 patch4
> >>User 2587.42 2566.53 2413.57
> >>System 482.89 471.20 461.71
> >>Elapsed 1395.68 1382.00 1392.87
> >>
> >>Times are not so useful metric for this benchmark as main portion is the
> >>interfering kernel builds, but results do hint at reduced system times.
> >>
> >> 4.6-rc1 4.6-rc1 4.6-rc1
> >> patch2 patch3 patch4
> >>Direct pages scanned 163614 159608 123385
> >>Kswapd pages scanned 2070139 2078790 2081385
> >>Kswapd pages reclaimed 2061707 2069757 2073723
> >>Direct pages reclaimed 163354 159505 122304
> >>
> >>Reduced direct reclaim was unintended, but could be explained by more
> >>successful first attempt at (async) direct compaction, which is attempted
> >>before the first reclaim attempt in __alloc_pages_slowpath().
> >>
> >>Compaction stalls 33052 39853 55091
> >>Compaction success 12121 19773 37875
> >>Compaction failures 20931 20079 17216
> >>
> >>Compaction is indeed more successful, and thus less likely to get deferred,
> >>so there are also more direct compaction stalls.
> >>
> >>Page migrate success 3781876 3326819 2790838
> >>Page migrate failure 45817 41774 38113
> >>Compaction pages isolated 7868232 6941457 5025092
> >>Compaction migrate scanned 168160492 127269354 87087993
> >>Compaction migrate prescanned 0 0 0
> >>Compaction free scanned 2522142582 2326342620 743205879
> >>Compaction free direct alloc 0 0 920792
> >>Compaction free dir. all. miss 0 0 5865
> >>Compaction cost 5252 4476 3602
> >>
> >>Patch 2 reduces migration scanned pages by 25% thanks to the eager skipping.
> >>Patch 3 reduces free scanned pages by 70%. The portion of direct allocation
> >>misses to all direct allocations is less than 1% which should be acceptable.
> >>Interestingly, patch 3 also reduces migration scanned pages by another 30% on
> >>top of patch 2. The reason is not clear, but we can rejoice nevertheless.
> >
> >s/Patch 2/Patch 3
> >s/Patch 3/Patch 4
>
> Thanks.
>
> >>Vlastimil Babka (4):
> >> mm, compaction: wrap calculating first and last pfn of pageblock
> >> mm, compaction: reduce spurious pcplist drains
> >> mm, compaction: skip blocks where isolation fails in async direct
> >> compaction
> >> mm, compaction: direct freepage allocation for async direct compaction
> >>
> >> include/linux/vm_event_item.h | 1 +
> >> mm/compaction.c | 189 ++++++++++++++++++++++++++++++++++--------
> >> mm/internal.h | 5 ++
> >> mm/page_alloc.c | 27 ++++++
> >> mm/vmstat.c | 2 +
> >> 5 files changed, 191 insertions(+), 33 deletions(-)
> >>
> >>--
> >>2.7.3
> >>
> >>--
> >>To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>the body to [email protected]. For more info on Linux MM,
> >>see: http://www.linux-mm.org/ .
> >>Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>