Last year, Johannes Weiner has reported a regression in page mobility
grouping [1] and while the exact cause was not found, I've come up with some
ways to improve it by reducing the number of allocations falling back to
different migratetype and causing permanent fragmentation.
Changes since v2:
- incorporated feedback (nothing major)
- updated to current mmotm
- dropped the more intrusive RFC patches 9-10
The series was tested with mmtests stress-highalloc modified to do GFP_KERNEL
order-4 allocations, on 4.9 with "mm, vmscan: fix zone balance check in
prepare_kswapd_sleep" (without that, kcompactd indeed wasn't woken up) on UMA
machine with 4GB memory. There were 5 repeats of each run, as the extfrag stats
are quite volatile (note the stats below are sums, not averages, as it was less
perl hacking for me).
Success rate are the same, already high due to the low allocation order used,
so I'm not including them.
Compaction stats:
(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)
patch 1 patch 2 patch 3 patch 4 patch 7 patch 8
Compaction stalls 22449 24680 24846 19765 22059 17480
Compaction success 12971 14836 14608 10475 11632 8757
Compaction failures 9477 9843 10238 9290 10426 8722
Page migrate success 3109022 3370438 3312164 1695105 1608435 2111379
Page migrate failure 911588 1149065 1028264 1112675 1077251 1026367
Compaction pages isolated 7242983 8015530 7782467 4629063 4402787 5377665
Compaction migrate scanned 980838938 987367943 957690188 917647238 947155598 1018922197
Compaction free scanned 557926893 598946443 602236894 594024490 541169699 763651731
Compaction cost 10243 10578 10304 8286 8398 9440
Compaction stats are mostly within noise until patch 4, which decreases the
number of compactions, and migrations. Part of that could be due to more
pageblocks marked as unmovable, and async compaction skipping those. This
changes a bit with patch 7, but not so much. Patch 8 increases free scanner
stats and migrations, which comes from the changed termination criteria.
Interestingly number of compactions decreases - probably the fully compacted
pageblock satisfies multiple subsequent allocations, so it amortizes.
Next comes the extfrag tracepoint, where "fragmenting" means that an allocation
had to fallback to a pageblock of another migratetype which wasn't fully free
(which is almost all of the fallbacks). I have locally added another tracepoint
for "Page steal" into steal_suitable_fallback() which triggers in situations
where we are allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down
for which allocation migratetype we are stealing and from which fallback
migratetype. The last part "due to counting" comes from patch 4 and counts the
events where the counting of movable pages allowed us to change pageblock's
migratetype, while the number of free pages alone wouldn't be enough to cross
the threshold.
patch 1 patch 2 patch 3 patch 4 patch 7 patch 8
Page alloc extfrag event 10155066 8522968 10164959 15622080 13727068 13140319
Extfrag fragmenting 10149231 8517025 10159040 15616925 13721391 13134792
Extfrag fragmenting for unmovable 159504 168500 184177 97835 70625 56948
Extfrag fragmenting unmovable placed with movable 153613 163549 172693 91740 64099 50917
Extfrag fragmenting unmovable placed with reclaim. 5891 4951 11484 6095 6526 6031
Extfrag fragmenting for reclaimable 4738 4829 6345 4822 5640 5378
Extfrag fragmenting reclaimable placed with movable 1836 1902 1851 1579 1739 1760
Extfrag fragmenting reclaimable placed with unmov. 2902 2927 4494 3243 3901 3618
Extfrag fragmenting for movable 9984989 8343696 9968518 15514268 13645126 13072466
Pages steal 179954 192291 210880 123254 94545 81486
Pages steal with pageblock 22153 18943 20154 33562 29969 33444
Pages steal with pageblock for unmovable 14350 12858 13256 20660 19003 20852
Pages steal with pageblock for unmovable from mov. 12812 11402 11683 19072 17467 19298
Pages steal with pageblock for unmovable from recl. 1538 1456 1573 1588 1536 1554
Pages steal with pageblock for movable 7114 5489 5965 11787 10012 11493
Pages steal with pageblock for movable from unmov. 6885 5291 5541 11179 9525 10885
Pages steal with pageblock for movable from recl. 229 198 424 608 487 608
Pages steal with pageblock for reclaimable 689 596 933 1115 954 1099
Pages steal with pageblock for reclaimable from unmov. 273 219 537 658 547 667
Pages steal with pageblock for reclaimable from mov. 416 377 396 457 407 432
Pages steal with pageblock due to counting 11834 10075 7530
... for unmovable 8993 7381 4616
... for movable 2792 2653 2851
... for reclaimable 49 41 63
What we can see is that "Extfrag fragmenting for unmovable" and "... placed with
movable" drops with almost each patch, which is good as we are polluting less
movable pageblocks with unmovable pages.
The most significant change is patch 4 with movable page counting. On the other
hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops
though, so these movable allocation fallbacks find only small free pages and are
not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises,
because the patch increases the chances of pageblock migratetype changes to
happen. This affects all migratetypes.
The summary is that patch 4 is not a clear win wrt these stats, but I believe
that the tradeoff it makes is a good one. There's less pollution of movable
pageblocks by unmovable allocations. There's less stealing between pageblock,
and those that remain have higher chance of changing migratetype also the
pageblock itself, so it should more faithfully reflect the migratetype of the
pages within the pageblock. The increase of movable allocations falling back to
unmovable pageblock might look dramatic, but those allocations can be migrated
by compaction when needed, and other patches in the series (7-9) improve that
aspect.
Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also
reduce the impact on movable fallbacks from patch 4.
[1] https://www.spinics.net/lists/linux-mm/msg114237.html
Vlastimil Babka (8):
mm, compaction: reorder fields in struct compact_control
mm, compaction: remove redundant watermark check in compact_finished()
mm, page_alloc: split smallest stolen page in fallback
mm, page_alloc: count movable pages when stealing from pageblock
mm, compaction: change migrate_async_suitable() to
suitable_migration_source()
mm, compaction: add migratetype to compact_control
mm, compaction: restrict async compaction to pageblocks of same
migratetype
mm, compaction: finish whole pageblock to reduce fragmentation
include/linux/mmzone.h | 5 ++
include/linux/page-isolation.h | 5 +-
mm/compaction.c | 83 ++++++++++++++++-------
mm/internal.h | 12 ++--
mm/page_alloc.c | 148 ++++++++++++++++++++++++++++-------------
mm/page_isolation.c | 5 +-
6 files changed, 177 insertions(+), 81 deletions(-)
--
2.12.0
When detecting whether compaction has succeeded in forming a high-order page,
__compact_finished() employs a watermark check, followed by an own search for
a suitable page in the freelists. This is not ideal for two reasons:
- The watermark check also searches high-order freelists, but has a less strict
criteria wrt fallback. It's therefore redundant and waste of cycles. This was
different in the past when high-order watermark check attempted to apply
reserves to high-order pages.
- The watermark check might actually fail due to lack of order-0 pages.
Compaction can't help with that, so there's no point in continuing because of
that. It's possible that high-order page still exists and it terminates.
This patch therefore removes the watermark check. This should save some cycles
and terminate compaction sooner in some cases.
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
mm/compaction.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 81e1eaa2a2cf..9222ff362f33 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1280,7 +1280,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
const int migratetype)
{
unsigned int order;
- unsigned long watermark;
if (cc->contended || fatal_signal_pending(current))
return COMPACT_CONTENDED;
@@ -1308,13 +1307,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
if (is_via_compact_memory(cc->order))
return COMPACT_CONTINUE;
- /* Compaction run is not finished if the watermark is not met */
- watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
-
- if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
- cc->alloc_flags))
- return COMPACT_CONTINUE;
-
/* Direct compactor: Is a suitable page free? */
for (order = cc->order; order < MAX_ORDER; order++) {
struct free_area *area = &zone->free_area[order];
--
2.12.0
Preparation patch. We are going to need migratetype at lower layers than
compact_zone() and compact_finished().
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
mm/compaction.c | 15 +++++++--------
mm/internal.h | 1 +
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a7c2f0da7228..c48da73e30a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1279,10 +1279,11 @@ static inline bool is_via_compact_memory(int order)
return order == -1;
}
-static enum compact_result __compact_finished(struct zone *zone, struct compact_control *cc,
- const int migratetype)
+static enum compact_result __compact_finished(struct zone *zone,
+ struct compact_control *cc)
{
unsigned int order;
+ const int migratetype = cc->migratetype;
if (cc->contended || fatal_signal_pending(current))
return COMPACT_CONTENDED;
@@ -1338,12 +1339,11 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
}
static enum compact_result compact_finished(struct zone *zone,
- struct compact_control *cc,
- const int migratetype)
+ struct compact_control *cc)
{
int ret;
- ret = __compact_finished(zone, cc, migratetype);
+ ret = __compact_finished(zone, cc);
trace_mm_compaction_finished(zone, cc->order, ret);
if (ret == COMPACT_NO_SUITABLE_PAGE)
ret = COMPACT_CONTINUE;
@@ -1476,9 +1476,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
enum compact_result ret;
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone_end_pfn(zone);
- const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
const bool sync = cc->mode != MIGRATE_ASYNC;
+ cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
cc->classzone_idx);
/* Compaction is likely to fail */
@@ -1528,8 +1528,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
migrate_prep_local();
- while ((ret = compact_finished(zone, cc, migratetype)) ==
- COMPACT_CONTINUE) {
+ while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
int err;
switch (isolate_migratepages(zone, cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index 05c48a95a20a..3985656ac261 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -193,6 +193,7 @@ struct compact_control {
unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
int order; /* order a direct compactor needs */
+ int migratetype; /* migratetype of direct compactor */
const unsigned int alloc_flags; /* alloc flags of a direct compactor */
const int classzone_idx; /* zone index of a direct compactor */
enum migrate_mode mode; /* Async or sync migration mode */
--
2.12.0
When stealing pages from pageblock of a different migratetype, we count how
many free pages were stolen, and change the pageblock's migratetype if more
than half of the pageblock was free. This might be too conservative, as there
might be other pages that are not free, but were allocated with the same
migratetype as our allocation requested.
While we cannot determine the migratetype of allocated pages precisely (at
least without the page_owner functionality enabled), we can count pages that
compaction would try to isolate for migration - those are either on LRU or
__PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or
MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be
done as part of free page stealing with little additional overhead.
The page stealing code is changed so that it considers free pages plus pages
of the "good" migratetype for the decision whether to change pageblock's
migratetype.
The result should be more accurate migratetype of pageblocks wrt the actual
pages in the pageblocks, when stealing from semi-occupied pageblocks. This
should help the efficiency of page grouping by mobility.
In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 47%. The number
of movable allocations falling back to other pageblocks are increased by 55%,
but these events don't cause permanent fragmentation, so the tradeoff should
be positive. Later patches also offset the movable fallback increase to some
extent.
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
---
include/linux/page-isolation.h | 5 +--
mm/page_alloc.c | 71 +++++++++++++++++++++++++++++++++---------
mm/page_isolation.c | 5 +--
3 files changed, 61 insertions(+), 20 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d64706f2a..d4cd2014fa6f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,10 +33,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
bool skip_hwpoisoned_pages);
void set_pageblock_migratetype(struct page *page, int migratetype);
int move_freepages_block(struct zone *zone, struct page *page,
- int migratetype);
-int move_freepages(struct zone *zone,
- struct page *start_page, struct page *end_page,
- int migratetype);
+ int migratetype, int *num_movable);
/*
* Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eda7fedf6378..db96d1ebbed8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1836,9 +1836,9 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
* Note that start_page and end_pages are not aligned on a pageblock
* boundary. If alignment is required, use move_freepages_block()
*/
-int move_freepages(struct zone *zone,
+static int move_freepages(struct zone *zone,
struct page *start_page, struct page *end_page,
- int migratetype)
+ int migratetype, int *num_movable)
{
struct page *page;
unsigned int order;
@@ -1855,6 +1855,9 @@ int move_freepages(struct zone *zone,
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
#endif
+ if (num_movable)
+ *num_movable = 0;
+
for (page = start_page; page <= end_page;) {
if (!pfn_valid_within(page_to_pfn(page))) {
page++;
@@ -1865,6 +1868,15 @@ int move_freepages(struct zone *zone,
VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
if (!PageBuddy(page)) {
+ /*
+ * We assume that pages that could be isolated for
+ * migration are movable. But we don't actually try
+ * isolating, as that would be expensive.
+ */
+ if (num_movable &&
+ (PageLRU(page) || __PageMovable(page)))
+ (*num_movable)++;
+
page++;
continue;
}
@@ -1880,7 +1892,7 @@ int move_freepages(struct zone *zone,
}
int move_freepages_block(struct zone *zone, struct page *page,
- int migratetype)
+ int migratetype, int *num_movable)
{
unsigned long start_pfn, end_pfn;
struct page *start_page, *end_page;
@@ -1897,7 +1909,8 @@ int move_freepages_block(struct zone *zone, struct page *page,
if (!zone_spans_pfn(zone, end_pfn))
return 0;
- return move_freepages(zone, start_page, end_page, migratetype);
+ return move_freepages(zone, start_page, end_page, migratetype,
+ num_movable);
}
static void change_pageblock_range(struct page *pageblock_page,
@@ -1947,22 +1960,26 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
/*
* This function implements actual steal behaviour. If order is large enough,
* we can steal whole pageblock. If not, we first move freepages in this
- * pageblock and check whether half of pages are moved or not. If half of
- * pages are moved, we can change migratetype of pageblock and permanently
- * use it's pages as requested migratetype in the future.
+ * pageblock to our migratetype and determine how many already-allocated pages
+ * are there in the pageblock with a compatible migratetype. If at least half
+ * of pages are free or compatible, we can change migratetype of the pageblock
+ * itself, so pages freed in the future will be put on the correct free list.
*/
static void steal_suitable_fallback(struct zone *zone, struct page *page,
int start_type, bool whole_block)
{
unsigned int current_order = page_order(page);
struct free_area *area;
- int pages;
+ int free_pages, movable_pages, alike_pages;
+ int old_block_type;
+
+ old_block_type = get_pageblock_migratetype(page);
/*
* This can happen due to races and we want to prevent broken
* highatomic accounting.
*/
- if (is_migrate_highatomic_page(page))
+ if (is_migrate_highatomic(old_block_type))
goto single_page;
/* Take ownership for orders >= pageblock_order */
@@ -1975,10 +1992,35 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
if (!whole_block)
goto single_page;
- pages = move_freepages_block(zone, page, start_type);
+ free_pages = move_freepages_block(zone, page, start_type,
+ &movable_pages);
+ /*
+ * Determine how many pages are compatible with our allocation.
+ * For movable allocation, it's the number of movable pages which
+ * we just obtained. For other types it's a bit more tricky.
+ */
+ if (start_type == MIGRATE_MOVABLE) {
+ alike_pages = movable_pages;
+ } else {
+ /*
+ * If we are falling back a RECLAIMABLE or UNMOVABLE allocation
+ * to MOVABLE pageblock, consider all non-movable pages as
+ * compatible. If it's UNMOVABLE falling back to RECLAIMABLE or
+ * vice versa, be conservative since we can't distinguish the
+ * exact migratetype of non-movable pages.
+ */
+ if (old_block_type == MIGRATE_MOVABLE)
+ alike_pages = pageblock_nr_pages
+ - (free_pages + movable_pages);
+ else
+ alike_pages = 0;
+ }
- /* Claim the whole block if over half of it is free */
- if (pages >= (1 << (pageblock_order-1)) ||
+ /*
+ * If a sufficient number of pages in the block are either free or of
+ * comparable migratability as our allocation, claim the whole block.
+ */
+ if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
page_group_by_mobility_disabled)
set_pageblock_migratetype(page, start_type);
@@ -2056,7 +2098,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
&& !is_migrate_cma(mt)) {
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
- move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
+ move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
}
out_unlock:
@@ -2133,7 +2175,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
* may increase.
*/
set_pageblock_migratetype(page, ac->migratetype);
- ret = move_freepages_block(zone, page, ac->migratetype);
+ ret = move_freepages_block(zone, page, ac->migratetype,
+ NULL);
if (ret) {
spin_unlock_irqrestore(&zone->lock, flags);
return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7927bbb54a4e..5092e4ef00c8 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -66,7 +66,8 @@ static int set_migratetype_isolate(struct page *page,
set_pageblock_migratetype(page, MIGRATE_ISOLATE);
zone->nr_isolate_pageblock++;
- nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+ nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
+ NULL);
__mod_zone_freepage_state(zone, -nr_pages, migratetype);
}
@@ -120,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
* pageblock scanning for freepage moving.
*/
if (!isolated_page) {
- nr_pages = move_freepages_block(zone, page, migratetype);
+ nr_pages = move_freepages_block(zone, page, migratetype, NULL);
__mod_zone_freepage_state(zone, nr_pages, migratetype);
}
set_pageblock_migratetype(page, migratetype);
--
2.12.0
The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one. There are
various heuristics to make this event infrequent and reduce permanent
fragmentation. The main one is to try stealing from a pageblock that has the
most free pages, and possibly steal them all at once and convert the whole
pageblock. Precise searching for such pageblock would be expensive, so instead
the heuristics walks the free lists from MAX_ORDER down to requested order and
assumes that the block with highest-order free page is likely to also have the
most free pages in total.
Chances are that together with the highest-order page, we steal also pages of
lower orders from the same block. But then we still split the highest order
page. This is wasteful and can contribute to fragmentation instead of avoiding
it.
This patch thus changes __rmqueue_fallback() to just steal the page(s) and put
them on the freelist of the requested migratetype, and only report whether it
was successful. Then we pick (and eventually split) the smallest page with
__rmqueue_smallest(). This all happens under zone lock, so nobody can steal it
from us in the process. This should reduce fragmentation due to fallbacks. At
worst we are only stealing a single highest-order page and waste some cycles by
moving it between lists and then removing it, but fallback is not exactly hot
path so that should not be a concern. As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
mm/page_alloc.c | 59 +++++++++++++++++++++++++++++++++------------------------
1 file changed, 34 insertions(+), 25 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5238b87aec91..eda7fedf6378 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1952,23 +1952,41 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
* use it's pages as requested migratetype in the future.
*/
static void steal_suitable_fallback(struct zone *zone, struct page *page,
- int start_type)
+ int start_type, bool whole_block)
{
unsigned int current_order = page_order(page);
+ struct free_area *area;
int pages;
+ /*
+ * This can happen due to races and we want to prevent broken
+ * highatomic accounting.
+ */
+ if (is_migrate_highatomic_page(page))
+ goto single_page;
+
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
change_pageblock_range(page, current_order, start_type);
- return;
+ goto single_page;
}
+ /* We are not allowed to try stealing from the whole block */
+ if (!whole_block)
+ goto single_page;
+
pages = move_freepages_block(zone, page, start_type);
/* Claim the whole block if over half of it is free */
if (pages >= (1 << (pageblock_order-1)) ||
page_group_by_mobility_disabled)
set_pageblock_migratetype(page, start_type);
+
+ return;
+
+single_page:
+ area = &zone->free_area[current_order];
+ list_move(&page->lru, &area->free_list[start_type]);
}
/*
@@ -2127,8 +2145,13 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
return false;
}
-/* Remove an element from the buddy allocator from the fallback list */
-static inline struct page *
+/*
+ * Try finding a free buddy page on the fallback list and put it on the free
+ * list of requested migratetype, possibly along with other pages from the same
+ * block, depending on fragmentation avoidance heuristics. Returns true if
+ * fallback was found so that __rmqueue_smallest() can grab it.
+ */
+static inline bool
__rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
{
struct free_area *area;
@@ -2149,32 +2172,17 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
page = list_first_entry(&area->free_list[fallback_mt],
struct page, lru);
- if (can_steal && !is_migrate_highatomic_page(page))
- steal_suitable_fallback(zone, page, start_migratetype);
- /* Remove the page from the freelists */
- area->nr_free--;
- list_del(&page->lru);
- rmv_page_order(page);
-
- expand(zone, page, order, current_order, area,
- start_migratetype);
- /*
- * The pcppage_migratetype may differ from pageblock's
- * migratetype depending on the decisions in
- * find_suitable_fallback(). This is OK as long as it does not
- * differ for MIGRATE_CMA pageblocks. Those can be used as
- * fallback only via special __rmqueue_cma_fallback() function
- */
- set_pcppage_migratetype(page, start_migratetype);
+ steal_suitable_fallback(zone, page, start_migratetype,
+ can_steal);
trace_mm_page_alloc_extfrag(page, order, current_order,
start_migratetype, fallback_mt);
- return page;
+ return true;
}
- return NULL;
+ return false;
}
/*
@@ -2186,13 +2194,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
{
struct page *page;
+retry:
page = __rmqueue_smallest(zone, order, migratetype);
if (unlikely(!page)) {
if (migratetype == MIGRATE_MOVABLE)
page = __rmqueue_cma_fallback(zone, order);
- if (!page)
- page = __rmqueue_fallback(zone, order, migratetype);
+ if (!page && __rmqueue_fallback(zone, order, migratetype))
+ goto retry;
}
trace_mm_page_alloc_zone_locked(page, order, migratetype);
--
2.12.0
While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so place
them all together to the end of the structure. While at it, just order all
fields from largest to smallest.
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
mm/internal.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 823a7a89099b..05c48a95a20a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -183,6 +183,7 @@ extern int user_min_free_kbytes;
struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
struct list_head migratepages; /* List of pages being migrated */
+ struct zone *zone;
unsigned long nr_freepages; /* Number of isolated free pages */
unsigned long nr_migratepages; /* Number of pages to migrate */
unsigned long total_migrate_scanned;
@@ -190,16 +191,15 @@ struct compact_control {
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+ const gfp_t gfp_mask; /* gfp mask of a direct compactor */
+ int order; /* order a direct compactor needs */
+ const unsigned int alloc_flags; /* alloc flags of a direct compactor */
+ const int classzone_idx; /* zone index of a direct compactor */
enum migrate_mode mode; /* Async or sync migration mode */
bool ignore_skip_hint; /* Scan blocks even if marked skip */
bool ignore_block_suitable; /* Scan blocks considered unsuitable */
bool direct_compaction; /* False from kcompactd or /proc/... */
bool whole_zone; /* Whole zone should/has been scanned */
- int order; /* order a direct compactor needs */
- const gfp_t gfp_mask; /* gfp mask of a direct compactor */
- const unsigned int alloc_flags; /* alloc flags of a direct compactor */
- const int classzone_idx; /* zone index of a direct compactor */
- struct zone *zone;
bool contended; /* Signal lock or sched contention */
};
--
2.12.0
The main goal of direct compaction is to form a high-order page for allocation,
but it should also help against long-term fragmentation when possible. Most
lower-than-pageblock-order compactions are for non-movable allocations, which
means that if we compact in a movable pageblock and terminate as soon as we
create the high-order page, it's unlikely that the fallback heuristics will
claim the whole block. Instead there might be a single unmovable page in a
pageblock full of movable pages, and the next unmovable allocation might pick
another pageblock and increase long-term fragmentation.
To help against such scenarios, this patch changes the termination criteria for
compaction so that the current pageblock is finished even though the high-order
page already exists. Note that it might be possible that the high-order page
formed elsewhere in the zone due to parallel activity, but this patch doesn't
try to detect that.
This is only done with sync compaction, because async compaction is limited to
pageblock of the same migratetype, where it cannot result in a migratetype
fallback. (Async compaction also eagerly skips order-aligned blocks where
isolation fails, which is against the goal of migrating away as much of the
pageblock as possible.)
As a result of this patch, long-term memory fragmentation should be reduced.
In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 20%. The number
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
mm/internal.h | 1 +
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 2c288e75840d..bc7903130501 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
if (is_via_compact_memory(cc->order))
return COMPACT_CONTINUE;
+ if (cc->finishing_block) {
+ /*
+ * We have finished the pageblock, but better check again that
+ * we really succeeded.
+ */
+ if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+ cc->finishing_block = false;
+ else
+ return COMPACT_CONTINUE;
+ }
+
/* Direct compactor: Is a suitable page free? */
for (order = cc->order; order < MAX_ORDER; order++) {
struct free_area *area = &zone->free_area[order];
@@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
* other migratetype buddy lists.
*/
if (find_suitable_fallback(area, order, migratetype,
- true, &can_steal) != -1)
- return COMPACT_SUCCESS;
+ true, &can_steal) != -1) {
+
+ /* movable pages are OK in any pageblock */
+ if (migratetype == MIGRATE_MOVABLE)
+ return COMPACT_SUCCESS;
+
+ /*
+ * We are stealing for a non-movable allocation. Make
+ * sure we finish compacting the current pageblock
+ * first so it is as free as possible and we won't
+ * have to steal another one soon. This only applies
+ * to sync compaction, as async compaction operates
+ * on pageblocks of the same migratetype.
+ */
+ if (cc->mode == MIGRATE_ASYNC ||
+ IS_ALIGNED(cc->migrate_pfn,
+ pageblock_nr_pages)) {
+ return COMPACT_SUCCESS;
+ }
+
+ cc->finishing_block = true;
+ return COMPACT_CONTINUE;
+ }
}
return COMPACT_NO_SUITABLE_PAGE;
diff --git a/mm/internal.h b/mm/internal.h
index 3985656ac261..e417084c2fb1 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -202,6 +202,7 @@ struct compact_control {
bool direct_compaction; /* False from kcompactd or /proc/... */
bool whole_zone; /* Whole zone should/has been scanned */
bool contended; /* Signal lock or sched contention */
+ bool finishing_block; /* Finishing current pageblock */
};
unsigned long
--
2.12.0
The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
pageblocks. This is a heuristic intended to reduce latency, based on the
assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
However, with the exception of THP's, most high-order allocations are not
movable. Should the async compaction succeed, this increases the chance that
the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
long-term fragmentation worse.
This patch attempts to help the situation by changing async direct compaction
so that the migrate scanner only scans the pageblocks of the requested
migratetype. If it's a non-MOVABLE type and there are such pageblocks that do
contain movable pages, chances are that the allocation can succeed within one
of such pageblocks, removing the need for a fallback. If that fails, the
subsequent sync attempt will ignore this restriction.
In testing based on 4.9 kernel with stress-highalloc from mmtests configured
for order-4 GFP_KERNEL allocations, this patch has reduced the number of
unmovable allocations falling back to movable pageblocks by 30%. The number
of movable allocations falling back is reduced by 12%.
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 11 +++++++++--
mm/page_alloc.c | 20 +++++++++++++-------
2 files changed, 22 insertions(+), 9 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index c48da73e30a5..2c288e75840d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -986,10 +986,17 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
static bool suitable_migration_source(struct compact_control *cc,
struct page *page)
{
- if (cc->mode != MIGRATE_ASYNC)
+ int block_mt;
+
+ if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
return true;
- return is_migrate_movable(get_pageblock_migratetype(page));
+ block_mt = get_pageblock_migratetype(page);
+
+ if (cc->migratetype == MIGRATE_MOVABLE)
+ return is_migrate_movable(block_mt);
+ else
+ return block_mt == cc->migratetype;
}
/* Returns true if the page is within a block suitable for migration to */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index db96d1ebbed8..01ce6d41cb20 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3665,6 +3665,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+ const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
struct page *page = NULL;
unsigned int alloc_flags;
unsigned long did_some_progress;
@@ -3732,12 +3733,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
/*
* For costly allocations, try direct compaction first, as it's likely
- * that we have enough base pages and don't need to reclaim. Don't try
- * that for allocations that are allowed to ignore watermarks, as the
- * ALLOC_NO_WATERMARKS attempt didn't yet happen.
+ * that we have enough base pages and don't need to reclaim. For non-
+ * movable high-order allocations, do that as well, as compaction will
+ * try prevent permanent fragmentation by migrating from blocks of the
+ * same migratetype.
+ * Don't try this for allocations that are allowed to ignore
+ * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
*/
- if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER &&
- !gfp_pfmemalloc_allowed(gfp_mask)) {
+ if (can_direct_reclaim &&
+ (costly_order ||
+ (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
+ && !gfp_pfmemalloc_allowed(gfp_mask)) {
page = __alloc_pages_direct_compact(gfp_mask, order,
alloc_flags, ac,
INIT_COMPACT_PRIORITY,
@@ -3749,7 +3755,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* Checks for costly allocations with __GFP_NORETRY, which
* includes THP page fault allocations
*/
- if (gfp_mask & __GFP_NORETRY) {
+ if (costly_order && (gfp_mask & __GFP_NORETRY)) {
/*
* If compaction is deferred for high-order allocations,
* it is because sync compaction recently failed. If
@@ -3830,7 +3836,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* Do not retry costly high order allocations unless they are
* __GFP_REPEAT
*/
- if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
+ if (costly_order && !(gfp_mask & __GFP_REPEAT))
goto nopage;
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
--
2.12.0
Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
include/linux/mmzone.h | 5 +++++
mm/compaction.c | 19 +++++++++++--------
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 446cf68c1c09..618499159a7c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -74,6 +74,11 @@ extern char * const migratetype_names[MIGRATE_TYPES];
# define is_migrate_cma_page(_page) false
#endif
+static inline bool is_migrate_movable(int mt)
+{
+ return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
+}
+
#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/compaction.c b/mm/compaction.c
index 9222ff362f33..a7c2f0da7228 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,11 +89,6 @@ static void map_pages(struct list_head *list)
list_splice(&tmp_list, list);
}
-static inline bool migrate_async_suitable(int migratetype)
-{
- return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
-}
-
#ifdef CONFIG_COMPACTION
int PageMovable(struct page *page)
@@ -988,6 +983,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
#endif /* CONFIG_COMPACTION || CONFIG_CMA */
#ifdef CONFIG_COMPACTION
+static bool suitable_migration_source(struct compact_control *cc,
+ struct page *page)
+{
+ if (cc->mode != MIGRATE_ASYNC)
+ return true;
+
+ return is_migrate_movable(get_pageblock_migratetype(page));
+}
+
/* Returns true if the page is within a block suitable for migration to */
static bool suitable_migration_target(struct compact_control *cc,
struct page *page)
@@ -1007,7 +1011,7 @@ static bool suitable_migration_target(struct compact_control *cc,
}
/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
- if (migrate_async_suitable(get_pageblock_migratetype(page)))
+ if (is_migrate_movable(get_pageblock_migratetype(page)))
return true;
/* Otherwise skip the block */
@@ -1242,8 +1246,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
* Async compaction is optimistic to see if the minimum amount
* of work satisfies the allocation.
*/
- if (cc->mode == MIGRATE_ASYNC &&
- !migrate_async_suitable(get_pageblock_migratetype(page)))
+ if (!suitable_migration_source(cc, page))
continue;
/* Perform the isolation */
--
2.12.0
On Tue, Mar 07, 2017 at 02:15:37PM +0100, Vlastimil Babka wrote:
> Last year, Johannes Weiner has reported a regression in page mobility
> grouping [1] and while the exact cause was not found, I've come up with some
> ways to improve it by reducing the number of allocations falling back to
> different migratetype and causing permanent fragmentation.
I finally managed to get a handful of our machines on 4.10 with these
patches applied and a 4.10 vanilla control group.
The sampling period is over twelve hours, which is on the short side
for evaluating that load, so take the results with a grain of salt.
The allocstall rate (events per second) is down on average, but there
are occasionally fairly high spikes that exceed the peaks in 4.10:
http://cmpxchg.org/antifrag/allocstallrate.png
Activity from the compaction free scanner is down, while the migration
scanner does more work. I would assume most of this is coming from the
same-migratetype restriction on the source blocks:
http://cmpxchg.org/antifrag/compactfreescannedrate.png
http://cmpxchg.org/antifrag/compactmigratescannedrate.png
Unfortunately, the average compaction stall rate is consistently much
higher with the patches. The 1h rate averages are 2-3x higher:
http://cmpxchg.org/antifrag/compactstallrate.png
An increase in direct compaction is a bit worrisome, but the task
completion rates - the bottom line metric for this workload - are
still too chaotic to say whether the increased allocation latency
affects us meaningfully here. I'll give it a few more days.
Is there any other data you would like me to gather?
Thanks!
On 8.3.2017 17:46, Johannes Weiner wrote:
> Is there any other data you would like me to gather?
If you can enable the extfrag tracepoint, it would be nice to have graphs of how
unmovable allocations falling back to movable pageblocks, etc.
Possibly also /proc/pagetypeinfo for numbers of pageblock types.
Thanks!
Hello,
On Tue, Mar 07, 2017 at 02:15:39PM +0100, Vlastimil Babka wrote:
> When detecting whether compaction has succeeded in forming a high-order page,
> __compact_finished() employs a watermark check, followed by an own search for
> a suitable page in the freelists. This is not ideal for two reasons:
>
> - The watermark check also searches high-order freelists, but has a less strict
> criteria wrt fallback. It's therefore redundant and waste of cycles. This was
> different in the past when high-order watermark check attempted to apply
> reserves to high-order pages.
Although it looks redundant now, I don't like removal of the watermark
check here. Criteria in watermark check would be changed to more strict
later and we would easily miss to apply it on compaction side if the
watermark check is removed.
>
> - The watermark check might actually fail due to lack of order-0 pages.
> Compaction can't help with that, so there's no point in continuing because of
> that. It's possible that high-order page still exists and it terminates.
If lack of order-0 pages is the reason for stopping compaction, we
need to insert the watermark check for order-0 to break the compaction
instead of removing it. Am I missing something?
Thanks.
On Tue, Mar 07, 2017 at 02:15:41PM +0100, Vlastimil Babka wrote:
> When stealing pages from pageblock of a different migratetype, we count how
> many free pages were stolen, and change the pageblock's migratetype if more
> than half of the pageblock was free. This might be too conservative, as there
> might be other pages that are not free, but were allocated with the same
> migratetype as our allocation requested.
I think that too conservative is good for movable case. In my experiments,
fragmentation spreads out when unmovable/reclaimable pageblock is
changed to movable pageblock prematurely ('prematurely' means that
allocated unmovable pages remains). As you said below, movable allocations
falling back to other pageblocks don't causes permanent fragmentation.
Therefore, we don't need to be less conservative for movable
allocation. So, how about following change to keep the criteria for
movable allocation conservative even with this counting improvement?
threshold = (1 << (pageblock_order - 1));
if (start_type == MIGRATE_MOVABLE)
threshold += (1 << (pageblock_order - 2));
if (free_pages + alike_pages >= threshold)
...
Thanks.
On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> pageblocks. This is a heuristic intended to reduce latency, based on the
> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>
> However, with the exception of THP's, most high-order allocations are not
> movable. Should the async compaction succeed, this increases the chance that
> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> long-term fragmentation worse.
I agree with this idea but have some concerns on this change.
*ASYNC* compaction is designed for reducing latency and this change
doesn't fit it. If everything works fine, there is a few movable pages
in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
less the number of non-MOVABLE pageblock than MOVABLE one so finding
non-MOVABLE pageblock takes long time. These two factors will increase
the latency of *ASYNC* compaction.
And, there is a concern in implementaion side. With this change, there
is much possibilty that compaction scanner's met by ASYNC compaction.
It resets the scanner position and SYNC compaction would start the
scan at the beginning of the zone every time. It would make cached
position useless and inefficient.
Thanks.
On Tue, Mar 07, 2017 at 02:15:45PM +0100, Vlastimil Babka wrote:
> The main goal of direct compaction is to form a high-order page for allocation,
> but it should also help against long-term fragmentation when possible. Most
> lower-than-pageblock-order compactions are for non-movable allocations, which
> means that if we compact in a movable pageblock and terminate as soon as we
> create the high-order page, it's unlikely that the fallback heuristics will
> claim the whole block. Instead there might be a single unmovable page in a
> pageblock full of movable pages, and the next unmovable allocation might pick
> another pageblock and increase long-term fragmentation.
>
> To help against such scenarios, this patch changes the termination criteria for
> compaction so that the current pageblock is finished even though the high-order
> page already exists. Note that it might be possible that the high-order page
> formed elsewhere in the zone due to parallel activity, but this patch doesn't
> try to detect that.
>
> This is only done with sync compaction, because async compaction is limited to
> pageblock of the same migratetype, where it cannot result in a migratetype
> fallback. (Async compaction also eagerly skips order-aligned blocks where
> isolation fails, which is against the goal of migrating away as much of the
> pageblock as possible.)
>
> As a result of this patch, long-term memory fragmentation should be reduced.
>
> In testing based on 4.9 kernel with stress-highalloc from mmtests configured
> for order-4 GFP_KERNEL allocations, this patch has reduced the number of
> unmovable allocations falling back to movable pageblocks by 20%. The number
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> ---
> mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
> mm/internal.h | 1 +
> 2 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 2c288e75840d..bc7903130501 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
> if (is_via_compact_memory(cc->order))
> return COMPACT_CONTINUE;
>
> + if (cc->finishing_block) {
> + /*
> + * We have finished the pageblock, but better check again that
> + * we really succeeded.
> + */
> + if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
> + cc->finishing_block = false;
> + else
> + return COMPACT_CONTINUE;
> + }
> +
> /* Direct compactor: Is a suitable page free? */
> for (order = cc->order; order < MAX_ORDER; order++) {
> struct free_area *area = &zone->free_area[order];
> @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
> * other migratetype buddy lists.
> */
> if (find_suitable_fallback(area, order, migratetype,
> - true, &can_steal) != -1)
> - return COMPACT_SUCCESS;
> + true, &can_steal) != -1) {
> +
> + /* movable pages are OK in any pageblock */
> + if (migratetype == MIGRATE_MOVABLE)
> + return COMPACT_SUCCESS;
> +
> + /*
> + * We are stealing for a non-movable allocation. Make
> + * sure we finish compacting the current pageblock
> + * first so it is as free as possible and we won't
> + * have to steal another one soon. This only applies
> + * to sync compaction, as async compaction operates
> + * on pageblocks of the same migratetype.
> + */
> + if (cc->mode == MIGRATE_ASYNC ||
> + IS_ALIGNED(cc->migrate_pfn,
> + pageblock_nr_pages)) {
> + return COMPACT_SUCCESS;
> + }
If cc->migratetype and cc->migrate_pfn's migratetype is the same, stopping
the compaction here doesn't cause any fragmentation. Do we need to
compact full pageblock in this case?
Thanks.
On Wed, Mar 08, 2017 at 08:17:39PM +0100, Vlastimil Babka wrote:
> On 8.3.2017 17:46, Johannes Weiner wrote:
> > Is there any other data you would like me to gather?
>
> If you can enable the extfrag tracepoint, it would be nice to have graphs of how
> unmovable allocations falling back to movable pageblocks, etc.
Okay, here we go. I recorded 24 hours worth of the extfrag tracepoint,
filtered to fallbacks from unmovable requests to movable blocks. I've
uploaded the plot here:
http://cmpxchg.org/antifrag/fallbackrate.png
but this already speaks for itself:
11G alloc-mtfallback.trace
3.3G alloc-mtfallback-patched.trace
;)
> Possibly also /proc/pagetypeinfo for numbers of pageblock types.
After a week of uptime, the patched (b) kernel has more movable blocks
than vanilla 4.10-rc8 (a):
Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
a: Node 1, zone Normal 2017 29763 987 1 0 0
b: Node 1, zone Normal 1264 30850 653 1 0 0
I sampled this somewhat sporadically over the week and it's been
reading reliably this way.
The patched kernel also consistently beats vanilla in terms of peak
job throughput.
Overall very cool!
On 03/16/2017 07:34 PM, Johannes Weiner wrote:
> On Wed, Mar 08, 2017 at 08:17:39PM +0100, Vlastimil Babka wrote:
>> On 8.3.2017 17:46, Johannes Weiner wrote:
>>> Is there any other data you would like me to gather?
>>
>> If you can enable the extfrag tracepoint, it would be nice to have graphs of how
>> unmovable allocations falling back to movable pageblocks, etc.
>
> Okay, here we go. I recorded 24 hours worth of the extfrag tracepoint,
> filtered to fallbacks from unmovable requests to movable blocks. I've
> uploaded the plot here:
>
> http://cmpxchg.org/antifrag/fallbackrate.png
>
> but this already speaks for itself:
>
> 11G alloc-mtfallback.trace
> 3.3G alloc-mtfallback-patched.trace
>
> ;)
Great!
>> Possibly also /proc/pagetypeinfo for numbers of pageblock types.
> After a week of uptime, the patched (b) kernel has more movable blocks
> than vanilla 4.10-rc8 (a):
>
> Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
>
> a: Node 1, zone Normal 2017 29763 987 1 0 0
> b: Node 1, zone Normal 1264 30850 653 1 0 0
That's better than I expected. I wouldn't be surprised if the number of
unmovable pageblocks actually got *higher* due to the series because
previously many unmovable pages would be scattered around movable blocks.
> I sampled this somewhat sporadically over the week and it's been
> reading reliably this way.
>
> The patched kernel also consistently beats vanilla in terms of peak
> job throughput.
>
> Overall very cool!
Thanks a lot! So that means it's worth the increased compaction stats
you reported earlier?
On Fri, Mar 17, 2017 at 07:29:54PM +0100, Vlastimil Babka wrote:
> On 03/16/2017 07:34 PM, Johannes Weiner wrote:
> > The patched kernel also consistently beats vanilla in terms of peak
> > job throughput.
> >
> > Overall very cool!
>
> Thanks a lot! So that means it's worth the increased compaction stats
> you reported earlier?
Yes, from the impact this patchset has on the workload overall, I'm
assuming that the increased work pays off.
So maybe something to keep an eye out for, but IMO not a dealbreaker.
On 03/16/2017 02:30 AM, Joonsoo Kim wrote:
> Hello,
Hi, sorry for the late replies.
> On Tue, Mar 07, 2017 at 02:15:39PM +0100, Vlastimil Babka wrote:
>> When detecting whether compaction has succeeded in forming a high-order page,
>> __compact_finished() employs a watermark check, followed by an own search for
>> a suitable page in the freelists. This is not ideal for two reasons:
>>
>> - The watermark check also searches high-order freelists, but has a less strict
>> criteria wrt fallback. It's therefore redundant and waste of cycles. This was
>> different in the past when high-order watermark check attempted to apply
>> reserves to high-order pages.
>
> Although it looks redundant now, I don't like removal of the watermark
> check here. Criteria in watermark check would be changed to more strict
> later and we would easily miss to apply it on compaction side if the
> watermark check is removed.
I see, but compaction is already full of various watermark(-like) checks that
have to be considered/updated if watermark checking changes significantly, or
things will go subtly wrong. I doubt this extra check can really help much in
such cases.
>>
>> - The watermark check might actually fail due to lack of order-0 pages.
>> Compaction can't help with that, so there's no point in continuing because of
>> that. It's possible that high-order page still exists and it terminates.
>
> If lack of order-0 pages is the reason for stopping compaction, we
> need to insert the watermark check for order-0 to break the compaction
> instead of removing it. Am I missing something?
You proposed that once IIRC, but didn't follow up? Currently we learn about
insufficient order-0 watermark in __isolate_free_page() from the free scanner.
We could potentially stop compacting earlier by checking it also in
compact_finished(), but maybe it doesn't happen that often and it's just extra
checking overhead.
So I wouldn't be terribly opposed by converting the current check to an order-0
fail-compaction check (instead of removing it), but I really wouldn't like to
insert the order-0 one and also keep the current one.
> Thanks.
>
On 03/16/2017 02:53 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:41PM +0100, Vlastimil Babka wrote:
>> When stealing pages from pageblock of a different migratetype, we count how
>> many free pages were stolen, and change the pageblock's migratetype if more
>> than half of the pageblock was free. This might be too conservative, as there
>> might be other pages that are not free, but were allocated with the same
>> migratetype as our allocation requested.
>
> I think that too conservative is good for movable case. In my experiments,
> fragmentation spreads out when unmovable/reclaimable pageblock is
> changed to movable pageblock prematurely ('prematurely' means that
> allocated unmovable pages remains). As you said below, movable allocations
> falling back to other pageblocks don't causes permanent fragmentation.
> Therefore, we don't need to be less conservative for movable
> allocation. So, how about following change to keep the criteria for
> movable allocation conservative even with this counting improvement?
>
> threshold = (1 << (pageblock_order - 1));
> if (start_type == MIGRATE_MOVABLE)
> threshold += (1 << (pageblock_order - 2));
>
> if (free_pages + alike_pages >= threshold)
> ...
That could help, or also not. Keeping more pageblocks marked as unmovable also
means that more unmovable allocations will spread out to them all, even if they
would fit within less pageblocks. MIGRATE_MIXED was an idea to help in this
case, as truly unmovable pageblocks would be preferred to the mixed ones.
Can't decide about such change without testing :/
> Thanks.
>
On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
>> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
>> pageblocks. This is a heuristic intended to reduce latency, based on the
>> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
>>
>> However, with the exception of THP's, most high-order allocations are not
>> movable. Should the async compaction succeed, this increases the chance that
>> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
>> long-term fragmentation worse.
>
> I agree with this idea but have some concerns on this change.
>
> *ASYNC* compaction is designed for reducing latency and this change
> doesn't fit it. If everything works fine, there is a few movable pages
> in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
> less the number of non-MOVABLE pageblock than MOVABLE one so finding
> non-MOVABLE pageblock takes long time. These two factors will increase
> the latency of *ASYNC* compaction.
Right. I lately started to doubt the whole idea of async compaction (for
non-movable allocations). Seems it's one of the compaction heuristics tuned
towards the THP usecase. But for non-movable allocations, we just can't have
both the low latency and long-term fragmentation avoidance. I see now even my
own skip_on_failure mode in isolate_migratepages_block() as a mistake for
non-movable allocations.
Ideally I'd like to make async compaction redundant by kcompactd, and direct
compaction would mean a serious situation which should warrant sync compaction.
Meanwhile I see several options to modify this patch
- async compaction for non-movable allocations will stop doing the
skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
make sure that also this kind of compaction finishes the whole pageblock
- non-movable allocations will skip async compaction completely and go for sync
compaction immediately
Both options mean we won't clean the unmovable/reclaimable pageblocks as
aggressively, but perhaps the tradeoff won't be bad. What do you think?
Johannes, would you be able/willing to test such modification?
Thanks
> And, there is a concern in implementaion side. With this change, there
> is much possibilty that compaction scanner's met by ASYNC compaction.
> It resets the scanner position and SYNC compaction would start the
> scan at the beginning of the zone every time. It would make cached
> position useless and inefficient.
>
> Thanks.
>
On 03/16/2017 03:18 AM, Joonsoo Kim wrote:
> On Tue, Mar 07, 2017 at 02:15:45PM +0100, Vlastimil Babka wrote:
>> The main goal of direct compaction is to form a high-order page for allocation,
>> but it should also help against long-term fragmentation when possible. Most
>> lower-than-pageblock-order compactions are for non-movable allocations, which
>> means that if we compact in a movable pageblock and terminate as soon as we
>> create the high-order page, it's unlikely that the fallback heuristics will
>> claim the whole block. Instead there might be a single unmovable page in a
>> pageblock full of movable pages, and the next unmovable allocation might pick
>> another pageblock and increase long-term fragmentation.
>>
>> To help against such scenarios, this patch changes the termination criteria for
>> compaction so that the current pageblock is finished even though the high-order
>> page already exists. Note that it might be possible that the high-order page
>> formed elsewhere in the zone due to parallel activity, but this patch doesn't
>> try to detect that.
>>
>> This is only done with sync compaction, because async compaction is limited to
>> pageblock of the same migratetype, where it cannot result in a migratetype
>> fallback. (Async compaction also eagerly skips order-aligned blocks where
>> isolation fails, which is against the goal of migrating away as much of the
>> pageblock as possible.)
>>
>> As a result of this patch, long-term memory fragmentation should be reduced.
>>
>> In testing based on 4.9 kernel with stress-highalloc from mmtests configured
>> for order-4 GFP_KERNEL allocations, this patch has reduced the number of
>> unmovable allocations falling back to movable pageblocks by 20%. The number
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> Acked-by: Mel Gorman <[email protected]>
>> Acked-by: Johannes Weiner <[email protected]>
>> ---
>> mm/compaction.c | 36 ++++++++++++++++++++++++++++++++++--
>> mm/internal.h | 1 +
>> 2 files changed, 35 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 2c288e75840d..bc7903130501 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1318,6 +1318,17 @@ static enum compact_result __compact_finished(struct zone *zone,
>> if (is_via_compact_memory(cc->order))
>> return COMPACT_CONTINUE;
>>
>> + if (cc->finishing_block) {
>> + /*
>> + * We have finished the pageblock, but better check again that
>> + * we really succeeded.
>> + */
>> + if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
>> + cc->finishing_block = false;
>> + else
>> + return COMPACT_CONTINUE;
>> + }
>> +
>> /* Direct compactor: Is a suitable page free? */
>> for (order = cc->order; order < MAX_ORDER; order++) {
>> struct free_area *area = &zone->free_area[order];
>> @@ -1338,8 +1349,29 @@ static enum compact_result __compact_finished(struct zone *zone,
>> * other migratetype buddy lists.
>> */
>> if (find_suitable_fallback(area, order, migratetype,
>> - true, &can_steal) != -1)
>> - return COMPACT_SUCCESS;
>> + true, &can_steal) != -1) {
>> +
>> + /* movable pages are OK in any pageblock */
>> + if (migratetype == MIGRATE_MOVABLE)
>> + return COMPACT_SUCCESS;
>> +
>> + /*
>> + * We are stealing for a non-movable allocation. Make
>> + * sure we finish compacting the current pageblock
>> + * first so it is as free as possible and we won't
>> + * have to steal another one soon. This only applies
>> + * to sync compaction, as async compaction operates
>> + * on pageblocks of the same migratetype.
>> + */
>> + if (cc->mode == MIGRATE_ASYNC ||
>> + IS_ALIGNED(cc->migrate_pfn,
>> + pageblock_nr_pages)) {
>> + return COMPACT_SUCCESS;
>> + }
>
> If cc->migratetype and cc->migrate_pfn's migratetype is the same, stopping
> the compaction here doesn't cause any fragmentation. Do we need to
> compact full pageblock in this case?
Probably not, but if we make patch 7/8 less aggressive, then I'd rather keep
this chance of clearing a whole unmovable pageblock of movable pages.
I also realized that the finishing_block flag is just an unnecessary
complication. Just keep going until migrate_pfn hits the end of pageblock, and
only then check the termination criteria.
> Thanks.
>
On Wed, Mar 29, 2017 at 06:06:41PM +0200, Vlastimil Babka wrote:
> On 03/16/2017 03:14 AM, Joonsoo Kim wrote:
> > On Tue, Mar 07, 2017 at 02:15:44PM +0100, Vlastimil Babka wrote:
> >> The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE
> >> pageblocks. This is a heuristic intended to reduce latency, based on the
> >> assumption that non-MOVABLE pageblocks are unlikely to contain movable pages.
> >>
> >> However, with the exception of THP's, most high-order allocations are not
> >> movable. Should the async compaction succeed, this increases the chance that
> >> the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the
> >> long-term fragmentation worse.
> >
> > I agree with this idea but have some concerns on this change.
> >
> > *ASYNC* compaction is designed for reducing latency and this change
> > doesn't fit it. If everything works fine, there is a few movable pages
> > in non-MOVABLE pageblocks as you noted above. Moreover, there is quite
> > less the number of non-MOVABLE pageblock than MOVABLE one so finding
> > non-MOVABLE pageblock takes long time. These two factors will increase
> > the latency of *ASYNC* compaction.
>
> Right. I lately started to doubt the whole idea of async compaction (for
> non-movable allocations). Seems it's one of the compaction heuristics tuned
> towards the THP usecase. But for non-movable allocations, we just can't have
> both the low latency and long-term fragmentation avoidance. I see now even my
> own skip_on_failure mode in isolate_migratepages_block() as a mistake for
> non-movable allocations.
Why do you think that skip_on_failure mode is a mistake? I think that
it would lead to reduce the latency and it fits the goal of async
compaction.
>
> Ideally I'd like to make async compaction redundant by kcompactd, and direct
> compaction would mean a serious situation which should warrant sync compaction.
> Meanwhile I see several options to modify this patch
> - async compaction for non-movable allocations will stop doing the
> skip_on_failure mode, and won't restrict the pageblock at all. patch 8/8 will
> make sure that also this kind of compaction finishes the whole pageblock
> - non-movable allocations will skip async compaction completely and go for sync
> compaction immediately
IMO, concept of async compaction is also important for non-movable allocation.
Non-movable allocation is essential for some workload and they hope
the low latency.
Thanks.