LinuxLists.cc - [PATCH 00/22] Increase success rates and reduce latency of compaction v3

2019-01-18 17:54:04

[permalink] [raw]

Subject: [PATCH 00/22] Increase success rates and reduce latency of compaction v3

This is a drop-in replacement for the series currently in Andrews tree that
incorporates static checking and compile warning fixes (Dan, YueHaibing)
and extensive review feedback from Vlastimil. Big thanks to Vlastimil as
the review was extremely detailed and a number of issues were caught. Not
all the patches have been acked but I think an update is still worthwhile.

Andrew, please drop the series you have and replace it with the following
on the off-chance we get bug reports that are fixed already. Doing this
with -fix patches would be relatively painful for little gain.

Changelog since v2
o Fix static checker warnings (dan)
o Fix unused variable warnings (yue)
o Drop patch about PageReserved as there is some abuse of the flag outside
of the mm core. (vbabka)
o Drop patch using the bulk free helper as it may be vulnerable to races
with compaction and gup (vbabka)
o Drop patch about remote compaction. It's unnecessary at this time and
unclear what the semantics should even be (vbabka)
o Changelog fixes and clarifications (vbabka)
o Free list management and search
Confined mostly to "mm, compaction: Use free lists to quickly locate
a migration source" which is arguably the most heavily modified patch
in this revision (vbabka, mel)
o Some minor churn, modifications, flow changes that fallout from
addressing the review feedback (mel)
o Minor pageblock skip changes, mostly fixing up which patch makes the
changes so the patches are incremental (mel)

This series reduces scan rates and success rates of compaction, primarily
by using the free lists to shorten scans, better controlling of skip
information and whether multiple scanners can target the same block and
capturing pageblocks before being stolen by parallel requests. The series
is based on mmotm from January 9th, 2019 with the previous compaction
series reverted.

I'm mostly using thpscale to measure the impact of the series. The
benchmark creates a large file, maps it, faults it, punches holes in the
mapping so that the virtual address space is fragmented and then tries
to allocate THP. It re-executes for different numbers of threads. From a
fragmentation perspective, the workload is relatively benign but it does
stress compaction.

The overall impact on latencies for a 1-socket machine is

baseline patches
Amean fault-both-3 3832.09 ( 0.00%) 2748.56 * 28.28%*
Amean fault-both-5 4933.06 ( 0.00%) 4255.52 ( 13.73%)
Amean fault-both-7 7017.75 ( 0.00%) 6586.93 ( 6.14%)
Amean fault-both-12 11610.51 ( 0.00%) 9162.34 * 21.09%*
Amean fault-both-18 17055.85 ( 0.00%) 11530.06 * 32.40%*
Amean fault-both-24 19306.27 ( 0.00%) 17956.13 ( 6.99%)
Amean fault-both-30 22516.49 ( 0.00%) 15686.47 * 30.33%*
Amean fault-both-32 23442.93 ( 0.00%) 16564.83 * 29.34%*

The allocation success rates are much improved

baseline patches
Percentage huge-3 85.99 ( 0.00%) 97.96 ( 13.92%)
Percentage huge-5 88.27 ( 0.00%) 96.87 ( 9.74%)
Percentage huge-7 85.87 ( 0.00%) 94.53 ( 10.09%)
Percentage huge-12 82.38 ( 0.00%) 98.44 ( 19.49%)
Percentage huge-18 83.29 ( 0.00%) 99.14 ( 19.04%)
Percentage huge-24 81.41 ( 0.00%) 97.35 ( 19.57%)
Percentage huge-30 80.98 ( 0.00%) 98.05 ( 21.08%)
Percentage huge-32 80.53 ( 0.00%) 97.06 ( 20.53%)

That's a nearly perfect allocation success rate.

The biggest impact is on the scan rates

Compaction migrate scanned 55893379 19341254
Compaction free scanned 474739990 11903963

The number of pages scanned for migration was reduced by 65% and the
free scanner was reduced by 97.5%. So much less work in exchange
for lower latency and better success rates.

The series was also evaluated using a workload that heavily fragments
memory but the benefits there are also significant, albeit not presented.

It was commented that we should be rethinking scanning entirely and to
a large extent I agree. However, to achieve that you need a lot of this
series in place first so it's best to make the linear scanners as best
as possible before ripping them out.

include/linux/compaction.h | 3 +-
include/linux/mmzone.h | 2 +
include/linux/sched.h | 4 +
kernel/sched/core.c | 3 +
mm/compaction.c | 1000 ++++++++++++++++++++++++++++++++++----------
mm/internal.h | 23 +-
mm/migrate.c | 2 +-
mm/page_alloc.c | 76 +++-
8 files changed, 888 insertions(+), 225 deletions(-)

--
2.16.4

2019-01-18 17:54:04

[permalink] [raw]

Subject: [PATCH 02/22] mm, compaction: Rearrange compact_control

compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/internal.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5564841fce36..867af5425432 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -184,14 +184,14 @@ extern int user_min_free_kbytes;
struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
struct list_head migratepages; /* List of pages being migrated */
- struct zone *zone;
unsigned int nr_freepages; /* Number of isolated free pages */
unsigned int nr_migratepages; /* Number of pages to migrate */
- unsigned long total_migrate_scanned;
- unsigned long total_free_scanned;
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+ struct zone *zone;
+ unsigned long total_migrate_scanned;
+ unsigned long total_free_scanned;
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
int order; /* order a direct compactor needs */
int migratetype; /* migratetype of direct compactor */
--
2.16.4

2019-01-18 17:54:06

[permalink] [raw]

Subject: [PATCH 03/22] mm, compaction: Remove last_migrated_pfn from compact_control

The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 25 +++++++++----------------
mm/internal.h | 1 -
2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c15b4bbc9e9e..e59dd7a7564c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -886,15 +886,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
cc->nr_migratepages++;
nr_isolated++;

- /*
- * Record where we could have freed pages by migration and not
- * yet flushed them to buddy allocator.
- * - this is the lowest page that was isolated and likely be
- * then freed by migration.
- */
- if (!cc->last_migrated_pfn)
- cc->last_migrated_pfn = low_pfn;
-
/* Avoid isolating too much */
if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
++low_pfn;
@@ -918,7 +909,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
}
putback_movable_pages(&cc->migratepages);
cc->nr_migratepages = 0;
- cc->last_migrated_pfn = 0;
nr_isolated = 0;
}

@@ -1539,6 +1529,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
enum compact_result ret;
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone_end_pfn(zone);
+ unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;

cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
@@ -1584,7 +1575,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
cc->whole_zone = true;
}

- cc->last_migrated_pfn = 0;
+ last_migrated_pfn = 0;

trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync);
@@ -1593,12 +1584,14 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro

while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
int err;
+ unsigned long start_pfn = cc->migrate_pfn;

switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_CONTENDED;
putback_movable_pages(&cc->migratepages);
cc->nr_migratepages = 0;
+ last_migrated_pfn = 0;
goto out;
case ISOLATE_NONE:
/*
@@ -1608,6 +1601,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
*/
goto check_drain;
case ISOLATE_SUCCESS:
+ last_migrated_pfn = start_pfn;
;
}

@@ -1639,8 +1633,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
cc->migrate_pfn = block_end_pfn(
cc->migrate_pfn - 1, cc->order);
/* Draining pcplists is useless in this case */
- cc->last_migrated_pfn = 0;
-
+ last_migrated_pfn = 0;
}
}

@@ -1652,18 +1645,18 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
* compact_finished() can detect immediately if allocation
* would succeed.
*/
- if (cc->order > 0 && cc->last_migrated_pfn) {
+ if (cc->order > 0 && last_migrated_pfn) {
int cpu;
unsigned long current_block_start =
block_start_pfn(cc->migrate_pfn, cc->order);

- if (cc->last_migrated_pfn < current_block_start) {
+ if (last_migrated_pfn < current_block_start) {
cpu = get_cpu();
lru_add_drain_cpu(cpu);
drain_local_pages(zone);
put_cpu();
/* No more flushing until we migrate again */
- cc->last_migrated_pfn = 0;
+ last_migrated_pfn = 0;
}
}

diff --git a/mm/internal.h b/mm/internal.h
index 867af5425432..f40d06d70683 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -188,7 +188,6 @@ struct compact_control {
unsigned int nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
- unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
struct zone *zone;
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
--
2.16.4

2019-01-18 17:54:11

[permalink] [raw]

Subject: [PATCH 01/22] mm, compaction: Shrink compact_control

The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 536bc2a839b9..5564841fce36 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -185,8 +185,8 @@ struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
struct list_head migratepages; /* List of pages being migrated */
struct zone *zone;
- unsigned long nr_freepages; /* Number of isolated free pages */
- unsigned long nr_migratepages; /* Number of pages to migrate */
+ unsigned int nr_freepages; /* Number of isolated free pages */
+ unsigned int nr_migratepages; /* Number of pages to migrate */
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
unsigned long free_pfn; /* isolate_freepages search base */
--
2.16.4

2019-01-18 17:54:41

[permalink] [raw]

Subject: [PATCH 04/22] mm, compaction: Remove unnecessary zone parameter in some instances

A zone parameter is passed into a number of top-level compaction functions
despite the fact that it's already in compact_control. This is harmless but
it did need an audit to check if zone actually ever changes meaningfully.
This patches removes the parameter in a number of top-level functions. The
change could be much deeper but this was enough to briefly clarify
the flow.

No functional change.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 54 ++++++++++++++++++++++++++----------------------------
1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e59dd7a7564c..163841e1b167 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order)
return order == -1;
}

-static enum compact_result __compact_finished(struct zone *zone,
- struct compact_control *cc)
+static enum compact_result __compact_finished(struct compact_control *cc)
{
unsigned int order;
const int migratetype = cc->migratetype;
@@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone *zone,
/* Compaction run completes if the migrate and free scanner meet */
if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
- reset_cached_positions(zone);
+ reset_cached_positions(cc->zone);

/*
* Mark that the PG_migrate_skip information should be cleared
@@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone *zone,
* based on an allocation request.
*/
if (cc->direct_compaction)
- zone->compact_blockskip_flush = true;
+ cc->zone->compact_blockskip_flush = true;

if (cc->whole_zone)
return COMPACT_COMPLETE;
@@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone *zone,

/* Direct compactor: Is a suitable page free? */
for (order = cc->order; order < MAX_ORDER; order++) {
- struct free_area *area = &zone->free_area[order];
+ struct free_area *area = &cc->zone->free_area[order];
bool can_steal;

/* Job done if page is free of the right migratetype */
@@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct zone *zone,
return COMPACT_NO_SUITABLE_PAGE;
}

-static enum compact_result compact_finished(struct zone *zone,
- struct compact_control *cc)
+static enum compact_result compact_finished(struct compact_control *cc)
{
int ret;

- ret = __compact_finished(zone, cc);
- trace_mm_compaction_finished(zone, cc->order, ret);
+ ret = __compact_finished(cc);
+ trace_mm_compaction_finished(cc->zone, cc->order, ret);
if (ret == COMPACT_NO_SUITABLE_PAGE)
ret = COMPACT_CONTINUE;

@@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
return false;
}

-static enum compact_result compact_zone(struct zone *zone, struct compact_control *cc)
+static enum compact_result compact_zone(struct compact_control *cc)
{
enum compact_result ret;
- unsigned long start_pfn = zone->zone_start_pfn;
- unsigned long end_pfn = zone_end_pfn(zone);
+ unsigned long start_pfn = cc->zone->zone_start_pfn;
+ unsigned long end_pfn = zone_end_pfn(cc->zone);
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;

cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
- ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
+ ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
cc->classzone_idx);
/* Compaction is likely to fail */
if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
@@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
* Clear pageblock skip if there were failures recently and compaction
* is about to be retried after being deferred.
*/
- if (compaction_restarting(zone, cc->order))
- __reset_isolation_suitable(zone);
+ if (compaction_restarting(cc->zone, cc->order))
+ __reset_isolation_suitable(cc->zone);

/*
* Setup to move all movable pages to the end of the zone. Used cached
@@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
cc->migrate_pfn = start_pfn;
cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
} else {
- cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
- cc->free_pfn = zone->compact_cached_free_pfn;
+ cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync];
+ cc->free_pfn = cc->zone->compact_cached_free_pfn;
if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
- zone->compact_cached_free_pfn = cc->free_pfn;
+ cc->zone->compact_cached_free_pfn = cc->free_pfn;
}
if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
cc->migrate_pfn = start_pfn;
- zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
- zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
+ cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
+ cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
}

if (cc->migrate_pfn == start_pfn)
@@ -1582,11 +1580,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro

migrate_prep_local();

- while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
+ while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) {
int err;
unsigned long start_pfn = cc->migrate_pfn;

- switch (isolate_migratepages(zone, cc)) {
+ switch (isolate_migratepages(cc->zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_CONTENDED;
putback_movable_pages(&cc->migratepages);
@@ -1653,7 +1651,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
if (last_migrated_pfn < current_block_start) {
cpu = get_cpu();
lru_add_drain_cpu(cpu);
- drain_local_pages(zone);
+ drain_local_pages(cc->zone);
put_cpu();
/* No more flushing until we migrate again */
last_migrated_pfn = 0;
@@ -1678,8 +1676,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
* Only go back, not forward. The cached pfn might have been
* already reset to zone end in compact_finished()
*/
- if (free_pfn > zone->compact_cached_free_pfn)
- zone->compact_cached_free_pfn = free_pfn;
+ if (free_pfn > cc->zone->compact_cached_free_pfn)
+ cc->zone->compact_cached_free_pfn = free_pfn;
}

count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
@@ -1716,7 +1714,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);

- ret = compact_zone(zone, &cc);
+ ret = compact_zone(&cc);

VM_BUG_ON(!list_empty(&cc.freepages));
VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -1834,7 +1832,7 @@ static void compact_node(int nid)
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);

- compact_zone(zone, &cc);
+ compact_zone(&cc);

VM_BUG_ON(!list_empty(&cc.freepages));
VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -1968,7 +1966,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)

if (kthread_should_stop())
return;
- status = compact_zone(zone, &cc);
+ status = compact_zone(&cc);

if (status == COMPACT_SUCCESS) {
compaction_defer_reset(zone, cc.order, false);
--
2.16.4

2019-01-18 17:55:04

[permalink] [raw]

Subject: [PATCH 07/22] mm, compaction: Always finish scanning of a full pageblock

When compaction is finishing, it uses a flag to ensure the pageblock
is complete but it makes sense to always complete migration of a
pageblock. Minimally, skip information is based on a pageblock and
partially scanned pageblocks may incur more scanning in the future. The
pageblock skip handling also becomes more strict later in the series and
the hint is more useful if a complete pageblock was always scanned.

The potentially impacts latency as more scanning is done but it's not a
consistent win or loss as the scanning is not always a high percentage
of the pageblock and sometimes it is offset by future reductions
in scanning. Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern. However, full
scanning of the pageblock is important for later patches.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 19 ++++++++-----------
mm/internal.h | 1 -
2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 32a88b49f973..3d11c209614a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1331,16 +1331,14 @@ static enum compact_result __compact_finished(struct compact_control *cc)
if (is_via_compact_memory(cc->order))
return COMPACT_CONTINUE;

- if (cc->finishing_block) {
- /*
- * We have finished the pageblock, but better check again that
- * we really succeeded.
- */
- if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
- cc->finishing_block = false;
- else
- return COMPACT_CONTINUE;
- }
+ /*
+ * Always finish scanning a pageblock to reduce the possibility of
+ * fallbacks in the future. This is particularly important when
+ * migration source is unmovable/reclaimable but it's not worth
+ * special casing.
+ */
+ if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+ return COMPACT_CONTINUE;

/* Direct compactor: Is a suitable page free? */
for (order = cc->order; order < MAX_ORDER; order++) {
@@ -1382,7 +1380,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
return COMPACT_SUCCESS;
}

- cc->finishing_block = true;
return COMPACT_CONTINUE;
}
}
diff --git a/mm/internal.h b/mm/internal.h
index f40d06d70683..9b32f4cab0ae 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -203,7 +203,6 @@ struct compact_control {
bool direct_compaction; /* False from kcompactd or /proc/... */
bool whole_zone; /* Whole zone should/has been scanned */
bool contended; /* Signal lock or sched contention */
- bool finishing_block; /* Finishing current pageblock */
};

unsigned long
--
2.16.4

2019-01-18 17:55:06

[permalink] [raw]

Subject: [PATCH 06/22] mm, migrate: Immediately fail migration of a page with no migration handler

Pages with no migration handler use a fallback handler which sometimes
works and sometimes persistently retries. A historical example was blockdev
pages but there are others such as odd refcounting when page->private
is used. These are retried multiple times which is wasteful during
compaction so this patch will fail migration faster unless the caller
specifies MIGRATE_SYNC.

This is not expected to help THP allocation success rates but it did
reduce latencies very slightly in some cases.

1-socket thpfioscale
4.20.0 4.20.0
noreserved-v2r15 failfast-v2r15
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3839.67 ( 0.00%) 3833.72 ( 0.15%)
Amean fault-both-5 5177.47 ( 0.00%) 4967.15 ( 4.06%)
Amean fault-both-7 7245.03 ( 0.00%) 7139.19 ( 1.46%)
Amean fault-both-12 11534.89 ( 0.00%) 11326.30 ( 1.81%)
Amean fault-both-18 16241.10 ( 0.00%) 16270.70 ( -0.18%)
Amean fault-both-24 19075.91 ( 0.00%) 19839.65 ( -4.00%)
Amean fault-both-30 22712.11 ( 0.00%) 21707.05 ( 4.43%)
Amean fault-both-32 21692.92 ( 0.00%) 21968.16 ( -1.27%)

The 2-socket results are not materially different. Scan rates are similar
as expected.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 52b04c217e30..4512afab46ac 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -916,7 +916,7 @@ static int fallback_migrate_page(struct address_space *mapping,
*/
if (page_has_private(page) &&
!try_to_release_page(page, GFP_KERNEL))
- return -EAGAIN;
+ return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;

return migrate_page(mapping, newpage, page, mode);
}
--
2.16.4

2019-01-18 17:55:32

[permalink] [raw]

Subject: [PATCH 09/22] mm, compaction: Use free lists to quickly locate a migration source

The migration scanner is a linear scan of a zone with a potentiall large
search space. Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire block.
The lowest PFN found during searches is tracked as the basis of the start
for the linear search after the first search of the free list fails.
After the search, the free list is shuffled so that the next search will
not encounter the same page. If the search fails then the subsequent
searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used. It
is somewhat pointless to use the list search in those cases. Small free
pages must be used for the search and there is no guarantee that movable
pages are located within that block that are contiguous.

5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%)
Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%)
Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%)
Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%)
Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%)
Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%*
Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%*
Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%*

5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%)
Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%)
Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%)
Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%)
Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%)
Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%)
Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%)
Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%)

This shows an improvement in allocation latencies with similar allocation
success rates. While not presented, there was a 31% reduction in migration
scanning and a 8% reduction on system CPU usage. A 2-socket machine showed
similar benefits.

[[email protected]: Migrate block that was found-fast, some optimisations]
Signed-off-by: Mel Gorman <[email protected]>
---
mm/compaction.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
mm/internal.h | 2 +
2 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3d11c209614a..92d10eb3d1c7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1040,6 +1040,12 @@ static bool suitable_migration_target(struct compact_control *cc,
return false;
}

+static inline unsigned int
+freelist_scan_limit(struct compact_control *cc)
+{
+ return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
+}
+
/*
* Test whether the free scanner has reached the same or lower pageblock than
* the migration scanner, and compaction should thus terminate.
@@ -1050,6 +1056,19 @@ static inline bool compact_scanners_met(struct compact_control *cc)
<= (cc->migrate_pfn >> pageblock_order);
}

+/* Reorder the free list to reduce repeated future searches */
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+ LIST_HEAD(sublist);
+
+ if (!list_is_last(freelist, &freepage->lru)) {
+ list_cut_position(&sublist, freelist, &freepage->lru);
+ if (!list_empty(&sublist))
+ list_splice_tail(&sublist, freelist);
+ }
+}
+
/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
@@ -1207,6 +1226,146 @@ typedef enum {
*/
int sysctl_compact_unevictable_allowed __read_mostly = 1;

+static inline void
+update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
+{
+ if (cc->fast_start_pfn == ULONG_MAX)
+ return;
+
+ if (!cc->fast_start_pfn)
+ cc->fast_start_pfn = pfn;
+
+ cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
+}
+
+static inline void
+reinit_migrate_pfn(struct compact_control *cc)
+{
+ if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
+ return;
+
+ cc->migrate_pfn = cc->fast_start_pfn;
+ cc->fast_start_pfn = ULONG_MAX;
+}
+
+/*
+ * Briefly search the free lists for a migration source that already has
+ * some free pages to reduce the number of pages that need migration
+ * before a pageblock is free.
+ */
+static unsigned long fast_find_migrateblock(struct compact_control *cc)
+{
+ unsigned int limit = freelist_scan_limit(cc);
+ unsigned int nr_scanned = 0;
+ unsigned long distance;
+ unsigned long pfn = cc->migrate_pfn;
+ unsigned long high_pfn;
+ int order;
+
+ /* Skip hints are relied on to avoid repeats on the fast search */
+ if (cc->ignore_skip_hint)
+ return pfn;
+
+ /*
+ * If the migrate_pfn is not at the start of a zone or the start
+ * of a pageblock then assume this is a continuation of a previous
+ * scan restarted due to COMPACT_CLUSTER_MAX.
+ */
+ if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn))
+ return pfn;
+
+ /*
+ * For smaller orders, just linearly scan as the number of pages
+ * to migrate should be relatively small and does not necessarily
+ * justify freeing up a large block for a small allocation.
+ */
+ if (cc->order <= PAGE_ALLOC_COSTLY_ORDER)
+ return pfn;
+
+ /*
+ * Only allow kcompactd and direct requests for movable pages to
+ * quickly clear out a MOVABLE pageblock for allocation. This
+ * reduces the risk that a large movable pageblock is freed for
+ * an unmovable/reclaimable small allocation.
+ */
+ if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
+ return pfn;
+
+ /*
+ * When starting the migration scanner, pick any pageblock within the
+ * first half of the search space. Otherwise try and pick a pageblock
+ * within the first eighth to reduce the chances that a migration
+ * target later becomes a source.
+ */
+ distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
+ if (cc->migrate_pfn != cc->zone->zone_start_pfn)
+ distance >>= 2;
+ high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
+
+ for (order = cc->order - 1;
+ order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
+ order--) {
+ struct free_area *area = &cc->zone->free_area[order];
+ struct list_head *freelist;
+ unsigned long flags;
+ struct page *freepage;
+
+ if (!area->nr_free)
+ continue;
+
+ spin_lock_irqsave(&cc->zone->lock, flags);
+ freelist = &area->free_list[MIGRATE_MOVABLE];
+ list_for_each_entry(freepage, freelist, lru) {
+ unsigned long free_pfn;
+
+ nr_scanned++;
+ free_pfn = page_to_pfn(freepage);
+ if (free_pfn < high_pfn) {
+ update_fast_start_pfn(cc, free_pfn);
+
+ /*
+ * Avoid if skipped recently. Ideally it would
+ * move to the tail but even safe iteration of
+ * the list assumes an entry is deleted, not
+ * reordered.
+ */
+ if (get_pageblock_skip(freepage)) {
+ if (list_is_last(freelist, &freepage->lru))
+ break;
+
+ continue;
+ }
+
+ /* Reorder to so a future search skips recent pages */
+ move_freelist_tail(freelist, freepage);
+
+ pfn = pageblock_start_pfn(free_pfn);
+ cc->fast_search_fail = 0;
+ set_pageblock_skip(freepage);
+ break;
+ }
+
+ if (nr_scanned >= limit) {
+ cc->fast_search_fail++;
+ move_freelist_tail(freelist, freepage);
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+ }
+
+ cc->total_migrate_scanned += nr_scanned;
+
+ /*
+ * If fast scanning failed then use a cached entry for a page block
+ * that had free pages as the basis for starting a linear scan.
+ */
+ if (pfn == cc->migrate_pfn)
+ reinit_migrate_pfn(cc);
+
+ return pfn;
+}
+
/*
* Isolate all pages that can be migrated from the first suitable block,
* starting at the block pointed to by the migrate scanner pfn within
@@ -1222,16 +1381,25 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
const isolate_mode_t isolate_mode =
(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+ bool fast_find_block;

/*
* Start at where we last stopped, or beginning of the zone as
- * initialized by compact_zone()
+ * initialized by compact_zone(). The first failure will use
+ * the lowest PFN as the starting point for linear scanning.
*/
- low_pfn = cc->migrate_pfn;
+ low_pfn = fast_find_migrateblock(cc);
block_start_pfn = pageblock_start_pfn(low_pfn);
if (block_start_pfn < zone->zone_start_pfn)
block_start_pfn = zone->zone_start_pfn;

+ /*
+ * fast_find_migrateblock marks a pageblock skipped so to avoid
+ * the isolation_suitable check below, check whether the fast
+ * search was successful.
+ */
+ fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail;
+
/* Only scan within a pageblock boundary */
block_end_pfn = pageblock_end_pfn(low_pfn);

@@ -1240,6 +1408,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
* Do not cross the free scanner.
*/
for (; block_end_pfn <= cc->free_pfn;
+ fast_find_block = false,
low_pfn = block_end_pfn,
block_start_pfn = block_end_pfn,
block_end_pfn += pageblock_nr_pages) {
@@ -1259,7 +1428,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
continue;

/* If isolation recently failed, do not retry */
- if (!isolation_suitable(cc, page))
+ if (!isolation_suitable(cc, page) && !fast_find_block)
continue;

/*
@@ -1550,6 +1719,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
* want to compact the whole zone), but check that it is initialised
* by ensuring the values are within zone boundaries.
*/
+ cc->fast_start_pfn = 0;
if (cc->whole_zone) {
cc->migrate_pfn = start_pfn;
cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
diff --git a/mm/internal.h b/mm/internal.h
index 9b32f4cab0ae..983cb975545f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -188,9 +188,11 @@ struct compact_control {
unsigned int nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
+ unsigned long fast_start_pfn; /* a pfn to start linear scan from */
struct zone *zone;
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
+ unsigned int fast_search_fail; /* failures to use free list searches */
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
int order; /* order a direct compactor needs */
int migratetype; /* migratetype of direct compactor */
--
2.16.4

2019-01-18 17:55:37

[permalink] [raw]

Subject: [PATCH 05/22] mm, compaction: Rename map_pages to split_map_pages

It's non-obvious that high-order free pages are split into order-0 pages
from the function name. Fix it.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 163841e1b167..32a88b49f973 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head *freelist)
return high_pfn;
}

-static void map_pages(struct list_head *list)
+static void split_map_pages(struct list_head *list)
{
unsigned int i, order, nr_pages;
struct page *page, *next;
@@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc,
}

/* __isolate_free_page() does not map the pages */
- map_pages(&freelist);
+ split_map_pages(&freelist);

if (pfn < end_pfn) {
/* Loop terminated early, cleanup. */
@@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc)
}

/* __isolate_free_page() does not map the pages */
- map_pages(freelist);
+ split_map_pages(freelist);

/*
* Record where the free scanner will restart next time. Either we
--
2.16.4

2019-01-18 17:56:01

[permalink] [raw]

Subject: [PATCH 13/22] mm, compaction: Finish pageblock scanning on contention

Async migration aborts on spinlock contention but contention can be high
when there are multiple compaction attempts and kswapd is active. The
consequence is that the migration scanners move forward uselessly while
still contending on locks for longer while leaving suitable migration
sources behind.

This patch will acquire the lock but track when contention occurs. When
it does, the current pageblock will finish as compaction may succeed for
that block and then abort. This will have a variable impact on latency as
in some cases useless scanning is avoided (reduces latency) but a lock
will be contended (increase latency) or a single contended pageblock is
scanned that would otherwise have been skipped (increase latency).

5.0.0-rc1 5.0.0-rc1
norescan-v3r16 finishcontend-v3r16
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3002.07 ( 0.00%) 3153.17 ( -5.03%)
Amean fault-both-5 4684.47 ( 0.00%) 4280.52 ( 8.62%)
Amean fault-both-7 6815.54 ( 0.00%) 5811.50 * 14.73%*
Amean fault-both-12 10864.02 ( 0.00%) 9276.85 ( 14.61%)
Amean fault-both-18 12247.52 ( 0.00%) 11032.67 ( 9.92%)
Amean fault-both-24 15683.99 ( 0.00%) 14285.70 ( 8.92%)
Amean fault-both-30 18620.02 ( 0.00%) 16293.76 * 12.49%*
Amean fault-both-32 19250.28 ( 0.00%) 16721.02 * 13.14%*

5.0.0-rc1 5.0.0-rc1
norescan-v3r16 finishcontend-v3r16
Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%)
Percentage huge-3 95.00 ( 0.00%) 96.82 ( 1.92%)
Percentage huge-5 94.22 ( 0.00%) 95.40 ( 1.26%)
Percentage huge-7 92.35 ( 0.00%) 95.92 ( 3.86%)
Percentage huge-12 91.90 ( 0.00%) 96.73 ( 5.25%)
Percentage huge-18 89.58 ( 0.00%) 96.77 ( 8.03%)
Percentage huge-24 90.03 ( 0.00%) 96.05 ( 6.69%)
Percentage huge-30 89.14 ( 0.00%) 96.81 ( 8.60%)
Percentage huge-32 90.58 ( 0.00%) 97.41 ( 7.54%)

There is a variable impact that is mostly good on latency while allocation
success rates are slightly higher. System CPU usage is reduced by about
10% but scan rate impact is mixed

Compaction migrate scanned 27997659.00 20148867
Compaction free scanned 120782791.00 118324914

Migration scan rates are reduced 28% which is expected as a pageblock
is used by the async scanner instead of skipped. The impact on the free
scanner is known to be variable. Overall the primary justification for
this patch is that completing scanning of a pageblock is very important
for later patches.

[[email protected]: Fix unused variable warning]
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 90 ++++++++++++++++++++++-----------------------------------
1 file changed, 34 insertions(+), 56 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a31fea7b3f96..b261c0bfac24 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -382,24 +382,25 @@ static bool test_and_set_skip(struct compact_control *cc, struct page *page,

/*
* Compaction requires the taking of some coarse locks that are potentially
- * very heavily contended. For async compaction, back out if the lock cannot
- * be taken immediately. For sync compaction, spin on the lock if needed.
+ * very heavily contended. For async compaction, trylock and record if the
+ * lock is contended. The lock will still be acquired but compaction will
+ * abort when the current block is finished regardless of success rate.
+ * Sync compaction acquires the lock.
*
- * Returns true if the lock is held
- * Returns false if the lock is not held and compaction should abort
+ * Always returns true which makes it easier to track lock state in callers.
*/
-static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
+static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
struct compact_control *cc)
{
- if (cc->mode == MIGRATE_ASYNC) {
- if (!spin_trylock_irqsave(lock, *flags)) {
- cc->contended = true;
- return false;
- }
- } else {
- spin_lock_irqsave(lock, *flags);
+ /* Track if the lock is contended in async mode */
+ if (cc->mode == MIGRATE_ASYNC && !cc->contended) {
+ if (spin_trylock_irqsave(lock, *flags))
+ return true;
+
+ cc->contended = true;
}

+ spin_lock_irqsave(lock, *flags);
return true;
}

@@ -432,10 +433,8 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
}

if (need_resched()) {
- if (cc->mode == MIGRATE_ASYNC) {
+ if (cc->mode == MIGRATE_ASYNC)
cc->contended = true;
- return true;
- }
cond_resched();
}

@@ -455,10 +454,8 @@ static inline bool compact_should_abort(struct compact_control *cc)
{
/* async compaction aborts if contended */
if (need_resched()) {
- if (cc->mode == MIGRATE_ASYNC) {
+ if (cc->mode == MIGRATE_ASYNC)
cc->contended = true;
- return true;
- }

cond_resched();
}
@@ -535,18 +532,8 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
* recheck as well.
*/
if (!locked) {
- /*
- * The zone lock must be held to isolate freepages.
- * Unfortunately this is a very coarse lock and can be
- * heavily contended if there are parallel allocations
- * or parallel compactions. For async compaction do not
- * spin on the lock and we acquire the lock as late as
- * possible.
- */
- locked = compact_trylock_irqsave(&cc->zone->lock,
+ locked = compact_lock_irqsave(&cc->zone->lock,
&flags, cc);
- if (!locked)
- break;

/* Recheck this is a buddy page under lock */
if (!PageBuddy(page))
@@ -900,15 +887,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

/* If we already hold the lock, we can skip some rechecking */
if (!locked) {
- locked = compact_trylock_irqsave(zone_lru_lock(zone),
+ locked = compact_lock_irqsave(zone_lru_lock(zone),
&flags, cc);

- /* Allow future scanning if the lock is contended */
- if (!locked) {
- clear_pageblock_skip(page);
- break;
- }
-
/* Try get exclusive access under lock */
if (!skip_updated) {
skip_updated = true;
@@ -951,9 +932,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

/*
* Avoid isolating too much unless this block is being
- * rescanned (e.g. dirty/writeback pages, parallel allocation).
+ * rescanned (e.g. dirty/writeback pages, parallel allocation)
+ * or a lock is contended. For contention, isolate quickly to
+ * potentially remove one source of contention.
*/
- if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) {
+ if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
+ !cc->rescan && !cc->contended) {
++low_pfn;
break;
}
@@ -1416,12 +1400,8 @@ static void isolate_freepages(struct compact_control *cc)
isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
freelist, false);

- /*
- * If we isolated enough freepages, or aborted due to lock
- * contention, terminate.
- */
- if ((cc->nr_freepages >= cc->nr_migratepages)
- || cc->contended) {
+ /* Are enough freepages isolated? */
+ if (cc->nr_freepages >= cc->nr_migratepages) {
if (isolate_start_pfn >= block_end_pfn) {
/*
* Restart at previous pageblock if more
@@ -1463,13 +1443,8 @@ static struct page *compaction_alloc(struct page *migratepage,
struct compact_control *cc = (struct compact_control *)data;
struct page *freepage;

- /*
- * Isolate free pages if necessary, and if we are not aborting due to
- * contention.
- */
if (list_empty(&cc->freepages)) {
- if (!cc->contended)
- isolate_freepages(cc);
+ isolate_freepages(cc);

if (list_empty(&cc->freepages))
return NULL;
@@ -1731,7 +1706,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
low_pfn = isolate_migratepages_block(cc, low_pfn,
block_end_pfn, isolate_mode);

- if (!low_pfn || cc->contended)
+ if (!low_pfn)
return ISOLATE_ABORT;

/*
@@ -1761,9 +1736,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
{
unsigned int order;
const int migratetype = cc->migratetype;
-
- if (cc->contended || fatal_signal_pending(current))
- return COMPACT_CONTENDED;
+ int ret;

/* Compaction run completes if the migrate and free scanner meet */
if (compact_scanners_met(cc)) {
@@ -1798,6 +1771,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
return COMPACT_CONTINUE;

/* Direct compactor: Is a suitable page free? */
+ ret = COMPACT_NO_SUITABLE_PAGE;
for (order = cc->order; order < MAX_ORDER; order++) {
struct free_area *area = &cc->zone->free_area[order];
bool can_steal;
@@ -1837,11 +1811,15 @@ static enum compact_result __compact_finished(struct compact_control *cc)
return COMPACT_SUCCESS;
}

- return COMPACT_CONTINUE;
+ ret = COMPACT_CONTINUE;
+ break;
}
}

- return COMPACT_NO_SUITABLE_PAGE;
+ if (cc->contended || fatal_signal_pending(current))
+ ret = COMPACT_CONTENDED;
+
+ return ret;
}

static enum compact_result compact_finished(struct compact_control *cc)
--
2.16.4

2019-01-18 17:56:06

[permalink] [raw]

Subject: [PATCH 12/22] mm, compaction: Avoid rescanning the same pageblock multiple times

Pageblocks are marked for skip when no pages are isolated after a scan.
However, it's possible to hit corner cases where the migration scanner
gets stuck near the boundary between the source and target scanner. Due
to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that
are migrated can be reallocated before the pageblock is complete. The
pageblock is not necessarily skipped so it can be rescanned multiple
times. Similarly, a pageblock with some dirty/writeback pages may fail
to migrate and be rescanned until writeback completes which is wasteful.

This patch tracks if a pageblock is being rescanned. If so, then the entire
pageblock will be migrated as one operation. This narrows the race window
during which pages can be reallocated during migration. Secondly, if there
are pages that cannot be isolated then the pageblock will still be fully
scanned and marked for skipping. On the second rescan, the pageblock skip
is set and the migration scanner makes progress.

5.0.0-rc1 5.0.0-rc1
findfree-v3r16 norescan-v3r16
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3200.68 ( 0.00%) 3002.07 ( 6.21%)
Amean fault-both-5 4847.75 ( 0.00%) 4684.47 ( 3.37%)
Amean fault-both-7 6658.92 ( 0.00%) 6815.54 ( -2.35%)
Amean fault-both-12 11077.62 ( 0.00%) 10864.02 ( 1.93%)
Amean fault-both-18 12403.97 ( 0.00%) 12247.52 ( 1.26%)
Amean fault-both-24 15607.10 ( 0.00%) 15683.99 ( -0.49%)
Amean fault-both-30 18752.27 ( 0.00%) 18620.02 ( 0.71%)
Amean fault-both-32 21207.54 ( 0.00%) 19250.28 * 9.23%*

5.0.0-rc1 5.0.0-rc1
findfree-v3r16 norescan-v3r16
Percentage huge-3 96.86 ( 0.00%) 95.00 ( -1.91%)
Percentage huge-5 93.72 ( 0.00%) 94.22 ( 0.53%)
Percentage huge-7 94.31 ( 0.00%) 92.35 ( -2.08%)
Percentage huge-12 92.66 ( 0.00%) 91.90 ( -0.82%)
Percentage huge-18 91.51 ( 0.00%) 89.58 ( -2.11%)
Percentage huge-24 90.50 ( 0.00%) 90.03 ( -0.52%)
Percentage huge-30 91.57 ( 0.00%) 89.14 ( -2.65%)
Percentage huge-32 91.00 ( 0.00%) 90.58 ( -0.46%)

Negligible difference but this was likely a case when the specific corner
case was not hit. A previous run of the same patch based on an earlier
iteration of the series showed large differences where migration rates
could be halved when the corner case was hit.

The specific corner case where migration scan rates go through the roof
was due to a dirty/writeback pageblock located at the boundary of the
migration/free scanner did not happen in this case. When it does happen,
the scan rates multipled by massive margins.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 32 ++++++++++++++++++++++++++------
mm/internal.h | 1 +
2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 19fea4a7b3f4..a31fea7b3f96 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -949,8 +949,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
cc->nr_migratepages++;
nr_isolated++;

- /* Avoid isolating too much */
- if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
+ /*
+ * Avoid isolating too much unless this block is being
+ * rescanned (e.g. dirty/writeback pages, parallel allocation).
+ */
+ if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) {
++low_pfn;
break;
}
@@ -997,11 +1000,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
spin_unlock_irqrestore(zone_lru_lock(zone), flags);

/*
- * Updated the cached scanner pfn if the pageblock was scanned
- * without isolating a page. The pageblock may not be marked
- * skipped already if there were no LRU pages in the block.
+ * Updated the cached scanner pfn once the pageblock has been scanned
+ * Pages will either be migrated in which case there is no point
+ * scanning in the near future or migration failed in which case the
+ * failure reason may persist. The block is marked for skipping if
+ * there were no pages isolated in the block or if the block is
+ * rescanned twice in a row.
*/
- if (low_pfn == end_pfn && !nr_isolated) {
+ if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
if (valid_page && !skip_updated)
set_pageblock_skip(valid_page);
update_cached_migrate(cc, low_pfn);
@@ -2033,6 +2039,20 @@ static enum compact_result compact_zone(struct compact_control *cc)
int err;
unsigned long start_pfn = cc->migrate_pfn;

+ /*
+ * Avoid multiple rescans which can happen if a page cannot be
+ * isolated (dirty/writeback in async mode) or if the migrated
+ * pages are being allocated before the pageblock is cleared.
+ * The first rescan will capture the entire pageblock for
+ * migration. If it fails, it'll be marked skip and scanning
+ * will proceed as normal.
+ */
+ cc->rescan = false;
+ if (pageblock_start_pfn(last_migrated_pfn) ==
+ pageblock_start_pfn(start_pfn)) {
+ cc->rescan = true;
+ }
+
switch (isolate_migratepages(cc->zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_CONTENDED;
diff --git a/mm/internal.h b/mm/internal.h
index 983cb975545f..d5b999e5eb5f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -205,6 +205,7 @@ struct compact_control {
bool direct_compaction; /* False from kcompactd or /proc/... */
bool whole_zone; /* Whole zone should/has been scanned */
bool contended; /* Signal lock or sched contention */
+ bool rescan; /* Rescanning the same pageblock */
};

unsigned long
--
2.16.4

2019-01-18 17:56:18

[permalink] [raw]

Subject: [PATCH 08/22] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction

When pageblocks get fragmented, watermarks are artifically boosted to
reclaim pages to avoid further fragmentation events. However, compaction
is often either fragmentation-neutral or moving movable pages away from
unmovable/reclaimable pages. As the true watermarks are preserved, allow
compaction to ignore the boost factor.

The expected impact is very slight as the main benefit is that compaction
is slightly more likely to succeed when the system has been fragmented
very recently. On both 1-socket and 2-socket machines for THP-intensive
allocation during fragmentation the success rate was increased by less
than 1% which is marginal. However, detailed tracing indicated that
failure of migration due to a premature ENOMEM triggered by watermark
checks were eliminated.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc769ff4fb2c..6607cb7131b0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3071,7 +3071,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
* watermark, because we already know our high-order page
* exists.
*/
- watermark = min_wmark_pages(zone) + (1UL << order);
+ watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
return 0;

--
2.16.4

2019-01-18 17:56:31

[permalink] [raw]

Subject: [PATCH 15/22] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks

Migrate has separate cached PFNs for ASYNC and SYNC* migration on the
basis that some migrations will fail in ASYNC mode. However, if the cached
PFNs match at the start of scanning and pageblocks are skipped due to
having no isolation candidates, then the sync state does not matter.
This patch keeps matching cached PFNs in sync until a pageblock with
isolation candidates is found.

The actual benefit is marginal given that the sync scanner following the
async scanner will often skip a number of pageblocks but it's useless
work. Any benefit depends heavily on whether the scanners restarted
recently.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 14bb66d48392..829540f6f3da 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1969,6 +1969,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
unsigned long end_pfn = zone_end_pfn(cc->zone);
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
+ bool update_cached;

cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
@@ -2016,6 +2017,17 @@ static enum compact_result compact_zone(struct compact_control *cc)

last_migrated_pfn = 0;

+ /*
+ * Migrate has separate cached PFNs for ASYNC and SYNC* migration on
+ * the basis that some migrations will fail in ASYNC mode. However,
+ * if the cached PFNs match and pageblocks are skipped due to having
+ * no isolation candidates, then the sync state does not matter.
+ * Until a pageblock with isolation candidates is found, keep the
+ * cached PFNs in sync to avoid revisiting the same blocks.
+ */
+ update_cached = !sync &&
+ cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1];
+
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync);

@@ -2047,6 +2059,11 @@ static enum compact_result compact_zone(struct compact_control *cc)
last_migrated_pfn = 0;
goto out;
case ISOLATE_NONE:
+ if (update_cached) {
+ cc->zone->compact_cached_migrate_pfn[1] =
+ cc->zone->compact_cached_migrate_pfn[0];
+ }
+
/*
* We haven't isolated and migrated anything, but
* there might still be unflushed migrations from
@@ -2054,6 +2071,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
*/
goto check_drain;
case ISOLATE_SUCCESS:
+ update_cached = false;
last_migrated_pfn = start_pfn;
;
}
--
2.16.4

2019-01-18 17:56:33

[permalink] [raw]

Subject: [PATCH 10/22] mm, compaction: Keep migration source private to a single compaction instance

Due to either a fast search of the free list or a linear scan, it is
possible for multiple compaction instances to pick the same pageblock
for migration. This is lucky for one scanner and increased scanning for
all the others. It also allows a race between requests on which first
allocates the resulting free block.

This patch tests and updates the pageblock skip for the migration scanner
carefully. When isolating a block, it will check and skip if the block is
already in use. Once the zone lock is acquired, it will be rechecked so
that only one scanner can set the pageblock skip for exclusive use. Any
scanner contending will continue with a linear scan. The skip bit is
still set if no pages can be isolated in a range. While this may result
in redundant scanning, it avoids unnecessarily acquiring the zone lock
when there are no suitable migration sources.

1-socket thpscale
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3390.40 ( 0.00%) 3024.41 ( 10.80%)
Amean fault-both-5 5082.28 ( 0.00%) 4749.30 ( 6.55%)
Amean fault-both-7 7012.51 ( 0.00%) 6454.95 ( 7.95%)
Amean fault-both-12 11346.63 ( 0.00%) 10324.83 ( 9.01%)
Amean fault-both-18 15324.19 ( 0.00%) 12896.82 * 15.84%*
Amean fault-both-24 16088.50 ( 0.00%) 13470.60 * 16.27%*
Amean fault-both-30 18723.42 ( 0.00%) 17143.99 ( 8.44%)
Amean fault-both-32 18612.01 ( 0.00%) 17743.91 ( 4.66%)

5.0.0-rc1 5.0.0-rc1
findmig-v3r15 isolmig-v3r15
Percentage huge-3 89.83 ( 0.00%) 92.96 ( 3.48%)
Percentage huge-5 91.96 ( 0.00%) 93.26 ( 1.41%)
Percentage huge-7 92.85 ( 0.00%) 93.63 ( 0.84%)
Percentage huge-12 92.74 ( 0.00%) 92.80 ( 0.07%)
Percentage huge-18 91.71 ( 0.00%) 91.62 ( -0.10%)
Percentage huge-24 92.13 ( 0.00%) 91.50 ( -0.69%)
Percentage huge-30 93.79 ( 0.00%) 92.73 ( -1.13%)
Percentage huge-32 91.27 ( 0.00%) 91.94 ( 0.74%)

This shows a reasonable reduction in latency as multiple compaction
scanners do not operate on the same blocks with a similar allocation
success rate.

Compaction migrate scanned 41093126 25646769

Migration scan rates are reduced by 38%.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 124 ++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 99 insertions(+), 25 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 92d10eb3d1c7..7c4c9cce7907 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -285,13 +285,52 @@ void reset_isolation_suitable(pg_data_t *pgdat)
}
}

+/*
+ * Sets the pageblock skip bit if it was clear. Note that this is a hint as
+ * locks are not required for read/writers. Returns true if it was already set.
+ */
+static bool test_and_set_skip(struct compact_control *cc, struct page *page,
+ unsigned long pfn)
+{
+ bool skip;
+
+ /* Do no update if skip hint is being ignored */
+ if (cc->ignore_skip_hint)
+ return false;
+
+ if (!IS_ALIGNED(pfn, pageblock_nr_pages))
+ return false;
+
+ skip = get_pageblock_skip(page);
+ if (!skip && !cc->no_set_skip_hint)
+ set_pageblock_skip(page);
+
+ return skip;
+}
+
+static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
+{
+ struct zone *zone = cc->zone;
+
+ pfn = pageblock_end_pfn(pfn);
+
+ /* Set for isolation rather than compaction */
+ if (cc->no_set_skip_hint)
+ return;
+
+ if (pfn > zone->compact_cached_migrate_pfn[0])
+ zone->compact_cached_migrate_pfn[0] = pfn;
+ if (cc->mode != MIGRATE_ASYNC &&
+ pfn > zone->compact_cached_migrate_pfn[1])
+ zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
/*
* If no pages were isolated then mark this pageblock to be skipped in the
* future. The information is later cleared by __reset_isolation_suitable().
*/
static void update_pageblock_skip(struct compact_control *cc,
- struct page *page, unsigned long nr_isolated,
- bool migrate_scanner)
+ struct page *page, unsigned long nr_isolated)
{
struct zone *zone = cc->zone;
unsigned long pfn;
@@ -310,16 +349,8 @@ static void update_pageblock_skip(struct compact_control *cc,
pfn = page_to_pfn(page);

/* Update where async and sync compaction should restart */
- if (migrate_scanner) {
- if (pfn > zone->compact_cached_migrate_pfn[0])
- zone->compact_cached_migrate_pfn[0] = pfn;
- if (cc->mode != MIGRATE_ASYNC &&
- pfn > zone->compact_cached_migrate_pfn[1])
- zone->compact_cached_migrate_pfn[1] = pfn;
- } else {
- if (pfn < zone->compact_cached_free_pfn)
- zone->compact_cached_free_pfn = pfn;
- }
+ if (pfn < zone->compact_cached_free_pfn)
+ zone->compact_cached_free_pfn = pfn;
}
#else
static inline bool isolation_suitable(struct compact_control *cc,
@@ -334,10 +365,19 @@ static inline bool pageblock_skip_persistent(struct page *page)
}

static inline void update_pageblock_skip(struct compact_control *cc,
- struct page *page, unsigned long nr_isolated,
- bool migrate_scanner)
+ struct page *page, unsigned long nr_isolated)
{
}
+
+static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
+{
+}
+
+static bool test_and_set_skip(struct compact_control *cc, struct page *page,
+ unsigned long pfn)
+{
+ return false;
+}
#endif /* CONFIG_COMPACTION */

/*
@@ -567,7 +607,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,

/* Update the pageblock-skip if the whole pageblock was scanned */
if (blockpfn == end_pfn)
- update_pageblock_skip(cc, valid_page, total_isolated, false);
+ update_pageblock_skip(cc, valid_page, total_isolated);

cc->total_free_scanned += nr_scanned;
if (total_isolated)
@@ -702,6 +742,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
unsigned long start_pfn = low_pfn;
bool skip_on_failure = false;
unsigned long next_skip_pfn = 0;
+ bool skip_updated = false;

/*
* Ensure that there are not too many pages isolated from the LRU
@@ -768,8 +809,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

page = pfn_to_page(low_pfn);

- if (!valid_page)
+ /*
+ * Check if the pageblock has already been marked skipped.
+ * Only the aligned PFN is checked as the caller isolates
+ * COMPACT_CLUSTER_MAX at a time so the second call must
+ * not falsely conclude that the block should be skipped.
+ */
+ if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) {
+ if (!cc->ignore_skip_hint && get_pageblock_skip(page)) {
+ low_pfn = end_pfn;
+ goto isolate_abort;
+ }
valid_page = page;
+ }

/*
* Skip if free. We read page order here without zone lock
@@ -850,8 +902,19 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
if (!locked) {
locked = compact_trylock_irqsave(zone_lru_lock(zone),
&flags, cc);
- if (!locked)
+
+ /* Allow future scanning if the lock is contended */
+ if (!locked) {
+ clear_pageblock_skip(page);
break;
+ }
+
+ /* Try get exclusive access under lock */
+ if (!skip_updated) {
+ skip_updated = true;
+ if (test_and_set_skip(cc, page, low_pfn))
+ goto isolate_abort;
+ }

/* Recheck PageLRU and PageCompound under lock */
if (!PageLRU(page))
@@ -929,15 +992,20 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
if (unlikely(low_pfn > end_pfn))
low_pfn = end_pfn;

+isolate_abort:
if (locked)
spin_unlock_irqrestore(zone_lru_lock(zone), flags);

/*
- * Update the pageblock-skip information and cached scanner pfn,
- * if the whole pageblock was scanned without isolating any page.
+ * Updated the cached scanner pfn if the pageblock was scanned
+ * without isolating a page. The pageblock may not be marked
+ * skipped already if there were no LRU pages in the block.
*/
- if (low_pfn == end_pfn)
- update_pageblock_skip(cc, valid_page, nr_isolated, true);
+ if (low_pfn == end_pfn && !nr_isolated) {
+ if (valid_page && !skip_updated)
+ set_pageblock_skip(valid_page);
+ update_cached_migrate(cc, low_pfn);
+ }

trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
@@ -1321,8 +1389,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
nr_scanned++;
free_pfn = page_to_pfn(freepage);
if (free_pfn < high_pfn) {
- update_fast_start_pfn(cc, free_pfn);
-
/*
* Avoid if skipped recently. Ideally it would
* move to the tail but even safe iteration of
@@ -1339,6 +1405,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
/* Reorder to so a future search skips recent pages */
move_freelist_tail(freelist, freepage);

+ update_fast_start_pfn(cc, free_pfn);
pfn = pageblock_start_pfn(free_pfn);
cc->fast_search_fail = 0;
set_pageblock_skip(freepage);
@@ -1427,8 +1494,15 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
if (!page)
continue;

- /* If isolation recently failed, do not retry */
- if (!isolation_suitable(cc, page) && !fast_find_block)
+ /*
+ * If isolation recently failed, do not retry. Only check the
+ * pageblock once. COMPACT_CLUSTER_MAX causes a pageblock
+ * to be visited multiple times. Assume skip was checked
+ * before making it "skip" so other compaction instances do
+ * not scan the same block.
+ */
+ if (IS_ALIGNED(low_pfn, pageblock_nr_pages) &&
+ !fast_find_block && !isolation_suitable(cc, page))
continue;

/*
--
2.16.4

2019-01-18 17:56:44

[permalink] [raw]

Subject: [PATCH 11/22] mm, compaction: Use free lists to quickly locate a migration target

Similar to the migration scanner, this patch uses the free lists to quickly
locate a migration target. The search is different in that lower orders
will be searched for a suitable high PFN if necessary but the search
is still bound. This is justified on the grounds that the free scanner
typically scans linearly much more than the migration scanner.

If a free page is found, it is isolated and compaction continues if enough
pages were isolated. For SYNC* scanning, the full pageblock is scanned
for any remaining free pages so that is can be marked for skipping in
the near future.

1-socket thpfioscale
5.0.0-rc1 5.0.0-rc1
isolmig-v3r15 findfree-v3r16
Amean fault-both-3 3024.41 ( 0.00%) 3200.68 ( -5.83%)
Amean fault-both-5 4749.30 ( 0.00%) 4847.75 ( -2.07%)
Amean fault-both-7 6454.95 ( 0.00%) 6658.92 ( -3.16%)
Amean fault-both-12 10324.83 ( 0.00%) 11077.62 ( -7.29%)
Amean fault-both-18 12896.82 ( 0.00%) 12403.97 ( 3.82%)
Amean fault-both-24 13470.60 ( 0.00%) 15607.10 * -15.86%*
Amean fault-both-30 17143.99 ( 0.00%) 18752.27 ( -9.38%)
Amean fault-both-32 17743.91 ( 0.00%) 21207.54 * -19.52%*

The impact on latency is variable but the search is optimistic and
sensitive to the exact system state. Success rates are similar but
the major impact is to the rate of scanning

5.0.0-rc1 5.0.0-rc1
isolmig-v3r15 findfree-v3r16
Compaction migrate scanned 25646769 29507205
Compaction free scanned 201558184 100359571

The free scan rates are reduced by 50%. The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.

[[email protected]: Fix static checker warning]
[[email protected]: Correct number of pages scanned for lower orders]
Signed-off-by: Mel Gorman <[email protected]>
---
mm/compaction.c | 218 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 213 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7c4c9cce7907..19fea4a7b3f4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1124,7 +1124,29 @@ static inline bool compact_scanners_met(struct compact_control *cc)
<= (cc->migrate_pfn >> pageblock_order);
}

-/* Reorder the free list to reduce repeated future searches */
+/*
+ * Used when scanning for a suitable migration target which scans freelists
+ * in reverse. Reorders the list such as the unscanned pages are scanned
+ * first on the next iteration of the free scanner
+ */
+static void
+move_freelist_head(struct list_head *freelist, struct page *freepage)
+{
+ LIST_HEAD(sublist);
+
+ if (!list_is_last(freelist, &freepage->lru)) {
+ list_cut_before(&sublist, freelist, &freepage->lru);
+ if (!list_empty(&sublist))
+ list_splice_tail(&sublist, freelist);
+ }
+}
+
+/*
+ * Similar to move_freelist_head except used by the migration scanner
+ * when scanning forward. It's possible for these list operations to
+ * move against each other if they search the free list exactly in
+ * lockstep.
+ */
static void
move_freelist_tail(struct list_head *freelist, struct page *freepage)
{
@@ -1137,6 +1159,186 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
}
}

+static void
+fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
+{
+ unsigned long start_pfn, end_pfn;
+ struct page *page = pfn_to_page(pfn);
+
+ /* Do not search around if there are enough pages already */
+ if (cc->nr_freepages >= cc->nr_migratepages)
+ return;
+
+ /* Minimise scanning during async compaction */
+ if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC)
+ return;
+
+ /* Pageblock boundaries */
+ start_pfn = pageblock_start_pfn(pfn);
+ end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone));
+
+ /* Scan before */
+ if (start_pfn != pfn) {
+ isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
+ if (cc->nr_freepages >= cc->nr_migratepages)
+ return;
+ }
+
+ /* Scan after */
+ start_pfn = pfn + nr_isolated;
+ if (start_pfn != end_pfn)
+ isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
+
+ /* Skip this pageblock in the future as it's full or nearly full */
+ if (cc->nr_freepages < cc->nr_migratepages)
+ set_pageblock_skip(page);
+}
+
+static unsigned long
+fast_isolate_freepages(struct compact_control *cc)
+{
+ unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
+ unsigned int nr_scanned = 0;
+ unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+ unsigned long nr_isolated = 0;
+ unsigned long distance;
+ struct page *page = NULL;
+ bool scan_start = false;
+ int order;
+
+ /* Full compaction passes in a negative order */
+ if (cc->order <= 0)
+ return cc->free_pfn;
+
+ /*
+ * If starting the scan, use a deeper search and use the highest
+ * PFN found if a suitable one is not found.
+ */
+ if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
+ limit = pageblock_nr_pages >> 1;
+ scan_start = true;
+ }
+
+ /*
+ * Preferred point is in the top quarter of the scan space but take
+ * a pfn from the top half if the search is problematic.
+ */
+ distance = (cc->free_pfn - cc->migrate_pfn);
+ low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
+ min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
+
+ if (WARN_ON_ONCE(min_pfn > low_pfn))
+ low_pfn = min_pfn;
+
+ for (order = cc->order - 1;
+ order >= 0 && !page;
+ order--) {
+ struct free_area *area = &cc->zone->free_area[order];
+ struct list_head *freelist;
+ struct page *freepage;
+ unsigned long flags;
+ unsigned int order_scanned = 0;
+
+ if (!area->nr_free)
+ continue;
+
+ spin_lock_irqsave(&cc->zone->lock, flags);
+ freelist = &area->free_list[MIGRATE_MOVABLE];
+ list_for_each_entry_reverse(freepage, freelist, lru) {
+ unsigned long pfn;
+
+ order_scanned++;
+ nr_scanned++;
+ pfn = page_to_pfn(freepage);
+
+ if (pfn >= highest)
+ highest = pageblock_start_pfn(pfn);
+
+ if (pfn >= low_pfn) {
+ cc->fast_search_fail = 0;
+ page = freepage;
+ break;
+ }
+
+ if (pfn >= min_pfn && pfn > high_pfn) {
+ high_pfn = pfn;
+
+ /* Shorten the scan if a candidate is found */
+ limit >>= 1;
+ }
+
+ if (order_scanned >= limit)
+ break;
+ }
+
+ /* Use a minimum pfn if a preferred one was not found */
+ if (!page && high_pfn) {
+ page = pfn_to_page(high_pfn);
+
+ /* Update freepage for the list reorder below */
+ freepage = page;
+ }
+
+ /* Reorder to so a future search skips recent pages */
+ move_freelist_head(freelist, freepage);
+
+ /* Isolate the page if available */
+ if (page) {
+ if (__isolate_free_page(page, order)) {
+ set_page_private(page, order);
+ nr_isolated = 1 << order;
+ cc->nr_freepages += nr_isolated;
+ list_add_tail(&page->lru, &cc->freepages);
+ count_compact_events(COMPACTISOLATED, nr_isolated);
+ } else {
+ /* If isolation fails, abort the search */
+ order = -1;
+ page = NULL;
+ }
+ }
+
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+
+ /*
+ * Smaller scan on next order so the total scan ig related
+ * to freelist_scan_limit.
+ */
+ if (order_scanned >= limit)
+ limit = min(1U, limit >> 1);
+ }
+
+ if (!page) {
+ cc->fast_search_fail++;
+ if (scan_start) {
+ /*
+ * Use the highest PFN found above min. If one was
+ * not found, be pessemistic for direct compaction
+ * and use the min mark.
+ */
+ if (highest) {
+ page = pfn_to_page(highest);
+ cc->free_pfn = highest;
+ } else {
+ if (cc->direct_compaction) {
+ page = pfn_to_page(min_pfn);
+ cc->free_pfn = min_pfn;
+ }
+ }
+ }
+ }
+
+ if (highest && highest > cc->zone->compact_cached_free_pfn)
+ cc->zone->compact_cached_free_pfn = highest;
+
+ cc->total_free_scanned += nr_scanned;
+ if (!page)
+ return cc->free_pfn;
+
+ low_pfn = page_to_pfn(page);
+ fast_isolate_around(cc, low_pfn, nr_isolated);
+ return low_pfn;
+}
+
/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
@@ -1151,6 +1353,11 @@ static void isolate_freepages(struct compact_control *cc)
unsigned long low_pfn; /* lowest pfn scanner is able to scan */
struct list_head *freelist = &cc->freepages;

+ /* Try a small search of the free lists for a candidate */
+ isolate_start_pfn = fast_isolate_freepages(cc);
+ if (cc->nr_freepages)
+ goto splitmap;
+
/*
* Initialise the free scanner. The starting point is where we last
* successfully isolated from, zone-cached value, or the end of the
@@ -1163,7 +1370,7 @@ static void isolate_freepages(struct compact_control *cc)
* is using.
*/
isolate_start_pfn = cc->free_pfn;
- block_start_pfn = pageblock_start_pfn(cc->free_pfn);
+ block_start_pfn = pageblock_start_pfn(isolate_start_pfn);
block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
zone_end_pfn(zone));
low_pfn = pageblock_end_pfn(cc->migrate_pfn);
@@ -1227,9 +1434,6 @@ static void isolate_freepages(struct compact_control *cc)
}
}

- /* __isolate_free_page() does not map the pages */
- split_map_pages(freelist);
-
/*
* Record where the free scanner will restart next time. Either we
* broke from the loop and set isolate_start_pfn based on the last
@@ -1237,6 +1441,10 @@ static void isolate_freepages(struct compact_control *cc)
* and the loop terminated due to isolate_start_pfn < low_pfn
*/
cc->free_pfn = isolate_start_pfn;
+
+splitmap:
+ /* __isolate_free_page() does not map the pages */
+ split_map_pages(freelist);
}

/*
--
2.16.4

2019-01-18 17:56:56

[permalink] [raw]

Subject: [PATCH 16/22] mm, compaction: Rework compact_should_abort as compact_check_resched

With incremental changes, compact_should_abort no longer makes
any documented sense. Rename to compact_check_resched and update the
associated comments. There is no benefit other than reducing redundant
code and making the intent slightly clearer. It could potentially be
merged with earlier patches but it just makes the review slightly
harder.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 61 ++++++++++++++++++++++-----------------------------------
1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 829540f6f3da..9aa71945255d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -404,6 +404,21 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
return true;
}

+/*
+ * Aside from avoiding lock contention, compaction also periodically checks
+ * need_resched() and records async compaction as contended if necessary.
+ */
+static inline void compact_check_resched(struct compact_control *cc)
+{
+ /* async compaction aborts if contended */
+ if (need_resched()) {
+ if (cc->mode == MIGRATE_ASYNC)
+ cc->contended = true;
+
+ cond_resched();
+ }
+}
+
/*
* Compaction requires the taking of some coarse locks that are potentially
* very heavily contended. The lock should be periodically unlocked to avoid
@@ -432,33 +447,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
return true;
}

- if (need_resched()) {
- if (cc->mode == MIGRATE_ASYNC)
- cc->contended = true;
- cond_resched();
- }
-
- return false;
-}
-
-/*
- * Aside from avoiding lock contention, compaction also periodically checks
- * need_resched() and either schedules in sync compaction or aborts async
- * compaction. This is similar to what compact_unlock_should_abort() does, but
- * is used where no lock is concerned.
- *
- * Returns false when no scheduling was needed, or sync compaction scheduled.
- * Returns true when async compaction should abort.
- */
-static inline bool compact_should_abort(struct compact_control *cc)
-{
- /* async compaction aborts if contended */
- if (need_resched()) {
- if (cc->mode == MIGRATE_ASYNC)
- cc->contended = true;
-
- cond_resched();
- }
+ compact_check_resched(cc);

return false;
}
@@ -747,8 +736,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
return 0;
}

- if (compact_should_abort(cc))
- return 0;
+ compact_check_resched(cc);

if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
skip_on_failure = true;
@@ -1379,12 +1367,10 @@ static void isolate_freepages(struct compact_control *cc)
isolate_start_pfn = block_start_pfn) {
/*
* This can iterate a massively long zone without finding any
- * suitable migration targets, so periodically check if we need
- * to schedule, or even abort async compaction.
+ * suitable migration targets, so periodically check resched.
*/
- if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
- && compact_should_abort(cc))
- break;
+ if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
+ compact_check_resched(cc);

page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
@@ -1675,11 +1661,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/*
* This can potentially iterate a massively long zone with
* many pageblocks unsuitable, so periodically check if we
- * need to schedule, or even abort async compaction.
+ * need to schedule.
*/
- if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
- && compact_should_abort(cc))
- break;
+ if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
+ compact_check_resched(cc);

page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
--
2.16.4

2019-01-18 17:57:04

[permalink] [raw]

Subject: [PATCH 17/22] mm, compaction: Do not consider a need to reschedule as contention

Scanning on large machines can take a considerable length of time and
eventually need to be rescheduled. This is treated as an abort event but
that's not appropriate as the attempt is likely to be retried after making
numerous checks and taking another cycle through the page allocator.
This patch will check the need to reschedule if necessary but continue
the scanning.

The main benefit is reduced scanning when compaction is taking a long time
or the machine is over-saturated. It also avoids an unnecessary exit of
compaction that ends up being retried by the page allocator in the outer
loop.

5.0.0-rc1 5.0.0-rc1
synccached-v3r16 noresched-v3r17
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2958.27 ( 0.00%) 2965.68 ( -0.25%)
Amean fault-both-5 4091.90 ( 0.00%) 3995.90 ( 2.35%)
Amean fault-both-7 5803.05 ( 0.00%) 5842.12 ( -0.67%)
Amean fault-both-12 9481.06 ( 0.00%) 9550.87 ( -0.74%)
Amean fault-both-18 14141.51 ( 0.00%) 13304.72 ( 5.92%)
Amean fault-both-24 16438.00 ( 0.00%) 14618.59 ( 11.07%)
Amean fault-both-30 17531.72 ( 0.00%) 16650.96 ( 5.02%)
Amean fault-both-32 17101.96 ( 0.00%) 17145.15 ( -0.25%)

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 23 ++++-------------------
1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9aa71945255d..293d9a9e6f00 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -404,21 +404,6 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
return true;
}

-/*
- * Aside from avoiding lock contention, compaction also periodically checks
- * need_resched() and records async compaction as contended if necessary.
- */
-static inline void compact_check_resched(struct compact_control *cc)
-{
- /* async compaction aborts if contended */
- if (need_resched()) {
- if (cc->mode == MIGRATE_ASYNC)
- cc->contended = true;
-
- cond_resched();
- }
-}
-
/*
* Compaction requires the taking of some coarse locks that are potentially
* very heavily contended. The lock should be periodically unlocked to avoid
@@ -447,7 +432,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
return true;
}

- compact_check_resched(cc);
+ cond_resched();

return false;
}
@@ -736,7 +721,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
return 0;
}

- compact_check_resched(cc);
+ cond_resched();

if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
skip_on_failure = true;
@@ -1370,7 +1355,7 @@ static void isolate_freepages(struct compact_control *cc)
* suitable migration targets, so periodically check resched.
*/
if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
- compact_check_resched(cc);
+ cond_resched();

page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
@@ -1664,7 +1649,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
* need to schedule.
*/
if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
- compact_check_resched(cc);
+ cond_resched();

page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
--
2.16.4

2019-01-18 17:57:40

[permalink] [raw]

Subject: [PATCH 19/22] mm, compaction: Round-robin the order while searching the free lists for a target

As compaction proceeds and creates high-order blocks, the free list
search gets less efficient as the larger blocks are used as compaction
targets. Eventually, the larger blocks will be behind the migration
scanner for partially migrated pageblocks and the search fails. This
patch round-robins what orders are searched so that larger blocks can be
ignored and find smaller blocks that can be used as migration targets.

The overall impact was small on 1-socket but it avoids corner cases where
the migration/free scanners meet prematurely or situations where many of
the pageblocks encountered by the free scanner are almost full instead of
being properly packed. Previous testing had indicated that without this
patch there were occasional large spikes in the free scanner without this
patch.

[[email protected]: Fix static checker warning]
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 33 ++++++++++++++++++++++++++++++---
mm/internal.h | 3 ++-
2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 04ec7d4da719..7d39ff08269a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1147,6 +1147,24 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
set_pageblock_skip(page);
}

+/* Search orders in round-robin fashion */
+static int next_search_order(struct compact_control *cc, int order)
+{
+ order--;
+ if (order < 0)
+ order = cc->order - 1;
+
+ /* Search wrapped around? */
+ if (order == cc->search_order) {
+ cc->search_order--;
+ if (cc->search_order < 0)
+ cc->search_order = cc->order - 1;
+ return -1;
+ }
+
+ return order;
+}
+
static unsigned long
fast_isolate_freepages(struct compact_control *cc)
{
@@ -1183,9 +1201,15 @@ fast_isolate_freepages(struct compact_control *cc)
if (WARN_ON_ONCE(min_pfn > low_pfn))
low_pfn = min_pfn;

- for (order = cc->order - 1;
- order >= 0 && !page;
- order--) {
+ /*
+ * Search starts from the last successful isolation order or the next
+ * order to search after a previous failure
+ */
+ cc->search_order = min_t(unsigned int, cc->order - 1, cc->search_order);
+
+ for (order = cc->search_order;
+ !page && order >= 0;
+ order = next_search_order(cc, order)) {
struct free_area *area = &cc->zone->free_area[order];
struct list_head *freelist;
struct page *freepage;
@@ -1209,6 +1233,7 @@ fast_isolate_freepages(struct compact_control *cc)

if (pfn >= low_pfn) {
cc->fast_search_fail = 0;
+ cc->search_order = order;
page = freepage;
break;
}
@@ -2136,6 +2161,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
.total_migrate_scanned = 0,
.total_free_scanned = 0,
.order = order,
+ .search_order = order,
.gfp_mask = gfp_mask,
.zone = zone,
.mode = (prio == COMPACT_PRIO_ASYNC) ?
@@ -2367,6 +2393,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
struct zone *zone;
struct compact_control cc = {
.order = pgdat->kcompactd_max_order,
+ .search_order = pgdat->kcompactd_max_order,
.total_migrate_scanned = 0,
.total_free_scanned = 0,
.classzone_idx = pgdat->kcompactd_classzone_idx,
diff --git a/mm/internal.h b/mm/internal.h
index d5b999e5eb5f..31bb0be6fd52 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -192,7 +192,8 @@ struct compact_control {
struct zone *zone;
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
- unsigned int fast_search_fail; /* failures to use free list searches */
+ unsigned short fast_search_fail;/* failures to use free list searches */
+ short search_order; /* order to start a fast search at */
const gfp_t gfp_mask; /* gfp mask of a direct compactor */
int order; /* order a direct compactor needs */
int migratetype; /* migratetype of direct compactor */
--
2.16.4

2019-01-18 17:57:42

[permalink] [raw]

Subject: [PATCH 22/22] mm, compaction: Capture a page under direct compaction

Compaction is inherently race-prone as a suitable page freed during
compaction can be allocated by any parallel task. This patch uses a
capture_control structure to isolate a page immediately when it is freed
by a direct compactor in the slow path of the page allocator. The intent
is to avoid redundant scanning.

5.0.0-rc1 5.0.0-rc1
selective-v3r17 capture-v3r19
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%)
Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%)
Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%)
Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%)
Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%)
Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%*
Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%)
Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%)

Latency is only moderately affected but the devil is in the details.
A closer examination indicates that base page fault latency is reduced
but latency of huge pages is increased as it takes creater care to
succeed. Part of the "problem" is that allocation success rates are close
to 100% even when under pressure and compaction gets harder

5.0.0-rc1 5.0.0-rc1
selective-v3r17 capture-v3r19
Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%)
Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%)
Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%)
Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%)
Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%)
Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%)
Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%)
Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%)

And scan rates are reduced as expected by 6% for the migration scanner
and 29% for the free scanner indicating that there is less redundant work.

Compaction migrate scanned 20815362 19573286
Compaction free scanned 16352612 11510663

Signed-off-by: Mel Gorman <[email protected]>
---
include/linux/compaction.h | 3 +-
include/linux/sched.h | 4 +++
kernel/sched/core.c | 3 ++
mm/compaction.c | 31 ++++++++++++++-----
mm/internal.h | 9 ++++++
mm/page_alloc.c | 74 +++++++++++++++++++++++++++++++++++++++++++---
6 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 70d0256edd31..c960923d9ec2 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -93,7 +93,8 @@ extern int sysctl_compact_unevictable_allowed;
extern int fragmentation_index(struct zone *zone, unsigned int order);
extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
unsigned int order, unsigned int alloc_flags,
- const struct alloc_context *ac, enum compact_priority prio);
+ const struct alloc_context *ac, enum compact_priority prio,
+ struct page **page);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern enum compact_result compaction_suitable(struct zone *zone, int order,
unsigned int alloc_flags, int classzone_idx);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9a46243e6585..5e6690042497 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -47,6 +47,7 @@ struct pid_namespace;
struct pipe_inode_info;
struct rcu_node;
struct reclaim_state;
+struct capture_control;
struct robust_list_head;
struct sched_attr;
struct sched_param;
@@ -964,6 +965,9 @@ struct task_struct {

struct io_context *io_context;

+#ifdef CONFIG_COMPACTION
+ struct capture_control *capture_control;
+#endif
/* Ptrace state: */
unsigned long ptrace_message;
kernel_siginfo_t *last_siginfo;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a674c7db2f29..ae5beb3ed09e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2177,6 +2177,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif

+#ifdef CONFIG_COMPACTION
+ p->capture_control = NULL;
+#endif
init_numa_balancing(clone_flags, p);
}

diff --git a/mm/compaction.c b/mm/compaction.c
index de558f110319..2a6240d940e9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2055,7 +2055,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
return false;
}

-static enum compact_result compact_zone(struct compact_control *cc)
+static enum compact_result
+compact_zone(struct compact_control *cc, struct capture_control *capc)
{
enum compact_result ret;
unsigned long start_pfn = cc->zone->zone_start_pfn;
@@ -2224,6 +2225,11 @@ static enum compact_result compact_zone(struct compact_control *cc)
}
}

+ /* Stop if a page has been captured */
+ if (capc && capc->page) {
+ ret = COMPACT_SUCCESS;
+ break;
+ }
}

out:
@@ -2257,7 +2263,8 @@ static enum compact_result compact_zone(struct compact_control *cc)

static enum compact_result compact_zone_order(struct zone *zone, int order,
gfp_t gfp_mask, enum compact_priority prio,
- unsigned int alloc_flags, int classzone_idx)
+ unsigned int alloc_flags, int classzone_idx,
+ struct page **capture)
{
enum compact_result ret;
struct compact_control cc = {
@@ -2278,14 +2285,24 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
};
+ struct capture_control capc = {
+ .cc = &cc,
+ .page = NULL,
+ };
+
+ if (capture)
+ current->capture_control = &capc;
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);

- ret = compact_zone(&cc);
+ ret = compact_zone(&cc, &capc);

VM_BUG_ON(!list_empty(&cc.freepages));
VM_BUG_ON(!list_empty(&cc.migratepages));

+ *capture = capc.page;
+ current->capture_control = NULL;
+
return ret;
}

@@ -2303,7 +2320,7 @@ int sysctl_extfrag_threshold = 500;
*/
enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
- enum compact_priority prio)
+ enum compact_priority prio, struct page **capture)
{
int may_perform_io = gfp_mask & __GFP_IO;
struct zoneref *z;
@@ -2331,7 +2348,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
}

status = compact_zone_order(zone, order, gfp_mask, prio,
- alloc_flags, ac_classzone_idx(ac));
+ alloc_flags, ac_classzone_idx(ac), capture);
rc = max(status, rc);

/* The allocation should succeed, stop compacting */
@@ -2399,7 +2416,7 @@ static void compact_node(int nid)
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);

- compact_zone(&cc);
+ compact_zone(&cc, NULL);

VM_BUG_ON(!list_empty(&cc.freepages));
VM_BUG_ON(!list_empty(&cc.migratepages));
@@ -2534,7 +2551,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)

if (kthread_should_stop())
return;
- status = compact_zone(&cc);
+ status = compact_zone(&cc, NULL);

if (status == COMPACT_SUCCESS) {
compaction_defer_reset(zone, cc.order, false);
diff --git a/mm/internal.h b/mm/internal.h
index 31bb0be6fd52..9eeaf2b95166 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -209,6 +209,15 @@ struct compact_control {
bool rescan; /* Rescanning the same pageblock */
};

+/*
+ * Used in direct compaction when a page should be taken from the freelists
+ * immediately when one is created during the free path.
+ */
+struct capture_control {
+ struct compact_control *cc;
+ struct page *page;
+};
+
unsigned long
isolate_freepages_range(struct compact_control *cc,
unsigned long start_pfn, unsigned long end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6607cb7131b0..d61174bb0333 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -786,6 +786,57 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
return 0;
}

+#ifdef CONFIG_COMPACTION
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+ struct capture_control *capc = current->capture_control;
+
+ return capc &&
+ !(current->flags & PF_KTHREAD) &&
+ !capc->page &&
+ capc->cc->zone == zone &&
+ capc->cc->direct_compaction ? capc : NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page,
+ int order, int migratetype)
+{
+ if (!capc || order != capc->cc->order)
+ return false;
+
+ /* Do not accidentally pollute CMA or isolated regions*/
+ if (is_migrate_cma(migratetype) ||
+ is_migrate_isolate(migratetype))
+ return false;
+
+ /*
+ * Do not let lower order allocations polluate a movable pageblock.
+ * This might let an unmovable request use a reclaimable pageblock
+ * and vice-versa but no more than normal fallback logic which can
+ * have trouble finding a high-order free page.
+ */
+ if (order < pageblock_order && migratetype == MIGRATE_MOVABLE)
+ return false;
+
+ capc->page = page;
+ return true;
+}
+
+#else
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+ return NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page,
+ int order, int migratetype)
+{
+ return false;
+}
+#endif /* CONFIG_COMPACTION */
+
/*
* Freeing function for a buddy system allocator.
*
@@ -819,6 +870,7 @@ static inline void __free_one_page(struct page *page,
unsigned long uninitialized_var(buddy_pfn);
struct page *buddy;
unsigned int max_order;
+ struct capture_control *capc = task_capc(zone);

max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);

@@ -834,6 +886,12 @@ static inline void __free_one_page(struct page *page,

continue_merging:
while (order < max_order - 1) {
+ if (compaction_capture(capc, page, order, migratetype)) {
+ if (likely(!is_migrate_isolate(migratetype)))
+ __mod_zone_freepage_state(zone, -(1 << order),
+ migratetype);
+ return;
+ }
buddy_pfn = __find_buddy_pfn(pfn, order);
buddy = page + (buddy_pfn - pfn);

@@ -3819,7 +3877,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
enum compact_priority prio, enum compact_result *compact_result)
{
- struct page *page;
+ struct page *page = NULL;
unsigned long pflags;
unsigned int noreclaim_flag;

@@ -3830,13 +3888,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
noreclaim_flag = memalloc_noreclaim_save();

*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
- prio);
+ prio, &page);

memalloc_noreclaim_restore(noreclaim_flag);
psi_memstall_leave(&pflags);

- if (*compact_result <= COMPACT_INACTIVE)
+ if (*compact_result <= COMPACT_INACTIVE) {
+ WARN_ON_ONCE(page);
return NULL;
+ }

/*
* At least in one zone compaction wasn't deferred or skipped, so let's
@@ -3844,7 +3904,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
*/
count_vm_event(COMPACTSTALL);

- page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+ /* Prep a captured page if available */
+ if (page)
+ prep_new_page(page, order, gfp_mask, alloc_flags);
+
+ /* Try get a page from the freelist if available */
+ if (!page)
+ page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

if (page) {
struct zone *zone = page_zone(page);
--
2.16.4

2019-01-18 17:57:45

[permalink] [raw]

Subject: [PATCH 20/22] mm, compaction: Sample pageblocks for free pages

Once fast searching finishes, there is a possibility that the linear
scanner is scanning full blocks found by the fast scanner earlier. This
patch uses an adaptive stride to sample pageblocks for free pages. The
more consecutive full pageblocks encountered, the larger the stride until
a pageblock with free pages is found. The scanners might meet slightly
sooner but it is an acceptable risk given that the search of the free
lists may still encounter the pages and adjust the cached PFN of the free
scanner accordingly.

5.0.0-rc1 5.0.0-rc1
roundrobin-v3r17 samplefree-v3r17
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2752.37 ( 0.00%) 2729.95 ( 0.81%)
Amean fault-both-5 4341.69 ( 0.00%) 4397.80 ( -1.29%)
Amean fault-both-7 6308.75 ( 0.00%) 6097.61 ( 3.35%)
Amean fault-both-12 10241.81 ( 0.00%) 9407.15 ( 8.15%)
Amean fault-both-18 13736.09 ( 0.00%) 10857.63 * 20.96%*
Amean fault-both-24 16853.95 ( 0.00%) 13323.24 * 20.95%*
Amean fault-both-30 15862.61 ( 0.00%) 17345.44 ( -9.35%)
Amean fault-both-32 18450.85 ( 0.00%) 16892.00 ( 8.45%)

The latency is mildly improved offseting some overhead from earlier
patches that are prerequisites for the rest of the series. However,
a major impact is on the free scan rate with an 82% reduction.

5.0.0-rc1 5.0.0-rc1
roundrobin-v3r17 samplefree-v3r17
Compaction migrate scanned 21607271 20116887
Compaction free scanned 95336406 16668703

It's also the first time in the series where the number of pages scanned
by the migration scanner is greater than the free scanner due to the
increased search efficiency.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/compaction.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7d39ff08269a..74bf620e3dcd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -440,6 +440,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
unsigned long *start_pfn,
unsigned long end_pfn,
struct list_head *freelist,
+ unsigned int stride,
bool strict)
{
int nr_scanned = 0, total_isolated = 0;
@@ -449,10 +450,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
unsigned long blockpfn = *start_pfn;
unsigned int order;

+ /* Strict mode is for isolation, speed is secondary */
+ if (strict)
+ stride = 1;
+
cursor = pfn_to_page(blockpfn);

/* Isolate free pages. */
- for (; blockpfn < end_pfn; blockpfn++, cursor++) {
+ for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) {
int isolated;
struct page *page = cursor;

@@ -614,7 +619,7 @@ isolate_freepages_range(struct compact_control *cc,
break;

isolated = isolate_freepages_block(cc, &isolate_start_pfn,
- block_end_pfn, &freelist, true);
+ block_end_pfn, &freelist, 0, true);

/*
* In strict mode, isolate_freepages_block() returns 0 if
@@ -1132,7 +1137,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long

/* Scan before */
if (start_pfn != pfn) {
- isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, false);
+ isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false);
if (cc->nr_freepages >= cc->nr_migratepages)
return;
}
@@ -1140,7 +1145,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long
/* Scan after */
start_pfn = pfn + nr_isolated;
if (start_pfn != end_pfn)
- isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, false);
+ isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);

/* Skip this pageblock in the future as it's full or nearly full */
if (cc->nr_freepages < cc->nr_migratepages)
@@ -1332,6 +1337,7 @@ static void isolate_freepages(struct compact_control *cc)
unsigned long block_end_pfn; /* end of current pageblock */
unsigned long low_pfn; /* lowest pfn scanner is able to scan */
struct list_head *freelist = &cc->freepages;
+ unsigned int stride;

/* Try a small search of the free lists for a candidate */
isolate_start_pfn = fast_isolate_freepages(cc);
@@ -1354,6 +1360,7 @@ static void isolate_freepages(struct compact_control *cc)
block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
zone_end_pfn(zone));
low_pfn = pageblock_end_pfn(cc->migrate_pfn);
+ stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1;

/*
* Isolate free pages until enough are available to migrate the
@@ -1364,6 +1371,8 @@ static void isolate_freepages(struct compact_control *cc)
block_end_pfn = block_start_pfn,
block_start_pfn -= pageblock_nr_pages,
isolate_start_pfn = block_start_pfn) {
+ unsigned long nr_isolated;
+
/*
* This can iterate a massively long zone without finding any
* suitable migration targets, so periodically check resched.
@@ -1385,8 +1394,8 @@ static void isolate_freepages(struct compact_control *cc)
continue;

/* Found a block suitable for isolating free pages from. */
- isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
- freelist, false);
+ nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn,
+ block_end_pfn, freelist, stride, false);

/* Update the skip hint if the full pageblock was scanned */
if (isolate_start_pfn == block_end_pfn)
@@ -1410,6 +1419,13 @@ static void isolate_freepages(struct compact_control *cc)
*/
break;
}
+
+ /* Adjust stride depending on isolation */
+ if (nr_isolated) {
+ stride = 1;
+ continue;
+ }
+ stride = min_t(unsigned int, COMPACT_CLUSTER_MAX, stride << 1);
}

/*
--
2.16.4

2019-01-18 17:58:25

[permalink] [raw]

Subject: [PATCH 21/22] mm, compaction: Be selective about what pageblocks to clear skip hints

Pageblock hints are cleared when compaction restarts or kswapd makes enough
progress that it can sleep but it's over-eager in that the bit is cleared
for migration sources with no LRU pages and migration targets with no free
pages. As pageblock skip hint flushes are relatively rare and out-of-band
with respect to kswapd, this patch makes a few more expensive checks to
see if it's appropriate to even clear the bit. Every pageblock that is
not cleared will avoid 512 pages being scanned unnecessarily on x86-64.

The impact is variable with different workloads showing small differences
in latency, success rates and scan rates. This is expected as clearing
the hints is not that common but doing a small amount of work out-of-band
to avoid a large amount of work in-band later is generally a good thing.

Signed-off-by: Mel Gorman <[email protected]>
---
include/linux/mmzone.h | 2 +
mm/compaction.c | 125 ++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 109 insertions(+), 18 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 842f9189537b..90c13cdeefb5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -480,6 +480,8 @@ struct zone {
unsigned long compact_cached_free_pfn;
/* pfn where async and sync compaction migration scanner should start */
unsigned long compact_cached_migrate_pfn[2];
+ unsigned long compact_init_migrate_pfn;
+ unsigned long compact_init_free_pfn;
#endif

#ifdef CONFIG_COMPACTION
diff --git a/mm/compaction.c b/mm/compaction.c
index 74bf620e3dcd..de558f110319 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -237,6 +237,71 @@ static bool pageblock_skip_persistent(struct page *page)
return false;
}

+static bool
+__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
+ bool check_target)
+{
+ struct page *page = pfn_to_online_page(pfn);
+ struct page *end_page;
+ unsigned long block_pfn;
+
+ if (!page)
+ return false;
+ if (zone != page_zone(page))
+ return false;
+ if (pageblock_skip_persistent(page))
+ return false;
+
+ /*
+ * If skip is already cleared do no further checking once the
+ * restart points have been set.
+ */
+ if (check_source && check_target && !get_pageblock_skip(page))
+ return true;
+
+ /*
+ * If clearing skip for the target scanner, do not select a
+ * non-movable pageblock as the starting point.
+ */
+ if (!check_source && check_target &&
+ get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
+ return false;
+
+ /*
+ * Only clear the hint if a sample indicates there is either a
+ * free page or an LRU page in the block. One or other condition
+ * is necessary for the block to be a migration source/target.
+ */
+ block_pfn = pageblock_start_pfn(pfn);
+ pfn = max(block_pfn, zone->zone_start_pfn);
+ page = pfn_to_page(pfn);
+ if (zone != page_zone(page))
+ return false;
+ pfn = block_pfn + pageblock_nr_pages;
+ pfn = min(pfn, zone_end_pfn(zone));
+ end_page = pfn_to_page(pfn);
+
+ do {
+ if (!pfn_valid_within(pfn))
+ continue;
+
+ if (check_source && PageLRU(page)) {
+ clear_pageblock_skip(page);
+ return true;
+ }
+
+ if (check_target && PageBuddy(page)) {
+ clear_pageblock_skip(page);
+ return true;
+ }
+
+ page += (1 << PAGE_ALLOC_COSTLY_ORDER);
+ pfn += (1 << PAGE_ALLOC_COSTLY_ORDER);
+ } while (page < end_page);
+
+ return false;
+}
+
/*
* This function is called to clear all cached information on pageblocks that
* should be skipped for page isolation when the migrate and free page scanner
@@ -244,30 +309,54 @@ static bool pageblock_skip_persistent(struct page *page)
*/
static void __reset_isolation_suitable(struct zone *zone)
{
- unsigned long start_pfn = zone->zone_start_pfn;
- unsigned long end_pfn = zone_end_pfn(zone);
- unsigned long pfn;
+ unsigned long migrate_pfn = zone->zone_start_pfn;
+ unsigned long free_pfn = zone_end_pfn(zone);
+ unsigned long reset_migrate = free_pfn;
+ unsigned long reset_free = migrate_pfn;
+ bool source_set = false;
+ bool free_set = false;

- zone->compact_blockskip_flush = false;
+ if (!zone->compact_blockskip_flush)
+ return;

- /* Walk the zone and mark every pageblock as suitable for isolation */
- for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
- struct page *page;
+ zone->compact_blockskip_flush = false;

+ /*
+ * Walk the zone and update pageblock skip information. Source looks
+ * for PageLRU while target looks for PageBuddy. When the scanner
+ * is found, both PageBuddy and PageLRU are checked as the pageblock
+ * is suitable as both source and target.
+ */
+ for (; migrate_pfn < free_pfn; migrate_pfn += pageblock_nr_pages,
+ free_pfn -= pageblock_nr_pages) {
cond_resched();

- page = pfn_to_online_page(pfn);
- if (!page)
- continue;
- if (zone != page_zone(page))
- continue;
- if (pageblock_skip_persistent(page))
- continue;
+ /* Update the migrate PFN */
+ if (__reset_isolation_pfn(zone, migrate_pfn, true, source_set) &&
+ migrate_pfn < reset_migrate) {
+ source_set = true;
+ reset_migrate = migrate_pfn;
+ zone->compact_init_migrate_pfn = reset_migrate;
+ zone->compact_cached_migrate_pfn[0] = reset_migrate;
+ zone->compact_cached_migrate_pfn[1] = reset_migrate;
+ }

- clear_pageblock_skip(page);
+ /* Update the free PFN */
+ if (__reset_isolation_pfn(zone, free_pfn, free_set, true) &&
+ free_pfn > reset_free) {
+ free_set = true;
+ reset_free = free_pfn;
+ zone->compact_init_free_pfn = reset_free;
+ zone->compact_cached_free_pfn = reset_free;
+ }
}

- reset_cached_positions(zone);
+ /* Leave no distance if no suitable block was reset */
+ if (reset_migrate >= reset_free) {
+ zone->compact_cached_migrate_pfn[0] = migrate_pfn;
+ zone->compact_cached_migrate_pfn[1] = migrate_pfn;
+ zone->compact_cached_free_pfn = free_pfn;
+ }
}

void reset_isolation_suitable(pg_data_t *pgdat)
@@ -1190,7 +1279,7 @@ fast_isolate_freepages(struct compact_control *cc)
* If starting the scan, use a deeper search and use the highest
* PFN found if a suitable one is not found.
*/
- if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
+ if (cc->free_pfn >= cc->zone->compact_init_free_pfn) {
limit = pageblock_nr_pages >> 1;
scan_start = true;
}
@@ -2015,7 +2104,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
}

- if (cc->migrate_pfn == start_pfn)
+ if (cc->migrate_pfn <= cc->zone->compact_init_migrate_pfn)
cc->whole_zone = true;
}

--
2.16.4

2019-01-18 17:58:36

[permalink] [raw]

Subject: [PATCH 18/22] mm, compaction: Reduce premature advancement of the migration target scanner

The fast isolation of free pages allows the cached PFN of the free
scanner to advance faster than necessary depending on the contents
of the free list. The key is that fast_isolate_freepages() can update
zone->compact_cached_free_pfn via isolate_freepages_block(). When the
fast search fails, the linear scan can start from a point that has skipped
valid migration targets, particularly pageblocks with just low-order
free pages. This can cause the migration source/target scanners to meet
prematurely causing a reset.

This patch starts by avoiding an update of the pageblock skip information
and cached PFN from isolate_freepages_block() and puts the responsibility
of updating that information in the callers. The fast scanner will update
the cached PFN if and only if it finds a block that is higher than the
existing cached PFN and sets the skip if the pageblock is full or nearly
full. The linear scanner will update skipped information and the cached
PFN only when a block is completely scanned. The total impact is that
the free scanner advances more slowly as it is primarily driven by the
linear scanner instead of the fast search.

5.0.0-rc1 5.0.0-rc1
noresched-v3r17 slowfree-v3r17
Amean fault-both-3 2965.68 ( 0.00%) 3036.75 ( -2.40%)
Amean fault-both-5 3995.90 ( 0.00%) 4522.24 * -13.17%*
Amean fault-both-7 5842.12 ( 0.00%) 6365.35 ( -8.96%)
Amean fault-both-12 9550.87 ( 0.00%) 10340.93 ( -8.27%)
Amean fault-both-18 13304.72 ( 0.00%) 14732.46 ( -10.73%)
Amean fault-both-24 14618.59 ( 0.00%) 16288.96 ( -11.43%)
Amean fault-both-30 16650.96 ( 0.00%) 16346.21 ( 1.83%)
Amean fault-both-32 17145.15 ( 0.00%) 19317.49 ( -12.67%)

The impact to latency is higher than the last version but it appears to
be due to a slight increase in the free scan rates which is a potential
side-effect of the patch. However, this is necessary for later patches that
are more careful about how pageblocks are treated as earlier iterations
of those patches hit corner cases where the restarts were punishing and
very visible.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/compaction.c | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 293d9a9e6f00..04ec7d4da719 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -330,10 +330,9 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
* future. The information is later cleared by __reset_isolation_suitable().
*/
static void update_pageblock_skip(struct compact_control *cc,
- struct page *page, unsigned long nr_isolated)
+ struct page *page, unsigned long pfn)
{
struct zone *zone = cc->zone;
- unsigned long pfn;

if (cc->no_set_skip_hint)
return;
@@ -341,13 +340,8 @@ static void update_pageblock_skip(struct compact_control *cc,
if (!page)
return;

- if (nr_isolated)
- return;
-
set_pageblock_skip(page);

- pfn = page_to_pfn(page);
-
/* Update where async and sync compaction should restart */
if (pfn < zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = pfn;
@@ -365,7 +359,7 @@ static inline bool pageblock_skip_persistent(struct page *page)
}

static inline void update_pageblock_skip(struct compact_control *cc,
- struct page *page, unsigned long nr_isolated)
+ struct page *page, unsigned long pfn)
{
}

@@ -449,7 +443,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
bool strict)
{
int nr_scanned = 0, total_isolated = 0;
- struct page *cursor, *valid_page = NULL;
+ struct page *cursor;
unsigned long flags = 0;
bool locked = false;
unsigned long blockpfn = *start_pfn;
@@ -476,9 +470,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
if (!pfn_valid_within(blockpfn))
goto isolate_fail;

- if (!valid_page)
- valid_page = page;
-
/*
* For compound pages such as THP and hugetlbfs, we can save
* potentially a lot of iterations if we skip them at once.
@@ -566,10 +557,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
if (strict && blockpfn < end_pfn)
total_isolated = 0;

- /* Update the pageblock-skip if the whole pageblock was scanned */
- if (blockpfn == end_pfn)
- update_pageblock_skip(cc, valid_page, total_isolated);
-
cc->total_free_scanned += nr_scanned;
if (total_isolated)
count_compact_events(COMPACTISOLATED, total_isolated);
@@ -1293,8 +1280,10 @@ fast_isolate_freepages(struct compact_control *cc)
}
}

- if (highest && highest > cc->zone->compact_cached_free_pfn)
+ if (highest && highest >= cc->zone->compact_cached_free_pfn) {
+ highest -= pageblock_nr_pages;
cc->zone->compact_cached_free_pfn = highest;
+ }

cc->total_free_scanned += nr_scanned;
if (!page)
@@ -1374,6 +1363,10 @@ static void isolate_freepages(struct compact_control *cc)
isolate_freepages_block(cc, &isolate_start_pfn, block_end_pfn,
freelist, false);

+ /* Update the skip hint if the full pageblock was scanned */
+ if (isolate_start_pfn == block_end_pfn)
+ update_pageblock_skip(cc, page, block_start_pfn);
+
/* Are enough freepages isolated? */
if (cc->nr_freepages >= cc->nr_migratepages) {
if (isolate_start_pfn >= block_end_pfn) {
--
2.16.4

2019-01-24 08:54:10

[permalink] [raw]

Subject: Re: [PATCH 00/22] Increase success rates and reduce latency of compaction v3

On Fri, Jan 18, 2019 at 05:51:14PM +0000, Mel Gorman wrote:
> This is a drop-in replacement for the series currently in Andrews tree that
> incorporates static checking and compile warning fixes (Dan, YueHaibing)
> and extensive review feedback from Vlastimil. Big thanks to Vlastimil as
> the review was extremely detailed and a number of issues were caught. Not
> all the patches have been acked but I think an update is still worthwhile.
>
> Andrew, please drop the series you have and replace it with the following
> on the off-chance we get bug reports that are fixed already. Doing this
> with -fix patches would be relatively painful for little gain.
>

Andrew?

--
Mel Gorman
SUSE Labs

2019-01-31 13:55:35

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 09/22] mm, compaction: Use free lists to quickly locate a migration source

On 1/18/19 6:51 PM, Mel Gorman wrote:
...

> + for (order = cc->order - 1;
> + order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
> + order--) {
> + struct free_area *area = &cc->zone->free_area[order];
> + struct list_head *freelist;
> + unsigned long flags;
> + struct page *freepage;
> +
> + if (!area->nr_free)
> + continue;
> +
> + spin_lock_irqsave(&cc->zone->lock, flags);
> + freelist = &area->free_list[MIGRATE_MOVABLE];
> + list_for_each_entry(freepage, freelist, lru) {
> + unsigned long free_pfn;
> +
> + nr_scanned++;
> + free_pfn = page_to_pfn(freepage);
> + if (free_pfn < high_pfn) {
> + update_fast_start_pfn(cc, free_pfn);

Shouldn't this update go below checking pageblock skip bit? We might be
caching pageblocks that will be skipped, and also potentially going
backwards from the original cc->migrate_pfn, which could perhaps explain
the reported kcompactd loops?

> +
> + /*
> + * Avoid if skipped recently. Ideally it would
> + * move to the tail but even safe iteration of
> + * the list assumes an entry is deleted, not
> + * reordered.
> + */
> + if (get_pageblock_skip(freepage)) {
> + if (list_is_last(freelist, &freepage->lru))
> + break;
> +
> + continue;
> + }
> +
> + /* Reorder to so a future search skips recent pages */
> + move_freelist_tail(freelist, freepage);
> +
> + pfn = pageblock_start_pfn(free_pfn);
> + cc->fast_search_fail = 0;
> + set_pageblock_skip(freepage);
> + break;
> + }
> +
> + if (nr_scanned >= limit) {
> + cc->fast_search_fail++;
> + move_freelist_tail(freelist, freepage);
> + break;
> + }
> + }
> + spin_unlock_irqrestore(&cc->zone->lock, flags);
> + }
> +
> + cc->total_migrate_scanned += nr_scanned;
> +
> + /*
> + * If fast scanning failed then use a cached entry for a page block
> + * that had free pages as the basis for starting a linear scan.
> + */
> + if (pfn == cc->migrate_pfn)
> + reinit_migrate_pfn(cc);

This will set cc->migrate_pfn to the lowest pfn encountered, yet return
pfn initialized by original cc->migrate_pfn.
AFAICS isolate_migratepages() will use the returned pfn for the linear
scan and then overwrite cc->migrate_pfn with wherever it advanced from
there. So whatever we stored here into cc->migrate_pfn will never get
actually used, except when isolate_migratepages() returns with
ISOLATED_ABORT.
So maybe the infinite kcompactd loop is linked to ISOLATED_ABORT?

> +
> + return pfn;
> +}
> +
> /*
> * Isolate all pages that can be migrated from the first suitable block,
> * starting at the block pointed to by the migrate scanner pfn within
> @@ -1222,16 +1381,25 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> const isolate_mode_t isolate_mode =
> (sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
> (cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
> + bool fast_find_block;
>
> /*
> * Start at where we last stopped, or beginning of the zone as
> - * initialized by compact_zone()
> + * initialized by compact_zone(). The first failure will use
> + * the lowest PFN as the starting point for linear scanning.
> */
> - low_pfn = cc->migrate_pfn;
> + low_pfn = fast_find_migrateblock(cc);
> block_start_pfn = pageblock_start_pfn(low_pfn);
> if (block_start_pfn < zone->zone_start_pfn)
> block_start_pfn = zone->zone_start_pfn;
>
> + /*
> + * fast_find_migrateblock marks a pageblock skipped so to avoid
> + * the isolation_suitable check below, check whether the fast
> + * search was successful.
> + */
> + fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail;
> +
> /* Only scan within a pageblock boundary */
> block_end_pfn = pageblock_end_pfn(low_pfn);
>
> @@ -1240,6 +1408,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> * Do not cross the free scanner.
> */
> for (; block_end_pfn <= cc->free_pfn;
> + fast_find_block = false,
> low_pfn = block_end_pfn,
> block_start_pfn = block_end_pfn,
> block_end_pfn += pageblock_nr_pages) {
> @@ -1259,7 +1428,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> continue;
>
> /* If isolation recently failed, do not retry */
> - if (!isolation_suitable(cc, page))
> + if (!isolation_suitable(cc, page) && !fast_find_block)
> continue;
>
> /*
> @@ -1550,6 +1719,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
> * want to compact the whole zone), but check that it is initialised
> * by ensuring the values are within zone boundaries.
> */
> + cc->fast_start_pfn = 0;
> if (cc->whole_zone) {
> cc->migrate_pfn = start_pfn;
> cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
> diff --git a/mm/internal.h b/mm/internal.h
> index 9b32f4cab0ae..983cb975545f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -188,9 +188,11 @@ struct compact_control {
> unsigned int nr_migratepages; /* Number of pages to migrate */
> unsigned long free_pfn; /* isolate_freepages search base */
> unsigned long migrate_pfn; /* isolate_migratepages search base */
> + unsigned long fast_start_pfn; /* a pfn to start linear scan from */
> struct zone *zone;
> unsigned long total_migrate_scanned;
> unsigned long total_free_scanned;
> + unsigned int fast_search_fail; /* failures to use free list searches */
> const gfp_t gfp_mask; /* gfp mask of a direct compactor */
> int order; /* order a direct compactor needs */
> int migratetype; /* migratetype of direct compactor */
>

2019-01-31 14:13:30

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 09/22] mm, compaction: Use free lists to quickly locate a migration source

On 1/31/19 2:55 PM, Vlastimil Babka wrote:
> On 1/18/19 6:51 PM, Mel Gorman wrote:
> ...
>
>> + for (order = cc->order - 1;
>> + order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
>> + order--) {
>> + struct free_area *area = &cc->zone->free_area[order];
>> + struct list_head *freelist;
>> + unsigned long flags;
>> + struct page *freepage;
>> +
>> + if (!area->nr_free)
>> + continue;
>> +
>> + spin_lock_irqsave(&cc->zone->lock, flags);
>> + freelist = &area->free_list[MIGRATE_MOVABLE];
>> + list_for_each_entry(freepage, freelist, lru) {
>> + unsigned long free_pfn;
>> +
>> + nr_scanned++;
>> + free_pfn = page_to_pfn(freepage);
>> + if (free_pfn < high_pfn) {
>> + update_fast_start_pfn(cc, free_pfn);
>
> Shouldn't this update go below checking pageblock skip bit? We might be
> caching pageblocks that will be skipped, and also potentially going

Ah that move happens in the next patch.

2019-01-31 14:59:41

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 11/22] mm, compaction: Use free lists to quickly locate a migration target

On 1/18/19 6:51 PM, Mel Gorman wrote:
> Similar to the migration scanner, this patch uses the free lists to quickly
> locate a migration target. The search is different in that lower orders
> will be searched for a suitable high PFN if necessary but the search
> is still bound. This is justified on the grounds that the free scanner
> typically scans linearly much more than the migration scanner.
>
> If a free page is found, it is isolated and compaction continues if enough
> pages were isolated. For SYNC* scanning, the full pageblock is scanned
> for any remaining free pages so that is can be marked for skipping in
> the near future.
>
> 1-socket thpfioscale
> 5.0.0-rc1 5.0.0-rc1
> isolmig-v3r15 findfree-v3r16
> Amean fault-both-3 3024.41 ( 0.00%) 3200.68 ( -5.83%)
> Amean fault-both-5 4749.30 ( 0.00%) 4847.75 ( -2.07%)
> Amean fault-both-7 6454.95 ( 0.00%) 6658.92 ( -3.16%)
> Amean fault-both-12 10324.83 ( 0.00%) 11077.62 ( -7.29%)
> Amean fault-both-18 12896.82 ( 0.00%) 12403.97 ( 3.82%)
> Amean fault-both-24 13470.60 ( 0.00%) 15607.10 * -15.86%*
> Amean fault-both-30 17143.99 ( 0.00%) 18752.27 ( -9.38%)
> Amean fault-both-32 17743.91 ( 0.00%) 21207.54 * -19.52%*
>
> The impact on latency is variable but the search is optimistic and
> sensitive to the exact system state. Success rates are similar but
> the major impact is to the rate of scanning
>
> 5.0.0-rc1 5.0.0-rc1
> isolmig-v3r15 findfree-v3r16
> Compaction migrate scanned 25646769 29507205
> Compaction free scanned 201558184 100359571
>
> The free scan rates are reduced by 50%. The 2-socket reductions for the
> free scanner are more dramatic which is a likely reflection that the
> machine has more memory.
>
> [[email protected]: Fix static checker warning]
> [[email protected]: Correct number of pages scanned for lower orders]
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

Small fix below:

> -/* Reorder the free list to reduce repeated future searches */
> +/*
> + * Used when scanning for a suitable migration target which scans freelists
> + * in reverse. Reorders the list such as the unscanned pages are scanned
> + * first on the next iteration of the free scanner
> + */
> +static void
> +move_freelist_head(struct list_head *freelist, struct page *freepage)
> +{
> + LIST_HEAD(sublist);
> +
> + if (!list_is_last(freelist, &freepage->lru)) {

Shouldn't there be list_is_first() for symmetry?

> + list_cut_before(&sublist, freelist, &freepage->lru);
> + if (!list_empty(&sublist))
> + list_splice_tail(&sublist, freelist);
> + }
> +}
> +
> +/*
> + * Similar to move_freelist_head except used by the migration scanner
> + * when scanning forward. It's possible for these list operations to
> + * move against each other if they search the free list exactly in
> + * lockstep.
> + */
> static void
> move_freelist_tail(struct list_head *freelist, struct page *freepage)
> {

2019-01-31 15:40:00

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 20/22] mm, compaction: Sample pageblocks for free pages

On 1/18/19 6:51 PM, Mel Gorman wrote:
> Once fast searching finishes, there is a possibility that the linear
> scanner is scanning full blocks found by the fast scanner earlier. This
> patch uses an adaptive stride to sample pageblocks for free pages. The
> more consecutive full pageblocks encountered, the larger the stride until
> a pageblock with free pages is found. The scanners might meet slightly
> sooner but it is an acceptable risk given that the search of the free
> lists may still encounter the pages and adjust the cached PFN of the free
> scanner accordingly.
>
> 5.0.0-rc1 5.0.0-rc1
> roundrobin-v3r17 samplefree-v3r17
> Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
> Amean fault-both-3 2752.37 ( 0.00%) 2729.95 ( 0.81%)
> Amean fault-both-5 4341.69 ( 0.00%) 4397.80 ( -1.29%)
> Amean fault-both-7 6308.75 ( 0.00%) 6097.61 ( 3.35%)
> Amean fault-both-12 10241.81 ( 0.00%) 9407.15 ( 8.15%)
> Amean fault-both-18 13736.09 ( 0.00%) 10857.63 * 20.96%*
> Amean fault-both-24 16853.95 ( 0.00%) 13323.24 * 20.95%*
> Amean fault-both-30 15862.61 ( 0.00%) 17345.44 ( -9.35%)
> Amean fault-both-32 18450.85 ( 0.00%) 16892.00 ( 8.45%)
>
> The latency is mildly improved offseting some overhead from earlier
> patches that are prerequisites for the rest of the series. However,
> a major impact is on the free scan rate with an 82% reduction.
>
> 5.0.0-rc1 5.0.0-rc1
> roundrobin-v3r17 samplefree-v3r17
> Compaction migrate scanned 21607271 20116887
> Compaction free scanned 95336406 16668703
>
> It's also the first time in the series where the number of pages scanned
> by the migration scanner is greater than the free scanner due to the
> increased search efficiency.
>
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2019-01-31 15:45:56

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 21/22] mm, compaction: Be selective about what pageblocks to clear skip hints

On 1/18/19 6:51 PM, Mel Gorman wrote:
> Pageblock hints are cleared when compaction restarts or kswapd makes enough
> progress that it can sleep but it's over-eager in that the bit is cleared
> for migration sources with no LRU pages and migration targets with no free
> pages. As pageblock skip hint flushes are relatively rare and out-of-band
> with respect to kswapd, this patch makes a few more expensive checks to
> see if it's appropriate to even clear the bit. Every pageblock that is
> not cleared will avoid 512 pages being scanned unnecessarily on x86-64.
>
> The impact is variable with different workloads showing small differences
> in latency, success rates and scan rates. This is expected as clearing
> the hints is not that common but doing a small amount of work out-of-band
> to avoid a large amount of work in-band later is generally a good thing.
>
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2019-01-31 16:13:48

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 22/22] mm, compaction: Capture a page under direct compaction

On 1/18/19 6:51 PM, Mel Gorman wrote:
> Compaction is inherently race-prone as a suitable page freed during
> compaction can be allocated by any parallel task. This patch uses a
> capture_control structure to isolate a page immediately when it is freed
> by a direct compactor in the slow path of the page allocator. The intent
> is to avoid redundant scanning.
>
> 5.0.0-rc1 5.0.0-rc1
> selective-v3r17 capture-v3r19
> Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
> Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%)
> Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%)
> Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%)
> Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%)
> Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%)
> Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%*
> Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%)
> Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%)
>
> Latency is only moderately affected but the devil is in the details.
> A closer examination indicates that base page fault latency is reduced
> but latency of huge pages is increased as it takes creater care to
> succeed. Part of the "problem" is that allocation success rates are close
> to 100% even when under pressure and compaction gets harder
>
> 5.0.0-rc1 5.0.0-rc1
> selective-v3r17 capture-v3r19
> Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%)
> Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%)
> Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%)
> Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%)
> Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%)
> Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%)
> Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%)
> Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%)
>
> And scan rates are reduced as expected by 6% for the migration scanner
> and 29% for the free scanner indicating that there is less redundant work.
>
> Compaction migrate scanned 20815362 19573286
> Compaction free scanned 16352612 11510663
>
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

Nit below:

...

> @@ -819,6 +870,7 @@ static inline void __free_one_page(struct page *page,
> unsigned long uninitialized_var(buddy_pfn);
> struct page *buddy;
> unsigned int max_order;
> + struct capture_control *capc = task_capc(zone);
>
> max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>
> @@ -834,6 +886,12 @@ static inline void __free_one_page(struct page *page,
>
> continue_merging:
> while (order < max_order - 1) {
> + if (compaction_capture(capc, page, order, migratetype)) {
> + if (likely(!is_migrate_isolate(migratetype)))

compaction_capture() won't act on isolated migratetype, so this check is
unnecessary?

> + __mod_zone_freepage_state(zone, -(1 << order),
> + migratetype);
> + return;
> + }
> buddy_pfn = __find_buddy_pfn(pfn, order);
> buddy = page + (buddy_pfn - pfn);
>

2019-02-01 14:41:02

[permalink] [raw]

Subject: [PATCH] mm, compaction: Capture a page under direct compaction -fix

Vlastimil pointed out that a check for isolation is redundant in
__free_one_page as compaction_capture checks for it.

This is a fix for the mmotm patch
mm-compaction-capture-a-page-under-direct-compaction.patch

Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d61174bb0333..b2eee9b71986 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -887,8 +887,7 @@ static inline void __free_one_page(struct page *page,
continue_merging:
while (order < max_order - 1) {
if (compaction_capture(capc, page, order, migratetype)) {
- if (likely(!is_migrate_isolate(migratetype)))
- __mod_zone_freepage_state(zone, -(1 << order),
+ __mod_zone_freepage_state(zone, -(1 << order),
migratetype);
return;
}

2019-02-01 14:52:36

[permalink] [raw]

Subject: Re: [PATCH 11/22] mm, compaction: Use free lists to quickly locate a migration target

On Thu, Jan 31, 2019 at 03:52:10PM +0100, Vlastimil Babka wrote:
> > -/* Reorder the free list to reduce repeated future searches */
> > +/*
> > + * Used when scanning for a suitable migration target which scans freelists
> > + * in reverse. Reorders the list such as the unscanned pages are scanned
> > + * first on the next iteration of the free scanner
> > + */
> > +static void
> > +move_freelist_head(struct list_head *freelist, struct page *freepage)
> > +{
> > + LIST_HEAD(sublist);
> > +
> > + if (!list_is_last(freelist, &freepage->lru)) {
>
> Shouldn't there be list_is_first() for symmetry?
>

I don't think it would help. We're reverse traversing the list when this is
called. If it's the last entry, it's moving just one page before breaking
off the search and a shuffle has minimal impact. If it's the first page
then list_cut_before moves the entire list to sublist before splicing it
back so it's a pointless operation.

--
Mel Gorman
SUSE Labs

2019-02-01 14:59:07

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 11/22] mm, compaction: Use free lists to quickly locate a migration target

On 2/1/19 3:51 PM, Mel Gorman wrote:
> On Thu, Jan 31, 2019 at 03:52:10PM +0100, Vlastimil Babka wrote:
>>> -/* Reorder the free list to reduce repeated future searches */
>>> +/*
>>> + * Used when scanning for a suitable migration target which scans freelists
>>> + * in reverse. Reorders the list such as the unscanned pages are scanned
>>> + * first on the next iteration of the free scanner
>>> + */
>>> +static void
>>> +move_freelist_head(struct list_head *freelist, struct page *freepage)
>>> +{
>>> + LIST_HEAD(sublist);
>>> +
>>> + if (!list_is_last(freelist, &freepage->lru)) {
>>
>> Shouldn't there be list_is_first() for symmetry?
>>
>
> I don't think it would help. We're reverse traversing the list when this is
> called. If it's the last entry, it's moving just one page before breaking
> off the search and a shuffle has minimal impact. If it's the first page
> then list_cut_before moves the entire list to sublist before splicing it
> back so it's a pointless operation.

Yeah I thought the goal was to avoid the pointless operation, which is
why it was previously added as "if (!list_is_last())" in
move_freelist_head(). So in move_freelist_head() it would have to be as
"if (!list_is_first())" to achieve the same effect. Agree that it's
marginal but if that's so then I would just remove the checks completely
(from both functions) instead of having it subtly wrong in one of them?

2019-02-01 15:07:31

[permalink] [raw]

Subject: Re: [PATCH 09/22] mm, compaction: Use free lists to quickly locate a migration source

On Thu, Jan 31, 2019 at 02:55:01PM +0100, Vlastimil Babka wrote:
> > +
> > + /*
> > + * Avoid if skipped recently. Ideally it would
> > + * move to the tail but even safe iteration of
> > + * the list assumes an entry is deleted, not
> > + * reordered.
> > + */
> > + if (get_pageblock_skip(freepage)) {
> > + if (list_is_last(freelist, &freepage->lru))
> > + break;
> > +
> > + continue;
> > + }
> > +
> > + /* Reorder to so a future search skips recent pages */
> > + move_freelist_tail(freelist, freepage);
> > +
> > + pfn = pageblock_start_pfn(free_pfn);
> > + cc->fast_search_fail = 0;
> > + set_pageblock_skip(freepage);
> > + break;
> > + }
> > +
> > + if (nr_scanned >= limit) {
> > + cc->fast_search_fail++;
> > + move_freelist_tail(freelist, freepage);
> > + break;
> > + }
> > + }
> > + spin_unlock_irqrestore(&cc->zone->lock, flags);
> > + }
> > +
> > + cc->total_migrate_scanned += nr_scanned;
> > +
> > + /*
> > + * If fast scanning failed then use a cached entry for a page block
> > + * that had free pages as the basis for starting a linear scan.
> > + */
> > + if (pfn == cc->migrate_pfn)
> > + reinit_migrate_pfn(cc);
>
> This will set cc->migrate_pfn to the lowest pfn encountered, yet return
> pfn initialized by original cc->migrate_pfn.
> AFAICS isolate_migratepages() will use the returned pfn for the linear
> scan and then overwrite cc->migrate_pfn with wherever it advanced from
> there. So whatever we stored here into cc->migrate_pfn will never get
> actually used, except when isolate_migratepages() returns with
> ISOLATED_ABORT.
> So maybe the infinite kcompactd loop is linked to ISOLATED_ABORT?
>

I'm not entirely sure it would fix the infinite loop. I suspect that is
going to be a boundary conditions where the two scanners are close but
do not meet if it still exists after the batch of fixes. However, you're
right that this code is problematic. I'll write a fix, test it and post
it if it's ok.

Well spotted!

--
Mel Gorman
SUSE Labs

2019-02-04 08:56:36

[permalink] [raw]

Subject: [PATCH] mm, compaction: Use free lists to quickly locate a migration source -fix

Vlastimil correctly pointed out that when a fast search fails and cc->migrate_pfn
is reinitialised to the lowest PFN found that the caller does not use the updated
PFN.

This is a fix for the mmotm patch
mm-compaction-use-free-lists-to-quickly-locate-a-migration-source.patch

Signed-off-by: Mel Gorman <[email protected]>

diff --git a/mm/compaction.c b/mm/compaction.c
index 92d10eb3d1c7..d249f257da7e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1238,14 +1238,16 @@ update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
}

-static inline void
+static inline unsigned long
reinit_migrate_pfn(struct compact_control *cc)
{
if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
- return;
+ return cc->migrate_pfn;

cc->migrate_pfn = cc->fast_start_pfn;
cc->fast_start_pfn = ULONG_MAX;
+
+ return cc->migrate_pfn;
}

/*
@@ -1361,7 +1363,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
* that had free pages as the basis for starting a linear scan.
*/
if (pfn == cc->migrate_pfn)
- reinit_migrate_pfn(cc);
+ pfn = reinit_migrate_pfn(cc);

return pfn;
}

2019-02-04 14:18:40

[permalink] [raw]

Subject: [PATCH] mm, compaction: Use free lists to quickly locate a migration source -fix

Vlastimil correctly pointed out that when a fast search fails and cc->migrate_pfn
is reinitialised to the lowest PFN found that the caller does not use the updated
PFN.

He also pointed out that there is an inconsistency between
move_freelist_head and move_freelist_tail. This patch adds a new helper
and uses it in move_freelist_tail so that list manipulations are avoided
if the first list item traversed is a suitable migration source. The
end result will be that the helpers should be symmetrical and it's been
confirmed that the scan rates are slightly improved as a result of the
fix but not enough to rewrite the changelogs.

This is a fix for the mmotm patch
mm-compaction-use-free-lists-to-quickly-locate-a-migration-source.patch . It's
been provided as a combined patch as the first patch is not picked up at the
time of writing and a rolled up patch is less likely to fall through the cracks.

Signed-off-by: Mel Gorman <[email protected]>
---
drivers/gpu/drm/i915/i915_utils.h | 6 ------
include/linux/list.h | 11 +++++++++++
mm/compaction.c | 10 ++++++----
3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 9726df37c4c4..540e20eb032c 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -123,12 +123,6 @@ static inline u64 ptr_to_u64(const void *ptr)

#include <linux/list.h>

-static inline int list_is_first(const struct list_head *list,
- const struct list_head *head)
-{
- return head->next == list;
-}
-
static inline void __list_del_many(struct list_head *head,
struct list_head *first)
{
diff --git a/include/linux/list.h b/include/linux/list.h
index edb7628e46ed..79626b5ab36c 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -206,6 +206,17 @@ static inline void list_bulk_move_tail(struct list_head *head,
head->prev = last;
}

+/**
+ * list_is_first -- tests whether @ list is the first entry in list @head
+ * @list: the entry to test
+ * @head: the head of the list
+ */
+static inline int list_is_first(const struct list_head *list,
+ const struct list_head *head)
+{
+ return list->prev == head;
+}
+
/**
* list_is_last - tests whether @list is the last entry in list @head
* @list: the entry to test
diff --git a/mm/compaction.c b/mm/compaction.c
index 92d10eb3d1c7..55f7ab142af2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1062,7 +1062,7 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
{
LIST_HEAD(sublist);

- if (!list_is_last(freelist, &freepage->lru)) {
+ if (!list_is_first(freelist, &freepage->lru)) {
list_cut_position(&sublist, freelist, &freepage->lru);
if (!list_empty(&sublist))
list_splice_tail(&sublist, freelist);
@@ -1238,14 +1238,16 @@ update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
}

-static inline void
+static inline unsigned long
reinit_migrate_pfn(struct compact_control *cc)
{
if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
- return;
+ return cc->migrate_pfn;

cc->migrate_pfn = cc->fast_start_pfn;
cc->fast_start_pfn = ULONG_MAX;
+
+ return cc->migrate_pfn;
}

/*
@@ -1361,7 +1363,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
* that had free pages as the basis for starting a linear scan.
*/
if (pfn == cc->migrate_pfn)
- reinit_migrate_pfn(cc);
+ pfn = reinit_migrate_pfn(cc);

return pfn;
}

2019-02-08 17:11:11

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 09/22] mm, compaction: Use free lists to quickly locate a migration source

On 1/18/19 6:51 PM, Mel Gorman wrote:
> The migration scanner is a linear scan of a zone with a potentiall large
> search space. Furthermore, many pageblocks are unusable such as those
> filled with reserved pages or partially filled with pages that cannot
> migrate. These still get scanned in the common case of allocating a THP
> and the cost accumulates.
>
> The patch uses a partial search of the free lists to locate a migration
> source candidate that is marked as MOVABLE when allocating a THP. It
> prefers picking a block with a larger number of free pages already on
> the basis that there are fewer pages to migrate to free the entire block.
> The lowest PFN found during searches is tracked as the basis of the start
> for the linear search after the first search of the free list fails.
> After the search, the free list is shuffled so that the next search will
> not encounter the same page. If the search fails then the subsequent
> searches will be shorter and the linear scanner is used.
>
> If this search fails, or if the request is for a small or
> unmovable/reclaimable allocation then the linear scanner is still used. It
> is somewhat pointless to use the list search in those cases. Small free
> pages must be used for the search and there is no guarantee that movable
> pages are located within that block that are contiguous.
>
> 5.0.0-rc1 5.0.0-rc1
> noboost-v3r10 findmig-v3r15
> Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%)
> Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%)
> Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%)
> Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%)
> Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%)
> Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%*
> Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%*
> Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%*
>
> 5.0.0-rc1 5.0.0-rc1
> noboost-v3r10 findmig-v3r15
> Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%)
> Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%)
> Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%)
> Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%)
> Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%)
> Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%)
> Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%)
> Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%)
>
> This shows an improvement in allocation latencies with similar allocation
> success rates. While not presented, there was a 31% reduction in migration
> scanning and a 8% reduction on system CPU usage. A 2-socket machine showed
> similar benefits.
>
> [[email protected]: Migrate block that was found-fast, some optimisations]
> Signed-off-by: Mel Gorman <[email protected]>

With the followup fix,

Acked-by: Vlastimil Babka <[email protected]>

2021-01-12 11:17:28

[permalink] [raw]

Subject: [PATCH] mm, compaction: make sure we isolate a valid freepage when high_pfn is used

In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)

Unable to handle kernel paging request at virtual address dead000000000200
Mem abort info:
ESR = 0x96000044
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000044
CM = 0, WnR = 1
[dead000000000200] address between user and kernel address ranges

-000|list_cut_before(inline)
-000|move_freelist_head(inline)
-000|fast_isolate_freepages(inline)
-000|isolate_freepages(inline)
-000|compaction_alloc(?, ?)
-001|unmap_and_move(inline)
-001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
-002|__read_once_size(inline)
-002|static_key_count(inline)
-002|static_key_false(inline)
-002|trace_mm_compaction_migratepages(inline)
-002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
-003|kcompactd_do_work(inline)
-003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
-004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
-005|ret_from_fork(asm)
---|end of frame

The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.

This patch fixes the issue by reset high_pfn before searching each free area, which ensure
freepage and freelist match when call move_freelist_head in fast_isolate_freepages().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
---
mm/compaction.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index cc1a7f600a86..6754229f4213 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1352,6 +1352,7 @@ fast_isolate_freepages(struct compact_control *cc)
if (!area->nr_free)
continue;

+ high_pfn = 0;
spin_lock_irqsave(&cc->zone->lock, flags);
freelist = &area->free_list[MIGRATE_MOVABLE];
list_for_each_entry_reverse(freepage, freelist, lru) {
--
2.25.1

2021-01-12 12:16:51

[permalink] [raw]

Subject: Re: [PATCH] mm, compaction: make sure we isolate a valid freepage when high_pfn is used

On Tue, Jan 12, 2021 at 01:19:55PM +0800, Rokudo Yan wrote:
> In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
> is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
> And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)
>
> Unable to handle kernel paging request at virtual address dead000000000200
> Mem abort info:
> ESR = 0x96000044
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000044
> CM = 0, WnR = 1
> [dead000000000200] address between user and kernel address ranges
>
> -000|list_cut_before(inline)
> -000|move_freelist_head(inline)
> -000|fast_isolate_freepages(inline)
> -000|isolate_freepages(inline)
> -000|compaction_alloc(?, ?)
> -001|unmap_and_move(inline)
> -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
> -002|__read_once_size(inline)
> -002|static_key_count(inline)
> -002|static_key_false(inline)
> -002|trace_mm_compaction_migratepages(inline)
> -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
> -003|kcompactd_do_work(inline)
> -003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
> -004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
> -005|ret_from_fork(asm)
> ---|end of frame
>
> The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.
>
> This patch fixes the issue by reset high_pfn before searching each free area, which ensure
> freepage and freelist match when call move_freelist_head in fast_isolate_freepages().
>
> Link: http://lkml.kernel.org/r/[email protected]
> Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")

In that case, please move the high_pfn declaration and initialiser within
the for (order = cc->order - 1;...) loop to make it explicit what the
scope of high_pfn is.

--
Mel Gorman
SUSE Labs

2021-01-13 01:27:15

[permalink] [raw]

Subject: [PATCH] mm, compaction: move high_pfn to the for loop scope.

In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)

Unable to handle kernel paging request at virtual address dead000000000200
Mem abort info:
ESR = 0x96000044
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000044
CM = 0, WnR = 1
[dead000000000200] address between user and kernel address ranges

-000|list_cut_before(inline)
-000|move_freelist_head(inline)
-000|fast_isolate_freepages(inline)
-000|isolate_freepages(inline)
-000|compaction_alloc(?, ?)
-001|unmap_and_move(inline)
-001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
-002|__read_once_size(inline)
-002|static_key_count(inline)
-002|static_key_false(inline)
-002|trace_mm_compaction_migratepages(inline)
-002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
-003|kcompactd_do_work(inline)
-003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
-004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
-005|ret_from_fork(asm)
---|end of frame

The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.

This patch fixes the issue by reset high_pfn before searching each free area, which ensure
freepage and freelist match when call move_freelist_head in fast_isolate_freepages().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
---
mm/compaction.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index cc1a7f600a86..75f0e550b18f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1303,7 +1303,7 @@ fast_isolate_freepages(struct compact_control *cc)
{
unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
unsigned int nr_scanned = 0;
- unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+ unsigned long low_pfn, min_pfn, highest = 0;
unsigned long nr_isolated = 0;
unsigned long distance;
struct page *page = NULL;
@@ -1348,6 +1348,7 @@ fast_isolate_freepages(struct compact_control *cc)
struct page *freepage;
unsigned long flags;
unsigned int order_scanned = 0;
+ unsigned long high_pfn = 0;

if (!area->nr_free)
continue;
--
2.25.1

2021-01-13 01:32:28

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH] mm, compaction: move high_pfn to the for loop scope.

On 1/12/21 10:47 AM, Rokudo Yan wrote:
> In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
> is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
> And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)
>
> Unable to handle kernel paging request at virtual address dead000000000200
> Mem abort info:
> ESR = 0x96000044
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000044
> CM = 0, WnR = 1
> [dead000000000200] address between user and kernel address ranges
>
> -000|list_cut_before(inline)
> -000|move_freelist_head(inline)
> -000|fast_isolate_freepages(inline)
> -000|isolate_freepages(inline)
> -000|compaction_alloc(?, ?)
> -001|unmap_and_move(inline)
> -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
> -002|__read_once_size(inline)
> -002|static_key_count(inline)
> -002|static_key_false(inline)
> -002|trace_mm_compaction_migratepages(inline)
> -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
> -003|kcompactd_do_work(inline)
> -003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
> -004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
> -005|ret_from_fork(asm)
> ---|end of frame
>
> The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.
>
> This patch fixes the issue by reset high_pfn before searching each free area, which ensure
> freepage and freelist match when call move_freelist_head in fast_isolate_freepages().
>
> Link: http://lkml.kernel.org/r/[email protected]
> Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")

Sounds serious enough. Wonder how it wasn't reported sooner.

Cc: <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/compaction.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index cc1a7f600a86..75f0e550b18f 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1303,7 +1303,7 @@ fast_isolate_freepages(struct compact_control *cc)
> {
> unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
> unsigned int nr_scanned = 0;
> - unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
> + unsigned long low_pfn, min_pfn, highest = 0;
> unsigned long nr_isolated = 0;
> unsigned long distance;
> struct page *page = NULL;
> @@ -1348,6 +1348,7 @@ fast_isolate_freepages(struct compact_control *cc)
> struct page *freepage;
> unsigned long flags;
> unsigned int order_scanned = 0;
> + unsigned long high_pfn = 0;
>
> if (!area->nr_free)
> continue;
>

2021-01-13 03:14:22

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] mm, compaction: move high_pfn to the for loop scope.

On Tue, 12 Jan 2021 17:47:20 +0800 Rokudo Yan <[email protected]> wrote:

> In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
> is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
> And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)
>
> Unable to handle kernel paging request at virtual address dead000000000200
> Mem abort info:
> ESR = 0x96000044
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000044
> CM = 0, WnR = 1
> [dead000000000200] address between user and kernel address ranges
>
> -000|list_cut_before(inline)
> -000|move_freelist_head(inline)
> -000|fast_isolate_freepages(inline)
> -000|isolate_freepages(inline)
> -000|compaction_alloc(?, ?)
> -001|unmap_and_move(inline)
> -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
> -002|__read_once_size(inline)
> -002|static_key_count(inline)
> -002|static_key_false(inline)
> -002|trace_mm_compaction_migratepages(inline)
> -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
> -003|kcompactd_do_work(inline)
> -003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
> -004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
> -005|ret_from_fork(asm)
> ---|end of frame
>
> The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.
>
> This patch fixes the issue by reset high_pfn before searching each free area, which ensure
> freepage and freelist match when call move_freelist_head in fast_isolate_freepages().
>
> Link: http://lkml.kernel.org/r/[email protected]
> Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")

Could you please send a Signed-off-by: for this patch, as per
Documentation/process/submitting-patches.rst?

2021-01-18 10:09:49

[permalink] [raw]

Subject: Re: [PATCH] mm, compaction: move high_pfn to the for loop scope.

On Mon, Jan 18, 2021 at 03:41:26PM +0800, Rokudo Yan wrote:
> In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
> is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
> And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)
>
> Unable to handle kernel paging request at virtual address dead000000000200
> Mem abort info:
> ESR = 0x96000044
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000044
> CM = 0, WnR = 1
> [dead000000000200] address between user and kernel address ranges
>
> -000|list_cut_before(inline)
> -000|move_freelist_head(inline)
> -000|fast_isolate_freepages(inline)
> -000|isolate_freepages(inline)
> -000|compaction_alloc(?, ?)
> -001|unmap_and_move(inline)
> -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
> -002|__read_once_size(inline)
> -002|static_key_count(inline)
> -002|static_key_false(inline)
> -002|trace_mm_compaction_migratepages(inline)
> -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
> -003|kcompactd_do_work(inline)
> -003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
> -004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
> -005|ret_from_fork(asm)
> ---|end of frame
>
> The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.
>
> This patch fixes the issue by reset high_pfn before searching each free area, which ensure
> freepage and freelist match when call move_freelist_head in fast_isolate_freepages().
>
> Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
>
> Signed-off-by: Rokudo Yan <[email protected]>

Other than a fixes line, I do not think this changed so my previous Ack
still applies

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2021-01-18 23:11:24

[permalink] [raw]

Subject: [PATCH] mm, compaction: move high_pfn to the for loop scope.

In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn
is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before.
And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist)

Unable to handle kernel paging request at virtual address dead000000000200
Mem abort info:
ESR = 0x96000044
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000044
CM = 0, WnR = 1
[dead000000000200] address between user and kernel address ranges

-000|list_cut_before(inline)
-000|move_freelist_head(inline)
-000|fast_isolate_freepages(inline)
-000|isolate_freepages(inline)
-000|compaction_alloc(?, ?)
-001|unmap_and_move(inline)
-001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
-002|__read_once_size(inline)
-002|static_key_count(inline)
-002|static_key_false(inline)
-002|trace_mm_compaction_migratepages(inline)
-002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
-003|kcompactd_do_work(inline)
-003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
-004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
-005|ret_from_fork(asm)
---|end of frame

The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device.

This patch fixes the issue by reset high_pfn before searching each free area, which ensure
freepage and freelist match when call move_freelist_head in fast_isolate_freepages().

Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")

Signed-off-by: Rokudo Yan <[email protected]>
---
mm/compaction.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index cc1a7f600a86..75f0e550b18f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1303,7 +1303,7 @@ fast_isolate_freepages(struct compact_control *cc)
{
unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
unsigned int nr_scanned = 0;
- unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+ unsigned long low_pfn, min_pfn, highest = 0;
unsigned long nr_isolated = 0;
unsigned long distance;
struct page *page = NULL;
@@ -1348,6 +1348,7 @@ fast_isolate_freepages(struct compact_control *cc)
struct page *freepage;
unsigned long flags;
unsigned int order_scanned = 0;
+ unsigned long high_pfn = 0;

if (!area->nr_free)
continue;
--
2.25.1