2015-06-10 09:33:30

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 0/6] Assorted compaction cleanups and optimizations

This series is partly the cleanups that were posted as part of the RFC for
changing initial scanning positions [1] and partly new relatively simple
scanner optimizations (yes, they are still possible). I've resumed working
on the bigger scanner changes, but that will take a while, so no point in
delaying these smaller patches.

The interesting patches are 4 and 5, and somewhat patch 6. In 4, skipping of
compound pages in single iteration is improved for migration scanner, so it
works also for !PageLRU compound pages such as hugetlbfs, slab etc. Patch 5
introduces this kind of skipping in the free scanner. The trick is that we
can read compound_order() without any protection, if we are careful to filter
out values larger than MAX_ORDER. The only danger is that we skip too much.
The same trick was already used for reading the freepage order in the migrate
scanner.

Patch 6 avoids some rescanning when compaction restarts from cached scanner
positions, but the benefits are small enough to be lost in the noise.

To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc from
mmtests, set to simulate THP allocations (including __GFP_COMP) on a 4GB system
where 1GB was occupied by hugetlbfs pages. I'll include just the relevant
stats:

Patch 3 Patch 4 Patch 5 Patch 6

Compaction stalls 7523 7529 7515 7495
Compaction success 323 304 322 289
Compaction failures 7200 7224 7192 7206
Page migrate success 247778 264395 240737 248956
Page migrate failure 15358 33184 21621 23657
Compaction pages isolated 906928 980192 909983 958044
Compaction migrate scanned 2005277 1692805 1498800 1750952
Compaction free scanned 13255284 11539986 9011276 9703018
Compaction cost 288 305 277 289

With 5 iterations per patch, the results are still noisy, but we can see that
Patch 4 does reduce migrate_scanned by 15% thanks to skipping the hugetlbfs
pages at once. Interestingly, free_scanned is also reduced and I have no idea
why. Patch 5 further reduces free_scanned as expected, by 15%. Other stats
are unaffected modulo noise. Patch 6 looks like a regression but I believe it's
just the noise. I've verified that compaction now restarts from the exact pfns
it left off, using tracepoints.

[1] https://lkml.org/lkml/2015/1/19/158

Vlastimil Babka (6):
mm, compaction: more robust check for scanners meeting
mm, compaction: simplify handling restart position in free pages
scanner
mm, compaction: encapsulate resetting cached scanner positions
mm, compaction: always skip compound pages by order in migrate scanner
mm, compaction: skip compound pages by order in free scanner
mm, compaction: decouple updating pageblock_skip and cached pfn

mm/compaction.c | 188 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 115 insertions(+), 73 deletions(-)

--
2.1.4


2015-06-10 09:33:59

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 1/6] mm, compaction: more robust check for scanners meeting

Compaction should finish when the migration and free scanner meet, i.e. they
reach the same pageblock. Currently however, the test in compact_finished()
simply just compares the exact pfns, which may yield a false negative when the
free scanner position is in the middle of a pageblock and the migration scanner
reaches the beginning of the same pageblock.

This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
position within pageblock in free pages scanner") allowed the free scanner
position to be in the middle of a pageblock between invocations. The hot-fix
1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
prevented the issue by adding a special check in the migration scanner to
satisfy the current detection of scanners meeting.

However, the proper fix is to make the detection more robust. This patch
introduces the compact_scanners_met() function that returns true when the free
scanner position is in the same or lower pageblock than the migration scanner.
The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
removed.

Suggested-by: Joonsoo Kim <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/compaction.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 16e1b57..d46aaeb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
}

/*
+ * Test whether the free scanner has reached the same or lower pageblock than
+ * the migration scanner, and compaction should thus terminate.
+ */
+static inline bool compact_scanners_met(struct compact_control *cc)
+{
+ return (cc->free_pfn >> pageblock_order)
+ <= (cc->migrate_pfn >> pageblock_order);
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}

acct_isolated(zone, cc);
- /*
- * Record where migration scanner will be restarted. If we end up in
- * the same pageblock as the free scanner, make the scanners fully
- * meet so that compact_finished() terminates compaction.
- */
- cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+ /* Record where migration scanner will be restarted. */
+ cc->migrate_pfn = low_pfn;

return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
}
@@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
return COMPACT_PARTIAL;

/* Compaction run completes if the migrate and free scanner meet */
- if (cc->free_pfn <= cc->migrate_pfn) {
+ if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
@@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
* migrate_pages() may return -ENOMEM when scanners meet
* and we want compact_finished() to detect it
*/
- if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
+ if (err == -ENOMEM && !compact_scanners_met(cc)) {
ret = COMPACT_PARTIAL;
goto out;
}
--
2.1.4

2015-06-10 09:33:40

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner

Handling the position where compaction free scanner should restart (stored in
cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner"). Currently the
position is updated in each loop iteration of isolate_freepages(), although it
should be enough to update it only when breaking from the loop. There's also
an extra check outside the loop updates the position in case we have met the
migration scanner.

This can be simplified if we move the test for having isolated enough from the
for loop header next to the test for contention, and determining the restart
position only in these cases. We can reuse the isolate_start_pfn variable for
this instead of setting cc->free_pfn directly. Outside the loop, we can simply
set cc->free_pfn to current value of isolate_start_pfn without any extra check.

Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
add a new condition that terminates isolate_freepages_block() prematurely
without also considering the condition in isolate_freepages().

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/compaction.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d46aaeb..7e0a814 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -947,8 +947,7 @@ static void isolate_freepages(struct compact_control *cc)
* pages on cc->migratepages. We stop searching if the migrate
* and free page scanners meet or enough free pages are isolated.
*/
- for (; block_start_pfn >= low_pfn &&
- cc->nr_migratepages > cc->nr_freepages;
+ for (; block_start_pfn >= low_pfn;
block_end_pfn = block_start_pfn,
block_start_pfn -= pageblock_nr_pages,
isolate_start_pfn = block_start_pfn) {
@@ -980,6 +979,8 @@ static void isolate_freepages(struct compact_control *cc)
block_end_pfn, freelist, false);

/*
+ * If we isolated enough freepages, or aborted due to async
+ * compaction being contended, terminate the loop.
* Remember where the free scanner should restart next time,
* which is where isolate_freepages_block() left off.
* But if it scanned the whole pageblock, isolate_start_pfn
@@ -988,27 +989,31 @@ static void isolate_freepages(struct compact_control *cc)
* In that case we will however want to restart at the start
* of the previous pageblock.
*/
- cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
- isolate_start_pfn :
- block_start_pfn - pageblock_nr_pages;
-
- /*
- * isolate_freepages_block() might have aborted due to async
- * compaction being contended
- */
- if (cc->contended)
+ if ((cc->nr_freepages >= cc->nr_migratepages)
+ || cc->contended) {
+ if (isolate_start_pfn >= block_end_pfn)
+ isolate_start_pfn =
+ block_start_pfn - pageblock_nr_pages;
break;
+ } else {
+ /*
+ * isolate_freepages_block() should not terminate
+ * prematurely unless contended, or isolated enough
+ */
+ VM_BUG_ON(isolate_start_pfn < block_end_pfn);
+ }
}

/* split_free_page does not map the pages */
map_pages(freelist);

/*
- * If we crossed the migrate scanner, we want to keep it that way
- * so that compact_finished() may detect this
+ * Record where the free scanner will restart next time. Either we
+ * broke from the loop and set isolate_start_pfn based on the last
+ * call to isolate_freepages_block(), or we met the migration scanner
+ * and the loop terminated due to isolate_start_pfn < low_pfn
*/
- if (block_start_pfn < low_pfn)
- cc->free_pfn = cc->migrate_pfn;
+ cc->free_pfn = isolate_start_pfn;
}

/*
--
2.1.4

2015-06-10 09:33:49

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions

Resetting the cached compaction scanner positions is now done implicitly in
__reset_isolation_suitable() and compact_finished(). Encapsulate the
functionality in a new function reset_cached_positions() and call it explicitly
where needed.

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/compaction.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7e0a814..d334bb3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
return !get_pageblock_skip(page);
}

+static void reset_cached_positions(struct zone *zone)
+{
+ zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
+ zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
+ zone->compact_cached_free_pfn = zone_end_pfn(zone);
+}
+
/*
* This function is called to clear all cached information on pageblocks that
* should be skipped for page isolation when the migrate and free page scanner
@@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;

- zone->compact_cached_migrate_pfn[0] = start_pfn;
- zone->compact_cached_migrate_pfn[1] = start_pfn;
- zone->compact_cached_free_pfn = end_pfn;
zone->compact_blockskip_flush = false;

/* Walk the zone and mark every pageblock as suitable for isolation */
@@ -250,8 +254,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
continue;

/* Only flush if a full compaction finished recently */
- if (zone->compact_blockskip_flush)
+ if (zone->compact_blockskip_flush) {
__reset_isolation_suitable(zone);
+ reset_cached_positions(zone);
+ }
}
}

@@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
/* Compaction run completes if the migrate and free scanner meet */
if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
- zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
- zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
- zone->compact_cached_free_pfn = zone_end_pfn(zone);
+ reset_cached_positions(zone);

/*
* Mark that the PG_migrate_skip information should be cleared
@@ -1329,8 +1333,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
* is about to be retried after being deferred. kswapd does not do
* this reset as it'll reset the cached information when going to sleep.
*/
- if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
+ if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
__reset_isolation_suitable(zone);
+ reset_cached_positions(zone);
+ }

/*
* Setup to move all movable pages to the end of the zone. Used cached
--
2.1.4

2015-06-10 09:35:07

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner

The compaction migrate scanner tries to skip compound pages by their order, to
reduce number of iterations for pages it cannot isolate. The check is only done
if PageLRU() is true, which means it applies to THP pages, but not e.g.
hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
by base pages.

This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be split under
us. But the only danger (after filtering out order values that are not below
MAX_ORDER, to prevent overflows) is that we skip too much or too little after
reading a bogus compound_order() due to a rare race. This is the same reasoning
as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
migrate scanner") introduced for unsafely reading PageBuddy() order.

After this patch, all pages are tested for PageCompound() and we skip them by
compound_order(). The test is done after the test for balloon_page_movable()
as we don't want to assume if balloon pages (or other pages with own isolation
and migration implementation if a generic API gets implemented) are compound
or not.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_migrate_scanned count decreased by 15%.

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/compaction.c | 36 +++++++++++++++++-------------------
1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d334bb3..e37d361 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

/* Time to isolate some pages for migration */
for (; low_pfn < end_pfn; low_pfn++) {
+ bool is_lru;
+
/*
* Periodically drop the lock (if held) regardless of its
* contention, to give chance to IRQs. Abort async compaction
@@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
* It's possible to migrate LRU pages and balloon pages
* Skip any other type of page
*/
- if (!PageLRU(page)) {
+ is_lru = PageLRU(page);
+ if (!is_lru) {
if (unlikely(balloon_page_movable(page))) {
if (balloon_page_isolate(page)) {
/* Successfully isolated */
goto isolate_success;
}
}
- continue;
}

/*
- * PageLRU is set. lru_lock normally excludes isolation
- * splitting and collapsing (collapsing has already happened
- * if PageLRU is set) but the lock is not necessarily taken
- * here and it is wasteful to take it just to check transhuge.
- * Check PageCompound without lock and skip the whole pageblock
- * if it's a transhuge page, as calling compound_order()
- * without preventing THP from splitting the page underneath us
- * may return surprising results.
- * If we happen to check a THP tail page, compound_order()
- * returns 0. It should be rare enough to not bother with
- * using compound_head() in that case.
+ * Regardless of being on LRU, compound pages such as THP and
+ * hugetlbfs are not to be compacted. We can potentially save
+ * a lot of iterations if we skip them at once. The check is
+ * racy, but we can consider only valid values and the only
+ * danger is skipping too much.
*/
if (PageCompound(page)) {
- int nr;
- if (locked)
- nr = 1 << compound_order(page);
- else
- nr = pageblock_nr_pages;
- low_pfn += nr - 1;
+ unsigned int comp_order = compound_order(page);
+
+ if (comp_order > 0 && comp_order < MAX_ORDER)
+ low_pfn += (1UL << comp_order) - 1;
+
continue;
}

+ if (!is_lru)
+ continue;
+
/*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
--
2.1.4

2015-06-10 09:35:00

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner

The compaction free scanner is looking for PageBuddy() pages and skipping all
others. For large compound pages such as THP or hugetlbfs, we can save a lot
of iterations if we skip them at once using their compound_order(). This is
generally unsafe and we can read a bogus value of order due to a race, but if
we are careful, the only danger is skipping too much.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_free_scanned count decreased by at least 15%.

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/compaction.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index e37d361..4a14084 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,

if (!valid_page)
valid_page = page;
+
+ /*
+ * For compound pages such as THP and hugetlbfs, we can save
+ * potentially a lot of iterations if we skip them at once.
+ * The check is racy, but we can consider only valid values
+ * and the only danger is skipping too much.
+ */
+ if (PageCompound(page)) {
+ unsigned int comp_order = compound_order(page);
+
+ if (comp_order > 0 && comp_order < MAX_ORDER) {
+ blockpfn += (1UL << comp_order) - 1;
+ cursor += (1UL << comp_order) - 1;
+ }
+
+ goto isolate_fail;
+ }
+
if (!PageBuddy(page))
goto isolate_fail;

@@ -496,6 +514,13 @@ isolate_fail:

}

+ /*
+ * There is a tiny chance that we have read bogus compound_order(),
+ * so be careful to not go outside of the pageblock.
+ */
+ if (unlikely(blockpfn > end_pfn))
+ blockpfn = end_pfn;
+
trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
nr_scanned, total_isolated);

--
2.1.4

2015-06-10 09:34:53

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn

The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
compaction to prevent rescanning pages where isolation has recently failed
or they were scanned during the previous compaction attempt.

Currently, both kinds of information are updated via update_pageblock_skip(),
which is suboptimal for the cached scanner pfn's:

- The condition "isolation has failed in the pageblock" checked by
update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
less sense for cached pfn's. There's little point for the next compaction
attempt to scan again a pageblock where all pages that could be isolated were
already processed.

- whole pageblocks can be skipped at the level of isolate_migratepages() or
isolate_freepages() before going into the corresponding _block() function.
Not updating cached scanner positions at the higher level may again result
in extra iterations.

This patch moves updating cached scanner pfn's from update_pageblock_skip()
to dedicated functions, which are called directly from isolate_migratepages()
and isolate_freepages(), resolving both inefficiencies.

During testing, the observed differences in compact_migrate_scanned and
compact_free_scanned were lost in the noise.

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/compaction.c | 48 +++++++++++++++++++++++++-----------------------
1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4a14084..c326607 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,17 +261,31 @@ void reset_isolation_suitable(pg_data_t *pgdat)
}
}

+static inline void
+update_cached_migrate_pfn(struct zone *zone, unsigned long pfn,
+ enum migrate_mode mode)
+{
+ if (pfn > zone->compact_cached_migrate_pfn[0])
+ zone->compact_cached_migrate_pfn[0] = pfn;
+ if (mode != MIGRATE_ASYNC &&
+ pfn > zone->compact_cached_migrate_pfn[1])
+ zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
+static inline void
+update_cached_free_pfn(struct zone *zone, unsigned long pfn)
+{
+ if (pfn < zone->compact_cached_free_pfn)
+ zone->compact_cached_free_pfn = pfn;
+}
+
/*
* If no pages were isolated then mark this pageblock to be skipped in the
* future. The information is later cleared by __reset_isolation_suitable().
*/
static void update_pageblock_skip(struct compact_control *cc,
- struct page *page, unsigned long nr_isolated,
- bool migrate_scanner)
+ struct page *page, unsigned long nr_isolated)
{
- struct zone *zone = cc->zone;
- unsigned long pfn;
-
if (cc->ignore_skip_hint)
return;

@@ -282,20 +296,6 @@ static void update_pageblock_skip(struct compact_control *cc,
return;

set_pageblock_skip(page);
-
- pfn = page_to_pfn(page);
-
- /* Update where async and sync compaction should restart */
- if (migrate_scanner) {
- if (pfn > zone->compact_cached_migrate_pfn[0])
- zone->compact_cached_migrate_pfn[0] = pfn;
- if (cc->mode != MIGRATE_ASYNC &&
- pfn > zone->compact_cached_migrate_pfn[1])
- zone->compact_cached_migrate_pfn[1] = pfn;
- } else {
- if (pfn < zone->compact_cached_free_pfn)
- zone->compact_cached_free_pfn = pfn;
- }
}
#else
static inline bool isolation_suitable(struct compact_control *cc,
@@ -305,8 +305,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
}

static void update_pageblock_skip(struct compact_control *cc,
- struct page *page, unsigned long nr_isolated,
- bool migrate_scanner)
+ struct page *page, unsigned long nr_isolated)
{
}
#endif /* CONFIG_COMPACTION */
@@ -540,7 +539,7 @@ isolate_fail:

/* Update the pageblock-skip if the whole pageblock was scanned */
if (blockpfn == end_pfn)
- update_pageblock_skip(cc, valid_page, total_isolated, false);
+ update_pageblock_skip(cc, valid_page, total_isolated);

count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
if (total_isolated)
@@ -843,7 +842,7 @@ isolate_success:
* if the whole pageblock was scanned without isolating any page.
*/
if (low_pfn == end_pfn)
- update_pageblock_skip(cc, valid_page, nr_isolated, true);
+ update_pageblock_skip(cc, valid_page, nr_isolated);

trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
@@ -1043,6 +1042,7 @@ static void isolate_freepages(struct compact_control *cc)
* and the loop terminated due to isolate_start_pfn < low_pfn
*/
cc->free_pfn = isolate_start_pfn;
+ update_cached_free_pfn(zone, isolate_start_pfn);
}

/*
@@ -1177,6 +1177,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
acct_isolated(zone, cc);
/* Record where migration scanner will be restarted. */
cc->migrate_pfn = low_pfn;
+ update_cached_migrate_pfn(zone, low_pfn, cc->mode);
+

return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
}
--
2.1.4

2015-06-10 18:03:16

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting

On 06/10/2015 05:32 AM, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations. The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>

Acked-by: Rik van Riel <[email protected]>


--
All rights reversed

2015-06-12 09:55:42

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations. The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>

Acked-by: Michal Nazarewicz <[email protected]>

> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 16e1b57..d46aaeb 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
> }
>
> /*
> + * Test whether the free scanner has reached the same or lower pageblock than
> + * the migration scanner, and compaction should thus terminate.
> + */
> +static inline bool compact_scanners_met(struct compact_control *cc)
> +{
> + return (cc->free_pfn >> pageblock_order)
> + <= (cc->migrate_pfn >> pageblock_order);
> +}
> +
> +/*
> * Based on information in the current compact_control, find blocks
> * suitable for isolating free pages from and then isolate them.
> */
> @@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> }
>
> acct_isolated(zone, cc);
> - /*
> - * Record where migration scanner will be restarted. If we end up in
> - * the same pageblock as the free scanner, make the scanners fully
> - * meet so that compact_finished() terminates compaction.
> - */
> - cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> + /* Record where migration scanner will be restarted. */
> + cc->migrate_pfn = low_pfn;
>
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
> @@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> return COMPACT_PARTIAL;
>
> /* Compaction run completes if the migrate and free scanner meet */
> - if (cc->free_pfn <= cc->migrate_pfn) {
> + if (compact_scanners_met(cc)) {
> /* Let the next compaction start anew. */
> zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> @@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> * migrate_pages() may return -ENOMEM when scanners meet
> * and we want compact_finished() to detect it
> */
> - if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> + if (err == -ENOMEM && !compact_scanners_met(cc)) {
> ret = COMPACT_PARTIAL;
> goto out;
> }
> --
> 2.1.4
>

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2015-06-12 10:07:42

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> Resetting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it explicitly
> where needed.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>

Acked-by: Michal Nazarewicz <[email protected]>

> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7e0a814..d334bb3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
> return !get_pageblock_skip(page);
> }
>
> +static void reset_cached_positions(struct zone *zone)
> +{
> + zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> + zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> + zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
> /*
> * This function is called to clear all cached information on pageblocks that
> * should be skipped for page isolation when the migrate and free page scanner
> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
> unsigned long end_pfn = zone_end_pfn(zone);
> unsigned long pfn;
>
> - zone->compact_cached_migrate_pfn[0] = start_pfn;
> - zone->compact_cached_migrate_pfn[1] = start_pfn;
> - zone->compact_cached_free_pfn = end_pfn;
> zone->compact_blockskip_flush = false;
>
> /* Walk the zone and mark every pageblock as suitable for isolation */
> @@ -250,8 +254,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> continue;
>
> /* Only flush if a full compaction finished recently */
> - if (zone->compact_blockskip_flush)
> + if (zone->compact_blockskip_flush) {
> __reset_isolation_suitable(zone);
> + reset_cached_positions(zone);
> + }
> }
> }
>
> @@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> /* Compaction run completes if the migrate and free scanner meet */
> if (compact_scanners_met(cc)) {
> /* Let the next compaction start anew. */
> - zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> - zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> - zone->compact_cached_free_pfn = zone_end_pfn(zone);
> + reset_cached_positions(zone);
>
> /*
> * Mark that the PG_migrate_skip information should be cleared
> @@ -1329,8 +1333,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> * is about to be retried after being deferred. kswapd does not do
> * this reset as it'll reset the cached information when going to sleep.
> */
> - if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
> + if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
> __reset_isolation_suitable(zone);
> + reset_cached_positions(zone);
> + }
>
> /*
> * Setup to move all movable pages to the end of the zone. Used cached
> --
> 2.1.4
>

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2015-06-12 10:11:44

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
>
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
>
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order(). The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>

Acked-by: Michal Nazarewicz <[email protected]>

> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 36 +++++++++++++++++-------------------
> 1 file changed, 17 insertions(+), 19 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d334bb3..e37d361 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>
> /* Time to isolate some pages for migration */
> for (; low_pfn < end_pfn; low_pfn++) {
> + bool is_lru;
> +
> /*
> * Periodically drop the lock (if held) regardless of its
> * contention, to give chance to IRQs. Abort async compaction
> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> * It's possible to migrate LRU pages and balloon pages
> * Skip any other type of page
> */
> - if (!PageLRU(page)) {
> + is_lru = PageLRU(page);
> + if (!is_lru) {
> if (unlikely(balloon_page_movable(page))) {
> if (balloon_page_isolate(page)) {
> /* Successfully isolated */
> goto isolate_success;
> }
> }
> - continue;
> }
>
> /*
> - * PageLRU is set. lru_lock normally excludes isolation
> - * splitting and collapsing (collapsing has already happened
> - * if PageLRU is set) but the lock is not necessarily taken
> - * here and it is wasteful to take it just to check transhuge.
> - * Check PageCompound without lock and skip the whole pageblock
> - * if it's a transhuge page, as calling compound_order()
> - * without preventing THP from splitting the page underneath us
> - * may return surprising results.
> - * If we happen to check a THP tail page, compound_order()
> - * returns 0. It should be rare enough to not bother with
> - * using compound_head() in that case.
> + * Regardless of being on LRU, compound pages such as THP and
> + * hugetlbfs are not to be compacted. We can potentially save
> + * a lot of iterations if we skip them at once. The check is
> + * racy, but we can consider only valid values and the only
> + * danger is skipping too much.
> */
> if (PageCompound(page)) {
> - int nr;
> - if (locked)
> - nr = 1 << compound_order(page);
> - else
> - nr = pageblock_nr_pages;
> - low_pfn += nr - 1;
> + unsigned int comp_order = compound_order(page);
> +
> + if (comp_order > 0 && comp_order < MAX_ORDER)
> + low_pfn += (1UL << comp_order) - 1;
> +
> continue;
> }
>
> + if (!is_lru)
> + continue;
> +
> /*
> * Migration will fail if an anonymous page is pinned in memory,
> * so avoid taking lru_lock and isolating it unnecessarily in an
> --
> 2.1.4
>

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2015-06-12 10:18:15

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> The compaction free scanner is looking for PageBuddy() pages and skipping all
> others. For large compound pages such as THP or hugetlbfs, we can save a lot
> of iterations if we skip them at once using their compound_order(). This is
> generally unsafe and we can read a bogus value of order due to a race, but if
> we are careful, the only danger is skipping too much.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_free_scanned count decreased by at least 15%.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>

Acked-by: Michal Nazarewicz <[email protected]>

> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e37d361..4a14084 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>
> if (!valid_page)
> valid_page = page;
> +
> + /*
> + * For compound pages such as THP and hugetlbfs, we can save
> + * potentially a lot of iterations if we skip them at once.
> + * The check is racy, but we can consider only valid values
> + * and the only danger is skipping too much.
> + */
> + if (PageCompound(page)) {
> + unsigned int comp_order = compound_order(page);
> +
> + if (comp_order > 0 && comp_order < MAX_ORDER) {

+ if (comp_order < MAX_ORDER) {

Might produce shorter/faster code. Dunno. Maybe. So much
micro-optimisations. Applies to the previous patch as well.

> + blockpfn += (1UL << comp_order) - 1;
> + cursor += (1UL << comp_order) - 1;
> + }
> +
> + goto isolate_fail;
> + }
> +
> if (!PageBuddy(page))
> goto isolate_fail;
>
> @@ -496,6 +514,13 @@ isolate_fail:
>
> }
>
> + /*
> + * There is a tiny chance that we have read bogus compound_order(),
> + * so be careful to not go outside of the pageblock.
> + */
> + if (unlikely(blockpfn > end_pfn))
> + blockpfn = end_pfn;
> +
> trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
> nr_scanned, total_isolated);
>
> --
> 2.1.4
>

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2015-06-16 05:35:45

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting

On Wed, Jun 10, 2015 at 11:32:29AM +0200, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations. The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Joonsoo Kim <[email protected]>

Thanks.

2015-06-16 05:36:10

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner

On Wed, Jun 10, 2015 at 11:32:30AM +0200, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
>
> This can be simplified if we move the test for having isolated enough from the
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
>
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Joonsoo Kim <[email protected]>

Thanks.

2015-06-16 05:39:14

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions

On Wed, Jun 10, 2015 at 11:32:31AM +0200, Vlastimil Babka wrote:
> Resetting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it explicitly
> where needed.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7e0a814..d334bb3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
> return !get_pageblock_skip(page);
> }
>
> +static void reset_cached_positions(struct zone *zone)
> +{
> + zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> + zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> + zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
> /*
> * This function is called to clear all cached information on pageblocks that
> * should be skipped for page isolation when the migrate and free page scanner
> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
> unsigned long end_pfn = zone_end_pfn(zone);
> unsigned long pfn;
>
> - zone->compact_cached_migrate_pfn[0] = start_pfn;
> - zone->compact_cached_migrate_pfn[1] = start_pfn;
> - zone->compact_cached_free_pfn = end_pfn;
> zone->compact_blockskip_flush = false;

Is there a valid reason not to call reset_cached_positions() in
__reset_isolation_suitable? You missed one callsite in
__compact_pgdat().

if (cc->order == -1)
__reset_isolation_suitable(zone);

This also needs reset_cached_positions().

Thanks.

2015-06-16 05:42:36

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner

On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
>
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
>
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order(). The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 36 +++++++++++++++++-------------------
> 1 file changed, 17 insertions(+), 19 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d334bb3..e37d361 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>
> /* Time to isolate some pages for migration */
> for (; low_pfn < end_pfn; low_pfn++) {
> + bool is_lru;
> +
> /*
> * Periodically drop the lock (if held) regardless of its
> * contention, to give chance to IRQs. Abort async compaction
> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> * It's possible to migrate LRU pages and balloon pages
> * Skip any other type of page
> */
> - if (!PageLRU(page)) {
> + is_lru = PageLRU(page);
> + if (!is_lru) {
> if (unlikely(balloon_page_movable(page))) {
> if (balloon_page_isolate(page)) {
> /* Successfully isolated */
> goto isolate_success;
> }
> }
> - continue;
> }
>
> /*
> - * PageLRU is set. lru_lock normally excludes isolation
> - * splitting and collapsing (collapsing has already happened
> - * if PageLRU is set) but the lock is not necessarily taken
> - * here and it is wasteful to take it just to check transhuge.
> - * Check PageCompound without lock and skip the whole pageblock
> - * if it's a transhuge page, as calling compound_order()
> - * without preventing THP from splitting the page underneath us
> - * may return surprising results.
> - * If we happen to check a THP tail page, compound_order()
> - * returns 0. It should be rare enough to not bother with
> - * using compound_head() in that case.
> + * Regardless of being on LRU, compound pages such as THP and
> + * hugetlbfs are not to be compacted. We can potentially save
> + * a lot of iterations if we skip them at once. The check is
> + * racy, but we can consider only valid values and the only
> + * danger is skipping too much.
> */
> if (PageCompound(page)) {
> - int nr;
> - if (locked)
> - nr = 1 << compound_order(page);
> - else
> - nr = pageblock_nr_pages;
> - low_pfn += nr - 1;
> + unsigned int comp_order = compound_order(page);
> +
> + if (comp_order > 0 && comp_order < MAX_ORDER)
> + low_pfn += (1UL << comp_order) - 1;
> +
> continue;
> }

How about moving this PageCompound() check up to the PageLRU check?
Is there any relationship between balloon page and PageCompound()?
It will remove is_lru and code would be more understandable.

Otherwise,

Acked-by: Joonsoo Kim <[email protected]>

Thanks.

2015-06-16 05:43:06

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner

On Wed, Jun 10, 2015 at 11:32:33AM +0200, Vlastimil Babka wrote:
> The compaction free scanner is looking for PageBuddy() pages and skipping all
> others. For large compound pages such as THP or hugetlbfs, we can save a lot
> of iterations if we skip them at once using their compound_order(). This is
> generally unsafe and we can read a bogus value of order due to a race, but if
> we are careful, the only danger is skipping too much.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_free_scanned count decreased by at least 15%.
>
> Signed-off-by: Vlastimil Babka <[email protected]>


Acked-by: Joonsoo Kim <[email protected]>

Thanks.

2015-06-16 06:08:12

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn

On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
> compaction to prevent rescanning pages where isolation has recently failed
> or they were scanned during the previous compaction attempt.
>
> Currently, both kinds of information are updated via update_pageblock_skip(),
> which is suboptimal for the cached scanner pfn's:
>
> - The condition "isolation has failed in the pageblock" checked by
> update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
> less sense for cached pfn's. There's little point for the next compaction
> attempt to scan again a pageblock where all pages that could be isolated were
> already processed.

In async compaction, compaction could be stopped due to cc->contended
in freepage scanner so sometimes isolated pages were not migrated. Your
change makes next async compaction skip these pages. This possibly causes
compaction complete prematurely by async compaction.

And, rescan previous attempted range could solve some race problem.
If allocated page waits to set PageLRU in pagevec, compaction will
pass it. If we try rescan after short time, page will have PageLRU and
compaction can isolate and migrate it and make high order freepage. This
requires some rescanning overhead but migration overhead which is more bigger
than scanning overhead is just a little. If compaction pass it like as
this change, pages on this area would be allocated for other requestor, and,
when compaction revisit, there would be more page to migrate.

I basically agree with this change because it is more intuitive. But,
I'd like to see some improvement result or test this patch myself before merging
it.

Thanks.

>
> - whole pageblocks can be skipped at the level of isolate_migratepages() or
> isolate_freepages() before going into the corresponding _block() function.
> Not updating cached scanner positions at the higher level may again result
> in extra iterations.
>
> This patch moves updating cached scanner pfn's from update_pageblock_skip()
> to dedicated functions, which are called directly from isolate_migratepages()
> and isolate_freepages(), resolving both inefficiencies.
>
> During testing, the observed differences in compact_migrate_scanned and
> compact_free_scanned were lost in the noise.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/compaction.c | 48 +++++++++++++++++++++++++-----------------------
> 1 file changed, 25 insertions(+), 23 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4a14084..c326607 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,17 +261,31 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> }
> }
>
> +static inline void
> +update_cached_migrate_pfn(struct zone *zone, unsigned long pfn,
> + enum migrate_mode mode)
> +{
> + if (pfn > zone->compact_cached_migrate_pfn[0])
> + zone->compact_cached_migrate_pfn[0] = pfn;
> + if (mode != MIGRATE_ASYNC &&
> + pfn > zone->compact_cached_migrate_pfn[1])
> + zone->compact_cached_migrate_pfn[1] = pfn;
> +}
> +
> +static inline void
> +update_cached_free_pfn(struct zone *zone, unsigned long pfn)
> +{
> + if (pfn < zone->compact_cached_free_pfn)
> + zone->compact_cached_free_pfn = pfn;
> +}
> +
> /*
> * If no pages were isolated then mark this pageblock to be skipped in the
> * future. The information is later cleared by __reset_isolation_suitable().
> */
> static void update_pageblock_skip(struct compact_control *cc,
> - struct page *page, unsigned long nr_isolated,
> - bool migrate_scanner)
> + struct page *page, unsigned long nr_isolated)
> {
> - struct zone *zone = cc->zone;
> - unsigned long pfn;
> -
> if (cc->ignore_skip_hint)
> return;
>
> @@ -282,20 +296,6 @@ static void update_pageblock_skip(struct compact_control *cc,
> return;
>
> set_pageblock_skip(page);
> -
> - pfn = page_to_pfn(page);
> -
> - /* Update where async and sync compaction should restart */
> - if (migrate_scanner) {
> - if (pfn > zone->compact_cached_migrate_pfn[0])
> - zone->compact_cached_migrate_pfn[0] = pfn;
> - if (cc->mode != MIGRATE_ASYNC &&
> - pfn > zone->compact_cached_migrate_pfn[1])
> - zone->compact_cached_migrate_pfn[1] = pfn;
> - } else {
> - if (pfn < zone->compact_cached_free_pfn)
> - zone->compact_cached_free_pfn = pfn;
> - }
> }
> #else
> static inline bool isolation_suitable(struct compact_control *cc,
> @@ -305,8 +305,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
> }
>
> static void update_pageblock_skip(struct compact_control *cc,
> - struct page *page, unsigned long nr_isolated,
> - bool migrate_scanner)
> + struct page *page, unsigned long nr_isolated)
> {
> }
> #endif /* CONFIG_COMPACTION */
> @@ -540,7 +539,7 @@ isolate_fail:
>
> /* Update the pageblock-skip if the whole pageblock was scanned */
> if (blockpfn == end_pfn)
> - update_pageblock_skip(cc, valid_page, total_isolated, false);
> + update_pageblock_skip(cc, valid_page, total_isolated);
>
> count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
> if (total_isolated)
> @@ -843,7 +842,7 @@ isolate_success:
> * if the whole pageblock was scanned without isolating any page.
> */
> if (low_pfn == end_pfn)
> - update_pageblock_skip(cc, valid_page, nr_isolated, true);
> + update_pageblock_skip(cc, valid_page, nr_isolated);
>
> trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
> nr_scanned, nr_isolated);
> @@ -1043,6 +1042,7 @@ static void isolate_freepages(struct compact_control *cc)
> * and the loop terminated due to isolate_start_pfn < low_pfn
> */
> cc->free_pfn = isolate_start_pfn;
> + update_cached_free_pfn(zone, isolate_start_pfn);
> }
>
> /*
> @@ -1177,6 +1177,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> acct_isolated(zone, cc);
> /* Record where migration scanner will be restarted. */
> cc->migrate_pfn = low_pfn;
> + update_cached_migrate_pfn(zone, low_pfn, cc->mode);
> +
>
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
> --
> 2.1.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2015-06-16 12:13:40

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions

On 06/16/2015 07:41 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:31AM +0200, Vlastimil Babka wrote:
>> Resetting the cached compaction scanner positions is now done implicitly in
>> __reset_isolation_suitable() and compact_finished(). Encapsulate the
>> functionality in a new function reset_cached_positions() and call it explicitly
>> where needed.
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> Cc: Minchan Kim <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Joonsoo Kim <[email protected]>
>> Cc: Michal Nazarewicz <[email protected]>
>> Cc: Naoya Horiguchi <[email protected]>
>> Cc: Christoph Lameter <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> ---
>> mm/compaction.c | 22 ++++++++++++++--------
>> 1 file changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 7e0a814..d334bb3 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>> return !get_pageblock_skip(page);
>> }
>>
>> +static void reset_cached_positions(struct zone *zone)
>> +{
>> + zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>> + zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
>> + zone->compact_cached_free_pfn = zone_end_pfn(zone);
>> +}
>> +
>> /*
>> * This function is called to clear all cached information on pageblocks that
>> * should be skipped for page isolation when the migrate and free page scanner
>> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>> unsigned long end_pfn = zone_end_pfn(zone);
>> unsigned long pfn;
>>
>> - zone->compact_cached_migrate_pfn[0] = start_pfn;
>> - zone->compact_cached_migrate_pfn[1] = start_pfn;
>> - zone->compact_cached_free_pfn = end_pfn;
>> zone->compact_blockskip_flush = false;
>
> Is there a valid reason not to call reset_cached_positions() in
> __reset_isolation_suitable?

Hm maybe not, except being somewhat implicit again. It might had a stronger
reason in the previous patchset.

> You missed one callsite in
> __compact_pgdat().
>
> if (cc->order == -1)
> __reset_isolation_suitable(zone);
>
> This also needs reset_cached_positions().

Ah, good catch. Thanks.

>
> Thanks.
>

2015-06-16 12:17:03

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner

On 06/16/2015 07:44 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:

[...]

>> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> * It's possible to migrate LRU pages and balloon pages
>> * Skip any other type of page
>> */
>> - if (!PageLRU(page)) {
>> + is_lru = PageLRU(page);
>> + if (!is_lru) {
>> if (unlikely(balloon_page_movable(page))) {
>> if (balloon_page_isolate(page)) {
>> /* Successfully isolated */
>> goto isolate_success;
>> }
>> }
>> - continue;
>> }
>>
>> /*
>> - * PageLRU is set. lru_lock normally excludes isolation
>> - * splitting and collapsing (collapsing has already happened
>> - * if PageLRU is set) but the lock is not necessarily taken
>> - * here and it is wasteful to take it just to check transhuge.
>> - * Check PageCompound without lock and skip the whole pageblock
>> - * if it's a transhuge page, as calling compound_order()
>> - * without preventing THP from splitting the page underneath us
>> - * may return surprising results.
>> - * If we happen to check a THP tail page, compound_order()
>> - * returns 0. It should be rare enough to not bother with
>> - * using compound_head() in that case.
>> + * Regardless of being on LRU, compound pages such as THP and
>> + * hugetlbfs are not to be compacted. We can potentially save
>> + * a lot of iterations if we skip them at once. The check is
>> + * racy, but we can consider only valid values and the only
>> + * danger is skipping too much.
>> */
>> if (PageCompound(page)) {
>> - int nr;
>> - if (locked)
>> - nr = 1 << compound_order(page);
>> - else
>> - nr = pageblock_nr_pages;
>> - low_pfn += nr - 1;
>> + unsigned int comp_order = compound_order(page);
>> +
>> + if (comp_order > 0 && comp_order < MAX_ORDER)
>> + low_pfn += (1UL << comp_order) - 1;
>> +
>> continue;
>> }
>
> How about moving this PageCompound() check up to the PageLRU check?
> Is there any relationship between balloon page and PageCompound()?

I didn't want to assume if there's a relationship or not, as per the changelog:

>> After this patch, all pages are tested for PageCompound() and we skip them by
>> compound_order(). The test is done after the test for balloon_page_movable()
>> as we don't want to assume if balloon pages (or other pages with own isolation
>> and migration implementation if a generic API gets implemented) are compound
>> or not.

> It will remove is_lru and code would be more understandable.

Right, it just felt safer and more future-proof this way.

> Otherwise,
>
> Acked-by: Joonsoo Kim <[email protected]>
>
> Thanks.
>

2015-06-16 12:33:15

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn

On 06/16/2015 08:10 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
>> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
>> compaction to prevent rescanning pages where isolation has recently failed
>> or they were scanned during the previous compaction attempt.
>>
>> Currently, both kinds of information are updated via update_pageblock_skip(),
>> which is suboptimal for the cached scanner pfn's:
>>
>> - The condition "isolation has failed in the pageblock" checked by
>> update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>> less sense for cached pfn's. There's little point for the next compaction
>> attempt to scan again a pageblock where all pages that could be isolated were
>> already processed.
>
> In async compaction, compaction could be stopped due to cc->contended
> in freepage scanner so sometimes isolated pages were not migrated. Your
> change makes next async compaction skip these pages. This possibly causes
> compaction complete prematurely by async compaction.

Hm, I see, thanks. That could be fixed when returning the non-migrated pages,
just like we do for the unused freepages and cached free scanner position.

> And, rescan previous attempted range could solve some race problem.
> If allocated page waits to set PageLRU in pagevec, compaction will
> pass it. If we try rescan after short time, page will have PageLRU and
> compaction can isolate and migrate it and make high order freepage. This
> requires some rescanning overhead but migration overhead which is more bigger
> than scanning overhead is just a little. If compaction pass it like as
> this change, pages on this area would be allocated for other requestor, and,
> when compaction revisit, there would be more page to migrate.

The same "race problem" (and many others) can happen when we don't abort and
later restart from cached pfn's, but just continue on to later pageblocks within
single compaction run. Still I would expect that it's statistically higher
chance to succeed in the next pageblock than rescanning pageblock(s) that we
just scanned.

> I basically agree with this change because it is more intuitive. But,
> I'd like to see some improvement result or test this patch myself before merging
> it.

Sure, please test. I don't expect much difference, the primary motivation was
really that the recorded pfn's by tracepoints looked much saner.

Thanks for the review!

2015-06-16 13:03:34

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn

2015-06-16 21:33 GMT+09:00 Vlastimil Babka <[email protected]>:
> On 06/16/2015 08:10 AM, Joonsoo Kim wrote:
>> On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
>>> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
>>> compaction to prevent rescanning pages where isolation has recently failed
>>> or they were scanned during the previous compaction attempt.
>>>
>>> Currently, both kinds of information are updated via update_pageblock_skip(),
>>> which is suboptimal for the cached scanner pfn's:
>>>
>>> - The condition "isolation has failed in the pageblock" checked by
>>> update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>>> less sense for cached pfn's. There's little point for the next compaction
>>> attempt to scan again a pageblock where all pages that could be isolated were
>>> already processed.
>>
>> In async compaction, compaction could be stopped due to cc->contended
>> in freepage scanner so sometimes isolated pages were not migrated. Your
>> change makes next async compaction skip these pages. This possibly causes
>> compaction complete prematurely by async compaction.
>
> Hm, I see, thanks. That could be fixed when returning the non-migrated pages,
> just like we do for the unused freepages and cached free scanner position.

Yes, that would work.

>> And, rescan previous attempted range could solve some race problem.
>> If allocated page waits to set PageLRU in pagevec, compaction will
>> pass it. If we try rescan after short time, page will have PageLRU and
>> compaction can isolate and migrate it and make high order freepage. This
>> requires some rescanning overhead but migration overhead which is more bigger
>> than scanning overhead is just a little. If compaction pass it like as
>> this change, pages on this area would be allocated for other requestor, and,
>> when compaction revisit, there would be more page to migrate.
>
> The same "race problem" (and many others) can happen when we don't abort and
> later restart from cached pfn's, but just continue on to later pageblocks within
> single compaction run. Still I would expect that it's statistically higher
> chance to succeed in the next pageblock than rescanning pageblock(s) that we
> just scanned.

If we consider just one compaction attempt, yes, scanning next pageblock
is more promising way to succeed. But, if we rescan pageblock that we
just scanned and, fortunately, we succeed to migrate and make high order
freeepage from that pageblock, more compaction attempt can be successful
before both scanner meet and this may result in less overhead in view of
overall compaction attempts.

>> I basically agree with this change because it is more intuitive. But,
>> I'd like to see some improvement result or test this patch myself before merging
>> it.
>
> Sure, please test. I don't expect much difference, the primary motivation was
> really that the recorded pfn's by tracepoints looked much saner.

Yes. that's what I'd like to say from *intuitive*. :)

Thanks.

2015-06-19 13:41:54

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting

On Wed, Jun 10, 2015 at 11:32:29AM +0200, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations. The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2015-06-19 13:48:36

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner

On Wed, Jun 10, 2015 at 11:32:30AM +0200, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
>
> This can be simplified if we move the test for having isolated enough from the
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
>
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2015-06-19 13:58:32

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner

On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
>
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
>
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order(). The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs