LinuxLists.cc - [PATCH v5 0/3] Enable >0 order folio memory compaction

2024-02-14 22:07:19

Subject: [PATCH v5 0/3] Enable >0 order folio memory compaction

From: Zi Yan <[email protected]>

Hi all,

This patchset enables >0 order folio memory compaction, which is one of
the prerequisitions for large folio support[1]. It is on top of
mm-everything-2024-02-14-21-42.

I am aware of that split free pages is necessary for folio
migration in compaction, since if >0 order free pages are never split
and no order-0 free page is scanned, compaction will end prematurely due
to migration returns -ENOMEM. Free page split becomes a must instead of
an optimization.

lkp ncompare results (on a 8-CPU (Intel Xeon E5-2650 v4 @2.20GHz) 16G VM)
for default LRU (-no-mglru) and CONFIG_LRU_GEN are shown at the bottom,
copied from V3[4].
In sum, most of vm-scalability applications do not see performance
change, and the others see ~4% to ~26% performance boost under default LRU
and ~2% to ~6% performance boost under CONFIG_LRU_GEN.

Changelog
===

From V4 [5]:
1. Refactored code in compaction_alloc() in Patch 3 (per Yu Zhao).

From V3 [4]:
1. Restructured isolate_migratepages_block() to minimize PageHuge() use
in Patch 1 (per Vlastimil Babka).

2. Used folio_put_testzero() instead of folio_set_count() to properly
handle free pages in compaction_free() (per Vlastimil Babka).

3. Simplified code to use struct list_head instead of a new struct page_list
(per Vlastimil Babka).

4. Restructured compaction_alloc() code to reduce indentation and
increase readability (per Vlastimil Babka).

From V2 [3]:
1. Added missing free page count in fast isolation path. This fixed the
weird performance outcome.

From V1 [2]:
1. Used folio_test_large() instead of folio_order() > 0. (per Matthew
Wilcox)

2. Fixed code rebase error. (per Baolin Wang)

3. Used list_split_init() instead of list_split(). (per Ryan Boberts)

4. Added free_pages_prepare_fpi_none() to avoid duplicate free page code
in compaction_free().

5. Dropped source page order sorting patch.

From RFC [1]:
1. Enabled >0 order folio compaction in the first patch by splitting all
to-be-migrated folios. (per Huang, Ying)

2. Stopped isolating compound pages with order greater than cc->order
to avoid wasting effort, since cc->order gives a hint that no free pages
with order greater than it exist, thus migrating the compound pages will fail.
(per Baolin Wang)

3. Retained the folio check within lru lock. (per Baolin Wang)

4. Made isolate_freepages_block() generate order-sorted multi lists.
(per Johannes Weiner)

Overview
===

To support >0 order folio compaction, the patchset changes how free pages used
for migration are kept during compaction. Free pages used to be split into
order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared,
page order stored in page->private is zeroed, and page reference is set to 1).
Now all free pages are kept in a NR_PAGE_ORDER array of page lists based
on their order without post allocation process. When migrate_pages() asks for
a new page, one of the free pages, based on the requested page order, is
then processed and given out. And THP <2MB would need this feature.

Feel free to give comments and ask questions.

Thanks.

[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lore.kernel.org/linux-mm/[email protected]/
[4] https://lore.kernel.org/linux-mm/[email protected]/
[5] https://lore.kernel.org/linux-mm/[email protected]/

Hi Andrew,

Baolin's patch on nr_migratepages was based on this one, a better fixup
for it might be below. Since before my patchset, compaction only deals with
order-0 pages.

diff --git a/mm/compaction.c b/mm/compaction.c
index 01ec85cfd623f..e60135e2019d6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1798,7 +1798,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
dst = list_entry(cc->freepages.next, struct folio, lru);
list_del(&dst->lru);
cc->nr_freepages--;
- cc->nr_migratepages -= 1 << order;
+ cc->nr_migratepages--;

return dst;
}
@@ -1814,7 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned long data)

list_add(&dst->lru, &cc->freepages);
cc->nr_freepages++;
- cc->nr_migratepages += 1 << order;
+ cc->nr_migratepages++;
}

vm-scalability results on CONFIG_LRU_GEN
===

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/mmap-xread-seq-mt/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19+
6.8.0-rc1-split-folio-in-compaction+
6.8.0-rc1-folio-migration-in-compaction+
6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
15107616 +3.2% 15590339 +1.3% 15297619 +3.0% 15567998 vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/mmap-pread-seq/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19+
6.8.0-rc1-split-folio-in-compaction+
6.8.0-rc1-folio-migration-in-compaction+
6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
12611785 +1.8% 12832919 +0.9% 12724223 +1.6% 12812682 vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/lru-file-readtwice/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19+
6.8.0-rc1-split-folio-in-compaction+
6.8.0-rc1-folio-migration-in-compaction+
6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
9833393 +5.7% 10390190 +3.0% 10126606 +5.9% 10408804 vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/lru-file-mmap-read/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19+
6.8.0-rc1-split-folio-in-compaction+
6.8.0-rc1-folio-migration-in-compaction+
6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
7034709 ± 3% +2.9% 7241429 +3.2% 7256680 ± 2% +3.9% 7308375 vm-scalability.throughput

vm-scalability results on default LRU (with -no-mglru suffix)
===

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/mmap-xread-seq-mt/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
6.8.0-rc1-split-folio-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
14401491 +3.7% 14940270 +2.4% 14748626 +4.0% 14975716 vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/mmap-pread-seq/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
6.8.0-rc1-split-folio-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
11407497 +5.1% 11989632 -0.5% 11349272 +4.8% 11957423 vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/mmap-pread-seq-mt/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
6.8.0-rc1-split-folio-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
11348474 +3.3% 11719453 -1.2% 11208759 +3.7% 11771926 vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/lru-file-readtwice/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
6.8.0-rc1-split-folio-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
8065614 ± 3% +7.7% 8686626 ± 2% +5.0% 8467577 ± 4% +11.8% 9016077 ± 2% vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-13/defconfig/debian/300s/qemu-vm/lru-file-mmap-read/vm-scalability

commit:
6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
6.8.0-rc1-split-folio-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-in-compaction-no-mglru+
6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
6438422 ± 2% +27.5% 8206734 ± 2% +10.6% 7118390 +26.2% 8127192 ± 4% vm-scalability.throughput

Zi Yan (3):
mm/compaction: enable compacting >0 order folios.
mm/compaction: add support for >0 order folio memory compaction.
mm/compaction: optimize >0 order folio compaction with free page
split.

mm/compaction.c | 228 +++++++++++++++++++++++++++++++++---------------
mm/internal.h | 4 +-
mm/page_alloc.c | 6 ++
3 files changed, 165 insertions(+), 73 deletions(-)

--
2.43.0

2024-02-14 22:07:29

by Zi Yan

[permalink] [raw]

Subject: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

From: Zi Yan <[email protected]>

Before last commit, memory compaction only migrates order-0 folios and
skips >0 order folios. Last commit splits all >0 order folios during
compaction. This commit migrates >0 order folios during compaction by
keeping isolated free pages at their original size without splitting them
into order-0 pages and using them directly during migration process.

What is different from the prior implementation:
1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page
lists, where each page list stores free pages in the same order.
2. All free pages are not post_alloc_hook() processed nor buddy pages,
although their orders are stored in first page's private like buddy
pages.
3. During migration, in new page allocation time (i.e., in
compaction_alloc()), free pages are then processed by post_alloc_hook().
When migration fails and a new page is returned (i.e., in
compaction_free()), free pages are restored by reversing the
post_alloc_hook() operations using newly added
free_pages_prepare_fpi_none().

Step 3 is done for a latter optimization that splitting and/or merging
free pages during compaction becomes easier.

Note: without splitting free pages, compaction can end prematurely due to
migration will return -ENOMEM even if there is free pages. This happens
when no order-0 free page exist and compaction_alloc() return NULL.

Signed-off-by: Zi Yan <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Tested-by: Yu Zhao <[email protected]>
Cc: Adam Manzanares <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kemeng Shi <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yin Fengwei <[email protected]>
---
mm/compaction.c | 143 +++++++++++++++++++++++++++---------------------
mm/internal.h | 4 +-
mm/page_alloc.c | 6 ++
3 files changed, 91 insertions(+), 62 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index aa6aad805c4d..d0a05a621b67 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -66,45 +66,56 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
#define COMPACTION_HPAGE_ORDER (PMD_SHIFT - PAGE_SHIFT)
#endif

-static unsigned long release_freepages(struct list_head *freelist)
+static void split_map_pages(struct list_head *freepages)
{
+ unsigned int i, order;
struct page *page, *next;
- unsigned long high_pfn = 0;
+ LIST_HEAD(tmp_list);

- list_for_each_entry_safe(page, next, freelist, lru) {
- unsigned long pfn = page_to_pfn(page);
- list_del(&page->lru);
- __free_page(page);
- if (pfn > high_pfn)
- high_pfn = pfn;
- }
+ for (order = 0; order < NR_PAGE_ORDERS; order++) {
+ list_for_each_entry_safe(page, next, &freepages[order], lru) {
+ unsigned int nr_pages;

- return high_pfn;
+ list_del(&page->lru);
+
+ nr_pages = 1 << order;
+
+ post_alloc_hook(page, order, __GFP_MOVABLE);
+ if (order)
+ split_page(page, order);
+
+ for (i = 0; i < nr_pages; i++) {
+ list_add(&page->lru, &tmp_list);
+ page++;
+ }
+ }
+ list_splice_init(&tmp_list, &freepages[0]);
+ }
}

-static void split_map_pages(struct list_head *list)
+static unsigned long release_free_list(struct list_head *freepages)
{
- unsigned int i, order, nr_pages;
- struct page *page, *next;
- LIST_HEAD(tmp_list);
-
- list_for_each_entry_safe(page, next, list, lru) {
- list_del(&page->lru);
+ int order;
+ unsigned long high_pfn = 0;

- order = page_private(page);
- nr_pages = 1 << order;
+ for (order = 0; order < NR_PAGE_ORDERS; order++) {
+ struct page *page, *next;

- post_alloc_hook(page, order, __GFP_MOVABLE);
- if (order)
- split_page(page, order);
+ list_for_each_entry_safe(page, next, &freepages[order], lru) {
+ unsigned long pfn = page_to_pfn(page);

- for (i = 0; i < nr_pages; i++) {
- list_add(&page->lru, &tmp_list);
- page++;
+ list_del(&page->lru);
+ /*
+ * Convert free pages into post allocation pages, so
+ * that we can free them via __free_page.
+ */
+ post_alloc_hook(page, order, __GFP_MOVABLE);
+ __free_pages(page, order);
+ if (pfn > high_pfn)
+ high_pfn = pfn;
}
}
-
- list_splice(&tmp_list, list);
+ return high_pfn;
}

#ifdef CONFIG_COMPACTION
@@ -657,7 +668,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
nr_scanned += isolated - 1;
total_isolated += isolated;
cc->nr_freepages += isolated;
- list_add_tail(&page->lru, freelist);
+ list_add_tail(&page->lru, &freelist[order]);

if (!strict && cc->nr_migratepages <= cc->nr_freepages) {
blockpfn += isolated;
@@ -722,7 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
unsigned long start_pfn, unsigned long end_pfn)
{
unsigned long isolated, pfn, block_start_pfn, block_end_pfn;
- LIST_HEAD(freelist);
+ int order;
+ struct list_head tmp_freepages[NR_PAGE_ORDERS];
+
+ for (order = 0; order < NR_PAGE_ORDERS; order++)
+ INIT_LIST_HEAD(&tmp_freepages[order]);

pfn = start_pfn;
block_start_pfn = pageblock_start_pfn(pfn);
@@ -753,7 +768,7 @@ isolate_freepages_range(struct compact_control *cc,
break;

isolated = isolate_freepages_block(cc, &isolate_start_pfn,
- block_end_pfn, &freelist, 0, true);
+ block_end_pfn, tmp_freepages, 0, true);

/*
* In strict mode, isolate_freepages_block() returns 0 if
@@ -770,15 +785,15 @@ isolate_freepages_range(struct compact_control *cc,
*/
}

- /* __isolate_free_page() does not map the pages */
- split_map_pages(&freelist);
-
if (pfn < end_pfn) {
/* Loop terminated early, cleanup. */
- release_freepages(&freelist);
+ release_free_list(tmp_freepages);
return 0;
}

+ /* __isolate_free_page() does not map the pages */
+ split_map_pages(tmp_freepages);
+
/* We don't use freelists for anything. */
return pfn;
}
@@ -1494,7 +1509,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
if (!page)
return;

- isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
+ isolate_freepages_block(cc, &start_pfn, end_pfn, cc->freepages, 1, false);

/* Skip this pageblock in the future as it's full or nearly full */
if (start_pfn == end_pfn && !cc->no_set_skip_hint)
@@ -1623,7 +1638,7 @@ static void fast_isolate_freepages(struct compact_control *cc)
nr_scanned += nr_isolated - 1;
total_isolated += nr_isolated;
cc->nr_freepages += nr_isolated;
- list_add_tail(&page->lru, &cc->freepages);
+ list_add_tail(&page->lru, &cc->freepages[order]);
count_compact_events(COMPACTISOLATED, nr_isolated);
} else {
/* If isolation fails, abort the search */
@@ -1700,13 +1715,12 @@ static void isolate_freepages(struct compact_control *cc)
unsigned long isolate_start_pfn; /* exact pfn we start at */
unsigned long block_end_pfn; /* end of current pageblock */
unsigned long low_pfn; /* lowest pfn scanner is able to scan */
- struct list_head *freelist = &cc->freepages;
unsigned int stride;

/* Try a small search of the free lists for a candidate */
fast_isolate_freepages(cc);
if (cc->nr_freepages)
- goto splitmap;
+ return;

/*
* Initialise the free scanner. The starting point is where we last
@@ -1766,7 +1780,7 @@ static void isolate_freepages(struct compact_control *cc)

/* Found a block suitable for isolating free pages from. */
nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn,
- block_end_pfn, freelist, stride, false);
+ block_end_pfn, cc->freepages, stride, false);

/* Update the skip hint if the full pageblock was scanned */
if (isolate_start_pfn == block_end_pfn)
@@ -1807,10 +1821,6 @@ static void isolate_freepages(struct compact_control *cc)
* and the loop terminated due to isolate_start_pfn < low_pfn
*/
cc->free_pfn = isolate_start_pfn;
-
-splitmap:
- /* __isolate_free_page() does not map the pages */
- split_map_pages(freelist);
}

/*
@@ -1821,24 +1831,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
{
struct compact_control *cc = (struct compact_control *)data;
struct folio *dst;
+ int order = folio_order(src);

- /* this makes migrate_pages() split the source page and retry */
- if (folio_test_large(src) > 0)
- return NULL;
-
- if (list_empty(&cc->freepages)) {
+ if (list_empty(&cc->freepages[order])) {
isolate_freepages(cc);
-
- if (list_empty(&cc->freepages))
+ if (list_empty(&cc->freepages[order]))
return NULL;
}

- dst = list_entry(cc->freepages.next, struct folio, lru);
+ dst = list_first_entry(&cc->freepages[order], struct folio, lru);
list_del(&dst->lru);
- cc->nr_freepages--;
- cc->nr_migratepages -= 1 << folio_order(src);
-
- return dst;
+ post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
+ if (order)
+ prep_compound_page(&dst->page, order);
+ cc->nr_freepages -= 1 << order;
+ cc->nr_migratepages -= 1 << order;
+ return page_rmappable_folio(&dst->page);
}

/*
@@ -1849,10 +1857,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
static void compaction_free(struct folio *dst, unsigned long data)
{
struct compact_control *cc = (struct compact_control *)data;
+ int order = folio_order(dst);
+ struct page *page = &dst->page;
+
+ if (folio_put_testzero(dst)) {
+ free_pages_prepare_fpi_none(page, order);
+
+ INIT_LIST_HEAD(&dst->lru);

- list_add(&dst->lru, &cc->freepages);
- cc->nr_freepages++;
- cc->nr_migratepages += 1 << folio_order(dst);
+ list_add(&dst->lru, &cc->freepages[order]);
+ cc->nr_freepages += 1 << order;
+ cc->nr_migratepages += 1 << order;
+ }
+ /*
+ * someone else has referenced the page, we cannot take it back to our
+ * free list.
+ */
}

/* possible outcome of isolate_migratepages */
@@ -2476,6 +2496,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
const bool sync = cc->mode != MIGRATE_ASYNC;
bool update_cached;
unsigned int nr_succeeded = 0;
+ int order;

/*
* These counters track activities during zone compaction. Initialize
@@ -2485,7 +2506,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
cc->total_free_scanned = 0;
cc->nr_migratepages = 0;
cc->nr_freepages = 0;
- INIT_LIST_HEAD(&cc->freepages);
+ for (order = 0; order < NR_PAGE_ORDERS; order++)
+ INIT_LIST_HEAD(&cc->freepages[order]);
INIT_LIST_HEAD(&cc->migratepages);

cc->migratetype = gfp_migratetype(cc->gfp_mask);
@@ -2671,7 +2693,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
* so we don't leave any returned pages behind in the next attempt.
*/
if (cc->nr_freepages > 0) {
- unsigned long free_pfn = release_freepages(&cc->freepages);
+ unsigned long free_pfn = release_free_list(cc->freepages);

cc->nr_freepages = 0;
VM_BUG_ON(free_pfn == 0);
@@ -2690,7 +2712,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)

trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);

- VM_BUG_ON(!list_empty(&cc->freepages));
VM_BUG_ON(!list_empty(&cc->migratepages));

return ret;
diff --git a/mm/internal.h b/mm/internal.h
index 1e29c5821a1d..9925291e7704 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -447,6 +447,8 @@ extern void prep_compound_page(struct page *page, unsigned int order);

extern void post_alloc_hook(struct page *page, unsigned int order,
gfp_t gfp_flags);
+extern bool free_pages_prepare_fpi_none(struct page *page, unsigned int order);
+
extern int user_min_free_kbytes;

extern void free_unref_page(struct page *page, unsigned int order);
@@ -481,7 +483,7 @@ int split_free_page(struct page *free_page,
* completes when free_pfn <= migrate_pfn
*/
struct compact_control {
- struct list_head freepages; /* List of free pages to migrate to */
+ struct list_head freepages[NR_PAGE_ORDERS]; /* List of free pages to migrate to */
struct list_head migratepages; /* List of pages being migrated */
unsigned int nr_freepages; /* Number of isolated free pages */
unsigned int nr_migratepages; /* Number of pages to migrate */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7ae4b74c9e5c..e6e2ac722a82 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1179,6 +1179,12 @@ static __always_inline bool free_pages_prepare(struct page *page,
return true;
}

+__always_inline bool free_pages_prepare_fpi_none(struct page *page,
+ unsigned int order)
+{
+ return free_pages_prepare(page, order, FPI_NONE);
+}
+
/*
* Frees a number of pages from the PCP lists
* Assumes all pages on list are in same zone.
--
2.43.0

2024-02-14 22:07:29

by Zi Yan

[permalink] [raw]

Subject: [PATCH v5 3/3] mm/compaction: optimize >0 order folio compaction with free page split.

From: Zi Yan <[email protected]>

During migration in a memory compaction, free pages are placed in an array
of page lists based on their order. But the desired free page order
(i.e., the order of a source page) might not be always present, thus
leading to migration failures and premature compaction termination. Split
a high order free pages when source migration page has a lower order to
increase migration successful rate.

Note: merging free pages when a migration fails and a lower order free
page is returned via compaction_free() is possible, but there is too much
work. Since the free pages are not buddy pages, it is hard to identify
these free pages using existing PFN-based page merging algorithm.

Signed-off-by: Zi Yan <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Tested-by: Yu Zhao <[email protected]>
Cc: Adam Manzanares <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kemeng Shi <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yin Fengwei <[email protected]>
---
mm/compaction.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d0a05a621b67..b261c5f13bef 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1832,15 +1832,40 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
struct compact_control *cc = (struct compact_control *)data;
struct folio *dst;
int order = folio_order(src);
+ bool has_isolated_pages = false;
+ int start_order;
+ struct page *freepage;
+ unsigned long size;
+
+again:
+ for (start_order = order; start_order < NR_PAGE_ORDERS; start_order++)
+ if (!list_empty(&cc->freepages[start_order]))
+ break;

- if (list_empty(&cc->freepages[order])) {
- isolate_freepages(cc);
- if (list_empty(&cc->freepages[order]))
+ /* no free pages in the list */
+ if (start_order == NR_PAGE_ORDERS) {
+ if (has_isolated_pages)
return NULL;
+ isolate_freepages(cc);
+ has_isolated_pages = true;
+ goto again;
+ }
+
+ freepage = list_first_entry(&cc->freepages[start_order], struct page,
+ lru);
+ size = 1 << start_order;
+
+ list_del(&freepage->lru);
+
+ while (start_order > order) {
+ start_order--;
+ size >>= 1;
+
+ list_add(&freepage[size].lru, &cc->freepages[start_order]);
+ set_page_private(&freepage[size], start_order);
}
+ dst = (struct folio *)freepage;

- dst = list_first_entry(&cc->freepages[order], struct folio, lru);
- list_del(&dst->lru);
post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
if (order)
prep_compound_page(&dst->page, order);
--
2.43.0

2024-02-15 16:40:21

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

On 2/14/24 23:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> Before last commit, memory compaction only migrates order-0 folios and
> skips >0 order folios. Last commit splits all >0 order folios during
> compaction. This commit migrates >0 order folios during compaction by
> keeping isolated free pages at their original size without splitting them
> into order-0 pages and using them directly during migration process.
>
> What is different from the prior implementation:
> 1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page
> lists, where each page list stores free pages in the same order.
> 2. All free pages are not post_alloc_hook() processed nor buddy pages,
> although their orders are stored in first page's private like buddy
> pages.
> 3. During migration, in new page allocation time (i.e., in
> compaction_alloc()), free pages are then processed by post_alloc_hook().
> When migration fails and a new page is returned (i.e., in
> compaction_free()), free pages are restored by reversing the
> post_alloc_hook() operations using newly added
> free_pages_prepare_fpi_none().
>
> Step 3 is done for a latter optimization that splitting and/or merging
> free pages during compaction becomes easier.
>
> Note: without splitting free pages, compaction can end prematurely due to
> migration will return -ENOMEM even if there is free pages. This happens
> when no order-0 free page exist and compaction_alloc() return NULL.
>
> Signed-off-by: Zi Yan <[email protected]>
> Reviewed-by: Baolin Wang <[email protected]>
> Tested-by: Baolin Wang <[email protected]>
> Tested-by: Yu Zhao <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

Noticed a possible simplification:

> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -447,6 +447,8 @@ extern void prep_compound_page(struct page *page, unsigned int order);
>
> extern void post_alloc_hook(struct page *page, unsigned int order,
> gfp_t gfp_flags);
> +extern bool free_pages_prepare_fpi_none(struct page *page, unsigned int order);
> +
> extern int user_min_free_kbytes;
>
> extern void free_unref_page(struct page *page, unsigned int order);
> @@ -481,7 +483,7 @@ int split_free_page(struct page *free_page,
> * completes when free_pfn <= migrate_pfn
> */
> struct compact_control {
> - struct list_head freepages; /* List of free pages to migrate to */
> + struct list_head freepages[NR_PAGE_ORDERS]; /* List of free pages to migrate to */
> struct list_head migratepages; /* List of pages being migrated */
> unsigned int nr_freepages; /* Number of isolated free pages */
> unsigned int nr_migratepages; /* Number of pages to migrate */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7ae4b74c9e5c..e6e2ac722a82 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1179,6 +1179,12 @@ static __always_inline bool free_pages_prepare(struct page *page,
> return true;
> }
>
> +__always_inline bool free_pages_prepare_fpi_none(struct page *page,
> + unsigned int order)
> +{
> + return free_pages_prepare(page, order, FPI_NONE);

Seems like free_pages_prepare() currently only passes fpi_flags to
should_skip_kasan_poison() and that ignores them. You could remove the
parameter from both and declare and use free_pages_prepare(page, order)
directly.

> +}
> +
> /*
> * Frees a number of pages from the PCP lists
> * Assumes all pages on list are in same zone.

2024-02-15 16:41:45

by Zi Yan

[permalink] [raw]

Subject: Re: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

On 15 Feb 2024, at 11:07, Vlastimil Babka wrote:

> On 2/14/24 23:04, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> Before last commit, memory compaction only migrates order-0 folios and
>> skips >0 order folios. Last commit splits all >0 order folios during
>> compaction. This commit migrates >0 order folios during compaction by
>> keeping isolated free pages at their original size without splitting them
>> into order-0 pages and using them directly during migration process.
>>
>> What is different from the prior implementation:
>> 1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page
>> lists, where each page list stores free pages in the same order.
>> 2. All free pages are not post_alloc_hook() processed nor buddy pages,
>> although their orders are stored in first page's private like buddy
>> pages.
>> 3. During migration, in new page allocation time (i.e., in
>> compaction_alloc()), free pages are then processed by post_alloc_hook().
>> When migration fails and a new page is returned (i.e., in
>> compaction_free()), free pages are restored by reversing the
>> post_alloc_hook() operations using newly added
>> free_pages_prepare_fpi_none().
>>
>> Step 3 is done for a latter optimization that splitting and/or merging
>> free pages during compaction becomes easier.
>>
>> Note: without splitting free pages, compaction can end prematurely due to
>> migration will return -ENOMEM even if there is free pages. This happens
>> when no order-0 free page exist and compaction_alloc() return NULL.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> Reviewed-by: Baolin Wang <[email protected]>
>> Tested-by: Baolin Wang <[email protected]>
>> Tested-by: Yu Zhao <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
Thanks.

>
> Noticed a possible simplification:
>
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -447,6 +447,8 @@ extern void prep_compound_page(struct page *page, unsigned int order);
>>
>> extern void post_alloc_hook(struct page *page, unsigned int order,
>> gfp_t gfp_flags);
>> +extern bool free_pages_prepare_fpi_none(struct page *page, unsigned int order);
>> +
>> extern int user_min_free_kbytes;
>>
>> extern void free_unref_page(struct page *page, unsigned int order);
>> @@ -481,7 +483,7 @@ int split_free_page(struct page *free_page,
>> * completes when free_pfn <= migrate_pfn
>> */
>> struct compact_control {
>> - struct list_head freepages; /* List of free pages to migrate to */
>> + struct list_head freepages[NR_PAGE_ORDERS]; /* List of free pages to migrate to */
>> struct list_head migratepages; /* List of pages being migrated */
>> unsigned int nr_freepages; /* Number of isolated free pages */
>> unsigned int nr_migratepages; /* Number of pages to migrate */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7ae4b74c9e5c..e6e2ac722a82 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1179,6 +1179,12 @@ static __always_inline bool free_pages_prepare(struct page *page,
>> return true;
>> }
>>
>> +__always_inline bool free_pages_prepare_fpi_none(struct page *page,
>> + unsigned int order)
>> +{
>> + return free_pages_prepare(page, order, FPI_NONE);
>
> Seems like free_pages_prepare() currently only passes fpi_flags to
> should_skip_kasan_poison() and that ignores them. You could remove the
> parameter from both and declare and use free_pages_prepare(page, order)
> directly.

Got it. I can send a cleanup patch after this series. No, to avoid unnecessary
code churn, it is better to put a cleanup patch before this series and use
free_pages_prepare(). Will do it in v6.

>> +}
>> +
>> /*
>> * Frees a number of pages from the PCP lists
>> * Assumes all pages on list are in same zone.

--
Best Regards,
Yan, Zi

Attachments:

signature.asc (871.00 B)
OpenPGP digital signature

2024-02-15 16:52:35

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v5 3/3] mm/compaction: optimize >0 order folio compaction with free page split.

On 2/14/24 23:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> During migration in a memory compaction, free pages are placed in an array
> of page lists based on their order. But the desired free page order
> (i.e., the order of a source page) might not be always present, thus
> leading to migration failures and premature compaction termination. Split
> a high order free pages when source migration page has a lower order to
> increase migration successful rate.
>
> Note: merging free pages when a migration fails and a lower order free
> page is returned via compaction_free() is possible, but there is too much
> work. Since the free pages are not buddy pages, it is hard to identify
> these free pages using existing PFN-based page merging algorithm.
>
> Signed-off-by: Zi Yan <[email protected]>
> Reviewed-by: Baolin Wang <[email protected]>
> Tested-by: Baolin Wang <[email protected]>
> Tested-by: Yu Zhao <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

Thanks!

2024-02-15 16:59:02

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

On 2/14/24 23:04, Zi Yan wrote:
> @@ -1849,10 +1857,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
> static void compaction_free(struct folio *dst, unsigned long data)
> {
> struct compact_control *cc = (struct compact_control *)data;
> + int order = folio_order(dst);
> + struct page *page = &dst->page;
> +
> + if (folio_put_testzero(dst)) {
> + free_pages_prepare_fpi_none(page, order);
> +
> + INIT_LIST_HEAD(&dst->lru);

(is this even needed? I think the state of first parameter of list_add() is
never expected to be in particular state?)

>
> - list_add(&dst->lru, &cc->freepages);
> - cc->nr_freepages++;
> - cc->nr_migratepages += 1 << folio_order(dst);
> + list_add(&dst->lru, &cc->freepages[order]);
> + cc->nr_freepages += 1 << order;
> + cc->nr_migratepages += 1 << order;

Hm actually this increment of nr_migratepages should happen even if we lost
the free page.

> + }
> + /*
> + * someone else has referenced the page, we cannot take it back to our
> + * free list.
> + */
> }

2024-02-15 17:56:54

by Zi Yan

[permalink] [raw]

Subject: Re: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

On 15 Feb 2024, at 11:57, Vlastimil Babka wrote:

> On 2/14/24 23:04, Zi Yan wrote:
>> @@ -1849,10 +1857,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>> static void compaction_free(struct folio *dst, unsigned long data)
>> {
>> struct compact_control *cc = (struct compact_control *)data;
>> + int order = folio_order(dst);
>> + struct page *page = &dst->page;
>> +
>> + if (folio_put_testzero(dst)) {
>> + free_pages_prepare_fpi_none(page, order);
>> +
>> + INIT_LIST_HEAD(&dst->lru);
>
> (is this even needed? I think the state of first parameter of list_add() is
> never expected to be in particular state?)

There is a __list_add_valid() performing list corruption checks.
>
>>
>> - list_add(&dst->lru, &cc->freepages);
>> - cc->nr_freepages++;
>> - cc->nr_migratepages += 1 << folio_order(dst);
>> + list_add(&dst->lru, &cc->freepages[order]);
>> + cc->nr_freepages += 1 << order;
>> + cc->nr_migratepages += 1 << order;
>
> Hm actually this increment of nr_migratepages should happen even if we lost
> the free page.

Because compaction_free() indicates the page is not migrated and nr_migratepages
should be increased regardless.

Will fix it. Thanks.

>> + }
>> + /*
>> + * someone else has referenced the page, we cannot take it back to our
>> + * free list.
>> + */
>> }

--
Best Regards,
Yan, Zi

Attachments:

signature.asc (871.00 B)
OpenPGP digital signature

2024-02-15 20:03:08

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

On 2/15/24 18:32, Zi Yan wrote:
> On 15 Feb 2024, at 11:57, Vlastimil Babka wrote:
>
>> On 2/14/24 23:04, Zi Yan wrote:
>>> @@ -1849,10 +1857,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>>> static void compaction_free(struct folio *dst, unsigned long data)
>>> {
>>> struct compact_control *cc = (struct compact_control *)data;
>>> + int order = folio_order(dst);
>>> + struct page *page = &dst->page;
>>> +
>>> + if (folio_put_testzero(dst)) {
>>> + free_pages_prepare_fpi_none(page, order);
>>> +
>>> + INIT_LIST_HEAD(&dst->lru);
>>
>> (is this even needed? I think the state of first parameter of list_add() is
>> never expected to be in particular state?)
>
> There is a __list_add_valid() performing list corruption checks.

Yes, but dst->lru becomes "new" in list_add() and __list_add_valid() and
those never check the contents of new, i.e. new->next or new->prev. We could
have done list_del(&dst->lru) which puts poison values there and then a
list_add() is fine. So dst->lru does not need the init, it's just confusing.
Init is for the list's list_head, not for the list entry.

>>>
>>> - list_add(&dst->lru, &cc->freepages);
>>> - cc->nr_freepages++;
>>> - cc->nr_migratepages += 1 << folio_order(dst);
>>> + list_add(&dst->lru, &cc->freepages[order]);
>>> + cc->nr_freepages += 1 << order;
>>> + cc->nr_migratepages += 1 << order;
>>
>> Hm actually this increment of nr_migratepages should happen even if we lost
>> the free page.
>
> Because compaction_free() indicates the page is not migrated and nr_migratepages
> should be increased regardless.

Yes.

> Will fix it. Thanks.
>
>>> + }
>>> + /*
>>> + * someone else has referenced the page, we cannot take it back to our
>>> + * free list.
>>> + */
>>> }
>
>
> --
> Best Regards,
> Yan, Zi

2024-02-15 20:04:22

by Zi Yan

[permalink] [raw]

Subject: Re: [PATCH v5 2/3] mm/compaction: add support for >0 order folio memory compaction.

On 15 Feb 2024, at 15:02, Vlastimil Babka wrote:

> On 2/15/24 18:32, Zi Yan wrote:
>> On 15 Feb 2024, at 11:57, Vlastimil Babka wrote:
>>
>>> On 2/14/24 23:04, Zi Yan wrote:
>>>> @@ -1849,10 +1857,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>>>> static void compaction_free(struct folio *dst, unsigned long data)
>>>> {
>>>> struct compact_control *cc = (struct compact_control *)data;
>>>> + int order = folio_order(dst);
>>>> + struct page *page = &dst->page;
>>>> +
>>>> + if (folio_put_testzero(dst)) {
>>>> + free_pages_prepare_fpi_none(page, order);
>>>> +
>>>> + INIT_LIST_HEAD(&dst->lru);
>>>
>>> (is this even needed? I think the state of first parameter of list_add() is
>>> never expected to be in particular state?)
>>
>> There is a __list_add_valid() performing list corruption checks.
>
> Yes, but dst->lru becomes "new" in list_add() and __list_add_valid() and
> those never check the contents of new, i.e. new->next or new->prev. We could
> have done list_del(&dst->lru) which puts poison values there and then a
> list_add() is fine. So dst->lru does not need the init, it's just confusing.
> Init is for the list's list_head, not for the list entry.

Got it. Will remove it.

>>>>
>>>> - list_add(&dst->lru, &cc->freepages);
>>>> - cc->nr_freepages++;
>>>> - cc->nr_migratepages += 1 << folio_order(dst);
>>>> + list_add(&dst->lru, &cc->freepages[order]);
>>>> + cc->nr_freepages += 1 << order;
>>>> + cc->nr_migratepages += 1 << order;
>>>
>>> Hm actually this increment of nr_migratepages should happen even if we lost
>>> the free page.
>>
>> Because compaction_free() indicates the page is not migrated and nr_migratepages
>> should be increased regardless.
>
> Yes.
>
>> Will fix it. Thanks.
>>
>>>> + }
>>>> + /*
>>>> + * someone else has referenced the page, we cannot take it back to our
>>>> + * free list.
>>>> + */
>>>> }
>>
>>
>> --
>> Best Regards,
>> Yan, Zi

--
Best Regards,
Yan, Zi

Attachments:

signature.asc (871.00 B)
OpenPGP digital signature