2021-12-09 23:04:39

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

From: Zi Yan <[email protected]>

Hi all,

This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
adjustable at boot time[1].

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
alloc_contig_range() and restores them after all pages within the range
become free after __alloc_contig_migrate_range();
3. splits free pages spanning multiple pageblocks at the beginning and the end
of the range and puts the split pages to the right migratetype free lists
based on the pageblock migratetypes;
4. returns pages not in the range as it did before this patch.

Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
to make sure all pageblocks belonging to a single page are isolated together
and later pageblocks outside the range need to have their migratetypes restored;
or 2) extra logic will need to be added during page free time to split a free
page with multi-migratetype pageblocks.

Two optimizations might come later:
1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
range during isolation to increase successful rate of alloc_contig_range().
2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.


[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/


Zi Yan (7):
mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
mm: compaction: handle non-lru compound pages properly in
isolate_migratepages_block().
mm: migrate: allocate the right size of non hugetlb or THP compound
pages.
mm: make alloc_contig_range work at pageblock granularity
mm: cma: use pageblock_order as the single alignment
drivers: virtio_mem: use pageblock size as the minimum virtio_mem
size.
arch: powerpc: adjust fadump alignment to be pageblock aligned.

arch/powerpc/include/asm/fadump-internal.h | 4 +-
drivers/virtio/virtio_mem.c | 6 +-
include/linux/mmzone.h | 11 +-
kernel/dma/contiguous.c | 2 +-
mm/cma.c | 6 +-
mm/compaction.c | 10 +-
mm/migrate.c | 8 +-
mm/page_alloc.c | 203 +++++++++++++++++----
8 files changed, 196 insertions(+), 54 deletions(-)

--
2.33.0



2021-12-09 23:04:43

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block().

From: Zi Yan <[email protected]>

In isolate_migratepages_block(), a !PageLRU tail page can be encountered
when the page is larger than a pageblock. Use compound head page for the
checks inside and skip the entire compound page when isolation succeeds.

Signed-off-by: Zi Yan <[email protected]>
---
mm/compaction.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b4e94cda3019..ad9053fbbe06 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -979,19 +979,23 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
* Skip any other type of page
*/
if (!PageLRU(page)) {
+ struct page *head = compound_head(page);
/*
* __PageMovable can return false positive so we need
* to verify it under page_lock.
*/
- if (unlikely(__PageMovable(page)) &&
- !PageIsolated(page)) {
+ if (unlikely(__PageMovable(head)) &&
+ !PageIsolated(head)) {
if (locked) {
unlock_page_lruvec_irqrestore(locked, flags);
locked = NULL;
}

- if (!isolate_movable_page(page, isolate_mode))
+ if (!isolate_movable_page(head, isolate_mode)) {
+ low_pfn += (1 << compound_order(head)) - 1 - (page - head);
+ page = head;
goto isolate_success;
+ }
}

goto isolate_fail;
--
2.33.0


2021-12-09 23:04:43

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

From: Zi Yan <[email protected]>

This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
It prepares for the upcoming removal of the MAX_ORDER-1 alignment
requirement for CMA and alloc_contig_range().

MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.

[1] https://lore.kernel.org/linux-mm/[email protected]/

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/mmzone.h | 6 ++++++
mm/page_alloc.c | 28 ++++++++++++++++++----------
2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..b925431b0123 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
}

+/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
+static inline bool migratetype_has_fallback(int mt)
+{
+ return mt < MIGRATE_PCPTYPES;
+}
+
#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edfd6c81af82..107a5f186d3b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
return page_is_buddy(higher_page, higher_buddy, order + 1);
}

+static inline bool has_non_fallback_pageblock(struct zone *zone)
+{
+ return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
+ zone->nr_reserved_highatomic != 0;
+}
+
/*
* Freeing function for a buddy system allocator.
*
@@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
}
if (order < MAX_ORDER - 1) {
/* If we are here, it means order is >= pageblock_order.
- * We want to prevent merge between freepages on isolate
- * pageblock and normal pageblock. Without this, pageblock
- * isolation could cause incorrect freepage or CMA accounting.
+ * We want to prevent merge between freepages on pageblock
+ * without fallbacks and normal pageblock. Without this,
+ * pageblock isolation could cause incorrect freepage or CMA
+ * accounting or HIGHATOMIC accounting.
*
* We don't want to hit this code for the more frequent
* low-order merging.
*/
- if (unlikely(has_isolate_pageblock(zone))) {
+ if (unlikely(has_non_fallback_pageblock(zone))) {
int buddy_mt;

buddy_pfn = __find_buddy_pfn(pfn, order);
@@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
buddy_mt = get_pageblock_migratetype(buddy);

if (migratetype != buddy_mt
- && (is_migrate_isolate(migratetype) ||
- is_migrate_isolate(buddy_mt)))
+ && (!migratetype_has_fallback(migratetype) ||
+ !migratetype_has_fallback(buddy_mt)))
goto done_merging;
}
max_order = order + 1;
@@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
+ [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
#ifdef CONFIG_CMA
[MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
#endif
@@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,

/* Yoink! */
mt = get_pageblock_migratetype(page);
- if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
- && !is_migrate_cma(mt)) {
+ /* Only reserve normal pageblock */
+ if (migratetype_has_fallback(mt)) {
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
- if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
- && !is_migrate_highatomic(mt))
+ /* Only change normal pageblock */
+ if (migratetype_has_fallback(mt))
set_pageblock_migratetype(page,
MIGRATE_MOVABLE);
}
--
2.33.0


2021-12-09 23:04:47

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity

From: Zi Yan <[email protected]>

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

It is done by restoring pageblock types and split >pageblock_order free
pages after isolating at MAX_ORDER-1 granularity and migrating pages
away at pageblock granularity. The reason for this process is that
during isolation, some pages, either free or in-use, might have >pageblock
sizes and isolating part of them can cause free accounting issues.
Restoring the migratetypes of the pageblocks not in the interesting
range later is much easier.

Signed-off-by: Zi Yan <[email protected]>
---
mm/page_alloc.c | 169 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 149 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 107a5f186d3b..5ffbeb1b7512 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8981,8 +8981,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
#ifdef CONFIG_CONTIG_ALLOC
static unsigned long pfn_max_align_down(unsigned long pfn)
{
- return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) - 1);
+ return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+ pageblock_nr_pages));
}

static unsigned long pfn_max_align_up(unsigned long pfn)
@@ -9071,6 +9071,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
return 0;
}

+static inline int save_migratetypes(unsigned char *migratetypes,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn = start_pfn;
+ int num = 0;
+
+ while (pfn < end_pfn) {
+ migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
+ num++;
+ pfn += pageblock_nr_pages;
+ }
+ return num;
+}
+
+static inline int restore_migratetypes(unsigned char *migratetypes,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn = start_pfn;
+ int num = 0;
+
+ while (pfn < end_pfn) {
+ set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
+ num++;
+ pfn += pageblock_nr_pages;
+ }
+ return num;
+}
+
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+ int order, struct zone *zone)
+{
+ unsigned long pfn;
+
+ spin_lock(&zone->lock);
+ del_page_from_free_list(free_page, zone, order);
+ for (pfn = page_to_pfn(free_page);
+ pfn < page_to_pfn(free_page) + (1UL << order);
+ pfn += pageblock_nr_pages) {
+ int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+ __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+ mt, FPI_NONE);
+ }
+ spin_unlock(&zone->lock);
+}
+
/**
* alloc_contig_range() -- tries to allocate given range of pages
* @start: start PFN to allocate
@@ -9096,8 +9142,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
unsigned migratetype, gfp_t gfp_mask)
{
unsigned long outer_start, outer_end;
+ unsigned long isolate_start = pfn_max_align_down(start);
+ unsigned long isolate_end = pfn_max_align_up(end);
+ unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+ unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
+ unsigned long num_pageblock_to_save;
unsigned int order;
int ret = 0;
+ unsigned char *saved_mt;
+ int num;

struct compact_control cc = {
.nr_migratepages = 0,
@@ -9111,11 +9164,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
};
INIT_LIST_HEAD(&cc.migratepages);

+ /*
+ * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
+ * the exiting migratetype. Then, we will not need the save and restore
+ * process here.
+ */
+
+ /* Save the migratepages of the pageblocks before start and after end */
+ num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
+ + (isolate_end - alloc_end) / pageblock_nr_pages;
+ saved_mt =
+ kmalloc_array(num_pageblock_to_save,
+ sizeof(unsigned char), GFP_KERNEL);
+ if (!saved_mt)
+ return -ENOMEM;
+
+ num = save_migratetypes(saved_mt, isolate_start, alloc_start);
+
+ num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
/*
* What we do here is we mark all pageblocks in range as
* MIGRATE_ISOLATE. Because pageblock and max order pages may
* have different sizes, and due to the way page allocator
- * work, we align the range to biggest of the two pages so
+ * work, we align the isolation range to biggest of the two so
* that page allocator won't try to merge buddies from
* different pageblocks and change MIGRATE_ISOLATE to some
* other migration type.
@@ -9125,6 +9197,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* we are interested in). This will put all the pages in
* range back to page allocator as MIGRATE_ISOLATE.
*
+ * Afterwards, we restore the migratetypes of the pageblocks not
+ * in range, split free pages spanning outside the range,
+ * and put split free pages (at pageblock_order) to the right
+ * migratetype list.
+ *
+ * NOTE: the above approach is used because it can cause free
+ * page accounting issues during isolation, if a page, either
+ * free or in-use, contains multiple pageblocks and we only
+ * isolate a subset of them. For example, if only the second
+ * pageblock is isolated from a page with 2 pageblocks, after
+ * the page is free, it will be put in the first pageblock
+ * migratetype list instead of having 2 pageblocks in two
+ * separate migratetype lists.
+ *
* When this is done, we take the pages in range from page
* allocator removing them from the buddy system. This way
* page allocator will never consider using them.
@@ -9135,10 +9221,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* put back to page allocator so that buddy can use them.
*/

- ret = start_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype, 0);
+ ret = start_isolate_page_range(isolate_start, isolate_end, migratetype, 0);
if (ret)
- return ret;
+ goto done;

drain_all_pages(cc.zone);

@@ -9174,6 +9259,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* isolated thus they won't get removed from buddy.
*/

+ /*
+ * Restore migratetypes of pageblocks outside [start, end)
+ * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
+ */
+
+ num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
+
+ num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
+
+ /*
+ * Split free page spanning [isolate_start, alloc_start) and put the
+ * pageblocks in the right migratetype lists.
+ */
order = 0;
outer_start = start;
while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -9188,37 +9286,68 @@ int alloc_contig_range(unsigned long start, unsigned long end,
order = buddy_order(pfn_to_page(outer_start));

/*
- * outer_start page could be small order buddy page and
- * it doesn't include start page. Adjust outer_start
- * in this case to report failed page properly
- * on tracepoint in test_pages_isolated()
+ * split the free page has start page and put the pageblocks
+ * in the right migratetype list
*/
- if (outer_start + (1UL << order) <= start)
- outer_start = start;
+ if (outer_start + (1UL << order) > start) {
+ struct page *free_page = pfn_to_page(outer_start);
+
+ split_free_page_into_pageblocks(free_page, order, cc.zone);
+ }
+ }
+
+ /*
+ * Split free page spanning [alloc_end, isolate_end) and put the
+ * pageblocks in the right migratetype list
+ */
+ order = 0;
+ outer_end = end;
+ while (!PageBuddy(pfn_to_page(outer_end))) {
+ if (++order >= MAX_ORDER) {
+ outer_end = end;
+ break;
+ }
+ outer_end &= ~0UL << order;
+ }
+
+ if (outer_end != end) {
+ order = buddy_order(pfn_to_page(outer_end));
+
+ /*
+ * split the free page has start page and put the pageblocks
+ * in the right migratetype list
+ */
+ VM_BUG_ON(outer_end + (1UL << order) <= end);
+ {
+ struct page *free_page = pfn_to_page(outer_end);
+
+ split_free_page_into_pageblocks(free_page, order, cc.zone);
+ }
}

/* Make sure the range is really isolated. */
- if (test_pages_isolated(outer_start, end, 0)) {
+ if (test_pages_isolated(alloc_start, alloc_end, 0)) {
ret = -EBUSY;
goto done;
}

/* Grab isolated pages from freelists. */
- outer_end = isolate_freepages_range(&cc, outer_start, end);
+ outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
if (!outer_end) {
ret = -EBUSY;
goto done;
}

/* Free head and tail (if any) */
- if (start != outer_start)
- free_contig_range(outer_start, start - outer_start);
- if (end != outer_end)
- free_contig_range(end, outer_end - end);
+ if (start != alloc_start)
+ free_contig_range(alloc_start, start - alloc_start);
+ if (end != alloc_end)
+ free_contig_range(end, alloc_end - end);

done:
- undo_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype);
+ kfree(saved_mt);
+ undo_isolate_page_range(alloc_start,
+ alloc_end, migratetype);
return ret;
}
EXPORT_SYMBOL(alloc_contig_range);
--
2.33.0


2021-12-09 23:04:49

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.

From: Zi Yan <[email protected]>

alloc_migration_target() is used by alloc_contig_range() and non-LRU
movable compound pages can be migrated. Current code does not allocate the
right page size for such pages. Check THP precisely using
is_transparent_huge() and add allocation support for non-LRU compound
pages.

Signed-off-by: Zi Yan <[email protected]>
---
mm/migrate.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d487a399253b..2ce3c771b1de 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
}

- if (PageTransHuge(page)) {
+ if (is_transparent_hugepage(page)) {
/*
* clear __GFP_RECLAIM to make the migration callback
* consistent with regular THP allocations.
@@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
+ if (PageCompound(page)) {
+ gfp_mask |= __GFP_COMP;
+ order = compound_order(page);
+ }
zidx = zone_idx(page_zone(page));
if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
gfp_mask |= __GFP_HIGHMEM;

new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);

- if (new_page && PageTransHuge(new_page))
+ if (new_page && is_transparent_hugepage(page))
prep_transhuge_page(new_page);

return new_page;
--
2.33.0


2021-12-09 23:04:51

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment

From: Zi Yan <[email protected]>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/mmzone.h | 5 +----
kernel/dma/contiguous.c | 2 +-
mm/cma.c | 6 ++----
mm/page_alloc.c | 6 +++---
4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b925431b0123..71830af35745 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
*
* The way to use it is to change migratetype of a range of
* pageblocks to MIGRATE_CMA which can be done by
- * __free_pageblock_cma() function. What is important though
- * is that a range of pageblocks must be aligned to
- * MAX_ORDER_NR_PAGES should biggest page be bigger than
- * a single pageblock.
+ * __free_pageblock_cma() function.
*/
MIGRATE_CMA,
#endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {

static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
- phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+ phys_addr_t align = PAGE_SIZE << pageblock_order;
phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
return -EINVAL;

/* ensure minimal alignment required by mm core */
- alignment = PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+ alignment = PAGE_SIZE << pageblock_order;

/* alignment should be aligned with order_per_bit */
if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
* migratetype page by page allocator's buddy algorithm. In the case,
* you couldn't get a contiguous memory, which is not what we want.
*/
- alignment = max(alignment, (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+ alignment = max(alignment, (phys_addr_t)PAGE_SIZE << pageblock_order);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5ffbeb1b7512..3317f2064105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9127,8 +9127,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page,
* be either of the two.
* @gfp_mask: GFP mask to use during compaction
*
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned. The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
*
* The first thing this routine does is attempt to MIGRATE_ISOLATE all
* pageblocks in the range. Once isolated, the pageblocks should not
@@ -9243,7 +9243,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
ret = 0;

/*
- * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+ * Pages from [start, end) are within a pageblock_nr_pages
* aligned blocks that are marked as MIGRATE_ISOLATE. What's
* more, all pages in [start, end) are free in page allocator.
* What we are going to do is to allocate all pages from
--
2.33.0


2021-12-09 23:04:54

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.

From: Zi Yan <[email protected]>

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan <[email protected]>
---
drivers/virtio/virtio_mem.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 96e5a8782769..dab4bce417fd 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2436,15 +2436,13 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);

/*
- * We want subblocks to span at least MAX_ORDER_NR_PAGES and
- * pageblock_nr_pages pages. This:
+ * We want subblocks to span at least pageblock_nr_pages pages. This:
* - Simplifies our page onlining code (virtio_mem_online_page_cb)
* and fake page onlining code (virtio_mem_fake_online).
* - Is required for now for alloc_contig_range() to work reliably -
* it doesn't properly handle smaller granularity on ZONE_NORMAL.
*/
- sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) * PAGE_SIZE;
+ sb_size = pageblock_nr_pages * PAGE_SIZE;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);

if (sb_size < memory_block_size_bytes() && !force_bbm) {
--
2.33.0


2021-12-09 23:04:56

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.

From: Zi Yan <[email protected]>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <[email protected]>
---
arch/powerpc/include/asm/fadump-internal.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 8d61c8f3fec4..9198f20b6b68 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt)

/* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \
- max_t(unsigned long, MAX_ORDER - 1, \
- pageblock_order))
+#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << pageblock_order)

/* FAD commands */
#define FADUMP_REGISTER 1
--
2.33.0


2021-12-10 07:30:20

by Eric Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

Hi Zi Yan,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> Hi all,
>
> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
> adjustable at boot time[1].
>
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
>
> To remove the requirement, this patchset:
> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
Then, unplug fails if either pageblock of the  MAX_ORDER - 1 page has
unmovable page, right?

Thanks,
Eric
> 2. but saves the pageblock migratetypes outside the specified range of
> alloc_contig_range() and restores them after all pages within the range
> become free after __alloc_contig_migrate_range();
> 3. splits free pages spanning multiple pageblocks at the beginning and the end
> of the range and puts the split pages to the right migratetype free lists
> based on the pageblock migratetypes;
> 4. returns pages not in the range as it did before this patch.
>
> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
> to make sure all pageblocks belonging to a single page are isolated together
> and later pageblocks outside the range need to have their migratetypes restored;
> or 2) extra logic will need to be added during page free time to split a free
> page with multi-migratetype pageblocks.
>
> Two optimizations might come later:
> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
> range during isolation to increase successful rate of alloc_contig_range().
> 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
> migratetypes before and after isolation respectively.
>
> Feel free to give comments and suggestions. Thanks.
>
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/
> [2] https://lore.kernel.org/linux-mm/[email protected]/
>
>
> Zi Yan (7):
> mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
> mm: compaction: handle non-lru compound pages properly in
> isolate_migratepages_block().
> mm: migrate: allocate the right size of non hugetlb or THP compound
> pages.
> mm: make alloc_contig_range work at pageblock granularity
> mm: cma: use pageblock_order as the single alignment
> drivers: virtio_mem: use pageblock size as the minimum virtio_mem
> size.
> arch: powerpc: adjust fadump alignment to be pageblock aligned.
>
> arch/powerpc/include/asm/fadump-internal.h | 4 +-
> drivers/virtio/virtio_mem.c | 6 +-
> include/linux/mmzone.h | 11 +-
> kernel/dma/contiguous.c | 2 +-
> mm/cma.c | 6 +-
> mm/compaction.c | 10 +-
> mm/migrate.c | 8 +-
> mm/page_alloc.c | 203 +++++++++++++++++----
> 8 files changed, 196 insertions(+), 54 deletions(-)
>


2021-12-10 07:44:05

by Eric Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

Hi,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
> requirement for CMA and alloc_contig_range().
>
> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/mmzone.h | 6 ++++++
> mm/page_alloc.c | 28 ++++++++++++++++++----------
> 2 files changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..b925431b0123 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
> return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
> }
>
> +/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
> +static inline bool migratetype_has_fallback(int mt)
> +{
> + return mt < MIGRATE_PCPTYPES;
> +}
> +

I would suggest spliting the patch into 2 parts.  The first part: no
functioning change, just introduce migratetype_has_fallback()
and replace where it applys to.

> #define for_each_migratetype_order(order, type) \
> for (order = 0; order < MAX_ORDER; order++) \
> for (type = 0; type < MIGRATE_TYPES; type++)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index edfd6c81af82..107a5f186d3b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
> return page_is_buddy(higher_page, higher_buddy, order + 1);
> }
>
> +static inline bool has_non_fallback_pageblock(struct zone *zone)
> +{
> + return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
> + zone->nr_reserved_highatomic != 0;

Make zone->nr_reserved_highatomic != 0 a helper as zone_cma_pages()?
> +}
> +
> /*
> * Freeing function for a buddy system allocator.
> *
> @@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
> }
> if (order < MAX_ORDER - 1) {
> /* If we are here, it means order is >= pageblock_order.
> - * We want to prevent merge between freepages on isolate
> - * pageblock and normal pageblock. Without this, pageblock
> - * isolation could cause incorrect freepage or CMA accounting.
> + * We want to prevent merge between freepages on pageblock
> + * without fallbacks and normal pageblock. Without this,
> + * pageblock isolation could cause incorrect freepage or CMA
> + * accounting or HIGHATOMIC accounting.
> *
> * We don't want to hit this code for the more frequent
> * low-order merging.
> */
> - if (unlikely(has_isolate_pageblock(zone))) {
> + if (unlikely(has_non_fallback_pageblock(zone))) {
I'm not familiar with the code details, just wondering if this change
would has side effects on cma
pageblock merging as it the condition stronger?

Thanks,
Eric
> int buddy_mt;
>
> buddy_pfn = __find_buddy_pfn(pfn, order);
> @@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
> buddy_mt = get_pageblock_migratetype(buddy);
>
> if (migratetype != buddy_mt
> - && (is_migrate_isolate(migratetype) ||
> - is_migrate_isolate(buddy_mt)))
> + && (!migratetype_has_fallback(migratetype) ||
> + !migratetype_has_fallback(buddy_mt)))
> goto done_merging;
> }
> max_order = order + 1;
> @@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
> [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
> [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
> [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
> + [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
> #ifdef CONFIG_CMA
> [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
> #endif
> @@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
>
> /* Yoink! */
> mt = get_pageblock_migratetype(page);
> - if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
> - && !is_migrate_cma(mt)) {
> + /* Only reserve normal pageblock */
> + if (migratetype_has_fallback(mt)) {
> zone->nr_reserved_highatomic += pageblock_nr_pages;
> set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
> move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
> @@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
> struct page *endpage = page + (1 << order) - 1;
> for (; page < endpage; page += pageblock_nr_pages) {
> int mt = get_pageblock_migratetype(page);
> - if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
> - && !is_migrate_highatomic(mt))
> + /* Only change normal pageblock */
> + if (migratetype_has_fallback(mt))
> set_pageblock_migratetype(page,
> MIGRATE_MOVABLE);
> }


2021-12-10 07:54:00

by Eric Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.

Hi,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> alloc_migration_target() is used by alloc_contig_range() and non-LRU
> movable compound pages can be migrated. Current code does not allocate the
> right page size for such pages. Check THP precisely using
> is_transparent_huge() and add allocation support for non-LRU compound
> pages.
Could you elaborate on why the current code doesn't get the right size? 
how this patch fixes it.

The description sounds like it's an existing bug, if so, the patch
subject should be changed to
"Fixes ..."?

>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/migrate.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index d487a399253b..2ce3c771b1de 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
> }
>
> - if (PageTransHuge(page)) {
> + if (is_transparent_hugepage(page)) {
> /*
> * clear __GFP_RECLAIM to make the migration callback
> * consistent with regular THP allocations.
> @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
if (PageTransHuge(page)) {  // just give more code context
...
> gfp_mask |= GFP_TRANSHUGE;
> order = HPAGE_PMD_ORDER;
order assigned here
> }
> + if (PageCompound(page)) {
> + gfp_mask |= __GFP_COMP;
> + order = compound_order(page);
re-assinged again as THP is a compound page?

Thanks,
Eric
> + }
> zidx = zone_idx(page_zone(page));
> if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
> gfp_mask |= __GFP_HIGHMEM;
>
> new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
>
> - if (new_page && PageTransHuge(new_page))
> + if (new_page && is_transparent_hugepage(page))
> prep_transhuge_page(new_page);
>
> return new_page;


2021-12-10 08:13:03

by Eric Ren

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity

Hi,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
>
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
Could you elaborate on how the acconting issue would happen in details?


> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/page_alloc.c | 169 ++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 149 insertions(+), 20 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 107a5f186d3b..5ffbeb1b7512 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8981,8 +8981,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> #ifdef CONFIG_CONTIG_ALLOC
> static unsigned long pfn_max_align_down(unsigned long pfn)
> {
> - return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
> - pageblock_nr_pages) - 1);
> + return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
> + pageblock_nr_pages));
> }
>
> static unsigned long pfn_max_align_up(unsigned long pfn)
> @@ -9071,6 +9071,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
> return 0;
> }
>
> +static inline int save_migratetypes(unsigned char *migratetypes,
> + unsigned long start_pfn, unsigned long end_pfn)
> +{
> + unsigned long pfn = start_pfn;
> + int num = 0;
> +
> + while (pfn < end_pfn) {
> + migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
> + num++;
> + pfn += pageblock_nr_pages;
> + }
> + return num;
> +}
> +
> +static inline int restore_migratetypes(unsigned char *migratetypes,
> + unsigned long start_pfn, unsigned long end_pfn)
> +{
> + unsigned long pfn = start_pfn;
> + int num = 0;
> +
> + while (pfn < end_pfn) {
> + set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
> + num++;
> + pfn += pageblock_nr_pages;
> + }
> + return num;
> +}
> +
> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> + int order, struct zone *zone)
> +{
> + unsigned long pfn;
> +
> + spin_lock(&zone->lock);
> + del_page_from_free_list(free_page, zone, order);
> + for (pfn = page_to_pfn(free_page);
> + pfn < page_to_pfn(free_page) + (1UL << order);
> + pfn += pageblock_nr_pages) {
> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> + mt, FPI_NONE);
> + }
> + spin_unlock(&zone->lock);
> +}
> +
> /**
> * alloc_contig_range() -- tries to allocate given range of pages
> * @start: start PFN to allocate
> @@ -9096,8 +9142,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> unsigned migratetype, gfp_t gfp_mask)
> {
> unsigned long outer_start, outer_end;
> + unsigned long isolate_start = pfn_max_align_down(start);
> + unsigned long isolate_end = pfn_max_align_up(end);
> + unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
> + unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
What is the differecence between isolate_* and alloc_*?
> + unsigned long num_pageblock_to_save;
> unsigned int order;
> int ret = 0;
> + unsigned char *saved_mt;
> + int num;
>
> struct compact_control cc = {
> .nr_migratepages = 0,
> @@ -9111,11 +9164,30 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> };
> INIT_LIST_HEAD(&cc.migratepages);
>
> + /*
> + * TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
> + * the exiting migratetype. Then, we will not need the save and restore
> + * process here.
> + */
> +
> + /* Save the migratepages of the pageblocks before start and after end */
> + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages
> + + (isolate_end - alloc_end) / pageblock_nr_pages;
> + saved_mt =
> + kmalloc_array(num_pageblock_to_save,
> + sizeof(unsigned char), GFP_KERNEL);
> + if (!saved_mt)
> + return -ENOMEM;
> +
> + num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);
the two lines above looks confusing to figure out the logic. Could you
help explain a bit?

Thanks,
Eric
> +
> /*
> * What we do here is we mark all pageblocks in range as
> * MIGRATE_ISOLATE. Because pageblock and max order pages may
> * have different sizes, and due to the way page allocator
> - * work, we align the range to biggest of the two pages so
> + * work, we align the isolation range to biggest of the two so
> * that page allocator won't try to merge buddies from
> * different pageblocks and change MIGRATE_ISOLATE to some
> * other migration type.
> @@ -9125,6 +9197,20 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> * we are interested in). This will put all the pages in
> * range back to page allocator as MIGRATE_ISOLATE.
> *
> + * Afterwards, we restore the migratetypes of the pageblocks not
> + * in range, split free pages spanning outside the range,
> + * and put split free pages (at pageblock_order) to the right
> + * migratetype list.
> + *
> + * NOTE: the above approach is used because it can cause free
> + * page accounting issues during isolation, if a page, either
> + * free or in-use, contains multiple pageblocks and we only
> + * isolate a subset of them. For example, if only the second
> + * pageblock is isolated from a page with 2 pageblocks, after
> + * the page is free, it will be put in the first pageblock
> + * migratetype list instead of having 2 pageblocks in two
> + * separate migratetype lists.
> + *
> * When this is done, we take the pages in range from page
> * allocator removing them from the buddy system. This way
> * page allocator will never consider using them.
> @@ -9135,10 +9221,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> * put back to page allocator so that buddy can use them.
> */
>
> - ret = start_isolate_page_range(pfn_max_align_down(start),
> - pfn_max_align_up(end), migratetype, 0);
> + ret = start_isolate_page_range(isolate_start, isolate_end, migratetype, 0);
> if (ret)
> - return ret;
> + goto done;
>
> drain_all_pages(cc.zone);
>
> @@ -9174,6 +9259,19 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> * isolated thus they won't get removed from buddy.
> */
>
> + /*
> + * Restore migratetypes of pageblocks outside [start, end)
> + * TODO: remove it when MIGRATE_ISOLATE becomes a standalone bit
> + */
> +
> + num = restore_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> + num = restore_migratetypes(&saved_mt[num], alloc_end, isolate_end);
> +
> + /*
> + * Split free page spanning [isolate_start, alloc_start) and put the
> + * pageblocks in the right migratetype lists.
> + */
> order = 0;
> outer_start = start;
> while (!PageBuddy(pfn_to_page(outer_start))) {
> @@ -9188,37 +9286,68 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> order = buddy_order(pfn_to_page(outer_start));
>
> /*
> - * outer_start page could be small order buddy page and
> - * it doesn't include start page. Adjust outer_start
> - * in this case to report failed page properly
> - * on tracepoint in test_pages_isolated()
> + * split the free page has start page and put the pageblocks
> + * in the right migratetype list
> */
> - if (outer_start + (1UL << order) <= start)
> - outer_start = start;
> + if (outer_start + (1UL << order) > start) {
> + struct page *free_page = pfn_to_page(outer_start);
> +
> + split_free_page_into_pageblocks(free_page, order, cc.zone);
> + }
> + }
> +
> + /*
> + * Split free page spanning [alloc_end, isolate_end) and put the
> + * pageblocks in the right migratetype list
> + */
> + order = 0;
> + outer_end = end;
> + while (!PageBuddy(pfn_to_page(outer_end))) {
> + if (++order >= MAX_ORDER) {
> + outer_end = end;
> + break;
> + }
> + outer_end &= ~0UL << order;
> + }
> +
> + if (outer_end != end) {
> + order = buddy_order(pfn_to_page(outer_end));
> +
> + /*
> + * split the free page has start page and put the pageblocks
> + * in the right migratetype list
> + */
> + VM_BUG_ON(outer_end + (1UL << order) <= end);
> + {
> + struct page *free_page = pfn_to_page(outer_end);
> +
> + split_free_page_into_pageblocks(free_page, order, cc.zone);
> + }
> }
>
> /* Make sure the range is really isolated. */
> - if (test_pages_isolated(outer_start, end, 0)) {
> + if (test_pages_isolated(alloc_start, alloc_end, 0)) {
> ret = -EBUSY;
> goto done;
> }
>
> /* Grab isolated pages from freelists. */
> - outer_end = isolate_freepages_range(&cc, outer_start, end);
> + outer_end = isolate_freepages_range(&cc, alloc_start, alloc_end);
> if (!outer_end) {
> ret = -EBUSY;
> goto done;
> }
>
> /* Free head and tail (if any) */
> - if (start != outer_start)
> - free_contig_range(outer_start, start - outer_start);
> - if (end != outer_end)
> - free_contig_range(end, outer_end - end);
> + if (start != alloc_start)
> + free_contig_range(alloc_start, start - alloc_start);
> + if (end != alloc_end)
> + free_contig_range(end, alloc_end - end);
>
> done:
> - undo_isolate_page_range(pfn_max_align_down(start),
> - pfn_max_align_up(end), migratetype);
> + kfree(saved_mt);
> + undo_isolate_page_range(alloc_start,
> + alloc_end, migratetype);
> return ret;
> }
> EXPORT_SYMBOL(alloc_contig_range);


2021-12-10 15:30:39

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

On 10 Dec 2021, at 2:30, Eric Ren wrote:

> Hi Zi Yan,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> Hi all,
>>
>> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
>> adjustable at boot time[1].
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
> Then, unplug fails if either pageblock of the  MAX_ORDER - 1 page has unmovable page, right?

Right. One of the optimizations mentioned is targeting this by passing the actual
range instead of the MAX_ORDER-1 aligned range, so that has_unmovable_pages()
will not give false positive, minimizing the isolation failure rates.

>
> Thanks,
> Eric
>> 2. but saves the pageblock migratetypes outside the specified range of
>> alloc_contig_range() and restores them after all pages within the range
>> become free after __alloc_contig_migrate_range();
>> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>> of the range and puts the split pages to the right migratetype free lists
>> based on the pageblock migratetypes;
>> 4. returns pages not in the range as it did before this patch.
>>
>> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
>> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
>> to make sure all pageblocks belonging to a single page are isolated together
>> and later pageblocks outside the range need to have their migratetypes restored;
>> or 2) extra logic will need to be added during page free time to split a free
>> page with multi-migratetype pageblocks.
>>
>> Two optimizations might come later:
>> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>> range during isolation to increase successful rate of alloc_contig_range().
^^^^^^^^^^^^^^

>> 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
>> migratetypes before and after isolation respectively.
>>
>> Feel free to give comments and suggestions. Thanks.
>>
>>
>> [1] https://lore.kernel.org/linux-mm/[email protected]/
>> [2] https://lore.kernel.org/linux-mm/[email protected]/
>>
>>
>> Zi Yan (7):
>> mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
>> mm: compaction: handle non-lru compound pages properly in
>> isolate_migratepages_block().
>> mm: migrate: allocate the right size of non hugetlb or THP compound
>> pages.
>> mm: make alloc_contig_range work at pageblock granularity
>> mm: cma: use pageblock_order as the single alignment
>> drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>> size.
>> arch: powerpc: adjust fadump alignment to be pageblock aligned.
>>
>> arch/powerpc/include/asm/fadump-internal.h | 4 +-
>> drivers/virtio/virtio_mem.c | 6 +-
>> include/linux/mmzone.h | 11 +-
>> kernel/dma/contiguous.c | 2 +-
>> mm/cma.c | 6 +-
>> mm/compaction.c | 10 +-
>> mm/migrate.c | 8 +-
>> mm/page_alloc.c | 203 +++++++++++++++++----
>> 8 files changed, 196 insertions(+), 54 deletions(-)
>>


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (854.00 B)
OpenPGP digital signature

2021-12-10 15:40:15

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

Hi Eric,

Thanks for looking into my patch.

On 10 Dec 2021, at 2:43, Eric Ren wrote:

> Hi,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
>> It prepares for the upcoming removal of the MAX_ORDER-1 alignment
>> requirement for CMA and alloc_contig_range().
>>
>> MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
>> MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
>> Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.
>>
>> [1] https://lore.kernel.org/linux-mm/[email protected]/
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/mmzone.h | 6 ++++++
>> mm/page_alloc.c | 28 ++++++++++++++++++----------
>> 2 files changed, 24 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 58e744b78c2c..b925431b0123 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
>> return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
>> }
>> +/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
>> +static inline bool migratetype_has_fallback(int mt)
>> +{
>> + return mt < MIGRATE_PCPTYPES;
>> +}
>> +
>
> I would suggest spliting the patch into 2 parts.  The first part: no functioning change, just introduce migratetype_has_fallback()
> and replace where it applys to.

OK. I can do that.

>
>> #define for_each_migratetype_order(order, type) \
>> for (order = 0; order < MAX_ORDER; order++) \
>> for (type = 0; type < MIGRATE_TYPES; type++)
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index edfd6c81af82..107a5f186d3b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
>> return page_is_buddy(higher_page, higher_buddy, order + 1);
>> }
>> +static inline bool has_non_fallback_pageblock(struct zone *zone)
>> +{
>> + return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
>> + zone->nr_reserved_highatomic != 0;
>
> Make zone->nr_reserved_highatomic != 0 a helper as zone_cma_pages()?

I am not sure. We have zone_cma_pages() because when CMA is not enabled, 0 can be
simply returned. But MIGRATE_HIGHATOMIC is always present, then an helper function
is not that useful.

>> +}
>> +
>> /*
>> * Freeing function for a buddy system allocator.
>> *
>> @@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
>> }
>> if (order < MAX_ORDER - 1) {
>> /* If we are here, it means order is >= pageblock_order.
>> - * We want to prevent merge between freepages on isolate
>> - * pageblock and normal pageblock. Without this, pageblock
>> - * isolation could cause incorrect freepage or CMA accounting.
>> + * We want to prevent merge between freepages on pageblock
>> + * without fallbacks and normal pageblock. Without this,
>> + * pageblock isolation could cause incorrect freepage or CMA
>> + * accounting or HIGHATOMIC accounting.
>> *
>> * We don't want to hit this code for the more frequent
>> * low-order merging.
>> */
>> - if (unlikely(has_isolate_pageblock(zone))) {
>> + if (unlikely(has_non_fallback_pageblock(zone))) {
> I'm not familiar with the code details, just wondering if this change would has side effects on cma
> pageblock merging as it the condition stronger?

No impact on cma pageblock merging, AFAICT.

>
> Thanks,
> Eric
>> int buddy_mt;
>> buddy_pfn = __find_buddy_pfn(pfn, order);
>> @@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
>> buddy_mt = get_pageblock_migratetype(buddy);
>> if (migratetype != buddy_mt
>> - && (is_migrate_isolate(migratetype) ||
>> - is_migrate_isolate(buddy_mt)))
>> + && (!migratetype_has_fallback(migratetype) ||
>> + !migratetype_has_fallback(buddy_mt)))
>> goto done_merging;
>> }
>> max_order = order + 1;
>> @@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
>> [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
>> [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
>> [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
>> + [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
>> #ifdef CONFIG_CMA
>> [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
>> #endif
>> @@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
>> /* Yoink! */
>> mt = get_pageblock_migratetype(page);
>> - if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
>> - && !is_migrate_cma(mt)) {
>> + /* Only reserve normal pageblock */
>> + if (migratetype_has_fallback(mt)) {
>> zone->nr_reserved_highatomic += pageblock_nr_pages;
>> set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
>> move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
>> @@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order)
>> struct page *endpage = page + (1 << order) - 1;
>> for (; page < endpage; page += pageblock_nr_pages) {
>> int mt = get_pageblock_migratetype(page);
>> - if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
>> - && !is_migrate_highatomic(mt))
>> + /* Only change normal pageblock */
>> + if (migratetype_has_fallback(mt))
>> set_pageblock_migratetype(page,
>> MIGRATE_MOVABLE);
>> }


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (854.00 B)
OpenPGP digital signature

2021-12-10 15:49:23

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.

On 10 Dec 2021, at 2:53, Eric Ren wrote:

> Hi,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> alloc_migration_target() is used by alloc_contig_range() and non-LRU
>> movable compound pages can be migrated. Current code does not allocate the
>> right page size for such pages. Check THP precisely using
>> is_transparent_huge() and add allocation support for non-LRU compound
>> pages.
> Could you elaborate on why the current code doesn't get the right size?  how this patch fixes it.

The current code only check PageHuge() and PageTransHuge(). Non-LRU compound
pages will be regarded as PageTransHuge() and an order-9 page is always allocated
regardless of the actual page order. This patch makes the exact check for
THPs using is_transparent_huge() instead of PageTransHuge() and checks PageCompound()
after PageHuge() and is_transparent_huge() for non-LRU compound pages.

>
> The description sounds like it's an existing bug, if so, the patch subject should be changed to
> "Fixes ..."?

I have not seen any related bug report.

>
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/migrate.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index d487a399253b..2ce3c771b1de 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
>> return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
>> }
>> - if (PageTransHuge(page)) {
>> + if (is_transparent_hugepage(page)) {
>> /*
>> * clear __GFP_RECLAIM to make the migration callback
>> * consistent with regular THP allocations.
>> @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> if (PageTransHuge(page)) {  // just give more code context
> ...
>> gfp_mask |= GFP_TRANSHUGE;
>> order = HPAGE_PMD_ORDER;
> order assigned here
>> }
>> + if (PageCompound(page)) {
>> + gfp_mask |= __GFP_COMP;
>> + order = compound_order(page);
> re-assinged again as THP is a compound page?

Ah, you are right. Will use else if instead. Thanks.

> Thanks,
> Eric
>> + }
>> zidx = zone_idx(page_zone(page));
>> if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
>> gfp_mask |= __GFP_HIGHMEM;
>> new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
>> - if (new_page && PageTransHuge(new_page))
>> + if (new_page && is_transparent_hugepage(page))
>> prep_transhuge_page(new_page);
>> return new_page;


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (854.00 B)
OpenPGP digital signature

2021-12-10 18:00:04

by Yang Shi

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.

On Fri, Dec 10, 2021 at 8:08 AM Zi Yan <[email protected]> wrote:
>
> On 10 Dec 2021, at 2:53, Eric Ren wrote:
>
> > Hi,
> >
> > On 2021/12/10 07:04, Zi Yan wrote:
> >> From: Zi Yan <[email protected]>
> >>
> >> alloc_migration_target() is used by alloc_contig_range() and non-LRU
> >> movable compound pages can be migrated. Current code does not allocate the
> >> right page size for such pages. Check THP precisely using
> >> is_transparent_huge() and add allocation support for non-LRU compound
> >> pages.
> > Could you elaborate on why the current code doesn't get the right size? how this patch fixes it.
>
> The current code only check PageHuge() and PageTransHuge(). Non-LRU compound
> pages will be regarded as PageTransHuge() and an order-9 page is always allocated
> regardless of the actual page order. This patch makes the exact check for
> THPs using is_transparent_huge() instead of PageTransHuge() and checks PageCompound()
> after PageHuge() and is_transparent_huge() for non-LRU compound pages.
>
> >
> > The description sounds like it's an existing bug, if so, the patch subject should be changed to
> > "Fixes ..."?
>
> I have not seen any related bug report.
>
> >
> >>
> >> Signed-off-by: Zi Yan <[email protected]>
> >> ---
> >> mm/migrate.c | 8 ++++++--
> >> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index d487a399253b..2ce3c771b1de 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> >> return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
> >> }
> >> - if (PageTransHuge(page)) {
> >> + if (is_transparent_hugepage(page)) {
> >> /*
> >> * clear __GFP_RECLAIM to make the migration callback
> >> * consistent with regular THP allocations.
> >> @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> > if (PageTransHuge(page)) { // just give more code context
> > ...
> >> gfp_mask |= GFP_TRANSHUGE;
> >> order = HPAGE_PMD_ORDER;
> > order assigned here
> >> }
> >> + if (PageCompound(page)) {
> >> + gfp_mask |= __GFP_COMP;
> >> + order = compound_order(page);
> > re-assinged again as THP is a compound page?
>
> Ah, you are right. Will use else if instead. Thanks.

You don't have to use else if, you could just do:

if (PageCompound(page)) {
gfp_mask |= __GFP_COMP;
order = compound_order(page);
}

if (is_transparent_hugepage(page)) {
/* Manipulate THP specific gfp mask */
....
}


>
> > Thanks,
> > Eric
> >> + }
> >> zidx = zone_idx(page_zone(page));
> >> if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
> >> gfp_mask |= __GFP_HIGHMEM;
> >> new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
> >> - if (new_page && PageTransHuge(new_page))
> >> + if (new_page && is_transparent_hugepage(page))
> >> prep_transhuge_page(new_page);
> >> return new_page;
>
>
> --
> Best Regards,
> Yan, Zi

2021-12-10 18:36:35

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

On 10.12.21 00:04, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> Hi all,

Hi,

thanks for working on that!

>
> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
> adjustable at boot time[1].
>
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
>
> To remove the requirement, this patchset:
> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
> 2. but saves the pageblock migratetypes outside the specified range of
> alloc_contig_range() and restores them after all pages within the range
> become free after __alloc_contig_migrate_range();
> 3. splits free pages spanning multiple pageblocks at the beginning and the end
> of the range and puts the split pages to the right migratetype free lists
> based on the pageblock migratetypes;
> 4. returns pages not in the range as it did before this patch.
>
> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
> to make sure all pageblocks belonging to a single page are isolated together
> and later pageblocks outside the range need to have their migratetypes restored;
> or 2) extra logic will need to be added during page free time to split a free
> page with multi-migratetype pageblocks.
>
> Two optimizations might come later:
> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
> range during isolation to increase successful rate of alloc_contig_range().

The issue with virtio-mem is that we'll need that as soon as we change
the granularity to pageblocks, because otherwise, you can heavily
degrade unplug reliably in sane setups:

Previous:
* Try unplug free 4M range (2 pageblocks): succeeds

Now:
* Try unplug 2M range (first pageblock): succeeds.
* Try unplug next 2M range (second pageblock): fails because first
contains unmovable allcoations.

--
Thanks,

David / dhildenb


2021-12-10 20:18:02

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

On 10 Dec 2021, at 13:36, David Hildenbrand wrote:

> On 10.12.21 00:04, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> Hi all,
>
> Hi,
>
> thanks for working on that!
>
>>
>> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
>> adjustable at boot time[1].
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
>> 2. but saves the pageblock migratetypes outside the specified range of
>> alloc_contig_range() and restores them after all pages within the range
>> become free after __alloc_contig_migrate_range();
>> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>> of the range and puts the split pages to the right migratetype free lists
>> based on the pageblock migratetypes;
>> 4. returns pages not in the range as it did before this patch.
>>
>> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
>> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
>> to make sure all pageblocks belonging to a single page are isolated together
>> and later pageblocks outside the range need to have their migratetypes restored;
>> or 2) extra logic will need to be added during page free time to split a free
>> page with multi-migratetype pageblocks.
>>
>> Two optimizations might come later:
>> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>> range during isolation to increase successful rate of alloc_contig_range().
>
> The issue with virtio-mem is that we'll need that as soon as we change
> the granularity to pageblocks, because otherwise, you can heavily
> degrade unplug reliably in sane setups:
>
> Previous:
> * Try unplug free 4M range (2 pageblocks): succeeds
>
> Now:
> * Try unplug 2M range (first pageblock): succeeds.
> * Try unplug next 2M range (second pageblock): fails because first
> contains unmovable allcoations.
>

OK. Make sense. I will add it in the next version.


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (854.00 B)
OpenPGP digital signature