2022-01-21 19:58:57

by Zi Yan

[permalink] [raw]
Subject: [PATCH v4 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

From: Zi Yan <[email protected]>

Hi all,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2021-12-29-20-07.

Changelog from RFC
===
1. Dropped two irrelevant patches on non-lru compound page handling, as
it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
__free_one_page() when order is >= pageblock_order, as the case (not
mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
alloc_contig_range() and restores them after all pages within the range
become free after __alloc_contig_migrate_range();
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
range during isolation to avoid alloc_contig_range() failure when pageblocks
within a MAX_ORDER - 1 aligned range are allocated separately.
3. splits free pages spanning multiple pageblocks at the beginning and the end
of the range and puts the split pages to the right migratetype free lists
based on the pageblock migratetypes;
4. returns pages not in the range as it did before.

Isolation needs to be done at MAX_ORDER - 1 granularity, because otherwise
either 1) it is needed to detect to-be-isolated page size (free, PageHuge, THP,
or other PageCompound) to make sure all pageblocks belonging to a single page
are isolated together and later restore pageblock migratetypes outside the
range, or 2) assuming isolation happens at pageblock granularity, a free page
with multi-migratetype pageblocks can seen in free page path and needs
to be split and freed at pageblock granularity.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/


Zi Yan (7):
mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
mm: page_isolation: check specified range for unmovable pages
mm: make alloc_contig_range work at pageblock granularity
mm: cma: use pageblock_order as the single alignment
drivers: virtio_mem: use pageblock size as the minimum virtio_mem
size.
arch: powerpc: adjust fadump alignment to be pageblock aligned.

arch/powerpc/include/asm/fadump-internal.h | 4 +-
drivers/virtio/virtio_mem.c | 7 +-
include/linux/mmzone.h | 16 +-
include/linux/page-isolation.h | 3 +-
kernel/dma/contiguous.c | 2 +-
mm/cma.c | 6 +-
mm/memory_hotplug.c | 12 +-
mm/page_alloc.c | 337 +++++++++++----------
mm/page_isolation.c | 154 +++++++++-
9 files changed, 352 insertions(+), 189 deletions(-)

--
2.34.1


2022-01-21 19:59:03

by Zi Yan

[permalink] [raw]
Subject: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c

From: Zi Yan <[email protected]>

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/page-isolation.h | 2 -
mm/page_alloc.c | 119 ---------------------------------
mm/page_isolation.c | 119 +++++++++++++++++++++++++++++++++
3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
#define MEMORY_OFFLINE 0x1
#define REPORT_FAILURE 0x2

-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
- int migratetype, int flags);
void set_pageblock_migratetype(struct page *page, int migratetype);
int move_freepages_block(struct zone *zone, struct page *page,
int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15de65215c02..1d812268c2a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8859,125 +8859,6 @@ void *__init alloc_large_system_hash(const char *tablename,
return table;
}

-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
- int migratetype, int flags)
-{
- unsigned long iter = 0;
- unsigned long pfn = page_to_pfn(page);
- unsigned long offset = pfn % pageblock_nr_pages;
-
- if (is_migrate_cma_page(page)) {
- /*
- * CMA allocations (alloc_contig_range) really need to mark
- * isolate CMA pageblocks even when they are not movable in fact
- * so consider them movable here.
- */
- if (is_migrate_cma(migratetype))
- return NULL;
-
- return page;
- }
-
- for (; iter < pageblock_nr_pages - offset; iter++) {
- page = pfn_to_page(pfn + iter);
-
- /*
- * Both, bootmem allocations and memory holes are marked
- * PG_reserved and are unmovable. We can even have unmovable
- * allocations inside ZONE_MOVABLE, for example when
- * specifying "movablecore".
- */
- if (PageReserved(page))
- return page;
-
- /*
- * If the zone is movable and we have ruled out all reserved
- * pages then it should be reasonably safe to assume the rest
- * is movable.
- */
- if (zone_idx(zone) == ZONE_MOVABLE)
- continue;
-
- /*
- * Hugepages are not in LRU lists, but they're movable.
- * THPs are on the LRU, but need to be counted as #small pages.
- * We need not scan over tail pages because we don't
- * handle each tail page individually in migration.
- */
- if (PageHuge(page) || PageTransCompound(page)) {
- struct page *head = compound_head(page);
- unsigned int skip_pages;
-
- if (PageHuge(page)) {
- if (!hugepage_migration_supported(page_hstate(head)))
- return page;
- } else if (!PageLRU(head) && !__PageMovable(head)) {
- return page;
- }
-
- skip_pages = compound_nr(head) - (page - head);
- iter += skip_pages - 1;
- continue;
- }
-
- /*
- * We can't use page_count without pin a page
- * because another CPU can free compound page.
- * This check already skips compound tails of THP
- * because their page->_refcount is zero at all time.
- */
- if (!page_ref_count(page)) {
- if (PageBuddy(page))
- iter += (1 << buddy_order(page)) - 1;
- continue;
- }
-
- /*
- * The HWPoisoned page may be not in buddy system, and
- * page_count() is not 0.
- */
- if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
- continue;
-
- /*
- * We treat all PageOffline() pages as movable when offlining
- * to give drivers a chance to decrement their reference count
- * in MEM_GOING_OFFLINE in order to indicate that these pages
- * can be offlined as there are no direct references anymore.
- * For actually unmovable PageOffline() where the driver does
- * not support this, we will fail later when trying to actually
- * move these pages that still have a reference count > 0.
- * (false negatives in this function only)
- */
- if ((flags & MEMORY_OFFLINE) && PageOffline(page))
- continue;
-
- if (__PageMovable(page) || PageLRU(page))
- continue;
-
- /*
- * If there are RECLAIMABLE pages, we need to check
- * it. But now, memory offline itself doesn't call
- * shrink_node_slabs() and it still to be fixed.
- */
- return page;
- }
- return NULL;
-}
-
#ifdef CONFIG_CONTIG_ALLOC
static unsigned long pfn_max_align_down(unsigned long pfn)
{
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6a0ddda6b3c5..6c841274bf46 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,125 @@
#define CREATE_TRACE_POINTS
#include <trace/events/page_isolation.h>

+/*
+ * This function checks whether pageblock includes unmovable pages or not.
+ *
+ * PageLRU check without isolation or lru_lock could race so that
+ * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
+ * check without lock_page also may miss some movable non-lru pages at
+ * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
+ */
+static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+ int migratetype, int flags)
+{
+ unsigned long iter = 0;
+ unsigned long pfn = page_to_pfn(page);
+ unsigned long offset = pfn % pageblock_nr_pages;
+
+ if (is_migrate_cma_page(page)) {
+ /*
+ * CMA allocations (alloc_contig_range) really need to mark
+ * isolate CMA pageblocks even when they are not movable in fact
+ * so consider them movable here.
+ */
+ if (is_migrate_cma(migratetype))
+ return NULL;
+
+ return page;
+ }
+
+ for (; iter < pageblock_nr_pages - offset; iter++) {
+ page = pfn_to_page(pfn + iter);
+
+ /*
+ * Both, bootmem allocations and memory holes are marked
+ * PG_reserved and are unmovable. We can even have unmovable
+ * allocations inside ZONE_MOVABLE, for example when
+ * specifying "movablecore".
+ */
+ if (PageReserved(page))
+ return page;
+
+ /*
+ * If the zone is movable and we have ruled out all reserved
+ * pages then it should be reasonably safe to assume the rest
+ * is movable.
+ */
+ if (zone_idx(zone) == ZONE_MOVABLE)
+ continue;
+
+ /*
+ * Hugepages are not in LRU lists, but they're movable.
+ * THPs are on the LRU, but need to be counted as #small pages.
+ * We need not scan over tail pages because we don't
+ * handle each tail page individually in migration.
+ */
+ if (PageHuge(page) || PageTransCompound(page)) {
+ struct page *head = compound_head(page);
+ unsigned int skip_pages;
+
+ if (PageHuge(page)) {
+ if (!hugepage_migration_supported(page_hstate(head)))
+ return page;
+ } else if (!PageLRU(head) && !__PageMovable(head)) {
+ return page;
+ }
+
+ skip_pages = compound_nr(head) - (page - head);
+ iter += skip_pages - 1;
+ continue;
+ }
+
+ /*
+ * We can't use page_count without pin a page
+ * because another CPU can free compound page.
+ * This check already skips compound tails of THP
+ * because their page->_refcount is zero at all time.
+ */
+ if (!page_ref_count(page)) {
+ if (PageBuddy(page))
+ iter += (1 << buddy_order(page)) - 1;
+ continue;
+ }
+
+ /*
+ * The HWPoisoned page may be not in buddy system, and
+ * page_count() is not 0.
+ */
+ if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
+ continue;
+
+ /*
+ * We treat all PageOffline() pages as movable when offlining
+ * to give drivers a chance to decrement their reference count
+ * in MEM_GOING_OFFLINE in order to indicate that these pages
+ * can be offlined as there are no direct references anymore.
+ * For actually unmovable PageOffline() where the driver does
+ * not support this, we will fail later when trying to actually
+ * move these pages that still have a reference count > 0.
+ * (false negatives in this function only)
+ */
+ if ((flags & MEMORY_OFFLINE) && PageOffline(page))
+ continue;
+
+ if (__PageMovable(page) || PageLRU(page))
+ continue;
+
+ /*
+ * If there are RECLAIMABLE pages, we need to check
+ * it. But now, memory offline itself doesn't call
+ * shrink_node_slabs() and it still to be fixed.
+ */
+ return page;
+ }
+ return NULL;
+}
+
static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
{
struct zone *zone = page_zone(page);
--
2.34.1

2022-01-21 19:59:10

by Zi Yan

[permalink] [raw]
Subject: [PATCH v4 5/7] mm: cma: use pageblock_order as the single alignment

From: Zi Yan <[email protected]>

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/mmzone.h | 5 +----
kernel/dma/contiguous.c | 2 +-
mm/cma.c | 6 ++----
mm/page_alloc.c | 6 +++---
4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 71b77aab748d..7bd3694b24b4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
*
* The way to use it is to change migratetype of a range of
* pageblocks to MIGRATE_CMA which can be done by
- * __free_pageblock_cma() function. What is important though
- * is that a range of pageblocks must be aligned to
- * MAX_ORDER_NR_PAGES should biggest page be bigger than
- * a single pageblock.
+ * __free_pageblock_cma() function.
*/
MIGRATE_CMA,
#endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {

static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
- phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+ phys_addr_t align = PAGE_SIZE << pageblock_order;
phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
return -EINVAL;

/* ensure minimal alignment required by mm core */
- alignment = PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+ alignment = PAGE_SIZE << pageblock_order;

/* alignment should be aligned with order_per_bit */
if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
* migratetype page by page allocator's buddy algorithm. In the case,
* you couldn't get a contiguous memory, which is not what we want.
*/
- alignment = max(alignment, (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+ alignment = max(alignment, (phys_addr_t)PAGE_SIZE << pageblock_order);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6ed506234efa..a8ced1a00ce8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9008,8 +9008,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page,
* be either of the two.
* @gfp_mask: GFP mask to use during compaction
*
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned. The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
*
* The first thing this routine does is attempt to MIGRATE_ISOLATE all
* pageblocks in the range. Once isolated, the pageblocks should not
@@ -9125,7 +9125,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
ret = 0;

/*
- * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+ * Pages from [start, end) are within a pageblock_nr_pages
* aligned blocks that are marked as MIGRATE_ISOLATE. What's
* more, all pages in [start, end) are free in page allocator.
* What we are going to do is to allocate all pages from
--
2.34.1

2022-01-21 19:59:15

by Zi Yan

[permalink] [raw]
Subject: [PATCH v4 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.

From: Zi Yan <[email protected]>

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan <[email protected]>
---
arch/powerpc/include/asm/fadump-internal.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..fbfca85b4200 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt)

/* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \
- max_t(unsigned long, MAX_ORDER - 1, \
- pageblock_order))
+#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << pageblock_order)

/* FAD commands */
#define FADUMP_REGISTER 1
--
2.34.1

2022-01-25 09:00:07

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v4 2/7] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c

On 2022-01-19 20:06, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
> mm/page_alloc.c and make it static.
>
> Signed-off-by: Zi Yan <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>

--
Oscar Salvador
SUSE Labs