From: Zi Yan <[email protected]>
Hi David,
This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-03-16-17-42.
Changelog
===
V8
---
1. Cleaned up has_unmovable_pages() to remove page argument.
V7
---
1. Added page validity check in isolate_single_pageblock() to avoid out
of zone pages.
2. Fixed a bug in split_free_page() to split and free pages in correct
page order.
V6
---
1. Resolved compilation error/warning reported by kernel test robot.
2. Tried to solve the coding concerns from Christophe Leroy.
3. Shortened lengthy lines (pointed out by Christoph Hellwig).
V5
---
1. Moved isolation address alignment handling in start_isolate_page_range().
2. Rewrote and simplified how alloc_contig_range() works at pageblock
granularity (Patch 3). Only two pageblock migratetypes need to be saved and
restored. start_isolate_page_range() might need to migrate pages in this
version, but it prevents the caller from worrying about
max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
is isolated.
V4
---
1. Dropped two irrelevant patches on non-lru compound page handling, as
it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
__free_one_page() when order is >= pageblock_order, as the case (not
mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.
Description
===
The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].
To remove the requirement, this patchset:
1. isolates pages at pageblock granularity instead of
max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
2. splits free pages across the specified range or migrates in-use pages
across the specified range then splits the freed page to avoid free page
accounting issues (it happens when multiple pageblocks within a single page
have different migratetypes);
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
range during isolation to avoid alloc_contig_range() failure when pageblocks
within a MAX_ORDER - 1 aligned range are allocated separately.
4. returns pages not in the range as it did before.
One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
migratetypes when isolation fails in the middle of the range.
Feel free to give comments and suggestions. Thanks.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
Zi Yan (5):
mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
mm: page_isolation: check specified range for unmovable pages
mm: make alloc_contig_range work at pageblock granularity
mm: cma: use pageblock_order as the single alignment
drivers: virtio_mem: use pageblock size as the minimum virtio_mem
size.
drivers/virtio/virtio_mem.c | 6 +-
include/linux/cma.h | 4 +-
include/linux/mmzone.h | 5 +-
include/linux/page-isolation.h | 14 +-
mm/internal.h | 6 +
mm/memory_hotplug.c | 3 +-
mm/page_alloc.c | 246 +++++++--------------------
mm/page_isolation.c | 296 +++++++++++++++++++++++++++++++--
8 files changed, 367 insertions(+), 213 deletions(-)
--
2.35.1
From: Zi Yan <[email protected]>
Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.
Signed-off-by: Zi Yan <[email protected]>
---
include/linux/page-isolation.h | 10 +++++
mm/page_alloc.c | 13 +------
mm/page_isolation.c | 69 ++++++++++++++++++++--------------
3 files changed, 51 insertions(+), 41 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..eb4a208fe907 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -15,6 +15,16 @@ static inline bool is_migrate_isolate(int migratetype)
{
return migratetype == MIGRATE_ISOLATE;
}
+static inline unsigned long pfn_max_align_down(unsigned long pfn)
+{
+ return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+static inline unsigned long pfn_max_align_up(unsigned long pfn)
+{
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
#else
static inline bool has_isolate_pageblock(struct zone *zone)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de57d058d3d..680580a40a35 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8937,16 +8937,6 @@ void *__init alloc_large_system_hash(const char *tablename,
}
#ifdef CONFIG_CONTIG_ALLOC
-static unsigned long pfn_max_align_down(unsigned long pfn)
-{
- return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
-}
-
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
- return ALIGN(pfn, MAX_ORDER_NR_PAGES);
-}
-
#if defined(CONFIG_DYNAMIC_DEBUG) || \
(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
/* Usage: See admin-guide/dynamic-debug-howto.rst */
@@ -9091,8 +9081,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* put back to page allocator so that buddy can use them.
*/
- ret = start_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype, 0);
+ ret = start_isolate_page_range(start, end, migratetype, 0);
if (ret)
return ret;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b34f1310aeaa..419c805dbdcd 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,8 @@
#include <trace/events/page_isolation.h>
/*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock within [start_pfn, end_pfn) includes
+ * unmovable pages or not.
*
* PageLRU check without isolation or lru_lock could race so that
* MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -28,27 +29,26 @@
* cannot get removed (e.g., via memory unplug) concurrently.
*
*/
-static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
- int migratetype, int flags)
+static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
+ int migratetype, int flags)
{
- unsigned long iter = 0;
- unsigned long pfn = page_to_pfn(page);
- unsigned long offset = pfn % pageblock_nr_pages;
+ unsigned long pfn = start_pfn;
- if (is_migrate_cma_page(page)) {
- /*
- * CMA allocations (alloc_contig_range) really need to mark
- * isolate CMA pageblocks even when they are not movable in fact
- * so consider them movable here.
- */
- if (is_migrate_cma(migratetype))
- return NULL;
+ for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+ struct page *page = pfn_to_page(pfn);
+ struct zone *zone = page_zone(page);
- return page;
- }
+ if (is_migrate_cma_page(page)) {
+ /*
+ * CMA allocations (alloc_contig_range) really need to mark
+ * isolate CMA pageblocks even when they are not movable in fact
+ * so consider them movable here.
+ */
+ if (is_migrate_cma(migratetype))
+ return NULL;
- for (; iter < pageblock_nr_pages - offset; iter++) {
- page = pfn_to_page(pfn + iter);
+ return page;
+ }
/*
* Both, bootmem allocations and memory holes are marked
@@ -85,7 +85,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
}
skip_pages = compound_nr(head) - (page - head);
- iter += skip_pages - 1;
+ pfn += skip_pages - 1;
continue;
}
@@ -97,7 +97,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
*/
if (!page_ref_count(page)) {
if (PageBuddy(page))
- iter += (1 << buddy_order(page)) - 1;
+ pfn += (1 << buddy_order(page)) - 1;
continue;
}
@@ -134,7 +134,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
return NULL;
}
-static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
+/*
+ * This function set pageblock migratetype to isolate if no unmovable page is
+ * present in [start_pfn, end_pfn). The pageblock must intersect with
+ * [start_pfn, end_pfn).
+ */
+static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
+ unsigned long start_pfn, unsigned long end_pfn)
{
struct zone *zone = page_zone(page);
struct page *unmovable;
@@ -155,8 +161,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
/*
* FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
* We just check MOVABLE pages.
+ *
+ * Pass the intersection of [start_pfn, end_pfn) and the page's pageblock
+ * to avoid redundant checks.
*/
- unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+ unmovable = has_unmovable_pages(max(page_to_pfn(page), start_pfn),
+ min(ALIGN(page_to_pfn(page) + 1, pageblock_nr_pages), end_pfn),
+ migratetype, isol_flags);
if (!unmovable) {
unsigned long nr_pages;
int mt = get_pageblock_migratetype(page);
@@ -267,7 +278,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* be MIGRATE_ISOLATE.
* @start_pfn: The lower PFN of the range to be isolated.
* @end_pfn: The upper PFN of the range to be isolated.
- * start_pfn/end_pfn must be aligned to pageblock_order.
* @migratetype: Migrate type to set in error recovery.
* @flags: The following flags are allowed (they can be combined in
* a bit mask)
@@ -309,15 +319,16 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
unsigned long pfn;
struct page *page;
- BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
- BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
+ unsigned long isolate_start = pfn_max_align_down(start_pfn);
+ unsigned long isolate_end = pfn_max_align_up(end_pfn);
- for (pfn = start_pfn;
- pfn < end_pfn;
+ for (pfn = isolate_start;
+ pfn < isolate_end;
pfn += pageblock_nr_pages) {
page = __first_valid_page(pfn, pageblock_nr_pages);
- if (page && set_migratetype_isolate(page, migratetype, flags)) {
- undo_isolate_page_range(start_pfn, pfn, migratetype);
+ if (page && set_migratetype_isolate(page, migratetype, flags,
+ start_pfn, end_pfn)) {
+ undo_isolate_page_range(isolate_start, pfn, migratetype);
return -EBUSY;
}
}
--
2.35.1
From: Zi Yan <[email protected]>
alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.
Signed-off-by: Zi Yan <[email protected]>
---
drivers/virtio/virtio_mem.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index e7d6b679596d..e07486f01999 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,10 +2476,10 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
/*
- * TODO: once alloc_contig_range() works reliably with pageblock
- * granularity on ZONE_NORMAL, use pageblock_nr_pages instead.
+ * alloc_contig_range() works reliably with pageblock
+ * granularity on ZONE_NORMAL, use pageblock_nr_pages.
*/
- sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES;
+ sb_size = PAGE_SIZE * pageblock_nr_pages;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
if (sb_size < memory_block_size_bytes() && !force_bbm) {
--
2.35.1
On 21 Mar 2022, at 13:30, David Hildenbrand wrote:
> On 17.03.22 16:37, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> Enable set_migratetype_isolate() to check specified sub-range for
>> unmovable pages during isolation. Page isolation is done
>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>> pages within that granularity are intended to be isolated. For example,
>> alloc_contig_range(), which uses page isolation, allows ranges without
>> alignment. This commit makes unmovable page check only look for
>> interesting pages, so that page isolation can succeed for any
>> non-overlapping ranges.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/page-isolation.h | 10 +++++
>> mm/page_alloc.c | 13 +------
>> mm/page_isolation.c | 69 ++++++++++++++++++++--------------
>> 3 files changed, 51 insertions(+), 41 deletions(-)
>>
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index e14eddf6741a..eb4a208fe907 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -15,6 +15,16 @@ static inline bool is_migrate_isolate(int migratetype)
>> {
>> return migratetype == MIGRATE_ISOLATE;
>> }
>> +static inline unsigned long pfn_max_align_down(unsigned long pfn)
>> +{
>> + return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
>> +}
>> +
>> +static inline unsigned long pfn_max_align_up(unsigned long pfn)
>> +{
>> + return ALIGN(pfn, MAX_ORDER_NR_PAGES);
>> +}
>> +
>> #else
>> static inline bool has_isolate_pageblock(struct zone *zone)
>> {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 6de57d058d3d..680580a40a35 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8937,16 +8937,6 @@ void *__init alloc_large_system_hash(const char *tablename,
>> }
>>
>> #ifdef CONFIG_CONTIG_ALLOC
>> -static unsigned long pfn_max_align_down(unsigned long pfn)
>> -{
>> - return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
>> -}
>> -
>> -static unsigned long pfn_max_align_up(unsigned long pfn)
>> -{
>> - return ALIGN(pfn, MAX_ORDER_NR_PAGES);
>> -}
>> -
>> #if defined(CONFIG_DYNAMIC_DEBUG) || \
>> (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
>> /* Usage: See admin-guide/dynamic-debug-howto.rst */
>> @@ -9091,8 +9081,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>> * put back to page allocator so that buddy can use them.
>> */
>>
>> - ret = start_isolate_page_range(pfn_max_align_down(start),
>> - pfn_max_align_up(end), migratetype, 0);
>> + ret = start_isolate_page_range(start, end, migratetype, 0);
>> if (ret)
>> return ret;
>
> Shouldn't we similarly adjust undo_isolate_page_range()? IOW, all users
> of pfn_max_align_down()/pfn_max_align_up(). would be gone from that file
> and you can move these defines into mm/page_isolation.c instead of
> include/linux/page-isolation.h?
undo_isolate_page_range() faces much simpler situation, just needing
to unset migratetype. We can just pass pageblock_nr_pages aligned range
to it. For start_isolate_page_range(), start and end are also used for
has_unmovable_pages() for precise unmovable page identification, so
they cannot be pageblock_nr_pages aligned. But for readability and symmetry,
yes, I can change undo_isolate_page_range() too.
>
> Maybe perform this change in a separate patch for
> start_isolate_page_range() and undo_isolate_page_range() ?
The change is trivial enough to be folded into this one.
>
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index b34f1310aeaa..419c805dbdcd 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -16,7 +16,8 @@
>> #include <trace/events/page_isolation.h>
>>
>> /*
>> - * This function checks whether pageblock includes unmovable pages or not.
>> + * This function checks whether pageblock within [start_pfn, end_pfn) includes
>> + * unmovable pages or not.
>
> I think we still want to limit that to a single pageblock (see below),
> as we're going to isolate individual pageblocks. Then an updated
> description could be:
>
> "This function checks whether the range [start_pfn, end_pfn) includes
> unmovable pages or not. The range must fall into a single pageblock and
> consequently belong to a single zone."
>
Sure.
>> *
>> * PageLRU check without isolation or lru_lock could race so that
>> * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>> @@ -28,27 +29,26 @@
>> * cannot get removed (e.g., via memory unplug) concurrently.
>> *
>> */
>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> - int migratetype, int flags)
>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>> + int migratetype, int flags)
>> {
>> - unsigned long iter = 0;
>> - unsigned long pfn = page_to_pfn(page);
>> - unsigned long offset = pfn % pageblock_nr_pages;
>> + unsigned long pfn = start_pfn;
>>
>> - if (is_migrate_cma_page(page)) {
>> - /*
>> - * CMA allocations (alloc_contig_range) really need to mark
>> - * isolate CMA pageblocks even when they are not movable in fact
>> - * so consider them movable here.
>> - */
>> - if (is_migrate_cma(migratetype))
>> - return NULL;
>
> If we're really dealing with a range that falls into a single pageblock,
> then you can leave the is_migrate_cma_page() in place and also lookup
> the zone only once. That should speed up things and minimize the
> required changes.
>
> You can then further add VM_BUG_ON()s that make sure that start_pfn and
> end_pfn-1 belong to a single pageblock.
Sure.
>
>> + for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> + struct page *page = pfn_to_page(pfn);
>> + struct zone *zone = page_zone(page);
>>
>> - return page;
>> - }
>> + if (is_migrate_cma_page(page)) {
>> + /*
>> + * CMA allocations (alloc_contig_range) really need to mark
>> + * isolate CMA pageblocks even when they are not movable in fact
>> + * so consider them movable here.
>> + */
>> + if (is_migrate_cma(migratetype))
>> + return NULL;
>>
>> - for (; iter < pageblock_nr_pages - offset; iter++) {
>> - page = pfn_to_page(pfn + iter);
>> + return page;
>> + }
>>
>> /*
>> * Both, bootmem allocations and memory holes are marked
>> @@ -85,7 +85,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> }
>>
>> skip_pages = compound_nr(head) - (page - head);
>> - iter += skip_pages - 1;
>> + pfn += skip_pages - 1;
>> continue;
>> }
>>
>> @@ -97,7 +97,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> */
>> if (!page_ref_count(page)) {
>> if (PageBuddy(page))
>> - iter += (1 << buddy_order(page)) - 1;
>> + pfn += (1 << buddy_order(page)) - 1;
>> continue;
>> }
>>
>> @@ -134,7 +134,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> return NULL;
>> }
>>
>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>> +/*
>> + * This function set pageblock migratetype to isolate if no unmovable page is
>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>> + * [start_pfn, end_pfn).
>> + */
>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>> + unsigned long start_pfn, unsigned long end_pfn)
>> {
>> struct zone *zone = page_zone(page);
>> struct page *unmovable;
>> @@ -155,8 +161,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>> /*
>> * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
>> * We just check MOVABLE pages.
>> + *
>> + * Pass the intersection of [start_pfn, end_pfn) and the page's pageblock
>> + * to avoid redundant checks.
>> */
>
> I think I'd prefer some helper variables for readability.
Will do.
>
>> - unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
>> + unmovable = has_unmovable_pages(max(page_to_pfn(page), start_pfn),
>> + min(ALIGN(page_to_pfn(page) + 1, pageblock_nr_pages), end_pfn),
>> + migratetype, isol_flags);
>> if (!unmovable) {
>> unsigned long nr_pages;
>> int mt = get_pageblock_migratetype(page);
>> @@ -267,7 +278,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>> * be MIGRATE_ISOLATE.
>> * @start_pfn: The lower PFN of the range to be isolated.
>> * @end_pfn: The upper PFN of the range to be isolated.
>> - * start_pfn/end_pfn must be aligned to pageblock_order.
>> * @migratetype: Migrate type to set in error recovery.
>> * @flags: The following flags are allowed (they can be combined in
>> * a bit mask)
>> @@ -309,15 +319,16 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> unsigned long pfn;
>> struct page *page;
>>
>> - BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>> - BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
>> + unsigned long isolate_start = pfn_max_align_down(start_pfn);
>> + unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>
>> - for (pfn = start_pfn;
>> - pfn < end_pfn;
>> + for (pfn = isolate_start;
>> + pfn < isolate_end;
>> pfn += pageblock_nr_pages) {
>> page = __first_valid_page(pfn, pageblock_nr_pages);
>> - if (page && set_migratetype_isolate(page, migratetype, flags)) {
>> - undo_isolate_page_range(start_pfn, pfn, migratetype);
>> + if (page && set_migratetype_isolate(page, migratetype, flags,
>> + start_pfn, end_pfn)) {
>> + undo_isolate_page_range(isolate_start, pfn, migratetype);
>> return -EBUSY;
>> }
>> }
>
>
> --
> Thanks,
>
> David / dhildenb
Thanks for your review.
--
Best Regards,
Yan, Zi
On 17.03.22 16:37, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> Enable set_migratetype_isolate() to check specified sub-range for
> unmovable pages during isolation. Page isolation is done
> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
> pages within that granularity are intended to be isolated. For example,
> alloc_contig_range(), which uses page isolation, allows ranges without
> alignment. This commit makes unmovable page check only look for
> interesting pages, so that page isolation can succeed for any
> non-overlapping ranges.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/page-isolation.h | 10 +++++
> mm/page_alloc.c | 13 +------
> mm/page_isolation.c | 69 ++++++++++++++++++++--------------
> 3 files changed, 51 insertions(+), 41 deletions(-)
>
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index e14eddf6741a..eb4a208fe907 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -15,6 +15,16 @@ static inline bool is_migrate_isolate(int migratetype)
> {
> return migratetype == MIGRATE_ISOLATE;
> }
> +static inline unsigned long pfn_max_align_down(unsigned long pfn)
> +{
> + return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +static inline unsigned long pfn_max_align_up(unsigned long pfn)
> +{
> + return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> #else
> static inline bool has_isolate_pageblock(struct zone *zone)
> {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6de57d058d3d..680580a40a35 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8937,16 +8937,6 @@ void *__init alloc_large_system_hash(const char *tablename,
> }
>
> #ifdef CONFIG_CONTIG_ALLOC
> -static unsigned long pfn_max_align_down(unsigned long pfn)
> -{
> - return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
> -}
> -
> -static unsigned long pfn_max_align_up(unsigned long pfn)
> -{
> - return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> -}
> -
> #if defined(CONFIG_DYNAMIC_DEBUG) || \
> (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> /* Usage: See admin-guide/dynamic-debug-howto.rst */
> @@ -9091,8 +9081,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> * put back to page allocator so that buddy can use them.
> */
>
> - ret = start_isolate_page_range(pfn_max_align_down(start),
> - pfn_max_align_up(end), migratetype, 0);
> + ret = start_isolate_page_range(start, end, migratetype, 0);
> if (ret)
> return ret;
Shouldn't we similarly adjust undo_isolate_page_range()? IOW, all users
of pfn_max_align_down()/pfn_max_align_up(). would be gone from that file
and you can move these defines into mm/page_isolation.c instead of
include/linux/page-isolation.h?
Maybe perform this change in a separate patch for
start_isolate_page_range() and undo_isolate_page_range() ?
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index b34f1310aeaa..419c805dbdcd 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -16,7 +16,8 @@
> #include <trace/events/page_isolation.h>
>
> /*
> - * This function checks whether pageblock includes unmovable pages or not.
> + * This function checks whether pageblock within [start_pfn, end_pfn) includes
> + * unmovable pages or not.
I think we still want to limit that to a single pageblock (see below),
as we're going to isolate individual pageblocks. Then an updated
description could be:
"This function checks whether the range [start_pfn, end_pfn) includes
unmovable pages or not. The range must fall into a single pageblock and
consequently belong to a single zone."
> *
> * PageLRU check without isolation or lru_lock could race so that
> * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
> @@ -28,27 +29,26 @@
> * cannot get removed (e.g., via memory unplug) concurrently.
> *
> */
> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> - int migratetype, int flags)
> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
> + int migratetype, int flags)
> {
> - unsigned long iter = 0;
> - unsigned long pfn = page_to_pfn(page);
> - unsigned long offset = pfn % pageblock_nr_pages;
> + unsigned long pfn = start_pfn;
>
> - if (is_migrate_cma_page(page)) {
> - /*
> - * CMA allocations (alloc_contig_range) really need to mark
> - * isolate CMA pageblocks even when they are not movable in fact
> - * so consider them movable here.
> - */
> - if (is_migrate_cma(migratetype))
> - return NULL;
If we're really dealing with a range that falls into a single pageblock,
then you can leave the is_migrate_cma_page() in place and also lookup
the zone only once. That should speed up things and minimize the
required changes.
You can then further add VM_BUG_ON()s that make sure that start_pfn and
end_pfn-1 belong to a single pageblock.
> + for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> + struct page *page = pfn_to_page(pfn);
> + struct zone *zone = page_zone(page);
>
> - return page;
> - }
> + if (is_migrate_cma_page(page)) {
> + /*
> + * CMA allocations (alloc_contig_range) really need to mark
> + * isolate CMA pageblocks even when they are not movable in fact
> + * so consider them movable here.
> + */
> + if (is_migrate_cma(migratetype))
> + return NULL;
>
> - for (; iter < pageblock_nr_pages - offset; iter++) {
> - page = pfn_to_page(pfn + iter);
> + return page;
> + }
>
> /*
> * Both, bootmem allocations and memory holes are marked
> @@ -85,7 +85,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> }
>
> skip_pages = compound_nr(head) - (page - head);
> - iter += skip_pages - 1;
> + pfn += skip_pages - 1;
> continue;
> }
>
> @@ -97,7 +97,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> */
> if (!page_ref_count(page)) {
> if (PageBuddy(page))
> - iter += (1 << buddy_order(page)) - 1;
> + pfn += (1 << buddy_order(page)) - 1;
> continue;
> }
>
> @@ -134,7 +134,13 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> return NULL;
> }
>
> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
> +/*
> + * This function set pageblock migratetype to isolate if no unmovable page is
> + * present in [start_pfn, end_pfn). The pageblock must intersect with
> + * [start_pfn, end_pfn).
> + */
> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
> + unsigned long start_pfn, unsigned long end_pfn)
> {
> struct zone *zone = page_zone(page);
> struct page *unmovable;
> @@ -155,8 +161,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> /*
> * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
> * We just check MOVABLE pages.
> + *
> + * Pass the intersection of [start_pfn, end_pfn) and the page's pageblock
> + * to avoid redundant checks.
> */
I think I'd prefer some helper variables for readability.
> - unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
> + unmovable = has_unmovable_pages(max(page_to_pfn(page), start_pfn),
> + min(ALIGN(page_to_pfn(page) + 1, pageblock_nr_pages), end_pfn),
> + migratetype, isol_flags);
> if (!unmovable) {
> unsigned long nr_pages;
> int mt = get_pageblock_migratetype(page);
> @@ -267,7 +278,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
> * be MIGRATE_ISOLATE.
> * @start_pfn: The lower PFN of the range to be isolated.
> * @end_pfn: The upper PFN of the range to be isolated.
> - * start_pfn/end_pfn must be aligned to pageblock_order.
> * @migratetype: Migrate type to set in error recovery.
> * @flags: The following flags are allowed (they can be combined in
> * a bit mask)
> @@ -309,15 +319,16 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> unsigned long pfn;
> struct page *page;
>
> - BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
> - BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
> + unsigned long isolate_start = pfn_max_align_down(start_pfn);
> + unsigned long isolate_end = pfn_max_align_up(end_pfn);
>
> - for (pfn = start_pfn;
> - pfn < end_pfn;
> + for (pfn = isolate_start;
> + pfn < isolate_end;
> pfn += pageblock_nr_pages) {
> page = __first_valid_page(pfn, pageblock_nr_pages);
> - if (page && set_migratetype_isolate(page, migratetype, flags)) {
> - undo_isolate_page_range(start_pfn, pfn, migratetype);
> + if (page && set_migratetype_isolate(page, migratetype, flags,
> + start_pfn, end_pfn)) {
> + undo_isolate_page_range(isolate_start, pfn, migratetype);
> return -EBUSY;
> }
> }
--
Thanks,
David / dhildenb
On 21.03.22 19:23, Zi Yan wrote:
> On 21 Mar 2022, at 13:30, David Hildenbrand wrote:
>
>> On 17.03.22 16:37, Zi Yan wrote:
>>> From: Zi Yan <[email protected]>
>>>
>>> Enable set_migratetype_isolate() to check specified sub-range for
>>> unmovable pages during isolation. Page isolation is done
>>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>>> pages within that granularity are intended to be isolated. For example,
>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>> alignment. This commit makes unmovable page check only look for
>>> interesting pages, so that page isolation can succeed for any
>>> non-overlapping ranges.
>>>
>>> Signed-off-by: Zi Yan <[email protected]>
>>> ---
>>> include/linux/page-isolation.h | 10 +++++
>>> mm/page_alloc.c | 13 +------
>>> mm/page_isolation.c | 69 ++++++++++++++++++++--------------
>>> 3 files changed, 51 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>>> index e14eddf6741a..eb4a208fe907 100644
>>> --- a/include/linux/page-isolation.h
>>> +++ b/include/linux/page-isolation.h
>>> @@ -15,6 +15,16 @@ static inline bool is_migrate_isolate(int migratetype)
>>> {
>>> return migratetype == MIGRATE_ISOLATE;
>>> }
>>> +static inline unsigned long pfn_max_align_down(unsigned long pfn)
>>> +{
>>> + return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
>>> +}
>>> +
>>> +static inline unsigned long pfn_max_align_up(unsigned long pfn)
>>> +{
>>> + return ALIGN(pfn, MAX_ORDER_NR_PAGES);
>>> +}
>>> +
>>> #else
>>> static inline bool has_isolate_pageblock(struct zone *zone)
>>> {
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 6de57d058d3d..680580a40a35 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -8937,16 +8937,6 @@ void *__init alloc_large_system_hash(const char *tablename,
>>> }
>>>
>>> #ifdef CONFIG_CONTIG_ALLOC
>>> -static unsigned long pfn_max_align_down(unsigned long pfn)
>>> -{
>>> - return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
>>> -}
>>> -
>>> -static unsigned long pfn_max_align_up(unsigned long pfn)
>>> -{
>>> - return ALIGN(pfn, MAX_ORDER_NR_PAGES);
>>> -}
>>> -
>>> #if defined(CONFIG_DYNAMIC_DEBUG) || \
>>> (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
>>> /* Usage: See admin-guide/dynamic-debug-howto.rst */
>>> @@ -9091,8 +9081,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>> * put back to page allocator so that buddy can use them.
>>> */
>>>
>>> - ret = start_isolate_page_range(pfn_max_align_down(start),
>>> - pfn_max_align_up(end), migratetype, 0);
>>> + ret = start_isolate_page_range(start, end, migratetype, 0);
>>> if (ret)
>>> return ret;
>>
>> Shouldn't we similarly adjust undo_isolate_page_range()? IOW, all users
>> of pfn_max_align_down()/pfn_max_align_up(). would be gone from that file
>> and you can move these defines into mm/page_isolation.c instead of
>> include/linux/page-isolation.h?
>
> undo_isolate_page_range() faces much simpler situation, just needing
> to unset migratetype. We can just pass pageblock_nr_pages aligned range
> to it. For start_isolate_page_range(), start and end are also used for
> has_unmovable_pages() for precise unmovable page identification, so
> they cannot be pageblock_nr_pages aligned. But for readability and symmetry,
> yes, I can change undo_isolate_page_range() too.
Yeah, we should call both with the same range and any extension of the
range should be handled internally.
I thought about some corner cases, especially once we relax some (CMA)
alignment thingies -- then, the CMA area might be placed at weird
locations. I haven't checked to which degree they apply, but we should
certainly keep them in mind whenever we're extending the isolation range.
We can assume that the contig range we're allocation
a) Belongs to a single zone
b) Does not contain any memory holes / mmap holes
Let's double check
1) Different zones in extended range
... ZONE A ][ ZONE B ....
[ Pageblock X ][ Pageblock Y ][ Pageblock Z ]
[ MAX_ORDER - 1 ]
We can never create a higher-order page between X and Y, because they
are in different zones. Most probably we should *not* extend the range
to cover pageblock X in case the zones don't match.
The same consideration applies to the end of the range, when extending
the isolation range.
But I wonder if we can have such a zone layout. At least
mm/page_alloc.c:find_zone_movable_pfns_for_nodes() makes sure to always
align the start of ZONE_MOVABLE to MAX_ORDER_NR_PAGES. I hope it applies
to all other zones as well? :/
Anyhow, it should be easy to check when isolating/un-isolating. Only
conditionally extend the range if the zones of both pageblocks match.
When eventually growing MAX_ORDER_NR_PAGES further, could we be in
trouble because we could suddenly span multiple zones with a single
MAX_ORDER - 1 page? Then we'd have to handle that I guess.
2) mmap holes
I think that's already covered by the existing __first_valid_page()
handling.
So, I feel like we might have to tackle the zones issue, especially when
extending MAX_ORDER_NR_PAGES?
--
Thanks,
David / dhildenb
On 22 Mar 2022, at 12:42, David Hildenbrand wrote:
> On 21.03.22 19:23, Zi Yan wrote:
>> On 21 Mar 2022, at 13:30, David Hildenbrand wrote:
>>
>>> On 17.03.22 16:37, Zi Yan wrote:
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>> unmovable pages during isolation. Page isolation is done
>>>> at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
>>>> pages within that granularity are intended to be isolated. For example,
>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>> alignment. This commit makes unmovable page check only look for
>>>> interesting pages, so that page isolation can succeed for any
>>>> non-overlapping ranges.
>>>>
>>>> Signed-off-by: Zi Yan <[email protected]>
>>>> ---
>>>> include/linux/page-isolation.h | 10 +++++
>>>> mm/page_alloc.c | 13 +------
>>>> mm/page_isolation.c | 69 ++++++++++++++++++++--------------
>>>> 3 files changed, 51 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>>>> index e14eddf6741a..eb4a208fe907 100644
>>>> --- a/include/linux/page-isolation.h
>>>> +++ b/include/linux/page-isolation.h
>>>> @@ -15,6 +15,16 @@ static inline bool is_migrate_isolate(int migratetype)
>>>> {
>>>> return migratetype == MIGRATE_ISOLATE;
>>>> }
>>>> +static inline unsigned long pfn_max_align_down(unsigned long pfn)
>>>> +{
>>>> + return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
>>>> +}
>>>> +
>>>> +static inline unsigned long pfn_max_align_up(unsigned long pfn)
>>>> +{
>>>> + return ALIGN(pfn, MAX_ORDER_NR_PAGES);
>>>> +}
>>>> +
>>>> #else
>>>> static inline bool has_isolate_pageblock(struct zone *zone)
>>>> {
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 6de57d058d3d..680580a40a35 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -8937,16 +8937,6 @@ void *__init alloc_large_system_hash(const char *tablename,
>>>> }
>>>>
>>>> #ifdef CONFIG_CONTIG_ALLOC
>>>> -static unsigned long pfn_max_align_down(unsigned long pfn)
>>>> -{
>>>> - return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
>>>> -}
>>>> -
>>>> -static unsigned long pfn_max_align_up(unsigned long pfn)
>>>> -{
>>>> - return ALIGN(pfn, MAX_ORDER_NR_PAGES);
>>>> -}
>>>> -
>>>> #if defined(CONFIG_DYNAMIC_DEBUG) || \
>>>> (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
>>>> /* Usage: See admin-guide/dynamic-debug-howto.rst */
>>>> @@ -9091,8 +9081,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>>> * put back to page allocator so that buddy can use them.
>>>> */
>>>>
>>>> - ret = start_isolate_page_range(pfn_max_align_down(start),
>>>> - pfn_max_align_up(end), migratetype, 0);
>>>> + ret = start_isolate_page_range(start, end, migratetype, 0);
>>>> if (ret)
>>>> return ret;
>>>
>>> Shouldn't we similarly adjust undo_isolate_page_range()? IOW, all users
>>> of pfn_max_align_down()/pfn_max_align_up(). would be gone from that file
>>> and you can move these defines into mm/page_isolation.c instead of
>>> include/linux/page-isolation.h?
>>
>> undo_isolate_page_range() faces much simpler situation, just needing
>> to unset migratetype. We can just pass pageblock_nr_pages aligned range
>> to it. For start_isolate_page_range(), start and end are also used for
>> has_unmovable_pages() for precise unmovable page identification, so
>> they cannot be pageblock_nr_pages aligned. But for readability and symmetry,
>> yes, I can change undo_isolate_page_range() too.
> Yeah, we should call both with the same range and any extension of the
> range should be handled internally.
>
> I thought about some corner cases, especially once we relax some (CMA)
> alignment thingies -- then, the CMA area might be placed at weird
> locations. I haven't checked to which degree they apply, but we should
> certainly keep them in mind whenever we're extending the isolation range.
>
> We can assume that the contig range we're allocation
> a) Belongs to a single zone
> b) Does not contain any memory holes / mmap holes
>
> Let's double check
>
>
> 1) Different zones in extended range
>
> ... ZONE A ][ ZONE B ....
> [ Pageblock X ][ Pageblock Y ][ Pageblock Z ]
> [ MAX_ORDER - 1 ]
>
> We can never create a higher-order page between X and Y, because they
> are in different zones. Most probably we should *not* extend the range
> to cover pageblock X in case the zones don't match.
>
> The same consideration applies to the end of the range, when extending
> the isolation range.
>
> But I wonder if we can have such a zone layout. At least
> mm/page_alloc.c:find_zone_movable_pfns_for_nodes() makes sure to always
> align the start of ZONE_MOVABLE to MAX_ORDER_NR_PAGES. I hope it applies
> to all other zones as well? :/
AFAICT, it is not. Physical page ranges are read from E820 on x86_64 and
put into memblocks, then added to zones. zone ranges are not aligned
during pgdat initialization.
>
> Anyhow, it should be easy to check when isolating/un-isolating. Only
> conditionally extend the range if the zones of both pageblocks match.
>
>
> When eventually growing MAX_ORDER_NR_PAGES further, could we be in
> trouble because we could suddenly span multiple zones with a single
> MAX_ORDER - 1 page? Then we'd have to handle that I guess.
Yes. Good catch. I need to check whether the MAX_ORDER_NR_PAGES aligned
down PFN is in the same zone as the isolation PFN. I will add the check
in the next version.
>
>
> 2) mmap holes
>
> I think that's already covered by the existing __first_valid_page()
> handling.
>
>
> So, I feel like we might have to tackle the zones issue, especially when
> extending MAX_ORDER_NR_PAGES?
Yes. Will add it in the next version.
Great thanks for pointing this out!
--
Best Regards,
Yan, Zi