2020-06-30 14:27:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/6] mm / virtio-mem: support ZONE_MOVABLE

Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows
to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow
to online partially-plugged memory blocks to ZONE_MOVABLE and will never
consider such memory blocks when unplugging memory. This might be
surprising for users (especially, if onlining suddenly fails).

Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks.

This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.

Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs. alloc_contig_range()).

David Hildenbrand (6):
mm/page_alloc: tweak comments in has_unmovable_pages()
mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
mm/page_isolation: cleanup set_migratetype_isolate()
mm/page_alloc: restrict ZONE_MOVABLE optimization in
has_unmovable_pages() to memory offlining
virtio-mem: don't special-case ZONE_MOVABLE

drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
mm/page_alloc.c | 29 +++++++++--------------
mm/page_isolation.c | 40 ++++++++++++++-----------------
3 files changed, 36 insertions(+), 80 deletions(-)

--
2.26.2


2020-06-30 14:27:34

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()

Let's move the split comment regarding bootmem allocations and memory
holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
check.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_alloc.c | 22 ++++++----------------
1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48eb0f1410d47..bd3ebf08f09b9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
unsigned long iter = 0;
unsigned long pfn = page_to_pfn(page);

- /*
- * TODO we could make this much more efficient by not checking every
- * page in the range if we know all of them are in MOVABLE_ZONE and
- * that the movable zone guarantees that pages are migratable but
- * the later is not the case right now unfortunatelly. E.g. movablecore
- * can still lead to having bootmem allocations in zone_movable.
- */
-
if (is_migrate_cma_page(page)) {
/*
* CMA allocations (alloc_contig_range) really need to mark
@@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,

page = pfn_to_page(pfn + iter);

+ /*
+ * Both, bootmem allocations and memory holes are marked
+ * PG_reserved and are unmovable. We can even have unmovable
+ * allocations inside ZONE_MOVABLE, for example when
+ * specifying "movable_core".
+ */
if (PageReserved(page))
return page;

@@ -8306,14 +8304,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
* it. But now, memory offline itself doesn't call
* shrink_node_slabs() and it still to be fixed.
*/
- /*
- * If the page is not RAM, page_count()should be 0.
- * we don't need more check. This is an _used_ not-movable page.
- *
- * The problematic thing here is PG_reserved pages. PG_reserved
- * is set to both of a memory hole page and a _used_ kernel
- * page at boot.
- */
return page;
}
return NULL;
--
2.26.2

2020-06-30 14:27:44

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

Right now, if we have two isolations racing, we might trigger the
WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
return directly.

In the future, we might want to report -EAGAIN to the caller instead, as
this could indicate a temporary isolation failure only.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_isolation.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5f0d34d..553b49a34cf71 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
/*
* We assume the caller intended to SET migrate type to isolate.
* If it is already set, then someone else must have raced and
- * set it before us. Return -EBUSY
+ * set it before us.
*/
- if (is_migrate_isolate_page(page))
- goto out;
+ if (is_migrate_isolate_page(page)) {
+ spin_unlock_irqrestore(&zone->lock, flags);
+ return -EBUSY;
+ }

/*
* FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
--
2.26.2

2020-06-30 14:28:02

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization in has_unmovable_pages() to memory offlining

We can already have pages that can be offlined but not allocated in
ZONE_MOVABLE - PageHWPoison pages. While these pages can be skipped when
offlining ("moving them to /dev/null"), we cannot move them when
allocating.

virtio-mem managed memory is similar. The logical memory holes
corresponding to unplug memory ranges can be skipped when offlining,
however, the pages cannot be moved. Currently, virtio-mem special-cases
ZONE_MOVABLE, such that:
- partially plugged memory blocks it added to Linux cannot be onlined to
ZONE_MOVABLE
- when unplugging memory, it will never consider memory blocks that were
onlined to ZONE_MOVABLE

We also want to support ZONE_MOVABLE in virtio-mem for both cases. Note
that virtio-mem does not blindly try to unplug random pages within its
managed memory region. It always plugs memory left-to-right and tries to
unplug memory right-to-left - in roughly MAX_ORDER - 1 granularity. In
theory, the movable ZONE part would only shrink when unplugging memory
from ZONE_MOVABLE.

Let's perform the ZONE_MOVABLE optimization only for memory offlining,
such that we reduce the number of false positives from
has_unmovable_pages() in case of alloc_contig_range() on ZONE_MOVABLE.

Note: We currently don't seem to have any user of alloc_contig_range()
that actually uses ZONE_MOVABLE. This change is mostly valuable for the
documentation.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_alloc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bd3ebf08f09b9..45077d74d975d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8237,9 +8237,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
/*
* If the zone is movable and we have ruled out all reserved
* pages then it should be reasonably safe to assume the rest
- * is movable.
+ * is movable. As we can have some pages in the movable zone
+ * that are only considered movable for memory offlining (esp.,
+ * PageHWPoison and PageOffline that will be skipped), we
+ * perform this optimization only for memory offlining.
*/
- if (zone_idx(zone) == ZONE_MOVABLE)
+ if ((flags & MEMORY_OFFLINE) && zone_idx(zone) == ZONE_MOVABLE)
continue;

/*
--
2.26.2

2020-06-30 14:28:44

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 4/6] mm/page_isolation: cleanup set_migratetype_isolate()

Let's clean it up a bit, simplifying error handling and getting rid of
the label.

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_isolation.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 02a01bff6b219..5f869bef23fa4 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -17,12 +17,9 @@

static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
{
- struct page *unmovable = NULL;
- struct zone *zone;
+ struct zone *zone = page_zone(page);
+ struct page *unmovable;
unsigned long flags;
- int ret = -EBUSY;
-
- zone = page_zone(page);

spin_lock_irqsave(&zone->lock, flags);

@@ -51,21 +48,20 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
NULL);

__mod_zone_freepage_state(zone, -nr_pages, mt);
- ret = 0;
+ spin_unlock_irqrestore(&zone->lock, flags);
+ drain_all_pages(zone);
+ return 0;
}

-out:
spin_unlock_irqrestore(&zone->lock, flags);
- if (!ret) {
- drain_all_pages(zone);
- } else if ((isol_flags & REPORT_FAILURE) && unmovable)
+ if (isol_flags & REPORT_FAILURE)
/*
* printk() with zone->lock held will likely trigger a
* lockdep splat, so defer it here.
*/
dump_page(unmovable, "unmovable page");

- return ret;
+ return -EBUSY;
}

static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
--
2.26.2

2020-06-30 14:28:50

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 6/6] virtio-mem: don't special-case ZONE_MOVABLE

Let's allow to online partially plugged memory blocks to ZONE_MOVABLE
and also consider memory blocks that were onlined to ZONE_MOVABLE when
unplugging memory. While unplugged memory blocks are, in general,
unmovable, they can be skipped when offlining memory.

virtio-mem only unplugs fairly big chunks (in the megabyte range) and
rather tries to shrink the memory region than randomly choosing memory. In
theory, if all other pages in the movable zone would be movable, virtio-mem
would only shrink that zone and not create any kind of fragmentation.

Note: Support for defragmentation is planned, to deal with fragmentation
after unplug due to memory chunks within memory blocks that could not
get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE
for a longer time).

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
1 file changed, 8 insertions(+), 39 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f26f5f64ae822..2ddfc4a0e2ee0 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -36,18 +36,10 @@ enum virtio_mem_mb_state {
VIRTIO_MEM_MB_STATE_OFFLINE,
/* Partially plugged, fully added to Linux, offline. */
VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
- /* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+ /* Fully plugged, fully added to Linux, online. */
VIRTIO_MEM_MB_STATE_ONLINE,
- /* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+ /* Partially plugged, fully added to Linux, online. */
VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
- /*
- * Fully plugged, fully added to Linux, online (ZONE_MOVABLE).
- * We are not allowed to allocate (unplug) parts of this block that
- * are not movable (similar to gigantic pages). We will never allow
- * to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain
- * unmovable parts).
- */
- VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE,
VIRTIO_MEM_MB_STATE_COUNT
};

@@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
}

static int virtio_mem_notify_going_online(struct virtio_mem *vm,
- unsigned long mb_id,
- enum zone_type zone)
+ unsigned long mb_id)
{
switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
- /*
- * We won't allow to online a partially plugged memory block
- * to the MOVABLE zone - it would contain unmovable parts.
- */
- if (zone == ZONE_MOVABLE) {
- dev_warn_ratelimited(&vm->vdev->dev,
- "memory block has holes, MOVABLE not supported\n");
- return NOTIFY_BAD;
- }
- return NOTIFY_OK;
case VIRTIO_MEM_MB_STATE_OFFLINE:
return NOTIFY_OK;
default:
@@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
break;
case VIRTIO_MEM_MB_STATE_ONLINE:
- case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE:
virtio_mem_mb_set_state(vm, mb_id,
VIRTIO_MEM_MB_STATE_OFFLINE);
break;
@@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
virtio_mem_retry(vm);
}

-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id,
- enum zone_type zone)
+static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
{
unsigned long nb_offline;

switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
- BUG_ON(zone == ZONE_MOVABLE);
virtio_mem_mb_set_state(vm, mb_id,
VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
break;
case VIRTIO_MEM_MB_STATE_OFFLINE:
- if (zone == ZONE_MOVABLE)
- virtio_mem_mb_set_state(vm, mb_id,
- VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE);
- else
- virtio_mem_mb_set_state(vm, mb_id,
- VIRTIO_MEM_MB_STATE_ONLINE);
+ virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
break;
default:
BUG();
@@ -675,7 +648,6 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
const unsigned long start = PFN_PHYS(mhp->start_pfn);
const unsigned long size = PFN_PHYS(mhp->nr_pages);
const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
- enum zone_type zone;
int rc = NOTIFY_OK;

if (!virtio_mem_overlaps_range(vm, start, size))
@@ -717,8 +689,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
}
vm->hotplug_active = true;
- zone = page_zonenum(pfn_to_page(mhp->start_pfn));
- rc = virtio_mem_notify_going_online(vm, mb_id, zone);
+ rc = virtio_mem_notify_going_online(vm, mb_id);
break;
case MEM_OFFLINE:
virtio_mem_notify_offline(vm, mb_id);
@@ -726,8 +697,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
mutex_unlock(&vm->hotplug_mutex);
break;
case MEM_ONLINE:
- zone = page_zonenum(pfn_to_page(mhp->start_pfn));
- virtio_mem_notify_online(vm, mb_id, zone);
+ virtio_mem_notify_online(vm, mb_id);
vm->hotplug_active = false;
mutex_unlock(&vm->hotplug_mutex);
break;
@@ -1906,8 +1876,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
- vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] ||
- vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) {
+ vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
dev_warn(&vdev->dev, "device still has system memory added\n");
} else {
virtio_mem_delete_resource(vm);
--
2.26.2

2020-06-30 14:30:21

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
checking if the first page in the pageblock is reserved, we don't
perform any further checks in case of ZONE_MOVABLE.

In case of memory offlining, we set REPORT_FAILURE, properly
dump_page() the page and handle the error gracefully.
alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
pages only, which will never end up on the MOVABLE zone
(see htlb_alloc_mask()).

Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_isolation.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 553b49a34cf71..02a01bff6b219 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
spin_unlock_irqrestore(&zone->lock, flags);
if (!ret) {
drain_all_pages(zone);
- } else {
- WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-
- if ((isol_flags & REPORT_FAILURE) && unmovable)
- /*
- * printk() with zone->lock held will likely trigger a
- * lockdep splat, so defer it here.
- */
- dump_page(unmovable, "unmovable page");
- }
+ } else if ((isol_flags & REPORT_FAILURE) && unmovable)
+ /*
+ * printk() with zone->lock held will likely trigger a
+ * lockdep splat, so defer it here.
+ */
+ dump_page(unmovable, "unmovable page");

return ret;
}
--
2.26.2

2020-07-21 10:01:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/6] mm / virtio-mem: support ZONE_MOVABLE

On 30.06.20 16:26, David Hildenbrand wrote:
> Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows
> to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow
> to online partially-plugged memory blocks to ZONE_MOVABLE and will never
> consider such memory blocks when unplugging memory. This might be
> surprising for users (especially, if onlining suddenly fails).
>
> Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
> partially plugged memory blocks to be online to ZONE_MOVABLE and also
> unplugging from such memory blocks.
>
> This is especially helpful for testing, but also paves the way for
> virtio-mem optimizations, allowing more memory to get reliably unplugged.
>
> Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
> better documentation of how ZONE_MOVABLE interacts with different kind of
> unmovable pages (memory offlining vs. alloc_contig_range()).
>
> David Hildenbrand (6):
> mm/page_alloc: tweak comments in has_unmovable_pages()
> mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
> mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
> mm/page_isolation: cleanup set_migratetype_isolate()
> mm/page_alloc: restrict ZONE_MOVABLE optimization in
> has_unmovable_pages() to memory offlining
> virtio-mem: don't special-case ZONE_MOVABLE
>
> drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
> mm/page_alloc.c | 29 +++++++++--------------
> mm/page_isolation.c | 40 ++++++++++++++-----------------
> 3 files changed, 36 insertions(+), 80 deletions(-)
>


Gentle ping, patch #1-#5 should be easy to review.

--
Thanks,

David / dhildenb

2020-07-27 12:26:09

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization in has_unmovable_pages() to memory offlining

On 30.06.20 16:26, David Hildenbrand wrote:
> We can already have pages that can be offlined but not allocated in
> ZONE_MOVABLE - PageHWPoison pages. While these pages can be skipped when
> offlining ("moving them to /dev/null"), we cannot move them when
> allocating.
>
> virtio-mem managed memory is similar. The logical memory holes
> corresponding to unplug memory ranges can be skipped when offlining,
> however, the pages cannot be moved. Currently, virtio-mem special-cases
> ZONE_MOVABLE, such that:
> - partially plugged memory blocks it added to Linux cannot be onlined to
> ZONE_MOVABLE
> - when unplugging memory, it will never consider memory blocks that were
> onlined to ZONE_MOVABLE
>
> We also want to support ZONE_MOVABLE in virtio-mem for both cases. Note
> that virtio-mem does not blindly try to unplug random pages within its
> managed memory region. It always plugs memory left-to-right and tries to
> unplug memory right-to-left - in roughly MAX_ORDER - 1 granularity. In
> theory, the movable ZONE part would only shrink when unplugging memory
> from ZONE_MOVABLE.
>
> Let's perform the ZONE_MOVABLE optimization only for memory offlining,
> such that we reduce the number of false positives from
> has_unmovable_pages() in case of alloc_contig_range() on ZONE_MOVABLE.
>
> Note: We currently don't seem to have any user of alloc_contig_range()
> that actually uses ZONE_MOVABLE. This change is mostly valuable for the
> documentation.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_alloc.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bd3ebf08f09b9..45077d74d975d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8237,9 +8237,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> /*
> * If the zone is movable and we have ruled out all reserved
> * pages then it should be reasonably safe to assume the rest
> - * is movable.
> + * is movable. As we can have some pages in the movable zone
> + * that are only considered movable for memory offlining (esp.,
> + * PageHWPoison and PageOffline that will be skipped), we
> + * perform this optimization only for memory offlining.
> */
> - if (zone_idx(zone) == ZONE_MOVABLE)
> + if ((flags & MEMORY_OFFLINE) && zone_idx(zone) == ZONE_MOVABLE)
> continue;
>
> /*
>

So, as we don't have any alloc_contig_range() users that use
ZONE_MOVABLE for now, and virtio-mem will be the only one for now (which
accounts for 50% of the special cases - PG_offline), I think we might
drop this patch.

Worst think is that if we ever have other alloc_contig_range() users,
that we return "false" from has_unmovable_pages() and fail later when
trying to migrate/isolate all pages. This should, however, only happen
in rare cases (and there are already other cases where we have basically
unmovable data - long-term pinnings).

On the plus side, keeping the ZONE_MOVABLE optimizations here also
allows virtio-mem to better tolerate unstable page flags when trying to
alloc_contig_range().

Thoughts?

--
Thanks,

David / dhildenb

2020-07-28 13:51:52

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()

On 06/30/20 at 04:26pm, David Hildenbrand wrote:
> Let's move the split comment regarding bootmem allocations and memory
> holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
> check.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_alloc.c | 22 ++++++----------------
> 1 file changed, 6 insertions(+), 16 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48eb0f1410d47..bd3ebf08f09b9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> unsigned long iter = 0;
> unsigned long pfn = page_to_pfn(page);
>
> - /*
> - * TODO we could make this much more efficient by not checking every
> - * page in the range if we know all of them are in MOVABLE_ZONE and
> - * that the movable zone guarantees that pages are migratable but
> - * the later is not the case right now unfortunatelly. E.g. movablecore
> - * can still lead to having bootmem allocations in zone_movable.
> - */
> -
> if (is_migrate_cma_page(page)) {
> /*
> * CMA allocations (alloc_contig_range) really need to mark
> @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>
> page = pfn_to_page(pfn + iter);
>
> + /*
> + * Both, bootmem allocations and memory holes are marked
> + * PG_reserved and are unmovable. We can even have unmovable
> + * allocations inside ZONE_MOVABLE, for example when
> + * specifying "movable_core".
~~~~ should be 'movablecore', we don't
have kernel parameter 'movable_core'.

Otherwise, this looks good to me. Esp the code comment at below had been
added very long time ago and obsolete.

Reviewed-by: Baoquan He <[email protected]>

By the way, David, do you know what is the situation of having unmovable
allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly
went through find_zone_movable_pfns_for_nodes(), but didn't get why.
Could you tell a little more detail about it?

Thanks
Baoquan

> + */
> if (PageReserved(page))
> return page;
>
> @@ -8306,14 +8304,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> * it. But now, memory offline itself doesn't call
> * shrink_node_slabs() and it still to be fixed.
> */
> - /*
> - * If the page is not RAM, page_count()should be 0.
> - * we don't need more check. This is an _used_ not-movable page.
> - *
> - * The problematic thing here is PG_reserved pages. PG_reserved
> - * is set to both of a memory hole page and a _used_ kernel
> - * page at boot.
> - */
> return page;
> }
> return NULL;
> --
> 2.26.2
>
>

2020-07-28 13:58:41

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

On 06/30/20 at 04:26pm, David Hildenbrand wrote:
> Right now, if we have two isolations racing, we might trigger the
> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
> return directly.
>
> In the future, we might want to report -EAGAIN to the caller instead, as
> this could indicate a temporary isolation failure only.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_isolation.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index f6d07c5f0d34d..553b49a34cf71 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> /*
> * We assume the caller intended to SET migrate type to isolate.
> * If it is already set, then someone else must have raced and
> - * set it before us. Return -EBUSY
> + * set it before us.
> */
> - if (is_migrate_isolate_page(page))
> - goto out;
> + if (is_migrate_isolate_page(page)) {
> + spin_unlock_irqrestore(&zone->lock, flags);
> + return -EBUSY;

Good catch, the fix looks good to me.

Reviewed-by: Baoquan He <[email protected]>

> + }
>
> /*
> * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
> --
> 2.26.2
>
>

2020-07-28 14:09:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()

On 28.07.20 15:48, Baoquan He wrote:
> On 06/30/20 at 04:26pm, David Hildenbrand wrote:
>> Let's move the split comment regarding bootmem allocations and memory
>> holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
>> check.
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Michael S. Tsirkin <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> mm/page_alloc.c | 22 ++++++----------------
>> 1 file changed, 6 insertions(+), 16 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 48eb0f1410d47..bd3ebf08f09b9 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>> unsigned long iter = 0;
>> unsigned long pfn = page_to_pfn(page);
>>
>> - /*
>> - * TODO we could make this much more efficient by not checking every
>> - * page in the range if we know all of them are in MOVABLE_ZONE and
>> - * that the movable zone guarantees that pages are migratable but
>> - * the later is not the case right now unfortunatelly. E.g. movablecore
>> - * can still lead to having bootmem allocations in zone_movable.
>> - */
>> -
>> if (is_migrate_cma_page(page)) {
>> /*
>> * CMA allocations (alloc_contig_range) really need to mark
>> @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>
>> page = pfn_to_page(pfn + iter);
>>
>> + /*
>> + * Both, bootmem allocations and memory holes are marked
>> + * PG_reserved and are unmovable. We can even have unmovable
>> + * allocations inside ZONE_MOVABLE, for example when
>> + * specifying "movable_core".
> ~~~~ should be 'movablecore', we don't
> have kernel parameter 'movable_core'.

Agreed!

>
> Otherwise, this looks good to me. Esp the code comment at below had been
> added very long time ago and obsolete.
>
> Reviewed-by: Baoquan He <[email protected]>
>
> By the way, David, do you know what is the situation of having unmovable
> allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly
> went through find_zone_movable_pfns_for_nodes(), but didn't get why.
> Could you tell a little more detail about it?

As far as I understand, it can happen that we have memblock allocations
during boot that fall into an area the kernel later configures to span
the movable zone (via movable_core).

>
> Thanks
> Baoquan


--
Thanks,

David / dhildenb

2020-07-29 10:50:34

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()

On 07/28/20 at 04:07pm, David Hildenbrand wrote:
> On 28.07.20 15:48, Baoquan He wrote:
> > On 06/30/20 at 04:26pm, David Hildenbrand wrote:
> >> Let's move the split comment regarding bootmem allocations and memory
> >> holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
> >> check.
> >>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Michal Hocko <[email protected]>
> >> Cc: Michael S. Tsirkin <[email protected]>
> >> Signed-off-by: David Hildenbrand <[email protected]>
> >> ---
> >> mm/page_alloc.c | 22 ++++++----------------
> >> 1 file changed, 6 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 48eb0f1410d47..bd3ebf08f09b9 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> >> unsigned long iter = 0;
> >> unsigned long pfn = page_to_pfn(page);
> >>
> >> - /*
> >> - * TODO we could make this much more efficient by not checking every
> >> - * page in the range if we know all of them are in MOVABLE_ZONE and
> >> - * that the movable zone guarantees that pages are migratable but
> >> - * the later is not the case right now unfortunatelly. E.g. movablecore
> >> - * can still lead to having bootmem allocations in zone_movable.
> >> - */
> >> -
> >> if (is_migrate_cma_page(page)) {
> >> /*
> >> * CMA allocations (alloc_contig_range) really need to mark
> >> @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
> >>
> >> page = pfn_to_page(pfn + iter);
> >>
> >> + /*
> >> + * Both, bootmem allocations and memory holes are marked
> >> + * PG_reserved and are unmovable. We can even have unmovable
> >> + * allocations inside ZONE_MOVABLE, for example when
> >> + * specifying "movable_core".
> > ~~~~ should be 'movablecore', we don't
> > have kernel parameter 'movable_core'.
>
> Agreed!
>
> >
> > Otherwise, this looks good to me. Esp the code comment at below had been
> > added very long time ago and obsolete.
> >
> > Reviewed-by: Baoquan He <[email protected]>
> >
> > By the way, David, do you know what is the situation of having unmovable
> > allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly
> > went through find_zone_movable_pfns_for_nodes(), but didn't get why.
> > Could you tell a little more detail about it?
>
> As far as I understand, it can happen that we have memblock allocations
> during boot that fall into an area the kernel later configures to span
> the movable zone (via movable_core).

Seems yes, thanks a lot. Wondering who is still using
movablecore|kernelcore in what use case.

2020-07-29 12:30:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()

On 29.07.20 12:47, Baoquan He wrote:
> On 07/28/20 at 04:07pm, David Hildenbrand wrote:
>> On 28.07.20 15:48, Baoquan He wrote:
>>> On 06/30/20 at 04:26pm, David Hildenbrand wrote:
>>>> Let's move the split comment regarding bootmem allocations and memory
>>>> holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
>>>> check.
>>>>
>>>> Cc: Andrew Morton <[email protected]>
>>>> Cc: Michal Hocko <[email protected]>
>>>> Cc: Michael S. Tsirkin <[email protected]>
>>>> Signed-off-by: David Hildenbrand <[email protected]>
>>>> ---
>>>> mm/page_alloc.c | 22 ++++++----------------
>>>> 1 file changed, 6 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 48eb0f1410d47..bd3ebf08f09b9 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>> unsigned long iter = 0;
>>>> unsigned long pfn = page_to_pfn(page);
>>>>
>>>> - /*
>>>> - * TODO we could make this much more efficient by not checking every
>>>> - * page in the range if we know all of them are in MOVABLE_ZONE and
>>>> - * that the movable zone guarantees that pages are migratable but
>>>> - * the later is not the case right now unfortunatelly. E.g. movablecore
>>>> - * can still lead to having bootmem allocations in zone_movable.
>>>> - */
>>>> -
>>>> if (is_migrate_cma_page(page)) {
>>>> /*
>>>> * CMA allocations (alloc_contig_range) really need to mark
>>>> @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>
>>>> page = pfn_to_page(pfn + iter);
>>>>
>>>> + /*
>>>> + * Both, bootmem allocations and memory holes are marked
>>>> + * PG_reserved and are unmovable. We can even have unmovable
>>>> + * allocations inside ZONE_MOVABLE, for example when
>>>> + * specifying "movable_core".
>>> ~~~~ should be 'movablecore', we don't
>>> have kernel parameter 'movable_core'.
>>
>> Agreed!
>>
>>>
>>> Otherwise, this looks good to me. Esp the code comment at below had been
>>> added very long time ago and obsolete.
>>>
>>> Reviewed-by: Baoquan He <[email protected]>
>>>
>>> By the way, David, do you know what is the situation of having unmovable
>>> allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly
>>> went through find_zone_movable_pfns_for_nodes(), but didn't get why.
>>> Could you tell a little more detail about it?
>>
>> As far as I understand, it can happen that we have memblock allocations
>> during boot that fall into an area the kernel later configures to span
>> the movable zone (via movable_core).
>
> Seems yes, thanks a lot. Wondering who is still using
> movablecore|kernelcore in what use case.
>

AFAIK, it's the only (guaranteed) way to get ZONE_MOVABLE without
involving memory hotplug. As I learned, the movable zone is also
interesting beyond memory hotunplug. It limits unmovable fragmentation
and e.g., makes THP/huge pages more likely/easier to allocate.

--
Thanks,

David / dhildenb

2020-07-29 13:25:19

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

On 06/30/20 at 04:26pm, David Hildenbrand wrote:
> Inside has_unmovable_pages(), we have a comment describing how unmovable
> data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
~~~ 'movablecore'
> checking if the first page in the pageblock is reserved, we don't
> perform any further checks in case of ZONE_MOVABLE.
>
> In case of memory offlining, we set REPORT_FAILURE, properly
> dump_page() the page and handle the error gracefully.
> alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
> E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
> pages only, which will never end up on the MOVABLE zone
> (see htlb_alloc_mask()).
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_isolation.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 553b49a34cf71..02a01bff6b219 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> spin_unlock_irqrestore(&zone->lock, flags);
> if (!ret) {
> drain_all_pages(zone);
> - } else {
> - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> -
> - if ((isol_flags & REPORT_FAILURE) && unmovable)
> - /*
> - * printk() with zone->lock held will likely trigger a
> - * lockdep splat, so defer it here.
> - */
> - dump_page(unmovable, "unmovable page");
> - }
> + } else if ((isol_flags & REPORT_FAILURE) && unmovable)

This else if branch should be enclosed in brace?

> + /*
> + * printk() with zone->lock held will likely trigger a
> + * lockdep splat, so defer it here.
> + */
> + dump_page(unmovable, "unmovable page");
>
> return ret;
> }

Otherwise, the patch looks good to me.

Reviewed-by: Baoquan He <[email protected]>

2020-07-29 13:38:18

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

On 29.07.20 15:24, Baoquan He wrote:
> On 06/30/20 at 04:26pm, David Hildenbrand wrote:
>> Inside has_unmovable_pages(), we have a comment describing how unmovable
>> data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
> ~~~ 'movablecore'
>> checking if the first page in the pageblock is reserved, we don't
>> perform any further checks in case of ZONE_MOVABLE.
>>
>> In case of memory offlining, we set REPORT_FAILURE, properly
>> dump_page() the page and handle the error gracefully.
>> alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
>> E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
>> pages only, which will never end up on the MOVABLE zone
>> (see htlb_alloc_mask()).
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Michael S. Tsirkin <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> mm/page_isolation.c | 16 ++++++----------
>> 1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 553b49a34cf71..02a01bff6b219 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>> spin_unlock_irqrestore(&zone->lock, flags);
>> if (!ret) {
>> drain_all_pages(zone);
>> - } else {
>> - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>> -
>> - if ((isol_flags & REPORT_FAILURE) && unmovable)
>> - /*
>> - * printk() with zone->lock held will likely trigger a
>> - * lockdep splat, so defer it here.
>> - */
>> - dump_page(unmovable, "unmovable page");
>> - }
>> + } else if ((isol_flags & REPORT_FAILURE) && unmovable)
>
> This else if branch should be enclosed in brace?
>

Not necessarily. And it will be gone in the next patch in this series :)

Thanks!


--
Thanks,

David / dhildenb

2020-07-29 14:07:24

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

On 07/29/20 at 03:37pm, David Hildenbrand wrote:
> On 29.07.20 15:24, Baoquan He wrote:
> > On 06/30/20 at 04:26pm, David Hildenbrand wrote:
> >> Inside has_unmovable_pages(), we have a comment describing how unmovable
> >> data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
> > ~~~ 'movablecore'
> >> checking if the first page in the pageblock is reserved, we don't
> >> perform any further checks in case of ZONE_MOVABLE.
> >>
> >> In case of memory offlining, we set REPORT_FAILURE, properly
> >> dump_page() the page and handle the error gracefully.
> >> alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
> >> E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
> >> pages only, which will never end up on the MOVABLE zone
> >> (see htlb_alloc_mask()).
> >>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Michal Hocko <[email protected]>
> >> Cc: Michael S. Tsirkin <[email protected]>
> >> Signed-off-by: David Hildenbrand <[email protected]>
> >> ---
> >> mm/page_isolation.c | 16 ++++++----------
> >> 1 file changed, 6 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> >> index 553b49a34cf71..02a01bff6b219 100644
> >> --- a/mm/page_isolation.c
> >> +++ b/mm/page_isolation.c
> >> @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> >> spin_unlock_irqrestore(&zone->lock, flags);
> >> if (!ret) {
> >> drain_all_pages(zone);
> >> - } else {
> >> - WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> >> -
> >> - if ((isol_flags & REPORT_FAILURE) && unmovable)
> >> - /*
> >> - * printk() with zone->lock held will likely trigger a
> >> - * lockdep splat, so defer it here.
> >> - */
> >> - dump_page(unmovable, "unmovable page");
> >> - }
> >> + } else if ((isol_flags & REPORT_FAILURE) && unmovable)
> >
> > This else if branch should be enclosed in brace?
> >
>
> Not necessarily. And it will be gone in the next patch in this series :)

OK, that's fine. Thanks.

2020-07-29 14:07:54

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v1 4/6] mm/page_isolation: cleanup set_migratetype_isolate()

On 06/30/20 at 04:26pm, David Hildenbrand wrote:
> Let's clean it up a bit, simplifying error handling and getting rid of
> the label.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_isolation.c | 18 +++++++-----------
> 1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 02a01bff6b219..5f869bef23fa4 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -17,12 +17,9 @@
>
> static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
> {
> - struct page *unmovable = NULL;
> - struct zone *zone;
> + struct zone *zone = page_zone(page);
> + struct page *unmovable;
> unsigned long flags;
> - int ret = -EBUSY;
> -
> - zone = page_zone(page);
>
> spin_lock_irqsave(&zone->lock, flags);
>
> @@ -51,21 +48,20 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> NULL);
>
> __mod_zone_freepage_state(zone, -nr_pages, mt);
> - ret = 0;
> + spin_unlock_irqrestore(&zone->lock, flags);
> + drain_all_pages(zone);
> + return 0;
> }
>
> -out:
> spin_unlock_irqrestore(&zone->lock, flags);
> - if (!ret) {
> - drain_all_pages(zone);
> - } else if ((isol_flags & REPORT_FAILURE) && unmovable)
> + if (isol_flags & REPORT_FAILURE)
> /*
> * printk() with zone->lock held will likely trigger a
> * lockdep splat, so defer it here.
> */
> dump_page(unmovable, "unmovable page");
>
> - return ret;
> + return -EBUSY;

Reviewed-by: Baoquan He <[email protected]>

2020-07-29 17:34:13

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

On 6/30/20 7:26 AM, David Hildenbrand wrote:
> Right now, if we have two isolations racing, we might trigger the
> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
> return directly.

Just curious, what call path has the WARN_ON_ONCE()/dump_page(NULL)?

>
> In the future, we might want to report -EAGAIN to the caller instead, as
> this could indicate a temporary isolation failure only.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Hi David,

That 'return -EAGAIN' was added as a sort of synchronization mechanism.
See commit message for 2c7452a075d4d. Before adding the 'return -EAGAIN',
I could create races which would abandon isolated pageblocks. Repeating
those races over and over would result in a good chunk of system memory
being isolated and unusable.

Admittedly, these races are rare and I had to work really hard to produce
them. I'll try to find my testing mechanism. My concern is reintroducing
this abandoning of pageblocks. I have not looked further in your series
to see if this potentially addressed later. If not, then we should not
remove the return code.

--
Mike Kravetz

> ---
> mm/page_isolation.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index f6d07c5f0d34d..553b49a34cf71 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> /*
> * We assume the caller intended to SET migrate type to isolate.
> * If it is already set, then someone else must have raced and
> - * set it before us. Return -EBUSY
> + * set it before us.
> */
> - if (is_migrate_isolate_page(page))
> - goto out;
> + if (is_migrate_isolate_page(page)) {
> + spin_unlock_irqrestore(&zone->lock, flags);
> + return -EBUSY;
> + }
>
> /*
> * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
>

2020-07-29 18:12:12

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

On 29.07.20 19:31, Mike Kravetz wrote:
> On 6/30/20 7:26 AM, David Hildenbrand wrote:
>> Right now, if we have two isolations racing, we might trigger the
>> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
>> return directly.
>
> Just curious, what call path has the WARN_ON_ONCE()/dump_page(NULL)?

See below, two set_migratetype_isolate() caller racing.

>
>>
>> In the future, we might want to report -EAGAIN to the caller instead, as
>> this could indicate a temporary isolation failure only.
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Michael S. Tsirkin <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>
> Hi David,
>
> That 'return -EAGAIN' was added as a sort of synchronization mechanism.
> See commit message for 2c7452a075d4d. Before adding the 'return -EAGAIN',
> I could create races which would abandon isolated pageblocks. Repeating
> those races over and over would result in a good chunk of system memory
> being isolated and unusable.

It's actually -EBUSY, it should maybe later be changed to -EAGAIN (see
comment), so caller can decide to retry immediately. Other discussion.

>
> Admittedly, these races are rare and I had to work really hard to produce
> them. I'll try to find my testing mechanism. My concern is reintroducing
> this abandoning of pageblocks. I have not looked further in your series
> to see if this potentially addressed later. If not, then we should not
> remove the return code.
>

Memory offlining could race with alloc_contig_range(), e.g., called when
allocating gigantic pages, or when virtio-mem tries to unplug memory.
The latter two could also race.

We are getting more alloc_contig_range() users, which is why these races
will become more relevant.

I have no clue what you mean with "reintroducing this abandoning of
pageblocks". All this patch is changing is not doing the dump_page() -
or am I missing something important?


--
Thanks,

David / dhildenb

2020-07-29 18:36:38

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

On 7/29/20 11:08 AM, David Hildenbrand wrote:
> I have no clue what you mean with "reintroducing this abandoning of
> pageblocks". All this patch is changing is not doing the dump_page() -
> or am I missing something important?

My apologies!!!

I got confused when I saw 'Return -EBUSY' removed from the comment and
assumed the code would not return an error code. The code now more
explicitly does return -EBUSY. My concern was when I incorrectly thought
you were removing the error return code. Sorry for the noise.

Acked-by: Mike Kravetz <[email protected]>
--
Mike Kravetz

2020-07-29 18:42:19

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()



> Am 29.07.2020 um 20:36 schrieb Mike Kravetz <[email protected]>:
>
> On 7/29/20 11:08 AM, David Hildenbrand wrote:
>> I have no clue what you mean with "reintroducing this abandoning of
>> pageblocks". All this patch is changing is not doing the dump_page() -
>> or am I missing something important?
>
> My apologies!!!
>

No worries, thanks for reviewing!!

> I got confused when I saw 'Return -EBUSY' removed from the comment and
> assumed the code would not return an error code. The code now more
> explicitly does return -EBUSY. My concern was when I incorrectly thought
> you were removing the error return code. Sorry for the noise.
>
> Acked-by: Mike Kravetz <[email protected]>
> --
> Mike Kravetz
>

2020-07-30 04:25:09

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

> Right now, if we have two isolations racing, we might trigger the
> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
> return directly.
>
> In the future, we might want to report -EAGAIN to the caller instead, as
> this could indicate a temporary isolation failure only.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_isolation.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index f6d07c5f0d34d..553b49a34cf71 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> /*
> * We assume the caller intended to SET migrate type to isolate.
> * If it is already set, then someone else must have raced and
> - * set it before us. Return -EBUSY
> + * set it before us.
> */
> - if (is_migrate_isolate_page(page))
> - goto out;
> + if (is_migrate_isolate_page(page)) {
> + spin_unlock_irqrestore(&zone->lock, flags);
> + return -EBUSY;
> + }
>
> /*
> * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
> --

Reviewed-by: Pankaj Gupta <[email protected]>

> 2.26.2
>
>

2020-07-30 04:32:25

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [PATCH v1 4/6] mm/page_isolation: cleanup set_migratetype_isolate()

> Let's clean it up a bit, simplifying error handling and getting rid of
> the label.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> mm/page_isolation.c | 18 +++++++-----------
> 1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 02a01bff6b219..5f869bef23fa4 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -17,12 +17,9 @@
>
> static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
> {
> - struct page *unmovable = NULL;
> - struct zone *zone;
> + struct zone *zone = page_zone(page);
> + struct page *unmovable;
> unsigned long flags;
> - int ret = -EBUSY;
> -
> - zone = page_zone(page);
>
> spin_lock_irqsave(&zone->lock, flags);
>
> @@ -51,21 +48,20 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
> NULL);
>
> __mod_zone_freepage_state(zone, -nr_pages, mt);
> - ret = 0;
> + spin_unlock_irqrestore(&zone->lock, flags);
> + drain_all_pages(zone);
> + return 0;
> }
>
> -out:
> spin_unlock_irqrestore(&zone->lock, flags);
> - if (!ret) {
> - drain_all_pages(zone);
> - } else if ((isol_flags & REPORT_FAILURE) && unmovable)
> + if (isol_flags & REPORT_FAILURE)
> /*
> * printk() with zone->lock held will likely trigger a
> * lockdep splat, so defer it here.
> */
> dump_page(unmovable, "unmovable page");
>
> - return ret;
> + return -EBUSY;
> }
>
> static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> --

This clean up looks good to me.

Reviewed-by: Pankaj Gupta <[email protected]>

> 2.26.2
>
>