For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
maintained by me. This should go via the -mm tree.
---
When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.
Now that the semantics are much clearer (and are documented in patch #6),
let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks. This avoids surprises when onlining
of memory blocks suddenly fails, just because they are not completely
populated by virtio-mem (yet).
This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.
Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs. alloc_contig_range()).
v4 -> v5:
- Rename "mm/page_isolation: don't dump_page(NULL) in
set_migratetype_isolate()" to "mm/page_isolation: exit early when
pageblock is isolated in set_migratetype_isolate()" as I was messing
up things while reshuffling patches (dump_page(NULL) could never happen,
only the WARN_ON_ONCE())
-- Clarify in the description that this is currently a cleanup only
- "mm: document semantics of ZONE_MOVABLE"
-- Clarified in the memory hole case, that kernelcore/movablecore is
required to create such rare special cases
v3 -> v4:
- "mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()"
-- Fix typo in description
- "virtio-mem: don't special-case ZONE_MOVABLE"
-- Add more details why we initialli special-cased ZONE_MOVABLE (via MST)
- "mm: document semantics of ZONE_MOVABLE"
-- Rephrase some parts of documentation (via Mike)
v2 -> v3:
- "mm: document semantics of ZONE_MOVABLE"
-- Fix a typo
v1 -> v2:
- "mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()"
-- Move to position 1, add Fixes: tag
-- Drop unused "out:" label
- "mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()"
-- Keep curly braces on "else" case
- Replace "[PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization
in has_unmovable_pages() to memory offlining"
by "mm: document semantics of ZONE_MOVABLE"
-- Brain dump of what I know about ZONE_MOVABLE
David Hildenbrand (6):
mm/page_alloc: tweak comments in has_unmovable_pages()
mm/page_isolation: exit early when pageblock is isolated in
set_migratetype_isolate()
mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
mm/page_isolation: cleanup set_migratetype_isolate()
virtio-mem: don't special-case ZONE_MOVABLE
mm: document semantics of ZONE_MOVABLE
drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
include/linux/mmzone.h | 35 +++++++++++++++++++++++++++
mm/page_alloc.c | 22 +++++------------
mm/page_isolation.c | 39 ++++++++++++++----------------
4 files changed, 66 insertions(+), 77 deletions(-)
--
2.26.2
Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movablecore". Also, besides
checking if the first page in the pageblock is reserved, we don't
perform any further checks in case of ZONE_MOVABLE.
In case of memory offlining, we set REPORT_FAILURE, properly
dump_page() the page and handle the error gracefully.
alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
pages only, which will never end up on the MOVABLE zone
(see htlb_alloc_mask()).
Reviewed-by: Baoquan He <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
mm/page_isolation.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7d7d263ce7f4b..d099aac479601 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -57,15 +57,12 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
spin_unlock_irqrestore(&zone->lock, flags);
if (!ret) {
drain_all_pages(zone);
- } else {
- WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-
- if ((isol_flags & REPORT_FAILURE) && unmovable)
- /*
- * printk() with zone->lock held will likely trigger a
- * lockdep splat, so defer it here.
- */
- dump_page(unmovable, "unmovable page");
+ } else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+ /*
+ * printk() with zone->lock held will likely trigger a
+ * lockdep splat, so defer it here.
+ */
+ dump_page(unmovable, "unmovable page");
}
return ret;
--
2.26.2
When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather
unclear, which is why we special-cased ZONE_MOVABLE such that partially
plugged blocks would never end up in ZONE_MOVABLE.
Now that the semantics are much clearer (and will be documented in
a follow-up patch including the new virtio-mem behavior), let's allow to
online partially plugged memory blocks to ZONE_MOVABLE and also consider
memory blocks that were onlined to ZONE_MOVABLE when unplugging memory.
While unplugged memory pages are, in general, unmovable, they can be
skipped when offlining memory.
virtio-mem only unplugs fairly big chunks (in the megabyte range) and
rather tries to shrink the memory region than randomly choosing memory. In
theory, if all other pages in the movable zone would be movable, virtio-mem
would only shrink that zone and not create any kind of fragmentation.
In the future, we might want to remember the zone again and use the
information when (un)plugging memory. For now, let's keep it simple.
Note: Support for defragmentation is planned, to deal with fragmentation
after unplug due to memory chunks within memory blocks that could not
get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE
for a longer time).
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Baoquan He <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/virtio/virtio_mem.c | 47 +++++++------------------------------
1 file changed, 8 insertions(+), 39 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f26f5f64ae822..2ddfc4a0e2ee0 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -36,18 +36,10 @@ enum virtio_mem_mb_state {
VIRTIO_MEM_MB_STATE_OFFLINE,
/* Partially plugged, fully added to Linux, offline. */
VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
- /* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+ /* Fully plugged, fully added to Linux, online. */
VIRTIO_MEM_MB_STATE_ONLINE,
- /* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+ /* Partially plugged, fully added to Linux, online. */
VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
- /*
- * Fully plugged, fully added to Linux, online (ZONE_MOVABLE).
- * We are not allowed to allocate (unplug) parts of this block that
- * are not movable (similar to gigantic pages). We will never allow
- * to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain
- * unmovable parts).
- */
- VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE,
VIRTIO_MEM_MB_STATE_COUNT
};
@@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct virtio_mem *vm, unsigned long mb_id)
}
static int virtio_mem_notify_going_online(struct virtio_mem *vm,
- unsigned long mb_id,
- enum zone_type zone)
+ unsigned long mb_id)
{
switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
- /*
- * We won't allow to online a partially plugged memory block
- * to the MOVABLE zone - it would contain unmovable parts.
- */
- if (zone == ZONE_MOVABLE) {
- dev_warn_ratelimited(&vm->vdev->dev,
- "memory block has holes, MOVABLE not supported\n");
- return NOTIFY_BAD;
- }
- return NOTIFY_OK;
case VIRTIO_MEM_MB_STATE_OFFLINE:
return NOTIFY_OK;
default:
@@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
break;
case VIRTIO_MEM_MB_STATE_ONLINE:
- case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE:
virtio_mem_mb_set_state(vm, mb_id,
VIRTIO_MEM_MB_STATE_OFFLINE);
break;
@@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
virtio_mem_retry(vm);
}
-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id,
- enum zone_type zone)
+static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id)
{
unsigned long nb_offline;
switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
- BUG_ON(zone == ZONE_MOVABLE);
virtio_mem_mb_set_state(vm, mb_id,
VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
break;
case VIRTIO_MEM_MB_STATE_OFFLINE:
- if (zone == ZONE_MOVABLE)
- virtio_mem_mb_set_state(vm, mb_id,
- VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE);
- else
- virtio_mem_mb_set_state(vm, mb_id,
- VIRTIO_MEM_MB_STATE_ONLINE);
+ virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
break;
default:
BUG();
@@ -675,7 +648,6 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
const unsigned long start = PFN_PHYS(mhp->start_pfn);
const unsigned long size = PFN_PHYS(mhp->nr_pages);
const unsigned long mb_id = virtio_mem_phys_to_mb_id(start);
- enum zone_type zone;
int rc = NOTIFY_OK;
if (!virtio_mem_overlaps_range(vm, start, size))
@@ -717,8 +689,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
break;
}
vm->hotplug_active = true;
- zone = page_zonenum(pfn_to_page(mhp->start_pfn));
- rc = virtio_mem_notify_going_online(vm, mb_id, zone);
+ rc = virtio_mem_notify_going_online(vm, mb_id);
break;
case MEM_OFFLINE:
virtio_mem_notify_offline(vm, mb_id);
@@ -726,8 +697,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb,
mutex_unlock(&vm->hotplug_mutex);
break;
case MEM_ONLINE:
- zone = page_zonenum(pfn_to_page(mhp->start_pfn));
- virtio_mem_notify_online(vm, mb_id, zone);
+ virtio_mem_notify_online(vm, mb_id);
vm->hotplug_active = false;
mutex_unlock(&vm->hotplug_mutex);
break;
@@ -1906,8 +1876,7 @@ static void virtio_mem_remove(struct virtio_device *vdev)
if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
- vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] ||
- vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) {
+ vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
dev_warn(&vdev->dev, "device still has system memory added\n");
} else {
virtio_mem_delete_resource(vm);
--
2.26.2
On 16.08.20 14:53, David Hildenbrand wrote:
> For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
> maintained by me. This should go via the -mm tree.
>
@Andrew, can we give this a churn if there are no further comments? Thanks!
--
Thanks,
David / dhildenb
On 21.08.20 10:31, David Hildenbrand wrote:
> On 16.08.20 14:53, David Hildenbrand wrote:
>> For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
>> maintained by me. This should go via the -mm tree.
>>
>
> @Andrew, can we give this a churn if there are no further comments? Thanks!
... I just spotted the patches in -next, strange I didn't get an email
notification. Thanks :)
--
Thanks,
David / dhildenb
On 08/21/20 at 10:31am, David Hildenbrand wrote:
> On 16.08.20 14:53, David Hildenbrand wrote:
> > For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
> > maintained by me. This should go via the -mm tree.
> >
>
> @Andrew, can we give this a churn if there are no further comments? Thanks!
Saw this series in next already.
On 21.08.20 10:46, Baoquan He wrote:
> On 08/21/20 at 10:31am, David Hildenbrand wrote:
>> On 16.08.20 14:53, David Hildenbrand wrote:
>>> For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
>>> maintained by me. This should go via the -mm tree.
>>>
>>
>> @Andrew, can we give this a churn if there are no further comments? Thanks!
>
> Saw this series in next already.
Hehe, yeah I also just stumbled over them while rebasing :)
--
Thanks,
David / dhildenb
On Fri, 21 Aug 2020 10:45:33 +0200 David Hildenbrand <[email protected]> wrote:
> On 21.08.20 10:31, David Hildenbrand wrote:
> > On 16.08.20 14:53, David Hildenbrand wrote:
> >> For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
> >> maintained by me. This should go via the -mm tree.
> >>
> >
> > @Andrew, can we give this a churn if there are no further comments? Thanks!
>
> ... I just spotted the patches in -next, strange I didn't get an email
> notification. Thanks :)
https://lore.kernel.org/mm-commits/20200819025501.gJhZlolfC%[email protected]/
akpm!=spam :)
On 21.08.20 22:55, Andrew Morton wrote:
> On Fri, 21 Aug 2020 10:45:33 +0200 David Hildenbrand <[email protected]> wrote:
>
>> On 21.08.20 10:31, David Hildenbrand wrote:
>>> On 16.08.20 14:53, David Hildenbrand wrote:
>>>> For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff
>>>> maintained by me. This should go via the -mm tree.
>>>>
>>>
>>> @Andrew, can we give this a churn if there are no further comments? Thanks!
>>
>> ... I just spotted the patches in -next, strange I didn't get an email
>> notification. Thanks :)
>
> https://lore.kernel.org/mm-commits/20200819025501.gJhZlolfC%[email protected]/
>
> akpm!=spam :)
>
I would never even dare to think that ... now I have to teach my spam
filter some good manners ;)
--
Thanks,
David / dhildenb