While providing guests, it's desirable to resize it's memory on demand.
By now, it's possible to do so by creating a guest with a small base
memory, hot-plugging all the rest, and using 'movable_node' kernel
command-line parameter, which puts all hot-plugged memory in
ZONE_MOVABLE, allowing it to be removed whenever needed.
But there is an issue regarding guest reboot:
If memory is hot-plugged, and then the guest is rebooted, all hot-plugged
memory goes to ZONE_NORMAL, which offers no guaranteed hot-removal.
It usually prevents this memory to be hot-removed from the guest.
It's possible to use device-tree information to fix that behavior, as
it stores flags for LMB ranges on ibm,dynamic-memory-vN.
It involves marking each memblock with the correct flags as hotpluggable
memory, which mm/memblock.c puts in ZONE_MOVABLE during boot if
'movable_node' is passed.
For base memory, qemu assigns these flags for it's LMBs:
(DRCONF_MEM_AI_INVALID | DRCONF_MEM_RESERVED)
For hot-plugged memory, it assigns (DRCONF_MEM_ASSIGNED).
While guest kernel reads the device-tree, early_init_drmem_lmb() is
called for every added LMBs, doing nothing for base memory, and adding
memblocks for hot-plugged memory. Skipping base memory happens here:
if ((lmb->flags & DRCONF_MEM_RESERVED) ||
!(lmb->flags & DRCONF_MEM_ASSIGNED))
return;
Marking memblocks added by this function as hotplugable memory
is enough to get the desirable behavior, and should cause no change
if 'movable_node' parameter is not passed to kernel.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/kernel/prom.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 6620f37abe73..f4d14c67bf53 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -518,6 +518,8 @@ static void __init early_init_drmem_lmb(struct drmem_lmb *lmb,
DBG("Adding: %llx -> %llx\n", base, size);
if (validate_mem_limit(base, &size))
memblock_add(base, size);
+
+ early_init_dt_mark_hotplug_memory_arch(base, size);
} while (--rngs);
}
#endif /* CONFIG_PPC_PSERIES */
--
2.24.1
On Fri, Feb 28, 2020 at 11:36 AM Leonardo Bras <[email protected]> wrote:
>
> While providing guests, it's desirable to resize it's memory on demand.
>
> By now, it's possible to do so by creating a guest with a small base
> memory, hot-plugging all the rest, and using 'movable_node' kernel
> command-line parameter, which puts all hot-plugged memory in
> ZONE_MOVABLE, allowing it to be removed whenever needed.
>
> But there is an issue regarding guest reboot:
> If memory is hot-plugged, and then the guest is rebooted, all hot-plugged
> memory goes to ZONE_NORMAL, which offers no guaranteed hot-removal.
> It usually prevents this memory to be hot-removed from the guest.
>
> It's possible to use device-tree information to fix that behavior, as
> it stores flags for LMB ranges on ibm,dynamic-memory-vN.
> It involves marking each memblock with the correct flags as hotpluggable
> memory, which mm/memblock.c puts in ZONE_MOVABLE during boot if
> 'movable_node' is passed.
>
> For base memory, qemu assigns these flags for it's LMBs:
> (DRCONF_MEM_AI_INVALID | DRCONF_MEM_RESERVED)
> For hot-plugged memory, it assigns (DRCONF_MEM_ASSIGNED).
>
> While guest kernel reads the device-tree, early_init_drmem_lmb() is
> called for every added LMBs, doing nothing for base memory, and adding
> memblocks for hot-plugged memory. Skipping base memory happens here:
>
> if ((lmb->flags & DRCONF_MEM_RESERVED) ||
> !(lmb->flags & DRCONF_MEM_ASSIGNED))
> return;
>
> Marking memblocks added by this function as hotplugable memory
> is enough to get the desirable behavior, and should cause no change
> if 'movable_node' parameter is not passed to kernel.
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/kernel/prom.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 6620f37abe73..f4d14c67bf53 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -518,6 +518,8 @@ static void __init early_init_drmem_lmb(struct drmem_lmb *lmb,
> DBG("Adding: %llx -> %llx\n", base, size);
> if (validate_mem_limit(base, &size))
> memblock_add(base, size);
> +
> + early_init_dt_mark_hotplug_memory_arch(base, size);
Hi,
I tried this a few years back
(https://patchwork.ozlabs.org/patch/800142/) and didn't pursue it
further because at that time, it was felt that the approach might not
work for PowerVM guests, because all the present memory except RMA
gets marked as hot-pluggable by PowerVM. This discussion is not
present in the above thread, but during my private discussions with
Reza and Nathan, it was noted that making all that memory as MOVABLE
is not preferable for PowerVM guests as we might run out of memory for
kernel allocations.
Regards,
Bharata.
--
http://raobharata.wordpress.com/
Hello Bharata, thanks for this feedback!
On Wed, 2020-03-04 at 10:13 +0530, Bharata B Rao wrote:
> Hi,
>
> I tried this a few years back
> (https://patchwork.ozlabs.org/patch/800142/) and didn't pursue it
> further because at that time, it was felt that the approach might not
> work for PowerVM guests, because all the present memory except RMA
> gets marked as hot-pluggable by PowerVM. This discussion is not
> present in the above thread, but during my private discussions with
> Reza and Nathan, it was noted that making all that memory as MOVABLE
> is not preferable for PowerVM guests as we might run out of memory for
> kernel allocations.
Humm, this makes sense.
But with mu change, these pieces of memory only get into ZONE_MOVABLE
if the boot parameter 'movable_node' gets passed to guest kernel.
So, even if we are unable to sort out some flag combination that work
fine for both use-cases, if PowerVM don't pass 'movable_node' as boot
parameter to kernel, it will behave just as today.
What are your thoughts on that?
Best regards,
Leonardo Bras
On Wed, 2020-03-04 at 04:18 -0300, Leonardo Bras wrote:
> Humm, this makes sense.
> But with mu change, these pieces of memory only get into ZONE_MOVABLE
> if the boot parameter 'movable_node' gets passed to guest kernel.
Humm, I think your patch also does that.
> So, even if we are unable to sort out some flag combination that work
> fine for both use-cases, if PowerVM don't pass 'movable_node' as boot
> parameter to kernel, it will behave just as today.
Also, another option would be adding a new 'removable' flag, given it
has a lot of free bytes. It would only be passed by qemu, so we would
be safe with PowerVM.
Then we would have
+ if(lmb->flags & DRCONF_MEM_REMOVABLE)
+ early_init_dt_mark_hotplug_memory_arch(base, size);
Do you know if it's possible?
We would need to update the LOPAPR?
Leonardo