Changes to support virtual cpu hotplug in QEMU[1] have been introduced to the
community as RFC. These are under review.
To support virtual cpu hotplug guest kernel must:
1. Identify disabled/present vcpus and set/unset the present mask of the vcpu
during initialization and hotplug event. It must also set the possible mask
(which includes disabled vcpus) during init of guest kernel.
2. Provide architecture specific ACPI hooks, for example to map/unmap the
logical cpuid to hwids/MPIDR. Linux kernel already has generic ACPI cpu
hotplug framework support.
Changes introduced in this patch-set also ensures that initialization of the
cpus when virtual cpu hotplug is not supported remains un-affected.
Repository:
(*) Kernel changes are at,
https://github.com/salil-mehta/linux.git virt-cpuhp-arm64/rfc-v1
(*) QEMU changes for vcpu hotplug could be cloned from below site,
https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
THINGS TO DO:
1. Handling of per-cpu variables especially the first-chunk allocations
(which are NUMA aware) when the vcpu is hotplugged needs further attention
and review.
2. NUMA related stuff has not been fully tested both in QEMU and kernel.
3. Comprehensive Testing including when cpu hotplug is not supported.
4. Docs
DISCLAIMER:
This is not a complete work but an effort to present the arm vcpu hotplug
implementation to the community. This RFC is being used as a way to verify
the idea mentioned above and to support changes presented for QEMU[1] to
support vcpu hotplug. As of now this is *not* a production level code and might
have bugs. Only a basic testing has been done on HiSilicon Kunpeng920 ARM64
based SoC for Servers to verify the proof-of-concept that has been found working!
Best regards
Salil.
REFERENCES:
[1] https://www.mail-archive.com/[email protected]/msg712010.html
[2] https://lkml.org/lkml/2019/6/28/1157
[3] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
Organization of Patches:
[Patch 1-3]
(*) Changes required during guest boot time to support vcpu hotplug
(*) Max cpu overflow checks
(*) Changes required to pre-setup cpu-operations even for disabled cpus
[Patch 4]
(*) Arch changes required by guest kernel ACPI CPU Hotplug framework.
Salil Mehta (4):
arm64: kernel: Handle disabled[(+)present] cpus in MADT/GICC during
init
arm64: kernel: Bound the total(present+disabled) cpus with nr_cpu_ids
arm64: kernel: Init cpu operations for all possible vcpus
arm64: kernel: Arch specific ACPI hooks(like logical cpuid<->hwid
etc.)
arch/arm64/kernel/smp.c | 153 ++++++++++++++++++++++++++++++++--------
1 file changed, 123 insertions(+), 30 deletions(-)
--
2.17.1
Bound the total number of identified cpus(including disabled cpus) by
maximum allowed limit by the kernel. Max value is either specified as
part of the kernel parameters 'nr_cpus' or specified during compile
time using CONFIG_NR_CPUS.
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
---
arch/arm64/kernel/smp.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 51a707928302..864ccd3da419 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -513,6 +513,7 @@ static int __init smp_cpu_setup(int cpu)
}
static bool bootcpu_valid __initdata;
+static bool cpus_clipped __initdata = false;
static unsigned int cpu_count = 1;
static unsigned int disabled_cpu_count;
@@ -536,6 +537,11 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
unsigned int total_cpu_count = disabled_cpu_count + cpu_count;
u64 hwid = processor->arm_mpidr;
+ if (total_cpu_count > nr_cpu_ids) {
+ cpus_clipped = true;
+ return;
+ }
+
if (!(processor->flags & ACPI_MADT_ENABLED)) {
#ifndef CONFIG_ACPI_HOTPLUG_CPU
pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid);
@@ -569,9 +575,6 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
return;
}
- if (cpu_count >= NR_CPUS)
- return;
-
/* map the logical cpu id to cpu MPIDR */
cpu_logical_map(total_cpu_count) = hwid;
@@ -688,8 +691,10 @@ static void __init of_parse_and_init_cpus(void)
continue;
}
- if (cpu_count >= NR_CPUS)
+ if (cpu_count >= NR_CPUS) {
+ cpus_clipped = true;
goto next;
+ }
pr_debug("cpu logical map 0x%llx\n", hwid);
cpu_logical_map(cpu_count) = hwid;
@@ -710,6 +715,7 @@ static void __init of_parse_and_init_cpus(void)
*/
void __init smp_init_cpus(void)
{
+ unsigned int total_cpu_count = disabled_cpu_count + cpu_count;
int i;
if (acpi_disabled)
@@ -717,9 +723,9 @@ void __init smp_init_cpus(void)
else
acpi_parse_and_init_cpus();
- if (cpu_count > nr_cpu_ids)
+ if (cpus_clipped)
pr_warn("Number of cores (%d) exceeds configured maximum of %u - clipping\n",
- cpu_count, nr_cpu_ids);
+ total_cpu_count, nr_cpu_ids);
if (!bootcpu_valid) {
pr_err("missing boot CPU MPIDR, not enabling secondaries\n");
--
2.17.1
Currently, cpu-operations are only initialized for the cpus which
already have logical cpuid to hwid assoication established. And this
only happens for the cpus which are present during boot time.
To support virtual cpu hotplug, we shall initialze the cpu-operations
for all possible(present+disabled) vcpus. This means logical cpuid to
hwid/mpidr association might not exists(i.e. might be INVALID_HWID)
during init. Later, when the vcpu is actually hotplugged logical cpuid
is allocated and associated with the hwid/mpidr.
This patch does some refactoring to support above change.
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
---
arch/arm64/kernel/smp.c | 38 ++++++++++++++++----------------------
1 file changed, 16 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 864ccd3da419..63f31ea23e55 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -503,13 +503,16 @@ static int __init smp_cpu_setup(int cpu)
const struct cpu_operations *ops;
if (init_cpu_ops(cpu))
- return -ENODEV;
+ goto out;
ops = get_cpu_ops(cpu);
if (ops->cpu_init(cpu))
- return -ENODEV;
+ goto out;
return 0;
+out:
+ cpu_logical_map(cpu) = INVALID_HWID;
+ return -ENODEV;
}
static bool bootcpu_valid __initdata;
@@ -547,7 +550,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid);
#else
cpu_madt_gicc[total_cpu_count] = *processor;
- set_cpu_possible(total_cpu_count, true);
+ if (!smp_cpu_setup(total_cpu_count))
+ set_cpu_possible(total_cpu_count, true);
disabled_cpu_count++;
#endif
return;
@@ -591,8 +595,11 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
*/
acpi_set_mailbox_entry(total_cpu_count, processor);
- set_cpu_possible(total_cpu_count, true);
- set_cpu_present(total_cpu_count, true);
+ if (!smp_cpu_setup(total_cpu_count)) {
+ set_cpu_possible(total_cpu_count, true);
+ set_cpu_present(total_cpu_count, true);
+ }
+
cpu_count++;
}
@@ -701,8 +708,10 @@ static void __init of_parse_and_init_cpus(void)
early_map_cpu_to_node(cpu_count, of_node_to_nid(dn));
- set_cpu_possible(cpu_count, true);
- set_cpu_present(cpu_count, true);
+ if (!smp_cpu_setup(cpu_count)) {
+ set_cpu_possible(cpu_count, true);
+ set_cpu_present(cpu_count, true);
+ }
next:
cpu_count++;
}
@@ -716,7 +725,6 @@ static void __init of_parse_and_init_cpus(void)
void __init smp_init_cpus(void)
{
unsigned int total_cpu_count = disabled_cpu_count + cpu_count;
- int i;
if (acpi_disabled)
of_parse_and_init_cpus();
@@ -731,20 +739,6 @@ void __init smp_init_cpus(void)
pr_err("missing boot CPU MPIDR, not enabling secondaries\n");
return;
}
-
- /*
- * We need to set the cpu_logical_map entries before enabling
- * the cpus so that cpu processor description entries (DT cpu nodes
- * and ACPI MADT entries) can be retrieved by matching the cpu hwid
- * with entries in cpu_logical_map while initializing the cpus.
- * If the cpu set-up fails, invalidate the cpu_logical_map entry.
- */
- for (i = 1; i < nr_cpu_ids; i++) {
- if (cpu_logical_map(i) != INVALID_HWID) {
- if (smp_cpu_setup(i))
- cpu_logical_map(i) = INVALID_HWID;
- }
- }
}
void __init smp_prepare_cpus(unsigned int max_cpus)
--
2.17.1
To support virtual cpu hotplug, some arch specifc hooks must be
facilitated. These hooks are called by the generic ACPI cpu hotplug
framework during a vcpu hot-(un)plug event handling. The changes
required involve:
1. Allocation of the logical cpuid corresponding to the hwid/mpidr
2. Mapping of logical cpuid to hwid/mpidr and marking present
3. Removing vcpu from present mask during hot-unplug
4. For arm64, all possible cpus are registered within topology_init()
Hence, we need to override the weak ACPI call of arch_register_cpu()
(which returns -ENODEV) and return success.
5. NUMA node mapping set for this vcpu using SRAT Table info during init
time will be discarded as the logical cpu-ids used at that time
might not be correct. This mapping will be set again using the
proximity/node info obtained by evaluating _PXM ACPI method.
Note, during hot unplug of vcpu, we do not unmap the association between
the logical cpuid and hwid/mpidr. This remains persistent.
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
---
arch/arm64/kernel/smp.c | 80 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 80 insertions(+)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 63f31ea23e55..f3315840e829 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -528,6 +528,86 @@ struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu)
return &cpu_madt_gicc[cpu];
}
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+int arch_register_cpu(int num)
+{
+ return 0;
+}
+
+static int set_numa_node_for_cpu(acpi_handle handle, int cpu)
+{
+#ifdef CONFIG_ACPI_NUMA
+ int node_id;
+
+ /* will evaluate _PXM */
+ node_id = acpi_get_node(handle);
+ if (node_id != NUMA_NO_NODE)
+ set_cpu_numa_node(cpu, node_id);
+#endif
+ return 0;
+}
+
+static void unset_numa_node_for_cpu(int cpu)
+{
+#ifdef CONFIG_ACPI_NUMA
+ set_cpu_numa_node(cpu, NUMA_NO_NODE);
+#endif
+}
+
+static int allocate_logical_cpuid(u64 physid)
+{
+ int first_invalid_idx = -1;
+ bool first = true;
+ int i;
+
+ for_each_possible_cpu(i) {
+ /*
+ * logical cpuid<->hwid association remains persistent once
+ * established
+ */
+ if (cpu_logical_map(i) == physid)
+ return i;
+
+ if ((cpu_logical_map(i) == INVALID_HWID) && first) {
+ first_invalid_idx = i;
+ first = false;
+ }
+ }
+
+ return first_invalid_idx;
+}
+
+int acpi_unmap_cpu(int cpu)
+{
+ set_cpu_present(cpu, false);
+ unset_numa_node_for_cpu(cpu);
+
+ return 0;
+}
+
+int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
+ int *cpuid)
+{
+ int cpu;
+
+ cpu = allocate_logical_cpuid(physid);
+ if (cpu < 0) {
+ pr_warn("Unable to map logical cpuid to physid 0x%llx\n",
+ physid);
+ return -ENOSPC;
+ }
+
+ /* map the logical cpu id to cpu MPIDR */
+ cpu_logical_map(cpu) = physid;
+ set_numa_node_for_cpu(handle, cpu);
+
+ set_cpu_present(cpu, true);
+ *cpuid = cpu;
+
+ return 0;
+}
+#endif
+
/*
* acpi_map_gic_cpu_interface - parse processor MADT entry
*
--
2.17.1
With ACPI enabled, cpus get identified by the presence of the GICC
entry in the MADT Table. Each GICC entry part of MADT presents cpu as
enabled or disabled. As of now, the disabled cpus are skipped as
physical cpu hotplug is not supported. These remain disabled even after
the kernel has booted.
To support virtual cpu hotplug(in which case disabled vcpus could be
hotplugged even after kernel has booted), QEMU will populate MADT Table
with appropriate details of GICC entry for each possible(present+disabled)
vcpu. Now, during the init time vcpus will be identified as present or
disabled. To achieve this, below changes have been made with respect to
the present/possible vcpu handling along with the mentioned reasoning:
1. Identify all possible(present+disabled) vcpus at boot/init time
and set their present mask and possible mask. In the existing code,
cpus are being marked present quite late within smp_prepare_cpus()
function, which gets called in context to the kernel thread. Since
the cpu hotplug is not supported, present cpus are always equal to
the possible cpus. But with cpu hotplug enabled, this assumption is
not true. Hence, present cpus should be marked while MADT GICC entries
are bring parsed for each vcpu.
2. Set possible cpus to include disabled. This needs to be done now
while parsing MADT GICC entries corresponding to each vcpu as the
disabled vcpu info is available only at this point as for hotplug
case possible vcpus is not equal to present vcpus.
3. We will store the parsed madt/gicc entry even for the disabled vcpus
during init time. This is needed as some modules like PMU registers
IRQs for each possible vcpus during init time. Therefore, a valid
entry of the MADT GICC should be present for all possible vcpus.
4. Refactoring related to DT/OF is also done to align it with the init
changes to support vcpu hotplug.
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
---
arch/arm64/kernel/smp.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index e43a8ff19f0f..51a707928302 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -509,13 +509,12 @@ static int __init smp_cpu_setup(int cpu)
if (ops->cpu_init(cpu))
return -ENODEV;
- set_cpu_possible(cpu, true);
-
return 0;
}
static bool bootcpu_valid __initdata;
static unsigned int cpu_count = 1;
+static unsigned int disabled_cpu_count;
#ifdef CONFIG_ACPI
static struct acpi_madt_generic_interrupt cpu_madt_gicc[NR_CPUS];
@@ -534,10 +533,17 @@ struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu)
static void __init
acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
{
+ unsigned int total_cpu_count = disabled_cpu_count + cpu_count;
u64 hwid = processor->arm_mpidr;
if (!(processor->flags & ACPI_MADT_ENABLED)) {
+#ifndef CONFIG_ACPI_HOTPLUG_CPU
pr_debug("skipping disabled CPU entry with 0x%llx MPIDR\n", hwid);
+#else
+ cpu_madt_gicc[total_cpu_count] = *processor;
+ set_cpu_possible(total_cpu_count, true);
+ disabled_cpu_count++;
+#endif
return;
}
@@ -546,7 +552,7 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
return;
}
- if (is_mpidr_duplicate(cpu_count, hwid)) {
+ if (is_mpidr_duplicate(total_cpu_count, hwid)) {
pr_err("duplicate CPU MPIDR 0x%llx in MADT\n", hwid);
return;
}
@@ -567,9 +573,9 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
return;
/* map the logical cpu id to cpu MPIDR */
- cpu_logical_map(cpu_count) = hwid;
+ cpu_logical_map(total_cpu_count) = hwid;
- cpu_madt_gicc[cpu_count] = *processor;
+ cpu_madt_gicc[total_cpu_count] = *processor;
/*
* Set-up the ACPI parking protocol cpu entries
@@ -580,8 +586,10 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
* initialize the cpu if the parking protocol is
* the only available enable method).
*/
- acpi_set_mailbox_entry(cpu_count, processor);
+ acpi_set_mailbox_entry(total_cpu_count, processor);
+ set_cpu_possible(total_cpu_count, true);
+ set_cpu_present(total_cpu_count, true);
cpu_count++;
}
@@ -614,6 +622,9 @@ static void __init acpi_parse_and_init_cpus(void)
acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
acpi_parse_gic_cpu_interface, 0);
+ pr_debug("possible cpus(%u) present cpus(%u) disabled cpus(%u)\n",
+ cpu_count+disabled_cpu_count, cpu_count, disabled_cpu_count);
+
/*
* In ACPI, SMP and CPU NUMA information is provided in separate
* static tables, namely the MADT and the SRAT.
@@ -684,6 +695,9 @@ static void __init of_parse_and_init_cpus(void)
cpu_logical_map(cpu_count) = hwid;
early_map_cpu_to_node(cpu_count, of_node_to_nid(dn));
+
+ set_cpu_possible(cpu_count, true);
+ set_cpu_present(cpu_count, true);
next:
cpu_count++;
}
@@ -768,7 +782,6 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
if (err)
continue;
- set_cpu_present(cpu, true);
numa_store_cpu_info(cpu);
}
}
--
2.17.1
Hello,
A gentle reminder, any comments regarding this series will help us know
your opinion and also confirm/correct our understanding about the topic
and will be much appreciated.
Thanks in anticipation!
Best regards
Salil
> From: Salil Mehta
> Sent: Thursday, June 25, 2020 2:38 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Linuxarm
> <[email protected]>; [email protected]; Salil Mehta
> <[email protected]>
> Subject: [PATCH RFC 0/4] Changes to Support *Virtual* CPU Hotplug for ARM64
>
> Changes to support virtual cpu hotplug in QEMU[1] have been introduced to the
> community as RFC. These are under review.
>
> To support virtual cpu hotplug guest kernel must:
> 1. Identify disabled/present vcpus and set/unset the present mask of the vcpu
> during initialization and hotplug event. It must also set the possible mask
> (which includes disabled vcpus) during init of guest kernel.
> 2. Provide architecture specific ACPI hooks, for example to map/unmap the
> logical cpuid to hwids/MPIDR. Linux kernel already has generic ACPI cpu
> hotplug framework support.
>
> Changes introduced in this patch-set also ensures that initialization of the
> cpus when virtual cpu hotplug is not supported remains un-affected.
>
> Repository:
> (*) Kernel changes are at,
> https://github.com/salil-mehta/linux.git virt-cpuhp-arm64/rfc-v1
> (*) QEMU changes for vcpu hotplug could be cloned from below site,
> https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
>
>
> THINGS TO DO:
> 1. Handling of per-cpu variables especially the first-chunk allocations
> (which are NUMA aware) when the vcpu is hotplugged needs further attention
> and review.
> 2. NUMA related stuff has not been fully tested both in QEMU and kernel.
> 3. Comprehensive Testing including when cpu hotplug is not supported.
> 4. Docs
>
> DISCLAIMER:
> This is not a complete work but an effort to present the arm vcpu hotplug
> implementation to the community. This RFC is being used as a way to verify
> the idea mentioned above and to support changes presented for QEMU[1] to
> support vcpu hotplug. As of now this is *not* a production level code and might
> have bugs. Only a basic testing has been done on HiSilicon Kunpeng920 ARM64
> based SoC for Servers to verify the proof-of-concept that has been found working!
>
> Best regards
> Salil.
>
> REFERENCES:
> [1] https://www.mail-archive.com/[email protected]/msg712010.html
> [2] https://lkml.org/lkml/2019/6/28/1157
> [3] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
>
> Organization of Patches:
> [Patch 1-3]
> (*) Changes required during guest boot time to support vcpu hotplug
> (*) Max cpu overflow checks
> (*) Changes required to pre-setup cpu-operations even for disabled cpus
> [Patch 4]
> (*) Arch changes required by guest kernel ACPI CPU Hotplug framework.
>
>
> Salil Mehta (4):
> arm64: kernel: Handle disabled[(+)present] cpus in MADT/GICC during
> init
> arm64: kernel: Bound the total(present+disabled) cpus with nr_cpu_ids
> arm64: kernel: Init cpu operations for all possible vcpus
> arm64: kernel: Arch specific ACPI hooks(like logical cpuid<->hwid
> etc.)
>
> arch/arm64/kernel/smp.c | 153 ++++++++++++++++++++++++++++++++--------
> 1 file changed, 123 insertions(+), 30 deletions(-)
>
> --
> 2.17.1
>
Hi Salil,
On 07/07/2020 10:52, Salil Mehta wrote:
>> From: Salil Mehta
Disambiguation: by cpu-hotplug here, you don't mean
CONFIG_HOTPLUG_CPU backed by PSCI, which is commonly what we mean in the arm world. You
mean: package hot-add. A bunch of CPUs (and maybe more) that weren't present at boot have
turned up.
>> Changes to support virtual cpu hotplug in QEMU[1] have been introduced to the
>> community as RFC. These are under review.
>>
>> To support virtual cpu hotplug guest kernel must:
Surely number 1 is: know its a virtual machine, and that whatever needs doing/describing
on a real machine, doesn't need doing or describing here...
We add support for virtual machines after support for the physical machine. Is anyone
building hardware that supports this?
We can assume some will exist during the lifetime of a stable-kernel. The stable-kernel
will claim to support this, but in reality it will crash and burn in exciting ways.
(e.g. parts of the interrupt controller in the hot-added package would need configuring.
We'd either lock up during boot when we try, but its not there ... or not do it when the
package is added because we assumed this was a VM)
I don't think linux can support this for virtual machines until it works for real machines
too. We don't have a reliable way of determining we are running in a VM.
This at least needs the ACPI spec updating to describe what work the OS has to do when a
package comes online, and what it can't touch until then.
I don't think this work would happen without someone building such a system.
>> 1. Identify disabled/present vcpus and set/unset the present mask of the vcpu
>> during initialization and hotplug event. It must also set the possible mask
>> (which includes disabled vcpus) during init of guest kernel.
>> 2. Provide architecture specific ACPI hooks, for example to map/unmap the
>> logical cpuid to hwids/MPIDR. Linux kernel already has generic ACPI cpu
>> hotplug framework support.
>> Changes introduced in this patch-set also ensures that initialization of the
>> cpus when virtual cpu hotplug is not supported remains un-affected.
But on a platform with physical cpu hotplug, really-bad-things will happen.
There is no description here of what problem you are trying to solve. I don't believe 'cpu
hotlpug' is an end in itself.
~
Aha, its in the qemu cover letter:
| This allows scaling the guest VM compute capacity on-demand which would be
| useful for the following example scenarios,
| 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
| framework which could adjust resource requests (CPU and Mem requests) for
| the containers in a pod, based on usage.
|2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
| restrict the total number of compute resources available to the guest VM
| according to the SLA(Service Level Agreement). VM owner could request for
| more compute to be hot-plugged for some cost.
Controlling CPU time makes perfect sense. But doesn't cgroup already do exactly this?
If a VM is restricted to 1xCPU of cpu-time, it can online as many vcpu as it likes, its
not going to get more than 1xCPU of cpu-time.
I understand that this is how kubernetes reconfigures a VM on x86, but I'm fairly sure x86
had physical cpu hotplug before, so the firmware/OS responsibilities were well understood.
I think this series creates a support nightmare for the future.
This has come up before:
https://lore.kernel.org/kvmarm/[email protected]/
Thanks,
James
>> Repository:
>> (*) Kernel changes are at,
>> https://github.com/salil-mehta/linux.git virt-cpuhp-arm64/rfc-v1
>> (*) QEMU changes for vcpu hotplug could be cloned from below site,
>> https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
>>
>>
>> THINGS TO DO:
>> 1. Handling of per-cpu variables especially the first-chunk allocations
>> (which are NUMA aware) when the vcpu is hotplugged needs further attention
>> and review.
>> 2. NUMA related stuff has not been fully tested both in QEMU and kernel.
>> 3. Comprehensive Testing including when cpu hotplug is not supported.
>> 4. Docs
>>
>> DISCLAIMER:
>> This is not a complete work but an effort to present the arm vcpu hotplug
>> implementation to the community. This RFC is being used as a way to verify
>> the idea mentioned above and to support changes presented for QEMU[1] to
>> support vcpu hotplug. As of now this is *not* a production level code and might
>> have bugs. Only a basic testing has been done on HiSilicon Kunpeng920 ARM64
>> based SoC for Servers to verify the proof-of-concept that has been found working!
>>
>> Best regards
>> Salil.
>>
>> REFERENCES:
>> [1] https://www.mail-archive.com/[email protected]/msg712010.html
>> [2] https://lkml.org/lkml/2019/6/28/1157
>> [3] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
>>
>> Organization of Patches:
>> [Patch 1-3]
>> (*) Changes required during guest boot time to support vcpu hotplug
>> (*) Max cpu overflow checks
>> (*) Changes required to pre-setup cpu-operations even for disabled cpus
>> [Patch 4]
>> (*) Arch changes required by guest kernel ACPI CPU Hotplug framework.
>>
>>
>> Salil Mehta (4):
>> arm64: kernel: Handle disabled[(+)present] cpus in MADT/GICC during
>> init
>> arm64: kernel: Bound the total(present+disabled) cpus with nr_cpu_ids
>> arm64: kernel: Init cpu operations for all possible vcpus
>> arm64: kernel: Arch specific ACPI hooks(like logical cpuid<->hwid
>> etc.)
>>
>> arch/arm64/kernel/smp.c | 153 ++++++++++++++++++++++++++++++++--------
>> 1 file changed, 123 insertions(+), 30 deletions(-)
>>
>> --
>> 2.17.1
>>
>
Hi James,
Thanks for taking time to respond. Please find my replies inline
Thanks
> From: James Morse [mailto:[email protected]]
> Sent: Wednesday, July 8, 2020 1:30 PM
> To: Salil Mehta <[email protected]>
>
> Hi Salil,
>
> On 07/07/2020 10:52, Salil Mehta wrote:
> >> From: Salil Mehta
>
>
> Disambiguation: by cpu-hotplug here, you don't mean
> CONFIG_HOTPLUG_CPU backed by PSCI, which is commonly what we mean in the arm
> world. You
> mean: package hot-add. A bunch of CPUs (and maybe more) that weren't present
> at boot have
> turned up.
Exactly, and hence the terminology of the possible, present and disabled comes
from there.
Present : which are present at boot time of the guest and were presented as
'ENABLED'(set in the flag) in the ACPI MADT Table cpu interface entry
by QEMU
Disabled : which were not-present at boot time of the guest and were presented
as 'DISABLED'(not set in the flag) in the ACPI MADT Table cpu interface
entry by QEMU
Possible : (Present + Disabled)
This series is meant to support infrastructure to hot-(un)plug of virtual cpus
at QEMU level. It does not assumes any Hardware cpu hotplug support present at
the host machine rather it is an attempt to make virtual cpu hotplug support
fairly independent of the hardware.
> >> Changes to support virtual cpu hotplug in QEMU[1] have been introduced to the
> >> community as RFC. These are under review.
> >>
> >> To support virtual cpu hotplug guest kernel must:
>
> Surely number 1 is: know its a virtual machine, and that whatever needs
> doing/describing
> on a real machine, doesn't need doing or describing here...
>
> We add support for virtual machines after support for the physical machine. Is
> anyone building hardware that supports this?
Do we really care about it if we could make virtual layer independent of the
Hardware by pre-sizing the resources(vcpus, GIC etc.) at QEMU level? :)
AFAIK, right now there is *no* known real benefit of the physical cpu hotplug
except for some of the cases which are really hard to solve *physically* and
perhaps require more comprehensive system architecture defined, like
1. Scalable servers, where cards could be purchased to add resources and compute
on demand. This might be useful for small to medium enterprises who would
like to start with something small but would want to scale up in time as their
business grows. You would want to keep some resources closely coupled because
of 'N' reasons
2. Die/SoC Hotplug, similar to above but more granular. This could be used for
saving power as well.
Again any of above looks to be a far-fetched idea right now.
But there are definite benefits and use-cases(as described in QEMU patches) to
support *virtual* cpu hotplug. Plus, we need to keep the way we support hotplug
consistent with x86. Yes, there are some inherent differences between APIC of x86
and GIC of ARM but those workaround are in the QEMU and the guest kernel is
agnostic about them and so is the host kernel. Why not let virtualizer deal
with this?
BTW, if you are aware of any physical cpu hotplug implementations then please
do let us know.
> We can assume some will exist during the lifetime of a stable-kernel. The
> stable-kernel
> will claim to support this, but in reality it will crash and burn in exciting
> ways.
> (e.g. parts of the interrupt controller in the hot-added package would need
> configuring.
> We'd either lock up during boot when we try, but its not there ... or not do
> it when the
> package is added because we assumed this was a VM)
Sure, but right now we are not even aware of the physical cpu hotplug requirements
(and look to be far-fetched) but *virtual* cpu hotplug requirement are very clear.
As far as I can tell, the changes being presented are non-intrusive to the host
and guest kernel but if there are any aspects of the patches which make you feel
otherwise then please do clarify objectively that it will make our life easier.
As such, broadly 2 types of changes are being presented in the patch:
1. Arch specific
a. Reshuffling of the code where and how the present/disabled cpus are being
counted and their corresponding mask set.
b. Their cpu operations
2. Generic ACPI CPU hotplug hooks which lands in arch specific code. These must
be implemented in any case.
Changes in 1a. and 1b.(part of patches 01/04, 02/04, 03/04) are mere reshuffling
to be frank. And rest changes in 2. are the hooks being called by hotplug
specific framework.
> I don't think linux can support this for virtual machines until it works for
> real machines
> too. We don't have a reliable way of determining we are running in a VM.
x86 supports both physical and vcpu hotplug. And I could not see any code for
x86 cpu hotplug support inside kernel which assumes this. Neither does this code
under review.
IMHO, the changes should be such that we should not require that distinction.
This patch-set has changes which are quite generic.
> This at least needs the ACPI spec updating to describe what work the OS has to
> do when a
> package comes online, and what it can't touch until then.
> I don't think this work would happen without someone building such a system.
Fair enough, I understand this point. But do you think this reason is good
enough to block a *real* use case (which is in demand because it helps the
business part) for the one which might actually never happen? :)
Not having *virtual* cpu hotplug feature support does impacts use cases of
the business and the deployment. And when you could see x86 has a wonderful way
to realize those use-cases but ARM cannot support it just because its system
architecture does not supports it. It means we are deadlocked?
Although I do understand your point and that we need more thought on this
part as well but it is important that we match the x86 server features and
above deadlock is blocking the real useful cases which are being demanded by
customers. They don’t want to get away from the look-and-feel of the x86 in
terms of the way they are used and configured but actually want to use ARM
servers.
> >> 1. Identify disabled/present vcpus and set/unset the present mask of the vcpu
> >> during initialization and hotplug event. It must also set the possible mask
> >> (which includes disabled vcpus) during init of guest kernel.
> >> 2. Provide architecture specific ACPI hooks, for example to map/unmap the
> >> logical cpuid to hwids/MPIDR. Linux kernel already has generic ACPI cpu
> >> hotplug framework support.
>
> >> Changes introduced in this patch-set also ensures that initialization of the
> >> cpus when virtual cpu hotplug is not supported remains un-affected.
>
> But on a platform with physical cpu hotplug, really-bad-things will happen.
Ok. But then if we have physical hotplug support then we will *exactly* know
how it is different than the x86 in terms of using the generic ACPI cpu hotplug
framework and also in terms of any changes which might require at the architecture
specific code than the ones which have been suggested here. Can't we deal that
grey not so clear part later?
> There is no description here of what problem you are trying to solve. I don't
> believe 'cpu hotlpug' is an end in itself.
>
> ~
>
> Aha, its in the qemu cover letter:
> | This allows scaling the guest VM compute capacity on-demand which would be
> | useful for the following example scenarios,
> | 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
> | framework which could adjust resource requests (CPU and Mem requests) for
> | the containers in a pod, based on usage.
> |2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
> | restrict the total number of compute resources available to the guest VM
> | according to the SLA(Service Level Agreement). VM owner could request for
> | more compute to be hot-plugged for some cost.
>
> Controlling CPU time makes perfect sense. But doesn't cgroup already do exactly
> this?
I am not sure I clearly understood what you mean by 'cpu time' here?
But few points:
1. We cannot achieve 2. by any way if you don’t have actual support of cpu hotplug.
User wants x86 kind of infrastructure where his existing framework or scripts
for Libvirt etc could work even with ARM platforms.
2. We want to push admin of vcpu and its resource allocation down to the layer
where it should belong i.e. QEMU and not the guest itself. There could be cases
when user would not want even admin to enter his VM and use guest command line
to configure the machine. And using a privileged mode of the guest to restrict
the user himself to perform the same operation from his command line is
inherently insecure and not an ideal solution.
> If a VM is restricted to 1xCPU of cpu-time, it can online as many vcpu as it
> likes, its
> not going to get more than 1xCPU of cpu-time.
Forgive me, if I am missing something terribly simple here but could you please
clarify more. I am not able to catch what you mean here by cpu-time?
> I understand that this is how kubernetes reconfigures a VM on x86, but I'm fairly
> sure x86
> had physical cpu hotplug before, so the firmware/OS responsibilities were well
> understood.
On the last part, yes physical cpu hotplug is supported in x86 but if you see the
QEMU code, clearly the only difference is the Local APIC which is per vcpu and can
be deferred realized till the time the vcpu is hotplugged. This is the key difference
with ARM GIC which requires its redistributors and CPU interfaces all to be present
and realized at the time of GIC init time. We solve this problem by pre-sizing the
vgic with the possible vcpus at QEMU.
>
>
> I think this series creates a support nightmare for the future.
Could you be more specific please in pointing in the patches which part or assumption
Is objectionable?
> This has come up before:
> https://lore.kernel.org/kvmarm/[email protected]
> /
Yes, we are aware of that. Please check the Reference [2] of this patch-set I have
mentioned that discussion. This patch-set has changed the implementation and is
now working with different set of QEMU changes based upon Marc's suggestions.
So both QEMU and kernel part has been re-written in the past 6 months.
Link: https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
Thanks,
Salil
> Thanks,
>
> James
>
>
> >> Repository:
> >> (*) Kernel changes are at,
> >> https://github.com/salil-mehta/linux.git virt-cpuhp-arm64/rfc-v1
> >> (*) QEMU changes for vcpu hotplug could be cloned from below site,
> >> https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
> >>
> >>
> >> THINGS TO DO:
> >> 1. Handling of per-cpu variables especially the first-chunk allocations
> >> (which are NUMA aware) when the vcpu is hotplugged needs further attention
> >> and review.
> >> 2. NUMA related stuff has not been fully tested both in QEMU and kernel.
> >> 3. Comprehensive Testing including when cpu hotplug is not supported.
> >> 4. Docs
> >>
> >> DISCLAIMER:
> >> This is not a complete work but an effort to present the arm vcpu hotplug
> >> implementation to the community. This RFC is being used as a way to verify
> >> the idea mentioned above and to support changes presented for QEMU[1] to
> >> support vcpu hotplug. As of now this is *not* a production level code and
> might
> >> have bugs. Only a basic testing has been done on HiSilicon Kunpeng920 ARM64
> >> based SoC for Servers to verify the proof-of-concept that has been found
> working!
> >>
> >> Best regards
> >> Salil.
> >>
> >> REFERENCES:
> >> [1] https://www.mail-archive.com/[email protected]/msg712010.html
> >> [2] https://lkml.org/lkml/2019/6/28/1157
> >> [3] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-July/032316.html
> >>
> >> Organization of Patches:
> >> [Patch 1-3]
> >> (*) Changes required during guest boot time to support vcpu hotplug
> >> (*) Max cpu overflow checks
> >> (*) Changes required to pre-setup cpu-operations even for disabled cpus
> >> [Patch 4]
> >> (*) Arch changes required by guest kernel ACPI CPU Hotplug framework.
> >>
> >>
> >> Salil Mehta (4):
> >> arm64: kernel: Handle disabled[(+)present] cpus in MADT/GICC during
> >> init
> >> arm64: kernel: Bound the total(present+disabled) cpus with nr_cpu_ids
> >> arm64: kernel: Init cpu operations for all possible vcpus
> >> arm64: kernel: Arch specific ACPI hooks(like logical cpuid<->hwid
> >> etc.)
> >>
> >> arch/arm64/kernel/smp.c | 153 ++++++++++++++++++++++++++++++++--------
> >> 1 file changed, 123 insertions(+), 30 deletions(-)
> >>
> >> --
> >> 2.17.1
> >>
> >