From: Isaku Yamahata <[email protected]>
Changes from v2:
- Replace the first patch("KVM: x86: Drop kvm_user_return_msr_cpu_online()")
with Sean's implementation
- Included all patches of "Improve KVM's interaction with CPU hotplug" [2]
Until v2, Tried to cherry-pick the least patches of it. It turned out that
all the patches are desirable.
This patch series is to implement the suggestion by Sean Christopherson [1]
to reorganize enable/disable cpu virtualization feature by replacing
the arch-generic current enable/disable logic with PM related hooks. And
convert kvm/x86 to use new hooks.
- Untable x86 hardware enable logic, snapshot MSRs for user return notifier,
enabling cpu virtualization on cpu online and platform resume. and real
enabling of CPU virtualization feature
- Introduce hooks related to PM.
- Convert kvm/x86 code to user those hooks.
- Split out hardware enabling/disabling logic into a separate file. Compile
it for non-x86 code. Once conversion of other KVM archs is done, this file
can be dropped.
- Delete cpus_hardware_enabled. 17/18 and 18/18
[1] https://lore.kernel.org/kvm/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
Changes from v1:
- Add a patch "KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section"
to make online/offline callback to run thread context to use mutex instead
of spin lock
- fixes pointed by Chao Gao
Chao Gao (4):
KVM: x86: Move check_processor_compatibility from init ops to runtime
ops
Partially revert "KVM: Pass kvm_init()'s opaque param to additional
arch funcs"
KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
KVM: Do compatibility checks on hotplugged CPUs
Isaku Yamahata (15):
KVM: x86: Use this_cpu_ptr() instead of
per_cpu_ptr(smp_processor_id())
KVM: Do processor compatibility check on resume
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with
kvm_lock
KVM: Add arch hooks for PM events with empty stub
KVM: x86: Move TSC fixup logic to KVM arch resume callback
KVM: Add arch hook when VM is added/deleted
KVM: Move out KVM arch PM hooks and hardware enable/disable logic
KVM: kvm_arch.c: Remove _nolock post fix
KVM: kvm_arch.c: Remove a global variable, hardware_enable_failed
KVM: x86: Duplicate arch callbacks related to pm events
KVM: Eliminate kvm_arch_post_init_vm()
KVM: x86: Delete kvm_arch_hardware_enable/disable()
KVM: Add config to not compile kvm_arch.c
RFC: KVM: x86: Remove cpus_hardware_enabled and related sanity check
RFC: KVM: Remove cpus_hardware_enabled and related sanity check
Marc Zyngier (1):
KVM: arm64: Simplify the CPUHP logic
Sean Christopherson (2):
KVM: x86: Drop kvm_user_return_msr_cpu_online()
KVM: Provide more information in kernel log if hardware enabling fails
Documentation/virt/kvm/locking.rst | 14 +-
arch/arm64/kvm/arch_timer.c | 27 ++--
arch/arm64/kvm/arm.c | 6 +-
arch/arm64/kvm/vgic/vgic-init.c | 19 +--
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/powerpc.c | 2 +-
arch/riscv/kvm/main.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/vmx/vmx.c | 24 ++--
arch/x86/kvm/x86.c | 177 ++++++++++++++++++++-----
include/kvm/arm_arch_timer.h | 4 +
include/kvm/arm_vgic.h | 4 +
include/linux/cpuhotplug.h | 5 +-
include/linux/kvm_host.h | 14 +-
virt/kvm/Kconfig | 3 +
virt/kvm/Makefile.kvm | 3 +
virt/kvm/kvm_arch.c | 127 ++++++++++++++++++
virt/kvm/kvm_main.c | 203 ++++++++++-------------------
22 files changed, 413 insertions(+), 233 deletions(-)
create mode 100644 virt/kvm/kvm_arch.c
base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
--
2.25.1
From: Isaku Yamahata <[email protected]>
convert per_cpu_ptr(smp_processor_id()) to this_cpu_ptr() as trivial
cleanup.
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Chao Gao <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0e200fe44b35..fd021581ca60 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -428,8 +428,7 @@ static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)
int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
int err;
kvm_user_return_msr_init_cpu(msrs);
@@ -453,8 +452,7 @@ EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
static void drop_user_return_notifiers(void)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
if (msrs->registered)
kvm_on_user_return(&msrs->urn);
--
2.25.1
From: Chao Gao <[email protected]>
At init time, KVM does compatibility checks to ensure that all online
CPUs support hardware virtualization and a common set of features. But
KVM uses hotplugged CPUs without such compatibility checks. On Intel
CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or
vmentry failure if the hotplugged CPU doesn't meet minimal feature
requirements.
Do compatibility checks when onlining a CPU and abort the online process
if the hotplugged CPU is incompatible with online CPUs.
CPU hotplug is disabled during hardware_enable_all() to prevent the corner
case as shown below. A hotplugged CPU marks itself online in
cpu_online_mask (1) and enables interrupt (2) before invoking callbacks
registered in ONLINE section (3). So, if hardware_enable_all() is invoked
on another CPU right after (2), then on_each_cpu() in hardware_enable_all()
invokes hardware_enable_nolock() on the hotplugged CPU before
kvm_online_cpu() is called. This makes the CPU escape from compatibility
checks, which is risky.
start_secondary { ...
set_cpu_online(smp_processor_id(), true); <- 1
...
local_irq_enable(); <- 2
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
}
Keep compatibility checks at KVM init time. It can help to find
incompatibility issues earlier and refuse to load arch KVM module
(e.g., kvm-intel).
Loosen the WARN_ON in kvm_arch_check_processor_compat so that it
can be invoked from KVM's CPU hotplug callback (i.e., kvm_online_cpu).
Opportunistically, add a pr_err() for setup_vmcs_config() path in
vmx_check_processor_compatibility() so that each possible error path has
its own error message. Convert printk(KERN_ERR ... to pr_err to please
checkpatch.pl
Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 10 ++++++----
arch/x86/kvm/x86.c | 11 +++++++++--
virt/kvm/kvm_main.c | 18 +++++++++++++++++-
3 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3cf7f18a4115..2a1ab6495299 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7421,20 +7421,22 @@ static int vmx_check_processor_compatibility(void)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;
+ int cpu = smp_processor_id();
if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
!this_cpu_has(X86_FEATURE_VMX)) {
- pr_err("kvm: VMX is disabled on CPU %d\n", smp_processor_id());
+ pr_err("kvm: VMX is disabled on CPU %d\n", cpu);
return -EIO;
}
- if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
+ if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) {
+ pr_err("kvm: failed to setup vmcs config on CPU %d\n", cpu);
return -EIO;
+ }
if (nested)
nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
- printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
- smp_processor_id());
+ pr_err("kvm: CPU %d feature inconsistency!\n", cpu);
return -EIO;
}
return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 53c8ee677f16..68def7ca224a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12000,9 +12000,16 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void)
{
- struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+ int cpu = smp_processor_id();
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
- WARN_ON(!irqs_disabled());
+ /*
+ * Compatibility checks are done when loading KVM or in KVM's CPU
+ * hotplug callback. It ensures all online CPUs are compatible to run
+ * vCPUs. For other cases, compatibility checks are unnecessary or
+ * even problematic. Try to detect improper usages here.
+ */
+ WARN_ON(!irqs_disabled() && cpu_active(cpu));
if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index db1303e2abc9..0ac00c711384 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5013,7 +5013,11 @@ static void hardware_enable_nolock(void *caller_name)
static int kvm_online_cpu(unsigned int cpu)
{
- int ret = 0;
+ int ret;
+
+ ret = kvm_arch_check_processor_compat();
+ if (ret)
+ return ret;
raw_spin_lock(&kvm_count_lock);
/*
@@ -5073,6 +5077,17 @@ static int hardware_enable_all(void)
{
int r = 0;
+ /*
+ * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+ * is called. on_each_cpu() between them includes the CPU. As a result,
+ * hardware_enable_nolock() may get invoked before kvm_online_cpu().
+ * This would enable hardware virtualization on that cpu without
+ * compatibility checks, which can potentially crash system or break
+ * running VMs.
+ *
+ * Disable CPU hotplug to prevent this case from happening.
+ */
+ cpus_read_lock();
raw_spin_lock(&kvm_count_lock);
kvm_usage_count++;
@@ -5087,6 +5102,7 @@ static int hardware_enable_all(void)
}
raw_spin_unlock(&kvm_count_lock);
+ cpus_read_unlock();
return r;
}
--
2.25.1
From: Isaku Yamahata <[email protected]>
and pass kvm_usage_count with kvm_lock. Move kvm_arch_post_init_vm() under
kvm_arch_add_vm(). Later kvm_arch_post_init_vm() is deleted once x86
overrides kvm_arch_add_vm().
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_arch.c | 12 +++++++++++-
virt/kvm/kvm_main.c | 21 +++++++++++++++++----
3 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dd2a6d98d4de..f78364e01ca9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1445,6 +1445,8 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
int kvm_arch_post_init_vm(struct kvm *kvm);
+int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
+int kvm_arch_del_vm(int usage_count);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
int kvm_arch_create_vm_debugfs(struct kvm *kvm);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 4748a76bcb03..0eac996f4981 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -10,11 +10,21 @@
#include <linux/kvm_host.h>
+__weak int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return 0;
+}
+
/*
* Called after the VM is otherwise initialized, but just before adding it to
* the vm_list.
*/
-__weak int kvm_arch_post_init_vm(struct kvm *kvm)
+__weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ return kvm_arch_post_init_vm(kvm);
+}
+
+__weak int kvm_arch_del_vm(int usage_count)
{
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e62240fb8474..90e1dcfc9ace 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -145,6 +145,7 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
static int hardware_enable_all(void);
static void hardware_disable_all(void);
static void hardware_disable_nolock(void *junk);
+static void kvm_del_vm(void);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1215,11 +1216,12 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_debugfs;
- r = kvm_arch_post_init_vm(kvm);
- if (r)
- goto out_err;
-
mutex_lock(&kvm_lock);
+ r = kvm_arch_add_vm(kvm, kvm_usage_count);
+ if (r) {
+ mutex_unlock(&kvm_lock);
+ goto out_err;
+ }
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);
@@ -1239,6 +1241,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
#endif
out_err_no_mmu_notifier:
hardware_disable_all();
+ kvm_del_vm();
out_err_no_disable:
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
@@ -1319,6 +1322,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
hardware_disable_all();
+ kvm_del_vm();
mmdrop(mm);
module_put(kvm_chardev_ops.owner);
}
@@ -5082,6 +5086,15 @@ static void hardware_disable_all(void)
cpus_read_unlock();
}
+static void kvm_del_vm(void)
+{
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
+ kvm_arch_del_vm(kvm_usage_count);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
+}
+
static int hardware_enable_all(void)
{
int r = 0;
--
2.25.1
From: Isaku Yamahata <[email protected]>
KVM/X86 can change those callbacks without worrying about breaking other
archs.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 131 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 126 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5f4d8eed588..9a28eb5fbb76 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11845,6 +11845,130 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}
+static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
+
+int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return kvm_mmu_post_init_vm(kvm);
+}
+
+static int __hardware_enable(void *caller_name)
+{
+ int cpu = raw_smp_processor_id();
+ int r;
+
+ WARN_ON_ONCE(preemptible());
+
+ if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return 0;
+ r = kvm_arch_hardware_enable();
+ if (r)
+ pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
+ cpu, (const char *)caller_name);
+ else
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ return r;
+}
+
+static void hardware_enable(void *arg)
+{
+ atomic_t *failed = arg;
+
+ if (__hardware_enable((void *)__func__))
+ atomic_inc(failed);
+}
+
+static void hardware_disable(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ WARN_ON_ONCE(preemptible());
+
+ if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ kvm_arch_hardware_disable();
+}
+
+void kvm_arch_pre_hardware_unsetup(void)
+{
+ on_each_cpu(hardware_disable, NULL, 1);
+}
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ atomic_t failed;
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&failed, 0);
+ on_each_cpu(hardware_enable, &failed, 1);
+
+ if (atomic_read(&failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r)
+ on_each_cpu(hardware_disable, NULL, 1);
+ return r;
+}
+
+int kvm_arch_del_vm(int usage_count)
+{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable, NULL, 1);
+ return 0;
+}
+
+int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ int r;
+
+ r = kvm_arch_check_processor_compat();
+ if (r)
+ return r;
+
+ if (!usage_count)
+ return 0;
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ return __hardware_enable((void *)__func__);
+}
+
+int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ if (usage_count)
+ hardware_disable(NULL);
+ return 0;
+}
+
+int kvm_arch_reboot(int val)
+{
+ on_each_cpu(hardware_disable, NULL, 1);
+ return NOTIFY_OK;
+}
+
+int kvm_arch_suspend(int usage_count)
+{
+ if (usage_count)
+ hardware_disable(NULL);
+ return 0;
+}
+
void kvm_arch_resume(int usage_count)
{
struct kvm *kvm;
@@ -11857,6 +11981,8 @@ void kvm_arch_resume(int usage_count)
if (!usage_count)
return;
+ if (kvm_arch_check_processor_compat())
+ return;
if (kvm_arch_hardware_enable())
return;
@@ -12115,11 +12241,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}
-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
{
vcpu_load(vcpu);
--
2.25.1
From: Sean Christopherson <[email protected]>
KVM/X86 uses user return notifier to switch MSR for guest or user space.
Snapshot host values on CPU online, change MSR values for guest, and
restore them on returning to user space. The current code abuses
kvm_arch_hardware_enable() which is called on kvm module initialization or
CPU online.
Remove such the abuse of kvm_arch_hardware_enable() by capturing the host
value on the first change of the MSR value to guest VM instead of CPU
online.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 205ebdc2b11b..0e200fe44b35 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -196,6 +196,7 @@ module_param(eager_page_split, bool, 0644);
struct kvm_user_return_msrs {
struct user_return_notifier urn;
+ bool initialized;
bool registered;
struct kvm_user_return_msr_values {
u64 host;
@@ -409,18 +410,20 @@ int kvm_find_user_return_msr(u32 msr)
}
EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
-static void kvm_user_return_msr_cpu_online(void)
+static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
u64 value;
int i;
+ if (msrs->initialized)
+ return;
+
for (i = 0; i < kvm_nr_uret_msrs; ++i) {
rdmsrl_safe(kvm_uret_msrs_list[i], &value);
msrs->values[i].host = value;
msrs->values[i].curr = value;
}
+ msrs->initialized = true;
}
int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
@@ -429,6 +432,8 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
int err;
+ kvm_user_return_msr_init_cpu(msrs);
+
value = (value & mask) | (msrs->values[slot].host & ~mask);
if (value == msrs->values[slot].curr)
return 0;
@@ -9212,7 +9217,12 @@ int kvm_arch_init(void *opaque)
return -ENOMEM;
}
- user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
+ /*
+ * __GFP_ZERO to ensure user_return_msrs.values[].initialized = false.
+ * See kvm_user_return_msr_init_cpu().
+ */
+ user_return_msrs = alloc_percpu_gfp(struct kvm_user_return_msrs,
+ GFP_KERNEL | __GFP_ZERO);
if (!user_return_msrs) {
printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
r = -ENOMEM;
@@ -11836,7 +11846,6 @@ int kvm_arch_hardware_enable(void)
u64 max_tsc = 0;
bool stable, backwards_tsc = false;
- kvm_user_return_msr_cpu_online();
ret = static_call(kvm_x86_hardware_enable)();
if (ret != 0)
return ret;
--
2.25.1
From: Isaku Yamahata <[email protected]>
Add arch hooks for reboot, suspend, resume, and CPU-online/offline events
with empty stub functions.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 6 +++++
virt/kvm/Makefile.kvm | 2 +-
virt/kvm/kvm_arch.c | 44 +++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 50 +++++++++++++++++++++++++---------------
4 files changed, 82 insertions(+), 20 deletions(-)
create mode 100644 virt/kvm/kvm_arch.c
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eab352902de7..dd2a6d98d4de 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1448,6 +1448,12 @@ int kvm_arch_post_init_vm(struct kvm *kvm);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+int kvm_arch_suspend(int usage_count);
+void kvm_arch_resume(int usage_count);
+int kvm_arch_reboot(int val);
+int kvm_arch_online_cpu(unsigned int cpu, int usage_count);
+int kvm_arch_offline_cpu(unsigned int cpu, int usage_count);
+
#ifndef __KVM_HAVE_ARCH_VM_ALLOC
/*
* All architectures that want to use vzalloc currently also
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..c4210acabd35 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,7 @@
KVM ?= ../../../virt/kvm
-kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/kvm_arch.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
new file mode 100644
index 000000000000..4748a76bcb03
--- /dev/null
+++ b/virt/kvm/kvm_arch.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * kvm_arch.c: kvm default arch hooks for hardware enabling/disabling
+ * Copyright (c) 2022 Intel Corporation.
+ *
+ * Author:
+ * Isaku Yamahata <[email protected]>
+ * <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+__weak int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return 0;
+}
+
+__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ return 0;
+}
+
+__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ return 0;
+}
+
+__weak int kvm_arch_reboot(int val)
+{
+ return NOTIFY_OK;
+}
+
+__weak int kvm_arch_suspend(int usage_count)
+{
+ return 0;
+}
+
+__weak void kvm_arch_resume(int usage_count)
+{
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 082d5dbc8d7f..e62240fb8474 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -144,6 +144,7 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#endif
static int hardware_enable_all(void);
static void hardware_disable_all(void);
+static void hardware_disable_nolock(void *junk);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1097,15 +1098,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
return ret;
}
-/*
- * Called after the VM is otherwise initialized, but just before adding it to
- * the vm_list.
- */
-int __weak kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -5033,6 +5025,10 @@ static int kvm_online_cpu(unsigned int cpu)
if (atomic_read(&hardware_enable_failed)) {
atomic_set(&hardware_enable_failed, 0);
ret = -EIO;
+ } else {
+ ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
+ if (ret)
+ hardware_disable_nolock(NULL);
}
}
mutex_unlock(&kvm_lock);
@@ -5053,11 +5049,19 @@ static void hardware_disable_nolock(void *junk)
static int kvm_offline_cpu(unsigned int cpu)
{
+ int ret = 0;
+
mutex_lock(&kvm_lock);
- if (kvm_usage_count)
+ if (kvm_usage_count) {
hardware_disable_nolock(NULL);
+ ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
+ if (ret) {
+ (void)hardware_enable_nolock(NULL);
+ atomic_set(&hardware_enable_failed, 0);
+ }
+ }
mutex_unlock(&kvm_lock);
- return 0;
+ return ret;
}
static void hardware_disable_all_nolock(void)
@@ -5115,6 +5119,8 @@ static int hardware_enable_all(void)
static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
void *v)
{
+ int r;
+
/*
* Some (well, at least mine) BIOSes hang on reboot if
* in vmx root mode.
@@ -5123,8 +5129,15 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
*/
pr_info("kvm: exiting hardware virtualization\n");
kvm_rebooting = true;
+
+ /* This hook is called without cpuhotplug disabled. */
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
on_each_cpu(hardware_disable_nolock, NULL, 1);
- return NOTIFY_OK;
+ r = kvm_arch_reboot(val);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
+ return r;
}
static struct notifier_block kvm_reboot_notifier = {
@@ -5718,11 +5731,10 @@ static int kvm_suspend(void)
* cpu_hotplug_disable() and other CPUs are offlined. No need for
* locking.
*/
- if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_not_held(&kvm_lock);
+ if (kvm_usage_count)
hardware_disable_nolock(NULL);
- }
- return 0;
+ return kvm_arch_suspend(kvm_usage_count);
}
static void kvm_resume(void)
@@ -5734,10 +5746,10 @@ static void kvm_resume(void)
*/
return; /* FIXME: disable KVM */
- if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_not_held(&kvm_lock);
+ kvm_arch_resume(kvm_usage_count);
+ if (kvm_usage_count)
hardware_enable_nolock((void *)__func__);
- }
}
static struct syscore_ops kvm_syscore_ops = {
--
2.25.1
From: Isaku Yamahata <[email protected]>
So that kvm_arch_hardware_enable/disable() aren't defined.
Once the conversion of all KVM archs is done, this config and kvm_arch.c
should be removed.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
include/linux/kvm_host.h | 2 ++
virt/kvm/Kconfig | 3 +++
virt/kvm/Makefile.kvm | 5 ++++-
4 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e3cbd7706136..e2e16205425d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
depends on X86_LOCAL_APIC
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
+ select HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
select HAVE_KVM_IRQCHIP
select HAVE_KVM_PFNCACHE
select HAVE_KVM_IRQFD
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8abbf7a1773b..74111118db42 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1434,8 +1434,10 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif
+#ifndef CONFIG_HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
+#endif
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_pre_hardware_unsetup(void);
void kvm_arch_hardware_unsetup(void);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index a8c5c9f06b3c..917314a87696 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -72,3 +72,6 @@ config KVM_XFER_TO_GUEST_WORK
config HAVE_KVM_PM_NOTIFIER
bool
+
+config HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
+ def_bool n
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index c4210acabd35..c0187ec4f83c 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,10 @@
KVM ?= ../../../virt/kvm
-kvm-y := $(KVM)/kvm_main.o $(KVM)/kvm_arch.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+ifneq ($(CONFIG_HAVE_KVM_OVERRIDE_HARDWARE_ENABLE), y)
+kvm-y += $(KVM)/kvm_arch.o
+endif
kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
--
2.25.1
From: Isaku Yamahata <[email protected]>
Now they're function call and there is no point to keep them.
Opportunistically make kvm_arch_pre_hardware_unsetup() empty.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa68bea655f0..f0382f3d5baf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -354,7 +354,7 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
/*
* Disabling irqs at this point since the following code could be
- * interrupted and executed through kvm_arch_hardware_disable()
+ * interrupted and executed through hardware_disable()
*/
local_irq_save(flags);
if (msrs->registered) {
@@ -11834,17 +11834,6 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
-int kvm_arch_hardware_enable(void)
-{
- return static_call(kvm_x86_hardware_enable)();
-}
-
-void kvm_arch_hardware_disable(void)
-{
- static_call(kvm_x86_hardware_disable)();
- drop_user_return_notifiers();
-}
-
static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
static int __hardware_enable(void *caller_name)
@@ -11856,7 +11845,7 @@ static int __hardware_enable(void *caller_name)
if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
return 0;
- r = kvm_arch_hardware_enable();
+ r = static_call(kvm_x86_hardware_enable)();
if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
cpu, (const char *)caller_name);
@@ -11882,12 +11871,13 @@ static void hardware_disable(void *junk)
if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
return;
cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
- kvm_arch_hardware_disable();
+ static_call(kvm_x86_hardware_disable)();
+ drop_user_return_notifiers();
}
void kvm_arch_pre_hardware_unsetup(void)
{
- on_each_cpu(hardware_disable, NULL, 1);
+ /* TODO: eliminate this function */
}
/*
@@ -11978,7 +11968,7 @@ void kvm_arch_resume(int usage_count)
if (kvm_arch_check_processor_compat())
return;
- if (kvm_arch_hardware_enable())
+ if (static_call(kvm_x86_hardware_enable)())
return;
local_tsc = rdtsc();
@@ -12119,6 +12109,8 @@ int kvm_arch_hardware_setup(void *opaque)
void kvm_arch_hardware_unsetup(void)
{
+ on_each_cpu(hardware_disable, NULL, 1);
+
kvm_unregister_perf_callbacks();
static_call(kvm_x86_hardware_unsetup)();
--
2.25.1
From: Marc Zyngier <[email protected]>
For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
complicated, and we have two extra CPUHP notifiers for vGIC and timers.
It looks pretty pointless, and gets in the way of further changes.
So let's just expose some helpers that can be called from the core
CPUHP callback, and get rid of everything else.
This gives us the opportunity to drop a useless notifier entry,
as well as tidy-up the timer enable/disable, which was a bit odd.
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Oliver Upton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
arch/arm64/kvm/arm.c | 4 ++++
arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
include/kvm/arm_arch_timer.h | 4 ++++
include/kvm/arm_vgic.h | 4 ++++
include/linux/cpuhotplug.h | 3 ---
6 files changed, 24 insertions(+), 37 deletions(-)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index bb24a76b4224..33fca1a691a5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
}
-static void kvm_timer_init_interrupt(void *info)
+void kvm_timer_cpu_up(void)
{
enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
- enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+ if (host_ptimer_irq)
+ enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+}
+
+void kvm_timer_cpu_down(void)
+{
+ disable_percpu_irq(host_vtimer_irq);
+ if (host_ptimer_irq)
+ disable_percpu_irq(host_ptimer_irq);
}
int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
@@ -976,18 +984,6 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
preempt_enable();
}
-static int kvm_timer_starting_cpu(unsigned int cpu)
-{
- kvm_timer_init_interrupt(NULL);
- return 0;
-}
-
-static int kvm_timer_dying_cpu(unsigned int cpu)
-{
- disable_percpu_irq(host_vtimer_irq);
- return 0;
-}
-
static int timer_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
{
if (vcpu)
@@ -1185,9 +1181,6 @@ int kvm_timer_hyp_init(bool has_gic)
goto out_free_irq;
}
- cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
- "kvm/arm/timer:starting", kvm_timer_starting_cpu,
- kvm_timer_dying_cpu);
return 0;
out_free_irq:
free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3385fb57c11a..0a2f616c4d63 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1670,6 +1670,8 @@ static void _kvm_arch_hardware_enable(void *discard)
{
if (!__this_cpu_read(kvm_arm_hardware_enabled)) {
cpu_hyp_reinit();
+ kvm_vgic_cpu_up();
+ kvm_timer_cpu_up();
__this_cpu_write(kvm_arm_hardware_enabled, 1);
}
}
@@ -1683,6 +1685,8 @@ int kvm_arch_hardware_enable(void)
static void _kvm_arch_hardware_disable(void *discard)
{
if (__this_cpu_read(kvm_arm_hardware_enabled)) {
+ kvm_timer_cpu_down();
+ kvm_vgic_cpu_down();
cpu_hyp_reset();
__this_cpu_write(kvm_arm_hardware_enabled, 0);
}
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..6c7f6ae21ec0 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -465,17 +465,15 @@ int kvm_vgic_map_resources(struct kvm *kvm)
/* GENERIC PROBE */
-static int vgic_init_cpu_starting(unsigned int cpu)
+void kvm_vgic_cpu_up(void)
{
enable_percpu_irq(kvm_vgic_global_state.maint_irq, 0);
- return 0;
}
-static int vgic_init_cpu_dying(unsigned int cpu)
+void kvm_vgic_cpu_down(void)
{
disable_percpu_irq(kvm_vgic_global_state.maint_irq);
- return 0;
}
static irqreturn_t vgic_maintenance_handler(int irq, void *data)
@@ -584,19 +582,6 @@ int kvm_vgic_hyp_init(void)
return ret;
}
- ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- "kvm/arm/vgic:starting",
- vgic_init_cpu_starting, vgic_init_cpu_dying);
- if (ret) {
- kvm_err("Cannot register vgic CPU notifier\n");
- goto out_free_irq;
- }
-
kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq);
return 0;
-
-out_free_irq:
- free_percpu_irq(kvm_vgic_global_state.maint_irq,
- kvm_get_running_vcpus());
- return ret;
}
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index cd6d8f260eab..1638418f72dd 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -104,4 +104,8 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
u32 timer_get_ctl(struct arch_timer_context *ctxt);
u64 timer_get_cval(struct arch_timer_context *ctxt);
+/* CPU HP callbacks */
+void kvm_timer_cpu_up(void);
+void kvm_timer_cpu_down(void);
+
#endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..fc4acc91ba06 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -431,4 +431,8 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
+/* CPU HP callbacks */
+void kvm_vgic_cpu_up(void);
+void kvm_vgic_cpu_down(void);
+
#endif /* __KVM_ARM_VGIC_H */
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..7337414e4947 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -186,9 +186,6 @@ enum cpuhp_state {
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
CPUHP_AP_KVM_STARTING,
- CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- CPUHP_AP_KVM_ARM_VGIC_STARTING,
- CPUHP_AP_KVM_ARM_TIMER_STARTING,
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
--
2.25.1
From: Isaku Yamahata <[email protected]>
So far the processor compatibility check is not done on resume. It should
be done.
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0ac00c711384..fc55447c4dba 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5715,6 +5715,13 @@ static int kvm_suspend(void)
static void kvm_resume(void)
{
+ if (kvm_arch_check_processor_compat())
+ /*
+ * No warning here because kvm_arch_check_processor_compat()
+ * would have warned with more information.
+ */
+ return; /* FIXME: disable KVM */
+
if (kvm_usage_count) {
lockdep_assert_not_held(&kvm_count_lock);
hardware_enable_nolock((void *)__func__);
--
2.25.1
From: Isaku Yamahata <[email protected]>
To make clear that those files are default implementation that KVM/x86 (and
other KVM arch in future) will override them, split out those into a single
file. Once conversions for all kvm archs are done, the file will be
deleted. kvm_arch_pre_hardware_unsetup() is introduced to avoid cross-arch
code churn for now. Once it's settled down,
kvm_arch_pre_hardware_unsetup() can be merged into
kvm_arch_hardware_unsetup() in each arch code.
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_arch.c | 103 ++++++++++++++++++++++-
virt/kvm/kvm_main.c | 172 +++++----------------------------------
3 files changed, 124 insertions(+), 152 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f78364e01ca9..60f4ae9d6f48 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
+void kvm_arch_pre_hardware_unsetup(void);
void kvm_arch_hardware_unsetup(void);
int kvm_arch_check_processor_compat(void);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 0eac996f4981..0648d4463d9e 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -6,49 +6,148 @@
* Author:
* Isaku Yamahata <[email protected]>
* <[email protected]>
+ *
+ * TODO: Delete this file once the conversion of all KVM arch is done.
*/
#include <linux/kvm_host.h>
+static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
+static atomic_t hardware_enable_failed;
+
__weak int kvm_arch_post_init_vm(struct kvm *kvm)
{
return 0;
}
+static void hardware_enable_nolock(void *caller_name)
+{
+ int cpu = raw_smp_processor_id();
+ int r;
+
+ WARN_ON_ONCE(preemptible());
+
+ if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+
+ r = kvm_arch_hardware_enable();
+
+ if (r) {
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ atomic_inc(&hardware_enable_failed);
+ pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
+ cpu, (const char *)caller_name);
+ }
+}
+
+static void hardware_disable_nolock(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ WARN_ON_ONCE(preemptible());
+
+ if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ kvm_arch_hardware_disable();
+}
+
+__weak void kvm_arch_pre_hardware_unsetup(void)
+{
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+}
+
/*
* Called after the VM is otherwise initialized, but just before adding it to
* the vm_list.
*/
__weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
{
- return kvm_arch_post_init_vm(kvm);
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&hardware_enable_failed, 0);
+ on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
+
+ if (atomic_read(&hardware_enable_failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r)
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return r;
}
__weak int kvm_arch_del_vm(int usage_count)
{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
return 0;
}
__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
- return 0;
+ int ret = 0;
+
+ ret = kvm_arch_check_processor_compat();
+ if (ret)
+ return ret;
+
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ if (usage_count) {
+ WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+
+ hardware_enable_nolock((void *)__func__);
+ if (atomic_read(&hardware_enable_failed)) {
+ atomic_set(&hardware_enable_failed, 0);
+ ret = -EIO;
+ }
+ }
+ return ret;
}
__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
{
+ if (usage_count)
+ hardware_disable_nolock(NULL);
return 0;
}
__weak int kvm_arch_reboot(int val)
{
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
return NOTIFY_OK;
}
__weak int kvm_arch_suspend(int usage_count)
{
+ if (usage_count)
+ hardware_disable_nolock(NULL);
return 0;
}
__weak void kvm_arch_resume(int usage_count)
{
+ if (kvm_arch_check_processor_compat())
+ /*
+ * No warning here because kvm_arch_check_processor_compat()
+ * would have warned with more information.
+ */
+ return; /* FIXME: disable KVM */
+
+ if (usage_count)
+ hardware_enable_nolock((void *)__func__);
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 90e1dcfc9ace..5373127dcdb6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,9 +102,7 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
DEFINE_MUTEX(kvm_lock);
LIST_HEAD(vm_list);
-static cpumask_var_t cpus_hardware_enabled;
static int kvm_usage_count;
-static atomic_t hardware_enable_failed;
static struct kmem_cache *kvm_vcpu_cache;
@@ -142,9 +140,6 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
.open = kvm_no_compat_open
#endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);
-static void hardware_disable_nolock(void *junk);
static void kvm_del_vm(void);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1196,10 +1191,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_arch_destroy_vm;
- r = hardware_enable_all();
- if (r)
- goto out_err_no_disable;
-
#ifdef CONFIG_HAVE_KVM_IRQFD
INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
#endif
@@ -1216,14 +1207,28 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_debugfs;
+ /*
+ * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+ * is called. on_each_cpu() between them includes the CPU. As a result,
+ * hardware_enable_nolock() may get invoked before kvm_online_cpu().
+ * This would enable hardware virtualization on that cpu without
+ * compatibility checks, which can potentially crash system or break
+ * running VMs.
+ *
+ * Disable CPU hotplug to prevent this case from happening.
+ */
+ cpus_read_lock();
mutex_lock(&kvm_lock);
+ kvm_usage_count++;
r = kvm_arch_add_vm(kvm, kvm_usage_count);
if (r) {
+ /* the following kvm_del_vm() decrements kvm_usage_count. */
mutex_unlock(&kvm_lock);
goto out_err;
}
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
preempt_notifier_inc();
kvm_init_pm_notifier(kvm);
@@ -1240,9 +1245,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
#endif
out_err_no_mmu_notifier:
- hardware_disable_all();
kvm_del_vm();
-out_err_no_disable:
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
@@ -1321,7 +1324,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
cleanup_srcu_struct(&kvm->srcu);
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
- hardware_disable_all();
kvm_del_vm();
mmdrop(mm);
module_put(kvm_chardev_ops.owner);
@@ -4986,149 +4988,37 @@ static struct miscdevice kvm_dev = {
&kvm_chardev_ops,
};
-static void hardware_enable_nolock(void *caller_name)
-{
- int cpu = raw_smp_processor_id();
- int r;
-
- WARN_ON_ONCE(preemptible());
-
- if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
- return;
-
- cpumask_set_cpu(cpu, cpus_hardware_enabled);
-
- r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
- atomic_inc(&hardware_enable_failed);
- pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
- cpu, (const char *)caller_name);
- }
-}
-
static int kvm_online_cpu(unsigned int cpu)
{
int ret;
- ret = kvm_arch_check_processor_compat();
- if (ret)
- return ret;
-
mutex_lock(&kvm_lock);
- /*
- * Abort the CPU online process if hardware virtualization cannot
- * be enabled. Otherwise running VMs would encounter unrecoverable
- * errors when scheduled to this CPU.
- */
- if (kvm_usage_count) {
- WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
-
- hardware_enable_nolock((void *)__func__);
- if (atomic_read(&hardware_enable_failed)) {
- atomic_set(&hardware_enable_failed, 0);
- ret = -EIO;
- } else {
- ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
- if (ret)
- hardware_disable_nolock(NULL);
- }
- }
+ ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return ret;
}
-static void hardware_disable_nolock(void *junk)
-{
- int cpu = raw_smp_processor_id();
-
- WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
- kvm_arch_hardware_disable();
-}
-
static int kvm_offline_cpu(unsigned int cpu)
{
- int ret = 0;
+ int ret;
mutex_lock(&kvm_lock);
- if (kvm_usage_count) {
- hardware_disable_nolock(NULL);
- ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
- if (ret) {
- (void)hardware_enable_nolock(NULL);
- atomic_set(&hardware_enable_failed, 0);
- }
- }
+ ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return ret;
}
-static void hardware_disable_all_nolock(void)
-{
- BUG_ON(!kvm_usage_count);
-
- kvm_usage_count--;
- if (!kvm_usage_count)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static void hardware_disable_all(void)
-{
- cpus_read_lock();
- mutex_lock(&kvm_lock);
- hardware_disable_all_nolock();
- mutex_unlock(&kvm_lock);
- cpus_read_unlock();
-}
-
static void kvm_del_vm(void)
{
cpus_read_lock();
mutex_lock(&kvm_lock);
+ WARN_ON_ONCE(!kvm_usage_count);
+ kvm_usage_count--;
kvm_arch_del_vm(kvm_usage_count);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
}
-static int hardware_enable_all(void)
-{
- int r = 0;
-
- /*
- * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
- * is called. on_each_cpu() between them includes the CPU. As a result,
- * hardware_enable_nolock() may get invoked before kvm_online_cpu().
- * This would enable hardware virtualization on that cpu without
- * compatibility checks, which can potentially crash system or break
- * running VMs.
- *
- * Disable CPU hotplug to prevent this case from happening.
- */
- cpus_read_lock();
- mutex_lock(&kvm_lock);
-
- kvm_usage_count++;
- if (kvm_usage_count == 1) {
- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
-
- if (atomic_read(&hardware_enable_failed)) {
- hardware_disable_all_nolock();
- r = -EBUSY;
- }
- }
-
- mutex_unlock(&kvm_lock);
- cpus_read_unlock();
-
- return r;
-}
-
static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
void *v)
{
@@ -5146,7 +5036,6 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
/* This hook is called without cpuhotplug disabled. */
cpus_read_lock();
mutex_lock(&kvm_lock);
- on_each_cpu(hardware_disable_nolock, NULL, 1);
r = kvm_arch_reboot(val);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
@@ -5745,24 +5634,13 @@ static int kvm_suspend(void)
* locking.
*/
lockdep_assert_not_held(&kvm_lock);
- if (kvm_usage_count)
- hardware_disable_nolock(NULL);
return kvm_arch_suspend(kvm_usage_count);
}
static void kvm_resume(void)
{
- if (kvm_arch_check_processor_compat())
- /*
- * No warning here because kvm_arch_check_processor_compat()
- * would have warned with more information.
- */
- return; /* FIXME: disable KVM */
-
lockdep_assert_not_held(&kvm_lock);
kvm_arch_resume(kvm_usage_count);
- if (kvm_usage_count)
- hardware_enable_nolock((void *)__func__);
}
static struct syscore_ops kvm_syscore_ops = {
@@ -5900,11 +5778,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r)
goto out_irqfd;
- if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
- r = -ENOMEM;
- goto out_free_0;
- }
-
r = kvm_arch_hardware_setup(opaque);
if (r < 0)
goto out_free_1;
@@ -5981,8 +5854,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
out_free_2:
kvm_arch_hardware_unsetup();
out_free_1:
- free_cpumask_var(cpus_hardware_enabled);
-out_free_0:
kvm_irqfd_exit();
out_irqfd:
kvm_arch_exit();
@@ -6004,11 +5875,12 @@ void kvm_exit(void)
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ cpus_read_lock();
+ kvm_arch_pre_hardware_unsetup();
kvm_arch_hardware_unsetup();
+ cpus_read_unlock();
kvm_arch_exit();
kvm_irqfd_exit();
- free_cpumask_var(cpus_hardware_enabled);
kvm_vfio_ops_exit();
}
EXPORT_SYMBOL_GPL(kvm_exit);
--
2.25.1
From: Isaku Yamahata <[email protected]>
Now kvm_arch_post_init_vm() is used only by x86 kvm_arch_add_vm(). Other
arch doesn't define it. Merge x86 kvm_arch_post_init_vm() int x86
kvm_arch_add_vm() and eliminate kvm_arch_post_init_vm().
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 7 +------
include/linux/kvm_host.h | 1 -
virt/kvm/kvm_arch.c | 12 +-----------
3 files changed, 2 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a28eb5fbb76..fa68bea655f0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11847,11 +11847,6 @@ void kvm_arch_hardware_disable(void)
static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static int __hardware_enable(void *caller_name)
{
int cpu = raw_smp_processor_id();
@@ -11915,7 +11910,7 @@ int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
goto err;
}
- r = kvm_arch_post_init_vm(kvm);
+ r = kvm_mmu_post_init_vm(kvm);
err:
if (r)
on_each_cpu(hardware_disable, NULL, 1);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60f4ae9d6f48..8abbf7a1773b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1445,7 +1445,6 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
-int kvm_arch_post_init_vm(struct kvm *kvm);
int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
int kvm_arch_del_vm(int usage_count);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index f7dcde842eb5..bcd82b75fa17 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -14,11 +14,6 @@
static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-__weak int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
static int __hardware_enable(void *caller_name)
{
int cpu = raw_smp_processor_id();
@@ -79,13 +74,8 @@ __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
if (atomic_read(&failed)) {
r = -EBUSY;
- goto err;
- }
-
- r = kvm_arch_post_init_vm(kvm);
-err:
- if (r)
on_each_cpu(hardware_disable, NULL, 1);
+ }
return r;
}
--
2.25.1
From: Isaku Yamahata <[email protected]>
commit 0dd6a6edb012 ("KVM: Dont mark TSC unstable due to S4 suspend") made
use of kvm_arch_hardware_enable() callback to detect that TSC goes backward
due to S4 suspend. It has to check it only when resuming from S4. Not
every time virtualization hardware ennoblement. Move the logic to
kvm_arch_resume() callback.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68def7ca224a..f5f4d8eed588 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11835,18 +11835,30 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
int kvm_arch_hardware_enable(void)
+{
+ return static_call(kvm_x86_hardware_enable)();
+}
+
+void kvm_arch_hardware_disable(void)
+{
+ static_call(kvm_x86_hardware_disable)();
+ drop_user_return_notifiers();
+}
+
+void kvm_arch_resume(int usage_count)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
unsigned long i;
- int ret;
u64 local_tsc;
u64 max_tsc = 0;
bool stable, backwards_tsc = false;
- ret = static_call(kvm_x86_hardware_enable)();
- if (ret != 0)
- return ret;
+ if (!usage_count)
+ return;
+
+ if (kvm_arch_hardware_enable())
+ return;
local_tsc = rdtsc();
stable = !kvm_check_tsc_unstable();
@@ -11921,13 +11933,6 @@ int kvm_arch_hardware_enable(void)
}
}
- return 0;
-}
-
-void kvm_arch_hardware_disable(void)
-{
- static_call(kvm_x86_hardware_disable)();
- drop_user_return_notifiers();
}
static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
--
2.25.1
From: Isaku Yamahata <[email protected]>
A global variable hardware_enable_failed in kvm_arch.c is used only by
kvm_arch_add_vm() and hardware_enable(). It doesn't have to be a global
variable. Make it function local.
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 49 +++++++++++++++++++++------------------------
1 file changed, 23 insertions(+), 26 deletions(-)
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index de39d0127584..f7dcde842eb5 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -13,14 +13,13 @@
#include <linux/kvm_host.h>
static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-static atomic_t hardware_enable_failed;
__weak int kvm_arch_post_init_vm(struct kvm *kvm)
{
return 0;
}
-static void hardware_enable(void *caller_name)
+static int __hardware_enable(void *caller_name)
{
int cpu = raw_smp_processor_id();
int r;
@@ -28,18 +27,22 @@ static void hardware_enable(void *caller_name)
WARN_ON_ONCE(preemptible());
if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
-
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
-
+ return 0;
r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
- atomic_inc(&hardware_enable_failed);
+ if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
cpu, (const char *)caller_name);
- }
+ else
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ return r;
+}
+
+static void hardware_enable(void *arg)
+{
+ atomic_t *failed = arg;
+
+ if (__hardware_enable((void *)__func__))
+ atomic_inc(failed);
}
static void hardware_disable(void *junk)
@@ -65,15 +68,16 @@ __weak void kvm_arch_pre_hardware_unsetup(void)
*/
__weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
{
+ atomic_t failed;
int r = 0;
if (usage_count != 1)
return 0;
- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable, (void *)__func__, 1);
+ atomic_set(&failed, 0);
+ on_each_cpu(hardware_enable, &failed, 1);
- if (atomic_read(&hardware_enable_failed)) {
+ if (atomic_read(&failed)) {
r = -EBUSY;
goto err;
}
@@ -96,27 +100,20 @@ __weak int kvm_arch_del_vm(int usage_count)
__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
- int ret = 0;
+ int ret;
ret = kvm_arch_check_processor_compat();
if (ret)
return ret;
+ if (!usage_count)
+ return 0;
/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
* errors when scheduled to this CPU.
*/
- if (usage_count) {
- WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
-
- hardware_enable((void *)__func__);
- if (atomic_read(&hardware_enable_failed)) {
- atomic_set(&hardware_enable_failed, 0);
- ret = -EIO;
- }
- }
- return ret;
+ return __hardware_enable((void *)__func__);
}
__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
@@ -149,5 +146,5 @@ __weak void kvm_arch_resume(int usage_count)
return; /* FIXME: disable KVM */
if (usage_count)
- hardware_enable((void *)__func__);
+ (void)__hardware_enable((void *)__func__);
}
--
2.25.1
From: Sean Christopherson <[email protected]>
Provide the name of the calling function to hardware_enable_nolock() and
include it in the error message to provide additional information on
exactly what path failed.
Opportunistically bump the pr_info() to pr_warn(), failure to enable
virtualization support is warn-worthy as _something_ is wrong with the
system.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_main.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4243a9541543..278eb6cc7cbe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4991,7 +4991,7 @@ static struct miscdevice kvm_dev = {
&kvm_chardev_ops,
};
-static void hardware_enable_nolock(void *junk)
+static void hardware_enable_nolock(void *caller_name)
{
int cpu = raw_smp_processor_id();
int r;
@@ -5006,7 +5006,8 @@ static void hardware_enable_nolock(void *junk)
if (r) {
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
atomic_inc(&hardware_enable_failed);
- pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
+ pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
+ cpu, (const char *)caller_name);
}
}
@@ -5014,7 +5015,7 @@ static int kvm_starting_cpu(unsigned int cpu)
{
raw_spin_lock(&kvm_count_lock);
if (kvm_usage_count)
- hardware_enable_nolock(NULL);
+ hardware_enable_nolock((void *)__func__);
raw_spin_unlock(&kvm_count_lock);
return 0;
}
@@ -5063,7 +5064,7 @@ static int hardware_enable_all(void)
kvm_usage_count++;
if (kvm_usage_count == 1) {
atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
+ on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
if (atomic_read(&hardware_enable_failed)) {
hardware_disable_all_nolock();
@@ -5686,7 +5687,7 @@ static void kvm_resume(void)
{
if (kvm_usage_count) {
lockdep_assert_not_held(&kvm_count_lock);
- hardware_enable_nolock(NULL);
+ hardware_enable_nolock((void *)__func__);
}
}
--
2.25.1
From: Chao Gao <[email protected]>
The CPU STARTING section doesn't allow callbacks to fail. Move KVM's
hotplug callback to ONLINE section so that it can abort onlining a CPU in
certain cases to avoid potentially breaking VMs running on existing CPUs.
For example, when kvm fails to enable hardware virtualization on the
hotplugged CPU.
Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it ensures
when offlining a CPU, all user tasks and non-pinned kernel tasks have left
the CPU, i.e. there cannot be a vCPU task around. So, it is safe for KVM's
CPU offline callback to disable hardware virtualization at that point.
Likewise, KVM's online callback can enable hardware virtualization before
any vCPU task gets a chance to run on hotplugged CPUs.
KVM's CPU hotplug callbacks are renamed as well.
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/cpuhotplug.h | 2 +-
virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++--------
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 7337414e4947..de45be38dd27 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -185,7 +185,6 @@ enum cpuhp_state {
CPUHP_AP_CSKY_TIMER_STARTING,
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
- CPUHP_AP_KVM_STARTING,
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
@@ -200,6 +199,7 @@ enum cpuhp_state {
/* Online section invoked on the hotplugged CPU from the hotplug thread */
CPUHP_AP_ONLINE_IDLE,
+ CPUHP_AP_KVM_ONLINE,
CPUHP_AP_SCHED_WAIT_EMPTY,
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 278eb6cc7cbe..db1303e2abc9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5011,13 +5011,27 @@ static void hardware_enable_nolock(void *caller_name)
}
}
-static int kvm_starting_cpu(unsigned int cpu)
+static int kvm_online_cpu(unsigned int cpu)
{
+ int ret = 0;
+
raw_spin_lock(&kvm_count_lock);
- if (kvm_usage_count)
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ if (kvm_usage_count) {
+ WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+
hardware_enable_nolock((void *)__func__);
+ if (atomic_read(&hardware_enable_failed)) {
+ atomic_set(&hardware_enable_failed, 0);
+ ret = -EIO;
+ }
+ }
raw_spin_unlock(&kvm_count_lock);
- return 0;
+ return ret;
}
static void hardware_disable_nolock(void *junk)
@@ -5030,7 +5044,7 @@ static void hardware_disable_nolock(void *junk)
kvm_arch_hardware_disable();
}
-static int kvm_dying_cpu(unsigned int cpu)
+static int kvm_offline_cpu(unsigned int cpu)
{
raw_spin_lock(&kvm_count_lock);
if (kvm_usage_count)
@@ -5841,8 +5855,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
goto out_free_2;
}
- r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
- kvm_starting_cpu, kvm_dying_cpu);
+ r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
+ kvm_online_cpu, kvm_offline_cpu);
if (r)
goto out_free_2;
register_reboot_notifier(&kvm_reboot_notifier);
@@ -5903,7 +5917,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
kmem_cache_destroy(kvm_vcpu_cache);
out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
out_free_2:
kvm_arch_hardware_unsetup();
out_free_1:
@@ -5929,7 +5943,7 @@ void kvm_exit(void)
kvm_async_pf_deinit();
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_arch_hardware_unsetup();
kvm_arch_exit();
--
2.25.1
From: Chao Gao <[email protected]>
This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
param to additional arch funcs") remove opaque from
kvm_arch_check_processor_compat because no one uses this opaque now.
Address conflicts for ARM (due to file movement) and manually handle RISC-V
which comes after the commit.
And changes about kvm_arch_hardware_setup() in original commit are still
needed so they are not reverted.
Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Acked-by: Anup Patel <[email protected]>
Acked-by: Claudio Imbrenda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
---
arch/arm64/kvm/arm.c | 2 +-
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/powerpc.c | 2 +-
arch/riscv/kvm/main.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 16 +++-------------
8 files changed, 10 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2ff0ef62abad..3385fb57c11a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -68,7 +68,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..092d09fb6a7e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fb1490761c87..7b56d6ccfdfb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -447,7 +447,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return kvmppc_core_check_processor_compat();
}
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 1549205fe5fe..f8d6372d208f 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index edfd4bbd0cba..e26d4dd85668 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -254,7 +254,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f12a7ed6f94..53c8ee677f16 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11998,7 +11998,7 @@ void kvm_arch_hardware_unsetup(void)
static_call(kvm_x86_hardware_unsetup)();
}
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..eab352902de7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1438,7 +1438,7 @@ int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void *opaque);
+int kvm_arch_check_processor_compat(void);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 584a5bab3af3..4243a9541543 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5799,22 +5799,14 @@ void kvm_unregister_perf_callbacks(void)
}
#endif
-struct kvm_cpu_compat_check {
- void *opaque;
- int *ret;
-};
-
-static void check_processor_compat(void *data)
+static void check_processor_compat(void *rtn)
{
- struct kvm_cpu_compat_check *c = data;
-
- *c->ret = kvm_arch_check_processor_compat(c->opaque);
+ *(int *)rtn = kvm_arch_check_processor_compat();
}
int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module)
{
- struct kvm_cpu_compat_check c;
int r;
int cpu;
@@ -5842,10 +5834,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r < 0)
goto out_free_1;
- c.ret = &r;
- c.opaque = opaque;
for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &c, 1);
+ smp_call_function_single(cpu, check_processor_compat, &r, 1);
if (r < 0)
goto out_free_2;
}
--
2.25.1
From: Isaku Yamahata <[email protected]>
cpus_hardware_enabled mask seems incomplete protection against other kernel
component using CPU virtualization feature. Because it's obscure and
incomplete, remove the check.
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f0382f3d5baf..15e123757b11 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11834,23 +11834,16 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
-static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-
static int __hardware_enable(void *caller_name)
{
- int cpu = raw_smp_processor_id();
int r;
WARN_ON_ONCE(preemptible());
- if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return 0;
r = static_call(kvm_x86_hardware_enable)();
if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
- cpu, (const char *)caller_name);
- else
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ smp_processor_id(), (const char *)caller_name);
return r;
}
@@ -11864,13 +11857,7 @@ static void hardware_enable(void *arg)
static void hardware_disable(void *junk)
{
- int cpu = raw_smp_processor_id();
-
WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
static_call(kvm_x86_hardware_disable)();
drop_user_return_notifiers();
}
--
2.25.1
From: Chao Gao <[email protected]>
so that KVM can do compatibility checks on hotplugged CPUs. Drop __init
from check_processor_compatibility() and its callees.
use a static_call() to invoke .check_processor_compatibility.
Opportunistically rename {svm,vmx}_check_processor_compat to conform
to the naming convention of fields of kvm_x86_ops.
Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 14 +++++++-------
arch/x86/kvm/x86.c | 3 +--
5 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 51f777071584..3bc45932e2d1 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP(check_processor_compatibility)
#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..5df5d88d345f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1445,6 +1445,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
struct kvm_x86_ops {
const char *name;
+ int (*check_processor_compatibility)(void);
int (*hardware_enable)(void);
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
@@ -1655,7 +1656,6 @@ struct kvm_x86_nested_ops {
struct kvm_x86_init_ops {
int (*cpu_has_kvm_support)(void);
int (*disabled_by_bios)(void);
- int (*check_processor_compatibility)(void);
int (*hardware_setup)(void);
unsigned int (*handle_intel_pt_intr)(void);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..371300f03f55 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4134,7 +4134,7 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xd9;
}
-static int __init svm_check_processor_compat(void)
+static int svm_check_processor_compatibility(void)
{
return 0;
}
@@ -4740,6 +4740,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = "kvm_amd",
.hardware_unsetup = svm_hardware_unsetup,
+ .check_processor_compatibility = svm_check_processor_compatibility,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.has_emulated_msr = svm_has_emulated_msr,
@@ -5122,7 +5123,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
.hardware_setup = svm_hardware_setup,
- .check_processor_compatibility = svm_check_processor_compat,
.runtime_ops = &svm_x86_ops,
.pmu_ops = &amd_pmu_ops,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d7f8331d6f7e..3cf7f18a4115 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2495,8 +2495,8 @@ static bool cpu_has_sgx(void)
return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
}
-static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
- u32 msr, u32 *result)
+static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
+ u32 msr, u32 *result)
{
u32 vmx_msr_low, vmx_msr_high;
u32 ctl = ctl_min | ctl_opt;
@@ -2514,7 +2514,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
return 0;
}
-static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
+static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
{
u64 allowed;
@@ -2523,8 +2523,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
return ctl_opt & allowed;
}
-static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
- struct vmx_capability *vmx_cap)
+static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
+ struct vmx_capability *vmx_cap)
{
u32 vmx_msr_low, vmx_msr_high;
u32 min, opt, min2, opt2;
@@ -7417,7 +7417,7 @@ static int vmx_vm_init(struct kvm *kvm)
return 0;
}
-static int __init vmx_check_processor_compat(void)
+static int vmx_check_processor_compatibility(void)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;
@@ -8015,6 +8015,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.hardware_unsetup = vmx_hardware_unsetup,
+ .check_processor_compatibility = vmx_check_processor_compatibility,
.hardware_enable = vmx_hardware_enable,
.hardware_disable = vmx_hardware_disable,
.has_emulated_msr = vmx_has_emulated_msr,
@@ -8404,7 +8405,6 @@ static __init int hardware_setup(void)
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
- .check_processor_compatibility = vmx_check_processor_compat,
.hardware_setup = hardware_setup,
.handle_intel_pt_intr = NULL,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd021581ca60..5f12a7ed6f94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12001,7 +12001,6 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
- struct kvm_x86_init_ops *ops = opaque;
WARN_ON(!irqs_disabled());
@@ -12009,7 +12008,7 @@ int kvm_arch_check_processor_compat(void *opaque)
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;
- return ops->check_processor_compatibility();
+ return static_call(kvm_x86_check_processor_compatibility)();
}
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--
2.25.1
From: Isaku Yamahata <[email protected]>
Now all related callbacks are called under kvm_lock, no point for _nolock
post fix. Remove _nolock post fix for readability with shorter function
names.
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 0648d4463d9e..de39d0127584 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -20,7 +20,7 @@ __weak int kvm_arch_post_init_vm(struct kvm *kvm)
return 0;
}
-static void hardware_enable_nolock(void *caller_name)
+static void hardware_enable(void *caller_name)
{
int cpu = raw_smp_processor_id();
int r;
@@ -42,7 +42,7 @@ static void hardware_enable_nolock(void *caller_name)
}
}
-static void hardware_disable_nolock(void *junk)
+static void hardware_disable(void *junk)
{
int cpu = raw_smp_processor_id();
@@ -56,7 +56,7 @@ static void hardware_disable_nolock(void *junk)
__weak void kvm_arch_pre_hardware_unsetup(void)
{
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
}
/*
@@ -71,7 +71,7 @@ __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
return 0;
atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
+ on_each_cpu(hardware_enable, (void *)__func__, 1);
if (atomic_read(&hardware_enable_failed)) {
r = -EBUSY;
@@ -81,7 +81,7 @@ __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
r = kvm_arch_post_init_vm(kvm);
err:
if (r)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return r;
}
@@ -90,7 +90,7 @@ __weak int kvm_arch_del_vm(int usage_count)
if (usage_count)
return 0;
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return 0;
}
@@ -110,7 +110,7 @@ __weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
if (usage_count) {
WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
- hardware_enable_nolock((void *)__func__);
+ hardware_enable((void *)__func__);
if (atomic_read(&hardware_enable_failed)) {
atomic_set(&hardware_enable_failed, 0);
ret = -EIO;
@@ -122,20 +122,20 @@ __weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
{
if (usage_count)
- hardware_disable_nolock(NULL);
+ hardware_disable(NULL);
return 0;
}
__weak int kvm_arch_reboot(int val)
{
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return NOTIFY_OK;
}
__weak int kvm_arch_suspend(int usage_count)
{
if (usage_count)
- hardware_disable_nolock(NULL);
+ hardware_disable(NULL);
return 0;
}
@@ -149,5 +149,5 @@ __weak void kvm_arch_resume(int usage_count)
return; /* FIXME: disable KVM */
if (usage_count)
- hardware_enable_nolock((void *)__func__);
+ hardware_enable((void *)__func__);
}
--
2.25.1
From: Isaku Yamahata <[email protected]>
cpus_hardware_enabled mask seems incomplete protection against other kernel
component using CPU virtualization feature. Because it's obscure and
incomplete, remove the check.
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index bcd82b75fa17..4142f2cb1922 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -12,23 +12,16 @@
#include <linux/kvm_host.h>
-static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-
static int __hardware_enable(void *caller_name)
{
- int cpu = raw_smp_processor_id();
int r;
WARN_ON_ONCE(preemptible());
- if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return 0;
r = kvm_arch_hardware_enable();
if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
- cpu, (const char *)caller_name);
- else
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ smp_processor_id(), (const char *)caller_name);
return r;
}
@@ -42,13 +35,7 @@ static void hardware_enable(void *arg)
static void hardware_disable(void *junk)
{
- int cpu = raw_smp_processor_id();
-
WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
kvm_arch_hardware_disable();
}
--
2.25.1
From: Isaku Yamahata <[email protected]>
Because kvm_count_lock unnecessarily complicates the KVM locking convention
Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
simplicity.
Opportunistically add some comments on locking.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
Documentation/virt/kvm/locking.rst | 14 +++++-------
virt/kvm/kvm_main.c | 34 ++++++++++++++++++++----------
2 files changed, 28 insertions(+), 20 deletions(-)
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 845a561629f1..8957e32aa724 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list
-
-``kvm_count_lock``
-^^^^^^^^^^^^^^^^^^
-
-:Type: raw_spinlock_t
-:Arch: any
-:Protects: - hardware virtualization enable/disable
-:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
- migration.
+ - kvm_usage_count
+ - hardware virtualization enable/disable
+:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
+ because hardware enabling/disabling must be atomic /wrt
+ migration. The lock order is cpus lock => kvm_lock.
``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fc55447c4dba..082d5dbc8d7f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
*/
DEFINE_MUTEX(kvm_lock);
-static DEFINE_RAW_SPINLOCK(kvm_count_lock);
LIST_HEAD(vm_list);
static cpumask_var_t cpus_hardware_enabled;
@@ -4996,6 +4995,8 @@ static void hardware_enable_nolock(void *caller_name)
int cpu = raw_smp_processor_id();
int r;
+ WARN_ON_ONCE(preemptible());
+
if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
return;
@@ -5019,7 +5020,7 @@ static int kvm_online_cpu(unsigned int cpu)
if (ret)
return ret;
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
@@ -5034,7 +5035,7 @@ static int kvm_online_cpu(unsigned int cpu)
ret = -EIO;
}
}
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
return ret;
}
@@ -5042,6 +5043,8 @@ static void hardware_disable_nolock(void *junk)
{
int cpu = raw_smp_processor_id();
+ WARN_ON_ONCE(preemptible());
+
if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
return;
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
@@ -5050,10 +5053,10 @@ static void hardware_disable_nolock(void *junk)
static int kvm_offline_cpu(unsigned int cpu)
{
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
if (kvm_usage_count)
hardware_disable_nolock(NULL);
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
return 0;
}
@@ -5068,9 +5071,11 @@ static void hardware_disable_all_nolock(void)
static void hardware_disable_all(void)
{
- raw_spin_lock(&kvm_count_lock);
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
hardware_disable_all_nolock();
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
}
static int hardware_enable_all(void)
@@ -5088,7 +5093,7 @@ static int hardware_enable_all(void)
* Disable CPU hotplug to prevent this case from happening.
*/
cpus_read_lock();
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
kvm_usage_count++;
if (kvm_usage_count == 1) {
@@ -5101,7 +5106,7 @@ static int hardware_enable_all(void)
}
}
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
cpus_read_unlock();
return r;
@@ -5708,8 +5713,15 @@ static void kvm_init_debug(void)
static int kvm_suspend(void)
{
- if (kvm_usage_count)
+ /*
+ * The caller ensures that CPU hotlug is disabled by
+ * cpu_hotplug_disable() and other CPUs are offlined. No need for
+ * locking.
+ */
+ if (kvm_usage_count) {
+ lockdep_assert_not_held(&kvm_lock);
hardware_disable_nolock(NULL);
+ }
return 0;
}
@@ -5723,7 +5735,7 @@ static void kvm_resume(void)
return; /* FIXME: disable KVM */
if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_count_lock);
+ lockdep_assert_not_held(&kvm_lock);
hardware_enable_nolock((void *)__func__);
}
}
--
2.25.1
On Thu, Sep 01, 2022 at 07:17:36PM -0700, [email protected] wrote:
>- user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
>+ /*
>+ * __GFP_ZERO to ensure user_return_msrs.values[].initialized = false.
^ should be user_return_msrs.initialized
With this fixed,
Reviewed-by: Chao Gao <[email protected]>
On Thu, Sep 01, 2022 at 07:17:37PM -0700, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> convert per_cpu_ptr(smp_processor_id()) to this_cpu_ptr() as trivial
> cleanup.
Reviewed-by: Yuan Yao <[email protected]>
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Chao Gao <[email protected]>
> ---
> arch/x86/kvm/x86.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0e200fe44b35..fd021581ca60 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -428,8 +428,7 @@ static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)
>
> int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
> {
> - unsigned int cpu = smp_processor_id();
> - struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
> + struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
> int err;
>
> kvm_user_return_msr_init_cpu(msrs);
> @@ -453,8 +452,7 @@ EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
>
> static void drop_user_return_notifiers(void)
> {
> - unsigned int cpu = smp_processor_id();
> - struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
> + struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
>
> if (msrs->registered)
> kvm_on_user_return(&msrs->urn);
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:36PM -0700, [email protected] wrote:
> From: Sean Christopherson <[email protected]>
>
> KVM/X86 uses user return notifier to switch MSR for guest or user space.
> Snapshot host values on CPU online, change MSR values for guest, and
> restore them on returning to user space. The current code abuses
> kvm_arch_hardware_enable() which is called on kvm module initialization or
> CPU online.
>
> Remove such the abuse of kvm_arch_hardware_enable() by capturing the host
> value on the first change of the MSR value to guest VM instead of CPU
> online.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
> ---
> arch/x86/kvm/x86.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 205ebdc2b11b..0e200fe44b35 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -196,6 +196,7 @@ module_param(eager_page_split, bool, 0644);
>
> struct kvm_user_return_msrs {
> struct user_return_notifier urn;
> + bool initialized;
> bool registered;
> struct kvm_user_return_msr_values {
> u64 host;
> @@ -409,18 +410,20 @@ int kvm_find_user_return_msr(u32 msr)
> }
> EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
>
> -static void kvm_user_return_msr_cpu_online(void)
> +static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)
> {
> - unsigned int cpu = smp_processor_id();
> - struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
> u64 value;
> int i;
>
> + if (msrs->initialized)
> + return;
> +
> for (i = 0; i < kvm_nr_uret_msrs; ++i) {
> rdmsrl_safe(kvm_uret_msrs_list[i], &value);
> msrs->values[i].host = value;
> msrs->values[i].curr = value;
> }
> + msrs->initialized = true;
> }
>
> int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
> @@ -429,6 +432,8 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
> struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
> int err;
>
> + kvm_user_return_msr_init_cpu(msrs);
> +
> value = (value & mask) | (msrs->values[slot].host & ~mask);
> if (value == msrs->values[slot].curr)
> return 0;
> @@ -9212,7 +9217,12 @@ int kvm_arch_init(void *opaque)
> return -ENOMEM;
> }
>
> - user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
> + /*
> + * __GFP_ZERO to ensure user_return_msrs.values[].initialized = false.
> + * See kvm_user_return_msr_init_cpu().
> + */
> + user_return_msrs = alloc_percpu_gfp(struct kvm_user_return_msrs,
> + GFP_KERNEL | __GFP_ZERO);
> if (!user_return_msrs) {
> printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
> r = -ENOMEM;
> @@ -11836,7 +11846,6 @@ int kvm_arch_hardware_enable(void)
> u64 max_tsc = 0;
> bool stable, backwards_tsc = false;
>
> - kvm_user_return_msr_cpu_online();
> ret = static_call(kvm_x86_hardware_enable)();
> if (ret != 0)
> return ret;
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:38PM -0700, [email protected] wrote:
> From: Chao Gao <[email protected]>
>
> so that KVM can do compatibility checks on hotplugged CPUs. Drop __init
> from check_processor_compatibility() and its callees.
>
> use a static_call() to invoke .check_processor_compatibility.
>
> Opportunistically rename {svm,vmx}_check_processor_compat to conform
> to the naming convention of fields of kvm_x86_ops.
>
> Signed-off-by: Chao Gao <[email protected]>
> Reviewed-by: Sean Christopherson <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm/svm.c | 4 ++--
> arch/x86/kvm/vmx/vmx.c | 14 +++++++-------
> arch/x86/kvm/x86.c | 3 +--
> 5 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 51f777071584..3bc45932e2d1 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
> KVM_X86_OP(complete_emulated_msr)
> KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP(check_processor_compatibility)
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..5df5d88d345f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1445,6 +1445,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> struct kvm_x86_ops {
> const char *name;
>
> + int (*check_processor_compatibility)(void);
> int (*hardware_enable)(void);
> void (*hardware_disable)(void);
> void (*hardware_unsetup)(void);
> @@ -1655,7 +1656,6 @@ struct kvm_x86_nested_ops {
> struct kvm_x86_init_ops {
> int (*cpu_has_kvm_support)(void);
> int (*disabled_by_bios)(void);
> - int (*check_processor_compatibility)(void);
> int (*hardware_setup)(void);
> unsigned int (*handle_intel_pt_intr)(void);
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f3813dbacb9f..371300f03f55 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4134,7 +4134,7 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
> hypercall[2] = 0xd9;
> }
>
> -static int __init svm_check_processor_compat(void)
> +static int svm_check_processor_compatibility(void)
> {
> return 0;
> }
> @@ -4740,6 +4740,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .name = "kvm_amd",
>
> .hardware_unsetup = svm_hardware_unsetup,
> + .check_processor_compatibility = svm_check_processor_compatibility,
> .hardware_enable = svm_hardware_enable,
> .hardware_disable = svm_hardware_disable,
> .has_emulated_msr = svm_has_emulated_msr,
> @@ -5122,7 +5123,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
> .cpu_has_kvm_support = has_svm,
> .disabled_by_bios = is_disabled,
> .hardware_setup = svm_hardware_setup,
> - .check_processor_compatibility = svm_check_processor_compat,
>
> .runtime_ops = &svm_x86_ops,
> .pmu_ops = &amd_pmu_ops,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d7f8331d6f7e..3cf7f18a4115 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2495,8 +2495,8 @@ static bool cpu_has_sgx(void)
> return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
> }
>
> -static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
> - u32 msr, u32 *result)
> +static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
> + u32 msr, u32 *result)
> {
> u32 vmx_msr_low, vmx_msr_high;
> u32 ctl = ctl_min | ctl_opt;
> @@ -2514,7 +2514,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
> return 0;
> }
>
> -static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
> +static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
> {
> u64 allowed;
>
> @@ -2523,8 +2523,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
> return ctl_opt & allowed;
> }
>
> -static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> - struct vmx_capability *vmx_cap)
> +static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> + struct vmx_capability *vmx_cap)
> {
> u32 vmx_msr_low, vmx_msr_high;
> u32 min, opt, min2, opt2;
> @@ -7417,7 +7417,7 @@ static int vmx_vm_init(struct kvm *kvm)
> return 0;
> }
>
> -static int __init vmx_check_processor_compat(void)
> +static int vmx_check_processor_compatibility(void)
> {
> struct vmcs_config vmcs_conf;
> struct vmx_capability vmx_cap;
> @@ -8015,6 +8015,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>
> .hardware_unsetup = vmx_hardware_unsetup,
>
> + .check_processor_compatibility = vmx_check_processor_compatibility,
> .hardware_enable = vmx_hardware_enable,
> .hardware_disable = vmx_hardware_disable,
> .has_emulated_msr = vmx_has_emulated_msr,
> @@ -8404,7 +8405,6 @@ static __init int hardware_setup(void)
> static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> .cpu_has_kvm_support = cpu_has_kvm_support,
> .disabled_by_bios = vmx_disabled_by_bios,
> - .check_processor_compatibility = vmx_check_processor_compat,
> .hardware_setup = hardware_setup,
> .handle_intel_pt_intr = NULL,
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd021581ca60..5f12a7ed6f94 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12001,7 +12001,6 @@ void kvm_arch_hardware_unsetup(void)
> int kvm_arch_check_processor_compat(void *opaque)
> {
> struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> - struct kvm_x86_init_ops *ops = opaque;
>
> WARN_ON(!irqs_disabled());
>
> @@ -12009,7 +12008,7 @@ int kvm_arch_check_processor_compat(void *opaque)
> __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> return -EIO;
>
> - return ops->check_processor_compatibility();
> + return static_call(kvm_x86_check_processor_compatibility)();
> }
>
> bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:40PM -0700, [email protected] wrote:
> From: Sean Christopherson <[email protected]>
>
> Provide the name of the calling function to hardware_enable_nolock() and
> include it in the error message to provide additional information on
> exactly what path failed.
>
> Opportunistically bump the pr_info() to pr_warn(), failure to enable
> virtualization support is warn-worthy as _something_ is wrong with the
> system.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> Signed-off-by: Chao Gao <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
> ---
> virt/kvm/kvm_main.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4243a9541543..278eb6cc7cbe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4991,7 +4991,7 @@ static struct miscdevice kvm_dev = {
> &kvm_chardev_ops,
> };
>
> -static void hardware_enable_nolock(void *junk)
> +static void hardware_enable_nolock(void *caller_name)
> {
> int cpu = raw_smp_processor_id();
> int r;
> @@ -5006,7 +5006,8 @@ static void hardware_enable_nolock(void *junk)
> if (r) {
> cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> atomic_inc(&hardware_enable_failed);
> - pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
> + pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
> + cpu, (const char *)caller_name);
> }
> }
>
> @@ -5014,7 +5015,7 @@ static int kvm_starting_cpu(unsigned int cpu)
> {
> raw_spin_lock(&kvm_count_lock);
> if (kvm_usage_count)
> - hardware_enable_nolock(NULL);
> + hardware_enable_nolock((void *)__func__);
> raw_spin_unlock(&kvm_count_lock);
> return 0;
> }
> @@ -5063,7 +5064,7 @@ static int hardware_enable_all(void)
> kvm_usage_count++;
> if (kvm_usage_count == 1) {
> atomic_set(&hardware_enable_failed, 0);
> - on_each_cpu(hardware_enable_nolock, NULL, 1);
> + on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
>
> if (atomic_read(&hardware_enable_failed)) {
> hardware_disable_all_nolock();
> @@ -5686,7 +5687,7 @@ static void kvm_resume(void)
> {
> if (kvm_usage_count) {
> lockdep_assert_not_held(&kvm_count_lock);
> - hardware_enable_nolock(NULL);
> + hardware_enable_nolock((void *)__func__);
> }
> }
>
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:39PM -0700, [email protected] wrote:
> From: Chao Gao <[email protected]>
>
> This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
> param to additional arch funcs") remove opaque from
> kvm_arch_check_processor_compat because no one uses this opaque now.
> Address conflicts for ARM (due to file movement) and manually handle RISC-V
> which comes after the commit.
>
> And changes about kvm_arch_hardware_setup() in original commit are still
> needed so they are not reverted.
>
> Signed-off-by: Chao Gao <[email protected]>
> Reviewed-by: Sean Christopherson <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Acked-by: Anup Patel <[email protected]>
> Acked-by: Claudio Imbrenda <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Kai Huang <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 2 +-
> arch/mips/kvm/mips.c | 2 +-
> arch/powerpc/kvm/powerpc.c | 2 +-
> arch/riscv/kvm/main.c | 2 +-
> arch/s390/kvm/kvm-s390.c | 2 +-
> arch/x86/kvm/x86.c | 2 +-
> include/linux/kvm_host.h | 2 +-
> virt/kvm/kvm_main.c | 16 +++-------------
> 8 files changed, 10 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2ff0ef62abad..3385fb57c11a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -68,7 +68,7 @@ int kvm_arch_hardware_setup(void *opaque)
> return 0;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
> {
> return 0;
> }
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..092d09fb6a7e 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
> return 0;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
> {
> return 0;
> }
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index fb1490761c87..7b56d6ccfdfb 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -447,7 +447,7 @@ int kvm_arch_hardware_setup(void *opaque)
> return 0;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
> {
> return kvmppc_core_check_processor_compat();
> }
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 1549205fe5fe..f8d6372d208f 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
> return -EINVAL;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
> {
> return 0;
> }
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index edfd4bbd0cba..e26d4dd85668 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -254,7 +254,7 @@ int kvm_arch_hardware_enable(void)
> return 0;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
> {
> return 0;
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5f12a7ed6f94..53c8ee677f16 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11998,7 +11998,7 @@ void kvm_arch_hardware_unsetup(void)
> static_call(kvm_x86_hardware_unsetup)();
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
> {
> struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f4519d3689e1..eab352902de7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1438,7 +1438,7 @@ int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
> int kvm_arch_hardware_setup(void *opaque);
> void kvm_arch_hardware_unsetup(void);
> -int kvm_arch_check_processor_compat(void *opaque);
> +int kvm_arch_check_processor_compat(void);
> int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 584a5bab3af3..4243a9541543 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5799,22 +5799,14 @@ void kvm_unregister_perf_callbacks(void)
> }
> #endif
>
> -struct kvm_cpu_compat_check {
> - void *opaque;
> - int *ret;
> -};
> -
> -static void check_processor_compat(void *data)
> +static void check_processor_compat(void *rtn)
> {
> - struct kvm_cpu_compat_check *c = data;
> -
> - *c->ret = kvm_arch_check_processor_compat(c->opaque);
> + *(int *)rtn = kvm_arch_check_processor_compat();
> }
>
> int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> struct module *module)
> {
> - struct kvm_cpu_compat_check c;
> int r;
> int cpu;
>
> @@ -5842,10 +5834,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> if (r < 0)
> goto out_free_1;
>
> - c.ret = &r;
> - c.opaque = opaque;
> for_each_online_cpu(cpu) {
> - smp_call_function_single(cpu, check_processor_compat, &c, 1);
> + smp_call_function_single(cpu, check_processor_compat, &r, 1);
> if (r < 0)
> goto out_free_2;
> }
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:41PM -0700, [email protected] wrote:
> From: Marc Zyngier <[email protected]>
>
> For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
> complicated, and we have two extra CPUHP notifiers for vGIC and timers.
>
> It looks pretty pointless, and gets in the way of further changes.
> So let's just expose some helpers that can be called from the core
> CPUHP callback, and get rid of everything else.
>
> This gives us the opportunity to drop a useless notifier entry,
> as well as tidy-up the timer enable/disable, which was a bit odd.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Chao Gao <[email protected]>
> Reviewed-by: Oliver Upton <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
> arch/arm64/kvm/arm.c | 4 ++++
> arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
> include/kvm/arm_arch_timer.h | 4 ++++
> include/kvm/arm_vgic.h | 4 ++++
> include/linux/cpuhotplug.h | 3 ---
> 6 files changed, 24 insertions(+), 37 deletions(-)
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index bb24a76b4224..33fca1a691a5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> }
>
> -static void kvm_timer_init_interrupt(void *info)
> +void kvm_timer_cpu_up(void)
> {
> enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
> - enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
> + if (host_ptimer_irq)
> + enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
> +}
> +
> +void kvm_timer_cpu_down(void)
> +{
> + disable_percpu_irq(host_vtimer_irq);
> + if (host_ptimer_irq)
> + disable_percpu_irq(host_ptimer_irq);
> }
Should "host_vtimer_irq" be checked yet as host_ptimer_irq ? Because
the host_{v,p}timer_irq is set in same function kvm_irq_init() which
called AFTER the on_each_cpu(_kvm_arch_hardware_enable, NULL, 1) from
init_subsystems():
kvm_init()
kvm_arch_init()
init_subsystems()
on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
kvm_timer_hyp_init()
kvm_irq_init()
host_vtimer_irq = info->virtual_irq;
host_ptimer_irq = info->physical_irq;
hardware_enable_all()
>
> int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> @@ -976,18 +984,6 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
> preempt_enable();
> }
>
> -static int kvm_timer_starting_cpu(unsigned int cpu)
> -{
> - kvm_timer_init_interrupt(NULL);
> - return 0;
> -}
> -
> -static int kvm_timer_dying_cpu(unsigned int cpu)
> -{
> - disable_percpu_irq(host_vtimer_irq);
> - return 0;
> -}
> -
> static int timer_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
> {
> if (vcpu)
> @@ -1185,9 +1181,6 @@ int kvm_timer_hyp_init(bool has_gic)
> goto out_free_irq;
> }
>
> - cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
> - "kvm/arm/timer:starting", kvm_timer_starting_cpu,
> - kvm_timer_dying_cpu);
> return 0;
> out_free_irq:
> free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 3385fb57c11a..0a2f616c4d63 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1670,6 +1670,8 @@ static void _kvm_arch_hardware_enable(void *discard)
> {
> if (!__this_cpu_read(kvm_arm_hardware_enabled)) {
> cpu_hyp_reinit();
> + kvm_vgic_cpu_up();
> + kvm_timer_cpu_up();
> __this_cpu_write(kvm_arm_hardware_enabled, 1);
> }
> }
> @@ -1683,6 +1685,8 @@ int kvm_arch_hardware_enable(void)
> static void _kvm_arch_hardware_disable(void *discard)
> {
> if (__this_cpu_read(kvm_arm_hardware_enabled)) {
> + kvm_timer_cpu_down();
> + kvm_vgic_cpu_down();
> cpu_hyp_reset();
> __this_cpu_write(kvm_arm_hardware_enabled, 0);
> }
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index f6d4f4052555..6c7f6ae21ec0 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -465,17 +465,15 @@ int kvm_vgic_map_resources(struct kvm *kvm)
>
> /* GENERIC PROBE */
>
> -static int vgic_init_cpu_starting(unsigned int cpu)
> +void kvm_vgic_cpu_up(void)
> {
> enable_percpu_irq(kvm_vgic_global_state.maint_irq, 0);
> - return 0;
> }
>
>
> -static int vgic_init_cpu_dying(unsigned int cpu)
> +void kvm_vgic_cpu_down(void)
> {
> disable_percpu_irq(kvm_vgic_global_state.maint_irq);
> - return 0;
> }
>
> static irqreturn_t vgic_maintenance_handler(int irq, void *data)
> @@ -584,19 +582,6 @@ int kvm_vgic_hyp_init(void)
> return ret;
> }
>
> - ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
> - "kvm/arm/vgic:starting",
> - vgic_init_cpu_starting, vgic_init_cpu_dying);
> - if (ret) {
> - kvm_err("Cannot register vgic CPU notifier\n");
> - goto out_free_irq;
> - }
> -
> kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq);
> return 0;
> -
> -out_free_irq:
> - free_percpu_irq(kvm_vgic_global_state.maint_irq,
> - kvm_get_running_vcpus());
> - return ret;
> }
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index cd6d8f260eab..1638418f72dd 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -104,4 +104,8 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
> u32 timer_get_ctl(struct arch_timer_context *ctxt);
> u64 timer_get_cval(struct arch_timer_context *ctxt);
>
> +/* CPU HP callbacks */
> +void kvm_timer_cpu_up(void);
> +void kvm_timer_cpu_down(void);
> +
> #endif
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4df9e73a8bb5..fc4acc91ba06 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -431,4 +431,8 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
> void vgic_v4_commit(struct kvm_vcpu *vcpu);
> int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
>
> +/* CPU HP callbacks */
> +void kvm_vgic_cpu_up(void);
> +void kvm_vgic_cpu_down(void);
> +
> #endif /* __KVM_ARM_VGIC_H */
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index f61447913db9..7337414e4947 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -186,9 +186,6 @@ enum cpuhp_state {
> CPUHP_AP_TI_GP_TIMER_STARTING,
> CPUHP_AP_HYPERV_TIMER_STARTING,
> CPUHP_AP_KVM_STARTING,
> - CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
> - CPUHP_AP_KVM_ARM_VGIC_STARTING,
> - CPUHP_AP_KVM_ARM_TIMER_STARTING,
> /* Must be the last timer callback */
> CPUHP_AP_DUMMY_TIMER_STARTING,
> CPUHP_AP_ARM_XEN_STARTING,
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:42PM -0700, [email protected] wrote:
> From: Chao Gao <[email protected]>
>
> The CPU STARTING section doesn't allow callbacks to fail. Move KVM's
> hotplug callback to ONLINE section so that it can abort onlining a CPU in
> certain cases to avoid potentially breaking VMs running on existing CPUs.
> For example, when kvm fails to enable hardware virtualization on the
> hotplugged CPU.
>
> Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it ensures
> when offlining a CPU, all user tasks and non-pinned kernel tasks have left
> the CPU, i.e. there cannot be a vCPU task around. So, it is safe for KVM's
> CPU offline callback to disable hardware virtualization at that point.
> Likewise, KVM's online callback can enable hardware virtualization before
> any vCPU task gets a chance to run on hotplugged CPUs.
>
> KVM's CPU hotplug callbacks are renamed as well.
>
> Suggested-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Chao Gao <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Reviewed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> include/linux/cpuhotplug.h | 2 +-
> virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++--------
> 2 files changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 7337414e4947..de45be38dd27 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -185,7 +185,6 @@ enum cpuhp_state {
> CPUHP_AP_CSKY_TIMER_STARTING,
> CPUHP_AP_TI_GP_TIMER_STARTING,
> CPUHP_AP_HYPERV_TIMER_STARTING,
> - CPUHP_AP_KVM_STARTING,
> /* Must be the last timer callback */
> CPUHP_AP_DUMMY_TIMER_STARTING,
> CPUHP_AP_ARM_XEN_STARTING,
> @@ -200,6 +199,7 @@ enum cpuhp_state {
>
> /* Online section invoked on the hotplugged CPU from the hotplug thread */
> CPUHP_AP_ONLINE_IDLE,
> + CPUHP_AP_KVM_ONLINE,
> CPUHP_AP_SCHED_WAIT_EMPTY,
> CPUHP_AP_SMPBOOT_THREADS,
> CPUHP_AP_X86_VDSO_VMA_ONLINE,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 278eb6cc7cbe..db1303e2abc9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5011,13 +5011,27 @@ static void hardware_enable_nolock(void *caller_name)
> }
> }
>
> -static int kvm_starting_cpu(unsigned int cpu)
> +static int kvm_online_cpu(unsigned int cpu)
> {
> + int ret = 0;
> +
> raw_spin_lock(&kvm_count_lock);
> - if (kvm_usage_count)
> + /*
> + * Abort the CPU online process if hardware virtualization cannot
> + * be enabled. Otherwise running VMs would encounter unrecoverable
> + * errors when scheduled to this CPU.
> + */
> + if (kvm_usage_count) {
> + WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
Reviewed-by: Yuan Yao <[email protected]>
> +
> hardware_enable_nolock((void *)__func__);
> + if (atomic_read(&hardware_enable_failed)) {
> + atomic_set(&hardware_enable_failed, 0);
> + ret = -EIO;
> + }
> + }
> raw_spin_unlock(&kvm_count_lock);
> - return 0;
> + return ret;
> }
>
> static void hardware_disable_nolock(void *junk)
> @@ -5030,7 +5044,7 @@ static void hardware_disable_nolock(void *junk)
> kvm_arch_hardware_disable();
> }
>
> -static int kvm_dying_cpu(unsigned int cpu)
> +static int kvm_offline_cpu(unsigned int cpu)
> {
> raw_spin_lock(&kvm_count_lock);
> if (kvm_usage_count)
> @@ -5841,8 +5855,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> goto out_free_2;
> }
>
> - r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
> - kvm_starting_cpu, kvm_dying_cpu);
> + r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> + kvm_online_cpu, kvm_offline_cpu);
> if (r)
> goto out_free_2;
> register_reboot_notifier(&kvm_reboot_notifier);
> @@ -5903,7 +5917,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> kmem_cache_destroy(kvm_vcpu_cache);
> out_free_3:
> unregister_reboot_notifier(&kvm_reboot_notifier);
> - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
> + cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> out_free_2:
> kvm_arch_hardware_unsetup();
> out_free_1:
> @@ -5929,7 +5943,7 @@ void kvm_exit(void)
> kvm_async_pf_deinit();
> unregister_syscore_ops(&kvm_syscore_ops);
> unregister_reboot_notifier(&kvm_reboot_notifier);
> - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
> + cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> on_each_cpu(hardware_disable_nolock, NULL, 1);
> kvm_arch_hardware_unsetup();
> kvm_arch_exit();
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:44PM -0700, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> So far the processor compatibility check is not done on resume. It should
> be done.
The resume happens for resuming from S3/S4, so the compatibility
checking is used to detecte CPU replacement, or resume from S4 on an
different machine ?
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> virt/kvm/kvm_main.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0ac00c711384..fc55447c4dba 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5715,6 +5715,13 @@ static int kvm_suspend(void)
>
> static void kvm_resume(void)
> {
> + if (kvm_arch_check_processor_compat())
> + /*
> + * No warning here because kvm_arch_check_processor_compat()
> + * would have warned with more information.
> + */
> + return; /* FIXME: disable KVM */
> +
> if (kvm_usage_count) {
> lockdep_assert_not_held(&kvm_count_lock);
> hardware_enable_nolock((void *)__func__);
> --
> 2.25.1
>
On Mon, Sep 05, 2022 at 04:40:14PM +0800, Yuan Yao wrote:
> On Thu, Sep 01, 2022 at 07:17:44PM -0700, [email protected] wrote:
> > From: Isaku Yamahata <[email protected]>
> >
> > So far the processor compatibility check is not done on resume. It should
> > be done.
>
> The resume happens for resuming from S3/S4, so the compatibility
> checking is used to detecte CPU replacement, or resume from S4 on an
> different machine ?
By did experiments, I found the resume is called once on CPU 0 before
other CPUs come UP, so yes it's necessary to check it.
>
> >
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > ---
> > virt/kvm/kvm_main.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 0ac00c711384..fc55447c4dba 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -5715,6 +5715,13 @@ static int kvm_suspend(void)
> >
> > static void kvm_resume(void)
> > {
> > + if (kvm_arch_check_processor_compat())
> > + /*
> > + * No warning here because kvm_arch_check_processor_compat()
> > + * would have warned with more information.
> > + */
> > + return; /* FIXME: disable KVM */
> > +
> > if (kvm_usage_count) {
> > lockdep_assert_not_held(&kvm_count_lock);
> > hardware_enable_nolock((void *)__func__);
> > --
> > 2.25.1
> >
On Mon, 05 Sep 2022 08:05:09 +0100,
Yuan Yao <[email protected]> wrote:
>
> On Thu, Sep 01, 2022 at 07:17:41PM -0700, [email protected] wrote:
> > From: Marc Zyngier <[email protected]>
> >
> > For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
> > complicated, and we have two extra CPUHP notifiers for vGIC and timers.
> >
> > It looks pretty pointless, and gets in the way of further changes.
> > So let's just expose some helpers that can be called from the core
> > CPUHP callback, and get rid of everything else.
> >
> > This gives us the opportunity to drop a useless notifier entry,
> > as well as tidy-up the timer enable/disable, which was a bit odd.
> >
> > Signed-off-by: Marc Zyngier <[email protected]>
> > Signed-off-by: Chao Gao <[email protected]>
> > Reviewed-by: Oliver Upton <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > ---
> > arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
> > arch/arm64/kvm/arm.c | 4 ++++
> > arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
> > include/kvm/arm_arch_timer.h | 4 ++++
> > include/kvm/arm_vgic.h | 4 ++++
> > include/linux/cpuhotplug.h | 3 ---
> > 6 files changed, 24 insertions(+), 37 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index bb24a76b4224..33fca1a691a5 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> > ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> > }
> >
> > -static void kvm_timer_init_interrupt(void *info)
> > +void kvm_timer_cpu_up(void)
> > {
> > enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
> > - enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
> > + if (host_ptimer_irq)
> > + enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
> > +}
> > +
> > +void kvm_timer_cpu_down(void)
> > +{
> > + disable_percpu_irq(host_vtimer_irq);
> > + if (host_ptimer_irq)
> > + disable_percpu_irq(host_ptimer_irq);
> > }
>
> Should "host_vtimer_irq" be checked yet as host_ptimer_irq ?
No, because although the ptimer interrupt is optional (on older
systems, we fully emulate that timer, including the interrupt), the
vtimer interrupt is always present and can be used unconditionally.
> Because
> the host_{v,p}timer_irq is set in same function kvm_irq_init() which
> called AFTER the on_each_cpu(_kvm_arch_hardware_enable, NULL, 1) from
> init_subsystems():
>
> kvm_init()
> kvm_arch_init()
> init_subsystems()
> on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
> kvm_timer_hyp_init()
> kvm_irq_init()
> host_vtimer_irq = info->virtual_irq;
> host_ptimer_irq = info->physical_irq;
> hardware_enable_all()
This, however, is a very nice catch. I doubt this results in anything
really bad (the interrupt enable will fail as the interrupt number
is 0, and the disable will also fail due to no prior enable), but
that's extremely ugly anyway.
The best course of action AFAICS is to differentiate between the
arm64-specific initialisation (which is a one-off) and the runtime
stuff. Something like the hack below, that I haven't tested yet:
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 32c1022eb4b3..65d03c28f32a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1671,23 +1671,27 @@ static void _kvm_arch_hardware_enable(void *discard)
{
if (!__this_cpu_read(kvm_arm_hardware_enabled)) {
cpu_hyp_reinit();
- kvm_vgic_cpu_up();
- kvm_timer_cpu_up();
__this_cpu_write(kvm_arm_hardware_enabled, 1);
}
}
int kvm_arch_hardware_enable(void)
{
+ int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
+
_kvm_arch_hardware_enable(NULL);
+
+ if (!was_enabled) {
+ kvm_vgic_cpu_up();
+ kvm_timer_cpu_up();
+ }
+
return 0;
}
static void _kvm_arch_hardware_disable(void *discard)
{
if (__this_cpu_read(kvm_arm_hardware_enabled)) {
- kvm_timer_cpu_down();
- kvm_vgic_cpu_down();
cpu_hyp_reset();
__this_cpu_write(kvm_arm_hardware_enabled, 0);
}
@@ -1695,6 +1699,11 @@ static void _kvm_arch_hardware_disable(void *discard)
void kvm_arch_hardware_disable(void)
{
+ if (__this_cpu_read(kvm_arm_hardware_enabled)) {
+ kvm_timer_cpu_down();
+ kvm_vgic_cpu_down();
+ }
+
if (!is_protected_kvm_enabled())
_kvm_arch_hardware_disable(NULL);
}
This should ensure that the init still works as it used to, and that
vgic and timers get switched as per the CPUHP notifiers.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On 2022-09-05 10:29, Marc Zyngier wrote:
> On Mon, 05 Sep 2022 08:05:09 +0100,
> Yuan Yao <[email protected]> wrote:
>>
>> On Thu, Sep 01, 2022 at 07:17:41PM -0700, [email protected]
>> wrote:
>> > From: Marc Zyngier <[email protected]>
>> >
>> > For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
>> > complicated, and we have two extra CPUHP notifiers for vGIC and timers.
>> >
>> > It looks pretty pointless, and gets in the way of further changes.
>> > So let's just expose some helpers that can be called from the core
>> > CPUHP callback, and get rid of everything else.
>> >
>> > This gives us the opportunity to drop a useless notifier entry,
>> > as well as tidy-up the timer enable/disable, which was a bit odd.
>> >
>> > Signed-off-by: Marc Zyngier <[email protected]>
>> > Signed-off-by: Chao Gao <[email protected]>
>> > Reviewed-by: Oliver Upton <[email protected]>
>> > Link: https://lore.kernel.org/r/[email protected]
>> > Signed-off-by: Isaku Yamahata <[email protected]>
>> > ---
>> > arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
>> > arch/arm64/kvm/arm.c | 4 ++++
>> > arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
>> > include/kvm/arm_arch_timer.h | 4 ++++
>> > include/kvm/arm_vgic.h | 4 ++++
>> > include/linux/cpuhotplug.h | 3 ---
>> > 6 files changed, 24 insertions(+), 37 deletions(-)
>> >
>> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>> > index bb24a76b4224..33fca1a691a5 100644
>> > --- a/arch/arm64/kvm/arch_timer.c
>> > +++ b/arch/arm64/kvm/arch_timer.c
>> > @@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>> > ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
>> > }
>> >
>> > -static void kvm_timer_init_interrupt(void *info)
>> > +void kvm_timer_cpu_up(void)
>> > {
>> > enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>> > - enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
>> > + if (host_ptimer_irq)
>> > + enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
>> > +}
>> > +
>> > +void kvm_timer_cpu_down(void)
>> > +{
>> > + disable_percpu_irq(host_vtimer_irq);
>> > + if (host_ptimer_irq)
>> > + disable_percpu_irq(host_ptimer_irq);
>> > }
>>
>> Should "host_vtimer_irq" be checked yet as host_ptimer_irq ?
>
> No, because although the ptimer interrupt is optional (on older
> systems, we fully emulate that timer, including the interrupt), the
> vtimer interrupt is always present and can be used unconditionally.
>
>> Because
>> the host_{v,p}timer_irq is set in same function kvm_irq_init() which
>> called AFTER the on_each_cpu(_kvm_arch_hardware_enable, NULL, 1) from
>> init_subsystems():
>>
>> kvm_init()
>> kvm_arch_init()
>> init_subsystems()
>> on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
>> kvm_timer_hyp_init()
>> kvm_irq_init()
>> host_vtimer_irq = info->virtual_irq;
>> host_ptimer_irq = info->physical_irq;
>> hardware_enable_all()
>
> This, however, is a very nice catch. I doubt this results in anything
> really bad (the interrupt enable will fail as the interrupt number
> is 0, and the disable will also fail due to no prior enable), but
> that's extremely ugly anyway.
>
> The best course of action AFAICS is to differentiate between the
> arm64-specific initialisation (which is a one-off) and the runtime
> stuff. Something like the hack below, that I haven't tested yet:
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 32c1022eb4b3..65d03c28f32a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1671,23 +1671,27 @@ static void _kvm_arch_hardware_enable(void
> *discard)
> {
> if (!__this_cpu_read(kvm_arm_hardware_enabled)) {
> cpu_hyp_reinit();
> - kvm_vgic_cpu_up();
> - kvm_timer_cpu_up();
> __this_cpu_write(kvm_arm_hardware_enabled, 1);
> }
> }
>
> int kvm_arch_hardware_enable(void)
> {
> + int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> +
> _kvm_arch_hardware_enable(NULL);
> +
> + if (!was_enabled) {
> + kvm_vgic_cpu_up();
> + kvm_timer_cpu_up();
> + }
> +
> return 0;
> }
>
> static void _kvm_arch_hardware_disable(void *discard)
> {
> if (__this_cpu_read(kvm_arm_hardware_enabled)) {
> - kvm_timer_cpu_down();
> - kvm_vgic_cpu_down();
> cpu_hyp_reset();
> __this_cpu_write(kvm_arm_hardware_enabled, 0);
> }
> @@ -1695,6 +1699,11 @@ static void _kvm_arch_hardware_disable(void
> *discard)
>
> void kvm_arch_hardware_disable(void)
> {
> + if (__this_cpu_read(kvm_arm_hardware_enabled)) {
> + kvm_timer_cpu_down();
> + kvm_vgic_cpu_down();
> + }
> +
> if (!is_protected_kvm_enabled())
> _kvm_arch_hardware_disable(NULL);
> }
OK, this seems to work here, at least based on a sample of 2
systems, bringing CPUs up and down whist a VM is pinned to
these CPUs.
Isaku, can you please squash this into the original patch
and drop Oliver's Reviewed-by: tag, as this significantly
changes the logic?
Alternatively, I can repost this patch as a standalone change.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Fri, 02 Sep 2022 03:17:35 +0100,
[email protected] wrote:
>
> From: Isaku Yamahata <[email protected]>
>
> Changes from v2:
> - Replace the first patch("KVM: x86: Drop kvm_user_return_msr_cpu_online()")
> with Sean's implementation
> - Included all patches of "Improve KVM's interaction with CPU hotplug" [2]
> Until v2, Tried to cherry-pick the least patches of it. It turned out that
> all the patches are desirable.
>
> This patch series is to implement the suggestion by Sean Christopherson [1]
> to reorganize enable/disable cpu virtualization feature by replacing
> the arch-generic current enable/disable logic with PM related hooks. And
> convert kvm/x86 to use new hooks.
This series totally breaks on arm64 when playing with CPU hotplug. It
very much looks like preemption is now enabled in situations where we
don't expect it to (see below for the full-blown horror show). And
given the way it shows up in common code, I strongly suspect this
affects other architectures too.
Note that if I only take patch #6 (with the subsequent fix that I
posted this morning), the system is perfectly happy with CPUs being
hotplugged on/off ad-nauseam.
Thanks,
M.
[ 108.213362] WARNING: CPU: 1 PID: 18 at arch/arm64/kvm/../../../virt/kvm/kvm_arch.c:38 hardware_disable+0x40/0x5c
[ 108.222403] Modules linked in: macvtap(E) macvlan(E) tap(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) hci_uart(E) btqca(E) btrtl(E) aes_ce_blk(E) btbcm(E) btintel(E) aes_ce_cipher(E) ghash_ce(E) gf128mul(E) sha2_ce(E) sha256_arm64(E) bluetooth(E) sha1_ce(E) meson_saradc(E) panfrost(E) ecdh_generic(E) ecc(E) gpu_sched(E) rfkill(E) drm_shmem_helper(E) industrialio(E) meson_drm(E) meson_rng(E) rng_core(E) meson_dw_hdmi(E) dw_hdmi(E) drm_display_helper(E) meson_canvas(E) cec(E) display_connector(E) drm_cma_helper(E) pwm_meson(E) efi_pstore(E) drm_kms_helper(E) leds_gpio(E) cpufreq_dt(E) fuse(E) drm(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) xhci_plat_hcd(E) meson_gxl(E) realtek(E) dwmac_generic(E) dwc3(E) dwc2(E) ulpi(E) udc_core(E) rtc_hym8563(E) nvme(E) dwmac_meson8b(E) stmmac_platform(E) nvme_core(E) t10_pi(E) mdio_mux_meson_g12a(E) dwc3_meson_g12a(E) i2c_meson(E) mdio_mux(E) meson_gx_mmc(E) stmmac(E) pcs_xpcs(E) phylink(E) of_mdio(E) crc64_rocksoft(E) crc64(E)
[ 108.222572] fixed_phy(E) fwnode_mdio(E) libphy(E) pwm_regulator(E)
[ 108.314691] CPU: 1 PID: 18 Comm: cpuhp/1 Tainted: G E 6.0.0-rc4-00024-g202b793c12e7 #125
[ 108.324090] Hardware name: , BIOS 2021.01-rc2-00012-gde865f7ee1 11/16/2020
[ 108.330992] pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 108.337890] pc : hardware_disable+0x40/0x5c
[ 108.342031] lr : kvm_arch_offline_cpu+0x2c/0x40
[ 108.346515] sp : ffff8000080cbd20
[ 108.349793] x29: ffff8000080cbd20 x28: 0000000000000001 x27: ffff909c1f1d9000
[ 108.356865] x26: 0000000000000000 x25: 0000000000000000 x24: ffff418c3f985768
[ 108.363938] x23: ffffb0f0207ac768 x22: ffffb0f01f33e180 x21: 00000000000002f3
[ 108.371010] x20: 0000000000000001 x19: ffffb0f020d09598 x18: 0000000000000000
[ 108.378083] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 108.385155] x14: 0000000000000032 x13: 0000000000000000 x12: 0000000000000000
[ 108.392228] x11: ffff418bc0503f50 x10: 0000000000000bc0 x9 : ffffb0f01f34b55c
[ 108.399300] x8 : ffff418bc029bce0 x7 : ffff418bc7a1a098 x6 : ffffb0f020d0f3a8
[ 108.406373] x5 : ffffb0f020cfd000 x4 : 0000000000000000 x3 : ffffb0f020d09598
[ 108.413445] x2 : ffff418bc029b0c0 x1 : 0000000000000000 x0 : 0000000000000000
[ 108.420519] Call trace:
[ 108.422933] hardware_disable+0x40/0x5c
[ 108.426729] kvm_arch_offline_cpu+0x2c/0x40
[ 108.430868] kvm_offline_cpu+0x40/0x60
[ 108.434577] cpuhp_invoke_callback+0x16c/0x5b0
[ 108.438976] cpuhp_thread_fun+0xdc/0x194
[ 108.442857] smpboot_thread_fn+0x244/0x270
[ 108.446911] kthread+0xf8/0x100
[ 108.450015] ret_from_fork+0x10/0x20
[ 108.453554] ---[ end trace 0000000000000000 ]---
[ 108.458406] IRQ24: set affinity failed(-22).
[ 108.458424] IRQ29: set affinity failed(-22).
[ 108.458911] psci: CPU1 killed (polled 0 ms)
[ 109.533677] Detected VIPT I-cache on CPU1
[ 109.533817] CPU1: Booted secondary processor 0x0000000100 [0x411fd050]
[ 109.533886] ------------[ cut here ]------------
[ 109.543308] WARNING: CPU: 1 PID: 18 at arch/arm64/kvm/../../../virt/kvm/kvm_arch.c:19 __hardware_enable+0x54/0x80
[ 109.553482] Modules linked in: macvtap(E) macvlan(E) tap(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) hci_uart(E) btqca(E) btrtl(E) aes_ce_blk(E) btbcm(E) btintel(E) aes_ce_cipher(E) ghash_ce(E) gf128mul(E) sha2_ce(E) sha256_arm64(E) bluetooth(E) sha1_ce(E) meson_saradc(E) panfrost(E) ecdh_generic(E) ecc(E) gpu_sched(E) rfkill(E) drm_shmem_helper(E) industrialio(E) meson_drm(E) meson_rng(E) rng_core(E) meson_dw_hdmi(E) dw_hdmi(E) drm_display_helper(E) meson_canvas(E) cec(E) display_connector(E) drm_cma_helper(E) pwm_meson(E) efi_pstore(E) drm_kms_helper(E) leds_gpio(E) cpufreq_dt(E) fuse(E) drm(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) xhci_plat_hcd(E) meson_gxl(E) realtek(E) dwmac_generic(E) dwc3(E) dwc2(E) ulpi(E) udc_core(E) rtc_hym8563(E) nvme(E) dwmac_meson8b(E) stmmac_platform(E) nvme_core(E) t10_pi(E) mdio_mux_meson_g12a(E) dwc3_meson_g12a(E) i2c_meson(E) mdio_mux(E) meson_gx_mmc(E) stmmac(E) pcs_xpcs(E) phylink(E) of_mdio(E) crc64_rocksoft(E) crc64(E)
[ 109.553657] fixed_phy(E) fwnode_mdio(E) libphy(E) pwm_regulator(E)
[ 109.645771] CPU: 1 PID: 18 Comm: cpuhp/1 Tainted: G W E 6.0.0-rc4-00024-g202b793c12e7 #125
[ 109.655170] Hardware name: , BIOS 2021.01-rc2-00012-gde865f7ee1 11/16/2020
[ 109.662071] pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 109.668970] pc : __hardware_enable+0x54/0x80
[ 109.673196] lr : kvm_arch_online_cpu+0x4c/0x5c
[ 109.677594] sp : ffff8000080cbd00
[ 109.680872] x29: ffff8000080cbd00 x28: 0000000000000001 x27: ffff909c1f1d9000
[ 109.687944] x26: 0000000000000000 x25: 0000000000000000 x24: ffff418c3f985768
[ 109.695016] x23: ffffb0f0207ac768 x22: ffffb0f01f33e1e0 x21: 00000000000002f3
[ 109.702089] x20: ffffb0f01ffdc700 x19: 0000000000000001 x18: 0000000000000000
[ 109.709161] x17: 000000006a7316ce x16: 00000000dee945dd x15: 0000000000000030
[ 109.716234] x14: 0000000000000001 x13: 5d30353064663131 x12: 3478305b20303031
[ 109.723306] x11: 3030303030303078 x10: 0000000000000bc0 x9 : ffffb0f01f34b520
[ 109.730379] x8 : ffff418bc029bce0 x7 : 0000000000000003 x6 : ffffb0f020d0f3a8
[ 109.737452] x5 : ffffb0f020cfd000 x4 : 0000000000000000 x3 : ffffb0f020d09598
[ 109.744524] x2 : ffff418bc029b0c0 x1 : 0000000000000000 x0 : 0000000000000000
[ 109.751598] Call trace:
[ 109.754012] __hardware_enable+0x54/0x80
[ 109.757894] kvm_arch_online_cpu+0x4c/0x5c
[ 109.761947] kvm_online_cpu+0x40/0x60
[ 109.765569] cpuhp_invoke_callback+0x16c/0x5b0
[ 109.769968] cpuhp_thread_fun+0xdc/0x194
[ 109.773849] smpboot_thread_fn+0x244/0x270
[ 109.777903] kthread+0xf8/0x100
[ 109.781009] ret_from_fork+0x10/0x20
[ 109.784545] ---[ end trace 0000000000000000 ]---
[ 109.789178] ------------[ cut here ]------------
[ 109.793694] kernel BUG at arch/arm64/kvm/vgic/vgic-init.c:507!
[ 109.799467] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 109.804901] Modules linked in: macvtap(E) macvlan(E) tap(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) hci_uart(E) btqca(E) btrtl(E) aes_ce_blk(E) btbcm(E) btintel(E) aes_ce_cipher(E) ghash_ce(E) gf128mul(E) sha2_ce(E) sha256_arm64(E) bluetooth(E) sha1_ce(E) meson_saradc(E) panfrost(E) ecdh_generic(E) ecc(E) gpu_sched(E) rfkill(E) drm_shmem_helper(E) industrialio(E) meson_drm(E) meson_rng(E) rng_core(E) meson_dw_hdmi(E) dw_hdmi(E) drm_display_helper(E) meson_canvas(E) cec(E) display_connector(E) drm_cma_helper(E) pwm_meson(E) efi_pstore(E) drm_kms_helper(E) leds_gpio(E) cpufreq_dt(E) fuse(E) drm(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) xhci_plat_hcd(E) meson_gxl(E) realtek(E) dwmac_generic(E) dwc3(E) dwc2(E) ulpi(E) udc_core(E) rtc_hym8563(E) nvme(E) dwmac_meson8b(E) stmmac_platform(E) nvme_core(E) t10_pi(E) mdio_mux_meson_g12a(E) dwc3_meson_g12a(E) i2c_meson(E) mdio_mux(E) meson_gx_mmc(E) stmmac(E) pcs_xpcs(E) phylink(E) of_mdio(E) crc64_rocksoft(E) crc64(E)
[ 109.805056] fixed_phy(E) fwnode_mdio(E) libphy(E) pwm_regulator(E)
[ 109.897189] CPU: 1 PID: 18 Comm: cpuhp/1 Tainted: G W E 6.0.0-rc4-00024-g202b793c12e7 #125
[ 109.906588] Hardware name: , BIOS 2021.01-rc2-00012-gde865f7ee1 11/16/2020
[ 109.913490] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 109.920388] pc : kvm_vgic_init_cpu_hardware+0x94/0xa0
[ 109.925391] lr : _kvm_arch_hardware_enable+0xa0/0xb0
[ 109.930307] sp : ffff8000080cbcc0
[ 109.933584] x29: ffff8000080cbcc0 x28: 0000000000000001 x27: ffff909c1f1d9000
[ 109.940657] x26: 0000000000000000 x25: 0000000000000000 x24: ffff418c3f985768
[ 109.947729] x23: ffffb0f0207ac768 x22: ffffb0f01f33e1e0 x21: 00000000000002f3
[ 109.954802] x20: ffffb0f01ffdc700 x19: ffffb0f0207ab8b8 x18: 0000000000000000
[ 109.961874] x17: 000000006a7316ce x16: 00000000dee945dd x15: 0000000000000030
[ 109.968947] x14: 0000000000000001 x13: 5d30353064663131 x12: 3478305b20303031
[ 109.976019] x11: 3030303030303078 x10: 0000000000000bc0 x9 : ffffb0f01f350c30
[ 109.983092] x8 : ffff418bc029bce0 x7 : 0000000000000003 x6 : ffffb0f020d0f3a8
[ 109.990164] x5 : ffffb0f020cfd000 x4 : ffff909c1f1d9000 x3 : ffffb0f020d09598
[ 109.997237] x2 : ffffb0f0207ab8c8 x1 : 0000000000000000 x0 : 0000000000000000
[ 110.004311] Call trace:
[ 110.006725] kvm_vgic_init_cpu_hardware+0x94/0xa0
[ 110.011384] kvm_arch_hardware_enable+0x38/0x70
[ 110.015867] __hardware_enable+0x2c/0x80
[ 110.019748] kvm_arch_online_cpu+0x4c/0x5c
[ 110.023802] kvm_online_cpu+0x40/0x60
[ 110.027424] cpuhp_invoke_callback+0x16c/0x5b0
[ 110.031823] cpuhp_thread_fun+0xdc/0x194
[ 110.035704] smpboot_thread_fn+0x244/0x270
[ 110.039758] kthread+0xf8/0x100
[ 110.042864] ret_from_fork+0x10/0x20
[ 110.046406] Code: 17fffff2 d53b4221 12190020 35fffc40 (d4210000)
[ 110.052440] ---[ end trace 0000000000000000 ]---
[ 110.057009] note: cpuhp/1[18] exited with preempt_count 1
[ 110.062537] ------------[ cut here ]------------
[ 110.066932] WARNING: CPU: 1 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0+0x98/0xa0
[ 110.076330] Modules linked in: macvtap(E) macvlan(E) tap(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) hci_uart(E) btqca(E) btrtl(E) aes_ce_blk(E) btbcm(E) btintel(E) aes_ce_cipher(E) ghash_ce(E) gf128mul(E) sha2_ce(E) sha256_arm64(E) bluetooth(E) sha1_ce(E) meson_saradc(E) panfrost(E) ecdh_generic(E) ecc(E) gpu_sched(E) rfkill(E) drm_shmem_helper(E) industrialio(E) meson_drm(E) meson_rng(E) rng_core(E) meson_dw_hdmi(E) dw_hdmi(E) drm_display_helper(E) meson_canvas(E) cec(E) display_connector(E) drm_cma_helper(E) pwm_meson(E) efi_pstore(E) drm_kms_helper(E) leds_gpio(E) cpufreq_dt(E) fuse(E) drm(E) configfs(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) xhci_plat_hcd(E) meson_gxl(E) realtek(E) dwmac_generic(E) dwc3(E) dwc2(E) ulpi(E) udc_core(E) rtc_hym8563(E) nvme(E) dwmac_meson8b(E) stmmac_platform(E) nvme_core(E) t10_pi(E) mdio_mux_meson_g12a(E) dwc3_meson_g12a(E) i2c_meson(E) mdio_mux(E) meson_gx_mmc(E) stmmac(E) pcs_xpcs(E) phylink(E) of_mdio(E) crc64_rocksoft(E) crc64(E)
[ 110.076588] fixed_phy(E) fwnode_mdio(E) libphy(E) pwm_regulator(E)
[ 110.168618] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D W E 6.0.0-rc4-00024-g202b793c12e7 #125
[ 110.178103] Hardware name: , BIOS 2021.01-rc2-00012-gde865f7ee1 11/16/2020
[ 110.185005] pstate: 204003c9 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 110.191904] pc : ct_kernel_exit.constprop.0+0x98/0xa0
[ 110.196906] lr : ct_idle_enter+0x10/0x20
[ 110.200787] sp : ffff8000080abda0
[ 110.204064] x29: ffff8000080abda0 x28: 0000000000000000 x27: 0000000000000000
[ 110.211137] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 110.218209] x23: 0000000000000000 x22: ffff909c1f1d9000 x21: ffff418bc029d140
[ 110.225281] x20: ffffb0f0207aa008 x19: ffff418c3f992b78 x18: 0000000000000001
[ 110.232354] x17: 3030303030303030 x16: 3030303030303020 x15: 0000000000000030
[ 110.239426] x14: 0000000000000012 x13: 0000000000000000 x12: 0000000000000000
[ 110.246499] x11: ffff418bc0449e60 x10: 0000000000000bc0 x9 : ffffb0f01f3eb3c0
[ 110.253572] x8 : ffff418bc029dd60 x7 : 0000000000000025 x6 : 00000000ac17d5b7
[ 110.260644] x5 : 00ffffffffffffff x4 : 4000000000000002 x3 : ffff8000080abda0
[ 110.267716] x2 : 4000000000000000 x1 : ffffb0f0207b9b78 x0 : ffffb0f0207b9b78
[ 110.274790] Call trace:
[ 110.277205] ct_kernel_exit.constprop.0+0x98/0xa0
[ 110.281863] ct_idle_enter+0x10/0x20
[ 110.285398] default_idle_call+0x58/0x198
[ 110.289366] cpuidle_idle_call+0x170/0x1c0
[ 110.293420] do_idle+0xb4/0x110
[ 110.296525] cpu_startup_entry+0x30/0x40
[ 110.300405] secondary_start_kernel+0xf0/0x144
[ 110.304805] __secondary_switched+0xb0/0xb4
[ 110.308946] ---[ end trace 0000000000000000 ]---
--
Without deviation from the norm, progress is not possible.
On Thu, Sep 01, 2022 at 07:17:43PM -0700, [email protected] wrote:
> From: Chao Gao <[email protected]>
>
> At init time, KVM does compatibility checks to ensure that all online
> CPUs support hardware virtualization and a common set of features. But
> KVM uses hotplugged CPUs without such compatibility checks. On Intel
> CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or
> vmentry failure if the hotplugged CPU doesn't meet minimal feature
> requirements.
>
> Do compatibility checks when onlining a CPU and abort the online process
> if the hotplugged CPU is incompatible with online CPUs.
>
> CPU hotplug is disabled during hardware_enable_all() to prevent the corner
> case as shown below. A hotplugged CPU marks itself online in
> cpu_online_mask (1) and enables interrupt (2) before invoking callbacks
> registered in ONLINE section (3). So, if hardware_enable_all() is invoked
> on another CPU right after (2), then on_each_cpu() in hardware_enable_all()
> invokes hardware_enable_nolock() on the hotplugged CPU before
> kvm_online_cpu() is called. This makes the CPU escape from compatibility
> checks, which is risky.
>
> start_secondary { ...
> set_cpu_online(smp_processor_id(), true); <- 1
> ...
> local_irq_enable(); <- 2
> ...
> cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
> }
>
> Keep compatibility checks at KVM init time. It can help to find
> incompatibility issues earlier and refuse to load arch KVM module
> (e.g., kvm-intel).
>
> Loosen the WARN_ON in kvm_arch_check_processor_compat so that it
> can be invoked from KVM's CPU hotplug callback (i.e., kvm_online_cpu).
>
> Opportunistically, add a pr_err() for setup_vmcs_config() path in
> vmx_check_processor_compatibility() so that each possible error path has
> its own error message. Convert printk(KERN_ERR ... to pr_err to please
> checkpatch.pl
>
> Signed-off-by: Chao Gao <[email protected]>
> Reviewed-by: Sean Christopherson <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
Reviewed-by: Yuan Yao <[email protected]>
> arch/x86/kvm/vmx/vmx.c | 10 ++++++----
> arch/x86/kvm/x86.c | 11 +++++++++--
> virt/kvm/kvm_main.c | 18 +++++++++++++++++-
> 3 files changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3cf7f18a4115..2a1ab6495299 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7421,20 +7421,22 @@ static int vmx_check_processor_compatibility(void)
> {
> struct vmcs_config vmcs_conf;
> struct vmx_capability vmx_cap;
> + int cpu = smp_processor_id();
>
> if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> !this_cpu_has(X86_FEATURE_VMX)) {
> - pr_err("kvm: VMX is disabled on CPU %d\n", smp_processor_id());
> + pr_err("kvm: VMX is disabled on CPU %d\n", cpu);
> return -EIO;
> }
>
> - if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
> + if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) {
> + pr_err("kvm: failed to setup vmcs config on CPU %d\n", cpu);
> return -EIO;
> + }
> if (nested)
> nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
> if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
> - printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
> - smp_processor_id());
> + pr_err("kvm: CPU %d feature inconsistency!\n", cpu);
> return -EIO;
> }
> return 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 53c8ee677f16..68def7ca224a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12000,9 +12000,16 @@ void kvm_arch_hardware_unsetup(void)
>
> int kvm_arch_check_processor_compat(void)
> {
> - struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> + int cpu = smp_processor_id();
> + struct cpuinfo_x86 *c = &cpu_data(cpu);
>
> - WARN_ON(!irqs_disabled());
> + /*
> + * Compatibility checks are done when loading KVM or in KVM's CPU
> + * hotplug callback. It ensures all online CPUs are compatible to run
> + * vCPUs. For other cases, compatibility checks are unnecessary or
> + * even problematic. Try to detect improper usages here.
> + */
> + WARN_ON(!irqs_disabled() && cpu_active(cpu));
>
> if (__cr4_reserved_bits(cpu_has, c) !=
> __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index db1303e2abc9..0ac00c711384 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5013,7 +5013,11 @@ static void hardware_enable_nolock(void *caller_name)
>
> static int kvm_online_cpu(unsigned int cpu)
> {
> - int ret = 0;
> + int ret;
> +
> + ret = kvm_arch_check_processor_compat();
> + if (ret)
> + return ret;
>
> raw_spin_lock(&kvm_count_lock);
> /*
> @@ -5073,6 +5077,17 @@ static int hardware_enable_all(void)
> {
> int r = 0;
>
> + /*
> + * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> + * is called. on_each_cpu() between them includes the CPU. As a result,
> + * hardware_enable_nolock() may get invoked before kvm_online_cpu().
> + * This would enable hardware virtualization on that cpu without
> + * compatibility checks, which can potentially crash system or break
> + * running VMs.
> + *
> + * Disable CPU hotplug to prevent this case from happening.
> + */
> + cpus_read_lock();
> raw_spin_lock(&kvm_count_lock);
>
> kvm_usage_count++;
> @@ -5087,6 +5102,7 @@ static int hardware_enable_all(void)
> }
>
> raw_spin_unlock(&kvm_count_lock);
> + cpus_read_unlock();
>
> return r;
> }
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:45PM -0700, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> Because kvm_count_lock unnecessarily complicates the KVM locking convention
> Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
> simplicity.
>
> Opportunistically add some comments on locking.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> Documentation/virt/kvm/locking.rst | 14 +++++-------
> virt/kvm/kvm_main.c | 34 ++++++++++++++++++++----------
> 2 files changed, 28 insertions(+), 20 deletions(-)
>
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 845a561629f1..8957e32aa724 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
> :Type: mutex
> :Arch: any
> :Protects: - vm_list
> -
> -``kvm_count_lock``
> -^^^^^^^^^^^^^^^^^^
> -
> -:Type: raw_spinlock_t
> -:Arch: any
> -:Protects: - hardware virtualization enable/disable
> -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
> - migration.
> + - kvm_usage_count
> + - hardware virtualization enable/disable
> +:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
> + because hardware enabling/disabling must be atomic /wrt
> + migration. The lock order is cpus lock => kvm_lock.
>
> ``kvm->mn_invalidate_lock``
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fc55447c4dba..082d5dbc8d7f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> */
>
> DEFINE_MUTEX(kvm_lock);
> -static DEFINE_RAW_SPINLOCK(kvm_count_lock);
> LIST_HEAD(vm_list);
>
> static cpumask_var_t cpus_hardware_enabled;
> @@ -4996,6 +4995,8 @@ static void hardware_enable_nolock(void *caller_name)
> int cpu = raw_smp_processor_id();
> int r;
>
> + WARN_ON_ONCE(preemptible());
This looks incorrect, it may triggers everytime when online CPU.
Because patch 7 moved CPUHP_AP_KVM_STARTING *AFTER*
CPUHP_AP_ONLINE_IDLE as CPUHP_AP_KVM_ONLINE, then cpuhp_thread_fun()
runs the new CPUHP_AP_KVM_ONLINE in *non-atomic* context:
cpuhp_thread_fun(unsigned int cpu) {
...
if (cpuhp_is_atomic_state(state)) {
local_irq_disable();
st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
local_irq_enable();
WARN_ON_ONCE(st->result);
} else {
st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
}
...
}
static bool cpuhp_is_atomic_state(enum cpuhp_state state)
{
return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
}
The hardware_enable_nolock() now is called in 2 cases:
1. in atomic context by on_each_cpu().
2. From non-atomic context by CPU hotplug thread.
so how about "WARN_ONCE(preemptible() && cpu_active(cpu))" ?
> +
> if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
> return;
>
> @@ -5019,7 +5020,7 @@ static int kvm_online_cpu(unsigned int cpu)
> if (ret)
> return ret;
>
> - raw_spin_lock(&kvm_count_lock);
> + mutex_lock(&kvm_lock);
> /*
> * Abort the CPU online process if hardware virtualization cannot
> * be enabled. Otherwise running VMs would encounter unrecoverable
> @@ -5034,7 +5035,7 @@ static int kvm_online_cpu(unsigned int cpu)
> ret = -EIO;
> }
> }
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> return ret;
> }
>
> @@ -5042,6 +5043,8 @@ static void hardware_disable_nolock(void *junk)
> {
> int cpu = raw_smp_processor_id();
>
> + WARN_ON_ONCE(preemptible());
Ditto.
> +
> if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
> return;
> cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> @@ -5050,10 +5053,10 @@ static void hardware_disable_nolock(void *junk)
>
> static int kvm_offline_cpu(unsigned int cpu)
> {
> - raw_spin_lock(&kvm_count_lock);
> + mutex_lock(&kvm_lock);
> if (kvm_usage_count)
> hardware_disable_nolock(NULL);
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> return 0;
> }
>
> @@ -5068,9 +5071,11 @@ static void hardware_disable_all_nolock(void)
>
> static void hardware_disable_all(void)
> {
> - raw_spin_lock(&kvm_count_lock);
> + cpus_read_lock();
> + mutex_lock(&kvm_lock);
> hardware_disable_all_nolock();
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> + cpus_read_unlock();
> }
>
> static int hardware_enable_all(void)
> @@ -5088,7 +5093,7 @@ static int hardware_enable_all(void)
> * Disable CPU hotplug to prevent this case from happening.
> */
> cpus_read_lock();
> - raw_spin_lock(&kvm_count_lock);
> + mutex_lock(&kvm_lock);
>
> kvm_usage_count++;
> if (kvm_usage_count == 1) {
> @@ -5101,7 +5106,7 @@ static int hardware_enable_all(void)
> }
> }
>
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> cpus_read_unlock();
>
> return r;
> @@ -5708,8 +5713,15 @@ static void kvm_init_debug(void)
>
> static int kvm_suspend(void)
> {
> - if (kvm_usage_count)
> + /*
> + * The caller ensures that CPU hotlug is disabled by
> + * cpu_hotplug_disable() and other CPUs are offlined. No need for
> + * locking.
> + */
> + if (kvm_usage_count) {
> + lockdep_assert_not_held(&kvm_lock);
> hardware_disable_nolock(NULL);
> + }
> return 0;
> }
>
> @@ -5723,7 +5735,7 @@ static void kvm_resume(void)
> return; /* FIXME: disable KVM */
>
> if (kvm_usage_count) {
> - lockdep_assert_not_held(&kvm_count_lock);
> + lockdep_assert_not_held(&kvm_lock);
> hardware_enable_nolock((void *)__func__);
> }
> }
> --
> 2.25.1
>
On Tue, 06 Sep 2022 03:46:43 +0100,
Yuan Yao <[email protected]> wrote:
>
> On Thu, Sep 01, 2022 at 07:17:45PM -0700, [email protected] wrote:
> > From: Isaku Yamahata <[email protected]>
> >
> > Because kvm_count_lock unnecessarily complicates the KVM locking convention
> > Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
> > simplicity.
> >
> > Opportunistically add some comments on locking.
> >
> > Suggested-by: Sean Christopherson <[email protected]>
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > ---
> > Documentation/virt/kvm/locking.rst | 14 +++++-------
> > virt/kvm/kvm_main.c | 34 ++++++++++++++++++++----------
> > 2 files changed, 28 insertions(+), 20 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> > index 845a561629f1..8957e32aa724 100644
> > --- a/Documentation/virt/kvm/locking.rst
> > +++ b/Documentation/virt/kvm/locking.rst
> > @@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
> > :Type: mutex
> > :Arch: any
> > :Protects: - vm_list
> > -
> > -``kvm_count_lock``
> > -^^^^^^^^^^^^^^^^^^
> > -
> > -:Type: raw_spinlock_t
> > -:Arch: any
> > -:Protects: - hardware virtualization enable/disable
> > -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
> > - migration.
> > + - kvm_usage_count
> > + - hardware virtualization enable/disable
> > +:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
> > + because hardware enabling/disabling must be atomic /wrt
> > + migration. The lock order is cpus lock => kvm_lock.
> >
> > ``kvm->mn_invalidate_lock``
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fc55447c4dba..082d5dbc8d7f 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> > */
> >
> > DEFINE_MUTEX(kvm_lock);
> > -static DEFINE_RAW_SPINLOCK(kvm_count_lock);
> > LIST_HEAD(vm_list);
> >
> > static cpumask_var_t cpus_hardware_enabled;
> > @@ -4996,6 +4995,8 @@ static void hardware_enable_nolock(void *caller_name)
> > int cpu = raw_smp_processor_id();
> > int r;
> >
> > + WARN_ON_ONCE(preemptible());
>
> This looks incorrect, it may triggers everytime when online CPU.
> Because patch 7 moved CPUHP_AP_KVM_STARTING *AFTER*
> CPUHP_AP_ONLINE_IDLE as CPUHP_AP_KVM_ONLINE, then cpuhp_thread_fun()
> runs the new CPUHP_AP_KVM_ONLINE in *non-atomic* context:
>
> cpuhp_thread_fun(unsigned int cpu) {
> ...
> if (cpuhp_is_atomic_state(state)) {
> local_irq_disable();
> st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
> local_irq_enable();
>
> WARN_ON_ONCE(st->result);
> } else {
> st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
> }
> ...
> }
>
> static bool cpuhp_is_atomic_state(enum cpuhp_state state)
> {
> return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
> }
>
> The hardware_enable_nolock() now is called in 2 cases:
> 1. in atomic context by on_each_cpu().
> 2. From non-atomic context by CPU hotplug thread.
>
> so how about "WARN_ONCE(preemptible() && cpu_active(cpu))" ?
I suspect similar changes must be applied to the arm64 side (though
I'm still looking for a good definition of cpu_active()).
M.
--
Without deviation from the norm, progress is not possible.
On Thu, Sep 01, 2022 at 07:17:46PM -0700, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add arch hooks for reboot, suspend, resume, and CPU-online/offline events
> with empty stub functions.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> include/linux/kvm_host.h | 6 +++++
> virt/kvm/Makefile.kvm | 2 +-
> virt/kvm/kvm_arch.c | 44 +++++++++++++++++++++++++++++++++++
> virt/kvm/kvm_main.c | 50 +++++++++++++++++++++++++---------------
> 4 files changed, 82 insertions(+), 20 deletions(-)
> create mode 100644 virt/kvm/kvm_arch.c
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index eab352902de7..dd2a6d98d4de 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1448,6 +1448,12 @@ int kvm_arch_post_init_vm(struct kvm *kvm);
> void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> int kvm_arch_create_vm_debugfs(struct kvm *kvm);
>
> +int kvm_arch_suspend(int usage_count);
> +void kvm_arch_resume(int usage_count);
> +int kvm_arch_reboot(int val);
> +int kvm_arch_online_cpu(unsigned int cpu, int usage_count);
> +int kvm_arch_offline_cpu(unsigned int cpu, int usage_count);
> +
> #ifndef __KVM_HAVE_ARCH_VM_ALLOC
> /*
> * All architectures that want to use vzalloc currently also
> diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
> index 2c27d5d0c367..c4210acabd35 100644
> --- a/virt/kvm/Makefile.kvm
> +++ b/virt/kvm/Makefile.kvm
> @@ -5,7 +5,7 @@
>
> KVM ?= ../../../virt/kvm
>
> -kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
> +kvm-y := $(KVM)/kvm_main.o $(KVM)/kvm_arch.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
> kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
> kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
> kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
> diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
> new file mode 100644
> index 000000000000..4748a76bcb03
> --- /dev/null
> +++ b/virt/kvm/kvm_arch.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * kvm_arch.c: kvm default arch hooks for hardware enabling/disabling
> + * Copyright (c) 2022 Intel Corporation.
> + *
> + * Author:
> + * Isaku Yamahata <[email protected]>
> + * <[email protected]>
> + */
> +
> +#include <linux/kvm_host.h>
> +
> +/*
> + * Called after the VM is otherwise initialized, but just before adding it to
> + * the vm_list.
> + */
> +__weak int kvm_arch_post_init_vm(struct kvm *kvm)
> +{
> + return 0;
> +}
> +
> +__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
> +{
> + return 0;
> +}
> +
> +__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
> +{
> + return 0;
> +}
> +
> +__weak int kvm_arch_reboot(int val)
> +{
> + return NOTIFY_OK;
> +}
> +
> +__weak int kvm_arch_suspend(int usage_count)
> +{
> + return 0;
> +}
> +
> +__weak void kvm_arch_resume(int usage_count)
> +{
> +}
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 082d5dbc8d7f..e62240fb8474 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -144,6 +144,7 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
> #endif
> static int hardware_enable_all(void);
> static void hardware_disable_all(void);
> +static void hardware_disable_nolock(void *junk);
>
> static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>
> @@ -1097,15 +1098,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
> return ret;
> }
>
> -/*
> - * Called after the VM is otherwise initialized, but just before adding it to
> - * the vm_list.
> - */
> -int __weak kvm_arch_post_init_vm(struct kvm *kvm)
> -{
> - return 0;
> -}
> -
> /*
> * Called just after removing the VM from the vm_list, but before doing any
> * other destruction.
> @@ -5033,6 +5025,10 @@ static int kvm_online_cpu(unsigned int cpu)
> if (atomic_read(&hardware_enable_failed)) {
> atomic_set(&hardware_enable_failed, 0);
> ret = -EIO;
> + } else {
> + ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
> + if (ret)
> + hardware_disable_nolock(NULL);
> }
> }
> mutex_unlock(&kvm_lock);
> @@ -5053,11 +5049,19 @@ static void hardware_disable_nolock(void *junk)
>
> static int kvm_offline_cpu(unsigned int cpu)
> {
> + int ret = 0;
> +
> mutex_lock(&kvm_lock);
> - if (kvm_usage_count)
> + if (kvm_usage_count) {
> hardware_disable_nolock(NULL);
> + ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
> + if (ret) {
> + (void)hardware_enable_nolock(NULL);
> + atomic_set(&hardware_enable_failed, 0);
> + }
> + }
> mutex_unlock(&kvm_lock);
> - return 0;
> + return ret;
> }
>
> static void hardware_disable_all_nolock(void)
> @@ -5115,6 +5119,8 @@ static int hardware_enable_all(void)
> static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> void *v)
> {
> + int r;
> +
> /*
> * Some (well, at least mine) BIOSes hang on reboot if
> * in vmx root mode.
> @@ -5123,8 +5129,15 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> */
> pr_info("kvm: exiting hardware virtualization\n");
> kvm_rebooting = true;
> +
> + /* This hook is called without cpuhotplug disabled. */
> + cpus_read_lock();
> + mutex_lock(&kvm_lock);
> on_each_cpu(hardware_disable_nolock, NULL, 1);
> - return NOTIFY_OK;
> + r = kvm_arch_reboot(val);
> + mutex_unlock(&kvm_lock);
> + cpus_read_unlock();
> + return r;
> }
>
> static struct notifier_block kvm_reboot_notifier = {
> @@ -5718,11 +5731,10 @@ static int kvm_suspend(void)
> * cpu_hotplug_disable() and other CPUs are offlined. No need for
> * locking.
> */
> - if (kvm_usage_count) {
> - lockdep_assert_not_held(&kvm_lock);
> + lockdep_assert_not_held(&kvm_lock);
> + if (kvm_usage_count)
> hardware_disable_nolock(NULL);
> - }
> - return 0;
> + return kvm_arch_suspend(kvm_usage_count);
> }
>
e static void kvm_resume(void)
> @@ -5734,10 +5746,10 @@ static void kvm_resume(void)
> */
> return; /* FIXME: disable KVM */
>
> - if (kvm_usage_count) {
> - lockdep_assert_not_held(&kvm_lock);
> + lockdep_assert_not_held(&kvm_lock);
> + kvm_arch_resume(kvm_usage_count);
> + if (kvm_usage_count)
> hardware_enable_nolock((void *)__func__);
Is single kvm_arch_{suspend,resume} enough?
The sequence is:
kvm_arch_resume()
hardware_enable_nolock()
But in patch 12 I see below code in x86's kvm_arch_resume():
...
if (!usage_count)
return;
if (kvm_arch_hardware_enable())
return;
...
So kvm_arch_resume() may depend on hardware enable, and it checks the
usage_count again, how about call kvm_arch_resume(bool
hardware_enabled) before/after the hardware_enable_nolock() (or even
give different functions with _before/_after suffix) for better
flexibility since it's common code for all architectures, e.g.
if (kvm_usage_count) {
kvm_arch_resume(false);
hardware_enable_nolock(__func__);
kvm_arch_resumt(true);
}
> - }
> }
>
> static struct syscore_ops kvm_syscore_ops = {
> --
> 2.25.1
>
On Thu, Sep 01, 2022 at 07:17:49PM -0700, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> To make clear that those files are default implementation that KVM/x86 (and
> other KVM arch in future) will override them, split out those into a single
> file. Once conversions for all kvm archs are done, the file will be
> deleted. kvm_arch_pre_hardware_unsetup() is introduced to avoid cross-arch
> code churn for now. Once it's settled down,
> kvm_arch_pre_hardware_unsetup() can be merged into
> kvm_arch_hardware_unsetup() in each arch code.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_arch.c | 103 ++++++++++++++++++++++-
> virt/kvm/kvm_main.c | 172 +++++----------------------------------
> 3 files changed, 124 insertions(+), 152 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f78364e01ca9..60f4ae9d6f48 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
> int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
> int kvm_arch_hardware_setup(void *opaque);
> +void kvm_arch_pre_hardware_unsetup(void);
> void kvm_arch_hardware_unsetup(void);
> int kvm_arch_check_processor_compat(void);
> int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
> index 0eac996f4981..0648d4463d9e 100644
> --- a/virt/kvm/kvm_arch.c
> +++ b/virt/kvm/kvm_arch.c
> @@ -6,49 +6,148 @@
> * Author:
> * Isaku Yamahata <[email protected]>
> * <[email protected]>
> + *
> + * TODO: Delete this file once the conversion of all KVM arch is done.
> */
>
> #include <linux/kvm_host.h>
>
> +static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
> +static atomic_t hardware_enable_failed;
> +
> __weak int kvm_arch_post_init_vm(struct kvm *kvm)
> {
> return 0;
> }
>
> +static void hardware_enable_nolock(void *caller_name)
> +{
> + int cpu = raw_smp_processor_id();
> + int r;
> +
> + WARN_ON_ONCE(preemptible());
> +
> + if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
> + return;
> +
> + cpumask_set_cpu(cpu, &cpus_hardware_enabled);
> +
> + r = kvm_arch_hardware_enable();
> +
> + if (r) {
> + cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
> + atomic_inc(&hardware_enable_failed);
> + pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
> + cpu, (const char *)caller_name);
> + }
> +}
> +
> +static void hardware_disable_nolock(void *junk)
> +{
> + int cpu = raw_smp_processor_id();
> +
> + WARN_ON_ONCE(preemptible());
> +
> + if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
> + return;
> + cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
> + kvm_arch_hardware_disable();
> +}
> +
> +__weak void kvm_arch_pre_hardware_unsetup(void)
> +{
> + on_each_cpu(hardware_disable_nolock, NULL, 1);
> +}
> +
> /*
> * Called after the VM is otherwise initialized, but just before adding it to
> * the vm_list.
> */
> __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
> {
> - return kvm_arch_post_init_vm(kvm);
> + int r = 0;
> +
> + if (usage_count != 1)
> + return 0;
> +
> + atomic_set(&hardware_enable_failed, 0);
> + on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
This function is called in kvm_create_vm:
kvm_create_vm {
...
enable_hardware_all()
...
kvm_arch_add_vm()
...
}
so don't need on_each_cpu(enable_hardware_nolock) here, or the
enable_hardware_all() shuold be removed from kvm_create_vm().
> +
> + if (atomic_read(&hardware_enable_failed)) {
> + r = -EBUSY;
> + goto err;
> + }
> +
> + r = kvm_arch_post_init_vm(kvm);
> +err:
> + if (r)
> + on_each_cpu(hardware_disable_nolock, NULL, 1);
> + return r;
> }
>
> __weak int kvm_arch_del_vm(int usage_count)
> {
> + if (usage_count)
> + return 0;
> +
> + on_each_cpu(hardware_disable_nolock, NULL, 1);
Same pattern as above:
kvm_destory_vm {
...
disable_hardware_all()
...
kvm_arch_del_vm()
...
}
> return 0;
> }
>
> __weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
> {
> - return 0;
> + int ret = 0;
> +
> + ret = kvm_arch_check_processor_compat();
> + if (ret)
> + return ret;
> +
> + /*
> + * Abort the CPU online process if hardware virtualization cannot
> + * be enabled. Otherwise running VMs would encounter unrecoverable
> + * errors when scheduled to this CPU.
> + */
> + if (usage_count) {
> + WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
> +
> + hardware_enable_nolock((void *)__func__);
> + if (atomic_read(&hardware_enable_failed)) {
> + atomic_set(&hardware_enable_failed, 0);
> + ret = -EIO;
> + }
> + }
> + return ret;
> }
>
> __weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
> {
> + if (usage_count)
> + hardware_disable_nolock(NULL);
> return 0;
> }
>
> __weak int kvm_arch_reboot(int val)
> {
> + on_each_cpu(hardware_disable_nolock, NULL, 1);
> return NOTIFY_OK;
> }
>
> __weak int kvm_arch_suspend(int usage_count)
> {
> + if (usage_count)
> + hardware_disable_nolock(NULL);
> return 0;
> }
>
> __weak void kvm_arch_resume(int usage_count)
> {
> + if (kvm_arch_check_processor_compat())
> + /*
> + * No warning here because kvm_arch_check_processor_compat()
> + * would have warned with more information.
> + */
> + return; /* FIXME: disable KVM */
> +
> + if (usage_count)
> + hardware_enable_nolock((void *)__func__);
> }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 90e1dcfc9ace..5373127dcdb6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,9 +102,7 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> DEFINE_MUTEX(kvm_lock);
> LIST_HEAD(vm_list);
>
> -static cpumask_var_t cpus_hardware_enabled;
> static int kvm_usage_count;
> -static atomic_t hardware_enable_failed;
>
> static struct kmem_cache *kvm_vcpu_cache;
>
> @@ -142,9 +140,6 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
> #define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
> .open = kvm_no_compat_open
> #endif
> -static int hardware_enable_all(void);
> -static void hardware_disable_all(void);
> -static void hardware_disable_nolock(void *junk);
> static void kvm_del_vm(void);
>
> static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
> @@ -1196,10 +1191,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> if (r)
> goto out_err_no_arch_destroy_vm;
>
> - r = hardware_enable_all();
> - if (r)
> - goto out_err_no_disable;
> -
> #ifdef CONFIG_HAVE_KVM_IRQFD
> INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
> #endif
> @@ -1216,14 +1207,28 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> if (r)
> goto out_err_no_debugfs;
>
> + /*
> + * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> + * is called. on_each_cpu() between them includes the CPU. As a result,
> + * hardware_enable_nolock() may get invoked before kvm_online_cpu().
> + * This would enable hardware virtualization on that cpu without
> + * compatibility checks, which can potentially crash system or break
> + * running VMs.
> + *
> + * Disable CPU hotplug to prevent this case from happening.
> + */
> + cpus_read_lock();
> mutex_lock(&kvm_lock);
> + kvm_usage_count++;
> r = kvm_arch_add_vm(kvm, kvm_usage_count);
> if (r) {
> + /* the following kvm_del_vm() decrements kvm_usage_count. */
> mutex_unlock(&kvm_lock);
> goto out_err;
> }
> list_add(&kvm->vm_list, &vm_list);
> mutex_unlock(&kvm_lock);
> + cpus_read_unlock();
>
> preempt_notifier_inc();
> kvm_init_pm_notifier(kvm);
> @@ -1240,9 +1245,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
> #endif
> out_err_no_mmu_notifier:
> - hardware_disable_all();
> kvm_del_vm();
> -out_err_no_disable:
> kvm_arch_destroy_vm(kvm);
> out_err_no_arch_destroy_vm:
> WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
> @@ -1321,7 +1324,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
> cleanup_srcu_struct(&kvm->srcu);
> kvm_arch_free_vm(kvm);
> preempt_notifier_dec();
> - hardware_disable_all();
> kvm_del_vm();
> mmdrop(mm);
> module_put(kvm_chardev_ops.owner);
> @@ -4986,149 +4988,37 @@ static struct miscdevice kvm_dev = {
> &kvm_chardev_ops,
> };
>
> -static void hardware_enable_nolock(void *caller_name)
> -{
> - int cpu = raw_smp_processor_id();
> - int r;
> -
> - WARN_ON_ONCE(preemptible());
> -
> - if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
> - return;
> -
> - cpumask_set_cpu(cpu, cpus_hardware_enabled);
> -
> - r = kvm_arch_hardware_enable();
> -
> - if (r) {
> - cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> - atomic_inc(&hardware_enable_failed);
> - pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
> - cpu, (const char *)caller_name);
> - }
> -}
> -
> static int kvm_online_cpu(unsigned int cpu)
> {
> int ret;
>
> - ret = kvm_arch_check_processor_compat();
> - if (ret)
> - return ret;
> -
> mutex_lock(&kvm_lock);
> - /*
> - * Abort the CPU online process if hardware virtualization cannot
> - * be enabled. Otherwise running VMs would encounter unrecoverable
> - * errors when scheduled to this CPU.
> - */
> - if (kvm_usage_count) {
> - WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
> -
> - hardware_enable_nolock((void *)__func__);
> - if (atomic_read(&hardware_enable_failed)) {
> - atomic_set(&hardware_enable_failed, 0);
> - ret = -EIO;
> - } else {
> - ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
> - if (ret)
> - hardware_disable_nolock(NULL);
> - }
> - }
> + ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
> mutex_unlock(&kvm_lock);
> return ret;
> }
>
> -static void hardware_disable_nolock(void *junk)
> -{
> - int cpu = raw_smp_processor_id();
> -
> - WARN_ON_ONCE(preemptible());
> -
> - if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
> - return;
> - cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> - kvm_arch_hardware_disable();
> -}
> -
> static int kvm_offline_cpu(unsigned int cpu)
> {
> - int ret = 0;
> + int ret;
>
> mutex_lock(&kvm_lock);
> - if (kvm_usage_count) {
> - hardware_disable_nolock(NULL);
> - ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
> - if (ret) {
> - (void)hardware_enable_nolock(NULL);
> - atomic_set(&hardware_enable_failed, 0);
> - }
> - }
> + ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
> mutex_unlock(&kvm_lock);
> return ret;
> }
>
> -static void hardware_disable_all_nolock(void)
> -{
> - BUG_ON(!kvm_usage_count);
> -
> - kvm_usage_count--;
> - if (!kvm_usage_count)
> - on_each_cpu(hardware_disable_nolock, NULL, 1);
> -}
> -
> -static void hardware_disable_all(void)
> -{
> - cpus_read_lock();
> - mutex_lock(&kvm_lock);
> - hardware_disable_all_nolock();
> - mutex_unlock(&kvm_lock);
> - cpus_read_unlock();
> -}
> -
> static void kvm_del_vm(void)
> {
> cpus_read_lock();
> mutex_lock(&kvm_lock);
> + WARN_ON_ONCE(!kvm_usage_count);
> + kvm_usage_count--;
> kvm_arch_del_vm(kvm_usage_count);
> mutex_unlock(&kvm_lock);
> cpus_read_unlock();
> }
>
> -static int hardware_enable_all(void)
> -{
> - int r = 0;
> -
> - /*
> - * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> - * is called. on_each_cpu() between them includes the CPU. As a result,
> - * hardware_enable_nolock() may get invoked before kvm_online_cpu().
> - * This would enable hardware virtualization on that cpu without
> - * compatibility checks, which can potentially crash system or break
> - * running VMs.
> - *
> - * Disable CPU hotplug to prevent this case from happening.
> - */
> - cpus_read_lock();
> - mutex_lock(&kvm_lock);
> -
> - kvm_usage_count++;
> - if (kvm_usage_count == 1) {
> - atomic_set(&hardware_enable_failed, 0);
> - on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
> -
> - if (atomic_read(&hardware_enable_failed)) {
> - hardware_disable_all_nolock();
> - r = -EBUSY;
> - }
> - }
> -
> - mutex_unlock(&kvm_lock);
> - cpus_read_unlock();
> -
> - return r;
> -}
> -
> static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> void *v)
> {
> @@ -5146,7 +5036,6 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> /* This hook is called without cpuhotplug disabled. */
> cpus_read_lock();
> mutex_lock(&kvm_lock);
> - on_each_cpu(hardware_disable_nolock, NULL, 1);
> r = kvm_arch_reboot(val);
> mutex_unlock(&kvm_lock);
> cpus_read_unlock();
> @@ -5745,24 +5634,13 @@ static int kvm_suspend(void)
> * locking.
> */
> lockdep_assert_not_held(&kvm_lock);
> - if (kvm_usage_count)
> - hardware_disable_nolock(NULL);
> return kvm_arch_suspend(kvm_usage_count);
> }
>
> static void kvm_resume(void)
> {
> - if (kvm_arch_check_processor_compat())
> - /*
> - * No warning here because kvm_arch_check_processor_compat()
> - * would have warned with more information.
> - */
> - return; /* FIXME: disable KVM */
> -
> lockdep_assert_not_held(&kvm_lock);
> kvm_arch_resume(kvm_usage_count);
> - if (kvm_usage_count)
> - hardware_enable_nolock((void *)__func__);
> }
>
> static struct syscore_ops kvm_syscore_ops = {
> @@ -5900,11 +5778,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> if (r)
> goto out_irqfd;
>
> - if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
> - r = -ENOMEM;
> - goto out_free_0;
> - }
> -
> r = kvm_arch_hardware_setup(opaque);
> if (r < 0)
> goto out_free_1;
> @@ -5981,8 +5854,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> out_free_2:
> kvm_arch_hardware_unsetup();
> out_free_1:
> - free_cpumask_var(cpus_hardware_enabled);
> -out_free_0:
> kvm_irqfd_exit();
> out_irqfd:
> kvm_arch_exit();
> @@ -6004,11 +5875,12 @@ void kvm_exit(void)
> unregister_syscore_ops(&kvm_syscore_ops);
> unregister_reboot_notifier(&kvm_reboot_notifier);
> cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> - on_each_cpu(hardware_disable_nolock, NULL, 1);
> + cpus_read_lock();
> + kvm_arch_pre_hardware_unsetup();
> kvm_arch_hardware_unsetup();
> + cpus_read_unlock();
> kvm_arch_exit();
> kvm_irqfd_exit();
> - free_cpumask_var(cpus_hardware_enabled);
> kvm_vfio_ops_exit();
> }
> EXPORT_SYMBOL_GPL(kvm_exit);
> --
> 2.25.1
>
On Tue, Sep 06, 2022 at 07:32:22AM +0100,
Marc Zyngier <[email protected]> wrote:
> On Tue, 06 Sep 2022 03:46:43 +0100,
> Yuan Yao <[email protected]> wrote:
> >
> > On Thu, Sep 01, 2022 at 07:17:45PM -0700, [email protected] wrote:
> > > From: Isaku Yamahata <[email protected]>
> > >
> > > Because kvm_count_lock unnecessarily complicates the KVM locking convention
> > > Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
> > > simplicity.
> > >
> > > Opportunistically add some comments on locking.
> > >
> > > Suggested-by: Sean Christopherson <[email protected]>
> > > Signed-off-by: Isaku Yamahata <[email protected]>
> > > ---
> > > Documentation/virt/kvm/locking.rst | 14 +++++-------
> > > virt/kvm/kvm_main.c | 34 ++++++++++++++++++++----------
> > > 2 files changed, 28 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> > > index 845a561629f1..8957e32aa724 100644
> > > --- a/Documentation/virt/kvm/locking.rst
> > > +++ b/Documentation/virt/kvm/locking.rst
> > > @@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
> > > :Type: mutex
> > > :Arch: any
> > > :Protects: - vm_list
> > > -
> > > -``kvm_count_lock``
> > > -^^^^^^^^^^^^^^^^^^
> > > -
> > > -:Type: raw_spinlock_t
> > > -:Arch: any
> > > -:Protects: - hardware virtualization enable/disable
> > > -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
> > > - migration.
> > > + - kvm_usage_count
> > > + - hardware virtualization enable/disable
> > > +:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
> > > + because hardware enabling/disabling must be atomic /wrt
> > > + migration. The lock order is cpus lock => kvm_lock.
> > >
> > > ``kvm->mn_invalidate_lock``
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index fc55447c4dba..082d5dbc8d7f 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> > > */
> > >
> > > DEFINE_MUTEX(kvm_lock);
> > > -static DEFINE_RAW_SPINLOCK(kvm_count_lock);
> > > LIST_HEAD(vm_list);
> > >
> > > static cpumask_var_t cpus_hardware_enabled;
> > > @@ -4996,6 +4995,8 @@ static void hardware_enable_nolock(void *caller_name)
> > > int cpu = raw_smp_processor_id();
> > > int r;
> > >
> > > + WARN_ON_ONCE(preemptible());
> >
> > This looks incorrect, it may triggers everytime when online CPU.
> > Because patch 7 moved CPUHP_AP_KVM_STARTING *AFTER*
> > CPUHP_AP_ONLINE_IDLE as CPUHP_AP_KVM_ONLINE, then cpuhp_thread_fun()
> > runs the new CPUHP_AP_KVM_ONLINE in *non-atomic* context:
> >
> > cpuhp_thread_fun(unsigned int cpu) {
> > ...
> > if (cpuhp_is_atomic_state(state)) {
> > local_irq_disable();
> > st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
> > local_irq_enable();
> >
> > WARN_ON_ONCE(st->result);
> > } else {
> > st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
> > }
> > ...
> > }
> >
> > static bool cpuhp_is_atomic_state(enum cpuhp_state state)
> > {
> > return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
> > }
> >
> > The hardware_enable_nolock() now is called in 2 cases:
> > 1. in atomic context by on_each_cpu().
> > 2. From non-atomic context by CPU hotplug thread.
> >
> > so how about "WARN_ONCE(preemptible() && cpu_active(cpu))" ?
>
> I suspect similar changes must be applied to the arm64 side (though
> I'm still looking for a good definition of cpu_active()).
It seems plausible. I tested cpu online/offline on x86. Let me update arm64 code
too.
--
Isaku Yamahata <[email protected]>
On Mon, Sep 05, 2022 at 04:38:39PM +0100,
Marc Zyngier <[email protected]> wrote:
> On Fri, 02 Sep 2022 03:17:35 +0100,
> [email protected] wrote:
> >
> > From: Isaku Yamahata <[email protected]>
> >
> > Changes from v2:
> > - Replace the first patch("KVM: x86: Drop kvm_user_return_msr_cpu_online()")
> > with Sean's implementation
> > - Included all patches of "Improve KVM's interaction with CPU hotplug" [2]
> > Until v2, Tried to cherry-pick the least patches of it. It turned out that
> > all the patches are desirable.
> >
> > This patch series is to implement the suggestion by Sean Christopherson [1]
> > to reorganize enable/disable cpu virtualization feature by replacing
> > the arch-generic current enable/disable logic with PM related hooks. And
> > convert kvm/x86 to use new hooks.
>
> This series totally breaks on arm64 when playing with CPU hotplug. It
> very much looks like preemption is now enabled in situations where we
> don't expect it to (see below for the full-blown horror show). And
> given the way it shows up in common code, I strongly suspect this
> affects other architectures too.
>
> Note that if I only take patch #6 (with the subsequent fix that I
> posted this morning), the system is perfectly happy with CPUs being
> hotplugged on/off ad-nauseam.
>
Thanks for testing. As the discussion in 10/22, it seems like we need to relax
the condition of WARN_ON or BUG_ON of preemptible(). Let me cook a version
to relax it.
--
Isaku Yamahata <[email protected]>
On Thu, Sep 01, 2022 at 07:17:51PM -0700, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> A global variable hardware_enable_failed in kvm_arch.c is used only by
> kvm_arch_add_vm() and hardware_enable(). It doesn't have to be a global
> variable. Make it function local.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> virt/kvm/kvm_arch.c | 49 +++++++++++++++++++++------------------------
> 1 file changed, 23 insertions(+), 26 deletions(-)
>
> diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
> index de39d0127584..f7dcde842eb5 100644
> --- a/virt/kvm/kvm_arch.c
> +++ b/virt/kvm/kvm_arch.c
> @@ -13,14 +13,13 @@
> #include <linux/kvm_host.h>
>
> static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
> -static atomic_t hardware_enable_failed;
>
> __weak int kvm_arch_post_init_vm(struct kvm *kvm)
> {
> return 0;
> }
>
> -static void hardware_enable(void *caller_name)
> +static int __hardware_enable(void *caller_name)
> {
> int cpu = raw_smp_processor_id();
> int r;
> @@ -28,18 +27,22 @@ static void hardware_enable(void *caller_name)
> WARN_ON_ONCE(preemptible());
>
> if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
> - return;
> -
> - cpumask_set_cpu(cpu, &cpus_hardware_enabled);
> -
> + return 0;
> r = kvm_arch_hardware_enable();
> -
> - if (r) {
> - cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
> - atomic_inc(&hardware_enable_failed);
> + if (r)
> pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
> cpu, (const char *)caller_name);
> - }
> + else
> + cpumask_set_cpu(cpu, &cpus_hardware_enabled);
> + return r;
> +}
> +
> +static void hardware_enable(void *arg)
> +{
> + atomic_t *failed = arg;
> +
> + if (__hardware_enable((void *)__func__))
> + atomic_inc(failed);
> }
A side effect: The actual caller_name information introduced in Patch
3 for hardware_enable() is lost. I tend to keep the information, but
depends on you and other guys. :-)
>
> static void hardware_disable(void *junk)
> @@ -65,15 +68,16 @@ __weak void kvm_arch_pre_hardware_unsetup(void)
> */
> __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
> {
> + atomic_t failed;
> int r = 0;
>
> if (usage_count != 1)
> return 0;
>
> - atomic_set(&hardware_enable_failed, 0);
> - on_each_cpu(hardware_enable, (void *)__func__, 1);
> + atomic_set(&failed, 0);
> + on_each_cpu(hardware_enable, &failed, 1);
>
> - if (atomic_read(&hardware_enable_failed)) {
> + if (atomic_read(&failed)) {
> r = -EBUSY;
> goto err;
> }
> @@ -96,27 +100,20 @@ __weak int kvm_arch_del_vm(int usage_count)
>
> __weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
> {
> - int ret = 0;
> + int ret;
>
> ret = kvm_arch_check_processor_compat();
> if (ret)
> return ret;
>
> + if (!usage_count)
> + return 0;
> /*
> * Abort the CPU online process if hardware virtualization cannot
> * be enabled. Otherwise running VMs would encounter unrecoverable
> * errors when scheduled to this CPU.
> */
> - if (usage_count) {
> - WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
> -
> - hardware_enable((void *)__func__);
> - if (atomic_read(&hardware_enable_failed)) {
> - atomic_set(&hardware_enable_failed, 0);
> - ret = -EIO;
> - }
> - }
> - return ret;
> + return __hardware_enable((void *)__func__);
> }
>
> __weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
> @@ -149,5 +146,5 @@ __weak void kvm_arch_resume(int usage_count)
> return; /* FIXME: disable KVM */
>
> if (usage_count)
> - hardware_enable((void *)__func__);
> + (void)__hardware_enable((void *)__func__);
> }
> --
> 2.25.1
>
On Mon, Sep 05, 2022 at 01:39:20PM +0100,
Marc Zyngier <[email protected]> wrote:
> On 2022-09-05 10:29, Marc Zyngier wrote:
> > On Mon, 05 Sep 2022 08:05:09 +0100,
> > Yuan Yao <[email protected]> wrote:
> > >
> > > On Thu, Sep 01, 2022 at 07:17:41PM -0700, [email protected]
> > > wrote:
> > > > From: Marc Zyngier <[email protected]>
> > > >
> > > > For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
> > > > complicated, and we have two extra CPUHP notifiers for vGIC and timers.
> > > >
> > > > It looks pretty pointless, and gets in the way of further changes.
> > > > So let's just expose some helpers that can be called from the core
> > > > CPUHP callback, and get rid of everything else.
> > > >
> > > > This gives us the opportunity to drop a useless notifier entry,
> > > > as well as tidy-up the timer enable/disable, which was a bit odd.
> > > >
> > > > Signed-off-by: Marc Zyngier <[email protected]>
> > > > Signed-off-by: Chao Gao <[email protected]>
> > > > Reviewed-by: Oliver Upton <[email protected]>
> > > > Link: https://lore.kernel.org/r/[email protected]
> > > > Signed-off-by: Isaku Yamahata <[email protected]>
> > > > ---
> > > > arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
> > > > arch/arm64/kvm/arm.c | 4 ++++
> > > > arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
> > > > include/kvm/arm_arch_timer.h | 4 ++++
> > > > include/kvm/arm_vgic.h | 4 ++++
> > > > include/linux/cpuhotplug.h | 3 ---
> > > > 6 files changed, 24 insertions(+), 37 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > > > index bb24a76b4224..33fca1a691a5 100644
> > > > --- a/arch/arm64/kvm/arch_timer.c
> > > > +++ b/arch/arm64/kvm/arch_timer.c
> > > > @@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> > > > ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> > > > }
> > > >
> > > > -static void kvm_timer_init_interrupt(void *info)
> > > > +void kvm_timer_cpu_up(void)
> > > > {
> > > > enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
> > > > - enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
> > > > + if (host_ptimer_irq)
> > > > + enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
> > > > +}
> > > > +
> > > > +void kvm_timer_cpu_down(void)
> > > > +{
> > > > + disable_percpu_irq(host_vtimer_irq);
> > > > + if (host_ptimer_irq)
> > > > + disable_percpu_irq(host_ptimer_irq);
> > > > }
> > >
> > > Should "host_vtimer_irq" be checked yet as host_ptimer_irq ?
> >
> > No, because although the ptimer interrupt is optional (on older
> > systems, we fully emulate that timer, including the interrupt), the
> > vtimer interrupt is always present and can be used unconditionally.
> >
> > > Because
> > > the host_{v,p}timer_irq is set in same function kvm_irq_init() which
> > > called AFTER the on_each_cpu(_kvm_arch_hardware_enable, NULL, 1) from
> > > init_subsystems():
> > >
> > > kvm_init()
> > > kvm_arch_init()
> > > init_subsystems()
> > > on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
> > > kvm_timer_hyp_init()
> > > kvm_irq_init()
> > > host_vtimer_irq = info->virtual_irq;
> > > host_ptimer_irq = info->physical_irq;
> > > hardware_enable_all()
> >
> > This, however, is a very nice catch. I doubt this results in anything
> > really bad (the interrupt enable will fail as the interrupt number
> > is 0, and the disable will also fail due to no prior enable), but
> > that's extremely ugly anyway.
> >
> > The best course of action AFAICS is to differentiate between the
> > arm64-specific initialisation (which is a one-off) and the runtime
> > stuff. Something like the hack below, that I haven't tested yet:
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 32c1022eb4b3..65d03c28f32a 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1671,23 +1671,27 @@ static void _kvm_arch_hardware_enable(void
> > *discard)
> > {
> > if (!__this_cpu_read(kvm_arm_hardware_enabled)) {
> > cpu_hyp_reinit();
> > - kvm_vgic_cpu_up();
> > - kvm_timer_cpu_up();
> > __this_cpu_write(kvm_arm_hardware_enabled, 1);
> > }
> > }
> >
> > int kvm_arch_hardware_enable(void)
> > {
> > + int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> > +
> > _kvm_arch_hardware_enable(NULL);
> > +
> > + if (!was_enabled) {
> > + kvm_vgic_cpu_up();
> > + kvm_timer_cpu_up();
> > + }
> > +
> > return 0;
> > }
> >
> > static void _kvm_arch_hardware_disable(void *discard)
> > {
> > if (__this_cpu_read(kvm_arm_hardware_enabled)) {
> > - kvm_timer_cpu_down();
> > - kvm_vgic_cpu_down();
> > cpu_hyp_reset();
> > __this_cpu_write(kvm_arm_hardware_enabled, 0);
> > }
> > @@ -1695,6 +1699,11 @@ static void _kvm_arch_hardware_disable(void
> > *discard)
> >
> > void kvm_arch_hardware_disable(void)
> > {
> > + if (__this_cpu_read(kvm_arm_hardware_enabled)) {
> > + kvm_timer_cpu_down();
> > + kvm_vgic_cpu_down();
> > + }
> > +
> > if (!is_protected_kvm_enabled())
> > _kvm_arch_hardware_disable(NULL);
> > }
>
> OK, this seems to work here, at least based on a sample of 2
> systems, bringing CPUs up and down whist a VM is pinned to
> these CPUs.
>
> Isaku, can you please squash this into the original patch
> and drop Oliver's Reviewed-by: tag, as this significantly
> changes the logic?
>
> Alternatively, I can repost this patch as a standalone change.
I'll do with the next respin. Anyway feel free to go before me.
--
Isaku Yamahata <[email protected]>
On Tue, Sep 06, 2022 at 03:43:58PM +0800,
Yuan Yao <[email protected]> wrote:
> On Thu, Sep 01, 2022 at 07:17:49PM -0700, [email protected] wrote:
> > From: Isaku Yamahata <[email protected]>
> >
> > To make clear that those files are default implementation that KVM/x86 (and
> > other KVM arch in future) will override them, split out those into a single
> > file. Once conversions for all kvm archs are done, the file will be
> > deleted. kvm_arch_pre_hardware_unsetup() is introduced to avoid cross-arch
> > code churn for now. Once it's settled down,
> > kvm_arch_pre_hardware_unsetup() can be merged into
> > kvm_arch_hardware_unsetup() in each arch code.
> >
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > ---
> > include/linux/kvm_host.h | 1 +
> > virt/kvm/kvm_arch.c | 103 ++++++++++++++++++++++-
> > virt/kvm/kvm_main.c | 172 +++++----------------------------------
> > 3 files changed, 124 insertions(+), 152 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f78364e01ca9..60f4ae9d6f48 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
> > int kvm_arch_hardware_enable(void);
> > void kvm_arch_hardware_disable(void);
> > int kvm_arch_hardware_setup(void *opaque);
> > +void kvm_arch_pre_hardware_unsetup(void);
> > void kvm_arch_hardware_unsetup(void);
> > int kvm_arch_check_processor_compat(void);
> > int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> > diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
> > index 0eac996f4981..0648d4463d9e 100644
> > --- a/virt/kvm/kvm_arch.c
> > +++ b/virt/kvm/kvm_arch.c
> > @@ -6,49 +6,148 @@
> > * Author:
> > * Isaku Yamahata <[email protected]>
> > * <[email protected]>
> > + *
> > + * TODO: Delete this file once the conversion of all KVM arch is done.
> > */
> >
> > #include <linux/kvm_host.h>
> >
> > +static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
> > +static atomic_t hardware_enable_failed;
> > +
> > __weak int kvm_arch_post_init_vm(struct kvm *kvm)
> > {
> > return 0;
> > }
> >
> > +static void hardware_enable_nolock(void *caller_name)
> > +{
> > + int cpu = raw_smp_processor_id();
> > + int r;
> > +
> > + WARN_ON_ONCE(preemptible());
> > +
> > + if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
> > + return;
> > +
> > + cpumask_set_cpu(cpu, &cpus_hardware_enabled);
> > +
> > + r = kvm_arch_hardware_enable();
> > +
> > + if (r) {
> > + cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
> > + atomic_inc(&hardware_enable_failed);
> > + pr_warn("kvm: enabling virtualization on CPU%d failed during %s()\n",
> > + cpu, (const char *)caller_name);
> > + }
> > +}
> > +
> > +static void hardware_disable_nolock(void *junk)
> > +{
> > + int cpu = raw_smp_processor_id();
> > +
> > + WARN_ON_ONCE(preemptible());
> > +
> > + if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
> > + return;
> > + cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
> > + kvm_arch_hardware_disable();
> > +}
> > +
> > +__weak void kvm_arch_pre_hardware_unsetup(void)
> > +{
> > + on_each_cpu(hardware_disable_nolock, NULL, 1);
> > +}
> > +
> > /*
> > * Called after the VM is otherwise initialized, but just before adding it to
> > * the vm_list.
> > */
> > __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
> > {
> > - return kvm_arch_post_init_vm(kvm);
> > + int r = 0;
> > +
> > + if (usage_count != 1)
> > + return 0;
> > +
> > + atomic_set(&hardware_enable_failed, 0);
> > + on_each_cpu(hardware_enable_nolock, (void *)__func__, 1);
>
>
> This function is called in kvm_create_vm:
>
> kvm_create_vm {
> ...
> enable_hardware_all()
> ...
> kvm_arch_add_vm()
> ...
> }
>
> so don't need on_each_cpu(enable_hardware_nolock) here, or the
> enable_hardware_all() shuold be removed from kvm_create_vm().
Yes, it's removed. Please notice the following hunk.
@@ -1196,10 +1191,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_arch_destroy_vm;
- r = hardware_enable_all();
- if (r)
- goto out_err_no_disable;
-
#ifdef CONFIG_HAVE_KVM_IRQFD
INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
#endif
--
Isaku Yamahata <[email protected]>
On Mon, Sep 05, 2022 at 05:27:12PM +0800,
Yuan Yao <[email protected]> wrote:
> On Mon, Sep 05, 2022 at 04:40:14PM +0800, Yuan Yao wrote:
> > On Thu, Sep 01, 2022 at 07:17:44PM -0700, [email protected] wrote:
> > > From: Isaku Yamahata <[email protected]>
> > >
> > > So far the processor compatibility check is not done on resume. It should
> > > be done.
> >
> > The resume happens for resuming from S3/S4, so the compatibility
> > checking is used to detecte CPU replacement, or resume from S4 on an
> > different machine ?
>
> By did experiments, I found the resume is called once on CPU 0 before
> other CPUs come UP, so yes it's necessary to check it.
I've added the commit message.
KVM: Do processor compatibility check on resume
So far the processor compatibility check is not done on resume. It should
be done. The resume is called for resuming from S3/S4. CPUs can be
replaced or the kernel can resume from S4 on a different machine. So
compatibility check is desirable.
--
Isaku Yamahata <[email protected]>
On Tue, Sep 06, 2022 at 02:25:02PM +0800,
Yuan Yao <[email protected]> wrote:
> On Thu, Sep 01, 2022 at 07:17:46PM -0700, [email protected] wrote:
> > From: Isaku Yamahata <[email protected]>
> >
> > Add arch hooks for reboot, suspend, resume, and CPU-online/offline events
> > with empty stub functions.
> >
> > Suggested-by: Sean Christopherson <[email protected]>
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > ---
> > include/linux/kvm_host.h | 6 +++++
> > virt/kvm/Makefile.kvm | 2 +-
> > virt/kvm/kvm_arch.c | 44 +++++++++++++++++++++++++++++++++++
> > virt/kvm/kvm_main.c | 50 +++++++++++++++++++++++++---------------
> > 4 files changed, 82 insertions(+), 20 deletions(-)
> > create mode 100644 virt/kvm/kvm_arch.c
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index eab352902de7..dd2a6d98d4de 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1448,6 +1448,12 @@ int kvm_arch_post_init_vm(struct kvm *kvm);
> > void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> > int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> >
> > +int kvm_arch_suspend(int usage_count);
> > +void kvm_arch_resume(int usage_count);
> > +int kvm_arch_reboot(int val);
> > +int kvm_arch_online_cpu(unsigned int cpu, int usage_count);
> > +int kvm_arch_offline_cpu(unsigned int cpu, int usage_count);
> > +
> > #ifndef __KVM_HAVE_ARCH_VM_ALLOC
> > /*
> > * All architectures that want to use vzalloc currently also
> > diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
> > index 2c27d5d0c367..c4210acabd35 100644
> > --- a/virt/kvm/Makefile.kvm
> > +++ b/virt/kvm/Makefile.kvm
> > @@ -5,7 +5,7 @@
> >
> > KVM ?= ../../../virt/kvm
> >
> > -kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
> > +kvm-y := $(KVM)/kvm_main.o $(KVM)/kvm_arch.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
> > kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
> > kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
> > kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
> > diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
> > new file mode 100644
> > index 000000000000..4748a76bcb03
> > --- /dev/null
> > +++ b/virt/kvm/kvm_arch.c
> > @@ -0,0 +1,44 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * kvm_arch.c: kvm default arch hooks for hardware enabling/disabling
> > + * Copyright (c) 2022 Intel Corporation.
> > + *
> > + * Author:
> > + * Isaku Yamahata <[email protected]>
> > + * <[email protected]>
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +
> > +/*
> > + * Called after the VM is otherwise initialized, but just before adding it to
> > + * the vm_list.
> > + */
> > +__weak int kvm_arch_post_init_vm(struct kvm *kvm)
> > +{
> > + return 0;
> > +}
> > +
> > +__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
> > +{
> > + return 0;
> > +}
> > +
> > +__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
> > +{
> > + return 0;
> > +}
> > +
> > +__weak int kvm_arch_reboot(int val)
> > +{
> > + return NOTIFY_OK;
> > +}
> > +
> > +__weak int kvm_arch_suspend(int usage_count)
> > +{
> > + return 0;
> > +}
> > +
> > +__weak void kvm_arch_resume(int usage_count)
> > +{
> > +}
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 082d5dbc8d7f..e62240fb8474 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -144,6 +144,7 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
> > #endif
> > static int hardware_enable_all(void);
> > static void hardware_disable_all(void);
> > +static void hardware_disable_nolock(void *junk);
> >
> > static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
> >
> > @@ -1097,15 +1098,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
> > return ret;
> > }
> >
> > -/*
> > - * Called after the VM is otherwise initialized, but just before adding it to
> > - * the vm_list.
> > - */
> > -int __weak kvm_arch_post_init_vm(struct kvm *kvm)
> > -{
> > - return 0;
> > -}
> > -
> > /*
> > * Called just after removing the VM from the vm_list, but before doing any
> > * other destruction.
> > @@ -5033,6 +5025,10 @@ static int kvm_online_cpu(unsigned int cpu)
> > if (atomic_read(&hardware_enable_failed)) {
> > atomic_set(&hardware_enable_failed, 0);
> > ret = -EIO;
> > + } else {
> > + ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
> > + if (ret)
> > + hardware_disable_nolock(NULL);
> > }
> > }
> > mutex_unlock(&kvm_lock);
> > @@ -5053,11 +5049,19 @@ static void hardware_disable_nolock(void *junk)
> >
> > static int kvm_offline_cpu(unsigned int cpu)
> > {
> > + int ret = 0;
> > +
> > mutex_lock(&kvm_lock);
> > - if (kvm_usage_count)
> > + if (kvm_usage_count) {
> > hardware_disable_nolock(NULL);
> > + ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
> > + if (ret) {
> > + (void)hardware_enable_nolock(NULL);
> > + atomic_set(&hardware_enable_failed, 0);
> > + }
> > + }
> > mutex_unlock(&kvm_lock);
> > - return 0;
> > + return ret;
> > }
> >
> > static void hardware_disable_all_nolock(void)
> > @@ -5115,6 +5119,8 @@ static int hardware_enable_all(void)
> > static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> > void *v)
> > {
> > + int r;
> > +
> > /*
> > * Some (well, at least mine) BIOSes hang on reboot if
> > * in vmx root mode.
> > @@ -5123,8 +5129,15 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> > */
> > pr_info("kvm: exiting hardware virtualization\n");
> > kvm_rebooting = true;
> > +
> > + /* This hook is called without cpuhotplug disabled. */
> > + cpus_read_lock();
> > + mutex_lock(&kvm_lock);
> > on_each_cpu(hardware_disable_nolock, NULL, 1);
> > - return NOTIFY_OK;
> > + r = kvm_arch_reboot(val);
> > + mutex_unlock(&kvm_lock);
> > + cpus_read_unlock();
> > + return r;
> > }
> >
> > static struct notifier_block kvm_reboot_notifier = {
> > @@ -5718,11 +5731,10 @@ static int kvm_suspend(void)
> > * cpu_hotplug_disable() and other CPUs are offlined. No need for
> > * locking.
> > */
> > - if (kvm_usage_count) {
> > - lockdep_assert_not_held(&kvm_lock);
> > + lockdep_assert_not_held(&kvm_lock);
> > + if (kvm_usage_count)
> > hardware_disable_nolock(NULL);
> > - }
> > - return 0;
> > + return kvm_arch_suspend(kvm_usage_count);
> > }
> >
> e static void kvm_resume(void)
> > @@ -5734,10 +5746,10 @@ static void kvm_resume(void)
> > */
> > return; /* FIXME: disable KVM */
> >
> > - if (kvm_usage_count) {
> > - lockdep_assert_not_held(&kvm_lock);
> > + lockdep_assert_not_held(&kvm_lock);
> > + kvm_arch_resume(kvm_usage_count);
> > + if (kvm_usage_count)
> > hardware_enable_nolock((void *)__func__);
>
> Is single kvm_arch_{suspend,resume} enough?
>
> The sequence is:
>
> kvm_arch_resume()
> hardware_enable_nolock()
>
> But in patch 12 I see below code in x86's kvm_arch_resume():
> ...
> if (!usage_count)
> return;
> if (kvm_arch_hardware_enable())
> return;
> ...
>
> So kvm_arch_resume() may depend on hardware enable, and it checks the
> usage_count again, how about call kvm_arch_resume(bool
> hardware_enabled) before/after the hardware_enable_nolock() (or even
> give different functions with _before/_after suffix) for better
> flexibility since it's common code for all architectures, e.g.
>
> if (kvm_usage_count) {
> kvm_arch_resume(false);
> hardware_enable_nolock(__func__);
> kvm_arch_resumt(true);
> }
Nice catch. I fixed it as follows.
kvm_resume()
if (kvm_usage_count)
hardware_enable_nolock()
kvm_arch_resume()
x86 kvm_arch_resume() doesn't call hardware_enable().
The patch of adjust compatibility check and hardware_enable().
--
Isaku Yamahata <[email protected]>
On Tue, Sep 06, 2022 at 02:44:34PM -0700,
Isaku Yamahata <[email protected]> wrote:
> On Tue, Sep 06, 2022 at 07:32:22AM +0100,
> Marc Zyngier <[email protected]> wrote:
>
> > On Tue, 06 Sep 2022 03:46:43 +0100,
> > Yuan Yao <[email protected]> wrote:
> > >
> > > On Thu, Sep 01, 2022 at 07:17:45PM -0700, [email protected] wrote:
> > > > From: Isaku Yamahata <[email protected]>
> > > >
> > > > Because kvm_count_lock unnecessarily complicates the KVM locking convention
> > > > Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
> > > > simplicity.
> > > >
> > > > Opportunistically add some comments on locking.
> > > >
> > > > Suggested-by: Sean Christopherson <[email protected]>
> > > > Signed-off-by: Isaku Yamahata <[email protected]>
> > > > ---
> > > > Documentation/virt/kvm/locking.rst | 14 +++++-------
> > > > virt/kvm/kvm_main.c | 34 ++++++++++++++++++++----------
> > > > 2 files changed, 28 insertions(+), 20 deletions(-)
> > > >
> > > > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> > > > index 845a561629f1..8957e32aa724 100644
> > > > --- a/Documentation/virt/kvm/locking.rst
> > > > +++ b/Documentation/virt/kvm/locking.rst
> > > > @@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
> > > > :Type: mutex
> > > > :Arch: any
> > > > :Protects: - vm_list
> > > > -
> > > > -``kvm_count_lock``
> > > > -^^^^^^^^^^^^^^^^^^
> > > > -
> > > > -:Type: raw_spinlock_t
> > > > -:Arch: any
> > > > -:Protects: - hardware virtualization enable/disable
> > > > -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
> > > > - migration.
> > > > + - kvm_usage_count
> > > > + - hardware virtualization enable/disable
> > > > +:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
> > > > + because hardware enabling/disabling must be atomic /wrt
> > > > + migration. The lock order is cpus lock => kvm_lock.
> > > >
> > > > ``kvm->mn_invalidate_lock``
> > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index fc55447c4dba..082d5dbc8d7f 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> > > > */
> > > >
> > > > DEFINE_MUTEX(kvm_lock);
> > > > -static DEFINE_RAW_SPINLOCK(kvm_count_lock);
> > > > LIST_HEAD(vm_list);
> > > >
> > > > static cpumask_var_t cpus_hardware_enabled;
> > > > @@ -4996,6 +4995,8 @@ static void hardware_enable_nolock(void *caller_name)
> > > > int cpu = raw_smp_processor_id();
> > > > int r;
> > > >
> > > > + WARN_ON_ONCE(preemptible());
> > >
> > > This looks incorrect, it may triggers everytime when online CPU.
> > > Because patch 7 moved CPUHP_AP_KVM_STARTING *AFTER*
> > > CPUHP_AP_ONLINE_IDLE as CPUHP_AP_KVM_ONLINE, then cpuhp_thread_fun()
> > > runs the new CPUHP_AP_KVM_ONLINE in *non-atomic* context:
> > >
> > > cpuhp_thread_fun(unsigned int cpu) {
> > > ...
> > > if (cpuhp_is_atomic_state(state)) {
> > > local_irq_disable();
> > > st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
> > > local_irq_enable();
> > >
> > > WARN_ON_ONCE(st->result);
> > > } else {
> > > st->result = cpuhp_invoke_callback(cpu, state, bringup, st->node, &st->last);
> > > }
> > > ...
> > > }
> > >
> > > static bool cpuhp_is_atomic_state(enum cpuhp_state state)
> > > {
> > > return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
> > > }
> > >
> > > The hardware_enable_nolock() now is called in 2 cases:
> > > 1. in atomic context by on_each_cpu().
> > > 2. From non-atomic context by CPU hotplug thread.
> > >
> > > so how about "WARN_ONCE(preemptible() && cpu_active(cpu))" ?
> >
> > I suspect similar changes must be applied to the arm64 side (though
> > I'm still looking for a good definition of cpu_active()).
>
> It seems plausible. I tested cpu online/offline on x86. Let me update arm64 code
> too.
On second thought, I decided to add preempt_disable/enable() instead of fixing
up possible arch callback and let each arch handle it.
--
Isaku Yamahata <[email protected]>
On Wed, Sep 07, 2022 at 01:56:57PM +0800,
Yuan Yao <[email protected]> wrote:
> > +static void hardware_enable(void *arg)
> > +{
> > + atomic_t *failed = arg;
> > +
> > + if (__hardware_enable((void *)__func__))
> > + atomic_inc(failed);
> > }
>
> A side effect: The actual caller_name information introduced in Patch
> 3 for hardware_enable() is lost. I tend to keep the information, but
> depends on you and other guys. :-)
That's true. But only kvm_arch_add_vm() calls hardware_enable() and other call
sites are converted to call __hardware_enable(). There is no confusion with
other callers with this patch series. So I decided not to bother to pass
function name in addition to a failed counter.
--
Isaku Yamahata <[email protected]>