2022-09-22 18:22:07

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 00/30] KVM: hardware enable/disable reorganize

From: Isaku Yamahata <[email protected]>

I've updated the patch series based on Chao's review and eliminated
kvm_arch_pre_hardware_unsetup() and compatibility check on resume.

Those two patch are compile only tested.
- "KVM: arm64: Simplify the CPUHP logic"
- "RFC: KVM: powerpc: Move processor compatibility check to hardware setup"
- "RFC: KVM: Remove cpus_hardware_enabled and related sanity check"

Changes from v4:
- reorder/reorganize the patch. Especially introduced one patch for each PM
events. reboot, suspend, resume, and cpu online/offline.
- eliminated kvm_arch_pre_hardware_unsetup().
- removed compatibility check on resume
- "KVM: arm64: Simplify the CPUHP logic": fixed up signed-off-by and the link.
- "RFC: KVM: powerpc: Move processor compatibility check to hardware setup"
move the comment in the code to the commit message.

This patch series is to implement the suggestion by Sean Christopherson [1]
to reorganize enable/disable cpu virtualization feature by replacing
the arch-generic current enable/disable logic with PM related hooks. And
convert kvm/x86 to use new hooks.

- Untable x86 hardware enable logic, snapshot MSRs for user return notifier,
enabling cpu virtualization on cpu online and platform resume. and real
enabling of CPU virtualization feature
- Introduce hooks related to PM.
- Convert kvm/x86 code to user those hooks.
- Split out hardware enabling/disabling logic into a separate file. Compile
it for non-x86 code. Once conversion of other KVM archs is done, this file
can be dropped.
- Delete kvm_arch_check_processor_compat()
- Delete cpus_hardware_enabled. 29/30 and 30/30

[1] https://lore.kernel.org/kvm/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Changes from v3:
- Updated "KVM: arm64: Simplify the CPUHP logic".
- add preempt_disable/enable() around hardware_enable/disable() to keep the
assumption of the arch callback.
- Eliminated arch compat check callback, kvm_arch_check_processor_compat().

Changes from v2:
- Replace the first patch("KVM: x86: Drop kvm_user_return_msr_cpu_online()")
with Sean's implementation
- Included all patches of "Improve KVM's interaction with CPU hotplug" [2]
Until v2, Tried to cherry-pick the least patches of it. It turned out that
all the patches are desirable.

Changes from v1:
- Add a patch "KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section"
to make online/offline callback to run thread context to use mutex instead
of spin lock
- fixes pointed by Chao Gao

Chao Gao (4):
KVM: x86: Move check_processor_compatibility from init ops to runtime
ops
Partially revert "KVM: Pass kvm_init()'s opaque param to additional
arch funcs"
KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
KVM: Do compatibility checks on hotplugged CPUs

Isaku Yamahata (24):
KVM: x86: Use this_cpu_ptr() instead of
per_cpu_ptr(smp_processor_id())
KVM: Provide more information in kernel log if hardware enabling fails
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with
kvm_lock
KVM: Add arch hooks when VM is added/deleted
KVM: Add arch hook for reboot event
KVM: Add arch hook for suspend
KVM: Add arch hook for resume event
KVM: Add arch hook for cpu online event
KVM: Add arch hook for cpu offline event
KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()
KVM: Move out KVM arch PM hooks and hardware enable/disable logic
KVM: kvm_arch.c: Remove _nolock post fix
KVM: kvm_arch.c: Remove a global variable, hardware_enable_failed
KVM: Introduce an arch wrapper to check all processor compatibility
KVM: x86: Duplicate arch callbacks related to pm events and compat
check
KVM: x86: Move TSC fixup logic to KVM arch resume callback
KVM: Eliminate kvm_arch_post_init_vm()
KVM: Add config to not compile kvm_arch.c
KVM: x86: Delete kvm_arch_hardware_enable/disable()
KVM: x86: Make x86 processor compat check callback empty
RFC: KVM: powerpc: Move processor compatibility check to hardware
setup
KVM: Eliminate kvm_arch_check_processor_compat()
RFC: KVM: x86: Remove cpus_hardware_enabled and related sanity check
RFC: KVM: Remove cpus_hardware_enabled and related sanity check

Marc Zyngier (1):
KVM: arm64: Simplify the CPUHP logic

Sean Christopherson (1):
KVM: x86: Drop kvm_user_return_msr_cpu_online()

Documentation/virt/kvm/locking.rst | 14 +-
arch/arm64/kvm/arch_timer.c | 27 ++--
arch/arm64/kvm/arm.c | 18 ++-
arch/arm64/kvm/vgic/vgic-init.c | 19 +--
arch/mips/kvm/mips.c | 5 -
arch/powerpc/kvm/powerpc.c | 5 -
arch/riscv/kvm/main.c | 5 -
arch/s390/kvm/kvm-s390.c | 5 -
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/vmx/vmx.c | 24 ++--
arch/x86/kvm/x86.c | 201 ++++++++++++++++++++++++----
include/kvm/arm_arch_timer.h | 4 +
include/kvm/arm_vgic.h | 4 +
include/linux/cpuhotplug.h | 5 +-
include/linux/kvm_host.h | 13 +-
virt/kvm/Kconfig | 3 +
virt/kvm/Makefile.kvm | 3 +
virt/kvm/kvm_arch.c | 132 +++++++++++++++++++
virt/kvm/kvm_main.c | 205 +++++++++--------------------
22 files changed, 443 insertions(+), 257 deletions(-)
create mode 100644 virt/kvm/kvm_arch.c


base-commit: 521a547ced6477c54b4b0cc206000406c221b4d6
--
2.25.1


2022-09-22 18:31:28

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 23/30] KVM: Eliminate kvm_arch_post_init_vm()

From: Isaku Yamahata <[email protected]>

Now kvm_arch_post_init_vm() is used only by x86 kvm_arch_add_vm(). Other
arch doesn't define it. Merge x86 kvm_arch_post_init_vm() into x86
kvm_arch_add_vm() and eliminate kvm_arch_post_init_vm().

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 7 +------
include/linux/kvm_host.h | 1 -
virt/kvm/kvm_arch.c | 16 +---------------
3 files changed, 2 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d49396bb6c2e..bf8d3b901725 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11969,11 +11969,6 @@ void kvm_arch_hardware_disable(void)

static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;

-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static int __hardware_enable(void)
{
int cpu = raw_smp_processor_id();
@@ -12032,7 +12027,7 @@ int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
goto err;
}

- r = kvm_arch_post_init_vm(kvm);
+ r = kvm_mmu_post_init_vm(kvm);
err:
if (r)
on_each_cpu(hardware_disable, NULL, 1);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2c5ca8741ca5..8dfa212b4543 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1451,7 +1451,6 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
-int kvm_arch_post_init_vm(struct kvm *kvm);
int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
int kvm_arch_del_vm(int usage_count);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 87ee84c09634..2509c2777a49 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -14,15 +14,6 @@

static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;

-/*
- * Called after the VM is otherwise initialized, but just before adding it to
- * the vm_list.
- */
-int __weak kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
static int __hardware_enable(void)
{
int cpu = raw_smp_processor_id();
@@ -77,13 +68,8 @@ int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)

if (atomic_read(&failed)) {
r = -EBUSY;
- goto err;
- }
-
- r = kvm_arch_post_init_vm(kvm);
-err:
- if (r)
on_each_cpu(hardware_disable, NULL, 1);
+ }
return r;
}

--
2.25.1

2022-09-22 18:31:43

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 02/30] KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())

From: Isaku Yamahata <[email protected]>

convert per_cpu_ptr(smp_processor_id()) to this_cpu_ptr() as trivial
cleanup.

Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Chao Gao <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e1b3af4a074..7fc19533a484 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -428,8 +428,7 @@ static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)

int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
int err;

kvm_user_return_msr_init_cpu(msrs);
@@ -453,8 +452,7 @@ EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);

static void drop_user_return_notifiers(void)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);

if (msrs->registered)
kvm_on_user_return(&msrs->urn);
--
2.25.1

2022-09-22 18:39:54

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 21/30] KVM: x86: Duplicate arch callbacks related to pm events and compat check

From: Isaku Yamahata <[email protected]>

KVM/X86 can change those callbacks without worrying about breaking other
archs.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 166 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 161 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5ebb69996d5..b15eb59096b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11960,6 +11960,167 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}

+static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
+
+int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return kvm_mmu_post_init_vm(kvm);
+}
+
+static int __hardware_enable(void)
+{
+ int cpu = raw_smp_processor_id();
+ int r;
+
+ WARN_ON_ONCE(preemptible());
+
+ if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return 0;
+ r = kvm_arch_hardware_enable();
+ if (r)
+ pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
+ cpu, __builtin_return_address(0));
+ else
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ return r;
+}
+
+static void hardware_enable(void *arg)
+{
+ atomic_t *failed = arg;
+
+ if (__hardware_enable())
+ atomic_inc(failed);
+}
+
+static void hardware_disable(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ WARN_ON_ONCE(preemptible());
+
+ if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ kvm_arch_hardware_disable();
+}
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ atomic_t failed;
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&failed, 0);
+ on_each_cpu(hardware_enable, &failed, 1);
+
+ if (atomic_read(&failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r)
+ on_each_cpu(hardware_disable, NULL, 1);
+ return r;
+}
+
+int kvm_arch_del_vm(int usage_count)
+{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable, NULL, 1);
+ return 0;
+}
+
+static void check_processor_compat(void *rtn)
+{
+ *(int *)rtn = kvm_arch_check_processor_compat();
+}
+
+int kvm_arch_check_processor_compat_all(void)
+{
+ int cpu;
+ int r;
+
+ for_each_online_cpu(cpu) {
+ smp_call_function_single(cpu, check_processor_compat, &r, 1);
+ if (r < 0)
+ return r;
+ }
+ return 0;
+}
+
+int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ int ret;
+
+ ret = kvm_arch_check_processor_compat();
+ if (ret)
+ return ret;
+
+ if (!usage_count)
+ return 0;
+
+ /*
+ * arch callback kvm_arch_hardware_eanble() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ ret = __hardware_enable();
+ preempt_enable();
+
+ return ret;
+}
+
+int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ if (usage_count) {
+ /*
+ * arch callback kvm_arch_hardware_disable() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
+ hardware_disable(NULL);
+ preempt_enable();
+ }
+ return 0;
+}
+
+int kvm_arch_reboot(int val)
+{
+ on_each_cpu(hardware_disable, NULL, 1);
+ return NOTIFY_OK;
+}
+
+int kvm_arch_suspend(int usage_count)
+{
+ if (usage_count)
+ hardware_disable(NULL);
+ return 0;
+}
+
+void kvm_arch_resume(int usage_count)
+{
+ if (usage_count)
+ (void)__hardware_enable();
+}
+
static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
{
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
@@ -12140,11 +12301,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}

-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
{
vcpu_load(vcpu);
--
2.25.1

2022-09-22 18:40:44

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 06/30] KVM: arm64: Simplify the CPUHP logic

From: Marc Zyngier <[email protected]>

For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
complicated, and we have two extra CPUHP notifiers for vGIC and timers.

It looks pretty pointless, and gets in the way of further changes.
So let's just expose some helpers that can be called from the core
CPUHP callback, and get rid of everything else.

This gives us the opportunity to drop a useless notifier entry,
as well as tidy-up the timer enable/disable, which was a bit odd.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
arch/arm64/kvm/arm.c | 13 +++++++++++++
arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
include/kvm/arm_arch_timer.h | 4 ++++
include/kvm/arm_vgic.h | 4 ++++
include/linux/cpuhotplug.h | 3 ---
6 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index bb24a76b4224..33fca1a691a5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
}

-static void kvm_timer_init_interrupt(void *info)
+void kvm_timer_cpu_up(void)
{
enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
- enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+ if (host_ptimer_irq)
+ enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+}
+
+void kvm_timer_cpu_down(void)
+{
+ disable_percpu_irq(host_vtimer_irq);
+ if (host_ptimer_irq)
+ disable_percpu_irq(host_ptimer_irq);
}

int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
@@ -976,18 +984,6 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
preempt_enable();
}

-static int kvm_timer_starting_cpu(unsigned int cpu)
-{
- kvm_timer_init_interrupt(NULL);
- return 0;
-}
-
-static int kvm_timer_dying_cpu(unsigned int cpu)
-{
- disable_percpu_irq(host_vtimer_irq);
- return 0;
-}
-
static int timer_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
{
if (vcpu)
@@ -1185,9 +1181,6 @@ int kvm_timer_hyp_init(bool has_gic)
goto out_free_irq;
}

- cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
- "kvm/arm/timer:starting", kvm_timer_starting_cpu,
- kvm_timer_dying_cpu);
return 0;
out_free_irq:
free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3385fb57c11a..7e83498b83aa 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1676,7 +1676,15 @@ static void _kvm_arch_hardware_enable(void *discard)

int kvm_arch_hardware_enable(void)
{
+ int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
+
_kvm_arch_hardware_enable(NULL);
+
+ if (!was_enabled) {
+ kvm_vgic_cpu_up();
+ kvm_timer_cpu_up();
+ }
+
return 0;
}

@@ -1690,6 +1698,11 @@ static void _kvm_arch_hardware_disable(void *discard)

void kvm_arch_hardware_disable(void)
{
+ if (__this_cpu_read(kvm_arm_hardware_enabled)) {
+ kvm_timer_cpu_down();
+ kvm_vgic_cpu_down();
+ }
+
if (!is_protected_kvm_enabled())
_kvm_arch_hardware_disable(NULL);
}
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..6c7f6ae21ec0 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -465,17 +465,15 @@ int kvm_vgic_map_resources(struct kvm *kvm)

/* GENERIC PROBE */

-static int vgic_init_cpu_starting(unsigned int cpu)
+void kvm_vgic_cpu_up(void)
{
enable_percpu_irq(kvm_vgic_global_state.maint_irq, 0);
- return 0;
}


-static int vgic_init_cpu_dying(unsigned int cpu)
+void kvm_vgic_cpu_down(void)
{
disable_percpu_irq(kvm_vgic_global_state.maint_irq);
- return 0;
}

static irqreturn_t vgic_maintenance_handler(int irq, void *data)
@@ -584,19 +582,6 @@ int kvm_vgic_hyp_init(void)
return ret;
}

- ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- "kvm/arm/vgic:starting",
- vgic_init_cpu_starting, vgic_init_cpu_dying);
- if (ret) {
- kvm_err("Cannot register vgic CPU notifier\n");
- goto out_free_irq;
- }
-
kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq);
return 0;
-
-out_free_irq:
- free_percpu_irq(kvm_vgic_global_state.maint_irq,
- kvm_get_running_vcpus());
- return ret;
}
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index cd6d8f260eab..1638418f72dd 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -104,4 +104,8 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
u32 timer_get_ctl(struct arch_timer_context *ctxt);
u64 timer_get_cval(struct arch_timer_context *ctxt);

+/* CPU HP callbacks */
+void kvm_timer_cpu_up(void);
+void kvm_timer_cpu_down(void);
+
#endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..fc4acc91ba06 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -431,4 +431,8 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);

+/* CPU HP callbacks */
+void kvm_vgic_cpu_up(void);
+void kvm_vgic_cpu_down(void);
+
#endif /* __KVM_ARM_VGIC_H */
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..7337414e4947 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -186,9 +186,6 @@ enum cpuhp_state {
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
CPUHP_AP_KVM_STARTING,
- CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- CPUHP_AP_KVM_ARM_VGIC_STARTING,
- CPUHP_AP_KVM_ARM_TIMER_STARTING,
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
--
2.25.1

2022-09-22 18:41:55

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 15/30] KVM: Add arch hook for cpu offline event

From: Isaku Yamahata <[email protected]>

Factor out the logic on cpu offline event as arch callback. Later kvm/x86
will override it.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 30 +++++++++++++++++++-----------
2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 013b33a19030..f3a79d55ca8c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1435,6 +1435,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif

int kvm_arch_online_cpu(unsigned int cpu, int usage_count);
+int kvm_arch_offline_cpu(unsigned int cpu, int usage_count);
int kvm_arch_reboot(int val);
int kvm_arch_suspend(int usage_count);
void kvm_arch_resume(int usage_count);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4c535ae412a7..ad9b8b7d21fa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1174,6 +1174,21 @@ int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
return ret;
}

+int __weak kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ if (usage_count) {
+ /*
+ * arch callback kvm_arch_hardware_disable() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
+ hardware_disable_nolock(NULL);
+ preempt_enable();
+ }
+ return 0;
+}
+
int __weak kvm_arch_reboot(int val)
{
on_each_cpu(hardware_disable_nolock, NULL, 1);
@@ -5137,19 +5152,12 @@ static void hardware_disable_nolock(void *junk)

static int kvm_offline_cpu(unsigned int cpu)
{
+ int ret;
+
mutex_lock(&kvm_lock);
- if (kvm_usage_count) {
- /*
- * arch callback kvm_arch_hardware_disable() assumes that
- * preemption is disabled for historical reason. Disable
- * preemption until all arch callbacks are fixed.
- */
- preempt_disable();
- hardware_disable_nolock(NULL);
- preempt_enable();
- }
+ ret = kvm_arch_offline_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
- return 0;
+ return ret;
}

static void kvm_del_vm(void)
--
2.25.1

2022-09-22 18:43:21

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 25/30] KVM: x86: Delete kvm_arch_hardware_enable/disable()

From: Isaku Yamahata <[email protected]>

Now they're function call and there is no point to keep them. Open code
them.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 01687b553915..aa6594de1fc1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -354,7 +354,7 @@ static void kvm_on_user_return(struct user_return_notifier *urn)

/*
* Disabling irqs at this point since the following code could be
- * interrupted and executed through kvm_arch_hardware_disable()
+ * interrupted and executed through hardware_disable()
*/
local_irq_save(flags);
if (msrs->registered) {
@@ -11864,11 +11864,6 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

-static int kvm_arch_hardware_enable(void)
-{
- return static_call(kvm_x86_hardware_enable)();
-}
-
static int __hardware_enable(void);

void kvm_arch_resume(int usage_count)
@@ -11961,12 +11956,6 @@ void kvm_arch_resume(int usage_count)
}
}

-static void kvm_arch_hardware_disable(void)
-{
- static_call(kvm_x86_hardware_disable)();
- drop_user_return_notifiers();
-}
-
static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;

static int __hardware_enable(void)
@@ -11978,7 +11967,7 @@ static int __hardware_enable(void)

if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
return 0;
- r = kvm_arch_hardware_enable();
+ r = static_call(kvm_x86_hardware_enable)();
if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
cpu, __builtin_return_address(0));
@@ -12004,7 +11993,8 @@ static void hardware_disable(void *junk)
if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
return;
cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
- kvm_arch_hardware_disable();
+ static_call(kvm_x86_hardware_disable)();
+ drop_user_return_notifiers();
}

/*
--
2.25.1

2022-09-22 18:43:44

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 08/30] KVM: Do compatibility checks on hotplugged CPUs

From: Chao Gao <[email protected]>

At init time, KVM does compatibility checks to ensure that all online
CPUs support hardware virtualization and a common set of features. But
KVM uses hotplugged CPUs without such compatibility checks. On Intel
CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or
vmentry failure if the hotplugged CPU doesn't meet minimal feature
requirements.

Do compatibility checks when onlining a CPU and abort the online process
if the hotplugged CPU is incompatible with online CPUs.

CPU hotplug is disabled during hardware_enable_all() to prevent the corner
case as shown below. A hotplugged CPU marks itself online in
cpu_online_mask (1) and enables interrupt (2) before invoking callbacks
registered in ONLINE section (3). So, if hardware_enable_all() is invoked
on another CPU right after (2), then on_each_cpu() in hardware_enable_all()
invokes hardware_enable_nolock() on the hotplugged CPU before
kvm_online_cpu() is called. This makes the CPU escape from compatibility
checks, which is risky.

start_secondary { ...
set_cpu_online(smp_processor_id(), true); <- 1
...
local_irq_enable(); <- 2
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
}

Keep compatibility checks at KVM init time. It can help to find
incompatibility issues earlier and refuse to load arch KVM module
(e.g., kvm-intel).

Loosen the WARN_ON in kvm_arch_check_processor_compat so that it
can be invoked from KVM's CPU hotplug callback (i.e., kvm_online_cpu).
Other arch doesn't depends on prohibiting of preemption because powerpc
has "strcmp(cur_cpu_spec->cpu_name, "model name")" and other arch has
"return 0". Only x86 kvm_arch_check_processor_compat() has issue.

Opportunistically, add a pr_err() for setup_vmcs_config() path in
vmx_check_processor_compatibility() so that each possible error path has
its own error message. Convert printk(KERN_ERR ... to pr_err to please
checkpatch.pl

Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 10 ++++++----
arch/x86/kvm/x86.c | 11 +++++++++--
virt/kvm/kvm_main.c | 18 +++++++++++++++++-
3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ad058dc9794..26f16e310869 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7420,20 +7420,22 @@ static int vmx_check_processor_compatibility(void)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;
+ int cpu = smp_processor_id();

if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
!this_cpu_has(X86_FEATURE_VMX)) {
- pr_err("kvm: VMX is disabled on CPU %d\n", smp_processor_id());
+ pr_err("kvm: VMX is disabled on CPU %d\n", cpu);
return -EIO;
}

- if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
+ if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) {
+ pr_err("kvm: failed to setup vmcs config on CPU %d\n", cpu);
return -EIO;
+ }
if (nested)
nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
- printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
- smp_processor_id());
+ pr_err("kvm: CPU %d feature inconsistency!\n", cpu);
return -EIO;
}
return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 230812d6cbfd..f5ebb69996d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12030,9 +12030,16 @@ void kvm_arch_hardware_unsetup(void)

int kvm_arch_check_processor_compat(void)
{
- struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+ int cpu = smp_processor_id();
+ struct cpuinfo_x86 *c = &cpu_data(cpu);

- WARN_ON(!irqs_disabled());
+ /*
+ * Compatibility checks are done when loading KVM or in KVM's CPU
+ * hotplug callback. It ensures all online CPUs are compatible to run
+ * vCPUs. For other cases, compatibility checks are unnecessary or
+ * even problematic. Try to detect improper usages here.
+ */
+ WARN_ON(!irqs_disabled() && cpu_active(cpu));

if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1bb7038e1ecf..b1bf44af523c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5013,7 +5013,11 @@ static void hardware_enable_nolock(void *junk)

static int kvm_online_cpu(unsigned int cpu)
{
- int ret = 0;
+ int ret;
+
+ ret = kvm_arch_check_processor_compat();
+ if (ret)
+ return ret;

raw_spin_lock(&kvm_count_lock);
/*
@@ -5073,6 +5077,17 @@ static int hardware_enable_all(void)
{
int r = 0;

+ /*
+ * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+ * is called. on_each_cpu() between them includes the CPU. As a result,
+ * hardware_enable_nolock() may get invoked before kvm_online_cpu().
+ * This would enable hardware virtualization on that cpu without
+ * compatibility checks, which can potentially crash system or break
+ * running VMs.
+ *
+ * Disable CPU hotplug to prevent this case from happening.
+ */
+ cpus_read_lock();
raw_spin_lock(&kvm_count_lock);

kvm_usage_count++;
@@ -5087,6 +5102,7 @@ static int hardware_enable_all(void)
}

raw_spin_unlock(&kvm_count_lock);
+ cpus_read_unlock();

return r;
}
--
2.25.1

2022-09-22 18:43:45

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 18/30] KVM: kvm_arch.c: Remove _nolock post fix

From: Isaku Yamahata <[email protected]>

Now all related callbacks are called under kvm_lock, no point for _nolock
post fix. Remove _nolock post fix for readability with shorter function
names.

Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Chao Gao <[email protected]>
---
virt/kvm/kvm_arch.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index bcf8b74144e3..e6bf9de18cba 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -24,7 +24,7 @@ int __weak kvm_arch_post_init_vm(struct kvm *kvm)
return 0;
}

-static void hardware_enable_nolock(void *junk)
+static void hardware_enable(void *junk)
{
int cpu = raw_smp_processor_id();
int r;
@@ -46,7 +46,7 @@ static void hardware_enable_nolock(void *junk)
}
}

-static void hardware_disable_nolock(void *junk)
+static void hardware_disable(void *junk)
{
int cpu = raw_smp_processor_id();

@@ -70,7 +70,7 @@ int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
return 0;

atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
+ on_each_cpu(hardware_enable, NULL, 1);

if (atomic_read(&hardware_enable_failed)) {
r = -EBUSY;
@@ -80,7 +80,7 @@ int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
r = kvm_arch_post_init_vm(kvm);
err:
if (r)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return r;
}

@@ -89,7 +89,7 @@ int __weak kvm_arch_del_vm(int usage_count)
if (usage_count)
return 0;

- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return 0;
}

@@ -115,7 +115,7 @@ int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
* preemption until all arch callbacks are fixed.
*/
preempt_disable();
- hardware_enable_nolock(NULL);
+ hardware_enable(NULL);
preempt_enable();
if (atomic_read(&hardware_enable_failed)) {
atomic_set(&hardware_enable_failed, 0);
@@ -134,7 +134,7 @@ int __weak kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
* preemption until all arch callbacks are fixed.
*/
preempt_disable();
- hardware_disable_nolock(NULL);
+ hardware_disable(NULL);
preempt_enable();
}
return 0;
@@ -142,7 +142,7 @@ int __weak kvm_arch_offline_cpu(unsigned int cpu, int usage_count)

int __weak kvm_arch_reboot(int val)
{
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return NOTIFY_OK;
}

@@ -153,12 +153,12 @@ int __weak kvm_arch_suspend(int usage_count)
* Because kvm_suspend() is called with interrupt disabled, no
* need to disable preemption.
*/
- hardware_disable_nolock(NULL);
+ hardware_disable(NULL);
return 0;
}

void __weak kvm_arch_resume(int usage_count)
{
if (usage_count)
- hardware_enable_nolock(NULL);
+ hardware_enable(NULL);
}
--
2.25.1

2022-09-22 18:44:05

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 10/30] KVM: Add arch hooks when VM is added/deleted

From: Isaku Yamahata <[email protected]>

and pass kvm_usage_count with kvm_lock. Move kvm_arch_post_init_vm() under
kvm_arch_add_vm(). Replace enable/disable_hardware_all() with the default
implementation of kvm_arch_add/del_vm(). Later kvm_arch_post_init_vm() is
deleted once x86 overrides kvm_arch_add_vm().

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 2 +
virt/kvm/kvm_main.c | 121 ++++++++++++++++++++-------------------
2 files changed, 65 insertions(+), 58 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eab352902de7..3fbb01bbac98 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1445,6 +1445,8 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
int kvm_arch_post_init_vm(struct kvm *kvm);
+int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
+int kvm_arch_del_vm(int usage_count);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
int kvm_arch_create_vm_debugfs(struct kvm *kvm);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c4b908553726..e2c8823786ff 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -142,8 +142,9 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
.open = kvm_no_compat_open
#endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);
+static void hardware_enable_nolock(void *junk);
+static void hardware_disable_nolock(void *junk);
+static void kvm_del_vm(void);

static void kvm_io_bus_destroy(struct kvm_io_bus *bus);

@@ -1106,6 +1107,41 @@ int __weak kvm_arch_post_init_vm(struct kvm *kvm)
return 0;
}

+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&hardware_enable_failed, 0);
+ on_each_cpu(hardware_enable_nolock, NULL, 1);
+
+ if (atomic_read(&hardware_enable_failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r)
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return r;
+}
+
+int __weak kvm_arch_del_vm(int usage_count)
+{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return 0;
+}
+
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -1203,10 +1239,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_arch_destroy_vm;

- r = hardware_enable_all();
- if (r)
- goto out_err_no_disable;
-
#ifdef CONFIG_HAVE_KVM_IRQFD
INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
#endif
@@ -1223,13 +1255,28 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_debugfs;

- r = kvm_arch_post_init_vm(kvm);
- if (r)
- goto out_err;
-
+ /*
+ * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+ * is called. on_each_cpu() between them includes the CPU. As a result,
+ * hardware_enable_nolock() may get invoked before kvm_online_cpu().
+ * This would enable hardware virtualization on that cpu without
+ * compatibility checks, which can potentially crash system or break
+ * running VMs.
+ *
+ * Disable CPU hotplug to prevent this case from happening.
+ */
+ cpus_read_lock();
mutex_lock(&kvm_lock);
+ kvm_usage_count++;
+ r = kvm_arch_add_vm(kvm, kvm_usage_count);
+ if (r) {
+ /* the following kvm_del_vm() decrements kvm_usage_count. */
+ mutex_unlock(&kvm_lock);
+ goto out_err;
+ }
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);
+ cpus_read_unlock();

preempt_notifier_inc();
kvm_init_pm_notifier(kvm);
@@ -1246,8 +1293,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
#endif
out_err_no_mmu_notifier:
- hardware_disable_all();
-out_err_no_disable:
+ kvm_del_vm();
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
@@ -1326,7 +1372,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
cleanup_srcu_struct(&kvm->srcu);
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
- hardware_disable_all();
+ kvm_del_vm();
mmdrop(mm);
module_put(kvm_chardev_ops.owner);
}
@@ -5075,56 +5121,15 @@ static int kvm_offline_cpu(unsigned int cpu)
return 0;
}

-static void hardware_disable_all_nolock(void)
-{
- BUG_ON(!kvm_usage_count);
-
- kvm_usage_count--;
- if (!kvm_usage_count)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static void hardware_disable_all(void)
+static void kvm_del_vm(void)
{
cpus_read_lock();
mutex_lock(&kvm_lock);
- hardware_disable_all_nolock();
- mutex_unlock(&kvm_lock);
- cpus_read_unlock();
-}
-
-static int hardware_enable_all(void)
-{
- int r = 0;
-
- /*
- * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
- * is called. on_each_cpu() between them includes the CPU. As a result,
- * hardware_enable_nolock() may get invoked before kvm_online_cpu().
- * This would enable hardware virtualization on that cpu without
- * compatibility checks, which can potentially crash system or break
- * running VMs.
- *
- * Disable CPU hotplug to prevent this case from happening.
- */
- cpus_read_lock();
- mutex_lock(&kvm_lock);
-
- kvm_usage_count++;
- if (kvm_usage_count == 1) {
- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
-
- if (atomic_read(&hardware_enable_failed)) {
- hardware_disable_all_nolock();
- r = -EBUSY;
- }
- }
-
+ WARN_ON_ONCE(!kvm_usage_count);
+ kvm_usage_count--;
+ kvm_arch_del_vm(kvm_usage_count);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
-
- return r;
}

static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
--
2.25.1

2022-09-22 18:44:55

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 19/30] KVM: kvm_arch.c: Remove a global variable, hardware_enable_failed

From: Isaku Yamahata <[email protected]>

A global variable hardware_enable_failed in kvm_arch.c is used only by
kvm_arch_add_vm() and hardware_enable(). It doesn't have to be a global
variable. Make it function local.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 61 ++++++++++++++++++++++-----------------------
1 file changed, 30 insertions(+), 31 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index e6bf9de18cba..10fd29bdd6e1 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -13,7 +13,6 @@
#include <linux/kvm_host.h>

static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-static atomic_t hardware_enable_failed;

/*
* Called after the VM is otherwise initialized, but just before adding it to
@@ -24,7 +23,7 @@ int __weak kvm_arch_post_init_vm(struct kvm *kvm)
return 0;
}

-static void hardware_enable(void *junk)
+static int __hardware_enable(void)
{
int cpu = raw_smp_processor_id();
int r;
@@ -32,18 +31,22 @@ static void hardware_enable(void *junk)
WARN_ON_ONCE(preemptible());

if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
-
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
-
+ return 0;
r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
- atomic_inc(&hardware_enable_failed);
+ if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
cpu, __builtin_return_address(0));
- }
+ else
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ return r;
+}
+
+static void hardware_enable(void *arg)
+{
+ atomic_t *failed = arg;
+
+ if (__hardware_enable())
+ atomic_inc(failed);
}

static void hardware_disable(void *junk)
@@ -64,15 +67,15 @@ static void hardware_disable(void *junk)
*/
int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
{
+ atomic_t failed = ATOMIC_INIT(0);
int r = 0;

if (usage_count != 1)
return 0;

- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable, NULL, 1);
+ on_each_cpu(hardware_enable, &failed, 1);

- if (atomic_read(&hardware_enable_failed)) {
+ if (atomic_read(&failed)) {
r = -EBUSY;
goto err;
}
@@ -95,33 +98,29 @@ int __weak kvm_arch_del_vm(int usage_count)

int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
- int ret = 0;
+ int ret;

ret = kvm_arch_check_processor_compat();
if (ret)
return ret;

+ if (!usage_count)
+ return 0;
+
+ /*
+ * arch callback kvm_arch_hardware_enable() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
* errors when scheduled to this CPU.
*/
- if (usage_count) {
- WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+ ret = __hardware_enable();
+ preempt_enable();

- /*
- * arch callback kvm_arch_hardware_eanble() assumes that
- * preemption is disabled for historical reason. Disable
- * preemption until all arch callbacks are fixed.
- */
- preempt_disable();
- hardware_enable(NULL);
- preempt_enable();
- if (atomic_read(&hardware_enable_failed)) {
- atomic_set(&hardware_enable_failed, 0);
- ret = -EIO;
- }
- }
return ret;
}

@@ -160,5 +159,5 @@ int __weak kvm_arch_suspend(int usage_count)
void __weak kvm_arch_resume(int usage_count)
{
if (usage_count)
- hardware_enable(NULL);
+ (void)__hardware_enable();
}
--
2.25.1

2022-09-22 18:48:39

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 13/30] KVM: Add arch hook for resume event

From: Isaku Yamahata <[email protected]>

Factor out the logic on resume event as arch callback. Later kvm/x86 will
override it.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 12 ++++++++----
2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 861aad8812ff..1adbf74e3047 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1436,6 +1436,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}

int kvm_arch_reboot(int val);
int kvm_arch_suspend(int usage_count);
+void kvm_arch_resume(int usage_count);

int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0ebe43a695e5..1270f88c2a1e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1159,6 +1159,12 @@ int __weak kvm_arch_suspend(int usage_count)
return 0;
}

+void __weak kvm_arch_resume(int usage_count)
+{
+ if (usage_count)
+ hardware_enable_nolock(NULL);
+}
+
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -5769,10 +5775,8 @@ static int kvm_suspend(void)

static void kvm_resume(void)
{
- if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_lock);
- hardware_enable_nolock(NULL);
- }
+ lockdep_assert_not_held(&kvm_lock);
+ kvm_arch_resume(kvm_usage_count);
}

static struct syscore_ops kvm_syscore_ops = {
--
2.25.1

2022-09-22 18:49:11

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 14/30] KVM: Add arch hook for cpu online event

From: Isaku Yamahata <[email protected]>

Factor out the logic on cpu hotplug event as arch callback. Later kvm/x86
will override it.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 58 +++++++++++++++++++++++-----------------
2 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1adbf74e3047..013b33a19030 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1434,6 +1434,7 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif

+int kvm_arch_online_cpu(unsigned int cpu, int usage_count);
int kvm_arch_reboot(int val);
int kvm_arch_suspend(int usage_count);
void kvm_arch_resume(int usage_count);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1270f88c2a1e..4c535ae412a7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1142,6 +1142,38 @@ int __weak kvm_arch_del_vm(int usage_count)
return 0;
}

+int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ int ret = 0;
+
+ ret = kvm_arch_check_processor_compat();
+ if (ret)
+ return ret;
+
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ if (usage_count) {
+ WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+
+ /*
+ * arch callback kvm_arch_hardware_eanble() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
+ hardware_enable_nolock(NULL);
+ preempt_enable();
+ if (atomic_read(&hardware_enable_failed)) {
+ atomic_set(&hardware_enable_failed, 0);
+ ret = -EIO;
+ }
+ }
+ return ret;
+}
+
int __weak kvm_arch_reboot(int val)
{
on_each_cpu(hardware_disable_nolock, NULL, 1);
@@ -5085,32 +5117,8 @@ static int kvm_online_cpu(unsigned int cpu)
{
int ret;

- ret = kvm_arch_check_processor_compat();
- if (ret)
- return ret;
-
mutex_lock(&kvm_lock);
- /*
- * Abort the CPU online process if hardware virtualization cannot
- * be enabled. Otherwise running VMs would encounter unrecoverable
- * errors when scheduled to this CPU.
- */
- if (kvm_usage_count) {
- WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
-
- /*
- * arch callback kvm_arch_hardware_eanble() assumes that
- * preemption is disabled for historical reason. Disable
- * preemption until all arch callbacks are fixed.
- */
- preempt_disable();
- hardware_enable_nolock(NULL);
- preempt_enable();
- if (atomic_read(&hardware_enable_failed)) {
- atomic_set(&hardware_enable_failed, 0);
- ret = -EIO;
- }
- }
+ ret = kvm_arch_online_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return ret;
}
--
2.25.1

2022-09-22 18:50:07

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 07/30] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section

From: Chao Gao <[email protected]>

The CPU STARTING section doesn't allow callbacks to fail. Move KVM's
hotplug callback to ONLINE section so that it can abort onlining a CPU in
certain cases to avoid potentially breaking VMs running on existing CPUs.
For example, when kvm fails to enable hardware virtualization on the
hotplugged CPU.

Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it ensures
when offlining a CPU, all user tasks and non-pinned kernel tasks have left
the CPU, i.e. there cannot be a vCPU task around. So, it is safe for KVM's
CPU offline callback to disable hardware virtualization at that point.
Likewise, KVM's online callback can enable hardware virtualization before
any vCPU task gets a chance to run on hotplugged CPUs.

KVM's CPU hotplug callbacks are renamed as well.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
---
include/linux/cpuhotplug.h | 2 +-
virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++--------
2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 7337414e4947..de45be38dd27 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -185,7 +185,6 @@ enum cpuhp_state {
CPUHP_AP_CSKY_TIMER_STARTING,
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
- CPUHP_AP_KVM_STARTING,
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
@@ -200,6 +199,7 @@ enum cpuhp_state {

/* Online section invoked on the hotplugged CPU from the hotplug thread */
CPUHP_AP_ONLINE_IDLE,
+ CPUHP_AP_KVM_ONLINE,
CPUHP_AP_SCHED_WAIT_EMPTY,
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f19c47aab1c..1bb7038e1ecf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5011,13 +5011,27 @@ static void hardware_enable_nolock(void *junk)
}
}

-static int kvm_starting_cpu(unsigned int cpu)
+static int kvm_online_cpu(unsigned int cpu)
{
+ int ret = 0;
+
raw_spin_lock(&kvm_count_lock);
- if (kvm_usage_count)
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ if (kvm_usage_count) {
+ WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+
hardware_enable_nolock(NULL);
+ if (atomic_read(&hardware_enable_failed)) {
+ atomic_set(&hardware_enable_failed, 0);
+ ret = -EIO;
+ }
+ }
raw_spin_unlock(&kvm_count_lock);
- return 0;
+ return ret;
}

static void hardware_disable_nolock(void *junk)
@@ -5030,7 +5044,7 @@ static void hardware_disable_nolock(void *junk)
kvm_arch_hardware_disable();
}

-static int kvm_dying_cpu(unsigned int cpu)
+static int kvm_offline_cpu(unsigned int cpu)
{
raw_spin_lock(&kvm_count_lock);
if (kvm_usage_count)
@@ -5841,8 +5855,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
goto out_free_2;
}

- r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
- kvm_starting_cpu, kvm_dying_cpu);
+ r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
+ kvm_online_cpu, kvm_offline_cpu);
if (r)
goto out_free_2;
register_reboot_notifier(&kvm_reboot_notifier);
@@ -5903,7 +5917,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
kmem_cache_destroy(kvm_vcpu_cache);
out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
out_free_2:
kvm_arch_hardware_unsetup();
out_free_1:
@@ -5929,7 +5943,7 @@ void kvm_exit(void)
kvm_async_pf_deinit();
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_arch_hardware_unsetup();
kvm_arch_exit();
--
2.25.1

2022-09-22 18:50:39

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 24/30] KVM: Add config to not compile kvm_arch.c

From: Isaku Yamahata <[email protected]>

So that kvm_arch_hardware_enable/disable() aren't required.

Once the conversion of all KVM archs is done, this config and kvm_arch.c
should be removed.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/x86.c | 4 ++--
include/linux/kvm_host.h | 3 +++
virt/kvm/Kconfig | 3 +++
virt/kvm/Makefile.kvm | 5 ++++-
5 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e3cbd7706136..e2e16205425d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
depends on X86_LOCAL_APIC
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
+ select HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
select HAVE_KVM_IRQCHIP
select HAVE_KVM_PFNCACHE
select HAVE_KVM_IRQFD
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf8d3b901725..01687b553915 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11864,7 +11864,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

-int kvm_arch_hardware_enable(void)
+static int kvm_arch_hardware_enable(void)
{
return static_call(kvm_x86_hardware_enable)();
}
@@ -11961,7 +11961,7 @@ void kvm_arch_resume(int usage_count)
}
}

-void kvm_arch_hardware_disable(void)
+static void kvm_arch_hardware_disable(void)
{
static_call(kvm_x86_hardware_disable)();
drop_user_return_notifiers();
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8dfa212b4543..7efc4792ff5f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1440,8 +1440,11 @@ int kvm_arch_reboot(int val);
int kvm_arch_suspend(int usage_count);
void kvm_arch_resume(int usage_count);

+#ifndef CONFIG_HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
+#endif
+
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
int kvm_arch_check_processor_compat(void);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index a8c5c9f06b3c..917314a87696 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -72,3 +72,6 @@ config KVM_XFER_TO_GUEST_WORK

config HAVE_KVM_PM_NOTIFIER
bool
+
+config HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
+ def_bool n
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 428b09b3f80a..c0187ec4f83c 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,10 @@

KVM ?= ../../../virt/kvm

-kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o $(KVM)/kvm_arch.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+ifneq ($(CONFIG_HAVE_KVM_OVERRIDE_HARDWARE_ENABLE), y)
+kvm-y += $(KVM)/kvm_arch.o
+endif
kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
--
2.25.1

2022-09-22 18:51:20

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 20/30] KVM: Introduce an arch wrapper to check all processor compatibility

From: Isaku Yamahata <[email protected]>

Introduce an arch wrapper to check all processor compatibility and define
default implementation as weak symbol to keep the current logic.

The hardware feature compatibility check is arch dependent, only x86 KVM
does cpu feature check on all processors. It doesn't make much sense to
enforce the current implementation to invoke check function on each
processors. Introduce a arch callback,
kvm_arch_check_processor_compat_all(), so that arch code can override it.

Eventually feature check should be pushed down into arch callback,
(kvm_arch_hardware_setup(), kvm_arch_online_cpu(), and kvm_arch_resume()),
the two compatibility check, kvm_arch_check_processor_compat{,_all}(), will
be eliminated. This is a transitional step for it.

Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Chao Gao <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_arch.c | 18 ++++++++++++++++++
virt/kvm/kvm_main.c | 13 +++----------
3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3a79d55ca8c..2c5ca8741ca5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1445,6 +1445,7 @@ void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
int kvm_arch_check_processor_compat(void);
+int kvm_arch_check_processor_compat_all(void);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 10fd29bdd6e1..87ee84c09634 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -96,6 +96,24 @@ int __weak kvm_arch_del_vm(int usage_count)
return 0;
}

+static void check_processor_compat(void *rtn)
+{
+ *(int *)rtn = kvm_arch_check_processor_compat();
+}
+
+int __weak kvm_arch_check_processor_compat_all(void)
+{
+ int cpu;
+ int r;
+
+ for_each_online_cpu(cpu) {
+ smp_call_function_single(cpu, check_processor_compat, &r, 1);
+ if (r < 0)
+ return r;
+ }
+ return 0;
+}
+
int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
int ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b1a09d2d5982..d5f882fb9e0c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5752,11 +5752,6 @@ void kvm_unregister_perf_callbacks(void)
}
#endif

-static void check_processor_compat(void *rtn)
-{
- *(int *)rtn = kvm_arch_check_processor_compat();
-}
-
int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module)
{
@@ -5782,11 +5777,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r < 0)
goto out_free_1;

- for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &r, 1);
- if (r < 0)
- goto out_free_2;
- }
+ r = kvm_arch_check_processor_compat_all();
+ if (r < 0)
+ goto out_free_2;

r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
kvm_online_cpu, kvm_offline_cpu);
--
2.25.1

2022-09-22 18:53:28

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 27/30] RFC: KVM: powerpc: Move processor compatibility check to hardware setup

From: Isaku Yamahata <[email protected]>

Move processor compatibility check from kvm_arch_processor_compat() into
kvm_arch_hardware_setup(). The check does model name comparison with a
global variable, cur_cpu_spec. There is no point to check it at run time
on all processors.

kvmppc_core_check_processor_compat() checks the global variable. There are
five implementation for it as follows.

arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
arch/powerpc/kvm/book3s.c: return 0
arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
strcmp(cur_cpu_spec->cpu_name, "e5500")
strcmp(cur_cpu_spec->cpu_name, "e6500")

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Cc: [email protected]
Cc: Fabiano Rosas <[email protected]>
---
arch/powerpc/kvm/powerpc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7b56d6ccfdfb..31dc4f231e9d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -444,12 +444,12 @@ int kvm_arch_hardware_enable(void)

int kvm_arch_hardware_setup(void *opaque)
{
- return 0;
+ return kvmppc_core_check_processor_compat();
}

int kvm_arch_check_processor_compat(void)
{
- return kvmppc_core_check_processor_compat();
+ return 0;
}

int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
--
2.25.1

2022-09-22 18:54:18

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 26/30] KVM: x86: Make x86 processor compat check callback empty

From: Isaku Yamahata <[email protected]>

Move processor compatibility check on all processors into
kvm_arch_hardware_setup() and make kvm_arch_check_processor_compat{,_all}()
empty. This is a preparation step to eliminate them.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 45 +++++++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa6594de1fc1..00cc74276819 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -129,6 +129,8 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);
static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);

+static int kvm_check_processor_compatibility(void);
+
struct kvm_x86_ops kvm_x86_ops __read_mostly;

#define KVM_X86_OP(func) \
@@ -12033,21 +12035,8 @@ int kvm_arch_del_vm(int usage_count)
return 0;
}

-static void check_processor_compat(void *rtn)
-{
- *(int *)rtn = kvm_arch_check_processor_compat();
-}
-
int kvm_arch_check_processor_compat_all(void)
{
- int cpu;
- int r;
-
- for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &r, 1);
- if (r < 0)
- return r;
- }
return 0;
}

@@ -12055,7 +12044,7 @@ int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
int ret;

- ret = kvm_arch_check_processor_compat();
+ ret = kvm_check_processor_compatibility();
if (ret)
return ret;

@@ -12125,6 +12114,24 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
kvm_pmu_ops_update(ops->pmu_ops);
}

+static void check_processor_compat(void *rtn)
+{
+ *(int *)rtn = kvm_check_processor_compatibility();
+}
+
+static int kvm_check_processor_compatibility_all(void)
+{
+ int cpu;
+ int r;
+
+ for_each_online_cpu(cpu) {
+ smp_call_function_single(cpu, check_processor_compat, &r, 1);
+ if (r < 0)
+ return r;
+ }
+ return 0;
+}
+
int kvm_arch_hardware_setup(void *opaque)
{
struct kvm_x86_init_ops *ops = opaque;
@@ -12165,7 +12172,8 @@ int kvm_arch_hardware_setup(void *opaque)
}
kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
kvm_init_msr_list();
- return 0;
+
+ return kvm_check_processor_compatibility_all();
}

void kvm_arch_hardware_unsetup(void)
@@ -12175,7 +12183,7 @@ void kvm_arch_hardware_unsetup(void)
static_call(kvm_x86_hardware_unsetup)();
}

-int kvm_arch_check_processor_compat(void)
+static int kvm_check_processor_compatibility(void)
{
int cpu = smp_processor_id();
struct cpuinfo_x86 *c = &cpu_data(cpu);
@@ -12195,6 +12203,11 @@ int kvm_arch_check_processor_compat(void)
return static_call(kvm_x86_check_processor_compatibility)();
}

+int kvm_arch_check_processor_compat(void)
+{
+ return 0;
+}
+
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
{
return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
--
2.25.1

2022-09-22 18:56:05

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 30/30] RFC: KVM: Remove cpus_hardware_enabled and related sanity check

From: Isaku Yamahata <[email protected]>

cpus_hardware_enabled mask seems incomplete protection against other kernel
component using CPU virtualization feature. Because it's obscure and
incomplete, remove the check.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 076a55d59988..e7708bfa0360 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -12,23 +12,16 @@

#include <linux/kvm_host.h>

-static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-
static int __hardware_enable(void)
{
- int cpu = raw_smp_processor_id();
int r;

WARN_ON_ONCE(preemptible());

- if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return 0;
r = kvm_arch_hardware_enable();
if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
- cpu, __builtin_return_address(0));
- else
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ smp_processor_id(), __builtin_return_address(0));
return r;
}

@@ -42,13 +35,7 @@ static void hardware_enable(void *arg)

static void hardware_disable(void *junk)
{
- int cpu = raw_smp_processor_id();
-
WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
kvm_arch_hardware_disable();
}

--
2.25.1

2022-09-22 18:56:11

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 05/30] KVM: Provide more information in kernel log if hardware enabling fails

From: Isaku Yamahata <[email protected]>

Provide the name of the calling function to hardware_enable_nolock() and
include it in the error message to provide additional information on
exactly what path failed.

Opportunistically bump the pr_info() to pr_warn(), failure to enable
virtualization support is warn-worthy as _something_ is wrong with the
system.

Suggested-by: Chao Gao <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4243a9541543..4f19c47aab1c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5006,7 +5006,8 @@ static void hardware_enable_nolock(void *junk)
if (r) {
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
atomic_inc(&hardware_enable_failed);
- pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
+ pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
+ cpu, __builtin_return_address(0));
}
}

--
2.25.1

2022-09-22 18:56:54

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 29/30] RFC: KVM: x86: Remove cpus_hardware_enabled and related sanity check

From: Isaku Yamahata <[email protected]>

cpus_hardware_enabled mask seems incomplete protection against other kernel
component using CPU virtualization feature. Because it's obscure and
incomplete, remove the check.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f791c93391b..7505dd4ae695 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11958,23 +11958,16 @@ void kvm_arch_resume(int usage_count)
}
}

-static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-
static int __hardware_enable(void)
{
- int cpu = raw_smp_processor_id();
int r;

WARN_ON_ONCE(preemptible());

- if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return 0;
r = static_call(kvm_x86_hardware_enable)();
if (r)
pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
- cpu, __builtin_return_address(0));
- else
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ smp_processor_id(), __builtin_return_address(0));
return r;
}

@@ -11988,13 +11981,7 @@ static void hardware_enable(void *arg)

static void hardware_disable(void *junk)
{
- int cpu = raw_smp_processor_id();
-
WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
static_call(kvm_x86_hardware_disable)();
drop_user_return_notifiers();
}
--
2.25.1

2022-09-22 18:59:30

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 28/30] KVM: Eliminate kvm_arch_check_processor_compat()

From: Isaku Yamahata <[email protected]>

Now all arch has "return 0" implementation. Eliminate it. If feature
compatibility check is needed, it should be done in
kvm_arch_hardware_setup(), kvm_arch_online_cpu(), and kvm_arch_resume().

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/arm64/kvm/arm.c | 5 -----
arch/mips/kvm/mips.c | 5 -----
arch/powerpc/kvm/powerpc.c | 5 -----
arch/riscv/kvm/main.c | 5 -----
arch/s390/kvm/kvm-s390.c | 5 -----
arch/x86/kvm/x86.c | 10 ----------
include/linux/kvm_host.h | 2 --
virt/kvm/kvm_arch.c | 22 ----------------------
virt/kvm/kvm_main.c | 4 ----
9 files changed, 63 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7e83498b83aa..de0397bd7b18 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -68,11 +68,6 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void)
-{
- return 0;
-}
-
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 092d09fb6a7e..f4feae89075c 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,11 +140,6 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void)
-{
- return 0;
-}
-
extern void kvm_init_loongson_ipi(struct kvm *kvm);

int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 31dc4f231e9d..546ebbe16bfe 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -447,11 +447,6 @@ int kvm_arch_hardware_setup(void *opaque)
return kvmppc_core_check_processor_compat();
}

-int kvm_arch_check_processor_compat(void)
-{
- return 0;
-}
-
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
struct kvmppc_ops *kvm_ops = NULL;
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index f8d6372d208f..ebabcb3dfb8c 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,11 +20,6 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}

-int kvm_arch_check_processor_compat(void)
-{
- return 0;
-}
-
int kvm_arch_hardware_setup(void *opaque)
{
return 0;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e26d4dd85668..9c5d3e4b464f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -254,11 +254,6 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_check_processor_compat(void)
-{
- return 0;
-}
-
/* forward declarations */
static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
unsigned long end);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00cc74276819..1f791c93391b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12035,11 +12035,6 @@ int kvm_arch_del_vm(int usage_count)
return 0;
}

-int kvm_arch_check_processor_compat_all(void)
-{
- return 0;
-}
-
int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
int ret;
@@ -12203,11 +12198,6 @@ static int kvm_check_processor_compatibility(void)
return static_call(kvm_x86_check_processor_compatibility)();
}

-int kvm_arch_check_processor_compat(void)
-{
- return 0;
-}
-
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
{
return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7efc4792ff5f..87d4f42f3ff9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1447,8 +1447,6 @@ void kvm_arch_hardware_disable(void);

int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void);
-int kvm_arch_check_processor_compat_all(void);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 2509c2777a49..076a55d59988 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -82,32 +82,10 @@ int __weak kvm_arch_del_vm(int usage_count)
return 0;
}

-static void check_processor_compat(void *rtn)
-{
- *(int *)rtn = kvm_arch_check_processor_compat();
-}
-
-int __weak kvm_arch_check_processor_compat_all(void)
-{
- int cpu;
- int r;
-
- for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &r, 1);
- if (r < 0)
- return r;
- }
- return 0;
-}
-
int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
int ret;

- ret = kvm_arch_check_processor_compat();
- if (ret)
- return ret;
-
if (!usage_count)
return 0;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d5f882fb9e0c..1c1a2b0630bc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5777,10 +5777,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r < 0)
goto out_free_1;

- r = kvm_arch_check_processor_compat_all();
- if (r < 0)
- goto out_free_2;
-
r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
kvm_online_cpu, kvm_offline_cpu);
if (r)
--
2.25.1

2022-09-22 18:59:58

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 11/30] KVM: Add arch hook for reboot event

From: Isaku Yamahata <[email protected]>

Factor out the logic on reboot event as arch hook. Later kvm/x86 overrides
it.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 18 ++++++++++++++++--
2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3fbb01bbac98..084ee8a13e9f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1434,6 +1434,8 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif

+int kvm_arch_reboot(int val);
+
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e2c8823786ff..58385000b73f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1142,6 +1142,12 @@ int __weak kvm_arch_del_vm(int usage_count)
return 0;
}

+int __weak kvm_arch_reboot(int val)
+{
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return NOTIFY_OK;
+}
+
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -5135,6 +5141,8 @@ static void kvm_del_vm(void)
static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
void *v)
{
+ int r;
+
/*
* Some (well, at least mine) BIOSes hang on reboot if
* in vmx root mode.
@@ -5143,8 +5151,14 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
*/
pr_info("kvm: exiting hardware virtualization\n");
kvm_rebooting = true;
- on_each_cpu(hardware_disable_nolock, NULL, 1);
- return NOTIFY_OK;
+
+ /* This hook is called without cpuhotplug disabled. */
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
+ r = kvm_arch_reboot(val);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
+ return r;
}

static struct notifier_block kvm_reboot_notifier = {
--
2.25.1

2022-09-22 19:15:37

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 09/30] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

From: Isaku Yamahata <[email protected]>

Because kvm_count_lock unnecessarily complicates the KVM locking convention
Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
simplicity. kvm_arch_hardware_enable/disable() callbacks depend on
non-preemptiblity with the spin lock. Add preempt_disable/enable()
around hardware enable/disable callback to keep the assumption.

Because kvm_suspend() and kvm_resume() is called with interrupt disabled,
they don't need preempt_disable/enable() pair.

Opportunistically add some comments on locking.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
Documentation/virt/kvm/locking.rst | 14 +++-----
virt/kvm/kvm_main.c | 56 +++++++++++++++++++++++-------
2 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 845a561629f1..55d6559ace2a 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list
-
-``kvm_count_lock``
-^^^^^^^^^^^^^^^^^^
-
-:Type: raw_spinlock_t
-:Arch: any
-:Protects: - hardware virtualization enable/disable
-:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
- migration.
+ - kvm_usage_count
+ - hardware virtualization enable/disable
+:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
+ because hardware enabling/disabling must be atomic /wrt
+ CPU hotplug. The lock order is cpus lock => kvm_lock.

``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b1bf44af523c..c4b908553726 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
*/

DEFINE_MUTEX(kvm_lock);
-static DEFINE_RAW_SPINLOCK(kvm_count_lock);
LIST_HEAD(vm_list);

static cpumask_var_t cpus_hardware_enabled;
@@ -4996,6 +4995,8 @@ static void hardware_enable_nolock(void *junk)
int cpu = raw_smp_processor_id();
int r;

+ WARN_ON_ONCE(preemptible());
+
if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
return;

@@ -5019,7 +5020,7 @@ static int kvm_online_cpu(unsigned int cpu)
if (ret)
return ret;

- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
@@ -5028,13 +5029,20 @@ static int kvm_online_cpu(unsigned int cpu)
if (kvm_usage_count) {
WARN_ON_ONCE(atomic_read(&hardware_enable_failed));

+ /*
+ * arch callback kvm_arch_hardware_eanble() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
hardware_enable_nolock(NULL);
+ preempt_enable();
if (atomic_read(&hardware_enable_failed)) {
atomic_set(&hardware_enable_failed, 0);
ret = -EIO;
}
}
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
return ret;
}

@@ -5042,6 +5050,8 @@ static void hardware_disable_nolock(void *junk)
{
int cpu = raw_smp_processor_id();

+ WARN_ON_ONCE(preemptible());
+
if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
return;
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
@@ -5050,10 +5060,18 @@ static void hardware_disable_nolock(void *junk)

static int kvm_offline_cpu(unsigned int cpu)
{
- raw_spin_lock(&kvm_count_lock);
- if (kvm_usage_count)
+ mutex_lock(&kvm_lock);
+ if (kvm_usage_count) {
+ /*
+ * arch callback kvm_arch_hardware_disable() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
hardware_disable_nolock(NULL);
- raw_spin_unlock(&kvm_count_lock);
+ preempt_enable();
+ }
+ mutex_unlock(&kvm_lock);
return 0;
}

@@ -5068,9 +5086,11 @@ static void hardware_disable_all_nolock(void)

static void hardware_disable_all(void)
{
- raw_spin_lock(&kvm_count_lock);
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
hardware_disable_all_nolock();
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
}

static int hardware_enable_all(void)
@@ -5088,7 +5108,7 @@ static int hardware_enable_all(void)
* Disable CPU hotplug to prevent this case from happening.
*/
cpus_read_lock();
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);

kvm_usage_count++;
if (kvm_usage_count == 1) {
@@ -5101,7 +5121,7 @@ static int hardware_enable_all(void)
}
}

- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
cpus_read_unlock();

return r;
@@ -5708,15 +5728,27 @@ static void kvm_init_debug(void)

static int kvm_suspend(void)
{
- if (kvm_usage_count)
+ /*
+ * The caller ensures that CPU hotplug is disabled by
+ * cpu_hotplug_disable() and other CPUs are offlined. No need for
+ * locking.
+ */
+ lockdep_assert_not_held(&kvm_lock);
+
+ if (kvm_usage_count) {
+ /*
+ * Because kvm_suspend() is called with interrupt disabled, no
+ * need to disable preemption.
+ */
hardware_disable_nolock(NULL);
+ }
return 0;
}

static void kvm_resume(void)
{
if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_count_lock);
+ lockdep_assert_not_held(&kvm_lock);
hardware_enable_nolock(NULL);
}
}
--
2.25.1

2022-09-22 19:19:33

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 04/30] Partially revert "KVM: Pass kvm_init()'s opaque param to additional arch funcs"

From: Chao Gao <[email protected]>

This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
param to additional arch funcs") remove opaque from
kvm_arch_check_processor_compat because no one uses this opaque now.
Address conflicts for ARM (due to file movement) and manually handle RISC-V
which comes after the commit.

And changes about kvm_arch_hardware_setup() in original commit are still
needed so they are not reverted.

Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Acked-by: Anup Patel <[email protected]>
Acked-by: Claudio Imbrenda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
---
arch/arm64/kvm/arm.c | 2 +-
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/powerpc.c | 2 +-
arch/riscv/kvm/main.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 16 +++-------------
8 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2ff0ef62abad..3385fb57c11a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -68,7 +68,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..092d09fb6a7e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fb1490761c87..7b56d6ccfdfb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -447,7 +447,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return kvmppc_core_check_processor_compat();
}
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 1549205fe5fe..f8d6372d208f 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index edfd4bbd0cba..e26d4dd85668 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -254,7 +254,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 34c7336ac581..230812d6cbfd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12028,7 +12028,7 @@ void kvm_arch_hardware_unsetup(void)
static_call(kvm_x86_hardware_unsetup)();
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..eab352902de7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1438,7 +1438,7 @@ int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void *opaque);
+int kvm_arch_check_processor_compat(void);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 584a5bab3af3..4243a9541543 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5799,22 +5799,14 @@ void kvm_unregister_perf_callbacks(void)
}
#endif

-struct kvm_cpu_compat_check {
- void *opaque;
- int *ret;
-};
-
-static void check_processor_compat(void *data)
+static void check_processor_compat(void *rtn)
{
- struct kvm_cpu_compat_check *c = data;
-
- *c->ret = kvm_arch_check_processor_compat(c->opaque);
+ *(int *)rtn = kvm_arch_check_processor_compat();
}

int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module)
{
- struct kvm_cpu_compat_check c;
int r;
int cpu;

@@ -5842,10 +5834,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r < 0)
goto out_free_1;

- c.ret = &r;
- c.opaque = opaque;
for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &c, 1);
+ smp_call_function_single(cpu, check_processor_compat, &r, 1);
if (r < 0)
goto out_free_2;
}
--
2.25.1

2022-09-22 19:19:33

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 17/30] KVM: Move out KVM arch PM hooks and hardware enable/disable logic

From: Isaku Yamahata <[email protected]>

To make clear that those files are default implementation and that KVM/x86
(and other KVM arch in future) will override them, split out those into a
dedicated file. Once conversions for all kvm archs are done, the file will
be deleted.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/Makefile.kvm | 2 +-
virt/kvm/kvm_arch.c | 164 ++++++++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 160 -----------------------------------------
3 files changed, 165 insertions(+), 161 deletions(-)
create mode 100644 virt/kvm/kvm_arch.c

diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..428b09b3f80a 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,7 @@

KVM ?= ../../../virt/kvm

-kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o $(KVM)/kvm_arch.o
kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
new file mode 100644
index 000000000000..bcf8b74144e3
--- /dev/null
+++ b/virt/kvm/kvm_arch.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * kvm_arch.c: kvm default arch hooks for hardware enabling/disabling
+ * Copyright (c) 2022 Intel Corporation.
+ *
+ * Author:
+ * Isaku Yamahata <[email protected]>
+ * <[email protected]>
+ *
+ * TODO: Delete this file once the conversion of all KVM arch is done.
+ */
+
+#include <linux/kvm_host.h>
+
+static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
+static atomic_t hardware_enable_failed;
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int __weak kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return 0;
+}
+
+static void hardware_enable_nolock(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+ int r;
+
+ WARN_ON_ONCE(preemptible());
+
+ if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+
+ r = kvm_arch_hardware_enable();
+
+ if (r) {
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ atomic_inc(&hardware_enable_failed);
+ pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
+ cpu, __builtin_return_address(0));
+ }
+}
+
+static void hardware_disable_nolock(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ WARN_ON_ONCE(preemptible());
+
+ if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ kvm_arch_hardware_disable();
+}
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&hardware_enable_failed, 0);
+ on_each_cpu(hardware_enable_nolock, NULL, 1);
+
+ if (atomic_read(&hardware_enable_failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r)
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return r;
+}
+
+int __weak kvm_arch_del_vm(int usage_count)
+{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return 0;
+}
+
+int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ int ret = 0;
+
+ ret = kvm_arch_check_processor_compat();
+ if (ret)
+ return ret;
+
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ if (usage_count) {
+ WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+
+ /*
+ * arch callback kvm_arch_hardware_eanble() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
+ hardware_enable_nolock(NULL);
+ preempt_enable();
+ if (atomic_read(&hardware_enable_failed)) {
+ atomic_set(&hardware_enable_failed, 0);
+ ret = -EIO;
+ }
+ }
+ return ret;
+}
+
+int __weak kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ if (usage_count) {
+ /*
+ * arch callback kvm_arch_hardware_disable() assumes that
+ * preemption is disabled for historical reason. Disable
+ * preemption until all arch callbacks are fixed.
+ */
+ preempt_disable();
+ hardware_disable_nolock(NULL);
+ preempt_enable();
+ }
+ return 0;
+}
+
+int __weak kvm_arch_reboot(int val)
+{
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return NOTIFY_OK;
+}
+
+int __weak kvm_arch_suspend(int usage_count)
+{
+ if (usage_count)
+ /*
+ * Because kvm_suspend() is called with interrupt disabled, no
+ * need to disable preemption.
+ */
+ hardware_disable_nolock(NULL);
+ return 0;
+}
+
+void __weak kvm_arch_resume(int usage_count)
+{
+ if (usage_count)
+ hardware_enable_nolock(NULL);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d7c3bc14691f..b1a09d2d5982 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,9 +102,7 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
DEFINE_MUTEX(kvm_lock);
LIST_HEAD(vm_list);

-static cpumask_var_t cpus_hardware_enabled;
static int kvm_usage_count;
-static atomic_t hardware_enable_failed;

static struct kmem_cache *kvm_vcpu_cache;

@@ -142,8 +140,6 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
.open = kvm_no_compat_open
#endif
-static void hardware_enable_nolock(void *junk);
-static void hardware_disable_nolock(void *junk);
static void kvm_del_vm(void);

static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1098,120 +1094,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
return ret;
}

-/*
- * Called after the VM is otherwise initialized, but just before adding it to
- * the vm_list.
- */
-int __weak kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
-/*
- * Called after the VM is otherwise initialized, but just before adding it to
- * the vm_list.
- */
-int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
-{
- int r = 0;
-
- if (usage_count != 1)
- return 0;
-
- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
-
- if (atomic_read(&hardware_enable_failed)) {
- r = -EBUSY;
- goto err;
- }
-
- r = kvm_arch_post_init_vm(kvm);
-err:
- if (r)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
- return r;
-}
-
-int __weak kvm_arch_del_vm(int usage_count)
-{
- if (usage_count)
- return 0;
-
- on_each_cpu(hardware_disable_nolock, NULL, 1);
- return 0;
-}
-
-int __weak kvm_arch_online_cpu(unsigned int cpu, int usage_count)
-{
- int ret = 0;
-
- ret = kvm_arch_check_processor_compat();
- if (ret)
- return ret;
-
- /*
- * Abort the CPU online process if hardware virtualization cannot
- * be enabled. Otherwise running VMs would encounter unrecoverable
- * errors when scheduled to this CPU.
- */
- if (usage_count) {
- WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
-
- /*
- * arch callback kvm_arch_hardware_eanble() assumes that
- * preemption is disabled for historical reason. Disable
- * preemption until all arch callbacks are fixed.
- */
- preempt_disable();
- hardware_enable_nolock(NULL);
- preempt_enable();
- if (atomic_read(&hardware_enable_failed)) {
- atomic_set(&hardware_enable_failed, 0);
- ret = -EIO;
- }
- }
- return ret;
-}
-
-int __weak kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
-{
- if (usage_count) {
- /*
- * arch callback kvm_arch_hardware_disable() assumes that
- * preemption is disabled for historical reason. Disable
- * preemption until all arch callbacks are fixed.
- */
- preempt_disable();
- hardware_disable_nolock(NULL);
- preempt_enable();
- }
- return 0;
-}
-
-int __weak kvm_arch_reboot(int val)
-{
- on_each_cpu(hardware_disable_nolock, NULL, 1);
- return NOTIFY_OK;
-}
-
-int __weak kvm_arch_suspend(int usage_count)
-{
- if (usage_count)
- /*
- * Because kvm_suspend() is called with interrupt disabled, no
- * need to disable preemption.
- */
- hardware_disable_nolock(NULL);
- return 0;
-}
-
-void __weak kvm_arch_resume(int usage_count)
-{
- if (usage_count)
- hardware_enable_nolock(NULL);
-}
-
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -5106,28 +4988,6 @@ static struct miscdevice kvm_dev = {
&kvm_chardev_ops,
};

-static void hardware_enable_nolock(void *junk)
-{
- int cpu = raw_smp_processor_id();
- int r;
-
- WARN_ON_ONCE(preemptible());
-
- if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
- return;
-
- cpumask_set_cpu(cpu, cpus_hardware_enabled);
-
- r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
- atomic_inc(&hardware_enable_failed);
- pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
- cpu, __builtin_return_address(0));
- }
-}
-
static int kvm_online_cpu(unsigned int cpu)
{
int ret;
@@ -5138,18 +4998,6 @@ static int kvm_online_cpu(unsigned int cpu)
return ret;
}

-static void hardware_disable_nolock(void *junk)
-{
- int cpu = raw_smp_processor_id();
-
- WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
- kvm_arch_hardware_disable();
-}
-
static int kvm_offline_cpu(unsigned int cpu)
{
int ret;
@@ -5930,11 +5778,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r)
goto out_irqfd;

- if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
- r = -ENOMEM;
- goto out_free_0;
- }
-
r = kvm_arch_hardware_setup(opaque);
if (r < 0)
goto out_free_1;
@@ -6011,8 +5854,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
out_free_2:
kvm_arch_hardware_unsetup();
out_free_1:
- free_cpumask_var(cpus_hardware_enabled);
-out_free_0:
kvm_irqfd_exit();
out_irqfd:
kvm_arch_exit();
@@ -6037,7 +5878,6 @@ void kvm_exit(void)
kvm_arch_hardware_unsetup();
kvm_arch_exit();
kvm_irqfd_exit();
- free_cpumask_var(cpus_hardware_enabled);
kvm_vfio_ops_exit();
}
EXPORT_SYMBOL_GPL(kvm_exit);
--
2.25.1

2022-09-22 19:21:28

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 01/30] KVM: x86: Drop kvm_user_return_msr_cpu_online()

From: Sean Christopherson <[email protected]>

KVM/X86 uses user return notifier to switch MSR for guest or user space.
Snapshot host values on CPU online, change MSR values for guest, and
restore them on returning to user space. The current code abuses
kvm_arch_hardware_enable() which is called on kvm module initialization or
CPU online.

Remove such the abuse of kvm_arch_hardware_enable() by capturing the host
value on the first change of the MSR value to guest VM instead of CPU
online.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Chao Gao <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
---
arch/x86/kvm/x86.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43a6a7efc6ec..9e1b3af4a074 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -196,6 +196,7 @@ module_param(eager_page_split, bool, 0644);

struct kvm_user_return_msrs {
struct user_return_notifier urn;
+ bool initialized;
bool registered;
struct kvm_user_return_msr_values {
u64 host;
@@ -409,18 +410,20 @@ int kvm_find_user_return_msr(u32 msr)
}
EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);

-static void kvm_user_return_msr_cpu_online(void)
+static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
u64 value;
int i;

+ if (msrs->initialized)
+ return;
+
for (i = 0; i < kvm_nr_uret_msrs; ++i) {
rdmsrl_safe(kvm_uret_msrs_list[i], &value);
msrs->values[i].host = value;
msrs->values[i].curr = value;
}
+ msrs->initialized = true;
}

int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
@@ -429,6 +432,8 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
int err;

+ kvm_user_return_msr_init_cpu(msrs);
+
value = (value & mask) | (msrs->values[slot].host & ~mask);
if (value == msrs->values[slot].curr)
return 0;
@@ -9229,7 +9234,12 @@ int kvm_arch_init(void *opaque)
return -ENOMEM;
}

- user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
+ /*
+ * __GFP_ZERO to ensure user_return_msrs.initialized = false.
+ * See kvm_user_return_msr_init_cpu().
+ */
+ user_return_msrs = alloc_percpu_gfp(struct kvm_user_return_msrs,
+ GFP_KERNEL | __GFP_ZERO);
if (!user_return_msrs) {
printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
r = -ENOMEM;
@@ -11866,7 +11876,6 @@ int kvm_arch_hardware_enable(void)
u64 max_tsc = 0;
bool stable, backwards_tsc = false;

- kvm_user_return_msr_cpu_online();
ret = static_call(kvm_x86_hardware_enable)();
if (ret != 0)
return ret;
--
2.25.1

2022-09-22 19:22:19

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 22/30] KVM: x86: Move TSC fixup logic to KVM arch resume callback

From: Isaku Yamahata <[email protected]>

commit 0dd6a6edb012 ("KVM: Dont mark TSC unstable due to S4 suspend") made
use of kvm_arch_hardware_enable() callback to detect that TSC goes backward
due to S4 suspend. It has to check it only when resuming from S4. Not
every time virtualization hardware ennoblement. Move the logic to
kvm_arch_resume() callback.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b15eb59096b6..d49396bb6c2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11865,18 +11865,26 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

int kvm_arch_hardware_enable(void)
+{
+ return static_call(kvm_x86_hardware_enable)();
+}
+
+static int __hardware_enable(void);
+
+void kvm_arch_resume(int usage_count)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
unsigned long i;
- int ret;
u64 local_tsc;
u64 max_tsc = 0;
bool stable, backwards_tsc = false;

- ret = static_call(kvm_x86_hardware_enable)();
- if (ret != 0)
- return ret;
+ if (!usage_count)
+ return;
+
+ if (__hardware_enable())
+ return;

local_tsc = rdtsc();
stable = !kvm_check_tsc_unstable();
@@ -11951,7 +11959,6 @@ int kvm_arch_hardware_enable(void)
}

}
- return 0;
}

void kvm_arch_hardware_disable(void)
@@ -12115,12 +12122,6 @@ int kvm_arch_suspend(int usage_count)
return 0;
}

-void kvm_arch_resume(int usage_count)
-{
- if (usage_count)
- (void)__hardware_enable();
-}
-
static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
{
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
--
2.25.1

2022-09-22 19:23:05

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 16/30] KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()

From: Isaku Yamahata <[email protected]>

hardware_enable/disable_nolock() check if the hardware is already
enabled/disabled and work as nop when they are called multiple times.

When VM is created/destroyed, on_each_cpu(hardware_enable/disable_nolock)
via kvm_arch_add/del_vm() and module_get/put() are called. It means when
kvm module is removed, it's guaranteed that there is no vm and that
hardware_disable_nolock() was called on each cpus.

Although the module exit function, kvm_exit(), calls
on_each_cpu(hardware_disable_nolock), it's essentially nop. Eliminate nop
call in kvm_exit().

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_main.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ad9b8b7d21fa..d7c3bc14691f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6034,7 +6034,6 @@ void kvm_exit(void)
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
- on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_arch_hardware_unsetup();
kvm_arch_exit();
kvm_irqfd_exit();
--
2.25.1

2022-09-22 19:27:25

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 03/30] KVM: x86: Move check_processor_compatibility from init ops to runtime ops

From: Chao Gao <[email protected]>

so that KVM can do compatibility checks on hotplugged CPUs. Drop __init
from check_processor_compatibility() and its callees.

use a static_call() to invoke .check_processor_compatibility.

Opportunistically rename {svm,vmx}_check_processor_compat to conform
to the naming convention of fields of kvm_x86_ops.

Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 14 +++++++-------
arch/x86/kvm/x86.c | 3 +--
5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 51f777071584..3bc45932e2d1 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP(check_processor_compatibility)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..5df5d88d345f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1445,6 +1445,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
struct kvm_x86_ops {
const char *name;

+ int (*check_processor_compatibility)(void);
int (*hardware_enable)(void);
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
@@ -1655,7 +1656,6 @@ struct kvm_x86_nested_ops {
struct kvm_x86_init_ops {
int (*cpu_has_kvm_support)(void);
int (*disabled_by_bios)(void);
- int (*check_processor_compatibility)(void);
int (*hardware_setup)(void);
unsigned int (*handle_intel_pt_intr)(void);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..371300f03f55 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4134,7 +4134,7 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xd9;
}

-static int __init svm_check_processor_compat(void)
+static int svm_check_processor_compatibility(void)
{
return 0;
}
@@ -4740,6 +4740,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = "kvm_amd",

.hardware_unsetup = svm_hardware_unsetup,
+ .check_processor_compatibility = svm_check_processor_compatibility,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.has_emulated_msr = svm_has_emulated_msr,
@@ -5122,7 +5123,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
.hardware_setup = svm_hardware_setup,
- .check_processor_compatibility = svm_check_processor_compat,

.runtime_ops = &svm_x86_ops,
.pmu_ops = &amd_pmu_ops,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c9b49a09e6b5..4ad058dc9794 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2494,8 +2494,8 @@ static bool cpu_has_sgx(void)
return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
}

-static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
- u32 msr, u32 *result)
+static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
+ u32 msr, u32 *result)
{
u32 vmx_msr_low, vmx_msr_high;
u32 ctl = ctl_min | ctl_opt;
@@ -2513,7 +2513,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
return 0;
}

-static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
+static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
{
u64 allowed;

@@ -2522,8 +2522,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
return ctl_opt & allowed;
}

-static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
- struct vmx_capability *vmx_cap)
+static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
+ struct vmx_capability *vmx_cap)
{
u32 vmx_msr_low, vmx_msr_high;
u32 min, opt, min2, opt2;
@@ -7416,7 +7416,7 @@ static int vmx_vm_init(struct kvm *kvm)
return 0;
}

-static int __init vmx_check_processor_compat(void)
+static int vmx_check_processor_compatibility(void)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;
@@ -8014,6 +8014,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {

.hardware_unsetup = vmx_hardware_unsetup,

+ .check_processor_compatibility = vmx_check_processor_compatibility,
.hardware_enable = vmx_hardware_enable,
.hardware_disable = vmx_hardware_disable,
.has_emulated_msr = vmx_has_emulated_msr,
@@ -8403,7 +8404,6 @@ static __init int hardware_setup(void)
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
- .check_processor_compatibility = vmx_check_processor_compat,
.hardware_setup = hardware_setup,
.handle_intel_pt_intr = NULL,

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fc19533a484..34c7336ac581 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12031,7 +12031,6 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
- struct kvm_x86_init_ops *ops = opaque;

WARN_ON(!irqs_disabled());

@@ -12039,7 +12038,7 @@ int kvm_arch_check_processor_compat(void *opaque)
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;

- return ops->check_processor_compatibility();
+ return static_call(kvm_x86_check_processor_compatibility)();
}

bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--
2.25.1

2022-09-22 19:29:01

by Isaku Yamahata

[permalink] [raw]
Subject: [PATCH v5 12/30] KVM: Add arch hook for suspend

From: Isaku Yamahata <[email protected]>

Factor out the logic on suspend event as arch callback. Later kvm/x86 will
override it.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 21 ++++++++++++---------
2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 084ee8a13e9f..861aad8812ff 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1435,6 +1435,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif

int kvm_arch_reboot(int val);
+int kvm_arch_suspend(int usage_count);

int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 58385000b73f..0ebe43a695e5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1148,6 +1148,17 @@ int __weak kvm_arch_reboot(int val)
return NOTIFY_OK;
}

+int __weak kvm_arch_suspend(int usage_count)
+{
+ if (usage_count)
+ /*
+ * Because kvm_suspend() is called with interrupt disabled, no
+ * need to disable preemption.
+ */
+ hardware_disable_nolock(NULL);
+ return 0;
+}
+
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -5753,15 +5764,7 @@ static int kvm_suspend(void)
* locking.
*/
lockdep_assert_not_held(&kvm_lock);
-
- if (kvm_usage_count) {
- /*
- * Because kvm_suspend() is called with interrupt disabled, no
- * need to disable preemption.
- */
- hardware_disable_nolock(NULL);
- }
- return 0;
+ return kvm_arch_suspend(kvm_usage_count);
}

static void kvm_resume(void)
--
2.25.1

2022-09-23 07:10:36

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v5 27/30] RFC: KVM: powerpc: Move processor compatibility check to hardware setup

[email protected] writes:
> From: Isaku Yamahata <[email protected]>
>
> Move processor compatibility check from kvm_arch_processor_compat() into
^
kvm_arch_check_processor_compat()

> kvm_arch_hardware_setup(). The check does model name comparison with a
> global variable, cur_cpu_spec. There is no point to check it at run time
> on all processors.

A key detail I had to look up is that both kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() are called from kvm_init(), one after
the other. But the latter is called on each CPU.

And because the powerpc implementation of kvm_arch_check_processor_compat()
just checks a global, there's no need to call it on every CPU.

> kvmppc_core_check_processor_compat() checks the global variable. There are
> five implementation for it as follows.

There are three implementations not five.

> arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
> arch/powerpc/kvm/book3s.c: return 0
> arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
> arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
> strcmp(cur_cpu_spec->cpu_name, "e5500")
> strcmp(cur_cpu_spec->cpu_name, "e6500")
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Cc: [email protected]
> Cc: Fabiano Rosas <[email protected]>
> ---
> arch/powerpc/kvm/powerpc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 7b56d6ccfdfb..31dc4f231e9d 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -444,12 +444,12 @@ int kvm_arch_hardware_enable(void)
>
> int kvm_arch_hardware_setup(void *opaque)
> {
> - return 0;
> + return kvmppc_core_check_processor_compat();
> }
>
> int kvm_arch_check_processor_compat(void)
> {
> - return kvmppc_core_check_processor_compat();
> + return 0;
> }

The actual change seems OK. I gave it a quick test boot and ran some
VMs, everything seems to work as before.

Acked-by: Michael Ellerman <[email protected]> (powerpc)

cheers

2022-09-27 00:51:42

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v5 27/30] RFC: KVM: powerpc: Move processor compatibility check to hardware setup

On Fri, Sep 23, 2022 at 04:58:41PM +1000,
Michael Ellerman <[email protected]> wrote:

> [email protected] writes:
> > From: Isaku Yamahata <[email protected]>
> >
> > Move processor compatibility check from kvm_arch_processor_compat() into
> ^
> kvm_arch_check_processor_compat()
>
> > kvm_arch_hardware_setup(). The check does model name comparison with a
> > global variable, cur_cpu_spec. There is no point to check it at run time
> > on all processors.
>
> A key detail I had to look up is that both kvm_arch_hardware_setup() and
> kvm_arch_check_processor_compat() are called from kvm_init(), one after
> the other. But the latter is called on each CPU.
>
> And because the powerpc implementation of kvm_arch_check_processor_compat()
> just checks a global, there's no need to call it on every CPU.
>
> > kvmppc_core_check_processor_compat() checks the global variable. There are
> > five implementation for it as follows.
>
> There are three implementations not five.

Thanks. I'll update the commit message.

> > arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
> > arch/powerpc/kvm/book3s.c: return 0
> > arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
> > arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
> > strcmp(cur_cpu_spec->cpu_name, "e5500")
> > strcmp(cur_cpu_spec->cpu_name, "e6500")
> >
> > Suggested-by: Sean Christopherson <[email protected]>
> > Signed-off-by: Isaku Yamahata <[email protected]>
> > Cc: [email protected]
> > Cc: Fabiano Rosas <[email protected]>
> > ---
> > arch/powerpc/kvm/powerpc.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 7b56d6ccfdfb..31dc4f231e9d 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -444,12 +444,12 @@ int kvm_arch_hardware_enable(void)
> >
> > int kvm_arch_hardware_setup(void *opaque)
> > {
> > - return 0;
> > + return kvmppc_core_check_processor_compat();
> > }
> >
> > int kvm_arch_check_processor_compat(void)
> > {
> > - return kvmppc_core_check_processor_compat();
> > + return 0;
> > }
>
> The actual change seems OK. I gave it a quick test boot and ran some
> VMs, everything seems to work as before.
>
> Acked-by: Michael Ellerman <[email protected]> (powerpc)

Thanks so much for testing. I'll remove RFC.
--
Isaku Yamahata <[email protected]>

2022-10-04 01:14:33

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v5 10/30] KVM: Add arch hooks when VM is added/deleted

On Thu, Sep 22, 2022 at 11:20:39AM -0700,
[email protected] wrote:

> From: Isaku Yamahata <[email protected]>
>
> and pass kvm_usage_count with kvm_lock. Move kvm_arch_post_init_vm() under
> kvm_arch_add_vm(). Replace enable/disable_hardware_all() with the default
> implementation of kvm_arch_add/del_vm(). Later kvm_arch_post_init_vm() is
> deleted once x86 overrides kvm_arch_add_vm().
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> include/linux/kvm_host.h | 2 +
> virt/kvm/kvm_main.c | 121 ++++++++++++++++++++-------------------
> 2 files changed, 65 insertions(+), 58 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index eab352902de7..3fbb01bbac98 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1445,6 +1445,8 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
> bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
> int kvm_arch_post_init_vm(struct kvm *kvm);
> +int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
> +int kvm_arch_del_vm(int usage_count);
> void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> int kvm_arch_create_vm_debugfs(struct kvm *kvm);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c4b908553726..e2c8823786ff 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -142,8 +142,9 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
> #define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
> .open = kvm_no_compat_open
> #endif
> -static int hardware_enable_all(void);
> -static void hardware_disable_all(void);
> +static void hardware_enable_nolock(void *junk);
> +static void hardware_disable_nolock(void *junk);
> +static void kvm_del_vm(void);
>
> static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>
> @@ -1106,6 +1107,41 @@ int __weak kvm_arch_post_init_vm(struct kvm *kvm)
> return 0;
> }
>
> +/*
> + * Called after the VM is otherwise initialized, but just before adding it to
> + * the vm_list.
> + */
> +int __weak kvm_arch_add_vm(struct kvm *kvm, int usage_count)
> +{
> + int r = 0;
> +
> + if (usage_count != 1)
> + return 0;

Oops. This line should be.
+ return kvm_arch_post_init_vm(kvm);

"KVM: x86: Duplicate arch callbacks related to pm events and compat check"
should have same line. "KVM: Eliminate kvm_arch_post_init_vm()" should include
kvm_mmu_post_init_vm().
It creates a kernel thread. kvm unit test didn't catch it.

Thanks,

> +
> + atomic_set(&hardware_enable_failed, 0);
> + on_each_cpu(hardware_enable_nolock, NULL, 1);
> +
> + if (atomic_read(&hardware_enable_failed)) {
> + r = -EBUSY;
> + goto err;
> + }
> +
> + r = kvm_arch_post_init_vm(kvm);
> +err:
> + if (r)
> + on_each_cpu(hardware_disable_nolock, NULL, 1);
> + return r;
> +}
> +
> +int __weak kvm_arch_del_vm(int usage_count)
> +{
> + if (usage_count)
> + return 0;
> +
> + on_each_cpu(hardware_disable_nolock, NULL, 1);
> + return 0;
> +}
> +
> /*
> * Called just after removing the VM from the vm_list, but before doing any
> * other destruction.
--
Isaku Yamahata <[email protected]>

2022-10-11 20:04:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 03/30] KVM: x86: Move check_processor_compatibility from init ops to runtime ops

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Chao Gao <[email protected]>
>
> so that KVM can do compatibility checks on hotplugged CPUs. Drop __init
> from check_processor_compatibility() and its callees.
>
> use a static_call() to invoke .check_processor_compatibility.
>
> Opportunistically rename {svm,vmx}_check_processor_compat to conform
> to the naming convention of fields of kvm_x86_ops.
>
> Signed-off-by: Chao Gao <[email protected]>
> Reviewed-by: Sean Christopherson <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Yuan Yao <[email protected]>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm/svm.c | 4 ++--
> arch/x86/kvm/vmx/vmx.c | 14 +++++++-------
> arch/x86/kvm/x86.c | 3 +--
> 5 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 51f777071584..3bc45932e2d1 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
> KVM_X86_OP(complete_emulated_msr)
> KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> +KVM_X86_OP(check_processor_compatibility)

Nit I missed in all previous versions: please add this new entry above "hardware_enable"
so as to keep the order similar to the order in struct kvm_x86_ops. E.g.

KVM_X86_OP(check_processor_compatibility)
KVM_X86_OP(hardware_enable)
...

2022-10-12 20:01:20

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 05/30] KVM: Provide more information in kernel log if hardware enabling fails

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> Provide the name of the calling function to hardware_enable_nolock() and
> include it in the error message to provide additional information on
> exactly what path failed.

Changelog doesn't match the code.

> Opportunistically bump the pr_info() to pr_warn(), failure to enable
> virtualization support is warn-worthy as _something_ is wrong with the
> system.
>
> Suggested-by: Chao Gao <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> virt/kvm/kvm_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4243a9541543..4f19c47aab1c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5006,7 +5006,8 @@ static void hardware_enable_nolock(void *junk)
> if (r) {
> cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> atomic_inc(&hardware_enable_failed);
> - pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
> + pr_warn("kvm: enabling virtualization on CPU%d failed during %pSb\n",
> + cpu, __builtin_return_address(0));

I vote to drop this patch. Trying to capture the caller is just a poor man's
version of WARN, and at the end of this series KVM should be at a point where KVM
can WARN when hardware enabling indicates a potentially fatal issue.

Specifically, kvm_arch_add_vm() shouldn't WARN since x86 can fail due a misbehaving
userspace. kvm_arch_online_cpu() on the other hand can and should WARN since
failure in that case means hardware enabling succeeded on other CPUs, and in the x86
case, that KVM is actively running VMs.

2022-10-12 20:57:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 11/30] KVM: Add arch hook for reboot event

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> Factor out the logic on reboot event as arch hook. Later kvm/x86 overrides
> it.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---

...

> @@ -5135,6 +5141,8 @@ static void kvm_del_vm(void)
> static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> void *v)
> {
> + int r;
> +
> /*
> * Some (well, at least mine) BIOSes hang on reboot if
> * in vmx root mode.
> @@ -5143,8 +5151,14 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> */
> pr_info("kvm: exiting hardware virtualization\n");
> kvm_rebooting = true;
> - on_each_cpu(hardware_disable_nolock, NULL, 1);
> - return NOTIFY_OK;
> +
> + /* This hook is called without cpuhotplug disabled. */
> + cpus_read_lock();
> + mutex_lock(&kvm_lock);
> + r = kvm_arch_reboot(val);

Unless there's a valid use case for rejecting/stopping reboot, which I'm pretty
sure there isn't, don't allow arch code to return a value. I.e. return NOTIFY_OK
unconditionally from kvm_reboot() and drop the return from kvm_arch_reboot().

> + mutex_unlock(&kvm_lock);
> + cpus_read_unlock();
> + return r;
> }
>
> static struct notifier_block kvm_reboot_notifier = {
> --
> 2.25.1
>

2022-10-12 20:58:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 16/30] KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> hardware_enable/disable_nolock() check if the hardware is already
> enabled/disabled and work as nop when they are called multiple times.
>
> When VM is created/destroyed, on_each_cpu(hardware_enable/disable_nolock)
> via kvm_arch_add/del_vm() and module_get/put() are called. It means when
> kvm module is removed, it's guaranteed that there is no vm and that
> hardware_disable_nolock() was called on each cpus.
>
> Although the module exit function, kvm_exit(), calls
> on_each_cpu(hardware_disable_nolock), it's essentially nop. Eliminate nop
> call in kvm_exit().

Add a WARN to "prove" that this is a nop, to guard against future bugs, and to
document that it should be impossible for KVM to be unloaded with active VMs.
E.g. do this in addition to dropping the hardware disabling.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1c1a2b0630bc..ca2251d02c77 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5855,6 +5855,8 @@ void kvm_exit(void)
{
int cpu;

+ WARN_ON_ONCE(kvm_usage_count);
+
debugfs_remove_recursive(kvm_debugfs_dir);
misc_deregister(&kvm_dev);
for_each_possible_cpu(cpu)

>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> virt/kvm/kvm_main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ad9b8b7d21fa..d7c3bc14691f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -6034,7 +6034,6 @@ void kvm_exit(void)
> unregister_syscore_ops(&kvm_syscore_ops);
> unregister_reboot_notifier(&kvm_reboot_notifier);
> cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> - on_each_cpu(hardware_disable_nolock, NULL, 1);
> kvm_arch_hardware_unsetup();
> kvm_arch_exit();
> kvm_irqfd_exit();


2022-10-12 20:59:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 09/30] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> Because kvm_count_lock unnecessarily complicates the KVM locking convention
> Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
> simplicity. kvm_arch_hardware_enable/disable() callbacks depend on
> non-preemptiblity with the spin lock. Add preempt_disable/enable()
> around hardware enable/disable callback to keep the assumption.

There's the other "minor" wrinkle that prior to patch 7, "KVM: Rename and move
CPUHP_AP_KVM_STARTING to ONLINE section, kvm_online_cpu() was called with IRQs
disabled and couldn't sleep, i.e. couldn't acquire a mutex. That's very important
to capture in the changelog.

> Because kvm_suspend() and kvm_resume() is called with interrupt disabled,
> they don't need preempt_disable/enable() pair.
>
> Opportunistically add some comments on locking.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>

...

> @@ -5028,13 +5029,20 @@ static int kvm_online_cpu(unsigned int cpu)
> if (kvm_usage_count) {
> WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
>
> + /*
> + * arch callback kvm_arch_hardware_eanble() assumes that

s/eanble/enable

Though even better would be to avoid function names entirely.

> + * preemption is disabled for historical reason. Disable
> + * preemption until all arch callbacks are fixed.
> + */

Probably better to put this comment above to the WARN_ON_ONCE() in hardware_enable_nolock()
since that's where the oddity and dependency on arch behavior lies. And then it
can be turned into a FIXME, e.g.

/*
* FIXME: drop the "preemption disabled" requirement here and in the
* disable path once all arch code plays nice with preemption.
*/

> + preempt_disable();
> hardware_enable_nolock(NULL);
> + preempt_enable();
> if (atomic_read(&hardware_enable_failed)) {
> atomic_set(&hardware_enable_failed, 0);
> ret = -EIO;
> }
> }
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> return ret;
> }
>
> @@ -5042,6 +5050,8 @@ static void hardware_disable_nolock(void *junk)
> {
> int cpu = raw_smp_processor_id();
>
> + WARN_ON_ONCE(preemptible());
> +
> if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
> return;
> cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> @@ -5050,10 +5060,18 @@ static void hardware_disable_nolock(void *junk)
>
> static int kvm_offline_cpu(unsigned int cpu)
> {
> - raw_spin_lock(&kvm_count_lock);
> - if (kvm_usage_count)
> + mutex_lock(&kvm_lock);
> + if (kvm_usage_count) {
> + /*
> + * arch callback kvm_arch_hardware_disable() assumes that
> + * preemption is disabled for historical reason. Disable
> + * preemption until all arch callbacks are fixed.
> + */

I vote to drop this comment and instead document everything in the enable FIXME
(see above).

> + preempt_disable();
> hardware_disable_nolock(NULL);
> - raw_spin_unlock(&kvm_count_lock);
> + preempt_enable();
> + }
> + mutex_unlock(&kvm_lock);
> return 0;
> }

...

> @@ -5708,15 +5728,27 @@ static void kvm_init_debug(void)
>
> static int kvm_suspend(void)
> {
> - if (kvm_usage_count)
> + /*
> + * The caller ensures that CPU hotplug is disabled by
> + * cpu_hotplug_disable() and other CPUs are offlined. No need for
> + * locking.

Disabling CPU hotplug prevents racing with kvm_online_cpu()/kvm_offline_cpu(), but
doesn't prevent racing with hardware_enable_all()/hardware_disable_all().

And the lockdep doesn't mesh with the comment, which explains why kvm_lock doesn't
_need_ to be held, but not why kvm_lock _can't_ be held.

Maybe this?

/*
* Secondary CPUs and CPU hotplug are disabled across the suspend/resume
* callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
* is stable. Assert that kvm_lock is not held as a paranoid sanity
* check that the system isn't suspended when KVM is enabling hardware.
*/

> + */
> + lockdep_assert_not_held(&kvm_lock);
> +
> + if (kvm_usage_count) {
> + /*
> + * Because kvm_suspend() is called with interrupt disabled, no
> + * need to disable preemption.
> + */

Add a lockdep and drop the comment, e.g. below the lockdep_assert_not_held(), add

lockdep_assert_irqs_disabled();

That covers the "why doesn't this disable preemption" _and_ enforces that IRQs are
indeed disabled.

> hardware_disable_nolock(NULL);
> + }
> return 0;
> }
>
> static void kvm_resume(void)
> {
> if (kvm_usage_count) {
> - lockdep_assert_not_held(&kvm_count_lock);
> + lockdep_assert_not_held(&kvm_lock);
> hardware_enable_nolock(NULL);
> }
> }
> --
> 2.25.1
>

2022-10-12 20:59:51

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 10/30] KVM: Add arch hooks when VM is added/deleted

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> and pass kvm_usage_count with kvm_lock. Move kvm_arch_post_init_vm() under
> kvm_arch_add_vm(). Replace enable/disable_hardware_all() with the default
> implementation of kvm_arch_add/del_vm(). Later kvm_arch_post_init_vm() is
> deleted once x86 overrides kvm_arch_add_vm().

This needs to explain _why_ KVM is pivoting to add/remove hooks.

> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> ---
> include/linux/kvm_host.h | 2 +
> virt/kvm/kvm_main.c | 121 ++++++++++++++++++++-------------------
> 2 files changed, 65 insertions(+), 58 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index eab352902de7..3fbb01bbac98 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1445,6 +1445,8 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
> bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
> int kvm_arch_post_init_vm(struct kvm *kvm);
> +int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
> +int kvm_arch_del_vm(int usage_count);
> void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> int kvm_arch_create_vm_debugfs(struct kvm *kvm);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c4b908553726..e2c8823786ff 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -142,8 +142,9 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
> #define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
> .open = kvm_no_compat_open
> #endif
> -static int hardware_enable_all(void);
> -static void hardware_disable_all(void);
> +static void hardware_enable_nolock(void *junk);
> +static void hardware_disable_nolock(void *junk);
> +static void kvm_del_vm(void);

I think kvm_remove_vm() will be less confusing as "remove" is almost never used
to describe freeing something, whereas "delete" is somewhat interchangeable with
"free. I.e. make it more obvious that the hook isn't intented to actually
delete/free a VM, rather it's there to remove/delete a VM from global tracking.

Ah, and this is especially true since the VM needs to be deleted from vm_list
before the is destroyed, but hardware needs to stay enabled until the VM is fully
destroyed.

Hmm, actually, I think even better would be to have kvm_remove_vm() and kvm_drop_vm()
to make it more obvious that there isn't 100% symmetry between "add" and "remove".

E.g. rename kvm_arch_pre_destroy_vm() => kvm_arch_remove_vm() and then we end up
with (see comments below for more details):

static int kvm_add_vm(struct kvm *kvm)
{
/*
* During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
* is called, i.e. on_each_cpu() includes CPUs that have not yet been
* onlined by KVM. Disable CPU hotplug to prevent enabling hardware on
* a CPU that hasn't yet done compatibility checks.
*/
cpus_read_lock();
mutex_lock(&kvm_lock);
r = kvm_arch_add_vm(kvm, ++kvm_usage_count);
if (r) {
--kvm_usage_count;
goto out;
}

list_add(&kvm->vm_list, &vm_list);

out:
mutex_unlock(&kvm_lock);
cpus_read_unlock();
}

static void kvm_remove_vm(struct kvm *kvm)
{
mutex_lock(&kvm_lock);
list_del(&kvm->vm_list);
mutex_unlock(&kvm_lock);
kvm_arch_remove_vm(kvm);
}

static void kvm_drop_vm(void)
{
cpus_read_lock();
mutex_lock(&kvm_lock);
WARN_ON_ONCE(!kvm_usage_count);
kvm_usage_count--;
kvm_arch_drop_vm(kvm_usage_count);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
}

> @@ -1223,13 +1255,28 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> if (r)
> goto out_err_no_debugfs;
>
> - r = kvm_arch_post_init_vm(kvm);
> - if (r)
> - goto out_err;
> -
> + /*
> + * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> + * is called. on_each_cpu() between them includes the CPU. As a result,
> + * hardware_enable_nolock() may get invoked before kvm_online_cpu().
> + * This would enable hardware virtualization on that cpu without
> + * compatibility checks, which can potentially crash system or break
> + * running VMs.
> + *
> + * Disable CPU hotplug to prevent this case from happening.
> + */
> + cpus_read_lock();
> mutex_lock(&kvm_lock);
> + kvm_usage_count++;
> + r = kvm_arch_add_vm(kvm, kvm_usage_count);
> + if (r) {
> + /* the following kvm_del_vm() decrements kvm_usage_count. */

This is buggy on two fronts. kvm_usage_count needs to be protected with kvm_lock,
and AFAICT cpus_read_unlock() isn't called.

Invoking kvm_del_vm() in the error path is the source of both bugs. Typically,
the paired "undo" of an operation should only be used if the "do" operation was
fully successful.

As above, move this to a helper to make juggling the locks less painful.

2022-10-12 22:03:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 17/30] KVM: Move out KVM arch PM hooks and hardware enable/disable logic

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> To make clear that those files are default implementation

Heh, this is debatable, I was quite confused/surprised by seeing kvm_arch.c. It's
also incomplete, which is further confusing, since there are a lot of default hooks
that are left behind. And we most definitely don't want to move all of them, since
many of those hooks are pure nops, i.e. get completely compiled out.

Given that we want this code to go away sooner than later, I don't think adding
a temporarily to-be-deleted file makes sense.

2022-10-13 01:40:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 00/30] KVM: hardware enable/disable reorganize

On Thu, Sep 22, 2022, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> This patch series is to implement the suggestion by Sean Christopherson [1]
> to reorganize enable/disable cpu virtualization feature by replacing
> the arch-generic current enable/disable logic with PM related hooks. And
> convert kvm/x86 to use new hooks.

Thanks for putting this together, actually seeing code is super helpful!

Unfortunately, after seeing the code, I think my suggestion was a bad one. At
the end of this series, there's a rather gross amount of duplicate code between
x86 and common KVM, and no clear line of sight to improving things.

Even if we move ARM, s390, and PPC away from the generic hooks, MIPS and RISC-V
still need the generic implementation, i.e. we'll still have duplicate code.

Rather than force arch code to implement most/all power management hooks, I think
we can achieve a similar outcome (let ARM do its own thing, turn s390 and PPC into
nops) by wrapping the hardware enable/disable (and thus PM code) in a Kconfig,
e.g. KVM_GENERIC_HARDWARE_ENABLING.

I'll throw together a rough prototype tomorrow (got partway through and then got
distracted by other stuff) and hopefully post an RFC series.

Thanks again!

2022-10-14 04:22:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 00/30] KVM: hardware enable/disable reorganize

On Thu, Oct 13, 2022, Sean Christopherson wrote:
> On Thu, Sep 22, 2022, [email protected] wrote:
> > From: Isaku Yamahata <[email protected]>
> >
> > This patch series is to implement the suggestion by Sean Christopherson [1]
> > to reorganize enable/disable cpu virtualization feature by replacing
> > the arch-generic current enable/disable logic with PM related hooks. And
> > convert kvm/x86 to use new hooks.
>
> Thanks for putting this together, actually seeing code is super helpful!
>
> Unfortunately, after seeing the code, I think my suggestion was a bad one. At
> the end of this series, there's a rather gross amount of duplicate code between
> x86 and common KVM, and no clear line of sight to improving things.
>
> Even if we move ARM, s390, and PPC away from the generic hooks, MIPS and RISC-V
> still need the generic implementation, i.e. we'll still have duplicate code.
>
> Rather than force arch code to implement most/all power management hooks, I think
> we can achieve a similar outcome (let ARM do its own thing, turn s390 and PPC into
> nops) by wrapping the hardware enable/disable (and thus PM code) in a Kconfig,
> e.g. KVM_GENERIC_HARDWARE_ENABLING.
>
> I'll throw together a rough prototype tomorrow (got partway through and then got
> distracted by other stuff) and hopefully post an RFC series.

Good news and bad news.

Bad news:
The KVM_GENERIC_HARDWARE_ENABLING idea is a bit of a bust. It works, but it's
little more than a nice-to-have for s390 and PPC.

Good news (from a certain point of view):
The reason that the "generic h/w enabling" doesn't help much is because after sorting
out the myriad issues in KVM initialization, which is even more of a bug-ridden mess
than I initially thought, kvm_init() no longer has _any_ arch hooks. All the compat
checks and whatnot are handled in x86, so tweaking those for TDX should be easier,
or at the very least, we should have more options.

Sorting everything out is proving to be a nightmare, but I think I have the initial
coding done. Testing will be fun. It'll likely be next week before I can post
anything.

2022-11-02 18:21:41

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 00/30] KVM: hardware enable/disable reorganize

On 10/14/22 06:04, Sean Christopherson wrote:
> Good news (from a certain point of view):
> The reason that the "generic h/w enabling" doesn't help much is because after sorting
> out the myriad issues in KVM initialization, which is even more of a bug-ridden mess
> than I initially thought, kvm_init() no longer has_any_ arch hooks. All the compat
> checks and whatnot are handled in x86, so tweaking those for TDX should be easier,
> or at the very least, we should have more options.
>
> Sorting everything out is proving to be a nightmare, but I think I have the initial
> coding done. Testing will be fun. It'll likely be next week before I can post
> anything.

Hi Sean,

is this the same series that you mentioned a couple hours ago in reply
to Isaku?

Thanks,

Paolo


2022-11-02 19:12:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 00/30] KVM: hardware enable/disable reorganize

On Wed, Nov 02, 2022, Paolo Bonzini wrote:
> On 10/14/22 06:04, Sean Christopherson wrote:
> > Good news (from a certain point of view):
> > The reason that the "generic h/w enabling" doesn't help much is because after sorting
> > out the myriad issues in KVM initialization, which is even more of a bug-ridden mess
> > than I initially thought, kvm_init() no longer has_any_ arch hooks. All the compat
> > checks and whatnot are handled in x86, so tweaking those for TDX should be easier,
> > or at the very least, we should have more options.
> >
> > Sorting everything out is proving to be a nightmare, but I think I have the initial
> > coding done. Testing will be fun. It'll likely be next week before I can post
> > anything.
>
> Hi Sean,
>
> is this the same series that you mentioned a couple hours ago in reply to
> Isaku?

Yep.