2022-11-02 23:39:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

Non-x86 folks, please test on hardware when possible. I made a _lot_ of
mistakes when moving code around. Thankfully, x86 was the trickiest code
to deal with, and I'm fairly confident that I found all the bugs I
introduced via testing. But the number of mistakes I made and found on
x86 makes me more than a bit worried that I screwed something up in other
arch code.

This is a continuation of Chao's series to do x86 CPU compatibility checks
during virtualization hardware enabling[1], and of Isaku's series to try
and clean up the hardware enabling paths so that x86 (Intel specifically)
can temporarily enable hardware during module initialization without
causing undue pain for other architectures[2]. It also includes one patch
from another mini-series from Isaku that provides the less controversial
patches[3].

The main theme of this series is to kill off kvm_arch_init(),
kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which
all originated in x86 code from way back when, and needlessly complicate
both common KVM code and architecture code. E.g. many architectures don't
mark functions/data as __init/__ro_after_init purely because kvm_init()
isn't marked __init to support x86's separate vendor modules.

The idea/hope is that with those hooks gone (moved to arch code), it will
be easier for x86 (and other architectures) to modify their module init
sequences as needed without having to fight common KVM code. E.g. I'm
hoping that ARM can build on this to simplify its hardware enabling logic,
especially the pKVM side of things.

There are bug fixes throughout this series. They are more scattered than
I would usually prefer, but getting the sequencing correct was a gigantic
pain for many of the x86 fixes due to needing to fix common code in order
for the x86 fix to have any meaning. And while the bugs are often fatal,
they aren't all that interesting for most users as they either require a
malicious admin or broken hardware, i.e. aren't likely to be encountered
by the vast majority of KVM users. So unless someone _really_ wants a
particular fix isolated for backporting, I'm not planning on shuffling
patches.

Tested on x86. Lightly tested on arm64. Compile tested only on all other
architectures.

[1] https://lore.kernel.org/all/[email protected]
[2] https://lore.kernel.org/all/[email protected]
[3] https://lore.kernel.org/all/[email protected]

Chao Gao (3):
KVM: x86: Do compatibility checks when onlining CPU
KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
KVM: Disable CPU hotplug during hardware enabling

Isaku Yamahata (3):
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with
kvm_lock
KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()
KVM: Make hardware_enable_failed a local variable in the "enable all"
path

Marc Zyngier (1):
KVM: arm64: Simplify the CPUHP logic

Sean Christopherson (37):
KVM: Register /dev/kvm as the _very_ last thing during initialization
KVM: Initialize IRQ FD after arch hardware setup
KVM: Allocate cpus_hardware_enabled after arch hardware setup
KVM: Teardown VFIO ops earlier in kvm_exit()
KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails
KVM: s390: Move hardware setup/unsetup to init/exit
KVM: x86: Do timer initialization after XCR0 configuration
KVM: x86: Move hardware setup/unsetup to init/exit
KVM: Drop arch hardware (un)setup hooks
KVM: VMX: Clean up eVMCS enabling if KVM initialization fails
KVM: x86: Move guts of kvm_arch_init() to standalone helper
KVM: VMX: Do _all_ initialization before exposing /dev/kvm to
userspace
KVM: x86: Serialize vendor module initialization (hardware setup)
KVM: arm64: Free hypervisor allocations if vector slot init fails
KVM: arm64: Unregister perf callbacks if hypervisor finalization fails
KVM: arm64: Do arm/arch initialiation without bouncing through
kvm_init()
KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init
KVM: MIPS: Hardcode callbacks to hardware virtualization extensions
KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init()
KVM: MIPS: Register die notifier prior to kvm_init()
KVM: RISC-V: Do arch init directly in riscv_kvm_init()
KVM: RISC-V: Tag init functions and data with __init, __ro_after_init
KVM: PPC: Move processor compatibility check to module init
KVM: s390: Do s390 specific init without bouncing through kvm_init()
KVM: s390: Mark __kvm_s390_init() and its descendants as __init
KVM: Drop kvm_arch_{init,exit}() hooks
KVM: VMX: Make VMCS configuration/capabilities structs read-only after
init
KVM: x86: Do CPU compatibility checks in x86 code
KVM: Drop kvm_arch_check_processor_compat() hook
KVM: x86: Use KBUILD_MODNAME to specify vendor module name
KVM: x86: Unify pr_fmt to use module name for all KVM modules
KVM: x86: Do VMX/SVM support checks directly in vendor code
KVM: VMX: Shuffle support checks and hardware enabling code around
KVM: SVM: Check for SVM support in CPU compatibility checks
KVM: Use a per-CPU variable to track which CPUs have enabled
virtualization
KVM: Register syscore (suspend/resume) ops early in kvm_init()
KVM: Opt out of generic hardware enabling on s390 and PPC

Documentation/virt/kvm/locking.rst | 18 +-
arch/arm64/include/asm/kvm_host.h | 15 +-
arch/arm64/include/asm/kvm_mmu.h | 4 +-
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arch_timer.c | 29 +-
arch/arm64/kvm/arm.c | 93 +++---
arch/arm64/kvm/mmu.c | 12 +-
arch/arm64/kvm/reset.c | 8 +-
arch/arm64/kvm/sys_regs.c | 6 +-
arch/arm64/kvm/vgic/vgic-init.c | 19 +-
arch/arm64/kvm/vmid.c | 6 +-
arch/mips/include/asm/kvm_host.h | 3 +-
arch/mips/kvm/Kconfig | 1 +
arch/mips/kvm/Makefile | 2 +-
arch/mips/kvm/callback.c | 14 -
arch/mips/kvm/mips.c | 34 +--
arch/mips/kvm/vz.c | 7 +-
arch/powerpc/include/asm/kvm_host.h | 3 -
arch/powerpc/include/asm/kvm_ppc.h | 1 -
arch/powerpc/kvm/book3s.c | 12 +-
arch/powerpc/kvm/e500.c | 6 +-
arch/powerpc/kvm/e500mc.c | 6 +-
arch/powerpc/kvm/powerpc.c | 20 --
arch/riscv/include/asm/kvm_host.h | 7 +-
arch/riscv/kvm/Kconfig | 1 +
arch/riscv/kvm/main.c | 23 +-
arch/riscv/kvm/mmu.c | 12 +-
arch/riscv/kvm/vmid.c | 4 +-
arch/s390/include/asm/kvm_host.h | 1 -
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/kvm/kvm-s390.c | 84 +++---
arch/s390/kvm/kvm-s390.h | 2 +-
arch/s390/kvm/pci.c | 2 +-
arch/s390/kvm/pci.h | 2 +-
arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/debugfs.c | 2 +
arch/x86/kvm/emulate.c | 1 +
arch/x86/kvm/hyperv.c | 1 +
arch/x86/kvm/i8254.c | 4 +-
arch/x86/kvm/i8259.c | 4 +-
arch/x86/kvm/ioapic.c | 1 +
arch/x86/kvm/irq.c | 1 +
arch/x86/kvm/irq_comm.c | 7 +-
arch/x86/kvm/kvm_onhyperv.c | 1 +
arch/x86/kvm/lapic.c | 8 +-
arch/x86/kvm/mmu/mmu.c | 6 +-
arch/x86/kvm/mmu/page_track.c | 1 +
arch/x86/kvm/mmu/spte.c | 4 +-
arch/x86/kvm/mmu/spte.h | 4 +-
arch/x86/kvm/mmu/tdp_iter.c | 1 +
arch/x86/kvm/mmu/tdp_mmu.c | 1 +
arch/x86/kvm/mtrr.c | 1 +
arch/x86/kvm/pmu.c | 1 +
arch/x86/kvm/smm.c | 1 +
arch/x86/kvm/svm/avic.c | 2 +-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/pmu.c | 2 +
arch/x86/kvm/svm/sev.c | 1 +
arch/x86/kvm/svm/svm.c | 90 +++---
arch/x86/kvm/svm/svm_onhyperv.c | 1 +
arch/x86/kvm/svm/svm_onhyperv.h | 4 +-
arch/x86/kvm/vmx/capabilities.h | 4 +-
arch/x86/kvm/vmx/evmcs.c | 1 +
arch/x86/kvm/vmx/evmcs.h | 4 +-
arch/x86/kvm/vmx/nested.c | 3 +-
arch/x86/kvm/vmx/pmu_intel.c | 5 +-
arch/x86/kvm/vmx/posted_intr.c | 2 +
arch/x86/kvm/vmx/sgx.c | 5 +-
arch/x86/kvm/vmx/vmcs12.c | 1 +
arch/x86/kvm/vmx/vmx.c | 438 +++++++++++++++-------------
arch/x86/kvm/vmx/vmx_ops.h | 4 +-
arch/x86/kvm/x86.c | 252 +++++++++-------
arch/x86/kvm/xen.c | 1 +
include/kvm/arm_arch_timer.h | 6 +-
include/kvm/arm_vgic.h | 4 +
include/linux/cpuhotplug.h | 5 +-
include/linux/kvm_host.h | 13 +-
virt/kvm/Kconfig | 3 +
virt/kvm/kvm_main.c | 302 ++++++++++---------
81 files changed, 861 insertions(+), 813 deletions(-)
delete mode 100644 arch/mips/kvm/callback.c


base-commit: d5af637323dd156bad071a3f8fc0d7166cca1276
--
2.38.1.431.g37b22c650d-goog



2022-11-02 23:40:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

From: Chao Gao <[email protected]>

Do compatibility checks when enabling hardware to effectively add
compatibility checks when onlining a CPU. Abort enabling, i.e. the
online process, if the (hotplugged) CPU is incompatible with the known
good setup.

At init time, KVM does compatibility checks to ensure that all online
CPUs support hardware virtualization and a common set of features. But
KVM uses hotplugged CPUs without such compatibility checks. On Intel
CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX, or
VM-Entry failure if the hotplugged CPU doesn't support all features
enabled by KVM.

Note, this is little more than a NOP on SVM, as SVM already checks for
full SVM support during hardware enabling.

Opportunistically add a pr_err() if setup_vmcs_config() fails, and
tweak all error messages to output which CPU failed.

Signed-off-by: Chao Gao <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 20 +++++++++++---------
arch/x86/kvm/vmx/vmx.c | 33 +++++++++++++++++++--------------
arch/x86/kvm/x86.c | 5 +++--
4 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f223c845ed6e..c99222b71fcc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1666,7 +1666,7 @@ struct kvm_x86_nested_ops {
};

struct kvm_x86_init_ops {
- int (*check_processor_compatibility)(void);
+ int (*check_processor_compatibility)(int cpu);
int (*hardware_setup)(void);
unsigned int (*handle_intel_pt_intr)(void);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index efda384d29d4..4772835174dd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -525,13 +525,13 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
vcpu->arch.osvw.status |= 1;
}

-static bool kvm_is_svm_supported(void)
+static bool kvm_is_svm_supported(int cpu)
{
const char *msg;
u64 vm_cr;

if (!cpu_has_svm(&msg)) {
- pr_err("SVM not supported, %s\n", msg);
+ pr_err("SVM not supported by CPU %d, %s\n", cpu, msg);
return false;
}

@@ -542,16 +542,16 @@ static bool kvm_is_svm_supported(void)

rdmsrl(MSR_VM_CR, vm_cr);
if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) {
- pr_err("SVM disabled in MSR_VM_CR\n");
+ pr_err("SVM disabled in MSR_VM_CR on CPU %d\n", cpu);
return false;
}

return true;
}

-static int __init svm_check_processor_compat(void)
+static int svm_check_processor_compat(int cpu)
{
- if (!kvm_is_svm_supported())
+ if (!kvm_is_svm_supported(cpu))
return -EIO;

return 0;
@@ -588,14 +588,16 @@ static int svm_hardware_enable(void)
uint64_t efer;
struct desc_struct *gdt;
int me = raw_smp_processor_id();
+ int r;
+
+ r = svm_check_processor_compat(me);
+ if (r)
+ return r;

rdmsrl(MSR_EFER, efer);
if (efer & EFER_SVME)
return -EBUSY;

- if (!kvm_is_svm_supported())
- return -EINVAL;
-
sd = per_cpu(svm_data, me);
if (!sd) {
pr_err("%s: svm_data is NULL on %d\n", __func__, me);
@@ -5132,7 +5134,7 @@ static int __init svm_init(void)

__unused_size_checks();

- if (!kvm_is_svm_supported())
+ if (!kvm_is_svm_supported(raw_smp_processor_id()))
return -EOPNOTSUPP;

r = kvm_x86_vendor_init(&svm_init_ops);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 07d86535c032..2729de93e0ea 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2520,8 +2520,7 @@ static bool cpu_has_perf_global_ctrl_bug(void)
return false;
}

-static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
- u32 msr, u32 *result)
+static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, u32 msr, u32 *result)
{
u32 vmx_msr_low, vmx_msr_high;
u32 ctl = ctl_min | ctl_opt;
@@ -2539,7 +2538,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
return 0;
}

-static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
+static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
{
u64 allowed;

@@ -2548,8 +2547,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
return ctl_opt & allowed;
}

-static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
- struct vmx_capability *vmx_cap)
+static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
+ struct vmx_capability *vmx_cap)
{
u32 vmx_msr_low, vmx_msr_high;
u32 _pin_based_exec_control = 0;
@@ -2710,36 +2709,38 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
return 0;
}

-static bool __init kvm_is_vmx_supported(void)
+static bool kvm_is_vmx_supported(int cpu)
{
if (!cpu_has_vmx()) {
- pr_err("CPU doesn't support VMX\n");
+ pr_err("VMX not supported by CPU %d\n", cpu);
return false;
}

if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
!boot_cpu_has(X86_FEATURE_VMX)) {
- pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
+ pr_err("VMX not enabled in MSR_IA32_FEAT_CTL on CPU %d\n", cpu);
return false;
}

return true;
}

-static int __init vmx_check_processor_compat(void)
+static int vmx_check_processor_compat(int cpu)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;

- if (!kvm_is_vmx_supported())
+ if (!kvm_is_vmx_supported(cpu))
return -EIO;

- if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
+ if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) {
+ pr_err("Failed to setup VMCS config on CPU %d\n", cpu);
return -EIO;
+ }
if (nested)
nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
- if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
- pr_err("CPU %d feature inconsistency!\n", smp_processor_id());
+ if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config))) {
+ pr_err("Inconsistent VMCS config on CPU %d\n", cpu);
return -EIO;
}
return 0;
@@ -2771,6 +2772,10 @@ static int vmx_hardware_enable(void)
u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
int r;

+ r = vmx_check_processor_compat(cpu);
+ if (r)
+ return r;
+
if (cr4_read_shadow() & X86_CR4_VMXE)
return -EBUSY;

@@ -8517,7 +8522,7 @@ static int __init vmx_init(void)
{
int r, cpu;

- if (!kvm_is_vmx_supported())
+ if (!kvm_is_vmx_supported(raw_smp_processor_id()))
return -EOPNOTSUPP;

hv_setup_evmcs();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c1778f3308a..a7b1d916ecb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9280,7 +9280,8 @@ struct kvm_cpu_compat_check {

static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
{
- struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+ int cpu = smp_processor_id();
+ struct cpuinfo_x86 *c = &cpu_data(cpu);

WARN_ON(!irqs_disabled());

@@ -9288,7 +9289,7 @@ static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;

- return ops->check_processor_compatibility();
+ return ops->check_processor_compatibility(cpu);
}

static void kvm_x86_check_cpu_compat(void *data)
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:41:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 21/44] KVM: MIPS: Register die notifier prior to kvm_init()

Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes
/dev/kvm to userspace and thus allows userspace to create VMs (and call
other ioctls).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 75681281e2df..ae7a24342fdf 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1640,16 +1640,17 @@ static int __init kvm_mips_init(void)
if (ret)
return ret;

- ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
-
- if (ret)
- return ret;

if (boot_cpu_type() == CPU_LOONGSON64)
kvm_priority_to_irq = kvm_loongson3_priority_to_irq;

register_die_notifier(&kvm_mips_csr_die_notifier);

+ ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ if (ret) {
+ unregister_die_notifier(&kvm_mips_csr_die_notifier);
+ return ret;
+ }
return 0;
}

--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:41:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 14/44] KVM: arm64: Simplify the CPUHP logic

From: Marc Zyngier <[email protected]>

For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
complicated, and we have two extra CPUHP notifiers for vGIC and timers.

It looks pretty pointless, and gets in the way of further changes.
So let's just expose some helpers that can be called from the core
CPUHP callback, and get rid of everything else.

This gives us the opportunity to drop a useless notifier entry,
as well as tidy-up the timer enable/disable, which was a bit odd.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/arch_timer.c | 27 ++++++++++-----------------
arch/arm64/kvm/arm.c | 13 +++++++++++++
arch/arm64/kvm/vgic/vgic-init.c | 19 ++-----------------
include/kvm/arm_arch_timer.h | 4 ++++
include/kvm/arm_vgic.h | 4 ++++
include/linux/cpuhotplug.h | 3 ---
6 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index bb24a76b4224..33fca1a691a5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -811,10 +811,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
}

-static void kvm_timer_init_interrupt(void *info)
+void kvm_timer_cpu_up(void)
{
enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
- enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+ if (host_ptimer_irq)
+ enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+}
+
+void kvm_timer_cpu_down(void)
+{
+ disable_percpu_irq(host_vtimer_irq);
+ if (host_ptimer_irq)
+ disable_percpu_irq(host_ptimer_irq);
}

int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
@@ -976,18 +984,6 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
preempt_enable();
}

-static int kvm_timer_starting_cpu(unsigned int cpu)
-{
- kvm_timer_init_interrupt(NULL);
- return 0;
-}
-
-static int kvm_timer_dying_cpu(unsigned int cpu)
-{
- disable_percpu_irq(host_vtimer_irq);
- return 0;
-}
-
static int timer_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
{
if (vcpu)
@@ -1185,9 +1181,6 @@ int kvm_timer_hyp_init(bool has_gic)
goto out_free_irq;
}

- cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
- "kvm/arm/timer:starting", kvm_timer_starting_cpu,
- kvm_timer_dying_cpu);
return 0;
out_free_irq:
free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2ee729f54ce0..0c328af064dd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1670,7 +1670,15 @@ static void _kvm_arch_hardware_enable(void *discard)

int kvm_arch_hardware_enable(void)
{
+ int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
+
_kvm_arch_hardware_enable(NULL);
+
+ if (!was_enabled) {
+ kvm_vgic_cpu_up();
+ kvm_timer_cpu_up();
+ }
+
return 0;
}

@@ -1684,6 +1692,11 @@ static void _kvm_arch_hardware_disable(void *discard)

void kvm_arch_hardware_disable(void)
{
+ if (__this_cpu_read(kvm_arm_hardware_enabled)) {
+ kvm_timer_cpu_down();
+ kvm_vgic_cpu_down();
+ }
+
if (!is_protected_kvm_enabled())
_kvm_arch_hardware_disable(NULL);
}
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..6c7f6ae21ec0 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -465,17 +465,15 @@ int kvm_vgic_map_resources(struct kvm *kvm)

/* GENERIC PROBE */

-static int vgic_init_cpu_starting(unsigned int cpu)
+void kvm_vgic_cpu_up(void)
{
enable_percpu_irq(kvm_vgic_global_state.maint_irq, 0);
- return 0;
}


-static int vgic_init_cpu_dying(unsigned int cpu)
+void kvm_vgic_cpu_down(void)
{
disable_percpu_irq(kvm_vgic_global_state.maint_irq);
- return 0;
}

static irqreturn_t vgic_maintenance_handler(int irq, void *data)
@@ -584,19 +582,6 @@ int kvm_vgic_hyp_init(void)
return ret;
}

- ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- "kvm/arm/vgic:starting",
- vgic_init_cpu_starting, vgic_init_cpu_dying);
- if (ret) {
- kvm_err("Cannot register vgic CPU notifier\n");
- goto out_free_irq;
- }
-
kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq);
return 0;
-
-out_free_irq:
- free_percpu_irq(kvm_vgic_global_state.maint_irq,
- kvm_get_running_vcpus());
- return ret;
}
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index cd6d8f260eab..1638418f72dd 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -104,4 +104,8 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
u32 timer_get_ctl(struct arch_timer_context *ctxt);
u64 timer_get_cval(struct arch_timer_context *ctxt);

+/* CPU HP callbacks */
+void kvm_timer_cpu_up(void);
+void kvm_timer_cpu_down(void);
+
#endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..fc4acc91ba06 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -431,4 +431,8 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);

+/* CPU HP callbacks */
+void kvm_vgic_cpu_up(void);
+void kvm_vgic_cpu_down(void);
+
#endif /* __KVM_ARM_VGIC_H */
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..7337414e4947 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -186,9 +186,6 @@ enum cpuhp_state {
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
CPUHP_AP_KVM_STARTING,
- CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- CPUHP_AP_KVM_ARM_VGIC_STARTING,
- CPUHP_AP_KVM_ARM_TIMER_STARTING,
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:42:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 05/44] KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails

In preparation for folding kvm_arch_hardware_setup() into kvm_arch_init(),
unwind initialization one step at a time instead of simply calling
kvm_arch_exit(). Using kvm_arch_exit() regardless of which initialization
step failed relies on all affected state playing nice with being undone
even if said state wasn't first setup. That holds true for state that is
currently configured by kvm_arch_init(), but not for state that's handled
by kvm_arch_hardware_setup(), e.g. calling gmap_unregister_pte_notifier()
without first registering a notifier would result in list corruption due
to attempting to delete an entry that was never added to the list.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 45d4b8182b07..8395433a79b2 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -490,11 +490,11 @@ int kvm_arch_init(void *opaque)

kvm_s390_dbf_uv = debug_register("kvm-uv", 32, 1, 7 * sizeof(long));
if (!kvm_s390_dbf_uv)
- goto out;
+ goto err_kvm_uv;

if (debug_register_view(kvm_s390_dbf, &debug_sprintf_view) ||
debug_register_view(kvm_s390_dbf_uv, &debug_sprintf_view))
- goto out;
+ goto err_debug_view;

kvm_s390_cpu_feat_init();

@@ -502,25 +502,32 @@ int kvm_arch_init(void *opaque)
rc = kvm_register_device_ops(&kvm_flic_ops, KVM_DEV_TYPE_FLIC);
if (rc) {
pr_err("A FLIC registration call failed with rc=%d\n", rc);
- goto out;
+ goto err_flic;
}

if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM)) {
rc = kvm_s390_pci_init();
if (rc) {
pr_err("Unable to allocate AIFT for PCI\n");
- goto out;
+ goto err_pci;
}
}

rc = kvm_s390_gib_init(GAL_ISC);
if (rc)
- goto out;
+ goto err_gib;

return 0;

-out:
- kvm_arch_exit();
+err_gib:
+ if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV_KVM))
+ kvm_s390_pci_exit();
+err_pci:
+err_flic:
+err_debug_view:
+ debug_unregister(kvm_s390_dbf_uv);
+err_kvm_uv:
+ debug_unregister(kvm_s390_dbf);
return rc;
}

--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:42:19

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section

From: Chao Gao <[email protected]>

The CPU STARTING section doesn't allow callbacks to fail. Move KVM's
hotplug callback to ONLINE section so that it can abort onlining a CPU in
certain cases to avoid potentially breaking VMs running on existing CPUs.
For example, when KVM fails to enable hardware virtualization on the
hotplugged CPU.

Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it ensures
when offlining a CPU, all user tasks and non-pinned kernel tasks have left
the CPU, i.e. there cannot be a vCPU task around. So, it is safe for KVM's
CPU offline callback to disable hardware virtualization at that point.
Likewise, KVM's online callback can enable hardware virtualization before
any vCPU task gets a chance to run on hotplugged CPUs.

Rename KVM's CPU hotplug callbacks accordingly.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Yuan Yao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
include/linux/cpuhotplug.h | 2 +-
virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++--------
2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 7337414e4947..de45be38dd27 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -185,7 +185,6 @@ enum cpuhp_state {
CPUHP_AP_CSKY_TIMER_STARTING,
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
- CPUHP_AP_KVM_STARTING,
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
@@ -200,6 +199,7 @@ enum cpuhp_state {

/* Online section invoked on the hotplugged CPU from the hotplug thread */
CPUHP_AP_ONLINE_IDLE,
+ CPUHP_AP_KVM_ONLINE,
CPUHP_AP_SCHED_WAIT_EMPTY,
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd13af9f06d5..fd9e39c85549 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5026,13 +5026,27 @@ static void hardware_enable_nolock(void *junk)
}
}

-static int kvm_starting_cpu(unsigned int cpu)
+static int kvm_online_cpu(unsigned int cpu)
{
+ int ret = 0;
+
raw_spin_lock(&kvm_count_lock);
- if (kvm_usage_count)
+ /*
+ * Abort the CPU online process if hardware virtualization cannot
+ * be enabled. Otherwise running VMs would encounter unrecoverable
+ * errors when scheduled to this CPU.
+ */
+ if (kvm_usage_count) {
+ WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
+
hardware_enable_nolock(NULL);
+ if (atomic_read(&hardware_enable_failed)) {
+ atomic_set(&hardware_enable_failed, 0);
+ ret = -EIO;
+ }
+ }
raw_spin_unlock(&kvm_count_lock);
- return 0;
+ return ret;
}

static void hardware_disable_nolock(void *junk)
@@ -5045,7 +5059,7 @@ static void hardware_disable_nolock(void *junk)
kvm_arch_hardware_disable();
}

-static int kvm_dying_cpu(unsigned int cpu)
+static int kvm_offline_cpu(unsigned int cpu)
{
raw_spin_lock(&kvm_count_lock);
if (kvm_usage_count)
@@ -5822,8 +5836,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
return -ENOMEM;

- r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
- kvm_starting_cpu, kvm_dying_cpu);
+ r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
+ kvm_online_cpu, kvm_offline_cpu);
if (r)
goto out_free_2;
register_reboot_notifier(&kvm_reboot_notifier);
@@ -5897,7 +5911,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
kmem_cache_destroy(kvm_vcpu_cache);
out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
out_free_2:
free_cpumask_var(cpus_hardware_enabled);
return r;
@@ -5923,7 +5937,7 @@ void kvm_exit(void)
kvm_async_pf_deinit();
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_irqfd_exit();
free_cpumask_var(cpus_hardware_enabled);
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:42:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 29/44] KVM: x86: Do CPU compatibility checks in x86 code

Move the CPU compatibility checks to pure x86 code, i.e. drop x86's use
of the common kvm_x86_check_cpu_compat() arch hook. x86 is the only
architecture that "needs" to do per-CPU compatibility checks, moving
the logic to x86 will allow dropping the common code, and will also
give x86 more control over when/how the compatibility checks are
performed, e.g. TDX will need to enable hardware (do VMXON) in order to
perform compatibility checks.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 49 ++++++++++++++++++++++++++++++++----------
3 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f48d07bfc3d7..368b4db4b240 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5144,7 +5144,7 @@ static int __init svm_init(void)
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
*/
- r = kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
+ r = kvm_init(NULL, sizeof(struct vcpu_svm),
__alignof__(struct vcpu_svm), THIS_MODULE);
if (r)
goto err_kvm_init;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 81690fce0eb1..26baaccb659a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8562,7 +8562,7 @@ static int __init vmx_init(void)
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
*/
- r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
+ r = kvm_init(NULL, sizeof(struct vcpu_vmx),
__alignof__(struct vcpu_vmx), THIS_MODULE);
if (r)
goto err_kvm_init;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2b4530a33298..94831f1a1d04 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9271,10 +9271,36 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
kvm_pmu_ops_update(ops->pmu_ops);
}

+struct kvm_cpu_compat_check {
+ struct kvm_x86_init_ops *ops;
+ int *ret;
+};
+
+static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
+{
+ struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+
+ WARN_ON(!irqs_disabled());
+
+ if (__cr4_reserved_bits(cpu_has, c) !=
+ __cr4_reserved_bits(cpu_has, &boot_cpu_data))
+ return -EIO;
+
+ return ops->check_processor_compatibility();
+}
+
+static void kvm_x86_check_cpu_compat(void *data)
+{
+ struct kvm_cpu_compat_check *c = data;
+
+ *c->ret = kvm_x86_check_processor_compatibility(c->ops);
+}
+
static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
{
+ struct kvm_cpu_compat_check c;
u64 host_pat;
- int r;
+ int r, cpu;

if (kvm_x86_ops.hardware_enable) {
pr_err("kvm: already loaded vendor module '%s'\n", kvm_x86_ops.name);
@@ -9354,6 +9380,14 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (r != 0)
goto out_mmu_exit;

+ c.ret = &r;
+ c.ops = ops;
+ for_each_online_cpu(cpu) {
+ smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &c, 1);
+ if (r < 0)
+ goto out_hardware_unsetup;
+ }
+
/*
* Point of no return! DO NOT add error paths below this point unless
* absolutely necessary, as most operations from this point forward
@@ -9396,6 +9430,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
kvm_init_msr_list();
return 0;

+out_hardware_unsetup:
+ ops->runtime_ops->hardware_unsetup();
out_mmu_exit:
kvm_mmu_vendor_module_exit();
out_free_percpu:
@@ -12002,16 +12038,7 @@ void kvm_arch_hardware_disable(void)

int kvm_arch_check_processor_compat(void *opaque)
{
- struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
- struct kvm_x86_init_ops *ops = opaque;
-
- WARN_ON(!irqs_disabled());
-
- if (__cr4_reserved_bits(cpu_has, c) !=
- __cr4_reserved_bits(cpu_has, &boot_cpu_data))
- return -EIO;
-
- return ops->check_processor_compatibility();
+ return 0;
}

bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:42:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

Do basic VMX/SVM support checks directly in vendor code instead of
implementing them via kvm_x86_ops hooks. Beyond the superficial benefit
of providing common messages, which isn't even clearly a net positive
since vendor code can provide more precise/detailed messages, there's
zero advantage to bouncing through common x86 code.

Consolidating the checks will also simplify performing the checks
across all CPUs (in a future patch).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/svm/svm.c | 38 +++++++++++++++------------------
arch/x86/kvm/vmx/vmx.c | 37 +++++++++++++++++---------------
arch/x86/kvm/x86.c | 11 ----------
4 files changed, 37 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 58a7cb8d8e96..f223c845ed6e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1666,8 +1666,6 @@ struct kvm_x86_nested_ops {
};

struct kvm_x86_init_ops {
- int (*cpu_has_kvm_support)(void);
- int (*disabled_by_bios)(void);
int (*check_processor_compatibility)(void);
int (*hardware_setup)(void);
unsigned int (*handle_intel_pt_intr)(void);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3c48fb837302..3523d24d004b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -525,21 +525,28 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
vcpu->arch.osvw.status |= 1;
}

-static int has_svm(void)
+static bool kvm_is_svm_supported(void)
{
const char *msg;
+ u64 vm_cr;

if (!cpu_has_svm(&msg)) {
- printk(KERN_INFO "has_svm: %s\n", msg);
- return 0;
+ pr_err("SVM not supported, %s\n", msg);
+ return false;
}

if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
pr_info("KVM is unsupported when running as an SEV guest\n");
- return 0;
+ return false;
}

- return 1;
+ rdmsrl(MSR_VM_CR, vm_cr);
+ if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) {
+ pr_err("SVM disabled in MSR_VM_CR\n");
+ return false;
+ }
+
+ return true;
}

void __svm_write_tsc_multiplier(u64 multiplier)
@@ -578,10 +585,9 @@ static int svm_hardware_enable(void)
if (efer & EFER_SVME)
return -EBUSY;

- if (!has_svm()) {
- pr_err("%s: err EOPNOTSUPP on %d\n", __func__, me);
+ if (!kvm_is_svm_supported())
return -EINVAL;
- }
+
sd = per_cpu(svm_data, me);
if (!sd) {
pr_err("%s: svm_data is NULL on %d\n", __func__, me);
@@ -4112,17 +4118,6 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
vmcb_mark_dirty(svm->vmcb, VMCB_CR);
}

-static int is_disabled(void)
-{
- u64 vm_cr;
-
- rdmsrl(MSR_VM_CR, vm_cr);
- if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE))
- return 1;
-
- return 0;
-}
-
static void
svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
{
@@ -5121,8 +5116,6 @@ static __init int svm_hardware_setup(void)


static struct kvm_x86_init_ops svm_init_ops __initdata = {
- .cpu_has_kvm_support = has_svm,
- .disabled_by_bios = is_disabled,
.hardware_setup = svm_hardware_setup,
.check_processor_compatibility = svm_check_processor_compat,

@@ -5136,6 +5129,9 @@ static int __init svm_init(void)

__unused_size_checks();

+ if (!kvm_is_svm_supported())
+ return -EOPNOTSUPP;
+
r = kvm_x86_vendor_init(&svm_init_ops);
if (r)
return r;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1b645f52cd8d..2a7e62d0707d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2485,17 +2485,6 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
}
}

-static __init int cpu_has_kvm_support(void)
-{
- return cpu_has_vmx();
-}
-
-static __init int vmx_disabled_by_bios(void)
-{
- return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
- !boot_cpu_has(X86_FEATURE_VMX);
-}
-
static int kvm_cpu_vmxon(u64 vmxon_pointer)
{
u64 msr;
@@ -7477,16 +7466,29 @@ static int vmx_vm_init(struct kvm *kvm)
return 0;
}

+static bool __init kvm_is_vmx_supported(void)
+{
+ if (!cpu_has_vmx()) {
+ pr_err("CPU doesn't support VMX\n");
+ return false;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
+ !boot_cpu_has(X86_FEATURE_VMX)) {
+ pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
+ return false;
+ }
+
+ return true;
+}
+
static int __init vmx_check_processor_compat(void)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;

- if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
- !this_cpu_has(X86_FEATURE_VMX)) {
- pr_err("VMX is disabled on CPU %d\n", smp_processor_id());
+ if (!kvm_is_vmx_supported())
return -EIO;
- }

if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
@@ -8471,8 +8473,6 @@ static __init int hardware_setup(void)
}

static struct kvm_x86_init_ops vmx_init_ops __initdata = {
- .cpu_has_kvm_support = cpu_has_kvm_support,
- .disabled_by_bios = vmx_disabled_by_bios,
.check_processor_compatibility = vmx_check_processor_compat,
.hardware_setup = hardware_setup,
.handle_intel_pt_intr = NULL,
@@ -8517,6 +8517,9 @@ static int __init vmx_init(void)
{
int r, cpu;

+ if (!kvm_is_vmx_supported())
+ return -EOPNOTSUPP;
+
hv_setup_evmcs();

r = kvm_x86_vendor_init(&vmx_init_ops);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 39675b9662d7..0c1778f3308a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9309,17 +9309,6 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
return -EEXIST;
}

- if (!ops->cpu_has_kvm_support()) {
- pr_err_ratelimited("no hardware support for '%s'\n",
- ops->runtime_ops->name);
- return -EOPNOTSUPP;
- }
- if (ops->disabled_by_bios()) {
- pr_err_ratelimited("support for '%s' disabled by bios\n",
- ops->runtime_ops->name);
- return -EOPNOTSUPP;
- }
-
/*
* KVM explicitly assumes that the guest has an FPU and
* FXSAVE/FXRSTOR. For example, the KVM_GET_FPU explicitly casts the
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:43:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 30/44] KVM: Drop kvm_arch_check_processor_compat() hook

Drop kvm_arch_check_processor_compat() and its support code now that all
architecture implementations are nops.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/arm.c | 7 +------
arch/mips/kvm/mips.c | 7 +------
arch/powerpc/kvm/book3s.c | 2 +-
arch/powerpc/kvm/e500.c | 2 +-
arch/powerpc/kvm/e500mc.c | 2 +-
arch/powerpc/kvm/powerpc.c | 5 -----
arch/riscv/kvm/main.c | 7 +------
arch/s390/kvm/kvm-s390.c | 7 +------
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 4 ++--
arch/x86/kvm/x86.c | 5 -----
include/linux/kvm_host.h | 4 +---
virt/kvm/kvm_main.c | 24 +-----------------------
13 files changed, 13 insertions(+), 67 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 75c5125b0dd3..ed1836b6f044 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -63,11 +63,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}

-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
@@ -2268,7 +2263,7 @@ static __init int kvm_arm_init(void)
* FIXME: Do something reasonable if kvm_init() fails after pKVM
* hypervisor protection is finalized.
*/
- err = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ err = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (err)
goto out_subs;

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 3cade648827a..36c8991b5d39 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -135,11 +135,6 @@ void kvm_arch_hardware_disable(void)
kvm_mips_callbacks->hardware_disable();
}

-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
extern void kvm_init_loongson_ipi(struct kvm *kvm);

int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
@@ -1636,7 +1631,7 @@ static int __init kvm_mips_init(void)

register_die_notifier(&kvm_mips_csr_die_notifier);

- ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ ret = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (ret) {
unregister_die_notifier(&kvm_mips_csr_die_notifier);
return ret;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 87283a0e33d8..57f4e7896d67 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1052,7 +1052,7 @@ static int kvmppc_book3s_init(void)
{
int r;

- r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ r = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (r)
return r;
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index 0ea61190ec04..b0f695428733 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -531,7 +531,7 @@ static int __init kvmppc_e500_init(void)
flush_icache_range(kvmppc_booke_handlers, kvmppc_booke_handlers +
ivor[max_ivor] + handler_len);

- r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
+ r = kvm_init(sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
if (r)
goto err_out;
kvm_ops_e500.owner = THIS_MODULE;
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 795667f7ebf0..611532a0dedc 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -404,7 +404,7 @@ static int __init kvmppc_e500mc_init(void)
*/
kvmppc_init_lpid(KVMPPC_NR_LPIDS/threads_per_core);

- r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
+ r = kvm_init(sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
if (r)
goto err_out;
kvm_ops_e500mc.owner = THIS_MODULE;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 34278042ad27..51268be60dac 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -441,11 +441,6 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
struct kvmppc_ops *kvm_ops = NULL;
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 4710a6751687..34c3dece6990 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,11 +20,6 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}

-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
int kvm_arch_hardware_enable(void)
{
unsigned long hideleg, hedeleg;
@@ -110,6 +105,6 @@ static int __init riscv_kvm_init(void)

kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());

- return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ return kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
}
module_init(riscv_kvm_init);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7c1c6d81b5d7..949231f1393e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -254,11 +254,6 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
/* forward declarations */
static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
unsigned long end);
@@ -5654,7 +5649,7 @@ static int __init kvm_s390_init(void)
if (r)
return r;

- r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ r = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (r) {
__kvm_s390_exit();
return r;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 368b4db4b240..99c1ac2d9c84 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5144,8 +5144,8 @@ static int __init svm_init(void)
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
*/
- r = kvm_init(NULL, sizeof(struct vcpu_svm),
- __alignof__(struct vcpu_svm), THIS_MODULE);
+ r = kvm_init(sizeof(struct vcpu_svm), __alignof__(struct vcpu_svm),
+ THIS_MODULE);
if (r)
goto err_kvm_init;

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26baaccb659a..25e28d368274 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8562,8 +8562,8 @@ static int __init vmx_init(void)
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
*/
- r = kvm_init(NULL, sizeof(struct vcpu_vmx),
- __alignof__(struct vcpu_vmx), THIS_MODULE);
+ r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx),
+ THIS_MODULE);
if (r)
goto err_kvm_init;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 94831f1a1d04..5b7b551ae44b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12036,11 +12036,6 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}

-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
{
return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6c2a28c4c684..0b96d836a051 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -936,8 +936,7 @@ static inline void kvm_irqfd_exit(void)
{
}
#endif
-int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
- struct module *module);
+int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module);
void kvm_exit(void);

void kvm_get_kvm(struct kvm *kvm);
@@ -1444,7 +1443,6 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}

int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
-int kvm_arch_check_processor_compat(void *opaque);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 17c852cb6842..dd13af9f06d5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5814,36 +5814,14 @@ void kvm_unregister_perf_callbacks(void)
}
#endif

-struct kvm_cpu_compat_check {
- void *opaque;
- int *ret;
-};
-
-static void check_processor_compat(void *data)
+int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
{
- struct kvm_cpu_compat_check *c = data;
-
- *c->ret = kvm_arch_check_processor_compat(c->opaque);
-}
-
-int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
- struct module *module)
-{
- struct kvm_cpu_compat_check c;
int r;
int cpu;

if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
return -ENOMEM;

- c.ret = &r;
- c.opaque = opaque;
- for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &c, 1);
- if (r < 0)
- goto out_free_2;
- }
-
r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
kvm_starting_cpu, kvm_dying_cpu);
if (r)
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:43:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 18/44] KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init

Tag kvm_arm_init() and its unique helper as __init, and tag data that is
only ever modified under the kvm_arm_init() umbrella as read-only after
init.

Opportunistically name the boolean param in kvm_timer_hyp_init()'s
prototype to match its definition.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 14 ++++++-------
arch/arm64/include/asm/kvm_mmu.h | 4 ++--
arch/arm64/kvm/arch_timer.c | 2 +-
arch/arm64/kvm/arm.c | 34 +++++++++++++++----------------
arch/arm64/kvm/mmu.c | 12 +++++------
arch/arm64/kvm/reset.c | 8 ++++----
arch/arm64/kvm/sys_regs.c | 6 +++---
arch/arm64/kvm/vmid.c | 6 +++---
include/kvm/arm_arch_timer.h | 2 +-
9 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5d5a887e63a5..4863fe356be1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -66,8 +66,8 @@ enum kvm_mode kvm_get_mode(void);

DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);

-extern unsigned int kvm_sve_max_vl;
-int kvm_arm_init_sve(void);
+extern unsigned int __ro_after_init kvm_sve_max_vl;
+int __init kvm_arm_init_sve(void);

u32 __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
@@ -793,7 +793,7 @@ int kvm_handle_cp10_id(struct kvm_vcpu *vcpu);

void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);

-int kvm_sys_reg_table_init(void);
+int __init kvm_sys_reg_table_init(void);

/* MMIO helpers */
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
@@ -824,9 +824,9 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);

-extern unsigned int kvm_arm_vmid_bits;
-int kvm_arm_vmid_alloc_init(void);
-void kvm_arm_vmid_alloc_free(void);
+extern unsigned int __ro_after_init kvm_arm_vmid_bits;
+int __init kvm_arm_vmid_alloc_init(void);
+void __init kvm_arm_vmid_alloc_free(void);
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
void kvm_arm_vmid_clear_active(void);

@@ -909,7 +909,7 @@ static inline void kvm_clr_pmu_events(u32 clr) {}
void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu);

-int kvm_set_ipa_limit(void);
+int __init kvm_set_ipa_limit(void);

#define __KVM_HAVE_ARCH_VM_ALLOC
struct kvm *kvm_arch_alloc_vm(void);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7784081088e7..ced5b0028933 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -163,7 +163,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **haddr);
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void **haddr);
-void free_hyp_pgds(void);
+void __init free_hyp_pgds(void);

void stage2_unmap_vm(struct kvm *kvm);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
@@ -175,7 +175,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);

phys_addr_t kvm_mmu_get_httbr(void);
phys_addr_t kvm_get_idmap_vector(void);
-int kvm_mmu_init(u32 *hyp_va_bits);
+int __init kvm_mmu_init(u32 *hyp_va_bits);

static inline void *__kvm_vector_slot2addr(void *base,
enum arm64_hyp_spectre_vector slot)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 33fca1a691a5..23346585a294 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -1113,7 +1113,7 @@ static int kvm_irq_init(struct arch_timer_kvm_info *info)
return 0;
}

-int kvm_timer_hyp_init(bool has_gic)
+int __init kvm_timer_hyp_init(bool has_gic)
{
struct arch_timer_kvm_info *info;
int err;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index bfa2dcd3db11..6e0061eac627 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1513,7 +1513,7 @@ static int kvm_init_vector_slots(void)
return 0;
}

-static void cpu_prepare_hyp_mode(int cpu)
+static void __init cpu_prepare_hyp_mode(int cpu)
{
struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
unsigned long tcr;
@@ -1739,26 +1739,26 @@ static struct notifier_block hyp_init_cpu_pm_nb = {
.notifier_call = hyp_init_cpu_pm_notifier,
};

-static void hyp_cpu_pm_init(void)
+static void __init hyp_cpu_pm_init(void)
{
if (!is_protected_kvm_enabled())
cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
}
-static void hyp_cpu_pm_exit(void)
+static void __init hyp_cpu_pm_exit(void)
{
if (!is_protected_kvm_enabled())
cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
}
#else
-static inline void hyp_cpu_pm_init(void)
+static inline void __init hyp_cpu_pm_init(void)
{
}
-static inline void hyp_cpu_pm_exit(void)
+static inline void __init hyp_cpu_pm_exit(void)
{
}
#endif

-static void init_cpu_logical_map(void)
+static void __init init_cpu_logical_map(void)
{
unsigned int cpu;

@@ -1775,7 +1775,7 @@ static void init_cpu_logical_map(void)
#define init_psci_0_1_impl_state(config, what) \
config.psci_0_1_ ## what ## _implemented = psci_ops.what

-static bool init_psci_relay(void)
+static bool __init init_psci_relay(void)
{
/*
* If PSCI has not been initialized, protected KVM cannot install
@@ -1798,7 +1798,7 @@ static bool init_psci_relay(void)
return true;
}

-static int init_subsystems(void)
+static int __init init_subsystems(void)
{
int err = 0;

@@ -1848,13 +1848,13 @@ static int init_subsystems(void)
return err;
}

-static void teardown_subsystems(void)
+static void __init teardown_subsystems(void)
{
kvm_unregister_perf_callbacks();
hyp_cpu_pm_exit();
}

-static void teardown_hyp_mode(void)
+static void __init teardown_hyp_mode(void)
{
int cpu;

@@ -1865,7 +1865,7 @@ static void teardown_hyp_mode(void)
}
}

-static int do_pkvm_init(u32 hyp_va_bits)
+static int __init do_pkvm_init(u32 hyp_va_bits)
{
void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
int ret;
@@ -1887,7 +1887,7 @@ static int do_pkvm_init(u32 hyp_va_bits)
return ret;
}

-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static int __init kvm_hyp_init_protection(u32 hyp_va_bits)
{
void *addr = phys_to_virt(hyp_mem_base);
int ret;
@@ -1917,7 +1917,7 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
/**
* Inits Hyp-mode on all online CPUs
*/
-static int init_hyp_mode(void)
+static int __init init_hyp_mode(void)
{
u32 hyp_va_bits;
int cpu;
@@ -2099,7 +2099,7 @@ static int init_hyp_mode(void)
return err;
}

-static void _kvm_host_prot_finalize(void *arg)
+static void __init _kvm_host_prot_finalize(void *arg)
{
int *err = arg;

@@ -2107,7 +2107,7 @@ static void _kvm_host_prot_finalize(void *arg)
WRITE_ONCE(*err, -EINVAL);
}

-static int pkvm_drop_host_privileges(void)
+static int __init pkvm_drop_host_privileges(void)
{
int ret = 0;

@@ -2120,7 +2120,7 @@ static int pkvm_drop_host_privileges(void)
return ret;
}

-static int finalize_hyp_mode(void)
+static int __init finalize_hyp_mode(void)
{
if (!is_protected_kvm_enabled())
return 0;
@@ -2190,7 +2190,7 @@ void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons)
/**
* Initialize Hyp-mode and memory mappings on all CPUs.
*/
-int kvm_arm_init(void)
+static __init int kvm_arm_init(void)
{
int err;
bool in_hyp_mode;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..4633664adb11 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -25,11 +25,11 @@
static struct kvm_pgtable *hyp_pgtable;
static DEFINE_MUTEX(kvm_hyp_pgd_mutex);

-static unsigned long hyp_idmap_start;
-static unsigned long hyp_idmap_end;
-static phys_addr_t hyp_idmap_vector;
+static unsigned long __ro_after_init hyp_idmap_start;
+static unsigned long __ro_after_init hyp_idmap_end;
+static phys_addr_t __ro_after_init hyp_idmap_vector;

-static unsigned long io_map_base;
+static unsigned long __ro_after_init io_map_base;

static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end)
{
@@ -261,7 +261,7 @@ static void stage2_flush_vm(struct kvm *kvm)
/**
* free_hyp_pgds - free Hyp-mode page tables
*/
-void free_hyp_pgds(void)
+void __init free_hyp_pgds(void)
{
mutex_lock(&kvm_hyp_pgd_mutex);
if (hyp_pgtable) {
@@ -1615,7 +1615,7 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
.virt_to_phys = kvm_host_pa,
};

-int kvm_mmu_init(u32 *hyp_va_bits)
+int __init kvm_mmu_init(u32 *hyp_va_bits)
{
int err;

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae18472205a..dd58a8629a2e 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -30,7 +30,7 @@
#include <asm/virt.h>

/* Maximum phys_shift supported for any VM on this host */
-static u32 kvm_ipa_limit;
+static u32 __ro_after_init kvm_ipa_limit;

/*
* ARMv8 Reset Values
@@ -41,9 +41,9 @@ static u32 kvm_ipa_limit;
#define VCPU_RESET_PSTATE_SVC (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
PSR_AA32_I_BIT | PSR_AA32_F_BIT)

-unsigned int kvm_sve_max_vl;
+unsigned int __ro_after_init kvm_sve_max_vl;

-int kvm_arm_init_sve(void)
+int __init kvm_arm_init_sve(void)
{
if (system_supports_sve()) {
kvm_sve_max_vl = sve_max_virtualisable_vl();
@@ -352,7 +352,7 @@ u32 get_kvm_ipa_limit(void)
return kvm_ipa_limit;
}

-int kvm_set_ipa_limit(void)
+int __init kvm_set_ipa_limit(void)
{
unsigned int parange;
u64 mmfr0;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f4a7c5abcbca..0359f57c2c44 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -82,7 +82,7 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
}

/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
-static u32 cache_levels;
+static u32 __ro_after_init cache_levels;

/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
#define CSSELR_MAX 14
@@ -2620,7 +2620,7 @@ static void get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
}

/* ->val is filled in by kvm_sys_reg_table_init() */
-static struct sys_reg_desc invariant_sys_regs[] = {
+static struct sys_reg_desc invariant_sys_regs[] __ro_after_init = {
{ SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
{ SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
{ SYS_DESC(SYS_CLIDR_EL1), NULL, get_clidr_el1 },
@@ -2944,7 +2944,7 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
return write_demux_regids(uindices);
}

-int kvm_sys_reg_table_init(void)
+int __init kvm_sys_reg_table_init(void)
{
bool valid = true;
unsigned int i;
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index d78ae63d7c15..08978d0672e7 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -16,7 +16,7 @@
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>

-unsigned int kvm_arm_vmid_bits;
+unsigned int __ro_after_init kvm_arm_vmid_bits;
static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);

static atomic64_t vmid_generation;
@@ -172,7 +172,7 @@ void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
/*
* Initialize the VMID allocator
*/
-int kvm_arm_vmid_alloc_init(void)
+int __init kvm_arm_vmid_alloc_init(void)
{
kvm_arm_vmid_bits = kvm_get_vmid_bits();

@@ -190,7 +190,7 @@ int kvm_arm_vmid_alloc_init(void)
return 0;
}

-void kvm_arm_vmid_alloc_free(void)
+void __init kvm_arm_vmid_alloc_free(void)
{
kfree(vmid_map);
}
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 1638418f72dd..71916de7c6c4 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -60,7 +60,7 @@ struct arch_timer_cpu {
bool enabled;
};

-int kvm_timer_hyp_init(bool);
+int __init kvm_timer_hyp_init(bool has_gic);
int kvm_timer_enable(struct kvm_vcpu *vcpu);
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:44:30

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 28/44] KVM: VMX: Make VMCS configuration/capabilities structs read-only after init

Tag vmcs_config and vmx_capability structs as __init, the canonical
configuration is generated during hardware_setup() and must never be
modified after that point.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 4 ++--
arch/x86/kvm/vmx/vmx.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 87c4e46daf37..2aaa0fd53d08 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -68,13 +68,13 @@ struct vmcs_config {
u64 misc;
struct nested_vmx_msrs nested;
};
-extern struct vmcs_config vmcs_config;
+extern struct vmcs_config vmcs_config __ro_after_init;

struct vmx_capability {
u32 ept;
u32 vpid;
};
-extern struct vmx_capability vmx_capability;
+extern struct vmx_capability vmx_capability __ro_after_init;

static inline bool cpu_has_vmx_basic_inout(void)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6adb60485839..81690fce0eb1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -489,8 +489,8 @@ static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
static DEFINE_SPINLOCK(vmx_vpid_lock);

-struct vmcs_config vmcs_config;
-struct vmx_capability vmx_capability;
+struct vmcs_config vmcs_config __ro_after_init;
+struct vmx_capability vmx_capability __ro_after_init;

#define VMX_SEGMENT_FIELD(seg) \
[VCPU_SREG_##seg] = { \
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:45:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 22/44] KVM: RISC-V: Do arch init directly in riscv_kvm_init()

Fold the guts of kvm_arch_init() into riscv_kvm_init() instead of
bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a
glorified nop as invoking kvm_arch_init() is the very first action
performed by kvm_init().

Moving setup to riscv_kvm_init(), which is tagged __init, will allow
tagging more functions and data with __init and __ro_after_init. And
emptying kvm_arch_init() will allow dropping the hook entirely once all
architecture implementations are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/riscv/kvm/main.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index a146fa0ce4d2..cb063b8a9a0f 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -66,6 +66,15 @@ void kvm_arch_hardware_disable(void)
}

int kvm_arch_init(void *opaque)
+{
+ return 0;
+}
+
+void kvm_arch_exit(void)
+{
+}
+
+static int __init riscv_kvm_init(void)
{
const char *str;

@@ -110,15 +119,6 @@ int kvm_arch_init(void *opaque)

kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());

- return 0;
-}
-
-void kvm_arch_exit(void)
-{
-}
-
-static int __init riscv_kvm_init(void)
-{
return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
}
module_init(riscv_kvm_init);
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:45:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 34/44] KVM: VMX: Shuffle support checks and hardware enabling code around

Reorder code in vmx.c so that the VMX support check helpers reside above
the hardware enabling helpers, which will allow KVM to perform support
checks during hardware enabling (in a future patch).

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 212 ++++++++++++++++++++---------------------
1 file changed, 106 insertions(+), 106 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2a7e62d0707d..07d86535c032 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2485,77 +2485,6 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
}
}

-static int kvm_cpu_vmxon(u64 vmxon_pointer)
-{
- u64 msr;
-
- cr4_set_bits(X86_CR4_VMXE);
-
- asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
- _ASM_EXTABLE(1b, %l[fault])
- : : [vmxon_pointer] "m"(vmxon_pointer)
- : : fault);
- return 0;
-
-fault:
- WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n",
- rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr);
- cr4_clear_bits(X86_CR4_VMXE);
-
- return -EFAULT;
-}
-
-static int vmx_hardware_enable(void)
-{
- int cpu = raw_smp_processor_id();
- u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
- int r;
-
- if (cr4_read_shadow() & X86_CR4_VMXE)
- return -EBUSY;
-
- /*
- * This can happen if we hot-added a CPU but failed to allocate
- * VP assist page for it.
- */
- if (static_branch_unlikely(&enable_evmcs) &&
- !hv_get_vp_assist_page(cpu))
- return -EFAULT;
-
- intel_pt_handle_vmx(1);
-
- r = kvm_cpu_vmxon(phys_addr);
- if (r) {
- intel_pt_handle_vmx(0);
- return r;
- }
-
- if (enable_ept)
- ept_sync_global();
-
- return 0;
-}
-
-static void vmclear_local_loaded_vmcss(void)
-{
- int cpu = raw_smp_processor_id();
- struct loaded_vmcs *v, *n;
-
- list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
- loaded_vmcss_on_cpu_link)
- __loaded_vmcs_clear(v);
-}
-
-static void vmx_hardware_disable(void)
-{
- vmclear_local_loaded_vmcss();
-
- if (cpu_vmxoff())
- kvm_spurious_fault();
-
- intel_pt_handle_vmx(0);
-}
-
/*
* There is no X86_FEATURE for SGX yet, but anyway we need to query CPUID
* directly instead of going through cpu_has(), to ensure KVM is trapping
@@ -2781,6 +2710,112 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
return 0;
}

+static bool __init kvm_is_vmx_supported(void)
+{
+ if (!cpu_has_vmx()) {
+ pr_err("CPU doesn't support VMX\n");
+ return false;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
+ !boot_cpu_has(X86_FEATURE_VMX)) {
+ pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
+ return false;
+ }
+
+ return true;
+}
+
+static int __init vmx_check_processor_compat(void)
+{
+ struct vmcs_config vmcs_conf;
+ struct vmx_capability vmx_cap;
+
+ if (!kvm_is_vmx_supported())
+ return -EIO;
+
+ if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
+ return -EIO;
+ if (nested)
+ nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
+ if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
+ pr_err("CPU %d feature inconsistency!\n", smp_processor_id());
+ return -EIO;
+ }
+ return 0;
+}
+
+static int kvm_cpu_vmxon(u64 vmxon_pointer)
+{
+ u64 msr;
+
+ cr4_set_bits(X86_CR4_VMXE);
+
+ asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
+ _ASM_EXTABLE(1b, %l[fault])
+ : : [vmxon_pointer] "m"(vmxon_pointer)
+ : : fault);
+ return 0;
+
+fault:
+ WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n",
+ rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr);
+ cr4_clear_bits(X86_CR4_VMXE);
+
+ return -EFAULT;
+}
+
+static int vmx_hardware_enable(void)
+{
+ int cpu = raw_smp_processor_id();
+ u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
+ int r;
+
+ if (cr4_read_shadow() & X86_CR4_VMXE)
+ return -EBUSY;
+
+ /*
+ * This can happen if we hot-added a CPU but failed to allocate
+ * VP assist page for it.
+ */
+ if (static_branch_unlikely(&enable_evmcs) &&
+ !hv_get_vp_assist_page(cpu))
+ return -EFAULT;
+
+ intel_pt_handle_vmx(1);
+
+ r = kvm_cpu_vmxon(phys_addr);
+ if (r) {
+ intel_pt_handle_vmx(0);
+ return r;
+ }
+
+ if (enable_ept)
+ ept_sync_global();
+
+ return 0;
+}
+
+static void vmclear_local_loaded_vmcss(void)
+{
+ int cpu = raw_smp_processor_id();
+ struct loaded_vmcs *v, *n;
+
+ list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
+ loaded_vmcss_on_cpu_link)
+ __loaded_vmcs_clear(v);
+}
+
+static void vmx_hardware_disable(void)
+{
+ vmclear_local_loaded_vmcss();
+
+ if (cpu_vmxoff())
+ kvm_spurious_fault();
+
+ intel_pt_handle_vmx(0);
+}
+
struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
{
int node = cpu_to_node(cpu);
@@ -7466,41 +7501,6 @@ static int vmx_vm_init(struct kvm *kvm)
return 0;
}

-static bool __init kvm_is_vmx_supported(void)
-{
- if (!cpu_has_vmx()) {
- pr_err("CPU doesn't support VMX\n");
- return false;
- }
-
- if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
- !boot_cpu_has(X86_FEATURE_VMX)) {
- pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
- return false;
- }
-
- return true;
-}
-
-static int __init vmx_check_processor_compat(void)
-{
- struct vmcs_config vmcs_conf;
- struct vmx_capability vmx_cap;
-
- if (!kvm_is_vmx_supported())
- return -EIO;
-
- if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
- return -EIO;
- if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
- if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
- pr_err("CPU %d feature inconsistency!\n", smp_processor_id());
- return -EIO;
- }
- return 0;
-}
-
static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
{
u8 cache;
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:45:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 19/44] KVM: MIPS: Hardcode callbacks to hardware virtualization extensions

Now that KVM no longer supports trap-and-emulate (see commit 45c7e8af4a5e
"MIPS: Remove KVM_TE support"), hardcode the MIPS callbacks to the
virtualization callbacks.

Harcoding the callbacks eliminates the technically-unnecessary check on
non-NULL kvm_mips_callbacks in kvm_arch_init(). MIPS has never supported
multiple in-tree modules, i.e. barring an out-of-tree module, where
copying and renaming kvm.ko counts as "out-of-tree", KVM could never
encounter a non-NULL set of callbacks during module init.

The callback check is also subtly broken, as it is not thread safe,
i.e. if there were multiple modules, loading both concurrently would
create a race between checking and setting kvm_mips_callbacks.

Given that out-of-tree shenanigans are not the kernel's responsibility,
hardcode the callbacks to simplify the code.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/Makefile | 2 +-
arch/mips/kvm/callback.c | 14 --------------
arch/mips/kvm/mips.c | 9 ++-------
arch/mips/kvm/vz.c | 7 ++++---
5 files changed, 8 insertions(+), 26 deletions(-)
delete mode 100644 arch/mips/kvm/callback.c

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 28f0ba97db71..2803c9c21ef9 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -758,7 +758,7 @@ struct kvm_mips_callbacks {
void (*vcpu_reenter)(struct kvm_vcpu *vcpu);
};
extern struct kvm_mips_callbacks *kvm_mips_callbacks;
-int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks);
+int kvm_mips_emulation_init(void);

/* Debug: dump vcpu state */
int kvm_arch_vcpu_dump_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/kvm/Makefile b/arch/mips/kvm/Makefile
index 21ff75bcdbc4..805aeea2166e 100644
--- a/arch/mips/kvm/Makefile
+++ b/arch/mips/kvm/Makefile
@@ -17,4 +17,4 @@ kvm-$(CONFIG_CPU_LOONGSON64) += loongson_ipi.o

kvm-y += vz.o
obj-$(CONFIG_KVM) += kvm.o
-obj-y += callback.o tlb.o
+obj-y += tlb.o
diff --git a/arch/mips/kvm/callback.c b/arch/mips/kvm/callback.c
deleted file mode 100644
index d88aa2173fb0..000000000000
--- a/arch/mips/kvm/callback.c
+++ /dev/null
@@ -1,14 +0,0 @@
-/*
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Copyright (C) 2012 MIPS Technologies, Inc. All rights reserved.
- * Authors: Yann Le Du <[email protected]>
- */
-
-#include <linux/export.h>
-#include <linux/kvm_host.h>
-
-struct kvm_mips_callbacks *kvm_mips_callbacks;
-EXPORT_SYMBOL_GPL(kvm_mips_callbacks);
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index af29490d9740..f0a6c245d1ff 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1012,17 +1012,12 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)

int kvm_arch_init(void *opaque)
{
- if (kvm_mips_callbacks) {
- kvm_err("kvm: module already exists\n");
- return -EEXIST;
- }
-
- return kvm_mips_emulation_init(&kvm_mips_callbacks);
+ return kvm_mips_emulation_init();
}

void kvm_arch_exit(void)
{
- kvm_mips_callbacks = NULL;
+
}

int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
diff --git a/arch/mips/kvm/vz.c b/arch/mips/kvm/vz.c
index c706f5890a05..dafab003ea0d 100644
--- a/arch/mips/kvm/vz.c
+++ b/arch/mips/kvm/vz.c
@@ -3304,7 +3304,10 @@ static struct kvm_mips_callbacks kvm_vz_callbacks = {
.vcpu_reenter = kvm_vz_vcpu_reenter,
};

-int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks)
+/* FIXME: Get rid of the callbacks now that trap-and-emulate is gone. */
+struct kvm_mips_callbacks *kvm_mips_callbacks = &kvm_vz_callbacks;
+
+int kvm_mips_emulation_init(void)
{
if (!cpu_has_vz)
return -ENODEV;
@@ -3318,7 +3321,5 @@ int kvm_mips_emulation_init(struct kvm_mips_callbacks **install_callbacks)
return -ENODEV;

pr_info("Starting KVM with MIPS VZ extensions\n");
-
- *install_callbacks = &kvm_vz_callbacks;
return 0;
}
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:45:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 20/44] KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init()

Invoke kvm_mips_emulation_init() directly from kvm_mips_init() instead
of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is
a glorified nop as invoking kvm_arch_init() is the very first action
performed by kvm_init().

Emptying kvm_arch_init() will allow dropping the hook entirely once all
architecture implementations are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index f0a6c245d1ff..75681281e2df 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1012,7 +1012,7 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)

int kvm_arch_init(void *opaque)
{
- return kvm_mips_emulation_init();
+ return 0;
}

void kvm_arch_exit(void)
@@ -1636,6 +1636,10 @@ static int __init kvm_mips_init(void)
if (ret)
return ret;

+ ret = kvm_mips_emulation_init();
+ if (ret)
+ return ret;
+
ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);

if (ret)
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:46:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

From: Chao Gao <[email protected]>

Disable CPU hotplug during hardware_enable_all() to prevent the corner
case where if the following sequence occurs:

1. A hotplugged CPU marks itself online in cpu_online_mask
2. The hotplugged CPU enables interrupt before invoking KVM's ONLINE
callback
3 hardware_enable_all() is invoked on another CPU right

the hotplugged CPU will be included in on_each_cpu() and thus get sent
through hardware_enable_nolock() before kvm_online_cpu() is called.

start_secondary { ...
set_cpu_online(smp_processor_id(), true); <- 1
...
local_irq_enable(); <- 2
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
}

KVM currently fudges around this race by keeping track of which CPUs have
done hardware enabling (see commit 1b6c016818a5 "KVM: Keep track of which
cpus have virtualization enabled"), but that's an inefficient, convoluted,
and hacky solution.

Signed-off-by: Chao Gao <[email protected]>
[sean: split to separate patch, write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 8 +++++++-
virt/kvm/kvm_main.c | 10 ++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a7b1d916ecb2..a15e54ba0471 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9283,7 +9283,13 @@ static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
int cpu = smp_processor_id();
struct cpuinfo_x86 *c = &cpu_data(cpu);

- WARN_ON(!irqs_disabled());
+ /*
+ * Compatibility checks are done when loading KVM and when enabling
+ * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
+ * compatible, i.e. KVM should never perform a compatibility check on
+ * an offline CPU.
+ */
+ WARN_ON(!irqs_disabled() && cpu_active(cpu));

if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fd9e39c85549..4e765ef9f4bd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5088,6 +5088,15 @@ static int hardware_enable_all(void)
{
int r = 0;

+ /*
+ * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+ * is called, and so on_each_cpu() between them includes the CPU that
+ * is being onlined. As a result, hardware_enable_nolock() may get
+ * invoked before kvm_online_cpu().
+ *
+ * Disable CPU hotplug to prevent scenarios where KVM sees
+ */
+ cpus_read_lock();
raw_spin_lock(&kvm_count_lock);

kvm_usage_count++;
@@ -5102,6 +5111,7 @@ static int hardware_enable_all(void)
}

raw_spin_unlock(&kvm_count_lock);
+ cpus_read_unlock();

return r;
}
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:46:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 12/44] KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace

Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes
/dev/kvm to userspace and thus allows userspace to create VMs (and call
other ioctls). E.g. KVM will encounter a NULL pointer when attempting to
add a vCPU to the per-CPU loaded_vmcss_on_cpu list if userspace is able to
create a VM before vmx_init() configures said list.

BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP
CPU: 6 PID: 1143 Comm: stable Not tainted 6.0.0-rc7+ #988
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:vmx_vcpu_load_vmcs+0x68/0x230 [kvm_intel]
<TASK>
vmx_vcpu_load+0x16/0x60 [kvm_intel]
kvm_arch_vcpu_load+0x32/0x1f0 [kvm]
vcpu_load+0x2f/0x40 [kvm]
kvm_arch_vcpu_create+0x231/0x310 [kvm]
kvm_vm_ioctl+0x79f/0xe10 [kvm]
? handle_mm_fault+0xb1/0x220
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f5a6b05743b
</TASK>
Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel(+) kvm irqbypass

Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 118d9b29b339..6adb60485839 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8493,21 +8493,25 @@ static void vmx_cleanup_l1d_flush(void)
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
}

-static void vmx_exit(void)
+static void __vmx_exit(void)
{
+ allow_smaller_maxphyaddr = false;
+
#ifdef CONFIG_KEXEC_CORE
RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
synchronize_rcu();
#endif
-
- kvm_exit();
- kvm_x86_vendor_exit();
-
- hv_cleanup_evmcs();
-
vmx_cleanup_l1d_flush();
+}

- allow_smaller_maxphyaddr = false;
+static void vmx_exit(void)
+{
+ kvm_exit();
+ kvm_x86_vendor_exit();
+
+ __vmx_exit();
+
+ hv_cleanup_evmcs();
}
module_exit(vmx_exit);

@@ -8521,11 +8525,6 @@ static int __init vmx_init(void)
if (r)
goto err_x86_init;

- r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
- __alignof__(struct vcpu_vmx), THIS_MODULE);
- if (r)
- goto err_kvm_init;
-
/*
* Must be called after common x86 init so enable_ept is properly set
* up. Hand the parameter mitigation value in which was stored in
@@ -8559,11 +8558,20 @@ static int __init vmx_init(void)
if (!enable_ept)
allow_smaller_maxphyaddr = true;

+ /*
+ * Common KVM initialization _must_ come last, after this, /dev/kvm is
+ * exposed to userspace!
+ */
+ r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
+ __alignof__(struct vcpu_vmx), THIS_MODULE);
+ if (r)
+ goto err_kvm_init;
+
return 0;

-err_l1d_flush:
- vmx_exit();
err_kvm_init:
+ __vmx_exit();
+err_l1d_flush:
kvm_x86_vendor_exit();
err_x86_init:
hv_cleanup_evmcs();
--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:46:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 31/44] KVM: x86: Use KBUILD_MODNAME to specify vendor module name

Use KBUILD_MODNAME to specify the vendor module name instead of manually
writing out the name to make it a bit more obvious that the name isn't
completely arbitrary. A future patch will also use KBUILD_MODNAME to
define pr_fmt, at which point using KBUILD_MODNAME for kvm_x86_ops.name
further reinforces the intended usage of kvm_x86_ops.name.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 99c1ac2d9c84..13457aa68112 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4737,7 +4737,7 @@ static int svm_vm_init(struct kvm *kvm)
}

static struct kvm_x86_ops svm_x86_ops __initdata = {
- .name = "kvm_amd",
+ .name = KBUILD_MODNAME,

.hardware_unsetup = svm_hardware_unsetup,
.hardware_enable = svm_hardware_enable,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 25e28d368274..a563c9756e36 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8074,7 +8074,7 @@ static void vmx_vm_destroy(struct kvm *kvm)
}

static struct kvm_x86_ops vmx_x86_ops __initdata = {
- .name = "kvm_intel",
+ .name = KBUILD_MODNAME,

.hardware_unsetup = vmx_hardware_unsetup,

--
2.38.1.431.g37b22c650d-goog


2022-11-02 23:47:47

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 26/44] KVM: s390: Mark __kvm_s390_init() and its descendants as __init

Tag __kvm_s390_init() and its unique helpers as __init. These functions
are only ever called during module_init(), but could not be tagged
accordingly while they were invoked from the common kvm_arch_init(),
which is not __init because of x86.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/kvm/kvm-s390.c | 4 ++--
arch/s390/kvm/kvm-s390.h | 2 +-
arch/s390/kvm/pci.c | 2 +-
arch/s390/kvm/pci.h | 2 +-
5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index ab569faf0df2..bf9d55fbc21a 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -3416,7 +3416,7 @@ void kvm_s390_gib_destroy(void)
gib = NULL;
}

-int kvm_s390_gib_init(u8 nisc)
+int __init kvm_s390_gib_init(u8 nisc)
{
int rc = 0;

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e1c9980aae78..f6ae845bc1c1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -358,7 +358,7 @@ static __always_inline void __insn32_query(unsigned int opcode, u8 *query)
#define INSN_SORTL 0xb938
#define INSN_DFLTCC 0xb939

-static void kvm_s390_cpu_feat_init(void)
+static void __init kvm_s390_cpu_feat_init(void)
{
int i;

@@ -461,7 +461,7 @@ static void kvm_s390_cpu_feat_init(void)
*/
}

-static int __kvm_s390_init(void)
+static int __init __kvm_s390_init(void)
{
int rc = -ENOMEM;

diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index f6fd668f887e..e7f6166129eb 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -467,7 +467,7 @@ void kvm_s390_gisa_clear(struct kvm *kvm);
void kvm_s390_gisa_destroy(struct kvm *kvm);
void kvm_s390_gisa_disable(struct kvm *kvm);
void kvm_s390_gisa_enable(struct kvm *kvm);
-int kvm_s390_gib_init(u8 nisc);
+int __init kvm_s390_gib_init(u8 nisc);
void kvm_s390_gib_destroy(void);

/* implemented in guestdbg.c */
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index c50c1645c0ae..60548791c077 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -670,7 +670,7 @@ int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args)
return r;
}

-int kvm_s390_pci_init(void)
+int __init kvm_s390_pci_init(void)
{
zpci_kvm_hook.kvm_register = kvm_s390_pci_register_kvm;
zpci_kvm_hook.kvm_unregister = kvm_s390_pci_unregister_kvm;
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 486d06ef563f..ff0972dd5e71 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -60,7 +60,7 @@ void kvm_s390_pci_clear_list(struct kvm *kvm);

int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args);

-int kvm_s390_pci_init(void);
+int __init kvm_s390_pci_init(void);
void kvm_s390_pci_exit(void);

static inline bool kvm_s390_pci_interp_allowed(void)
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:12:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 08/44] KVM: x86: Move hardware setup/unsetup to init/exit

Now that kvm_arch_hardware_setup() is called immediately after
kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into
kvm_arch_{init,exit}() as a step towards dropping one of the hooks.

To avoid having to unwind various setup, e.g registration of several
notifiers, slot in the vendor hardware setup before the registration of
said notifiers and callbacks. Introducing a functional change while
moving code is less than ideal, but the alternative is adding a pile of
unwinding code, which is much more error prone, e.g. several attempts to
move the setup code verbatim all introduced bugs.

Add a comment to document that kvm_ops_update() is effectively the point
of no return, e.g. it sets the kvm_x86_ops.hardware_enable canary and so
needs to be unwound.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 121 +++++++++++++++++++++++----------------------
1 file changed, 63 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a7702b1c563..80ee580a9cd4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9252,6 +9252,24 @@ static struct notifier_block pvclock_gtod_notifier = {
};
#endif

+static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
+{
+ memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
+
+#define __KVM_X86_OP(func) \
+ static_call_update(kvm_x86_##func, kvm_x86_ops.func);
+#define KVM_X86_OP(func) \
+ WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func)
+#define KVM_X86_OP_OPTIONAL __KVM_X86_OP
+#define KVM_X86_OP_OPTIONAL_RET0(func) \
+ static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \
+ (void *)__static_call_return0);
+#include <asm/kvm-x86-ops.h>
+#undef __KVM_X86_OP
+
+ kvm_pmu_ops_update(ops->pmu_ops);
+}
+
int kvm_arch_init(void *opaque)
{
struct kvm_x86_init_ops *ops = opaque;
@@ -9325,6 +9343,24 @@ int kvm_arch_init(void *opaque)
kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
}

+ rdmsrl_safe(MSR_EFER, &host_efer);
+
+ if (boot_cpu_has(X86_FEATURE_XSAVES))
+ rdmsrl(MSR_IA32_XSS, host_xss);
+
+ kvm_init_pmu_capability();
+
+ r = ops->hardware_setup();
+ if (r != 0)
+ goto out_mmu_exit;
+
+ /*
+ * Point of no return! DO NOT add error paths below this point unless
+ * absolutely necessary, as most operations from this point forward
+ * require unwinding.
+ */
+ kvm_ops_update(ops);
+
kvm_timer_init();

if (pi_inject_timer == -1)
@@ -9336,8 +9372,32 @@ int kvm_arch_init(void *opaque)
set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
#endif

+ kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
+ kvm_caps.supported_xss = 0;
+
+#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
+ cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
+#undef __kvm_cpu_cap_has
+
+ if (kvm_caps.has_tsc_control) {
+ /*
+ * Make sure the user can only configure tsc_khz values that
+ * fit into a signed integer.
+ * A min value is not calculated because it will always
+ * be 1 on all machines.
+ */
+ u64 max = min(0x7fffffffULL,
+ __scale_tsc(kvm_caps.max_tsc_scaling_ratio, tsc_khz));
+ kvm_caps.max_guest_tsc_khz = max;
+ }
+ kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
+ kvm_init_msr_list();
return 0;

+out_mmu_exit:
+ kvm_mmu_vendor_module_exit();
out_free_percpu:
free_percpu(user_return_msrs);
out_free_x86_emulator_cache:
@@ -9347,6 +9407,8 @@ int kvm_arch_init(void *opaque)

void kvm_arch_exit(void)
{
+ kvm_unregister_perf_callbacks();
+
#ifdef CONFIG_X86_64
if (hypervisor_is_type(X86_HYPER_MS_HYPERV))
clear_hv_tscchange_cb();
@@ -9362,6 +9424,7 @@ void kvm_arch_exit(void)
irq_work_sync(&pvclock_irq_work);
cancel_work_sync(&pvclock_gtod_work);
#endif
+ static_call(kvm_x86_hardware_unsetup)();
kvm_x86_ops.hardware_enable = NULL;
kvm_mmu_vendor_module_exit();
free_percpu(user_return_msrs);
@@ -11922,72 +11985,14 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}

-static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
-{
- memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
-
-#define __KVM_X86_OP(func) \
- static_call_update(kvm_x86_##func, kvm_x86_ops.func);
-#define KVM_X86_OP(func) \
- WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func)
-#define KVM_X86_OP_OPTIONAL __KVM_X86_OP
-#define KVM_X86_OP_OPTIONAL_RET0(func) \
- static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \
- (void *)__static_call_return0);
-#include <asm/kvm-x86-ops.h>
-#undef __KVM_X86_OP
-
- kvm_pmu_ops_update(ops->pmu_ops);
-}
-
int kvm_arch_hardware_setup(void *opaque)
{
- struct kvm_x86_init_ops *ops = opaque;
- int r;
-
- rdmsrl_safe(MSR_EFER, &host_efer);
-
- if (boot_cpu_has(X86_FEATURE_XSAVES))
- rdmsrl(MSR_IA32_XSS, host_xss);
-
- kvm_init_pmu_capability();
-
- r = ops->hardware_setup();
- if (r != 0)
- return r;
-
- kvm_ops_update(ops);
-
- kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
-
- if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
- kvm_caps.supported_xss = 0;
-
-#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
- cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
-#undef __kvm_cpu_cap_has
-
- if (kvm_caps.has_tsc_control) {
- /*
- * Make sure the user can only configure tsc_khz values that
- * fit into a signed integer.
- * A min value is not calculated because it will always
- * be 1 on all machines.
- */
- u64 max = min(0x7fffffffULL,
- __scale_tsc(kvm_caps.max_tsc_scaling_ratio, tsc_khz));
- kvm_caps.max_guest_tsc_khz = max;
- }
- kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
- kvm_init_msr_list();
return 0;
}

void kvm_arch_hardware_unsetup(void)
{
- kvm_unregister_perf_callbacks();

- static_call(kvm_x86_hardware_unsetup)();
}

int kvm_arch_check_processor_compat(void *opaque)
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:14:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 42/44] KVM: Make hardware_enable_failed a local variable in the "enable all" path

From: Isaku Yamahata <[email protected]>

Rework detecting hardware enabling errors to use a local variable in the
"enable all" path to track whether or not enabling was successful across
all CPUs. Using a global variable complicates paths that enable hardware
only on the current CPU, e.g. kvm_resume() and kvm_online_cpu().

Opportunistically add a WARN if hardware enabling fails during
kvm_resume(), KVM is all kinds of hosed if CPU0 fails to enable hardware.
The WARN is largely futile in the current code, as KVM BUG()s on spurious
faults on VMX instructions, e.g. attempting to run a vCPU on CPU if
hardware enabling fails will explode.

------------[ cut here ]------------
kernel BUG at arch/x86/kvm/x86.c:508!
invalid opcode: 0000 [#1] SMP
CPU: 3 PID: 1009 Comm: CPU 4/KVM Not tainted 6.1.0-rc1+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_spurious_fault+0xa/0x10
Call Trace:
vmx_vcpu_load_vmcs+0x192/0x230 [kvm_intel]
vmx_vcpu_load+0x16/0x60 [kvm_intel]
kvm_arch_vcpu_load+0x32/0x1f0
vcpu_load+0x2f/0x40
kvm_arch_vcpu_ioctl_run+0x19/0x9d0
kvm_vcpu_ioctl+0x271/0x660
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0

But, the WARN may provide a breadcrumb to understand what went awry, and
someday KVM may fix one or both of those bugs, e.g. by finding a way to
eat spurious faults no matter the context (easier said than done due to
side effects of certain operations, e.g. Intel's VMCLEAR).

Signed-off-by: Isaku Yamahata <[email protected]>
[sean: rebase, WARN on failure in kvm_resume()]
Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 31949a89fe25..a18296ee731b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -104,7 +104,6 @@ LIST_HEAD(vm_list);

static DEFINE_PER_CPU(bool, hardware_enabled);
static int kvm_usage_count;
-static atomic_t hardware_enable_failed;

static struct kmem_cache *kvm_vcpu_cache;

@@ -5006,19 +5005,25 @@ static struct miscdevice kvm_dev = {
&kvm_chardev_ops,
};

-static void hardware_enable_nolock(void *junk)
+static int __hardware_enable_nolock(void)
{
if (__this_cpu_read(hardware_enabled))
- return;
+ return 0;

if (kvm_arch_hardware_enable()) {
- atomic_inc(&hardware_enable_failed);
pr_info("kvm: enabling virtualization on CPU%d failed\n",
raw_smp_processor_id());
- return;
+ return -EIO;
}

__this_cpu_write(hardware_enabled, true);
+ return 0;
+}
+
+static void hardware_enable_nolock(void *failed)
+{
+ if (__hardware_enable_nolock())
+ atomic_inc(failed);
}

static int kvm_online_cpu(unsigned int cpu)
@@ -5033,16 +5038,9 @@ static int kvm_online_cpu(unsigned int cpu)
* errors when scheduled to this CPU.
*/
if (kvm_usage_count) {
- WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
-
local_irq_save(flags);
- hardware_enable_nolock(NULL);
+ ret = __hardware_enable_nolock();
local_irq_restore(flags);
-
- if (atomic_read(&hardware_enable_failed)) {
- atomic_set(&hardware_enable_failed, 0);
- ret = -EIO;
- }
}
mutex_unlock(&kvm_lock);
return ret;
@@ -5094,6 +5092,7 @@ static void hardware_disable_all(void)

static int hardware_enable_all(void)
{
+ atomic_t failed = ATOMIC_INIT(0);
int r = 0;

/*
@@ -5109,10 +5108,9 @@ static int hardware_enable_all(void)

kvm_usage_count++;
if (kvm_usage_count == 1) {
- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
+ on_each_cpu(hardware_enable_nolock, &failed, 1);

- if (atomic_read(&hardware_enable_failed)) {
+ if (atomic_read(&failed)) {
hardware_disable_all_nolock();
r = -EBUSY;
}
@@ -5744,7 +5742,7 @@ static void kvm_resume(void)
lockdep_assert_irqs_disabled();

if (kvm_usage_count)
- hardware_enable_nolock(NULL);
+ WARN_ON_ONCE(__hardware_enable_nolock());
}

static struct syscore_ops kvm_syscore_ops = {
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:16:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 40/44] KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()

From: Isaku Yamahata <[email protected]>

Drop the superfluous invocation of hardware_disable_nolock() during
kvm_exit(), as it's nothing more than a glorified nop.

KVM automatically disables hardware on all CPUs when the last VM is
destroyed, and kvm_exit() cannot be called until the last VM goes
away as the calling module is pinned by an elevated refcount of the fops
associated with /dev/kvm. This holds true even on x86, where the caller
of kvm_exit() is not kvm.ko, but is instead a dependent module, kvm_amd.ko
or kvm_intel.ko, as kvm_chardev_ops.owner is set to the module that calls
kvm_init(), not hardcoded to the base kvm.ko module.

Signed-off-by: Isaku Yamahata <[email protected]>
[sean: rework changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c8d92e6c3922..4a42b78bfb0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5966,7 +5966,6 @@ void kvm_exit(void)
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
- on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_irqfd_exit();
free_cpumask_var(cpus_hardware_enabled);
}
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:19:03

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 44/44] KVM: Opt out of generic hardware enabling on s390 and PPC

Allow architectures to opt out of the generic hardware enabling logic,
and opt out on both s390 and PPC, which don't need to manually enable
virtualization as it's always on (when available).

In addition to letting s390 and PPC drop a bit of dead code, this will
hopefully also allow ARM to clean up its related code, e.g. ARM has its
own per-CPU flag to track which CPUs have enable hardware due to the
need to keep hardware enabled indefinitely when pKVM is enabled.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/Kconfig | 1 +
arch/mips/kvm/Kconfig | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 -
arch/powerpc/kvm/powerpc.c | 5 -----
arch/riscv/kvm/Kconfig | 1 +
arch/s390/include/asm/kvm_host.h | 1 -
arch/s390/kvm/kvm-s390.c | 6 ------
arch/x86/kvm/Kconfig | 1 +
include/linux/kvm_host.h | 4 ++++
virt/kvm/Kconfig | 3 +++
virt/kvm/kvm_main.c | 30 +++++++++++++++++++++++------
11 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..0a7d2116b27b 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -21,6 +21,7 @@ if VIRTUALIZATION
menuconfig KVM
bool "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
+ select KVM_GENERIC_HARDWARE_ENABLING
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index 91d197bee9c0..29e51649203b 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select MMU_NOTIFIER
select SRCU
select INTERVAL_TREE
+ select KVM_GENERIC_HARDWARE_ENABLING
help
Support for hosting Guest kernels.

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0a80e80c7b9e..959f566a455c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -876,7 +876,6 @@ struct kvm_vcpu_arch {
#define __KVM_HAVE_ARCH_WQP
#define __KVM_HAVE_CREATE_DEVICE

-static inline void kvm_arch_hardware_disable(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 51268be60dac..ed426c9ee0e9 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -436,11 +436,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
}
EXPORT_SYMBOL_GPL(kvmppc_ld);

-int kvm_arch_hardware_enable(void)
-{
- return 0;
-}
-
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
struct kvmppc_ops *kvm_ops = NULL;
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index f36a737d5f96..d5a658a047a7 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -20,6 +20,7 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)"
depends on RISCV_SBI && MMU
+ select KVM_GENERIC_HARDWARE_ENABLING
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select KVM_MMIO
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b1e98a9ed152..d3e4b5d7013a 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -1023,7 +1023,6 @@ extern char sie_exit;
extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);

-static inline void kvm_arch_hardware_disable(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 949231f1393e..129c159ab5ee 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -248,12 +248,6 @@ debug_info_t *kvm_s390_dbf;
debug_info_t *kvm_s390_dbf_uv;

/* Section: not file related */
-int kvm_arch_hardware_enable(void)
-{
- /* every s390 is virtualization enabled ;-) */
- return 0;
-}
-
/* forward declarations */
static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
unsigned long end);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fbeaa9ddef59..8e578311ca9d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
select SRCU
select INTERVAL_TREE
select HAVE_KVM_PM_NOTIFIER if PM
+ select KVM_GENERIC_HARDWARE_ENABLING
help
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b96d836a051..23c89c1e7788 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1441,8 +1441,10 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif

+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
+#endif
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
@@ -2074,7 +2076,9 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
}
}

+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
extern bool kvm_rebooting;
+#endif

extern unsigned int halt_poll_ns;
extern unsigned int halt_poll_ns_grow;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..d28df77345e1 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -86,3 +86,6 @@ config KVM_XFER_TO_GUEST_WORK

config HAVE_KVM_PM_NOTIFIER
bool
+
+config KVM_GENERIC_HARDWARE_ENABLING
+ bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 859bc27091cd..6736b36cf469 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,9 +102,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
DEFINE_MUTEX(kvm_lock);
LIST_HEAD(vm_list);

-static DEFINE_PER_CPU(bool, hardware_enabled);
-static int kvm_usage_count;
-
static struct kmem_cache *kvm_vcpu_cache;

static __read_mostly struct preempt_ops kvm_preempt_ops;
@@ -146,9 +143,6 @@ static void hardware_disable_all(void);

static void kvm_io_bus_destroy(struct kvm_io_bus *bus);

-__visible bool kvm_rebooting;
-EXPORT_SYMBOL_GPL(kvm_rebooting);
-
#define KVM_EVENT_CREATE_VM 0
#define KVM_EVENT_DESTROY_VM 1
static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
@@ -5005,6 +4999,13 @@ static struct miscdevice kvm_dev = {
&kvm_chardev_ops,
};

+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
+__visible bool kvm_rebooting;
+EXPORT_SYMBOL_GPL(kvm_rebooting);
+
+static DEFINE_PER_CPU(bool, hardware_enabled);
+static int kvm_usage_count;
+
static int __hardware_enable_nolock(void)
{
if (__this_cpu_read(hardware_enabled))
@@ -5171,6 +5172,17 @@ static struct syscore_ops kvm_syscore_ops = {
.suspend = kvm_suspend,
.resume = kvm_resume,
};
+#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
+static int hardware_enable_all(void)
+{
+ return 0;
+}
+
+static void hardware_disable_all(void)
+{
+
+}
+#endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */

static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
{
@@ -5859,6 +5871,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
int r;
int cpu;

+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
kvm_online_cpu, kvm_offline_cpu);
if (r)
@@ -5866,6 +5879,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)

register_reboot_notifier(&kvm_reboot_notifier);
register_syscore_ops(&kvm_syscore_ops);
+#endif

/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
@@ -5933,9 +5947,11 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
kmem_cache_destroy(kvm_vcpu_cache);
out_free_3:
+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
+#endif
return r;
}
EXPORT_SYMBOL_GPL(kvm_init);
@@ -5957,9 +5973,11 @@ void kvm_exit(void)
kmem_cache_destroy(kvm_vcpu_cache);
kvm_vfio_ops_exit();
kvm_async_pf_deinit();
+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
+#endif
kvm_irqfd_exit();
}
EXPORT_SYMBOL_GPL(kvm_exit);
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:21:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 17/44] KVM: arm64: Do arm/arch initialiation without bouncing through kvm_init()

Move arm/arch specific initialization directly in arm's module_init(),
now called kvm_arm_init(), instead of bouncing through kvm_init() to
reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action
performed by kvm_init(), i.e. this is a glorified nop.

Making kvm_arch_init() a nop will allow dropping it entirely once all
other architectures follow suit.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/arm.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f400a8c029dd..bfa2dcd3db11 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2190,7 +2190,7 @@ void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons)
/**
* Initialize Hyp-mode and memory mappings on all CPUs.
*/
-int kvm_arch_init(void *opaque)
+int kvm_arm_init(void)
{
int err;
bool in_hyp_mode;
@@ -2264,6 +2264,14 @@ int kvm_arch_init(void *opaque)
kvm_info("Hyp mode initialized successfully\n");
}

+ /*
+ * FIXME: Do something reasonable if kvm_init() fails after pKVM
+ * hypervisor protection is finalized.
+ */
+ err = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ if (err)
+ goto out_subs;
+
return 0;

out_subs:
@@ -2276,10 +2284,15 @@ int kvm_arch_init(void *opaque)
return err;
}

+int kvm_arch_init(void *opaque)
+{
+ return 0;
+}
+
/* NOP: Compiling as a module not supported */
void kvm_arch_exit(void)
{
- kvm_unregister_perf_callbacks();
+
}

static int __init early_kvm_mode_cfg(char *arg)
@@ -2320,10 +2333,4 @@ enum kvm_mode kvm_get_mode(void)
return kvm_mode;
}

-static int arm_init(void)
-{
- int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
- return rc;
-}
-
-module_init(arm_init);
+module_init(kvm_arm_init);
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:21:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules

Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.

Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.

Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.

Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/debugfs.c | 2 ++
arch/x86/kvm/emulate.c | 1 +
arch/x86/kvm/hyperv.c | 1 +
arch/x86/kvm/i8254.c | 4 ++--
arch/x86/kvm/i8259.c | 4 +++-
arch/x86/kvm/ioapic.c | 1 +
arch/x86/kvm/irq.c | 1 +
arch/x86/kvm/irq_comm.c | 7 +++---
arch/x86/kvm/kvm_onhyperv.c | 1 +
arch/x86/kvm/lapic.c | 8 +++----
arch/x86/kvm/mmu/mmu.c | 6 ++---
arch/x86/kvm/mmu/page_track.c | 1 +
arch/x86/kvm/mmu/spte.c | 4 ++--
arch/x86/kvm/mmu/spte.h | 4 ++--
arch/x86/kvm/mmu/tdp_iter.c | 1 +
arch/x86/kvm/mmu/tdp_mmu.c | 1 +
arch/x86/kvm/mtrr.c | 1 +
arch/x86/kvm/pmu.c | 1 +
arch/x86/kvm/smm.c | 1 +
arch/x86/kvm/svm/avic.c | 2 +-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/pmu.c | 2 ++
arch/x86/kvm/svm/sev.c | 1 +
arch/x86/kvm/svm/svm.c | 10 ++++-----
arch/x86/kvm/svm/svm_onhyperv.c | 1 +
arch/x86/kvm/svm/svm_onhyperv.h | 4 ++--
arch/x86/kvm/vmx/evmcs.c | 1 +
arch/x86/kvm/vmx/evmcs.h | 4 +---
arch/x86/kvm/vmx/nested.c | 3 ++-
arch/x86/kvm/vmx/pmu_intel.c | 5 +++--
arch/x86/kvm/vmx/posted_intr.c | 2 ++
arch/x86/kvm/vmx/sgx.c | 5 +++--
arch/x86/kvm/vmx/vmcs12.c | 1 +
arch/x86/kvm/vmx/vmx.c | 40 ++++++++++++++++-----------------
arch/x86/kvm/vmx/vmx_ops.h | 4 ++--
arch/x86/kvm/x86.c | 28 ++++++++++++-----------
arch/x86/kvm/xen.c | 1 +
38 files changed, 97 insertions(+), 70 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fb9b139023af..e2d02f655e96 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -8,6 +8,7 @@
* Copyright 2011 Red Hat, Inc. and/or its affiliates.
* Copyright IBM Corporation, 2008
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <linux/export.h>
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index c1390357126a..ee8c4c3496ed 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -4,6 +4,8 @@
*
* Copyright 2016 Red Hat, Inc. and/or its affiliates.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/kvm_host.h>
#include <linux/debugfs.h>
#include "lapic.h"
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5cc3efa0e21c..c3443045cd93 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -17,6 +17,7 @@
*
* From: xen-unstable 10676:af9809f51f81a3c43f276f00c81a52ef558afda4
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include "kvm_cache_regs.h"
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 0adf4a437e85..5a5f0d882df3 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -17,6 +17,7 @@
* Ben-Ami Yassour <[email protected]>
* Andrey Smetanin <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include "x86.h"
#include "lapic.h"
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index e0a7a0e7a73c..cd57a517d04a 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -30,7 +30,7 @@
* Based on QEMU and Xen.
*/

-#define pr_fmt(fmt) "pit: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <linux/slab.h>
@@ -351,7 +351,7 @@ static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period)

if (ps->period < min_period) {
pr_info_ratelimited(
- "kvm: requested %lld ns "
+ "requested %lld ns "
"i8254 timer period limited to %lld ns\n",
ps->period, min_period);
ps->period = min_period;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index e1bb6218bb96..4756bcb5724f 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -26,6 +26,8 @@
* Yaozu (Eddie) Dong <[email protected]>
* Port from Qemu.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/bitops.h>
@@ -35,7 +37,7 @@
#include "trace.h"

#define pr_pic_unimpl(fmt, ...) \
- pr_err_ratelimited("kvm: pic: " fmt, ## __VA_ARGS__)
+ pr_err_ratelimited("pic: " fmt, ## __VA_ARGS__)

static void pic_irq_request(struct kvm *kvm, int level);

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 765943d7cfa5..042dee556125 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -26,6 +26,7 @@
* Yaozu (Eddie) Dong <[email protected]>
* Based on Xen 3.1 code.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <linux/kvm.h>
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index f371f1292ca3..4e6c06632c8f 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -7,6 +7,7 @@
* Authors:
* Yaozu (Eddie) Dong <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/export.h>
#include <linux/kvm_host.h>
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 0687162c4f22..d48eaeacf803 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -8,6 +8,7 @@
*
* Copyright 2010 Red Hat, Inc. and/or its affiliates.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <linux/slab.h>
@@ -56,7 +57,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,

if (irq->dest_mode == APIC_DEST_PHYSICAL &&
irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) {
- printk(KERN_INFO "kvm: apic: phys broadcast and lowest prio\n");
+ pr_info("apic: phys broadcast and lowest prio\n");
irq->delivery_mode = APIC_DM_FIXED;
}

@@ -199,7 +200,7 @@ int kvm_request_irq_source_id(struct kvm *kvm)
irq_source_id = find_first_zero_bit(bitmap, BITS_PER_LONG);

if (irq_source_id >= BITS_PER_LONG) {
- printk(KERN_WARNING "kvm: exhaust allocatable IRQ sources!\n");
+ pr_warn("exhausted allocatable IRQ sources!\n");
irq_source_id = -EFAULT;
goto unlock;
}
@@ -221,7 +222,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
mutex_lock(&kvm->irq_lock);
if (irq_source_id < 0 ||
irq_source_id >= BITS_PER_LONG) {
- printk(KERN_ERR "kvm: IRQ source ID out of range!\n");
+ pr_err("IRQ source ID out of range!\n");
goto unlock;
}
clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
index ee4f696a0782..482d6639ef88 100644
--- a/arch/x86/kvm/kvm_onhyperv.c
+++ b/arch/x86/kvm/kvm_onhyperv.c
@@ -2,6 +2,7 @@
/*
* KVM L1 hypervisor optimizations on Hyper-V.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <asm/mshyperv.h>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1bb63746e991..9335c4b05760 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -15,6 +15,7 @@
*
* Based on Xen 3.1 code, Copyright (c) 2004, Intel Corporation.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <linux/kvm.h>
@@ -942,8 +943,7 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
{
if (!kvm->arch.disabled_lapic_found) {
kvm->arch.disabled_lapic_found = true;
- printk(KERN_INFO
- "Disabled LAPIC found during irq injection\n");
+ pr_info("Disabled LAPIC found during irq injection\n");
}
}

@@ -1561,7 +1561,7 @@ static void limit_periodic_timer_frequency(struct kvm_lapic *apic)

if (apic->lapic_timer.period < min_period) {
pr_info_ratelimited(
- "kvm: vcpu %i: requested %lld ns "
+ "vcpu %i: requested %lld ns "
"lapic timer period limited to %lld ns\n",
apic->vcpu->vcpu_id,
apic->lapic_timer.period, min_period);
@@ -1846,7 +1846,7 @@ static bool set_target_expiration(struct kvm_lapic *apic, u32 count_reg)
deadline = apic->lapic_timer.period;
else if (unlikely(deadline > apic->lapic_timer.period)) {
pr_info_ratelimited(
- "kvm: vcpu %i: requested lapic timer restore with "
+ "vcpu %i: requested lapic timer restore with "
"starting count register %#x=%u (%lld ns) > initial count (%lld ns). "
"Using initial count to start timer.\n",
apic->vcpu->vcpu_id,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f8c92a4a35fa..e5a0252fa6ac 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -14,6 +14,7 @@
* Yaniv Kamay <[email protected]>
* Avi Kivity <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include "irq.h"
#include "ioapic.h"
@@ -3399,8 +3400,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
}

if (++retry_count > 4) {
- printk_once(KERN_WARNING
- "kvm: Fast #PF retrying more than 4 times.\n");
+ pr_warn_once("Fast #PF retrying more than 4 times.\n");
break;
}

@@ -6537,7 +6537,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
* zap all shadow pages.
*/
if (unlikely(gen == 0)) {
- kvm_debug_ratelimited("kvm: zapping shadow pages for mmio generation wraparound\n");
+ kvm_debug_ratelimited("zapping shadow pages for mmio generation wraparound\n");
kvm_mmu_zap_all_fast(kvm);
}
}
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2e09d1b6249f..0a2ac438d647 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -10,6 +10,7 @@
* Author:
* Xiao Guangrong <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <linux/rculist.h>
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2e08b2a45361..f00fbfdea706 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -7,7 +7,7 @@
* Copyright (C) 2006 Qumranet, Inc.
* Copyright 2020 Red Hat, Inc. and/or its affiliates.
*/
-
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include "mmu.h"
@@ -340,7 +340,7 @@ u64 mark_spte_for_access_track(u64 spte)

WARN_ONCE(spte & (SHADOW_ACC_TRACK_SAVED_BITS_MASK <<
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT),
- "kvm: Access Tracking saved bit locations are not zero\n");
+ "Access Tracking saved bit locations are not zero\n");

spte |= (spte & SHADOW_ACC_TRACK_SAVED_BITS_MASK) <<
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 79560d77aa4c..922c838ff9a6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -418,11 +418,11 @@ static inline void check_spte_writable_invariants(u64 spte)
{
if (spte & shadow_mmu_writable_mask)
WARN_ONCE(!(spte & shadow_host_writable_mask),
- "kvm: MMU-writable SPTE is not Host-writable: %llx",
+ KBUILD_MODNAME ": MMU-writable SPTE is not Host-writable: %llx",
spte);
else
WARN_ONCE(is_writable_pte(spte),
- "kvm: Writable SPTE is not MMU-writable: %llx", spte);
+ KBUILD_MODNAME ": Writable SPTE is not MMU-writable: %llx", spte);
}

static inline bool is_mmu_writable_spte(u64 spte)
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 39b48e7d7d1a..e26e744df1d1 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include "mmu_internal.h"
#include "tdp_iter.h"
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 672f0432d777..42bb031f33b9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include "mmu.h"
#include "mmu_internal.h"
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index a8502e02f479..9fac1ec03463 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -13,6 +13,7 @@
* Paolo Bonzini <[email protected]>
* Xiao Guangrong <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include <asm/mtrr.h>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d9b9a0f0db17..db37fcb8f112 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -9,6 +9,7 @@
* Gleb Natapov <[email protected]>
* Wei Huang <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/types.h>
#include <linux/kvm_host.h>
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 39e5e2c3498b..137575ff3074 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -1,4 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include "x86.h"
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 6919dee69f18..f52f5e0dd465 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -12,7 +12,7 @@
* Avi Kivity <[email protected]>
*/

-#define pr_fmt(fmt) "SVM: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_types.h>
#include <linux/hashtable.h>
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b258d6988f5d..b79fe8612747 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -12,7 +12,7 @@
* Avi Kivity <[email protected]>
*/

-#define pr_fmt(fmt) "SVM: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b68956299fa8..13f0eeaebd55 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -9,6 +9,8 @@
*
* Implementation is based on pmu_intel.c file
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/types.h>
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c0c9ed5e279c..ac78ca3e87da 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -6,6 +6,7 @@
*
* Copyright 2010 Red Hat, Inc. and/or its affiliates.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 13457aa68112..3c48fb837302 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1,4 +1,4 @@
-#define pr_fmt(fmt) "SVM: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>

@@ -2082,7 +2082,7 @@ static void svm_handle_mce(struct kvm_vcpu *vcpu)
* Erratum 383 triggered. Guest state is corrupt so kill the
* guest.
*/
- pr_err("KVM: Guest triggered AMD Erratum 383\n");
+ pr_err("Guest triggered AMD Erratum 383\n");

kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);

@@ -4665,7 +4665,7 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
smap = cr4 & X86_CR4_SMAP;
is_user = svm_get_cpl(vcpu) == 3;
if (smap && (!smep || is_user)) {
- pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
+ pr_err_ratelimited("SEV Guest triggered AMD Erratum 1096\n");

/*
* If the fault occurred in userspace, arbitrarily inject #GP
@@ -5013,7 +5013,7 @@ static __init int svm_hardware_setup(void)
}

if (nested) {
- printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
+ pr_info("Nested Virtualization enabled\n");
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}

@@ -5031,7 +5031,7 @@ static __init int svm_hardware_setup(void)
/* Force VM NPT level equal to the host's paging level */
kvm_configure_mmu(npt_enabled, get_npt_level(),
get_npt_level(), PG_LEVEL_1G);
- pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
+ pr_info("Nested Paging %sabled\n", npt_enabled ? "en" : "dis");

/* Setup shadow_me_value and shadow_me_mask */
kvm_mmu_set_me_spte_mask(sme_me_mask, sme_me_mask);
diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c
index 8cdc62c74a96..34be6026101d 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.c
+++ b/arch/x86/kvm/svm/svm_onhyperv.c
@@ -2,6 +2,7 @@
/*
* KVM L1 hypervisor optimizations on Hyper-V for SVM.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>

diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index e2fc59380465..28fb6706804a 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -32,7 +32,7 @@ static inline void svm_hv_hardware_setup(void)
{
if (npt_enabled &&
ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) {
- pr_info("kvm: Hyper-V enlightened NPT TLB flush enabled\n");
+ pr_info(KBUILD_MODNAME ": Hyper-V enlightened NPT TLB flush enabled\n");
svm_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
svm_x86_ops.tlb_remote_flush_with_range =
hv_remote_flush_tlb_with_range;
@@ -41,7 +41,7 @@ static inline void svm_hv_hardware_setup(void)
if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) {
int cpu;

- pr_info("kvm: Hyper-V Direct TLB Flush enabled\n");
+ pr_info(KBUILD_MODNAME ": Hyper-V Direct TLB Flush enabled\n");
for_each_online_cpu(cpu) {
struct hv_vp_assist_page *vp_ap =
hv_get_vp_assist_page(cpu);
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index d8b23c96d627..96b705f1cfdf 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/errno.h>
#include <linux/smp.h>
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 6f746ef3c038..8f8bb262e07c 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -115,9 +115,7 @@ static __always_inline int get_evmcs_offset(unsigned long field,
{
int offset = evmcs_field_offset(field, clean_field);

- WARN_ONCE(offset < 0, "KVM: accessing unsupported EVMCS field %lx\n",
- field);
-
+ WARN_ONCE(offset < 0, "accessing unsupported EVMCS field %lx\n", field);
return offset;
}

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 61a2e551640a..1ff1fc6e9107 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/objtool.h>
#include <linux/percpu.h>
@@ -204,7 +205,7 @@ static void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
{
/* TODO: not to reset guest simply here. */
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
- pr_debug_ratelimited("kvm: nested vmx abort, indicator %d\n", indicator);
+ pr_debug_ratelimited("nested vmx abort, indicator %d\n", indicator);
}

static inline bool vmx_control_verify(u32 control, u32 low, u32 high)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 25b70a85bef5..968f9762081f 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -8,6 +8,8 @@
* Avi Kivity <[email protected]>
* Gleb Natapov <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/types.h>
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
@@ -763,8 +765,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
return;

warn:
- pr_warn_ratelimited("kvm: vcpu-%d: fail to passthrough LBR.\n",
- vcpu->vcpu_id);
+ pr_warn_ratelimited("vcpu-%d: fail to passthrough LBR.\n", vcpu->vcpu_id);
}

static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 1b56c5e5c9fb..94c38bea60e7 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -1,4 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/kvm_host.h>

#include <asm/irq_remapping.h>
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 8f95c7c01433..a6ac83d4b6ad 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2021 Intel Corporation. */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <asm/sgx.h>

@@ -164,7 +165,7 @@ static int __handle_encls_ecreate(struct kvm_vcpu *vcpu,
if (!vcpu->kvm->arch.sgx_provisioning_allowed &&
(attributes & SGX_ATTR_PROVISIONKEY)) {
if (sgx_12_1->eax & SGX_ATTR_PROVISIONKEY)
- pr_warn_once("KVM: SGX PROVISIONKEY advertised but not allowed\n");
+ pr_warn_once("SGX PROVISIONKEY advertised but not allowed\n");
kvm_inject_gp(vcpu, 0);
return 1;
}
@@ -379,7 +380,7 @@ int handle_encls(struct kvm_vcpu *vcpu)
return handle_encls_ecreate(vcpu);
if (leaf == EINIT)
return handle_encls_einit(vcpu);
- WARN(1, "KVM: unexpected exit on ENCLS[%u]", leaf);
+ WARN_ONCE(1, "unexpected exit on ENCLS[%u]", leaf);
vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
vcpu->run->hw.hardware_exit_reason = EXIT_REASON_ENCLS;
return 0;
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 2251b60920f8..106a72c923ca 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include "vmcs12.h"

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a563c9756e36..1b645f52cd8d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -12,6 +12,7 @@
* Avi Kivity <[email protected]>
* Yaniv Kamay <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/highmem.h>
#include <linux/hrtimer.h>
@@ -445,36 +446,36 @@ void vmread_error(unsigned long field, bool fault)
if (fault)
kvm_spurious_fault();
else
- vmx_insn_failed("kvm: vmread failed: field=%lx\n", field);
+ vmx_insn_failed("vmread failed: field=%lx\n", field);
}

noinline void vmwrite_error(unsigned long field, unsigned long value)
{
- vmx_insn_failed("kvm: vmwrite failed: field=%lx val=%lx err=%u\n",
+ vmx_insn_failed("vmwrite failed: field=%lx val=%lx err=%u\n",
field, value, vmcs_read32(VM_INSTRUCTION_ERROR));
}

noinline void vmclear_error(struct vmcs *vmcs, u64 phys_addr)
{
- vmx_insn_failed("kvm: vmclear failed: %p/%llx err=%u\n",
+ vmx_insn_failed("vmclear failed: %p/%llx err=%u\n",
vmcs, phys_addr, vmcs_read32(VM_INSTRUCTION_ERROR));
}

noinline void vmptrld_error(struct vmcs *vmcs, u64 phys_addr)
{
- vmx_insn_failed("kvm: vmptrld failed: %p/%llx err=%u\n",
+ vmx_insn_failed("vmptrld failed: %p/%llx err=%u\n",
vmcs, phys_addr, vmcs_read32(VM_INSTRUCTION_ERROR));
}

noinline void invvpid_error(unsigned long ext, u16 vpid, gva_t gva)
{
- vmx_insn_failed("kvm: invvpid failed: ext=0x%lx vpid=%u gva=0x%lx\n",
+ vmx_insn_failed("invvpid failed: ext=0x%lx vpid=%u gva=0x%lx\n",
ext, vpid, gva);
}

noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
{
- vmx_insn_failed("kvm: invept failed: ext=0x%lx eptp=%llx gpa=0x%llx\n",
+ vmx_insn_failed("invept failed: ext=0x%lx eptp=%llx gpa=0x%llx\n",
ext, eptp, gpa);
}

@@ -578,7 +579,7 @@ static __init void hv_setup_evmcs(void)
}

if (enlightened_vmcs) {
- pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n");
+ pr_info("Using Hyper-V Enlightened VMCS\n");
static_branch_enable(&enable_evmcs);
}

@@ -1679,8 +1680,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
if (!instr_len)
goto rip_updated;

- WARN(exit_reason.enclave_mode,
- "KVM: skipping instruction after SGX enclave VM-Exit");
+ WARN_ONCE(exit_reason.enclave_mode,
+ "skipping instruction after SGX enclave VM-Exit");

orig_rip = kvm_rip_read(vcpu);
rip = orig_rip + instr_len;
@@ -2986,9 +2987,8 @@ static void fix_rmode_seg(int seg, struct kvm_segment *save)
var.type = 0x3;
var.avl = 0;
if (save->base & 0xf)
- printk_once(KERN_WARNING "kvm: segment base is not "
- "paragraph aligned when entering "
- "protected mode (seg=%d)", seg);
+ pr_warn_once("segment base is not paragraph aligned "
+ "when entering protected mode (seg=%d)", seg);
}

vmcs_write16(sf->selector, var.selector);
@@ -3018,8 +3018,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
* vcpu. Warn the user that an update is overdue.
*/
if (!kvm_vmx->tss_addr)
- printk_once(KERN_WARNING "kvm: KVM_SET_TSS_ADDR need to be "
- "called before entering vcpu\n");
+ pr_warn_once("KVM_SET_TSS_ADDR needs to be called before running vCPU\n");

vmx_segment_cache_clear(vmx);

@@ -6880,7 +6879,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
gate_desc *desc = (gate_desc *)host_idt_base + vector;

if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
- "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
+ "unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;

handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
@@ -7485,7 +7484,7 @@ static int __init vmx_check_processor_compat(void)

if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
!this_cpu_has(X86_FEATURE_VMX)) {
- pr_err("kvm: VMX is disabled on CPU %d\n", smp_processor_id());
+ pr_err("VMX is disabled on CPU %d\n", smp_processor_id());
return -EIO;
}

@@ -7494,8 +7493,7 @@ static int __init vmx_check_processor_compat(void)
if (nested)
nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
- printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
- smp_processor_id());
+ pr_err("CPU %d feature inconsistency!\n", smp_processor_id());
return -EIO;
}
return 0;
@@ -8294,7 +8292,7 @@ static __init int hardware_setup(void)
return -EIO;

if (cpu_has_perf_global_ctrl_bug())
- pr_warn_once("kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
+ pr_warn_once("VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL "
"does not work properly. Using workaround\n");

if (boot_cpu_has(X86_FEATURE_NX))
@@ -8302,7 +8300,7 @@ static __init int hardware_setup(void)

if (boot_cpu_has(X86_FEATURE_MPX)) {
rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
- WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
+ WARN_ONCE(host_bndcfgs, "BNDCFGS in host will be lost");
}

if (!cpu_has_vmx_mpx())
@@ -8321,7 +8319,7 @@ static __init int hardware_setup(void)

/* NX support is required for shadow paging. */
if (!enable_ept && !boot_cpu_has(X86_FEATURE_NX)) {
- pr_err_ratelimited("kvm: NX (Execute Disable) not supported\n");
+ pr_err_ratelimited("NX (Execute Disable) not supported\n");
return -EOPNOTSUPP;
}

diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index ec268df83ed6..ac52ef9ca561 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -86,8 +86,8 @@ static __always_inline unsigned long __vmcs_readl(unsigned long field)
return value;

do_fail:
- WARN_ONCE(1, "kvm: vmread failed: field=%lx\n", field);
- pr_warn_ratelimited("kvm: vmread failed: field=%lx\n", field);
+ WARN_ONCE(1, KBUILD_MODNAME ": vmread failed: field=%lx\n", field);
+ pr_warn_ratelimited(KBUILD_MODNAME ": vmread failed: field=%lx\n", field);
return 0;

do_exception:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b7b551ae44b..39675b9662d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -15,6 +15,7 @@
* Amit Shah <[email protected]>
* Ben-Ami Yassour <[email protected]>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/kvm_host.h>
#include "irq.h"
@@ -2089,7 +2090,7 @@ static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
!guest_cpuid_has(vcpu, X86_FEATURE_MWAIT))
return kvm_handle_invalid_op(vcpu);

- pr_warn_once("kvm: %s instruction emulated as NOP!\n", insn);
+ pr_warn_once("%s instruction emulated as NOP!\n", insn);
return kvm_emulate_as_nop(vcpu);
}
int kvm_emulate_mwait(struct kvm_vcpu *vcpu)
@@ -2438,7 +2439,8 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
- pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
+ pr_debug("requested TSC rate %u falls outside tolerance [%u,%u]\n",
+ user_tsc_khz, thresh_lo, thresh_hi);
use_scaling = 1;
}
return set_tsc_khz(vcpu, user_tsc_khz, use_scaling);
@@ -7687,7 +7689,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;

emul_write:
- printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
+ pr_warn_once("emulating exchange as write\n");

return emulator_write_emulated(ctxt, addr, new, bytes, exception);
}
@@ -8248,7 +8250,7 @@ static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)

ctxt = kmem_cache_zalloc(x86_emulator_cache, GFP_KERNEL_ACCOUNT);
if (!ctxt) {
- pr_err("kvm: failed to allocate vcpu's emulator\n");
+ pr_err("failed to allocate vcpu's emulator\n");
return NULL;
}

@@ -9303,17 +9305,17 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
int r, cpu;

if (kvm_x86_ops.hardware_enable) {
- pr_err("kvm: already loaded vendor module '%s'\n", kvm_x86_ops.name);
+ pr_err("already loaded vendor module '%s'\n", kvm_x86_ops.name);
return -EEXIST;
}

if (!ops->cpu_has_kvm_support()) {
- pr_err_ratelimited("kvm: no hardware support for '%s'\n",
+ pr_err_ratelimited("no hardware support for '%s'\n",
ops->runtime_ops->name);
return -EOPNOTSUPP;
}
if (ops->disabled_by_bios()) {
- pr_err_ratelimited("kvm: support for '%s' disabled by bios\n",
+ pr_err_ratelimited("support for '%s' disabled by bios\n",
ops->runtime_ops->name);
return -EOPNOTSUPP;
}
@@ -9324,7 +9326,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
* vCPU's FPU state as a fxregs_state struct.
*/
if (!boot_cpu_has(X86_FEATURE_FPU) || !boot_cpu_has(X86_FEATURE_FXSR)) {
- printk(KERN_ERR "kvm: inadequate fpu\n");
+ pr_err("inadequate fpu\n");
return -EOPNOTSUPP;
}

@@ -9342,19 +9344,19 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
*/
if (rdmsrl_safe(MSR_IA32_CR_PAT, &host_pat) ||
(host_pat & GENMASK(2, 0)) != 6) {
- pr_err("kvm: host PAT[0] is not WB\n");
+ pr_err("host PAT[0] is not WB\n");
return -EIO;
}

x86_emulator_cache = kvm_alloc_emulator_cache();
if (!x86_emulator_cache) {
- pr_err("kvm: failed to allocate cache for x86 emulator\n");
+ pr_err("failed to allocate cache for x86 emulator\n");
return -ENOMEM;
}

user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
if (!user_return_msrs) {
- printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
+ pr_err("failed to allocate percpu kvm_user_return_msrs\n");
r = -ENOMEM;
goto out_free_x86_emulator_cache;
}
@@ -11611,7 +11613,7 @@ static int sync_regs(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
if (kvm_check_tsc_unstable() && kvm->created_vcpus)
- pr_warn_once("kvm: SMP vm created on host with unstable TSC; "
+ pr_warn_once("SMP vm created on host with unstable TSC; "
"guest TSC will not be reliable\n");

if (!kvm->arch.max_vcpu_ids)
@@ -11688,7 +11690,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
goto free_wbinvd_dirty_mask;

if (!fpu_alloc_guest_fpstate(&vcpu->arch.guest_fpu)) {
- pr_err("kvm: failed to allocate vcpu's fpu\n");
+ pr_err("failed to allocate vcpu's fpu\n");
goto free_emulate_ctxt;
}

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 2dae413bd62a..74eb3c47a340 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -5,6 +5,7 @@
*
* KVM Xen emulation
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include "x86.h"
#include "xen.h"
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:21:54

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 43/44] KVM: Register syscore (suspend/resume) ops early in kvm_init()

Register the suspend/resume notifier hooks at the same time KVM registers
its reboot notifier so that all the code in kvm_init() that deals with
enabling/disabling hardware is bundled together. Opportunstically move
KVM's implementations to reside near the reboot notifier code for the
same reason.

Bunching the code together will allow architectures to opt out of KVM's
generic hardware enable/disable logic with minimal #ifdeffery.

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 64 ++++++++++++++++++++++-----------------------
1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a18296ee731b..859bc27091cd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5142,6 +5142,36 @@ static struct notifier_block kvm_reboot_notifier = {
.priority = 0,
};

+static int kvm_suspend(void)
+{
+ /*
+ * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
+ * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
+ * is stable. Assert that kvm_lock is not held as a paranoid sanity
+ * check that the system isn't suspended when KVM is enabling hardware.
+ */
+ lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_irqs_disabled();
+
+ if (kvm_usage_count)
+ hardware_disable_nolock(NULL);
+ return 0;
+}
+
+static void kvm_resume(void)
+{
+ lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_irqs_disabled();
+
+ if (kvm_usage_count)
+ WARN_ON_ONCE(__hardware_enable_nolock());
+}
+
+static struct syscore_ops kvm_syscore_ops = {
+ .suspend = kvm_suspend,
+ .resume = kvm_resume,
+};
+
static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
{
int i;
@@ -5720,36 +5750,6 @@ static void kvm_init_debug(void)
}
}

-static int kvm_suspend(void)
-{
- /*
- * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
- * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
- * is stable. Assert that kvm_lock is not held as a paranoid sanity
- * check that the system isn't suspended when KVM is enabling hardware.
- */
- lockdep_assert_not_held(&kvm_lock);
- lockdep_assert_irqs_disabled();
-
- if (kvm_usage_count)
- hardware_disable_nolock(NULL);
- return 0;
-}
-
-static void kvm_resume(void)
-{
- lockdep_assert_not_held(&kvm_lock);
- lockdep_assert_irqs_disabled();
-
- if (kvm_usage_count)
- WARN_ON_ONCE(__hardware_enable_nolock());
-}
-
-static struct syscore_ops kvm_syscore_ops = {
- .suspend = kvm_suspend,
- .resume = kvm_resume,
-};
-
static inline
struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
{
@@ -5865,6 +5865,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
return r;

register_reboot_notifier(&kvm_reboot_notifier);
+ register_syscore_ops(&kvm_syscore_ops);

/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
@@ -5899,8 +5900,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)

kvm_chardev_ops.owner = module;

- register_syscore_ops(&kvm_syscore_ops);
-
kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;

@@ -5934,6 +5933,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
kmem_cache_destroy(kvm_vcpu_cache);
out_free_3:
+ unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
return r;
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:22:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 24/44] KVM: PPC: Move processor compatibility check to module init

Move KVM PPC's compatibility checks to their respective module_init()
hooks, there's no need to wait until KVM's common compat check, nor is
there a need to perform the check on every CPU (provided by common KVM's
hook), as the compatibility checks operate on global data.

arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
arch/powerpc/kvm/book3s.c: return 0
arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
strcmp(cur_cpu_spec->cpu_name, "e5500")
strcmp(cur_cpu_spec->cpu_name, "e6500")

Cc: Fabiano Rosas <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/powerpc/include/asm/kvm_ppc.h | 1 -
arch/powerpc/kvm/book3s.c | 10 ----------
arch/powerpc/kvm/e500.c | 4 ++--
arch/powerpc/kvm/e500mc.c | 4 ++++
arch/powerpc/kvm/powerpc.c | 2 +-
5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index bfacf12784dd..51a1824b0a16 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -118,7 +118,6 @@ extern int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr,
extern int kvmppc_core_vcpu_create(struct kvm_vcpu *vcpu);
extern void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu);
extern int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu);
-extern int kvmppc_core_check_processor_compat(void);
extern int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr);

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6d525285dbe8..87283a0e33d8 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -999,16 +999,6 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store);

-int kvmppc_core_check_processor_compat(void)
-{
- /*
- * We always return 0 for book3s. We check
- * for compatibility while loading the HV
- * or PR module
- */
- return 0;
-}
-
int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hcall)
{
return kvm->arch.kvm_ops->hcall_implemented(hcall);
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index c8b2b4478545..0ea61190ec04 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -314,7 +314,7 @@ static void kvmppc_core_vcpu_put_e500(struct kvm_vcpu *vcpu)
kvmppc_booke_vcpu_put(vcpu);
}

-int kvmppc_core_check_processor_compat(void)
+static int kvmppc_e500_check_processor_compat(void)
{
int r;

@@ -507,7 +507,7 @@ static int __init kvmppc_e500_init(void)
unsigned long handler_len;
unsigned long max_ivor = 0;

- r = kvmppc_core_check_processor_compat();
+ r = kvmppc_e500_check_processor_compat();
if (r)
goto err_out;

diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 57e0ad6a2ca3..795667f7ebf0 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -388,6 +388,10 @@ static int __init kvmppc_e500mc_init(void)
{
int r;

+ r = kvmppc_e500mc_check_processor_compat();
+ if (r)
+ return kvmppc_e500mc;
+
r = kvmppc_booke_init();
if (r)
goto err_out;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 74ea5687ecbc..36c27381a769 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -443,7 +443,7 @@ int kvm_arch_hardware_enable(void)

int kvm_arch_check_processor_compat(void *opaque)
{
- return kvmppc_core_check_processor_compat();
+ return 0;
}

int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:23:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 01/44] KVM: Register /dev/kvm as the _very_ last thing during initialization

Register /dev/kvm, i.e. expose KVM to userspace, only after all other
setup has completed. Once /dev/kvm is exposed, userspace can start
invoking KVM ioctls, creating VMs, etc... If userspace creates a VM
before KVM is done with its configuration, bad things may happen, e.g.
KVM will fail to properly migrate vCPU state if a VM is created before
KVM has registered preemption notifiers.

Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f1df24c2bc84..a188d27f78af 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5900,12 +5900,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,

kvm_chardev_ops.owner = module;

- r = misc_register(&kvm_dev);
- if (r) {
- pr_err("kvm: misc device register failed\n");
- goto out_unreg;
- }
-
register_syscore_ops(&kvm_syscore_ops);

kvm_preempt_ops.sched_in = kvm_sched_in;
@@ -5914,11 +5908,24 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
kvm_init_debug();

r = kvm_vfio_ops_init();
- WARN_ON(r);
+ if (WARN_ON_ONCE(r))
+ goto err_vfio;
+
+ /*
+ * Registration _must_ be the very last thing done, as this exposes
+ * /dev/kvm to userspace, i.e. all infrastructure must be setup!
+ */
+ r = misc_register(&kvm_dev);
+ if (r) {
+ pr_err("kvm: misc device register failed\n");
+ goto err_register;
+ }

return 0;

-out_unreg:
+err_register:
+ kvm_vfio_ops_exit();
+err_vfio:
kvm_async_pf_deinit();
out_free_4:
for_each_possible_cpu(cpu)
@@ -5944,8 +5951,14 @@ void kvm_exit(void)
{
int cpu;

- debugfs_remove_recursive(kvm_debugfs_dir);
+ /*
+ * Note, unregistering /dev/kvm doesn't strictly need to come first,
+ * fops_get(), a.k.a. try_module_get(), prevents acquiring references
+ * to KVM while the module is being stopped.
+ */
misc_deregister(&kvm_dev);
+
+ debugfs_remove_recursive(kvm_debugfs_dir);
for_each_possible_cpu(cpu)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
kmem_cache_destroy(kvm_vcpu_cache);
--
2.38.1.431.g37b22c650d-goog


2022-11-03 00:58:29

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 39/44] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

From: Isaku Yamahata <[email protected]>

Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock now
that KVM hooks CPU hotplug during the ONLINE phase, which can sleep.
Previously, KVM hooked the STARTING phase, which is not allowed to sleep
and thus could not take kvm_lock (a mutex).

Explicitly disable preemptions/IRQs in the CPU hotplug paths as needed to
keep arch code happy, e.g. x86 expects IRQs to be disabled during hardware
enabling, and expects preemption to be disabled during hardware disabling.
There are no preemption/interrupt concerns in the hotplug path, i.e. the
extra disabling is done purely to allow x86 to keep its sanity checks,
which are targeted primiarily at the "enable/disable all" paths.

Opportunistically update KVM's locking documentation.

Signed-off-by: Isaku Yamahata <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
Documentation/virt/kvm/locking.rst | 18 ++++++------
virt/kvm/kvm_main.c | 44 +++++++++++++++++++++---------
2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 845a561629f1..4feaf527575b 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -9,6 +9,8 @@ KVM Lock Overview

The acquisition orders for mutexes are as follows:

+- cpus_read_lock() is taken outside kvm_lock
+
- kvm->lock is taken outside vcpu->mutex

- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
@@ -29,6 +31,8 @@ The acquisition orders for mutexes are as follows:

On x86:

+- kvm_lock is taken outside kvm->mmu_lock
+
- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock

- kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock and
@@ -216,15 +220,11 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list
-
-``kvm_count_lock``
-^^^^^^^^^^^^^^^^^^
-
-:Type: raw_spinlock_t
-:Arch: any
-:Protects: - hardware virtualization enable/disable
-:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
- migration.
+ - kvm_usage_count
+ - hardware virtualization enable/disable
+ - module probing (x86 only)
+:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
+ enable/disable.

``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4e765ef9f4bd..c8d92e6c3922 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
*/

DEFINE_MUTEX(kvm_lock);
-static DEFINE_RAW_SPINLOCK(kvm_count_lock);
LIST_HEAD(vm_list);

static cpumask_var_t cpus_hardware_enabled;
@@ -5028,9 +5027,10 @@ static void hardware_enable_nolock(void *junk)

static int kvm_online_cpu(unsigned int cpu)
{
+ unsigned long flags;
int ret = 0;

- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
@@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
if (kvm_usage_count) {
WARN_ON_ONCE(atomic_read(&hardware_enable_failed));

+ local_irq_save(flags);
hardware_enable_nolock(NULL);
+ local_irq_restore(flags);
+
if (atomic_read(&hardware_enable_failed)) {
atomic_set(&hardware_enable_failed, 0);
ret = -EIO;
}
}
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
return ret;
}

@@ -5061,10 +5064,13 @@ static void hardware_disable_nolock(void *junk)

static int kvm_offline_cpu(unsigned int cpu)
{
- raw_spin_lock(&kvm_count_lock);
- if (kvm_usage_count)
+ mutex_lock(&kvm_lock);
+ if (kvm_usage_count) {
+ preempt_disable();
hardware_disable_nolock(NULL);
- raw_spin_unlock(&kvm_count_lock);
+ preempt_enable();
+ }
+ mutex_unlock(&kvm_lock);
return 0;
}

@@ -5079,9 +5085,11 @@ static void hardware_disable_all_nolock(void)

static void hardware_disable_all(void)
{
- raw_spin_lock(&kvm_count_lock);
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
hardware_disable_all_nolock();
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
}

static int hardware_enable_all(void)
@@ -5097,7 +5105,7 @@ static int hardware_enable_all(void)
* Disable CPU hotplug to prevent scenarios where KVM sees
*/
cpus_read_lock();
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);

kvm_usage_count++;
if (kvm_usage_count == 1) {
@@ -5110,7 +5118,7 @@ static int hardware_enable_all(void)
}
}

- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
cpus_read_unlock();

return r;
@@ -5716,6 +5724,15 @@ static void kvm_init_debug(void)

static int kvm_suspend(void)
{
+ /*
+ * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
+ * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
+ * is stable. Assert that kvm_lock is not held as a paranoid sanity
+ * check that the system isn't suspended when KVM is enabling hardware.
+ */
+ lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_irqs_disabled();
+
if (kvm_usage_count)
hardware_disable_nolock(NULL);
return 0;
@@ -5723,10 +5740,11 @@ static int kvm_suspend(void)

static void kvm_resume(void)
{
- if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_count_lock);
+ lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_irqs_disabled();
+
+ if (kvm_usage_count)
hardware_enable_nolock(NULL);
- }
}

static struct syscore_ops kvm_syscore_ops = {
--
2.38.1.431.g37b22c650d-goog


2022-11-03 01:02:23

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 02/44] KVM: Initialize IRQ FD after arch hardware setup

Move initialization of KVM's IRQ FD workqueue below arch hardware setup
as a step towards consolidating arch "init" and "hardware setup", and
eventually towards dropping the hooks entirely. There is no dependency
on the workqueue being created before hardware setup, the workqueue is
used only when destroying VMs, i.e. only needs to be created before
/dev/kvm is exposed to userspace.

Move the destruction of the workqueue before the arch hooks to maintain
symmetry, and so that arch code can move away from the hooks without
having to worry about ordering changes.

Reword the comment about kvm_irqfd_init() needing to come after
kvm_arch_init() to call out that kvm_arch_init() must come before common
KVM does _anything_, as x86 very subtly relies on that behavior to deal
with multiple calls to kvm_init(), e.g. if userspace attempts to load
kvm_amd.ko and kvm_intel.ko. Tag the code with a FIXME, as x86's subtle
requirement is gross, and invoking an arch callback as the very first
action in a helper that is called only from arch code is silly.

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 37 ++++++++++++++++++-------------------
1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a188d27f78af..e0424af52acc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5833,24 +5833,19 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
int r;
int cpu;

+ /*
+ * FIXME: Get rid of kvm_arch_init(), vendor code should call arch code
+ * directly. Note, kvm_arch_init() _must_ be called before anything
+ * else as x86 relies on checks buried in kvm_arch_init() to guard
+ * against multiple calls to kvm_init().
+ */
r = kvm_arch_init(opaque);
if (r)
- goto out_fail;
-
- /*
- * kvm_arch_init makes sure there's at most one caller
- * for architectures that support multiple implementations,
- * like intel and amd on x86.
- * kvm_arch_init must be called before kvm_irqfd_init to avoid creating
- * conflicts in case kvm is already setup for another implementation.
- */
- r = kvm_irqfd_init();
- if (r)
- goto out_irqfd;
+ return r;

if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
r = -ENOMEM;
- goto out_free_0;
+ goto err_hw_enabled;
}

r = kvm_arch_hardware_setup(opaque);
@@ -5894,9 +5889,13 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
}
}

+ r = kvm_irqfd_init();
+ if (r)
+ goto err_irqfd;
+
r = kvm_async_pf_init();
if (r)
- goto out_free_4;
+ goto err_async_pf;

kvm_chardev_ops.owner = module;

@@ -5927,6 +5926,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
kvm_vfio_ops_exit();
err_vfio:
kvm_async_pf_deinit();
+err_async_pf:
+ kvm_irqfd_exit();
+err_irqfd:
out_free_4:
for_each_possible_cpu(cpu)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
@@ -5938,11 +5940,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
kvm_arch_hardware_unsetup();
out_free_1:
free_cpumask_var(cpus_hardware_enabled);
-out_free_0:
- kvm_irqfd_exit();
-out_irqfd:
+err_hw_enabled:
kvm_arch_exit();
-out_fail:
return r;
}
EXPORT_SYMBOL_GPL(kvm_init);
@@ -5967,9 +5966,9 @@ void kvm_exit(void)
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
on_each_cpu(hardware_disable_nolock, NULL, 1);
+ kvm_irqfd_exit();
kvm_arch_hardware_unsetup();
kvm_arch_exit();
- kvm_irqfd_exit();
free_cpumask_var(cpus_hardware_enabled);
kvm_vfio_ops_exit();
}
--
2.38.1.431.g37b22c650d-goog


2022-11-03 01:06:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 41/44] KVM: Use a per-CPU variable to track which CPUs have enabled virtualization

Use a per-CPU variable instead of a shared bitmap to track which CPUs
have successfully enabled virtualization hardware. Using a per-CPU bool
avoids the need for an additional allocation, and arguably yields easier
to read code. Using a bitmap would be advantageous if KVM used it to
avoid generating IPIs to CPUs that failed to enable hardware, but that's
an extreme edge case and not worth optimizing, and the low level helpers
would still want to keep their individual checks as attempting to enable
virtualization hardware when it's already enabled can be problematic,
e.g. Intel's VMXON will fault.

Opportunistically change the order in hardware_enable_nolock() to set
the flag if and only if hardware enabling is successful, instead of
speculatively setting the flag and then clearing it on failure.

Add a comment explaining that the check in hardware_disable_nolock()
isn't simply paranoia. Waaay back when, commit 1b6c016818a5 ("KVM: Keep
track of which cpus have virtualization enabled"), added the logic as a
guards against CPU hotplug racing with hardware enable/disable. Now that
KVM has eliminated the race by taking cpu_hotplug_lock for read (via
cpus_read_lock()) when enabling or disabling hardware, at first glance it
appears that the check is now superfluous, i.e. it's tempting to remove
the per-CPU flag entirely...

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 41 ++++++++++++++++++-----------------------
1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4a42b78bfb0e..31949a89fe25 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,7 +102,7 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
DEFINE_MUTEX(kvm_lock);
LIST_HEAD(vm_list);

-static cpumask_var_t cpus_hardware_enabled;
+static DEFINE_PER_CPU(bool, hardware_enabled);
static int kvm_usage_count;
static atomic_t hardware_enable_failed;

@@ -5008,21 +5008,17 @@ static struct miscdevice kvm_dev = {

static void hardware_enable_nolock(void *junk)
{
- int cpu = raw_smp_processor_id();
- int r;
-
- if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
+ if (__this_cpu_read(hardware_enabled))
return;

- cpumask_set_cpu(cpu, cpus_hardware_enabled);
-
- r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
+ if (kvm_arch_hardware_enable()) {
atomic_inc(&hardware_enable_failed);
- pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
+ pr_info("kvm: enabling virtualization on CPU%d failed\n",
+ raw_smp_processor_id());
+ return;
}
+
+ __this_cpu_write(hardware_enabled, true);
}

static int kvm_online_cpu(unsigned int cpu)
@@ -5054,12 +5050,16 @@ static int kvm_online_cpu(unsigned int cpu)

static void hardware_disable_nolock(void *junk)
{
- int cpu = raw_smp_processor_id();
-
- if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
+ /*
+ * Note, hardware_disable_all_nolock() tells all online CPUs to disable
+ * hardware, not just CPUs that successfully enabled hardware!
+ */
+ if (!__this_cpu_read(hardware_enabled))
return;
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
+
kvm_arch_hardware_disable();
+
+ __this_cpu_write(hardware_enabled, false);
}

static int kvm_offline_cpu(unsigned int cpu)
@@ -5861,13 +5861,11 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
int r;
int cpu;

- if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
- return -ENOMEM;
-
r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
kvm_online_cpu, kvm_offline_cpu);
if (r)
- goto out_free_2;
+ return r;
+
register_reboot_notifier(&kvm_reboot_notifier);

/* A kmem cache lets us meet the alignment requirements of fx_save. */
@@ -5940,8 +5938,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
-out_free_2:
- free_cpumask_var(cpus_hardware_enabled);
return r;
}
EXPORT_SYMBOL_GPL(kvm_init);
@@ -5967,7 +5963,6 @@ void kvm_exit(void)
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
kvm_irqfd_exit();
- free_cpumask_var(cpus_hardware_enabled);
}
EXPORT_SYMBOL_GPL(kvm_exit);

--
2.38.1.431.g37b22c650d-goog


2022-11-03 01:08:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 27/44] KVM: Drop kvm_arch_{init,exit}() hooks

Drop kvm_arch_init() and kvm_arch_exit() now that all implementations
are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/arm.c | 11 -----------
arch/mips/kvm/mips.c | 10 ----------
arch/powerpc/include/asm/kvm_host.h | 1 -
arch/powerpc/kvm/powerpc.c | 5 -----
arch/riscv/kvm/main.c | 9 ---------
arch/s390/kvm/kvm-s390.c | 10 ----------
arch/x86/kvm/x86.c | 10 ----------
include/linux/kvm_host.h | 3 ---
virt/kvm/kvm_main.c | 19 ++-----------------
9 files changed, 2 insertions(+), 76 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6e0061eac627..75c5125b0dd3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2284,17 +2284,6 @@ static __init int kvm_arm_init(void)
return err;
}

-int kvm_arch_init(void *opaque)
-{
- return 0;
-}
-
-/* NOP: Compiling as a module not supported */
-void kvm_arch_exit(void)
-{
-
-}
-
static int __init early_kvm_mode_cfg(char *arg)
{
if (!arg)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index ae7a24342fdf..3cade648827a 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1010,16 +1010,6 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
return r;
}

-int kvm_arch_init(void *opaque)
-{
- return 0;
-}
-
-void kvm_arch_exit(void)
-{
-
-}
-
int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 5d2c3a487e73..0a80e80c7b9e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -881,7 +881,6 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
-static inline void kvm_arch_exit(void) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 36c27381a769..34278042ad27 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -2525,11 +2525,6 @@ void kvmppc_init_lpid(unsigned long nr_lpids_param)
}
EXPORT_SYMBOL_GPL(kvmppc_init_lpid);

-int kvm_arch_init(void *opaque)
-{
- return 0;
-}
-
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ppc_instr);

void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index cb063b8a9a0f..4710a6751687 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -65,15 +65,6 @@ void kvm_arch_hardware_disable(void)
csr_write(CSR_HIDELEG, 0);
}

-int kvm_arch_init(void *opaque)
-{
- return 0;
-}
-
-void kvm_arch_exit(void)
-{
-}
-
static int __init riscv_kvm_init(void)
{
const char *str;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f6ae845bc1c1..7c1c6d81b5d7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -533,16 +533,6 @@ static void __kvm_s390_exit(void)
debug_unregister(kvm_s390_dbf_uv);
}

-int kvm_arch_init(void *opaque)
-{
- return 0;
-}
-
-void kvm_arch_exit(void)
-{
-
-}
-
/* Section: device related */
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 218707597bea..2b4530a33298 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9271,16 +9271,6 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
kvm_pmu_ops_update(ops->pmu_ops);
}

-int kvm_arch_init(void *opaque)
-{
- return 0;
-}
-
-void kvm_arch_exit(void)
-{
-
-}
-
static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
{
u64 host_pat;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9b52bd40be56..6c2a28c4c684 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1423,9 +1423,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);

-int kvm_arch_init(void *opaque);
-void kvm_arch_exit(void);
-
void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);

void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 27ce263a80e4..17c852cb6842 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5833,20 +5833,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
int r;
int cpu;

- /*
- * FIXME: Get rid of kvm_arch_init(), vendor code should call arch code
- * directly. Note, kvm_arch_init() _must_ be called before anything
- * else as x86 relies on checks buried in kvm_arch_init() to guard
- * against multiple calls to kvm_init().
- */
- r = kvm_arch_init(opaque);
- if (r)
- return r;
-
- if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
- r = -ENOMEM;
- goto err_hw_enabled;
- }
+ if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
+ return -ENOMEM;

c.ret = &r;
c.opaque = opaque;
@@ -5934,8 +5922,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
out_free_2:
free_cpumask_var(cpus_hardware_enabled);
-err_hw_enabled:
- kvm_arch_exit();
return r;
}
EXPORT_SYMBOL_GPL(kvm_init);
@@ -5963,7 +5949,6 @@ void kvm_exit(void)
on_each_cpu(hardware_disable_nolock, NULL, 1);
kvm_irqfd_exit();
free_cpumask_var(cpus_hardware_enabled);
- kvm_arch_exit();
}
EXPORT_SYMBOL_GPL(kvm_exit);

--
2.38.1.431.g37b22c650d-goog


2022-11-03 01:11:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 25/44] KVM: s390: Do s390 specific init without bouncing through kvm_init()

Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(),
and invoke the new helper directly from kvm_s390_init() instead of
bouncing through kvm_init(). Invoking kvm_arch_init() is the very
first action performed by kvm_init(), i.e. this is a glorified nop.

Moving setup to __kvm_s390_init() will allow tagging more functions as
__init, and emptying kvm_arch_init() will allow dropping the hook
entirely once all architecture implementations are nops.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7fcd2d3b3558..e1c9980aae78 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -461,7 +461,7 @@ static void kvm_s390_cpu_feat_init(void)
*/
}

-int kvm_arch_init(void *opaque)
+static int __kvm_s390_init(void)
{
int rc = -ENOMEM;

@@ -519,7 +519,7 @@ int kvm_arch_init(void *opaque)
return rc;
}

-void kvm_arch_exit(void)
+static void __kvm_s390_exit(void)
{
gmap_unregister_pte_notifier(&gmap_notifier);
gmap_unregister_pte_notifier(&vsie_gmap_notifier);
@@ -533,6 +533,16 @@ void kvm_arch_exit(void)
debug_unregister(kvm_s390_dbf_uv);
}

+int kvm_arch_init(void *opaque)
+{
+ return 0;
+}
+
+void kvm_arch_exit(void)
+{
+
+}
+
/* Section: device related */
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
@@ -5634,7 +5644,7 @@ static inline unsigned long nonhyp_mask(int i)

static int __init kvm_s390_init(void)
{
- int i;
+ int i, r;

if (!sclp.has_sief2) {
pr_info("SIE is not available\n");
@@ -5650,12 +5660,23 @@ static int __init kvm_s390_init(void)
kvm_s390_fac_base[i] |=
stfle_fac_list[i] & nonhyp_mask(i);

- return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ r = __kvm_s390_init();
+ if (r)
+ return r;
+
+ r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ if (r) {
+ __kvm_s390_exit();
+ return r;
+ }
+ return 0;
}

static void __exit kvm_s390_exit(void)
{
kvm_exit();
+
+ __kvm_s390_exit();
}

module_init(kvm_s390_init);
--
2.38.1.431.g37b22c650d-goog


2022-11-03 07:28:35

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 25/44] KVM: s390: Do s390 specific init without bouncing through kvm_init()

On 3/11/22 00:18, Sean Christopherson wrote:
> Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(),
> and invoke the new helper directly from kvm_s390_init() instead of
> bouncing through kvm_init(). Invoking kvm_arch_init() is the very
> first action performed by kvm_init(), i.e. this is a glorified nop.
>
> Moving setup to __kvm_s390_init() will allow tagging more functions as
> __init, and emptying kvm_arch_init() will allow dropping the hook
> entirely once all architecture implementations are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/s390/kvm/kvm-s390.c | 29 +++++++++++++++++++++++++----
> 1 file changed, 25 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>


2022-11-03 07:48:23

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 30/44] KVM: Drop kvm_arch_check_processor_compat() hook

On 3/11/22 00:18, Sean Christopherson wrote:
> Drop kvm_arch_check_processor_compat() and its support code now that all
> architecture implementations are nops.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 7 +------
> arch/mips/kvm/mips.c | 7 +------
> arch/powerpc/kvm/book3s.c | 2 +-
> arch/powerpc/kvm/e500.c | 2 +-
> arch/powerpc/kvm/e500mc.c | 2 +-
> arch/powerpc/kvm/powerpc.c | 5 -----
> arch/riscv/kvm/main.c | 7 +------
> arch/s390/kvm/kvm-s390.c | 7 +------
> arch/x86/kvm/svm/svm.c | 4 ++--
> arch/x86/kvm/vmx/vmx.c | 4 ++--
> arch/x86/kvm/x86.c | 5 -----
> include/linux/kvm_host.h | 4 +---
> virt/kvm/kvm_main.c | 24 +-----------------------
> 13 files changed, 13 insertions(+), 67 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>


2022-11-03 07:53:10

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 20/44] KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init()

On 3/11/22 00:18, Sean Christopherson wrote:
> Invoke kvm_mips_emulation_init() directly from kvm_mips_init() instead
> of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is
> a glorified nop as invoking kvm_arch_init() is the very first action
> performed by kvm_init().
>
> Emptying kvm_arch_init() will allow dropping the hook entirely once all
> architecture implementations are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/mips/kvm/mips.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>


2022-11-03 07:53:24

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 17/44] KVM: arm64: Do arm/arch initialiation without bouncing through kvm_init()

Hi Sean,

On 3/11/22 00:18, Sean Christopherson wrote:
> Move arm/arch specific initialization directly in arm's module_init(),
> now called kvm_arm_init(), instead of bouncing through kvm_init() to
> reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action
> performed by kvm_init(), i.e. this is a glorified nop.
>
> Making kvm_arch_init() a nop will allow dropping it entirely once all
> other architectures follow suit.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)

> /* NOP: Compiling as a module not supported */
> void kvm_arch_exit(void)
> {
> - kvm_unregister_perf_callbacks();

Doesn't this belong to the previous patch?

> +
> }


2022-11-03 07:53:35

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 21/44] KVM: MIPS: Register die notifier prior to kvm_init()

On 3/11/22 00:18, Sean Christopherson wrote:
> Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes
> /dev/kvm to userspace and thus allows userspace to create VMs (and call
> other ioctls).
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/mips/kvm/mips.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>


2022-11-03 08:09:43

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 22/44] KVM: RISC-V: Do arch init directly in riscv_kvm_init()

On 3/11/22 00:18, Sean Christopherson wrote:
> Fold the guts of kvm_arch_init() into riscv_kvm_init() instead of
> bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a
> glorified nop as invoking kvm_arch_init() is the very first action
> performed by kvm_init().
>
> Moving setup to riscv_kvm_init(), which is tagged __init, will allow
> tagging more functions and data with __init and __ro_after_init. And
> emptying kvm_arch_init() will allow dropping the hook entirely once all
> architecture implementations are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/riscv/kvm/main.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>


2022-11-03 08:27:25

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH 27/44] KVM: Drop kvm_arch_{init,exit}() hooks

On 3/11/22 00:18, Sean Christopherson wrote:
> Drop kvm_arch_init() and kvm_arch_exit() now that all implementations
> are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 11 -----------
> arch/mips/kvm/mips.c | 10 ----------
> arch/powerpc/include/asm/kvm_host.h | 1 -
> arch/powerpc/kvm/powerpc.c | 5 -----
> arch/riscv/kvm/main.c | 9 ---------
> arch/s390/kvm/kvm-s390.c | 10 ----------
> arch/x86/kvm/x86.c | 10 ----------
> include/linux/kvm_host.h | 3 ---
> virt/kvm/kvm_main.c | 19 ++-----------------
> 9 files changed, 2 insertions(+), 76 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>


2022-11-03 12:37:24

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

Am 03.11.22 um 00:18 schrieb Sean Christopherson:
> Non-x86 folks, please test on hardware when possible. I made a _lot_ of
> mistakes when moving code around. Thankfully, x86 was the trickiest code
> to deal with, and I'm fairly confident that I found all the bugs I
> introduced via testing. But the number of mistakes I made and found on
> x86 makes me more than a bit worried that I screwed something up in other
> arch code.
>
> This is a continuation of Chao's series to do x86 CPU compatibility checks
> during virtualization hardware enabling[1], and of Isaku's series to try
> and clean up the hardware enabling paths so that x86 (Intel specifically)
> can temporarily enable hardware during module initialization without
> causing undue pain for other architectures[2]. It also includes one patch
> from another mini-series from Isaku that provides the less controversial
> patches[3].
>
> The main theme of this series is to kill off kvm_arch_init(),
> kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which
> all originated in x86 code from way back when, and needlessly complicate
> both common KVM code and architecture code. E.g. many architectures don't
> mark functions/data as __init/__ro_after_init purely because kvm_init()
> isn't marked __init to support x86's separate vendor modules.
>
> The idea/hope is that with those hooks gone (moved to arch code), it will
> be easier for x86 (and other architectures) to modify their module init
> sequences as needed without having to fight common KVM code. E.g. I'm
> hoping that ARM can build on this to simplify its hardware enabling logic,
> especially the pKVM side of things.
>
> There are bug fixes throughout this series. They are more scattered than
> I would usually prefer, but getting the sequencing correct was a gigantic
> pain for many of the x86 fixes due to needing to fix common code in order
> for the x86 fix to have any meaning. And while the bugs are often fatal,
> they aren't all that interesting for most users as they either require a
> malicious admin or broken hardware, i.e. aren't likely to be encountered
> by the vast majority of KVM users. So unless someone _really_ wants a
> particular fix isolated for backporting, I'm not planning on shuffling
> patches.
>
> Tested on x86. Lightly tested on arm64. Compile tested only on all other
> architectures.

Some sniff tests seem to work ok on s390.


2022-11-03 13:02:47

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH 25/44] KVM: s390: Do s390 specific init without bouncing through kvm_init()

On Wed, 2 Nov 2022 23:18:52 +0000
Sean Christopherson <[email protected]> wrote:

> Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(),
> and invoke the new helper directly from kvm_s390_init() instead of
> bouncing through kvm_init(). Invoking kvm_arch_init() is the very
> first action performed by kvm_init(), i.e. this is a glorified nop.
>
> Moving setup to __kvm_s390_init() will allow tagging more functions as
> __init, and emptying kvm_arch_init() will allow dropping the hook
> entirely once all architecture implementations are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/s390/kvm/kvm-s390.c | 29 +++++++++++++++++++++++++----
> 1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 7fcd2d3b3558..e1c9980aae78 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -461,7 +461,7 @@ static void kvm_s390_cpu_feat_init(void)
> */
> }
>
> -int kvm_arch_init(void *opaque)
> +static int __kvm_s390_init(void)
> {
> int rc = -ENOMEM;
>
> @@ -519,7 +519,7 @@ int kvm_arch_init(void *opaque)
> return rc;
> }
>
> -void kvm_arch_exit(void)
> +static void __kvm_s390_exit(void)
> {
> gmap_unregister_pte_notifier(&gmap_notifier);
> gmap_unregister_pte_notifier(&vsie_gmap_notifier);
> @@ -533,6 +533,16 @@ void kvm_arch_exit(void)
> debug_unregister(kvm_s390_dbf_uv);
> }
>
> +int kvm_arch_init(void *opaque)
> +{
> + return 0;
> +}
> +
> +void kvm_arch_exit(void)
> +{
> +
> +}
> +

I wonder at this point if it's possible to define kvm_arch_init and
kvm_arch_exit directly in kvm_main.c with __weak

> /* Section: device related */
> long kvm_arch_dev_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg)
> @@ -5634,7 +5644,7 @@ static inline unsigned long nonhyp_mask(int i)
>
> static int __init kvm_s390_init(void)
> {
> - int i;
> + int i, r;
>
> if (!sclp.has_sief2) {
> pr_info("SIE is not available\n");
> @@ -5650,12 +5660,23 @@ static int __init kvm_s390_init(void)
> kvm_s390_fac_base[i] |=
> stfle_fac_list[i] & nonhyp_mask(i);
>
> - return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + r = __kvm_s390_init();
> + if (r)
> + return r;
> +
> + r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + if (r) {
> + __kvm_s390_exit();
> + return r;
> + }
> + return 0;
> }
>
> static void __exit kvm_s390_exit(void)
> {
> kvm_exit();
> +
> + __kvm_s390_exit();
> }
>
> module_init(kvm_s390_init);


2022-11-03 13:43:49

by Claudio Imbrenda

[permalink] [raw]
Subject: Re: [PATCH 25/44] KVM: s390: Do s390 specific init without bouncing through kvm_init()

On Thu, 3 Nov 2022 13:44:15 +0100
Claudio Imbrenda <[email protected]> wrote:

> On Wed, 2 Nov 2022 23:18:52 +0000
> Sean Christopherson <[email protected]> wrote:
>
> > Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(),
> > and invoke the new helper directly from kvm_s390_init() instead of
> > bouncing through kvm_init(). Invoking kvm_arch_init() is the very
> > first action performed by kvm_init(), i.e. this is a glorified nop.
> >
> > Moving setup to __kvm_s390_init() will allow tagging more functions as
> > __init, and emptying kvm_arch_init() will allow dropping the hook
> > entirely once all architecture implementations are nops.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/s390/kvm/kvm-s390.c | 29 +++++++++++++++++++++++++----
> > 1 file changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 7fcd2d3b3558..e1c9980aae78 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -461,7 +461,7 @@ static void kvm_s390_cpu_feat_init(void)
> > */
> > }
> >
> > -int kvm_arch_init(void *opaque)
> > +static int __kvm_s390_init(void)
> > {
> > int rc = -ENOMEM;
> >
> > @@ -519,7 +519,7 @@ int kvm_arch_init(void *opaque)
> > return rc;
> > }
> >
> > -void kvm_arch_exit(void)
> > +static void __kvm_s390_exit(void)
> > {
> > gmap_unregister_pte_notifier(&gmap_notifier);
> > gmap_unregister_pte_notifier(&vsie_gmap_notifier);
> > @@ -533,6 +533,16 @@ void kvm_arch_exit(void)
> > debug_unregister(kvm_s390_dbf_uv);
> > }
> >
> > +int kvm_arch_init(void *opaque)
> > +{
> > + return 0;
> > +}
> > +
> > +void kvm_arch_exit(void)
> > +{
> > +
> > +}
> > +
>
> I wonder at this point if it's possible to define kvm_arch_init and
> kvm_arch_exit directly in kvm_main.c with __weak

ah, nevermind, you get rid of them completely in the next patch

>
> > /* Section: device related */
> > long kvm_arch_dev_ioctl(struct file *filp,
> > unsigned int ioctl, unsigned long arg)
> > @@ -5634,7 +5644,7 @@ static inline unsigned long nonhyp_mask(int i)
> >
> > static int __init kvm_s390_init(void)
> > {
> > - int i;
> > + int i, r;
> >
> > if (!sclp.has_sief2) {
> > pr_info("SIE is not available\n");
> > @@ -5650,12 +5660,23 @@ static int __init kvm_s390_init(void)
> > kvm_s390_fac_base[i] |=
> > stfle_fac_list[i] & nonhyp_mask(i);
> >
> > - return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> > + r = __kvm_s390_init();
> > + if (r)
> > + return r;
> > +
> > + r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> > + if (r) {
> > + __kvm_s390_exit();
> > + return r;
> > + }
> > + return 0;
> > }
> >
> > static void __exit kvm_s390_exit(void)
> > {
> > kvm_exit();
> > +
> > + __kvm_s390_exit();
> > }
> >
> > module_init(kvm_s390_init);
>


2022-11-03 15:33:38

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 39/44] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

On 11/3/22 00:19, Sean Christopherson wrote:
> +- kvm_lock is taken outside kvm->mmu_lock

Not surprising since one is a mutex and one is an rwlock. :) You can
drop this hunk as well as the "Opportunistically update KVM's locking
documentation" sentence in the commit message.

> - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
>
> - kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock and
> @@ -216,15 +220,11 @@ time it will be set using the Dirty tracking mechanism described above.
> :Type: mutex
> :Arch: any
> :Protects: - vm_list
> -
> -``kvm_count_lock``
> -^^^^^^^^^^^^^^^^^^
> -
> -:Type: raw_spinlock_t
> -:Arch: any
> -:Protects: - hardware virtualization enable/disable
> -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
> - migration.
> + - kvm_usage_count
> + - hardware virtualization enable/disable
> + - module probing (x86 only)

What do you mean exactly by "module probing"? Is it anything else than
what is serialized by vendor_module_lock?

Paolo

> +:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
> + enable/disable.
>
> ``kvm->mn_invalidate_lock``
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4e765ef9f4bd..c8d92e6c3922 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> */
>
> DEFINE_MUTEX(kvm_lock);
> -static DEFINE_RAW_SPINLOCK(kvm_count_lock);
> LIST_HEAD(vm_list);
>
> static cpumask_var_t cpus_hardware_enabled;
> @@ -5028,9 +5027,10 @@ static void hardware_enable_nolock(void *junk)
>
> static int kvm_online_cpu(unsigned int cpu)
> {
> + unsigned long flags;
> int ret = 0;
>
> - raw_spin_lock(&kvm_count_lock);
> + mutex_lock(&kvm_lock);
> /*
> * Abort the CPU online process if hardware virtualization cannot
> * be enabled. Otherwise running VMs would encounter unrecoverable
> @@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
> if (kvm_usage_count) {
> WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
>
> + local_irq_save(flags);
> hardware_enable_nolock(NULL);
> + local_irq_restore(flags);
> +
> if (atomic_read(&hardware_enable_failed)) {
> atomic_set(&hardware_enable_failed, 0);
> ret = -EIO;
> }
> }
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> return ret;
> }
>
> @@ -5061,10 +5064,13 @@ static void hardware_disable_nolock(void *junk)
>
> static int kvm_offline_cpu(unsigned int cpu)
> {
> - raw_spin_lock(&kvm_count_lock);
> - if (kvm_usage_count)
> + mutex_lock(&kvm_lock);
> + if (kvm_usage_count) {
> + preempt_disable();
> hardware_disable_nolock(NULL);
> - raw_spin_unlock(&kvm_count_lock);
> + preempt_enable();
> + }
> + mutex_unlock(&kvm_lock);
> return 0;
> }
>
> @@ -5079,9 +5085,11 @@ static void hardware_disable_all_nolock(void)
>
> static void hardware_disable_all(void)
> {
> - raw_spin_lock(&kvm_count_lock);
> + cpus_read_lock();
> + mutex_lock(&kvm_lock);
> hardware_disable_all_nolock();
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> + cpus_read_unlock();
> }
>
> static int hardware_enable_all(void)
> @@ -5097,7 +5105,7 @@ static int hardware_enable_all(void)
> * Disable CPU hotplug to prevent scenarios where KVM sees
> */
> cpus_read_lock();
> - raw_spin_lock(&kvm_count_lock);
> + mutex_lock(&kvm_lock);
>
> kvm_usage_count++;
> if (kvm_usage_count == 1) {
> @@ -5110,7 +5118,7 @@ static int hardware_enable_all(void)
> }
> }
>
> - raw_spin_unlock(&kvm_count_lock);
> + mutex_unlock(&kvm_lock);
> cpus_read_unlock();
>
> return r;
> @@ -5716,6 +5724,15 @@ static void kvm_init_debug(void)
>
> static int kvm_suspend(void)
> {
> + /*
> + * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
> + * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
> + * is stable. Assert that kvm_lock is not held as a paranoid sanity
> + * check that the system isn't suspended when KVM is enabling hardware.
> + */
> + lockdep_assert_not_held(&kvm_lock);
> + lockdep_assert_irqs_disabled();
> +
> if (kvm_usage_count)
> hardware_disable_nolock(NULL);
> return 0;
> @@ -5723,10 +5740,11 @@ static int kvm_suspend(void)
>
> static void kvm_resume(void)
> {
> - if (kvm_usage_count) {
> - lockdep_assert_not_held(&kvm_count_lock);
> + lockdep_assert_not_held(&kvm_lock);
> + lockdep_assert_irqs_disabled();
> +
> + if (kvm_usage_count)
> hardware_enable_nolock(NULL);
> - }
> }
>
> static struct syscore_ops kvm_syscore_ops = {


2022-11-03 15:58:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On 11/3/22 00:19, Sean Christopherson wrote:
> From: Chao Gao<[email protected]>
>
> Do compatibility checks when enabling hardware to effectively add
> compatibility checks when onlining a CPU. Abort enabling, i.e. the
> online process, if the (hotplugged) CPU is incompatible with the known
> good setup.

This paragraph is not true with this patch being before "KVM: Rename and
move CPUHP_AP_KVM_STARTING to ONLINE section".

Paolo


2022-11-03 16:02:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 17/44] KVM: arm64: Do arm/arch initialiation without bouncing through kvm_init()

On Thu, Nov 03, 2022, Philippe Mathieu-Daud? wrote:
> Hi Sean,
>
> On 3/11/22 00:18, Sean Christopherson wrote:
> > Move arm/arch specific initialization directly in arm's module_init(),
> > now called kvm_arm_init(), instead of bouncing through kvm_init() to
> > reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action
> > performed by kvm_init(), i.e. this is a glorified nop.
> >
> > Making kvm_arch_init() a nop will allow dropping it entirely once all
> > other architectures follow suit.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/arm64/kvm/arm.c | 25 ++++++++++++++++---------
> > 1 file changed, 16 insertions(+), 9 deletions(-)
>
> > /* NOP: Compiling as a module not supported */
> > void kvm_arch_exit(void)
> > {
> > - kvm_unregister_perf_callbacks();
>
> Doesn't this belong to the previous patch?

No, but the above changelog is a lie, there is very much a functional change here.

The goal of the previous patch is to fix the error paths in kvm_arch_init(), a.k.a.
kvm_arm_init(). After fixing kvm_arch_init(), there are still bugs in the sequence
as a whole because kvm_arch_exit() doesn't unwind other state, e.g. kvm_arch_exit()
should really look something like:

void kvm_arch_exit(void)
{
teardown_subsystems();

if (!is_kernel_in_hyp_mode())
teardown_hyp_mode();

kvm_arm_vmid_alloc_free();

if (is_protected_kvm_enabled())
???
}

Becuase although the comment "NOP: Compiling as a module not supported" is correct
about KVM ARM always having to be built into the kernel, kvm_arch_exit() can still
be called if a later stage of kvm_init() fails.

But rather than add a patch to fix kvm_arch_exit(), I chose to fix the bug by
moving code out of kvm_arch_init() so that the unwind sequence established in the
previous patch could be reused.

Except I managed to forget those details when writing the changelog. The changelog
should instead be:

KVM: arm64: Do arm/arch initialization without bouncing through kvm_init()

Do arm/arch specific initialization directly in arm's module_init(), now
called kvm_arm_init(), instead of bouncing through kvm_init() to reach
kvm_arch_init(). Invoking kvm_arch_init() is the very first action
performed by kvm_init(), so from a initialization perspective this is a
glorified nop.

Avoiding kvm_arch_init() also fixes a mostly benign bug as kvm_arch_exit()
doesn't properly unwind if a later stage of kvm_init() fails. While the
soon-to-be-deleted comment about compiling as a module being unsupported
is correct, kvm_arch_exit() can still be called by kvm_init() if any step
after the call to kvm_arch_init() succeeds.

Add a FIXME to call out that pKVM initialization isn't unwound if
kvm_init() fails, which is a pre-existing problem inherited from
kvm_arch_exit().

Making kvm_arch_init() a nop will also allow dropping kvm_arch_init() and
kvm_arch_exit() entirely once all other architectures follow suit.

2022-11-03 16:03:00

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On 11/3/22 13:08, Christian Borntraeger wrote:
>> There are bug fixes throughout this series.  They are more scattered than
>> I would usually prefer, but getting the sequencing correct was a gigantic
>> pain for many of the x86 fixes due to needing to fix common code in order
>> for the x86 fix to have any meaning.  And while the bugs are often fatal,
>> they aren't all that interesting for most users as they either require a
>> malicious admin or broken hardware, i.e. aren't likely to be encountered
>> by the vast majority of KVM users.  So unless someone _really_ wants a
>> particular fix isolated for backporting, I'm not planning on shuffling
>> patches.
>>
>> Tested on x86.  Lightly tested on arm64.  Compile tested only on all
>> other architectures.
>
> Some sniff tests seem to work ok on s390.

Thanks. There are just a couple nits, and MIPS/PPC/RISC-V have very
small changes. Feel free to send me a pull request once Marc acks.

Paolo


2022-11-03 16:03:30

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On 11/3/22 00:19, Sean Christopherson wrote:
> + if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> + !boot_cpu_has(X86_FEATURE_VMX)) {
> + pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
> + return false;

I think the reference to the BIOS should remain in these messages and in
svm.c (even though these days it's much less common for vendors to
default to disabled virtualization in the system setup).

The check for X86_FEATURE_MSR_IA32_FEAT_CTL is not needed because
init_ia32_feat_ctl() will clear X86_FEATURE_VMX if the rdmsr fail (and
not set X86_FEATURE_MSR_IA32_FEAT_CTL).

Paolo


2022-11-03 18:01:50

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On 11/3/22 18:44, Sean Christopherson wrote:
>>> Do compatibility checks when enabling hardware to effectively add
>>> compatibility checks when onlining a CPU. Abort enabling, i.e. the
>>> online process, if the (hotplugged) CPU is incompatible with the known
>>> good setup.
>>
>> This paragraph is not true with this patch being before "KVM: Rename and
>> move CPUHP_AP_KVM_STARTING to ONLINE section".
>
> Argh, good eyes. Getting the ordering correct in this series has been quite the
> struggle. Assuming there are no subtle dependencies between x86 and common KVM,
> the ordering should be something like this:

It's not a problem to keep the ordering in this v1, just fix the commit
message like "Do compatibility checks when enabling hardware to
effectively add compatibility checks on CPU hotplug. For now KVM is
using a STARTING hook, which makes it impossible to abort the hotplug if
the new CPU is incompatible with the known good setup; switching to an
ONLINE hook will fix this."

Paolo

> KVM: Opt out of generic hardware enabling on s390 and PPC
> KVM: Register syscore (suspend/resume) ops early in kvm_init()
> KVM: x86: Do compatibility checks when onlining CPU
> KVM: SVM: Check for SVM support in CPU compatibility checks
> KVM: VMX: Shuffle support checks and hardware enabling code around
> KVM: x86: Do VMX/SVM support checks directly in vendor code
> KVM: x86: Unify pr_fmt to use module name for all KVM modules
> KVM: x86: Use KBUILD_MODNAME to specify vendor module name
> KVM: Make hardware_enable_failed a local variable in the "enable all" path
> KVM: Use a per-CPU variable to track which CPUs have enabled virtualization
> KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()
> KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock
> KVM: Disable CPU hotplug during hardware enabling
> KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
> KVM: Drop kvm_arch_check_processor_compat() hook
>


2022-11-03 18:22:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 39/44] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

On Thu, Nov 03, 2022, Paolo Bonzini wrote:
> On 11/3/22 00:19, Sean Christopherson wrote:
> > +- kvm_lock is taken outside kvm->mmu_lock
>
> Not surprising since one is a mutex and one is an rwlock. :)

Heh,

Signed-off-by: Captain Obvious <[email protected]>

> You can drop this hunk as well as the "Opportunistically update KVM's locking
> documentation" sentence in the commit message.

Will do.

> > - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
> > - kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock and
> > @@ -216,15 +220,11 @@ time it will be set using the Dirty tracking mechanism described above.
> > :Type: mutex
> > :Arch: any
> > :Protects: - vm_list
> > -
> > -``kvm_count_lock``
> > -^^^^^^^^^^^^^^^^^^
> > -
> > -:Type: raw_spinlock_t
> > -:Arch: any
> > -:Protects: - hardware virtualization enable/disable
> > -:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
> > - migration.
> > + - kvm_usage_count
> > + - hardware virtualization enable/disable
> > + - module probing (x86 only)
>
> What do you mean exactly by "module probing"? Is it anything else than what
> is serialized by vendor_module_lock?

Ooh, I forgot to update this patch after switching to vendor_module_lock. I
added the above after fixing the first deadlock between kvm_lock and cpu_hotplug_lock,
but later gave up on trying to use kvm_lock after deadlock #2, which is when I
when I realized piggybacking kvm_lock was going to be a maintainance nightmare due.

2022-11-03 18:42:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On Thu, Nov 03, 2022, Paolo Bonzini wrote:
> On 11/3/22 00:19, Sean Christopherson wrote:
> > From: Chao Gao<[email protected]>
> >
> > Do compatibility checks when enabling hardware to effectively add
> > compatibility checks when onlining a CPU. Abort enabling, i.e. the
> > online process, if the (hotplugged) CPU is incompatible with the known
> > good setup.
>
> This paragraph is not true with this patch being before "KVM: Rename and
> move CPUHP_AP_KVM_STARTING to ONLINE section".

Argh, good eyes. Getting the ordering correct in this series has been quite the
struggle. Assuming there are no subtle dependencies between x86 and common KVM,
the ordering should be something like this:

KVM: Opt out of generic hardware enabling on s390 and PPC
KVM: Register syscore (suspend/resume) ops early in kvm_init()
KVM: x86: Do compatibility checks when onlining CPU
KVM: SVM: Check for SVM support in CPU compatibility checks
KVM: VMX: Shuffle support checks and hardware enabling code around
KVM: x86: Do VMX/SVM support checks directly in vendor code
KVM: x86: Unify pr_fmt to use module name for all KVM modules
KVM: x86: Use KBUILD_MODNAME to specify vendor module name
KVM: Make hardware_enable_failed a local variable in the "enable all" path
KVM: Use a per-CPU variable to track which CPUs have enabled virtualization
KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit()
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock
KVM: Disable CPU hotplug during hardware enabling
KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
KVM: Drop kvm_arch_check_processor_compat() hook

2022-11-03 18:46:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On Thu, Nov 03, 2022, Paolo Bonzini wrote:
> On 11/3/22 00:19, Sean Christopherson wrote:
> > + if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> > + !boot_cpu_has(X86_FEATURE_VMX)) {
> > + pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
> > + return false;
>
> I think the reference to the BIOS should remain in these messages and in
> svm.c (even though these days it's much less common for vendors to default
> to disabled virtualization in the system setup).

Ya, I'll figure out a way to mention BIOS/firmware.

> The check for X86_FEATURE_MSR_IA32_FEAT_CTL is not needed because
> init_ia32_feat_ctl() will clear X86_FEATURE_VMX if the rdmsr fail (and not
> set X86_FEATURE_MSR_IA32_FEAT_CTL).

It's technically required. IA32_FEAT_CTL and thus KVM_INTEL depends on any of
CPU_SUP_{INTEL,CENATUR,ZHAOXIN}, but init_ia32_feat_ctl() is invoked if and only
if the actual CPU type matches one of the aforementioned CPU_SUP_*.

E.g. running a kernel built with

CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
# CONFIG_CPU_SUP_HYGON is not set
# CONFIG_CPU_SUP_CENTAUR is not set
# CONFIG_CPU_SUP_ZHAOXIN is not set

on a Cenatur or Zhaoxin CPU will leave X86_FEATURE_VMX set but not set
X86_FEATURE_MSR_IA32_FEAT_CTL. If VMX isn't enabled in MSR_IA32_FEAT_CTL, KVM
will get unexpected #UDs when trying to enable VMX.

2022-11-03 19:25:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On Thu, Nov 03, 2022, Paolo Bonzini wrote:
> On 11/3/22 19:35, Sean Christopherson wrote:
> > It's technically required. IA32_FEAT_CTL and thus KVM_INTEL depends on any of
> > CPU_SUP_{INTEL,CENATUR,ZHAOXIN}, but init_ia32_feat_ctl() is invoked if and only
> > if the actual CPU type matches one of the aforementioned CPU_SUP_*.
> >
> > E.g. running a kernel built with
> >
> > CONFIG_CPU_SUP_INTEL=y
> > CONFIG_CPU_SUP_AMD=y
> > # CONFIG_CPU_SUP_HYGON is not set
> > # CONFIG_CPU_SUP_CENTAUR is not set
> > # CONFIG_CPU_SUP_ZHAOXIN is not set
> >
> > on a Cenatur or Zhaoxin CPU will leave X86_FEATURE_VMX set but not set
> > X86_FEATURE_MSR_IA32_FEAT_CTL. If VMX isn't enabled in MSR_IA32_FEAT_CTL, KVM
> > will get unexpected #UDs when trying to enable VMX.
>
> Oh, I see. Perhaps X86_FEATURE_VMX and X86_FEATURE_SGX should be moved to
> one of the software words instead of using cpuid. Nothing that you should
> care about for this series though.

Or maybe something like this?

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..ebe617ab0b37 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -191,6 +191,8 @@ static void default_init(struct cpuinfo_x86 *c)
strcpy(c->x86_model_id, "386");
}
#endif
+
+ clear_cpu_cap(c, X86_FEATURE_MSR_IA32_FEAT_CTL);
}

static const struct cpu_dev default_cpu = {
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..3a7ae67f5a5e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -72,6 +72,8 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
{ X86_FEATURE_PER_THREAD_MBA, X86_FEATURE_MBA },
+ { X86_FEATURE_VMX, X86_FEATURE_MSR_IA32_FEAT_CTL },
+ { X86_FEATURE_SGX, X86_FEATURE_MSR_IA32_FEAT_CTL },
{ X86_FEATURE_SGX_LC, X86_FEATURE_SGX },
{ X86_FEATURE_SGX1, X86_FEATURE_SGX },
{ X86_FEATURE_SGX2, X86_FEATURE_SGX1 },


2022-11-03 19:50:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On 11/3/22 19:35, Sean Christopherson wrote:
> It's technically required. IA32_FEAT_CTL and thus KVM_INTEL depends on any of
> CPU_SUP_{INTEL,CENATUR,ZHAOXIN}, but init_ia32_feat_ctl() is invoked if and only
> if the actual CPU type matches one of the aforementioned CPU_SUP_*.
>
> E.g. running a kernel built with
>
> CONFIG_CPU_SUP_INTEL=y
> CONFIG_CPU_SUP_AMD=y
> # CONFIG_CPU_SUP_HYGON is not set
> # CONFIG_CPU_SUP_CENTAUR is not set
> # CONFIG_CPU_SUP_ZHAOXIN is not set
>
> on a Cenatur or Zhaoxin CPU will leave X86_FEATURE_VMX set but not set
> X86_FEATURE_MSR_IA32_FEAT_CTL. If VMX isn't enabled in MSR_IA32_FEAT_CTL, KVM
> will get unexpected #UDs when trying to enable VMX.

Oh, I see. Perhaps X86_FEATURE_VMX and X86_FEATURE_SGX should be moved
to one of the software words instead of using cpuid. Nothing that you
should care about for this series though.

Paolo


2022-11-03 22:18:21

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On Wed, Nov 02, 2022 at 11:19:03PM +0000,
Sean Christopherson <[email protected]> wrote:

> From: Chao Gao <[email protected]>
>
> Do compatibility checks when enabling hardware to effectively add
> compatibility checks when onlining a CPU. Abort enabling, i.e. the
> online process, if the (hotplugged) CPU is incompatible with the known
> good setup.
>
> At init time, KVM does compatibility checks to ensure that all online
> CPUs support hardware virtualization and a common set of features. But
> KVM uses hotplugged CPUs without such compatibility checks. On Intel
> CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX, or
> VM-Entry failure if the hotplugged CPU doesn't support all features
> enabled by KVM.
>
> Note, this is little more than a NOP on SVM, as SVM already checks for
> full SVM support during hardware enabling.
>
> Opportunistically add a pr_err() if setup_vmcs_config() fails, and
> tweak all error messages to output which CPU failed.
>
> Signed-off-by: Chao Gao <[email protected]>
> Co-developed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm/svm.c | 20 +++++++++++---------
> arch/x86/kvm/vmx/vmx.c | 33 +++++++++++++++++++--------------
> arch/x86/kvm/x86.c | 5 +++--
> 4 files changed, 34 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f223c845ed6e..c99222b71fcc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1666,7 +1666,7 @@ struct kvm_x86_nested_ops {
> };
>
> struct kvm_x86_init_ops {
> - int (*check_processor_compatibility)(void);
> + int (*check_processor_compatibility)(int cpu);

Is this cpu argument used only for error message to include cpu number
with avoiding repeating raw_smp_processor_id() in pr_err()?
The actual check is done on the current executing cpu.

If cpu != raw_smp_processor_id(), cpu is wrong. Although the function is called
in non-preemptive context, it's a bit confusing. So voting to remove it and
to use.

Thanks,
--
Isaku Yamahata <[email protected]>

2022-11-03 22:59:31

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On Thu, Nov 03, 2022, Isaku Yamahata wrote:
> On Wed, Nov 02, 2022 at 11:19:03PM +0000,
> Sean Christopherson <[email protected]> wrote:
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index f223c845ed6e..c99222b71fcc 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1666,7 +1666,7 @@ struct kvm_x86_nested_ops {
> > };
> >
> > struct kvm_x86_init_ops {
> > - int (*check_processor_compatibility)(void);
> > + int (*check_processor_compatibility)(int cpu);
>
> Is this cpu argument used only for error message to include cpu number
> with avoiding repeating raw_smp_processor_id() in pr_err()?

Yep.

> The actual check is done on the current executing cpu.
>
> If cpu != raw_smp_processor_id(), cpu is wrong. Although the function is called
> in non-preemptive context, it's a bit confusing. So voting to remove it and
> to use.

What if I rename the param is this_cpu? I 100% agree the argument is confusing
as-is, but forcing all the helpers to manually grab the cpu is quite annoying.

2022-11-04 01:06:20

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH 02/44] KVM: Initialize IRQ FD after arch hardware setup

On Wed, Nov 02, 2022 at 11:18:29PM +0000, Sean Christopherson wrote:
>
>+ r = kvm_irqfd_init();
>+ if (r)
>+ goto err_irqfd;
>+
> r = kvm_async_pf_init();
> if (r)
>- goto out_free_4;
>+ goto err_async_pf;
>
> kvm_chardev_ops.owner = module;
>
>@@ -5927,6 +5926,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> kvm_vfio_ops_exit();
> err_vfio:
> kvm_async_pf_deinit();
>+err_async_pf:
>+ kvm_irqfd_exit();

>+err_irqfd:
> out_free_4:

Do you mind removing one of the two labels?

2022-11-04 07:12:23

by Yuan Yao

[permalink] [raw]
Subject: Re: [PATCH 08/44] KVM: x86: Move hardware setup/unsetup to init/exit

On Wed, Nov 02, 2022 at 11:18:35PM +0000, Sean Christopherson wrote:
> Now that kvm_arch_hardware_setup() is called immediately after
> kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into
> kvm_arch_{init,exit}() as a step towards dropping one of the hooks.
>
> To avoid having to unwind various setup, e.g registration of several
> notifiers, slot in the vendor hardware setup before the registration of
> said notifiers and callbacks. Introducing a functional change while
> moving code is less than ideal, but the alternative is adding a pile of
> unwinding code, which is much more error prone, e.g. several attempts to
> move the setup code verbatim all introduced bugs.
>
> Add a comment to document that kvm_ops_update() is effectively the point
> of no return, e.g. it sets the kvm_x86_ops.hardware_enable canary and so
> needs to be unwound.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/x86.c | 121 +++++++++++++++++++++++----------------------
> 1 file changed, 63 insertions(+), 58 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9a7702b1c563..80ee580a9cd4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9252,6 +9252,24 @@ static struct notifier_block pvclock_gtod_notifier = {
> };
> #endif
>
> +static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
> +{
> + memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> +
> +#define __KVM_X86_OP(func) \
> + static_call_update(kvm_x86_##func, kvm_x86_ops.func);
> +#define KVM_X86_OP(func) \
> + WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func)
> +#define KVM_X86_OP_OPTIONAL __KVM_X86_OP
> +#define KVM_X86_OP_OPTIONAL_RET0(func) \
> + static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \
> + (void *)__static_call_return0);
> +#include <asm/kvm-x86-ops.h>
> +#undef __KVM_X86_OP
> +
> + kvm_pmu_ops_update(ops->pmu_ops);
> +}
> +
> int kvm_arch_init(void *opaque)
> {
> struct kvm_x86_init_ops *ops = opaque;
> @@ -9325,6 +9343,24 @@ int kvm_arch_init(void *opaque)
> kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
> }
>
> + rdmsrl_safe(MSR_EFER, &host_efer);
> +
> + if (boot_cpu_has(X86_FEATURE_XSAVES))
> + rdmsrl(MSR_IA32_XSS, host_xss);
> +
> + kvm_init_pmu_capability();
> +
> + r = ops->hardware_setup();
> + if (r != 0)
> + goto out_mmu_exit;

The failure case of ops->hardware_setup() is unwound
by kvm_arch_exit() before this patch, do we need to
keep that old behavior ?

> +
> + /*
> + * Point of no return! DO NOT add error paths below this point unless
> + * absolutely necessary, as most operations from this point forward
> + * require unwinding.
> + */
> + kvm_ops_update(ops);
> +
> kvm_timer_init();
>
> if (pi_inject_timer == -1)
> @@ -9336,8 +9372,32 @@ int kvm_arch_init(void *opaque)
> set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
> #endif
>
> + kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
> +
> + if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> + kvm_caps.supported_xss = 0;
> +
> +#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> + cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> +#undef __kvm_cpu_cap_has
> +
> + if (kvm_caps.has_tsc_control) {
> + /*
> + * Make sure the user can only configure tsc_khz values that
> + * fit into a signed integer.
> + * A min value is not calculated because it will always
> + * be 1 on all machines.
> + */
> + u64 max = min(0x7fffffffULL,
> + __scale_tsc(kvm_caps.max_tsc_scaling_ratio, tsc_khz));
> + kvm_caps.max_guest_tsc_khz = max;
> + }
> + kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
> + kvm_init_msr_list();
> return 0;
>
> +out_mmu_exit:
> + kvm_mmu_vendor_module_exit();
> out_free_percpu:
> free_percpu(user_return_msrs);
> out_free_x86_emulator_cache:
> @@ -9347,6 +9407,8 @@ int kvm_arch_init(void *opaque)
>
> void kvm_arch_exit(void)
> {
> + kvm_unregister_perf_callbacks();
> +
> #ifdef CONFIG_X86_64
> if (hypervisor_is_type(X86_HYPER_MS_HYPERV))
> clear_hv_tscchange_cb();
> @@ -9362,6 +9424,7 @@ void kvm_arch_exit(void)
> irq_work_sync(&pvclock_irq_work);
> cancel_work_sync(&pvclock_gtod_work);
> #endif
> + static_call(kvm_x86_hardware_unsetup)();
> kvm_x86_ops.hardware_enable = NULL;
> kvm_mmu_vendor_module_exit();
> free_percpu(user_return_msrs);
> @@ -11922,72 +11985,14 @@ void kvm_arch_hardware_disable(void)
> drop_user_return_notifiers();
> }
>
> -static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
> -{
> - memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> -
> -#define __KVM_X86_OP(func) \
> - static_call_update(kvm_x86_##func, kvm_x86_ops.func);
> -#define KVM_X86_OP(func) \
> - WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func)
> -#define KVM_X86_OP_OPTIONAL __KVM_X86_OP
> -#define KVM_X86_OP_OPTIONAL_RET0(func) \
> - static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \
> - (void *)__static_call_return0);
> -#include <asm/kvm-x86-ops.h>
> -#undef __KVM_X86_OP
> -
> - kvm_pmu_ops_update(ops->pmu_ops);
> -}
> -
> int kvm_arch_hardware_setup(void *opaque)
> {
> - struct kvm_x86_init_ops *ops = opaque;
> - int r;
> -
> - rdmsrl_safe(MSR_EFER, &host_efer);
> -
> - if (boot_cpu_has(X86_FEATURE_XSAVES))
> - rdmsrl(MSR_IA32_XSS, host_xss);
> -
> - kvm_init_pmu_capability();
> -
> - r = ops->hardware_setup();
> - if (r != 0)
> - return r;
> -
> - kvm_ops_update(ops);
> -
> - kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
> -
> - if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> - kvm_caps.supported_xss = 0;
> -
> -#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> - cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> -#undef __kvm_cpu_cap_has
> -
> - if (kvm_caps.has_tsc_control) {
> - /*
> - * Make sure the user can only configure tsc_khz values that
> - * fit into a signed integer.
> - * A min value is not calculated because it will always
> - * be 1 on all machines.
> - */
> - u64 max = min(0x7fffffffULL,
> - __scale_tsc(kvm_caps.max_tsc_scaling_ratio, tsc_khz));
> - kvm_caps.max_guest_tsc_khz = max;
> - }
> - kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
> - kvm_init_msr_list();
> return 0;
> }
>
> void kvm_arch_hardware_unsetup(void)
> {
> - kvm_unregister_perf_callbacks();
>
> - static_call(kvm_x86_hardware_unsetup)();
> }
>
> int kvm_arch_check_processor_compat(void *opaque)
> --
> 2.38.1.431.g37b22c650d-goog
>

2022-11-04 07:29:25

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Wed, Nov 02, 2022 at 11:18:27PM +0000,
Sean Christopherson <[email protected]> wrote:

> Non-x86 folks, please test on hardware when possible. I made a _lot_ of
> mistakes when moving code around. Thankfully, x86 was the trickiest code
> to deal with, and I'm fairly confident that I found all the bugs I
> introduced via testing. But the number of mistakes I made and found on
> x86 makes me more than a bit worried that I screwed something up in other
> arch code.
>
> This is a continuation of Chao's series to do x86 CPU compatibility checks
> during virtualization hardware enabling[1], and of Isaku's series to try
> and clean up the hardware enabling paths so that x86 (Intel specifically)
> can temporarily enable hardware during module initialization without
> causing undue pain for other architectures[2]. It also includes one patch
> from another mini-series from Isaku that provides the less controversial
> patches[3].
>
> The main theme of this series is to kill off kvm_arch_init(),
> kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which
> all originated in x86 code from way back when, and needlessly complicate
> both common KVM code and architecture code. E.g. many architectures don't
> mark functions/data as __init/__ro_after_init purely because kvm_init()
> isn't marked __init to support x86's separate vendor modules.
>
> The idea/hope is that with those hooks gone (moved to arch code), it will
> be easier for x86 (and other architectures) to modify their module init
> sequences as needed without having to fight common KVM code. E.g. I'm
> hoping that ARM can build on this to simplify its hardware enabling logic,
> especially the pKVM side of things.
>
> There are bug fixes throughout this series. They are more scattered than
> I would usually prefer, but getting the sequencing correct was a gigantic
> pain for many of the x86 fixes due to needing to fix common code in order
> for the x86 fix to have any meaning. And while the bugs are often fatal,
> they aren't all that interesting for most users as they either require a
> malicious admin or broken hardware, i.e. aren't likely to be encountered
> by the vast majority of KVM users. So unless someone _really_ wants a
> particular fix isolated for backporting, I'm not planning on shuffling
> patches.
>
> Tested on x86. Lightly tested on arm64. Compile tested only on all other
> architectures.

Thanks for the patch series. I the rebased TDX KVM patch series and it worked.
Since cpu offline needs to be rejected in some cases(To keep at least one cpu
on a package), arch hook for cpu offline is needed.
I can keep it in TDX KVM patch series.

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 23c0f4bc63f1..ef7bcb845d42 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -17,6 +17,7 @@ BUILD_BUG_ON(1)
KVM_X86_OP(hardware_enable)
KVM_X86_OP(hardware_disable)
KVM_X86_OP(hardware_unsetup)
+KVM_X86_OP_OPTIONAL_RET0(offline_cpu)
KVM_X86_OP(has_emulated_msr)
KVM_X86_OP(vcpu_after_set_cpuid)
KVM_X86_OP(is_vm_type_supported)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 496c7c6eaff9..c420409aa96f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1468,6 +1468,7 @@ struct kvm_x86_ops {
int (*hardware_enable)(void);
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
+ int (*offline_cpu)(void);
bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ed5a017f7bc..17c5d6a76c93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12039,6 +12039,11 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}

+int kvm_arch_offline_cpu(unsigned int cpu)
+{
+ return static_call(kvm_x86_offline_cpu)();
+}
+
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
{
return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 620489b9aa93..4df79443fd11 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1460,6 +1460,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
#endif
+int kvm_arch_offline_cpu(unsigned int cpu);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f6b6dcedaa0a..f770fdc662d0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5396,16 +5396,24 @@ static void hardware_disable_nolock(void *junk)
__this_cpu_write(hardware_enabled, false);
}

+__weak int kvm_arch_offline_cpu(unsigned int cpu)
+{
+ return 0;
+}
+
static int kvm_offline_cpu(unsigned int cpu)
{
+ int r = 0;
+
mutex_lock(&kvm_lock);
- if (kvm_usage_count) {
+ r = kvm_arch_offline_cpu(cpu);
+ if (!r && kvm_usage_count) {
preempt_disable();
hardware_disable_nolock(NULL);
preempt_enable();
}
mutex_unlock(&kvm_lock);
- return 0;
+ return r;
}

static void hardware_disable_all_nolock(void)

--
Isaku Yamahata <[email protected]>

2022-11-04 07:44:06

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On Thu, Nov 03, 2022 at 10:34:10PM +0000,
Sean Christopherson <[email protected]> wrote:

> On Thu, Nov 03, 2022, Isaku Yamahata wrote:
> > On Wed, Nov 02, 2022 at 11:19:03PM +0000,
> > Sean Christopherson <[email protected]> wrote:
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index f223c845ed6e..c99222b71fcc 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1666,7 +1666,7 @@ struct kvm_x86_nested_ops {
> > > };
> > >
> > > struct kvm_x86_init_ops {
> > > - int (*check_processor_compatibility)(void);
> > > + int (*check_processor_compatibility)(int cpu);
> >
> > Is this cpu argument used only for error message to include cpu number
> > with avoiding repeating raw_smp_processor_id() in pr_err()?
>
> Yep.
>
> > The actual check is done on the current executing cpu.
> >
> > If cpu != raw_smp_processor_id(), cpu is wrong. Although the function is called
> > in non-preemptive context, it's a bit confusing. So voting to remove it and
> > to use.
>
> What if I rename the param is this_cpu? I 100% agree the argument is confusing
> as-is, but forcing all the helpers to manually grab the cpu is quite annoying.

Makes sense. Let's settle it with this_cpu.
--
Isaku Yamahata <[email protected]>

2022-11-04 08:16:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On 11/3/22 19:58, Sean Christopherson wrote:
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 3e508f239098..ebe617ab0b37 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -191,6 +191,8 @@ static void default_init(struct cpuinfo_x86 *c)
> strcpy(c->x86_model_id, "386");
> }
> #endif
> +
> + clear_cpu_cap(c, X86_FEATURE_MSR_IA32_FEAT_CTL);
> }
>
> static const struct cpu_dev default_cpu = {

Not needed I think? default_init does not call init_ia32_feat_ctl.

> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index c881bcafba7d..3a7ae67f5a5e 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -72,6 +72,8 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
> { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
> { X86_FEATURE_PER_THREAD_MBA, X86_FEATURE_MBA },
> + { X86_FEATURE_VMX, X86_FEATURE_MSR_IA32_FEAT_CTL },
> + { X86_FEATURE_SGX, X86_FEATURE_MSR_IA32_FEAT_CTL },
> { X86_FEATURE_SGX_LC, X86_FEATURE_SGX },
> { X86_FEATURE_SGX1, X86_FEATURE_SGX },
> { X86_FEATURE_SGX2, X86_FEATURE_SGX1 },
>

Yes, good idea.

Paolo


2022-11-04 08:57:53

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On 11/4/22 08:17, Isaku Yamahata wrote:
> On Wed, Nov 02, 2022 at 11:18:27PM +0000,
> Sean Christopherson <[email protected]> wrote:
>
>> Non-x86 folks, please test on hardware when possible. I made a _lot_ of
>> mistakes when moving code around. Thankfully, x86 was the trickiest code
>> to deal with, and I'm fairly confident that I found all the bugs I
>> introduced via testing. But the number of mistakes I made and found on
>> x86 makes me more than a bit worried that I screwed something up in other
>> arch code.
>>
>> This is a continuation of Chao's series to do x86 CPU compatibility checks
>> during virtualization hardware enabling[1], and of Isaku's series to try
>> and clean up the hardware enabling paths so that x86 (Intel specifically)
>> can temporarily enable hardware during module initialization without
>> causing undue pain for other architectures[2]. It also includes one patch
>> from another mini-series from Isaku that provides the less controversial
>> patches[3].
>>
>> The main theme of this series is to kill off kvm_arch_init(),
>> kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which
>> all originated in x86 code from way back when, and needlessly complicate
>> both common KVM code and architecture code. E.g. many architectures don't
>> mark functions/data as __init/__ro_after_init purely because kvm_init()
>> isn't marked __init to support x86's separate vendor modules.
>>
>> The idea/hope is that with those hooks gone (moved to arch code), it will
>> be easier for x86 (and other architectures) to modify their module init
>> sequences as needed without having to fight common KVM code. E.g. I'm
>> hoping that ARM can build on this to simplify its hardware enabling logic,
>> especially the pKVM side of things.
>>
>> There are bug fixes throughout this series. They are more scattered than
>> I would usually prefer, but getting the sequencing correct was a gigantic
>> pain for many of the x86 fixes due to needing to fix common code in order
>> for the x86 fix to have any meaning. And while the bugs are often fatal,
>> they aren't all that interesting for most users as they either require a
>> malicious admin or broken hardware, i.e. aren't likely to be encountered
>> by the vast majority of KVM users. So unless someone _really_ wants a
>> particular fix isolated for backporting, I'm not planning on shuffling
>> patches.
>>
>> Tested on x86. Lightly tested on arm64. Compile tested only on all other
>> architectures.
>
> Thanks for the patch series. I the rebased TDX KVM patch series and it worked.
> Since cpu offline needs to be rejected in some cases(To keep at least one cpu
> on a package), arch hook for cpu offline is needed.
> I can keep it in TDX KVM patch series.

Yes, this patch looks good.

Paolo

> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 23c0f4bc63f1..ef7bcb845d42 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -17,6 +17,7 @@ BUILD_BUG_ON(1)
> KVM_X86_OP(hardware_enable)
> KVM_X86_OP(hardware_disable)
> KVM_X86_OP(hardware_unsetup)
> +KVM_X86_OP_OPTIONAL_RET0(offline_cpu)
> KVM_X86_OP(has_emulated_msr)
> KVM_X86_OP(vcpu_after_set_cpuid)
> KVM_X86_OP(is_vm_type_supported)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 496c7c6eaff9..c420409aa96f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1468,6 +1468,7 @@ struct kvm_x86_ops {
> int (*hardware_enable)(void);
> void (*hardware_disable)(void);
> void (*hardware_unsetup)(void);
> + int (*offline_cpu)(void);
> bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
> void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2ed5a017f7bc..17c5d6a76c93 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12039,6 +12039,11 @@ void kvm_arch_hardware_disable(void)
> drop_user_return_notifiers();
> }
>
> +int kvm_arch_offline_cpu(unsigned int cpu)
> +{
> + return static_call(kvm_x86_offline_cpu)();
> +}
> +
> bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
> {
> return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 620489b9aa93..4df79443fd11 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1460,6 +1460,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
> int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
> #endif
> +int kvm_arch_offline_cpu(unsigned int cpu);
> int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f6b6dcedaa0a..f770fdc662d0 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5396,16 +5396,24 @@ static void hardware_disable_nolock(void *junk)
> __this_cpu_write(hardware_enabled, false);
> }
>
> +__weak int kvm_arch_offline_cpu(unsigned int cpu)
> +{
> + return 0;
> +}
> +
> static int kvm_offline_cpu(unsigned int cpu)
> {
> + int r = 0;
> +
> mutex_lock(&kvm_lock);
> - if (kvm_usage_count) {
> + r = kvm_arch_offline_cpu(cpu);
> + if (!r && kvm_usage_count) {
> preempt_disable();
> hardware_disable_nolock(NULL);
> preempt_enable();
> }
> mutex_unlock(&kvm_lock);
> - return 0;
> + return r;
> }
>
> static void hardware_disable_all_nolock(void)
>


2022-11-04 15:55:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On Fri, Nov 04, 2022, Paolo Bonzini wrote:
> On 11/3/22 19:58, Sean Christopherson wrote:
> >
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 3e508f239098..ebe617ab0b37 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -191,6 +191,8 @@ static void default_init(struct cpuinfo_x86 *c)
> > strcpy(c->x86_model_id, "386");
> > }
> > #endif
> > +
> > + clear_cpu_cap(c, X86_FEATURE_MSR_IA32_FEAT_CTL);
> > }
> > static const struct cpu_dev default_cpu = {
>
> Not needed I think? default_init does not call init_ia32_feat_ctl.

cpuid_deps is only processed by do_clear_cpu_cap(), so unless there's an explicit
"clear" action, the dependencies will not be updated. It kinda makes sense since
hardware-based features shouldn't end up with scenarios where a dependent feature
exists but the base feature does not (barring bad KVM setups :-) ).

That said, this seems like a bug waiting to happen, and unless I'm missing something
it's quite straightforward to process all dependencies during setup. Time to find
out if Boris and co. agree :-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1a85e1fb0922..c4408d03b180 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -147,6 +147,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];

extern void setup_clear_cpu_cap(unsigned int bit);
extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+extern void apply_cpuid_deps(struct cpuinfo_x86 *c);

#define setup_force_cpu_cap(bit) do { \
set_cpu_cap(&boot_cpu_data, bit); \
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..28ce31dadd7f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1884,6 +1884,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
c->x86_capability[i] |= boot_cpu_data.x86_capability[i];
}

+ apply_cpuid_deps(c);
+
ppin_init(c);

/* Init Machine Check Exception if available. */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..7e91e97973ca 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -138,3 +138,13 @@ void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
}
+
+void apply_cpuid_deps(struct cpuinfo_x86 *c)
+{
+ const struct cpuid_dep *d;
+
+ for (d = cpuid_deps; d->feature; d++) {
+ if (!cpu_has(c, d->feature))
+ clear_cpu_cap(c, d->feature);
+ }
+}


2022-11-04 17:05:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 08/44] KVM: x86: Move hardware setup/unsetup to init/exit

On Fri, Nov 04, 2022, Yuan Yao wrote:
> On Wed, Nov 02, 2022 at 11:18:35PM +0000, Sean Christopherson wrote:
> > To avoid having to unwind various setup, e.g registration of several
> > notifiers, slot in the vendor hardware setup before the registration of
> > said notifiers and callbacks. Introducing a functional change while
> > moving code is less than ideal, but the alternative is adding a pile of
> > unwinding code, which is much more error prone, e.g. several attempts to
> > move the setup code verbatim all introduced bugs.

...

> > @@ -9325,6 +9343,24 @@ int kvm_arch_init(void *opaque)
> > kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
> > }
> >
> > + rdmsrl_safe(MSR_EFER, &host_efer);
> > +
> > + if (boot_cpu_has(X86_FEATURE_XSAVES))
> > + rdmsrl(MSR_IA32_XSS, host_xss);
> > +
> > + kvm_init_pmu_capability();
> > +
> > + r = ops->hardware_setup();
> > + if (r != 0)
> > + goto out_mmu_exit;
>
> The failure case of ops->hardware_setup() is unwound
> by kvm_arch_exit() before this patch, do we need to
> keep that old behavior ?

As called out in the changelog, the call to ops->hardware_setup() was deliberately
slotted in before the call to kvm_timer_init() so that kvm_arch_init() wouldn't
need to unwind more stuff if harware_setup() fails.

> > + /*
> > + * Point of no return! DO NOT add error paths below this point unless
> > + * absolutely necessary, as most operations from this point forward
> > + * require unwinding.
> > + */
> > + kvm_ops_update(ops);
> > +
> > kvm_timer_init();
> >
> > if (pi_inject_timer == -1)
> > @@ -9336,8 +9372,32 @@ int kvm_arch_init(void *opaque)
> > set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
> > #endif
> >
> > + kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
> > +
> > + if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> > + kvm_caps.supported_xss = 0;
> > +
> > +#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> > + cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> > +#undef __kvm_cpu_cap_has
> > +
> > + if (kvm_caps.has_tsc_control) {
> > + /*
> > + * Make sure the user can only configure tsc_khz values that
> > + * fit into a signed integer.
> > + * A min value is not calculated because it will always
> > + * be 1 on all machines.
> > + */
> > + u64 max = min(0x7fffffffULL,
> > + __scale_tsc(kvm_caps.max_tsc_scaling_ratio, tsc_khz));
> > + kvm_caps.max_guest_tsc_khz = max;
> > + }
> > + kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
> > + kvm_init_msr_list();
> > return 0;
> >
> > +out_mmu_exit:
> > + kvm_mmu_vendor_module_exit();
> > out_free_percpu:
> > free_percpu(user_return_msrs);
> > out_free_x86_emulator_cache:

2022-11-04 20:39:50

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 02/44] KVM: Initialize IRQ FD after arch hardware setup

On Fri, Nov 04, 2022, Chao Gao wrote:
> On Wed, Nov 02, 2022 at 11:18:29PM +0000, Sean Christopherson wrote:
> >
> >+ r = kvm_irqfd_init();
> >+ if (r)
> >+ goto err_irqfd;
> >+
> > r = kvm_async_pf_init();
> > if (r)
> >- goto out_free_4;
> >+ goto err_async_pf;
> >
> > kvm_chardev_ops.owner = module;
> >
> >@@ -5927,6 +5926,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> > kvm_vfio_ops_exit();
> > err_vfio:
> > kvm_async_pf_deinit();
> >+err_async_pf:
> >+ kvm_irqfd_exit();
>
> >+err_irqfd:
> > out_free_4:
>
> Do you mind removing one of the two labels?

Ah, I meant to tack on a patch at the very end to clean up these labels once the
dust had settled, e.g. to also resolve the "err" vs. "out" mess I created (on
purpose, because trying to describe the "out" path was frustrating and generated
too much churn).

2022-11-04 20:40:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> Thanks for the patch series. I the rebased TDX KVM patch series and it worked.
> Since cpu offline needs to be rejected in some cases(To keep at least one cpu
> on a package), arch hook for cpu offline is needed.

I hate to bring this up because I doubt there's a real use case for SUSPEND with
TDX, but the CPU offline path isn't just for true offlining of CPUs. When the
system enters SUSPEND, only the initiating CPU goes through kvm_suspend()+kvm_resume(),
all responding CPUs go through CPU offline+online. I.e. disallowing all CPUs from
going "offline" will prevent suspending the system.

I don't see anything in the TDX series or the specs that suggests suspend+resume
is disallowed when TDX is enabled, so blocking that seems just as wrong as
preventing software from soft-offlining CPUs.

2022-11-07 03:23:52

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 22/44] KVM: RISC-V: Do arch init directly in riscv_kvm_init()

On Thu, Nov 3, 2022 at 4:49 AM Sean Christopherson <[email protected]> wrote:
>
> Fold the guts of kvm_arch_init() into riscv_kvm_init() instead of
> bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a
> glorified nop as invoking kvm_arch_init() is the very first action
> performed by kvm_init().
>
> Moving setup to riscv_kvm_init(), which is tagged __init, will allow
> tagging more functions and data with __init and __ro_after_init. And
> emptying kvm_arch_init() will allow dropping the hook entirely once all
> architecture implementations are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>

For KVM RISC-V:
Acked-by: Anup Patel <[email protected]>

Regards,
Anup

> ---
> arch/riscv/kvm/main.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index a146fa0ce4d2..cb063b8a9a0f 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -66,6 +66,15 @@ void kvm_arch_hardware_disable(void)
> }
>
> int kvm_arch_init(void *opaque)
> +{
> + return 0;
> +}
> +
> +void kvm_arch_exit(void)
> +{
> +}
> +
> +static int __init riscv_kvm_init(void)
> {
> const char *str;
>
> @@ -110,15 +119,6 @@ int kvm_arch_init(void *opaque)
>
> kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
>
> - return 0;
> -}
> -
> -void kvm_arch_exit(void)
> -{
> -}
> -
> -static int __init riscv_kvm_init(void)
> -{
> return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> }
> module_init(riscv_kvm_init);
> --
> 2.38.1.431.g37b22c650d-goog
>

2022-11-07 03:36:18

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 44/44] KVM: Opt out of generic hardware enabling on s390 and PPC

On Thu, Nov 3, 2022 at 4:50 AM Sean Christopherson <[email protected]> wrote:
>
> Allow architectures to opt out of the generic hardware enabling logic,
> and opt out on both s390 and PPC, which don't need to manually enable
> virtualization as it's always on (when available).
>
> In addition to letting s390 and PPC drop a bit of dead code, this will
> hopefully also allow ARM to clean up its related code, e.g. ARM has its
> own per-CPU flag to track which CPUs have enable hardware due to the
> need to keep hardware enabled indefinitely when pKVM is enabled.
>
> Signed-off-by: Sean Christopherson <[email protected]>

For KVM RISC-V:
Acked-by: Anup Patel <[email protected]>

Thanks,
Anup

> ---
> arch/arm64/kvm/Kconfig | 1 +
> arch/mips/kvm/Kconfig | 1 +
> arch/powerpc/include/asm/kvm_host.h | 1 -
> arch/powerpc/kvm/powerpc.c | 5 -----
> arch/riscv/kvm/Kconfig | 1 +
> arch/s390/include/asm/kvm_host.h | 1 -
> arch/s390/kvm/kvm-s390.c | 6 ------
> arch/x86/kvm/Kconfig | 1 +
> include/linux/kvm_host.h | 4 ++++
> virt/kvm/Kconfig | 3 +++
> virt/kvm/kvm_main.c | 30 +++++++++++++++++++++++------
> 11 files changed, 35 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 815cc118c675..0a7d2116b27b 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -21,6 +21,7 @@ if VIRTUALIZATION
> menuconfig KVM
> bool "Kernel-based Virtual Machine (KVM) support"
> depends on HAVE_KVM
> + select KVM_GENERIC_HARDWARE_ENABLING
> select MMU_NOTIFIER
> select PREEMPT_NOTIFIERS
> select HAVE_KVM_CPU_RELAX_INTERCEPT
> diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
> index 91d197bee9c0..29e51649203b 100644
> --- a/arch/mips/kvm/Kconfig
> +++ b/arch/mips/kvm/Kconfig
> @@ -28,6 +28,7 @@ config KVM
> select MMU_NOTIFIER
> select SRCU
> select INTERVAL_TREE
> + select KVM_GENERIC_HARDWARE_ENABLING
> help
> Support for hosting Guest kernels.
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 0a80e80c7b9e..959f566a455c 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -876,7 +876,6 @@ struct kvm_vcpu_arch {
> #define __KVM_HAVE_ARCH_WQP
> #define __KVM_HAVE_CREATE_DEVICE
>
> -static inline void kvm_arch_hardware_disable(void) {}
> static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 51268be60dac..ed426c9ee0e9 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -436,11 +436,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
> }
> EXPORT_SYMBOL_GPL(kvmppc_ld);
>
> -int kvm_arch_hardware_enable(void)
> -{
> - return 0;
> -}
> -
> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> {
> struct kvmppc_ops *kvm_ops = NULL;
> diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
> index f36a737d5f96..d5a658a047a7 100644
> --- a/arch/riscv/kvm/Kconfig
> +++ b/arch/riscv/kvm/Kconfig
> @@ -20,6 +20,7 @@ if VIRTUALIZATION
> config KVM
> tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)"
> depends on RISCV_SBI && MMU
> + select KVM_GENERIC_HARDWARE_ENABLING
> select MMU_NOTIFIER
> select PREEMPT_NOTIFIERS
> select KVM_MMIO
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index b1e98a9ed152..d3e4b5d7013a 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -1023,7 +1023,6 @@ extern char sie_exit;
> extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
> extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
>
> -static inline void kvm_arch_hardware_disable(void) {}
> static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> static inline void kvm_arch_free_memslot(struct kvm *kvm,
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 949231f1393e..129c159ab5ee 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -248,12 +248,6 @@ debug_info_t *kvm_s390_dbf;
> debug_info_t *kvm_s390_dbf_uv;
>
> /* Section: not file related */
> -int kvm_arch_hardware_enable(void)
> -{
> - /* every s390 is virtualization enabled ;-) */
> - return 0;
> -}
> -
> /* forward declarations */
> static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
> unsigned long end);
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index fbeaa9ddef59..8e578311ca9d 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -49,6 +49,7 @@ config KVM
> select SRCU
> select INTERVAL_TREE
> select HAVE_KVM_PM_NOTIFIER if PM
> + select KVM_GENERIC_HARDWARE_ENABLING
> help
> Support hosting fully virtualized guest machines using hardware
> virtualization extensions. You will need a fairly recent
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0b96d836a051..23c89c1e7788 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1441,8 +1441,10 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
> static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
> #endif
>
> +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
> +#endif
> int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> @@ -2074,7 +2076,9 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> }
> }
>
> +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> extern bool kvm_rebooting;
> +#endif
>
> extern unsigned int halt_poll_ns;
> extern unsigned int halt_poll_ns_grow;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 800f9470e36b..d28df77345e1 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -86,3 +86,6 @@ config KVM_XFER_TO_GUEST_WORK
>
> config HAVE_KVM_PM_NOTIFIER
> bool
> +
> +config KVM_GENERIC_HARDWARE_ENABLING
> + bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 859bc27091cd..6736b36cf469 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,9 +102,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
> DEFINE_MUTEX(kvm_lock);
> LIST_HEAD(vm_list);
>
> -static DEFINE_PER_CPU(bool, hardware_enabled);
> -static int kvm_usage_count;
> -
> static struct kmem_cache *kvm_vcpu_cache;
>
> static __read_mostly struct preempt_ops kvm_preempt_ops;
> @@ -146,9 +143,6 @@ static void hardware_disable_all(void);
>
> static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>
> -__visible bool kvm_rebooting;
> -EXPORT_SYMBOL_GPL(kvm_rebooting);
> -
> #define KVM_EVENT_CREATE_VM 0
> #define KVM_EVENT_DESTROY_VM 1
> static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
> @@ -5005,6 +4999,13 @@ static struct miscdevice kvm_dev = {
> &kvm_chardev_ops,
> };
>
> +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> +__visible bool kvm_rebooting;
> +EXPORT_SYMBOL_GPL(kvm_rebooting);
> +
> +static DEFINE_PER_CPU(bool, hardware_enabled);
> +static int kvm_usage_count;
> +
> static int __hardware_enable_nolock(void)
> {
> if (__this_cpu_read(hardware_enabled))
> @@ -5171,6 +5172,17 @@ static struct syscore_ops kvm_syscore_ops = {
> .suspend = kvm_suspend,
> .resume = kvm_resume,
> };
> +#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
> +static int hardware_enable_all(void)
> +{
> + return 0;
> +}
> +
> +static void hardware_disable_all(void)
> +{
> +
> +}
> +#endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
>
> static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
> {
> @@ -5859,6 +5871,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> int r;
> int cpu;
>
> +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> kvm_online_cpu, kvm_offline_cpu);
> if (r)
> @@ -5866,6 +5879,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
>
> register_reboot_notifier(&kvm_reboot_notifier);
> register_syscore_ops(&kvm_syscore_ops);
> +#endif
>
> /* A kmem cache lets us meet the alignment requirements of fx_save. */
> if (!vcpu_align)
> @@ -5933,9 +5947,11 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
> kmem_cache_destroy(kvm_vcpu_cache);
> out_free_3:
> +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> unregister_syscore_ops(&kvm_syscore_ops);
> unregister_reboot_notifier(&kvm_reboot_notifier);
> cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> +#endif
> return r;
> }
> EXPORT_SYMBOL_GPL(kvm_init);
> @@ -5957,9 +5973,11 @@ void kvm_exit(void)
> kmem_cache_destroy(kvm_vcpu_cache);
> kvm_vfio_ops_exit();
> kvm_async_pf_deinit();
> +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> unregister_syscore_ops(&kvm_syscore_ops);
> unregister_reboot_notifier(&kvm_reboot_notifier);
> cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> +#endif
> kvm_irqfd_exit();
> }
> EXPORT_SYMBOL_GPL(kvm_exit);
> --
> 2.38.1.431.g37b22c650d-goog
>

2022-11-07 03:58:51

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 30/44] KVM: Drop kvm_arch_check_processor_compat() hook

On Thu, Nov 3, 2022 at 4:50 AM Sean Christopherson <[email protected]> wrote:
>
> Drop kvm_arch_check_processor_compat() and its support code now that all
> architecture implementations are nops.
>
> Signed-off-by: Sean Christopherson <[email protected]>

For KVM RISC-V:
Acked-by: Anup Patel <[email protected]>

Thanks,
Anup

> ---
> arch/arm64/kvm/arm.c | 7 +------
> arch/mips/kvm/mips.c | 7 +------
> arch/powerpc/kvm/book3s.c | 2 +-
> arch/powerpc/kvm/e500.c | 2 +-
> arch/powerpc/kvm/e500mc.c | 2 +-
> arch/powerpc/kvm/powerpc.c | 5 -----
> arch/riscv/kvm/main.c | 7 +------
> arch/s390/kvm/kvm-s390.c | 7 +------
> arch/x86/kvm/svm/svm.c | 4 ++--
> arch/x86/kvm/vmx/vmx.c | 4 ++--
> arch/x86/kvm/x86.c | 5 -----
> include/linux/kvm_host.h | 4 +---
> virt/kvm/kvm_main.c | 24 +-----------------------
> 13 files changed, 13 insertions(+), 67 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 75c5125b0dd3..ed1836b6f044 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -63,11 +63,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> -{
> - return 0;
> -}
> -
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> struct kvm_enable_cap *cap)
> {
> @@ -2268,7 +2263,7 @@ static __init int kvm_arm_init(void)
> * FIXME: Do something reasonable if kvm_init() fails after pKVM
> * hypervisor protection is finalized.
> */
> - err = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + err = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> if (err)
> goto out_subs;
>
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 3cade648827a..36c8991b5d39 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -135,11 +135,6 @@ void kvm_arch_hardware_disable(void)
> kvm_mips_callbacks->hardware_disable();
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> -{
> - return 0;
> -}
> -
> extern void kvm_init_loongson_ipi(struct kvm *kvm);
>
> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> @@ -1636,7 +1631,7 @@ static int __init kvm_mips_init(void)
>
> register_die_notifier(&kvm_mips_csr_die_notifier);
>
> - ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + ret = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> if (ret) {
> unregister_die_notifier(&kvm_mips_csr_die_notifier);
> return ret;
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 87283a0e33d8..57f4e7896d67 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -1052,7 +1052,7 @@ static int kvmppc_book3s_init(void)
> {
> int r;
>
> - r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + r = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> if (r)
> return r;
> #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
> diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
> index 0ea61190ec04..b0f695428733 100644
> --- a/arch/powerpc/kvm/e500.c
> +++ b/arch/powerpc/kvm/e500.c
> @@ -531,7 +531,7 @@ static int __init kvmppc_e500_init(void)
> flush_icache_range(kvmppc_booke_handlers, kvmppc_booke_handlers +
> ivor[max_ivor] + handler_len);
>
> - r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
> + r = kvm_init(sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
> if (r)
> goto err_out;
> kvm_ops_e500.owner = THIS_MODULE;
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index 795667f7ebf0..611532a0dedc 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -404,7 +404,7 @@ static int __init kvmppc_e500mc_init(void)
> */
> kvmppc_init_lpid(KVMPPC_NR_LPIDS/threads_per_core);
>
> - r = kvm_init(NULL, sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
> + r = kvm_init(sizeof(struct kvmppc_vcpu_e500), 0, THIS_MODULE);
> if (r)
> goto err_out;
> kvm_ops_e500mc.owner = THIS_MODULE;
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 34278042ad27..51268be60dac 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -441,11 +441,6 @@ int kvm_arch_hardware_enable(void)
> return 0;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> -{
> - return 0;
> -}
> -
> int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> {
> struct kvmppc_ops *kvm_ops = NULL;
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 4710a6751687..34c3dece6990 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -20,11 +20,6 @@ long kvm_arch_dev_ioctl(struct file *filp,
> return -EINVAL;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> -{
> - return 0;
> -}
> -
> int kvm_arch_hardware_enable(void)
> {
> unsigned long hideleg, hedeleg;
> @@ -110,6 +105,6 @@ static int __init riscv_kvm_init(void)
>
> kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
>
> - return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + return kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> }
> module_init(riscv_kvm_init);
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 7c1c6d81b5d7..949231f1393e 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -254,11 +254,6 @@ int kvm_arch_hardware_enable(void)
> return 0;
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> -{
> - return 0;
> -}
> -
> /* forward declarations */
> static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
> unsigned long end);
> @@ -5654,7 +5649,7 @@ static int __init kvm_s390_init(void)
> if (r)
> return r;
>
> - r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> + r = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> if (r) {
> __kvm_s390_exit();
> return r;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 368b4db4b240..99c1ac2d9c84 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5144,8 +5144,8 @@ static int __init svm_init(void)
> * Common KVM initialization _must_ come last, after this, /dev/kvm is
> * exposed to userspace!
> */
> - r = kvm_init(NULL, sizeof(struct vcpu_svm),
> - __alignof__(struct vcpu_svm), THIS_MODULE);
> + r = kvm_init(sizeof(struct vcpu_svm), __alignof__(struct vcpu_svm),
> + THIS_MODULE);
> if (r)
> goto err_kvm_init;
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 26baaccb659a..25e28d368274 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8562,8 +8562,8 @@ static int __init vmx_init(void)
> * Common KVM initialization _must_ come last, after this, /dev/kvm is
> * exposed to userspace!
> */
> - r = kvm_init(NULL, sizeof(struct vcpu_vmx),
> - __alignof__(struct vcpu_vmx), THIS_MODULE);
> + r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx),
> + THIS_MODULE);
> if (r)
> goto err_kvm_init;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 94831f1a1d04..5b7b551ae44b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12036,11 +12036,6 @@ void kvm_arch_hardware_disable(void)
> drop_user_return_notifiers();
> }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> -{
> - return 0;
> -}
> -
> bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
> {
> return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6c2a28c4c684..0b96d836a051 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -936,8 +936,7 @@ static inline void kvm_irqfd_exit(void)
> {
> }
> #endif
> -int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> - struct module *module);
> +int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module);
> void kvm_exit(void);
>
> void kvm_get_kvm(struct kvm *kvm);
> @@ -1444,7 +1443,6 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
>
> int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
> -int kvm_arch_check_processor_compat(void *opaque);
> int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 17c852cb6842..dd13af9f06d5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5814,36 +5814,14 @@ void kvm_unregister_perf_callbacks(void)
> }
> #endif
>
> -struct kvm_cpu_compat_check {
> - void *opaque;
> - int *ret;
> -};
> -
> -static void check_processor_compat(void *data)
> +int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> {
> - struct kvm_cpu_compat_check *c = data;
> -
> - *c->ret = kvm_arch_check_processor_compat(c->opaque);
> -}
> -
> -int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> - struct module *module)
> -{
> - struct kvm_cpu_compat_check c;
> int r;
> int cpu;
>
> if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
> return -ENOMEM;
>
> - c.ret = &r;
> - c.opaque = opaque;
> - for_each_online_cpu(cpu) {
> - smp_call_function_single(cpu, check_processor_compat, &c, 1);
> - if (r < 0)
> - goto out_free_2;
> - }
> -
> r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
> kvm_starting_cpu, kvm_dying_cpu);
> if (r)
> --
> 2.38.1.431.g37b22c650d-goog
>

2022-11-07 04:18:18

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 27/44] KVM: Drop kvm_arch_{init,exit}() hooks

On Thu, Nov 3, 2022 at 4:50 AM Sean Christopherson <[email protected]> wrote:
>
> Drop kvm_arch_init() and kvm_arch_exit() now that all implementations
> are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>

For KVM RISC-V:
Acked-by: Anup Patel <[email protected]>

Thanks,
Anup

> ---
> arch/arm64/kvm/arm.c | 11 -----------
> arch/mips/kvm/mips.c | 10 ----------
> arch/powerpc/include/asm/kvm_host.h | 1 -
> arch/powerpc/kvm/powerpc.c | 5 -----
> arch/riscv/kvm/main.c | 9 ---------
> arch/s390/kvm/kvm-s390.c | 10 ----------
> arch/x86/kvm/x86.c | 10 ----------
> include/linux/kvm_host.h | 3 ---
> virt/kvm/kvm_main.c | 19 ++-----------------
> 9 files changed, 2 insertions(+), 76 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 6e0061eac627..75c5125b0dd3 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2284,17 +2284,6 @@ static __init int kvm_arm_init(void)
> return err;
> }
>
> -int kvm_arch_init(void *opaque)
> -{
> - return 0;
> -}
> -
> -/* NOP: Compiling as a module not supported */
> -void kvm_arch_exit(void)
> -{
> -
> -}
> -
> static int __init early_kvm_mode_cfg(char *arg)
> {
> if (!arg)
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index ae7a24342fdf..3cade648827a 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1010,16 +1010,6 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> return r;
> }
>
> -int kvm_arch_init(void *opaque)
> -{
> - return 0;
> -}
> -
> -void kvm_arch_exit(void)
> -{
> -
> -}
> -
> int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
> struct kvm_sregs *sregs)
> {
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 5d2c3a487e73..0a80e80c7b9e 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -881,7 +881,6 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
> static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_arch_exit(void) {}
> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 36c27381a769..34278042ad27 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -2525,11 +2525,6 @@ void kvmppc_init_lpid(unsigned long nr_lpids_param)
> }
> EXPORT_SYMBOL_GPL(kvmppc_init_lpid);
>
> -int kvm_arch_init(void *opaque)
> -{
> - return 0;
> -}
> -
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ppc_instr);
>
> void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index cb063b8a9a0f..4710a6751687 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -65,15 +65,6 @@ void kvm_arch_hardware_disable(void)
> csr_write(CSR_HIDELEG, 0);
> }
>
> -int kvm_arch_init(void *opaque)
> -{
> - return 0;
> -}
> -
> -void kvm_arch_exit(void)
> -{
> -}
> -
> static int __init riscv_kvm_init(void)
> {
> const char *str;
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f6ae845bc1c1..7c1c6d81b5d7 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -533,16 +533,6 @@ static void __kvm_s390_exit(void)
> debug_unregister(kvm_s390_dbf_uv);
> }
>
> -int kvm_arch_init(void *opaque)
> -{
> - return 0;
> -}
> -
> -void kvm_arch_exit(void)
> -{
> -
> -}
> -
> /* Section: device related */
> long kvm_arch_dev_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 218707597bea..2b4530a33298 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9271,16 +9271,6 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
> kvm_pmu_ops_update(ops->pmu_ops);
> }
>
> -int kvm_arch_init(void *opaque)
> -{
> - return 0;
> -}
> -
> -void kvm_arch_exit(void)
> -{
> -
> -}
> -
> static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> {
> u64 host_pat;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9b52bd40be56..6c2a28c4c684 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1423,9 +1423,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> struct kvm_guest_debug *dbg);
> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
>
> -int kvm_arch_init(void *opaque);
> -void kvm_arch_exit(void);
> -
> void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
>
> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 27ce263a80e4..17c852cb6842 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5833,20 +5833,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> int r;
> int cpu;
>
> - /*
> - * FIXME: Get rid of kvm_arch_init(), vendor code should call arch code
> - * directly. Note, kvm_arch_init() _must_ be called before anything
> - * else as x86 relies on checks buried in kvm_arch_init() to guard
> - * against multiple calls to kvm_init().
> - */
> - r = kvm_arch_init(opaque);
> - if (r)
> - return r;
> -
> - if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
> - r = -ENOMEM;
> - goto err_hw_enabled;
> - }
> + if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
> + return -ENOMEM;
>
> c.ret = &r;
> c.opaque = opaque;
> @@ -5934,8 +5922,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
> out_free_2:
> free_cpumask_var(cpus_hardware_enabled);
> -err_hw_enabled:
> - kvm_arch_exit();
> return r;
> }
> EXPORT_SYMBOL_GPL(kvm_init);
> @@ -5963,7 +5949,6 @@ void kvm_exit(void)
> on_each_cpu(hardware_disable_nolock, NULL, 1);
> kvm_irqfd_exit();
> free_cpumask_var(cpus_hardware_enabled);
> - kvm_arch_exit();
> }
> EXPORT_SYMBOL_GPL(kvm_exit);
>
> --
> 2.38.1.431.g37b22c650d-goog
>

2022-11-07 18:59:50

by Eric Farman

[permalink] [raw]
Subject: Re: [PATCH 05/44] KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails

On Wed, 2022-11-02 at 23:18 +0000, Sean Christopherson wrote:
> In preparation for folding kvm_arch_hardware_setup() into
> kvm_arch_init(),
> unwind initialization one step at a time instead of simply calling
> kvm_arch_exit().  Using kvm_arch_exit() regardless of which
> initialization
> step failed relies on all affected state playing nice with being
> undone
> even if said state wasn't first setup.  That holds true for state
> that is
> currently configured by kvm_arch_init(), but not for state that's
> handled
> by kvm_arch_hardware_setup(), e.g. calling
> gmap_unregister_pte_notifier()
> without first registering a notifier would result in list corruption
> due
> to attempting to delete an entry that was never added to the list.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/s390/kvm/kvm-s390.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)

Reviewed-by: Eric Farman <[email protected]>

2022-11-07 19:10:15

by Eric Farman

[permalink] [raw]
Subject: Re: [PATCH 26/44] KVM: s390: Mark __kvm_s390_init() and its descendants as __init

On Wed, 2022-11-02 at 23:18 +0000, Sean Christopherson wrote:
> Tag __kvm_s390_init() and its unique helpers as __init.  These
> functions
> are only ever called during module_init(), but could not be tagged
> accordingly while they were invoked from the common kvm_arch_init(),
> which is not __init because of x86.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/s390/kvm/interrupt.c | 2 +-
>  arch/s390/kvm/kvm-s390.c  | 4 ++--
>  arch/s390/kvm/kvm-s390.h  | 2 +-
>  arch/s390/kvm/pci.c       | 2 +-
>  arch/s390/kvm/pci.h       | 2 +-
>  5 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Eric Farman <[email protected]>

2022-11-07 19:10:56

by Eric Farman

[permalink] [raw]
Subject: Re: [PATCH 25/44] KVM: s390: Do s390 specific init without bouncing through kvm_init()

On Wed, 2022-11-02 at 23:18 +0000, Sean Christopherson wrote:
> Move the guts of kvm_arch_init() into a new helper,
> __kvm_s390_init(),
> and invoke the new helper directly from kvm_s390_init() instead of
> bouncing through kvm_init().  Invoking kvm_arch_init() is the very
> first action performed by kvm_init(), i.e. this is a glorified nop.
>
> Moving setup to __kvm_s390_init() will allow tagging more functions
> as
> __init, and emptying kvm_arch_init() will allow dropping the hook
> entirely once all architecture implementations are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/s390/kvm/kvm-s390.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)

Reviewed-by: Eric Farman <[email protected]>


2022-11-07 19:26:11

by Eric Farman

[permalink] [raw]
Subject: Re: [PATCH 30/44] KVM: Drop kvm_arch_check_processor_compat() hook

On Wed, 2022-11-02 at 23:18 +0000, Sean Christopherson wrote:
> Drop kvm_arch_check_processor_compat() and its support code now that
> all
> architecture implementations are nops.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/arm64/kvm/arm.c       |  7 +------
>  arch/mips/kvm/mips.c       |  7 +------
>  arch/powerpc/kvm/book3s.c  |  2 +-
>  arch/powerpc/kvm/e500.c    |  2 +-
>  arch/powerpc/kvm/e500mc.c  |  2 +-
>  arch/powerpc/kvm/powerpc.c |  5 -----
>  arch/riscv/kvm/main.c      |  7 +------
>  arch/s390/kvm/kvm-s390.c   |  7 +------
>  arch/x86/kvm/svm/svm.c     |  4 ++--
>  arch/x86/kvm/vmx/vmx.c     |  4 ++--
>  arch/x86/kvm/x86.c         |  5 -----
>  include/linux/kvm_host.h   |  4 +---
>  virt/kvm/kvm_main.c        | 24 +-----------------------
>  13 files changed, 13 insertions(+), 67 deletions(-)

Reviewed-by: Eric Farman <[email protected]> # s390

2022-11-07 19:29:11

by Eric Farman

[permalink] [raw]
Subject: Re: [PATCH 27/44] KVM: Drop kvm_arch_{init,exit}() hooks

On Wed, 2022-11-02 at 23:18 +0000, Sean Christopherson wrote:
> Drop kvm_arch_init() and kvm_arch_exit() now that all implementations
> are nops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/arm64/kvm/arm.c                | 11 -----------
>  arch/mips/kvm/mips.c                | 10 ----------
>  arch/powerpc/include/asm/kvm_host.h |  1 -
>  arch/powerpc/kvm/powerpc.c          |  5 -----
>  arch/riscv/kvm/main.c               |  9 ---------
>  arch/s390/kvm/kvm-s390.c            | 10 ----------
>  arch/x86/kvm/x86.c                  | 10 ----------
>  include/linux/kvm_host.h            |  3 ---
>  virt/kvm/kvm_main.c                 | 19 ++-----------------
>  9 files changed, 2 insertions(+), 76 deletions(-)

Reviewed-by: Eric Farman <[email protected]> # s390

2022-11-07 22:04:18

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Fri, Nov 04, 2022 at 08:27:14PM +0000,
Sean Christopherson <[email protected]> wrote:

> On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > Thanks for the patch series. I the rebased TDX KVM patch series and it worked.
> > Since cpu offline needs to be rejected in some cases(To keep at least one cpu
> > on a package), arch hook for cpu offline is needed.
>
> I hate to bring this up because I doubt there's a real use case for SUSPEND with
> TDX, but the CPU offline path isn't just for true offlining of CPUs. When the
> system enters SUSPEND, only the initiating CPU goes through kvm_suspend()+kvm_resume(),
> all responding CPUs go through CPU offline+online. I.e. disallowing all CPUs from
> going "offline" will prevent suspending the system.

The current TDX KVM implementation disallows CPU package from offline only when
TDs are running. If no TD is running, CPU offline is allowed. So before
SUSPEND, TDs need to be killed via systemd or something. After killing TDs, the
system can enter into SUSPEND state.


> I don't see anything in the TDX series or the specs that suggests suspend+resume
> is disallowed when TDX is enabled, so blocking that seems just as wrong as
> preventing software from soft-offlining CPUs.

When it comes to SUSPEND, it means suspend-to-idle, ACPI S1, S3, or S4.
suspend-to-idle doesn't require CPU offline.

Although CPU related spec doesn't mention about S3, the ACPI spec says

7.4.2.2 System _S1 State (Sleeping with Processor Context Maintained)
The processor-complex context is maintained.

7.4.2.4 System _S3 State or 7.4.2.5 System _S4 State
The processor-complex context is not maintained.

It's safe to say the processor context related to TDX is complex, I think.
Let me summarize the situation. What do you think?

- While no TD running:
No additional limitation on CPU offline.

- On TD creation:
If any of whole cpu package is software offlined, TD creation fails.
Alternative: forcibly online necessary CPUs, create TD, and offline CPUs

- TD running:
Although it's not required to keep all CPU packages online, keep CPU package
from offlining for TD destruction.

- TD destruction:
If any of whole cpu package is software offlined, TD destruction fails.
The current implementation prevents any cpu package from offlinining during
TD running.
Alternative:
- forcibly online necessary CPUs, destruct TD, and offline CPUs again and
allow CPU package to offline
- Stash TDX resources somewhere. When cpu packages are onlined, free those
release.

- On SUSPEND:
TODO: Allow CPU offline if S1 is requested.
- suspend-to-idle: nothing to do because cpu offline isn't required
- ACPI S1: Need to allow offline CPUs. This can be implemented by referencing
suspend_state_t pm_suspend_target_state is PM_SUSPEND_TO_STANBY.
- ACPI S3/S4: refuse cpu offline. The system needs to kill all TDs before
starting SUSPEND process. This is what is implemented.

Thanks,
--
Isaku Yamahata <[email protected]>

2022-11-08 02:03:22

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Mon, 2022-11-07 at 13:46 -0800, Isaku Yamahata wrote:
> > On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > > Thanks for the patch series. I the rebased TDX KVM patch series and it
> > > worked.
> > > Since cpu offline needs to be rejected in some cases(To keep at least one
> > > cpu
> > > on a package), arch hook for cpu offline is needed.
> >
> > I hate to bring this up because I doubt there's a real use case for SUSPEND
> > with
> > TDX, but the CPU offline path isn't just for true offlining of CPUs.  When
> > the
> > system enters SUSPEND, only the initiating CPU goes through
> > kvm_suspend()+kvm_resume(),
> > all responding CPUs go through CPU offline+online.  I.e. disallowing all
> > CPUs from
> > going "offline" will prevent suspending the system.
>
> The current TDX KVM implementation disallows CPU package from offline only
> when
> TDs are running.  If no TD is running, CPU offline is allowed.  So before
> SUSPEND, TDs need to be killed via systemd or something.  After killing TDs,
> the
> system can enter into SUSPEND state.

This seems not correct. You need one cpu for each to be online in order to
create TD as well, as TDH.MNG.KEY.CONFIG needs to be called on all packages,
correct?

2022-11-08 05:52:19

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Tue, Nov 08, 2022 at 01:09:27AM +0000,
"Huang, Kai" <[email protected]> wrote:

> On Mon, 2022-11-07 at 13:46 -0800, Isaku Yamahata wrote:
> > > On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > > > Thanks for the patch series. I the rebased TDX KVM patch series and it
> > > > worked.
> > > > Since cpu offline needs to be rejected in some cases(To keep at least one
> > > > cpu
> > > > on a package), arch hook for cpu offline is needed.
> > >
> > > I hate to bring this up because I doubt there's a real use case for SUSPEND
> > > with
> > > TDX, but the CPU offline path isn't just for true offlining of CPUs.  When
> > > the
> > > system enters SUSPEND, only the initiating CPU goes through
> > > kvm_suspend()+kvm_resume(),
> > > all responding CPUs go through CPU offline+online.  I.e. disallowing all
> > > CPUs from
> > > going "offline" will prevent suspending the system.
> >
> > The current TDX KVM implementation disallows CPU package from offline only
> > when
> > TDs are running.  If no TD is running, CPU offline is allowed.  So before
> > SUSPEND, TDs need to be killed via systemd or something.  After killing TDs,
> > the
> > system can enter into SUSPEND state.
>
> This seems not correct. You need one cpu for each to be online in order to
> create TD as well, as TDH.MNG.KEY.CONFIG needs to be called on all packages,
> correct?

That's correct. In such case, the creation of TD fails. TD creation checks if
at least one cpu is online on all CPU packages. If no, error.
--
Isaku Yamahata <[email protected]>

2022-11-08 09:11:27

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Mon, 2022-11-07 at 21:43 -0800, Isaku Yamahata wrote:
> On Tue, Nov 08, 2022 at 01:09:27AM +0000,
> "Huang, Kai" <[email protected]> wrote:
>
> > On Mon, 2022-11-07 at 13:46 -0800, Isaku Yamahata wrote:
> > > > On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > > > > Thanks for the patch series. I the rebased TDX KVM patch series and it
> > > > > worked.
> > > > > Since cpu offline needs to be rejected in some cases(To keep at least one
> > > > > cpu
> > > > > on a package), arch hook for cpu offline is needed.
> > > >
> > > > I hate to bring this up because I doubt there's a real use case for SUSPEND
> > > > with
> > > > TDX, but the CPU offline path isn't just for true offlining of CPUs.  When
> > > > the
> > > > system enters SUSPEND, only the initiating CPU goes through
> > > > kvm_suspend()+kvm_resume(),
> > > > all responding CPUs go through CPU offline+online.  I.e. disallowing all
> > > > CPUs from
> > > > going "offline" will prevent suspending the system.
> > >
> > > The current TDX KVM implementation disallows CPU package from offline only
> > > when
> > > TDs are running.  If no TD is running, CPU offline is allowed.  So before
> > > SUSPEND, TDs need to be killed via systemd or something.  After killing TDs,
> > > the
> > > system can enter into SUSPEND state.
> >
> > This seems not correct. You need one cpu for each to be online in order to
> > create TD as well, as TDH.MNG.KEY.CONFIG needs to be called on all packages,
> > correct?
>
> That's correct. In such case, the creation of TD fails. TD creation checks if
> at least one cpu is online on all CPU packages. If no, error.

I think we can just always refuse to offline the last cpu for each package when
TDX is enabled. It's simpler I guess.

2022-11-08 10:54:26

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Tue, 2022-11-08 at 08:56 +0000, Huang, Kai wrote:
> On Mon, 2022-11-07 at 21:43 -0800, Isaku Yamahata wrote:
> > On Tue, Nov 08, 2022 at 01:09:27AM +0000,
> > "Huang, Kai" <[email protected]> wrote:
> >
> > > On Mon, 2022-11-07 at 13:46 -0800, Isaku Yamahata wrote:
> > > > > On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > > > > > Thanks for the patch series. I the rebased TDX KVM patch series and it
> > > > > > worked.
> > > > > > Since cpu offline needs to be rejected in some cases(To keep at least one
> > > > > > cpu
> > > > > > on a package), arch hook for cpu offline is needed.
> > > > >
> > > > > I hate to bring this up because I doubt there's a real use case for SUSPEND
> > > > > with
> > > > > TDX, but the CPU offline path isn't just for true offlining of CPUs.  When
> > > > > the
> > > > > system enters SUSPEND, only the initiating CPU goes through
> > > > > kvm_suspend()+kvm_resume(),
> > > > > all responding CPUs go through CPU offline+online.  I.e. disallowing all
> > > > > CPUs from
> > > > > going "offline" will prevent suspending the system.
> > > >
> > > > The current TDX KVM implementation disallows CPU package from offline only
> > > > when
> > > > TDs are running.  If no TD is running, CPU offline is allowed.  So before
> > > > SUSPEND, TDs need to be killed via systemd or something.  After killing TDs,
> > > > the
> > > > system can enter into SUSPEND state.
> > >
> > > This seems not correct. You need one cpu for each to be online in order to
> > > create TD as well, as TDH.MNG.KEY.CONFIG needs to be called on all packages,
> > > correct?
> >
> > That's correct. In such case, the creation of TD fails. TD creation checks if
> > at least one cpu is online on all CPU packages. If no, error.
>
> I think we can just always refuse to offline the last cpu for each package when
> TDX is enabled. It's simpler I guess.

Sorry I wasn't reading carefully. Please ignore. We need to support suspend :)

2022-11-08 18:12:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling

On Mon, Nov 07, 2022, Isaku Yamahata wrote:
> On Fri, Nov 04, 2022 at 08:27:14PM +0000,
> Sean Christopherson <[email protected]> wrote:
>
> > On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> > > Thanks for the patch series. I the rebased TDX KVM patch series and it worked.
> > > Since cpu offline needs to be rejected in some cases(To keep at least one cpu
> > > on a package), arch hook for cpu offline is needed.
> >
> > I hate to bring this up because I doubt there's a real use case for SUSPEND with
> > TDX, but the CPU offline path isn't just for true offlining of CPUs. When the
> > system enters SUSPEND, only the initiating CPU goes through kvm_suspend()+kvm_resume(),
> > all responding CPUs go through CPU offline+online. I.e. disallowing all CPUs from
> > going "offline" will prevent suspending the system.
>
> The current TDX KVM implementation disallows CPU package from offline only when
> TDs are running. If no TD is running, CPU offline is allowed. So before
> SUSPEND, TDs need to be killed via systemd or something. After killing TDs, the
> system can enter into SUSPEND state.

Ah, I assumed offlining was disallowed if TDX was enabled.

> > I don't see anything in the TDX series or the specs that suggests suspend+resume
> > is disallowed when TDX is enabled, so blocking that seems just as wrong as
> > preventing software from soft-offlining CPUs.
>
> When it comes to SUSPEND, it means suspend-to-idle, ACPI S1, S3, or S4.
> suspend-to-idle doesn't require CPU offline.
>
> Although CPU related spec doesn't mention about S3, the ACPI spec says
>
> 7.4.2.2 System _S1 State (Sleeping with Processor Context Maintained)
> The processor-complex context is maintained.
>
> 7.4.2.4 System _S3 State or 7.4.2.5 System _S4 State
> The processor-complex context is not maintained.
>
> It's safe to say the processor context related to TDX is complex, I think.
> Let me summarize the situation. What do you think?
>
> - While no TD running:
> No additional limitation on CPU offline.
>
> - On TD creation:
> If any of whole cpu package is software offlined, TD creation fails.
> Alternative: forcibly online necessary CPUs, create TD, and offline CPUs

The alternative isn't really viable because there's no way the kernel can guarantee
a CPU can be onlined, i.e. the kernel would need to fallback of disallowing TD
creation anyways.

> - TD running:
> Although it's not required to keep all CPU packages online, keep CPU package
> from offlining for TD destruction.
>
> - TD destruction:
> If any of whole cpu package is software offlined, TD destruction fails.
> The current implementation prevents any cpu package from offlinining during
> TD running.
> Alternative:
> - forcibly online necessary CPUs, destruct TD, and offline CPUs again and
> allow CPU package to offline
> - Stash TDX resources somewhere. When cpu packages are onlined, free those
> release.
>
> - On SUSPEND:
> TODO: Allow CPU offline if S1 is requested.

Is this actually a TODO? I assume the kernel doesn't actually try to offline
CPUs in this case, i.e. it Just Works.

> - suspend-to-idle: nothing to do because cpu offline isn't required
> - ACPI S1: Need to allow offline CPUs. This can be implemented by referencing
> suspend_state_t pm_suspend_target_state is PM_SUSPEND_TO_STANBY.
> - ACPI S3/S4: refuse cpu offline. The system needs to kill all TDs before
> starting SUSPEND process. This is what is implemented.

Looks good, disallowing SUSPEND with active TDs is a reasonable tradeoff. As
above, I highly doubt anyone actually cares.

2022-11-10 01:44:19

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Wed, 2022-11-02 at 23:19 +0000, Sean Christopherson wrote:
> From: Chao Gao <[email protected]>
>
> Disable CPU hotplug during hardware_enable_all() to prevent the corner
> case where if the following sequence occurs:
>
>   1. A hotplugged CPU marks itself online in cpu_online_mask
>   2. The hotplugged CPU enables interrupt before invoking KVM's ONLINE
>      callback
>   3  hardware_enable_all() is invoked on another CPU right
>
> the hotplugged CPU will be included in on_each_cpu() and thus get sent
> through hardware_enable_nolock() before kvm_online_cpu() is called.
>
>         start_secondary { ...
>                 set_cpu_online(smp_processor_id(), true); <- 1
>                 ...
>                 local_irq_enable();  <- 2
>                 ...
>                 cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
>         }
>
> KVM currently fudges around this race by keeping track of which CPUs have
> done hardware enabling (see commit 1b6c016818a5 "KVM: Keep track of which
> cpus have virtualization enabled"), but that's an inefficient, convoluted,
> and hacky solution.
>
> Signed-off-by: Chao Gao <[email protected]>
> [sean: split to separate patch, write changelog]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/x86/kvm/x86.c  |  8 +++++++-
>  virt/kvm/kvm_main.c | 10 ++++++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a7b1d916ecb2..a15e54ba0471 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9283,7 +9283,13 @@ static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
>   int cpu = smp_processor_id();
>   struct cpuinfo_x86 *c = &cpu_data(cpu);
>  
> - WARN_ON(!irqs_disabled());
> + /*
> + * Compatibility checks are done when loading KVM and when enabling
> + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> + * compatible, i.e. KVM should never perform a compatibility check on
> + * an offline CPU.
> + */
> + WARN_ON(!irqs_disabled() && cpu_active(cpu));
>  

Also, the logic of:

!irqs_disabled() && cpu_active(cpu)

is quite weird.

The original "WARN(!irqs_disabled())" is reasonable because in STARTING section
the IRQ is indeed disabled.

But this doesn't make sense anymore after we move to ONLINE section, in which
IRQ has already been enabled (see start_secondary()). IIUC the WARN_ON()
doesn't get exploded is purely because there's an additional cpu_active(cpu)
check.

So, a more reasonable check should be something like:

WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));

Or we can simply do:

WARN_ON(!cpu_online(cpu) || cpu_active(cpu));

(because I don't know whether it's possible IRQ can somehow get disabled in
ONLINE section).

Btw above is purely based on code analysis, but I haven't done any test.

2022-11-10 01:53:35

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Wed, 2022-11-02 at 23:19 +0000, Sean Christopherson wrote:
> From: Chao Gao <[email protected]>
>
> Disable CPU hotplug during hardware_enable_all() to prevent the corner
> case where if the following sequence occurs:
>
> 1. A hotplugged CPU marks itself online in cpu_online_mask
> 2. The hotplugged CPU enables interrupt before invoking KVM's ONLINE
> callback
> 3 hardware_enable_all() is invoked on another CPU right
>
> the hotplugged CPU will be included in on_each_cpu() and thus get sent
> through hardware_enable_nolock() before kvm_online_cpu() is called.
>
> start_secondary { ...
> set_cpu_online(smp_processor_id(), true); <- 1
> ...
> local_irq_enable(); <- 2
> ...
> cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
> }
>
> KVM currently fudges around this race by keeping track of which CPUs have
> done hardware enabling (see commit 1b6c016818a5 "KVM: Keep track of which
> cpus have virtualization enabled"), but that's an inefficient, convoluted,
> and hacky solution.
>
> Signed-off-by: Chao Gao <[email protected]>
> [sean: split to separate patch, write changelog]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/x86.c | 8 +++++++-
> virt/kvm/kvm_main.c | 10 ++++++++++
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a7b1d916ecb2..a15e54ba0471 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9283,7 +9283,13 @@ static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> int cpu = smp_processor_id();
> struct cpuinfo_x86 *c = &cpu_data(cpu);
>
> - WARN_ON(!irqs_disabled());
> + /*
> + * Compatibility checks are done when loading KVM and when enabling
> + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> + * compatible, i.e. KVM should never perform a compatibility check on
> + * an offline CPU.
> + */
> + WARN_ON(!irqs_disabled() && cpu_active(cpu));

Comment doesn't match with the code?

"KVM should never perform a compatibility check on on offline CPU" should be
something like:

WARN_ON(!cpu_online(cpu));

So, should the comment be something like below?

"KVM compatibility check happens before CPU is marked as active".

>
> if (__cr4_reserved_bits(cpu_has, c) !=
> __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fd9e39c85549..4e765ef9f4bd 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5088,6 +5088,15 @@ static int hardware_enable_all(void)
> {
> int r = 0;
>
> + /*
> + * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> + * is called, and so on_each_cpu() between them includes the CPU that
> + * is being onlined. As a result, hardware_enable_nolock() may get
> + * invoked before kvm_online_cpu().
> + *
> + * Disable CPU hotplug to prevent scenarios where KVM sees
> + */

The above sentence is broken.

I think below comment Quoted from Isaku's series should be OK?

/*
* During onlining a CPU, cpu_online_mask is set before
kvm_online_cpu()
* is called. on_each_cpu() between them includes the CPU. As a result,
* hardware_enable_nolock() may get invoked before kvm_online_cpu().
* This would enable hardware virtualization on that cpu without
* compatibility checks, which can potentially crash system or break
* running VMs.
*
* Disable CPU hotplug to prevent this case from happening.
*/

> + cpus_read_lock();
> raw_spin_lock(&kvm_count_lock);
>
> kvm_usage_count++;
> @@ -5102,6 +5111,7 @@ static int hardware_enable_all(void)
> }
>
> raw_spin_unlock(&kvm_count_lock);
> + cpus_read_unlock();
>
> return r;
> }

2022-11-10 02:38:06

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > @@ -9283,7 +9283,13 @@ static int
> > kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> >   int cpu = smp_processor_id();
> >   struct cpuinfo_x86 *c = &cpu_data(cpu);
> >  
> > - WARN_ON(!irqs_disabled());
> > + /*
> > + * Compatibility checks are done when loading KVM and when enabling
> > + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > + * compatible, i.e. KVM should never perform a compatibility check
> > on
> > + * an offline CPU.
> > + */
> > + WARN_ON(!irqs_disabled() && cpu_active(cpu));
> >  
>
> Also, the logic of:
>
> !irqs_disabled() && cpu_active(cpu)
>
> is quite weird.
>
> The original "WARN(!irqs_disabled())" is reasonable because in STARTING
> section
> the IRQ is indeed disabled.
>
> But this doesn't make sense anymore after we move to ONLINE section, in which
> IRQ has already been enabled (see start_secondary()).  IIUC the WARN_ON()
> doesn't get exploded is purely because there's an additional cpu_active(cpu)
> check.
>
> So, a more reasonable check should be something like:
>
> WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));
>
> Or we can simply do:
>
> WARN_ON(!cpu_online(cpu) || cpu_active(cpu));
>
> (because I don't know whether it's possible IRQ can somehow get disabled in
> ONLINE section).
>
> Btw above is purely based on code analysis, but I haven't done any test.

Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
happens on all online cpus when loading KVM. For this case, IRQ is disabled and
cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
false.

So WARN_ON(!irqs_disabled() && cpu_active(cpu)) looks reasonable. Sorry for the
noise. Just needed some time to connect the comment with the code.

2022-11-10 02:38:55

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Thu, 2022-11-10 at 01:08 +0000, Huang, Kai wrote:
> > - WARN_ON(!irqs_disabled());
> > + /*
> > + * Compatibility checks are done when loading KVM and when enabling
> > + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > + * compatible, i.e. KVM should never perform a compatibility check
> > on
> > + * an offline CPU.
> > + */
> > + WARN_ON(!irqs_disabled() && cpu_active(cpu));
>
> Comment doesn't match with the code?
>
> "KVM should never perform a compatibility check on on offline CPU" should be
> something like:
>
> WARN_ON(!cpu_online(cpu));
>
> So, should the comment be something like below?
>
> "KVM compatibility check happens before CPU is marked as active".

Also ignore this one as I only thought about hotplug case.

2022-11-10 07:42:09

by Robert Hoo

[permalink] [raw]
Subject: Re: [PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section

On Wed, 2022-11-02 at 23:19 +0000, Sean Christopherson wrote:
> From: Chao Gao <[email protected]>
>
> The CPU STARTING section doesn't allow callbacks to fail. Move KVM's
> hotplug callback to ONLINE section so that it can abort onlining a
> CPU in
> certain cases to avoid potentially breaking VMs running on existing
> CPUs.
> For example, when KVM fails to enable hardware virtualization on the
> hotplugged CPU.
>
> Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it
> ensures
> when offlining a CPU, all user tasks and non-pinned kernel tasks have
> left
> the CPU, i.e. there cannot be a vCPU task around. So, it is safe for
> KVM's
> CPU offline callback to disable hardware virtualization at that
> point.
> Likewise, KVM's online callback can enable hardware virtualization
> before
> any vCPU task gets a chance to run on hotplugged CPUs.
>
> Rename KVM's CPU hotplug callbacks accordingly.
>
> Suggested-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Chao Gao <[email protected]>
> Reviewed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Yuan Yao <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> include/linux/cpuhotplug.h | 2 +-
> virt/kvm/kvm_main.c | 30 ++++++++++++++++++++++--------
> 2 files changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 7337414e4947..de45be38dd27 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -185,7 +185,6 @@ enum cpuhp_state {
> CPUHP_AP_CSKY_TIMER_STARTING,
> CPUHP_AP_TI_GP_TIMER_STARTING,
> CPUHP_AP_HYPERV_TIMER_STARTING,
> - CPUHP_AP_KVM_STARTING,
> /* Must be the last timer callback */
> CPUHP_AP_DUMMY_TIMER_STARTING,
> CPUHP_AP_ARM_XEN_STARTING,
> @@ -200,6 +199,7 @@ enum cpuhp_state {
>
> /* Online section invoked on the hotplugged CPU from the
> hotplug thread */
> CPUHP_AP_ONLINE_IDLE,
> + CPUHP_AP_KVM_ONLINE,
> CPUHP_AP_SCHED_WAIT_EMPTY,
> CPUHP_AP_SMPBOOT_THREADS,
> CPUHP_AP_X86_VDSO_VMA_ONLINE,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index dd13af9f06d5..fd9e39c85549 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5026,13 +5026,27 @@ static void hardware_enable_nolock(void
> *junk)
> }
> }
>
> -static int kvm_starting_cpu(unsigned int cpu)
> +static int kvm_online_cpu(unsigned int cpu)
> {
> + int ret = 0;
> +
> raw_spin_lock(&kvm_count_lock);
> - if (kvm_usage_count)
> + /*
> + * Abort the CPU online process if hardware virtualization
> cannot
> + * be enabled. Otherwise running VMs would encounter
> unrecoverable
> + * errors when scheduled to this CPU.
> + */
> + if (kvm_usage_count) {
> + WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
> +
> hardware_enable_nolock(NULL);
> + if (atomic_read(&hardware_enable_failed)) {
> + atomic_set(&hardware_enable_failed, 0);

I see other places using this hardware_enable_failed with atomic_inc(),
should here use atomic_dec() instead of straightly set to 0?
Though here is embraced by spin_lock, hardware_enable_nolock() can be
invoked in other places in parallel?

Fortunately in the end of this patch set, global hardware_enable_failed
is get rid of.

> + ret = -EIO;
> + }
> + }
> raw_spin_unlock(&kvm_count_lock);
> - return 0;
> + return ret;
> }
>
> static void hardware_disable_nolock(void *junk)
> @@ -5045,7 +5059,7 @@ static void hardware_disable_nolock(void *junk)
> kvm_arch_hardware_disable();
> }
>
> -static int kvm_dying_cpu(unsigned int cpu)
> +static int kvm_offline_cpu(unsigned int cpu)
> {
> raw_spin_lock(&kvm_count_lock);
> if (kvm_usage_count)
> @@ -5822,8 +5836,8 @@ int kvm_init(unsigned vcpu_size, unsigned
> vcpu_align, struct module *module)
> if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL))
> return -ENOMEM;
>
> - r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING,
> "kvm/cpu:starting",
> - kvm_starting_cpu, kvm_dying_cpu);
> + r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE,
> "kvm/cpu:online",
> + kvm_online_cpu, kvm_offline_cpu);
> if (r)
> goto out_free_2;
> register_reboot_notifier(&kvm_reboot_notifier);
> @@ -5897,7 +5911,7 @@ int kvm_init(unsigned vcpu_size, unsigned
> vcpu_align, struct module *module)
> kmem_cache_destroy(kvm_vcpu_cache);
> out_free_3:
> unregister_reboot_notifier(&kvm_reboot_notifier);
> - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
> + cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> out_free_2:
> free_cpumask_var(cpus_hardware_enabled);
> return r;
> @@ -5923,7 +5937,7 @@ void kvm_exit(void)
> kvm_async_pf_deinit();
> unregister_syscore_ops(&kvm_syscore_ops);
> unregister_reboot_notifier(&kvm_reboot_notifier);
> - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
> + cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> on_each_cpu(hardware_disable_nolock, NULL, 1);
> kvm_irqfd_exit();
> free_cpumask_var(cpus_hardware_enabled);


2022-11-10 07:55:14

by Robert Hoo

[permalink] [raw]
Subject: Re: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules

On Wed, 2022-11-02 at 23:18 +0000, Sean Christopherson wrote:
> Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that
> printks
> use consistent formatting across common x86, Intel, and AMD code. In
> addition to providing consistent print formatting, using
> KBUILD_MODNAME,
> e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV
> and
> SGX and ...) as technologies without generating weird messages, and
> without causing naming conflicts with other kernel code, e.g. "SEV:
> ",
> "tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM
> subsystems.
>
> Opportunistically move away from printk() for prints that need to be
> modified anyways, e.g. to drop a manual "kvm: " prefix.
>
> Opportunistically convert a few SGX WARNs that are similarly modified
> to
> WARN_ONCE; in the very unlikely event that the WARNs fire, odds are
> good
> that they would fire repeatedly and spam the kernel log without
> providing
> unique information in each print.
>
> Note, defining pr_fmt yields undesirable results for code that uses
> KVM's
> printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing
> problem
> as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
> wrappers is relatively limited in KVM x86 code.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 1 +
> arch/x86/kvm/debugfs.c | 2 ++
> arch/x86/kvm/emulate.c | 1 +
> arch/x86/kvm/hyperv.c | 1 +
> arch/x86/kvm/i8254.c | 4 ++--
> arch/x86/kvm/i8259.c | 4 +++-
> arch/x86/kvm/ioapic.c | 1 +
> arch/x86/kvm/irq.c | 1 +
> arch/x86/kvm/irq_comm.c | 7 +++---
> arch/x86/kvm/kvm_onhyperv.c | 1 +
> arch/x86/kvm/lapic.c | 8 +++----
> arch/x86/kvm/mmu/mmu.c | 6 ++---
> arch/x86/kvm/mmu/page_track.c | 1 +
> arch/x86/kvm/mmu/spte.c | 4 ++--
> arch/x86/kvm/mmu/spte.h | 4 ++--
> arch/x86/kvm/mmu/tdp_iter.c | 1 +
> arch/x86/kvm/mmu/tdp_mmu.c | 1 +
> arch/x86/kvm/mtrr.c | 1 +
> arch/x86/kvm/pmu.c | 1 +
> arch/x86/kvm/smm.c | 1 +
> arch/x86/kvm/svm/avic.c | 2 +-
> arch/x86/kvm/svm/nested.c | 2 +-
> arch/x86/kvm/svm/pmu.c | 2 ++
> arch/x86/kvm/svm/sev.c | 1 +
> arch/x86/kvm/svm/svm.c | 10 ++++-----
> arch/x86/kvm/svm/svm_onhyperv.c | 1 +
> arch/x86/kvm/svm/svm_onhyperv.h | 4 ++--
> arch/x86/kvm/vmx/evmcs.c | 1 +
> arch/x86/kvm/vmx/evmcs.h | 4 +---
> arch/x86/kvm/vmx/nested.c | 3 ++-
> arch/x86/kvm/vmx/pmu_intel.c | 5 +++--
> arch/x86/kvm/vmx/posted_intr.c | 2 ++
> arch/x86/kvm/vmx/sgx.c | 5 +++--
> arch/x86/kvm/vmx/vmcs12.c | 1 +
> arch/x86/kvm/vmx/vmx.c | 40 ++++++++++++++++---------------
> --
> arch/x86/kvm/vmx/vmx_ops.h | 4 ++--
> arch/x86/kvm/x86.c | 28 ++++++++++++-----------
> arch/x86/kvm/xen.c | 1 +
> 38 files changed, 97 insertions(+), 70 deletions(-)
>
After this patch set, still find some printk()s left in arch/x86/kvm/*,
consider clean all of them up?

arch/x86/kvm/lapic.c:1215: printk(KERN_ERR "TODO:
unsupported delivery mode %x\n",
arch/x86/kvm/lapic.c:1506: printk(KERN_ERR "Local APIC
read with len = %x, "
arch/x86/kvm/lapic.c:2586: printk(KERN_ERR "malloc apic
regs error for vcpu %x\n",
arch/x86/kvm/ioapic.h:95: printk(KERN_EMERG "assertion
failed %s: %d: %s\n", \
arch/x86/kvm/ioapic.c:614: printk(KERN_WARNING "ioapic:
wrong length %d\n", len);
arch/x86/kvm/ioapic.c:641: printk(KERN_WARNING "ioapic:
Unsupported size %d\n", len);
arch/x86/kvm/mmu/mmu.c:1652: printk(KERN_ERR "%s: %p
%llx\n", __func__,
arch/x86/kvm/svm/svm.c:3450: printk(KERN_ERR "%s: unexpected
exit_int_info 0x%x "
arch/x86/kvm/vmx/posted_intr.c:322: printk(
KERN_INFO
arch/x86/kvm/vmx/posted_intr.c:343: printk(KERN_INF
O "%s: failed to update PI IRTE\n",
arch/x86/kvm/vmx/vmx.c:6507: printk(KERN_WARNING
"%s: Breaking out of NMI-blocked "
arch/x86/kvm/x86.c:13027: printk(KERN_INFO "irq bypass
consumer (token %p) unregistration"


2022-11-10 17:07:55

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section

On Thu, Nov 10, 2022, Robert Hoo wrote:
> > -static int kvm_starting_cpu(unsigned int cpu)
> > +static int kvm_online_cpu(unsigned int cpu)
> > {
> > + int ret = 0;
> > +
> > raw_spin_lock(&kvm_count_lock);
> > - if (kvm_usage_count)
> > + /*
> > + * Abort the CPU online process if hardware virtualization
> > cannot
> > + * be enabled. Otherwise running VMs would encounter
> > unrecoverable
> > + * errors when scheduled to this CPU.
> > + */
> > + if (kvm_usage_count) {
> > + WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
> > +
> > hardware_enable_nolock(NULL);
> > + if (atomic_read(&hardware_enable_failed)) {
> > + atomic_set(&hardware_enable_failed, 0);
>
> I see other places using this hardware_enable_failed with atomic_inc(),
> should here use atomic_dec() instead of straightly set to 0?

Meh, both options are flawed. E.g. if hardware_enable_failed was left dangling
(the WARN above), then atomic_dec() won't remedy the problem and KVM will reject
onlining CPUs indefinitely. Forcing the atomic back to '0' will remedy that
particular issue, but could lead to problems if there are other bugs.

> Though here is embraced by spin_lock, hardware_enable_nolock() can be
> invoked in other places in parallel?

Only because of a KVM bug, which gets fixed in the next patch:

KVM: Disable CPU hotplug during hardware enabling

2022-11-10 17:13:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules

On Thu, Nov 10, 2022, Robert Hoo wrote:
> After this patch set, still find some printk()s left in arch/x86/kvm/*,
> consider clean all of them up?

Hmm, yeah, I suppose at this point it makes sense to tack on a patch to clean
them up.

2022-11-10 17:15:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Thu, Nov 10, 2022, Huang, Kai wrote:
> On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > > @@ -9283,7 +9283,13 @@ static int
> > > kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> > > ? int cpu = smp_processor_id();
> > > ? struct cpuinfo_x86 *c = &cpu_data(cpu);
> > > ?
> > > - WARN_ON(!irqs_disabled());
> > > + /*
> > > + * Compatibility checks are done when loading KVM and when enabling
> > > + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > > + * compatible, i.e. KVM should never perform a compatibility check
> > > on
> > > + * an offline CPU.
> > > + */
> > > + WARN_ON(!irqs_disabled() && cpu_active(cpu));
> > > ?
> >
> > Also, the logic of:
> >
> > !irqs_disabled() && cpu_active(cpu)
> >
> > is quite weird.
> >
> > The original "WARN(!irqs_disabled())" is reasonable because in STARTING
> > section
> > the IRQ is indeed disabled.
> >
> > But this doesn't make sense anymore after we move to ONLINE section, in which
> > IRQ has already been enabled (see start_secondary()).? IIUC the WARN_ON()
> > doesn't get exploded is purely because there's an additional cpu_active(cpu)
> > check.
> >
> > So, a more reasonable check should be something like:
> >
> > WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));
> >
> > Or we can simply do:
> >
> > WARN_ON(!cpu_online(cpu) || cpu_active(cpu));
> >
> > (because I don't know whether it's possible IRQ can somehow get disabled in
> > ONLINE section).
> >
> > Btw above is purely based on code analysis, but I haven't done any test.
>
> Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
> happens on all online cpus when loading KVM. For this case, IRQ is disabled and
> cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
> false.
>
> So WARN_ON(!irqs_disabled() && cpu_active(cpu)) looks reasonable. Sorry for the
> noise. Just needed some time to connect the comment with the code.

No worries, more than once while working through this code, I've considered setting
up one of those evidence boards from the movies with string and push pins to help
connect all the dots.

2022-11-11 00:20:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU

On Fri, Nov 04, 2022, Isaku Yamahata wrote:
> On Thu, Nov 03, 2022 at 10:34:10PM +0000,
> Sean Christopherson <[email protected]> wrote:
>
> > On Thu, Nov 03, 2022, Isaku Yamahata wrote:
> > > On Wed, Nov 02, 2022 at 11:19:03PM +0000,
> > > Sean Christopherson <[email protected]> wrote:
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index f223c845ed6e..c99222b71fcc 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1666,7 +1666,7 @@ struct kvm_x86_nested_ops {
> > > > };
> > > >
> > > > struct kvm_x86_init_ops {
> > > > - int (*check_processor_compatibility)(void);
> > > > + int (*check_processor_compatibility)(int cpu);
> > >
> > > Is this cpu argument used only for error message to include cpu number
> > > with avoiding repeating raw_smp_processor_id() in pr_err()?
> >
> > Yep.
> >
> > > The actual check is done on the current executing cpu.
> > >
> > > If cpu != raw_smp_processor_id(), cpu is wrong. Although the function is called
> > > in non-preemptive context, it's a bit confusing. So voting to remove it and
> > > to use.
> >
> > What if I rename the param is this_cpu? I 100% agree the argument is confusing
> > as-is, but forcing all the helpers to manually grab the cpu is quite annoying.
>
> Makes sense. Let's settle it with this_cpu.

Finally got to actually change the code, and am not a fan of passing "this_cpu"
everywhere. It's not terrible, but it's not clearly better than just grabbing
the CPU on-demand. And while manually grabbing the CPU in the helpers is annoying,
in at least two cases the pain is just shifted to the caller.

I'm going with your original suggestion of just grabbing raw_smp_processor_id()
in the helpers that print the error message.

2022-11-15 20:25:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Thu, Nov 10, 2022, Huang, Kai wrote:
> On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > > @@ -9283,7 +9283,13 @@ static int
> > > kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> > > ? int cpu = smp_processor_id();
> > > ? struct cpuinfo_x86 *c = &cpu_data(cpu);
> > > ?
> > > - WARN_ON(!irqs_disabled());
> > > + /*
> > > + * Compatibility checks are done when loading KVM and when enabling
> > > + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > > + * compatible, i.e. KVM should never perform a compatibility check
> > > on
> > > + * an offline CPU.
> > > + */
> > > + WARN_ON(!irqs_disabled() && cpu_active(cpu));
> > > ?
> >
> > Also, the logic of:
> >
> > !irqs_disabled() && cpu_active(cpu)
> >
> > is quite weird.
> >
> > The original "WARN(!irqs_disabled())" is reasonable because in STARTING
> > section
> > the IRQ is indeed disabled.
> >
> > But this doesn't make sense anymore after we move to ONLINE section, in which
> > IRQ has already been enabled (see start_secondary()).? IIUC the WARN_ON()
> > doesn't get exploded is purely because there's an additional cpu_active(cpu)
> > check.
> >
> > So, a more reasonable check should be something like:
> >
> > WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));
> >
> > Or we can simply do:
> >
> > WARN_ON(!cpu_online(cpu) || cpu_active(cpu));
> >
> > (because I don't know whether it's possible IRQ can somehow get disabled in
> > ONLINE section).
> >
> > Btw above is purely based on code analysis, but I haven't done any test.
>
> Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
> happens on all online cpus when loading KVM. For this case, IRQ is disabled and
> cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
> false.

Actually, you're right (and wrong). You're right in that the WARN is flawed. And
the reason for that is because you're wrong about the hotplug case. In this version
of things, the compatibility checks are routed through hardware enabling, i.e. this
flow is used only when loading KVM. This helper should only be called via SMP function
call, which means that IRQs should always be disabled.

2022-11-15 20:43:01

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Tue, Nov 15, 2022, Sean Christopherson wrote:
> On Thu, Nov 10, 2022, Huang, Kai wrote:
> > On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > > > @@ -9283,7 +9283,13 @@ static int
> > > > kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> > > > ? int cpu = smp_processor_id();
> > > > ? struct cpuinfo_x86 *c = &cpu_data(cpu);
> > > > ?
> > > > - WARN_ON(!irqs_disabled());
> > > > + /*
> > > > + * Compatibility checks are done when loading KVM and when enabling
> > > > + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > > > + * compatible, i.e. KVM should never perform a compatibility check
> > > > on
> > > > + * an offline CPU.
> > > > + */
> > > > + WARN_ON(!irqs_disabled() && cpu_active(cpu));
> > > > ?
> > >
> > > Also, the logic of:
> > >
> > > !irqs_disabled() && cpu_active(cpu)
> > >
> > > is quite weird.
> > >
> > > The original "WARN(!irqs_disabled())" is reasonable because in STARTING
> > > section
> > > the IRQ is indeed disabled.
> > >
> > > But this doesn't make sense anymore after we move to ONLINE section, in which
> > > IRQ has already been enabled (see start_secondary()).? IIUC the WARN_ON()
> > > doesn't get exploded is purely because there's an additional cpu_active(cpu)
> > > check.
> > >
> > > So, a more reasonable check should be something like:
> > >
> > > WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));
> > >
> > > Or we can simply do:
> > >
> > > WARN_ON(!cpu_online(cpu) || cpu_active(cpu));
> > >
> > > (because I don't know whether it's possible IRQ can somehow get disabled in
> > > ONLINE section).
> > >
> > > Btw above is purely based on code analysis, but I haven't done any test.
> >
> > Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
> > happens on all online cpus when loading KVM. For this case, IRQ is disabled and
> > cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
> > false.
>
> Actually, you're right (and wrong). You're right in that the WARN is flawed. And
> the reason for that is because you're wrong about the hotplug case. In this version
> of things, the compatibility checks are routed through hardware enabling, i.e. this
> flow is used only when loading KVM. This helper should only be called via SMP function
> call, which means that IRQs should always be disabled.

Grr, but not routing through this helper is flawed in that KVM doesn't do the
CR4 checks in the hardware enabling case. Don't think that changes the WARN, but
other patches in this series need tweaks.

2022-11-15 23:38:32

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On Wed, 2022-11-02 at 23:19 +0000, Sean Christopherson wrote:
> +static bool __init kvm_is_vmx_supported(void)
> +{
> + if (!cpu_has_vmx()) {
> + pr_err("CPU doesn't support VMX\n");
> + return false;
> + }
> +
> + if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> +     !boot_cpu_has(X86_FEATURE_VMX)) {
> + pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
> + return false;
> + }
> +
> + return true;
> +}
> +
>  static int __init vmx_check_processor_compat(void)
>  {
>   struct vmcs_config vmcs_conf;
>   struct vmx_capability vmx_cap;
>  
> - if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> -     !this_cpu_has(X86_FEATURE_VMX)) {
> - pr_err("VMX is disabled on CPU %d\n", smp_processor_id());
> + if (!kvm_is_vmx_supported())
>   return -EIO;
> - }
>  

Looks there's a functional change here -- the old code checks local cpu's
feature bits but the new code always checks bsp's feature bits. Should have no
problem I think, though.

2022-11-16 02:11:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code

On Tue, Nov 15, 2022, Huang, Kai wrote:
> On Wed, 2022-11-02 at 23:19 +0000, Sean Christopherson wrote:
> > +static bool __init kvm_is_vmx_supported(void)
> > +{
> > + if (!cpu_has_vmx()) {
> > + pr_err("CPU doesn't support VMX\n");
> > + return false;
> > + }
> > +
> > + if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> > + ??? !boot_cpu_has(X86_FEATURE_VMX)) {
> > + pr_err("VMX not enabled in MSR_IA32_FEAT_CTL\n");
> > + return false;
> > + }
> > +
> > + return true;
> > +}
> > +
> > ?static int __init vmx_check_processor_compat(void)
> > ?{
> > ? struct vmcs_config vmcs_conf;
> > ? struct vmx_capability vmx_cap;
> > ?
> > - if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> > - ??? !this_cpu_has(X86_FEATURE_VMX)) {
> > - pr_err("VMX is disabled on CPU %d\n", smp_processor_id());
> > + if (!kvm_is_vmx_supported())
> > ? return -EIO;
> > - }
> > ?
>
> Looks there's a functional change here -- the old code checks local cpu's
> feature bits but the new code always checks bsp's feature bits. Should have no
> problem I think, though.

Ouch. The bad check will defeat the purpose of doing compat checks. Nice catch!

2022-11-16 13:28:32

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Tue, 2022-11-15 at 20:16 +0000, Sean Christopherson wrote:
> On Thu, Nov 10, 2022, Huang, Kai wrote:
> > On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > > > @@ -9283,7 +9283,13 @@ static int
> > > > kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops *ops)
> > > >   int cpu = smp_processor_id();
> > > >   struct cpuinfo_x86 *c = &cpu_data(cpu);
> > > >  
> > > > - WARN_ON(!irqs_disabled());
> > > > + /*
> > > > + * Compatibility checks are done when loading KVM and when enabling
> > > > + * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > > > + * compatible, i.e. KVM should never perform a compatibility check
> > > > on
> > > > + * an offline CPU.
> > > > + */
> > > > + WARN_ON(!irqs_disabled() && cpu_active(cpu));
> > > >  
> > >
> > > Also, the logic of:
> > >
> > > !irqs_disabled() && cpu_active(cpu)
> > >
> > > is quite weird.
> > >
> > > The original "WARN(!irqs_disabled())" is reasonable because in STARTING
> > > section
> > > the IRQ is indeed disabled.
> > >
> > > But this doesn't make sense anymore after we move to ONLINE section, in which
> > > IRQ has already been enabled (see start_secondary()).  IIUC the WARN_ON()
> > > doesn't get exploded is purely because there's an additional cpu_active(cpu)
> > > check.
> > >
> > > So, a more reasonable check should be something like:
> > >
> > > WARN_ON(irqs_disabled() || cpu_active(cpu) || !cpu_online(cpu));
> > >
> > > Or we can simply do:
> > >
> > > WARN_ON(!cpu_online(cpu) || cpu_active(cpu));
> > >
> > > (because I don't know whether it's possible IRQ can somehow get disabled in
> > > ONLINE section).
> > >
> > > Btw above is purely based on code analysis, but I haven't done any test.
> >
> > Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
> > happens on all online cpus when loading KVM. For this case, IRQ is disabled and
> > cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
> > false.
>
> Actually, you're right (and wrong). You're right in that the WARN is flawed. And
> the reason for that is because you're wrong about the hotplug case. In this version
> of things, the compatibility checks are routed through hardware enabling, i.e. this
> flow is used only when loading KVM. This helper should only be called via SMP function
> call, which means that IRQs should always be disabled.

Did you mean below code change in later patch "[PATCH 39/44] KVM: Drop
kvm_count_lock and instead protect kvm_usage_count with kvm_lock"?

/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
@@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
if (kvm_usage_count) {
WARN_ON_ONCE(atomic_read(&hardware_enable_failed));

+ local_irq_save(flags);
hardware_enable_nolock(NULL);
+ local_irq_restore(flags);
+

2022-11-16 18:44:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Wed, Nov 16, 2022, Huang, Kai wrote:
> On Tue, 2022-11-15 at 20:16 +0000, Sean Christopherson wrote:
> > On Thu, Nov 10, 2022, Huang, Kai wrote:
> > > On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > > Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
> > > happens on all online cpus when loading KVM. For this case, IRQ is disabled and
> > > cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
> > > false.
> >
> > Actually, you're right (and wrong). You're right in that the WARN is flawed. And
> > the reason for that is because you're wrong about the hotplug case. In this version
> > of things, the compatibility checks are routed through hardware enabling, i.e. this
> > flow is used only when loading KVM. This helper should only be called via SMP function
> > call, which means that IRQs should always be disabled.
>
> Did you mean below code change in later patch "[PATCH 39/44] KVM: Drop
> kvm_count_lock and instead protect kvm_usage_count with kvm_lock"?
>
> /*
> * Abort the CPU online process if hardware virtualization cannot
> * be enabled. Otherwise running VMs would encounter unrecoverable
> @@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
> if (kvm_usage_count) {
> WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
>
> + local_irq_save(flags);
> hardware_enable_nolock(NULL);
> + local_irq_restore(flags);

Sort of. What I was saying is that in this v1, the compatibility checks that are
done during harware enabling are initiated from vendor code, i.e. VMX and SVM call
{svm,vmx}_check_processor_compat() directly. As a result, the compat checks that
are handled in common code:

if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;

are skipped. And if that's fixed, then the above hardware_enable_nolock() call
will bounce through kvm_x86_check_processor_compatibility() with IRQs enabled
once the KVM hotplug hook is moved to the ONLINE section.

As above, the simple "fix" would be to disable IRQs, but that's not actually
necessary. The only requirement is that preemption is disabled so that the checks
are done on the current CPU. The "IRQs disabled" check was a deliberately
agressive WARN that was added to guard against doing compatibility checks from
the "wrong" location.

E.g. this is what I ended up with for a changelog to drop the irqs_disabled()
check and for the end code (though it's not tested yet...)

Drop kvm_x86_check_processor_compatibility()'s WARN that IRQs are
disabled, as the ONLINE section runs with IRQs disabled. The WARN wasn't
intended to be a requirement, e.g. disabling preemption is sufficient,
the IRQ thing was purely an aggressive sanity check since the helper was
only ever invoked via SMP function call.


static int kvm_x86_check_processor_compatibility(void)
{
int cpu = smp_processor_id();
struct cpuinfo_x86 *c = &cpu_data(cpu);

/*
* Compatibility checks are done when loading KVM and when enabling
* hardware, e.g. during CPU hotplug, to ensure all online CPUs are
* compatible, i.e. KVM should never perform a compatibility check on
* an offline CPU.
*/
WARN_ON(!cpu_online(cpu));

if (__cr4_reserved_bits(cpu_has, c) !=
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;

return static_call(kvm_x86_check_processor_compatibility)();
}


int kvm_arch_hardware_enable(void)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
unsigned long i;
int ret;
u64 local_tsc;
u64 max_tsc = 0;
bool stable, backwards_tsc = false;

kvm_user_return_msr_cpu_online();

ret = kvm_x86_check_processor_compatibility();
if (ret)
return ret;

ret = static_call(kvm_x86_hardware_enable)();
if (ret != 0)
return ret;


....
}

2022-11-17 01:56:11

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Wed, 2022-11-16 at 17:11 +0000, Sean Christopherson wrote:
> On Wed, Nov 16, 2022, Huang, Kai wrote:
> > On Tue, 2022-11-15 at 20:16 +0000, Sean Christopherson wrote:
> > > On Thu, Nov 10, 2022, Huang, Kai wrote:
> > > > On Thu, 2022-11-10 at 01:33 +0000, Huang, Kai wrote:
> > > > Hmm.. I wasn't thinking thoroughly. I forgot CPU compatibility check also
> > > > happens on all online cpus when loading KVM. For this case, IRQ is disabled and
> > > > cpu_active() is true. For the hotplug case, IRQ is enabled but cpu_active() is
> > > > false.
> > >
> > > Actually, you're right (and wrong). You're right in that the WARN is flawed. And
> > > the reason for that is because you're wrong about the hotplug case. In this version
> > > of things, the compatibility checks are routed through hardware enabling, i.e. this
> > > flow is used only when loading KVM. This helper should only be called via SMP function
> > > call, which means that IRQs should always be disabled.
> >
> > Did you mean below code change in later patch "[PATCH 39/44] KVM: Drop
> > kvm_count_lock and instead protect kvm_usage_count with kvm_lock"?
> >
> > /*
> > * Abort the CPU online process if hardware virtualization cannot
> > * be enabled. Otherwise running VMs would encounter unrecoverable
> > @@ -5039,13 +5039,16 @@ static int kvm_online_cpu(unsigned int cpu)
> > if (kvm_usage_count) {
> > WARN_ON_ONCE(atomic_read(&hardware_enable_failed));
> >
> > + local_irq_save(flags);
> > hardware_enable_nolock(NULL);
> > + local_irq_restore(flags);
>
> Sort of. What I was saying is that in this v1, the compatibility checks that are
> done during harware enabling are initiated from vendor code, i.e. VMX and SVM call
> {svm,vmx}_check_processor_compat() directly. As a result, the compat checks that
> are handled in common code:
>
> if (__cr4_reserved_bits(cpu_has, c) !=
> __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> return -EIO;
>
> are skipped. And if that's fixed, then the above hardware_enable_nolock() call
> will bounce through kvm_x86_check_processor_compatibility() with IRQs enabled
> once the KVM hotplug hook is moved to the ONLINE section.

Oh I see. So you still want the kvm_x86_ops->check_processor_compatibility(),
in order to avoid duplicating the above code in SVM and VMX.

>
> As above, the simple "fix" would be to disable IRQs, but that's not actually
> necessary. The only requirement is that preemption is disabled so that the checks
> are done on the current CPU.  
>

Probably even preemption is allowed, as long as the compatibility check is not
scheduled to another cpu.


> The "IRQs disabled" check was a deliberately
> agressive WARN that was added to guard against doing compatibility checks from
> the "wrong" location.
>
> E.g. this is what I ended up with for a changelog to drop the irqs_disabled()
> check and for the end code (though it's not tested yet...)
>
> Drop kvm_x86_check_processor_compatibility()'s WARN that IRQs are
> disabled, as the ONLINE section runs with IRQs disabled. The WARN wasn't
^
enabled.

> intended to be a requirement, e.g. disabling preemption is sufficient,
> the IRQ thing was purely an aggressive sanity check since the helper was
> only ever invoked via SMP function call.
>
>
> static int kvm_x86_check_processor_compatibility(void)
> {
> int cpu = smp_processor_id();
> struct cpuinfo_x86 *c = &cpu_data(cpu);
>
> /*
> * Compatibility checks are done when loading KVM and when enabling
> * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> * compatible, i.e. KVM should never perform a compatibility check on
> * an offline CPU.
> */
> WARN_ON(!cpu_online(cpu));

Looks good to me. Perhaps this also can be removed, though.

And IMHO the removing of WARN_ON(!irq_disabled()) should be folded to the patch
"[PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section".
Because moving from STARTING section to ONLINE section changes the IRQ status
when the compatibility check is called.

>
> if (__cr4_reserved_bits(cpu_has, c) !=
> __cr4_reserved_bits(cpu_has, &boot_cpu_data))
> return -EIO;
>
> return static_call(kvm_x86_check_processor_compatibility)();
> }
>
>
> int kvm_arch_hardware_enable(void)
> {
> struct kvm *kvm;
> struct kvm_vcpu *vcpu;
> unsigned long i;
> int ret;
> u64 local_tsc;
> u64 max_tsc = 0;
> bool stable, backwards_tsc = false;
>
> kvm_user_return_msr_cpu_online();
>
> ret = kvm_x86_check_processor_compatibility();
> if (ret)
> return ret;
>
> ret = static_call(kvm_x86_hardware_enable)();
> if (ret != 0)
> return ret;
>
>
> ....
> }

2022-11-17 15:40:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling

On Thu, Nov 17, 2022, Huang, Kai wrote:
> On Wed, 2022-11-16 at 17:11 +0000, Sean Christopherson wrote:
> > static int kvm_x86_check_processor_compatibility(void)
> > {
> > int cpu = smp_processor_id();
> > struct cpuinfo_x86 *c = &cpu_data(cpu);
> >
> > /*
> > * Compatibility checks are done when loading KVM and when enabling
> > * hardware, e.g. during CPU hotplug, to ensure all online CPUs are
> > * compatible, i.e. KVM should never perform a compatibility check on
> > * an offline CPU.
> > */
> > WARN_ON(!cpu_online(cpu));
>
> Looks good to me. Perhaps this also can be removed, though.

Hmm, it's a bit superfluous, but I think it could fire if KVM messed up CPU
hotplug again, e.g. if the for_each_online_cpu() => IPI raced with CPU unplug.

> And IMHO the removing of WARN_ON(!irq_disabled()) should be folded to the patch
> "[PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section".
> Because moving from STARTING section to ONLINE section changes the IRQ status
> when the compatibility check is called.

Yep, that's what I have coded up, just smushed it all together here.

2022-11-30 23:10:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules

On Thu, Nov 10, 2022, Sean Christopherson wrote:
> On Thu, Nov 10, 2022, Robert Hoo wrote:
> > After this patch set, still find some printk()s left in arch/x86/kvm/*,
> > consider clean all of them up?
>
> Hmm, yeah, I suppose at this point it makes sense to tack on a patch to clean
> them up.

Actually, I'm going to pass on this for now. The series is already too big. I'll
add this to my todo list for the future.

2022-12-01 01:50:39

by Robert Hoo

[permalink] [raw]
Subject: Re: [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules

On Wed, 2022-11-30 at 23:02 +0000, Sean Christopherson wrote:
> On Thu, Nov 10, 2022, Sean Christopherson wrote:
> > On Thu, Nov 10, 2022, Robert Hoo wrote:
> > > After this patch set, still find some printk()s left in
> > > arch/x86/kvm/*,
> > > consider clean all of them up?
> >
> > Hmm, yeah, I suppose at this point it makes sense to tack on a
> > patch to clean
> > them up.
>
> Actually, I'm going to pass on this for now. The series is already
> too big. I'll
> add this to my todo list for the future.

That's all right, thanks for update.