The non-x86 part of this series is wholly contained in patch 01. Compared
to other recent kvm-wide changes, this one is very straightforward (famous
last words).
Like a few other architectures, e.g. PPC, x86 uses a set of global hooks
to call back into vendor code on demand. A handlful of the x86 hooks are
used only within the scope of kvm_init(). This series moves the init-only
hooks to a separate struct, partly to clean up the code a bit, but mainly
so that the runtime hooks can be made available only after the x86 vendor
has completed its ->hardware_setup(). While working on a different series
I spent a fair bit of time scratching my as to why a kvm_x86_ops wasn't
working, and eventually realized VMX's callback wasn't "ready" because the
vmcs_config hadn't yet been populated.
Due to lack of a cross-compiling setup, the non-x86 changes in patch 01
are untested.
v3:
- Rebase to kvm/queue, d55c9d4009c7 ("KVM: nSVM: check for EFER ... ").
Conflicts galore, but all mechanical in nature.
- Drop an SVM patch that was obsoleted by kvm/queue.
- Collect an ack. [Marc]
v2:
- Rebase to kvm/queue, 2c2787938512 ("KVM: selftests: Stop ...")
- Collect tags. [Cornelia]
- Add a patch to make kvm_x86_ops its own instance and copy
{vmx,svm}_x86_ops by value, which saves a memory access on every
invocation of a kvm_x86_ops hook. [Paolo]
- Add patches to tag {vmx,svm}_x86_ops as __initdata after they're
copied by value.
Sean Christopherson (9):
KVM: Pass kvm_init()'s opaque param to additional arch funcs
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: SVM: Annotate svm_x86_ops as __initdata
arch/mips/kvm/mips.c | 4 +-
arch/powerpc/kvm/powerpc.c | 4 +-
arch/s390/kvm/kvm-s390.c | 4 +-
arch/x86/include/asm/kvm_host.h | 33 +--
arch/x86/kvm/cpuid.c | 4 +-
arch/x86/kvm/hyperv.c | 8 +-
arch/x86/kvm/kvm_cache_regs.h | 10 +-
arch/x86/kvm/lapic.c | 30 +--
arch/x86/kvm/mmu.h | 8 +-
arch/x86/kvm/mmu/mmu.c | 32 +--
arch/x86/kvm/pmu.c | 30 +--
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/svm.c | 19 +-
arch/x86/kvm/trace.h | 4 +-
arch/x86/kvm/vmx/nested.c | 17 +-
arch/x86/kvm/vmx/nested.h | 3 +-
arch/x86/kvm/vmx/vmx.c | 371 ++++++++++++++++----------------
arch/x86/kvm/x86.c | 370 +++++++++++++++----------------
arch/x86/kvm/x86.h | 4 +-
include/linux/kvm_host.h | 4 +-
virt/kvm/arm/arm.c | 4 +-
virt/kvm/kvm_main.c | 18 +-
22 files changed, 507 insertions(+), 476 deletions(-)
--
2.24.1
Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
of using the global kvm_x86_ops. This sets the stage for waiting until
after ->hardware_setup() to set kvm_x86_ops with the vendor's
implementation.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 15 ++++++++-------
arch/x86/kvm/vmx/nested.h | 3 ++-
arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
3 files changed, 24 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4ff859c99946..87fea22c3799 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
}
}
-__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
+__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
+ int (*exit_handlers[])(struct kvm_vcpu *))
{
int i;
@@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
exit_handlers[EXIT_REASON_INVVPID] = handle_invvpid;
exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;
- kvm_x86_ops->check_nested_events = vmx_check_nested_events;
- kvm_x86_ops->get_nested_state = vmx_get_nested_state;
- kvm_x86_ops->set_nested_state = vmx_set_nested_state;
- kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
- kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
- kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
+ ops->check_nested_events = vmx_check_nested_events;
+ ops->get_nested_state = vmx_get_nested_state;
+ ops->set_nested_state = vmx_set_nested_state;
+ ops->get_vmcs12_pages = nested_get_vmcs12_pages;
+ ops->nested_enable_evmcs = nested_enable_evmcs;
+ ops->nested_get_evmcs_version = nested_get_evmcs_version;
return 0;
}
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index f70968b76d33..ac56aefa49e3 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -19,7 +19,8 @@ enum nvmx_vmentry_status {
void vmx_leave_nested(struct kvm_vcpu *vcpu);
void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
-__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
+__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
+ int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 82dab775d520..cfa9093bdc06 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7860,16 +7860,16 @@ static __init int hardware_setup(void)
* using the APIC_ACCESS_ADDR VMCS field.
*/
if (!flexpriority_enabled)
- kvm_x86_ops->set_apic_access_page_addr = NULL;
+ vmx_x86_ops.set_apic_access_page_addr = NULL;
if (!cpu_has_vmx_tpr_shadow())
- kvm_x86_ops->update_cr8_intercept = NULL;
+ vmx_x86_ops.update_cr8_intercept = NULL;
#if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
&& enable_ept) {
- kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
- kvm_x86_ops->tlb_remote_flush_with_range =
+ vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
+ vmx_x86_ops.tlb_remote_flush_with_range =
hv_remote_flush_tlb_with_range;
}
#endif
@@ -7884,7 +7884,7 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
- kvm_x86_ops->sync_pir_to_irr = NULL;
+ vmx_x86_ops.sync_pir_to_irr = NULL;
}
if (cpu_has_vmx_tsc_scaling()) {
@@ -7916,10 +7916,10 @@ static __init int hardware_setup(void)
enable_pml = 0;
if (!enable_pml) {
- kvm_x86_ops->slot_enable_log_dirty = NULL;
- kvm_x86_ops->slot_disable_log_dirty = NULL;
- kvm_x86_ops->flush_log_dirty = NULL;
- kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
+ vmx_x86_ops.slot_enable_log_dirty = NULL;
+ vmx_x86_ops.slot_disable_log_dirty = NULL;
+ vmx_x86_ops.flush_log_dirty = NULL;
+ vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
}
if (!cpu_has_vmx_preemption_timer())
@@ -7947,9 +7947,9 @@ static __init int hardware_setup(void)
}
if (!enable_preemption_timer) {
- kvm_x86_ops->set_hv_timer = NULL;
- kvm_x86_ops->cancel_hv_timer = NULL;
- kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
+ vmx_x86_ops.set_hv_timer = NULL;
+ vmx_x86_ops.cancel_hv_timer = NULL;
+ vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}
kvm_set_posted_intr_wakeup_handler(wakeup_handler);
@@ -7965,7 +7965,8 @@ static __init int hardware_setup(void)
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
vmx_capability.ept);
- r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
+ r = nested_vmx_hardware_setup(&vmx_x86_ops,
+ kvm_vmx_exit_handlers);
if (r)
return r;
}
--
2.24.1
Remove the __exit annotation from VMX hardware_unsetup(), the hook
can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
if failure occurs at various points during initialization.
Note, there is no known functional issue with the __exit annotation, the
above is merely justification for its removal. The real motivation is
to be able to annotate vmx_x86_ops and svm_x86_ops with __initdata,
which makes objtool complain because objtool doesn't understand that the
vendor specific __initdata is being copied by value to a non-__initdata
instance.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 54f991244fae..42a2d0d3984a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1056,7 +1056,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
struct kvm_x86_ops {
int (*hardware_enable)(void);
void (*hardware_disable)(void);
- void (*hardware_unsetup)(void); /* __exit */
+ void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(int index);
void (*cpuid_update)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bbe0d165a0c..fac22e316417 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7652,7 +7652,7 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
return to_vmx(vcpu)->nested.vmxon;
}
-static __exit void hardware_unsetup(void)
+static void hardware_unsetup(void)
{
if (nested)
nested_vmx_hardware_unsetup();
--
2.24.1
Move the kvm_x86_ops functions that are used only within the scope of
kvm_init() into a separate struct, kvm_x86_init_ops. In addition to
identifying the init-only functions without restorting to code comments,
this also sets the stage for waiting until after ->hardware_setup() to
set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is
desirable as many of the hooks are not usable until ->hardware_setup()
completes.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 13 +++++++++----
arch/x86/kvm/svm.c | 15 ++++++++++-----
arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
arch/x86/kvm/x86.c | 10 ++++++----
4 files changed, 36 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9a183e9d4cb1..f4c5b49299ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1054,12 +1054,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
}
struct kvm_x86_ops {
- int (*cpu_has_kvm_support)(void); /* __init */
- int (*disabled_by_bios)(void); /* __init */
int (*hardware_enable)(void);
void (*hardware_disable)(void);
- int (*check_processor_compatibility)(void);/* __init */
- int (*hardware_setup)(void); /* __init */
void (*hardware_unsetup)(void); /* __exit */
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(int index);
@@ -1260,6 +1256,15 @@ struct kvm_x86_ops {
int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
};
+struct kvm_x86_init_ops {
+ int (*cpu_has_kvm_support)(void);
+ int (*disabled_by_bios)(void);
+ int (*check_processor_compatibility)(void);
+ int (*hardware_setup)(void);
+
+ struct kvm_x86_ops *runtime_ops;
+};
+
struct kvm_arch_async_pf {
u32 token;
gfn_t gfn;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2125c6ae5951..33e67c3389c2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7351,11 +7351,7 @@ static void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
}
static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
- .cpu_has_kvm_support = has_svm,
- .disabled_by_bios = is_disabled,
- .hardware_setup = svm_hardware_setup,
.hardware_unsetup = svm_hardware_teardown,
- .check_processor_compatibility = svm_check_processor_compat,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
@@ -7480,9 +7476,18 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.check_nested_events = svm_check_nested_events,
};
+static struct kvm_x86_init_ops svm_init_ops __initdata = {
+ .cpu_has_kvm_support = has_svm,
+ .disabled_by_bios = is_disabled,
+ .hardware_setup = svm_hardware_setup,
+ .check_processor_compatibility = svm_check_processor_compat,
+
+ .runtime_ops = &svm_x86_ops,
+};
+
static int __init svm_init(void)
{
- return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
+ return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
__alignof__(struct vcpu_svm), THIS_MODULE);
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 07299a957d4a..ffcdcc86f5b7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
}
static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
- .cpu_has_kvm_support = cpu_has_kvm_support,
- .disabled_by_bios = vmx_disabled_by_bios,
- .hardware_setup = hardware_setup,
.hardware_unsetup = hardware_unsetup,
- .check_processor_compatibility = vmx_check_processor_compat,
+
.hardware_enable = hardware_enable,
.hardware_disable = hardware_disable,
.cpu_has_accelerated_tpr = report_flexpriority,
@@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
};
+static struct kvm_x86_init_ops vmx_init_ops __initdata = {
+ .cpu_has_kvm_support = cpu_has_kvm_support,
+ .disabled_by_bios = vmx_disabled_by_bios,
+ .check_processor_compatibility = vmx_check_processor_compat,
+ .hardware_setup = hardware_setup,
+
+ .runtime_ops = &vmx_x86_ops,
+};
+
static void vmx_cleanup_l1d_flush(void)
{
if (vmx_l1d_flush_pages) {
@@ -8065,7 +8071,7 @@ static int __init vmx_init(void)
}
#endif
- r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
+ r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
__alignof__(struct vcpu_vmx), THIS_MODULE);
if (r)
return r;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0f08e1b4e762..20f989d1bba8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7298,8 +7298,8 @@ static struct notifier_block pvclock_gtod_notifier = {
int kvm_arch_init(void *opaque)
{
+ struct kvm_x86_init_ops *ops = opaque;
int r;
- struct kvm_x86_ops *ops = opaque;
if (kvm_x86_ops) {
printk(KERN_ERR "kvm: already loaded the other module\n");
@@ -7354,7 +7354,7 @@ int kvm_arch_init(void *opaque)
if (r)
goto out_free_percpu;
- kvm_x86_ops = ops;
+ kvm_x86_ops = ops->runtime_ops;
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0,
@@ -9623,6 +9623,7 @@ void kvm_arch_hardware_disable(void)
int kvm_arch_hardware_setup(void *opaque)
{
+ struct kvm_x86_init_ops *ops = opaque;
int r;
rdmsrl_safe(MSR_EFER, &host_efer);
@@ -9630,7 +9631,7 @@ int kvm_arch_hardware_setup(void *opaque)
if (boot_cpu_has(X86_FEATURE_XSAVES))
rdmsrl(MSR_IA32_XSS, host_xss);
- r = kvm_x86_ops->hardware_setup();
+ r = ops->hardware_setup();
if (r != 0)
return r;
@@ -9665,13 +9666,14 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+ struct kvm_x86_init_ops *ops = opaque;
WARN_ON(!irqs_disabled());
if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits)
return -EIO;
- return kvm_x86_ops->check_processor_compatibility();
+ return ops->check_processor_compatibility();
}
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--
2.24.1
Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable. This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.
No functional change intended.
Reviewed-by: Cornelia Huck <[email protected]>
Tested-by: Cornelia Huck <[email protected]> #s390
Acked-by: Marc Zyngier <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 4 ++--
arch/powerpc/kvm/powerpc.c | 4 ++--
arch/s390/kvm/kvm-s390.c | 4 ++--
arch/x86/kvm/x86.c | 4 ++--
include/linux/kvm_host.h | 4 ++--
virt/kvm/arm/arm.c | 4 ++--
virt/kvm/kvm_main.c | 18 ++++++++++++++----
7 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 78507757ba9a..8f05dd0a0f4e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -118,12 +118,12 @@ void kvm_arch_hardware_disable(void)
kvm_mips_callbacks->hardware_disable();
}
-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
return 0;
}
-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 62ee66d5eb6f..f54574564f1e 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -415,12 +415,12 @@ int kvm_arch_hardware_enable(void)
return 0;
}
-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
return 0;
}
-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return kvmppc_core_check_processor_compat();
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 807ed6d722dd..0ac8eb29e5aa 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -235,7 +235,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}
-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return 0;
}
@@ -302,7 +302,7 @@ static struct notifier_block kvm_clock_notifier = {
.notifier_call = kvm_clock_sync,
};
-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
gmap_notifier.notifier_call = kvm_gmap_notifier;
gmap_register_pte_notifier(&gmap_notifier);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e54c6ad628a8..0f08e1b4e762 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9621,7 +9621,7 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}
-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
int r;
@@ -9662,7 +9662,7 @@ void kvm_arch_hardware_unsetup(void)
kvm_x86_ops->hardware_unsetup();
}
-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 35bc52e187a2..bc48d8fef3e1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -886,9 +886,9 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu);
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
-int kvm_arch_hardware_setup(void);
+int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void);
+int kvm_arch_check_processor_compat(void *opaque);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bfdba1caf59d..57e5a143b0db 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -64,12 +64,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}
-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
return 0;
}
-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 28eae681859f..f4e79c891c25 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4645,14 +4645,22 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
return &kvm_running_vcpu;
}
-static void check_processor_compat(void *rtn)
+struct kvm_cpu_compat_check {
+ void *opaque;
+ int *ret;
+};
+
+static void check_processor_compat(void *data)
{
- *(int *)rtn = kvm_arch_check_processor_compat();
+ struct kvm_cpu_compat_check *c = data;
+
+ *c->ret = kvm_arch_check_processor_compat(c->opaque);
}
int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module)
{
+ struct kvm_cpu_compat_check c;
int r;
int cpu;
@@ -4676,12 +4684,14 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
goto out_free_0;
}
- r = kvm_arch_hardware_setup();
+ r = kvm_arch_hardware_setup(opaque);
if (r < 0)
goto out_free_1;
+ c.ret = &r;
+ c.opaque = opaque;
for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &r, 1);
+ smp_call_function_single(cpu, check_processor_compat, &c, 1);
if (r < 0)
goto out_free_2;
}
--
2.24.1
Move VMX's hardware_setup() below its vmx_x86_ops definition so that a
future patch can refactor hardware_setup() to modify vmx_x86_ops
directly instead of indirectly modifying the ops via the global
kvm_x86_ops.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 346 ++++++++++++++++++++---------------------
1 file changed, 173 insertions(+), 173 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ffcdcc86f5b7..82dab775d520 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7652,179 +7652,6 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
return to_vmx(vcpu)->nested.vmxon;
}
-static __init int hardware_setup(void)
-{
- unsigned long host_bndcfgs;
- struct desc_ptr dt;
- int r, i, ept_lpage_level;
-
- store_idt(&dt);
- host_idt_base = dt.address;
-
- for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
- kvm_define_shared_msr(i, vmx_msr_index[i]);
-
- if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
- return -EIO;
-
- if (boot_cpu_has(X86_FEATURE_NX))
- kvm_enable_efer_bits(EFER_NX);
-
- if (boot_cpu_has(X86_FEATURE_MPX)) {
- rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
- WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
- }
-
- if (!cpu_has_vmx_mpx())
- supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
- XFEATURE_MASK_BNDCSR);
-
- if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
- !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
- enable_vpid = 0;
-
- if (!cpu_has_vmx_ept() ||
- !cpu_has_vmx_ept_4levels() ||
- !cpu_has_vmx_ept_mt_wb() ||
- !cpu_has_vmx_invept_global())
- enable_ept = 0;
-
- if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
- enable_ept_ad_bits = 0;
-
- if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
- enable_unrestricted_guest = 0;
-
- if (!cpu_has_vmx_flexpriority())
- flexpriority_enabled = 0;
-
- if (!cpu_has_virtual_nmis())
- enable_vnmi = 0;
-
- /*
- * set_apic_access_page_addr() is used to reload apic access
- * page upon invalidation. No need to do anything if not
- * using the APIC_ACCESS_ADDR VMCS field.
- */
- if (!flexpriority_enabled)
- kvm_x86_ops->set_apic_access_page_addr = NULL;
-
- if (!cpu_has_vmx_tpr_shadow())
- kvm_x86_ops->update_cr8_intercept = NULL;
-
-#if IS_ENABLED(CONFIG_HYPERV)
- if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
- && enable_ept) {
- kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
- kvm_x86_ops->tlb_remote_flush_with_range =
- hv_remote_flush_tlb_with_range;
- }
-#endif
-
- if (!cpu_has_vmx_ple()) {
- ple_gap = 0;
- ple_window = 0;
- ple_window_grow = 0;
- ple_window_max = 0;
- ple_window_shrink = 0;
- }
-
- if (!cpu_has_vmx_apicv()) {
- enable_apicv = 0;
- kvm_x86_ops->sync_pir_to_irr = NULL;
- }
-
- if (cpu_has_vmx_tsc_scaling()) {
- kvm_has_tsc_control = true;
- kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
- kvm_tsc_scaling_ratio_frac_bits = 48;
- }
-
- set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
-
- if (enable_ept)
- vmx_enable_tdp();
-
- if (!enable_ept)
- ept_lpage_level = 0;
- else if (cpu_has_vmx_ept_1g_page())
- ept_lpage_level = PT_PDPE_LEVEL;
- else if (cpu_has_vmx_ept_2m_page())
- ept_lpage_level = PT_DIRECTORY_LEVEL;
- else
- ept_lpage_level = PT_PAGE_TABLE_LEVEL;
- kvm_configure_mmu(enable_ept, ept_lpage_level);
-
- /*
- * Only enable PML when hardware supports PML feature, and both EPT
- * and EPT A/D bit features are enabled -- PML depends on them to work.
- */
- if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
- enable_pml = 0;
-
- if (!enable_pml) {
- kvm_x86_ops->slot_enable_log_dirty = NULL;
- kvm_x86_ops->slot_disable_log_dirty = NULL;
- kvm_x86_ops->flush_log_dirty = NULL;
- kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
- }
-
- if (!cpu_has_vmx_preemption_timer())
- enable_preemption_timer = false;
-
- if (enable_preemption_timer) {
- u64 use_timer_freq = 5000ULL * 1000 * 1000;
- u64 vmx_msr;
-
- rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
- cpu_preemption_timer_multi =
- vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
-
- if (tsc_khz)
- use_timer_freq = (u64)tsc_khz * 1000;
- use_timer_freq >>= cpu_preemption_timer_multi;
-
- /*
- * KVM "disables" the preemption timer by setting it to its max
- * value. Don't use the timer if it might cause spurious exits
- * at a rate faster than 0.1 Hz (of uninterrupted guest time).
- */
- if (use_timer_freq > 0xffffffffu / 10)
- enable_preemption_timer = false;
- }
-
- if (!enable_preemption_timer) {
- kvm_x86_ops->set_hv_timer = NULL;
- kvm_x86_ops->cancel_hv_timer = NULL;
- kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
- }
-
- kvm_set_posted_intr_wakeup_handler(wakeup_handler);
-
- kvm_mce_cap_supported |= MCG_LMCE_P;
-
- if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
- return -EINVAL;
- if (!enable_ept || !cpu_has_vmx_intel_pt())
- pt_mode = PT_MODE_SYSTEM;
-
- if (nested) {
- nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept);
-
- r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
- if (r)
- return r;
- }
-
- vmx_set_cpu_caps();
-
- r = alloc_kvm_area();
- if (r)
- nested_vmx_hardware_unsetup();
- return r;
-}
-
static __exit void hardware_unsetup(void)
{
if (nested)
@@ -7978,6 +7805,179 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
};
+static __init int hardware_setup(void)
+{
+ unsigned long host_bndcfgs;
+ struct desc_ptr dt;
+ int r, i, ept_lpage_level;
+
+ store_idt(&dt);
+ host_idt_base = dt.address;
+
+ for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
+ kvm_define_shared_msr(i, vmx_msr_index[i]);
+
+ if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
+ return -EIO;
+
+ if (boot_cpu_has(X86_FEATURE_NX))
+ kvm_enable_efer_bits(EFER_NX);
+
+ if (boot_cpu_has(X86_FEATURE_MPX)) {
+ rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
+ WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
+ }
+
+ if (!cpu_has_vmx_mpx())
+ supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
+ XFEATURE_MASK_BNDCSR);
+
+ if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
+ !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
+ enable_vpid = 0;
+
+ if (!cpu_has_vmx_ept() ||
+ !cpu_has_vmx_ept_4levels() ||
+ !cpu_has_vmx_ept_mt_wb() ||
+ !cpu_has_vmx_invept_global())
+ enable_ept = 0;
+
+ if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
+ enable_ept_ad_bits = 0;
+
+ if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
+ enable_unrestricted_guest = 0;
+
+ if (!cpu_has_vmx_flexpriority())
+ flexpriority_enabled = 0;
+
+ if (!cpu_has_virtual_nmis())
+ enable_vnmi = 0;
+
+ /*
+ * set_apic_access_page_addr() is used to reload apic access
+ * page upon invalidation. No need to do anything if not
+ * using the APIC_ACCESS_ADDR VMCS field.
+ */
+ if (!flexpriority_enabled)
+ kvm_x86_ops->set_apic_access_page_addr = NULL;
+
+ if (!cpu_has_vmx_tpr_shadow())
+ kvm_x86_ops->update_cr8_intercept = NULL;
+
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
+ && enable_ept) {
+ kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+ kvm_x86_ops->tlb_remote_flush_with_range =
+ hv_remote_flush_tlb_with_range;
+ }
+#endif
+
+ if (!cpu_has_vmx_ple()) {
+ ple_gap = 0;
+ ple_window = 0;
+ ple_window_grow = 0;
+ ple_window_max = 0;
+ ple_window_shrink = 0;
+ }
+
+ if (!cpu_has_vmx_apicv()) {
+ enable_apicv = 0;
+ kvm_x86_ops->sync_pir_to_irr = NULL;
+ }
+
+ if (cpu_has_vmx_tsc_scaling()) {
+ kvm_has_tsc_control = true;
+ kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
+ kvm_tsc_scaling_ratio_frac_bits = 48;
+ }
+
+ set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
+
+ if (enable_ept)
+ vmx_enable_tdp();
+
+ if (!enable_ept)
+ ept_lpage_level = 0;
+ else if (cpu_has_vmx_ept_1g_page())
+ ept_lpage_level = PT_PDPE_LEVEL;
+ else if (cpu_has_vmx_ept_2m_page())
+ ept_lpage_level = PT_DIRECTORY_LEVEL;
+ else
+ ept_lpage_level = PT_PAGE_TABLE_LEVEL;
+ kvm_configure_mmu(enable_ept, ept_lpage_level);
+
+ /*
+ * Only enable PML when hardware supports PML feature, and both EPT
+ * and EPT A/D bit features are enabled -- PML depends on them to work.
+ */
+ if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
+ enable_pml = 0;
+
+ if (!enable_pml) {
+ kvm_x86_ops->slot_enable_log_dirty = NULL;
+ kvm_x86_ops->slot_disable_log_dirty = NULL;
+ kvm_x86_ops->flush_log_dirty = NULL;
+ kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
+ }
+
+ if (!cpu_has_vmx_preemption_timer())
+ enable_preemption_timer = false;
+
+ if (enable_preemption_timer) {
+ u64 use_timer_freq = 5000ULL * 1000 * 1000;
+ u64 vmx_msr;
+
+ rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+ cpu_preemption_timer_multi =
+ vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+
+ if (tsc_khz)
+ use_timer_freq = (u64)tsc_khz * 1000;
+ use_timer_freq >>= cpu_preemption_timer_multi;
+
+ /*
+ * KVM "disables" the preemption timer by setting it to its max
+ * value. Don't use the timer if it might cause spurious exits
+ * at a rate faster than 0.1 Hz (of uninterrupted guest time).
+ */
+ if (use_timer_freq > 0xffffffffu / 10)
+ enable_preemption_timer = false;
+ }
+
+ if (!enable_preemption_timer) {
+ kvm_x86_ops->set_hv_timer = NULL;
+ kvm_x86_ops->cancel_hv_timer = NULL;
+ kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
+ }
+
+ kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
+ kvm_mce_cap_supported |= MCG_LMCE_P;
+
+ if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
+ return -EINVAL;
+ if (!enable_ept || !cpu_has_vmx_intel_pt())
+ pt_mode = PT_MODE_SYSTEM;
+
+ if (nested) {
+ nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
+ vmx_capability.ept);
+
+ r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
+ if (r)
+ return r;
+ }
+
+ vmx_set_cpu_caps();
+
+ r = alloc_kvm_area();
+ if (r)
+ nested_vmx_hardware_unsetup();
+ return r;
+}
+
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
--
2.24.1
Sean Christopherson <[email protected]> writes:
> Move the kvm_x86_ops functions that are used only within the scope of
> kvm_init() into a separate struct, kvm_x86_init_ops. In addition to
> identifying the init-only functions without restorting to code comments,
> this also sets the stage for waiting until after ->hardware_setup() to
> set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is
> desirable as many of the hooks are not usable until ->hardware_setup()
> completes.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 13 +++++++++----
> arch/x86/kvm/svm.c | 15 ++++++++++-----
> arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
> arch/x86/kvm/x86.c | 10 ++++++----
> 4 files changed, 36 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9a183e9d4cb1..f4c5b49299ff 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1054,12 +1054,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> }
>
> struct kvm_x86_ops {
> - int (*cpu_has_kvm_support)(void); /* __init */
> - int (*disabled_by_bios)(void); /* __init */
> int (*hardware_enable)(void);
> void (*hardware_disable)(void);
> - int (*check_processor_compatibility)(void);/* __init */
> - int (*hardware_setup)(void); /* __init */
> void (*hardware_unsetup)(void); /* __exit */
> bool (*cpu_has_accelerated_tpr)(void);
> bool (*has_emulated_msr)(int index);
> @@ -1260,6 +1256,15 @@ struct kvm_x86_ops {
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> };
>
> +struct kvm_x86_init_ops {
> + int (*cpu_has_kvm_support)(void);
> + int (*disabled_by_bios)(void);
> + int (*check_processor_compatibility)(void);
> + int (*hardware_setup)(void);
> +
> + struct kvm_x86_ops *runtime_ops;
> +};
> +
> struct kvm_arch_async_pf {
> u32 token;
> gfn_t gfn;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 2125c6ae5951..33e67c3389c2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7351,11 +7351,7 @@ static void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
> }
>
> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> - .cpu_has_kvm_support = has_svm,
> - .disabled_by_bios = is_disabled,
> - .hardware_setup = svm_hardware_setup,
> .hardware_unsetup = svm_hardware_teardown,
> - .check_processor_compatibility = svm_check_processor_compat,
> .hardware_enable = svm_hardware_enable,
> .hardware_disable = svm_hardware_disable,
> .cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
> @@ -7480,9 +7476,18 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .check_nested_events = svm_check_nested_events,
> };
>
> +static struct kvm_x86_init_ops svm_init_ops __initdata = {
> + .cpu_has_kvm_support = has_svm,
> + .disabled_by_bios = is_disabled,
> + .hardware_setup = svm_hardware_setup,
> + .check_processor_compatibility = svm_check_processor_compat,
> +
> + .runtime_ops = &svm_x86_ops,
> +};
Unrelated to your patch but I think we can make the naming of some of
these functions more consistend on SVM/VMX, in particular I'd suggest
has_svm() -> cpu_has_svm_support()
is_disabled -> svm_disabled_by_bios()
...
(see below for VMX)
> +
> static int __init svm_init(void)
> {
> - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
> + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
> __alignof__(struct vcpu_svm), THIS_MODULE);
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 07299a957d4a..ffcdcc86f5b7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> }
>
> static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> - .cpu_has_kvm_support = cpu_has_kvm_support,
> - .disabled_by_bios = vmx_disabled_by_bios,
> - .hardware_setup = hardware_setup,
> .hardware_unsetup = hardware_unsetup,
> - .check_processor_compatibility = vmx_check_processor_compat,
> +
> .hardware_enable = hardware_enable,
> .hardware_disable = hardware_disable,
> .cpu_has_accelerated_tpr = report_flexpriority,
> @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> };
>
> +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> + .cpu_has_kvm_support = cpu_has_kvm_support,
> + .disabled_by_bios = vmx_disabled_by_bios,
> + .check_processor_compatibility = vmx_check_processor_compat,
> + .hardware_setup = hardware_setup,
cpu_has_kvm_support() -> cpu_has_vmx_support()
hardware_setup() -> vmx_hardware_setup()
> +
> + .runtime_ops = &vmx_x86_ops,
> +};
> +
> static void vmx_cleanup_l1d_flush(void)
> {
> if (vmx_l1d_flush_pages) {
> @@ -8065,7 +8071,7 @@ static int __init vmx_init(void)
> }
> #endif
>
> - r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
> + r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
> __alignof__(struct vcpu_vmx), THIS_MODULE);
> if (r)
> return r;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0f08e1b4e762..20f989d1bba8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7298,8 +7298,8 @@ static struct notifier_block pvclock_gtod_notifier = {
>
> int kvm_arch_init(void *opaque)
> {
> + struct kvm_x86_init_ops *ops = opaque;
> int r;
> - struct kvm_x86_ops *ops = opaque;
>
> if (kvm_x86_ops) {
> printk(KERN_ERR "kvm: already loaded the other module\n");
> @@ -7354,7 +7354,7 @@ int kvm_arch_init(void *opaque)
> if (r)
> goto out_free_percpu;
>
> - kvm_x86_ops = ops;
> + kvm_x86_ops = ops->runtime_ops;
>
> kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
> PT_DIRTY_MASK, PT64_NX_MASK, 0,
> @@ -9623,6 +9623,7 @@ void kvm_arch_hardware_disable(void)
>
> int kvm_arch_hardware_setup(void *opaque)
> {
> + struct kvm_x86_init_ops *ops = opaque;
> int r;
>
> rdmsrl_safe(MSR_EFER, &host_efer);
> @@ -9630,7 +9631,7 @@ int kvm_arch_hardware_setup(void *opaque)
> if (boot_cpu_has(X86_FEATURE_XSAVES))
> rdmsrl(MSR_IA32_XSS, host_xss);
>
> - r = kvm_x86_ops->hardware_setup();
> + r = ops->hardware_setup();
> if (r != 0)
> return r;
>
> @@ -9665,13 +9666,14 @@ void kvm_arch_hardware_unsetup(void)
> int kvm_arch_check_processor_compat(void *opaque)
> {
> struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> + struct kvm_x86_init_ops *ops = opaque;
>
> WARN_ON(!irqs_disabled());
>
> if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits)
> return -EIO;
>
> - return kvm_x86_ops->check_processor_compatibility();
> + return ops->check_processor_compatibility();
> }
>
> bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
The patch itself looks good,
Reviewed-by: Vitaly Kuznetsov <[email protected]>
--
Vitaly
Sean Christopherson <[email protected]> writes:
> Move VMX's hardware_setup() below its vmx_x86_ops definition so that a
> future patch can refactor hardware_setup() to modify vmx_x86_ops
> directly instead of indirectly modifying the ops via the global
> kvm_x86_ops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 346 ++++++++++++++++++++---------------------
> 1 file changed, 173 insertions(+), 173 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ffcdcc86f5b7..82dab775d520 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7652,179 +7652,6 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> return to_vmx(vcpu)->nested.vmxon;
> }
>
> -static __init int hardware_setup(void)
> -{
> - unsigned long host_bndcfgs;
> - struct desc_ptr dt;
> - int r, i, ept_lpage_level;
> -
> - store_idt(&dt);
> - host_idt_base = dt.address;
> -
> - for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
> - kvm_define_shared_msr(i, vmx_msr_index[i]);
> -
> - if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
> - return -EIO;
> -
> - if (boot_cpu_has(X86_FEATURE_NX))
> - kvm_enable_efer_bits(EFER_NX);
> -
> - if (boot_cpu_has(X86_FEATURE_MPX)) {
> - rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
> - WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
> - }
> -
> - if (!cpu_has_vmx_mpx())
> - supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
> - XFEATURE_MASK_BNDCSR);
> -
> - if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
> - !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
> - enable_vpid = 0;
> -
> - if (!cpu_has_vmx_ept() ||
> - !cpu_has_vmx_ept_4levels() ||
> - !cpu_has_vmx_ept_mt_wb() ||
> - !cpu_has_vmx_invept_global())
> - enable_ept = 0;
> -
> - if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
> - enable_ept_ad_bits = 0;
> -
> - if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
> - enable_unrestricted_guest = 0;
> -
> - if (!cpu_has_vmx_flexpriority())
> - flexpriority_enabled = 0;
> -
> - if (!cpu_has_virtual_nmis())
> - enable_vnmi = 0;
> -
> - /*
> - * set_apic_access_page_addr() is used to reload apic access
> - * page upon invalidation. No need to do anything if not
> - * using the APIC_ACCESS_ADDR VMCS field.
> - */
> - if (!flexpriority_enabled)
> - kvm_x86_ops->set_apic_access_page_addr = NULL;
> -
> - if (!cpu_has_vmx_tpr_shadow())
> - kvm_x86_ops->update_cr8_intercept = NULL;
> -
> -#if IS_ENABLED(CONFIG_HYPERV)
> - if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> - && enable_ept) {
> - kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> - kvm_x86_ops->tlb_remote_flush_with_range =
> - hv_remote_flush_tlb_with_range;
> - }
> -#endif
> -
> - if (!cpu_has_vmx_ple()) {
> - ple_gap = 0;
> - ple_window = 0;
> - ple_window_grow = 0;
> - ple_window_max = 0;
> - ple_window_shrink = 0;
> - }
> -
> - if (!cpu_has_vmx_apicv()) {
> - enable_apicv = 0;
> - kvm_x86_ops->sync_pir_to_irr = NULL;
> - }
> -
> - if (cpu_has_vmx_tsc_scaling()) {
> - kvm_has_tsc_control = true;
> - kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
> - kvm_tsc_scaling_ratio_frac_bits = 48;
> - }
> -
> - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> -
> - if (enable_ept)
> - vmx_enable_tdp();
> -
> - if (!enable_ept)
> - ept_lpage_level = 0;
> - else if (cpu_has_vmx_ept_1g_page())
> - ept_lpage_level = PT_PDPE_LEVEL;
> - else if (cpu_has_vmx_ept_2m_page())
> - ept_lpage_level = PT_DIRECTORY_LEVEL;
> - else
> - ept_lpage_level = PT_PAGE_TABLE_LEVEL;
> - kvm_configure_mmu(enable_ept, ept_lpage_level);
> -
> - /*
> - * Only enable PML when hardware supports PML feature, and both EPT
> - * and EPT A/D bit features are enabled -- PML depends on them to work.
> - */
> - if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
> - enable_pml = 0;
> -
> - if (!enable_pml) {
> - kvm_x86_ops->slot_enable_log_dirty = NULL;
> - kvm_x86_ops->slot_disable_log_dirty = NULL;
> - kvm_x86_ops->flush_log_dirty = NULL;
> - kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> - }
> -
> - if (!cpu_has_vmx_preemption_timer())
> - enable_preemption_timer = false;
> -
> - if (enable_preemption_timer) {
> - u64 use_timer_freq = 5000ULL * 1000 * 1000;
> - u64 vmx_msr;
> -
> - rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> - cpu_preemption_timer_multi =
> - vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
> -
> - if (tsc_khz)
> - use_timer_freq = (u64)tsc_khz * 1000;
> - use_timer_freq >>= cpu_preemption_timer_multi;
> -
> - /*
> - * KVM "disables" the preemption timer by setting it to its max
> - * value. Don't use the timer if it might cause spurious exits
> - * at a rate faster than 0.1 Hz (of uninterrupted guest time).
> - */
> - if (use_timer_freq > 0xffffffffu / 10)
> - enable_preemption_timer = false;
> - }
> -
> - if (!enable_preemption_timer) {
> - kvm_x86_ops->set_hv_timer = NULL;
> - kvm_x86_ops->cancel_hv_timer = NULL;
> - kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
> - }
> -
> - kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> -
> - kvm_mce_cap_supported |= MCG_LMCE_P;
> -
> - if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
> - return -EINVAL;
> - if (!enable_ept || !cpu_has_vmx_intel_pt())
> - pt_mode = PT_MODE_SYSTEM;
> -
> - if (nested) {
> - nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
> - vmx_capability.ept);
> -
> - r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
> - if (r)
> - return r;
> - }
> -
> - vmx_set_cpu_caps();
> -
> - r = alloc_kvm_area();
> - if (r)
> - nested_vmx_hardware_unsetup();
> - return r;
> -}
> -
> static __exit void hardware_unsetup(void)
> {
> if (nested)
> @@ -7978,6 +7805,179 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> };
>
> +static __init int hardware_setup(void)
> +{
> + unsigned long host_bndcfgs;
> + struct desc_ptr dt;
> + int r, i, ept_lpage_level;
> +
> + store_idt(&dt);
> + host_idt_base = dt.address;
> +
> + for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
> + kvm_define_shared_msr(i, vmx_msr_index[i]);
> +
> + if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
> + return -EIO;
> +
> + if (boot_cpu_has(X86_FEATURE_NX))
> + kvm_enable_efer_bits(EFER_NX);
> +
> + if (boot_cpu_has(X86_FEATURE_MPX)) {
> + rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
> + WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
> + }
> +
> + if (!cpu_has_vmx_mpx())
> + supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
> + XFEATURE_MASK_BNDCSR);
> +
> + if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
> + !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
> + enable_vpid = 0;
> +
> + if (!cpu_has_vmx_ept() ||
> + !cpu_has_vmx_ept_4levels() ||
> + !cpu_has_vmx_ept_mt_wb() ||
> + !cpu_has_vmx_invept_global())
> + enable_ept = 0;
> +
> + if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
> + enable_ept_ad_bits = 0;
> +
> + if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
> + enable_unrestricted_guest = 0;
> +
> + if (!cpu_has_vmx_flexpriority())
> + flexpriority_enabled = 0;
> +
> + if (!cpu_has_virtual_nmis())
> + enable_vnmi = 0;
> +
> + /*
> + * set_apic_access_page_addr() is used to reload apic access
> + * page upon invalidation. No need to do anything if not
> + * using the APIC_ACCESS_ADDR VMCS field.
> + */
> + if (!flexpriority_enabled)
> + kvm_x86_ops->set_apic_access_page_addr = NULL;
> +
> + if (!cpu_has_vmx_tpr_shadow())
> + kvm_x86_ops->update_cr8_intercept = NULL;
> +
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> + && enable_ept) {
> + kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> + kvm_x86_ops->tlb_remote_flush_with_range =
> + hv_remote_flush_tlb_with_range;
> + }
> +#endif
> +
> + if (!cpu_has_vmx_ple()) {
> + ple_gap = 0;
> + ple_window = 0;
> + ple_window_grow = 0;
> + ple_window_max = 0;
> + ple_window_shrink = 0;
> + }
> +
> + if (!cpu_has_vmx_apicv()) {
> + enable_apicv = 0;
> + kvm_x86_ops->sync_pir_to_irr = NULL;
> + }
> +
> + if (cpu_has_vmx_tsc_scaling()) {
> + kvm_has_tsc_control = true;
> + kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
> + kvm_tsc_scaling_ratio_frac_bits = 48;
> + }
> +
> + set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> +
> + if (enable_ept)
> + vmx_enable_tdp();
> +
> + if (!enable_ept)
> + ept_lpage_level = 0;
> + else if (cpu_has_vmx_ept_1g_page())
> + ept_lpage_level = PT_PDPE_LEVEL;
> + else if (cpu_has_vmx_ept_2m_page())
> + ept_lpage_level = PT_DIRECTORY_LEVEL;
> + else
> + ept_lpage_level = PT_PAGE_TABLE_LEVEL;
> + kvm_configure_mmu(enable_ept, ept_lpage_level);
> +
> + /*
> + * Only enable PML when hardware supports PML feature, and both EPT
> + * and EPT A/D bit features are enabled -- PML depends on them to work.
> + */
> + if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
> + enable_pml = 0;
> +
> + if (!enable_pml) {
> + kvm_x86_ops->slot_enable_log_dirty = NULL;
> + kvm_x86_ops->slot_disable_log_dirty = NULL;
> + kvm_x86_ops->flush_log_dirty = NULL;
> + kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> + }
> +
> + if (!cpu_has_vmx_preemption_timer())
> + enable_preemption_timer = false;
> +
> + if (enable_preemption_timer) {
> + u64 use_timer_freq = 5000ULL * 1000 * 1000;
> + u64 vmx_msr;
> +
> + rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> + cpu_preemption_timer_multi =
> + vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
> +
> + if (tsc_khz)
> + use_timer_freq = (u64)tsc_khz * 1000;
> + use_timer_freq >>= cpu_preemption_timer_multi;
> +
> + /*
> + * KVM "disables" the preemption timer by setting it to its max
> + * value. Don't use the timer if it might cause spurious exits
> + * at a rate faster than 0.1 Hz (of uninterrupted guest time).
> + */
> + if (use_timer_freq > 0xffffffffu / 10)
> + enable_preemption_timer = false;
> + }
> +
> + if (!enable_preemption_timer) {
> + kvm_x86_ops->set_hv_timer = NULL;
> + kvm_x86_ops->cancel_hv_timer = NULL;
> + kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
> + }
> +
> + kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> +
> + kvm_mce_cap_supported |= MCG_LMCE_P;
> +
> + if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
> + return -EINVAL;
> + if (!enable_ept || !cpu_has_vmx_intel_pt())
> + pt_mode = PT_MODE_SYSTEM;
> +
> + if (nested) {
> + nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
> + vmx_capability.ept);
> +
> + r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
> + if (r)
> + return r;
> + }
> +
> + vmx_set_cpu_caps();
> +
> + r = alloc_kvm_area();
> + if (r)
> + nested_vmx_hardware_unsetup();
> + return r;
> +}
> +
> static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> .cpu_has_kvm_support = cpu_has_kvm_support,
> .disabled_by_bios = vmx_disabled_by_bios,
Reviewed-by: Vitaly Kuznetsov <[email protected]>
--
Vitaly
Sean Christopherson <[email protected]> writes:
> Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
> of using the global kvm_x86_ops. This sets the stage for waiting until
> after ->hardware_setup() to set kvm_x86_ops with the vendor's
> implementation.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/nested.c | 15 ++++++++-------
> arch/x86/kvm/vmx/nested.h | 3 ++-
> arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
> 3 files changed, 24 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 4ff859c99946..87fea22c3799 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
> }
> }
>
> -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> + int (*exit_handlers[])(struct kvm_vcpu *))
> {
> int i;
>
> @@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> exit_handlers[EXIT_REASON_INVVPID] = handle_invvpid;
> exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;
>
> - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
> - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
> - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
> - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
> - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
> + ops->check_nested_events = vmx_check_nested_events;
> + ops->get_nested_state = vmx_get_nested_state;
> + ops->set_nested_state = vmx_set_nested_state;
> + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> + ops->nested_enable_evmcs = nested_enable_evmcs;
> + ops->nested_get_evmcs_version = nested_get_evmcs_version;
A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
would make 'git grep vmx_x86_ops' output more complete.
>
> return 0;
> }
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index f70968b76d33..ac56aefa49e3 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -19,7 +19,8 @@ enum nvmx_vmentry_status {
> void vmx_leave_nested(struct kvm_vcpu *vcpu);
> void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
> void nested_vmx_hardware_unsetup(void);
> -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
> +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> + int (*exit_handlers[])(struct kvm_vcpu *));
> void nested_vmx_set_vmcs_shadowing_bitmap(void);
> void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
> enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 82dab775d520..cfa9093bdc06 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7860,16 +7860,16 @@ static __init int hardware_setup(void)
> * using the APIC_ACCESS_ADDR VMCS field.
> */
> if (!flexpriority_enabled)
> - kvm_x86_ops->set_apic_access_page_addr = NULL;
> + vmx_x86_ops.set_apic_access_page_addr = NULL;
>
> if (!cpu_has_vmx_tpr_shadow())
> - kvm_x86_ops->update_cr8_intercept = NULL;
> + vmx_x86_ops.update_cr8_intercept = NULL;
>
> #if IS_ENABLED(CONFIG_HYPERV)
> if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> && enable_ept) {
> - kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> - kvm_x86_ops->tlb_remote_flush_with_range =
> + vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
> + vmx_x86_ops.tlb_remote_flush_with_range =
> hv_remote_flush_tlb_with_range;
> }
> #endif
> @@ -7884,7 +7884,7 @@ static __init int hardware_setup(void)
>
> if (!cpu_has_vmx_apicv()) {
> enable_apicv = 0;
> - kvm_x86_ops->sync_pir_to_irr = NULL;
> + vmx_x86_ops.sync_pir_to_irr = NULL;
> }
>
> if (cpu_has_vmx_tsc_scaling()) {
> @@ -7916,10 +7916,10 @@ static __init int hardware_setup(void)
> enable_pml = 0;
>
> if (!enable_pml) {
> - kvm_x86_ops->slot_enable_log_dirty = NULL;
> - kvm_x86_ops->slot_disable_log_dirty = NULL;
> - kvm_x86_ops->flush_log_dirty = NULL;
> - kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> + vmx_x86_ops.slot_enable_log_dirty = NULL;
> + vmx_x86_ops.slot_disable_log_dirty = NULL;
> + vmx_x86_ops.flush_log_dirty = NULL;
> + vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
> }
>
> if (!cpu_has_vmx_preemption_timer())
> @@ -7947,9 +7947,9 @@ static __init int hardware_setup(void)
> }
>
> if (!enable_preemption_timer) {
> - kvm_x86_ops->set_hv_timer = NULL;
> - kvm_x86_ops->cancel_hv_timer = NULL;
> - kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
> + vmx_x86_ops.set_hv_timer = NULL;
> + vmx_x86_ops.cancel_hv_timer = NULL;
> + vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
> }
>
> kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> @@ -7965,7 +7965,8 @@ static __init int hardware_setup(void)
> nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
> vmx_capability.ept);
>
> - r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
> + r = nested_vmx_hardware_setup(&vmx_x86_ops,
> + kvm_vmx_exit_handlers);
> if (r)
> return r;
> }
Reviewed-by: Vitaly Kuznetsov <[email protected]>
--
Vitaly
Sean Christopherson <[email protected]> writes:
> Remove the __exit annotation from VMX hardware_unsetup(), the hook
> can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
> if failure occurs at various points during initialization.
>
> Note, there is no known functional issue with the __exit annotation, the
> above is merely justification for its removal. The real motivation is
> to be able to annotate vmx_x86_ops and svm_x86_ops with __initdata,
> which makes objtool complain because objtool doesn't understand that the
> vendor specific __initdata is being copied by value to a non-__initdata
> instance.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 54f991244fae..42a2d0d3984a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1056,7 +1056,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> struct kvm_x86_ops {
> int (*hardware_enable)(void);
> void (*hardware_disable)(void);
> - void (*hardware_unsetup)(void); /* __exit */
> + void (*hardware_unsetup)(void);
> bool (*cpu_has_accelerated_tpr)(void);
> bool (*has_emulated_msr)(int index);
> void (*cpuid_update)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4bbe0d165a0c..fac22e316417 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7652,7 +7652,7 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> return to_vmx(vcpu)->nested.vmxon;
> }
>
> -static __exit void hardware_unsetup(void)
> +static void hardware_unsetup(void)
> {
> if (nested)
> nested_vmx_hardware_unsetup();
Reviewed-by: Vitaly Kuznetsov <[email protected]>
--
Vitaly
On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > +
> > + .runtime_ops = &svm_x86_ops,
> > +};
>
> Unrelated to your patch but I think we can make the naming of some of
> these functions more consistend on SVM/VMX, in particular I'd suggest
>
> has_svm() -> cpu_has_svm_support()
> is_disabled -> svm_disabled_by_bios()
> ...
> (see below for VMX)
>
> > +
> > static int __init svm_init(void)
> > {
> > - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
> > + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
> > __alignof__(struct vcpu_svm), THIS_MODULE);
> > }
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 07299a957d4a..ffcdcc86f5b7 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> > }
> >
> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > - .cpu_has_kvm_support = cpu_has_kvm_support,
> > - .disabled_by_bios = vmx_disabled_by_bios,
> > - .hardware_setup = hardware_setup,
> > .hardware_unsetup = hardware_unsetup,
> > - .check_processor_compatibility = vmx_check_processor_compat,
> > +
> > .hardware_enable = hardware_enable,
> > .hardware_disable = hardware_disable,
> > .cpu_has_accelerated_tpr = report_flexpriority,
> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > };
> >
> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> > + .cpu_has_kvm_support = cpu_has_kvm_support,
> > + .disabled_by_bios = vmx_disabled_by_bios,
> > + .check_processor_compatibility = vmx_check_processor_compat,
> > + .hardware_setup = hardware_setup,
>
> cpu_has_kvm_support() -> cpu_has_vmx_support()
> hardware_setup() -> vmx_hardware_setup()
Preaching to the choir on this one. The VMX functions without prefixes in
in particular annoy me to no end, e.g. hardware_setup(). Though the worst
is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
time I've tried to find vmx_vcpu_create()...
What if we added a macro to auto-generate the common/required hooks? E.g.:
static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
MANDATORY_KVM_X86_OPS(vmx),
.pmu_ops = &intel_pmu_ops,
...
};
That'd enforce consistent naming, and would provide a bit of documentation
as to which hooks are optional, e.g. many of the nested hooks, and which
must be defined for KVM to function.
On Mon, Mar 23, 2020 at 01:27:28PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
> > of using the global kvm_x86_ops. This sets the stage for waiting until
> > after ->hardware_setup() to set kvm_x86_ops with the vendor's
> > implementation.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/vmx/nested.c | 15 ++++++++-------
> > arch/x86/kvm/vmx/nested.h | 3 ++-
> > arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
> > 3 files changed, 24 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 4ff859c99946..87fea22c3799 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
> > }
> > }
> >
> > -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> > +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> > + int (*exit_handlers[])(struct kvm_vcpu *))
> > {
> > int i;
> >
> > @@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> > exit_handlers[EXIT_REASON_INVVPID] = handle_invvpid;
> > exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;
> >
> > - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
> > - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
> > - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
> > - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> > - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
> > - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
> > + ops->check_nested_events = vmx_check_nested_events;
> > + ops->get_nested_state = vmx_get_nested_state;
> > + ops->set_nested_state = vmx_set_nested_state;
> > + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> > + ops->nested_enable_evmcs = nested_enable_evmcs;
> > + ops->nested_get_evmcs_version = nested_get_evmcs_version;
>
>
> A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
> would make 'git grep vmx_x86_ops' output more complete.
Ah, didn't think about that, obviously.
Sean Christopherson <[email protected]> writes:
> On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <[email protected]> writes:
>>
>> > +
>> > + .runtime_ops = &svm_x86_ops,
>> > +};
>>
>> Unrelated to your patch but I think we can make the naming of some of
>> these functions more consistend on SVM/VMX, in particular I'd suggest
>>
>> has_svm() -> cpu_has_svm_support()
>> is_disabled -> svm_disabled_by_bios()
>> ...
>> (see below for VMX)
>>
>> > +
>> > static int __init svm_init(void)
>> > {
>> > - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
>> > + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
>> > __alignof__(struct vcpu_svm), THIS_MODULE);
>> > }
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> > index 07299a957d4a..ffcdcc86f5b7 100644
>> > --- a/arch/x86/kvm/vmx/vmx.c
>> > +++ b/arch/x86/kvm/vmx/vmx.c
>> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
>> > }
>> >
>> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>> > - .cpu_has_kvm_support = cpu_has_kvm_support,
>> > - .disabled_by_bios = vmx_disabled_by_bios,
>> > - .hardware_setup = hardware_setup,
>> > .hardware_unsetup = hardware_unsetup,
>> > - .check_processor_compatibility = vmx_check_processor_compat,
>> > +
>> > .hardware_enable = hardware_enable,
>> > .hardware_disable = hardware_disable,
>> > .cpu_has_accelerated_tpr = report_flexpriority,
>> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>> > };
>> >
>> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
>> > + .cpu_has_kvm_support = cpu_has_kvm_support,
>> > + .disabled_by_bios = vmx_disabled_by_bios,
>> > + .check_processor_compatibility = vmx_check_processor_compat,
>> > + .hardware_setup = hardware_setup,
>>
>> cpu_has_kvm_support() -> cpu_has_vmx_support()
>> hardware_setup() -> vmx_hardware_setup()
>
> Preaching to the choir on this one. The VMX functions without prefixes in
> in particular annoy me to no end, e.g. hardware_setup(). Though the worst
> is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
> time I've tried to find vmx_vcpu_create()...
>
> What if we added a macro to auto-generate the common/required hooks? E.g.:
>
> static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> MANDATORY_KVM_X86_OPS(vmx),
>
> .pmu_ops = &intel_pmu_ops,
>
> ...
> };
>
> That'd enforce consistent naming, and would provide a bit of documentation
> as to which hooks are optional, e.g. many of the nested hooks, and which
> must be defined for KVM to function.
Sounds cool! (not sure that with only two implementations people won't
call it 'over-engineered' but cool). My personal wish would just be that
function names in function implementations are not auto-generated so
e.g. a simple 'git grep vmx_hardware_setup' works but the way how we
fill vmx_x86_ops in can be macroed I guess.
--
Vitaly
On Mon, Mar 23, 2020 at 05:24:56PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <[email protected]> writes:
> >>
> >> > +
> >> > + .runtime_ops = &svm_x86_ops,
> >> > +};
> >>
> >> Unrelated to your patch but I think we can make the naming of some of
> >> these functions more consistend on SVM/VMX, in particular I'd suggest
> >>
> >> has_svm() -> cpu_has_svm_support()
> >> is_disabled -> svm_disabled_by_bios()
> >> ...
> >> (see below for VMX)
> >>
> >> > +
> >> > static int __init svm_init(void)
> >> > {
> >> > - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
> >> > + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
> >> > __alignof__(struct vcpu_svm), THIS_MODULE);
> >> > }
> >> >
> >> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >> > index 07299a957d4a..ffcdcc86f5b7 100644
> >> > --- a/arch/x86/kvm/vmx/vmx.c
> >> > +++ b/arch/x86/kvm/vmx/vmx.c
> >> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> >> > }
> >> >
> >> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >> > - .cpu_has_kvm_support = cpu_has_kvm_support,
> >> > - .disabled_by_bios = vmx_disabled_by_bios,
> >> > - .hardware_setup = hardware_setup,
> >> > .hardware_unsetup = hardware_unsetup,
> >> > - .check_processor_compatibility = vmx_check_processor_compat,
> >> > +
> >> > .hardware_enable = hardware_enable,
> >> > .hardware_disable = hardware_disable,
> >> > .cpu_has_accelerated_tpr = report_flexpriority,
> >> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >> > };
> >> >
> >> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> >> > + .cpu_has_kvm_support = cpu_has_kvm_support,
> >> > + .disabled_by_bios = vmx_disabled_by_bios,
> >> > + .check_processor_compatibility = vmx_check_processor_compat,
> >> > + .hardware_setup = hardware_setup,
> >>
> >> cpu_has_kvm_support() -> cpu_has_vmx_support()
> >> hardware_setup() -> vmx_hardware_setup()
> >
> > Preaching to the choir on this one. The VMX functions without prefixes in
> > in particular annoy me to no end, e.g. hardware_setup(). Though the worst
> > is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
> > time I've tried to find vmx_vcpu_create()...
> >
> > What if we added a macro to auto-generate the common/required hooks? E.g.:
> >
> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > MANDATORY_KVM_X86_OPS(vmx),
> >
> > .pmu_ops = &intel_pmu_ops,
> >
> > ...
> > };
> >
> > That'd enforce consistent naming, and would provide a bit of documentation
> > as to which hooks are optional, e.g. many of the nested hooks, and which
> > must be defined for KVM to function.
>
> Sounds cool! (not sure that with only two implementations people won't
> call it 'over-engineered' but cool). My personal wish would just be that
> function names in function implementations are not auto-generated so
> e.g. a simple 'git grep vmx_hardware_setup' works but the way how we
> fill vmx_x86_ops in can be macroed I guess.
Ya, I was thinking of just the macro. Even that has downsides though, e.g.
chasing kvm_x86_ops.hardware_setup() to find VMX's hardware_setup() becomes
a bit kludgy. On the other hand, _if_ you know how the fill macro works,
getting to the implementation should be easier.
On 23/03/20 17:24, Vitaly Kuznetsov wrote:
> Sounds cool! (not sure that with only two implementations people won't
> call it 'over-engineered' but cool).
Yes, something like
#define KVM_X86_OP(name) .name = vmx_##name
(svm_##name for svm.c) and then
KVM_X86_OP(check_nested_events)
etc.
> My personal wish would just be that
> function names in function implementations are not auto-generated so
> e.g. a simple 'git grep vmx_hardware_setup' works
Yes, absolutely; the function names would still be written by hand.
Paolo
> but the way how we
> fill vmx_x86_ops in can be macroed I guess.
On 23/03/20 13:27, Vitaly Kuznetsov wrote:
>> - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
>> - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
>> - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
>> - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
>> - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
>> - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
>> + ops->check_nested_events = vmx_check_nested_events;
>> + ops->get_nested_state = vmx_get_nested_state;
>> + ops->set_nested_state = vmx_set_nested_state;
>> + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
>> + ops->nested_enable_evmcs = nested_enable_evmcs;
>> + ops->nested_get_evmcs_version = nested_get_evmcs_version;
>
> A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
> would make 'git grep vmx_x86_ops' output more complete.
>
I would prefer even more a kvm_x86_ops.nested struct but I would be okay
with a separate patch.
Paolo
On Sat, Mar 21, 2020 at 01:25:55PM -0700, Sean Christopherson wrote:
> Pass @opaque to kvm_arch_hardware_setup() and
> kvm_arch_check_processor_compat() to allow architecture specific code to
> reference @opaque without having to stash it away in a temporary global
> variable. This will enable x86 to separate its vendor specific callback
> ops, which are passed via @opaque, into "init" and "runtime" ops without
> having to stash away the "init" ops.
>
> No functional change intended.
>
> Reviewed-by: Cornelia Huck <[email protected]>
> Tested-by: Cornelia Huck <[email protected]> #s390
> Acked-by: Marc Zyngier <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
Acked-by: Paul Mackerras <[email protected]>