2020-03-21 20:27:24

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 0/9] KVM: Move x86 init ops to separate struct

The non-x86 part of this series is wholly contained in patch 01. Compared
to other recent kvm-wide changes, this one is very straightforward (famous
last words).

Like a few other architectures, e.g. PPC, x86 uses a set of global hooks
to call back into vendor code on demand. A handlful of the x86 hooks are
used only within the scope of kvm_init(). This series moves the init-only
hooks to a separate struct, partly to clean up the code a bit, but mainly
so that the runtime hooks can be made available only after the x86 vendor
has completed its ->hardware_setup(). While working on a different series
I spent a fair bit of time scratching my as to why a kvm_x86_ops wasn't
working, and eventually realized VMX's callback wasn't "ready" because the
vmcs_config hadn't yet been populated.

Due to lack of a cross-compiling setup, the non-x86 changes in patch 01
are untested.

v3:
- Rebase to kvm/queue, d55c9d4009c7 ("KVM: nSVM: check for EFER ... ").
Conflicts galore, but all mechanical in nature.
- Drop an SVM patch that was obsoleted by kvm/queue.
- Collect an ack. [Marc]

v2:
- Rebase to kvm/queue, 2c2787938512 ("KVM: selftests: Stop ...")
- Collect tags. [Cornelia]
- Add a patch to make kvm_x86_ops its own instance and copy
{vmx,svm}_x86_ops by value, which saves a memory access on every
invocation of a kvm_x86_ops hook. [Paolo]
- Add patches to tag {vmx,svm}_x86_ops as __initdata after they're
copied by value.


Sean Christopherson (9):
KVM: Pass kvm_init()'s opaque param to additional arch funcs
KVM: x86: Move init-only kvm_x86_ops to separate struct
KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
KVM: VMX: Configure runtime hooks using vmx_x86_ops
KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
KVM: VMX: Annotate vmx_x86_ops as __initdata
KVM: SVM: Annotate svm_x86_ops as __initdata

arch/mips/kvm/mips.c | 4 +-
arch/powerpc/kvm/powerpc.c | 4 +-
arch/s390/kvm/kvm-s390.c | 4 +-
arch/x86/include/asm/kvm_host.h | 33 +--
arch/x86/kvm/cpuid.c | 4 +-
arch/x86/kvm/hyperv.c | 8 +-
arch/x86/kvm/kvm_cache_regs.h | 10 +-
arch/x86/kvm/lapic.c | 30 +--
arch/x86/kvm/mmu.h | 8 +-
arch/x86/kvm/mmu/mmu.c | 32 +--
arch/x86/kvm/pmu.c | 30 +--
arch/x86/kvm/pmu.h | 2 +-
arch/x86/kvm/svm.c | 19 +-
arch/x86/kvm/trace.h | 4 +-
arch/x86/kvm/vmx/nested.c | 17 +-
arch/x86/kvm/vmx/nested.h | 3 +-
arch/x86/kvm/vmx/vmx.c | 371 ++++++++++++++++----------------
arch/x86/kvm/x86.c | 370 +++++++++++++++----------------
arch/x86/kvm/x86.h | 4 +-
include/linux/kvm_host.h | 4 +-
virt/kvm/arm/arm.c | 4 +-
virt/kvm/kvm_main.c | 18 +-
22 files changed, 507 insertions(+), 476 deletions(-)

--
2.24.1


2020-03-21 20:27:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
of using the global kvm_x86_ops. This sets the stage for waiting until
after ->hardware_setup() to set kvm_x86_ops with the vendor's
implementation.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 15 ++++++++-------
arch/x86/kvm/vmx/nested.h | 3 ++-
arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4ff859c99946..87fea22c3799 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
}
}

-__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
+__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
+ int (*exit_handlers[])(struct kvm_vcpu *))
{
int i;

@@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
exit_handlers[EXIT_REASON_INVVPID] = handle_invvpid;
exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;

- kvm_x86_ops->check_nested_events = vmx_check_nested_events;
- kvm_x86_ops->get_nested_state = vmx_get_nested_state;
- kvm_x86_ops->set_nested_state = vmx_set_nested_state;
- kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
- kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
- kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
+ ops->check_nested_events = vmx_check_nested_events;
+ ops->get_nested_state = vmx_get_nested_state;
+ ops->set_nested_state = vmx_set_nested_state;
+ ops->get_vmcs12_pages = nested_get_vmcs12_pages;
+ ops->nested_enable_evmcs = nested_enable_evmcs;
+ ops->nested_get_evmcs_version = nested_get_evmcs_version;

return 0;
}
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index f70968b76d33..ac56aefa49e3 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -19,7 +19,8 @@ enum nvmx_vmentry_status {
void vmx_leave_nested(struct kvm_vcpu *vcpu);
void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
-__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
+__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
+ int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 82dab775d520..cfa9093bdc06 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7860,16 +7860,16 @@ static __init int hardware_setup(void)
* using the APIC_ACCESS_ADDR VMCS field.
*/
if (!flexpriority_enabled)
- kvm_x86_ops->set_apic_access_page_addr = NULL;
+ vmx_x86_ops.set_apic_access_page_addr = NULL;

if (!cpu_has_vmx_tpr_shadow())
- kvm_x86_ops->update_cr8_intercept = NULL;
+ vmx_x86_ops.update_cr8_intercept = NULL;

#if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
&& enable_ept) {
- kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
- kvm_x86_ops->tlb_remote_flush_with_range =
+ vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
+ vmx_x86_ops.tlb_remote_flush_with_range =
hv_remote_flush_tlb_with_range;
}
#endif
@@ -7884,7 +7884,7 @@ static __init int hardware_setup(void)

if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
- kvm_x86_ops->sync_pir_to_irr = NULL;
+ vmx_x86_ops.sync_pir_to_irr = NULL;
}

if (cpu_has_vmx_tsc_scaling()) {
@@ -7916,10 +7916,10 @@ static __init int hardware_setup(void)
enable_pml = 0;

if (!enable_pml) {
- kvm_x86_ops->slot_enable_log_dirty = NULL;
- kvm_x86_ops->slot_disable_log_dirty = NULL;
- kvm_x86_ops->flush_log_dirty = NULL;
- kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
+ vmx_x86_ops.slot_enable_log_dirty = NULL;
+ vmx_x86_ops.slot_disable_log_dirty = NULL;
+ vmx_x86_ops.flush_log_dirty = NULL;
+ vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
}

if (!cpu_has_vmx_preemption_timer())
@@ -7947,9 +7947,9 @@ static __init int hardware_setup(void)
}

if (!enable_preemption_timer) {
- kvm_x86_ops->set_hv_timer = NULL;
- kvm_x86_ops->cancel_hv_timer = NULL;
- kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
+ vmx_x86_ops.set_hv_timer = NULL;
+ vmx_x86_ops.cancel_hv_timer = NULL;
+ vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}

kvm_set_posted_intr_wakeup_handler(wakeup_handler);
@@ -7965,7 +7965,8 @@ static __init int hardware_setup(void)
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
vmx_capability.ept);

- r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
+ r = nested_vmx_hardware_setup(&vmx_x86_ops,
+ kvm_vmx_exit_handlers);
if (r)
return r;
}
--
2.24.1

2020-03-21 20:27:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 7/9] KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()

Remove the __exit annotation from VMX hardware_unsetup(), the hook
can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
if failure occurs at various points during initialization.

Note, there is no known functional issue with the __exit annotation, the
above is merely justification for its removal. The real motivation is
to be able to annotate vmx_x86_ops and svm_x86_ops with __initdata,
which makes objtool complain because objtool doesn't understand that the
vendor specific __initdata is being copied by value to a non-__initdata
instance.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 54f991244fae..42a2d0d3984a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1056,7 +1056,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
struct kvm_x86_ops {
int (*hardware_enable)(void);
void (*hardware_disable)(void);
- void (*hardware_unsetup)(void); /* __exit */
+ void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(int index);
void (*cpuid_update)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bbe0d165a0c..fac22e316417 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7652,7 +7652,7 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
return to_vmx(vcpu)->nested.vmxon;
}

-static __exit void hardware_unsetup(void)
+static void hardware_unsetup(void)
{
if (nested)
nested_vmx_hardware_unsetup();
--
2.24.1

2020-03-21 20:27:48

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

Move the kvm_x86_ops functions that are used only within the scope of
kvm_init() into a separate struct, kvm_x86_init_ops. In addition to
identifying the init-only functions without restorting to code comments,
this also sets the stage for waiting until after ->hardware_setup() to
set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is
desirable as many of the hooks are not usable until ->hardware_setup()
completes.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 13 +++++++++----
arch/x86/kvm/svm.c | 15 ++++++++++-----
arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
arch/x86/kvm/x86.c | 10 ++++++----
4 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9a183e9d4cb1..f4c5b49299ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1054,12 +1054,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
}

struct kvm_x86_ops {
- int (*cpu_has_kvm_support)(void); /* __init */
- int (*disabled_by_bios)(void); /* __init */
int (*hardware_enable)(void);
void (*hardware_disable)(void);
- int (*check_processor_compatibility)(void);/* __init */
- int (*hardware_setup)(void); /* __init */
void (*hardware_unsetup)(void); /* __exit */
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(int index);
@@ -1260,6 +1256,15 @@ struct kvm_x86_ops {
int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
};

+struct kvm_x86_init_ops {
+ int (*cpu_has_kvm_support)(void);
+ int (*disabled_by_bios)(void);
+ int (*check_processor_compatibility)(void);
+ int (*hardware_setup)(void);
+
+ struct kvm_x86_ops *runtime_ops;
+};
+
struct kvm_arch_async_pf {
u32 token;
gfn_t gfn;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2125c6ae5951..33e67c3389c2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7351,11 +7351,7 @@ static void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
}

static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
- .cpu_has_kvm_support = has_svm,
- .disabled_by_bios = is_disabled,
- .hardware_setup = svm_hardware_setup,
.hardware_unsetup = svm_hardware_teardown,
- .check_processor_compatibility = svm_check_processor_compat,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
@@ -7480,9 +7476,18 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.check_nested_events = svm_check_nested_events,
};

+static struct kvm_x86_init_ops svm_init_ops __initdata = {
+ .cpu_has_kvm_support = has_svm,
+ .disabled_by_bios = is_disabled,
+ .hardware_setup = svm_hardware_setup,
+ .check_processor_compatibility = svm_check_processor_compat,
+
+ .runtime_ops = &svm_x86_ops,
+};
+
static int __init svm_init(void)
{
- return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
+ return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
__alignof__(struct vcpu_svm), THIS_MODULE);
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 07299a957d4a..ffcdcc86f5b7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
}

static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
- .cpu_has_kvm_support = cpu_has_kvm_support,
- .disabled_by_bios = vmx_disabled_by_bios,
- .hardware_setup = hardware_setup,
.hardware_unsetup = hardware_unsetup,
- .check_processor_compatibility = vmx_check_processor_compat,
+
.hardware_enable = hardware_enable,
.hardware_disable = hardware_disable,
.cpu_has_accelerated_tpr = report_flexpriority,
@@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
};

+static struct kvm_x86_init_ops vmx_init_ops __initdata = {
+ .cpu_has_kvm_support = cpu_has_kvm_support,
+ .disabled_by_bios = vmx_disabled_by_bios,
+ .check_processor_compatibility = vmx_check_processor_compat,
+ .hardware_setup = hardware_setup,
+
+ .runtime_ops = &vmx_x86_ops,
+};
+
static void vmx_cleanup_l1d_flush(void)
{
if (vmx_l1d_flush_pages) {
@@ -8065,7 +8071,7 @@ static int __init vmx_init(void)
}
#endif

- r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
+ r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
__alignof__(struct vcpu_vmx), THIS_MODULE);
if (r)
return r;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0f08e1b4e762..20f989d1bba8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7298,8 +7298,8 @@ static struct notifier_block pvclock_gtod_notifier = {

int kvm_arch_init(void *opaque)
{
+ struct kvm_x86_init_ops *ops = opaque;
int r;
- struct kvm_x86_ops *ops = opaque;

if (kvm_x86_ops) {
printk(KERN_ERR "kvm: already loaded the other module\n");
@@ -7354,7 +7354,7 @@ int kvm_arch_init(void *opaque)
if (r)
goto out_free_percpu;

- kvm_x86_ops = ops;
+ kvm_x86_ops = ops->runtime_ops;

kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0,
@@ -9623,6 +9623,7 @@ void kvm_arch_hardware_disable(void)

int kvm_arch_hardware_setup(void *opaque)
{
+ struct kvm_x86_init_ops *ops = opaque;
int r;

rdmsrl_safe(MSR_EFER, &host_efer);
@@ -9630,7 +9631,7 @@ int kvm_arch_hardware_setup(void *opaque)
if (boot_cpu_has(X86_FEATURE_XSAVES))
rdmsrl(MSR_IA32_XSS, host_xss);

- r = kvm_x86_ops->hardware_setup();
+ r = ops->hardware_setup();
if (r != 0)
return r;

@@ -9665,13 +9666,14 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+ struct kvm_x86_init_ops *ops = opaque;

WARN_ON(!irqs_disabled());

if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits)
return -EIO;

- return kvm_x86_ops->check_processor_compatibility();
+ return ops->check_processor_compatibility();
}

bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--
2.24.1

2020-03-21 20:27:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 1/9] KVM: Pass kvm_init()'s opaque param to additional arch funcs

Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable. This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.

No functional change intended.

Reviewed-by: Cornelia Huck <[email protected]>
Tested-by: Cornelia Huck <[email protected]> #s390
Acked-by: Marc Zyngier <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 4 ++--
arch/powerpc/kvm/powerpc.c | 4 ++--
arch/s390/kvm/kvm-s390.c | 4 ++--
arch/x86/kvm/x86.c | 4 ++--
include/linux/kvm_host.h | 4 ++--
virt/kvm/arm/arm.c | 4 ++--
virt/kvm/kvm_main.c | 18 ++++++++++++++----
7 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 78507757ba9a..8f05dd0a0f4e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -118,12 +118,12 @@ void kvm_arch_hardware_disable(void)
kvm_mips_callbacks->hardware_disable();
}

-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
return 0;
}

-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 62ee66d5eb6f..f54574564f1e 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -415,12 +415,12 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
return 0;
}

-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return kvmppc_core_check_processor_compat();
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 807ed6d722dd..0ac8eb29e5aa 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -235,7 +235,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return 0;
}
@@ -302,7 +302,7 @@ static struct notifier_block kvm_clock_notifier = {
.notifier_call = kvm_clock_sync,
};

-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
gmap_notifier.notifier_call = kvm_gmap_notifier;
gmap_register_pte_notifier(&gmap_notifier);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e54c6ad628a8..0f08e1b4e762 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9621,7 +9621,7 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}

-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
int r;

@@ -9662,7 +9662,7 @@ void kvm_arch_hardware_unsetup(void)
kvm_x86_ops->hardware_unsetup();
}

-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 35bc52e187a2..bc48d8fef3e1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -886,9 +886,9 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu);

int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
-int kvm_arch_hardware_setup(void);
+int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void);
+int kvm_arch_check_processor_compat(void *opaque);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bfdba1caf59d..57e5a143b0db 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -64,12 +64,12 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}

-int kvm_arch_hardware_setup(void)
+int kvm_arch_hardware_setup(void *opaque)
{
return 0;
}

-int kvm_arch_check_processor_compat(void)
+int kvm_arch_check_processor_compat(void *opaque)
{
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 28eae681859f..f4e79c891c25 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4645,14 +4645,22 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
return &kvm_running_vcpu;
}

-static void check_processor_compat(void *rtn)
+struct kvm_cpu_compat_check {
+ void *opaque;
+ int *ret;
+};
+
+static void check_processor_compat(void *data)
{
- *(int *)rtn = kvm_arch_check_processor_compat();
+ struct kvm_cpu_compat_check *c = data;
+
+ *c->ret = kvm_arch_check_processor_compat(c->opaque);
}

int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module)
{
+ struct kvm_cpu_compat_check c;
int r;
int cpu;

@@ -4676,12 +4684,14 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
goto out_free_0;
}

- r = kvm_arch_hardware_setup();
+ r = kvm_arch_hardware_setup(opaque);
if (r < 0)
goto out_free_1;

+ c.ret = &r;
+ c.opaque = opaque;
for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &r, 1);
+ smp_call_function_single(cpu, check_processor_compat, &c, 1);
if (r < 0)
goto out_free_2;
}
--
2.24.1

2020-03-21 20:28:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 3/9] KVM: VMX: Move hardware_setup() definition below vmx_x86_ops

Move VMX's hardware_setup() below its vmx_x86_ops definition so that a
future patch can refactor hardware_setup() to modify vmx_x86_ops
directly instead of indirectly modifying the ops via the global
kvm_x86_ops.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 346 ++++++++++++++++++++---------------------
1 file changed, 173 insertions(+), 173 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ffcdcc86f5b7..82dab775d520 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7652,179 +7652,6 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
return to_vmx(vcpu)->nested.vmxon;
}

-static __init int hardware_setup(void)
-{
- unsigned long host_bndcfgs;
- struct desc_ptr dt;
- int r, i, ept_lpage_level;
-
- store_idt(&dt);
- host_idt_base = dt.address;
-
- for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
- kvm_define_shared_msr(i, vmx_msr_index[i]);
-
- if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
- return -EIO;
-
- if (boot_cpu_has(X86_FEATURE_NX))
- kvm_enable_efer_bits(EFER_NX);
-
- if (boot_cpu_has(X86_FEATURE_MPX)) {
- rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
- WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
- }
-
- if (!cpu_has_vmx_mpx())
- supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
- XFEATURE_MASK_BNDCSR);
-
- if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
- !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
- enable_vpid = 0;
-
- if (!cpu_has_vmx_ept() ||
- !cpu_has_vmx_ept_4levels() ||
- !cpu_has_vmx_ept_mt_wb() ||
- !cpu_has_vmx_invept_global())
- enable_ept = 0;
-
- if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
- enable_ept_ad_bits = 0;
-
- if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
- enable_unrestricted_guest = 0;
-
- if (!cpu_has_vmx_flexpriority())
- flexpriority_enabled = 0;
-
- if (!cpu_has_virtual_nmis())
- enable_vnmi = 0;
-
- /*
- * set_apic_access_page_addr() is used to reload apic access
- * page upon invalidation. No need to do anything if not
- * using the APIC_ACCESS_ADDR VMCS field.
- */
- if (!flexpriority_enabled)
- kvm_x86_ops->set_apic_access_page_addr = NULL;
-
- if (!cpu_has_vmx_tpr_shadow())
- kvm_x86_ops->update_cr8_intercept = NULL;
-
-#if IS_ENABLED(CONFIG_HYPERV)
- if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
- && enable_ept) {
- kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
- kvm_x86_ops->tlb_remote_flush_with_range =
- hv_remote_flush_tlb_with_range;
- }
-#endif
-
- if (!cpu_has_vmx_ple()) {
- ple_gap = 0;
- ple_window = 0;
- ple_window_grow = 0;
- ple_window_max = 0;
- ple_window_shrink = 0;
- }
-
- if (!cpu_has_vmx_apicv()) {
- enable_apicv = 0;
- kvm_x86_ops->sync_pir_to_irr = NULL;
- }
-
- if (cpu_has_vmx_tsc_scaling()) {
- kvm_has_tsc_control = true;
- kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
- kvm_tsc_scaling_ratio_frac_bits = 48;
- }
-
- set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
-
- if (enable_ept)
- vmx_enable_tdp();
-
- if (!enable_ept)
- ept_lpage_level = 0;
- else if (cpu_has_vmx_ept_1g_page())
- ept_lpage_level = PT_PDPE_LEVEL;
- else if (cpu_has_vmx_ept_2m_page())
- ept_lpage_level = PT_DIRECTORY_LEVEL;
- else
- ept_lpage_level = PT_PAGE_TABLE_LEVEL;
- kvm_configure_mmu(enable_ept, ept_lpage_level);
-
- /*
- * Only enable PML when hardware supports PML feature, and both EPT
- * and EPT A/D bit features are enabled -- PML depends on them to work.
- */
- if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
- enable_pml = 0;
-
- if (!enable_pml) {
- kvm_x86_ops->slot_enable_log_dirty = NULL;
- kvm_x86_ops->slot_disable_log_dirty = NULL;
- kvm_x86_ops->flush_log_dirty = NULL;
- kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
- }
-
- if (!cpu_has_vmx_preemption_timer())
- enable_preemption_timer = false;
-
- if (enable_preemption_timer) {
- u64 use_timer_freq = 5000ULL * 1000 * 1000;
- u64 vmx_msr;
-
- rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
- cpu_preemption_timer_multi =
- vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
-
- if (tsc_khz)
- use_timer_freq = (u64)tsc_khz * 1000;
- use_timer_freq >>= cpu_preemption_timer_multi;
-
- /*
- * KVM "disables" the preemption timer by setting it to its max
- * value. Don't use the timer if it might cause spurious exits
- * at a rate faster than 0.1 Hz (of uninterrupted guest time).
- */
- if (use_timer_freq > 0xffffffffu / 10)
- enable_preemption_timer = false;
- }
-
- if (!enable_preemption_timer) {
- kvm_x86_ops->set_hv_timer = NULL;
- kvm_x86_ops->cancel_hv_timer = NULL;
- kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
- }
-
- kvm_set_posted_intr_wakeup_handler(wakeup_handler);
-
- kvm_mce_cap_supported |= MCG_LMCE_P;
-
- if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
- return -EINVAL;
- if (!enable_ept || !cpu_has_vmx_intel_pt())
- pt_mode = PT_MODE_SYSTEM;
-
- if (nested) {
- nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept);
-
- r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
- if (r)
- return r;
- }
-
- vmx_set_cpu_caps();
-
- r = alloc_kvm_area();
- if (r)
- nested_vmx_hardware_unsetup();
- return r;
-}
-
static __exit void hardware_unsetup(void)
{
if (nested)
@@ -7978,6 +7805,179 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
};

+static __init int hardware_setup(void)
+{
+ unsigned long host_bndcfgs;
+ struct desc_ptr dt;
+ int r, i, ept_lpage_level;
+
+ store_idt(&dt);
+ host_idt_base = dt.address;
+
+ for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
+ kvm_define_shared_msr(i, vmx_msr_index[i]);
+
+ if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
+ return -EIO;
+
+ if (boot_cpu_has(X86_FEATURE_NX))
+ kvm_enable_efer_bits(EFER_NX);
+
+ if (boot_cpu_has(X86_FEATURE_MPX)) {
+ rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
+ WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
+ }
+
+ if (!cpu_has_vmx_mpx())
+ supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
+ XFEATURE_MASK_BNDCSR);
+
+ if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
+ !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
+ enable_vpid = 0;
+
+ if (!cpu_has_vmx_ept() ||
+ !cpu_has_vmx_ept_4levels() ||
+ !cpu_has_vmx_ept_mt_wb() ||
+ !cpu_has_vmx_invept_global())
+ enable_ept = 0;
+
+ if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
+ enable_ept_ad_bits = 0;
+
+ if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
+ enable_unrestricted_guest = 0;
+
+ if (!cpu_has_vmx_flexpriority())
+ flexpriority_enabled = 0;
+
+ if (!cpu_has_virtual_nmis())
+ enable_vnmi = 0;
+
+ /*
+ * set_apic_access_page_addr() is used to reload apic access
+ * page upon invalidation. No need to do anything if not
+ * using the APIC_ACCESS_ADDR VMCS field.
+ */
+ if (!flexpriority_enabled)
+ kvm_x86_ops->set_apic_access_page_addr = NULL;
+
+ if (!cpu_has_vmx_tpr_shadow())
+ kvm_x86_ops->update_cr8_intercept = NULL;
+
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
+ && enable_ept) {
+ kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+ kvm_x86_ops->tlb_remote_flush_with_range =
+ hv_remote_flush_tlb_with_range;
+ }
+#endif
+
+ if (!cpu_has_vmx_ple()) {
+ ple_gap = 0;
+ ple_window = 0;
+ ple_window_grow = 0;
+ ple_window_max = 0;
+ ple_window_shrink = 0;
+ }
+
+ if (!cpu_has_vmx_apicv()) {
+ enable_apicv = 0;
+ kvm_x86_ops->sync_pir_to_irr = NULL;
+ }
+
+ if (cpu_has_vmx_tsc_scaling()) {
+ kvm_has_tsc_control = true;
+ kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
+ kvm_tsc_scaling_ratio_frac_bits = 48;
+ }
+
+ set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
+
+ if (enable_ept)
+ vmx_enable_tdp();
+
+ if (!enable_ept)
+ ept_lpage_level = 0;
+ else if (cpu_has_vmx_ept_1g_page())
+ ept_lpage_level = PT_PDPE_LEVEL;
+ else if (cpu_has_vmx_ept_2m_page())
+ ept_lpage_level = PT_DIRECTORY_LEVEL;
+ else
+ ept_lpage_level = PT_PAGE_TABLE_LEVEL;
+ kvm_configure_mmu(enable_ept, ept_lpage_level);
+
+ /*
+ * Only enable PML when hardware supports PML feature, and both EPT
+ * and EPT A/D bit features are enabled -- PML depends on them to work.
+ */
+ if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
+ enable_pml = 0;
+
+ if (!enable_pml) {
+ kvm_x86_ops->slot_enable_log_dirty = NULL;
+ kvm_x86_ops->slot_disable_log_dirty = NULL;
+ kvm_x86_ops->flush_log_dirty = NULL;
+ kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
+ }
+
+ if (!cpu_has_vmx_preemption_timer())
+ enable_preemption_timer = false;
+
+ if (enable_preemption_timer) {
+ u64 use_timer_freq = 5000ULL * 1000 * 1000;
+ u64 vmx_msr;
+
+ rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+ cpu_preemption_timer_multi =
+ vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
+
+ if (tsc_khz)
+ use_timer_freq = (u64)tsc_khz * 1000;
+ use_timer_freq >>= cpu_preemption_timer_multi;
+
+ /*
+ * KVM "disables" the preemption timer by setting it to its max
+ * value. Don't use the timer if it might cause spurious exits
+ * at a rate faster than 0.1 Hz (of uninterrupted guest time).
+ */
+ if (use_timer_freq > 0xffffffffu / 10)
+ enable_preemption_timer = false;
+ }
+
+ if (!enable_preemption_timer) {
+ kvm_x86_ops->set_hv_timer = NULL;
+ kvm_x86_ops->cancel_hv_timer = NULL;
+ kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
+ }
+
+ kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
+ kvm_mce_cap_supported |= MCG_LMCE_P;
+
+ if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
+ return -EINVAL;
+ if (!enable_ept || !cpu_has_vmx_intel_pt())
+ pt_mode = PT_MODE_SYSTEM;
+
+ if (nested) {
+ nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
+ vmx_capability.ept);
+
+ r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
+ if (r)
+ return r;
+ }
+
+ vmx_set_cpu_caps();
+
+ r = alloc_kvm_area();
+ if (r)
+ nested_vmx_hardware_unsetup();
+ return r;
+}
+
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
--
2.24.1

2020-03-23 12:12:53

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

Sean Christopherson <[email protected]> writes:

> Move the kvm_x86_ops functions that are used only within the scope of
> kvm_init() into a separate struct, kvm_x86_init_ops. In addition to
> identifying the init-only functions without restorting to code comments,
> this also sets the stage for waiting until after ->hardware_setup() to
> set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is
> desirable as many of the hooks are not usable until ->hardware_setup()
> completes.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 13 +++++++++----
> arch/x86/kvm/svm.c | 15 ++++++++++-----
> arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
> arch/x86/kvm/x86.c | 10 ++++++----
> 4 files changed, 36 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9a183e9d4cb1..f4c5b49299ff 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1054,12 +1054,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> }
>
> struct kvm_x86_ops {
> - int (*cpu_has_kvm_support)(void); /* __init */
> - int (*disabled_by_bios)(void); /* __init */
> int (*hardware_enable)(void);
> void (*hardware_disable)(void);
> - int (*check_processor_compatibility)(void);/* __init */
> - int (*hardware_setup)(void); /* __init */
> void (*hardware_unsetup)(void); /* __exit */
> bool (*cpu_has_accelerated_tpr)(void);
> bool (*has_emulated_msr)(int index);
> @@ -1260,6 +1256,15 @@ struct kvm_x86_ops {
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> };
>
> +struct kvm_x86_init_ops {
> + int (*cpu_has_kvm_support)(void);
> + int (*disabled_by_bios)(void);
> + int (*check_processor_compatibility)(void);
> + int (*hardware_setup)(void);
> +
> + struct kvm_x86_ops *runtime_ops;
> +};
> +
> struct kvm_arch_async_pf {
> u32 token;
> gfn_t gfn;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 2125c6ae5951..33e67c3389c2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7351,11 +7351,7 @@ static void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
> }
>
> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> - .cpu_has_kvm_support = has_svm,
> - .disabled_by_bios = is_disabled,
> - .hardware_setup = svm_hardware_setup,
> .hardware_unsetup = svm_hardware_teardown,
> - .check_processor_compatibility = svm_check_processor_compat,
> .hardware_enable = svm_hardware_enable,
> .hardware_disable = svm_hardware_disable,
> .cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
> @@ -7480,9 +7476,18 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .check_nested_events = svm_check_nested_events,
> };
>
> +static struct kvm_x86_init_ops svm_init_ops __initdata = {
> + .cpu_has_kvm_support = has_svm,
> + .disabled_by_bios = is_disabled,
> + .hardware_setup = svm_hardware_setup,
> + .check_processor_compatibility = svm_check_processor_compat,
> +
> + .runtime_ops = &svm_x86_ops,
> +};

Unrelated to your patch but I think we can make the naming of some of
these functions more consistend on SVM/VMX, in particular I'd suggest

has_svm() -> cpu_has_svm_support()
is_disabled -> svm_disabled_by_bios()
...
(see below for VMX)

> +
> static int __init svm_init(void)
> {
> - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
> + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
> __alignof__(struct vcpu_svm), THIS_MODULE);
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 07299a957d4a..ffcdcc86f5b7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> }
>
> static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> - .cpu_has_kvm_support = cpu_has_kvm_support,
> - .disabled_by_bios = vmx_disabled_by_bios,
> - .hardware_setup = hardware_setup,
> .hardware_unsetup = hardware_unsetup,
> - .check_processor_compatibility = vmx_check_processor_compat,
> +
> .hardware_enable = hardware_enable,
> .hardware_disable = hardware_disable,
> .cpu_has_accelerated_tpr = report_flexpriority,
> @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> };
>
> +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> + .cpu_has_kvm_support = cpu_has_kvm_support,
> + .disabled_by_bios = vmx_disabled_by_bios,
> + .check_processor_compatibility = vmx_check_processor_compat,
> + .hardware_setup = hardware_setup,

cpu_has_kvm_support() -> cpu_has_vmx_support()
hardware_setup() -> vmx_hardware_setup()

> +
> + .runtime_ops = &vmx_x86_ops,
> +};
> +
> static void vmx_cleanup_l1d_flush(void)
> {
> if (vmx_l1d_flush_pages) {
> @@ -8065,7 +8071,7 @@ static int __init vmx_init(void)
> }
> #endif
>
> - r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx),
> + r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
> __alignof__(struct vcpu_vmx), THIS_MODULE);
> if (r)
> return r;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0f08e1b4e762..20f989d1bba8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7298,8 +7298,8 @@ static struct notifier_block pvclock_gtod_notifier = {
>
> int kvm_arch_init(void *opaque)
> {
> + struct kvm_x86_init_ops *ops = opaque;
> int r;
> - struct kvm_x86_ops *ops = opaque;
>
> if (kvm_x86_ops) {
> printk(KERN_ERR "kvm: already loaded the other module\n");
> @@ -7354,7 +7354,7 @@ int kvm_arch_init(void *opaque)
> if (r)
> goto out_free_percpu;
>
> - kvm_x86_ops = ops;
> + kvm_x86_ops = ops->runtime_ops;
>
> kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
> PT_DIRTY_MASK, PT64_NX_MASK, 0,
> @@ -9623,6 +9623,7 @@ void kvm_arch_hardware_disable(void)
>
> int kvm_arch_hardware_setup(void *opaque)
> {
> + struct kvm_x86_init_ops *ops = opaque;
> int r;
>
> rdmsrl_safe(MSR_EFER, &host_efer);
> @@ -9630,7 +9631,7 @@ int kvm_arch_hardware_setup(void *opaque)
> if (boot_cpu_has(X86_FEATURE_XSAVES))
> rdmsrl(MSR_IA32_XSS, host_xss);
>
> - r = kvm_x86_ops->hardware_setup();
> + r = ops->hardware_setup();
> if (r != 0)
> return r;
>
> @@ -9665,13 +9666,14 @@ void kvm_arch_hardware_unsetup(void)
> int kvm_arch_check_processor_compat(void *opaque)
> {
> struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> + struct kvm_x86_init_ops *ops = opaque;
>
> WARN_ON(!irqs_disabled());
>
> if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits)
> return -EIO;
>
> - return kvm_x86_ops->check_processor_compatibility();
> + return ops->check_processor_compatibility();
> }
>
> bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)

The patch itself looks good,

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-03-23 12:13:55

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] KVM: VMX: Move hardware_setup() definition below vmx_x86_ops

Sean Christopherson <[email protected]> writes:

> Move VMX's hardware_setup() below its vmx_x86_ops definition so that a
> future patch can refactor hardware_setup() to modify vmx_x86_ops
> directly instead of indirectly modifying the ops via the global
> kvm_x86_ops.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 346 ++++++++++++++++++++---------------------
> 1 file changed, 173 insertions(+), 173 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ffcdcc86f5b7..82dab775d520 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7652,179 +7652,6 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> return to_vmx(vcpu)->nested.vmxon;
> }
>
> -static __init int hardware_setup(void)
> -{
> - unsigned long host_bndcfgs;
> - struct desc_ptr dt;
> - int r, i, ept_lpage_level;
> -
> - store_idt(&dt);
> - host_idt_base = dt.address;
> -
> - for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
> - kvm_define_shared_msr(i, vmx_msr_index[i]);
> -
> - if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
> - return -EIO;
> -
> - if (boot_cpu_has(X86_FEATURE_NX))
> - kvm_enable_efer_bits(EFER_NX);
> -
> - if (boot_cpu_has(X86_FEATURE_MPX)) {
> - rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
> - WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
> - }
> -
> - if (!cpu_has_vmx_mpx())
> - supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
> - XFEATURE_MASK_BNDCSR);
> -
> - if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
> - !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
> - enable_vpid = 0;
> -
> - if (!cpu_has_vmx_ept() ||
> - !cpu_has_vmx_ept_4levels() ||
> - !cpu_has_vmx_ept_mt_wb() ||
> - !cpu_has_vmx_invept_global())
> - enable_ept = 0;
> -
> - if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
> - enable_ept_ad_bits = 0;
> -
> - if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
> - enable_unrestricted_guest = 0;
> -
> - if (!cpu_has_vmx_flexpriority())
> - flexpriority_enabled = 0;
> -
> - if (!cpu_has_virtual_nmis())
> - enable_vnmi = 0;
> -
> - /*
> - * set_apic_access_page_addr() is used to reload apic access
> - * page upon invalidation. No need to do anything if not
> - * using the APIC_ACCESS_ADDR VMCS field.
> - */
> - if (!flexpriority_enabled)
> - kvm_x86_ops->set_apic_access_page_addr = NULL;
> -
> - if (!cpu_has_vmx_tpr_shadow())
> - kvm_x86_ops->update_cr8_intercept = NULL;
> -
> -#if IS_ENABLED(CONFIG_HYPERV)
> - if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> - && enable_ept) {
> - kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> - kvm_x86_ops->tlb_remote_flush_with_range =
> - hv_remote_flush_tlb_with_range;
> - }
> -#endif
> -
> - if (!cpu_has_vmx_ple()) {
> - ple_gap = 0;
> - ple_window = 0;
> - ple_window_grow = 0;
> - ple_window_max = 0;
> - ple_window_shrink = 0;
> - }
> -
> - if (!cpu_has_vmx_apicv()) {
> - enable_apicv = 0;
> - kvm_x86_ops->sync_pir_to_irr = NULL;
> - }
> -
> - if (cpu_has_vmx_tsc_scaling()) {
> - kvm_has_tsc_control = true;
> - kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
> - kvm_tsc_scaling_ratio_frac_bits = 48;
> - }
> -
> - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> -
> - if (enable_ept)
> - vmx_enable_tdp();
> -
> - if (!enable_ept)
> - ept_lpage_level = 0;
> - else if (cpu_has_vmx_ept_1g_page())
> - ept_lpage_level = PT_PDPE_LEVEL;
> - else if (cpu_has_vmx_ept_2m_page())
> - ept_lpage_level = PT_DIRECTORY_LEVEL;
> - else
> - ept_lpage_level = PT_PAGE_TABLE_LEVEL;
> - kvm_configure_mmu(enable_ept, ept_lpage_level);
> -
> - /*
> - * Only enable PML when hardware supports PML feature, and both EPT
> - * and EPT A/D bit features are enabled -- PML depends on them to work.
> - */
> - if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
> - enable_pml = 0;
> -
> - if (!enable_pml) {
> - kvm_x86_ops->slot_enable_log_dirty = NULL;
> - kvm_x86_ops->slot_disable_log_dirty = NULL;
> - kvm_x86_ops->flush_log_dirty = NULL;
> - kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> - }
> -
> - if (!cpu_has_vmx_preemption_timer())
> - enable_preemption_timer = false;
> -
> - if (enable_preemption_timer) {
> - u64 use_timer_freq = 5000ULL * 1000 * 1000;
> - u64 vmx_msr;
> -
> - rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> - cpu_preemption_timer_multi =
> - vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
> -
> - if (tsc_khz)
> - use_timer_freq = (u64)tsc_khz * 1000;
> - use_timer_freq >>= cpu_preemption_timer_multi;
> -
> - /*
> - * KVM "disables" the preemption timer by setting it to its max
> - * value. Don't use the timer if it might cause spurious exits
> - * at a rate faster than 0.1 Hz (of uninterrupted guest time).
> - */
> - if (use_timer_freq > 0xffffffffu / 10)
> - enable_preemption_timer = false;
> - }
> -
> - if (!enable_preemption_timer) {
> - kvm_x86_ops->set_hv_timer = NULL;
> - kvm_x86_ops->cancel_hv_timer = NULL;
> - kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
> - }
> -
> - kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> -
> - kvm_mce_cap_supported |= MCG_LMCE_P;
> -
> - if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
> - return -EINVAL;
> - if (!enable_ept || !cpu_has_vmx_intel_pt())
> - pt_mode = PT_MODE_SYSTEM;
> -
> - if (nested) {
> - nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
> - vmx_capability.ept);
> -
> - r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
> - if (r)
> - return r;
> - }
> -
> - vmx_set_cpu_caps();
> -
> - r = alloc_kvm_area();
> - if (r)
> - nested_vmx_hardware_unsetup();
> - return r;
> -}
> -
> static __exit void hardware_unsetup(void)
> {
> if (nested)
> @@ -7978,6 +7805,179 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> };
>
> +static __init int hardware_setup(void)
> +{
> + unsigned long host_bndcfgs;
> + struct desc_ptr dt;
> + int r, i, ept_lpage_level;
> +
> + store_idt(&dt);
> + host_idt_base = dt.address;
> +
> + for (i = 0; i < ARRAY_SIZE(vmx_msr_index); ++i)
> + kvm_define_shared_msr(i, vmx_msr_index[i]);
> +
> + if (setup_vmcs_config(&vmcs_config, &vmx_capability) < 0)
> + return -EIO;
> +
> + if (boot_cpu_has(X86_FEATURE_NX))
> + kvm_enable_efer_bits(EFER_NX);
> +
> + if (boot_cpu_has(X86_FEATURE_MPX)) {
> + rdmsrl(MSR_IA32_BNDCFGS, host_bndcfgs);
> + WARN_ONCE(host_bndcfgs, "KVM: BNDCFGS in host will be lost");
> + }
> +
> + if (!cpu_has_vmx_mpx())
> + supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
> + XFEATURE_MASK_BNDCSR);
> +
> + if (!cpu_has_vmx_vpid() || !cpu_has_vmx_invvpid() ||
> + !(cpu_has_vmx_invvpid_single() || cpu_has_vmx_invvpid_global()))
> + enable_vpid = 0;
> +
> + if (!cpu_has_vmx_ept() ||
> + !cpu_has_vmx_ept_4levels() ||
> + !cpu_has_vmx_ept_mt_wb() ||
> + !cpu_has_vmx_invept_global())
> + enable_ept = 0;
> +
> + if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
> + enable_ept_ad_bits = 0;
> +
> + if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
> + enable_unrestricted_guest = 0;
> +
> + if (!cpu_has_vmx_flexpriority())
> + flexpriority_enabled = 0;
> +
> + if (!cpu_has_virtual_nmis())
> + enable_vnmi = 0;
> +
> + /*
> + * set_apic_access_page_addr() is used to reload apic access
> + * page upon invalidation. No need to do anything if not
> + * using the APIC_ACCESS_ADDR VMCS field.
> + */
> + if (!flexpriority_enabled)
> + kvm_x86_ops->set_apic_access_page_addr = NULL;
> +
> + if (!cpu_has_vmx_tpr_shadow())
> + kvm_x86_ops->update_cr8_intercept = NULL;
> +
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> + && enable_ept) {
> + kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> + kvm_x86_ops->tlb_remote_flush_with_range =
> + hv_remote_flush_tlb_with_range;
> + }
> +#endif
> +
> + if (!cpu_has_vmx_ple()) {
> + ple_gap = 0;
> + ple_window = 0;
> + ple_window_grow = 0;
> + ple_window_max = 0;
> + ple_window_shrink = 0;
> + }
> +
> + if (!cpu_has_vmx_apicv()) {
> + enable_apicv = 0;
> + kvm_x86_ops->sync_pir_to_irr = NULL;
> + }
> +
> + if (cpu_has_vmx_tsc_scaling()) {
> + kvm_has_tsc_control = true;
> + kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
> + kvm_tsc_scaling_ratio_frac_bits = 48;
> + }
> +
> + set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> +
> + if (enable_ept)
> + vmx_enable_tdp();
> +
> + if (!enable_ept)
> + ept_lpage_level = 0;
> + else if (cpu_has_vmx_ept_1g_page())
> + ept_lpage_level = PT_PDPE_LEVEL;
> + else if (cpu_has_vmx_ept_2m_page())
> + ept_lpage_level = PT_DIRECTORY_LEVEL;
> + else
> + ept_lpage_level = PT_PAGE_TABLE_LEVEL;
> + kvm_configure_mmu(enable_ept, ept_lpage_level);
> +
> + /*
> + * Only enable PML when hardware supports PML feature, and both EPT
> + * and EPT A/D bit features are enabled -- PML depends on them to work.
> + */
> + if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
> + enable_pml = 0;
> +
> + if (!enable_pml) {
> + kvm_x86_ops->slot_enable_log_dirty = NULL;
> + kvm_x86_ops->slot_disable_log_dirty = NULL;
> + kvm_x86_ops->flush_log_dirty = NULL;
> + kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> + }
> +
> + if (!cpu_has_vmx_preemption_timer())
> + enable_preemption_timer = false;
> +
> + if (enable_preemption_timer) {
> + u64 use_timer_freq = 5000ULL * 1000 * 1000;
> + u64 vmx_msr;
> +
> + rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
> + cpu_preemption_timer_multi =
> + vmx_msr & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
> +
> + if (tsc_khz)
> + use_timer_freq = (u64)tsc_khz * 1000;
> + use_timer_freq >>= cpu_preemption_timer_multi;
> +
> + /*
> + * KVM "disables" the preemption timer by setting it to its max
> + * value. Don't use the timer if it might cause spurious exits
> + * at a rate faster than 0.1 Hz (of uninterrupted guest time).
> + */
> + if (use_timer_freq > 0xffffffffu / 10)
> + enable_preemption_timer = false;
> + }
> +
> + if (!enable_preemption_timer) {
> + kvm_x86_ops->set_hv_timer = NULL;
> + kvm_x86_ops->cancel_hv_timer = NULL;
> + kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
> + }
> +
> + kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> +
> + kvm_mce_cap_supported |= MCG_LMCE_P;
> +
> + if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
> + return -EINVAL;
> + if (!enable_ept || !cpu_has_vmx_intel_pt())
> + pt_mode = PT_MODE_SYSTEM;
> +
> + if (nested) {
> + nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
> + vmx_capability.ept);
> +
> + r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
> + if (r)
> + return r;
> + }
> +
> + vmx_set_cpu_caps();
> +
> + r = alloc_kvm_area();
> + if (r)
> + nested_vmx_hardware_unsetup();
> + return r;
> +}
> +
> static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> .cpu_has_kvm_support = cpu_has_kvm_support,
> .disabled_by_bios = vmx_disabled_by_bios,

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-03-23 12:29:32

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

Sean Christopherson <[email protected]> writes:

> Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
> of using the global kvm_x86_ops. This sets the stage for waiting until
> after ->hardware_setup() to set kvm_x86_ops with the vendor's
> implementation.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/nested.c | 15 ++++++++-------
> arch/x86/kvm/vmx/nested.h | 3 ++-
> arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
> 3 files changed, 24 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 4ff859c99946..87fea22c3799 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
> }
> }
>
> -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> + int (*exit_handlers[])(struct kvm_vcpu *))
> {
> int i;
>
> @@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> exit_handlers[EXIT_REASON_INVVPID] = handle_invvpid;
> exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;
>
> - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
> - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
> - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
> - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
> - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
> + ops->check_nested_events = vmx_check_nested_events;
> + ops->get_nested_state = vmx_get_nested_state;
> + ops->set_nested_state = vmx_set_nested_state;
> + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> + ops->nested_enable_evmcs = nested_enable_evmcs;
> + ops->nested_get_evmcs_version = nested_get_evmcs_version;


A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
would make 'git grep vmx_x86_ops' output more complete.

>
> return 0;
> }
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index f70968b76d33..ac56aefa49e3 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -19,7 +19,8 @@ enum nvmx_vmentry_status {
> void vmx_leave_nested(struct kvm_vcpu *vcpu);
> void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
> void nested_vmx_hardware_unsetup(void);
> -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
> +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> + int (*exit_handlers[])(struct kvm_vcpu *));
> void nested_vmx_set_vmcs_shadowing_bitmap(void);
> void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
> enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 82dab775d520..cfa9093bdc06 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7860,16 +7860,16 @@ static __init int hardware_setup(void)
> * using the APIC_ACCESS_ADDR VMCS field.
> */
> if (!flexpriority_enabled)
> - kvm_x86_ops->set_apic_access_page_addr = NULL;
> + vmx_x86_ops.set_apic_access_page_addr = NULL;
>
> if (!cpu_has_vmx_tpr_shadow())
> - kvm_x86_ops->update_cr8_intercept = NULL;
> + vmx_x86_ops.update_cr8_intercept = NULL;
>
> #if IS_ENABLED(CONFIG_HYPERV)
> if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
> && enable_ept) {
> - kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
> - kvm_x86_ops->tlb_remote_flush_with_range =
> + vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
> + vmx_x86_ops.tlb_remote_flush_with_range =
> hv_remote_flush_tlb_with_range;
> }
> #endif
> @@ -7884,7 +7884,7 @@ static __init int hardware_setup(void)
>
> if (!cpu_has_vmx_apicv()) {
> enable_apicv = 0;
> - kvm_x86_ops->sync_pir_to_irr = NULL;
> + vmx_x86_ops.sync_pir_to_irr = NULL;
> }
>
> if (cpu_has_vmx_tsc_scaling()) {
> @@ -7916,10 +7916,10 @@ static __init int hardware_setup(void)
> enable_pml = 0;
>
> if (!enable_pml) {
> - kvm_x86_ops->slot_enable_log_dirty = NULL;
> - kvm_x86_ops->slot_disable_log_dirty = NULL;
> - kvm_x86_ops->flush_log_dirty = NULL;
> - kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> + vmx_x86_ops.slot_enable_log_dirty = NULL;
> + vmx_x86_ops.slot_disable_log_dirty = NULL;
> + vmx_x86_ops.flush_log_dirty = NULL;
> + vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
> }
>
> if (!cpu_has_vmx_preemption_timer())
> @@ -7947,9 +7947,9 @@ static __init int hardware_setup(void)
> }
>
> if (!enable_preemption_timer) {
> - kvm_x86_ops->set_hv_timer = NULL;
> - kvm_x86_ops->cancel_hv_timer = NULL;
> - kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
> + vmx_x86_ops.set_hv_timer = NULL;
> + vmx_x86_ops.cancel_hv_timer = NULL;
> + vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
> }
>
> kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> @@ -7965,7 +7965,8 @@ static __init int hardware_setup(void)
> nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
> vmx_capability.ept);
>
> - r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
> + r = nested_vmx_hardware_setup(&vmx_x86_ops,
> + kvm_vmx_exit_handlers);
> if (r)
> return r;
> }

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-03-23 12:48:28

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 7/9] KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()

Sean Christopherson <[email protected]> writes:

> Remove the __exit annotation from VMX hardware_unsetup(), the hook
> can be reached during kvm_init() by way of kvm_arch_hardware_unsetup()
> if failure occurs at various points during initialization.
>
> Note, there is no known functional issue with the __exit annotation, the
> above is merely justification for its removal. The real motivation is
> to be able to annotate vmx_x86_ops and svm_x86_ops with __initdata,
> which makes objtool complain because objtool doesn't understand that the
> vendor specific __initdata is being copied by value to a non-__initdata
> instance.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 54f991244fae..42a2d0d3984a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1056,7 +1056,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> struct kvm_x86_ops {
> int (*hardware_enable)(void);
> void (*hardware_disable)(void);
> - void (*hardware_unsetup)(void); /* __exit */
> + void (*hardware_unsetup)(void);
> bool (*cpu_has_accelerated_tpr)(void);
> bool (*has_emulated_msr)(int index);
> void (*cpuid_update)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4bbe0d165a0c..fac22e316417 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7652,7 +7652,7 @@ static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> return to_vmx(vcpu)->nested.vmxon;
> }
>
> -static __exit void hardware_unsetup(void)
> +static void hardware_unsetup(void)
> {
> if (nested)
> nested_vmx_hardware_unsetup();

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-03-23 15:29:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > +
> > + .runtime_ops = &svm_x86_ops,
> > +};
>
> Unrelated to your patch but I think we can make the naming of some of
> these functions more consistend on SVM/VMX, in particular I'd suggest
>
> has_svm() -> cpu_has_svm_support()
> is_disabled -> svm_disabled_by_bios()
> ...
> (see below for VMX)
>
> > +
> > static int __init svm_init(void)
> > {
> > - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
> > + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
> > __alignof__(struct vcpu_svm), THIS_MODULE);
> > }
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 07299a957d4a..ffcdcc86f5b7 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> > }
> >
> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > - .cpu_has_kvm_support = cpu_has_kvm_support,
> > - .disabled_by_bios = vmx_disabled_by_bios,
> > - .hardware_setup = hardware_setup,
> > .hardware_unsetup = hardware_unsetup,
> > - .check_processor_compatibility = vmx_check_processor_compat,
> > +
> > .hardware_enable = hardware_enable,
> > .hardware_disable = hardware_disable,
> > .cpu_has_accelerated_tpr = report_flexpriority,
> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > };
> >
> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> > + .cpu_has_kvm_support = cpu_has_kvm_support,
> > + .disabled_by_bios = vmx_disabled_by_bios,
> > + .check_processor_compatibility = vmx_check_processor_compat,
> > + .hardware_setup = hardware_setup,
>
> cpu_has_kvm_support() -> cpu_has_vmx_support()
> hardware_setup() -> vmx_hardware_setup()

Preaching to the choir on this one. The VMX functions without prefixes in
in particular annoy me to no end, e.g. hardware_setup(). Though the worst
is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
time I've tried to find vmx_vcpu_create()...

What if we added a macro to auto-generate the common/required hooks? E.g.:

static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
MANDATORY_KVM_X86_OPS(vmx),

.pmu_ops = &intel_pmu_ops,

...
};

That'd enforce consistent naming, and would provide a bit of documentation
as to which hooks are optional, e.g. many of the nested hooks, and which
must be defined for KVM to function.

2020-03-23 16:24:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

On Mon, Mar 23, 2020 at 01:27:28PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > Configure VMX's runtime hooks by modifying vmx_x86_ops directly instead
> > of using the global kvm_x86_ops. This sets the stage for waiting until
> > after ->hardware_setup() to set kvm_x86_ops with the vendor's
> > implementation.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/vmx/nested.c | 15 ++++++++-------
> > arch/x86/kvm/vmx/nested.h | 3 ++-
> > arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++++-------------
> > 3 files changed, 24 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 4ff859c99946..87fea22c3799 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -6241,7 +6241,8 @@ void nested_vmx_hardware_unsetup(void)
> > }
> > }
> >
> > -__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> > +__init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
> > + int (*exit_handlers[])(struct kvm_vcpu *))
> > {
> > int i;
> >
> > @@ -6277,12 +6278,12 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
> > exit_handlers[EXIT_REASON_INVVPID] = handle_invvpid;
> > exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;
> >
> > - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
> > - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
> > - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
> > - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> > - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
> > - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
> > + ops->check_nested_events = vmx_check_nested_events;
> > + ops->get_nested_state = vmx_get_nested_state;
> > + ops->set_nested_state = vmx_set_nested_state;
> > + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
> > + ops->nested_enable_evmcs = nested_enable_evmcs;
> > + ops->nested_get_evmcs_version = nested_get_evmcs_version;
>
>
> A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
> would make 'git grep vmx_x86_ops' output more complete.

Ah, didn't think about that, obviously.

2020-03-23 16:26:20

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

Sean Christopherson <[email protected]> writes:

> On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
>> Sean Christopherson <[email protected]> writes:
>>
>> > +
>> > + .runtime_ops = &svm_x86_ops,
>> > +};
>>
>> Unrelated to your patch but I think we can make the naming of some of
>> these functions more consistend on SVM/VMX, in particular I'd suggest
>>
>> has_svm() -> cpu_has_svm_support()
>> is_disabled -> svm_disabled_by_bios()
>> ...
>> (see below for VMX)
>>
>> > +
>> > static int __init svm_init(void)
>> > {
>> > - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
>> > + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
>> > __alignof__(struct vcpu_svm), THIS_MODULE);
>> > }
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> > index 07299a957d4a..ffcdcc86f5b7 100644
>> > --- a/arch/x86/kvm/vmx/vmx.c
>> > +++ b/arch/x86/kvm/vmx/vmx.c
>> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
>> > }
>> >
>> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>> > - .cpu_has_kvm_support = cpu_has_kvm_support,
>> > - .disabled_by_bios = vmx_disabled_by_bios,
>> > - .hardware_setup = hardware_setup,
>> > .hardware_unsetup = hardware_unsetup,
>> > - .check_processor_compatibility = vmx_check_processor_compat,
>> > +
>> > .hardware_enable = hardware_enable,
>> > .hardware_disable = hardware_disable,
>> > .cpu_has_accelerated_tpr = report_flexpriority,
>> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>> > };
>> >
>> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
>> > + .cpu_has_kvm_support = cpu_has_kvm_support,
>> > + .disabled_by_bios = vmx_disabled_by_bios,
>> > + .check_processor_compatibility = vmx_check_processor_compat,
>> > + .hardware_setup = hardware_setup,
>>
>> cpu_has_kvm_support() -> cpu_has_vmx_support()
>> hardware_setup() -> vmx_hardware_setup()
>
> Preaching to the choir on this one. The VMX functions without prefixes in
> in particular annoy me to no end, e.g. hardware_setup(). Though the worst
> is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
> time I've tried to find vmx_vcpu_create()...
>
> What if we added a macro to auto-generate the common/required hooks? E.g.:
>
> static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> MANDATORY_KVM_X86_OPS(vmx),
>
> .pmu_ops = &intel_pmu_ops,
>
> ...
> };
>
> That'd enforce consistent naming, and would provide a bit of documentation
> as to which hooks are optional, e.g. many of the nested hooks, and which
> must be defined for KVM to function.

Sounds cool! (not sure that with only two implementations people won't
call it 'over-engineered' but cool). My personal wish would just be that
function names in function implementations are not auto-generated so
e.g. a simple 'git grep vmx_hardware_setup' works but the way how we
fill vmx_x86_ops in can be macroed I guess.

--
Vitaly

2020-03-23 16:34:24

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

On Mon, Mar 23, 2020 at 05:24:56PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > On Mon, Mar 23, 2020 at 01:10:40PM +0100, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <[email protected]> writes:
> >>
> >> > +
> >> > + .runtime_ops = &svm_x86_ops,
> >> > +};
> >>
> >> Unrelated to your patch but I think we can make the naming of some of
> >> these functions more consistend on SVM/VMX, in particular I'd suggest
> >>
> >> has_svm() -> cpu_has_svm_support()
> >> is_disabled -> svm_disabled_by_bios()
> >> ...
> >> (see below for VMX)
> >>
> >> > +
> >> > static int __init svm_init(void)
> >> > {
> >> > - return kvm_init(&svm_x86_ops, sizeof(struct vcpu_svm),
> >> > + return kvm_init(&svm_init_ops, sizeof(struct vcpu_svm),
> >> > __alignof__(struct vcpu_svm), THIS_MODULE);
> >> > }
> >> >
> >> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >> > index 07299a957d4a..ffcdcc86f5b7 100644
> >> > --- a/arch/x86/kvm/vmx/vmx.c
> >> > +++ b/arch/x86/kvm/vmx/vmx.c
> >> > @@ -7842,11 +7842,8 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
> >> > }
> >> >
> >> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >> > - .cpu_has_kvm_support = cpu_has_kvm_support,
> >> > - .disabled_by_bios = vmx_disabled_by_bios,
> >> > - .hardware_setup = hardware_setup,
> >> > .hardware_unsetup = hardware_unsetup,
> >> > - .check_processor_compatibility = vmx_check_processor_compat,
> >> > +
> >> > .hardware_enable = hardware_enable,
> >> > .hardware_disable = hardware_disable,
> >> > .cpu_has_accelerated_tpr = report_flexpriority,
> >> > @@ -7981,6 +7978,15 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >> > .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >> > };
> >> >
> >> > +static struct kvm_x86_init_ops vmx_init_ops __initdata = {
> >> > + .cpu_has_kvm_support = cpu_has_kvm_support,
> >> > + .disabled_by_bios = vmx_disabled_by_bios,
> >> > + .check_processor_compatibility = vmx_check_processor_compat,
> >> > + .hardware_setup = hardware_setup,
> >>
> >> cpu_has_kvm_support() -> cpu_has_vmx_support()
> >> hardware_setup() -> vmx_hardware_setup()
> >
> > Preaching to the choir on this one. The VMX functions without prefixes in
> > in particular annoy me to no end, e.g. hardware_setup(). Though the worst
> > is probably ".vcpu_create = vmx_create_vcpu", if I had a nickel for every
> > time I've tried to find vmx_vcpu_create()...
> >
> > What if we added a macro to auto-generate the common/required hooks? E.g.:
> >
> > static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > MANDATORY_KVM_X86_OPS(vmx),
> >
> > .pmu_ops = &intel_pmu_ops,
> >
> > ...
> > };
> >
> > That'd enforce consistent naming, and would provide a bit of documentation
> > as to which hooks are optional, e.g. many of the nested hooks, and which
> > must be defined for KVM to function.
>
> Sounds cool! (not sure that with only two implementations people won't
> call it 'over-engineered' but cool). My personal wish would just be that
> function names in function implementations are not auto-generated so
> e.g. a simple 'git grep vmx_hardware_setup' works but the way how we
> fill vmx_x86_ops in can be macroed I guess.

Ya, I was thinking of just the macro. Even that has downsides though, e.g.
chasing kvm_x86_ops.hardware_setup() to find VMX's hardware_setup() becomes
a bit kludgy. On the other hand, _if_ you know how the fill macro works,
getting to the implementation should be easier.

2020-03-23 19:48:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] KVM: x86: Move init-only kvm_x86_ops to separate struct

On 23/03/20 17:24, Vitaly Kuznetsov wrote:
> Sounds cool! (not sure that with only two implementations people won't
> call it 'over-engineered' but cool).

Yes, something like

#define KVM_X86_OP(name) .name = vmx_##name

(svm_##name for svm.c) and then

KVM_X86_OP(check_nested_events)

etc.

> My personal wish would just be that
> function names in function implementations are not auto-generated so
> e.g. a simple 'git grep vmx_hardware_setup' works

Yes, absolutely; the function names would still be written by hand.

Paolo

> but the way how we
> fill vmx_x86_ops in can be macroed I guess.

2020-03-23 20:03:05

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] KVM: VMX: Configure runtime hooks using vmx_x86_ops

On 23/03/20 13:27, Vitaly Kuznetsov wrote:
>> - kvm_x86_ops->check_nested_events = vmx_check_nested_events;
>> - kvm_x86_ops->get_nested_state = vmx_get_nested_state;
>> - kvm_x86_ops->set_nested_state = vmx_set_nested_state;
>> - kvm_x86_ops->get_vmcs12_pages = nested_get_vmcs12_pages;
>> - kvm_x86_ops->nested_enable_evmcs = nested_enable_evmcs;
>> - kvm_x86_ops->nested_get_evmcs_version = nested_get_evmcs_version;
>> + ops->check_nested_events = vmx_check_nested_events;
>> + ops->get_nested_state = vmx_get_nested_state;
>> + ops->set_nested_state = vmx_set_nested_state;
>> + ops->get_vmcs12_pages = nested_get_vmcs12_pages;
>> + ops->nested_enable_evmcs = nested_enable_evmcs;
>> + ops->nested_get_evmcs_version = nested_get_evmcs_version;
>
> A lazy guy like me would appreciate 'ops' -> 'vmx_x86_ops' rename as it
> would make 'git grep vmx_x86_ops' output more complete.
>

I would prefer even more a kvm_x86_ops.nested struct but I would be okay
with a separate patch.

Paolo

2020-03-24 01:19:35

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH v3 1/9] KVM: Pass kvm_init()'s opaque param to additional arch funcs

On Sat, Mar 21, 2020 at 01:25:55PM -0700, Sean Christopherson wrote:
> Pass @opaque to kvm_arch_hardware_setup() and
> kvm_arch_check_processor_compat() to allow architecture specific code to
> reference @opaque without having to stash it away in a temporary global
> variable. This will enable x86 to separate its vendor specific callback
> ops, which are passed via @opaque, into "init" and "runtime" ops without
> having to stash away the "init" ops.
>
> No functional change intended.
>
> Reviewed-by: Cornelia Huck <[email protected]>
> Tested-by: Cornelia Huck <[email protected]> #s390
> Acked-by: Marc Zyngier <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>

Acked-by: Paul Mackerras <[email protected]>