2019-06-16 09:58:52

by Tao Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 0/3] KVM: x86: Enable user wait instructions

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store
to an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

UMWAIT instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(c0.1 state) or an improved power/performance optimized state
(c0.2 state).

TPAUSE instructs the processor to enter an implementation-dependent
optimized state c0.1 or c0.2 state and wake up when time-stamp counter
reaches specified timeout.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

The patches enable the umonitor, umwait and tpause features in KVM.
Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it. If the instruction causes a delay, the amount
of time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay (the time to delay
relative to the VM’s timestamp counter).

The release document ref below link:
Intel 64 and IA-32 Architectures Software Developer's Manual,
https://software.intel.com/sites/default/files/\
managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
This patch has a dependency on https://lkml.org/lkml/2019/6/7/1206

Changelog:
v3:
Simplify the patches, expose user wait instructions when the
guest has CPUID (Paolo)
Use mwait_control_cached to avoid frequently rdmsr of
IA32_UMWAIT_CONTROL (Paolo and Xiaoyao)
Handle vm-exit for UMWAIT and TPAUSE as "never happen" (Paolo)
v2:
Separated from the series https://lkml.org/lkml/2018/7/10/160
Add provide a capability to enable UMONITOR, UMWAIT and TPAUSE
v1:
Sent out with MOVDIRI/MOVDIR64B instructions patches

Tao Xu (3):
KVM: x86: add support for user wait instructions
KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL
KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

arch/x86/include/asm/vmx.h | 1 +
arch/x86/include/uapi/asm/vmx.h | 6 +++-
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx/capabilities.h | 6 ++++
arch/x86/kvm/vmx/vmx.c | 56 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 ++
arch/x86/power/umwait.c | 3 +-
7 files changed, 74 insertions(+), 3 deletions(-)

--
2.20.1


2019-06-16 09:58:57

by Tao Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 1/3] KVM: x86: add support for user wait instructions

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to
set the maximum time.

The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting” VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFFFFFCH).

Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.

Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.

Co-developed-by: Jingqi Liu <[email protected]>
Signed-off-by: Jingqi Liu <[email protected]>
Signed-off-by: Tao Xu <[email protected]>
---
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx/capabilities.h | 6 ++++++
arch/x86/kvm/vmx/vmx.c | 4 ++++
4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..8f00882664d3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -69,6 +69,7 @@
#define SECONDARY_EXEC_PT_USE_GPA 0x01000000
#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x00400000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000
+#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE 0x04000000

#define PIN_BASED_EXT_INTR_MASK 0x00000001
#define PIN_BASED_NMI_EXITING 0x00000008
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e18a9f9f65b5..48bd851a6ae5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -405,7 +405,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
- F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+ F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/;

/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d6664ee3d127..fd77e17651b4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -253,6 +253,12 @@ static inline bool cpu_has_vmx_tsc_scaling(void)
SECONDARY_EXEC_TSC_SCALING;
}

+static inline bool vmx_waitpkg_supported(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+}
+
static inline bool cpu_has_vmx_apicv(void)
{
return cpu_has_vmx_apic_register_virt() &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b93e36ddee5e..b35bfac30a34 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2250,6 +2250,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC |
@@ -3987,6 +3988,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
}
}

+ if (!guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
+ exec_control &= ~SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
+
vmx->secondary_exec_control = exec_control;
}

--
2.20.1

2019-06-16 09:59:04

by Tao Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 3/3] KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

As the latest Intel 64 and IA-32 Architectures Software Developer's
Manual, UMWAIT and TPAUSE instructions cause a VM exit if the
RDTSC exiting and enable user wait and pause VM-execution
controls are both 1.

This patch is to handle the vm-exit for UMWAIT and TPAUSE as this
should never happen.

Co-developed-by: Jingqi Liu <[email protected]>
Signed-off-by: Jingqi Liu <[email protected]>
Signed-off-by: Tao Xu <[email protected]>
---
arch/x86/include/uapi/asm/vmx.h | 6 +++++-
arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d213ec5c3766..d88d7a68849b 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -85,6 +85,8 @@
#define EXIT_REASON_PML_FULL 62
#define EXIT_REASON_XSAVES 63
#define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_UMWAIT 67
+#define EXIT_REASON_TPAUSE 68

#define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -142,7 +144,9 @@
{ EXIT_REASON_RDSEED, "RDSEED" }, \
{ EXIT_REASON_PML_FULL, "PML_FULL" }, \
{ EXIT_REASON_XSAVES, "XSAVES" }, \
- { EXIT_REASON_XRSTORS, "XRSTORS" }
+ { EXIT_REASON_XRSTORS, "XRSTORS" }, \
+ { EXIT_REASON_UMWAIT, "UMWAIT" }, \
+ { EXIT_REASON_TPAUSE, "TPAUSE" }

#define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1
#define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f33a25e82cb8..386bd68f8d0b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5335,6 +5335,20 @@ static int handle_monitor(struct kvm_vcpu *vcpu)
return handle_nop(vcpu);
}

+static int handle_umwait(struct kvm_vcpu *vcpu)
+{
+ kvm_skip_emulated_instruction(vcpu);
+ WARN(1, "this should never happen\n");
+ return 1;
+}
+
+static int handle_tpause(struct kvm_vcpu *vcpu)
+{
+ kvm_skip_emulated_instruction(vcpu);
+ WARN(1, "this should never happen\n");
+ return 1;
+}
+
static int handle_invpcid(struct kvm_vcpu *vcpu)
{
u32 vmx_instruction_info;
@@ -5545,6 +5559,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_VMFUNC] = handle_vmx_instruction,
[EXIT_REASON_PREEMPTION_TIMER] = handle_preemption_timer,
[EXIT_REASON_ENCLS] = handle_encls,
+ [EXIT_REASON_UMWAIT] = handle_umwait,
+ [EXIT_REASON_TPAUSE] = handle_tpause,
};

static const int kvm_vmx_max_exit_handlers =
--
2.20.1

2019-06-16 09:59:24

by Tao Xu

[permalink] [raw]
Subject: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
to determines the maximum time in TSC-quanta that the processor can reside
in either C0.1 or C0.2.

This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
IA32_UMWAIT_CONTROL between host and guest. The variable
mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.

Co-developed-by: Jingqi Liu <[email protected]>
Signed-off-by: Jingqi Liu <[email protected]>
Signed-off-by: Tao Xu <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 36 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
arch/x86/power/umwait.c | 3 ++-
3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b35bfac30a34..f33a25e82cb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
#endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!vmx_waitpkg_supported())
+ return 1;
+
+ msr_info->data = vmx->msr_ia32_umwait_control;
+ break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -1841,6 +1847,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
vmcs_write64(GUEST_BNDCFGS, data);
break;
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!vmx_waitpkg_supported())
+ return 1;
+
+ if (!data)
+ break;
+
+ vmx->msr_ia32_umwait_control = data;
+ break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -4126,6 +4141,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;

+ vmx->msr_ia32_umwait_control = 0;
+
vcpu->arch.microcode_version = 0x100000000ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vcpu, 0);
@@ -6339,6 +6356,23 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
}

+static void atomic_switch_ia32_umwait_control(struct vcpu_vmx *vmx)
+{
+ u64 host_umwait_control;
+
+ if (!vmx_waitpkg_supported())
+ return;
+
+ host_umwait_control = umwait_control_cached;
+
+ if (vmx->msr_ia32_umwait_control != host_umwait_control)
+ add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
+ vmx->msr_ia32_umwait_control,
+ host_umwait_control, false);
+ else
+ clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
+}
+
static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
{
vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6447,6 +6481,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)

atomic_switch_perf_msrs(vmx);

+ atomic_switch_ia32_umwait_control(vmx);
+
vmx_update_hv_timer(vcpu);

/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 61128b48c503..8485bec7c38a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -14,6 +14,8 @@
extern const u32 vmx_msr_index[];
extern u64 host_efer;

+extern u32 umwait_control_cached;
+
#define MSR_TYPE_R 1
#define MSR_TYPE_W 2
#define MSR_TYPE_RW 3
@@ -194,6 +196,7 @@ struct vcpu_vmx {
#endif

u64 spec_ctrl;
+ u64 msr_ia32_umwait_control;

u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
index 7fa381e3fd4e..2e6ce4cbccb3 100644
--- a/arch/x86/power/umwait.c
+++ b/arch/x86/power/umwait.c
@@ -9,7 +9,8 @@
* MSR value. By default, umwait max time is 100000 in TSC-quanta and C0.2
* is enabled
*/
-static u32 umwait_control_cached = 100000;
+u32 umwait_control_cached = 100000;
+EXPORT_SYMBOL_GPL(umwait_control_cached);

/*
* Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
--
2.20.1

2019-06-17 03:33:18

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL



On 6/16/2019 5:55 PM, Tao Xu wrote:
> UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
> to determines the maximum time in TSC-quanta that the processor can reside
> in either C0.1 or C0.2.
>
> This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
> IA32_UMWAIT_CONTROL between host and guest. The variable
> mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
> this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.
>
> Co-developed-by: Jingqi Liu <[email protected]>
> Signed-off-by: Jingqi Liu <[email protected]>
> Signed-off-by: Tao Xu <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 36 ++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 3 +++
> arch/x86/power/umwait.c | 3 ++-
> 3 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b35bfac30a34..f33a25e82cb8 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> #endif
> case MSR_EFER:
> return kvm_get_msr_common(vcpu, msr_info);
> + case MSR_IA32_UMWAIT_CONTROL:
> + if (!vmx_waitpkg_supported())
> + return 1;
> +
> + msr_info->data = vmx->msr_ia32_umwait_control;
> + break;
> case MSR_IA32_SPEC_CTRL:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> @@ -1841,6 +1847,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return 1;
> vmcs_write64(GUEST_BNDCFGS, data);
> break;
> + case MSR_IA32_UMWAIT_CONTROL:
> + if (!vmx_waitpkg_supported())
> + return 1;
> +
> + if (!data)
> + break;
> +

Why cannot clear it to zero?

> + vmx->msr_ia32_umwait_control = data;
> + break;
> case MSR_IA32_SPEC_CTRL:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> @@ -4126,6 +4141,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> vmx->rmode.vm86_active = 0;
> vmx->spec_ctrl = 0;
>
> + vmx->msr_ia32_umwait_control = 0;
> +
> vcpu->arch.microcode_version = 0x100000000ULL;
> vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> kvm_set_cr8(vcpu, 0);
> @@ -6339,6 +6356,23 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> msrs[i].host, false);
> }
>
> +static void atomic_switch_ia32_umwait_control(struct vcpu_vmx *vmx)
> +{
> + u64 host_umwait_control;
> +
> + if (!vmx_waitpkg_supported())
> + return;
> +
> + host_umwait_control = umwait_control_cached;
> +

It's redundant to define host_umwait_control and this line, we can just
use umwait_control_cached.

> + if (vmx->msr_ia32_umwait_control != host_umwait_control)
> + add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
> + vmx->msr_ia32_umwait_control,
> + host_umwait_control, false);

The bit 1 is reserved, at least, we need to do below to ensure not
modifying the reserved bit:

guest_val = (vmx->msr_ia32_umwait_control & ~BIT_ULL(1)) |
(host_val & BIT_ULL(1))

> + else
> + clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
> +}
> +
> static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> {
> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> @@ -6447,6 +6481,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> atomic_switch_perf_msrs(vmx);
>
> + atomic_switch_ia32_umwait_control(vmx);
> +
> vmx_update_hv_timer(vcpu);
>
> /*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 61128b48c503..8485bec7c38a 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -14,6 +14,8 @@
> extern const u32 vmx_msr_index[];
> extern u64 host_efer;
>
> +extern u32 umwait_control_cached;
> +
> #define MSR_TYPE_R 1
> #define MSR_TYPE_W 2
> #define MSR_TYPE_RW 3
> @@ -194,6 +196,7 @@ struct vcpu_vmx {
> #endif
>
> u64 spec_ctrl;
> + u64 msr_ia32_umwait_control;
>
> u32 vm_entry_controls_shadow;
> u32 vm_exit_controls_shadow;
> diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
> index 7fa381e3fd4e..2e6ce4cbccb3 100644
> --- a/arch/x86/power/umwait.c
> +++ b/arch/x86/power/umwait.c
> @@ -9,7 +9,8 @@
> * MSR value. By default, umwait max time is 100000 in TSC-quanta and C0.2
> * is enabled
> */
> -static u32 umwait_control_cached = 100000;
> +u32 umwait_control_cached = 100000;
> +EXPORT_SYMBOL_GPL(umwait_control_cached);
>
> /*
> * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
>

2019-06-17 06:32:29

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL



On 6/17/2019 11:32 AM, Xiaoyao Li wrote:
>
>
> On 6/16/2019 5:55 PM, Tao Xu wrote:
>> UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
>> to determines the maximum time in TSC-quanta that the processor can
>> reside
>> in either C0.1 or C0.2.
>>
>> This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
>> IA32_UMWAIT_CONTROL between host and guest. The variable
>> mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
>> this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.
>>
>> Co-developed-by: Jingqi Liu <[email protected]>
>> Signed-off-by: Jingqi Liu <[email protected]>
>> Signed-off-by: Tao Xu <[email protected]>
>> ---
>>   arch/x86/kvm/vmx/vmx.c  | 36 ++++++++++++++++++++++++++++++++++++
>>   arch/x86/kvm/vmx/vmx.h  |  3 +++
>>   arch/x86/power/umwait.c |  3 ++-
>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index b35bfac30a34..f33a25e82cb8 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr_info)
>>   #endif
>>       case MSR_EFER:
>>           return kvm_get_msr_common(vcpu, msr_info);
>> +    case MSR_IA32_UMWAIT_CONTROL:
>> +        if (!vmx_waitpkg_supported())
>> +            return 1;
>> +
>> +        msr_info->data = vmx->msr_ia32_umwait_control;
>> +        break;
>>       case MSR_IA32_SPEC_CTRL:
>>           if (!msr_info->host_initiated &&
>>               !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>> @@ -1841,6 +1847,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr_info)
>>               return 1;
>>           vmcs_write64(GUEST_BNDCFGS, data);
>>           break;
>> +    case MSR_IA32_UMWAIT_CONTROL:
>> +        if (!vmx_waitpkg_supported())
>> +            return 1;
>> +
>> +        if (!data)
>> +            break;
>> +
>
> Why cannot clear it to zero?
>
>> +        vmx->msr_ia32_umwait_control = data;
>> +        break;
>>       case MSR_IA32_SPEC_CTRL:
>>           if (!msr_info->host_initiated &&
>>               !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>> @@ -4126,6 +4141,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu
>> *vcpu, bool init_event)
>>       vmx->rmode.vm86_active = 0;
>>       vmx->spec_ctrl = 0;
>> +    vmx->msr_ia32_umwait_control = 0;
>> +
>>       vcpu->arch.microcode_version = 0x100000000ULL;
>>       vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>>       kvm_set_cr8(vcpu, 0);
>> @@ -6339,6 +6356,23 @@ static void atomic_switch_perf_msrs(struct
>> vcpu_vmx *vmx)
>>                       msrs[i].host, false);
>>   }
>> +static void atomic_switch_ia32_umwait_control(struct vcpu_vmx *vmx)
>> +{
>> +    u64 host_umwait_control;
>> +
>> +    if (!vmx_waitpkg_supported())
>> +        return;
>> +
>> +    host_umwait_control = umwait_control_cached;
>> +
>
> It's redundant to define host_umwait_control and this line, we can just
> use umwait_control_cached.
>
>> +    if (vmx->msr_ia32_umwait_control != host_umwait_control)
>> +        add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
>> +                      vmx->msr_ia32_umwait_control,
>> +                      host_umwait_control, false);
>
> The bit 1 is reserved, at least, we need to do below to ensure not
> modifying the reserved bit:
>
>     guest_val = (vmx->msr_ia32_umwait_control & ~BIT_ULL(1)) |
>             (host_val & BIT_ULL(1))
>

I find a better solution to ensure reserved bit 1 not being modified in
vmx_set_msr() as below:

if((data ^ umwait_control_cached) & BIT_ULL(1))
return 1;

>> +    else
>> +        clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
>> +}
>> +
>>   static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
>>   {
>>       vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
>> @@ -6447,6 +6481,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>       atomic_switch_perf_msrs(vmx);
>> +    atomic_switch_ia32_umwait_control(vmx);
>> +
>>       vmx_update_hv_timer(vcpu);
>>       /*
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 61128b48c503..8485bec7c38a 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -14,6 +14,8 @@
>>   extern const u32 vmx_msr_index[];
>>   extern u64 host_efer;
>> +extern u32 umwait_control_cached;
>> +
>>   #define MSR_TYPE_R    1
>>   #define MSR_TYPE_W    2
>>   #define MSR_TYPE_RW    3
>> @@ -194,6 +196,7 @@ struct vcpu_vmx {
>>   #endif
>>       u64              spec_ctrl;
>> +    u64              msr_ia32_umwait_control;
>>       u32 vm_entry_controls_shadow;
>>       u32 vm_exit_controls_shadow;
>> diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
>> index 7fa381e3fd4e..2e6ce4cbccb3 100644
>> --- a/arch/x86/power/umwait.c
>> +++ b/arch/x86/power/umwait.c
>> @@ -9,7 +9,8 @@
>>    * MSR value. By default, umwait max time is 100000 in TSC-quanta
>> and C0.2
>>    * is enabled
>>    */
>> -static u32 umwait_control_cached = 100000;
>> +u32 umwait_control_cached = 100000;
>> +EXPORT_SYMBOL_GPL(umwait_control_cached);
>>   /*
>>    * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL
>> MSR
>>

2019-06-17 15:52:46

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

2019-06-17 14:31+0800, Xiaoyao Li:
> On 6/17/2019 11:32 AM, Xiaoyao Li wrote:
> > On 6/16/2019 5:55 PM, Tao Xu wrote:
> > > +    if (vmx->msr_ia32_umwait_control != host_umwait_control)
> > > +        add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
> > > +                      vmx->msr_ia32_umwait_control,
> > > +                      host_umwait_control, false);
> >
> > The bit 1 is reserved, at least, we need to do below to ensure not
> > modifying the reserved bit:
> >
> >     guest_val = (vmx->msr_ia32_umwait_control & ~BIT_ULL(1)) |
> >             (host_val & BIT_ULL(1))
> >
>
> I find a better solution to ensure reserved bit 1 not being modified in
> vmx_set_msr() as below:
>
> if((data ^ umwait_control_cached) & BIT_ULL(1))
> return 1;

We could just be checking

if (data & BIT_ULL(1))

because the guest cannot change its visible reserved value and KVM
currently initializes the value to 0.

The arch/x86/kernel/cpu/umwait.c series assumes that the reserved bit
is 0 (hopefully deliberately) and I would do the same in KVM as it
simplifies the logic. (We don't have to even think about migrations
between machines with a different reserved value and making it play
nicely with possible future implementations of that bit.)

Thanks.

2019-06-17 15:55:13

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL



On 6/17/2019 11:50 PM, Radim Krčmář wrote:
> 2019-06-17 14:31+0800, Xiaoyao Li:
>> On 6/17/2019 11:32 AM, Xiaoyao Li wrote:
>>> On 6/16/2019 5:55 PM, Tao Xu wrote:
>>>> +    if (vmx->msr_ia32_umwait_control != host_umwait_control)
>>>> +        add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
>>>> +                      vmx->msr_ia32_umwait_control,
>>>> +                      host_umwait_control, false);
>>>
>>> The bit 1 is reserved, at least, we need to do below to ensure not
>>> modifying the reserved bit:
>>>
>>>     guest_val = (vmx->msr_ia32_umwait_control & ~BIT_ULL(1)) |
>>>             (host_val & BIT_ULL(1))
>>>
>>
>> I find a better solution to ensure reserved bit 1 not being modified in
>> vmx_set_msr() as below:
>>
>> if((data ^ umwait_control_cached) & BIT_ULL(1))
>> return 1;
>
> We could just be checking
>
> if (data & BIT_ULL(1))
>
> because the guest cannot change its visible reserved value and KVM
> currently initializes the value to 0.
>
> The arch/x86/kernel/cpu/umwait.c series assumes that the reserved bit
> is 0 (hopefully deliberately) and I would do the same in KVM as it
> simplifies the logic. (We don't have to even think about migrations
> between machines with a different reserved value and making it play
> nicely with possible future implementations of that bit.)
>

Got it, thanks.

> Thanks.
>

2019-06-18 02:41:06

by Tao Xu

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

On 6/17/2019 11:50 PM, Radim Krčmář wrote:
> 2019-06-17 14:31+0800, Xiaoyao Li:
>> On 6/17/2019 11:32 AM, Xiaoyao Li wrote:
>>> On 6/16/2019 5:55 PM, Tao Xu wrote:
>>>> +    if (vmx->msr_ia32_umwait_control != host_umwait_control)
>>>> +        add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
>>>> +                      vmx->msr_ia32_umwait_control,
>>>> +                      host_umwait_control, false);
>>>
>>> The bit 1 is reserved, at least, we need to do below to ensure not
>>> modifying the reserved bit:
>>>
>>>     guest_val = (vmx->msr_ia32_umwait_control & ~BIT_ULL(1)) |
>>>             (host_val & BIT_ULL(1))
>>>
>>
>> I find a better solution to ensure reserved bit 1 not being modified in
>> vmx_set_msr() as below:
>>
>> if((data ^ umwait_control_cached) & BIT_ULL(1))
>> return 1;
>
> We could just be checking
>
> if (data & BIT_ULL(1))
>
> because the guest cannot change its visible reserved value and KVM
> currently initializes the value to 0.
>
> The arch/x86/kernel/cpu/umwait.c series assumes that the reserved bit
> is 0 (hopefully deliberately) and I would do the same in KVM as it
> simplifies the logic. (We don't have to even think about migrations
> between machines with a different reserved value and making it play
> nicely with possible future implementations of that bit.)
>
> Thanks.
>
Thank you Radim and xiaoyao's review, I will improve it in the next
version. Xiaoyao's suggestion remind me another thing. And I am
wondering if we need to initialize the value of MSR_IA32_UMWAIT_CONTROL
in KVM to 0x186a0(umwait_control = 100000, as host does).

Although the guest with new kernel(has umwait host patch)can initialize
the value to 0x186a0. But there is a case that a guest with a old kernel
and the host with the new kernel and has the cpuid of WAITPKG. Because
the msr value is 0, the guest umwait will have no max time by default.

2019-06-18 03:05:19

by Tao Xu

[permalink] [raw]
Subject: Re: [PATCH RESEND v3 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

On 6/17/2019 11:32 AM, Xiaoyao Li wrote:
>
>
> On 6/16/2019 5:55 PM, Tao Xu wrote:
>> UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
>> to determines the maximum time in TSC-quanta that the processor can
>> reside
>> in either C0.1 or C0.2.
>>
>> This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
>> IA32_UMWAIT_CONTROL between host and guest. The variable
>> mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
>> this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.
>>
>> Co-developed-by: Jingqi Liu <[email protected]>
>> Signed-off-by: Jingqi Liu <[email protected]>
>> Signed-off-by: Tao Xu <[email protected]>
>> ---
>>   arch/x86/kvm/vmx/vmx.c  | 36 ++++++++++++++++++++++++++++++++++++
>>   arch/x86/kvm/vmx/vmx.h  |  3 +++
>>   arch/x86/power/umwait.c |  3 ++-
>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index b35bfac30a34..f33a25e82cb8 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1679,6 +1679,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr_info)
>>   #endif
>>       case MSR_EFER:
>>           return kvm_get_msr_common(vcpu, msr_info);
>> +    case MSR_IA32_UMWAIT_CONTROL:
>> +        if (!vmx_waitpkg_supported())
>> +            return 1;
>> +
>> +        msr_info->data = vmx->msr_ia32_umwait_control;
>> +        break;
>>       case MSR_IA32_SPEC_CTRL:
>>           if (!msr_info->host_initiated &&
>>               !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>> @@ -1841,6 +1847,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr_info)
>>               return 1;
>>           vmcs_write64(GUEST_BNDCFGS, data);
>>           break;
>> +    case MSR_IA32_UMWAIT_CONTROL:
>> +        if (!vmx_waitpkg_supported())
>> +            return 1;
>> +
>> +        if (!data)
>> +            break;
>> +
>
> Why cannot clear it to zero?
>

After read the kernel code of umwait again and test it again, host can
set it to 0, when we set sys max_time to 0. So I am wondering to remove
the "if (!data)" and set the value of msr value to 0x186a0(maxtime =
100000) when KVM initialization.

And considering we use "-overcommit cpu-pm=on" to use umwait in QEMU
side. It means guest can over-commit the host CPU, so set to 0 make sense.

>> +        vmx->msr_ia32_umwait_control = data;
>> +        break;
>>       case MSR_IA32_SPEC_CTRL:
>>           if (!msr_info->host_initiated &&
>>               !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
>> @@ -4126,6 +4141,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu
>> *vcpu, bool init_event)
>>       vmx->rmode.vm86_active = 0;
>>       vmx->spec_ctrl = 0;
>> +    vmx->msr_ia32_umwait_control = 0;
>> +
>>       vcpu->arch.microcode_version = 0x100000000ULL;
>>       vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
>>       kvm_set_cr8(vcpu, 0);
>> @@ -6339,6 +6356,23 @@ static void atomic_switch_perf_msrs(struct
>> vcpu_vmx *vmx)
>>                       msrs[i].host, false);
>>   }
>> +static void atomic_switch_ia32_umwait_control(struct vcpu_vmx *vmx)
>> +{
>> +    u64 host_umwait_control;
>> +
>> +    if (!vmx_waitpkg_supported())
>> +        return;
>> +
>> +    host_umwait_control = umwait_control_cached;
>> +
>
> It's redundant to define host_umwait_control and this line, we can just
> use umwait_control_cached.

Thanks for reminding.
>
>> +    if (vmx->msr_ia32_umwait_control != host_umwait_control)
>> +        add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
>> +                      vmx->msr_ia32_umwait_control,
>> +                      host_umwait_control, false);
>
> The bit 1 is reserved, at least, we need to do below to ensure not
> modifying the reserved bit:
>
>     guest_val = (vmx->msr_ia32_umwait_control & ~BIT_ULL(1)) |
>             (host_val & BIT_ULL(1))
>
>> +    else
>> +        clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
>> +}
>> +
>>   static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
>>   {
>>       vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
>> @@ -6447,6 +6481,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>       atomic_switch_perf_msrs(vmx);
>> +    atomic_switch_ia32_umwait_control(vmx);
>> +
>>       vmx_update_hv_timer(vcpu);
>>       /*
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 61128b48c503..8485bec7c38a 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -14,6 +14,8 @@
>>   extern const u32 vmx_msr_index[];
>>   extern u64 host_efer;
>> +extern u32 umwait_control_cached;
>> +
>>   #define MSR_TYPE_R    1
>>   #define MSR_TYPE_W    2
>>   #define MSR_TYPE_RW    3
>> @@ -194,6 +196,7 @@ struct vcpu_vmx {
>>   #endif
>>       u64              spec_ctrl;
>> +    u64              msr_ia32_umwait_control;
>>       u32 vm_entry_controls_shadow;
>>       u32 vm_exit_controls_shadow;
>> diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c
>> index 7fa381e3fd4e..2e6ce4cbccb3 100644
>> --- a/arch/x86/power/umwait.c
>> +++ b/arch/x86/power/umwait.c
>> @@ -9,7 +9,8 @@
>>    * MSR value. By default, umwait max time is 100000 in TSC-quanta
>> and C0.2
>>    * is enabled
>>    */
>> -static u32 umwait_control_cached = 100000;
>> +u32 umwait_control_cached = 100000;
>> +EXPORT_SYMBOL_GPL(umwait_control_cached);
>>   /*
>>    * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL
>> MSR
>>