2019-06-21 06:00:54

by Tao Xu

[permalink] [raw]
Subject: [PATCH v6 0/3] KVM: x86: Enable user wait instructions

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store
to an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

UMWAIT instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(c0.1 state) or an improved power/performance optimized state
(c0.2 state).

TPAUSE instructs the processor to enter an implementation-dependent
optimized state c0.1 or c0.2 state and wake up when time-stamp counter
reaches specified timeout.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

The patches enable the umonitor, umwait and tpause features in KVM.
Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it. If the instruction causes a delay, the amount
of time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay (the time to delay
relative to the VM’s timestamp counter).

The release document ref below link:
Intel 64 and IA-32 Architectures Software Developer's Manual,
https://software.intel.com/sites/default/files/\
managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
This patch has a dependency on https://lkml.org/lkml/2019/6/19/972

Changelog:
v6:
add check msr_info->host_initiated in get/set msr(Xiaoyao)
restore the atomic_switch_umwait_control_msr()(Xiaoyao)
v5:
remove vmx_waitpkg_supported() to fix guest can rdmsr or wrmsr
when the feature is off (Xiaoyao)
rebase the patch because the kernel dependcy patch updated to
v5: https://lkml.org/lkml/2019/6/19/972
v4:
Set msr of IA32_UMWAIT_CONTROL can be 0 and add the check of
reserved bit 1 (Radim and Xiaoyao)
Use umwait_control_cached directly and add the IA32_UMWAIT_CONTROL
in msrs_to_save[] to support migration (Xiaoyao)
v3:
Simplify the patches, expose user wait instructions when the
guest has CPUID (Paolo)
Use mwait_control_cached to avoid frequently rdmsr of
IA32_UMWAIT_CONTROL (Paolo and Xiaoyao)
Handle vm-exit for UMWAIT and TPAUSE as "never happen" (Paolo)
v2:
Separated from the series https://lkml.org/lkml/2018/7/10/160
Add provide a capability to enable UMONITOR, UMWAIT and TPAUSE
v1:
Sent out with MOVDIRI/MOVDIR64B instructions patches

Tao Xu (3):
KVM: x86: add support for user wait instructions
KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL
KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

arch/x86/include/asm/vmx.h | 1 +
arch/x86/include/uapi/asm/vmx.h | 6 +++-
arch/x86/kernel/cpu/umwait.c | 3 +-
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 53 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 ++
arch/x86/kvm/x86.c | 1 +
7 files changed, 66 insertions(+), 3 deletions(-)

--
2.20.1


2019-06-21 06:00:57

by Tao Xu

[permalink] [raw]
Subject: [PATCH v6 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
to determines the maximum time in TSC-quanta that the processor can reside
in either C0.1 or C0.2.

This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
IA32_UMWAIT_CONTROL between host and guest. The variable
mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.

Co-developed-by: Jingqi Liu <[email protected]>
Signed-off-by: Jingqi Liu <[email protected]>
Signed-off-by: Tao Xu <[email protected]>
---

Changes in v6:
add check msr_info->host_initiated in get/set msr(Xiaoyao)
restore the atomic_switch_umwait_control_msr()(Xiaoyao)
rebase the patch because the kernel dependcy patch updated to
v5: https://lkml.org/lkml/2019/6/19/972
---
arch/x86/kernel/cpu/umwait.c | 3 ++-
arch/x86/kvm/vmx/vmx.c | 33 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
arch/x86/kvm/x86.c | 1 +
4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 4b2aff7b2d4d..db5c193ef136 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -15,7 +15,8 @@
* MSR value. By default, umwait max time is 100000 in TSC-quanta and C0.2
* is enabled
*/
-static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLED);
+u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLED);
+EXPORT_SYMBOL_GPL(umwait_control_cached);

/*
* Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b35bfac30a34..0d81cb9b96cf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1679,6 +1679,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
#endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
+ return 1;
+
+ msr_info->data = vmx->msr_ia32_umwait_control;
+ break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -1841,6 +1848,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
vmcs_write64(GUEST_BNDCFGS, data);
break;
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
+ return 1;
+
+ /* The reserved bit IA32_UMWAIT_CONTROL[1] should be zero */
+ if (data & BIT_ULL(1))
+ return 1;
+
+ vmx->msr_ia32_umwait_control = data;
+ break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -4126,6 +4144,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;

+ vmx->msr_ia32_umwait_control = 0;
+
vcpu->arch.microcode_version = 0x100000000ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vcpu, 0);
@@ -6339,6 +6359,16 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
}

+static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx)
+{
+ if (vmx->msr_ia32_umwait_control != umwait_control_cached)
+ add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
+ vmx->msr_ia32_umwait_control,
+ umwait_control_cached, false);
+ else
+ clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
+}
+
static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
{
vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6447,6 +6477,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)

atomic_switch_perf_msrs(vmx);

+ if (guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
+ atomic_switch_umwait_control_msr(vmx);
+
vmx_update_hv_timer(vcpu);

/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 61128b48c503..8485bec7c38a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -14,6 +14,8 @@
extern const u32 vmx_msr_index[];
extern u64 host_efer;

+extern u32 umwait_control_cached;
+
#define MSR_TYPE_R 1
#define MSR_TYPE_W 2
#define MSR_TYPE_RW 3
@@ -194,6 +196,7 @@ struct vcpu_vmx {
#endif

u64 spec_ctrl;
+ u64 msr_ia32_umwait_control;

u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83aefd759846..4480de459bf4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1138,6 +1138,7 @@ static u32 msrs_to_save[] = {
MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_UMWAIT_CONTROL,
};

static unsigned num_msrs_to_save;
--
2.20.1

2019-06-21 06:02:05

by Tao Xu

[permalink] [raw]
Subject: [PATCH v6 3/3] KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

As the latest Intel 64 and IA-32 Architectures Software Developer's
Manual, UMWAIT and TPAUSE instructions cause a VM exit if the
RDTSC exiting and enable user wait and pause VM-execution
controls are both 1.

This patch is to handle the vm-exit for UMWAIT and TPAUSE as this
should never happen.

Co-developed-by: Jingqi Liu <[email protected]>
Signed-off-by: Jingqi Liu <[email protected]>
Signed-off-by: Tao Xu <[email protected]>
---

No changes in v6
---
arch/x86/include/uapi/asm/vmx.h | 6 +++++-
arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d213ec5c3766..d88d7a68849b 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -85,6 +85,8 @@
#define EXIT_REASON_PML_FULL 62
#define EXIT_REASON_XSAVES 63
#define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_UMWAIT 67
+#define EXIT_REASON_TPAUSE 68

#define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -142,7 +144,9 @@
{ EXIT_REASON_RDSEED, "RDSEED" }, \
{ EXIT_REASON_PML_FULL, "PML_FULL" }, \
{ EXIT_REASON_XSAVES, "XSAVES" }, \
- { EXIT_REASON_XRSTORS, "XRSTORS" }
+ { EXIT_REASON_XRSTORS, "XRSTORS" }, \
+ { EXIT_REASON_UMWAIT, "UMWAIT" }, \
+ { EXIT_REASON_TPAUSE, "TPAUSE" }

#define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1
#define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0d81cb9b96cf..26696679f4ca 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5338,6 +5338,20 @@ static int handle_monitor(struct kvm_vcpu *vcpu)
return handle_nop(vcpu);
}

+static int handle_umwait(struct kvm_vcpu *vcpu)
+{
+ kvm_skip_emulated_instruction(vcpu);
+ WARN(1, "this should never happen\n");
+ return 1;
+}
+
+static int handle_tpause(struct kvm_vcpu *vcpu)
+{
+ kvm_skip_emulated_instruction(vcpu);
+ WARN(1, "this should never happen\n");
+ return 1;
+}
+
static int handle_invpcid(struct kvm_vcpu *vcpu)
{
u32 vmx_instruction_info;
@@ -5548,6 +5562,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_VMFUNC] = handle_vmx_instruction,
[EXIT_REASON_PREEMPTION_TIMER] = handle_preemption_timer,
[EXIT_REASON_ENCLS] = handle_encls,
+ [EXIT_REASON_UMWAIT] = handle_umwait,
+ [EXIT_REASON_TPAUSE] = handle_tpause,
};

static const int kvm_vmx_max_exit_handlers =
--
2.20.1

2019-07-03 00:33:25

by Tao Xu

[permalink] [raw]
Subject: Re: [PATCH v6 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

Ping ;)
On 6/21/2019 1:57 PM, Tao Xu wrote:
> UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
> to determines the maximum time in TSC-quanta that the processor can reside
> in either C0.1 or C0.2.
>
> This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
> IA32_UMWAIT_CONTROL between host and guest. The variable
> mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
> this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.
>
> Co-developed-by: Jingqi Liu <[email protected]>
> Signed-off-by: Jingqi Liu <[email protected]>
> Signed-off-by: Tao Xu <[email protected]>
> ---
>
> Changes in v6:
> add check msr_info->host_initiated in get/set msr(Xiaoyao)
> restore the atomic_switch_umwait_control_msr()(Xiaoyao)
> rebase the patch because the kernel dependcy patch updated to
> v5: https://lkml.org/lkml/2019/6/19/972
> ---
> arch/x86/kernel/cpu/umwait.c | 3 ++-
> arch/x86/kvm/vmx/vmx.c | 33 +++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.h | 3 +++
> arch/x86/kvm/x86.c | 1 +
> 4 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
> index 4b2aff7b2d4d..db5c193ef136 100644
> --- a/arch/x86/kernel/cpu/umwait.c
> +++ b/arch/x86/kernel/cpu/umwait.c
> @@ -15,7 +15,8 @@
> * MSR value. By default, umwait max time is 100000 in TSC-quanta and C0.2
> * is enabled
> */
> -static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLED);
> +u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLED);
> +EXPORT_SYMBOL_GPL(umwait_control_cached);
>
> /*
> * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b35bfac30a34..0d81cb9b96cf 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1679,6 +1679,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> #endif
> case MSR_EFER:
> return kvm_get_msr_common(vcpu, msr_info);
> + case MSR_IA32_UMWAIT_CONTROL:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
> + return 1;
> +
> + msr_info->data = vmx->msr_ia32_umwait_control;
> + break;
> case MSR_IA32_SPEC_CTRL:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> @@ -1841,6 +1848,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return 1;
> vmcs_write64(GUEST_BNDCFGS, data);
> break;
> + case MSR_IA32_UMWAIT_CONTROL:
> + if (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
> + return 1;
> +
> + /* The reserved bit IA32_UMWAIT_CONTROL[1] should be zero */
> + if (data & BIT_ULL(1))
> + return 1;
> +
> + vmx->msr_ia32_umwait_control = data;
> + break;
> case MSR_IA32_SPEC_CTRL:
> if (!msr_info->host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> @@ -4126,6 +4144,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> vmx->rmode.vm86_active = 0;
> vmx->spec_ctrl = 0;
>
> + vmx->msr_ia32_umwait_control = 0;
> +
> vcpu->arch.microcode_version = 0x100000000ULL;
> vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> kvm_set_cr8(vcpu, 0);
> @@ -6339,6 +6359,16 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> msrs[i].host, false);
> }
>
> +static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx)
> +{
> + if (vmx->msr_ia32_umwait_control != umwait_control_cached)
> + add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
> + vmx->msr_ia32_umwait_control,
> + umwait_control_cached, false);
> + else
> + clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
> +}
> +
> static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
> {
> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> @@ -6447,6 +6477,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> atomic_switch_perf_msrs(vmx);
>
> + if (guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
> + atomic_switch_umwait_control_msr(vmx);
> +
> vmx_update_hv_timer(vcpu);
>
> /*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 61128b48c503..8485bec7c38a 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -14,6 +14,8 @@
> extern const u32 vmx_msr_index[];
> extern u64 host_efer;
>
> +extern u32 umwait_control_cached;
> +
> #define MSR_TYPE_R 1
> #define MSR_TYPE_W 2
> #define MSR_TYPE_RW 3
> @@ -194,6 +196,7 @@ struct vcpu_vmx {
> #endif
>
> u64 spec_ctrl;
> + u64 msr_ia32_umwait_control;
>
> u32 vm_entry_controls_shadow;
> u32 vm_exit_controls_shadow;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 83aefd759846..4480de459bf4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1138,6 +1138,7 @@ static u32 msrs_to_save[] = {
> MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
> MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
> MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
> + MSR_IA32_UMWAIT_CONTROL,
> };
>
> static unsigned num_msrs_to_save;
>

2019-07-11 13:42:41

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v6 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

On 21/06/19 07:57, Tao Xu wrote:
> + if (guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
> + atomic_switch_umwait_control_msr(vmx);
> +

guest_cpuid_has is slow. Please replace it with a test on
secondary_exec_controls_get(vmx).

Are you going to look into nested virtualization support? This should
include only 1) allowing setting the enable bit in secondary execution
controls, and passing it through in prepare_vmcs02_early; 2) reflecting
the vmexit in nested_vmx_exit_reflected.

Thanks,

Paolo

2019-07-12 00:43:59

by Tao Xu

[permalink] [raw]
Subject: Re: [PATCH v6 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

On 7/11/2019 9:25 PM, Paolo Bonzini wrote:
> On 21/06/19 07:57, Tao Xu wrote:
>> + if (guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG))
>> + atomic_switch_umwait_control_msr(vmx);
>> +
>
> guest_cpuid_has is slow. Please replace it with a test on
> secondary_exec_controls_get(vmx).

Thank you paolo, I will improve it.

>
> Are you going to look into nested virtualization support? This should
> include only 1) allowing setting the enable bit in secondary execution
> controls, and passing it through in prepare_vmcs02_early; 2) reflecting
> the vmexit in nested_vmx_exit_reflected.
>

I will add nested support in next version.

> Thanks,
>
> Paolo
>