2019-05-21 06:08:21

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/3] KVM: Documentation: Add disable pause exits to KVM_CAP_X86_DISABLE_EXITS

From: Wanpeng Li <[email protected]>

Commit b31c114b (KVM: X86: Provide a capability to disable PAUSE intercepts)
forgot to add the KVM_X86_DISABLE_EXITS_PAUSE into api doc. This patch adds
it.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
Documentation/virtual/kvm/api.txt | 1 +
1 file changed, 1 insertion(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ba6c42c..33cd92d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4893,6 +4893,7 @@ Valid bits in args[0] are

#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
+#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)

Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
--
2.7.4



2019-05-21 06:08:28

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

From: Wanpeng Li <[email protected]>

MSR IA32_MISC_ENABLE bit 18, according to SDM:

| When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0).
| This indicates that MONITOR/MWAIT are not supported.
|
| Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0.
|
| When this bit is set to 1 (default), MONITOR/MWAIT are supported (CPUID.01H:ECX[bit 3] = 1).

The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
guest visibility.

This patch implements toggling of the CPUID bit based on guest writes
to the MSR.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
v1 -> v2:
* hide behind KVM_CAP_DISABLE_QUIRKS

arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/cpuid.c | 10 ++++++++++
arch/x86/kvm/x86.c | 10 ++++++++++
3 files changed, 21 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7a0e64c..e3ae96b5 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -382,6 +382,7 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_MWAIT (1 << 4)

#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e18a9f9..f54d266 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,6 +137,16 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);

+ if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT)) {
+ best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+ if (best) {
+ if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
+ best->ecx |= F(MWAIT);
+ else
+ best->ecx &= ~F(MWAIT);
+ }
+ }
+
/* Update physical-address width */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
kvm_mmu_reset_context(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 765fe59..a4eb711 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2547,6 +2547,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
break;
case MSR_IA32_MISC_ENABLE:
+ if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT) &&
+ ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
+ if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
+ !(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+ return 1;
+ }
+ vcpu->arch.ia32_misc_enable_msr = data;
+ kvm_update_cpuid(vcpu);
+ }
vcpu->arch.ia32_misc_enable_msr = data;
break;
case MSR_IA32_SMBASE:
--
2.7.4


2019-05-21 06:09:25

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

From: Wanpeng Li <[email protected]>

Allow guest reads CORE cstate when exposing host CPU power management capabilities
to the guest. PKG cstate is restricted to avoid a guest to get the whole package
information in multi-tenant scenario.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Liran Alon <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
v1 -> v2:
* use a separate bit for KVM_CAP_X86_DISABLE_EXITS

Documentation/virtual/kvm/api.txt | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 5 ++++-
arch/x86/kvm/x86.h | 5 +++++
include/uapi/linux/kvm.h | 4 +++-
tools/include/uapi/linux/kvm.h | 4 +++-
7 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 33cd92d..91fd86f 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4894,6 +4894,7 @@ Valid bits in args[0] are
#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)

Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d5457c7..1ce8289 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -882,6 +882,7 @@ struct kvm_arch {
bool mwait_in_guest;
bool hlt_in_guest;
bool pause_in_guest;
+ bool cstate_in_guest;

unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0861c71..da24f18 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+ if (kvm_cstate_in_guest(kvm)) {
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
+ }
vmx->msr_bitmap_mode = 0;

vmx->loaded_vmcs = &vmx->vmcs01;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 202048e..765fe59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3098,7 +3098,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_CLOCK_TSC_STABLE;
break;
case KVM_CAP_X86_DISABLE_EXITS:
- r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE;
+ r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
+ KVM_X86_DISABLE_EXITS_CSTATE;
if(kvm_can_mwait_in_guest())
r |= KVM_X86_DISABLE_EXITS_MWAIT;
break;
@@ -4612,6 +4613,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm->arch.hlt_in_guest = true;
if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
kvm->arch.pause_in_guest = true;
+ if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
+ kvm->arch.cstate_in_guest = true;
r = 0;
break;
case KVM_CAP_MSR_PLATFORM_INFO:
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a470ff0..275b3b6 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -333,6 +333,11 @@ static inline bool kvm_pause_in_guest(struct kvm *kvm)
return kvm->arch.pause_in_guest;
}

+static inline bool kvm_cstate_in_guest(struct kvm *kvm)
+{
+ return kvm->arch.cstate_in_guest;
+}
+
DECLARE_PER_CPU(struct kvm_vcpu *, current_vcpu);

static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fe12b4..c2152f3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -696,9 +696,11 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
#define KVM_X86_DISABLE_VALID_EXITS (KVM_X86_DISABLE_EXITS_MWAIT | \
KVM_X86_DISABLE_EXITS_HLT | \
- KVM_X86_DISABLE_EXITS_PAUSE)
+ KVM_X86_DISABLE_EXITS_PAUSE | \
+ KVM_X86_DISABLE_EXITS_CSTATE)

/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 6d4ea4b..ef3303f 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -696,9 +696,11 @@ struct kvm_ioeventfd {
#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
#define KVM_X86_DISABLE_VALID_EXITS (KVM_X86_DISABLE_EXITS_MWAIT | \
KVM_X86_DISABLE_EXITS_HLT | \
- KVM_X86_DISABLE_EXITS_PAUSE)
+ KVM_X86_DISABLE_EXITS_PAUSE | \
+ KVM_X86_DISABLE_EXITS_CSTATE)

/* for KVM_ENABLE_CAP */
struct kvm_enable_cap {
--
2.7.4


2019-06-04 16:56:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

On 21/05/19 08:06, Wanpeng Li wrote:
> From: Wanpeng Li <[email protected]>
>
> Allow guest reads CORE cstate when exposing host CPU power management capabilities
> to the guest. PKG cstate is restricted to avoid a guest to get the whole package
> information in multi-tenant scenario.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: Liran Alon <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> v1 -> v2:
> * use a separate bit for KVM_CAP_X86_DISABLE_EXITS
>
> Documentation/virtual/kvm/api.txt | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 6 ++++++
> arch/x86/kvm/x86.c | 5 ++++-
> arch/x86/kvm/x86.h | 5 +++++
> include/uapi/linux/kvm.h | 4 +++-
> tools/include/uapi/linux/kvm.h | 4 +++-
> 7 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 33cd92d..91fd86f 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4894,6 +4894,7 @@ Valid bits in args[0] are
> #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
> #define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
> #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
> +#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
>
> Enabling this capability on a VM provides userspace with a way to no
> longer intercept some instructions for improved latency in some
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d5457c7..1ce8289 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -882,6 +882,7 @@ struct kvm_arch {
> bool mwait_in_guest;
> bool hlt_in_guest;
> bool pause_in_guest;
> + bool cstate_in_guest;
>
> unsigned long irq_sources_bitmap;
> s64 kvmclock_offset;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 0861c71..da24f18 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> + if (kvm_cstate_in_guest(kvm)) {
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);

I think I have changed my mind on the implementation of this, sorry.

1) We should emulate these MSRs always, otherwise the guest API changes
between different values of KVM_CAP_X86_DISABLE_EXITS which is not
intended. Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live
migration, so it should be possible to set the MSRs in the host to
change the delta between the host and guest values.

2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are
disabled (i.e. exit happens), the MSRs will be purely emulated.
C3/C6/C7 residency will never increase (it will remain the value that is
set by the host). When the VM executes an hlt vmexit, it should save
the current TSC. When it comes back, the C1 residency MSR should be
increased by the time that has passed.

3) If KVM_X86_DISABLE_EXITS_HLT is enabled but
KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen),
C3/C6/C7 residency will also never increase, but the C1 residency value
should be read using rdmsr from the host, with a delta added from the
host value.

4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both
disabled (i.e. mwait exits do not happen), all four residency values
should be read using rdmsr from the host, with a delta added from the
host value.

5) If KVM_X86_DISABLE_EXITS_HLT is disabled and
KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense
so it's okay not to be very optimized. In this case, the residency
value should be read as in (4), but hlt vmexits will be accounted as in
(2) so we need to be careful not to double-count the residency during
hlt. This means doing four rdmsr before the beginning of the hlt vmexit
and four at the end of the hlt vmexit.

Therefore the data structure should be something like

struct kvm_residency_msr {
u64 value;
bool delta_from_host;
bool count_with_host;
}

u64 kvm_residency_read_host(struct kvm_residency_msr *msr)
{
u64 unscaled_value = rdmsrl(msr->index);
// apply TSC scaling...
return ...
}

u64 kvm_residency_read(struct kvm_residency_msr *msr)
{
return msr->value +
(msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
}

void kvm_residency_write(struct kvm_residency_msr *msr,
u64 value)
{
msr->value = value -
(msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
}

// count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT
// or KVM_CAP_DISABLE_EXITS_MWAIT is set
// count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT
is set
void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index,
bool count_with_host)
{
/* Preserve value on calls after the first */
u64 value = msr->index ? kvm_residency_read(msr) : 0;
msr->delta_from_host = msr->count_with_host = count_with_host;
msr->index = index;
kvm_residency_write(msr, value);
}

// The following functions are called from hlt vmexits.

void kvm_residency_start_hlt(struct kvm_residency_msr *msr)
{
if (msr->count_with_host) {
WARN_ON(msr->delta_from_host);
msr->value += kvm_residency_read_host(msr);
msr->delta_from_host = false;
}
}

// host_tsc_waited is 0 except for MSR_CORE_C1_RES
void kvm_residency_end_hlt(struct kvm_residency_msr *msr,
u64 host_tsc_waited)
{
if (msr->count_with_host) {
WARN_ON(!msr->delta_from_host);
msr->value -= kvm_residency_read_host(msr);
msr->delta_from_host = true;
}
if (host_tsc_waited) {
// ... apply TSC scaling to host_tsc_waited ...
msr->value += ...;
}
}

Thanks,

Paolo

2019-06-04 17:01:06

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

On 21/05/19 08:06, Wanpeng Li wrote:
>
> The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
> CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
> kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
> guest visibility.

This needs some adjustment so that the default is backwards-compatible:

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index e3ae96b52a16..f9b021e16ebc 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -378,11 +378,11 @@ struct kvm_sync_regs {
struct kvm_vcpu_events events;
};

-#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
-#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
-#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
-#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
-#define KVM_X86_QUIRK_MISC_ENABLE_MWAIT (1 << 4)
+#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
+#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
+#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)

#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f54d266fd3b5..bfa1341ce6f1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,10 +137,10 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);

- if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT)) {
+ if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
if (best) {
- if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
+ if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_NO_MWAIT)
best->ecx |= F(MWAIT);
else
best->ecx &= ~F(MWAIT);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 528935733fe0..0c1498da46c7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2548,17 +2548,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
break;
case MSR_IA32_MISC_ENABLE:
- if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT) &&
- ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
- if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
- !(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
- if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
- return 1;
- }
+ if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
+ ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_NO_MWAIT)) {
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+ return 1;
vcpu->arch.ia32_misc_enable_msr = data;
kvm_update_cpuid(vcpu);
+ } else {
+ vcpu->arch.ia32_misc_enable_msr = data;
}
- vcpu->arch.ia32_misc_enable_msr = data;
break;
case MSR_IA32_SMBASE:
if (!msr_info->host_initiated)

Please double check, in the meanwhile I've queued the patch.

Paolo

2019-06-05 10:57:56

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

On Wed, 5 Jun 2019 at 00:53, Paolo Bonzini <[email protected]> wrote:
>
> On 21/05/19 08:06, Wanpeng Li wrote:
> > From: Wanpeng Li <[email protected]>
> >
> > Allow guest reads CORE cstate when exposing host CPU power management capabilities
> > to the guest. PKG cstate is restricted to avoid a guest to get the whole package
> > information in multi-tenant scenario.
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Sean Christopherson <[email protected]>
> > Cc: Liran Alon <[email protected]>
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
> > v1 -> v2:
> > * use a separate bit for KVM_CAP_X86_DISABLE_EXITS
> >
> > Documentation/virtual/kvm/api.txt | 1 +
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/vmx/vmx.c | 6 ++++++
> > arch/x86/kvm/x86.c | 5 ++++-
> > arch/x86/kvm/x86.h | 5 +++++
> > include/uapi/linux/kvm.h | 4 +++-
> > tools/include/uapi/linux/kvm.h | 4 +++-
> > 7 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 33cd92d..91fd86f 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -4894,6 +4894,7 @@ Valid bits in args[0] are
> > #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
> > #define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
> > #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
> > +#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
> >
> > Enabling this capability on a VM provides userspace with a way to no
> > longer intercept some instructions for improved latency in some
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d5457c7..1ce8289 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -882,6 +882,7 @@ struct kvm_arch {
> > bool mwait_in_guest;
> > bool hlt_in_guest;
> > bool pause_in_guest;
> > + bool cstate_in_guest;
> >
> > unsigned long irq_sources_bitmap;
> > s64 kvmclock_offset;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 0861c71..da24f18 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> > + if (kvm_cstate_in_guest(kvm)) {
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>
> I think I have changed my mind on the implementation of this, sorry.
>
> 1) We should emulate these MSRs always, otherwise the guest API changes
> between different values of KVM_CAP_X86_DISABLE_EXITS which is not
> intended. Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live
> migration, so it should be possible to set the MSRs in the host to
> change the delta between the host and guest values.
>
> 2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are
> disabled (i.e. exit happens), the MSRs will be purely emulated.
> C3/C6/C7 residency will never increase (it will remain the value that is
> set by the host). When the VM executes an hlt vmexit, it should save
> the current TSC. When it comes back, the C1 residency MSR should be
> increased by the time that has passed.
>
> 3) If KVM_X86_DISABLE_EXITS_HLT is enabled but
> KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen),
> C3/C6/C7 residency will also never increase, but the C1 residency value
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both
> disabled (i.e. mwait exits do not happen), all four residency values
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 5) If KVM_X86_DISABLE_EXITS_HLT is disabled and
> KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense
> so it's okay not to be very optimized. In this case, the residency
> value should be read as in (4), but hlt vmexits will be accounted as in
> (2) so we need to be careful not to double-count the residency during
> hlt. This means doing four rdmsr before the beginning of the hlt vmexit
> and four at the end of the hlt vmexit.

I will have a try, thanks Paolo! :)

Regards,
Wanpeng Li

>
> Therefore the data structure should be something like
>
> struct kvm_residency_msr {
> u64 value;
> bool delta_from_host;
> bool count_with_host;
> }
>
> u64 kvm_residency_read_host(struct kvm_residency_msr *msr)
> {
> u64 unscaled_value = rdmsrl(msr->index);
> // apply TSC scaling...
> return ...
> }
>
> u64 kvm_residency_read(struct kvm_residency_msr *msr)
> {
> return msr->value +
> (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> void kvm_residency_write(struct kvm_residency_msr *msr,
> u64 value)
> {
> msr->value = value -
> (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> // count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT
> // or KVM_CAP_DISABLE_EXITS_MWAIT is set
> // count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT
> is set
> void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index,
> bool count_with_host)
> {
> /* Preserve value on calls after the first */
> u64 value = msr->index ? kvm_residency_read(msr) : 0;
> msr->delta_from_host = msr->count_with_host = count_with_host;
> msr->index = index;
> kvm_residency_write(msr, value);
> }
>
> // The following functions are called from hlt vmexits.
>
> void kvm_residency_start_hlt(struct kvm_residency_msr *msr)
> {
> if (msr->count_with_host) {
> WARN_ON(msr->delta_from_host);
> msr->value += kvm_residency_read_host(msr);
> msr->delta_from_host = false;
> }
> }
>
> // host_tsc_waited is 0 except for MSR_CORE_C1_RES
> void kvm_residency_end_hlt(struct kvm_residency_msr *msr,
> u64 host_tsc_waited)
> {
> if (msr->count_with_host) {
> WARN_ON(!msr->delta_from_host);
> msr->value -= kvm_residency_read_host(msr);
> msr->delta_from_host = true;
> }
> if (host_tsc_waited) {
> // ... apply TSC scaling to host_tsc_waited ...
> msr->value += ...;
> }
> }
>
> Thanks,
>
> Paolo

2019-06-05 11:02:18

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

On Wed, 5 Jun 2019 at 00:59, Paolo Bonzini <[email protected]> wrote:
>
> On 21/05/19 08:06, Wanpeng Li wrote:
> >
> > The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
> > CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
> > kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
> > guest visibility.
>
> This needs some adjustment so that the default is backwards-compatible:
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index e3ae96b52a16..f9b021e16ebc 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -378,11 +378,11 @@ struct kvm_sync_regs {
> struct kvm_vcpu_events events;
> };
>
> -#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
> -#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
> -#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
> -#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
> -#define KVM_X86_QUIRK_MISC_ENABLE_MWAIT (1 << 4)
> +#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
> +#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
> +#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
> +#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
> +#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
>
> #define KVM_STATE_NESTED_GUEST_MODE 0x00000001
> #define KVM_STATE_NESTED_RUN_PENDING 0x00000002
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f54d266fd3b5..bfa1341ce6f1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -137,10 +137,10 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
>
> - if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT)) {
> + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
> if (best) {
> - if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
> + if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_NO_MWAIT)
> best->ecx |= F(MWAIT);
> else
> best->ecx &= ~F(MWAIT);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 528935733fe0..0c1498da46c7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2548,17 +2548,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> }
> break;
> case MSR_IA32_MISC_ENABLE:
> - if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT) &&
> - ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
> - if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
> - !(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
> - if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> - return 1;
> - }
> + if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
> + ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_NO_MWAIT)) {
> + if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> + return 1;
> vcpu->arch.ia32_misc_enable_msr = data;
> kvm_update_cpuid(vcpu);
> + } else {
> + vcpu->arch.ia32_misc_enable_msr = data;
> }
> - vcpu->arch.ia32_misc_enable_msr = data;
> break;
> case MSR_IA32_SMBASE:
> if (!msr_info->host_initiated)
>
> Please double check, in the meanwhile I've queued the patch.

Looks good to me, thanks. :)

Regards,
Wanpeng Li

2019-06-11 07:38:17

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

On Wed, 5 Jun 2019 at 00:53, Paolo Bonzini <[email protected]> wrote:
>
> On 21/05/19 08:06, Wanpeng Li wrote:
> > From: Wanpeng Li <[email protected]>
> >
> > Allow guest reads CORE cstate when exposing host CPU power management capabilities
> > to the guest. PKG cstate is restricted to avoid a guest to get the whole package
> > information in multi-tenant scenario.
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Sean Christopherson <[email protected]>
> > Cc: Liran Alon <[email protected]>
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
> > v1 -> v2:
> > * use a separate bit for KVM_CAP_X86_DISABLE_EXITS
> >
> > Documentation/virtual/kvm/api.txt | 1 +
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/vmx/vmx.c | 6 ++++++
> > arch/x86/kvm/x86.c | 5 ++++-
> > arch/x86/kvm/x86.h | 5 +++++
> > include/uapi/linux/kvm.h | 4 +++-
> > tools/include/uapi/linux/kvm.h | 4 +++-
> > 7 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 33cd92d..91fd86f 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -4894,6 +4894,7 @@ Valid bits in args[0] are
> > #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
> > #define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
> > #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
> > +#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
> >
> > Enabling this capability on a VM provides userspace with a way to no
> > longer intercept some instructions for improved latency in some
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d5457c7..1ce8289 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -882,6 +882,7 @@ struct kvm_arch {
> > bool mwait_in_guest;
> > bool hlt_in_guest;
> > bool pause_in_guest;
> > + bool cstate_in_guest;
> >
> > unsigned long irq_sources_bitmap;
> > s64 kvmclock_offset;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 0861c71..da24f18 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> > vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> > + if (kvm_cstate_in_guest(kvm)) {
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>
> I think I have changed my mind on the implementation of this, sorry.
>
> 1) We should emulate these MSRs always, otherwise the guest API changes
> between different values of KVM_CAP_X86_DISABLE_EXITS which is not
> intended. Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live
> migration, so it should be possible to set the MSRs in the host to
> change the delta between the host and guest values.
>
> 2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are
> disabled (i.e. exit happens), the MSRs will be purely emulated.
> C3/C6/C7 residency will never increase (it will remain the value that is
> set by the host). When the VM executes an hlt vmexit, it should save
> the current TSC. When it comes back, the C1 residency MSR should be
> increased by the time that has passed.
>
> 3) If KVM_X86_DISABLE_EXITS_HLT is enabled but
> KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen),
> C3/C6/C7 residency will also never increase, but the C1 residency value
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both
> disabled (i.e. mwait exits do not happen), all four residency values
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 5) If KVM_X86_DISABLE_EXITS_HLT is disabled and
> KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense
> so it's okay not to be very optimized. In this case, the residency
> value should be read as in (4), but hlt vmexits will be accounted as in
> (2) so we need to be careful not to double-count the residency during
> hlt. This means doing four rdmsr before the beginning of the hlt vmexit
> and four at the end of the hlt vmexit.

MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
can avoid the complex logic to handle C1 now. :)

Regards,
Wanpeng Li

>
> Therefore the data structure should be something like
>
> struct kvm_residency_msr {
> u64 value;
> bool delta_from_host;
> bool count_with_host;
> }
>
> u64 kvm_residency_read_host(struct kvm_residency_msr *msr)
> {
> u64 unscaled_value = rdmsrl(msr->index);
> // apply TSC scaling...
> return ...
> }
>
> u64 kvm_residency_read(struct kvm_residency_msr *msr)
> {
> return msr->value +
> (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> void kvm_residency_write(struct kvm_residency_msr *msr,
> u64 value)
> {
> msr->value = value -
> (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> // count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT
> // or KVM_CAP_DISABLE_EXITS_MWAIT is set
> // count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT
> is set
> void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index,
> bool count_with_host)
> {
> /* Preserve value on calls after the first */
> u64 value = msr->index ? kvm_residency_read(msr) : 0;
> msr->delta_from_host = msr->count_with_host = count_with_host;
> msr->index = index;
> kvm_residency_write(msr, value);
> }
>
> // The following functions are called from hlt vmexits.
>
> void kvm_residency_start_hlt(struct kvm_residency_msr *msr)
> {
> if (msr->count_with_host) {
> WARN_ON(msr->delta_from_host);
> msr->value += kvm_residency_read_host(msr);
> msr->delta_from_host = false;
> }
> }
>
> // host_tsc_waited is 0 except for MSR_CORE_C1_RES
> void kvm_residency_end_hlt(struct kvm_residency_msr *msr,
> u64 host_tsc_waited)
> {
> if (msr->count_with_host) {
> WARN_ON(!msr->delta_from_host);
> msr->value -= kvm_residency_read_host(msr);
> msr->delta_from_host = true;
> }
> if (host_tsc_waited) {
> // ... apply TSC scaling to host_tsc_waited ...
> msr->value += ...;
> }
> }
>
> Thanks,
>
> Paolo

2019-06-11 12:20:22

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

On 11/06/19 09:38, Wanpeng Li wrote:
> MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
> can avoid the complex logic to handle C1 now. :)

I disagree. Linux uses it on all platforms is available, and virtual
machines that don't pass mwait through _only_ have C1, so it would be
less useful to have deep C-state residency MSRs and not C1 residency.

But turbostat can get the information from sysfs, so what are these MSRs
used for?

Paolo

2019-06-11 12:23:20

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

On Tue, 11 Jun 2019 at 19:09, Paolo Bonzini <[email protected]> wrote:
>
> On 11/06/19 09:38, Wanpeng Li wrote:
> > MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
> > can avoid the complex logic to handle C1 now. :)
>
> I disagree. Linux uses it on all platforms is available, and virtual
> machines that don't pass mwait through _only_ have C1, so it would be
> less useful to have deep C-state residency MSRs and not C1 residency.
>
> But turbostat can get the information from sysfs, so what are these MSRs
> used for?

The sysfs is not accurate, the time which is accumulated in the
function cpuidle_enter_state() is the expected cstate we hope to enter
instead of the real cstate we finally enter. For example, we found
several SKX/CLX models can't enter deeper cstates in non-root mode,
Intel hardware team has been working on this according to our report
recently. The bare-metal cstate residency msrs don't increase in
non-root mode, however, the time under
/sys/devices/system/cpu/cpux/cpuidle/statex/ increase gradually in the
guest.

Regards,
Wanpeng Li

2019-06-12 07:32:20

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

On Tue, 11 Jun 2019 at 19:09, Paolo Bonzini <[email protected]> wrote:
>
> On 11/06/19 09:38, Wanpeng Li wrote:
> > MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
> > can avoid the complex logic to handle C1 now. :)
>
> I disagree. Linux uses it on all platforms is available, and virtual
> machines that don't pass mwait through _only_ have C1, so it would be
> less useful to have deep C-state residency MSRs and not C1 residency.

Your design heavily depends on read host MSR_CORE_C1_RES for C1
residency msr emulation, however, the host MSR_CORE_C1_RES is
unreadable.
#rdmsr 0x660
rdmsr: CPU 0 cannot read MSR 0x00000660

Refer to turbostat codes,
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/power/x86/turbostat/turbostat.c#n1335
C1 is derivated from other parameters. use_c1_residency_msr is true
just for ATOM platform.

Regards,
Wanpeng Li