2019-02-14 17:30:06

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 00/12] Guest LBR Enabling

Last Branch Recording (LBR) is a performance monitor unit (PMU) feature
on Intel CPUs that captures branch related info. This patch series enables
this feature to KVM guests.

Here is a conclusion of the fundamental methods that we use:
1) the LBR feature is enabled per guest via QEMU setting of
KVM_CAP_X86_GUEST_LBR;
2) the LBR stack is passed through to the guest for direct accesses after
the guest's first access to any of the lbr related MSRs;
3) the host will help save/resotre the LBR stack when the vCPU is
scheduled out/in.

ChangeLog:
- KVM/x86:
- patch 4: enable guest lbr when guest lbr msr index and host
lbr msr index matches;
- patch 5: change kvm_pmu_get_msr to get the msr_data struct;
- patch 8: remove the PF_VCPU and is_kernel_event check;
- patch 10:
- move the lbr virtualization code from vmx.c to pmu_intel.c;
- save the lbr stack even when the guest is not using the user
callstack mode in case some tools (e.g. autofdo) complain
for bogus samples;
- patch 11: remove the common handling of the debugctl;
- patch 12: support to report GLOBAL_STATUS_LBRS_FROZEN.
previous:
https://lkml.org/lkml/2018/12/26/82

Like Xu (1):
KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr
stack

Wei Wang (11):
perf/x86: fix the variable type of the LBR MSRs
perf/x86: add a function to get the lbr stack
KVM/x86: KVM_CAP_X86_GUEST_LBR
KVM/x86: intel_pmu_lbr_enable
KVM/x86/vPMU: tweak kvm_pmu_get_msr
KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
perf/x86: no counter allocation support
perf/x86: save/restore LBR_SELECT on vCPU switching
KVM/x86/lbr: lazy save the guest lbr stack
KVM/x86: remove the common handling of the debugctl msr
KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

arch/x86/events/core.c | 12 ++
arch/x86/events/intel/lbr.c | 42 +++-
arch/x86/events/perf_event.h | 6 +-
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/include/asm/perf_event.h | 16 ++
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/cpuid.h | 8 +
arch/x86/kvm/pmu.c | 18 +-
arch/x86/kvm/pmu.h | 11 +-
arch/x86/kvm/pmu_amd.c | 7 +-
arch/x86/kvm/vmx/pmu_intel.c | 398 +++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 4 +-
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 33 ++--
include/linux/perf_event.h | 5 +
include/uapi/linux/kvm.h | 1 +
include/uapi/linux/perf_event.h | 3 +-
17 files changed, 535 insertions(+), 38 deletions(-)

--
2.7.4



2019-02-14 17:29:35

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 03/12] KVM/x86: KVM_CAP_X86_GUEST_LBR

Introduce KVM_CAP_X86_GUEST_LBR to allow per-VM enabling of the guest
lbr feature.

Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/x86.c | 15 +++++++++++++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 18 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4660ce9..e6f6760 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -869,6 +869,7 @@ struct kvm_arch {
atomic_t vapics_in_nmi_mode;
struct mutex apic_map_lock;
struct kvm_apic_map *apic_map;
+ struct x86_perf_lbr_stack lbr_stack;

bool apic_access_page_done;

@@ -877,6 +878,7 @@ struct kvm_arch {
bool mwait_in_guest;
bool hlt_in_guest;
bool pause_in_guest;
+ bool lbr_in_guest;

unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d27206..2cdebe7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3028,6 +3028,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_GET_MSR_FEATURES:
case KVM_CAP_MSR_PLATFORM_INFO:
case KVM_CAP_EXCEPTION_PAYLOAD:
+ case KVM_CAP_X86_GUEST_LBR:
r = 1;
break;
case KVM_CAP_SYNC_REGS:
@@ -4562,6 +4563,20 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm->arch.exception_payload_enabled = cap->args[0];
r = 0;
break;
+ case KVM_CAP_X86_GUEST_LBR:
+ r = -EINVAL;
+ if (cap->args[0] &&
+ x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) {
+ pr_err("Failed to enable the guest lbr feature\n");
+ break;
+ }
+ if (copy_to_user((void __user *)cap->args[1],
+ &kvm->arch.lbr_stack,
+ sizeof(struct x86_perf_lbr_stack)))
+ break;
+ kvm->arch.lbr_in_guest = cap->args[0];
+ r = 0;
+ break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6d4ea4b..a7cac96 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ARM_VM_IPA_SIZE 165
#define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
#define KVM_CAP_HYPERV_CPUID 167
+#define KVM_CAP_X86_GUEST_LBR 168

#ifdef KVM_CAP_IRQ_ROUTING

--
2.7.4


2019-02-14 17:29:52

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 06/12] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest

Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of
the addresses stored in the LBR stack. Expose those bits to the guest
when the guest lbr feature is enabled.

Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/asm/perf_event.h | 2 ++
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++++++++++
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 2f82795..eee09b7 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -87,6 +87,8 @@
#define ARCH_PERFMON_BRANCH_MISSES_RETIRED 6
#define ARCH_PERFMON_EVENTS_COUNT 7

+#define X86_PERF_CAP_MASK_LBR_FMT 0x3f
+
/*
* Intel "Architectural Performance Monitoring" CPUID
* detection/enumeration details:
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index bbffa6c..708df80 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -363,7 +363,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
0 /* DS-CPL, VMX, SMX, EST */ |
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
- F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
+ F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) |
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index cd3b5d2..cbc6015 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -153,6 +153,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
case MSR_CORE_PERF_GLOBAL_STATUS:
case MSR_CORE_PERF_GLOBAL_CTRL:
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_IA32_PERF_CAPABILITIES:
ret = pmu->version > 1;
break;
default:
@@ -318,6 +319,19 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
msr_info->data = pmu->global_ovf_ctrl;
return 0;
+ case MSR_IA32_PERF_CAPABILITIES: {
+ u64 data;
+
+ if (!boot_cpu_has(X86_FEATURE_PDCM) ||
+ (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_PDCM)))
+ return 1;
+ data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
+ msr_info->data = 0;
+ if (vcpu->kvm->arch.lbr_in_guest)
+ msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT);
+ return 0;
+ }
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
@@ -370,6 +384,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 0;
}
break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ return 1; /* RO MSR */
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
--
2.7.4


2019-02-14 17:30:01

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 05/12] KVM/x86/vPMU: tweak kvm_pmu_get_msr

This patch changes kvm_pmu_get_msr to get the msr_data struct, because
The host_initiated field from the struct could be used by get_msr. This
also makes this API be consistent with kvm_pmu_set_msr.

Signed-off-by: Wei Wang <[email protected]>
---
arch/x86/kvm/pmu.c | 4 ++--
arch/x86/kvm/pmu.h | 4 ++--
arch/x86/kvm/pmu_amd.c | 7 ++++---
arch/x86/kvm/vmx/pmu_intel.c | 15 ++++++++-------
arch/x86/kvm/x86.c | 4 ++--
5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b438ffa..57e0df3 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -318,9 +318,9 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
return kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
}

-int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
+int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
- return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data);
+ return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr_info);
}

int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5f3c7a4..e1bcd2b 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -29,7 +29,7 @@ struct kvm_pmu_ops {
int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx);
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
bool (*lbr_enable)(struct kvm_vcpu *vcpu);
- int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
+ int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
@@ -113,7 +113,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx);
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
-int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
+int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
void kvm_pmu_reset(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 1495a73..99c4097 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -210,21 +210,22 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
return ret;
}

-static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
+static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
+ u32 msr = msr_info->index;

/* MSR_PERFCTRn */
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
if (pmc) {
- *data = pmc_read_counter(pmc);
+ msr_info->data = pmc_read_counter(pmc);
return 0;
}
/* MSR_EVNTSELn */
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
if (pmc) {
- *data = pmc->eventsel;
+ msr_info->data = pmc->eventsel;
return 0;
}

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 6728701..cd3b5d2 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -299,31 +299,32 @@ static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu)
return true;
}

-static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
+static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
+ u32 msr = msr_info->index;

switch (msr) {
case MSR_CORE_PERF_FIXED_CTR_CTRL:
- *data = pmu->fixed_ctr_ctrl;
+ msr_info->data = pmu->fixed_ctr_ctrl;
return 0;
case MSR_CORE_PERF_GLOBAL_STATUS:
- *data = pmu->global_status;
+ msr_info->data = pmu->global_status;
return 0;
case MSR_CORE_PERF_GLOBAL_CTRL:
- *data = pmu->global_ctrl;
+ msr_info->data = pmu->global_ctrl;
return 0;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
- *data = pmu->global_ovf_ctrl;
+ msr_info->data = pmu->global_ovf_ctrl;
return 0;
default:
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
(pmc = get_fixed_pmc(pmu, msr))) {
- *data = pmc_read_counter(pmc);
+ msr_info->data = pmc_read_counter(pmc);
return 0;
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
- *data = pmc->eventsel;
+ msr_info->data = pmc->eventsel;
return 0;
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2a41aab..c8f32e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2741,7 +2741,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
- return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
+ return kvm_pmu_get_msr(vcpu, msr_info);
msr_info->data = 0;
break;
case MSR_IA32_UCODE_REV:
@@ -2884,7 +2884,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
- return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
+ return kvm_pmu_get_msr(vcpu, msr_info);
if (!ignore_msrs) {
vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
msr_info->index);
--
2.7.4


2019-02-14 17:30:33

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 01/12] perf/x86: fix the variable type of the LBR MSRs

The MSR variable type can be "unsigned int", which uses less memory than
the longer unsigned long. The lbr nr won't be a negative number, so make
it "unsigned int" as well.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/events/perf_event.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b70..1f78d85 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -619,8 +619,8 @@ struct x86_pmu {
/*
* Intel LBR
*/
- unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */
- int lbr_nr; /* hardware stack size */
+ unsigned int lbr_tos, lbr_from, lbr_to,
+ lbr_nr; /* lbr stack and size */
u64 lbr_sel_mask; /* LBR_SELECT valid bits */
const int *lbr_sel_map; /* lbr_select mappings */
bool lbr_double_abort; /* duplicated lbr aborts */
--
2.7.4


2019-02-14 17:30:45

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 11/12] KVM/x86: remove the common handling of the debugctl msr

The debugctl msr is not completely identical on AMD and Intel CPUs, for
example, FREEZE_LBRS_ON_PMI is supported by Intel CPUs only. Now, this
msr is handled separatedly in svm.c and intel_pmu.c. So remove the
common debugctl msr handling code in kvm_get/set_msr_common.

Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/kvm/x86.c | 13 -------------
1 file changed, 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e663c1..729da4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2463,18 +2463,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
}
break;
- case MSR_IA32_DEBUGCTLMSR:
- if (!data) {
- /* We support the non-activated case already */
- break;
- } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
- /* Values other than LBR and BTF are vendor-specific,
- thus reserved and should throw a #GP */
- return 1;
- }
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
- break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(vcpu, msr, data);
case MSR_IA32_APICBASE:
@@ -2716,7 +2704,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
switch (msr_info->index) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
- case MSR_IA32_DEBUGCTLMSR:
case MSR_IA32_LASTBRANCHFROMIP:
case MSR_IA32_LASTBRANCHTOIP:
case MSR_IA32_LASTINTFROMIP:
--
2.7.4


2019-02-14 17:31:18

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

Arch v4 supports streamlined Freeze_LBR_on_PMI, so we set the
GLOBAL_STATUS_LBRS_FROZEN bit when the guest reads the global
status msr with freezing lbr in use.

Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index bf40941..80d3fcf 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -420,6 +420,25 @@ static bool intel_pmu_access_lbr_msr(struct kvm_vcpu *vcpu,
return ret;
}

+static void intel_pmu_get_global_status(struct kvm_pmu *pmu,
+ struct msr_data *msr_info)
+{
+ u64 guest_debugctl, freeze_lbr_bits = DEBUGCTLMSR_FREEZE_LBRS_ON_PMI |
+ DEBUGCTLMSR_LBR;
+
+ if (!pmu->global_status) {
+ msr_info->data = 0;
+ return;
+ }
+
+ msr_info->data = pmu->global_status;
+ if (pmu->version >= 4) {
+ guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ if ((guest_debugctl & freeze_lbr_bits) == freeze_lbr_bits)
+ msr_info->data |= GLOBAL_STATUS_LBRS_FROZEN;
+ }
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -431,7 +450,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = pmu->fixed_ctr_ctrl;
return 0;
case MSR_CORE_PERF_GLOBAL_STATUS:
- msr_info->data = pmu->global_status;
+ intel_pmu_get_global_status(pmu, msr_info);
return 0;
case MSR_CORE_PERF_GLOBAL_CTRL:
msr_info->data = pmu->global_ctrl;
--
2.7.4


2019-02-14 17:31:28

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 10/12] KVM/x86/lbr: lazy save the guest lbr stack

When the vCPU is scheduled in:
- if the lbr feature was used in the last vCPU time slice, set the lbr
stack to be interceptible, so that the host can capture whether the
lbr feature will be used in this time slice;
- if the lbr feature wasn't used in the last vCPU time slice, disable
the vCPU support of the guest lbr switching.

Upon the first access to one of the lbr related MSRs (since the vCPU was
scheduled in):
- record that the guest has used the lbr;
- create a host perf event to help save/restore the guest lbr stack;
- pass the stack through to the guest.

Suggested-by: Andi Kleen <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/pmu.c | 6 ++
arch/x86/kvm/pmu.h | 2 +
arch/x86/kvm/vmx/pmu_intel.c | 146 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 4 +-
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 2 +
7 files changed, 162 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2b75c63..22b56d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -469,6 +469,8 @@ struct kvm_pmu {
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u64 reserved_bits;
+ /* Indicate if the lbr msrs were accessed in this vCPU time slice */
+ bool lbr_used;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 57e0df3..51e8cb8 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -328,6 +328,12 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
}

+void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+ if (kvm_x86_ops->pmu_ops->sched_in)
+ kvm_x86_ops->pmu_ops->sched_in(vcpu, cpu);
+}
+
/* refresh PMU settings. This function generally is called when underlying
* settings are changed (such as changes of PMU CPUID by guest VMs), which
* should rarely happen.
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 009be7a..34fb5bf 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -31,6 +31,7 @@ struct kvm_pmu_ops {
bool (*lbr_enable)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+ void (*sched_in)(struct kvm_vcpu *vcpu, int cpu);
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
@@ -115,6 +116,7 @@ int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx);
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu);
void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
void kvm_pmu_reset(struct kvm_vcpu *vcpu);
void kvm_pmu_init(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b00f094..bf40941 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -16,10 +16,12 @@
#include <linux/perf_event.h>
#include <asm/perf_event.h>
#include <asm/intel-family.h>
+#include <asm/vmx.h>
#include "x86.h"
#include "cpuid.h"
#include "lapic.h"
#include "pmu.h"
+#include "vmx.h"

static struct kvm_event_hw_type_mapping intel_arch_events[] = {
/* Index must match CPUID 0x0A.EBX bit vector */
@@ -143,6 +145,17 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
return &counters[idx];
}

+static inline bool msr_is_lbr_stack(struct kvm_vcpu *vcpu, u32 index)
+{
+ struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack;
+ int nr = stack->nr;
+
+ return !!(index == stack->tos ||
+ (index >= stack->from && index < stack->from + nr) ||
+ (index >= stack->to && index < stack->to + nr) ||
+ (index >= stack->info && index < stack->info));
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -154,9 +167,13 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
case MSR_CORE_PERF_GLOBAL_CTRL:
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
case MSR_IA32_PERF_CAPABILITIES:
+ case MSR_IA32_DEBUGCTLMSR:
+ case MSR_LBR_SELECT:
ret = pmu->version > 1;
break;
default:
+ if (msr_is_lbr_stack(vcpu, msr))
+ return pmu->version > 1;
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
get_fixed_pmc(pmu, msr);
@@ -300,6 +317,109 @@ static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu)
return true;
}

+static void intel_pmu_set_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu,
+ bool set)
+{
+ unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+ struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack;
+ int nr = stack->nr;
+ int i;
+
+ vmx_set_intercept_for_msr(msr_bitmap, stack->tos, MSR_TYPE_RW, set);
+ for (i = 0; i < nr; i++) {
+ vmx_set_intercept_for_msr(msr_bitmap, stack->from + i,
+ MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(msr_bitmap, stack->to + i,
+ MSR_TYPE_RW, set);
+ if (stack->info)
+ vmx_set_intercept_for_msr(msr_bitmap, stack->info + i,
+ MSR_TYPE_RW, set);
+ }
+}
+
+static bool intel_pmu_get_lbr_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ u32 index = msr_info->index;
+ bool ret = false;
+
+ switch (index) {
+ case MSR_IA32_DEBUGCTLMSR:
+ msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ ret = true;
+ break;
+ case MSR_LBR_SELECT:
+ ret = true;
+ rdmsrl(index, msr_info->data);
+ break;
+ default:
+ if (msr_is_lbr_stack(vcpu, index)) {
+ ret = true;
+ rdmsrl(index, msr_info->data);
+ }
+ }
+
+ return ret;
+}
+
+static bool intel_pmu_set_lbr_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ u32 index = msr_info->index;
+ u64 data = msr_info->data;
+ bool ret = false;
+
+ switch (index) {
+ case MSR_IA32_DEBUGCTLMSR:
+ ret = true;
+ /*
+ * Currently, only FREEZE_LBRS_ON_PMI and DEBUGCTLMSR_LBR are
+ * supported.
+ */
+ data &= (DEBUGCTLMSR_FREEZE_LBRS_ON_PMI | DEBUGCTLMSR_LBR);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ break;
+ case MSR_LBR_SELECT:
+ ret = true;
+ wrmsrl(index, data);
+ break;
+ default:
+ if (msr_is_lbr_stack(vcpu, index)) {
+ ret = true;
+ wrmsrl(index, data);
+ }
+ }
+
+ return ret;
+}
+
+static bool intel_pmu_access_lbr_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info,
+ bool set)
+{
+ bool ret = false;
+
+ /*
+ * Some userspace implementations (e.g. QEMU) expects the msrs to be
+ * always accesible.
+ */
+ if (!msr_info->host_initiated && !vcpu->kvm->arch.lbr_in_guest)
+ return false;
+
+ if (set)
+ ret = intel_pmu_set_lbr_msr(vcpu, msr_info);
+ else
+ ret = intel_pmu_get_lbr_msr(vcpu, msr_info);
+
+ if (ret && !vcpu->arch.pmu.lbr_used) {
+ vcpu->arch.pmu.lbr_used = true;
+ intel_pmu_set_intercept_for_lbr_msrs(vcpu, false);
+ intel_pmu_enable_save_guest_lbr(vcpu);
+ }
+
+ return ret;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -340,6 +460,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
msr_info->data = pmc->eventsel;
return 0;
+ } else if (intel_pmu_access_lbr_msr(vcpu, msr_info, false)) {
+ return 0;
}
}

@@ -400,12 +522,33 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
reprogram_gp_counter(pmc, data);
return 0;
}
+ } else if (intel_pmu_access_lbr_msr(vcpu, msr_info, true)) {
+ return 0;
}
}

return 1;
}

+static void intel_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ u64 guest_debugctl;
+
+ if (pmu->lbr_used) {
+ pmu->lbr_used = false;
+ intel_pmu_set_intercept_for_lbr_msrs(vcpu, true);
+ } else if (pmu->vcpu_lbr_event) {
+ /*
+ * The lbr feature wasn't used during that last vCPU time
+ * slice, so it's time to disable the vCPU side save/restore.
+ */
+ guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ if (!(guest_debugctl & DEBUGCTLMSR_LBR))
+ intel_pmu_disable_save_guest_lbr(vcpu);
+ }
+}
+
static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -492,6 +635,8 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)

pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
pmu->global_ovf_ctrl = 0;
+
+ intel_pmu_disable_save_guest_lbr(vcpu);
}

int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
@@ -571,6 +716,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
.lbr_enable = intel_pmu_lbr_enable,
.get_msr = intel_pmu_get_msr,
.set_msr = intel_pmu_set_msr,
+ .sched_in = intel_pmu_sched_in,
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4341175..dabf6ca 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3526,8 +3526,8 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm
}
}

-static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
- u32 msr, int type, bool value)
+void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr, int type,
+ bool value)
{
if (value)
vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9932895..f4b904e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -314,6 +314,8 @@ void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr, int type,
+ bool value);
struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr);
void pt_update_intercept_for_msr(struct vcpu_vmx *vmx);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c8f32e7..8e663c1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9101,6 +9101,8 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
{
vcpu->arch.l1tf_flush_l1d = true;
+
+ kvm_pmu_sched_in(vcpu, cpu);
kvm_x86_ops->sched_in(vcpu, cpu);
}

--
2.7.4


2019-02-14 17:31:35

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 07/12] perf/x86: no counter allocation support

In some cases, an event may be created without needing a counter
allocation. For example, an lbr event may be created by the host
only to help save/restore the lbr stack on the vCPU context switching.

This patch adds a "no_counter" attr boolean to let the callers
explicitly tell the perf core that no counter is needed.

Signed-off-by: Wei Wang <[email protected]>
---
arch/x86/events/core.c | 12 ++++++++++++
include/linux/perf_event.h | 5 +++++
include/uapi/linux/perf_event.h | 3 ++-
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a197..c09f03b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -410,6 +410,9 @@ int x86_setup_perfctr(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
u64 config;

+ if (is_no_counter_event(event))
+ return 0;
+
if (!is_sampling_event(event)) {
hwc->sample_period = x86_pmu.max_period;
hwc->last_period = hwc->sample_period;
@@ -1209,6 +1212,12 @@ static int x86_pmu_add(struct perf_event *event, int flags)
hwc = &event->hw;

n0 = cpuc->n_events;
+
+ if (is_no_counter_event(event)) {
+ n = n0;
+ goto done_collect;
+ }
+
ret = n = collect_events(cpuc, event, false);
if (ret < 0)
goto out;
@@ -1387,6 +1396,9 @@ static void x86_pmu_del(struct perf_event *event, int flags)
if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
goto do_del;

+ if (is_no_counter_event(event))
+ goto do_del;
+
/*
* Not a TXN, therefore cleanup properly.
*/
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d5c551..3993e66 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1009,6 +1009,11 @@ static inline bool is_sampling_event(struct perf_event *event)
return event->attr.sample_period != 0;
}

+static inline bool is_no_counter_event(struct perf_event *event)
+{
+ return event->attr.no_counter;
+}
+
/*
* Return 1 for a software event, 0 for a hardware event
*/
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9de8780..ec97a70 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */
- __reserved_1 : 35;
+ no_counter : 1, /* no counter allocation */
+ __reserved_1 : 34;

union {
__u32 wakeup_events; /* wakeup every n events */
--
2.7.4


2019-02-14 17:32:26

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 08/12] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

From: Like Xu <[email protected]>

This patch adds support to enable/disable the host side save/restore
for the guest lbr stack on vCPU switching. To enable that, the host
creates a perf event for the vCPU, and the event attributes are set
to the user callstack mode lbr so that all the conditions are meet in
the host perf subsystem to save the lbr stack on task switching.

The host side lbr perf event are created only for the purpose of saving
and restoring the lbr stack. There is no need to enable the lbr
functionality for this perf event, because the feature is essentially
used in the vCPU. So use "no_counter=true" to have the perf core not
allocate a counter for this event.

The vcpu_lbr field is added to cpuc, to indicate if the lbr perf event is
used by the vCPU only for context switching. When the perf subsystem
handles this event (e.g. lbr enable or read lbr stack on PMI) and finds
it's non-zero, it simply returns.

Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/events/intel/lbr.c | 12 ++++++--
arch/x86/events/perf_event.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/pmu.h | 3 ++
arch/x86/kvm/vmx/pmu_intel.c | 66 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 594a91b..7951b22 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -462,6 +462,9 @@ void intel_pmu_lbr_add(struct perf_event *event)
if (!x86_pmu.lbr_nr)
return;

+ if (event->attr.exclude_guest && event->attr.no_counter)
+ cpuc->vcpu_lbr = 1;
+
cpuc->br_sel = event->hw.branch_reg.reg;

if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) {
@@ -507,6 +510,9 @@ void intel_pmu_lbr_del(struct perf_event *event)
task_ctx->lbr_callstack_users--;
}

+ if (event->attr.exclude_guest && event->attr.no_counter)
+ cpuc->vcpu_lbr = 0;
+
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
perf_sched_cb_dec(event->ctx->pmu);
@@ -516,7 +522,7 @@ void intel_pmu_lbr_enable_all(bool pmi)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

- if (cpuc->lbr_users)
+ if (cpuc->lbr_users && !cpuc->vcpu_lbr)
__intel_pmu_lbr_enable(pmi);
}

@@ -524,7 +530,7 @@ void intel_pmu_lbr_disable_all(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

- if (cpuc->lbr_users)
+ if (cpuc->lbr_users && !cpuc->vcpu_lbr)
__intel_pmu_lbr_disable();
}

@@ -658,7 +664,7 @@ void intel_pmu_lbr_read(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

- if (!cpuc->lbr_users)
+ if (!cpuc->lbr_users || cpuc->vcpu_lbr)
return;

if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 1f78d85..bbea559 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -210,6 +210,7 @@ struct cpu_hw_events {
/*
* Intel LBR bits
*/
+ u8 vcpu_lbr;
int lbr_users;
struct perf_branch_stack lbr_stack;
struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES];
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e6f6760..2b75c63 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -474,6 +474,7 @@ struct kvm_pmu {
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
struct irq_work irq_work;
u64 reprogram_pmi;
+ struct perf_event *vcpu_lbr_event;
};

struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index e1bcd2b..009be7a 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -122,6 +122,9 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu);

bool is_vmware_backdoor_pmc(u32 pmc_idx);

+extern int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu);
+extern void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu);
+
extern struct kvm_pmu_ops intel_pmu_ops;
extern struct kvm_pmu_ops amd_pmu_ops;
#endif /* __KVM_X86_PMU_H */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index cbc6015..b00f094 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -494,6 +494,72 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
pmu->global_ovf_ctrl = 0;
}

+int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct perf_event *event;
+
+ /*
+ * The main purpose of this perf event is to have the host perf core
+ * help save/restore the guest lbr stack on vcpu switching. There is
+ * no perf counters allocated for the event.
+ *
+ * About the attr:
+ * exclude_guest: set to true to indicate that the event runs on the
+ * host only.
+ * no_counter: set to true to tell the perf core that this event
+ * doesn't need a counter.
+ * pinned: set to false, so that the FLEXIBLE events will not
+ * be rescheduled for this event which actually doesn't
+ * need a perf counter.
+ * config: Actually this field won't be used by the perf core
+ * as this event doesn't have a perf counter.
+ * sample_period: Same as above.
+ * sample_type: tells the perf core that it is an lbr event.
+ * branch_sample_type: tells the perf core that the lbr event works in
+ * the user callstack mode so that the lbr stack will be
+ * saved/restored on vCPU switching.
+ */
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_RAW,
+ .size = sizeof(attr),
+ .no_counter = true,
+ .exclude_guest = true,
+ .pinned = false,
+ .config = 0,
+ .sample_period = 0,
+ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ PERF_SAMPLE_BRANCH_USER,
+ };
+
+ if (pmu->vcpu_lbr_event)
+ return 0;
+
+ event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
+ NULL);
+ if (IS_ERR(event)) {
+ pr_err("%s: failed %ld\n", __func__, PTR_ERR(event));
+ return -ENOENT;
+ }
+ pmu->vcpu_lbr_event = event;
+
+ return 0;
+}
+
+void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct perf_event *event = pmu->vcpu_lbr_event;
+
+ if (!event)
+ return;
+
+ perf_event_release_kernel(event);
+ pmu->vcpu_lbr_event = NULL;
+}
+
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
--
2.7.4


2019-02-14 17:32:37

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 04/12] KVM/x86: intel_pmu_lbr_enable

The lbr stack is architecturally specific, for example, SKX has 32 lbr
stack entries while HSW has 16 entries, so a HSW guest running on a SKX
machine may not get accurate perf results. Currently, we forbid the
guest lbr enabling when the guest and host see different lbr stack
entries or the host and guest see different lbr stack msr indices.

Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/kvm/cpuid.h | 8 +++
arch/x86/kvm/pmu.c | 8 +++
arch/x86/kvm/pmu.h | 2 +
arch/x86/kvm/vmx/pmu_intel.c | 136 +++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 3 +-
5 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5..92bdc7d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -123,6 +123,14 @@ static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)
return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;
}

+static inline bool guest_cpuid_is_intel(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = kvm_find_cpuid_entry(vcpu, 0, 0);
+ return best && best->ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx;
+}
+
static inline int guest_cpuid_family(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 58ead7d..b438ffa 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -299,6 +299,14 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
return 0;
}

+bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu)
+{
+ if (guest_cpuid_is_intel(vcpu))
+ return kvm_x86_ops->pmu_ops->lbr_enable(vcpu);
+
+ return false;
+}
+
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
{
if (lapic_in_kernel(vcpu))
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index ba8898e..5f3c7a4 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -28,6 +28,7 @@ struct kvm_pmu_ops {
struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx);
int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx);
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
+ bool (*lbr_enable)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
void (*refresh)(struct kvm_vcpu *vcpu);
@@ -106,6 +107,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);

+bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu);
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5ab4a36..6728701 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -15,6 +15,7 @@
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
#include <asm/perf_event.h>
+#include <asm/intel-family.h>
#include "x86.h"
#include "cpuid.h"
#include "lapic.h"
@@ -164,6 +165,140 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
return ret;
}

+static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ u8 vcpu_model = guest_cpuid_model(vcpu);
+ unsigned int vcpu_lbr_from, vcpu_lbr_nr;
+
+ if (x86_perf_get_lbr_stack(&kvm->arch.lbr_stack))
+ return false;
+
+ if (guest_cpuid_family(vcpu) != boot_cpu_data.x86)
+ return false;
+
+ /*
+ * It could be possible that people have vcpus of old model run on
+ * physcal cpus of newer model, for example a BDW guest on a SKX
+ * machine (but not possible to be the other way around).
+ * The BDW guest may not get accurate results on a SKX machine as it
+ * only reads 16 entries of the lbr stack while there are 32 entries
+ * of recordings. We currently forbid the lbr enabling when the vcpu
+ * and physical cpu see different lbr stack entries or the guest lbr
+ * msr indices are not compatible with the host.
+ */
+ switch (vcpu_model) {
+ case INTEL_FAM6_CORE2_MEROM:
+ case INTEL_FAM6_CORE2_MEROM_L:
+ case INTEL_FAM6_CORE2_PENRYN:
+ case INTEL_FAM6_CORE2_DUNNINGTON:
+ /* intel_pmu_lbr_init_core() */
+ vcpu_lbr_nr = 4;
+ vcpu_lbr_from = MSR_LBR_CORE_FROM;
+ break;
+ case INTEL_FAM6_NEHALEM:
+ case INTEL_FAM6_NEHALEM_EP:
+ case INTEL_FAM6_NEHALEM_EX:
+ /* intel_pmu_lbr_init_nhm() */
+ vcpu_lbr_nr = 16;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_ATOM_BONNELL:
+ case INTEL_FAM6_ATOM_BONNELL_MID:
+ case INTEL_FAM6_ATOM_SALTWELL:
+ case INTEL_FAM6_ATOM_SALTWELL_MID:
+ case INTEL_FAM6_ATOM_SALTWELL_TABLET:
+ /* intel_pmu_lbr_init_atom() */
+ vcpu_lbr_nr = 8;
+ vcpu_lbr_from = MSR_LBR_CORE_FROM;
+ break;
+ case INTEL_FAM6_ATOM_SILVERMONT:
+ case INTEL_FAM6_ATOM_SILVERMONT_X:
+ case INTEL_FAM6_ATOM_SILVERMONT_MID:
+ case INTEL_FAM6_ATOM_AIRMONT:
+ case INTEL_FAM6_ATOM_AIRMONT_MID:
+ /* intel_pmu_lbr_init_slm() */
+ vcpu_lbr_nr = 8;
+ vcpu_lbr_from = MSR_LBR_CORE_FROM;
+ break;
+ case INTEL_FAM6_ATOM_GOLDMONT:
+ case INTEL_FAM6_ATOM_GOLDMONT_X:
+ /* intel_pmu_lbr_init_skl(); */
+ vcpu_lbr_nr = 32;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_ATOM_GOLDMONT_PLUS:
+ /* intel_pmu_lbr_init_skl()*/
+ vcpu_lbr_nr = 32;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_WESTMERE:
+ case INTEL_FAM6_WESTMERE_EP:
+ case INTEL_FAM6_WESTMERE_EX:
+ /* intel_pmu_lbr_init_nhm() */
+ vcpu_lbr_nr = 16;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_SANDYBRIDGE:
+ case INTEL_FAM6_SANDYBRIDGE_X:
+ /* intel_pmu_lbr_init_snb() */
+ vcpu_lbr_nr = 16;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_IVYBRIDGE:
+ case INTEL_FAM6_IVYBRIDGE_X:
+ /* intel_pmu_lbr_init_snb() */
+ vcpu_lbr_nr = 16;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_HASWELL_CORE:
+ case INTEL_FAM6_HASWELL_X:
+ case INTEL_FAM6_HASWELL_ULT:
+ case INTEL_FAM6_HASWELL_GT3E:
+ /* intel_pmu_lbr_init_hsw() */
+ vcpu_lbr_nr = 16;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_BROADWELL_CORE:
+ case INTEL_FAM6_BROADWELL_XEON_D:
+ case INTEL_FAM6_BROADWELL_GT3E:
+ case INTEL_FAM6_BROADWELL_X:
+ /* intel_pmu_lbr_init_hsw() */
+ vcpu_lbr_nr = 16;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_XEON_PHI_KNL:
+ case INTEL_FAM6_XEON_PHI_KNM:
+ /* intel_pmu_lbr_init_knl() */
+ vcpu_lbr_nr = 8;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ case INTEL_FAM6_SKYLAKE_MOBILE:
+ case INTEL_FAM6_SKYLAKE_DESKTOP:
+ case INTEL_FAM6_SKYLAKE_X:
+ case INTEL_FAM6_KABYLAKE_MOBILE:
+ case INTEL_FAM6_KABYLAKE_DESKTOP:
+ /* intel_pmu_lbr_init_skl() */
+ vcpu_lbr_nr = 32;
+ vcpu_lbr_from = MSR_LBR_NHM_FROM;
+ break;
+ default:
+ vcpu_lbr_nr = 0;
+ vcpu_lbr_from = 0;
+ pr_warn("%s: vcpu model not supported %d\n", __func__,
+ vcpu_model);
+ }
+
+ if (vcpu_lbr_nr != kvm->arch.lbr_stack.nr ||
+ vcpu_lbr_from != kvm->arch.lbr_stack.from) {
+ pr_warn("%s: vcpu model %x incompatible to pcpu %x\n",
+ __func__, vcpu_model, boot_cpu_data.x86_model);
+ return false;
+ }
+
+ return true;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -350,6 +485,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
.msr_idx_to_pmc = intel_msr_idx_to_pmc,
.is_valid_msr_idx = intel_is_valid_msr_idx,
.is_valid_msr = intel_is_valid_msr,
+ .lbr_enable = intel_pmu_lbr_enable,
.get_msr = intel_pmu_get_msr,
.set_msr = intel_pmu_set_msr,
.refresh = intel_pmu_refresh,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2cdebe7..2a41aab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4565,8 +4565,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_X86_GUEST_LBR:
r = -EINVAL;
- if (cap->args[0] &&
- x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) {
+ if (cap->args[0] && !kvm_pmu_lbr_enable(kvm->vcpus[0])) {
pr_err("Failed to enable the guest lbr feature\n");
break;
}
--
2.7.4


2019-02-14 17:33:09

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 02/12] perf/x86: add a function to get the lbr stack

The LBR stack MSRs are architecturally specific. The perf subsystem has
already assigned the abstracted MSR values based on the CPU architecture.

This patch enables a caller outside the perf subsystem to get the LBR
stack info. This is useful for hyperviosrs to prepare the lbr feature
for the guest.

Signed-off-by: Wei Wang <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/events/intel/lbr.c | 23 +++++++++++++++++++++++
arch/x86/include/asm/perf_event.h | 14 ++++++++++++++
2 files changed, 37 insertions(+)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index c88ed39..594a91b 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1277,3 +1277,26 @@ void intel_pmu_lbr_init_knl(void)
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_LIP)
x86_pmu.intel_cap.lbr_format = LBR_FORMAT_EIP_FLAGS;
}
+
+/**
+ * x86_perf_get_lbr_stack - get the lbr stack related MSRs
+ *
+ * @stack: the caller's memory to get the lbr stack
+ *
+ * Returns: 0 indicates that the lbr stack has been successfully obtained.
+ */
+int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack)
+{
+ stack->nr = x86_pmu.lbr_nr;
+ stack->tos = x86_pmu.lbr_tos;
+ stack->from = x86_pmu.lbr_from;
+ stack->to = x86_pmu.lbr_to;
+
+ if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
+ stack->info = MSR_LBR_INFO_0;
+ else
+ stack->info = 0;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_perf_get_lbr_stack);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8bdf749..2f82795 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -275,7 +275,16 @@ struct perf_guest_switch_msr {
u64 host, guest;
};

+struct x86_perf_lbr_stack {
+ unsigned int nr;
+ unsigned int tos;
+ unsigned int from;
+ unsigned int to;
+ unsigned int info;
+};
+
extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
+extern int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack);
extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
extern void perf_check_microcode(void);
extern int x86_perf_rdpmc_index(struct perf_event *event);
@@ -286,6 +295,11 @@ static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
return NULL;
}

+static inline int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack)
+{
+ return -1;
+}
+
static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
{
memset(cap, 0, sizeof(*cap));
--
2.7.4


2019-02-14 17:33:11

by Wang, Wei W

[permalink] [raw]
Subject: [PATCH v5 09/12] perf/x86: save/restore LBR_SELECT on vCPU switching

The vCPU lbr event relies on the host to save/restore all the lbr
related MSRs. So add the LBR_SELECT save/restore to the related
functions for the vCPU case.

Signed-off-by: Wei Wang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/events/intel/lbr.c | 7 +++++++
arch/x86/events/perf_event.h | 1 +
2 files changed, 8 insertions(+)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 7951b22..740c097 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -383,6 +383,9 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)

wrmsrl(x86_pmu.lbr_tos, tos);
task_ctx->lbr_stack_state = LBR_NONE;
+
+ if (cpuc->vcpu_lbr)
+ wrmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel);
}

static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
@@ -409,6 +412,10 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
+
+ if (cpuc->vcpu_lbr)
+ rdmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel);
+
task_ctx->valid_lbrs = i;
task_ctx->tos = tos;
task_ctx->lbr_stack_state = LBR_VALID;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bbea559..ccd0215 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -653,6 +653,7 @@ struct x86_perf_task_context {
u64 lbr_from[MAX_LBR_ENTRIES];
u64 lbr_to[MAX_LBR_ENTRIES];
u64 lbr_info[MAX_LBR_ENTRIES];
+ u64 lbr_sel;
int tos;
int valid_lbrs;
int lbr_callstack_users;
--
2.7.4


2019-02-15 00:53:23

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5 07/12] perf/x86: no counter allocation support

> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 9de8780..ec97a70 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -372,7 +372,8 @@ struct perf_event_attr {
> context_switch : 1, /* context switch data */
> write_backward : 1, /* Write ring buffer from end to beginning */
> namespaces : 1, /* include namespaces data */
> - __reserved_1 : 35;
> + no_counter : 1, /* no counter allocation */

Not sure we really want to expose this in the user ABI. Perhaps make
it a feature of the in kernel API only?

-Andi

2019-02-15 00:54:14

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5 03/12] KVM/x86: KVM_CAP_X86_GUEST_LBR

> + case KVM_CAP_X86_GUEST_LBR:
> + r = -EINVAL;
> + if (cap->args[0] &&
> + x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) {
> + pr_err("Failed to enable the guest lbr feature\n");

Remove the pr_err. We don't want unprivileged users trigger unlimited
kernel printk.




2019-02-15 00:54:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

> +static void intel_pmu_get_global_status(struct kvm_pmu *pmu,
> + struct msr_data *msr_info)
> +{
> + u64 guest_debugctl, freeze_lbr_bits = DEBUGCTLMSR_FREEZE_LBRS_ON_PMI |
> + DEBUGCTLMSR_LBR;
> +
> + if (!pmu->global_status) {
> + msr_info->data = 0;
> + return;
> + }
> +
> + msr_info->data = pmu->global_status;
> + if (pmu->version >= 4) {
> + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> + if ((guest_debugctl & freeze_lbr_bits) == freeze_lbr_bits)

It should only check for the freeze bit, the freeze bit can be set
even when LBRs are disabled.

Also you seem to set the bit unconditionally?
That doesn't seem right. It should only be set after an overflow.

So the PMI injection needs to set it.

-Andi

2019-02-15 02:28:20

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] KVM/x86/lbr: lazy save the guest lbr stack

On 2019/2/14 17:06, Wei Wang wrote:
> When the vCPU is scheduled in:
> - if the lbr feature was used in the last vCPU time slice, set the lbr
> stack to be interceptible, so that the host can capture whether the
> lbr feature will be used in this time slice;
> - if the lbr feature wasn't used in the last vCPU time slice, disable
> the vCPU support of the guest lbr switching.
>
> Upon the first access to one of the lbr related MSRs (since the vCPU was
> scheduled in):
> - record that the guest has used the lbr;
> - create a host perf event to help save/restore the guest lbr stack;

Based on commit "15ad71460" and guest-use-lbr-only usage,
is this possible to create none of host perf event for vcpu
and simply reuse __intel_pmu_lbr_save/restore
in intel_pmu_sched_out/in and keep the lbr_stack sync with
kvm_pmu->lbr_stack rather than task_ctx of perf_event ?

> - pass the stack through to the guest.
>
> Suggested-by: Andi Kleen <[email protected]>
> Signed-off-by: Wei Wang <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/pmu.c | 6 ++
> arch/x86/kvm/pmu.h | 2 +
> arch/x86/kvm/vmx/pmu_intel.c | 146 ++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 4 +-
> arch/x86/kvm/vmx/vmx.h | 2 +
> arch/x86/kvm/x86.c | 2 +
> 7 files changed, 162 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2b75c63..22b56d3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -469,6 +469,8 @@ struct kvm_pmu {
> u64 counter_bitmask[2];
> u64 global_ctrl_mask;
> u64 reserved_bits;
> + /* Indicate if the lbr msrs were accessed in this vCPU time slice */
> + bool lbr_used;
> u8 version;
> struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
> struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 57e0df3..51e8cb8 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -328,6 +328,12 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
> }
>
> +void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> + if (kvm_x86_ops->pmu_ops->sched_in)
> + kvm_x86_ops->pmu_ops->sched_in(vcpu, cpu);
> +}
> +
> /* refresh PMU settings. This function generally is called when underlying
> * settings are changed (such as changes of PMU CPUID by guest VMs), which
> * should rarely happen.
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 009be7a..34fb5bf 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -31,6 +31,7 @@ struct kvm_pmu_ops {
> bool (*lbr_enable)(struct kvm_vcpu *vcpu);
> int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
> int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
> + void (*sched_in)(struct kvm_vcpu *vcpu, int cpu);
> void (*refresh)(struct kvm_vcpu *vcpu);
> void (*init)(struct kvm_vcpu *vcpu);
> void (*reset)(struct kvm_vcpu *vcpu);
> @@ -115,6 +116,7 @@ int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx);
> bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
> int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
> int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
> +void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu);
> void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
> void kvm_pmu_reset(struct kvm_vcpu *vcpu);
> void kvm_pmu_init(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index b00f094..bf40941 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -16,10 +16,12 @@
> #include <linux/perf_event.h>
> #include <asm/perf_event.h>
> #include <asm/intel-family.h>
> +#include <asm/vmx.h>
> #include "x86.h"
> #include "cpuid.h"
> #include "lapic.h"
> #include "pmu.h"
> +#include "vmx.h"
>
> static struct kvm_event_hw_type_mapping intel_arch_events[] = {
> /* Index must match CPUID 0x0A.EBX bit vector */
> @@ -143,6 +145,17 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
> return &counters[idx];
> }
>
> +static inline bool msr_is_lbr_stack(struct kvm_vcpu *vcpu, u32 index)
> +{
> + struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack;
> + int nr = stack->nr;
> +
> + return !!(index == stack->tos ||
> + (index >= stack->from && index < stack->from + nr) ||
> + (index >= stack->to && index < stack->to + nr) ||
> + (index >= stack->info && index < stack->info));
> +}
> +
> static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -154,9 +167,13 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
> case MSR_CORE_PERF_GLOBAL_CTRL:
> case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> case MSR_IA32_PERF_CAPABILITIES:
> + case MSR_IA32_DEBUGCTLMSR:
> + case MSR_LBR_SELECT:
> ret = pmu->version > 1;
> break;
> default:
> + if (msr_is_lbr_stack(vcpu, msr))
> + return pmu->version > 1;
> ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
> get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
> get_fixed_pmc(pmu, msr);
> @@ -300,6 +317,109 @@ static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu)
> return true;
> }
>
> +static void intel_pmu_set_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu,
> + bool set)
> +{
> + unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
> + struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack;
> + int nr = stack->nr;
> + int i;
> +
> + vmx_set_intercept_for_msr(msr_bitmap, stack->tos, MSR_TYPE_RW, set);
> + for (i = 0; i < nr; i++) {
> + vmx_set_intercept_for_msr(msr_bitmap, stack->from + i,
> + MSR_TYPE_RW, set);
> + vmx_set_intercept_for_msr(msr_bitmap, stack->to + i,
> + MSR_TYPE_RW, set);
> + if (stack->info)
> + vmx_set_intercept_for_msr(msr_bitmap, stack->info + i,
> + MSR_TYPE_RW, set);
> + }
> +}
> +
> +static bool intel_pmu_get_lbr_msr(struct kvm_vcpu *vcpu,
> + struct msr_data *msr_info)
> +{
> + u32 index = msr_info->index;
> + bool ret = false;
> +
> + switch (index) {
> + case MSR_IA32_DEBUGCTLMSR:
> + msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> + ret = true;
> + break;
> + case MSR_LBR_SELECT:
> + ret = true;
> + rdmsrl(index, msr_info->data);
> + break;
> + default:
> + if (msr_is_lbr_stack(vcpu, index)) {
> + ret = true;
> + rdmsrl(index, msr_info->data);
> + }
> + }
> +
> + return ret;
> +}
> +
> +static bool intel_pmu_set_lbr_msr(struct kvm_vcpu *vcpu,
> + struct msr_data *msr_info)
> +{
> + u32 index = msr_info->index;
> + u64 data = msr_info->data;
> + bool ret = false;
> +
> + switch (index) {
> + case MSR_IA32_DEBUGCTLMSR:
> + ret = true;
> + /*
> + * Currently, only FREEZE_LBRS_ON_PMI and DEBUGCTLMSR_LBR are
> + * supported.
> + */
> + data &= (DEBUGCTLMSR_FREEZE_LBRS_ON_PMI | DEBUGCTLMSR_LBR);
> + vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> + break;
> + case MSR_LBR_SELECT:
> + ret = true;
> + wrmsrl(index, data);
> + break;
> + default:
> + if (msr_is_lbr_stack(vcpu, index)) {
> + ret = true;
> + wrmsrl(index, data);
> + }
> + }
> +
> + return ret;
> +}
> +
> +static bool intel_pmu_access_lbr_msr(struct kvm_vcpu *vcpu,
> + struct msr_data *msr_info,
> + bool set)
> +{
> + bool ret = false;
> +
> + /*
> + * Some userspace implementations (e.g. QEMU) expects the msrs to be
> + * always accesible.
> + */
> + if (!msr_info->host_initiated && !vcpu->kvm->arch.lbr_in_guest)
> + return false;
> +
> + if (set)
> + ret = intel_pmu_set_lbr_msr(vcpu, msr_info);
> + else
> + ret = intel_pmu_get_lbr_msr(vcpu, msr_info);
> +
> + if (ret && !vcpu->arch.pmu.lbr_used) {
> + vcpu->arch.pmu.lbr_used = true;
> + intel_pmu_set_intercept_for_lbr_msrs(vcpu, false);
> + intel_pmu_enable_save_guest_lbr(vcpu);
> + }
> +
> + return ret;
> +}
> +
> static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -340,6 +460,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
> msr_info->data = pmc->eventsel;
> return 0;
> + } else if (intel_pmu_access_lbr_msr(vcpu, msr_info, false)) {
> + return 0;
> }
> }
>
> @@ -400,12 +522,33 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> reprogram_gp_counter(pmc, data);
> return 0;
> }
> + } else if (intel_pmu_access_lbr_msr(vcpu, msr_info, true)) {
> + return 0;
> }
> }
>
> return 1;
> }
>
> +static void intel_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu)
> +{
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> + u64 guest_debugctl;
> +
> + if (pmu->lbr_used) {
> + pmu->lbr_used = false;
> + intel_pmu_set_intercept_for_lbr_msrs(vcpu, true);
> + } else if (pmu->vcpu_lbr_event) {
> + /*
> + * The lbr feature wasn't used during that last vCPU time
> + * slice, so it's time to disable the vCPU side save/restore.
> + */
> + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> + if (!(guest_debugctl & DEBUGCTLMSR_LBR))
> + intel_pmu_disable_save_guest_lbr(vcpu);
> + }
> +}
> +
> static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -492,6 +635,8 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>
> pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
> pmu->global_ovf_ctrl = 0;
> +
> + intel_pmu_disable_save_guest_lbr(vcpu);
> }
>
> int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> @@ -571,6 +716,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
> .lbr_enable = intel_pmu_lbr_enable,
> .get_msr = intel_pmu_get_msr,
> .set_msr = intel_pmu_set_msr,
> + .sched_in = intel_pmu_sched_in,
> .refresh = intel_pmu_refresh,
> .init = intel_pmu_init,
> .reset = intel_pmu_reset,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4341175..dabf6ca 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3526,8 +3526,8 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm
> }
> }
>
> -static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
> - u32 msr, int type, bool value)
> +void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr, int type,
> + bool value)
> {
> if (value)
> vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 9932895..f4b904e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -314,6 +314,8 @@ void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
> bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
> void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
> void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
> +void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr, int type,
> + bool value);
> struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr);
> void pt_update_intercept_for_msr(struct vcpu_vmx *vmx);
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c8f32e7..8e663c1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9101,6 +9101,8 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> {
> vcpu->arch.l1tf_flush_l1d = true;
> +
> + kvm_pmu_sched_in(vcpu, cpu);
> kvm_x86_ops->sched_in(vcpu, cpu);
> }
>
>


2019-02-15 15:38:20

by Wang, Wei W

[permalink] [raw]
Subject: RE: [PATCH v5 07/12] perf/x86: no counter allocation support

On Friday, February 15, 2019 12:26 AM, Andi Kleen wrote:
> > diff --git a/include/uapi/linux/perf_event.h
> > b/include/uapi/linux/perf_event.h index 9de8780..ec97a70 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -372,7 +372,8 @@ struct perf_event_attr {
> > context_switch : 1, /* context switch data */
> > write_backward : 1, /* Write ring buffer from
> end to beginning */
> > namespaces : 1, /* include namespaces
> data */
> > - __reserved_1 : 35;
> > + no_counter : 1, /* no counter allocation */
>
> Not sure we really want to expose this in the user ABI. Perhaps make it a
> feature of the in kernel API only?

OK. I plan to move it to the perf_event struct.

Best,
Wei

2019-02-15 15:38:33

by Wang, Wei W

[permalink] [raw]
Subject: RE: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

On Friday, February 15, 2019 12:32 AM, Andi Kleen wrote:
>
> > +static void intel_pmu_get_global_status(struct kvm_pmu *pmu,
> > + struct msr_data *msr_info)
> > +{
> > + u64 guest_debugctl, freeze_lbr_bits =
> DEBUGCTLMSR_FREEZE_LBRS_ON_PMI |
> > + DEBUGCTLMSR_LBR;
> > +
> > + if (!pmu->global_status) {
> > + msr_info->data = 0;
> > + return;
> > + }
> > +
> > + msr_info->data = pmu->global_status;
> > + if (pmu->version >= 4) {
> > + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> > + if ((guest_debugctl & freeze_lbr_bits) == freeze_lbr_bits)
>
> It should only check for the freeze bit, the freeze bit can be set even when
> LBRs are disabled.
>
> Also you seem to set the bit unconditionally?
> That doesn't seem right. It should only be set after an overflow.
>
> So the PMI injection needs to set it.

OK. The freeze bits need to be cleared by IA32_PERF_GLOBAL_STATUS_RESET, which seems not supported by the perf code yet (thus guest won't clear them). Would handle_irq_v4 also need to be changed to support that?

Best,
Wei

2019-02-15 15:39:48

by Wang, Wei W

[permalink] [raw]
Subject: RE: [PATCH v5 10/12] KVM/x86/lbr: lazy save the guest lbr stack

On Friday, February 15, 2019 9:50 AM, Like Xu wrote:
> To: Wang, Wei W <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: Liang, Kan <[email protected]>; [email protected];
> [email protected]; Xu, Like <[email protected]>; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH v5 10/12] KVM/x86/lbr: lazy save the guest lbr stack
>
> On 2019/2/14 17:06, Wei Wang wrote:
> > When the vCPU is scheduled in:
> > - if the lbr feature was used in the last vCPU time slice, set the lbr
> > stack to be interceptible, so that the host can capture whether the
> > lbr feature will be used in this time slice;
> > - if the lbr feature wasn't used in the last vCPU time slice, disable
> > the vCPU support of the guest lbr switching.
> >
> > Upon the first access to one of the lbr related MSRs (since the vCPU
> > was scheduled in):
> > - record that the guest has used the lbr;
> > - create a host perf event to help save/restore the guest lbr stack;
>
> Based on commit "15ad71460" and guest-use-lbr-only usage, is this possible
> to create none of host perf event for vcpu and simply reuse
> __intel_pmu_lbr_save/restore in intel_pmu_sched_out/in and keep the
> lbr_stack sync with kvm_pmu->lbr_stack rather than task_ctx of perf_event ?

Yes, both method should work. People may have different opinions about
kvm or perf to own the job of lbr switching. Let's see.. if most people vote for
kvm to do the switching, I'll make that change.
Otherwise, I'll stay with what we already have.

Best,
Wei

2019-02-15 16:17:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

On Fri, Feb 15, 2019 at 08:56:02AM +0000, Wang, Wei W wrote:
> On Friday, February 15, 2019 12:32 AM, Andi Kleen wrote:
> >
> > > +static void intel_pmu_get_global_status(struct kvm_pmu *pmu,
> > > + struct msr_data *msr_info)
> > > +{
> > > + u64 guest_debugctl, freeze_lbr_bits =
> > DEBUGCTLMSR_FREEZE_LBRS_ON_PMI |
> > > + DEBUGCTLMSR_LBR;
> > > +
> > > + if (!pmu->global_status) {
> > > + msr_info->data = 0;
> > > + return;
> > > + }
> > > +
> > > + msr_info->data = pmu->global_status;
> > > + if (pmu->version >= 4) {
> > > + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> > > + if ((guest_debugctl & freeze_lbr_bits) == freeze_lbr_bits)
> >
> > It should only check for the freeze bit, the freeze bit can be set even when
> > LBRs are disabled.
> >
> > Also you seem to set the bit unconditionally?
> > That doesn't seem right. It should only be set after an overflow.
> >
> > So the PMI injection needs to set it.
>
> OK. The freeze bits need to be cleared by IA32_PERF_GLOBAL_STATUS_RESET, which seems not supported by the perf code yet (thus guest won't clear them). Would handle_irq_v4 also need to be changed to support that?

In Arch Perfmon v4 it is cleared by the MSR_CORE_PERF_GLOBAL_OVF_CTRL write
But the guest KVM pmu doesn't support v4 so far, so the only way to clear it is through DEBUGCTL.

STATUS_RESET would only be needed to set it from the guest, which is not necessary at least for now
(and would be also v4)

At some point the guest PMU should probably be updated for v4, but it can be done
separately from this.

-Andi

2019-02-18 01:57:43

by Wang, Wei W

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

On 02/15/2019 09:10 PM, Andi Kleen wrote:
>
> OK. The freeze bits need to be cleared by IA32_PERF_GLOBAL_STATUS_RESET, which seems not supported by the perf code yet (thus guest won't clear them). Would handle_irq_v4 also need to be changed to support that?
> In Arch Perfmon v4 it is cleared by the MSR_CORE_PERF_GLOBAL_OVF_CTRL write

Not very sure about this one. The spec 18.2.4.2 mentions
"IA32_PERF_GLOBAL_STATUS_RESET provides additional bit fields to clear
the new indicators.."
IIUIC, the new freeze bits can only be cleared by RESET.


> But the guest KVM pmu doesn't support v4 so far, so the only way to clear it is through DEBUGCTL.
>
> STATUS_RESET would only be needed to set it from the guest, which is not necessary at least for now
> (and would be also v4)
>
> At some point the guest PMU should probably be updated for v4, but it can be done
> separately from this.
>

Agree. I think the guest perf won't work in v4 mode if the KVM vPMU
exposes it is v3.
Probably we could also leave the freeze bits virtualization support to
another series of vPMU v4 support?
We would also need to use the STATUS_SET in v4 to set the freeze bits of
GLOBAL_STATUS when
entering the guest (instead of clearing the guest debugctl), so that we
could achieve architectural emulation.

Best,
Wei