2020-07-26 15:35:12

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 00/10] Guest Last Branch Recording Enabling (KVM part)

Hi Paolo,

Please review this new version for the Kernel 5.9 release, and
Sean may not review them as he said in the previous email
https://lore.kernel.org/kvm/[email protected]/

You may cherry-pick the perf patches "3cb9d5464c1c..e1ad1ac2deb8"
from the branch "tip/perf/core" of scm/linux/kernel/git/tip/tip.git
as PeterZ said in the previous email
https://lore.kernel.org/kvm/[email protected]/

We may also apply the qemu-devel patch to the upstream qemu and try
the QEMU command lines with '-cpu host' or '-cpu host,pmu=true,lbr=true'.

The following error will be gone forever with the patchset:

$ perf record -b lbr ${WORKLOAD}
or $ perf record --call-graph lbr ${WORKLOAD}
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

Please check more details in each commit and feel free to test.

v12->v13 Changelog:
- remove perf patches since they're queued in the tip/perf/core;
- add a minor patch to refactor MSR_IA32_DEBUGCTLMSR set/get handler;
- add a minor patch to expose vmx_set_intercept_for_msr();
- add a minor patch to initialize perf_capabilities in the intel_pmu_init();
- spilt the big patch to three pieces (0004-0006) for better understanding and review
- make the LBR_FMT exposure patch as the last step to enable guest LBR;

Previous:
https://lore.kernel.org/kvm/[email protected]/

---

The last branch recording (LBR) is a performance monitor unit (PMU)
feature on Intel processors that records a running trace of the most
recent branches taken by the processor in the LBR stack. This patch
series is going to enable this feature for plenty of KVM guests.

The user space could configure whether it's enabled or not for each
guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
could only enable LBR feature if its cpu model is the same as the
host since the LBR feature is still one of model specific features.

If it's enabled on the guest, the guest LBR driver would accesses the
LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
The first guest access on the LBR related MSRs is always interceptible.
The KVM trap would create a special LBR event (called guest LBR event)
which enables the callstack mode and none of hardware counter is assigned.
The host perf would enable and schedule this event as usual.

Guest's first access to a LBR registers gets trapped to KVM, which
creates a guest LBR perf event. It's a regular LBR perf event which gets
the LBR facility assigned from the perf subsystem. Once that succeeds,
the LBR stack msrs are passed through to the guest for efficient accesses.
However, if another host LBR event comes in and takes over the LBR
facility, the LBR msrs will be made interceptible, and guest following
accesses to the LBR msrs will be trapped and meaningless.

Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
VMX transition brings too excessive overhead to frequent vmx transition
itself, the guest LBR event would help save/restore the LBR stack msrs
during the context switching with the help of native LBR event callstack
mechanism, including LBR_SELECT msr.

If the guest no longer accesses the LBR-related MSRs within a scheduling
time slice and the LBR enable bit is unset, vPMU would release its guest
LBR event as a normal event of a unused vPMC and the pass-through
state of the LBR stack msrs would be canceled.

---

LBR testcase:
echo 1 > /proc/sys/kernel/watchdog
echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
./perf record -b ./br_instr a

- Perf report on the host:
Samples: 72K of event 'cycles', Event count (approx.): 72512
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
12.12% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
11.05% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
8.81% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
5.04% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
4.92% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
4.88% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
4.58% br_instr br_instr [.] cmp_end [.] lfsr_cond 5

- Perf report on the guest:
Samples: 92K of event 'cycles', Event count (approx.): 92544
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
12.03% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
11.09% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
8.57% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
5.08% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
5.06% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
4.87% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
4.70% br_instr br_instr [.] cmp_end [.] lfsr_cond 5

Conclusion: the profiling results on the guest are similar to that on the host.

Like Xu (10):
KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX
KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it
KVM: vmx/pmu: Initialize vcpu perf_capabilities once in intel_pmu_init()
KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled
KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
KVM: vmx/pmu: Pass-through LBR msrs to when the guest LBR event is ACTIVE
KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

arch/x86/kvm/pmu.c | 12 +-
arch/x86/kvm/pmu.h | 5 +
arch/x86/kvm/vmx/capabilities.h | 22 ++-
arch/x86/kvm/vmx/pmu_intel.c | 296 +++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 44 ++++-
arch/x86/kvm/vmx/vmx.h | 28 +++
arch/x86/kvm/x86.c | 15 +-
7 files changed, 395 insertions(+), 27 deletions(-)

--
2.21.3


2020-07-26 15:35:16

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 01/10] KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX

SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
svm_get/set_msr, so the x86 common part can be safely moved to VMX.

Add vmx_supported_debugctl() to refactor the throwing logic of #GP.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 5 +++++
arch/x86/kvm/vmx/vmx.c | 11 ++++++++---
arch/x86/kvm/x86.c | 13 -------------
3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 4bbd8b448d22..c199885af7c7 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -378,4 +378,9 @@ static inline u64 vmx_get_perf_capabilities(void)
return PMU_CAP_FW_WRITES;
}

+static inline u64 vmx_supported_debugctl(void)
+{
+ return DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF;
+}
+
#endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1bb59ae5016d..dcde73a230c6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1956,6 +1956,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
return 1;
goto find_shared_msr;
+ case MSR_IA32_DEBUGCTLMSR:
+ msr_info->data = 0;
+ break;
default:
find_shared_msr:
msr = find_msr_entry(vmx, msr_info->index);
@@ -2034,9 +2037,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;

- ret = kvm_set_msr_common(vcpu, msr_info);
- break;
-
+ if (data & ~vmx_supported_debugctl())
+ return 1;
+ vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
+ __func__, data);
+ return 0;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 95ef62922869..c79953b49c77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2897,18 +2897,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
}
break;
- case MSR_IA32_DEBUGCTLMSR:
- if (!data) {
- /* We support the non-activated case already */
- break;
- } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
- /* Values other than LBR and BTF are vendor-specific,
- thus reserved and should throw a #GP */
- return 1;
- }
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
- break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(vcpu, msr, data);
case MSR_IA32_APICBASE:
@@ -3167,7 +3155,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
switch (msr_info->index) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
- case MSR_IA32_DEBUGCTLMSR:
case MSR_IA32_LASTBRANCHFROMIP:
case MSR_IA32_LASTBRANCHTOIP:
case MSR_IA32_LASTINTFROMIP:
--
2.21.3

2020-07-26 15:35:30

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 02/10] KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it

It's reasonable to call vmx_set_intercept_for_msr() in other vmx-specific
files (e.g. pmu_intel.c), so expose it without semantic changes hopefully.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 4 ++--
arch/x86/kvm/vmx/vmx.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dcde73a230c6..162c668d58f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3772,8 +3772,8 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm
}
}

-static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
- u32 msr, int type, bool value)
+__always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
+ u32 msr, int type, bool value)
{
if (value)
vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0d06951e607c..08c850596cfc 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -356,6 +356,8 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
int vmx_find_msr_index(struct vmx_msrs *m, u32 msr);
int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
+void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
+ u32 msr, int type, bool value);

#define POSTED_INTR_ON 0
#define POSTED_INTR_SN 1
--
2.21.3

2020-07-26 15:35:39

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 03/10] KVM: vmx/pmu: Initialize vcpu perf_capabilities once in intel_pmu_init()

The guest hypervisor may configure MSR_IA32_PERF_CAPABILITIES to unmask
some vmx-supported bits in vcpu->arch.perf_capabilities, and the requested
value could affect the exposure of features in the intel_pmu_refresh().

Refactoring its initialization path clears the way for the above usage.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 6 +++---
arch/x86/kvm/x86.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index a886a47daebd..f8083ecf8c7b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -327,7 +327,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
pmu->version = 0;
pmu->reserved_bits = 0xffffffff00200000ull;
- vcpu->arch.perf_capabilities = 0;

entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
@@ -340,8 +339,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
- if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
- vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -401,6 +398,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
pmu->fixed_counters[i].current_config = 0;
}
+
+ vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
+ vmx_get_perf_capabilities() : 0;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c79953b49c77..8a58d0355a99 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2871,7 +2871,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;

vcpu->arch.perf_capabilities = data;
-
+ kvm_pmu_refresh(vcpu);
return 0;
}
case MSR_EFER:
--
2.21.3

2020-07-26 15:35:46

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 05/10] KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR

When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler
would create a guest LBR event which enables the callstack mode and none of
hardware counter is assigned. The host perf would schedule and enable this
event as usual but in an exclusive way.

The guest LBR event will be released when the vPMU is reset but soon,
the lazy release mechanism would be applied to this event like a vPMC.

Adding vcpu_supported_debugctl() to throw #GP per per-guest setting.

Suggested-by: Andi Kleen <[email protected]>
Co-developed-by: Wei Wang <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 7 +++-
arch/x86/kvm/vmx/pmu_intel.c | 61 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 27 ++++++++++++---
arch/x86/kvm/vmx/vmx.h | 10 ++++++
4 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d818081f37e1..26e77c6edcda 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -381,7 +381,12 @@ static inline u64 vmx_get_perf_capabilities(void)

static inline u64 vmx_supported_debugctl(void)
{
- return DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF;
+ u64 debugctl = DEBUGCTLMSR_BTF;
+
+ if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
+ debugctl |= DEBUGCTLMSR_LBR;
+
+ return debugctl;
}

#endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 91212fe5ec56..db1d78ddabac 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -235,6 +235,65 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
return pmc;
}

+static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->event) {
+ perf_event_release_kernel(lbr_desc->event);
+ lbr_desc->event = NULL;
+ vcpu_to_pmu(vcpu)->event_count--;
+ }
+}
+
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ struct perf_event *event;
+
+ /*
+ * The perf_event_attr is constructed in the minimum efficient way:
+ * - set 'pinned = true' to make it task pinned so that if another
+ * cpu pinned event reclaims LBR, the event->oncpu will be set to -1;
+ * - set '.exclude_host = true' to record guest branches behavior;
+ *
+ * - set '.config = INTEL_FIXED_VLBR_EVENT' to indicates host perf
+ * schedule the event without a real HW counter but a fake one;
+ * check is_guest_lbr_event() and __intel_get_event_constraints();
+ *
+ * - set 'sample_type = PERF_SAMPLE_BRANCH_STACK' and
+ * 'branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ * PERF_SAMPLE_BRANCH_USER' to configure it as a LBR callstack
+ * event, which helps KVM to save/restore guest LBR records
+ * during host context switches and reduces quite a lot overhead,
+ * check branch_user_callstack() and intel_pmu_lbr_sched_task();
+ */
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_RAW,
+ .size = sizeof(attr),
+ .config = INTEL_FIXED_VLBR_EVENT,
+ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ .pinned = true,
+ .exclude_host = true,
+ .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ PERF_SAMPLE_BRANCH_USER,
+ };
+
+ if (unlikely(lbr_desc->event))
+ return 0;
+
+ event = perf_event_create_kernel_counter(&attr, -1,
+ current, NULL, NULL);
+ if (IS_ERR(event)) {
+ pr_debug_ratelimited("%s: failed %ld\n",
+ __func__, PTR_ERR(event));
+ return -ENOENT;
+ }
+ lbr_desc->event = event;
+ vcpu_to_pmu(vcpu)->event_count++;
+ return 0;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -441,6 +500,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
vmx_get_perf_capabilities() : 0;
lbr_desc->records.nr = 0;
+ lbr_desc->event = NULL;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -465,6 +525,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)

pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
pmu->global_ovf_ctrl = 0;
+ intel_pmu_release_guest_lbr_event(vcpu);
}

struct kvm_pmu_ops intel_pmu_ops = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 162c668d58f5..1204dc730e4f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1957,8 +1957,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
goto find_shared_msr;
case MSR_IA32_DEBUGCTLMSR:
- msr_info->data = 0;
- break;
+ msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return 0;
default:
find_shared_msr:
msr = find_msr_entry(vmx, msr_info->index);
@@ -1982,6 +1982,16 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}

+static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 debugctl = vmx_supported_debugctl();
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ debugctl &= ~DEBUGCTLMSR_LBR;
+
+ return debugctl;
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2037,10 +2047,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;

- if (data & ~vmx_supported_debugctl())
+ if (data & ~vcpu_supported_debugctl(vcpu))
return 1;
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
+ if (data & DEBUGCTLMSR_BTF) {
+ vcpu_unimpl(vcpu, "%s: BTF in MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
+ __func__, data);
+ data &= ~DEBUGCTLMSR_BTF;
+ }
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
+ (data & DEBUGCTLMSR_LBR))
+ intel_pmu_create_guest_lbr_event(vcpu);
return 0;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c24d89ea70c5..eef8eaaec031 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -98,9 +98,19 @@ struct pt_desc {
bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
struct lbr_desc {
/* Basic info about guest LBR records. */
struct x86_pmu_lbr records;
+
+ /*
+ * Emulate LBR feature via passthrough LBR registers when the
+ * per-vcpu guest LBR event is scheduled on the current pcpu.
+ *
+ * The records may be inaccurate if the host reclaims the LBR.
+ */
+ struct perf_event *event;
};

/*
--
2.21.3

2020-07-26 15:35:55

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 07/10] KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 13 +++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 0358ceea34d4..08d195e08deb 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -562,6 +562,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
vmx_get_perf_capabilities() : 0;
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
+ lbr_desc->already_passthrough = false;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -611,12 +612,24 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)

static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->already_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, true);
+ lbr_desc->already_passthrough = false;
}

static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->already_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, false);
+ lbr_desc->already_passthrough = true;
}

/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index dd029b57215c..f95d61942a1c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -112,6 +112,9 @@ struct lbr_desc {
* The records may be inaccurate if the host reclaims the LBR.
*/
struct perf_event *event;
+
+ /* A flag to reduce the overhead of LBR pass-through or cancellation. */
+ bool already_passthrough;
};

/*
--
2.21.3

2020-07-26 15:35:59

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 08/10] KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 5 ++++-
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/capabilities.h | 4 +++-
arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 2 +-
5 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 67741d2a0308..405890c723a1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -383,8 +383,11 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
{
- if (lapic_in_kernel(vcpu))
+ if (lapic_in_kernel(vcpu)) {
+ if (kvm_x86_ops.pmu_ops->deliver_pmi)
+ kvm_x86_ops.pmu_ops->deliver_pmi(vcpu);
kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTPC);
+ }
}

bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 067fef51760c..742a4e98df8c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -39,6 +39,7 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+ void (*deliver_pmi)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 26e77c6edcda..f17ab6cf152e 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -21,6 +21,8 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f

+#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
+
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -384,7 +386,7 @@ static inline u64 vmx_supported_debugctl(void)
u64 debugctl = DEBUGCTLMSR_BTF;

if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR;
+ debugctl |= DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 08d195e08deb..bcac0f59bbf1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -590,6 +590,35 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+/*
+ * Emulate LBR_On_PMI behavior for 1 < pmu.version < 4.
+ *
+ * If Freeze_LBR_On_PMI = 1, the LBR is frozen on PMI and
+ * the KVM emulates to clear the LBR bit (bit 0) in IA32_DEBUGCTL.
+ *
+ * Guest needs to re-enable LBR to resume branches recording.
+ */
+static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
+{
+ u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
+ data &= ~DEBUGCTLMSR_LBR;
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ }
+}
+
+static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
+{
+ u8 version = vcpu_to_pmu(vcpu)->version;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return;
+
+ if (version > 1 && version < 4)
+ intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
+}
+
static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
{
unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
@@ -680,4 +709,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+ .deliver_pmi = intel_pmu_deliver_pmi,
};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index daab79d1ccc3..b72fbbef56fa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1987,7 +1987,7 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
u64 debugctl = vmx_supported_debugctl();

if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR;
+ debugctl &= ~DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
--
2.21.3

2020-07-26 15:36:17

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 10/10] KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 7 +++++++
arch/x86/kvm/pmu.h | 4 ++++
arch/x86/kvm/vmx/pmu_intel.c | 17 ++++++++++++++++-
3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 405890c723a1..e7c72eea07d4 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -463,6 +463,7 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
struct kvm_pmc *pmc = NULL;
DECLARE_BITMAP(bitmask, X86_PMC_IDX_MAX);
int i;
+ bool extra_cleanup = false;

pmu->need_cleanup = false;

@@ -474,8 +475,14 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)

if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
pmc_stop_counter(pmc);
+
+ if (i == INTEL_GUEST_LBR_INUSE)
+ extra_cleanup = true;
}

+ if (extra_cleanup && kvm_x86_ops.pmu_ops->cleanup)
+ kvm_x86_ops.pmu_ops->cleanup(vcpu);
+
bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 742a4e98df8c..c8b650866f56 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -15,6 +15,9 @@
#define VMWARE_BACKDOOR_PMC_REAL_TIME 0x10001
#define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002

+/* Indicates whether Intel LBR msrs were accessed during the last time slice. */
+#define INTEL_GUEST_LBR_INUSE INTEL_PMC_IDX_FIXED_VLBR
+
#define MAX_FIXED_COUNTERS 3

struct kvm_event_hw_type_mapping {
@@ -40,6 +43,7 @@ struct kvm_pmu_ops {
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
+ void (*cleanup)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index bcac0f59bbf1..d61a30d3a6ed 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -310,6 +310,7 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
}
lbr_desc->event = event;
vcpu_to_pmu(vcpu)->event_count++;
+ __set_bit(INTEL_GUEST_LBR_INUSE, vcpu_to_pmu(vcpu)->pmc_in_use);
return 0;
}

@@ -342,10 +343,12 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
rdmsrl(index, msr_info->data);
else
wrmsrl(index, msr_info->data);
+ __set_bit(INTEL_GUEST_LBR_INUSE, vcpu_to_pmu(vcpu)->pmc_in_use);
local_irq_enable();
return true;
}
local_irq_enable();
+ clear_bit(INTEL_GUEST_LBR_INUSE, vcpu_to_pmu(vcpu)->pmc_in_use);

dummy:
if (read)
@@ -496,7 +499,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
if (!intel_pmu_lbr_is_enabled(vcpu)) {
vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
lbr_desc->records.nr = 0;
- }
+ } else
+ bitmap_set(pmu->all_valid_pmc_idx, INTEL_GUEST_LBR_INUSE, 1);

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -673,17 +677,21 @@ static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
*/
void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
goto warn;
+ if (test_bit(INTEL_GUEST_LBR_INUSE, pmu->pmc_in_use))
+ goto warn;
return;
}

if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
vmx_disable_lbr_msrs_passthrough(vcpu);
+ __clear_bit(INTEL_GUEST_LBR_INUSE, pmu->pmc_in_use);
goto warn;
} else
vmx_enable_lbr_msrs_passthrough(vcpu);
@@ -695,6 +703,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
vcpu->vcpu_id);
}

+static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+ if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ intel_pmu_release_guest_lbr_event(vcpu);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
@@ -710,4 +724,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.init = intel_pmu_init,
.reset = intel_pmu_reset,
.deliver_pmi = intel_pmu_deliver_pmi,
+ .cleanup = intel_pmu_cleanup,
};
--
2.21.3

2020-07-26 15:37:06

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 06/10] KVM: vmx/pmu: Pass-through LBR msrs to when the guest LBR event is ACTIVE

In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access
will result in a creation of guest LBR event per-vcpu.

If the guest LBR event is scheduled on with the corresponding vcpu context,
KVM will pass-through all LBR records msrs to the guest. The LBR callstack
mechanism implemented in the host could help save/restore the guest LBR
records during the event context switches, which reduces a lot of overhead
if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the
much more frequent VMX transitions.

To avoid reclaiming LBR resources from any higher priority event on host,
KVM would always check the exist of guest LBR event and its state before
vm-entry as late as possible. A negative result would cancel the
pass-through state, and it also prevents real registers accesses and
potential data leakage. If host reclaims the LBR between two checks, the
interception state and LBR records can be safely preserved due to native
save/restore support from guest LBR event.

The KVM emits a pr_warn() when the LBR hardware is unavailable to the
guest LBR event. The administer is supposed to reminder users that the
guest result may be inaccurate if someone is using LBR to record
hypervisor on the host side.

Suggested-by: Andi Kleen <[email protected]>
Co-developed-by: Wei Wang <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 131 ++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 2 +
arch/x86/kvm/vmx/vmx.h | 1 +
3 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index db1d78ddabac..0358ceea34d4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -201,6 +201,24 @@ bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
return true;
}

+static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
+{
+ struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
+ bool ret = false;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return ret;
+
+ ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
+ (index >= records->from && index < records->from + records->nr) ||
+ (index >= records->to && index < records->to + records->nr);
+
+ if (!ret && records->info)
+ ret = (index >= records->info && index < records->info + records->nr);
+
+ return ret;
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -216,7 +234,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
default:
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
- get_fixed_pmc(pmu, msr) || get_fw_gp_pmc(pmu, msr);
+ get_fixed_pmc(pmu, msr) || get_fw_gp_pmc(pmu, msr) ||
+ intel_pmu_is_valid_lbr_msr(vcpu, msr);
break;
}

@@ -294,6 +313,46 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
return 0;
}

+/*
+ * It's safe to access LBR msrs from guest when they have not
+ * been passthrough since the host would help restore or reset
+ * the LBR msrs records when the guest LBR event is scheduled in.
+ */
+static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info, bool read)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ u32 index = msr_info->index;
+
+ if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
+ return false;
+
+ if (!lbr_desc->event && !intel_pmu_create_guest_lbr_event(vcpu))
+ goto dummy;
+
+ /*
+ * Disable irq to ensure the LBR feature doesn't get reclaimed by the
+ * host at the time the value is read from the msr, and this avoids the
+ * host LBR value to be leaked to the guest. If LBR has been reclaimed,
+ * return 0 on guest reads.
+ */
+ local_irq_disable();
+ if (lbr_desc->event->state == PERF_EVENT_STATE_ACTIVE) {
+ if (read)
+ rdmsrl(index, msr_info->data);
+ else
+ wrmsrl(index, msr_info->data);
+ local_irq_enable();
+ return true;
+ }
+ local_irq_enable();
+
+dummy:
+ if (read)
+ msr_info->data = 0;
+ return true;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -328,7 +387,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
msr_info->data = pmc->eventsel;
return 0;
- }
+ } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true))
+ return 0;
}

return 1;
@@ -399,7 +459,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
reprogram_gp_counter(pmc, data);
return 0;
}
- }
+ } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
+ return 0;
}

return 1;
@@ -528,6 +589,70 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
+{
+ unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ int i;
+
+ for (i = 0; i < lbr->nr; i++) {
+ vmx_set_intercept_for_msr(msr_bitmap,
+ lbr->from + i, MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(msr_bitmap,
+ lbr->to + i, MSR_TYPE_RW, set);
+ if (lbr->info)
+ vmx_set_intercept_for_msr(msr_bitmap,
+ lbr->info + i, MSR_TYPE_RW, set);
+ }
+
+ vmx_set_intercept_for_msr(msr_bitmap, MSR_LBR_SELECT, MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(msr_bitmap, MSR_LBR_TOS, MSR_TYPE_RW, set);
+}
+
+static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
+{
+ vmx_update_intercept_for_lbr_msrs(vcpu, true);
+}
+
+static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
+{
+ vmx_update_intercept_for_lbr_msrs(vcpu, false);
+}
+
+/*
+ * Higher priority host perf events (e.g. cpu pinned) could reclaim the
+ * pmu resources (e.g. LBR) that were assigned to the guest. This is
+ * usually done via ipi calls (more details in perf_install_in_context).
+ *
+ * Before entering the non-root mode (with irq disabled here), double
+ * confirm that the pmu features enabled to the guest are not reclaimed
+ * by higher priority host events. Otherwise, disallow vcpu's access to
+ * the reclaimed features.
+ */
+void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->event) {
+ vmx_disable_lbr_msrs_passthrough(vcpu);
+ if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+ goto warn;
+ return;
+ }
+
+ if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
+ vmx_disable_lbr_msrs_passthrough(vcpu);
+ goto warn;
+ } else
+ vmx_enable_lbr_msrs_passthrough(vcpu);
+
+ return;
+
+warn:
+ pr_warn_ratelimited("kvm: vcpu-%d: fail to passthrough LBR.\n",
+ vcpu->vcpu_id);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1204dc730e4f..daab79d1ccc3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6822,6 +6822,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
pt_guest_enter(vmx);

atomic_switch_perf_msrs(vmx);
+ if (intel_pmu_lbr_is_enabled(vcpu))
+ vmx_passthrough_lbr_msrs(vcpu);

if (enable_preemption_timer)
vmx_update_hv_timer(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index eef8eaaec031..dd029b57215c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -99,6 +99,7 @@ bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);

struct lbr_desc {
/* Basic info about guest LBR records. */
--
2.21.3

2020-07-26 15:37:12

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 04/10] KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled

The LBR could be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.

If LBR is disabled on the guest, the bits [0, 5] of the read-only
MSR_IA32_PERF_CAPABILITIES which tells about the record format
stored in the LBR records would be cleared.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 40 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 12 ++++++++++
3 files changed, 53 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index c199885af7c7..d818081f37e1 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -19,6 +19,7 @@ extern int __read_mostly pt_mode;
#define PT_MODE_HOST_GUEST 1

#define PMU_CAP_FW_WRITES (1ULL << 13)
+#define PMU_CAP_LBR_FMT 0x3f

struct nested_vmx_msrs {
/*
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f8083ecf8c7b..91212fe5ec56 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -168,6 +168,39 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
}

+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+ if (pmu->version < 2)
+ return false;
+
+ /*
+ * As a first step, a guest could only enable LBR feature if its
+ * cpu model is the same as the host because the LBR registers
+ * would be pass-through to the guest and they're model specific.
+ */
+ if (boot_cpu_data.x86_model != guest_cpuid_model(vcpu))
+ return false;
+
+ return !x86_perf_get_lbr(lbr);
+}
+
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ u64 lbr_fmt = vcpu->arch.perf_capabilities & PMU_CAP_LBR_FMT;
+
+ if (lbr->nr && lbr_fmt)
+ return true;
+
+ if (!lbr_fmt || !intel_pmu_lbr_is_compatible(vcpu))
+ return false;
+
+ return true;
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -320,6 +353,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
union cpuid10_edx edx;
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

pmu->nr_arch_gp_counters = 0;
pmu->nr_arch_fixed_counters = 0;
@@ -339,6 +373,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
+ if (!intel_pmu_lbr_is_enabled(vcpu)) {
+ vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
+ lbr_desc->records.nr = 0;
+ }

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -384,6 +422,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
{
int i;
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
pmu->gp_counters[i].type = KVM_PMC_GP;
@@ -401,6 +440,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)

vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
vmx_get_perf_capabilities() : 0;
+ lbr_desc->records.nr = 0;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 08c850596cfc..c24d89ea70c5 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -92,6 +92,17 @@ struct pt_desc {
struct pt_ctx guest;
};

+#define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
+#define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
+
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+ /* Basic info about guest LBR records. */
+ struct x86_pmu_lbr records;
+};
+
/*
* The nested_vmx structure is part of vcpu_vmx, and holds information we need
* for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -301,6 +312,7 @@ struct vcpu_vmx {
u64 ept_pointer;

struct pt_desc pt_desc;
+ struct lbr_desc lbr_desc;
};

enum ept_pointers_status {
--
2.21.3

2020-07-26 15:38:07

by Like Xu

[permalink] [raw]
Subject: [PATCH v13 09/10] KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES

Userspace could enable guest LBR feature when the exactly supported
LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
and the LBR is also compatible with vPMU version and host cpu model.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 9 ++++++++-
arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index f17ab6cf152e..5829f4e9a7e0 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -378,7 +378,14 @@ static inline u64 vmx_get_perf_capabilities(void)
* Since counters are virtualized, KVM would support full
* width counting unconditionally, even if the host lacks it.
*/
- return PMU_CAP_FW_WRITES;
+ u64 perf_cap = PMU_CAP_FW_WRITES;
+
+ if (boot_cpu_has(X86_FEATURE_PDCM))
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
+
+ perf_cap |= perf_cap & PMU_CAP_LBR_FMT;
+
+ return perf_cap;
}

static inline u64 vmx_supported_debugctl(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b72fbbef56fa..6d8b1d98ae1d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2251,6 +2251,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((data >> 32) != 0)
return 1;
goto find_shared_msr;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (data & PMU_CAP_LBR_FMT) {
+ if ((data & PMU_CAP_LBR_FMT) !=
+ (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
+ return 1;
+ if (!intel_pmu_lbr_is_compatible(vcpu))
+ return 1;
+ }
+ ret = kvm_set_msr_common(vcpu, msr_info);
+ break;

default:
find_shared_msr:
--
2.21.3

2020-07-26 15:38:50

by Like Xu

[permalink] [raw]
Subject: [PATCH] target/i386: add -cpu,lbr=true support to enable guest LBR

The LBR feature would be enabled on the guest if:
- the KVM is enabled and the PMU is enabled and,
- the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and,
- the supported returned value for lbr_fmt from this msr is not zero.

The LBR feature would be disabled on the guest if:
- the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR,
- qemu set the IA32_PERF_CAPABILITIES msr feature without lbr_fmt values OR,
- the requested guest vcpu model doesn't support PDCM.

Cc: Paolo Bonzini <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Eduardo Habkost <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Marcel Apfelbaum <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: [email protected]
Signed-off-by: Like Xu <[email protected]>
---
hw/i386/pc.c | 1 +
target/i386/cpu.c | 24 ++++++++++++++++++++++--
target/i386/cpu.h | 2 ++
target/i386/kvm.c | 7 ++++++-
4 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3d419d5991..857aff75bb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -318,6 +318,7 @@ GlobalProperty pc_compat_1_5[] = {
{ "Nehalem-" TYPE_X86_CPU, "min-level", "2" },
{ "virtio-net-pci", "any_layout", "off" },
{ TYPE_X86_CPU, "pmu", "on" },
+ { TYPE_X86_CPU, "lbr", "on" },
{ "i440FX-pcihost", "short_root_bus", "0" },
{ "q35-pcihost", "short_root_bus", "0" },
};
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 588f32e136..c803994887 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1142,8 +1142,8 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
[FEAT_PERF_CAPABILITIES] = {
.type = MSR_FEATURE_WORD,
.feat_names = {
- NULL, NULL, NULL, NULL,
- NULL, NULL, NULL, NULL,
+ "lbr-fmt-bit-0", "lbr-fmt-bit-1", "lbr-fmt-bit-2", "lbr-fmt-bit-3",
+ "lbr-fmt-bit-4", "lbr-fmt-bit-5", NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, "full-width-write", NULL, NULL,
NULL, NULL, NULL, NULL,
@@ -4224,6 +4224,12 @@ static bool lmce_supported(void)
return !!(mce_cap & MCG_LMCE_P);
}

+static inline bool lbr_supported(void)
+{
+ return kvm_enabled() && (kvm_arch_get_supported_msr_feature(kvm_state,
+ MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT);
+}
+
#define CPUID_MODEL_ID_SZ 48

/**
@@ -4327,6 +4333,9 @@ static void max_x86_cpu_initfn(Object *obj)
}

object_property_set_bool(OBJECT(cpu), "pmu", true, &error_abort);
+ if (lbr_supported()) {
+ object_property_set_bool(OBJECT(cpu), "lbr", true, &error_abort);
+ }
}

static const TypeInfo max_x86_cpu_type_info = {
@@ -5535,6 +5544,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
}
if (!cpu->enable_pmu) {
*ecx &= ~CPUID_EXT_PDCM;
+ if (cpu->enable_lbr) {
+ warn_report("LBR is unsupported since guest PMU is disabled.");
+ exit(1);
+ }
}
break;
case 2:
@@ -6553,6 +6566,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
}
}

+ if (!cpu->max_features && cpu->enable_lbr &&
+ !(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) {
+ warn_report("requested vcpu model doesn't support PDCM for LBR.");
+ exit(1);
+ }
+
if (cpu->ucode_rev == 0) {
/* The default is the same as KVM's. */
if (IS_AMD_CPU(env)) {
@@ -7187,6 +7206,7 @@ static Property x86_cpu_properties[] = {
#endif
DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
+ DEFINE_PROP_BOOL("lbr", X86CPU, enable_lbr, false),

DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts,
HYPERV_SPINLOCK_NEVER_RETRY),
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e1a5c174dc..a059913e26 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -357,6 +357,7 @@ typedef enum X86Seg {
#define ARCH_CAP_TSX_CTRL_MSR (1<<7)

#define MSR_IA32_PERF_CAPABILITIES 0x345
+#define PERF_CAP_LBR_FMT 0x3f

#define MSR_IA32_TSX_CTRL 0x122
#define MSR_IA32_TSCDEADLINE 0x6e0
@@ -1702,6 +1703,7 @@ struct X86CPU {
* capabilities) directly to the guest.
*/
bool enable_pmu;
+ bool enable_lbr;

/* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is
* disabled by default to avoid breaking migration between QEMU with
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index b8455c89ed..feb33d5472 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -2690,8 +2690,10 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f)
uint64_t kvm_perf_cap =
kvm_arch_get_supported_msr_feature(kvm_state,
MSR_IA32_PERF_CAPABILITIES);
-
if (kvm_perf_cap) {
+ if (!cpu->enable_lbr) {
+ kvm_perf_cap &= ~PERF_CAP_LBR_FMT;
+ }
kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES,
kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]);
}
@@ -2731,6 +2733,9 @@ static void kvm_init_msrs(X86CPU *cpu)

if (has_msr_perf_capabs && cpu->enable_pmu) {
kvm_msr_entry_add_perf(cpu, env->features);
+ } else if (!has_msr_perf_capabs && cpu->enable_lbr) {
+ warn_report("KVM doesn't support MSR_IA32_PERF_CAPABILITIES for LBR.");
+ exit(1);
}

if (has_msr_ucode_rev) {
--
2.21.3

2020-08-14 09:53:25

by Xu, Like

[permalink] [raw]
Subject: Re: [PATCH v13 00/10] Guest Last Branch Recording Enabling (KVM part)

Are there no interested reviewers or users?

Just a kindly ping.

On 2020/7/26 23:32, Like Xu wrote:
> Hi Paolo,
>
> Please review this new version for the Kernel 5.9 release, and
> Sean may not review them as he said in the previous email
> https://lore.kernel.org/kvm/[email protected]/
>
> You may cherry-pick the perf patches "3cb9d5464c1c..e1ad1ac2deb8"
> from the branch "tip/perf/core" of scm/linux/kernel/git/tip/tip.git
> as PeterZ said in the previous email
> https://lore.kernel.org/kvm/[email protected]/
>
> We may also apply the qemu-devel patch to the upstream qemu and try
> the QEMU command lines with '-cpu host' or '-cpu host,pmu=true,lbr=true'.
>
> The following error will be gone forever with the patchset:
>
> $ perf record -b lbr ${WORKLOAD}
> or $ perf record --call-graph lbr ${WORKLOAD}
> Error:
> cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
>
> Please check more details in each commit and feel free to test.
>
> v12->v13 Changelog:
> - remove perf patches since they're queued in the tip/perf/core;
> - add a minor patch to refactor MSR_IA32_DEBUGCTLMSR set/get handler;
> - add a minor patch to expose vmx_set_intercept_for_msr();
> - add a minor patch to initialize perf_capabilities in the intel_pmu_init();
> - spilt the big patch to three pieces (0004-0006) for better understanding and review
> - make the LBR_FMT exposure patch as the last step to enable guest LBR;
>
> Previous:
> https://lore.kernel.org/kvm/[email protected]/
>
> ---
>
> The last branch recording (LBR) is a performance monitor unit (PMU)
> feature on Intel processors that records a running trace of the most
> recent branches taken by the processor in the LBR stack. This patch
> series is going to enable this feature for plenty of KVM guests.
>
> The user space could configure whether it's enabled or not for each
> guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
> could only enable LBR feature if its cpu model is the same as the
> host since the LBR feature is still one of model specific features.
>
> If it's enabled on the guest, the guest LBR driver would accesses the
> LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
> The first guest access on the LBR related MSRs is always interceptible.
> The KVM trap would create a special LBR event (called guest LBR event)
> which enables the callstack mode and none of hardware counter is assigned.
> The host perf would enable and schedule this event as usual.
>
> Guest's first access to a LBR registers gets trapped to KVM, which
> creates a guest LBR perf event. It's a regular LBR perf event which gets
> the LBR facility assigned from the perf subsystem. Once that succeeds,
> the LBR stack msrs are passed through to the guest for efficient accesses.
> However, if another host LBR event comes in and takes over the LBR
> facility, the LBR msrs will be made interceptible, and guest following
> accesses to the LBR msrs will be trapped and meaningless.
>
> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
> VMX transition brings too excessive overhead to frequent vmx transition
> itself, the guest LBR event would help save/restore the LBR stack msrs
> during the context switching with the help of native LBR event callstack
> mechanism, including LBR_SELECT msr.
>
> If the guest no longer accesses the LBR-related MSRs within a scheduling
> time slice and the LBR enable bit is unset, vPMU would release its guest
> LBR event as a normal event of a unused vPMC and the pass-through
> state of the LBR stack msrs would be canceled.
>
> ---
>
> LBR testcase:
> echo 1 > /proc/sys/kernel/watchdog
> echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
> echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
> ./perf record -b ./br_instr a
>
> - Perf report on the host:
> Samples: 72K of event 'cycles', Event count (approx.): 72512
> Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
> 12.12% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
> 11.05% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
> 8.81% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
> 5.04% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
> 4.92% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
> 4.88% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
> 4.58% br_instr br_instr [.] cmp_end [.] lfsr_cond 5
>
> - Perf report on the guest:
> Samples: 92K of event 'cycles', Event count (approx.): 92544
> Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
> 12.03% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
> 11.09% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
> 8.57% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
> 5.08% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
> 5.06% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
> 4.87% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
> 4.70% br_instr br_instr [.] cmp_end [.] lfsr_cond 5
>
> Conclusion: the profiling results on the guest are similar to that on the host.
>
> Like Xu (10):
> KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX
> KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it
> KVM: vmx/pmu: Initialize vcpu perf_capabilities once in intel_pmu_init()
> KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled
> KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
> KVM: vmx/pmu: Pass-through LBR msrs to when the guest LBR event is ACTIVE
> KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
> KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
> KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
> KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
>
> arch/x86/kvm/pmu.c | 12 +-
> arch/x86/kvm/pmu.h | 5 +
> arch/x86/kvm/vmx/capabilities.h | 22 ++-
> arch/x86/kvm/vmx/pmu_intel.c | 296 +++++++++++++++++++++++++++++++-
> arch/x86/kvm/vmx/vmx.c | 44 ++++-
> arch/x86/kvm/vmx/vmx.h | 28 +++
> arch/x86/kvm/x86.c | 15 +-
> 7 files changed, 395 insertions(+), 27 deletions(-)
>

2020-09-04 01:58:55

by Xu, Like

[permalink] [raw]
Subject: Re: [PATCH v13 00/10] Guest Last Branch Recording Enabling (KVM part)

Hi Paolo,

Do you have time or plan to review this patch series in this kernel cycle
since we have merged perf patches in the upstream to make it happen ?

Thanks,
Like Xu

On 2020/8/14 16:48, Xu, Like wrote:
> Are there no interested reviewers or users?
>
> Just a kindly ping.
>
> On 2020/7/26 23:32, Like Xu wrote:
>> Hi Paolo,
>>
>> Please review this new version for the Kernel 5.9 release, and
>> Sean may not review them as he said in the previous email
>> https://lore.kernel.org/kvm/[email protected]/
>>
>> You may cherry-pick the perf patches "3cb9d5464c1c..e1ad1ac2deb8"
>> from the branch "tip/perf/core" of scm/linux/kernel/git/tip/tip.git
>> as PeterZ said in the previous email
>> https://lore.kernel.org/kvm/[email protected]/
>>
>>
>> We may also apply the qemu-devel patch to the upstream qemu and try
>> the QEMU command lines with '-cpu host' or '-cpu host,pmu=true,lbr=true'.
>>
>> The following error will be gone forever with the patchset:
>>
>>    $ perf record -b lbr ${WORKLOAD}
>>    or $ perf record --call-graph lbr ${WORKLOAD}
>>    Error:
>>    cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
>> Try 'perf stat'
>>
>> Please check more details in each commit and feel free to test.
>>
>> v12->v13 Changelog:
>> - remove perf patches since they're queued in the tip/perf/core;
>> - add a minor patch to refactor MSR_IA32_DEBUGCTLMSR set/get handler;
>> - add a minor patch to expose vmx_set_intercept_for_msr();
>> - add a minor patch to initialize perf_capabilities in the
>> intel_pmu_init();
>> - spilt the big patch to three pieces (0004-0006) for better
>> understanding and review
>> - make the LBR_FMT exposure patch as the last step to enable guest LBR;
>>
>> Previous:
>> https://lore.kernel.org/kvm/[email protected]/
>>
>>
>> ---
>>
>> The last branch recording (LBR) is a performance monitor unit (PMU)
>> feature on Intel processors that records a running trace of the most
>> recent branches taken by the processor in the LBR stack. This patch
>> series is going to enable this feature for plenty of KVM guests.
>>
>> The user space could configure whether it's enabled or not for each
>> guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
>> could only enable LBR feature if its cpu model is the same as the
>> host since the LBR feature is still one of model specific features.
>>
>> If it's enabled on the guest, the guest LBR driver would accesses the
>> LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
>> The first guest access on the LBR related MSRs is always interceptible.
>> The KVM trap would create a special LBR event (called guest LBR event)
>> which enables the callstack mode and none of hardware counter is assigned.
>> The host perf would enable and schedule this event as usual.
>>
>> Guest's first access to a LBR registers gets trapped to KVM, which
>> creates a guest LBR perf event. It's a regular LBR perf event which gets
>> the LBR facility assigned from the perf subsystem. Once that succeeds,
>> the LBR stack msrs are passed through to the guest for efficient accesses.
>> However, if another host LBR event comes in and takes over the LBR
>> facility, the LBR msrs will be made interceptible, and guest following
>> accesses to the LBR msrs will be trapped and meaningless.
>>
>> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
>> VMX transition brings too excessive overhead to frequent vmx transition
>> itself, the guest LBR event would help save/restore the LBR stack msrs
>> during the context switching with the help of native LBR event callstack
>> mechanism, including LBR_SELECT msr.
>>
>> If the guest no longer accesses the LBR-related MSRs within a scheduling
>> time slice and the LBR enable bit is unset, vPMU would release its guest
>> LBR event as a normal event of a unused vPMC and the pass-through
>> state of the LBR stack msrs would be canceled.
>>
>> ---
>>
>> LBR testcase:
>> echo 1 > /proc/sys/kernel/watchdog
>> echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
>> echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
>> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
>> ./perf record -b ./br_instr a
>>
>> - Perf report on the host:
>> Samples: 72K of event 'cycles', Event count (approx.): 72512
>> Overhead  Command   Source Shared Object           Source
>> Symbol                           Target Symbol                          
>> Basic Block Cycles
>>    12.12%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           1
>>    11.05%  br_instr  br_instr                       [.]
>> lfsr_cond                           [.]
>> cmp_end                             5
>>     8.81%  br_instr  br_instr                       [.]
>> lfsr_cond                           [.]
>> cmp_end                             4
>>     5.04%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           20
>>     4.92%  br_instr  br_instr                       [.]
>> lfsr_cond                           [.]
>> cmp_end                             6
>>     4.88%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           6
>>     4.58%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           5
>>
>> - Perf report on the guest:
>> Samples: 92K of event 'cycles', Event count (approx.): 92544
>> Overhead  Command   Source Shared Object  Source
>> Symbol                                   Target
>> Symbol                                   Basic Block Cycles
>>    12.03%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   1
>>    11.09%  br_instr  br_instr              [.]
>> lfsr_cond                                   [.]
>> cmp_end                                     5
>>     8.57%  br_instr  br_instr              [.]
>> lfsr_cond                                   [.]
>> cmp_end                                     4
>>     5.08%  br_instr  br_instr              [.]
>> lfsr_cond                                   [.]
>> cmp_end                                     6
>>     5.06%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   20
>>     4.87%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   6
>>     4.70%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   5
>>
>> Conclusion: the profiling results on the guest are similar to that on
>> the host.
>>
>> Like Xu (10):
>>    KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX
>>    KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it
>>    KVM: vmx/pmu: Initialize vcpu perf_capabilities once in intel_pmu_init()
>>    KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled
>>    KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
>>    KVM: vmx/pmu: Pass-through LBR msrs to when the guest LBR event is
>> ACTIVE
>>    KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
>>    KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
>>    KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
>>    KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
>>
>>   arch/x86/kvm/pmu.c              |  12 +-
>>   arch/x86/kvm/pmu.h              |   5 +
>>   arch/x86/kvm/vmx/capabilities.h |  22 ++-
>>   arch/x86/kvm/vmx/pmu_intel.c    | 296 +++++++++++++++++++++++++++++++-
>>   arch/x86/kvm/vmx/vmx.c          |  44 ++++-
>>   arch/x86/kvm/vmx/vmx.h          |  28 +++
>>   arch/x86/kvm/x86.c              |  15 +-
>>   7 files changed, 395 insertions(+), 27 deletions(-)
>>
>

2020-09-24 22:07:08

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH] target/i386: add -cpu,lbr=true support to enable guest LBR

I've just noticed this on my review queue (apologies for the long
delay). Comments below:

On Sun, Jul 26, 2020 at 11:32:20PM +0800, Like Xu wrote:
> The LBR feature would be enabled on the guest if:
> - the KVM is enabled and the PMU is enabled and,
> - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and,
> - the supported returned value for lbr_fmt from this msr is not zero.
>
> The LBR feature would be disabled on the guest if:
> - the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR,
> - qemu set the IA32_PERF_CAPABILITIES msr feature without lbr_fmt values OR,
> - the requested guest vcpu model doesn't support PDCM.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Richard Henderson <[email protected]>
> Cc: Eduardo Habkost <[email protected]>
> Cc: "Michael S. Tsirkin" <[email protected]>
> Cc: Marcel Apfelbaum <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Cc: [email protected]
> Signed-off-by: Like Xu <[email protected]>
> ---
> hw/i386/pc.c | 1 +
> target/i386/cpu.c | 24 ++++++++++++++++++++++--
> target/i386/cpu.h | 2 ++
> target/i386/kvm.c | 7 ++++++-
> 4 files changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3d419d5991..857aff75bb 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -318,6 +318,7 @@ GlobalProperty pc_compat_1_5[] = {
> { "Nehalem-" TYPE_X86_CPU, "min-level", "2" },
> { "virtio-net-pci", "any_layout", "off" },
> { TYPE_X86_CPU, "pmu", "on" },
> + { TYPE_X86_CPU, "lbr", "on" },

Why is this line here?

> { "i440FX-pcihost", "short_root_bus", "0" },
> { "q35-pcihost", "short_root_bus", "0" },
> };
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 588f32e136..c803994887 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1142,8 +1142,8 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
> [FEAT_PERF_CAPABILITIES] = {
> .type = MSR_FEATURE_WORD,
> .feat_names = {
> - NULL, NULL, NULL, NULL,
> - NULL, NULL, NULL, NULL,
> + "lbr-fmt-bit-0", "lbr-fmt-bit-1", "lbr-fmt-bit-2", "lbr-fmt-bit-3",
> + "lbr-fmt-bit-4", "lbr-fmt-bit-5", NULL, NULL,

What about a separate "lbr-fmt" int property instead of
individual bit properties?

What happens if LBR_FMT on the host (returned by
kvm_arch_get_supported_msr_feature(MSR_IA32_PERF_CAPABILITIES) is
different than the one configured for the guest? Can KVM emulate
a CPU with different LBR_FMT, or it must match the host?

If LBR_FMT must always match the host, the feature needs to block
live migration. I guess this is already the case because PDCM is
cleared if !cpu->enable_pmu. Adding PDCM to .unmigratable_flags
is probably a good idea, though.



> NULL, NULL, NULL, NULL,
> NULL, "full-width-write", NULL, NULL,
> NULL, NULL, NULL, NULL,
> @@ -4224,6 +4224,12 @@ static bool lmce_supported(void)
> return !!(mce_cap & MCG_LMCE_P);
> }
>
> +static inline bool lbr_supported(void)
> +{
> + return kvm_enabled() && (kvm_arch_get_supported_msr_feature(kvm_state,
> + MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT);
> +}

You can rewrite this is an accelerator-independent way as:
(x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT)

However, is this really supposed to return false if LBR_FMT is 000000?

> +
> #define CPUID_MODEL_ID_SZ 48
>
> /**
> @@ -4327,6 +4333,9 @@ static void max_x86_cpu_initfn(Object *obj)
> }
>
> object_property_set_bool(OBJECT(cpu), "pmu", true, &error_abort);
> + if (lbr_supported()) {
> + object_property_set_bool(OBJECT(cpu), "lbr", true, &error_abort);

Why is this necessary?

If kvm_arch_get_supported_msr_feature(MSR_IA32_PERF_CAPABILITIES)
return the PERF_CAP_LBR_FMT bits set,
x86_cpu_get_supported_feature_word() will return those bits, and
they will be automatically set at
env->features[FEAT_PERF_CAPABILITIES].

> + }
> }
>
> static const TypeInfo max_x86_cpu_type_info = {
> @@ -5535,6 +5544,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> }
> if (!cpu->enable_pmu) {
> *ecx &= ~CPUID_EXT_PDCM;
> + if (cpu->enable_lbr) {
> + warn_report("LBR is unsupported since guest PMU is disabled.");
> + exit(1);
> + }
> }
> break;
> case 2:
> @@ -6553,6 +6566,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> }
> }
> + if (!cpu->max_features && cpu->enable_lbr &&

Why do we need to check for !cpu->max_features here?

> + !(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) {
> + warn_report("requested vcpu model doesn't support PDCM for LBR.");
> + exit(1);

Please report errors using error_setg(errp, ...) instead.

> + }
> +
> if (cpu->ucode_rev == 0) {
> /* The default is the same as KVM's. */
> if (IS_AMD_CPU(env)) {
> @@ -7187,6 +7206,7 @@ static Property x86_cpu_properties[] = {
> #endif
> DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
> + DEFINE_PROP_BOOL("lbr", X86CPU, enable_lbr, false),

When exactly do we want to set lbr=off explicitly? What's the
expected outcome when lbr=off?


>
> DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts,
> HYPERV_SPINLOCK_NEVER_RETRY),
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index e1a5c174dc..a059913e26 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -357,6 +357,7 @@ typedef enum X86Seg {
> #define ARCH_CAP_TSX_CTRL_MSR (1<<7)
>
> #define MSR_IA32_PERF_CAPABILITIES 0x345
> +#define PERF_CAP_LBR_FMT 0x3f
>
> #define MSR_IA32_TSX_CTRL 0x122
> #define MSR_IA32_TSCDEADLINE 0x6e0
> @@ -1702,6 +1703,7 @@ struct X86CPU {
> * capabilities) directly to the guest.
> */
> bool enable_pmu;
> + bool enable_lbr;

This is a good place to document what enable_lbr=true|false
means (see questions above).


>
> /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is
> * disabled by default to avoid breaking migration between QEMU with
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index b8455c89ed..feb33d5472 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -2690,8 +2690,10 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f)
> uint64_t kvm_perf_cap =
> kvm_arch_get_supported_msr_feature(kvm_state,
> MSR_IA32_PERF_CAPABILITIES);
> -
> if (kvm_perf_cap) {
> + if (!cpu->enable_lbr) {
> + kvm_perf_cap &= ~PERF_CAP_LBR_FMT;
> + }

Why is this necessary? If enable_lbr is false,
f[FEAT_PERF_CAPABILITIES] should not have those bits set at all.

> kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES,
> kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]);
> }
> @@ -2731,6 +2733,9 @@ static void kvm_init_msrs(X86CPU *cpu)
>
> if (has_msr_perf_capabs && cpu->enable_pmu) {
> kvm_msr_entry_add_perf(cpu, env->features);
> + } else if (!has_msr_perf_capabs && cpu->enable_lbr) {
> + warn_report("KVM doesn't support MSR_IA32_PERF_CAPABILITIES for LBR.");
> + exit(1);

This is not the appropriate place to check for unsupported
features. x86_cpu_realizefn() and/or x86_cpu_filter_features()
is.

> }
>
> if (has_msr_ucode_rev) {
> --
> 2.21.3
>

--
Eduardo

2020-09-28 15:44:59

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH] target/i386: add -cpu,lbr=true support to enable guest LBR

On Mon, Sep 28, 2020 at 10:51:03PM +0800, Xu, Like wrote:
> Hi Eduardo,
>
> Thanks for your detailed review.
>
> On 2020/9/25 6:05, Eduardo Habkost wrote:
> > I've just noticed this on my review queue (apologies for the long
> > delay). Comments below:
> >
> > On Sun, Jul 26, 2020 at 11:32:20PM +0800, Like Xu wrote:
> > > The LBR feature would be enabled on the guest if:
> > > - the KVM is enabled and the PMU is enabled and,
> > > - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and,
> > > - the supported returned value for lbr_fmt from this msr is not zero.
> > >
> > > The LBR feature would be disabled on the guest if:
> > > - the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR,
> > > - qemu set the IA32_PERF_CAPABILITIES msr feature without lbr_fmt values OR,
> > > - the requested guest vcpu model doesn't support PDCM.
> > >
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: Richard Henderson <[email protected]>
> > > Cc: Eduardo Habkost <[email protected]>
> > > Cc: "Michael S. Tsirkin" <[email protected]>
> > > Cc: Marcel Apfelbaum <[email protected]>
> > > Cc: Marcelo Tosatti <[email protected]>
> > > Cc: [email protected]
> > > Signed-off-by: Like Xu <[email protected]>
> > > ---
> > > hw/i386/pc.c | 1 +
> > > target/i386/cpu.c | 24 ++++++++++++++++++++++--
> > > target/i386/cpu.h | 2 ++
> > > target/i386/kvm.c | 7 ++++++-
> > > 4 files changed, 31 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index 3d419d5991..857aff75bb 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -318,6 +318,7 @@ GlobalProperty pc_compat_1_5[] = {
> > > { "Nehalem-" TYPE_X86_CPU, "min-level", "2" },
> > > { "virtio-net-pci", "any_layout", "off" },
> > > { TYPE_X86_CPU, "pmu", "on" },
> > > + { TYPE_X86_CPU, "lbr", "on" },
> > Why is this line here?
> I'll remove it.
> >
> > > { "i440FX-pcihost", "short_root_bus", "0" },
> > > { "q35-pcihost", "short_root_bus", "0" },
> > > };
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 588f32e136..c803994887 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -1142,8 +1142,8 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
> > > [FEAT_PERF_CAPABILITIES] = {
> > > .type = MSR_FEATURE_WORD,
> > > .feat_names = {
> > > - NULL, NULL, NULL, NULL,
> > > - NULL, NULL, NULL, NULL,
> > > + "lbr-fmt-bit-0", "lbr-fmt-bit-1", "lbr-fmt-bit-2", "lbr-fmt-bit-3",
> > > + "lbr-fmt-bit-4", "lbr-fmt-bit-5", NULL, NULL,
> > What about a separate "lbr-fmt" int property instead of
> > individual bit properties?
>
> I'm not sure if you mean adding a "separate lbr-fmt int property"
> like "uint64_t tcg_features" to 'struct FeatureWordInfo'.
>
> Would you mind providing more implementation hints,
> considering the PEBS_FMT will be added later ?

You can add a regular uint8_t field to X86CPU, use
DEFINE_PROP_UINT8 at x86_cpu_properties[], and just validate/copy
the bits to cpu->features[FEAT_PERF_CAPABILITIES][bits 0:5] on
x86_cpu_realizefn().


>
> >
> > What happens if LBR_FMT on the host (returned by
> > kvm_arch_get_supported_msr_feature(MSR_IA32_PERF_CAPABILITIES) is
> > different than the one configured for the guest?
> To enable guest LBR, guest LBR_FMT must be the same as host LBR_FMT.
> > Can KVM emulate
> > a CPU with different LBR_FMT, or it must match the host?
> It must match the host since the LBR registers are model specified.

OK, this means the value set in cpu->features[] need to be
validated against the host in x86_cpu_filter_features().

It can be similar to what's done for intel-pt bits, but instead
of comparing to constants (the intel-pt bits in CPUID are
constant today), you can compare the host value with
cpu->features[FEAT_PERF_CAPABILITIES].

Maybe a FeatureWordInfo.validate_feature(X86CPU *, FeatureWord)
callback could be added, so we could just define separate
validation functions for each feature word, to be called
automatically by x86_cpu_filter_features(). This could be done
as a follow-up patch, though.


> >
> > If LBR_FMT must always match the host, the feature needs to block
> > live migration.
> It's migrable enough of the perf cap LBR version matches,
> don't need full model number match.

As long as the requirements are validated inside
x86_cpu_filter_features(), it should be OK to make it migratable.

>
> For example it's fine to migrate from SKY to CLX.
> > I guess this is already the case because PDCM is
> > cleared if !cpu->enable_pmu. Adding PDCM to .unmigratable_flags
> > is probably a good idea, though.
> I'm trying to make LBR migration-friendly as much as possible w/ your help.
>
> If Arch LBR is enabled for SPR guest, the situation will be different
> hence adding PDCM to .unmigratable_flags may not help it.

OK, in this case forget what I said about setting it on
.unmigratable_flags. The constraints for making the feature
migratable should be same ones mentioned for intel-pt at:
https://lore.kernel.org/qemu-devel/[email protected]/


> >
> >
> >
> > > NULL, NULL, NULL, NULL,
> > > NULL, "full-width-write", NULL, NULL,
> > > NULL, NULL, NULL, NULL,
> > > @@ -4224,6 +4224,12 @@ static bool lmce_supported(void)
> > > return !!(mce_cap & MCG_LMCE_P);
> > > }
> > > +static inline bool lbr_supported(void)
> > > +{
> > > + return kvm_enabled() && (kvm_arch_get_supported_msr_feature(kvm_state,
> > > + MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT);
> > > +}
> > You can rewrite this is an accelerator-independent way as:
> > (x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT)
> Thanks, I'll apply it.
> >
> > However, is this really supposed to return false if LBR_FMT is 000000?
> I think it's fine to return false if LBR_FMT is 000000.

Don't we want to support hosts that have PDCM
(CPUID[1].ECX[bit 15]) = 1 and
IA32_PERF_CAPABILITIES.LBR_FMT[bits 5:0] = 000000 ?

> >
> > > +
> > > #define CPUID_MODEL_ID_SZ 48
> > > /**
> > > @@ -4327,6 +4333,9 @@ static void max_x86_cpu_initfn(Object *obj)
> > > }
> > > object_property_set_bool(OBJECT(cpu), "pmu", true, &error_abort);
> > > + if (lbr_supported()) {
> > > + object_property_set_bool(OBJECT(cpu), "lbr", true, &error_abort);
> > Why is this necessary?
> >
> > If kvm_arch_get_supported_msr_feature(MSR_IA32_PERF_CAPABILITIES)
> > return the PERF_CAP_LBR_FMT bits set,
> > x86_cpu_get_supported_feature_word() will return those bits, and
> > they will be automatically set at
> > env->features[FEAT_PERF_CAPABILITIES].
> Thanks, I'll remove it.
> > > + }
> > > }
> > > static const TypeInfo max_x86_cpu_type_info = {
> > > @@ -5535,6 +5544,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > > }
> > > if (!cpu->enable_pmu) {
> > > *ecx &= ~CPUID_EXT_PDCM;
> > > + if (cpu->enable_lbr) {
> > > + warn_report("LBR is unsupported since guest PMU is disabled.");
> > > + exit(1);
> > > + }
> > > }
> > > break;
> > > case 2:
> > > @@ -6553,6 +6566,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > > }
> > > }
> > > + if (!cpu->max_features && cpu->enable_lbr &&
> > Why do we need to check for !cpu->max_features here?
> I'll remove it.
> >
> > > + !(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) {
> > > + warn_report("requested vcpu model doesn't support PDCM for LBR.");
> > > + exit(1);
> > Please report errors using error_setg(errp, ...) instead.
> I'll apply it.
> >
> > > + }
> > > +
> > > if (cpu->ucode_rev == 0) {
> > > /* The default is the same as KVM's. */
> > > if (IS_AMD_CPU(env)) {
> > > @@ -7187,6 +7206,7 @@ static Property x86_cpu_properties[] = {
> > > #endif
> > > DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> > > DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
> > > + DEFINE_PROP_BOOL("lbr", X86CPU, enable_lbr, false),
> > When exactly do we want to set lbr=off explicitly? What's the
> > expected outcome when lbr=off?
> We set pmu=off explicitly, so does lbr=off.
>
> When set lbr=off, the LBR-related registers accesses from guest bring #GP
> and expected outcome is just like pmu=off.

How are those registers enumerated? Maybe I'm looking at an
outdated version of the Intel SDM or I couldn't find the right
section.

> >
> > > DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts,
> > > HYPERV_SPINLOCK_NEVER_RETRY),
> > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > > index e1a5c174dc..a059913e26 100644
> > > --- a/target/i386/cpu.h
> > > +++ b/target/i386/cpu.h
> > > @@ -357,6 +357,7 @@ typedef enum X86Seg {
> > > #define ARCH_CAP_TSX_CTRL_MSR (1<<7)
> > > #define MSR_IA32_PERF_CAPABILITIES 0x345
> > > +#define PERF_CAP_LBR_FMT 0x3f
> > > #define MSR_IA32_TSX_CTRL 0x122
> > > #define MSR_IA32_TSCDEADLINE 0x6e0
> > > @@ -1702,6 +1703,7 @@ struct X86CPU {
> > > * capabilities) directly to the guest.
> > > */
> > > bool enable_pmu;
> > > + bool enable_lbr;
> > This is a good place to document what enable_lbr=true|false
> > means (see questions above).
> >
> I'll document it here.
> > > /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is
> > > * disabled by default to avoid breaking migration between QEMU with
> > > diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > index b8455c89ed..feb33d5472 100644
> > > --- a/target/i386/kvm.c
> > > +++ b/target/i386/kvm.c
> > > @@ -2690,8 +2690,10 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f)
> > > uint64_t kvm_perf_cap =
> > > kvm_arch_get_supported_msr_feature(kvm_state,
> > > MSR_IA32_PERF_CAPABILITIES);
> > > -
> > > if (kvm_perf_cap) {
> > > + if (!cpu->enable_lbr) {
> > > + kvm_perf_cap &= ~PERF_CAP_LBR_FMT;
> > > + }
> > Why is this necessary? If enable_lbr is false,
> > f[FEAT_PERF_CAPABILITIES] should not have those bits set at all.
> I'll remove it.
> >
> > > kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES,
> > > kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]);
> > > }
> > > @@ -2731,6 +2733,9 @@ static void kvm_init_msrs(X86CPU *cpu)
> > > if (has_msr_perf_capabs && cpu->enable_pmu) {
> > > kvm_msr_entry_add_perf(cpu, env->features);
> > > + } else if (!has_msr_perf_capabs && cpu->enable_lbr) {
> > > + warn_report("KVM doesn't support MSR_IA32_PERF_CAPABILITIES for LBR.");
> > > + exit(1);
> > This is not the appropriate place to check for unsupported
> > features. x86_cpu_realizefn() and/or x86_cpu_filter_features()
> > is.
> Thanks, I'll apply it in the x86_cpu_filter_features().
>
> Please let me if you have more comments.
>
> Thanks,
> Like Xu
> > > }
> > > if (has_msr_ucode_rev) {
> > > --
> > > 2.21.3
> > >
>

--
Eduardo

2020-09-29 03:16:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 02/10] KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it

On Sun, Jul 26, 2020 at 11:32:21PM +0800, Like Xu wrote:
> It's reasonable to call vmx_set_intercept_for_msr() in other vmx-specific
> files (e.g. pmu_intel.c), so expose it without semantic changes hopefully.

I suppose it's reasonable, but you still need to state what is actually
going to use it.

> Signed-off-by: Like Xu <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 4 ++--
> arch/x86/kvm/vmx/vmx.h | 2 ++
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index dcde73a230c6..162c668d58f5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3772,8 +3772,8 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm
> }
> }
>
> -static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
> - u32 msr, int type, bool value)
> +__always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
> + u32 msr, int type, bool value)
> {
> if (value)
> vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 0d06951e607c..08c850596cfc 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -356,6 +356,8 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
> int vmx_find_msr_index(struct vmx_msrs *m, u32 msr);
> int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
> struct x86_exception *e);
> +void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
> + u32 msr, int type, bool value);

This completely defeats the purpose of __always_inline.

>
> #define POSTED_INTR_ON 0
> #define POSTED_INTR_SN 1
> --
> 2.21.3
>

2020-09-29 06:48:08

by Xu, Like

[permalink] [raw]
Subject: Re: [PATCH] target/i386: add -cpu,lbr=true support to enable guest LBR

Hi Eduardo,

On 2020/9/28 23:41, Eduardo Habkost wrote:
> On Mon, Sep 28, 2020 at 10:51:03PM +0800, Xu, Like wrote:
>> Hi Eduardo,
>>
>> Thanks for your detailed review.
>>
>> On 2020/9/25 6:05, Eduardo Habkost wrote:
>>> I've just noticed this on my review queue (apologies for the long
>>> delay). Comments below:
>>>
>>> On Sun, Jul 26, 2020 at 11:32:20PM +0800, Like Xu wrote:
>>>> The LBR feature would be enabled on the guest if:
>>>> - the KVM is enabled and the PMU is enabled and,
>>>> - the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and,
>>>> - the supported returned value for lbr_fmt from this msr is not zero.
>>>>
>>>> The LBR feature would be disabled on the guest if:
>>>> - the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR,
>>>> - qemu set the IA32_PERF_CAPABILITIES msr feature without lbr_fmt values OR,
>>>> - the requested guest vcpu model doesn't support PDCM.
>>>>
>>>> Cc: Paolo Bonzini <[email protected]>
>>>> Cc: Richard Henderson <[email protected]>
>>>> Cc: Eduardo Habkost <[email protected]>
>>>> Cc: "Michael S. Tsirkin" <[email protected]>
>>>> Cc: Marcel Apfelbaum <[email protected]>
>>>> Cc: Marcelo Tosatti <[email protected]>
>>>> Cc: [email protected]
>>>> Signed-off-by: Like Xu <[email protected]>
>>>> ---
>>>> hw/i386/pc.c | 1 +
>>>> target/i386/cpu.c | 24 ++++++++++++++++++++++--
>>>> target/i386/cpu.h | 2 ++
>>>> target/i386/kvm.c | 7 ++++++-
>>>> 4 files changed, 31 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>> index 3d419d5991..857aff75bb 100644
>>>> --- a/hw/i386/pc.c
>>>> +++ b/hw/i386/pc.c
>>>> @@ -318,6 +318,7 @@ GlobalProperty pc_compat_1_5[] = {
>>>> { "Nehalem-" TYPE_X86_CPU, "min-level", "2" },
>>>> { "virtio-net-pci", "any_layout", "off" },
>>>> { TYPE_X86_CPU, "pmu", "on" },
>>>> + { TYPE_X86_CPU, "lbr", "on" },
>>> Why is this line here?
>> I'll remove it.
>>>> { "i440FX-pcihost", "short_root_bus", "0" },
>>>> { "q35-pcihost", "short_root_bus", "0" },
>>>> };
>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>> index 588f32e136..c803994887 100644
>>>> --- a/target/i386/cpu.c
>>>> +++ b/target/i386/cpu.c
>>>> @@ -1142,8 +1142,8 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>>>> [FEAT_PERF_CAPABILITIES] = {
>>>> .type = MSR_FEATURE_WORD,
>>>> .feat_names = {
>>>> - NULL, NULL, NULL, NULL,
>>>> - NULL, NULL, NULL, NULL,
>>>> + "lbr-fmt-bit-0", "lbr-fmt-bit-1", "lbr-fmt-bit-2", "lbr-fmt-bit-3",
>>>> + "lbr-fmt-bit-4", "lbr-fmt-bit-5", NULL, NULL,
>>> What about a separate "lbr-fmt" int property instead of
>>> individual bit properties?
>> I'm not sure if you mean adding a "separate lbr-fmt int property"
>> like "uint64_t tcg_features" to 'struct FeatureWordInfo'.
>>
>> Would you mind providing more implementation hints,
>> considering the PEBS_FMT will be added later ?
> You can add a regular uint8_t field to X86CPU, use
> DEFINE_PROP_UINT8 at x86_cpu_properties[], and just validate/copy
> the bits to cpu->features[FEAT_PERF_CAPABILITIES][bits 0:5] on
> x86_cpu_realizefn().
>
Thanks, I'll apply it and enable "-cpu,lbr-fmt=*" from command line.
>>> What happens if LBR_FMT on the host (returned by
>>> kvm_arch_get_supported_msr_feature(MSR_IA32_PERF_CAPABILITIES) is
>>> different than the one configured for the guest?
>> To enable guest LBR, guest LBR_FMT must be the same as host LBR_FMT.
>>> Can KVM emulate
>>> a CPU with different LBR_FMT, or it must match the host?
>> It must match the host since the LBR registers are model specified.
> OK, this means the value set in cpu->features[] need to be
> validated against the host in x86_cpu_filter_features().
>
> It can be similar to what's done for intel-pt bits, but instead
> of comparing to constants (the intel-pt bits in CPUID are
> constant today), you can compare the host value with
> cpu->features[FEAT_PERF_CAPABILITIES].
I assume you mean
    env->features[FEAT_PERF_CAPABILITIES]
for
    cpu->features[FEAT_PERF_CAPABILITIES].

Thanks, I'll apply it.
>
> Maybe a FeatureWordInfo.validate_feature(X86CPU *, FeatureWord)
> callback could be added, so we could just define separate
> validation functions for each feature word, to be called
> automatically by x86_cpu_filter_features(). This could be done
> as a follow-up patch, though.
Sure, let me see if there is extra value in adding separate
verification functions for each feature word.
>
>
>>> If LBR_FMT must always match the host, the feature needs to block
>>> live migration.
>> It's migrable enough of the perf cap LBR version matches,
>> don't need full model number match.
> As long as the requirements are validated inside
> x86_cpu_filter_features(), it should be OK to make it migratable.
Thanks.
>
>> For example it's fine to migrate from SKY to CLX.
>>> I guess this is already the case because PDCM is
>>> cleared if !cpu->enable_pmu. Adding PDCM to .unmigratable_flags
>>> is probably a good idea, though.
>> I'm trying to make LBR migration-friendly as much as possible w/ your help.
>>
>> If Arch LBR is enabled for SPR guest, the situation will be different
>> hence adding PDCM to .unmigratable_flags may not help it.
> OK, in this case forget what I said about setting it on
> .unmigratable_flags. The constraints for making the feature
> migratable should be same ones mentioned for intel-pt at:
> https://lore.kernel.org/qemu-devel/[email protected]/
Thanks for clarification.

Please let me know if the v2 patch doesn't follow you guide:
https://lore.kernel.org/qemu-devel/[email protected]/

>
>
>>>
>>>
>>>> NULL, NULL, NULL, NULL,
>>>> NULL, "full-width-write", NULL, NULL,
>>>> NULL, NULL, NULL, NULL,
>>>> @@ -4224,6 +4224,12 @@ static bool lmce_supported(void)
>>>> return !!(mce_cap & MCG_LMCE_P);
>>>> }
>>>> +static inline bool lbr_supported(void)
>>>> +{
>>>> + return kvm_enabled() && (kvm_arch_get_supported_msr_feature(kvm_state,
>>>> + MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT);
>>>> +}
>>> You can rewrite this is an accelerator-independent way as:
>>> (x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES) & PERF_CAP_LBR_FMT)
>> Thanks, I'll apply it.
>>> However, is this really supposed to return false if LBR_FMT is 000000?
>> I think it's fine to return false if LBR_FMT is 000000.
> Don't we want to support hosts that have PDCM
> (CPUID[1].ECX[bit 15]) = 1 and
> IA32_PERF_CAPABILITIES.LBR_FMT[bits 5:0] = 000000 ?
Real hardware may always has PDCM and non-zero LBR_FMT.

If we are talking about supporting guest with PDCM and zero LBR_FMT,
it has been supported due to the present of "full-width-write" bit.
>
>>>> +
>>>> #define CPUID_MODEL_ID_SZ 48
>>>> /**
>>>> @@ -4327,6 +4333,9 @@ static void max_x86_cpu_initfn(Object *obj)
>>>> }
>>>> object_property_set_bool(OBJECT(cpu), "pmu", true, &error_abort);
>>>> + if (lbr_supported()) {
>>>> + object_property_set_bool(OBJECT(cpu), "lbr", true, &error_abort);
>>> Why is this necessary?
>>>
>>> If kvm_arch_get_supported_msr_feature(MSR_IA32_PERF_CAPABILITIES)
>>> return the PERF_CAP_LBR_FMT bits set,
>>> x86_cpu_get_supported_feature_word() will return those bits, and
>>> they will be automatically set at
>>> env->features[FEAT_PERF_CAPABILITIES].
>> Thanks, I'll remove it.
>>>> + }
>>>> }
>>>> static const TypeInfo max_x86_cpu_type_info = {
>>>> @@ -5535,6 +5544,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>>> }
>>>> if (!cpu->enable_pmu) {
>>>> *ecx &= ~CPUID_EXT_PDCM;
>>>> + if (cpu->enable_lbr) {
>>>> + warn_report("LBR is unsupported since guest PMU is disabled.");
>>>> + exit(1);
>>>> + }
>>>> }
>>>> break;
>>>> case 2:
>>>> @@ -6553,6 +6566,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>> }
>>>> }
>>>> + if (!cpu->max_features && cpu->enable_lbr &&
>>> Why do we need to check for !cpu->max_features here?
>> I'll remove it.
>>>> + !(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) {
>>>> + warn_report("requested vcpu model doesn't support PDCM for LBR.");
>>>> + exit(1);
>>> Please report errors using error_setg(errp, ...) instead.
>> I'll apply it.
>>>> + }
>>>> +
>>>> if (cpu->ucode_rev == 0) {
>>>> /* The default is the same as KVM's. */
>>>> if (IS_AMD_CPU(env)) {
>>>> @@ -7187,6 +7206,7 @@ static Property x86_cpu_properties[] = {
>>>> #endif
>>>> DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
>>>> DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
>>>> + DEFINE_PROP_BOOL("lbr", X86CPU, enable_lbr, false),
>>> When exactly do we want to set lbr=off explicitly? What's the
>>> expected outcome when lbr=off?
>> We set pmu=off explicitly, so does lbr=off.
>>
>> When set lbr=off, the LBR-related registers accesses from guest bring #GP
>> and expected outcome is just like pmu=off.
> How are those registers enumerated? Maybe I'm looking at an
> outdated version of the Intel SDM or I couldn't find the right
> section.
For model-specified LBR, please refer to:

    17.4 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING OVERVIEW

For Architecture LBR, please refer to:

CHAPTER 7 ARCHITECTURAL LAST BRANCH RECORDS (LBRS)

    in the "JUNE 2020" of Intel® Architecture Instruction Set Extensions \
    and Future Features Programming Reference

Or, you can also ask me for any relevant details.

Thanks,
Like Xu

>>>> DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts,
>>>> HYPERV_SPINLOCK_NEVER_RETRY),
>>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>>>> index e1a5c174dc..a059913e26 100644
>>>> --- a/target/i386/cpu.h
>>>> +++ b/target/i386/cpu.h
>>>> @@ -357,6 +357,7 @@ typedef enum X86Seg {
>>>> #define ARCH_CAP_TSX_CTRL_MSR (1<<7)
>>>> #define MSR_IA32_PERF_CAPABILITIES 0x345
>>>> +#define PERF_CAP_LBR_FMT 0x3f
>>>> #define MSR_IA32_TSX_CTRL 0x122
>>>> #define MSR_IA32_TSCDEADLINE 0x6e0
>>>> @@ -1702,6 +1703,7 @@ struct X86CPU {
>>>> * capabilities) directly to the guest.
>>>> */
>>>> bool enable_pmu;
>>>> + bool enable_lbr;
>>> This is a good place to document what enable_lbr=true|false
>>> means (see questions above).
>>>
>> I'll document it here.
>>>> /* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is
>>>> * disabled by default to avoid breaking migration between QEMU with
>>>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>>>> index b8455c89ed..feb33d5472 100644
>>>> --- a/target/i386/kvm.c
>>>> +++ b/target/i386/kvm.c
>>>> @@ -2690,8 +2690,10 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f)
>>>> uint64_t kvm_perf_cap =
>>>> kvm_arch_get_supported_msr_feature(kvm_state,
>>>> MSR_IA32_PERF_CAPABILITIES);
>>>> -
>>>> if (kvm_perf_cap) {
>>>> + if (!cpu->enable_lbr) {
>>>> + kvm_perf_cap &= ~PERF_CAP_LBR_FMT;
>>>> + }
>>> Why is this necessary? If enable_lbr is false,
>>> f[FEAT_PERF_CAPABILITIES] should not have those bits set at all.
>> I'll remove it.
>>>> kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES,
>>>> kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]);
>>>> }
>>>> @@ -2731,6 +2733,9 @@ static void kvm_init_msrs(X86CPU *cpu)
>>>> if (has_msr_perf_capabs && cpu->enable_pmu) {
>>>> kvm_msr_entry_add_perf(cpu, env->features);
>>>> + } else if (!has_msr_perf_capabs && cpu->enable_lbr) {
>>>> + warn_report("KVM doesn't support MSR_IA32_PERF_CAPABILITIES for LBR.");
>>>> + exit(1);
>>> This is not the appropriate place to check for unsupported
>>> features. x86_cpu_realizefn() and/or x86_cpu_filter_features()
>>> is.
>> Thanks, I'll apply it in the x86_cpu_filter_features().
>>
>> Please let me if you have more comments.
>>
>> Thanks,
>> Like Xu
>>>> }
>>>> if (has_msr_ucode_rev) {
>>>> --
>>>> 2.21.3
>>>>

2020-09-29 07:14:21

by Xu, Like

[permalink] [raw]
Subject: Re: [PATCH v13 02/10] KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it

Hi Sean,

On 2020/9/29 11:13, Sean Christopherson wrote:
> On Sun, Jul 26, 2020 at 11:32:21PM +0800, Like Xu wrote:
>> It's reasonable to call vmx_set_intercept_for_msr() in other vmx-specific
>> files (e.g. pmu_intel.c), so expose it without semantic changes hopefully.
> I suppose it's reasonable, but you still need to state what is actually
> going to use it.

Sure, I will add more here that
one of its usage is to pass through LBR-related msrs later.

>> Signed-off-by: Like Xu <[email protected]>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 4 ++--
>> arch/x86/kvm/vmx/vmx.h | 2 ++
>> 2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index dcde73a230c6..162c668d58f5 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -3772,8 +3772,8 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm
>> }
>> }
>>
>> -static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
>> - u32 msr, int type, bool value)
>> +__always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
>> + u32 msr, int type, bool value)
>> {
>> if (value)
>> vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index 0d06951e607c..08c850596cfc 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -356,6 +356,8 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
>> int vmx_find_msr_index(struct vmx_msrs *m, u32 msr);
>> int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
>> struct x86_exception *e);
>> +void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
>> + u32 msr, int type, bool value);
> This completely defeats the purpose of __always_inline.
My motivation is to use vmx_set_intercept_for_msr() in pmu_intel.c,
which helps to extract pmu-specific code from vmx.c

I assume modern compilers will still make it inline even in this way,
or do you have a better solution for this?

Please let me if you have more comments on the patch series.

Thanks,
Like Xu
>
>>
>> #define POSTED_INTR_ON 0
>> #define POSTED_INTR_SN 1
>> --
>> 2.21.3
>>

2020-09-30 02:25:32

by Xu, Like

[permalink] [raw]
Subject: Re: [PATCH v13 00/10] Guest Last Branch Recording Enabling (KVM part)

Are there volunteers or maintainer to help review this patch-set ?

Just a kindly ping.

Please let me know if you need a re-based version.

Thanks,
Like Xu

On 2020/8/14 16:48, Xu, Like wrote:
> Are there no interested reviewers or users?
>
> Just a kindly ping.
>
> On 2020/7/26 23:32, Like Xu wrote:
>> Hi Paolo,
>>
>> Please review this new version for the Kernel 5.9 release, and
>> Sean may not review them as he said in the previous email
>> https://lore.kernel.org/kvm/[email protected]/
>>
>> You may cherry-pick the perf patches "3cb9d5464c1c..e1ad1ac2deb8"
>> from the branch "tip/perf/core" of scm/linux/kernel/git/tip/tip.git
>> as PeterZ said in the previous email
>> https://lore.kernel.org/kvm/[email protected]/
>>
>>
>> We may also apply the qemu-devel patch to the upstream qemu and try
>> the QEMU command lines with '-cpu host' or '-cpu host,pmu=true,lbr=true'.
>>
>> The following error will be gone forever with the patchset:
>>
>>    $ perf record -b lbr ${WORKLOAD}
>>    or $ perf record --call-graph lbr ${WORKLOAD}
>>    Error:
>>    cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
>> Try 'perf stat'
>>
>> Please check more details in each commit and feel free to test.
>>
>> v12->v13 Changelog:
>> - remove perf patches since they're queued in the tip/perf/core;
>> - add a minor patch to refactor MSR_IA32_DEBUGCTLMSR set/get handler;
>> - add a minor patch to expose vmx_set_intercept_for_msr();
>> - add a minor patch to initialize perf_capabilities in the
>> intel_pmu_init();
>> - spilt the big patch to three pieces (0004-0006) for better
>> understanding and review
>> - make the LBR_FMT exposure patch as the last step to enable guest LBR;
>>
>> Previous:
>> https://lore.kernel.org/kvm/[email protected]/
>>
>>
>> ---
>>
>> The last branch recording (LBR) is a performance monitor unit (PMU)
>> feature on Intel processors that records a running trace of the most
>> recent branches taken by the processor in the LBR stack. This patch
>> series is going to enable this feature for plenty of KVM guests.
>>
>> The user space could configure whether it's enabled or not for each
>> guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
>> could only enable LBR feature if its cpu model is the same as the
>> host since the LBR feature is still one of model specific features.
>>
>> If it's enabled on the guest, the guest LBR driver would accesses the
>> LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
>> The first guest access on the LBR related MSRs is always interceptible.
>> The KVM trap would create a special LBR event (called guest LBR event)
>> which enables the callstack mode and none of hardware counter is assigned.
>> The host perf would enable and schedule this event as usual.
>>
>> Guest's first access to a LBR registers gets trapped to KVM, which
>> creates a guest LBR perf event. It's a regular LBR perf event which gets
>> the LBR facility assigned from the perf subsystem. Once that succeeds,
>> the LBR stack msrs are passed through to the guest for efficient accesses.
>> However, if another host LBR event comes in and takes over the LBR
>> facility, the LBR msrs will be made interceptible, and guest following
>> accesses to the LBR msrs will be trapped and meaningless.
>>
>> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
>> VMX transition brings too excessive overhead to frequent vmx transition
>> itself, the guest LBR event would help save/restore the LBR stack msrs
>> during the context switching with the help of native LBR event callstack
>> mechanism, including LBR_SELECT msr.
>>
>> If the guest no longer accesses the LBR-related MSRs within a scheduling
>> time slice and the LBR enable bit is unset, vPMU would release its guest
>> LBR event as a normal event of a unused vPMC and the pass-through
>> state of the LBR stack msrs would be canceled.
>>
>> ---
>>
>> LBR testcase:
>> echo 1 > /proc/sys/kernel/watchdog
>> echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
>> echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
>> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
>> ./perf record -b ./br_instr a
>>
>> - Perf report on the host:
>> Samples: 72K of event 'cycles', Event count (approx.): 72512
>> Overhead  Command   Source Shared Object           Source
>> Symbol                           Target Symbol                          
>> Basic Block Cycles
>>    12.12%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           1
>>    11.05%  br_instr  br_instr                       [.]
>> lfsr_cond                           [.]
>> cmp_end                             5
>>     8.81%  br_instr  br_instr                       [.]
>> lfsr_cond                           [.]
>> cmp_end                             4
>>     5.04%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           20
>>     4.92%  br_instr  br_instr                       [.]
>> lfsr_cond                           [.]
>> cmp_end                             6
>>     4.88%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           6
>>     4.58%  br_instr  br_instr                       [.]
>> cmp_end                             [.]
>> lfsr_cond                           5
>>
>> - Perf report on the guest:
>> Samples: 92K of event 'cycles', Event count (approx.): 92544
>> Overhead  Command   Source Shared Object  Source
>> Symbol                                   Target
>> Symbol                                   Basic Block Cycles
>>    12.03%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   1
>>    11.09%  br_instr  br_instr              [.]
>> lfsr_cond                                   [.]
>> cmp_end                                     5
>>     8.57%  br_instr  br_instr              [.]
>> lfsr_cond                                   [.]
>> cmp_end                                     4
>>     5.08%  br_instr  br_instr              [.]
>> lfsr_cond                                   [.]
>> cmp_end                                     6
>>     5.06%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   20
>>     4.87%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   6
>>     4.70%  br_instr  br_instr              [.]
>> cmp_end                                     [.]
>> lfsr_cond                                   5
>>
>> Conclusion: the profiling results on the guest are similar to that on
>> the host.
>>
>> Like Xu (10):
>>    KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX
>>    KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static and expose it
>>    KVM: vmx/pmu: Initialize vcpu perf_capabilities once in intel_pmu_init()
>>    KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled
>>    KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
>>    KVM: vmx/pmu: Pass-through LBR msrs to when the guest LBR event is
>> ACTIVE
>>    KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
>>    KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
>>    KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
>>    KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
>>
>>   arch/x86/kvm/pmu.c              |  12 +-
>>   arch/x86/kvm/pmu.h              |   5 +
>>   arch/x86/kvm/vmx/capabilities.h |  22 ++-
>>   arch/x86/kvm/vmx/pmu_intel.c    | 296 +++++++++++++++++++++++++++++++-
>>   arch/x86/kvm/vmx/vmx.c          |  44 ++++-
>>   arch/x86/kvm/vmx/vmx.h          |  28 +++
>>   arch/x86/kvm/x86.c              |  15 +-
>>   7 files changed, 395 insertions(+), 27 deletions(-)
>>
>