2021-01-08 01:46:39

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 00/10] KVM: x86/pmu: Guest Last Branch Recording Enabling

Hi geniuses,

Please help review this rebased version which enables the guest LBR.

We already upstreamed the guest LBR support in the host perf, please
check more details in each commit and feel free to test and comment.

v13->v13 RESEND Changelog:
- Reviewed-by: Andi Kleen <[email protected]>

v12->v13 Changelog:
- remove perf patches since they're merged already;
- add a minor patch to refactor MSR_IA32_DEBUGCTLMSR set/get handler;
- add a minor patch to expose vmx_set_intercept_for_msr();
- add a minor patch to adjust features visibility via IA32_PERF_CAPABILITIES;
- spilt the big patch to three pieces (0004-0006) for better understanding and review;
- make the LBR_FMT exposure patch as the last step to enable guest LBR;

Previous:
https://lore.kernel.org/kvm/[email protected]/

---

The last branch recording (LBR) is a performance monitor unit (PMU)
feature on Intel processors that records a running trace of the most
recent branches taken by the processor in the LBR stack. This patch
series is going to enable this feature for plenty of KVM guests.

with this patch set, the following error will be gone forever and cloud
developers can better understand their programs with less profiling overhead:

$ perf record -b lbr ${WORKLOAD}
or $ perf record --call-graph lbr ${WORKLOAD}
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

The user space could configure whether it's enabled or not for each
guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
could only enable LBR feature if its cpu model is the same as the
host since the LBR feature is still one of model specific features.

If it's enabled on the guest, the guest LBR driver would accesses the
LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
The first guest access on the LBR related MSRs is always interceptible.
The KVM trap would create a special LBR event (called guest LBR event)
which enables the callstack mode and none of hardware counter is assigned.
The host perf would enable and schedule this event as usual.

Guest's first access to a LBR registers gets trapped to KVM, which
creates a guest LBR perf event. It's a regular LBR perf event which gets
the LBR facility assigned from the perf subsystem. Once that succeeds,
the LBR stack msrs are passed through to the guest for efficient accesses.
However, if another host LBR event comes in and takes over the LBR
facility, the LBR msrs will be made interceptible, and guest following
accesses to the LBR msrs will be trapped and meaningless.

Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
VMX transition brings too excessive overhead to frequent vmx transition
itself, the guest LBR event would help save/restore the LBR stack msrs
during the context switching with the help of native LBR event callstack
mechanism, including LBR_SELECT msr.

If the guest no longer accesses the LBR-related MSRs within a scheduling
time slice and the LBR enable bit is unset, vPMU would release its guest
LBR event as a normal event of a unused vPMC and the pass-through
state of the LBR stack msrs would be canceled.

---

LBR testcase:
echo 1 > /proc/sys/kernel/watchdog
echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
./perf record -b ./br_instr a

- Perf report on the host:
Samples: 72K of event 'cycles', Event count (approx.): 72512
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
12.12% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
11.05% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
8.81% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
5.04% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
4.92% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
4.88% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
4.58% br_instr br_instr [.] cmp_end [.] lfsr_cond 5

- Perf report on the guest:
Samples: 92K of event 'cycles', Event count (approx.): 92544
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
12.03% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
11.09% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
8.57% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
5.08% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
5.06% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
4.87% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
4.70% br_instr br_instr [.] cmp_end [.] lfsr_cond 5

Conclusion: the profiling results on the guest are similar to that on the host.

Like Xu (10):
KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX
KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static
KVM: x86/pmu: Use IA32_PERF_CAPABILITIES to adjust features visibility
KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled
KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE
KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

arch/x86/kvm/pmu.c | 12 +-
arch/x86/kvm/pmu.h | 5 +
arch/x86/kvm/vmx/capabilities.h | 22 ++-
arch/x86/kvm/vmx/pmu_intel.c | 292 +++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 52 +++++-
arch/x86/kvm/vmx/vmx.h | 28 +++
arch/x86/kvm/x86.c | 15 +-
7 files changed, 400 insertions(+), 26 deletions(-)

--
2.29.2


2021-01-08 01:46:44

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 01/10] KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX

SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
svm_get/set_msr, so the x86 common part can be safely moved to VMX.

Add vmx_supported_debugctl() to refactor the throwing logic of #GP.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 5 +++++
arch/x86/kvm/vmx/vmx.c | 19 ++++++++++++++++---
arch/x86/kvm/x86.c | 13 -------------
3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 3a1861403d73..a58cf3655351 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -378,4 +378,9 @@ static inline u64 vmx_get_perf_capabilities(void)
return PMU_CAP_FW_WRITES;
}

+static inline u64 vmx_supported_debugctl(void)
+{
+ return DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF;
+}
+
#endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2af05d3b0590..23b46327527e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1924,6 +1924,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
return 1;
goto find_uret_msr;
+ case MSR_IA32_DEBUGCTLMSR:
+ msr_info->data = 0;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2002,9 +2005,19 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;

- ret = kvm_set_msr_common(vcpu, msr_info);
- break;
-
+ if (!data) {
+ /* We support the non-activated case already */
+ return 0;
+ } else if (data & ~vmx_supported_debugctl()) {
+ /*
+ * Values other than LBR and BTF are vendor-specific,
+ * thus reserved and should throw a #GP.
+ */
+ return 1;
+ }
+ vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
+ __func__, data);
+ return 0;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0287840b93e0..c765fd72a66c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3063,18 +3063,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
}
break;
- case MSR_IA32_DEBUGCTLMSR:
- if (!data) {
- /* We support the non-activated case already */
- break;
- } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
- /* Values other than LBR and BTF are vendor-specific,
- thus reserved and should throw a #GP */
- return 1;
- } else if (report_ignored_msrs)
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
- break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(vcpu, msr, data);
case MSR_IA32_APICBASE:
@@ -3347,7 +3335,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
switch (msr_info->index) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
- case MSR_IA32_DEBUGCTLMSR:
case MSR_IA32_LASTBRANCHFROMIP:
case MSR_IA32_LASTBRANCHTOIP:
case MSR_IA32_LASTINTFROMIP:
--
2.29.2

2021-01-08 01:48:05

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 02/10] KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static

To make code responsibilities clear, we may resue and invoke the
vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c),
so expose it to passthrough LBR msrs later.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/vmx.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 23b46327527e..035bd6d279a4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3787,7 +3787,7 @@ static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
vmx_set_msr_bitmap_write(msr_bitmap, msr);
}

-static __always_inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
+void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
u32 msr, int type, bool value)
{
if (value)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9d3a557949ac..adc40d36909c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -341,6 +341,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu);
void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr);
void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
+void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
+ u32 msr, int type, bool value);

static inline u8 vmx_get_rvi(void)
{
--
2.29.2

2021-01-08 01:48:34

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 04/10] KVM: vmx/pmu: Clear PMU_CAP_LBR_FMT when guest LBR is disabled

The LBR could be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.

If LBR is disabled on the guest, the bits [0, 5] of the read-only
MSR_IA32_PERF_CAPABILITIES which tells about the record format
stored in the LBR records would be cleared.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 40 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 12 ++++++++++
3 files changed, 53 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index a58cf3655351..db1178a66d93 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -19,6 +19,7 @@ extern int __read_mostly pt_mode;
#define PT_MODE_HOST_GUEST 1

#define PMU_CAP_FW_WRITES (1ULL << 13)
+#define PMU_CAP_LBR_FMT 0x3f

struct nested_vmx_msrs {
/*
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f8083ecf8c7b..91212fe5ec56 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -168,6 +168,39 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
}

+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+ if (pmu->version < 2)
+ return false;
+
+ /*
+ * As a first step, a guest could only enable LBR feature if its
+ * cpu model is the same as the host because the LBR registers
+ * would be pass-through to the guest and they're model specific.
+ */
+ if (boot_cpu_data.x86_model != guest_cpuid_model(vcpu))
+ return false;
+
+ return !x86_perf_get_lbr(lbr);
+}
+
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ u64 lbr_fmt = vcpu->arch.perf_capabilities & PMU_CAP_LBR_FMT;
+
+ if (lbr->nr && lbr_fmt)
+ return true;
+
+ if (!lbr_fmt || !intel_pmu_lbr_is_compatible(vcpu))
+ return false;
+
+ return true;
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -320,6 +353,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
union cpuid10_edx edx;
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

pmu->nr_arch_gp_counters = 0;
pmu->nr_arch_fixed_counters = 0;
@@ -339,6 +373,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
+ if (!intel_pmu_lbr_is_enabled(vcpu)) {
+ vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
+ lbr_desc->records.nr = 0;
+ }

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -384,6 +422,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
{
int i;
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
pmu->gp_counters[i].type = KVM_PMC_GP;
@@ -401,6 +440,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)

vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
vmx_get_perf_capabilities() : 0;
+ lbr_desc->records.nr = 0;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index adc40d36909c..1b0bbfffa1f0 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -70,6 +70,17 @@ struct pt_desc {
struct pt_ctx guest;
};

+#define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
+#define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
+
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+ /* Basic info about guest LBR records. */
+ struct x86_pmu_lbr records;
+};
+
/*
* The nested_vmx structure is part of vcpu_vmx, and holds information we need
* for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -279,6 +290,7 @@ struct vcpu_vmx {
u64 ept_pointer;

struct pt_desc pt_desc;
+ struct lbr_desc lbr_desc;

/* Save desired MSR intercept (read: pass-through) state */
#define MAX_POSSIBLE_PASSTHROUGH_MSRS 13
--
2.29.2

2021-01-08 01:49:08

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 06/10] KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE

In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access
will result in a creation of guest LBR event per-vcpu.

If the guest LBR event is scheduled on with the corresponding vcpu context,
KVM will pass-through all LBR records msrs to the guest. The LBR callstack
mechanism implemented in the host could help save/restore the guest LBR
records during the event context switches, which reduces a lot of overhead
if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the
much more frequent VMX transitions.

To avoid reclaiming LBR resources from any higher priority event on host,
KVM would always check the exist of guest LBR event and its state before
vm-entry as late as possible. A negative result would cancel the
pass-through state, and it also prevents real registers accesses and
potential data leakage. If host reclaims the LBR between two checks, the
interception state and LBR records can be safely preserved due to native
save/restore support from guest LBR event.

The KVM emits a pr_warn() when the LBR hardware is unavailable to the
guest LBR event. The administer is supposed to reminder users that the
guest result may be inaccurate if someone is using LBR to record
hypervisor on the host side.

Suggested-by: Andi Kleen <[email protected]>
Co-developed-by: Wei Wang <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 127 ++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 10 +++
arch/x86/kvm/vmx/vmx.h | 1 +
3 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index db1d78ddabac..6b3319dbfa72 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -201,6 +201,24 @@ bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
return true;
}

+static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
+{
+ struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
+ bool ret = false;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return ret;
+
+ ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
+ (index >= records->from && index < records->from + records->nr) ||
+ (index >= records->to && index < records->to + records->nr);
+
+ if (!ret && records->info)
+ ret = (index >= records->info && index < records->info + records->nr);
+
+ return ret;
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -216,7 +234,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
default:
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
- get_fixed_pmc(pmu, msr) || get_fw_gp_pmc(pmu, msr);
+ get_fixed_pmc(pmu, msr) || get_fw_gp_pmc(pmu, msr) ||
+ intel_pmu_is_valid_lbr_msr(vcpu, msr);
break;
}

@@ -294,6 +313,46 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
return 0;
}

+/*
+ * It's safe to access LBR msrs from guest when they have not
+ * been passthrough since the host would help restore or reset
+ * the LBR msrs records when the guest LBR event is scheduled in.
+ */
+static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info, bool read)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ u32 index = msr_info->index;
+
+ if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
+ return false;
+
+ if (!lbr_desc->event && !intel_pmu_create_guest_lbr_event(vcpu))
+ goto dummy;
+
+ /*
+ * Disable irq to ensure the LBR feature doesn't get reclaimed by the
+ * host at the time the value is read from the msr, and this avoids the
+ * host LBR value to be leaked to the guest. If LBR has been reclaimed,
+ * return 0 on guest reads.
+ */
+ local_irq_disable();
+ if (lbr_desc->event->state == PERF_EVENT_STATE_ACTIVE) {
+ if (read)
+ rdmsrl(index, msr_info->data);
+ else
+ wrmsrl(index, msr_info->data);
+ local_irq_enable();
+ return true;
+ }
+ local_irq_enable();
+
+dummy:
+ if (read)
+ msr_info->data = 0;
+ return true;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -328,7 +387,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
msr_info->data = pmc->eventsel;
return 0;
- }
+ } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true))
+ return 0;
}

return 1;
@@ -399,7 +459,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
reprogram_gp_counter(pmc, data);
return 0;
}
- }
+ } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
+ return 0;
}

return 1;
@@ -528,6 +589,66 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ int i;
+
+ for (i = 0; i < lbr->nr; i++) {
+ vmx_set_intercept_for_msr(vcpu, lbr->from + i, MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(vcpu, lbr->to + i, MSR_TYPE_RW, set);
+ if (lbr->info)
+ vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
+ }
+
+ vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
+}
+
+static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
+{
+ vmx_update_intercept_for_lbr_msrs(vcpu, true);
+}
+
+static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
+{
+ vmx_update_intercept_for_lbr_msrs(vcpu, false);
+}
+
+/*
+ * Higher priority host perf events (e.g. cpu pinned) could reclaim the
+ * pmu resources (e.g. LBR) that were assigned to the guest. This is
+ * usually done via ipi calls (more details in perf_install_in_context).
+ *
+ * Before entering the non-root mode (with irq disabled here), double
+ * confirm that the pmu features enabled to the guest are not reclaimed
+ * by higher priority host events. Otherwise, disallow vcpu's access to
+ * the reclaimed features.
+ */
+void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->event) {
+ vmx_disable_lbr_msrs_passthrough(vcpu);
+ if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+ goto warn;
+ return;
+ }
+
+ if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
+ vmx_disable_lbr_msrs_passthrough(vcpu);
+ goto warn;
+ } else
+ vmx_enable_lbr_msrs_passthrough(vcpu);
+
+ return;
+
+warn:
+ pr_warn_ratelimited("kvm: vcpu-%d: fail to passthrough LBR.\n",
+ vcpu->vcpu_id);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2494e9f233dc..e1e9e0d1c414 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -658,6 +658,14 @@ static bool is_valid_passthrough_msr(u32 msr)
case MSR_IA32_RTIT_CR3_MATCH:
case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
/* PT MSRs. These are handled in pt_update_intercept_for_msr() */
+ case MSR_LBR_SELECT:
+ case MSR_LBR_TOS:
+ case MSR_LBR_INFO_0 ... MSR_LBR_INFO_0 + 31:
+ case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 31:
+ case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
+ case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
+ case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
+ /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
return true;
}

@@ -6721,6 +6729,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
pt_guest_enter(vmx);

atomic_switch_perf_msrs(vmx);
+ if (intel_pmu_lbr_is_enabled(vcpu))
+ vmx_passthrough_lbr_msrs(vcpu);

if (enable_preemption_timer)
vmx_update_hv_timer(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index ae645c2082ba..863bb3fe73d4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -77,6 +77,7 @@ bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);

struct lbr_desc {
/* Basic info about guest LBR records. */
--
2.29.2

2021-01-08 01:49:20

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 08/10] KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/pmu.c | 5 ++++-
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/capabilities.h | 4 +++-
arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 2 +-
5 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 67741d2a0308..405890c723a1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -383,8 +383,11 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
{
- if (lapic_in_kernel(vcpu))
+ if (lapic_in_kernel(vcpu)) {
+ if (kvm_x86_ops.pmu_ops->deliver_pmi)
+ kvm_x86_ops.pmu_ops->deliver_pmi(vcpu);
kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTPC);
+ }
}

bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 067fef51760c..742a4e98df8c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -39,6 +39,7 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+ void (*deliver_pmi)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 62aa7a701ebb..57b940c613ab 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -21,6 +21,8 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f

+#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
+
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -384,7 +386,7 @@ static inline u64 vmx_supported_debugctl(void)
u64 debugctl = DEBUGCTLMSR_BTF;

if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR;
+ debugctl |= DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 72a6dd6ca0ac..8120685c43d4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -590,6 +590,35 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+/*
+ * Emulate LBR_On_PMI behavior for 1 < pmu.version < 4.
+ *
+ * If Freeze_LBR_On_PMI = 1, the LBR is frozen on PMI and
+ * the KVM emulates to clear the LBR bit (bit 0) in IA32_DEBUGCTL.
+ *
+ * Guest needs to re-enable LBR to resume branches recording.
+ */
+static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
+{
+ u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
+ data &= ~DEBUGCTLMSR_LBR;
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ }
+}
+
+static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
+{
+ u8 version = vcpu_to_pmu(vcpu)->version;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return;
+
+ if (version > 1 && version < 4)
+ intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
+}
+
static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
{
struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
@@ -676,4 +705,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+ .deliver_pmi = intel_pmu_deliver_pmi,
};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e1e9e0d1c414..ad3b079f6700 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1963,7 +1963,7 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
u64 debugctl = vmx_supported_debugctl();

if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR;
+ debugctl &= ~DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
--
2.29.2

2021-01-08 01:49:22

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 05/10] KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR

When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler
would create a guest LBR event which enables the callstack mode and none of
hardware counter is assigned. The host perf would schedule and enable this
event as usual but in an exclusive way.

The guest LBR event will be released when the vPMU is reset but soon,
the lazy release mechanism would be applied to this event like a vPMC.

Adding vcpu_supported_debugctl() to throw #GP for DEBUGCTLMSR_LBR
based on per-guest LBR setting.

Suggested-by: Andi Kleen <[email protected]>
Co-developed-by: Wei Wang <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 7 +++-
arch/x86/kvm/vmx/pmu_intel.c | 61 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 31 +++++++++++------
arch/x86/kvm/vmx/vmx.h | 10 ++++++
4 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index db1178a66d93..62aa7a701ebb 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -381,7 +381,12 @@ static inline u64 vmx_get_perf_capabilities(void)

static inline u64 vmx_supported_debugctl(void)
{
- return DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF;
+ u64 debugctl = DEBUGCTLMSR_BTF;
+
+ if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
+ debugctl |= DEBUGCTLMSR_LBR;
+
+ return debugctl;
}

#endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 91212fe5ec56..db1d78ddabac 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -235,6 +235,65 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
return pmc;
}

+static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->event) {
+ perf_event_release_kernel(lbr_desc->event);
+ lbr_desc->event = NULL;
+ vcpu_to_pmu(vcpu)->event_count--;
+ }
+}
+
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ struct perf_event *event;
+
+ /*
+ * The perf_event_attr is constructed in the minimum efficient way:
+ * - set 'pinned = true' to make it task pinned so that if another
+ * cpu pinned event reclaims LBR, the event->oncpu will be set to -1;
+ * - set '.exclude_host = true' to record guest branches behavior;
+ *
+ * - set '.config = INTEL_FIXED_VLBR_EVENT' to indicates host perf
+ * schedule the event without a real HW counter but a fake one;
+ * check is_guest_lbr_event() and __intel_get_event_constraints();
+ *
+ * - set 'sample_type = PERF_SAMPLE_BRANCH_STACK' and
+ * 'branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ * PERF_SAMPLE_BRANCH_USER' to configure it as a LBR callstack
+ * event, which helps KVM to save/restore guest LBR records
+ * during host context switches and reduces quite a lot overhead,
+ * check branch_user_callstack() and intel_pmu_lbr_sched_task();
+ */
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_RAW,
+ .size = sizeof(attr),
+ .config = INTEL_FIXED_VLBR_EVENT,
+ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ .pinned = true,
+ .exclude_host = true,
+ .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ PERF_SAMPLE_BRANCH_USER,
+ };
+
+ if (unlikely(lbr_desc->event))
+ return 0;
+
+ event = perf_event_create_kernel_counter(&attr, -1,
+ current, NULL, NULL);
+ if (IS_ERR(event)) {
+ pr_debug_ratelimited("%s: failed %ld\n",
+ __func__, PTR_ERR(event));
+ return -ENOENT;
+ }
+ lbr_desc->event = event;
+ vcpu_to_pmu(vcpu)->event_count++;
+ return 0;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -441,6 +500,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
vmx_get_perf_capabilities() : 0;
lbr_desc->records.nr = 0;
+ lbr_desc->event = NULL;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -465,6 +525,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)

pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
pmu->global_ovf_ctrl = 0;
+ intel_pmu_release_guest_lbr_event(vcpu);
}

struct kvm_pmu_ops intel_pmu_ops = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 085a14579bbe..2494e9f233dc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1925,7 +1925,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
goto find_uret_msr;
case MSR_IA32_DEBUGCTLMSR:
- msr_info->data = 0;
+ msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
default:
find_uret_msr:
@@ -1950,6 +1950,16 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}

+static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 debugctl = vmx_supported_debugctl();
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ debugctl &= ~DEBUGCTLMSR_LBR;
+
+ return debugctl;
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2005,18 +2015,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;

- if (!data) {
- /* We support the non-activated case already */
- return 0;
- } else if (data & ~vmx_supported_debugctl()) {
- /*
- * Values other than LBR and BTF are vendor-specific,
- * thus reserved and should throw a #GP.
- */
+ if (data & ~vcpu_supported_debugctl(vcpu))
return 1;
+ if (data & DEBUGCTLMSR_BTF) {
+ vcpu_unimpl(vcpu, "%s: BTF in MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
+ __func__, data);
+ data &= ~DEBUGCTLMSR_BTF;
}
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
+ (data & DEBUGCTLMSR_LBR))
+ intel_pmu_create_guest_lbr_event(vcpu);
return 0;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1b0bbfffa1f0..ae645c2082ba 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -76,9 +76,19 @@ struct pt_desc {
bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
struct lbr_desc {
/* Basic info about guest LBR records. */
struct x86_pmu_lbr records;
+
+ /*
+ * Emulate LBR feature via passthrough LBR registers when the
+ * per-vcpu guest LBR event is scheduled on the current pcpu.
+ *
+ * The records may be inaccurate if the host reclaims the LBR.
+ */
+ struct perf_event *event;
};

/*
--
2.29.2

2021-01-08 01:49:41

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 07/10] KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 13 +++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 6b3319dbfa72..72a6dd6ca0ac 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -562,6 +562,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
vmx_get_perf_capabilities() : 0;
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
+ lbr_desc->already_passthrough = false;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -607,12 +608,24 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)

static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->already_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, true);
+ lbr_desc->already_passthrough = false;
}

static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->already_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, false);
+ lbr_desc->already_passthrough = true;
}

/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 863bb3fe73d4..b79244bd44bb 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -90,6 +90,9 @@ struct lbr_desc {
* The records may be inaccurate if the host reclaims the LBR.
*/
struct perf_event *event;
+
+ /* A flag to reduce the overhead of LBR pass-through or cancellation. */
+ bool already_passthrough;
};

/*
--
2.29.2

2021-01-08 01:49:45

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 03/10] KVM: x86/pmu: Use IA32_PERF_CAPABILITIES to adjust features visibility

On Intel platforms, KVM agent could configure MSR_IA32_PERF_CAPABILITIES
(such as unmask some vmx-supported bits in vcpu->arch.perf_capabilities)
to adjust the visibility of guest PMU features for vPMU-enabled guests.

Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr() validly,
the adjustment in intel_pmu_refresh() will be triggered. To ensure the
sustainability of the new value, the default initialization path is
moved to intel_pmu_init().

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 6 +++---
arch/x86/kvm/vmx/vmx.c | 5 +++++
arch/x86/kvm/x86.c | 2 +-
3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index a886a47daebd..f8083ecf8c7b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -327,7 +327,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
pmu->version = 0;
pmu->reserved_bits = 0xffffffff00200000ull;
- vcpu->arch.perf_capabilities = 0;

entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
@@ -340,8 +339,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
- if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
- vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -401,6 +398,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
pmu->fixed_counters[i].current_config = 0;
}
+
+ vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
+ vmx_get_perf_capabilities() : 0;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 035bd6d279a4..085a14579bbe 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2209,6 +2209,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((data >> 32) != 0)
return 1;
goto find_uret_msr;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (data && !vcpu_to_pmu(vcpu)->version)
+ return 1;
+ ret = kvm_set_msr_common(vcpu, msr_info);
+ break;

default:
find_uret_msr:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c765fd72a66c..a1b251ae648a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3037,7 +3037,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;

vcpu->arch.perf_capabilities = data;
-
+ kvm_pmu_refresh(vcpu);
return 0;
}
case MSR_EFER:
--
2.29.2

2021-01-08 01:50:07

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 09/10] KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES

Userspace could enable guest LBR feature when the exactly supported
LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
and the LBR is also compatible with vPMU version and host cpu model.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 9 ++++++++-
arch/x86/kvm/vmx/vmx.c | 7 +++++++
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 57b940c613ab..a9a7c4d1b634 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -378,7 +378,14 @@ static inline u64 vmx_get_perf_capabilities(void)
* Since counters are virtualized, KVM would support full
* width counting unconditionally, even if the host lacks it.
*/
- return PMU_CAP_FW_WRITES;
+ u64 perf_cap = PMU_CAP_FW_WRITES;
+
+ if (boot_cpu_has(X86_FEATURE_PDCM))
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
+
+ perf_cap |= perf_cap & PMU_CAP_LBR_FMT;
+
+ return perf_cap;
}

static inline u64 vmx_supported_debugctl(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ad3b079f6700..9cb5b1e4fc27 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2229,6 +2229,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
+ if (data & PMU_CAP_LBR_FMT) {
+ if ((data & PMU_CAP_LBR_FMT) !=
+ (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
+ return 1;
+ if (!intel_pmu_lbr_is_compatible(vcpu))
+ return 1;
+ }
ret = kvm_set_msr_common(vcpu, msr_info);
break;

--
2.29.2

2021-01-08 01:50:59

by Like Xu

[permalink] [raw]
Subject: [RESEND v13 10/10] KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/pmu.c | 7 +++++++
arch/x86/kvm/pmu.h | 4 ++++
arch/x86/kvm/vmx/pmu_intel.c | 17 ++++++++++++++++-
3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 405890c723a1..e7c72eea07d4 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -463,6 +463,7 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
struct kvm_pmc *pmc = NULL;
DECLARE_BITMAP(bitmask, X86_PMC_IDX_MAX);
int i;
+ bool extra_cleanup = false;

pmu->need_cleanup = false;

@@ -474,8 +475,14 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)

if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
pmc_stop_counter(pmc);
+
+ if (i == INTEL_GUEST_LBR_INUSE)
+ extra_cleanup = true;
}

+ if (extra_cleanup && kvm_x86_ops.pmu_ops->cleanup)
+ kvm_x86_ops.pmu_ops->cleanup(vcpu);
+
bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 742a4e98df8c..c8b650866f56 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -15,6 +15,9 @@
#define VMWARE_BACKDOOR_PMC_REAL_TIME 0x10001
#define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002

+/* Indicates whether Intel LBR msrs were accessed during the last time slice. */
+#define INTEL_GUEST_LBR_INUSE INTEL_PMC_IDX_FIXED_VLBR
+
#define MAX_FIXED_COUNTERS 3

struct kvm_event_hw_type_mapping {
@@ -40,6 +43,7 @@ struct kvm_pmu_ops {
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
+ void (*cleanup)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8120685c43d4..4d10f564607d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -310,6 +310,7 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
}
lbr_desc->event = event;
vcpu_to_pmu(vcpu)->event_count++;
+ __set_bit(INTEL_GUEST_LBR_INUSE, vcpu_to_pmu(vcpu)->pmc_in_use);
return 0;
}

@@ -342,10 +343,12 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
rdmsrl(index, msr_info->data);
else
wrmsrl(index, msr_info->data);
+ __set_bit(INTEL_GUEST_LBR_INUSE, vcpu_to_pmu(vcpu)->pmc_in_use);
local_irq_enable();
return true;
}
local_irq_enable();
+ clear_bit(INTEL_GUEST_LBR_INUSE, vcpu_to_pmu(vcpu)->pmc_in_use);

dummy:
if (read)
@@ -496,7 +499,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
if (!intel_pmu_lbr_is_enabled(vcpu)) {
vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
lbr_desc->records.nr = 0;
- }
+ } else
+ bitmap_set(pmu->all_valid_pmc_idx, INTEL_GUEST_LBR_INUSE, 1);

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -669,17 +673,21 @@ static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
*/
void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
goto warn;
+ if (test_bit(INTEL_GUEST_LBR_INUSE, pmu->pmc_in_use))
+ goto warn;
return;
}

if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
vmx_disable_lbr_msrs_passthrough(vcpu);
+ __clear_bit(INTEL_GUEST_LBR_INUSE, pmu->pmc_in_use);
goto warn;
} else
vmx_enable_lbr_msrs_passthrough(vcpu);
@@ -691,6 +699,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
vcpu->vcpu_id);
}

+static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+ if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ intel_pmu_release_guest_lbr_event(vcpu);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
@@ -706,4 +720,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.init = intel_pmu_init,
.reset = intel_pmu_reset,
.deliver_pmi = intel_pmu_deliver_pmi,
+ .cleanup = intel_pmu_cleanup,
};
--
2.29.2

2021-01-15 08:48:11

by Alex Shi

[permalink] [raw]
Subject: Re: [RESEND v13 00/10] KVM: x86/pmu: Guest Last Branch Recording Enabling



?? 2021/1/8 ????9:36, Like Xu д??:
> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
> VMX transition brings too excessive overhead to frequent vmx transition
> itself, the guest LBR event would help save/restore the LBR stack msrs
> during the context switching with the help of native LBR event callstack
> mechanism, including LBR_SELECT msr.
>

Sounds the feature is much helpful for VMM guest performance tunning.
Good job!

Thanks
Alex

2021-01-15 08:53:14

by Xu, Like

[permalink] [raw]
Subject: Re: [RESEND v13 00/10] KVM: x86/pmu: Guest Last Branch Recording Enabling

Hi Alex,

Thank you for trying this guest feature on multiple Intel platforms!
If you have more specific comments or any concerns, just let me know.

---
thx, likexu

On 2021/1/15 16:19, Alex Shi wrote:
>
> 在 2021/1/8 上午9:36, Like Xu 写道:
>> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
>> VMX transition brings too excessive overhead to frequent vmx transition
>> itself, the guest LBR event would help save/restore the LBR stack msrs
>> during the context switching with the help of native LBR event callstack
>> mechanism, including LBR_SELECT msr.
>>
> Sounds the feature is much helpful for VMM guest performance tunning.
> Good job!
>
> Thanks
> Alex

2021-01-26 11:29:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 09/10] KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES

On 08/01/21 02:37, Like Xu wrote:
> Userspace could enable guest LBR feature when the exactly supported
> LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
> and the LBR is also compatible with vPMU version and host cpu model.
>
> Signed-off-by: Like Xu <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kvm/vmx/capabilities.h | 9 ++++++++-
> arch/x86/kvm/vmx/vmx.c | 7 +++++++
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 57b940c613ab..a9a7c4d1b634 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -378,7 +378,14 @@ static inline u64 vmx_get_perf_capabilities(void)
> * Since counters are virtualized, KVM would support full
> * width counting unconditionally, even if the host lacks it.
> */
> - return PMU_CAP_FW_WRITES;
> + u64 perf_cap = PMU_CAP_FW_WRITES;
> +
> + if (boot_cpu_has(X86_FEATURE_PDCM))
> + rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
> +
> + perf_cap |= perf_cap & PMU_CAP_LBR_FMT;
> +
> + return perf_cap;
> }
>
> static inline u64 vmx_supported_debugctl(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ad3b079f6700..9cb5b1e4fc27 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2229,6 +2229,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_PERF_CAPABILITIES:
> if (data && !vcpu_to_pmu(vcpu)->version)
> return 1;
> + if (data & PMU_CAP_LBR_FMT) {
> + if ((data & PMU_CAP_LBR_FMT) !=
> + (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
> + return 1;
> + if (!intel_pmu_lbr_is_compatible(vcpu))
> + return 1;
> + }
> ret = kvm_set_msr_common(vcpu, msr_info);
> break;
>
>

Please move this hunk to patch 4.

Paolo

2021-01-26 12:01:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 03/10] KVM: x86/pmu: Use IA32_PERF_CAPABILITIES to adjust features visibility

On 08/01/21 02:36, Like Xu wrote:
>
> @@ -401,6 +398,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
> pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
> pmu->fixed_counters[i].current_config = 0;
> }
> +
> + vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu, X86_FEATURE_PDCM) ?
> + vmx_get_perf_capabilities() : 0;

There is one thing I don't understand with this patch: intel_pmu_init is
not called when CPUID is changed. So I would have thought that anything
that uses guest_cpuid_has must stay in intel_pmu_refresh. As I
understand it vcpu->arch.perf_capabilities is always set to 0
(vmx_get_perf_capabilities is never called), and kvm_set_msr_common
would fail to set any bit in the MSR. What am I missing?

In addition, the code of patch 4:

+ if (!intel_pmu_lbr_is_enabled(vcpu)) {
+ vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
+ lbr_desc->records.nr = 0;
+ }

is not okay after MSR changes. The value written by the host must be
either rejected (with "return 1") or applied unchanged.

Fortunately I think this code is dead if you move the check in
kvm_set_msr from patch 9 to patch 4. However, in patch 9
vmx_get_perf_capabilities() must only set the LBR format bits if
intel_pmu_lbr_is_compatible(vcpu).


The patches look good apart from these issues and the other nits I
pointed out. However, you need testcases here, for both kvm-unit-tests
and tools/testing/selftests/kvm.

For KVM, it would be at least a basic check that looks for the MSR LBR
(using the MSR indices for the various processors), does a branch, and
checks that the FROM_IP/TO_IP are good. You can write the
kvm-unit-tests using the QEMU option "-cpu host,migratable=no": if you
do this, QEMU will pick the KVM_GET_SUPPORTED_CPUID bits and move them
more or less directly into the guest CPUID.

For tools/testing/selftests/kvm, your test need to check the effect of
various CPUID settings on the PERF_CAPABILITIES MSR, check that whatever
you write with KVM_SET_MSR is _not_ modified and can be retrieved with
KVM_GET_MSR, and check that invalid LBR formats are rejected.

I'm really, really sorry for leaving these patches on my todo list for
months, but you guys need to understand the main reason for this: they
come with no testcases. A large patch series adding userspace APIs and
complicated CPUID/MSR processing *automatically* goes to the bottom of
my queue, because:

- I need to go with a fine comb over all the userspace API changes, I
cannot just look at test code and see if it works.

- I will have no way to test its correctness after it's committed.

For you, the work ends when your patch is accepted. For me, that's when
the work begins, and I need to make sure that the patch will be
maintainable in the future.

Thanks, and sorry again for the delay.

Paolo

2021-01-26 15:02:26

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 10/10] KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

On 08/01/21 02:37, Like Xu wrote:
> The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
> indicate whether a guest LBR event is still needed by the vcpu. If the
> vcpu no longer accesses LBR related registers within a scheduling time
> slice, and the enable bit of LBR has been unset, vPMU will treat the
> guest LBR event as a bland event of a vPMC counter and release it
> as usual. Also, the pass-through state of LBR records msrs is cancelled.
>
> Signed-off-by: Like Xu <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kvm/pmu.c | 7 +++++++
> arch/x86/kvm/pmu.h | 4 ++++
> arch/x86/kvm/vmx/pmu_intel.c | 17 ++++++++++++++++-
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 405890c723a1..e7c72eea07d4 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -463,6 +463,7 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
> struct kvm_pmc *pmc = NULL;
> DECLARE_BITMAP(bitmask, X86_PMC_IDX_MAX);
> int i;
> + bool extra_cleanup = false;
>
> pmu->need_cleanup = false;
>
> @@ -474,8 +475,14 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
>
> if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
> pmc_stop_counter(pmc);
> +
> + if (i == INTEL_GUEST_LBR_INUSE)
> + extra_cleanup = true;
> }
>
> + if (extra_cleanup && kvm_x86_ops.pmu_ops->cleanup)
> + kvm_x86_ops.pmu_ops->cleanup(vcpu);
> +

You can call this function always, it's cleaner than hardcoding
INTEL_GUEST_LBR_INUSE.

You can also use INTEL_PMC_IDX_FIXED_VLBR directly instead of
INTEL_GUEST_LBR_INUSE.

Paolo

2021-01-26 16:22:12

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 07/10] KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation

On 08/01/21 02:37, Like Xu wrote:
> +
> + /* A flag to reduce the overhead of LBR pass-through or cancellation. */
> + bool already_passthrough;

/* True if LBRs are marked as not intercepted in the MSR bitmap */
bool msr_passthrough;

> };
>
> /*
>

2021-01-26 20:38:44

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 01/10] KVM: x86: Move common set/get handler of MSR_IA32_DEBUGCTLMSR to VMX

On 08/01/21 02:36, Like Xu wrote:
> SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
> svm_get/set_msr, so the x86 common part can be safely moved to VMX.
>
> Add vmx_supported_debugctl() to refactor the throwing logic of #GP.
>
> Signed-off-by: Like Xu <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kvm/vmx/capabilities.h | 5 +++++
> arch/x86/kvm/vmx/vmx.c | 19 ++++++++++++++++---
> arch/x86/kvm/x86.c | 13 -------------
> 3 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 3a1861403d73..a58cf3655351 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -378,4 +378,9 @@ static inline u64 vmx_get_perf_capabilities(void)
> return PMU_CAP_FW_WRITES;
> }
>
> +static inline u64 vmx_supported_debugctl(void)
> +{
> + return DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF;
> +}
> +
> #endif /* __KVM_X86_VMX_CAPS_H */
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2af05d3b0590..23b46327527e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1924,6 +1924,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> return 1;
> goto find_uret_msr;
> + case MSR_IA32_DEBUGCTLMSR:
> + msr_info->data = 0;
> + break;
> default:
> find_uret_msr:
> msr = vmx_find_uret_msr(vmx, msr_info->index);
> @@ -2002,9 +2005,19 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> VM_EXIT_SAVE_DEBUG_CONTROLS)
> get_vmcs12(vcpu)->guest_ia32_debugctl = data;
>
> - ret = kvm_set_msr_common(vcpu, msr_info);
> - break;
> -
> + if (!data) {
> + /* We support the non-activated case already */
> + return 0;
> + } else if (data & ~vmx_supported_debugctl()) {
> + /*
> + * Values other than LBR and BTF are vendor-specific,
> + * thus reserved and should throw a #GP.
> + */
> + return 1;
> + }
> + vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
> + __func__, data);
> + return 0;
> case MSR_IA32_BNDCFGS:
> if (!kvm_mpx_supported() ||
> (!msr_info->host_initiated &&
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0287840b93e0..c765fd72a66c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3063,18 +3063,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return 1;
> }
> break;
> - case MSR_IA32_DEBUGCTLMSR:
> - if (!data) {
> - /* We support the non-activated case already */
> - break;
> - } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
> - /* Values other than LBR and BTF are vendor-specific,
> - thus reserved and should throw a #GP */
> - return 1;
> - } else if (report_ignored_msrs)
> - vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
> - __func__, data);
> - break;
> case 0x200 ... 0x2ff:
> return kvm_mtrr_set_msr(vcpu, msr, data);
> case MSR_IA32_APICBASE:
> @@ -3347,7 +3335,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> switch (msr_info->index) {
> case MSR_IA32_PLATFORM_ID:
> case MSR_IA32_EBL_CR_POWERON:
> - case MSR_IA32_DEBUGCTLMSR:
> case MSR_IA32_LASTBRANCHFROMIP:
> case MSR_IA32_LASTBRANCHTOIP:
> case MSR_IA32_LASTINTFROMIP:
>

Queued, thanks.

Paolo

2021-01-27 21:24:48

by Xu, Like

[permalink] [raw]
Subject: Re: [RESEND v13 09/10] KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES

On 2021/1/26 17:30, Paolo Bonzini wrote:
> On 08/01/21 02:37, Like Xu wrote:
>> Userspace could enable guest LBR feature when the exactly supported
>> LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
>> and the LBR is also compatible with vPMU version and host cpu model.
>>
>> Signed-off-by: Like Xu <[email protected]>
>> Reviewed-by: Andi Kleen <[email protected]>
>> ---
>>   arch/x86/kvm/vmx/capabilities.h | 9 ++++++++-
>>   arch/x86/kvm/vmx/vmx.c          | 7 +++++++
>>   2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx/capabilities.h
>> b/arch/x86/kvm/vmx/capabilities.h
>> index 57b940c613ab..a9a7c4d1b634 100644
>> --- a/arch/x86/kvm/vmx/capabilities.h
>> +++ b/arch/x86/kvm/vmx/capabilities.h
>> @@ -378,7 +378,14 @@ static inline u64 vmx_get_perf_capabilities(void)
>>        * Since counters are virtualized, KVM would support full
>>        * width counting unconditionally, even if the host lacks it.
>>        */
>> -    return PMU_CAP_FW_WRITES;
>> +    u64 perf_cap = PMU_CAP_FW_WRITES;
>> +
>> +    if (boot_cpu_has(X86_FEATURE_PDCM))
>> +        rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
>> +
>> +    perf_cap |= perf_cap & PMU_CAP_LBR_FMT;
>> +
>> +    return perf_cap;
>>   }
>>     static inline u64 vmx_supported_debugctl(void)
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index ad3b079f6700..9cb5b1e4fc27 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2229,6 +2229,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr_info)
>>       case MSR_IA32_PERF_CAPABILITIES:
>>           if (data && !vcpu_to_pmu(vcpu)->version)
>>               return 1;
>> +        if (data & PMU_CAP_LBR_FMT) {
>> +            if ((data & PMU_CAP_LBR_FMT) !=
>> +                (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
>> +                return 1;
>> +            if (!intel_pmu_lbr_is_compatible(vcpu))
>> +                return 1;
>> +        }
>>           ret = kvm_set_msr_common(vcpu, msr_info);
>>           break;
>>
>
> Please move this hunk to patch 4.
>
> Paolo
>
Thanks, I'll do this part early in the next version.

I would have thought that we need to
make the interface exposing as the last enabling step.

2021-01-27 21:27:18

by Like Xu

[permalink] [raw]
Subject: Re: [RESEND v13 03/10] KVM: x86/pmu: Use IA32_PERF_CAPABILITIES to adjust features visibility

On 2021/1/26 17:42, Paolo Bonzini wrote:
> On 08/01/21 02:36, Like Xu wrote:
>>
>> @@ -401,6 +398,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>          pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
>>          pmu->fixed_counters[i].current_config = 0;
>>      }
>> +
>> +    vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu,
>> X86_FEATURE_PDCM) ?
>> +        vmx_get_perf_capabilities() : 0;
>
> There is one thing I don't understand with this patch: intel_pmu_init is
> not called when CPUID is changed.  So I would have thought that anything
> that uses guest_cpuid_has must stay in intel_pmu_refresh.  As I understand
> it vcpu->arch.perf_capabilities is always set to 0
> (vmx_get_perf_capabilities is never called), and kvm_set_msr_common would
> fail to set any bit in the MSR.  What am I missing?
>
> In addition, the code of patch 4:
>
> +    if (!intel_pmu_lbr_is_enabled(vcpu)) {
> +        vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
> +        lbr_desc->records.nr = 0;
> +    }
>
> is not okay after MSR changes.  The value written by the host must be
> either rejected (with "return 1") or applied unchanged.
>
> Fortunately I think this code is dead if you move the check in kvm_set_msr
> from patch 9 to patch 4.  However, in patch 9 vmx_get_perf_capabilities()
> must only set the LBR format bits if intel_pmu_lbr_is_compatible(vcpu).

Thanks for the guidance. How about handling it in this way:

In the intel_pmu_init():

vcpu->arch.perf_capabilities = 0;
lbr_desc->records.nr = 0;

In the intel_pmu_refresh():

if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) {
vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();
if (!lbr_desc->records.nr)
vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
}

In the vmx_set_msr():

case MSR_IA32_PERF_CAPABILITIES:
// set up lbr_desc->records.nr
if (!intel_pmu_lbr_is_compatible(vcpu))
return 1;
ret = kvm_set_msr_common(vcpu, msr_info);

In the kvm_set_msr_common():

case MSR_IA32_PERF_CAPABILITIES:
vcpu->arch.perf_capabilities = data;
kvm_pmu_refresh(vcpu);

>
>
> The patches look good apart from these issues and the other nits I pointed
> out.  However, you need testcases here, for both kvm-unit-tests and
> tools/testing/selftests/kvm.
>
> For KVM, it would be at least a basic check that looks for the MSR LBR
> (using the MSR indices for the various processors), does a branch, and
> checks that the FROM_IP/TO_IP are good.  You can write the kvm-unit-tests
> using the QEMU option "-cpu host,migratable=no": if you do this, QEMU will
> pick the KVM_GET_SUPPORTED_CPUID bits and move them more or less directly
> into the guest CPUID.
>
> For tools/testing/selftests/kvm, your test need to check the effect of
> various CPUID settings on the PERF_CAPABILITIES MSR, check that whatever
> you write with KVM_SET_MSR is _not_ modified and can be retrieved with
> KVM_GET_MSR, and check that invalid LBR formats are rejected.

Thanks, I will add the above tests in the next version.

>
> I'm really, really sorry for leaving these patches on my todo list for
> months, but you guys need to understand the main reason for this: they come
> with no testcases.  A large patch series adding userspace APIs and
> complicated CPUID/MSR processing *automatically* goes to the bottom of my
> queue, because:
>
> - I need to go with a fine comb over all the userspace API changes, I
> cannot just look at test code and see if it works.
>
> - I will have no way to test its correctness after it's committed.
>
> For you, the work ends when your patch is accepted.  For me, that's when
> the work begins, and I need to make sure that the patch will be
> maintainable in the future.
>
> Thanks, and sorry again for the delay.
>
> Paolo
>

2021-01-27 21:54:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 03/10] KVM: x86/pmu: Use IA32_PERF_CAPABILITIES to adjust features visibility

On 27/01/21 07:04, Like Xu wrote:
> On 2021/1/26 17:42, Paolo Bonzini wrote:
>> On 08/01/21 02:36, Like Xu wrote:
>>>
>>> @@ -401,6 +398,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>>          pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
>>>          pmu->fixed_counters[i].current_config = 0;
>>>      }
>>> +
>>> +    vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu,
>>> X86_FEATURE_PDCM) ?
>>> +        vmx_get_perf_capabilities() : 0;
>>
>> There is one thing I don't understand with this patch: intel_pmu_init
>> is not called when CPUID is changed.  So I would have thought that
>> anything that uses guest_cpuid_has must stay in intel_pmu_refresh.  As
>> I understand it vcpu->arch.perf_capabilities is always set to 0
>> (vmx_get_perf_capabilities is never called), and kvm_set_msr_common
>> would fail to set any bit in the MSR.  What am I missing?
>>
>> In addition, the code of patch 4:
>>
>> +    if (!intel_pmu_lbr_is_enabled(vcpu)) {
>> +        vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
>> +        lbr_desc->records.nr = 0;
>> +    }
>>
>> is not okay after MSR changes.  The value written by the host must be
>> either rejected (with "return 1") or applied unchanged.
>>
>> Fortunately I think this code is dead if you move the check in
>> kvm_set_msr from patch 9 to patch 4.  However, in patch 9
>> vmx_get_perf_capabilities() must only set the LBR format bits if
>> intel_pmu_lbr_is_compatible(vcpu).
>
> Thanks for the guidance. How about handling it in this way:
>
> In the intel_pmu_init():
>
>     vcpu->arch.perf_capabilities = 0;
>     lbr_desc->records.nr = 0;
>
> In the intel_pmu_refresh():
>
>     if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) {
>         vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();
>         if (!lbr_desc->records.nr)
>             vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
>     }
>
> In the vmx_set_msr():
>
>     case MSR_IA32_PERF_CAPABILITIES:
>         // set up lbr_desc->records.nr
>         if (!intel_pmu_lbr_is_compatible(vcpu))

Maybe pass msr_info.data to intel_pmu_lbr_is_compatible? Otherwise
seems okay, thanks Like.

Paolo

2021-01-27 21:55:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RESEND v13 09/10] KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES

On 27/01/21 06:45, Xu, Like wrote:
> On 2021/1/26 17:30, Paolo Bonzini wrote:
>> On 08/01/21 02:37, Like Xu wrote:
>>> Userspace could enable guest LBR feature when the exactly supported
>>> LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
>>> and the LBR is also compatible with vPMU version and host cpu model.
>>>
>>> Signed-off-by: Like Xu <[email protected]>
>>> Reviewed-by: Andi Kleen <[email protected]>
>>> ---
>>>   arch/x86/kvm/vmx/capabilities.h | 9 ++++++++-
>>>   arch/x86/kvm/vmx/vmx.c          | 7 +++++++
>>>   2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/capabilities.h
>>> b/arch/x86/kvm/vmx/capabilities.h
>>> index 57b940c613ab..a9a7c4d1b634 100644
>>> --- a/arch/x86/kvm/vmx/capabilities.h
>>> +++ b/arch/x86/kvm/vmx/capabilities.h
>>> @@ -378,7 +378,14 @@ static inline u64 vmx_get_perf_capabilities(void)
>>>        * Since counters are virtualized, KVM would support full
>>>        * width counting unconditionally, even if the host lacks it.
>>>        */
>>> -    return PMU_CAP_FW_WRITES;
>>> +    u64 perf_cap = PMU_CAP_FW_WRITES;
>>> +
>>> +    if (boot_cpu_has(X86_FEATURE_PDCM))
>>> +        rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
>>> +
>>> +    perf_cap |= perf_cap & PMU_CAP_LBR_FMT;
>>> +
>>> +    return perf_cap;
>>>   }
>>>     static inline u64 vmx_supported_debugctl(void)
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index ad3b079f6700..9cb5b1e4fc27 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -2229,6 +2229,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
>>> struct msr_data *msr_info)
>>>       case MSR_IA32_PERF_CAPABILITIES:
>>>           if (data && !vcpu_to_pmu(vcpu)->version)
>>>               return 1;
>>> +        if (data & PMU_CAP_LBR_FMT) {
>>> +            if ((data & PMU_CAP_LBR_FMT) !=
>>> +                (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
>>> +                return 1;
>>> +            if (!intel_pmu_lbr_is_compatible(vcpu))
>>> +                return 1;
>>> +        }
>>>           ret = kvm_set_msr_common(vcpu, msr_info);
>>>           break;
>>>
>>
>> Please move this hunk to patch 4.
>>
>> Paolo
>>
> Thanks, I'll do this part early in the next version.
>
> I would have thought that we need to
> make the interface exposing as the last enabling step.

That's the right thing to do for vmx_get_perf_capabilities(). However,
checking the values of MSR_IA32_PERF_CAPABILITIES can be done early.

Paolo

2021-02-01 05:40:52

by Like Xu

[permalink] [raw]
Subject: Re: [RESEND v13 03/10] KVM: x86/pmu: Use IA32_PERF_CAPABILITIES to adjust features visibility

Hi Paolo,

On 2021/1/27 14:04, Like Xu wrote:
> On 2021/1/26 17:42, Paolo Bonzini wrote:
>> On 08/01/21 02:36, Like Xu wrote:
>>>
>>> @@ -401,6 +398,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>>>          pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
>>>          pmu->fixed_counters[i].current_config = 0;
>>>      }
>>> +
>>> +    vcpu->arch.perf_capabilities = guest_cpuid_has(vcpu,
>>> X86_FEATURE_PDCM) ?
>>> +        vmx_get_perf_capabilities() : 0;
>>
>> There is one thing I don't understand with this patch: intel_pmu_init is
>> not called when CPUID is changed.  So I would have thought that anything
>> that uses guest_cpuid_has must stay in intel_pmu_refresh.  As I
>> understand it vcpu->arch.perf_capabilities is always set to 0
>> (vmx_get_perf_capabilities is never called), and kvm_set_msr_common
>> would fail to set any bit in the MSR.  What am I missing?
>>
>> In addition, the code of patch 4:
>>
>> +    if (!intel_pmu_lbr_is_enabled(vcpu)) {
>> +        vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
>> +        lbr_desc->records.nr = 0;
>> +    }
>>
>> is not okay after MSR changes.  The value written by the host must be
>> either rejected (with "return 1") or applied unchanged.
>>
>> Fortunately I think this code is dead if you move the check in
>> kvm_set_msr from patch 9 to patch 4.  However, in patch 9
>> vmx_get_perf_capabilities() must only set the LBR format bits if
>> intel_pmu_lbr_is_compatible(vcpu).
>
> Thanks for the guidance. How about handling it in this way:
>
> In the intel_pmu_init():
>
>     vcpu->arch.perf_capabilities = 0;
>     lbr_desc->records.nr = 0;
>
> In the intel_pmu_refresh():
>
>     if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) {
>         vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();
>         if (!lbr_desc->records.nr)
>             vcpu->arch.perf_capabilities &= ~PMU_CAP_LBR_FMT;
>     }
>
> In the vmx_set_msr():
>
>     case MSR_IA32_PERF_CAPABILITIES:
>         // set up lbr_desc->records.nr
>         if (!intel_pmu_lbr_is_compatible(vcpu))
>             return 1;
>         ret = kvm_set_msr_common(vcpu, msr_info);
>
> In the kvm_set_msr_common():
>
>     case MSR_IA32_PERF_CAPABILITIES:
>         vcpu->arch.perf_capabilities = data;
>         kvm_pmu_refresh(vcpu);

The new version will make the vcpu->arch.perf_capabilities crud as simple
as possible.

>
>>
>>
>> The patches look good apart from these issues and the other nits I
>> pointed out.  However, you need testcases here, for both kvm-unit-tests
>> and tools/testing/selftests/kvm.
>>
>> For KVM, it would be at least a basic check that looks for the MSR LBR
>> (using the MSR indices for the various processors), does a branch, and
>> checks that the FROM_IP/TO_IP are good.  You can write the
>> kvm-unit-tests using the QEMU option "-cpu host,migratable=no": if you
>> do this, QEMU will pick the KVM_GET_SUPPORTED_CPUID bits and move them
>> more or less directly into the guest CPUID.
>>
>> For tools/testing/selftests/kvm, your test need to check the effect of
>> various CPUID settings on the PERF_CAPABILITIES MSR, check that whatever
>> you write with KVM_SET_MSR is _not_ modified and can be retrieved with
>> KVM_GET_MSR, and check that invalid LBR formats are rejected.
>
> Thanks, I will add the above tests in the next version.
>
>>
>> I'm really, really sorry for leaving these patches on my todo list for
>> months, but you guys need to understand the main reason for this: they
>> come with no testcases.  A large patch series adding userspace APIs and
>> complicated CPUID/MSR processing *automatically* goes to the bottom of
>> my queue, because:

Please review the new version
https://lore.kernel.org/kvm/[email protected]/T/#t
and kvm-unit-tests:
https://lore.kernel.org/kvm/[email protected],
in case the patch set is not *automatically* processed to the bottom of
your queue.

I'll also add tests for other new vPMU features.

---
thx, likexu

>>
>> - I need to go with a fine comb over all the userspace API changes, I
>> cannot just look at test code and see if it works.
>>
>> - I will have no way to test its correctness after it's committed.
>>
>> For you, the work ends when your patch is accepted.  For me, that's when
>> the work begins, and I need to make sure that the patch will be
>> maintainable in the future.
>>
>> Thanks, and sorry again for the delay.
>>
>> Paolo
>>
>