2021-02-01 05:21:19

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

Hi geniuses,

Please help review this new version which enables the guest LBR.

We already upstreamed the guest LBR support in the host perf, please
check more details in each commit and feel free to test and comment.

QEMU part: https://lore.kernel.org/qemu-devel/[email protected]
kvm-unit-tests: https://lore.kernel.org/kvm/[email protected]

v13-v14 Changelog:
- Rewrite crud about vcpu->arch.perf_capabilities;
- Add PERF_CAPABILITIES testcases to tools/testing/selftests/kvm;
- Add basic LBR testcases to the kvm-unit-tests (w/ QEMU patches);
- Apply rewritten commit log from Paolo;
- Queued the first patch "KVM: x86: Move common set/get handler ...";
- Rename 'already_passthrough' to 'msr_passthrough';
- Check the values of MSR_IA32_PERF_CAPABILITIES early;
- Call kvm_x86_ops.pmu_ops->cleanup() always and drop extra_cleanup;
- Use INTEL_PMC_IDX_FIXED_VLBR directly;
- Fix a bug in the vmx_get_perf_capabilities();

Previous:
https://lore.kernel.org/kvm/[email protected]/

---

The last branch recording (LBR) is a performance monitor unit (PMU)
feature on Intel processors that records a running trace of the most
recent branches taken by the processor in the LBR stack. This patch
series is going to enable this feature for plenty of KVM guests.

with this patch set, the following error will be gone forever and cloud
developers can better understand their programs with less profiling overhead:

$ perf record -b lbr ${WORKLOAD}
or $ perf record --call-graph lbr ${WORKLOAD}
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

The user space could configure whether it's enabled or not for each
guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
could only enable LBR feature if its cpu model is the same as the
host since the LBR feature is still one of model specific features.

If it's enabled on the guest, the guest LBR driver would accesses the
LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
The first guest access on the LBR related MSRs is always interceptible.
The KVM trap would create a special LBR event (called guest LBR event)
which enables the callstack mode and none of hardware counter is assigned.
The host perf would enable and schedule this event as usual.

Guest's first access to a LBR registers gets trapped to KVM, which
creates a guest LBR perf event. It's a regular LBR perf event which gets
the LBR facility assigned from the perf subsystem. Once that succeeds,
the LBR stack msrs are passed through to the guest for efficient accesses.
However, if another host LBR event comes in and takes over the LBR
facility, the LBR msrs will be made interceptible, and guest following
accesses to the LBR msrs will be trapped and meaningless.

Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
VMX transition brings too excessive overhead to frequent vmx transition
itself, the guest LBR event would help save/restore the LBR stack msrs
during the context switching with the help of native LBR event callstack
mechanism, including LBR_SELECT msr.

If the guest no longer accesses the LBR-related MSRs within a scheduling
time slice and the LBR enable bit is unset, vPMU would release its guest
LBR event as a normal event of a unused vPMC and the pass-through
state of the LBR stack msrs would be canceled.

---

LBR testcase:
echo 1 > /proc/sys/kernel/watchdog
echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
perf record -b ./br_instr a
(perf record --call-graph lbr ./br_instr a)

- Perf report on the host:
Samples: 72K of event 'cycles', Event count (approx.): 72512
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
12.12% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
11.05% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
8.81% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
5.04% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
4.92% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
4.88% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
4.58% br_instr br_instr [.] cmp_end [.] lfsr_cond 5

- Perf report on the guest:
Samples: 92K of event 'cycles', Event count (approx.): 92544
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
12.03% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
11.09% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
8.57% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
5.08% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
5.06% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
4.87% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
4.70% br_instr br_instr [.] cmp_end [.] lfsr_cond 5

Conclusion: the profiling results on the guest are similar to that on the host.

Like Xu (11):
KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static
KVM: x86/pmu: Set up IA32_PERF_CAPABILITIES if PDCM bit is available
KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled
KVM: vmx/pmu: Expose DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR
KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE
KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
selftests: kvm/x86: add test for pmu msr MSR_IA32_PERF_CAPABILITIES

arch/x86/kvm/pmu.c | 8 +-
arch/x86/kvm/pmu.h | 2 +
arch/x86/kvm/vmx/capabilities.h | 19 +-
arch/x86/kvm/vmx/pmu_intel.c | 281 +++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 55 +++-
arch/x86/kvm/vmx/vmx.h | 28 ++
arch/x86/kvm/x86.c | 2 +-
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/vmx_pmu_msrs_test.c | 149 ++++++++++
10 files changed, 524 insertions(+), 22 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c

--
2.29.2


2021-02-01 05:22:34

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 01/11] KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static

To make code responsibilities clear, we may resue and invoke the
vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c),
so expose it to passthrough LBR msrs later.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/vmx.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6f94620fd0db..4aa378c20986 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3787,7 +3787,7 @@ static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
vmx_set_msr_bitmap_write(msr_bitmap, msr);
}

-static __always_inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
+void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
u32 msr, int type, bool value)
{
if (value)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9d3a557949ac..adc40d36909c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -341,6 +341,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu);
void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr);
void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
+void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
+ u32 msr, int type, bool value);

static inline u8 vmx_get_rvi(void)
{
--
2.29.2

2021-02-01 05:24:07

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 02/11] KVM: x86/pmu: Set up IA32_PERF_CAPABILITIES if PDCM bit is available

On Intel platforms, KVM userspace will be able to configure
MSR_IA32_PERF_CAPABILITIES to adjust the visibility of guest
PMU features for vPMU-enabled guests.

Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr(),
the adjustment in intel_pmu_refresh() will be triggered. To
ensure that the new value is kept, the default initialization
path is moved to intel_pmu_init().

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 5 ++---
arch/x86/kvm/vmx/vmx.c | 5 +++++
arch/x86/kvm/x86.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index cdf5f34518f4..f632039173ff 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -327,7 +327,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
pmu->version = 0;
pmu->reserved_bits = 0xffffffff00200000ull;
- vcpu->arch.perf_capabilities = 0;

entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
@@ -340,8 +339,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
- if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
- vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -405,6 +402,8 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
pmu->fixed_counters[i].current_config = 0;
}
+
+ vcpu->arch.perf_capabilities = 0;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4aa378c20986..387adaa1194f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2209,6 +2209,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((data >> 32) != 0)
return 1;
goto find_uret_msr;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (data && !vcpu_to_pmu(vcpu)->version)
+ return 1;
+ ret = kvm_set_msr_common(vcpu, msr_info);
+ break;

default:
find_uret_msr:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42ef3659b20a..bdb0b3a37147 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3038,7 +3038,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;

vcpu->arch.perf_capabilities = data;
-
+ kvm_pmu_refresh(vcpu);
return 0;
}
case MSR_EFER:
--
2.29.2

2021-02-01 05:25:47

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 03/11] KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 17 +++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 7 +++++++
arch/x86/kvm/vmx/vmx.h | 11 +++++++++++
4 files changed, 36 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index a58cf3655351..db1178a66d93 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -19,6 +19,7 @@ extern int __read_mostly pt_mode;
#define PT_MODE_HOST_GUEST 1

#define PMU_CAP_FW_WRITES (1ULL << 13)
+#define PMU_CAP_LBR_FMT 0x3f

struct nested_vmx_msrs {
/*
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f632039173ff..01b2cd8eca47 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -168,6 +168,21 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
}

+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+
+ /*
+ * As a first step, a guest could only enable LBR feature if its
+ * cpu model is the same as the host because the LBR registers
+ * would be pass-through to the guest and they're model specific.
+ */
+ if (boot_cpu_data.x86_model != guest_cpuid_model(vcpu))
+ return false;
+
+ return !x86_perf_get_lbr(lbr);
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -388,6 +403,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
{
int i;
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
pmu->gp_counters[i].type = KVM_PMC_GP;
@@ -404,6 +420,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
}

vcpu->arch.perf_capabilities = 0;
+ lbr_desc->records.nr = 0;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 387adaa1194f..af9c7632ecfa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2212,6 +2212,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
+ if (data & PMU_CAP_LBR_FMT) {
+ if ((data & PMU_CAP_LBR_FMT) !=
+ (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
+ return 1;
+ if (!intel_pmu_lbr_is_compatible(vcpu))
+ return 1;
+ }
ret = kvm_set_msr_common(vcpu, msr_info);
break;

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index adc40d36909c..095e357e5316 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -70,6 +70,16 @@ struct pt_desc {
struct pt_ctx guest;
};

+#define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
+#define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
+
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+ /* Basic info about guest LBR records. */
+ struct x86_pmu_lbr records;
+};
+
/*
* The nested_vmx structure is part of vcpu_vmx, and holds information we need
* for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -279,6 +289,7 @@ struct vcpu_vmx {
u64 ept_pointer;

struct pt_desc pt_desc;
+ struct lbr_desc lbr_desc;

/* Save desired MSR intercept (read: pass-through) state */
#define MAX_POSSIBLE_PASSTHROUGH_MSRS 13
--
2.29.2

2021-02-01 05:25:49

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 05/11] KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR

When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler
would create a guest LBR event which enables the callstack mode and none of
hardware counter is assigned. The host perf would schedule and enable this
event as usual but in an exclusive way.

The guest LBR event will be released when the vPMU is reset but soon,
the lazy release mechanism would be applied to this event like a vPMC.

Suggested-by: Andi Kleen <[email protected]>
Co-developed-by: Wei Wang <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 63 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 3 ++
arch/x86/kvm/vmx/vmx.h | 10 ++++++
3 files changed, 76 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e75a957b2068..22c271a1c7a4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -224,6 +224,66 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
return pmc;
}

+static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->event) {
+ perf_event_release_kernel(lbr_desc->event);
+ lbr_desc->event = NULL;
+ vcpu_to_pmu(vcpu)->event_count--;
+ }
+}
+
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct perf_event *event;
+
+ /*
+ * The perf_event_attr is constructed in the minimum efficient way:
+ * - set 'pinned = true' to make it task pinned so that if another
+ * cpu pinned event reclaims LBR, the event->oncpu will be set to -1;
+ * - set '.exclude_host = true' to record guest branches behavior;
+ *
+ * - set '.config = INTEL_FIXED_VLBR_EVENT' to indicates host perf
+ * schedule the event without a real HW counter but a fake one;
+ * check is_guest_lbr_event() and __intel_get_event_constraints();
+ *
+ * - set 'sample_type = PERF_SAMPLE_BRANCH_STACK' and
+ * 'branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ * PERF_SAMPLE_BRANCH_USER' to configure it as a LBR callstack
+ * event, which helps KVM to save/restore guest LBR records
+ * during host context switches and reduces quite a lot overhead,
+ * check branch_user_callstack() and intel_pmu_lbr_sched_task();
+ */
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_RAW,
+ .size = sizeof(attr),
+ .config = INTEL_FIXED_VLBR_EVENT,
+ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ .pinned = true,
+ .exclude_host = true,
+ .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ PERF_SAMPLE_BRANCH_USER,
+ };
+
+ if (unlikely(lbr_desc->event))
+ return 0;
+
+ event = perf_event_create_kernel_counter(&attr, -1,
+ current, NULL, NULL);
+ if (IS_ERR(event)) {
+ pr_debug_ratelimited("%s: failed %ld\n",
+ __func__, PTR_ERR(event));
+ return -ENOENT;
+ }
+ lbr_desc->event = event;
+ pmu->event_count++;
+ return 0;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -428,6 +488,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)

vcpu->arch.perf_capabilities = 0;
lbr_desc->records.nr = 0;
+ lbr_desc->event = NULL;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -452,6 +513,8 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)

pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
pmu->global_ovf_ctrl = 0;
+
+ intel_pmu_release_guest_lbr_event(vcpu);
}

struct kvm_pmu_ops intel_pmu_ops = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3c008dec407c..c85a42b39bed 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2023,6 +2023,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
data &= ~DEBUGCTLMSR_BTF;
}
vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
+ (data & DEBUGCTLMSR_LBR))
+ intel_pmu_create_guest_lbr_event(vcpu);
return 0;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1b0bbfffa1f0..ae645c2082ba 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -76,9 +76,19 @@ struct pt_desc {
bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
struct lbr_desc {
/* Basic info about guest LBR records. */
struct x86_pmu_lbr records;
+
+ /*
+ * Emulate LBR feature via passthrough LBR registers when the
+ * per-vcpu guest LBR event is scheduled on the current pcpu.
+ *
+ * The records may be inaccurate if the host reclaims the LBR.
+ */
+ struct perf_event *event;
};

/*
--
2.29.2

2021-02-01 05:26:29

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 04/11] KVM: vmx/pmu: Expose DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR

When the DEBUGCTLMSR_LBR bit 0 is set, the processor records a running
trace of the most recent branches, interrupts, and/or exceptions taken
by the processor (prior to a debug exception being generated) in the
last branch record (LBR) stack.

Adding vcpu_supported_debugctl() to throw #GP for DEBUGCTLMSR_LBR
based on per-guest LBR setting.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 7 ++++++-
arch/x86/kvm/vmx/pmu_intel.c | 7 +++++++
arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++-----------
arch/x86/kvm/vmx/vmx.h | 1 +
4 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index db1178a66d93..62aa7a701ebb 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -381,7 +381,12 @@ static inline u64 vmx_get_perf_capabilities(void)

static inline u64 vmx_supported_debugctl(void)
{
- return DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF;
+ u64 debugctl = DEBUGCTLMSR_BTF;
+
+ if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
+ debugctl |= DEBUGCTLMSR_LBR;
+
+ return debugctl;
}

#endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 01b2cd8eca47..e75a957b2068 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -183,6 +183,13 @@ bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
return !x86_perf_get_lbr(lbr);
}

+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+
+ return lbr->nr && (vcpu->arch.perf_capabilities & PMU_CAP_LBR_FMT);
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index af9c7632ecfa..3c008dec407c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1925,7 +1925,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
goto find_uret_msr;
case MSR_IA32_DEBUGCTLMSR:
- msr_info->data = 0;
+ msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
default:
find_uret_msr:
@@ -1950,6 +1950,16 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}

+static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 debugctl = vmx_supported_debugctl();
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ debugctl &= ~DEBUGCTLMSR_LBR;
+
+ return debugctl;
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2005,18 +2015,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
VM_EXIT_SAVE_DEBUG_CONTROLS)
get_vmcs12(vcpu)->guest_ia32_debugctl = data;

- if (!data) {
- /* We support the non-activated case already */
- return 0;
- } else if (data & ~vmx_supported_debugctl()) {
- /*
- * Values other than LBR and BTF are vendor-specific,
- * thus reserved and should throw a #GP.
- */
+ if (data & ~vcpu_supported_debugctl(vcpu))
return 1;
+ if (data & DEBUGCTLMSR_BTF) {
+ vcpu_unimpl(vcpu, "%s: BTF in MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
+ __func__, data);
+ data &= ~DEBUGCTLMSR_BTF;
}
- vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
- __func__, data);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
return 0;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 095e357e5316..1b0bbfffa1f0 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -74,6 +74,7 @@ struct pt_desc {
#define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)

bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

struct lbr_desc {
/* Basic info about guest LBR records. */
--
2.29.2

2021-02-01 05:28:26

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 06/11] KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE

In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access
will result in a creation of guest LBR event per-vcpu.

If the guest LBR event is scheduled on with the corresponding vcpu context,
KVM will pass-through all LBR records msrs to the guest. The LBR callstack
mechanism implemented in the host could help save/restore the guest LBR
records during the event context switches, which reduces a lot of overhead
if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the
much more frequent VMX transitions.

To avoid reclaiming LBR resources from any higher priority event on host,
KVM would always check the exist of guest LBR event and its state before
vm-entry as late as possible. A negative result would cancel the
pass-through state, and it also prevents real registers accesses and
potential data leakage. If host reclaims the LBR between two checks, the
interception state and LBR records can be safely preserved due to native
save/restore support from guest LBR event.

The KVM emits a pr_warn() when the LBR hardware is unavailable to the
guest LBR event. The administer is supposed to reminder users that the
guest result may be inaccurate if someone is using LBR to record
hypervisor on the host side.

Suggested-by: Andi Kleen <[email protected]>
Co-developed-by: Wei Wang <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 127 ++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 10 +++
arch/x86/kvm/vmx/vmx.h | 1 +
3 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 22c271a1c7a4..287fc14f0445 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -190,6 +190,24 @@ bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
return lbr->nr && (vcpu->arch.perf_capabilities & PMU_CAP_LBR_FMT);
}

+static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
+{
+ struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
+ bool ret = false;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return ret;
+
+ ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
+ (index >= records->from && index < records->from + records->nr) ||
+ (index >= records->to && index < records->to + records->nr);
+
+ if (!ret && records->info)
+ ret = (index >= records->info && index < records->info + records->nr);
+
+ return ret;
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -205,7 +223,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
default:
ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
- get_fixed_pmc(pmu, msr) || get_fw_gp_pmc(pmu, msr);
+ get_fixed_pmc(pmu, msr) || get_fw_gp_pmc(pmu, msr) ||
+ intel_pmu_is_valid_lbr_msr(vcpu, msr);
break;
}

@@ -284,6 +303,46 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
return 0;
}

+/*
+ * It's safe to access LBR msrs from guest when they have not
+ * been passthrough since the host would help restore or reset
+ * the LBR msrs records when the guest LBR event is scheduled in.
+ */
+static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info, bool read)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ u32 index = msr_info->index;
+
+ if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
+ return false;
+
+ if (!lbr_desc->event && !intel_pmu_create_guest_lbr_event(vcpu))
+ goto dummy;
+
+ /*
+ * Disable irq to ensure the LBR feature doesn't get reclaimed by the
+ * host at the time the value is read from the msr, and this avoids the
+ * host LBR value to be leaked to the guest. If LBR has been reclaimed,
+ * return 0 on guest reads.
+ */
+ local_irq_disable();
+ if (lbr_desc->event->state == PERF_EVENT_STATE_ACTIVE) {
+ if (read)
+ rdmsrl(index, msr_info->data);
+ else
+ wrmsrl(index, msr_info->data);
+ local_irq_enable();
+ return true;
+ }
+ local_irq_enable();
+
+dummy:
+ if (read)
+ msr_info->data = 0;
+ return true;
+}
+
static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -318,7 +377,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
msr_info->data = pmc->eventsel;
return 0;
- }
+ } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true))
+ return 0;
}

return 1;
@@ -389,7 +449,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
reprogram_gp_counter(pmc, data);
return 0;
}
- }
+ } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
+ return 0;
}

return 1;
@@ -517,6 +578,66 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
+{
+ struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
+ int i;
+
+ for (i = 0; i < lbr->nr; i++) {
+ vmx_set_intercept_for_msr(vcpu, lbr->from + i, MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(vcpu, lbr->to + i, MSR_TYPE_RW, set);
+ if (lbr->info)
+ vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
+ }
+
+ vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
+ vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
+}
+
+static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
+{
+ vmx_update_intercept_for_lbr_msrs(vcpu, true);
+}
+
+static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
+{
+ vmx_update_intercept_for_lbr_msrs(vcpu, false);
+}
+
+/*
+ * Higher priority host perf events (e.g. cpu pinned) could reclaim the
+ * pmu resources (e.g. LBR) that were assigned to the guest. This is
+ * usually done via ipi calls (more details in perf_install_in_context).
+ *
+ * Before entering the non-root mode (with irq disabled here), double
+ * confirm that the pmu features enabled to the guest are not reclaimed
+ * by higher priority host events. Otherwise, disallow vcpu's access to
+ * the reclaimed features.
+ */
+void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
+{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->event) {
+ vmx_disable_lbr_msrs_passthrough(vcpu);
+ if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+ goto warn;
+ return;
+ }
+
+ if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
+ vmx_disable_lbr_msrs_passthrough(vcpu);
+ goto warn;
+ } else
+ vmx_enable_lbr_msrs_passthrough(vcpu);
+
+ return;
+
+warn:
+ pr_warn_ratelimited("kvm: vcpu-%d: fail to passthrough LBR.\n",
+ vcpu->vcpu_id);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c85a42b39bed..40fdeb394328 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -658,6 +658,14 @@ static bool is_valid_passthrough_msr(u32 msr)
case MSR_IA32_RTIT_CR3_MATCH:
case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
/* PT MSRs. These are handled in pt_update_intercept_for_msr() */
+ case MSR_LBR_SELECT:
+ case MSR_LBR_TOS:
+ case MSR_LBR_INFO_0 ... MSR_LBR_INFO_0 + 31:
+ case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 31:
+ case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
+ case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
+ case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
+ /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
return true;
}

@@ -6730,6 +6738,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
pt_guest_enter(vmx);

atomic_switch_perf_msrs(vmx);
+ if (intel_pmu_lbr_is_enabled(vcpu))
+ vmx_passthrough_lbr_msrs(vcpu);

if (enable_preemption_timer)
vmx_update_hv_timer(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index ae645c2082ba..863bb3fe73d4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -77,6 +77,7 @@ bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);

int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);

struct lbr_desc {
/* Basic info about guest LBR records. */
--
2.29.2

2021-02-01 05:35:16

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 09/11] KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 3 +++
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 21 ++++++++++++++++++++-
3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 405890c723a1..136dc2f3c5d3 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -476,6 +476,9 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
pmc_stop_counter(pmc);
}

+ if (kvm_x86_ops.pmu_ops->cleanup)
+ kvm_x86_ops.pmu_ops->cleanup(vcpu);
+
bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 742a4e98df8c..7b30bc967af3 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -40,6 +40,7 @@ struct kvm_pmu_ops {
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
+ void (*cleanup)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 51edd9c1adfa..23cd31b849f4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -288,8 +288,10 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
PERF_SAMPLE_BRANCH_USER,
};

- if (unlikely(lbr_desc->event))
+ if (unlikely(lbr_desc->event)) {
+ __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
return 0;
+ }

event = perf_event_create_kernel_counter(&attr, -1,
current, NULL, NULL);
@@ -300,6 +302,7 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
}
lbr_desc->event = event;
pmu->event_count++;
+ __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
return 0;
}

@@ -332,9 +335,11 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
rdmsrl(index, msr_info->data);
else
wrmsrl(index, msr_info->data);
+ __set_bit(INTEL_PMC_IDX_FIXED_VLBR, vcpu_to_pmu(vcpu)->pmc_in_use);
local_irq_enable();
return true;
}
+ clear_bit(INTEL_PMC_IDX_FIXED_VLBR, vcpu_to_pmu(vcpu)->pmc_in_use);
local_irq_enable();

dummy:
@@ -463,6 +468,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
union cpuid10_edx edx;
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

pmu->nr_arch_gp_counters = 0;
pmu->nr_arch_fixed_counters = 0;
@@ -482,6 +488,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
+ if (lbr_desc->records.nr)
+ bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1);

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -658,17 +666,21 @@ static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
*/
void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
goto warn;
+ if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
+ goto warn;
return;
}

if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
vmx_disable_lbr_msrs_passthrough(vcpu);
+ __clear_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
goto warn;
} else
vmx_enable_lbr_msrs_passthrough(vcpu);
@@ -680,6 +692,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
vcpu->vcpu_id);
}

+static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+ if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ intel_pmu_release_guest_lbr_event(vcpu);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
@@ -695,4 +713,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.init = intel_pmu_init,
.reset = intel_pmu_reset,
.deliver_pmi = intel_pmu_deliver_pmi,
+ .cleanup = intel_pmu_cleanup,
};
--
2.29.2

2021-02-01 06:37:16

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 07/11] KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 13 +++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 287fc14f0445..60f395e18446 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -550,6 +550,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
vcpu->arch.perf_capabilities = 0;
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
+ lbr_desc->msr_passthrough = false;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -596,12 +597,24 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)

static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->msr_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, true);
+ lbr_desc->msr_passthrough = false;
}

static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->msr_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, false);
+ lbr_desc->msr_passthrough = true;
}

/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 863bb3fe73d4..4d6a2624a204 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -90,6 +90,9 @@ struct lbr_desc {
* The records may be inaccurate if the host reclaims the LBR.
*/
struct perf_event *event;
+
+ /* True if LBRs are marked as not intercepted in the MSR bitmap */
+ bool msr_passthrough;
};

/*
--
2.29.2

2021-02-01 06:37:37

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 11/11] selftests: kvm/x86: add test for pmu msr MSR_IA32_PERF_CAPABILITIES

This test will check the effect of various CPUID settings on the
MSR_IA32_PERF_CAPABILITIES MSR, check that whatever user space writes
with KVM_SET_MSR is _not_ modified from the guest and can be retrieved
with KVM_GET_MSR, and check that invalid LBR formats are rejected.

Signed-off-by: Like Xu <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/vmx_pmu_msrs_test.c | 149 ++++++++++++++++++
3 files changed, 151 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index ce8f4ad39684..28b71efe52a0 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -25,6 +25,7 @@
/x86_64/vmx_set_nested_state_test
/x86_64/vmx_tsc_adjust_test
/x86_64/xss_msr_test
+/x86_64/vmx_pmu_msrs_test
/demand_paging_test
/dirty_log_test
/dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index fe41c6a0fa67..cf8737828dd4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -59,6 +59,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_msrs_test
TEST_GEN_PROGS_x86_64 += demand_paging_test
TEST_GEN_PROGS_x86_64 += dirty_log_test
TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c
new file mode 100644
index 000000000000..b3ad63e6ff12
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VMX-pmu related msrs test
+ *
+ * Copyright (C) 2021 Intel Corporation
+ *
+ * Test to check the effect of various CPUID settings
+ * on the MSR_IA32_PERF_CAPABILITIES MSR, and check that
+ * whatever we write with KVM_SET_MSR is _not_ modified
+ * in the guest and test it can be retrieved with KVM_GET_MSR.
+ *
+ * Test to check that invalid LBR formats are rejected.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <sys/ioctl.h>
+
+#include "kvm_util.h"
+#include "vmx.h"
+
+#define VCPU_ID 0
+
+#define X86_FEATURE_PDCM (1<<15)
+#define PMU_CAP_FW_WRITES (1ULL << 13)
+#define PMU_CAP_LBR_FMT 0x3f
+
+union cpuid10_eax {
+ struct {
+ unsigned int version_id:8;
+ unsigned int num_counters:8;
+ unsigned int bit_width:8;
+ unsigned int mask_length:8;
+ } split;
+ unsigned int full;
+};
+
+union perf_capabilities {
+ struct {
+ u64 lbr_format:6;
+ u64 pebs_trap:1;
+ u64 pebs_arch_reg:1;
+ u64 pebs_format:4;
+ u64 smm_freeze:1;
+ u64 full_width_write:1;
+ u64 pebs_baseline:1;
+ u64 perf_metrics:1;
+ u64 pebs_output_pt_available:1;
+ u64 anythread_deprecated:1;
+ };
+ u64 capabilities;
+};
+
+uint64_t rdmsr_on_cpu(uint32_t reg)
+{
+ uint64_t data;
+ int fd;
+ char msr_file[64];
+
+ sprintf(msr_file, "/dev/cpu/%d/msr", 0);
+ fd = open(msr_file, O_RDONLY);
+ if (fd < 0)
+ exit(KSFT_SKIP);
+
+ if (pread(fd, &data, sizeof(data), reg) != sizeof(data))
+ exit(KSFT_SKIP);
+
+ close(fd);
+ return data;
+}
+
+static void guest_code(void)
+{
+ wrmsr(MSR_IA32_PERF_CAPABILITIES, PMU_CAP_LBR_FMT);
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_cpuid2 *cpuid;
+ struct kvm_cpuid_entry2 *entry_1_0;
+ struct kvm_cpuid_entry2 *entry_a_0;
+ bool pdcm_supported = false;
+ struct kvm_vm *vm;
+ int ret;
+ union cpuid10_eax eax;
+ union perf_capabilities host_cap;
+
+ host_cap.capabilities = rdmsr_on_cpu(MSR_IA32_PERF_CAPABILITIES);
+ host_cap.capabilities &= (PMU_CAP_FW_WRITES | PMU_CAP_LBR_FMT);
+
+ /* Create VM */
+ vm = vm_create_default(VCPU_ID, 0, guest_code);
+ cpuid = kvm_get_supported_cpuid();
+
+ if (kvm_get_cpuid_max_basic() >= 0xa) {
+ entry_1_0 = kvm_get_supported_cpuid_index(1, 0);
+ entry_a_0 = kvm_get_supported_cpuid_index(0xa, 0);
+ pdcm_supported = entry_1_0 && !!(entry_1_0->ecx & X86_FEATURE_PDCM);
+ eax.full = entry_a_0->eax;
+ }
+ if (!pdcm_supported) {
+ print_skip("MSR_IA32_PERF_CAPABILITIES is not supported by the vCPU");
+ exit(KSFT_SKIP);
+ }
+ if (!eax.split.version_id) {
+ print_skip("PMU is not supported by the vCPU");
+ exit(KSFT_SKIP);
+ }
+
+ /* testcase 1, set capabilities when we have PDCM bit */
+ vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, PMU_CAP_FW_WRITES);
+
+ /* check capabilities can be retrieved with KVM_GET_MSR */
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), PMU_CAP_FW_WRITES);
+
+ /* check whatever we write with KVM_SET_MSR is _not_ modified */
+ vcpu_run(vm, VCPU_ID);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), PMU_CAP_FW_WRITES);
+
+ /* testcase 2, check valid LBR formats are accepted */
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, 0);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), 0);
+
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, host_cap.lbr_format);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), (u64)host_cap.lbr_format);
+
+ /* testcase 3, check invalid LBR format is rejected */
+ ret = _vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, PMU_CAP_LBR_FMT);
+ TEST_ASSERT(ret == 0, "Bad PERF_CAPABILITIES didn't fail.");
+
+ /* testcase 4, set capabilities when we don't have PDCM bit */
+ entry_1_0->ecx &= ~X86_FEATURE_PDCM;
+ vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ ret = _vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, host_cap.capabilities);
+ TEST_ASSERT(ret == 0, "Bad PERF_CAPABILITIES didn't fail.");
+
+ /* testcase 5, set capabilities when we don't have PMU version bits */
+ entry_1_0->ecx |= X86_FEATURE_PDCM;
+ eax.split.version_id = 0;
+ entry_1_0->ecx = eax.full;
+ vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ ret = _vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, PMU_CAP_FW_WRITES);
+ TEST_ASSERT(ret == 0, "Bad PERF_CAPABILITIES didn't fail.");
+
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, 0);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), 0);
+
+ kvm_vm_free(vm);
+}
--
2.29.2

2021-02-01 06:38:26

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 09/11] KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/pmu.c | 3 +++
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 21 ++++++++++++++++++++-
3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 405890c723a1..136dc2f3c5d3 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -476,6 +476,9 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
pmc_stop_counter(pmc);
}

+ if (kvm_x86_ops.pmu_ops->cleanup)
+ kvm_x86_ops.pmu_ops->cleanup(vcpu);
+
bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 742a4e98df8c..7b30bc967af3 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -40,6 +40,7 @@ struct kvm_pmu_ops {
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
+ void (*cleanup)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 51edd9c1adfa..23cd31b849f4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -288,8 +288,10 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
PERF_SAMPLE_BRANCH_USER,
};

- if (unlikely(lbr_desc->event))
+ if (unlikely(lbr_desc->event)) {
+ __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
return 0;
+ }

event = perf_event_create_kernel_counter(&attr, -1,
current, NULL, NULL);
@@ -300,6 +302,7 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
}
lbr_desc->event = event;
pmu->event_count++;
+ __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
return 0;
}

@@ -332,9 +335,11 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
rdmsrl(index, msr_info->data);
else
wrmsrl(index, msr_info->data);
+ __set_bit(INTEL_PMC_IDX_FIXED_VLBR, vcpu_to_pmu(vcpu)->pmc_in_use);
local_irq_enable();
return true;
}
+ clear_bit(INTEL_PMC_IDX_FIXED_VLBR, vcpu_to_pmu(vcpu)->pmc_in_use);
local_irq_enable();

dummy:
@@ -463,6 +468,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
union cpuid10_edx edx;
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

pmu->nr_arch_gp_counters = 0;
pmu->nr_arch_fixed_counters = 0;
@@ -482,6 +488,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
return;

perf_get_x86_pmu_capability(&x86_pmu);
+ if (lbr_desc->records.nr)
+ bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1);

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
x86_pmu.num_counters_gp);
@@ -658,17 +666,21 @@ static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
*/
void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);

if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
goto warn;
+ if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
+ goto warn;
return;
}

if (lbr_desc->event->state < PERF_EVENT_STATE_ACTIVE) {
vmx_disable_lbr_msrs_passthrough(vcpu);
+ __clear_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
goto warn;
} else
vmx_enable_lbr_msrs_passthrough(vcpu);
@@ -680,6 +692,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
vcpu->vcpu_id);
}

+static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+ if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ intel_pmu_release_guest_lbr_event(vcpu);
+}
+
struct kvm_pmu_ops intel_pmu_ops = {
.find_arch_event = intel_find_arch_event,
.find_fixed_event = intel_find_fixed_event,
@@ -695,4 +713,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.init = intel_pmu_init,
.reset = intel_pmu_reset,
.deliver_pmi = intel_pmu_deliver_pmi,
+ .cleanup = intel_pmu_cleanup,
};
--
2.29.2

2021-02-01 06:40:24

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 08/11] KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/pmu.c | 5 ++++-
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/capabilities.h | 4 +++-
arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 2 +-
5 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 67741d2a0308..405890c723a1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -383,8 +383,11 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
{
- if (lapic_in_kernel(vcpu))
+ if (lapic_in_kernel(vcpu)) {
+ if (kvm_x86_ops.pmu_ops->deliver_pmi)
+ kvm_x86_ops.pmu_ops->deliver_pmi(vcpu);
kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTPC);
+ }
}

bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 067fef51760c..742a4e98df8c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -39,6 +39,7 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+ void (*deliver_pmi)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 62aa7a701ebb..57b940c613ab 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -21,6 +21,8 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f

+#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
+
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -384,7 +386,7 @@ static inline u64 vmx_supported_debugctl(void)
u64 debugctl = DEBUGCTLMSR_BTF;

if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR;
+ debugctl |= DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 60f395e18446..51edd9c1adfa 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -579,6 +579,35 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+/*
+ * Emulate LBR_On_PMI behavior for 1 < pmu.version < 4.
+ *
+ * If Freeze_LBR_On_PMI = 1, the LBR is frozen on PMI and
+ * the KVM emulates to clear the LBR bit (bit 0) in IA32_DEBUGCTL.
+ *
+ * Guest needs to re-enable LBR to resume branches recording.
+ */
+static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
+{
+ u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
+ data &= ~DEBUGCTLMSR_LBR;
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ }
+}
+
+static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
+{
+ u8 version = vcpu_to_pmu(vcpu)->version;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return;
+
+ if (version > 1 && version < 4)
+ intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
+}
+
static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
{
struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
@@ -665,4 +694,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+ .deliver_pmi = intel_pmu_deliver_pmi,
};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 40fdeb394328..5389032ca4ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1963,7 +1963,7 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
u64 debugctl = vmx_supported_debugctl();

if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR;
+ debugctl &= ~DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
--
2.29.2

2021-02-01 06:45:55

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 11/11] selftests: kvm/x86: add test for pmu msr MSR_IA32_PERF_CAPABILITIES

This test will check the effect of various CPUID settings on the
MSR_IA32_PERF_CAPABILITIES MSR, check that whatever user space writes
with KVM_SET_MSR is _not_ modified from the guest and can be retrieved
with KVM_GET_MSR, and check that invalid LBR formats are rejected.

Signed-off-by: Like Xu <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/vmx_pmu_msrs_test.c | 149 ++++++++++++++++++
3 files changed, 151 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index ce8f4ad39684..28b71efe52a0 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -25,6 +25,7 @@
/x86_64/vmx_set_nested_state_test
/x86_64/vmx_tsc_adjust_test
/x86_64/xss_msr_test
+/x86_64/vmx_pmu_msrs_test
/demand_paging_test
/dirty_log_test
/dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index fe41c6a0fa67..cf8737828dd4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -59,6 +59,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_msrs_test
TEST_GEN_PROGS_x86_64 += demand_paging_test
TEST_GEN_PROGS_x86_64 += dirty_log_test
TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c
new file mode 100644
index 000000000000..b3ad63e6ff12
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VMX-pmu related msrs test
+ *
+ * Copyright (C) 2021 Intel Corporation
+ *
+ * Test to check the effect of various CPUID settings
+ * on the MSR_IA32_PERF_CAPABILITIES MSR, and check that
+ * whatever we write with KVM_SET_MSR is _not_ modified
+ * in the guest and test it can be retrieved with KVM_GET_MSR.
+ *
+ * Test to check that invalid LBR formats are rejected.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <sys/ioctl.h>
+
+#include "kvm_util.h"
+#include "vmx.h"
+
+#define VCPU_ID 0
+
+#define X86_FEATURE_PDCM (1<<15)
+#define PMU_CAP_FW_WRITES (1ULL << 13)
+#define PMU_CAP_LBR_FMT 0x3f
+
+union cpuid10_eax {
+ struct {
+ unsigned int version_id:8;
+ unsigned int num_counters:8;
+ unsigned int bit_width:8;
+ unsigned int mask_length:8;
+ } split;
+ unsigned int full;
+};
+
+union perf_capabilities {
+ struct {
+ u64 lbr_format:6;
+ u64 pebs_trap:1;
+ u64 pebs_arch_reg:1;
+ u64 pebs_format:4;
+ u64 smm_freeze:1;
+ u64 full_width_write:1;
+ u64 pebs_baseline:1;
+ u64 perf_metrics:1;
+ u64 pebs_output_pt_available:1;
+ u64 anythread_deprecated:1;
+ };
+ u64 capabilities;
+};
+
+uint64_t rdmsr_on_cpu(uint32_t reg)
+{
+ uint64_t data;
+ int fd;
+ char msr_file[64];
+
+ sprintf(msr_file, "/dev/cpu/%d/msr", 0);
+ fd = open(msr_file, O_RDONLY);
+ if (fd < 0)
+ exit(KSFT_SKIP);
+
+ if (pread(fd, &data, sizeof(data), reg) != sizeof(data))
+ exit(KSFT_SKIP);
+
+ close(fd);
+ return data;
+}
+
+static void guest_code(void)
+{
+ wrmsr(MSR_IA32_PERF_CAPABILITIES, PMU_CAP_LBR_FMT);
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_cpuid2 *cpuid;
+ struct kvm_cpuid_entry2 *entry_1_0;
+ struct kvm_cpuid_entry2 *entry_a_0;
+ bool pdcm_supported = false;
+ struct kvm_vm *vm;
+ int ret;
+ union cpuid10_eax eax;
+ union perf_capabilities host_cap;
+
+ host_cap.capabilities = rdmsr_on_cpu(MSR_IA32_PERF_CAPABILITIES);
+ host_cap.capabilities &= (PMU_CAP_FW_WRITES | PMU_CAP_LBR_FMT);
+
+ /* Create VM */
+ vm = vm_create_default(VCPU_ID, 0, guest_code);
+ cpuid = kvm_get_supported_cpuid();
+
+ if (kvm_get_cpuid_max_basic() >= 0xa) {
+ entry_1_0 = kvm_get_supported_cpuid_index(1, 0);
+ entry_a_0 = kvm_get_supported_cpuid_index(0xa, 0);
+ pdcm_supported = entry_1_0 && !!(entry_1_0->ecx & X86_FEATURE_PDCM);
+ eax.full = entry_a_0->eax;
+ }
+ if (!pdcm_supported) {
+ print_skip("MSR_IA32_PERF_CAPABILITIES is not supported by the vCPU");
+ exit(KSFT_SKIP);
+ }
+ if (!eax.split.version_id) {
+ print_skip("PMU is not supported by the vCPU");
+ exit(KSFT_SKIP);
+ }
+
+ /* testcase 1, set capabilities when we have PDCM bit */
+ vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, PMU_CAP_FW_WRITES);
+
+ /* check capabilities can be retrieved with KVM_GET_MSR */
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), PMU_CAP_FW_WRITES);
+
+ /* check whatever we write with KVM_SET_MSR is _not_ modified */
+ vcpu_run(vm, VCPU_ID);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), PMU_CAP_FW_WRITES);
+
+ /* testcase 2, check valid LBR formats are accepted */
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, 0);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), 0);
+
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, host_cap.lbr_format);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), (u64)host_cap.lbr_format);
+
+ /* testcase 3, check invalid LBR format is rejected */
+ ret = _vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, PMU_CAP_LBR_FMT);
+ TEST_ASSERT(ret == 0, "Bad PERF_CAPABILITIES didn't fail.");
+
+ /* testcase 4, set capabilities when we don't have PDCM bit */
+ entry_1_0->ecx &= ~X86_FEATURE_PDCM;
+ vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ ret = _vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, host_cap.capabilities);
+ TEST_ASSERT(ret == 0, "Bad PERF_CAPABILITIES didn't fail.");
+
+ /* testcase 5, set capabilities when we don't have PMU version bits */
+ entry_1_0->ecx |= X86_FEATURE_PDCM;
+ eax.split.version_id = 0;
+ entry_1_0->ecx = eax.full;
+ vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+ ret = _vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, PMU_CAP_FW_WRITES);
+ TEST_ASSERT(ret == 0, "Bad PERF_CAPABILITIES didn't fail.");
+
+ vcpu_set_msr(vm, 0, MSR_IA32_PERF_CAPABILITIES, 0);
+ ASSERT_EQ(vcpu_get_msr(vm, VCPU_ID, MSR_IA32_PERF_CAPABILITIES), 0);
+
+ kvm_vm_free(vm);
+}
--
2.29.2

2021-02-01 06:46:37

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 08/11] KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/pmu.c | 5 ++++-
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/vmx/capabilities.h | 4 +++-
arch/x86/kvm/vmx/pmu_intel.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 2 +-
5 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 67741d2a0308..405890c723a1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -383,8 +383,11 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)

void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
{
- if (lapic_in_kernel(vcpu))
+ if (lapic_in_kernel(vcpu)) {
+ if (kvm_x86_ops.pmu_ops->deliver_pmi)
+ kvm_x86_ops.pmu_ops->deliver_pmi(vcpu);
kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTPC);
+ }
}

bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 067fef51760c..742a4e98df8c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -39,6 +39,7 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+ void (*deliver_pmi)(struct kvm_vcpu *vcpu);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 62aa7a701ebb..57b940c613ab 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -21,6 +21,8 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f

+#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
+
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -384,7 +386,7 @@ static inline u64 vmx_supported_debugctl(void)
u64 debugctl = DEBUGCTLMSR_BTF;

if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR;
+ debugctl |= DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 60f395e18446..51edd9c1adfa 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -579,6 +579,35 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
intel_pmu_release_guest_lbr_event(vcpu);
}

+/*
+ * Emulate LBR_On_PMI behavior for 1 < pmu.version < 4.
+ *
+ * If Freeze_LBR_On_PMI = 1, the LBR is frozen on PMI and
+ * the KVM emulates to clear the LBR bit (bit 0) in IA32_DEBUGCTL.
+ *
+ * Guest needs to re-enable LBR to resume branches recording.
+ */
+static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
+{
+ u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
+ data &= ~DEBUGCTLMSR_LBR;
+ vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+ }
+}
+
+static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
+{
+ u8 version = vcpu_to_pmu(vcpu)->version;
+
+ if (!intel_pmu_lbr_is_enabled(vcpu))
+ return;
+
+ if (version > 1 && version < 4)
+ intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
+}
+
static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
{
struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
@@ -665,4 +694,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+ .deliver_pmi = intel_pmu_deliver_pmi,
};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 40fdeb394328..5389032ca4ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1963,7 +1963,7 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
u64 debugctl = vmx_supported_debugctl();

if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR;
+ debugctl &= ~DEBUGCTLMSR_LBR_MASK;

return debugctl;
}
--
2.29.2

2021-02-01 06:48:31

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 10/11] KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES

Userspace could enable guest LBR feature when the exactly supported
LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
and the LBR is also compatible with vPMU version and host cpu model.

The LBR could be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 57b940c613ab..c49f3ee8eca8 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -374,11 +374,18 @@ static inline bool vmx_pt_mode_is_host_guest(void)

static inline u64 vmx_get_perf_capabilities(void)
{
+ u64 perf_cap;
+
+ if (boot_cpu_has(X86_FEATURE_PDCM))
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
+
+ perf_cap &= PMU_CAP_LBR_FMT;
+
/*
* Since counters are virtualized, KVM would support full
* width counting unconditionally, even if the host lacks it.
*/
- return PMU_CAP_FW_WRITES;
+ return PMU_CAP_FW_WRITES | perf_cap;
}

static inline u64 vmx_supported_debugctl(void)
--
2.29.2

2021-02-01 06:53:00

by Like Xu

[permalink] [raw]
Subject: [PATCH v14 07/11] KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.

Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 13 +++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 287fc14f0445..60f395e18446 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -550,6 +550,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
vcpu->arch.perf_capabilities = 0;
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
+ lbr_desc->msr_passthrough = false;
}

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -596,12 +597,24 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)

static inline void vmx_disable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (!lbr_desc->msr_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, true);
+ lbr_desc->msr_passthrough = false;
}

static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
{
+ struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+ if (lbr_desc->msr_passthrough)
+ return;
+
vmx_update_intercept_for_lbr_msrs(vcpu, false);
+ lbr_desc->msr_passthrough = true;
}

/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 863bb3fe73d4..4d6a2624a204 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -90,6 +90,9 @@ struct lbr_desc {
* The records may be inaccurate if the host reclaims the LBR.
*/
struct perf_event *event;
+
+ /* True if LBRs are marked as not intercepted in the MSR bitmap */
+ bool msr_passthrough;
};

/*
--
2.29.2

2021-02-02 11:51:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v14 02/11] KVM: x86/pmu: Set up IA32_PERF_CAPABILITIES if PDCM bit is available

On 01/02/21 06:10, Like Xu wrote:
>
> - if (guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> - vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();

Why remove this "if"?

> pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters, > x86_pmu.num_counters_gp);
> @@ -405,6 +402,8 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
> pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
> pmu->fixed_counters[i].current_config = 0;
> }
> +
> + vcpu->arch.perf_capabilities = 0;

Paolo

2021-02-02 12:05:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v14 03/11] KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled

On 01/02/21 06:10, Like Xu wrote:
> Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
> MSR which tells about the record format stored in the LBR records.
>
> The LBR will be enabled on the guest if host perf supports LBR
> (checked via x86_perf_get_lbr()) and the vcpu model is compatible
> with the host one.
>
> Signed-off-by: Like Xu <[email protected]>
> ---
> arch/x86/kvm/vmx/capabilities.h | 1 +
> arch/x86/kvm/vmx/pmu_intel.c | 17 +++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 7 +++++++
> arch/x86/kvm/vmx/vmx.h | 11 +++++++++++
> 4 files changed, 36 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index a58cf3655351..db1178a66d93 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -19,6 +19,7 @@ extern int __read_mostly pt_mode;
> #define PT_MODE_HOST_GUEST 1
>
> #define PMU_CAP_FW_WRITES (1ULL << 13)
> +#define PMU_CAP_LBR_FMT 0x3f
>
> struct nested_vmx_msrs {
> /*
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index f632039173ff..01b2cd8eca47 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -168,6 +168,21 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
> return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
> }
>
> +bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
> +{
> + struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
> +
> + /*
> + * As a first step, a guest could only enable LBR feature if its
> + * cpu model is the same as the host because the LBR registers
> + * would be pass-through to the guest and they're model specific.
> + */
> + if (boot_cpu_data.x86_model != guest_cpuid_model(vcpu))
> + return false;
> +
> + return !x86_perf_get_lbr(lbr);

This seems the wrong place to me. What about adding

+ if (intel_pmu_lbr_is_compatible(vcpu))
+ x86_perf_get_lbr(lbr_desc);
+ else
+ lbr_desc->records.nr = 0;
}

at the end of intel_pmu_refresh instead?

Paolo

2021-02-02 21:26:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

On 01/02/21 06:10, Like Xu wrote:
> Hi geniuses,
>
> Please help review this new version which enables the guest LBR.
>
> We already upstreamed the guest LBR support in the host perf, please
> check more details in each commit and feel free to test and comment.
>
> QEMU part: https://lore.kernel.org/qemu-devel/[email protected]
> kvm-unit-tests: https://lore.kernel.org/kvm/[email protected]
>
> v13-v14 Changelog:
> - Rewrite crud about vcpu->arch.perf_capabilities;
> - Add PERF_CAPABILITIES testcases to tools/testing/selftests/kvm;
> - Add basic LBR testcases to the kvm-unit-tests (w/ QEMU patches);
> - Apply rewritten commit log from Paolo;
> - Queued the first patch "KVM: x86: Move common set/get handler ...";
> - Rename 'already_passthrough' to 'msr_passthrough';
> - Check the values of MSR_IA32_PERF_CAPABILITIES early;
> - Call kvm_x86_ops.pmu_ops->cleanup() always and drop extra_cleanup;
> - Use INTEL_PMC_IDX_FIXED_VLBR directly;
> - Fix a bug in the vmx_get_perf_capabilities();
>
> Previous:
> https://lore.kernel.org/kvm/[email protected]/

Queued, thanks. There were some conflicts with the bus lock detection
patches, so I had to tweak a bit the DEBUGCTL MSR handling.

Paolo

> ---
>
> The last branch recording (LBR) is a performance monitor unit (PMU)
> feature on Intel processors that records a running trace of the most
> recent branches taken by the processor in the LBR stack. This patch
> series is going to enable this feature for plenty of KVM guests.
>
> with this patch set, the following error will be gone forever and cloud
> developers can better understand their programs with less profiling overhead:
>
> $ perf record -b lbr ${WORKLOAD}
> or $ perf record --call-graph lbr ${WORKLOAD}
> Error:
> cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
>
> The user space could configure whether it's enabled or not for each
> guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
> could only enable LBR feature if its cpu model is the same as the
> host since the LBR feature is still one of model specific features.
>
> If it's enabled on the guest, the guest LBR driver would accesses the
> LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
> The first guest access on the LBR related MSRs is always interceptible.
> The KVM trap would create a special LBR event (called guest LBR event)
> which enables the callstack mode and none of hardware counter is assigned.
> The host perf would enable and schedule this event as usual.
>
> Guest's first access to a LBR registers gets trapped to KVM, which
> creates a guest LBR perf event. It's a regular LBR perf event which gets
> the LBR facility assigned from the perf subsystem. Once that succeeds,
> the LBR stack msrs are passed through to the guest for efficient accesses.
> However, if another host LBR event comes in and takes over the LBR
> facility, the LBR msrs will be made interceptible, and guest following
> accesses to the LBR msrs will be trapped and meaningless.
>
> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
> VMX transition brings too excessive overhead to frequent vmx transition
> itself, the guest LBR event would help save/restore the LBR stack msrs
> during the context switching with the help of native LBR event callstack
> mechanism, including LBR_SELECT msr.
>
> If the guest no longer accesses the LBR-related MSRs within a scheduling
> time slice and the LBR enable bit is unset, vPMU would release its guest
> LBR event as a normal event of a unused vPMC and the pass-through
> state of the LBR stack msrs would be canceled.
>
> ---
>
> LBR testcase:
> echo 1 > /proc/sys/kernel/watchdog
> echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
> echo 5000 > /proc/sys/kernel/perf_event_max_sample_rate
> echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
> perf record -b ./br_instr a
> (perf record --call-graph lbr ./br_instr a)
>
> - Perf report on the host:
> Samples: 72K of event 'cycles', Event count (approx.): 72512
> Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
> 12.12% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
> 11.05% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
> 8.81% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
> 5.04% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
> 4.92% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
> 4.88% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
> 4.58% br_instr br_instr [.] cmp_end [.] lfsr_cond 5
>
> - Perf report on the guest:
> Samples: 92K of event 'cycles', Event count (approx.): 92544
> Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
> 12.03% br_instr br_instr [.] cmp_end [.] lfsr_cond 1
> 11.09% br_instr br_instr [.] lfsr_cond [.] cmp_end 5
> 8.57% br_instr br_instr [.] lfsr_cond [.] cmp_end 4
> 5.08% br_instr br_instr [.] lfsr_cond [.] cmp_end 6
> 5.06% br_instr br_instr [.] cmp_end [.] lfsr_cond 20
> 4.87% br_instr br_instr [.] cmp_end [.] lfsr_cond 6
> 4.70% br_instr br_instr [.] cmp_end [.] lfsr_cond 5
>
> Conclusion: the profiling results on the guest are similar to that on the host.
>
> Like Xu (11):
> KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static
> KVM: x86/pmu: Set up IA32_PERF_CAPABILITIES if PDCM bit is available
> KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled
> KVM: vmx/pmu: Expose DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR
> KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR
> KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE
> KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation
> KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI
> KVM: vmx/pmu: Release guest LBR event via lazy release mechanism
> KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES
> selftests: kvm/x86: add test for pmu msr MSR_IA32_PERF_CAPABILITIES
>
> arch/x86/kvm/pmu.c | 8 +-
> arch/x86/kvm/pmu.h | 2 +
> arch/x86/kvm/vmx/capabilities.h | 19 +-
> arch/x86/kvm/vmx/pmu_intel.c | 281 +++++++++++++++++-
> arch/x86/kvm/vmx/vmx.c | 55 +++-
> arch/x86/kvm/vmx/vmx.h | 28 ++
> arch/x86/kvm/x86.c | 2 +-
> tools/testing/selftests/kvm/.gitignore | 1 +
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/x86_64/vmx_pmu_msrs_test.c | 149 ++++++++++
> 10 files changed, 524 insertions(+), 22 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_msrs_test.c
>

2021-02-02 22:29:48

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v14 11/11] selftests: kvm/x86: add test for pmu msr MSR_IA32_PERF_CAPABILITIES

On 01/02/21 07:01, Like Xu wrote:
>
> +uint64_t rdmsr_on_cpu(uint32_t reg)
> +{
> + uint64_t data;
> + int fd;
> + char msr_file[64];
> +
> + sprintf(msr_file, "/dev/cpu/%d/msr", 0);
> + fd = open(msr_file, O_RDONLY);
> + if (fd < 0)
> + exit(KSFT_SKIP);
> +
> + if (pread(fd, &data, sizeof(data), reg) != sizeof(data))
> + exit(KSFT_SKIP);
> +
> + close(fd);
> + return data;
> +}

In order to allow running as non-root, it's better to use the
KVM_GET_MSRS ioctl on the /dev/kvm file descriptor.

The tests pass, but please take a look at the kvm/queue branch to see if
everything is ok.

Paolo

2021-07-29 12:42:01

by Liuxiangdong

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

Hi, like.

This patch set has been merged in 5.12 kernel tree so we can use LBR in
Guest.
Does it have requirement on CPU?
I can use lbr in guest on skylake and icelake, but cannot on IvyBridge.

I can see lbr formats(000011b) in perf_capabilities msr(0x345), but
there is still
error when I try.

$ perf record -b
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try
'perf stat'

Host CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Stepping: 4


Thanks!
Xiangdong Liu

2021-07-30 03:17:38

by Liuxiangdong

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

Hi, like.

Does it have requirement on CPU if we want to use LBR in Guest?

I have tried linux-5.14-rc3 on different CPUs. And I can use lbr on
Haswell, Broadwell, skylake and icelake, but I cannot use lbr on IvyBridge.

Thanks!


On 2021/7/29 20:40, Liuxiangdong wrote:
> Hi, like.
>
> This patch set has been merged in 5.12 kernel tree so we can use LBR
> in Guest.
> Does it have requirement on CPU?
> I can use lbr in guest on skylake and icelake, but cannot on IvyBridge.
>
> I can see lbr formats(000011b) in perf_capabilities msr(0x345), but
> there is still
> error when I try.
>
> $ perf record -b
> Error:
> cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try
> 'perf stat'
>
> Host CPU:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> Address sizes: 46 bits physical, 48 bits virtual
> CPU(s): 24
> On-line CPU(s) list: 0-23
> Thread(s) per core: 2
> Core(s) per socket: 6
> Socket(s): 2
> NUMA node(s): 2
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 62
> Model name: Intel(R) Xeon(R) CPU E5-2620 v2 @
> 2.10GHz
> Stepping: 4
>
>
> Thanks!
> Xiangdong Liu


2021-07-30 03:32:31

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

On 30/7/2021 11:15 am, Liuxiangdong wrote:
> Hi, like.
>
> Does it have requirement on CPU if we want to use LBR in Guest?

As long as you find valid output from the "dmesg| grep -i LBR" like
"XX-deep LBR",
you can use LBR on the host and theoretically on the most Intel guest.

But I don't have various Intel machine types for testing.

As far as I know, the guest LBR doesn't work on the platforms that
the MSR_LBR_SELECT is defined per physical core not logical core.

I will fix this issue by making KVM aware of the recent core scheduling
policy.

>
> I have tried linux-5.14-rc3 on different CPUs. And I can use lbr on
> Haswell, Broadwell, skylake and icelake, but I cannot use lbr on IvyBridge.

I suppose INTEL_FAM6_IVYBRIDGE and INTEL_FAM6_IVYBRIDGE_X do support LBR.

You may check the return values from x86_perf_get_lbr() or
cpuid_model_is_consistent() in the KVM for more details.

>
> Thanks!
>
>
> On 2021/7/29 20:40, Liuxiangdong wrote:
>> Hi, like.
>>
>> This patch set has been merged in 5.12 kernel tree so we can use LBR
>> in Guest.
>> Does it have requirement on CPU?
>> I can use lbr in guest on skylake and icelake, but cannot on IvyBridge.
>>
>> I can see lbr formats(000011b) in perf_capabilities msr(0x345), but
>> there is still
>> error when I try.
>>
>> $ perf record -b
>> Error:
>> cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try
>> 'perf stat'
>>
>> Host CPU:
>> Architecture:                    x86_64
>> CPU op-mode(s):                  32-bit, 64-bit
>> Byte Order:                      Little Endian
>> Address sizes:                   46 bits physical, 48 bits virtual
>> CPU(s):                          24
>> On-line CPU(s) list:             0-23
>> Thread(s) per core:              2
>> Core(s) per socket:              6
>> Socket(s):                       2
>> NUMA node(s):                    2
>> Vendor ID:                       GenuineIntel
>> CPU family:                      6
>> Model:                           62
>> Model name:                      Intel(R) Xeon(R) CPU E5-2620 v2 @
>> 2.10GHz
>> Stepping:                        4
>>
>>
>> Thanks!
>> Xiangdong Liu
>

2022-09-14 00:02:19

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

On Sun, Jan 31, 2021 at 9:17 PM Like Xu <[email protected]> wrote:
>
> Hi geniuses,
>
> Please help review this new version which enables the guest LBR.
>
> We already upstreamed the guest LBR support in the host perf, please
> check more details in each commit and feel free to test and comment.
>
> QEMU part: https://lore.kernel.org/qemu-devel/[email protected]
> kvm-unit-tests: https://lore.kernel.org/kvm/[email protected]
>
> v13-v14 Changelog:
> - Rewrite crud about vcpu->arch.perf_capabilities;
> - Add PERF_CAPABILITIES testcases to tools/testing/selftests/kvm;
> - Add basic LBR testcases to the kvm-unit-tests (w/ QEMU patches);
> - Apply rewritten commit log from Paolo;
> - Queued the first patch "KVM: x86: Move common set/get handler ...";
> - Rename 'already_passthrough' to 'msr_passthrough';
> - Check the values of MSR_IA32_PERF_CAPABILITIES early;
> - Call kvm_x86_ops.pmu_ops->cleanup() always and drop extra_cleanup;
> - Use INTEL_PMC_IDX_FIXED_VLBR directly;
> - Fix a bug in the vmx_get_perf_capabilities();
>
> Previous:
> https://lore.kernel.org/kvm/[email protected]/
>
> ---
>
> The last branch recording (LBR) is a performance monitor unit (PMU)
> feature on Intel processors that records a running trace of the most
> recent branches taken by the processor in the LBR stack. This patch
> series is going to enable this feature for plenty of KVM guests.
>
> with this patch set, the following error will be gone forever and cloud
> developers can better understand their programs with less profiling overhead:
>
> $ perf record -b lbr ${WORKLOAD}
> or $ perf record --call-graph lbr ${WORKLOAD}
> Error:
> cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
>
> The user space could configure whether it's enabled or not for each
> guest via MSR_IA32_PERF_CAPABILITIES msr. As a first step, a guest
> could only enable LBR feature if its cpu model is the same as the
> host since the LBR feature is still one of model specific features.
>
> If it's enabled on the guest, the guest LBR driver would accesses the
> LBR MSR (including IA32_DEBUGCTLMSR and records MSRs) as host does.
> The first guest access on the LBR related MSRs is always interceptible.
> The KVM trap would create a special LBR event (called guest LBR event)
> which enables the callstack mode and none of hardware counter is assigned.
> The host perf would enable and schedule this event as usual.
>
> Guest's first access to a LBR registers gets trapped to KVM, which
> creates a guest LBR perf event. It's a regular LBR perf event which gets
> the LBR facility assigned from the perf subsystem. Once that succeeds,
> the LBR stack msrs are passed through to the guest for efficient accesses.
> However, if another host LBR event comes in and takes over the LBR
> facility, the LBR msrs will be made interceptible, and guest following
> accesses to the LBR msrs will be trapped and meaningless.
>
> Because saving/restoring tens of LBR MSRs (e.g. 32 LBR stack entries) in
> VMX transition brings too excessive overhead to frequent vmx transition
> itself, the guest LBR event would help save/restore the LBR stack msrs
> during the context switching with the help of native LBR event callstack
> mechanism, including LBR_SELECT msr.
>
> If the guest no longer accesses the LBR-related MSRs within a scheduling
> time slice and the LBR enable bit is unset, vPMU would release its guest
> LBR event as a normal event of a unused vPMC and the pass-through
> state of the LBR stack msrs would be canceled.

How does live migration work? I don't see any mechanism for recording
the current LBR MSRs on suspend or restoring them on resume.

2022-09-19 07:48:24

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

On 14/9/2022 7:42 am, Jim Mattson wrote:
> How does live migration work? I don't see any mechanism for recording
> the current LBR MSRs on suspend or restoring them on resume.

Considering that LBR is still a model specific feature, migration is less
valuable unless
both LBR_FMT values of the migration side are the same, the compatibility check
(based on cpu models) is required (gathering dust in my to-do list);

and there is another dusty missing piece is how to ensure that vcpu can get LBR
hardware in
vmx transition when KVM lbr event fails in host lbr event competition, the
complexity here is
that the host and guest may have different LBR filtering options.

The good news is the Architecture LBR will add save/restore support since Paolo
is not averse to
putting more msr's into msrs_to_save_all[], perhaps a dynamic addition mechanism
is a prerequisite.

Please let me know what your priority preferences are for these tasks above.

2022-09-19 18:28:53

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v14 00/11] KVM: x86/pmu: Guest Last Branch Recording Enabling

On Mon, Sep 19, 2022 at 12:26 AM Like Xu <[email protected]> wrote:
>
> On 14/9/2022 7:42 am, Jim Mattson wrote:
> > How does live migration work? I don't see any mechanism for recording
> > the current LBR MSRs on suspend or restoring them on resume.
>
> Considering that LBR is still a model specific feature, migration is less
> valuable unless
> both LBR_FMT values of the migration side are the same, the compatibility check
> (based on cpu models) is required (gathering dust in my to-do list);

This seems like a problem best solved in the control plane.

> and there is another dusty missing piece is how to ensure that vcpu can get LBR
> hardware in
> vmx transition when KVM lbr event fails in host lbr event competition, the
> complexity here is
> that the host and guest may have different LBR filtering options.

In case of a conflict, who currently wins? The host or the guest? I'd
like the guest to win, but others may feel differently. Maybe we need
a configuration knob?

> The good news is the Architecture LBR will add save/restore support since Paolo
> is not averse to
> putting more msr's into msrs_to_save_all[], perhaps a dynamic addition mechanism
> is a prerequisite.
>
> Please let me know what your priority preferences are for these tasks above.