2023-10-23 23:40:48

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/6] KVM: x86/pmu: Clean up emulated PMC event handling

The ultimate goal of this series is to track emulated counter events using
a dedicated variable instead of trying to track the previous counter value.
Tracking the previous counter value is flawed as it takes a snapshot at
every emulated event, but only checks for overflow prior to VM-Enter, i.e.
KVM could miss an overflow if KVM ever supports emulating event types that
can occur multiple times in a single VM-Exit.

Patches 1-5 are (some loosely, some tightly) related fixes and cleanups to
simplify the emulated counter approach implementation. The fixes are
tagged for stable as usersepace could cause some weirdness around perf
events, but I doubt any real world VMM is actually affected.

Sean Christopherson (6):
KVM: x86/pmu: Move PMU reset logic to common x86 code
KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing
KVM: x86/pmu: Stop calling kvm_pmu_reset() at RESET (it's redundant)
KVM: x86/pmu: Remove manual clearing of fields in kvm_pmu_init()
KVM: x86/pmu: Update sample period in pmc_write_counter()
KVM: x86/pmu: Track emulated counter events instead of previous
counter

arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +-
arch/x86/include/asm/kvm_host.h | 17 +++-
arch/x86/kvm/pmu.c | 128 +++++++++++++++++++++----
arch/x86/kvm/pmu.h | 47 +--------
arch/x86/kvm/svm/pmu.c | 17 ----
arch/x86/kvm/vmx/pmu_intel.c | 22 -----
arch/x86/kvm/x86.c | 1 -
7 files changed, 127 insertions(+), 107 deletions(-)


base-commit: ec2f1daad460c6201338dae606466220ccaa96d5
--
2.42.0.758.gaed0368e0e-goog


2023-10-23 23:41:23

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 1/6] KVM: x86/pmu: Move PMU reset logic to common x86 code

Move the common (or at least "ignored") aspects of resetting the vPMU to
common x86 code, along with the stop/release helpers that are no used only
by the common pmu.c.

There is no need to manually handle fixed counters as all_valid_pmc_idx
tracks both fixed and general purpose counters, and resetting the vPMU is
far from a hot path, i.e. the extra bit of overhead to the PMC from the
index is a non-issue.

Zero fixed_ctr_ctrl in common code even though it's Intel specific.
Ensuring it's zero doesn't harm AMD/SVM in any way, and stopping the fixed
counters via all_valid_pmc_idx, but not clearing the associated control
bits, would be odd/confusing.

Make the .reset() hook optional as SVM no longer needs vendor specific
handling.

Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +-
arch/x86/kvm/pmu.c | 40 +++++++++++++++++++++++++-
arch/x86/kvm/pmu.h | 18 ------------
arch/x86/kvm/svm/pmu.c | 16 -----------
arch/x86/kvm/vmx/pmu_intel.c | 20 -------------
5 files changed, 40 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h b/arch/x86/include/asm/kvm-x86-pmu-ops.h
index 6c98f4bb4228..058bc636356a 100644
--- a/arch/x86/include/asm/kvm-x86-pmu-ops.h
+++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h
@@ -22,7 +22,7 @@ KVM_X86_PMU_OP(get_msr)
KVM_X86_PMU_OP(set_msr)
KVM_X86_PMU_OP(refresh)
KVM_X86_PMU_OP(init)
-KVM_X86_PMU_OP(reset)
+KVM_X86_PMU_OP_OPTIONAL(reset)
KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
KVM_X86_PMU_OP_OPTIONAL(cleanup)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9ae07db6f0f6..027e9c3c2b93 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -250,6 +250,24 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
return true;
}

+static void pmc_release_perf_event(struct kvm_pmc *pmc)
+{
+ if (pmc->perf_event) {
+ perf_event_release_kernel(pmc->perf_event);
+ pmc->perf_event = NULL;
+ pmc->current_config = 0;
+ pmc_to_pmu(pmc)->event_count--;
+ }
+}
+
+static void pmc_stop_counter(struct kvm_pmc *pmc)
+{
+ if (pmc->perf_event) {
+ pmc->counter = pmc_read_counter(pmc);
+ pmc_release_perf_event(pmc);
+ }
+}
+
static int filter_cmp(const void *pa, const void *pb, u64 mask)
{
u64 a = *(u64 *)pa & mask;
@@ -654,7 +672,27 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)

void kvm_pmu_reset(struct kvm_vcpu *vcpu)
{
- static_call(kvm_x86_pmu_reset)(vcpu);
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+ int i;
+
+ bitmap_zero(pmu->reprogram_pmi, X86_PMC_IDX_MAX);
+
+ for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
+ pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, i);
+ if (!pmc)
+ continue;
+
+ pmc_stop_counter(pmc);
+ pmc->counter = 0;
+
+ if (pmc_is_gp(pmc))
+ pmc->eventsel = 0;
+ }
+
+ pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0;
+
+ static_call_cond(kvm_x86_pmu_reset)(vcpu);
}

void kvm_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 1d64113de488..a46aa9b25150 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -80,24 +80,6 @@ static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
pmc->counter &= pmc_bitmask(pmc);
}

-static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
-{
- if (pmc->perf_event) {
- perf_event_release_kernel(pmc->perf_event);
- pmc->perf_event = NULL;
- pmc->current_config = 0;
- pmc_to_pmu(pmc)->event_count--;
- }
-}
-
-static inline void pmc_stop_counter(struct kvm_pmc *pmc)
-{
- if (pmc->perf_event) {
- pmc->counter = pmc_read_counter(pmc);
- pmc_release_perf_event(pmc);
- }
-}
-
static inline bool pmc_is_gp(struct kvm_pmc *pmc)
{
return pmc->type == KVM_PMC_GP;
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 373ff6a6687b..3fd47de14b38 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -233,21 +233,6 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
}
}

-static void amd_pmu_reset(struct kvm_vcpu *vcpu)
-{
- struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- int i;
-
- for (i = 0; i < KVM_AMD_PMC_MAX_GENERIC; i++) {
- struct kvm_pmc *pmc = &pmu->gp_counters[i];
-
- pmc_stop_counter(pmc);
- pmc->counter = pmc->prev_counter = pmc->eventsel = 0;
- }
-
- pmu->global_ctrl = pmu->global_status = 0;
-}
-
struct kvm_pmu_ops amd_pmu_ops __initdata = {
.hw_event_available = amd_hw_event_available,
.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
@@ -259,7 +244,6 @@ struct kvm_pmu_ops amd_pmu_ops __initdata = {
.set_msr = amd_pmu_set_msr,
.refresh = amd_pmu_refresh,
.init = amd_pmu_init,
- .reset = amd_pmu_reset,
.EVENTSEL_EVENT = AMD64_EVENTSEL_EVENT,
.MAX_NR_GP_COUNTERS = KVM_AMD_PMC_MAX_GENERIC,
.MIN_NR_GP_COUNTERS = AMD64_NUM_COUNTERS,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 820d3e1f6b4f..90c1f7f07e53 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -632,26 +632,6 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)

static void intel_pmu_reset(struct kvm_vcpu *vcpu)
{
- struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- struct kvm_pmc *pmc = NULL;
- int i;
-
- for (i = 0; i < KVM_INTEL_PMC_MAX_GENERIC; i++) {
- pmc = &pmu->gp_counters[i];
-
- pmc_stop_counter(pmc);
- pmc->counter = pmc->prev_counter = pmc->eventsel = 0;
- }
-
- for (i = 0; i < KVM_PMC_MAX_FIXED; i++) {
- pmc = &pmu->fixed_counters[i];
-
- pmc_stop_counter(pmc);
- pmc->counter = pmc->prev_counter = 0;
- }
-
- pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0;
-
intel_pmu_release_guest_lbr_event(vcpu);
}

--
2.42.0.758.gaed0368e0e-goog

2023-10-23 23:41:39

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 4/6] KVM: x86/pmu: Remove manual clearing of fields in kvm_pmu_init()

Remove code the unnecessarily clears event_count and need_cleanup in
kvm_pmu_init(), the entire kvm_pmu is zeroed just a few lines earlier.
Vendor code doesn't set event_count or need_cleanup during .init(), and
if either VMX or SVM did set those fields it would be a flagrant bug.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 458e836c6efe..c06090196b00 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -710,8 +710,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)

memset(pmu, 0, sizeof(*pmu));
static_call(kvm_x86_pmu_init)(vcpu);
- pmu->event_count = 0;
- pmu->need_cleanup = false;
kvm_pmu_refresh(vcpu);
}

--
2.42.0.758.gaed0368e0e-goog

2023-10-23 23:41:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 3/6] KVM: x86/pmu: Stop calling kvm_pmu_reset() at RESET (it's redundant)

Drop kvm_vcpu_reset()'s call to kvm_pmu_reset(), the call is performed
only for RESET, which is really just the same thing as vCPU creation,
and kvm_arch_vcpu_create() *just* called kvm_pmu_init(), i.e. there can't
possibly be any work to do.

Unlike Intel, AMD's amd_pmu_refresh() does fill all_valid_pmc_idx even if
guest CPUID is empty, but everything that is at all dynamic is guaranteed
to be '0'/NULL, e.g. it should be impossible for KVM to have already
created a perf event.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.c | 2 +-
arch/x86/kvm/pmu.h | 1 -
arch/x86/kvm/x86.c | 1 -
3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index dc8e8e907cfb..458e836c6efe 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -657,7 +657,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 0;
}

-void kvm_pmu_reset(struct kvm_vcpu *vcpu)
+static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index a46aa9b25150..db9a12c0a2ef 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -243,7 +243,6 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
-void kvm_pmu_reset(struct kvm_vcpu *vcpu);
void kvm_pmu_init(struct kvm_vcpu *vcpu);
void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c9686207996..c8ac83a7764e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12207,7 +12207,6 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
}

if (!init_event) {
- kvm_pmu_reset(vcpu);
vcpu->arch.smbase = 0x30000;

vcpu->arch.msr_misc_features_enables = 0;
--
2.42.0.758.gaed0368e0e-goog

2023-10-23 23:41:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/6] KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing

Stop all counters and release all perf events before refreshing the vPMU,
i.e. before reconfiguring the vPMU to respond to changes in the vCPU
model.

Clear need_cleanup in kvm_pmu_reset() as well so that KVM doesn't
prematurely stop counters, e.g. if KVM enters the guest and enables
counters before the vCPU is scheduled out.

Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.c | 35 ++++++++++++++++++++++-------------
1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 027e9c3c2b93..dc8e8e907cfb 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -657,25 +657,14 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 0;
}

-/* refresh PMU settings. This function generally is called when underlying
- * settings are changed (such as changes of PMU CPUID by guest VMs), which
- * should rarely happen.
- */
-void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
-{
- if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
- return;
-
- bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- static_call(kvm_x86_pmu_refresh)(vcpu);
-}
-
void kvm_pmu_reset(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
int i;

+ pmu->need_cleanup = false;
+
bitmap_zero(pmu->reprogram_pmi, X86_PMC_IDX_MAX);

for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
@@ -695,6 +684,26 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu)
static_call_cond(kvm_x86_pmu_reset)(vcpu);
}

+
+/*
+ * Refresh the PMU configuration for the vCPU, e.g. if userspace changes CPUID
+ * and/or PERF_CAPABILITIES.
+ */
+void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
+{
+ if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
+ return;
+
+ /*
+ * Stop/release all existing counters/events before realizing the new
+ * vPMU model.
+ */
+ kvm_pmu_reset(vcpu);
+
+ bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX);
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+}
+
void kvm_pmu_init(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
--
2.42.0.758.gaed0368e0e-goog

2023-10-23 23:41:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 5/6] KVM: x86/pmu: Update sample period in pmc_write_counter()

Update a PMC's sample period in pmc_write_counter() to deduplicate code
across all callers of pmc_write_counter(). Opportunistically move
pmc_write_counter() into pmc.c now that it's doing more work. WRMSR isn't
such a hot path that an extra CALL+RET pair will be problematic, and the
order of function definitions needs to be changed anyways, i.e. now is a
convenient time to eat the churn.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/pmu.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/pmu.h | 25 +------------------------
arch/x86/kvm/svm/pmu.c | 1 -
arch/x86/kvm/vmx/pmu_intel.c | 2 --
4 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c06090196b00..3725d001239d 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -161,6 +161,15 @@ static u64 pmc_get_pebs_precise_level(struct kvm_pmc *pmc)
return 1;
}

+static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
+{
+ u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
+
+ if (!sample_period)
+ sample_period = pmc_bitmask(pmc) + 1;
+ return sample_period;
+}
+
static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
bool exclude_user, bool exclude_kernel,
bool intr)
@@ -268,6 +277,24 @@ static void pmc_stop_counter(struct kvm_pmc *pmc)
}
}

+static void pmc_update_sample_period(struct kvm_pmc *pmc)
+{
+ if (!pmc->perf_event || pmc->is_paused ||
+ !is_sampling_event(pmc->perf_event))
+ return;
+
+ perf_event_period(pmc->perf_event,
+ get_sample_period(pmc, pmc->counter));
+}
+
+void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
+{
+ pmc->counter += val - pmc_read_counter(pmc);
+ pmc->counter &= pmc_bitmask(pmc);
+ pmc_update_sample_period(pmc);
+}
+EXPORT_SYMBOL_GPL(pmc_write_counter);
+
static int filter_cmp(const void *pa, const void *pb, u64 mask)
{
u64 a = *(u64 *)pa & mask;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index db9a12c0a2ef..cae85e550f60 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -74,11 +74,7 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
return counter & pmc_bitmask(pmc);
}

-static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
-{
- pmc->counter += val - pmc_read_counter(pmc);
- pmc->counter &= pmc_bitmask(pmc);
-}
+void pmc_write_counter(struct kvm_pmc *pmc, u64 val);

static inline bool pmc_is_gp(struct kvm_pmc *pmc)
{
@@ -128,25 +124,6 @@ static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
return NULL;
}

-static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
-{
- u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
-
- if (!sample_period)
- sample_period = pmc_bitmask(pmc) + 1;
- return sample_period;
-}
-
-static inline void pmc_update_sample_period(struct kvm_pmc *pmc)
-{
- if (!pmc->perf_event || pmc->is_paused ||
- !is_sampling_event(pmc->perf_event))
- return;
-
- perf_event_period(pmc->perf_event,
- get_sample_period(pmc, pmc->counter));
-}
-
static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 3fd47de14b38..b6a7ad4d6914 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -161,7 +161,6 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
if (pmc) {
pmc_write_counter(pmc, data);
- pmc_update_sample_period(pmc);
return 0;
}
/* MSR_EVNTSELn */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 90c1f7f07e53..a6216c874729 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -437,11 +437,9 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!(msr & MSR_PMC_FULL_WIDTH_BIT))
data = (s64)(s32)data;
pmc_write_counter(pmc, data);
- pmc_update_sample_period(pmc);
break;
} else if ((pmc = get_fixed_pmc(pmu, msr))) {
pmc_write_counter(pmc, data);
- pmc_update_sample_period(pmc);
break;
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
reserved_bits = pmu->reserved_bits;
--
2.42.0.758.gaed0368e0e-goog

2023-10-23 23:41:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 6/6] KVM: x86/pmu: Track emulated counter events instead of previous counter

Explicitly track emulated counter events instead of using the common
counter value that's shared with the hardware counter owned by perf.
Bumping the common counter requires snapshotting the pre-increment value
in order to detect overflow from emulation, and the snapshot approach is
inherently flawed.

Snapshotting the previous counter at every increment assumes that there is
at most one emulated counter event per emulated instruction (or rather,
between checks for KVM_REQ_PMU). That's mostly holds true today because
KVM only emulates (branch) instructions retired, but the approach will
fall apart if KVM ever supports event types that don't have a 1:1
relationship with instructions.

And KVM already has a relevant bug, as handle_invalid_guest_state()
emulates multiple instructions without checking KVM_REQ_PMU, i.e. could
miss an overflow event due to clobbering pmc->prev_counter. Not checking
KVM_REQ_PMU is problematic in both cases, but at least with the emulated
counter approach, the resulting behavior is delayed overflow detection,
as opposed to completely lost detection.

Cc: Mingwei Zhang <[email protected]>
Cc: Roman Kagan <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Dapeng Mi <[email protected]>
Cc: Like Xu <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 17 +++++++++++++++-
arch/x86/kvm/pmu.c | 36 +++++++++++++++++++++++----------
arch/x86/kvm/pmu.h | 3 ++-
3 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7036982332e..d8bc9ba88cfc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -500,8 +500,23 @@ struct kvm_pmc {
u8 idx;
bool is_paused;
bool intr;
+ /*
+ * Base value of the PMC counter, relative to the *consumed* count in
+ * the associated perf_event. This value includes counter updates from
+ * the perf_event and emulated_count since the last time the counter
+ * was reprogrammed, but it is *not* the current value as seen by the
+ * guest or userspace.
+ *
+ * The count is relative to the associated perf_event so that KVM
+ * doesn't need to reprogram the perf_event every time the guest writes
+ * to the counter.
+ */
u64 counter;
- u64 prev_counter;
+ /*
+ * PMC events triggered by KVM emulation that haven't been fully
+ * processed, i.e. haven't undergone overflow detection.
+ */
+ u64 emulated_counter;
u64 eventsel;
struct perf_event *perf_event;
struct kvm_vcpu *vcpu;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3725d001239d..f02cee222e9a 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -127,9 +127,9 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
struct kvm_pmc *pmc = perf_event->overflow_handler_context;

/*
- * Ignore overflow events for counters that are scheduled to be
- * reprogrammed, e.g. if a PMI for the previous event races with KVM's
- * handling of a related guest WRMSR.
+ * Ignore asynchronous overflow events for counters that are scheduled
+ * to be reprogrammed, e.g. if a PMI for the previous event races with
+ * KVM's handling of a related guest WRMSR.
*/
if (test_and_set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi))
return;
@@ -226,13 +226,19 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,

static void pmc_pause_counter(struct kvm_pmc *pmc)
{
- u64 counter = pmc->counter;
+ /*
+ * Accumulate emulated events, even if the PMC was already paused, e.g.
+ * if KVM emulated an event after a WRMSR, but before reprogramming, or
+ * if KVM couldn't create a perf event.
+ */
+ u64 counter = pmc->counter + pmc->emulated_counter;

- if (!pmc->perf_event || pmc->is_paused)
- return;
+ pmc->emulated_counter = 0;

/* update counter, reset event value to avoid redundant accumulation */
- counter += perf_event_pause(pmc->perf_event, true);
+ if (pmc->perf_event && !pmc->is_paused)
+ counter += perf_event_pause(pmc->perf_event, true);
+
pmc->counter = counter & pmc_bitmask(pmc);
pmc->is_paused = true;
}
@@ -289,6 +295,14 @@ static void pmc_update_sample_period(struct kvm_pmc *pmc)

void pmc_write_counter(struct kvm_pmc *pmc, u64 val)
{
+ /*
+ * Drop any unconsumed accumulated counts, the WRMSR is a write, not a
+ * read-modify-write. Adjust the counter value so that it's value is
+ * relative to the current perf_event (if there is one), as reading the
+ * current count is faster than pausing and repgrogramming the event in
+ * order to reset it to '0'.
+ */
+ pmc->emulated_counter = 0;
pmc->counter += val - pmc_read_counter(pmc);
pmc->counter &= pmc_bitmask(pmc);
pmc_update_sample_period(pmc);
@@ -426,6 +440,7 @@ static bool pmc_event_is_allowed(struct kvm_pmc *pmc)
static void reprogram_counter(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+ u64 prev_counter = pmc->counter;
u64 eventsel = pmc->eventsel;
u64 new_config = eventsel;
u8 fixed_ctr_ctrl;
@@ -435,7 +450,7 @@ static void reprogram_counter(struct kvm_pmc *pmc)
if (!pmc_event_is_allowed(pmc))
goto reprogram_complete;

- if (pmc->counter < pmc->prev_counter)
+ if (pmc->counter < prev_counter)
__kvm_perf_overflow(pmc, false);

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
@@ -475,7 +490,6 @@ static void reprogram_counter(struct kvm_pmc *pmc)

reprogram_complete:
clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
- pmc->prev_counter = 0;
}

void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
@@ -701,6 +715,7 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)

pmc_stop_counter(pmc);
pmc->counter = 0;
+ pmc->emulated_counter = 0;

if (pmc_is_gp(pmc))
pmc->eventsel = 0;
@@ -772,8 +787,7 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)

static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
{
- pmc->prev_counter = pmc->counter;
- pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
+ pmc->emulated_counter++;
kvm_pmu_request_counter_reprogram(pmc);
}

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cae85e550f60..7caeb3d8d4fd 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -66,7 +66,8 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
{
u64 counter, enabled, running;

- counter = pmc->counter;
+ counter = pmc->counter + pmc->emulated_counter;
+
if (pmc->perf_event && !pmc->is_paused)
counter += perf_event_read_value(pmc->perf_event,
&enabled, &running);
--
2.42.0.758.gaed0368e0e-goog

2023-10-27 20:55:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: x86/pmu: Track emulated counter events instead of previous counter

On Mon, Oct 23, 2023, Sean Christopherson wrote:
> @@ -226,13 +226,19 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
>
> static void pmc_pause_counter(struct kvm_pmc *pmc)
> {
> - u64 counter = pmc->counter;
> + /*
> + * Accumulate emulated events, even if the PMC was already paused, e.g.
> + * if KVM emulated an event after a WRMSR, but before reprogramming, or
> + * if KVM couldn't create a perf event.
> + */
> + u64 counter = pmc->counter + pmc->emulated_counter;
>
> - if (!pmc->perf_event || pmc->is_paused)
> - return;
> + pmc->emulated_counter = 0;

As pointed by Mingwei, who _very_ patiently explained to me what is broken, the
snapshot used to detect overflow due to emulated_counter events needs to be taken
_after_ pausing the perf event, i.e. the count from the perf event needs to be
excluded. If overflow happens from pmc->counter => pmc->counter + pmc->perf_event,
then hardware (via perf) will detect overflow. I.e. KVM is only responsible for
detecting overflow solely due to emulated_counter. Include the count from the
perf event can lead to KVM generating multiple overflow events, where architecturally
only one should occur.

> /* update counter, reset event value to avoid redundant accumulation */
> - counter += perf_event_pause(pmc->perf_event, true);
> + if (pmc->perf_event && !pmc->is_paused)
> + counter += perf_event_pause(pmc->perf_event, true);
> +
> pmc->counter = counter & pmc_bitmask(pmc);
> pmc->is_paused = true;
> }

2023-11-01 09:37:48

by Mi, Dapeng

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86/pmu: Clean up emulated PMC event handling


On 10/24/2023 7:39 AM, Sean Christopherson wrote:
> The ultimate goal of this series is to track emulated counter events using
> a dedicated variable instead of trying to track the previous counter value.
> Tracking the previous counter value is flawed as it takes a snapshot at
> every emulated event, but only checks for overflow prior to VM-Enter, i.e.
> KVM could miss an overflow if KVM ever supports emulating event types that
> can occur multiple times in a single VM-Exit.
>
> Patches 1-5 are (some loosely, some tightly) related fixes and cleanups to
> simplify the emulated counter approach implementation. The fixes are
> tagged for stable as usersepace could cause some weirdness around perf
> events, but I doubt any real world VMM is actually affected.
>
> Sean Christopherson (6):
> KVM: x86/pmu: Move PMU reset logic to common x86 code
> KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing
> KVM: x86/pmu: Stop calling kvm_pmu_reset() at RESET (it's redundant)
> KVM: x86/pmu: Remove manual clearing of fields in kvm_pmu_init()
> KVM: x86/pmu: Update sample period in pmc_write_counter()
> KVM: x86/pmu: Track emulated counter events instead of previous
> counter
>
> arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +-
> arch/x86/include/asm/kvm_host.h | 17 +++-
> arch/x86/kvm/pmu.c | 128 +++++++++++++++++++++----
> arch/x86/kvm/pmu.h | 47 +--------
> arch/x86/kvm/svm/pmu.c | 17 ----
> arch/x86/kvm/vmx/pmu_intel.c | 22 -----
> arch/x86/kvm/x86.c | 1 -
> 7 files changed, 127 insertions(+), 107 deletions(-)
>
>
> base-commit: ec2f1daad460c6201338dae606466220ccaa96d5

Reviewed-by: Dapeng Mi <[email protected]>