2022-09-19 09:51:55

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 0/3] KVM: x86: Add AMD Guest PerfMonV2 PMU support

Starting with Zen4, core PMU on AMD platforms such as Genoa and
Ryzen-7000 will support PerfMonV2, and it is also compatible with
legacy PERFCTR_CORE behavior and msr addresses.

If you don't have access to the hardware specification, the commits
d6d0c7f681fd..7685665c390d for host perf can also bring a quick
overview. Its main change is the addition of three msr's equivalent
to Intel V2, namely global_ctrl, global_status, global_status_clear.

It is worth noting that this feature is very attractive for reducing the
overhead of PMU virtualization, since multiple msr accesses to multiple
counters will be replaced by a single access to the global register,
plus more accuracy gain when multiple guest counters are used.

The KVM part is based on the latest vPMU fixes patch set [1], while
the kvm-unit-test patch has been move to the preemptive test cases
effort [2] for testing leagcy AMD vPMU, which didn't exist before.

All related testcases are passed on a Genoa box.
Please feel free to run more tests, add more or share comments.

[1] https://lore.kernel.org/kvm/[email protected]/
[2] https://lore.kernel.org/kvm/[email protected]/

Previous:
https://lore.kernel.org/kvm/[email protected]/

V1 -> V2 Changelog:
- The patch limiting the x86 GP counters is moved to specific patch set;
- MSR scope is bounded by pmu->nr_arch_gp_counters (Jim);
- Fn8000_0022_EBX can never report less than four counters \
(or six, if Fn8000_0001_ECX[PerfCtrExtCore] is set); (Jim)
- Use "kvm_pmu_cap.version > 1" check instead of "guest perfmon_v2 bit";

Like Xu (2):
KVM: x86/pmu: Make part of the Intel v2 PMU MSRs handling x86 generic
KVM: x86/svm/pmu: Add AMD PerfMonV2 support

Sandipan Das (1):
KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

arch/x86/include/asm/kvm-x86-pmu-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/perf_event.h | 8 +++
arch/x86/kvm/cpuid.c | 32 ++++++++++-
arch/x86/kvm/pmu.c | 61 +++++++++++++++++++--
arch/x86/kvm/pmu.h | 30 +++++++++-
arch/x86/kvm/svm/pmu.c | 76 +++++++++++++++++++-------
arch/x86/kvm/vmx/pmu_intel.c | 58 +-------------------
arch/x86/kvm/x86.c | 14 ++++-
9 files changed, 193 insertions(+), 88 deletions(-)

--
2.37.3


2022-09-19 10:17:43

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 2/3] KVM: x86/svm/pmu: Add AMD PerfMonV2 support

From: Like Xu <[email protected]>

If AMD Performance Monitoring Version 2 (PerfMonV2) is detected
by the guest, it can use a new scheme to manage the Core PMCs using
the new global control and status registers.

In addition to benefiting from the PerfMonV2 functionality in the same
way as the host (higher precision), the guest also can reduce the number
of vm-exits by lowering the total number of MSRs accesses.

In terms of implementation details, amd_is_valid_msr() is resurrected
since three newly added MSRs could not be mapped to one vPMC.
The possibility of emulating PerfMonV2 on the mainframe has also
been eliminated for reasons of precision.

Co-developed-by: Sandipan Das <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/pmu.c | 6 +++
arch/x86/kvm/svm/pmu.c | 67 +++++++++++++++++++++++++++------
arch/x86/kvm/x86.c | 14 ++++++-
4 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 323f2fa46bb4..925d07431155 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -509,6 +509,7 @@ struct kvm_pmc {
#define MSR_ARCH_PERFMON_EVENTSEL_MAX (MSR_ARCH_PERFMON_EVENTSEL0 + KVM_INTEL_PMC_MAX_GENERIC - 1)
#define KVM_PMC_MAX_FIXED 3
#define KVM_AMD_PMC_MAX_GENERIC AMD64_NUM_COUNTERS_CORE
+#define MSR_F15H_PERF_MSR_MAX (MSR_F15H_PERF_CTR0 + 2 * (KVM_AMD_PMC_MAX_GENERIC - 1))
struct kvm_pmu {
unsigned nr_arch_gp_counters;
unsigned nr_arch_fixed_counters;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 2643a3994624..8348c9ba5b37 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -455,12 +455,15 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

switch (msr) {
case MSR_CORE_PERF_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
msr_info->data = pmu->global_status;
return 0;
case MSR_CORE_PERF_GLOBAL_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
msr_info->data = pmu->global_ctrl;
return 0;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
msr_info->data = 0;
return 0;
default:
@@ -479,12 +482,14 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

switch (msr) {
case MSR_CORE_PERF_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
if (msr_info->host_initiated) {
pmu->global_status = data;
return 0;
}
break; /* RO MSR */
case MSR_CORE_PERF_GLOBAL_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
if (pmu->global_ctrl == data)
return 0;
if (kvm_valid_perf_global_ctrl(pmu, data)) {
@@ -495,6 +500,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
break;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
if (!(data & pmu->global_ovf_ctrl_mask)) {
if (!msr_info->host_initiated)
pmu->global_status &= ~data;
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 3a20972e9f1a..8a39e9e33b06 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -92,12 +92,6 @@ static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
return amd_pmc_idx_to_pmc(vcpu_to_pmu(vcpu), idx & ~(3u << 30));
}

-static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
-{
- /* All MSRs refer to exactly one PMC, so msr_idx_to_pmc is enough. */
- return false;
-}
-
static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -109,6 +103,29 @@ static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
return pmc;
}

+static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+ switch (msr) {
+ case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
+ return pmu->version > 0;
+ case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
+ return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE);
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+ return pmu->version > 1;
+ default:
+ if (msr > MSR_F15H_PERF_CTR5 &&
+ msr < MSR_F15H_PERF_CTL0 + 2 * pmu->nr_arch_gp_counters)
+ return pmu->version > 1;
+ break;
+ }
+
+ return amd_msr_idx_to_pmc(vcpu, msr);
+}
+
static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -162,20 +179,43 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct kvm_cpuid_entry2 *entry;
+ union cpuid_0x80000022_ebx ebx;

- if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
- pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
- else
- pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
+ pmu->version = 1;
+ if (kvm_pmu_cap.version > 1) {
+ pmu->version = min_t(unsigned int, 2, kvm_pmu_cap.version);
+ entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0);
+ if (entry) {
+ ebx.full = entry->ebx;
+ pmu->nr_arch_gp_counters = min3((unsigned int)ebx.split.num_core_pmc,
+ (unsigned int)kvm_pmu_cap.num_counters_gp,
+ (unsigned int)KVM_AMD_PMC_MAX_GENERIC);
+ }
+ }
+
+ /* Commitment to minimal PMCs, regardless of CPUID.80000022 */
+ if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
+ pmu->nr_arch_gp_counters = max_t(unsigned int,
+ pmu->nr_arch_gp_counters,
+ AMD64_NUM_COUNTERS_CORE);
+ } else {
+ pmu->nr_arch_gp_counters = max_t(unsigned int,
+ pmu->nr_arch_gp_counters,
+ AMD64_NUM_COUNTERS);
+ }
+
+ if (pmu->version > 1) {
+ pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
+ pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask;
+ }

pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
pmu->reserved_bits = 0xfffffff000280000ull;
pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
- pmu->version = 1;
/* not applicable to AMD; but clean them to prevent any fall out */
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
pmu->nr_arch_fixed_counters = 0;
- pmu->global_status = 0;
bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
}

@@ -186,6 +226,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)

BUILD_BUG_ON(AMD64_NUM_COUNTERS_CORE > KVM_AMD_PMC_MAX_GENERIC);
BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC > INTEL_PMC_MAX_GENERIC);
+ BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC < 1);

for (i = 0; i < KVM_AMD_PMC_MAX_GENERIC ; i++) {
pmu->gp_counters[i].type = KVM_PMC_GP;
@@ -206,6 +247,8 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
pmc_stop_counter(pmc);
pmc->counter = pmc->prev_counter = pmc->eventsel = 0;
}
+
+ pmu->global_ctrl = pmu->global_status = 0;
}

struct kvm_pmu_ops amd_pmu_ops __initdata = {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1d28d147fc34..34ceae4e2354 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1445,6 +1445,10 @@ static const u32 msrs_to_save_all[] = {
MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,

+ MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+
MSR_IA32_XFD, MSR_IA32_XFD_ERR,
};

@@ -3848,7 +3852,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_PEBS_ENABLE:
case MSR_IA32_DS_AREA:
case MSR_PEBS_DATA_CFG:
- case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
+ case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_MSR_MAX:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
/*
@@ -3951,7 +3958,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_PEBS_ENABLE:
case MSR_IA32_DS_AREA:
case MSR_PEBS_DATA_CFG:
- case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
+ case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_MSR_MAX:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info);
/*
--
2.37.3

2022-09-19 10:18:01

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 3/3] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

From: Sandipan Das <[email protected]>

From: Sandipan Das <[email protected]>

CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some
new performance monitoring features for AMD processors.

Bit 0 of EAX indicates support for Performance Monitoring
Version 2 (PerfMonV2) features. If found to be set during
PMU initialization, the EBX bits of the same CPUID function
can be used to determine the number of available PMCs for
different PMU types.

Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so
that guests can make use of the PerfMonV2 features.

Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/include/asm/perf_event.h | 8 ++++++++
arch/x86/kvm/cpuid.c | 32 ++++++++++++++++++++++++++++++-
2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f6fc8dd51ef4..c848f504e467 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -214,6 +214,14 @@ union cpuid_0x80000022_ebx {
unsigned int full;
};

+union cpuid_0x80000022_eax {
+ struct {
+ /* Performance Monitoring Version 2 Supported */
+ unsigned int perfmon_v2:1;
+ } split;
+ unsigned int full;
+};
+
struct x86_pmu_capability {
int version;
int num_counters_gp;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 75dcf7a72605..34ba845c91b7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1094,7 +1094,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->edx = 0;
break;
case 0x80000000:
- entry->eax = min(entry->eax, 0x80000021);
+ entry->eax = min(entry->eax, 0x80000022);
/*
* Serializing LFENCE is reported in a multitude of ways, and
* NullSegClearsBase is not reported in CPUID on Zen2; help
@@ -1203,6 +1203,36 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
entry->eax |= BIT(6);
break;
+ /* AMD Extended Performance Monitoring and Debug */
+ case 0x80000022: {
+ union cpuid_0x80000022_eax eax;
+ union cpuid_0x80000022_ebx ebx;
+
+ entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+ if (!enable_pmu)
+ break;
+
+ if (kvm_pmu_cap.version > 1) {
+ /* AMD PerfMon is only supported up to V2 in the KVM. */
+ eax.split.perfmon_v2 = 1;
+ ebx.split.num_core_pmc = min(kvm_pmu_cap.num_counters_gp,
+ KVM_AMD_PMC_MAX_GENERIC);
+ }
+
+ if (kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE)) {
+ ebx.split.num_core_pmc = max_t(unsigned int,
+ ebx.split.num_core_pmc,
+ AMD64_NUM_COUNTERS_CORE);
+ } else {
+ ebx.split.num_core_pmc = max_t(unsigned int,
+ ebx.split.num_core_pmc,
+ AMD64_NUM_COUNTERS);
+ }
+
+ entry->eax = eax.full;
+ entry->ebx = ebx.full;
+ break;
+ }
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
/*Just support up to 0xC0000004 now*/
--
2.37.3

2022-09-21 00:16:34

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

On Mon, Sep 19, 2022 at 2:35 AM Like Xu <[email protected]> wrote:
>
> From: Sandipan Das <[email protected]>
>
> From: Sandipan Das <[email protected]>
>
> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some
> new performance monitoring features for AMD processors.
>
> Bit 0 of EAX indicates support for Performance Monitoring
> Version 2 (PerfMonV2) features. If found to be set during
> PMU initialization, the EBX bits of the same CPUID function
> can be used to determine the number of available PMCs for
> different PMU types.
>
> Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so
> that guests can make use of the PerfMonV2 features.
>
> Co-developed-by: Like Xu <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> Signed-off-by: Sandipan Das <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2022-09-21 00:40:33

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: x86/svm/pmu: Add AMD PerfMonV2 support

On Mon, Sep 19, 2022 at 2:35 AM Like Xu <[email protected]> wrote:
>
> From: Like Xu <[email protected]>
>
> If AMD Performance Monitoring Version 2 (PerfMonV2) is detected
> by the guest, it can use a new scheme to manage the Core PMCs using
> the new global control and status registers.
>
> In addition to benefiting from the PerfMonV2 functionality in the same
> way as the host (higher precision), the guest also can reduce the number
> of vm-exits by lowering the total number of MSRs accesses.
>
> In terms of implementation details, amd_is_valid_msr() is resurrected
> since three newly added MSRs could not be mapped to one vPMC.
> The possibility of emulating PerfMonV2 on the mainframe has also
> been eliminated for reasons of precision.
>
> Co-developed-by: Sandipan Das <[email protected]>
> Signed-off-by: Sandipan Das <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2022-10-27 23:09:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: x86/svm/pmu: Add AMD PerfMonV2 support

On Mon, Sep 19, 2022, Like Xu wrote:
> @@ -162,20 +179,43 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> + struct kvm_cpuid_entry2 *entry;
> + union cpuid_0x80000022_ebx ebx;
>
> - if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
> - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
> - else
> - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
> + pmu->version = 1;
> + if (kvm_pmu_cap.version > 1) {
> + pmu->version = min_t(unsigned int, 2, kvm_pmu_cap.version);

This is wrong, it forces the guest PMU verson to match the max version supported
by KVM. E.g. if userspace wants to expose v1 for whatever reason, pmu->version
will still end up 2+.

> + entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0);
> + if (entry) {
> + ebx.full = entry->ebx;
> + pmu->nr_arch_gp_counters = min3((unsigned int)ebx.split.num_core_pmc,
> + (unsigned int)kvm_pmu_cap.num_counters_gp,
> + (unsigned int)KVM_AMD_PMC_MAX_GENERIC);

This is technically wrong, the number of counters is supposed to be valid if and
only if v2 is supported. On a related topic, does KVM explode if userspace
specifies a bogus PMU version on Intel? I don't see any sanity checks there...

With a proper feature flag

pmu->version = 1;
if (kvm_cpu_has(X86_FEATURE_AMD_PMU_V2) &&
guest_cpuid_has(X86_FEATURE_AMD_PMU_V2)) {
pmu->version = 2;

entry = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0);
if (entry) {
...

Though technically the "if (entry)" check is unnecesary.

> + }
> + }
> +
> + /* Commitment to minimal PMCs, regardless of CPUID.80000022 */
> + if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {

Unnecessary braces.

> + pmu->nr_arch_gp_counters = max_t(unsigned int,
> + pmu->nr_arch_gp_counters,
> + AMD64_NUM_COUNTERS_CORE);

What happens if userspace sets X86_FEATURE_PERFCTR_CORE when its not supported?
E.g. will KVM be coerced into taking a #GP on a non-existent counter?


> + } else {
> + pmu->nr_arch_gp_counters = max_t(unsigned int,
> + pmu->nr_arch_gp_counters,
> + AMD64_NUM_COUNTERS);
> + }

2022-10-27 23:09:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

On Mon, Sep 19, 2022, Like Xu wrote:
> From: Sandipan Das <[email protected]>
>
> From: Sandipan Das <[email protected]>

Duplicate "From:"s.

> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some
> new performance monitoring features for AMD processors.

Wrap changelogs closer to ~75 chars.

> Bit 0 of EAX indicates support for Performance Monitoring
> Version 2 (PerfMonV2) features. If found to be set during
> PMU initialization, the EBX bits of the same CPUID function
> can be used to determine the number of available PMCs for
> different PMU types.
>
> Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so
> that guests can make use of the PerfMonV2 features.
>
> Co-developed-by: Like Xu <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> arch/x86/include/asm/perf_event.h | 8 ++++++++
> arch/x86/kvm/cpuid.c | 32 ++++++++++++++++++++++++++++++-
> 2 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index f6fc8dd51ef4..c848f504e467 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -214,6 +214,14 @@ union cpuid_0x80000022_ebx {
> unsigned int full;
> };
>
> +union cpuid_0x80000022_eax {
> + struct {
> + /* Performance Monitoring Version 2 Supported */
> + unsigned int perfmon_v2:1;
> + } split;
> + unsigned int full;
> +};

I'm not a fan of perf's unions, but I at least understand the value added for
CPUID entries that are a bunch of multi-bit values. However, this leaf appears
to be a pure features leaf. In which case a union just makes life painful.

Please add a CPUID_8000_0022_EAX kvm_only_cpuid_leafs entry (details in link[*]
below) so that KVM can write sane code like

guest_cpuid_has(X86_FEATURE_AMD_PMU_V2)

and cpuid_entry_override() instead of manually filling in information.

where appropriate.

[*] https://lore.kernel.org/all/[email protected]

> struct x86_pmu_capability {
> int version;
> int num_counters_gp;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 75dcf7a72605..34ba845c91b7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1094,7 +1094,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> entry->edx = 0;
> break;
> case 0x80000000:
> - entry->eax = min(entry->eax, 0x80000021);
> + entry->eax = min(entry->eax, 0x80000022);
> /*
> * Serializing LFENCE is reported in a multitude of ways, and
> * NullSegClearsBase is not reported in CPUID on Zen2; help
> @@ -1203,6 +1203,36 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
> entry->eax |= BIT(6);
> break;
> + /* AMD Extended Performance Monitoring and Debug */
> + case 0x80000022: {
> + union cpuid_0x80000022_eax eax;
> + union cpuid_0x80000022_ebx ebx;
> +
> + entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> + if (!enable_pmu)

Shouldn't

case 0xa: { /* Architectural Performance Monitoring */

also check enable_pmu instead of X86_FEATURE_ARCH_PERFMON?

> + break;
> +
> + if (kvm_pmu_cap.version > 1) {
> + /* AMD PerfMon is only supported up to V2 in the KVM. */
> + eax.split.perfmon_v2 = 1;

With a proper CPUID_8000_0022_EAX, this becomes:

entry->ecx = entry->edx = 0;
if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_AMD_PMU_V2)) {
entry->eax = entry->ebx;
break;
}

cpuid_entry_override(entry, CPUID_8000_0022_EAX);

...

entry->ebx = ebx.full;

2022-11-09 10:11:13

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: x86/svm/pmu: Add AMD PerfMonV2 support

On 28/10/2022 6:47 am, Sean Christopherson wrote:
> What happens if userspace sets X86_FEATURE_PERFCTR_CORE when its not supported?
> E.g. will KVM be coerced into taking a #GP on a non-existent counter?

I'm getting a bit tired of this generic issue, what does KVM need to do when the
KVM user space
sets a capability that KVM doesn't support (like cpuid). Should it change the
guest cpuid audibly
or silently ? Should it report an error when the guest uses this unsupported
capability at run time,
or should it just let the KVM report an error when user space setting the cpuid ?

For vPMU, I may prefer to report an error when setting the vpmu capability (via
cpuid or perf_cap).

Please let me know what you think and make it law as an example to others.

Thanks,
Like Xu

2022-11-09 14:55:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] KVM: x86/svm/pmu: Add AMD PerfMonV2 support

On Wed, Nov 09, 2022, Like Xu wrote:
> On 28/10/2022 6:47 am, Sean Christopherson wrote:
> > What happens if userspace sets X86_FEATURE_PERFCTR_CORE when its not supported?
> > E.g. will KVM be coerced into taking a #GP on a non-existent counter?
>
> I'm getting a bit tired of this generic issue, what does KVM need to do when
> the KVM user space sets a capability that KVM doesn't support (like cpuid).
> Should it change the guest cpuid audibly or silently ? Should it report an
> error when the guest uses this unsupported capability at run time, or should
> it just let the KVM report an error when user space setting the cpuid ?

KVM should do nothing unless the bogus CPUID can be used to trigger unexpected
and/or unwanted behavior in the kernel, which is why I asked if KVM might end up
taking a #GP.

2022-11-10 09:37:54

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

On 28/10/2022 6:37 am, Sean Christopherson wrote:
> On Mon, Sep 19, 2022, Like Xu wrote:
>> From: Sandipan Das <[email protected]>
>>
>> From: Sandipan Das <[email protected]>
>
> Duplicate "From:"s.
>
>> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some
>> new performance monitoring features for AMD processors.
>
> Wrap changelogs closer to ~75 chars.
>
>> Bit 0 of EAX indicates support for Performance Monitoring
>> Version 2 (PerfMonV2) features. If found to be set during
>> PMU initialization, the EBX bits of the same CPUID function
>> can be used to determine the number of available PMCs for
>> different PMU types.
>>
>> Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so
>> that guests can make use of the PerfMonV2 features.
>>
>> Co-developed-by: Like Xu <[email protected]>
>> Signed-off-by: Like Xu <[email protected]>
>> Signed-off-by: Sandipan Das <[email protected]>
>> ---
>> arch/x86/include/asm/perf_event.h | 8 ++++++++
>> arch/x86/kvm/cpuid.c | 32 ++++++++++++++++++++++++++++++-
>> 2 files changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index f6fc8dd51ef4..c848f504e467 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -214,6 +214,14 @@ union cpuid_0x80000022_ebx {
>> unsigned int full;
>> };
>>
>> +union cpuid_0x80000022_eax {
>> + struct {
>> + /* Performance Monitoring Version 2 Supported */
>> + unsigned int perfmon_v2:1;
>> + } split;
>> + unsigned int full;
>> +};
>
> I'm not a fan of perf's unions, but I at least understand the value added for
> CPUID entries that are a bunch of multi-bit values. However, this leaf appears
> to be a pure features leaf. In which case a union just makes life painful.
>
> Please add a CPUID_8000_0022_EAX kvm_only_cpuid_leafs entry (details in link[*]
> below) so that KVM can write sane code like
>
> guest_cpuid_has(X86_FEATURE_AMD_PMU_V2)
>
> and cpuid_entry_override() instead of manually filling in information.
>
> where appropriate.
>
> [*] https://lore.kernel.org/all/[email protected]

When someone is selling syntactic sugar in the kernel space, extra attention
needs to be paid to runtime performance (union) and memory footprint
(reverse_cpuid).

Applied for this case, while the cpuid_0x80000022_eax will be used again
in the perf core since the other new AMD PMU features are pacing at the door.

>
>> struct x86_pmu_capability {
>> int version;
>> int num_counters_gp;
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 75dcf7a72605..34ba845c91b7 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -1094,7 +1094,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>> entry->edx = 0;
>> break;
>> case 0x80000000:
>> - entry->eax = min(entry->eax, 0x80000021);
>> + entry->eax = min(entry->eax, 0x80000022);
>> /*
>> * Serializing LFENCE is reported in a multitude of ways, and
>> * NullSegClearsBase is not reported in CPUID on Zen2; help
>> @@ -1203,6 +1203,36 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>> if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
>> entry->eax |= BIT(6);
>> break;
>> + /* AMD Extended Performance Monitoring and Debug */
>> + case 0x80000022: {
>> + union cpuid_0x80000022_eax eax;
>> + union cpuid_0x80000022_ebx ebx;
>> +
>> + entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
>> + if (!enable_pmu)
>
> Shouldn't
>
> case 0xa: { /* Architectural Performance Monitoring */
>
> also check enable_pmu instead of X86_FEATURE_ARCH_PERFMON?

Applied as a separate patch, though KVM will have zero-padded kvm_pmu_cap to do
subsequent assignments when !enable_pmu but that doesn't hurt.

>
>> + break;
>> +
>> + if (kvm_pmu_cap.version > 1) {
>> + /* AMD PerfMon is only supported up to V2 in the KVM. */
>> + eax.split.perfmon_v2 = 1;
>
> With a proper CPUID_8000_0022_EAX, this becomes:
>
> entry->ecx = entry->edx = 0;
> if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_AMD_PMU_V2)) {
> entry->eax = entry->ebx;
> break;
> }
>
> cpuid_entry_override(entry, CPUID_8000_0022_EAX);
>
> ...

Then in this code block, we will have:

/* AMD PerfMon is only supported up to V2 in the KVM. */
entry->eax |= BIT(0);

to cover AMD Perfmon V3+, any better move ?

>
> entry->ebx = ebx.full;

2022-11-10 18:05:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022

On Thu, Nov 10, 2022, Like Xu wrote:
> On 28/10/2022 6:37 am, Sean Christopherson wrote:
> > I'm not a fan of perf's unions, but I at least understand the value added for
> > CPUID entries that are a bunch of multi-bit values. However, this leaf appears
> > to be a pure features leaf. In which case a union just makes life painful.
> >
> > Please add a CPUID_8000_0022_EAX kvm_only_cpuid_leafs entry (details in link[*]
> > below) so that KVM can write sane code like
> >
> > guest_cpuid_has(X86_FEATURE_AMD_PMU_V2)
> >
> > and cpuid_entry_override() instead of manually filling in information.
> >
> > where appropriate.
> >
> > [*] https://lore.kernel.org/all/[email protected]
>
> When someone is selling syntactic sugar in the kernel space, extra attention
> needs to be paid to runtime performance (union) and memory footprint
> (reverse_cpuid).

No. Just no.

First off, this is more than syntactic sugar. KVM has had multiple bugs in the
past due to manually querying/filling CPUID entries. The reverse-CPUID infrastructure
guards against some of those bugs by limiting potential bugs to the leaf definition
and the feature definition. I.e. we only need to get the cpuid_leafs+X86_FEATURE_*
definitions correct.

Second, this code is not remotely performance sensitive, and the memory footprint
of the reverse_cpuid table is laughably small. It's literally 8 bytes per entry
FOR THE ENTIRE KERNEL. And that's ignoring the fact that the table might even be
optimized away entirely since it's really just a switch statement that doesn't
use a helper function.

> > With a proper CPUID_8000_0022_EAX, this becomes:
> >
> > entry->ecx = entry->edx = 0;
> > if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_AMD_PMU_V2)) {
> > entry->eax = entry->ebx;
> > break;
> > }
> >
> > cpuid_entry_override(entry, CPUID_8000_0022_EAX);
> >
> > ...
>
> Then in this code block, we will have:
>
> /* AMD PerfMon is only supported up to V2 in the KVM. */
> entry->eax |= BIT(0);

I can't tell exactly what you're suggesting, but if you're implying that you don't
want to add CPUID_8000_0022_EAX, then NAK. Open coding CPUID feature bit
manipulations in KVM is not acceptable.

If I'm misunderstanding and there's something that isn't handled by
cpuid_entry_override(), then the correct way to force a CPUID feature bit is:

cpuid_entry_set(entry, X86_FEATURE_AMD_PMU_V2);

> to cover AMD Perfmon V3+, any better move ?

Huh? If/when V3+ comes along, the above

cpuid_entry_override(entry, CPUID_8000_0022_EAX);

will continue to do the right thing because KVM will (a) advertise V2 if it's
supported in hardware and (b) NOT advertise V3+ because the relevant CPUID bit(s)
will not be set in kvm_cpu_caps until KVM gains the necessary support.