2022-10-06 00:04:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 0/8] KVM: x86: Intel LBR related perf cleanups

PeterZ, I dropped your ACK from v4 because the perf patches were completely
broken.

Fix a bug where KVM incorrectly advertises PMU_CAP_LBR_FMT to userspace if
perf has disabled LBRs, e.g. because probing one or more LBR MSRs during
setup hit a #GP.

The non-KVM patch cleans up a KVM-specific perf API to fix a benign bug
where KVM ignores the error return from the API.

The remaining patches clean up KVM's PERF_CAPABILITIES mess, which makes
everything far more complex than it needs to be by

v5:
- Drop perf patches that removed stubs. The stubs are sadly necessary
when CPU_SUP_INTEL=n && KVM_INTEL={m,y}, which is possible due to
KVM_INTEL effectively depending on INTEL || CENTAUR || ZHAOXIN.
[hint provided by kernel test robot].
- Add a patch to ignore guest CPUID on host userspace MSR writes.
- Add supported PERF_CAPABILITIES to kvm_caps to simplify code for all
parties.

v4
- https://lore.kernel.org/all/[email protected]:
- Make vmx_get_perf_capabilities() non-inline to avoid references to
x86_perf_get_lbr() when CPU_SUP_INTEL=n. [kernel test robot]

v3:
- https://lore.kernel.org/all/[email protected]
- Drop patches for bug #1 (already merged).
- Drop misguided "clean up the capability check" patch. [Like]

v2:
- https://lore.kernel.org/all/[email protected]
- Add patches to fix bug #2. [Like]
- Add a patch to clean up the capability check.
- Tweak the changelog for the PMU refresh bug fix to call out that
KVM should disallow changing feature MSRs after KVM_RUN. [Like]

v1: https://lore.kernel.org/all/[email protected]

Sean Christopherson (8):
perf/x86/core: Zero @lbr instead of returning -1 in x86_perf_get_lbr()
stub
KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
KVM: x86: Track supported PERF_CAPABILITIES in kvm_caps
KVM: x86: Init vcpu->arch.perf_capabilities in common x86 code
KVM: x86: Handle PERF_CAPABILITIES in common x86's
kvm_get_msr_feature()
KVM: x86: Directly query supported PERF_CAPABILITIES for WRMSR checks

arch/x86/events/intel/lbr.c | 6 +---
arch/x86/include/asm/perf_event.h | 6 ++--
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/capabilities.h | 37 ----------------------
arch/x86/kvm/vmx/pmu_intel.c | 1 -
arch/x86/kvm/vmx/vmx.c | 51 +++++++++++++++++++++++--------
arch/x86/kvm/x86.c | 14 ++++-----
arch/x86/kvm/x86.h | 1 +
8 files changed, 52 insertions(+), 67 deletions(-)


base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
--
2.38.0.rc1.362.ged0d419d3c-goog


2022-10-06 00:05:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 4/8] KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL

Ignore guest CPUID for host userspace writes to the DEBUGCTL MSR, KVM's
ABI is that setting CPUID vs. state can be done in any order, i.e. KVM
allows userspace to stuff MSRs prior to setting the guest's CPUID that
makes the new MSR "legal".

Keep the vmx_get_perf_capabilities() check for guest writes, even though
it's technically unnecessary since the vCPU's PERF_CAPABILITIES is
consulted when refreshing LBR support. A future patch will clean up
vmx_get_perf_capabilities() to avoid the RDMSR on every call, at which
point the paranoia will incur no meaningful overhead.

Note, prior to vmx_get_perf_capabilities() checking that the host fully
supports LBRs via x86_perf_get_lbr(), KVM effectively relied on
intel_pmu_lbr_is_enabled() to guard against host userspace enabling LBRs
on platforms without full support.

Fixes: c646236344e9 ("KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 97fc873c37fa..e70ac91cd2fb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2021,16 +2021,16 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}

-static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
+static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;

if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
- guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;

if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
- intel_pmu_lbr_is_enabled(vcpu))
+ (host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;

return debugctl;
@@ -2105,7 +2105,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmcs_writel(GUEST_SYSENTER_ESP, data);
break;
case MSR_IA32_DEBUGCTLMSR: {
- u64 invalid = data & ~vcpu_supported_debugctl(vcpu);
+ u64 invalid;
+
+ invalid = data & ~vmx_get_supported_debugctl(vcpu, msr_info->host_initiated);
if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) {
if (report_ignored_msrs)
vcpu_unimpl(vcpu, "%s: BTF|LBR in IA32_DEBUGCTLMSR 0x%llx, nop\n",
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:05:23

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 3/8] KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()

Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only
caller. Setting bits only to clear them a few instructions later is
rather silly, and splitting the logic makes things seem more complicated
than they actually are.

Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single
reference to the pair of bits. The extra layer of indirection provides
no meaningful value and makes it unnecessarily tedious to understand
what KVM is doing.

No functional change.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 15 ---------------
arch/x86/kvm/vmx/vmx.c | 12 +++++++-----
2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index a6689bf06542..479124e49bbd 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -24,8 +24,6 @@ extern int __read_mostly pt_mode;
#define PMU_CAP_FW_WRITES (1ULL << 13)
#define PMU_CAP_LBR_FMT 0x3f

-#define DEBUGCTLMSR_LBR_MASK (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
-
struct nested_vmx_msrs {
/*
* We only store the "true" versions of the VMX capability MSRs. We
@@ -422,19 +420,6 @@ static inline u64 vmx_get_perf_capabilities(void)
return perf_cap;
}

-static inline u64 vmx_supported_debugctl(void)
-{
- u64 debugctl = 0;
-
- if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
- debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
-
- if (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT)
- debugctl |= DEBUGCTLMSR_LBR_MASK;
-
- return debugctl;
-}
-
static inline bool cpu_has_notify_vmexit(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9dba04b6b019..97fc873c37fa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2023,13 +2023,15 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,

static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu)
{
- u64 debugctl = vmx_supported_debugctl();
+ u64 debugctl = 0;

- if (!intel_pmu_lbr_is_enabled(vcpu))
- debugctl &= ~DEBUGCTLMSR_LBR_MASK;
+ if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;

- if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
- debugctl &= ~DEBUGCTLMSR_BUS_LOCK_DETECT;
+ if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
+ intel_pmu_lbr_is_enabled(vcpu))
+ debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;

return debugctl;
}
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:06:38

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 1/8] perf/x86/core: Zero @lbr instead of returning -1 in x86_perf_get_lbr() stub

Drop the return value from x86_perf_get_lbr() and have the stub zero out
the @lbr structure instead of returning -1 to indicate "no LBR support".
KVM doesn't actually check the return value, and instead subtly relies on
zeroing the number of LBRs in intel_pmu_init().

Formalize "nr=0 means unsupported" so that KVM doesn't need to add a
pointless check on the return value to fix KVM's benign bug.

Note, the stub is necessary even though KVM x86 selects PERF_EVENTS and
the caller exists only when CONFIG_KVM_INTEL=y. Despite the name,
KVM_INTEL doesn't strictly require CPU_SUP_INTEL, it can be built with
any of INTEL || CENTAUR || ZHAOXIN CPUs.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/events/intel/lbr.c | 6 +-----
arch/x86/include/asm/perf_event.h | 6 +++---
2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 47fca6a7a8bc..3abf7b041220 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1876,10 +1876,8 @@ void __init intel_pmu_arch_lbr_init(void)
* x86_perf_get_lbr - get the LBR records information
*
* @lbr: the caller's memory to store the LBR records information
- *
- * Returns: 0 indicates the LBR info has been successfully obtained
*/
-int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
+void x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
{
int lbr_fmt = x86_pmu.intel_cap.lbr_format;

@@ -1887,8 +1885,6 @@ int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
lbr->from = x86_pmu.lbr_from;
lbr->to = x86_pmu.lbr_to;
lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? x86_pmu.lbr_info : 0;
-
- return 0;
}
EXPORT_SYMBOL_GPL(x86_perf_get_lbr);

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f6fc8dd51ef4..18c105571d11 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -542,12 +542,12 @@ static inline void perf_check_microcode(void) { }

#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
-extern int x86_perf_get_lbr(struct x86_pmu_lbr *lbr);
+extern void x86_perf_get_lbr(struct x86_pmu_lbr *lbr);
#else
struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
-static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
+static inline void x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
{
- return -1;
+ memset(lbr, 0, sizeof(*lbr));
}
#endif

--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:07:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 7/8] KVM: x86: Handle PERF_CAPABILITIES in common x86's kvm_get_msr_feature()

Handle PERF_CAPABILITIES directly in kvm_get_msr_feature() now that the
supported value is available in kvm_caps.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 3 ---
arch/x86/kvm/vmx/vmx.c | 3 ---
arch/x86/kvm/x86.c | 3 +++
3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6b680b249975..0d8935e7a943 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2714,9 +2714,6 @@ static int svm_get_msr_feature(struct kvm_msr_entry *msr)
if (boot_cpu_has(X86_FEATURE_LFENCE_RDTSC))
msr->data |= MSR_F10H_DECFG_LFENCE_SERIALIZE;
break;
- case MSR_IA32_PERF_CAPABILITIES:
- msr->data = kvm_caps.supported_perf_cap;
- return 0;
default:
return KVM_MSR_RET_INVALID;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 850ff6e683d1..6ff832178e48 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1849,9 +1849,6 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
if (!nested)
return 1;
return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
- case MSR_IA32_PERF_CAPABILITIES:
- msr->data = kvm_caps.supported_perf_cap;
- return 0;
default:
return KVM_MSR_RET_INVALID;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b6a973d53f93..9443ddb358e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1653,6 +1653,9 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
case MSR_IA32_ARCH_CAPABILITIES:
msr->data = kvm_get_arch_capabilities();
break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ msr->data = kvm_caps.supported_perf_cap;
+ break;
case MSR_IA32_UCODE_REV:
rdmsrl_safe(msr->index, &msr->data);
break;
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:07:48

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 6/8] KVM: x86: Init vcpu->arch.perf_capabilities in common x86 code

Initialize vcpu->arch.perf_capabilities in x86's kvm_arch_vcpu_create()
instead of deferring initialization to vendor code. For better or worse,
common x86 handles reads and writes to the MSR, and so common x86 should
also handle initializing the MSR.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 1 -
arch/x86/kvm/x86.c | 2 ++
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 24f4c22691f8..49343ee48062 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -631,7 +631,6 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->fixed_counters[i].current_config = 0;
}

- vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
lbr_desc->msr_passthrough = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4bd5f8a751de..b6a973d53f93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11821,6 +11821,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;

kvm_async_pf_hash_reset(vcpu);
+
+ vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
kvm_pmu_init(vcpu);

vcpu->arch.pending_external_vector = -1;
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:18:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 8/8] KVM: x86: Directly query supported PERF_CAPABILITIES for WRMSR checks

Use kvm_caps.supported_perf_cap directly instead of bouncing through
kvm_get_msr_feature() when checking the incoming value for writes to
PERF_CAPABILITIES.

Note, kvm_get_msr_feature() is guaranteed to succeed when getting
PERF_CAPABILITIES, i.e. dropping that check is a nop.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9443ddb358e6..3afe5f4b1a40 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3568,20 +3568,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
vcpu->arch.arch_capabilities = data;
break;
- case MSR_IA32_PERF_CAPABILITIES: {
- struct kvm_msr_entry msr_ent = {.index = msr, .data = 0};
-
+ case MSR_IA32_PERF_CAPABILITIES:
if (!msr_info->host_initiated)
return 1;
- if (kvm_get_msr_feature(&msr_ent))
- return 1;
- if (data & ~msr_ent.data)
+ if (data & ~kvm_caps.supported_perf_cap)
return 1;

vcpu->arch.perf_capabilities = data;
kvm_pmu_refresh(vcpu);
return 0;
- }
case MSR_EFER:
return set_efer(vcpu, msr_info);
case MSR_K7_HWCR:
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:32:45

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 2/8] KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs

Advertise LBR support to userspace via MSR_IA32_PERF_CAPABILITIES if and
only if perf fully supports LBRs. Perf may disable LBRs (by zeroing the
number of LBRs) even on platforms the allegedly support LBRs, e.g. if
probing any LBR MSRs during setup fails.

Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES")
Reported-by: Like Xu <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 87c4e46daf37..a6689bf06542 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -400,6 +400,7 @@ static inline bool vmx_pebs_supported(void)
static inline u64 vmx_get_perf_capabilities(void)
{
u64 perf_cap = PMU_CAP_FW_WRITES;
+ struct x86_pmu_lbr lbr;
u64 host_perf_cap = 0;

if (!enable_pmu)
@@ -408,7 +409,9 @@ static inline u64 vmx_get_perf_capabilities(void)
if (boot_cpu_has(X86_FEATURE_PDCM))
rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);

- perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+ x86_perf_get_lbr(&lbr);
+ if (lbr.nr)
+ perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;

if (vmx_pebs_supported()) {
perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-06 00:35:35

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 5/8] KVM: x86: Track supported PERF_CAPABILITIES in kvm_caps

Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing
the supported capabilities on the fly every time. Using kvm_caps will
also allow for future cleanups as the kvm_caps values can be used
directly in common x86 code.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/capabilities.h | 25 ------------------------
arch/x86/kvm/vmx/pmu_intel.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++----
arch/x86/kvm/x86.h | 1 +
5 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f0077d9357..6b680b249975 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2715,6 +2715,7 @@ static int svm_get_msr_feature(struct kvm_msr_entry *msr)
msr->data |= MSR_F10H_DECFG_LFENCE_SERIALIZE;
break;
case MSR_IA32_PERF_CAPABILITIES:
+ msr->data = kvm_caps.supported_perf_cap;
return 0;
default:
return KVM_MSR_RET_INVALID;
@@ -4898,6 +4899,7 @@ static __init void svm_set_cpu_caps(void)
{
kvm_set_cpu_caps();

+ kvm_caps.supported_perf_cap = 0;
kvm_caps.supported_xss = 0;

/* CPUID 0x80000001 and 0x8000000A (SVM features) */
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 479124e49bbd..cd2ac9536c99 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -395,31 +395,6 @@ static inline bool vmx_pebs_supported(void)
return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
}

-static inline u64 vmx_get_perf_capabilities(void)
-{
- u64 perf_cap = PMU_CAP_FW_WRITES;
- struct x86_pmu_lbr lbr;
- u64 host_perf_cap = 0;
-
- if (!enable_pmu)
- return 0;
-
- if (boot_cpu_has(X86_FEATURE_PDCM))
- rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
-
- x86_perf_get_lbr(&lbr);
- if (lbr.nr)
- perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
-
- if (vmx_pebs_supported()) {
- perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
- if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4)
- perf_cap &= ~PERF_CAP_PEBS_BASELINE;
- }
-
- return perf_cap;
-}
-
static inline bool cpu_has_notify_vmexit(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 25b70a85bef5..24f4c22691f8 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -631,7 +631,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->fixed_counters[i].current_config = 0;
}

- vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();
+ vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
lbr_desc->records.nr = 0;
lbr_desc->event = NULL;
lbr_desc->msr_passthrough = false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e70ac91cd2fb..850ff6e683d1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1850,7 +1850,7 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
return 1;
return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
case MSR_IA32_PERF_CAPABILITIES:
- msr->data = vmx_get_perf_capabilities();
+ msr->data = kvm_caps.supported_perf_cap;
return 0;
default:
return KVM_MSR_RET_INVALID;
@@ -2029,7 +2029,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;

- if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
+ if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) &&
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;

@@ -2342,14 +2342,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
if (data & PMU_CAP_LBR_FMT) {
if ((data & PMU_CAP_LBR_FMT) !=
- (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
+ (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT))
return 1;
if (!cpuid_model_is_consistent(vcpu))
return 1;
}
if (data & PERF_CAP_PEBS_FORMAT) {
if ((data & PERF_CAP_PEBS_MASK) !=
- (vmx_get_perf_capabilities() & PERF_CAP_PEBS_MASK))
+ (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
return 1;
if (!guest_cpuid_has(vcpu, X86_FEATURE_DS))
return 1;
@@ -7669,6 +7669,31 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vmx_update_exception_bitmap(vcpu);
}

+static u64 vmx_get_perf_capabilities(void)
+{
+ u64 perf_cap = PMU_CAP_FW_WRITES;
+ struct x86_pmu_lbr lbr;
+ u64 host_perf_cap = 0;
+
+ if (!enable_pmu)
+ return 0;
+
+ if (boot_cpu_has(X86_FEATURE_PDCM))
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
+
+ x86_perf_get_lbr(&lbr);
+ if (lbr.nr)
+ perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+
+ if (vmx_pebs_supported()) {
+ perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
+ if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4)
+ perf_cap &= ~PERF_CAP_PEBS_BASELINE;
+ }
+
+ return perf_cap;
+}
+
static __init void vmx_set_cpu_caps(void)
{
kvm_set_cpu_caps();
@@ -7691,6 +7716,7 @@ static __init void vmx_set_cpu_caps(void)

if (!enable_pmu)
kvm_cpu_cap_clear(X86_FEATURE_PDCM);
+ kvm_caps.supported_perf_cap = vmx_get_perf_capabilities();

if (!enable_sgx) {
kvm_cpu_cap_clear(X86_FEATURE_SGX);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 829d3134c1eb..9de72586f406 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -27,6 +27,7 @@ struct kvm_caps {
u64 supported_mce_cap;
u64 supported_xcr0;
u64 supported_xss;
+ u64 supported_perf_cap;
};

void kvm_spurious_fault(void);
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-14 09:42:37

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v5 5/8] KVM: x86: Track supported PERF_CAPABILITIES in kvm_caps

On 6/10/2022 8:03 am, Sean Christopherson wrote:
> Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing
> the supported capabilities on the fly every time. Using kvm_caps will
> also allow for future cleanups as the kvm_caps values can be used
> directly in common x86 code.
>
> Signed-off-by: Sean Christopherson <[email protected]>

A similar change was made to one of my git branches. Please move forward.

Acked-by: Like Xu <[email protected]>

> ---
> arch/x86/kvm/svm/svm.c | 2 ++
> arch/x86/kvm/vmx/capabilities.h | 25 ------------------------
> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++----
> arch/x86/kvm/x86.h | 1 +
> 5 files changed, 34 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58f0077d9357..6b680b249975 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2715,6 +2715,7 @@ static int svm_get_msr_feature(struct kvm_msr_entry *msr)
> msr->data |= MSR_F10H_DECFG_LFENCE_SERIALIZE;
> break;
> case MSR_IA32_PERF_CAPABILITIES:
> + msr->data = kvm_caps.supported_perf_cap;
> return 0;
> default:
> return KVM_MSR_RET_INVALID;
> @@ -4898,6 +4899,7 @@ static __init void svm_set_cpu_caps(void)
> {
> kvm_set_cpu_caps();
>
> + kvm_caps.supported_perf_cap = 0;
> kvm_caps.supported_xss = 0;
>
> /* CPUID 0x80000001 and 0x8000000A (SVM features) */
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 479124e49bbd..cd2ac9536c99 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -395,31 +395,6 @@ static inline bool vmx_pebs_supported(void)
> return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
> }
>
> -static inline u64 vmx_get_perf_capabilities(void)
> -{
> - u64 perf_cap = PMU_CAP_FW_WRITES;
> - struct x86_pmu_lbr lbr;
> - u64 host_perf_cap = 0;
> -
> - if (!enable_pmu)
> - return 0;
> -
> - if (boot_cpu_has(X86_FEATURE_PDCM))
> - rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
> -
> - x86_perf_get_lbr(&lbr);
> - if (lbr.nr)
> - perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
> -
> - if (vmx_pebs_supported()) {
> - perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
> - if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4)
> - perf_cap &= ~PERF_CAP_PEBS_BASELINE;
> - }
> -
> - return perf_cap;
> -}
> -
> static inline bool cpu_has_notify_vmexit(void)
> {
> return vmcs_config.cpu_based_2nd_exec_ctrl &
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 25b70a85bef5..24f4c22691f8 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -631,7 +631,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
> pmu->fixed_counters[i].current_config = 0;
> }
>
> - vcpu->arch.perf_capabilities = vmx_get_perf_capabilities();
> + vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
> lbr_desc->records.nr = 0;
> lbr_desc->event = NULL;
> lbr_desc->msr_passthrough = false;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e70ac91cd2fb..850ff6e683d1 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1850,7 +1850,7 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> return 1;
> return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
> case MSR_IA32_PERF_CAPABILITIES:
> - msr->data = vmx_get_perf_capabilities();
> + msr->data = kvm_caps.supported_perf_cap;
> return 0;
> default:
> return KVM_MSR_RET_INVALID;
> @@ -2029,7 +2029,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
> (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
> debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
>
> - if ((vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT) &&
> + if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) &&
> (host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
> debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
>
> @@ -2342,14 +2342,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return 1;
> if (data & PMU_CAP_LBR_FMT) {
> if ((data & PMU_CAP_LBR_FMT) !=
> - (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
> + (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT))
> return 1;
> if (!cpuid_model_is_consistent(vcpu))
> return 1;
> }
> if (data & PERF_CAP_PEBS_FORMAT) {
> if ((data & PERF_CAP_PEBS_MASK) !=
> - (vmx_get_perf_capabilities() & PERF_CAP_PEBS_MASK))
> + (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
> return 1;
> if (!guest_cpuid_has(vcpu, X86_FEATURE_DS))
> return 1;
> @@ -7669,6 +7669,31 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> vmx_update_exception_bitmap(vcpu);
> }
>
> +static u64 vmx_get_perf_capabilities(void)
> +{
> + u64 perf_cap = PMU_CAP_FW_WRITES;
> + struct x86_pmu_lbr lbr;
> + u64 host_perf_cap = 0;
> +
> + if (!enable_pmu)
> + return 0;
> +
> + if (boot_cpu_has(X86_FEATURE_PDCM))
> + rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
> +
> + x86_perf_get_lbr(&lbr);
> + if (lbr.nr)
> + perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
> +
> + if (vmx_pebs_supported()) {
> + perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
> + if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4)
> + perf_cap &= ~PERF_CAP_PEBS_BASELINE;
> + }
> +
> + return perf_cap;
> +}
> +
> static __init void vmx_set_cpu_caps(void)
> {
> kvm_set_cpu_caps();
> @@ -7691,6 +7716,7 @@ static __init void vmx_set_cpu_caps(void)
>
> if (!enable_pmu)
> kvm_cpu_cap_clear(X86_FEATURE_PDCM);
> + kvm_caps.supported_perf_cap = vmx_get_perf_capabilities();
>
> if (!enable_sgx) {
> kvm_cpu_cap_clear(X86_FEATURE_SGX);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 829d3134c1eb..9de72586f406 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -27,6 +27,7 @@ struct kvm_caps {
> u64 supported_mce_cap;
> u64 supported_xcr0;
> u64 supported_xss;
> + u64 supported_perf_cap;
> };
>
> void kvm_spurious_fault(void);

2022-11-02 18:06:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 0/8] KVM: x86: Intel LBR related perf cleanups

On 10/6/22 02:03, Sean Christopherson wrote:
> PeterZ, I dropped your ACK from v4 because the perf patches were completely
> broken.
>
> Fix a bug where KVM incorrectly advertises PMU_CAP_LBR_FMT to userspace if
> perf has disabled LBRs, e.g. because probing one or more LBR MSRs during
> setup hit a #GP.
>
> The non-KVM patch cleans up a KVM-specific perf API to fix a benign bug
> where KVM ignores the error return from the API.
>
> The remaining patches clean up KVM's PERF_CAPABILITIES mess, which makes
> everything far more complex than it needs to be by
>
> v5:
> - Drop perf patches that removed stubs. The stubs are sadly necessary
> when CPU_SUP_INTEL=n && KVM_INTEL={m,y}, which is possible due to
> KVM_INTEL effectively depending on INTEL || CENTAUR || ZHAOXIN.
> [hint provided by kernel test robot].
> - Add a patch to ignore guest CPUID on host userspace MSR writes.
> - Add supported PERF_CAPABILITIES to kvm_caps to simplify code for all
> parties.
>
> v4
> - https://lore.kernel.org/all/[email protected]:
> - Make vmx_get_perf_capabilities() non-inline to avoid references to
> x86_perf_get_lbr() when CPU_SUP_INTEL=n. [kernel test robot]
>
> v3:
> - https://lore.kernel.org/all/[email protected]
> - Drop patches for bug #1 (already merged).
> - Drop misguided "clean up the capability check" patch. [Like]
>
> v2:
> - https://lore.kernel.org/all/[email protected]
> - Add patches to fix bug #2. [Like]
> - Add a patch to clean up the capability check.
> - Tweak the changelog for the PMU refresh bug fix to call out that
> KVM should disallow changing feature MSRs after KVM_RUN. [Like]
>
> v1: https://lore.kernel.org/all/[email protected]
>
> Sean Christopherson (8):
> perf/x86/core: Zero @lbr instead of returning -1 in x86_perf_get_lbr()
> stub
> KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
> KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
> KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
> KVM: x86: Track supported PERF_CAPABILITIES in kvm_caps
> KVM: x86: Init vcpu->arch.perf_capabilities in common x86 code
> KVM: x86: Handle PERF_CAPABILITIES in common x86's
> kvm_get_msr_feature()
> KVM: x86: Directly query supported PERF_CAPABILITIES for WRMSR checks
>
> arch/x86/events/intel/lbr.c | 6 +---
> arch/x86/include/asm/perf_event.h | 6 ++--
> arch/x86/kvm/svm/svm.c | 3 +-
> arch/x86/kvm/vmx/capabilities.h | 37 ----------------------
> arch/x86/kvm/vmx/pmu_intel.c | 1 -
> arch/x86/kvm/vmx/vmx.c | 51 +++++++++++++++++++++++--------
> arch/x86/kvm/x86.c | 14 ++++-----
> arch/x86/kvm/x86.h | 1 +
> 8 files changed, 52 insertions(+), 67 deletions(-)
>
>
> base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354

Queued, with patches 2-4 destined to kvm/master.

Paolo