2022-09-22 05:29:38

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 2/2] KVM: x86/pmu: Add PEBS support for SPR and future non-hybird models

From: Like Xu <[email protected]>

Virtualization support for SPR PEBS has officially available in the
Intel SDM (June 2022) and has been validated on late stepping machines:

Compared to Ice Lake Server, the PDIR counter available (Fixed 0) on SPR
is unchanged, but the capability is enhanced to Instruction-Accurate PDIR
(PDIR++), where PEBS is taken on the next instruction after the one that
caused the overflow. Also, it introduces a new Precise Distribution (PDist)
facility that eliminates the skid when a precise event is programmed
on general programmable counter 0.

For guest usage, KVM will require the max precise level in both cases
mentioned above (other conditions may apply later), requesting the
correct hardware counter (PRIR++ or PDist) from host perf as usual.

Signed-off-by: Like Xu <[email protected]>
---
Previous:
https://lore.kernel.org/kvm/[email protected]/

V1 -> V2 Changelog:
- move the check into a function; (Kan)

arch/x86/kvm/pmu.c | 25 ++++++++++++++++++++++---
arch/x86/kvm/vmx/capabilities.h | 4 +++-
2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 02f9e4f245bd..5e9b0b3ea42d 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -28,9 +28,18 @@
struct x86_pmu_capability __read_mostly kvm_pmu_cap;
EXPORT_SYMBOL_GPL(kvm_pmu_cap);

-static const struct x86_cpu_id vmx_icl_pebs_cpu[] = {
+/* Precise Distribution of Instructions Retired (PDIR) */
+static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, NULL),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, NULL),
+ /* Instruction-Accurate PDIR (PDIR++) */
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
+ {}
+};
+
+/* Precise Distribution (PDist) */
+static const struct x86_cpu_id vmx_pebs_pdist_cpu[] = {
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
{}
};

@@ -140,6 +149,16 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
__kvm_perf_overflow(pmc, true);
}

+static bool need_max_precise(struct kvm_pmc *pmc)
+{
+ if (pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu))
+ return true;
+ if (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu))
+ return true;
+
+ return false;
+}
+
static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
u64 config, bool exclude_user,
bool exclude_kernel, bool intr)
@@ -181,11 +200,11 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
* the accuracy of the PEBS profiling result, because the "event IP"
* in the PEBS record is calibrated on the guest side.
*
- * On Icelake everything is fine. Other hardware (GLC+, TNT+) that
+ * On Icelake everything is fine. Other hardware (TNT+) that
* could possibly care here is unsupported and needs changes.
*/
attr.precise_ip = 1;
- if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
+ if (need_max_precise(pmc))
attr.precise_ip = 3;
}

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index c5e5dfef69c7..4dc4bbe18821 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -398,7 +398,9 @@ static inline bool vmx_pt_mode_is_host_guest(void)

static inline bool vmx_pebs_supported(void)
{
- return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
+ return boot_cpu_has(X86_FEATURE_PEBS) &&
+ !boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&
+ kvm_pmu_cap.pebs_ept;
}

static inline u64 vmx_get_perf_capabilities(void)
--
2.37.3


2022-10-11 19:41:20

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: x86/pmu: Add PEBS support for SPR and future non-hybird models

Kind of a nit, but I would prefer a shortlog that talks about the pdit/pdir
stuff and not a "enable PEBS" bucket.

On Thu, Sep 22, 2022, Like Xu wrote:
> @@ -140,6 +149,16 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
> __kvm_perf_overflow(pmc, true);
> }
>
> +static bool need_max_precise(struct kvm_pmc *pmc)
> +{
> + if (pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu))
> + return true;
> + if (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu))
> + return true;
> +
> + return false;
> +}
> +
> static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> u64 config, bool exclude_user,
> bool exclude_kernel, bool intr)
> @@ -181,11 +200,11 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> * the accuracy of the PEBS profiling result, because the "event IP"
> * in the PEBS record is calibrated on the guest side.
> *
> - * On Icelake everything is fine. Other hardware (GLC+, TNT+) that
> + * On Icelake everything is fine. Other hardware (TNT+) that
> * could possibly care here is unsupported and needs changes.

This part of the comment is still somewhat stale, and for me at least it's not at
all helpful.

> */
> attr.precise_ip = 1;
> - if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
> + if (need_max_precise(pmc))
> attr.precise_ip = 3;

What about writing this as:

attr.precise_ip = pmc_get_pebs_precision(pmc);

(or whatever name is appropriate for "pebs_precision").

Then in the helper, add comments to explaint the magic numbers and the interaction
with PDIST and PDIR. Bonus points if #defines for the the magic numbers can be
added somewher

* 0 - SAMPLE_IP can have arbitrary skid
* 1 - SAMPLE_IP must have constant skid
* 2 - SAMPLE_IP requested to have 0 skid
* 3 - SAMPLE_IP must have 0 skid

static u64 pmc_get_pebs_precision(struct kvm_pmc *pmc)
{
/* Comment that explains why PDIST/PDIR require 0 skid? */
if ((pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu)) ||
(pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu)))
return 3;

/* Comment about constant skid? */
return 1;
}


> }
>
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index c5e5dfef69c7..4dc4bbe18821 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -398,7 +398,9 @@ static inline bool vmx_pt_mode_is_host_guest(void)
>
> static inline bool vmx_pebs_supported(void)
> {
> - return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
> + return boot_cpu_has(X86_FEATURE_PEBS) &&
> + !boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&

This belongs in a separate patch, and it should be ordered before patch 1 so that
there's no window where KVM can see pebs_ept==1 on a hybrid CPU.

Actually, shouldn't everything in this patch land before core enabling?

> + kvm_pmu_cap.pebs_ept;

Please align indentation:

return boot_cpu_has(X86_FEATURE_PEBS) &&
!boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&
kvm_pmu_cap.pebs_ept;

2022-10-14 08:51:37

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: x86/pmu: Add PEBS support for SPR and future non-hybird models

On 12/10/2022 3:09 am, Sean Christopherson wrote:
> Kind of a nit, but I would prefer a shortlog that talks about the pdit/pdir
> stuff and not a "enable PEBS" bucket.
>
> On Thu, Sep 22, 2022, Like Xu wrote:
>> @@ -140,6 +149,16 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
>> __kvm_perf_overflow(pmc, true);
>> }
>>
>> +static bool need_max_precise(struct kvm_pmc *pmc)
>> +{
>> + if (pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu))
>> + return true;
>> + if (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu))
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>> u64 config, bool exclude_user,
>> bool exclude_kernel, bool intr)
>> @@ -181,11 +200,11 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>> * the accuracy of the PEBS profiling result, because the "event IP"
>> * in the PEBS record is calibrated on the guest side.
>> *
>> - * On Icelake everything is fine. Other hardware (GLC+, TNT+) that
>> + * On Icelake everything is fine. Other hardware (TNT+) that
>> * could possibly care here is unsupported and needs changes.
>
> This part of the comment is still somewhat stale, and for me at least it's not at
> all helpful.

SPR has GLC core, and this patch adds this support and removes the unsupported
statements.
Not sure how else I may help you.

>
>> */
>> attr.precise_ip = 1;
>> - if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
>> + if (need_max_precise(pmc))
>> attr.precise_ip = 3;
>
> What about writing this as:
>
> attr.precise_ip = pmc_get_pebs_precision(pmc);
>
> (or whatever name is appropriate for "pebs_precision").

The comment says, "the difference in the software precision levels of guest and
host PEBS events will not affect the accuracy of the PEBS profiling result".

>
> Then in the helper, add comments to explaint the magic numbers and the interaction
> with PDIST and PDIR. Bonus points if #defines for the the magic numbers can be
> added somewher

KVM just uses "precision_ip = 3" to request hw pdit/pdir counters from host perf,
It only works at 3. For 1 or 2, there is no difference.

>
> * 0 - SAMPLE_IP can have arbitrary skid
> * 1 - SAMPLE_IP must have constant skid
> * 2 - SAMPLE_IP requested to have 0 skid
> * 3 - SAMPLE_IP must have 0 skid
>
> static u64 pmc_get_pebs_precision(struct kvm_pmc *pmc)
> {
> /* Comment that explains why PDIST/PDIR require 0 skid? */
> if ((pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu)) ||
> (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu)))
> return 3;
>
> /* Comment about constant skid? */
> return 1;
> }
, therefore 0 or constant skid makes no difference.

>
>
>> }
>>
>> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
>> index c5e5dfef69c7..4dc4bbe18821 100644
>> --- a/arch/x86/kvm/vmx/capabilities.h
>> +++ b/arch/x86/kvm/vmx/capabilities.h
>> @@ -398,7 +398,9 @@ static inline bool vmx_pt_mode_is_host_guest(void)
>>
>> static inline bool vmx_pebs_supported(void)
>> {
>> - return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
>> + return boot_cpu_has(X86_FEATURE_PEBS) &&
>> + !boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&
>
> This belongs in a separate patch, and it should be ordered before patch 1 so that
> there's no window where KVM can see pebs_ept==1 on a hybrid CPU.
>
> Actually, shouldn't everything in this patch land before core enabling?

Perf core enabling about SPR is already upstreamed.
KVM support is not too late either, thanks to SPR's relentless delays.

>
>> + kvm_pmu_cap.pebs_ept;
>
> Please align indentation:
>
> return boot_cpu_has(X86_FEATURE_PEBS) &&
> !boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&
> kvm_pmu_cap.pebs_ept;
>

2022-10-14 16:59:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: x86/pmu: Add PEBS support for SPR and future non-hybird models

On Fri, Oct 14, 2022, Like Xu wrote:
> On 12/10/2022 3:09 am, Sean Christopherson wrote:
> > > @@ -181,11 +200,11 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> > > * the accuracy of the PEBS profiling result, because the "event IP"
> > > * in the PEBS record is calibrated on the guest side.
> > > *
> > > - * On Icelake everything is fine. Other hardware (GLC+, TNT+) that
> > > + * On Icelake everything is fine. Other hardware (TNT+) that
> > > * could possibly care here is unsupported and needs changes.
> >
> > This part of the comment is still somewhat stale, and for me at least it's not at
> > all helpful.
>
> SPR has GLC core, and this patch adds this support and removes the
> unsupported statements.

It's stale in the sense that it says "Icelake is fine", which implies that SPR is
NOT fine.

> Not sure how else I may help you.

Write comments that don't require the reader to have in-depth knowledge of the PMU's
magic numbers. Explain what "unsupported" means. Explain what changes might need
to be made. Use search-friendly words, e.g. I assume TNT means Tremont, but Googling
for "Intel TNT" doesn't yield anything useful.

The above comment essentially says "Icelake is good, Tremont is broken", but that
doesn't help the reader understand what this code actually does.

> > > attr.precise_ip = 1;
> > > - if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
> > > + if (need_max_precise(pmc))
> > > attr.precise_ip = 3;
> >
> > What about writing this as:
> >
> > attr.precise_ip = pmc_get_pebs_precision(pmc);
> >
> > (or whatever name is appropriate for "pebs_precision").
>
> The comment says, "the difference in the software precision levels of guest and
> host PEBS events will not affect the accuracy of the PEBS profiling result".
>
> >
> > Then in the helper, add comments to explaint the magic numbers and the interaction
> > with PDIST and PDIR. Bonus points if #defines for the the magic numbers can be
> > added somewher
>
> KVM just uses "precision_ip = 3" to request hw pdit/pdir counters from host perf,
> It only works at 3. For 1 or 2, there is no difference.

What only works at 3? Are 0..3 architectural PMU values, or are they arbitrary
numbers perf uses?

> >
> > * 0 - SAMPLE_IP can have arbitrary skid
> > * 1 - SAMPLE_IP must have constant skid
> > * 2 - SAMPLE_IP requested to have 0 skid
> > * 3 - SAMPLE_IP must have 0 skid
> >
> > static u64 pmc_get_pebs_precision(struct kvm_pmc *pmc)
> > {
> > /* Comment that explains why PDIST/PDIR require 0 skid? */
> > if ((pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu)) ||
> > (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu)))
> > return 3;
> >
> > /* Comment about constant skid? */
> > return 1;
> > }
> , therefore 0 or constant skid makes no difference.

Then why '1'?

> > > diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> > > index c5e5dfef69c7..4dc4bbe18821 100644
> > > --- a/arch/x86/kvm/vmx/capabilities.h
> > > +++ b/arch/x86/kvm/vmx/capabilities.h
> > > @@ -398,7 +398,9 @@ static inline bool vmx_pt_mode_is_host_guest(void)
> > > static inline bool vmx_pebs_supported(void)
> > > {
> > > - return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
> > > + return boot_cpu_has(X86_FEATURE_PEBS) &&
> > > + !boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&
> >
> > This belongs in a separate patch, and it should be ordered before patch 1 so that
> > there's no window where KVM can see pebs_ept==1 on a hybrid CPU.
> >
> > Actually, shouldn't everything in this patch land before core enabling?
>
> Perf core enabling about SPR is already upstreamed.

I'm referring to patch 1, "perf/x86/intel: Expose EPT-friendly PEBS for SPR and
future models". Before that patch, x86_pmu.pebs_ept will be set only for Icelake,
i.e. PEBS for KVM is effecitively not enabled in the core perf code until patch 1.
KVM should get all its support lined up, and then enable PEBS support for SPR and
beyond in core perf. E.g. if someone applied only patch 1, they would be able to
use KVM PEBS on any CPU with PEBS v5, even if it were a hybrid CPU.