2023-03-22 09:37:23

by Like Xu

[permalink] [raw]
Subject: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

From: Like Xu <[email protected]>

Per Intel SDM, the bit width of a PMU counter is specified via CPUID
only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES.
When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds
bits accesses do not generate #GP. Conversely when this bit is set, #GP
for out-of-bounds bits accesses will also appear on the fixed counters.
vPMU currently does not support emulation of bit widths lower than 32
bits or higher than its host capability.

Signed-off-by: Like Xu <[email protected]>
---
Previous:
https://lore.kernel.org/kvm/[email protected]/

V1 -> V2 Changelog:
- Apply #GP rule to fixed counetrs when guest has FW_WRITE;
- Apply signed rule to fixed counetrs when guest doesn't have FW_WRITE;
- Counters' bit width set by cpuid cannot be less than 32 bits;

arch/x86/kvm/vmx/pmu_intel.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e8a3be0b9df9..d38b820d6b9e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -470,6 +470,12 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
pmc_update_sample_period(pmc);
return 0;
} else if ((pmc = get_fixed_pmc(pmu, msr))) {
+ if (fw_writes_is_enabled(vcpu)) {
+ if (data & ~pmu->counter_bitmask[KVM_PMC_FIXED])
+ return 1;
+ } else if (!msr_info->host_initiated) {
+ data = (s64)(s32)data;
+ }
pmc->counter += data - pmc_read_counter(pmc);
pmc_update_sample_period(pmc);
return 0;
@@ -516,6 +522,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
union cpuid10_edx edx;
u64 perf_capabilities;
u64 counter_mask;
+ bool fw_wr = fw_writes_is_enabled(vcpu);
int i;

pmu->nr_arch_gp_counters = 0;
@@ -543,6 +550,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)

pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
kvm_pmu_cap.num_counters_gp);
+ eax.split.bit_width = fw_wr ? max_t(int, 32, eax.split.bit_width) : 32;
eax.split.bit_width = min_t(int, eax.split.bit_width,
kvm_pmu_cap.bit_width_gp);
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1;
@@ -558,6 +566,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
min3(ARRAY_SIZE(fixed_pmc_events),
(size_t) edx.split.num_counters_fixed,
(size_t)kvm_pmu_cap.num_counters_fixed);
+ edx.split.bit_width_fixed = fw_wr ?
+ max_t(int, 32, edx.split.bit_width_fixed) : 32;
edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
kvm_pmu_cap.bit_width_fixed);
pmu->counter_bitmask[KVM_PMC_FIXED] =

base-commit: d8708b80fa0e6e21bc0c9e7276ad0bccef73b6e7
--
2.40.0


2023-03-27 14:33:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

On Wed, Mar 22, 2023 at 10:31 AM Like Xu <[email protected]> wrote:
>
> From: Like Xu <[email protected]>
>
> Per Intel SDM, the bit width of a PMU counter is specified via CPUID
> only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES.
> When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds
> bits accesses do not generate #GP. Conversely when this bit is set, #GP
> for out-of-bounds bits accesses will also appear on the fixed counters.
> vPMU currently does not support emulation of bit widths lower than 32
> bits or higher than its host capability.

Can you please point out the date and paragraph of the SDM?

Paolo

2023-03-28 09:18:18

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

On 27/3/2023 10:30 pm, Paolo Bonzini wrote:
> On Wed, Mar 22, 2023 at 10:31 AM Like Xu <[email protected]> wrote:
>>
>> From: Like Xu <[email protected]>
>>
>> Per Intel SDM, the bit width of a PMU counter is specified via CPUID
>> only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES.
>> When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds
>> bits accesses do not generate #GP. Conversely when this bit is set, #GP
>> for out-of-bounds bits accesses will also appear on the fixed counters.
>> vPMU currently does not support emulation of bit widths lower than 32
>> bits or higher than its host capability.
>
> Can you please point out the date and paragraph of the SDM?
>
> Paolo
>

25462-078US, December 2022
20.2.6 Full-Width Writes to Performance Counter Registers

The general-purpose performance counter registers IA32_PMCx are writable via
WRMSR instruction.
However, the value written into IA32_PMCx by WRMSR is the signed extended 64-bit
value of the
EAX[31:0] input of WRMSR.

A processor that supports full-width writes to the general-purpose performance
counters enumerated by
CPUID.0AH:EAX[15:8] will set IA32_PERF_CAPABILITIES[13] to enumerate its
full-width-write
capability See Figure 20-65.

If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a
corresponding alias address starting at 4C1H for IA32_A_PMC0.

The bit width of the performance monitoring counters is specified in
CPUID.0AH:EAX[23:16].
If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to
IA32_A_PMCi will cause
IA32_PMCi to be updated by:

COUNTERWIDTH =
CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter
IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]);
IA32_PMCi[31:0] := EAX[31:0];
EDX[63:COUNTERWIDTH] are reserved

---

Some might argue that this is all talking about GP counters, not fixed counters.
In fact, the full-width write hw behaviour is presumed to do the same thing for
all counters.

Commercial hardware will not use less than 32 bits or a bit width like 46 bits.
A KVM user space (such as selftests) may set a strange bit-width, for example
using 33 bits,
and based on the current code, writing the reserved bits for #fixed counters
doesn't cause #GP.

Also when the guest does not have the Full-Width feature, the fixed counters can
be more than
32 bits wide via CPUID, while the #GP counter is only 32 bits wide, which is
also monstrous.

The current KVM is also not capable of emulating counter overflow when KVM user
space is set
to a bit width of less than 32 bits w/ FW_WRITE.

The above SDM-undefined behaviour led to this fix, which may lift some of the fog.

2023-03-28 09:24:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

On 3/28/23 11:16, Like Xu wrote:
>
>
> If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is
> accompanied by a
> corresponding alias address starting at 4C1H for IA32_A_PMC0.
>
> The bit width of the performance monitoring counters is specified in
> CPUID.0AH:EAX[23:16].
> If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to
> IA32_A_PMCi will cause
> IA32_PMCi to be updated by:
>
>     COUNTERWIDTH =
>         CPUID.0AH:EAX[23:16] bit width of the performance monitoring
> counter
>     IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]);
>     IA32_PMCi[31:0] := EAX[31:0];
>     EDX[63:COUNTERWIDTH] are reserved
>
> ---
>
> Some might argue that this is all talking about GP counters, not
> fixed counters. In fact, the full-width write hw behaviour is
> presumed to do the same thing for all counters.
But the above behavior, and the #GP, is only true for IA32_A_PMCi (the
full-witdh MSR). Did I understand correctly that the behavior for fixed
counters is changed without introducing an alias MSR?

Paolo

2023-03-28 10:09:30

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

On 28/3/2023 5:20 pm, Paolo Bonzini wrote:
> On 3/28/23 11:16, Like Xu wrote:
>>
>>
>> If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a
>> corresponding alias address starting at 4C1H for IA32_A_PMC0.
>>
>> The bit width of the performance monitoring counters is specified in
>> CPUID.0AH:EAX[23:16].
>> If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to
>> IA32_A_PMCi will cause
>> IA32_PMCi to be updated by:
>>
>>      COUNTERWIDTH =
>>          CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter
>>      IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]);
>>      IA32_PMCi[31:0] := EAX[31:0];
>>      EDX[63:COUNTERWIDTH] are reserved
>>
>> ---
>>
>> Some might argue that this is all talking about GP counters, not
>> fixed counters. In fact, the full-width write hw behaviour is
>> presumed to do the same thing for all counters.
> But the above behavior, and the #GP, is only true for IA32_A_PMCi (the
> full-witdh MSR).  Did I understand correctly that the behavior for fixed
> counters is changed without introducing an alias MSR?
>
> Paolo
>

If true, why introducing those alias MSRs ? My archaeological findings are:

a platform w/o full-witdh like Westmere (has 3-fixed counters already) is
declared to
have a counter width (R:48, W:32) and its successor Sandy Bridge has (R:48 , W:
32/48).

Thus I think the behaviour of the fixed counter has changed from there, and the
alias GP MSRs
were introduced to keep the support on 32-bit writes on #GP counters (via
original address).

[*] Intel® 64 and IA-32 Architectures Software Developer’s Manual Documentation
Changes
(252046-030, January 2011) Table 30-18 Core PMU Comparison.

2023-05-24 20:36:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

On Tue, Mar 28, 2023, Like Xu wrote:
> On 28/3/2023 5:20 pm, Paolo Bonzini wrote:
> > On 3/28/23 11:16, Like Xu wrote:
> > >
> > >
> > > If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a
> > > corresponding alias address starting at 4C1H for IA32_A_PMC0.
> > >
> > > The bit width of the performance monitoring counters is specified in
> > > CPUID.0AH:EAX[23:16].
> > > If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR
> > > to IA32_A_PMCi will cause
> > > IA32_PMCi to be updated by:
> > >
> > > �����COUNTERWIDTH =
> > > �������� CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter
> > > �����IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]);
> > > �����IA32_PMCi[31:0] := EAX[31:0];
> > > �����EDX[63:COUNTERWIDTH] are reserved
> > >
> > > ---
> > >
> > > Some might argue that this is all talking about GP counters, not
> > > fixed counters. In fact, the full-width write hw behaviour is
> > > presumed to do the same thing for all counters.
> > But the above behavior, and the #GP, is only true for IA32_A_PMCi (the
> > full-witdh MSR).� Did I understand correctly that the behavior for fixed
> > counters is changed without introducing an alias MSR?
> >
> > Paolo
> >
>
> If true, why introducing those alias MSRs ?

My guess is there is/was software in the field that wrote -1 to the GP counters,
i.e. would have been broken by the new #GP behavior.

> My archaeological findings are:
>
> a platform w/o full-witdh like Westmere (has 3-fixed counters already) is
> declared to have a counter width (R:48, W:32) and its successor Sandy Bridge
> has (R:48 , W: 32/48).
>
> Thus I think the behaviour of the fixed counter has changed from there, and
> the alias GP MSRs were introduced to keep the support on 32-bit writes on #GP
> counters (via original address).

FWIW, I see the #GP behavior for fixed counters on Haswell, so this does seem to
be the case. That said, I would like to get confirmation from Intel that this is
architectural and/or working as intended.

Like, can you follow up with Intel to get clarification/confirmation? And ideally
an SDM update...