From: Like Xu <[email protected]>
In either case, the counters will point to fixed or gp pmc array, and
taking advantage of the C pointer, it's reasonable to use an almost known
mem load operation directly without disturbing the branch predictor.
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e5cec07ca8d9..28b0a784f6e9 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
}
if (idx >= num_counters)
return NULL;
- *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
+ *mask &= pmu->counter_bitmask[counters->type];
return &counters[array_index_nospec(idx, num_counters)];
}
--
2.38.1
On Mon, Dec 05, 2022, Like Xu wrote:
> From: Like Xu <[email protected]>
>
> In either case, the counters will point to fixed or gp pmc array, and
> taking advantage of the C pointer, it's reasonable to use an almost known
> mem load operation directly without disturbing the branch predictor.
The compiler is extremely unlikely to generate a branch for this, e.g. gcc-12 uses
setne and clang-14 shifts "fixed" by 30. FWIW, clang is also clever enough to
use a cmov to load the address of counters, i.e. the happy path will have no taken
branches for either type of counter.
> Signed-off-by: Like Xu <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index e5cec07ca8d9..28b0a784f6e9 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> }
> if (idx >= num_counters)
> return NULL;
> - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
> + *mask &= pmu->counter_bitmask[counters->type];
In terms of readability, I have a slight preference for the current code as I
don't have to look at counters->type to understand its possible values.
On 6/12/2022 12:46 am, Sean Christopherson wrote:
> On Mon, Dec 05, 2022, Like Xu wrote:
>> From: Like Xu <[email protected]>
>>
>> In either case, the counters will point to fixed or gp pmc array, and
>> taking advantage of the C pointer, it's reasonable to use an almost known
>> mem load operation directly without disturbing the branch predictor.
>
> The compiler is extremely unlikely to generate a branch for this, e.g. gcc-12 uses
> setne and clang-14 shifts "fixed" by 30. FWIW, clang is also clever enough to
> use a cmov to load the address of counters, i.e. the happy path will have no taken
> branches for either type of counter.
If so, good news for users of the new tool chain. I assume our Linux project is also
to be commended when it comes to supporting legacy issues even if just a little.
>
>> Signed-off-by: Like Xu <[email protected]>
>> ---
>> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index e5cec07ca8d9..28b0a784f6e9 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>> }
>> if (idx >= num_counters)
>> return NULL;
>> - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
>> + *mask &= pmu->counter_bitmask[counters->type];
>
> In terms of readability, I have a slight preference for the current code as I
> don't have to look at counters->type to understand its possible values.
When someone tries to add a new type of pmc type, the code bugs up. And,
this one will make all usage of pmu->counter_bitmask[] more consistent.
Please reconsider this minor diff if it does no harm.
On Tue, Dec 06, 2022, Like Xu wrote:
> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > index e5cec07ca8d9..28b0a784f6e9 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > }
> > > if (idx >= num_counters)
> > > return NULL;
> > > - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
> > > + *mask &= pmu->counter_bitmask[counters->type];
> >
> > In terms of readability, I have a slight preference for the current code as I
> > don't have to look at counters->type to understand its possible values.
> When someone tries to add a new type of pmc type, the code bugs up.
Are there new types coming along? If so, I definitely would not object to refactoring
this code in the context of a series that adds a new type(s). But "fixing" this one
case is not sufficient to support a new type, e.g. intel_is_valid_rdpmc_ecx() also
needs to be updated. Actually, even this function would need additional updates
to perform a similar sanity check.
if (fixed) {
counters = pmu->fixed_counters;
num_counters = pmu->nr_arch_fixed_counters;
} else {
counters = pmu->gp_counters;
num_counters = pmu->nr_arch_gp_counters;
}
if (idx >= num_counters)
return NULL;
> And, this one will make all usage of pmu->counter_bitmask[] more consistent.
How's that? There's literally one instance of using ->type
static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
return pmu->counter_bitmask[pmc->type];
}
everything else is hardcoded. And using pmc->type there make perfect sense in
that case. But in intel_rdpmc_ecx_to_pmc(), there is already usage of "fixed",
so IMO switching to ->type makes that function somewhat inconsistent with itself.
On 7/12/2022 1:19 am, Sean Christopherson wrote:
> On Tue, Dec 06, 2022, Like Xu wrote:
>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>>> index e5cec07ca8d9..28b0a784f6e9 100644
>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>> @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>>>> }
>>>> if (idx >= num_counters)
>>>> return NULL;
>>>> - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
>>>> + *mask &= pmu->counter_bitmask[counters->type];
>>>
>>> In terms of readability, I have a slight preference for the current code as I
IMO, using counters->type directly just like pmc_bitmask() will add more readability
and opportunistically helps some stale compilers behave better.
>>> don't have to look at counters->type to understand its possible values.
>> When someone tries to add a new type of pmc type, the code bugs up.
>
> Are there new types coming along? If so, I definitely would not object to refactoring
> this code in the context of a series that adds a new type(s). But "fixing" this one
> case is not sufficient to support a new type, e.g. intel_is_valid_rdpmc_ecx() also
> needs to be updated. Actually, even this function would need additional updates
> to perform a similar sanity check.
True but this part of the change is semantically relevant, which should not
be present in a harmless generic optimization like this one. Right ?
>
> if (fixed) {
> counters = pmu->fixed_counters;
> num_counters = pmu->nr_arch_fixed_counters;
> } else {
> counters = pmu->gp_counters;
> num_counters = pmu->nr_arch_gp_counters;
> }
> if (idx >= num_counters)
> return NULL;
>
>> And, this one will make all usage of pmu->counter_bitmask[] more consistent.
>
> How's that? There's literally one instance of using ->type
>
> static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
> {
> struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>
> return pmu->counter_bitmask[pmc->type];
> }
>
> everything else is hardcoded. And using pmc->type there make perfect sense in
> that case. But in intel_rdpmc_ecx_to_pmc(), there is already usage of "fixed",
> so IMO switching to ->type makes that function somewhat inconsistent with itself.
More, it's rare to see code like " [ a ? b : c] " in the world of both KVM and x86.
Good practice (branchless) should be scattered everywhere and not the other way
around.
I have absolutely no objection to your "slight preference". Thanks for your time
in reviewing this.
On Wed, Dec 07, 2022, Like Xu wrote:
> On 7/12/2022 1:19 am, Sean Christopherson wrote:
> > On Tue, Dec 06, 2022, Like Xu wrote:
> > > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > > > index e5cec07ca8d9..28b0a784f6e9 100644
> > > > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > > > @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > > > }
> > > > > if (idx >= num_counters)
> > > > > return NULL;
> > > > > - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
> > > > > + *mask &= pmu->counter_bitmask[counters->type];
> > > >
> > > > In terms of readability, I have a slight preference for the current code as I
>
> IMO, using counters->type directly just like pmc_bitmask() will add more readability
> and opportunistically helps some stale compilers behave better.
Anyone that cares about this level of micro-optimization absolutely should be
using a toolchain that's at or near the bleeding edge.
> > > > don't have to look at counters->type to understand its possible values.
> > > When someone tries to add a new type of pmc type, the code bugs up.
> >
> > Are there new types coming along? If so, I definitely would not object to refactoring
> > this code in the context of a series that adds a new type(s). But "fixing" this one
> > case is not sufficient to support a new type, e.g. intel_is_valid_rdpmc_ecx() also
> > needs to be updated. Actually, even this function would need additional updates
> > to perform a similar sanity check.
>
> True but this part of the change is semantically relevant, which should not
> be present in a harmless generic optimization like this one. Right ?
For modern compilers, it's not an optimization.
> > if (fixed) {
> > counters = pmu->fixed_counters;
> > num_counters = pmu->nr_arch_fixed_counters;
> > } else {
> > counters = pmu->gp_counters;
> > num_counters = pmu->nr_arch_gp_counters;
> > }
> > if (idx >= num_counters)
> > return NULL;
> >
> > > And, this one will make all usage of pmu->counter_bitmask[] more consistent.
> >
> > How's that? There's literally one instance of using ->type
> >
> > static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
> > {
> > struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> >
> > return pmu->counter_bitmask[pmc->type];
> > }
> >
> > everything else is hardcoded. And using pmc->type there make perfect sense in
> > that case. But in intel_rdpmc_ecx_to_pmc(), there is already usage of "fixed",
> > so IMO switching to ->type makes that function somewhat inconsistent with itself.
>
> More, it's rare to see code like " [ a ? b : c] " in the world of both KVM and x86.
There are a few false positives here, but ternary operators are common.
$ git grep ? arch/x86/kvm | wc -l
292
If you're saying that indexing an array with a ternary operator is rare, then sure,
but only because there is almost never anything that fits such a pattern, not because
it's an inherently bad pattern.
> Good practice (branchless) should be scattered everywhere and not the other
> way around.
Once again, modern compilers will not generate branches for this code.