2022-12-05 11:52:23

by Like Xu

[permalink] [raw]
Subject: [PATCH] KVM: x86/pmu: Avoid ternary operator by directly referring to counters->type

From: Like Xu <[email protected]>

In either case, the counters will point to fixed or gp pmc array, and
taking advantage of the C pointer, it's reasonable to use an almost known
mem load operation directly without disturbing the branch predictor.

Signed-off-by: Like Xu <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e5cec07ca8d9..28b0a784f6e9 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
}
if (idx >= num_counters)
return NULL;
- *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
+ *mask &= pmu->counter_bitmask[counters->type];
return &counters[array_index_nospec(idx, num_counters)];
}

--
2.38.1


2022-12-05 17:05:24

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/pmu: Avoid ternary operator by directly referring to counters->type

On Mon, Dec 05, 2022, Like Xu wrote:
> From: Like Xu <[email protected]>
>
> In either case, the counters will point to fixed or gp pmc array, and
> taking advantage of the C pointer, it's reasonable to use an almost known
> mem load operation directly without disturbing the branch predictor.

The compiler is extremely unlikely to generate a branch for this, e.g. gcc-12 uses
setne and clang-14 shifts "fixed" by 30. FWIW, clang is also clever enough to
use a cmov to load the address of counters, i.e. the happy path will have no taken
branches for either type of counter.

> Signed-off-by: Like Xu <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index e5cec07ca8d9..28b0a784f6e9 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> }
> if (idx >= num_counters)
> return NULL;
> - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
> + *mask &= pmu->counter_bitmask[counters->type];

In terms of readability, I have a slight preference for the current code as I
don't have to look at counters->type to understand its possible values.

2022-12-06 02:51:44

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/pmu: Avoid ternary operator by directly referring to counters->type

On 6/12/2022 12:46 am, Sean Christopherson wrote:
> On Mon, Dec 05, 2022, Like Xu wrote:
>> From: Like Xu <[email protected]>
>>
>> In either case, the counters will point to fixed or gp pmc array, and
>> taking advantage of the C pointer, it's reasonable to use an almost known
>> mem load operation directly without disturbing the branch predictor.
>
> The compiler is extremely unlikely to generate a branch for this, e.g. gcc-12 uses
> setne and clang-14 shifts "fixed" by 30. FWIW, clang is also clever enough to
> use a cmov to load the address of counters, i.e. the happy path will have no taken
> branches for either type of counter.

If so, good news for users of the new tool chain. I assume our Linux project is also
to be commended when it comes to supporting legacy issues even if just a little.

>
>> Signed-off-by: Like Xu <[email protected]>
>> ---
>> arch/x86/kvm/vmx/pmu_intel.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index e5cec07ca8d9..28b0a784f6e9 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>> }
>> if (idx >= num_counters)
>> return NULL;
>> - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
>> + *mask &= pmu->counter_bitmask[counters->type];
>
> In terms of readability, I have a slight preference for the current code as I
> don't have to look at counters->type to understand its possible values.
When someone tries to add a new type of pmc type, the code bugs up. And,
this one will make all usage of pmu->counter_bitmask[] more consistent.

Please reconsider this minor diff if it does no harm.

2022-12-06 17:37:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/pmu: Avoid ternary operator by directly referring to counters->type

On Tue, Dec 06, 2022, Like Xu wrote:
> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > index e5cec07ca8d9..28b0a784f6e9 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > }
> > > if (idx >= num_counters)
> > > return NULL;
> > > - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
> > > + *mask &= pmu->counter_bitmask[counters->type];
> >
> > In terms of readability, I have a slight preference for the current code as I
> > don't have to look at counters->type to understand its possible values.
> When someone tries to add a new type of pmc type, the code bugs up.

Are there new types coming along? If so, I definitely would not object to refactoring
this code in the context of a series that adds a new type(s). But "fixing" this one
case is not sufficient to support a new type, e.g. intel_is_valid_rdpmc_ecx() also
needs to be updated. Actually, even this function would need additional updates
to perform a similar sanity check.

if (fixed) {
counters = pmu->fixed_counters;
num_counters = pmu->nr_arch_fixed_counters;
} else {
counters = pmu->gp_counters;
num_counters = pmu->nr_arch_gp_counters;
}
if (idx >= num_counters)
return NULL;

> And, this one will make all usage of pmu->counter_bitmask[] more consistent.

How's that? There's literally one instance of using ->type

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu = pmc_to_pmu(pmc);

return pmu->counter_bitmask[pmc->type];
}

everything else is hardcoded. And using pmc->type there make perfect sense in
that case. But in intel_rdpmc_ecx_to_pmc(), there is already usage of "fixed",
so IMO switching to ->type makes that function somewhat inconsistent with itself.

2022-12-07 09:18:52

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/pmu: Avoid ternary operator by directly referring to counters->type

On 7/12/2022 1:19 am, Sean Christopherson wrote:
> On Tue, Dec 06, 2022, Like Xu wrote:
>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>>> index e5cec07ca8d9..28b0a784f6e9 100644
>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>> @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>>>> }
>>>> if (idx >= num_counters)
>>>> return NULL;
>>>> - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
>>>> + *mask &= pmu->counter_bitmask[counters->type];
>>>
>>> In terms of readability, I have a slight preference for the current code as I

IMO, using counters->type directly just like pmc_bitmask() will add more readability
and opportunistically helps some stale compilers behave better.

>>> don't have to look at counters->type to understand its possible values.
>> When someone tries to add a new type of pmc type, the code bugs up.
>
> Are there new types coming along? If so, I definitely would not object to refactoring
> this code in the context of a series that adds a new type(s). But "fixing" this one
> case is not sufficient to support a new type, e.g. intel_is_valid_rdpmc_ecx() also
> needs to be updated. Actually, even this function would need additional updates
> to perform a similar sanity check.

True but this part of the change is semantically relevant, which should not
be present in a harmless generic optimization like this one. Right ?

>
> if (fixed) {
> counters = pmu->fixed_counters;
> num_counters = pmu->nr_arch_fixed_counters;
> } else {
> counters = pmu->gp_counters;
> num_counters = pmu->nr_arch_gp_counters;
> }
> if (idx >= num_counters)
> return NULL;
>
>> And, this one will make all usage of pmu->counter_bitmask[] more consistent.
>
> How's that? There's literally one instance of using ->type
>
> static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
> {
> struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>
> return pmu->counter_bitmask[pmc->type];
> }
>
> everything else is hardcoded. And using pmc->type there make perfect sense in
> that case. But in intel_rdpmc_ecx_to_pmc(), there is already usage of "fixed",
> so IMO switching to ->type makes that function somewhat inconsistent with itself.

More, it's rare to see code like " [ a ? b : c] " in the world of both KVM and x86.
Good practice (branchless) should be scattered everywhere and not the other way
around.

I have absolutely no objection to your "slight preference". Thanks for your time
in reviewing this.

2022-12-07 18:29:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/pmu: Avoid ternary operator by directly referring to counters->type

On Wed, Dec 07, 2022, Like Xu wrote:
> On 7/12/2022 1:19 am, Sean Christopherson wrote:
> > On Tue, Dec 06, 2022, Like Xu wrote:
> > > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > > > index e5cec07ca8d9..28b0a784f6e9 100644
> > > > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > > > @@ -142,7 +142,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
> > > > > }
> > > > > if (idx >= num_counters)
> > > > > return NULL;
> > > > > - *mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
> > > > > + *mask &= pmu->counter_bitmask[counters->type];
> > > >
> > > > In terms of readability, I have a slight preference for the current code as I
>
> IMO, using counters->type directly just like pmc_bitmask() will add more readability
> and opportunistically helps some stale compilers behave better.

Anyone that cares about this level of micro-optimization absolutely should be
using a toolchain that's at or near the bleeding edge.

> > > > don't have to look at counters->type to understand its possible values.
> > > When someone tries to add a new type of pmc type, the code bugs up.
> >
> > Are there new types coming along? If so, I definitely would not object to refactoring
> > this code in the context of a series that adds a new type(s). But "fixing" this one
> > case is not sufficient to support a new type, e.g. intel_is_valid_rdpmc_ecx() also
> > needs to be updated. Actually, even this function would need additional updates
> > to perform a similar sanity check.
>
> True but this part of the change is semantically relevant, which should not
> be present in a harmless generic optimization like this one. Right ?

For modern compilers, it's not an optimization.

> > if (fixed) {
> > counters = pmu->fixed_counters;
> > num_counters = pmu->nr_arch_fixed_counters;
> > } else {
> > counters = pmu->gp_counters;
> > num_counters = pmu->nr_arch_gp_counters;
> > }
> > if (idx >= num_counters)
> > return NULL;
> >
> > > And, this one will make all usage of pmu->counter_bitmask[] more consistent.
> >
> > How's that? There's literally one instance of using ->type
> >
> > static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
> > {
> > struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> >
> > return pmu->counter_bitmask[pmc->type];
> > }
> >
> > everything else is hardcoded. And using pmc->type there make perfect sense in
> > that case. But in intel_rdpmc_ecx_to_pmc(), there is already usage of "fixed",
> > so IMO switching to ->type makes that function somewhat inconsistent with itself.
>
> More, it's rare to see code like " [ a ? b : c] " in the world of both KVM and x86.

There are a few false positives here, but ternary operators are common.

$ git grep ? arch/x86/kvm | wc -l
292

If you're saying that indexing an array with a ternary operator is rare, then sure,
but only because there is almost never anything that fits such a pattern, not because
it's an inherently bad pattern.

> Good practice (branchless) should be scattered everywhere and not the other
> way around.

Once again, modern compilers will not generate branches for this code.