I used the unlikely() macro on the return values of the k.alloc
calls and found that it changes the code generation a bit.
Optimize all return paths of k.alloc calls by improving
branch prediction on return value of k.alloc.
Signed-off-by: Kautuk Consul <[email protected]>
---
arch/powerpc/kvm/book3s_hv_nested.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 5a64a1341e6f..dbf2dd073e1f 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -446,7 +446,7 @@ long kvmhv_nested_init(void)
ptb_order = 12;
pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order,
GFP_KERNEL);
- if (!pseries_partition_tb) {
+ if (unlikely(!pseries_partition_tb)) {
pr_err("kvm-hv: failed to allocated nested partition table\n");
return -ENOMEM;
}
@@ -575,7 +575,7 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
return H_PARAMETER;
buf = kzalloc(n, GFP_KERNEL | __GFP_NOWARN);
- if (!buf)
+ if (unlikely(!buf))
return H_NO_MEM;
gp = kvmhv_get_nested(vcpu->kvm, l1_lpid, false);
@@ -689,7 +689,7 @@ static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int
long shadow_lpid;
gp = kzalloc(sizeof(*gp), GFP_KERNEL);
- if (!gp)
+ if (unlikely(!gp))
return NULL;
gp->l1_host = kvm;
gp->l1_lpid = lpid;
@@ -1633,7 +1633,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
/* 4. Insert the pte into our shadow_pgtable */
n_rmap = kzalloc(sizeof(*n_rmap), GFP_KERNEL);
- if (!n_rmap)
+ if (unlikely(!n_rmap))
return RESUME_GUEST; /* Let the guest try again */
n_rmap->rmap = (n_gpa & RMAP_NESTED_GPA_MASK) |
(((unsigned long) gp->l1_lpid) << RMAP_NESTED_LPID_SHIFT);
--
2.39.2
On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> I used the unlikely() macro on the return values of the k.alloc
> calls and found that it changes the code generation a bit.
> Optimize all return paths of k.alloc calls by improving
> branch prediction on return value of k.alloc.
What about below?
"Improve branch prediction on kmalloc() and kzalloc() call by using
unlikely() macro to optimize their return paths."
That is, try to avoid first-person construct (I).
Thanks.
--
An old man doll... just what I always wanted! - Clara
On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > I used the unlikely() macro on the return values of the k.alloc
> > calls and found that it changes the code generation a bit.
> > Optimize all return paths of k.alloc calls by improving
> > branch prediction on return value of k.alloc.
Nit, this is improving code generation, not branch prediction.
> What about below?
>
> "Improve branch prediction on kmalloc() and kzalloc() call by using
> unlikely() macro to optimize their return paths."
Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
As above, it does often improve code generation for the happy path, but that doesn't
always equate to improved performance, e.g. if the CPU can easily predict the branch
and/or there is no impact on the cache footprint.
On 2023-04-07 09:01:29, Sean Christopherson wrote:
> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > > I used the unlikely() macro on the return values of the k.alloc
> > > calls and found that it changes the code generation a bit.
> > > Optimize all return paths of k.alloc calls by improving
> > > branch prediction on return value of k.alloc.
>
> Nit, this is improving code generation, not branch prediction.
Sorry my mistake.
>
> > What about below?
> >
> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > unlikely() macro to optimize their return paths."
>
> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> As above, it does often improve code generation for the happy path, but that doesn't
> always equate to improved performance, e.g. if the CPU can easily predict the branch
> and/or there is no impact on the cache footprint.
I see. I will submit a v2 of the patch with a better and more accurate
description. Does anyone else have any comments before I do so ?
Kautuk Consul <[email protected]> writes:
> On 2023-04-07 09:01:29, Sean Christopherson wrote:
>> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
>> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
>> > > I used the unlikely() macro on the return values of the k.alloc
>> > > calls and found that it changes the code generation a bit.
>> > > Optimize all return paths of k.alloc calls by improving
>> > > branch prediction on return value of k.alloc.
>>
>> Nit, this is improving code generation, not branch prediction.
> Sorry my mistake.
>>
>> > What about below?
>> >
>> > "Improve branch prediction on kmalloc() and kzalloc() call by using
>> > unlikely() macro to optimize their return paths."
>>
>> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
>> As above, it does often improve code generation for the happy path, but that doesn't
>> always equate to improved performance, e.g. if the CPU can easily predict the branch
>> and/or there is no impact on the cache footprint.
> I see. I will submit a v2 of the patch with a better and more accurate
> description. Does anyone else have any comments before I do so ?
In general I think unlikely should be saved for cases where either the
compiler is generating terrible code, or the likelyness of the condition
might be surprising to a human reader.
eg. if you had some code that does a NULL check and it's *expected* that
the value is NULL, then wrapping that check in likely() actually adds
information for a human reader.
Also please don't use unlikely in init paths or other cold paths, it
clutters the code (only slightly but a little) and that's not worth the
possible tiny benefit for code that only runs once or infrequently.
I would expect the compilers to do the right thing in all
these cases without the unlikely. But if you can demonstrate that they
meaningfully improve the code generation with a before/after
dissassembly then I'd be interested.
cheers
Sorry, last email rejected by the mailing lists.
Can you please look at the diff file attach ?
On Tue, Apr 11, 2023 at 2:14 PM Kautuk Consul
<[email protected]> wrote:
>
> Hi,
>
> Sorry Im replying back using my private gmail ID as I can't figure out
> how to attach multiple files using mutt.
>
> On Tue, Apr 11, 2023 at 12:05 PM Michael Ellerman <[email protected]> wrote:
> >
> > Kautuk Consul <[email protected]> writes:
> > > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > >> > > I used the unlikely() macro on the return values of the k.alloc
> > >> > > calls and found that it changes the code generation a bit.
> > >> > > Optimize all return paths of k.alloc calls by improving
> > >> > > branch prediction on return value of k.alloc.
> > >>
> > >> Nit, this is improving code generation, not branch prediction.
> > > Sorry my mistake.
> > >>
> > >> > What about below?
> > >> >
> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > >> > unlikely() macro to optimize their return paths."
> > >>
> > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> > >> As above, it does often improve code generation for the happy path, but that doesn't
> > >> always equate to improved performance, e.g. if the CPU can easily predict the branch
> > >> and/or there is no impact on the cache footprint.
> >
> > > I see. I will submit a v2 of the patch with a better and more accurate
> > > description. Does anyone else have any comments before I do so ?
> >
> > In general I think unlikely should be saved for cases where either the
> > compiler is generating terrible code, or the likelyness of the condition
> > might be surprising to a human reader.
> >
> > eg. if you had some code that does a NULL check and it's *expected* that
> > the value is NULL, then wrapping that check in likely() actually adds
> > information for a human reader.
> >
> > Also please don't use unlikely in init paths or other cold paths, it
> > clutters the code (only slightly but a little) and that's not worth the
> > possible tiny benefit for code that only runs once or infrequently.
> >
> > I would expect the compilers to do the right thing in all
> > these cases without the unlikely. But if you can demonstrate that they
> > meaningfully improve the code generation with a before/after
> > dissassembly then I'd be interested.
> >
> There are surprisingly many changes to code generation before and
> after using these
> instances of the unlikely macro. I couldn't really analyze all of them
> to be able to state
> that they are indeed improving performance in some way. I assumed the compiler
> would generate optimal code for these unlikely paths.
> Please find the before and after file attached to this email.
>
> > cheers
Hi,
On 2023-04-11 16:35:10, Michael Ellerman wrote:
> Kautuk Consul <[email protected]> writes:
> > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> >> > > I used the unlikely() macro on the return values of the k.alloc
> >> > > calls and found that it changes the code generation a bit.
> >> > > Optimize all return paths of k.alloc calls by improving
> >> > > branch prediction on return value of k.alloc.
> >>
> >> Nit, this is improving code generation, not branch prediction.
> > Sorry my mistake.
> >>
> >> > What about below?
> >> >
> >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> >> > unlikely() macro to optimize their return paths."
> >>
> >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> >> As above, it does often improve code generation for the happy path, but that doesn't
> >> always equate to improved performance, e.g. if the CPU can easily predict the branch
> >> and/or there is no impact on the cache footprint.
>
> > I see. I will submit a v2 of the patch with a better and more accurate
> > description. Does anyone else have any comments before I do so ?
>
> In general I think unlikely should be saved for cases where either the
> compiler is generating terrible code, or the likelyness of the condition
> might be surprising to a human reader.
>
> eg. if you had some code that does a NULL check and it's *expected* that
> the value is NULL, then wrapping that check in likely() actually adds
> information for a human reader.
>
> Also please don't use unlikely in init paths or other cold paths, it
> clutters the code (only slightly but a little) and that's not worth the
> possible tiny benefit for code that only runs once or infrequently.
>
> I would expect the compilers to do the right thing in all
> these cases without the unlikely. But if you can demonstrate that they
> meaningfully improve the code generation with a before/after
> dissassembly then I'd be interested.
Just FYI, the last email by [email protected] was by me.
That last email contains a diff file attachment which compares 2 files:
before my changes and after my changes.
This diff file shows a lot of changes in code generation. Im assuming
all those changes are made by the compiler towards optimizing all return
paths to k.alloc calls.
Kindly review and comment.
> cheers
On 2023-04-12 12:34:13, Kautuk Consul wrote:
> Hi,
>
> On 2023-04-11 16:35:10, Michael Ellerman wrote:
> > Kautuk Consul <[email protected]> writes:
> > > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > >> > > I used the unlikely() macro on the return values of the k.alloc
> > >> > > calls and found that it changes the code generation a bit.
> > >> > > Optimize all return paths of k.alloc calls by improving
> > >> > > branch prediction on return value of k.alloc.
> > >>
> > >> Nit, this is improving code generation, not branch prediction.
> > > Sorry my mistake.
> > >>
> > >> > What about below?
> > >> >
> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > >> > unlikely() macro to optimize their return paths."
> > >>
> > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization.
> > >> As above, it does often improve code generation for the happy path, but that doesn't
> > >> always equate to improved performance, e.g. if the CPU can easily predict the branch
> > >> and/or there is no impact on the cache footprint.
> >
> > > I see. I will submit a v2 of the patch with a better and more accurate
> > > description. Does anyone else have any comments before I do so ?
> >
> > In general I think unlikely should be saved for cases where either the
> > compiler is generating terrible code, or the likelyness of the condition
> > might be surprising to a human reader.
> >
> > eg. if you had some code that does a NULL check and it's *expected* that
> > the value is NULL, then wrapping that check in likely() actually adds
> > information for a human reader.
> >
> > Also please don't use unlikely in init paths or other cold paths, it
> > clutters the code (only slightly but a little) and that's not worth the
> > possible tiny benefit for code that only runs once or infrequently.
> >
> > I would expect the compilers to do the right thing in all
> > these cases without the unlikely. But if you can demonstrate that they
> > meaningfully improve the code generation with a before/after
> > dissassembly then I'd be interested.
> Just FYI, the last email by [email protected] was by me.
> That last email contains a diff file attachment which compares 2 files:
> before my changes and after my changes.
> This diff file shows a lot of changes in code generation. Im assuming
> all those changes are made by the compiler towards optimizing all return
> paths to k.alloc calls.
> Kindly review and comment.
Any comments on the numerous code generation changes as shown by the
files I attached to this mail chain ? Sorry I don't have concrete
figures of any type to prove that this leads to any measurable performance
improvements. I am just assuming that the compiler's modified code
generation (due to the use of the unlikely macro) would be optimal.
Thanks.
> > cheers