2020-10-15 12:05:30

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10

ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>"

With gcc9 I get:

c0017ef8 <__csum_partial>:
c00182fc: 4b ff fb fd bl c0017ef8 <__csum_partial>
c0018478: 4b ff fa 80 b c0017ef8 <__csum_partial>
c03e8458: 4b c2 fa a0 b c0017ef8 <__csum_partial>
c03e8518: 4b c2 f9 e1 bl c0017ef8 <__csum_partial>
c03ef410: 4b c2 8a e9 bl c0017ef8 <__csum_partial>
c03f0b24: 4b c2 73 d5 bl c0017ef8 <__csum_partial>
c04279a4: 4b bf 05 55 bl c0017ef8 <__csum_partial>
c0429820: 4b be e6 d9 bl c0017ef8 <__csum_partial>
c0429944: 4b be e5 b5 bl c0017ef8 <__csum_partial>
c042b478: 4b be ca 81 bl c0017ef8 <__csum_partial>
c042b554: 4b be c9 a5 bl c0017ef8 <__csum_partial>
c045f15c: 4b bb 8d 9d bl c0017ef8 <__csum_partial>
c0492190: 4b b8 5d 69 bl c0017ef8 <__csum_partial>
c0492310: 4b b8 5b e9 bl c0017ef8 <__csum_partial>
c0495594: 4b b8 29 65 bl c0017ef8 <__csum_partial>
c049c420: 4b b7 ba d9 bl c0017ef8 <__csum_partial>
c049c870: 4b b7 b6 89 bl c0017ef8 <__csum_partial>
c049c930: 4b b7 b5 c9 bl c0017ef8 <__csum_partial>
c04a9ca0: 4b b6 e2 59 bl c0017ef8 <__csum_partial>
c04bdde4: 4b b5 a1 15 bl c0017ef8 <__csum_partial>
c04be480: 4b b5 9a 79 bl c0017ef8 <__csum_partial>
c04be710: 4b b5 97 e9 bl c0017ef8 <__csum_partial>
c04c969c: 4b b4 e8 5d bl c0017ef8 <__csum_partial>
c04ca2fc: 4b b4 db fd bl c0017ef8 <__csum_partial>
c04cf5bc: 4b b4 89 3d bl c0017ef8 <__csum_partial>
c04d0440: 4b b4 7a b9 bl c0017ef8 <__csum_partial>

With gcc10 I get:

c0018d08 <__csum_partial>:
c0019020 <csum_partial>:
c0019020: 4b ff fc e8 b c0018d08 <__csum_partial>
c001914c: 4b ff fe d4 b c0019020 <csum_partial>
c0019250: 4b ff fd d1 bl c0019020 <csum_partial>
c03e404c <csum_partial>:
c03e404c: 4b c3 4c bc b c0018d08 <__csum_partial>
c03e4050: 4b ff ff fc b c03e404c <csum_partial>
c03e40fc: 4b ff ff 51 bl c03e404c <csum_partial>
c03e6680: 4b ff d9 cd bl c03e404c <csum_partial>
c03e68c4: 4b ff d7 89 bl c03e404c <csum_partial>
c03e7934: 4b ff c7 19 bl c03e404c <csum_partial>
c03e7bf8: 4b ff c4 55 bl c03e404c <csum_partial>
c03eb148: 4b ff 8f 05 bl c03e404c <csum_partial>
c03ecf68: 4b c2 bd a1 bl c0018d08 <__csum_partial>
c04275b8 <csum_partial>:
c04275b8: 4b bf 17 50 b c0018d08 <__csum_partial>
c0427884: 4b ff fd 35 bl c04275b8 <csum_partial>
c0427b18: 4b ff fa a1 bl c04275b8 <csum_partial>
c0427bd8: 4b ff f9 e1 bl c04275b8 <csum_partial>
c0427cd4: 4b ff f8 e5 bl c04275b8 <csum_partial>
c0427e34: 4b ff f7 85 bl c04275b8 <csum_partial>
c045a1c0: 4b bb eb 49 bl c0018d08 <__csum_partial>
c0489464 <csum_partial>:
c0489464: 4b b8 f8 a4 b c0018d08 <__csum_partial>
c04896b0: 4b ff fd b5 bl c0489464 <csum_partial>
c048982c: 4b ff fc 39 bl c0489464 <csum_partial>
c0490158: 4b b8 8b b1 bl c0018d08 <__csum_partial>
c0492f0c <csum_partial>:
c0492f0c: 4b b8 5d fc b c0018d08 <__csum_partial>
c049326c: 4b ff fc a1 bl c0492f0c <csum_partial>
c049333c: 4b ff fb d1 bl c0492f0c <csum_partial>
c0493b18: 4b ff f3 f5 bl c0492f0c <csum_partial>
c0493f50: 4b ff ef bd bl c0492f0c <csum_partial>
c0493ffc: 4b ff ef 11 bl c0492f0c <csum_partial>
c04a0f78: 4b b7 7d 91 bl c0018d08 <__csum_partial>
c04b3e3c: 4b b6 4e cd bl c0018d08 <__csum_partial>
c04b40d0 <csum_partial>:
c04b40d0: 4b b6 4c 38 b c0018d08 <__csum_partial>
c04b4448: 4b ff fc 89 bl c04b40d0 <csum_partial>
c04b46f4: 4b ff f9 dd bl c04b40d0 <csum_partial>
c04bf448: 4b b5 98 c0 b c0018d08 <__csum_partial>
c04c5264: 4b b5 3a a5 bl c0018d08 <__csum_partial>
c04c61e4: 4b b5 2b 25 bl c0018d08 <__csum_partial>

gcc10 defines multiple versions of csum_partial() which are just
an unconditionnal branch to __csum_partial().

To enforce inlining of that branch to __csum_partial(),
mark csum_partial() as __always_inline.

With this patch with gcc10:

c0018d08 <__csum_partial>:
c0019148: 4b ff fb c0 b c0018d08 <__csum_partial>
c001924c: 4b ff fa bd bl c0018d08 <__csum_partial>
c03e40ec: 4b c3 4c 1d bl c0018d08 <__csum_partial>
c03e4120: 4b c3 4b e8 b c0018d08 <__csum_partial>
c03eb004: 4b c2 dd 05 bl c0018d08 <__csum_partial>
c03ecef4: 4b c2 be 15 bl c0018d08 <__csum_partial>
c0427558: 4b bf 17 b1 bl c0018d08 <__csum_partial>
c04286e4: 4b bf 06 25 bl c0018d08 <__csum_partial>
c0428cd8: 4b bf 00 31 bl c0018d08 <__csum_partial>
c0428d84: 4b be ff 85 bl c0018d08 <__csum_partial>
c045a17c: 4b bb eb 8d bl c0018d08 <__csum_partial>
c0489450: 4b b8 f8 b9 bl c0018d08 <__csum_partial>
c0491860: 4b b8 74 a9 bl c0018d08 <__csum_partial>
c0492eec: 4b b8 5e 1d bl c0018d08 <__csum_partial>
c04a0eac: 4b b7 7e 5d bl c0018d08 <__csum_partial>
c04b3e34: 4b b6 4e d5 bl c0018d08 <__csum_partial>
c04b426c: 4b b6 4a 9d bl c0018d08 <__csum_partial>
c04b463c: 4b b6 46 cd bl c0018d08 <__csum_partial>
c04c004c: 4b b5 8c bd bl c0018d08 <__csum_partial>
c04c0368: 4b b5 89 a1 bl c0018d08 <__csum_partial>
c04c5254: 4b b5 3a b5 bl c0018d08 <__csum_partial>
c04c60d4: 4b b5 2c 35 bl c0018d08 <__csum_partial>

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/checksum.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 9cce06194dcc..cab8fec60005 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -164,7 +164,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
*/
__wsum __csum_partial(const void *buff, int len, __wsum sum);

-static inline __wsum csum_partial(const void *buff, int len, __wsum sum)
+static __always_inline __wsum csum_partial(const void *buff, int len, __wsum sum)
{
if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) {
if (len == 2)
--
2.25.0


2020-10-15 13:34:06

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10



Le 15/10/2020 à 15:25, Segher Boessenkool a écrit :
> Hi!
>
> On Thu, Oct 15, 2020 at 10:52:20AM +0000, Christophe Leroy wrote:
>> With gcc9 I get:
>
> <snip>
>
>> With gcc10 I get:
>
> <snip>
>
>> gcc10 defines multiple versions of csum_partial() which are just
>> an unconditionnal branch to __csum_partial().
>
> It doesn't inline it, yes.
>
> Could you open a GCC PR for this please?

Sure.

I also have get_order() 75 times in my vmlinux, all the same as the following:

c0016790 <get_order>:
c0016790: 38 63 ff ff addi r3,r3,-1
c0016794: 54 63 a3 3e rlwinm r3,r3,20,12,31
c0016798: 7c 63 00 34 cntlzw r3,r3
c001679c: 20 63 00 20 subfic r3,r3,32
c00167a0: 4e 80 00 20 blr

Christophe

2020-10-15 13:36:16

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10

Hi!

On Thu, Oct 15, 2020 at 10:52:20AM +0000, Christophe Leroy wrote:
> With gcc9 I get:

<snip>

> With gcc10 I get:

<snip>

> gcc10 defines multiple versions of csum_partial() which are just
> an unconditionnal branch to __csum_partial().

It doesn't inline it, yes.

Could you open a GCC PR for this please?

> To enforce inlining of that branch to __csum_partial(),
> mark csum_partial() as __always_inline.

That should be fine of course, but it would be good if we could fix this
in the compiler :-)

Reviewed-by: Segher Boessenkool <[email protected]>


Segher

2020-10-15 14:05:35

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10



Le 15/10/2020 à 15:25, Segher Boessenkool a écrit :
> Hi!
>
> On Thu, Oct 15, 2020 at 10:52:20AM +0000, Christophe Leroy wrote:
>> With gcc9 I get:
>
> <snip>
>
>> With gcc10 I get:
>
> <snip>
>
>> gcc10 defines multiple versions of csum_partial() which are just
>> an unconditionnal branch to __csum_partial().
>
> It doesn't inline it, yes.
>
> Could you open a GCC PR for this please?
>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

Christophe

2020-12-21 11:06:02

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10

On Thu, 15 Oct 2020 10:52:20 +0000 (UTC), Christophe Leroy wrote:
> ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>"
>
> With gcc9 I get:
>
> c0017ef8 <__csum_partial>:
> c00182fc: 4b ff fb fd bl c0017ef8 <__csum_partial>
> c0018478: 4b ff fa 80 b c0017ef8 <__csum_partial>
> c03e8458: 4b c2 fa a0 b c0017ef8 <__csum_partial>
> c03e8518: 4b c2 f9 e1 bl c0017ef8 <__csum_partial>
> c03ef410: 4b c2 8a e9 bl c0017ef8 <__csum_partial>
> c03f0b24: 4b c2 73 d5 bl c0017ef8 <__csum_partial>
> c04279a4: 4b bf 05 55 bl c0017ef8 <__csum_partial>
> c0429820: 4b be e6 d9 bl c0017ef8 <__csum_partial>
> c0429944: 4b be e5 b5 bl c0017ef8 <__csum_partial>
> c042b478: 4b be ca 81 bl c0017ef8 <__csum_partial>
> c042b554: 4b be c9 a5 bl c0017ef8 <__csum_partial>
> c045f15c: 4b bb 8d 9d bl c0017ef8 <__csum_partial>
> c0492190: 4b b8 5d 69 bl c0017ef8 <__csum_partial>
> c0492310: 4b b8 5b e9 bl c0017ef8 <__csum_partial>
> c0495594: 4b b8 29 65 bl c0017ef8 <__csum_partial>
> c049c420: 4b b7 ba d9 bl c0017ef8 <__csum_partial>
> c049c870: 4b b7 b6 89 bl c0017ef8 <__csum_partial>
> c049c930: 4b b7 b5 c9 bl c0017ef8 <__csum_partial>
> c04a9ca0: 4b b6 e2 59 bl c0017ef8 <__csum_partial>
> c04bdde4: 4b b5 a1 15 bl c0017ef8 <__csum_partial>
> c04be480: 4b b5 9a 79 bl c0017ef8 <__csum_partial>
> c04be710: 4b b5 97 e9 bl c0017ef8 <__csum_partial>
> c04c969c: 4b b4 e8 5d bl c0017ef8 <__csum_partial>
> c04ca2fc: 4b b4 db fd bl c0017ef8 <__csum_partial>
> c04cf5bc: 4b b4 89 3d bl c0017ef8 <__csum_partial>
> c04d0440: 4b b4 7a b9 bl c0017ef8 <__csum_partial>
>
> [...]

Applied to powerpc/next.

[1/1] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
https://git.kernel.org/powerpc/c/328e7e487a464aad024fbde6663b7859df082b7b

cheers