2018-05-24 11:23:25

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH] powerpc/32: Optimise __csum_partial()

Improve __csum_partial by interleaving loads and adds.

On a 8xx, it brings neither improvement nor degradation.
On a 83xx, it brings a 25% improvement.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/lib/checksum_32.S | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index d2238ea82209..aa224069f93a 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -47,16 +47,25 @@ _GLOBAL(__csum_partial)
bdnz 2b
21: srwi. r6,r4,4 /* # blocks of 4 words to do */
beq 3f
+ lwz r0,4(r3)
mtctr r6
-22: lwz r0,4(r3)
lwz r6,8(r3)
+ adde r5,r5,r0
lwz r7,12(r3)
+ adde r5,r5,r6
lwzu r8,16(r3)
+ adde r5,r5,r7
+ bdz 23f
+22: lwz r0,4(r3)
+ adde r5,r5,r8
+ lwz r6,8(r3)
adde r5,r5,r0
+ lwz r7,12(r3)
adde r5,r5,r6
+ lwzu r8,16(r3)
adde r5,r5,r7
- adde r5,r5,r8
bdnz 22b
+23: adde r5,r5,r8
3: andi. r0,r4,2
beq+ 4f
lhz r0,4(r3)
--
2.13.3



2018-05-25 02:40:58

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/32: Optimise __csum_partial()

On Thu, May 24, 2018 at 11:22:27AM +0000, Christophe Leroy wrote:
> Improve __csum_partial by interleaving loads and adds.
>
> On a 8xx, it brings neither improvement nor degradation.
> On a 83xx, it brings a 25% improvement.

Thanks! Looks fine to me.

> Signed-off-by: Christophe Leroy <[email protected]>

Reviewed-by: Segher Boessenkool <[email protected]>

> ---
> arch/powerpc/lib/checksum_32.S | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
> index d2238ea82209..aa224069f93a 100644
> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -47,16 +47,25 @@ _GLOBAL(__csum_partial)
> bdnz 2b
> 21: srwi. r6,r4,4 /* # blocks of 4 words to do */
> beq 3f
> + lwz r0,4(r3)
> mtctr r6
> -22: lwz r0,4(r3)
> lwz r6,8(r3)
> + adde r5,r5,r0
> lwz r7,12(r3)
> + adde r5,r5,r6
> lwzu r8,16(r3)
> + adde r5,r5,r7
> + bdz 23f
> +22: lwz r0,4(r3)
> + adde r5,r5,r8
> + lwz r6,8(r3)
> adde r5,r5,r0
> + lwz r7,12(r3)
> adde r5,r5,r6
> + lwzu r8,16(r3)
> adde r5,r5,r7
> - adde r5,r5,r8
> bdnz 22b
> +23: adde r5,r5,r8
> 3: andi. r0,r4,2
> beq+ 4f
> lhz r0,4(r3)
> --
> 2.13.3

2018-06-04 14:15:53

by Michael Ellerman

[permalink] [raw]
Subject: Re: powerpc/32: Optimise __csum_partial()

On Thu, 2018-05-24 at 11:22:27 UTC, Christophe Leroy wrote:
> Improve __csum_partial by interleaving loads and adds.
>
> On a 8xx, it brings neither improvement nor degradation.
> On a 83xx, it brings a 25% improvement.
>
> Signed-off-by: Christophe Leroy <[email protected]>
> Reviewed-by: Segher Boessenkool <[email protected]>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/373e098e1e788d7b89ec0f31765a6c

cheers