2022-02-25 14:56:21

by Michal Simek

[permalink] [raw]
Subject: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation

Align implementation with memcpy and memmove where also remaining bytes are
copied via final switch case instead of using simple implementations which
loop. But this alignment has much stronger reason and definitely aligning
implementation is not the key point here. It is just good to have in mind
that the same technique is used already there.

In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
This optimization causes GCC to convert the while loop in memset.c into a
call to memset.
So this optimization is transforming a loop in a memset/memcpy into a call
to the function itself. This makes the memset implementation as recursive.
"-freestanding" option will disable the built-in library function but it
has been added in generic library implementation.

In default microblaze kernel defconfig we have CONFIG_OPT_LIB_FUNCTION
enabled so it will always pick optimized version of memset which is target
specific so we are replacing the while() loop with switch case to avoid
recursive memset call.

Issue with freestanding was already discussed in connection to commit
33d0f96ffd73 ("lib/string.c: Use freestanding environment") and also this
is topic in glibc and gcc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
http://patchwork.ozlabs.org/project/glibc/patch/[email protected]/

Signed-off-by: Michal Simek <[email protected]>
Signed-off-by: Mahesh Bodapati <[email protected]>
---

Changes in v2:
- missing patch in v1

arch/microblaze/lib/memset.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/microblaze/lib/memset.c b/arch/microblaze/lib/memset.c
index 615a2f8f53cb..7c2352d56bb0 100644
--- a/arch/microblaze/lib/memset.c
+++ b/arch/microblaze/lib/memset.c
@@ -74,8 +74,19 @@ void *memset(void *v_src, int c, __kernel_size_t n)
}

/* Simple, byte oriented memset or the rest of count. */
- while (n--)
+ switch (n) {
+ case 3:
*src++ = c;
+ fallthrough;
+ case 2:
+ *src++ = c;
+ fallthrough;
+ case 1:
+ *src++ = c;
+ break;
+ default:
+ break;
+ }

return v_src;
}
--
2.35.1


2022-02-26 01:54:51

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation

From: Michal Simek
> Sent: 25 February 2022 13:56
>
> Align implementation with memcpy and memmove where also remaining bytes are
> copied via final switch case instead of using simple implementations which
> loop. But this alignment has much stronger reason and definitely aligning
> implementation is not the key point here. It is just good to have in mind
> that the same technique is used already there.
>
> In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
> This optimization causes GCC to convert the while loop in memset.c into a
> call to memset.

Gah...
That is nearly as brain dead as another compiler that would convert
any byte copy loop (on x86) into 'rep movsb'.

If I want to call memcpy() I'll call memcpy.
If I'm copying a few bytes I might write the loop to avoid
the cost of the call and all the conditional tests for
buffer length and alignment.

Don't the compiler writers have better things to do?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-02-28 08:11:54

by Michal Simek

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] microblaze: Do loop unrolling for optimized memset implementation



On 2/25/22 22:50, David Laight wrote:
> From: Michal Simek
>> Sent: 25 February 2022 13:56
>>
>> Align implementation with memcpy and memmove where also remaining bytes are
>> copied via final switch case instead of using simple implementations which
>> loop. But this alignment has much stronger reason and definitely aligning
>> implementation is not the key point here. It is just good to have in mind
>> that the same technique is used already there.
>>
>> In GCC 10, now -ftree-loop-distribute-patterns optimization is on at O2.
>> This optimization causes GCC to convert the while loop in memset.c into a
>> call to memset.
>
> Gah...
> That is nearly as brain dead as another compiler that would convert
> any byte copy loop (on x86) into 'rep movsb'.
>
> If I want to call memcpy() I'll call memcpy.
> If I'm copying a few bytes I might write the loop to avoid
> the cost of the call and all the conditional tests for
> buffer length and alignment.
>
> Don't the compiler writers have better things to do?

Not sure what you want me to say about it. It is current gcc behavior and I
can't see the way back. I don't think doing loop unrolling here is a big deal
for me because the same technique is used for years in memcpy and memmove.

Thanks,
Michal