2022-07-10 14:45:58

by Yu-Jen Chang

[permalink] [raw]
Subject: [PATCH 0/2] Optimize memchr()

*** BLURB HERE ***
This patche series optimized "memchr()" and add a macro for
"memchr_inv()" so that both funtions can use it to generate bit mask.

The original implementaion of "memchr()" is based on byte-wise comparison,
which do not fully use 64-bit or 32-bit register in CPU. We implement a
word-wise comparison so that at least 4 bytes can be compared at the same
time. The optimized "memchr()" is nearly 4x faster than the original one
for long strings. In Linux Kernel, we find that the length of the string
searched by "memchr()" is up to 512 bytes in drivers/misc/lkdtm/heap.c.
In our test, the optimized version is about 20% faster if the target
character is at the end of the string when going through a 512-byte
string.

We recompile the 5.18 kernel with optimized "memchr()" in 32-bit and
64-bit. They run correctly.

Yu-Jen Chang (2):
lib/string.c: Add a macro for memchr_inv()
lib/string.c: Optimize memchr()

lib/string.c | 62 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 45 insertions(+), 17 deletions(-)

--
2.25.1


2022-08-12 20:09:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/2] Optimize memchr()

Hi!

> This patche series optimized "memchr()" and add a macro for
> "memchr_inv()" so that both funtions can use it to generate bit mask.
>
> The original implementaion of "memchr()" is based on byte-wise comparison,
> which do not fully use 64-bit or 32-bit register in CPU. We implement a
> word-wise comparison so that at least 4 bytes can be compared at the same
> time. The optimized "memchr()" is nearly 4x faster than the original one
> for long strings. In Linux Kernel, we find that the length of the string

Well... how much slower is it for short strings?

> searched by "memchr()" is up to 512 bytes in drivers/misc/lkdtm/heap.c.
> In our test, the optimized version is about 20% faster if the target
> character is at the end of the string when going through a 512-byte
> string.

"What is the average length passed to memchr" would be more useful question.

Best regards,
Pavel

2022-08-15 11:03:38

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 0/2] Optimize memchr()

From: Pavel Machek
> Sent: 12 August 2022 20:07
>
> Hi!
>
> > This patche series optimized "memchr()" and add a macro for
> > "memchr_inv()" so that both funtions can use it to generate bit mask.
> >
> > The original implementaion of "memchr()" is based on byte-wise comparison,
> > which do not fully use 64-bit or 32-bit register in CPU. We implement a
> > word-wise comparison so that at least 4 bytes can be compared at the same
> > time. The optimized "memchr()" is nearly 4x faster than the original one
> > for long strings. In Linux Kernel, we find that the length of the string
>
> Well... how much slower is it for short strings?

And cold cache??

David

> > searched by "memchr()" is up to 512 bytes in drivers/misc/lkdtm/heap.c.
> > In our test, the optimized version is about 20% faster if the target
> > character is at the end of the string when going through a 512-byte
> > string.
>
> "What is the average length passed to memchr" would be more useful question.
>
> Best regards,
> Pavel

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)