2006-12-10 21:23:54

by folkert

[permalink] [raw]
Subject: optimalisation for strlcpy (lib/string.c)

Hi,

Like the other patch (by that other person), I think it is faster to not
do a strlen first.
E.g. replace this:
ize_t strlcpy(char *dest, const char *src, size_t size)
{
size_t ret = strlen(src);

if (size) {
size_t len = (ret >= size) ? size - 1 : ret;
memcpy(dest, src, len);
dest[len] = '\0';
}
return ret;
}
by this:
size_t strlcpy(char *dest, const char *src, size_t size)
{
char *tmp = dest;

for(;;)
{
*dest = *src;
if (!*src)
break;

if (--size == 0)
break;

dest++;
src++;
}

*dest = 0x00;

return dest - tmp;
}
patch:
diff -uNrBbd lib/string.c string-new.c
--- lib/string.c 2006-11-04 02:33:58.000000000 +0100
+++ string-new.c 2006-12-10 22:22:08.000000000 +0100
@@ -121,14 +121,24 @@
*/
size_t strlcpy(char *dest, const char *src, size_t size)
{
- size_t ret = strlen(src);
+ char *tmp = dest;

- if (size) {
- size_t len = (ret >= size) ? size - 1 : ret;
- memcpy(dest, src, len);
- dest[len] = '\0';
+ for(;;)
+ {
+ *dest = *src;
+ if (!*src)
+ break;
+
+ if (--size == 0)
+ break;
+
+ dest++;
+ src++;
}
- return ret;
+
+ *dest = 0x00;
+
+ return dest - tmp;
}
EXPORT_SYMBOL(strlcpy);
#endif


I've tested the speed difference with this:
http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
and the speed difference is quite a bit on a P4: 28% faster.


Signed-off by: Folkert van Heusden <[email protected]>


Folkert van Heusden

--
http://www.vanheusden.com/multitail - multitail is tail on steroids. multiple
windows, filtering, coloring, anything you can think of
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com


2006-12-10 21:36:39

by Mitchell Blank Jr

[permalink] [raw]
Subject: Re: optimalisation for strlcpy (lib/string.c)

Folkert van Heusden wrote:
> Like the other patch (by that other person), I think it is faster to not
> do a strlen first.

Debatable. You've replaced a call to the highly-optimized memcpy function
with a byte-by-byte copy. It's probably a wash performance wise (have you
benchmarked?) and is certainly less clear.

-Mitch

2006-12-10 21:41:15

by folkert

[permalink] [raw]
Subject: Re: optimalisation for strlcpy (lib/string.c)

> > Like the other patch (by that other person), I think it is faster to not
> > do a strlen first.
> Debatable. You've replaced a call to the highly-optimized memcpy function
> with a byte-by-byte copy. It's probably a wash performance wise (have you
> benchmarked?) and is certainly less clear.

I tested it outside the kernel.
The test source can be found here:
http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c


Folkert van Heusden

--
Feeling generous? -> http://www.vanheusden.com/wishlist.php
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2006-12-10 21:49:50

by Eyal Lebedinsky

[permalink] [raw]
Subject: Re: optimalisation for strlcpy (lib/string.c)

Folkert van Heusden wrote:
> Hi,
>
> Like the other patch (by that other person), I think it is faster to not
> do a strlen first.
> E.g. replace this:
> ize_t strlcpy(char *dest, const char *src, size_t size)
> {
> size_t ret = strlen(src);
>
> if (size) {
> size_t len = (ret >= size) ? size - 1 : ret;
> memcpy(dest, src, len);
> dest[len] = '\0';
> }
> return ret;
> }
> by this:
> size_t strlcpy(char *dest, const char *src, size_t size)
> {
> char *tmp = dest;
>
> for(;;)
> {
> *dest = *src;
> if (!*src)
> break;
>
> if (--size == 0)
> break;
>
> dest++;
> src++;
> }
>
> *dest = 0x00;
>
> return dest - tmp;
> }
> patch:
> diff -uNrBbd lib/string.c string-new.c
> --- lib/string.c 2006-11-04 02:33:58.000000000 +0100
> +++ string-new.c 2006-12-10 22:22:08.000000000 +0100
> @@ -121,14 +121,24 @@
> */
> size_t strlcpy(char *dest, const char *src, size_t size)
> {
> - size_t ret = strlen(src);
> + char *tmp = dest;
>
> - if (size) {
> - size_t len = (ret >= size) ? size - 1 : ret;
> - memcpy(dest, src, len);
> - dest[len] = '\0';
> + for(;;)
> + {
> + *dest = *src;
> + if (!*src)
> + break;
> +
> + if (--size == 0)
> + break;
> +
> + dest++;
> + src++;
> }
> - return ret;
> +
> + *dest = 0x00;
> +
> + return dest - tmp;
> }
> EXPORT_SYMBOL(strlcpy);
> #endif
>
>
> I've tested the speed difference with this:
> http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
> and the speed difference is quite a bit on a P4: 28% faster.
>
>
> Signed-off by: Folkert van Heusden <[email protected]>
>
>
> Folkert van Heusden

The two do not do exactly the same. The first one handles 'size == 0'
(should not happen?) safely while the other does not.

--
Eyal Lebedinsky ([email protected]) <http://samba.org/eyal/>
attach .zip as .dat

2006-12-10 22:04:23

by folkert

[permalink] [raw]
Subject: Re: optimalisation for strlcpy (lib/string.c)

> > by this:
> > size_t strlcpy(char *dest, const char *src, size_t size)
> > {
> > char *tmp = dest;
> >
> > for(;;)
> > {
> > *dest = *src;
> > if (!*src)
> > break;
> >
> > if (--size == 0)
> > break;
> >
> > dest++;
> > src++;
> > }
> >
> > *dest = 0x00;
> >
> > return dest - tmp;
> > }
> >
> > I've tested the speed difference with this:
> > http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
> > and the speed difference is quite a bit on a P4: 28% faster.
>
> The two do not do exactly the same. The first one handles 'size == 0'
> (should not happen?) safely while the other does not.

Ok, small change:
diff -uNrBbd lib/string.c string-new.c
--- lib/string.c 2006-11-04 02:33:58.000000000 +0100
+++ string-new.c 2006-12-10 23:02:26.000000000 +0100
@@ -121,14 +122,27 @@
*/
size_t strlcpy(char *dest, const char *src, size_t size)
{
- size_t ret = strlen(src);
+ char *tmp = dest;

- if (size) {
- size_t len = (ret >= size) ? size - 1 : ret;
- memcpy(dest, src, len);
- dest[len] = '\0';
+ if (likeley(size > 0))
+ {
+ for(;;)
+ {
+ *dest = *src;
+ if (unlikely(!*src))
+ break;
+
+ if (unlikely(--size == 0))
+ break;
+
+ dest++;
+ src++;
}
- return ret;
+ }
+
+ *dest = 0x00;
+
+ return dest - tmp;
}
EXPORT_SYMBOL(strlcpy);
#endif


Folkert van Heusden

--
Ever wonder what is out there? Any alien races? Then please support
the seti@home project: setiathome.ssl.berkeley.edu
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2006-12-10 23:12:56

by Adrian Bunk

[permalink] [raw]
Subject: Re: optimalisation for strlcpy (lib/string.c)

On Sun, Dec 10, 2006 at 10:23:51PM +0100, Folkert van Heusden wrote:
> Hi,
>
> Like the other patch (by that other person), I think it is faster to not
> do a strlen first.
>...
> --- lib/string.c 2006-11-04 02:33:58.000000000 +0100
> +++ string-new.c 2006-12-10 22:22:08.000000000 +0100
> @@ -121,14 +121,24 @@
> */
> size_t strlcpy(char *dest, const char *src, size_t size)
> {
> - size_t ret = strlen(src);
> + char *tmp = dest;
>
> - if (size) {
> - size_t len = (ret >= size) ? size - 1 : ret;
> - memcpy(dest, src, len);
> - dest[len] = '\0';
> + for(;;)
> + {
> + *dest = *src;
> + if (!*src)
> + break;
> +
> + if (--size == 0)
> + break;
> +
> + dest++;
> + src++;
> }
> - return ret;
> +
> + *dest = 0x00;
> +
> + return dest - tmp;
>...

Two bugs in your code:
- you copy a maximum of size bytes _plus_ \0
- size == 0 is no longer handled correctly

> I've tested the speed difference with this:
> http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
> and the speed difference is quite a bit on a P4: 28% faster.
>...

My Athlon says:
org: 2.400000
new: 6.710000

IOW, your version is much slower.

But the main question is actually:
Does the performance of this function matter anywhere inside the kernel?
Is strlcpy() used in any fast path?
If not, there's no point in trying to optimize it.

> Folkert van Heusden

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-12-11 05:26:52

by Clemens Koller

[permalink] [raw]
Subject: Re: optimalisation for strlcpy (lib/string.c)

Hi, Adrian & Friends!

>> Like the other patch (by that other person), I think it is faster to not
>> do a strlen first.

There are PowerPC architectures, where a strlen is a matter of a single
instruction of the CPU.

Quote: "PowerPC 405 and 440 instruction sets offer the powerful
Determine Left-Most Zero Byte (DLMZB) instruction."

That's propably hard to beat.

So, if we really want to speed up these things, we should write
code which helps the compiler get that optimized instructions for
us - by improving the code of strlen() i.e.

Reference:
http://www-03.ibm.com/chips/power/powerpc/newsletter/sep2003/ppc_process_at_work2.html

Greets,

Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm-technology.com
Phone: +49-89-741518-50
Fax: +49-89-741518-19