2007-08-22 09:34:59

by lode leroy

[permalink] [raw]
Subject: [PATCH] memchr (trivial) optimization

While profiling something completely unrelated, I noticed
that on the workloads I used memchr for, I saw a 30%-40% improvement
in performance, with the following trivial changes...
(basically, it saves 3 operations for each call)

$ diff -Nurp linux/lib/string.c{-old,}
--- linux/lib/string.c-old 2007-08-22 11:18:54.000000000 +0200
+++ linux/lib/string.c 2007-08-22 11:19:20.000000000 +0200
@@ -623,10 +623,12 @@ EXPORT_SYMBOL(strstr);
void *memchr(const void *s, int c, size_t n)
{
const unsigned char *p = s;
- while (n-- != 0) {
- if ((unsigned char)c == *p++) {
- return (void *)(p - 1);
+ while (n != 0) {
+ if ((unsigned char)c == *p) {
+ return (void *)p;
}
+ n--;
+ p++;
}
return NULL;
}

_________________________________________________________________
A lot of passions? Collect all your personal info on one central location ,
for free! http://get.live.com/live/features


2007-08-22 10:03:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization

"lode leroy" <[email protected]> writes:

> While profiling something completely unrelated, I noticed
> that on the workloads I used memchr for, I saw a 30%-40% improvement
> in performance, with the following trivial changes...
> (basically, it saves 3 operations for each call)

What kind of workload? I didn't think anything in tree
was memchr intensive.

-Andi

2007-08-23 00:05:31

by Ingo Oeser

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization

On Wednesday 22 August 2007, lode leroy wrote:
> While profiling something completely unrelated, I noticed
> that on the workloads I used memchr for, I saw a 30%-40% improvement
> in performance, with the following trivial changes...
> (basically, it saves 3 operations for each call)

Yes, but then you could be a bit more explicit to the compiler
on what you are doing here:

void *memchr(const void *s, int c, size_t n)
{
const unsigned char *p = s;

for (; n != 0; n--, p++) {
if ((unsigned char)c == *p) {
return (void *)p;
}
return NULL;
}

Now the compiler should see the loop more clearly.


Best Regards

Ingo Oeser

2007-08-24 00:12:28

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization

On Thu, Aug 23, 2007 at 02:13:20AM +0200, Ingo Oeser wrote:
> On Wednesday 22 August 2007, lode leroy wrote:
> > While profiling something completely unrelated, I noticed
> > that on the workloads I used memchr for, I saw a 30%-40% improvement
> > in performance, with the following trivial changes...
> > (basically, it saves 3 operations for each call)
>
> Yes, but then you could be a bit more explicit to the compiler
> on what you are doing here:
>
> void *memchr(const void *s, int c, size_t n)
> {
> const unsigned char *p = s;
>
> for (; n != 0; n--, p++) {
> if ((unsigned char)c == *p) {
> return (void *)p;
> }
> return NULL;
> }
>
> Now the compiler should see the loop more clearly.

And you can do even better with this:

void *memchr(const void *s, int c, size_t n)
{
const unsigned char *p = s, *e = s + n;
const unsigned char *e = p + n;

for (; p < e ; p++)
if ((unsigned char)c == *p)
return (void *)p;

return NULL;
}

which changes the inner loop from:

50: 38 08 cmp %cl,(%eax)
52: 74 08 je 5c <memchr2+0x1a>
54: 4a dec %edx
55: 40 inc %eax
56: 85 d2 test %edx,%edx
58: 75 f6 jne 50 <memchr2+0xe>

to:

6e: 38 08 cmp %cl,(%eax)
70: 74 07 je 79 <memchr3+0x1b>
72: 40 inc %eax
73: 39 d0 cmp %edx,%eax
75: 72 f7 jb 6e <memchr3+0x10>


--
Mathematics is the supreme nostalgia of our time.

2007-08-24 01:03:39

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization

Matt Mackall wrote:
> 6e: 38 08 cmp %cl,(%eax)
> 70: 74 07 je 79 <memchr3+0x1b>
> 72: 40 inc %eax
>
It's a bit gross that the compiler is using inc here rather than lea or
add, but still...

Er, something's spending 30% of its time in memchr? This is not the
code to fix.

J

2007-08-24 02:18:36

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization

On Thu, Aug 23, 2007 at 06:03:29PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> > 6e: 38 08 cmp %cl,(%eax)
> > 70: 74 07 je 79 <memchr3+0x1b>
> > 72: 40 inc %eax
> >
> It's a bit gross that the compiler is using inc here rather than lea or
> add, but still...
>
> Er, something's spending 30% of its time in memchr? This is not the
> code to fix.

Indeed. I'm just pointing out the general optimization of walking
one counter instead of two.

Hmm, perhaps the culprit is validate_nla?

--
Mathematics is the supreme nostalgia of our time.

2007-08-24 12:56:00

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization


On Aug 23 2007 19:13, Matt Mackall wrote:
>
>And you can do even better with this:
>
>void *memchr(const void *s, int c, size_t n)
>{
> const unsigned char *p = s, *e = s + n;
> const unsigned char *e = p + n;

Uhm, you have two "e"s in there.

> for (; p < e ; p++)
> if ((unsigned char)c == *p)
> return (void *)p;
>
> return NULL;
>}

Or do it glibc-style

void *memchr(const void *s, unsigned char c, size_t n)
{
...
for (; p + 3 < e; p += 4) {
if (c == p[0])
return (void *)&p[0];
if (c == p[1])
return (void *)&p[1];
if (c == p[2])
return (void *)&p[2];
if (c == p[3])
return (void *)&p[3];
}
... /* check the rest */
}


Jan
--

2007-08-24 15:56:24

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] memchr (trivial) optimization

On Fri, Aug 24, 2007 at 02:54:48PM +0200, Jan Engelhardt wrote:
>
> On Aug 23 2007 19:13, Matt Mackall wrote:
> >
> >And you can do even better with this:
> >
> >void *memchr(const void *s, int c, size_t n)
> >{
> > const unsigned char *p = s, *e = s + n;
> > const unsigned char *e = p + n;
>
> Uhm, you have two "e"s in there.

Yep, that's what I get for editing in email.

> Or do it glibc-style
>
> void *memchr(const void *s, unsigned char c, size_t n)
> {
> ...
> for (; p + 3 < e; p += 4) {
> if (c == p[0])
> return (void *)&p[0];
> if (c == p[1])
> return (void *)&p[1];
> if (c == p[2])
> return (void *)&p[2];
> if (c == p[3])
> return (void *)&p[3];
> }
> ... /* check the rest */
> }

Yes, very funny.

--
Mathematics is the supreme nostalgia of our time.