2000-12-20 14:40:00

by Steve Grubb

[permalink] [raw]
Subject: [Patch] performance enhancement for simple_strtoul

Hello,

The following patch is a faster implementation of the simple_strtoul
function. This function differs from the original in that it reduces the
multiplies to shifts and logical operations wherever possible. My testing
shows that it adds around 100 bytes, but is about 6% faster on a K6-2. (It
is 40% faster that glibc's strtoul, but that's a different story.) My guess
is that the performance gain will be higher on platforms with slower
multiplication instructions. I have tested it for numerical accuracy so I
think this is safe to apply. If anyone is interested, I can also supply a
test application that demonstrates the performance gain. This patch was
generated against 2.2.16, but should apply to 2.2.19 cleanly. In
2.4.0-test9, simple_strtoul starts on line 19 rather than 17, hopefully
that's not a problem.

Cheers,
Steve Grubb

-------------------------

--- lib/vsprintf.orig Fri Dec 1 08:58:02 2000
+++ lib/vsprintf.c Wed Dec 20 08:42:26 2000
@@ -14,10 +14,13 @@
#include <linux/string.h>
#include <linux/ctype.h>

+/*
+* This function converts base 8, 10, or 16 only - Steve Grubb
+*/
unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base)
{
- unsigned long result = 0,value;
-
+ unsigned char c;
+ unsigned long result = 0;
if (!base) {
base = 10;
if (*cp == '0') {
@@ -29,11 +32,36 @@
}
}
}
- while (isxdigit(*cp) && (value = isdigit(*cp) ? *cp-'0' : (islower(*cp)
- ? toupper(*cp) : *cp)-'A'+10) < base) {
- result = result*base + value;
- cp++;
- }
+ c = *cp;
+ switch (base) {
+ case 10:
+ while (isdigit(c)) {
+ result = (result*10) + (c & 0x0f);
+ c = *(++cp);
+ }
+ break;
+ case 16:
+ while (isxdigit(c)) {
+ result = result<<4;
+ if (c&0x40)
+ result += (c & 0x07) + 9;
+ else
+ result += c & 0x0f;
+ c = *(++cp);
+ }
+ break;
+ case 8:
+ while (isdigit(c)) {
+ if ((c&0x37) == c)
+ result = (result<<3) + (c & 0x07);
+ else
+ break;
+ c = *(++cp);
+ }
+ break;
+ default: /* Is anything else used by the kernel? */
+ break;
+ }
if (endp)
*endp = (char *)cp;
return result;




2000-12-20 16:35:42

by Jeff Epler

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

On Wed, Dec 20, 2000 at 09:09:03AM -0500, Steve Grubb wrote:
> Hello,
>
> The following patch is a faster implementation of the simple_strtoul
> function.
[snip]

Why not preserve the existing code for bases other than 8, 10, and 16?
Admittedly, the only other case that is likely to be used would be base
2, but surely there's only a penalty of a few dozen bytes for the
following code..
> - while (isxdigit(*cp) && (value = isdigit(*cp) ? *cp-'0' : (islower(*cp)
> - ? toupper(*cp) : *cp)-'A'+10) < base) {
> - result = result*base + value;
> - cp++;
> - }

Jeff

2000-12-20 17:06:33

by Steve Grubb

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

Hello,

I thought about that. This would be my recommendation for glibc where the
general public may be doing scientific applications. But this is the kernel
and there are people that would reject my patch purely on the basis that it
adds precious bytes to the kernel. But since the kernel is "controllable" &
printf() and its variants only support 8, 10, & 16, perhaps a better
solution might be to trap the odd case and write something for it if its
that important, or simply don't allow it.

The base guessing part at the beginning of the function only supports base
8, 10, & 16. Therefore, the only way to require another base is to specify
it in the function call (param - unsigned int base). A quick scan of the
current linux source shows no one using something odd. So...

If the maintainers of vsprintf.c want support for all number bases, that's
fine with me. Just say the word & I'll gen up another patch...but it will be
more bytes.

Cheers,
Steve Grubb


2000-12-20 19:13:44

by Steve Grubb

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

Hello,

I continued experimenting with the Test Case and found a further speed
improvement & I am re-submiting the patch. It is the same as the first one
with the two local variables changed to register storage types.

On a K6-2, I now see:
Base 10 - 28% speedup
Base 16 - 24% speedup
Base 8 - 30% speedup

On a P3 system, I now see:
Base 10 - 25% speedup
Base 16 - 17% speedup
Base 8 - 20% speedup

It seems gcc creates much better code with the variables set to register
types. Please apply the following patch. It should apply to any recent 2.2.x
without problems. In 2.4 the function starts 2 lines later.

Cheers,
Steve Grubb

----------------------------------

--- lib/vsprintf.orig Fri Dec 1 08:58:02 2000
+++ lib/vsprintf.c Wed Dec 20 13:14:13 2000
@@ -14,10 +14,13 @@
#include <linux/string.h>
#include <linux/ctype.h>

+/*
+* This function converts base 8, 10, or 16 only - Steve Grubb
+*/
unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base)
{
- unsigned long result = 0,value;
-
+ register unsigned char c;
+ register unsigned long result = 0;
if (!base) {
base = 10;
if (*cp == '0') {
@@ -29,11 +32,36 @@
}
}
}
- while (isxdigit(*cp) && (value = isdigit(*cp) ? *cp-'0' : (islower(*cp)
- ? toupper(*cp) : *cp)-'A'+10) < base) {
- result = result*base + value;
- cp++;
- }
+ c = *cp;
+ switch (base) {
+ case 10:
+ while (isdigit(c)) {
+ result = (result*10) + (c & 0x0f);
+ c = *(++cp);
+ }
+ break;
+ case 16:
+ while (isxdigit(c)) {
+ result = result<<4;
+ if (c&0x40)
+ result += (c & 0x07) + 9;
+ else
+ result += c & 0x0f;
+ c = *(++cp);
+ }
+ break;
+ case 8:
+ while (isdigit(c)) {
+ if ((c&0x37) == c)
+ result = (result<<3) + (c & 0x07);
+ else
+ break;
+ c = *(++cp);
+ }
+ break;
+ default: /* Is anything else used by the kernel? */
+ break;
+ }
if (endp)
*endp = (char *)cp;
return result;



2000-12-21 00:40:28

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

Steve Grubb wrote:
> It seems gcc creates much better code with the variables set to register
> types.

Curious. GCC should be generating the same code regardless; ah well.

Is strtoul actually used in the kernel other than for the occasional
(rare) write to /proc/sys and parsing boot options?

> But this is the kernel and there are people that would reject my patch
> purely on the basis that it adds precious bytes to the kernel.

Perhaps I am mistaken but I'd expect it to be called what, ten times at
boot time, and a couple of times when X loads the MTRRs?

Sounds like the neatest trick would be reducing bytes used here...

-- Jamie

2000-12-21 09:55:45

by Matthias Andree

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

On Wed, 20 Dec 2000, Steve Grubb wrote:

> + while (isdigit(c)) {
> + result = (result*10) + (c & 0x0f);
> + c = *(++cp);
> + }

x * 10 can be written as:

(x << 2 + x) << 1 = (4x+x) * 2
(x << 3) + (x << 1) = 8x + 2x

Not sure if that/which one is faster, you may want to benchmark.
However, on machines that I have seen, multiplication times are either
constant or depend on the count of set bits in the second divisor, so
it's something like 6 + 2s.

However, I have only m68k data books here, and it will gain nothing on
an 'C68060 since those beasts ram down multiplications in 2 cycles, so
we'd gain nothing on those chips (OK, the shifts take 1 cycles each and
are scheduled in parallel, and the add takes an additional cycle after
the shifts have completed). Not sure about the ix86, alpha or sparc
series.

--
Matthias Andree

2000-12-21 10:59:33

by Bernd Schmidt

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

On Thu, 21 Dec 2000, Matthias Andree wrote:
>
> x * 10 can be written as:
>
> (x << 2 + x) << 1 = (4x+x) * 2
> (x << 3) + (x << 1) = 8x + 2x

Or as "x * 10". Which has the advantage of actually being readable, and
letting the compiler optimize it into one of the other forms if that's
profitable on the machine you are compiling for.

Why do you guys bother making strtoul run fast anyway? Surely it's not on
any critical path in the kernel?


Bernd

2000-12-21 16:36:06

by Alan

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

> On Wed, 20 Dec 2000, Steve Grubb wrote:
>
> > + while (isdigit(c)) {
> > + result = (result*10) + (c & 0x0f);
> > + c = *(++cp);
> > + }
>
> x * 10 can be written as:
>
> (x << 2 + x) << 1 = (4x+x) * 2
> (x << 3) + (x << 1) = 8x + 2x

Since when has printk been performance critical. It isnt worth microoptimising
(or in your case for some cpus micropessimising) that stuff. Besides, gcc should
work it out if its worth doing

2000-12-22 10:29:30

by Pavel Machek

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

Hi!

> The following patch is a faster implementation of the simple_strtoul
> function. This function differs from the original in that it reduces the
> multiplies to shifts and logical operations wherever possible. My testing
> shows that it adds around 100 bytes, but is about 6% faster on a K6-2. (It
> is 40% faster that glibc's strtoul, but that's a different story.) My guess
> is that the performance gain will be higher on platforms with slower
> multiplication instructions. I have tested it for numerical accuracy so I
> think this is safe to apply. If anyone is interested, I can also supply a
> test application that demonstrates the performance gain. This patch was
> generated against 2.2.16, but should apply to 2.2.19 cleanly. In
> 2.4.0-test9, simple_strtoul starts on line 19 rather than 17, hopefully
> that's not a problem.

Simple question: who cares about performance of simple_strtoul?

Original is shorter, and simple_strtoul is not performace
critical. Keep it as it is.

--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]

2000-12-22 10:32:00

by Pavel Machek

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

Hi!

> > It seems gcc creates much better code with the variables set to register
> > types.
>
> Curious. GCC should be generating the same code regardless; ah well.
>
> Is strtoul actually used in the kernel other than for the occasional
> (rare) write to /proc/sys and parsing boot options?
>
> > But this is the kernel and there are people that would reject my patch
> > purely on the basis that it adds precious bytes to the kernel.
>
> Perhaps I am mistaken but I'd expect it to be called what, ten times at
> boot time, and a couple of times when X loads the MTRRs?

On second thought, ps -auxl maybe stresses simple_strtoul a little
bit. Not sure.

> Sounds like the neatest trick would be reducing bytes used here...

Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]

2000-12-28 05:45:24

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Patch] performance enhancement for simple_strtoul

Pavel Machek wrote:
> > [about strtoul]
> > Perhaps I am mistaken but I'd expect it to be called what, ten times at
> > boot time, and a couple of times when X loads the MTRRs?
>
> On second thought, ps -auxl maybe stresses simple_strtoul a little
> bit. Not sure.

Nah. proc_pid_lookup does its own conversion from string to number, and
the rest are conversions from numbers to strings in sprintf.

-- Jamie