LinuxLists.cc - [PATCH] micro optimization of cache

2005-12-18 08:23:27

Subject: [PATCH] micro optimization of cache_estimate in slab.c

OK, I know this is really just a micro optimization, but I figured I'd
submit it anyway. Looking into the slab code, I came across the
cache_estimate function, which has the following code:

i = 0;
while (i*size + ALIGN(base+i*extra, align) <= wastage)
i++;
if (i > 0)
i--;

Now this is counting linearly up and will be O(n) for n = number of
objects in the page order. Since objects range up to 202 (according to
my /proc/slabinfo). So I figured there must be a better way. So I added
this:

i = 0;
do {
x = 1;
while ((x+i)*size + ALIGN(base+(x+i)*extra, align) <= wastage)
x <<= 1;
i += (x >> 1);
} while (x > 1);

Which now makes it O(log n). This basically does a binary search for
the upper range. I tested this code in userspace, and it works just the
same as the original code, but with less iterations.

I know this is really a micro optimization, but if you think every usec
counts, then we can use this patch ;)

-- Steve

Index: linux-2.6.15-rc5/mm/slab.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/slab.c 2005-12-16 16:24:09.000000000 -0500
+++ linux-2.6.15-rc5/mm/slab.c 2005-12-18 03:16:18.000000000 -0500
@@ -700,6 +700,7 @@
int flags, size_t *left_over, unsigned int *num)
{
int i;
+ int x;
size_t wastage = PAGE_SIZE<<gfporder;
size_t extra = 0;
size_t base = 0;
@@ -709,10 +710,12 @@
extra = sizeof(kmem_bufctl_t);
}
i = 0;
- while (i*size + ALIGN(base+i*extra, align) <= wastage)
- i++;
- if (i > 0)
- i--;
+ do {
+ x = 1;
+ while ((x+i)*size + ALIGN(base+(x+i)*extra, align) <= wastage)
+ x <<= 1;
+ i += (x >> 1);
+ } while (x > 1);

if (i > SLAB_LIMIT)
i = SLAB_LIMIT;

2005-12-18 17:27:40

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] micro optimization of cache_estimate in slab.c

Hi Steven,

On 12/18/05, Steven Rostedt <[email protected]> wrote:
> + do {
> + x = 1;
> + while ((x+i)*size + ALIGN(base+(x+i)*extra, align) <= wastage)
> + x <<= 1;
> + i += (x >> 1);
> + } while (x > 1);

The above is pretty hard to read. Perhaps we could give x and i better
names? Also, couldn't we move left part of the expression into a
separate static inline function for readability?

Pekka

2005-12-18 18:38:48

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] micro optimization of cache_estimate in slab.c

On Sun, 2005-12-18 at 19:27 +0200, Pekka Enberg wrote:
> Hi Steven,
>
> On 12/18/05, Steven Rostedt <[email protected]> wrote:
> > + do {
> > + x = 1;
> > + while ((x+i)*size + ALIGN(base+(x+i)*extra, align) <= wastage)
> > + x <<= 1;
> > + i += (x >> 1);
> > + } while (x > 1);
>
> The above is pretty hard to read. Perhaps we could give x and i better
> names? Also, couldn't we move left part of the expression into a
> separate static inline function for readability?

Actually, Luuk sent me this patch made by Balbir Singh that was done a
while ago.

extra = sizeof(kmem_bufctl_t);
}
- i = 0;
+ i = (wastage - base)/(size + extra);
while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)
i++;

- if (i > 0)
+ while (i*size + L1_CACHE_ALIGN(base+i*extra) > wastage)
i--;

This actually has a O(1) with a K=2. Analyzing this further, I've come
up with the below patch. This patch removes the need for the second
while, and adds a comment to why. The size is already calculated to be
no smaller than the alignment. So that the division will not return
something greater than 1 of what is needed. So the if (i > 0) after the
while is all that is needed. So this patch is O(1) K=1.

-- Steve

Index: linux-2.6.15-rc5/mm/slab.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/slab.c 2005-12-16 16:24:09.000000000 -0500
+++ linux-2.6.15-rc5/mm/slab.c 2005-12-18 13:30:13.000000000 -0500
@@ -708,7 +708,14 @@
base = sizeof(struct slab);
extra = sizeof(kmem_bufctl_t);
}
- i = 0;
+ /*
+ * Divide the amount we have, by the amount we need for
+ * each object. Since the size is already calculated
+ * to be no less than the alignment, this result will
+ * not be any greater than 1 that we need, and this will
+ * be subtracted after the while loop.
+ */
+ i = (wastage - base)/(size + extra);
while (i*size + ALIGN(base+i*extra, align) <= wastage)
i++;
if (i > 0)

2005-12-18 19:29:35

by Luuk van der Duim

[permalink] [raw]

Subject: Re: [PATCH] micro optimization of cache_estimate in slab.c

> Index: linux-2.6.15-rc5/mm/slab.c
> ===================================================================
> --- linux-2.6.15-rc5.orig/mm/slab.c 2005-12-16 16:24:09.000000000 -0500
> +++ linux-2.6.15-rc5/mm/slab.c 2005-12-18 13:30:13.000000000 -0500
> @@ -708,7 +708,14 @@
> base = sizeof(struct slab);
> extra = sizeof(kmem_bufctl_t);
> }
> - i = 0;
> + /*
> + * Divide the amount we have, by the amount we need for
> + * each object. Since the size is already calculated
> + * to be no less than the alignment, this result will
> + * not be any greater than 1 that we need, and this will
> + * be subtracted after the while loop.
> + */
> + i = (wastage - base)/(size + extra);
> while (i*size + ALIGN(base+i*extra, align) <= wastage)
> i++;
> if (i > 0)

Yes, I recognised the patch because I passed Balbirs version over to
Micheal Cohen in early 2002. The patch originaly was intended for 2.4

Complexity of Stevens patch has gone, readability has improved, Steven
has taken advantage of context-code and the rationale has been
explained in the comment.

Sounds fine to me.

Luuk van der Duim

Partners aan het Werk

2005-12-19 07:43:17

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] micro optimization of cache_estimate in slab.c

On Sun, 18 Dec 2005, Steven Rostedt wrote:
> Index: linux-2.6.15-rc5/mm/slab.c
> ===================================================================
> --- linux-2.6.15-rc5.orig/mm/slab.c 2005-12-16 16:24:09.000000000 -0500
> +++ linux-2.6.15-rc5/mm/slab.c 2005-12-18 13:30:13.000000000 -0500
> @@ -708,7 +708,14 @@
> base = sizeof(struct slab);
> extra = sizeof(kmem_bufctl_t);
> }
> - i = 0;
> + /*
> + * Divide the amount we have, by the amount we need for
> + * each object. Since the size is already calculated
> + * to be no less than the alignment, this result will
> + * not be any greater than 1 that we need, and this will
> + * be subtracted after the while loop.
> + */
> + i = (wastage - base)/(size + extra);
> while (i*size + ALIGN(base+i*extra, align) <= wastage)
> i++;
> if (i > 0)

Thanks. Looks much better but I am still wondering if we could get rid of
the loop altogether. Hmm.

Pekka