2002-01-06 12:40:22

by Anton Blanchard

[permalink] [raw]
Subject: [PATCH] Remove 8 bytes from struct page on 64bit archs


Hi,

It seems shortening struct page is all the rage at the moment and I
didnt want to be left out. On some 64bit architectures (sparc64 and
ppc64 for example) all memory is allocated in the DMA zone. Therefore
there is no reason to waste 8 bytes per page when every page points to
the same zone!

Here is a very simple patch (ppc64 only so far). For archs that have
more than one memory zone, they should define the following:

#define ARCH_NR_ZONES 3
#define GET_PAGE_ZONE(page) (page)->zone
#define SET_PAGE_ZONE(page, __zone) (page)->zone = (__zone)

Next up will be modifying the VM so it doesnt iterate over all 3 zones
if ARCH_NR_ZONES = 1. I did this a while ago with the old VM and it made
a noticable difference but I need to retry it to see if this is still
the case.

Anton

diff -ru --exclude-from=exclude linuxppc_2_4_devel_work/include/asm-ppc64/page.h linuxppc_2_4_devel_work_onezone/include/asm-ppc64/page.h
--- linuxppc_2_4_devel_work/include/asm-ppc64/page.h Tue Dec 18 18:50:35 2001
+++ linuxppc_2_4_devel_work_onezone/include/asm-ppc64/page.h Sun Jan 6 22:18:45 2002
@@ -223,5 +223,9 @@

#define MAP_NR(addr) (__pa(addr) >> PAGE_SHIFT)

+#define ARCH_NR_ZONES 1
+#define GET_PAGE_ZONE(page) (contig_page_data.node_zones)
+#define SET_PAGE_ZONE(page, zone)
+
#endif /* __KERNEL__ */
#endif /* _PPC64_PAGE_H */
diff -ru --exclude-from=exclude linuxppc_2_4_devel_work/include/linux/mm.h linuxppc_2_4_devel_work_onezone/include/linux/mm.h
--- linuxppc_2_4_devel_work/include/linux/mm.h Sun Jan 6 15:50:56 2002
+++ linuxppc_2_4_devel_work_onezone/include/linux/mm.h Sun Jan 6 23:19:58 2002
@@ -164,7 +164,9 @@
struct buffer_head * buffers; /* Buffer maps us to a disk block. */
void *virtual; /* Kernel virtual address (NULL if
not kmapped, ie. highmem) */
+#if ARCH_NR_ZONES > 1
struct zone_struct *zone; /* Memory zone we are in. */
+#endif
} mem_map_t;

/*
diff -ru --exclude-from=exclude linuxppc_2_4_devel_work/mm/page_alloc.c linuxppc_2_4_devel_work_onezone/mm/page_alloc.c
--- linuxppc_2_4_devel_work/mm/page_alloc.c Wed Nov 21 13:43:40 2001
+++ linuxppc_2_4_devel_work_onezone/mm/page_alloc.c Sun Jan 6 22:18:45 2002
@@ -54,7 +54,11 @@
/*
* Temporary debugging check.
*/
+#if ARCH_NR_ZONES > 1
#define BAD_RANGE(zone,x) (((zone) != (x)->zone) || (((x)-mem_map) < (zone)->zone_start_mapnr) || (((x)-mem_map) >= (zone)->zone_start_mapnr+(zone)->size))
+#else
+#define BAD_RANGE(zone,x) 0
+#endif

/*
* Buddy system. Hairy. You really aren't expected to understand this
@@ -90,7 +94,7 @@
goto local_freelist;
back_local_freelist:

- zone = page->zone;
+ zone = GET_PAGE_ZONE(page);

mask = (~0UL) << order;
base = zone->zone_mem_map;
@@ -255,7 +259,7 @@
entry = local_pages->next;
do {
tmp = list_entry(entry, struct page, list);
- if (tmp->index == order && memclass(tmp->zone, classzone)) {
+ if (tmp->index == order && memclass(GET_PAGE_ZONE(tmp), classzone)) {
list_del(entry);
current->nr_local_pages--;
set_page_count(tmp, 1);
@@ -732,7 +736,7 @@

for (i = 0; i < size; i++) {
struct page *page = mem_map + offset + i;
- page->zone = zone;
+ SET_PAGE_ZONE(page, zone);
if (j != ZONE_HIGHMEM)
page->virtual = __va(zone_start_paddr);
zone_start_paddr += PAGE_SIZE;
diff -ru --exclude-from=exclude linuxppc_2_4_devel_work/mm/vmscan.c linuxppc_2_4_devel_work_onezone/mm/vmscan.c
--- linuxppc_2_4_devel_work/mm/vmscan.c Sat Dec 22 11:50:21 2001
+++ linuxppc_2_4_devel_work_onezone/mm/vmscan.c Sun Jan 6 22:18:45 2002
@@ -58,7 +58,7 @@
return 0;

/* Don't bother replenishing zones not under pressure.. */
- if (!memclass(page->zone, classzone))
+ if (!memclass(GET_PAGE_ZONE(page), classzone))
return 0;

if (TryLockPage(page))
@@ -369,7 +369,7 @@
if (unlikely(!page_count(page)))
continue;

- if (!memclass(page->zone, classzone))
+ if (!memclass(GET_PAGE_ZONE(page), classzone))
continue;

/* Racy check to avoid trylocking when not worthwhile */


2002-01-06 13:07:27

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

On Sun, 6 Jan 2002, Anton Blanchard wrote:

> Therefore there is no reason to waste 8 bytes per page when every page
> points to the same zone!

Some of the low end single zone machines (m68k, sparc32, arm etc)
could benefit from losing ->virtual too. wli has patches in
his dir on kernel.org that do this (and other struct page reductions)
The newer ones are against Rik's rmap vm though iirc, so you may have to
frob a bit to get them to play with the stock vm.

It'd be nice to see all these patches reducing this struct consolidated,
right now they're all ifdefing different bits with different names..

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-01-06 13:12:17

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

On Sun, Jan 06, 2002 at 11:39:14PM +1100, Anton Blanchard wrote:
> It seems shortening struct page is all the rage at the moment and I
> didnt want to be left out. On some 64bit architectures (sparc64 and
> ppc64 for example) all memory is allocated in the DMA zone. Therefore
> there is no reason to waste 8 bytes per page when every page points to
> the same zone!

Very true. I devised something to address this that appears to work on
multiple architectures already by folding ->zone into ->flags, which
could be useful. (Dave Jones recommended I just let arch maintainers
for things other than i386 mess with page_address() for other arches.)
OTOH, I'm more interested in getting it trimmed down than getting credit.


Cheers,
Bill

P.S.:

My i386 version, which makes ->virtual conditional on CONFIG_HIGHMEM as
well, is at:

ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/struct_page/struct_page-2.4.17-rc2

2002-01-06 13:14:07

by Momchil Velikov

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

>>>>> "Dave" == Dave Jones <[email protected]> writes:
Dave> It'd be nice to see all these patches reducing this struct consolidated,
Dave> right now they're all ifdefing different bits with different names..

They're pretty much independent of each other and, IMHO, having them
as a single large dropping^h^h^h^h is not the preferred way of
submitting patches.

Regards,
-velco


2002-01-06 13:25:40

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

On 6 Jan 2002, Momchil Velikov wrote:

> They're pretty much independent of each other

Hardly. wli removes the zone ptr completely by encoding it into flags,
whilst anton makes it conditional. As it stands, they don't
play together.

> as a single large dropping^h^h^h^h is not the preferred way of
> submitting patches.

They both have the same (or similar) goal. Having them both go
off in different directions to achieve that goal is counterproductive
if the end result is the same.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-01-06 13:28:57

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

On Sun, 6 Jan 2002, Anton Blanchard wrote:
>> Therefore there is no reason to waste 8 bytes per page when every page
>> points to the same zone!

On Sun, Jan 06, 2002 at 02:07:05PM +0100, Dave Jones wrote:
> Some of the low end single zone machines (m68k, sparc32, arm etc)
> could benefit from losing ->virtual too. wli has patches in
> his dir on kernel.org that do this (and other struct page reductions)
> The newer ones are against Rik's rmap vm though iirc, so you may have to
> frob a bit to get them to play with the stock vm.

I can personally (and quite quickly) port the page size reductions to
other VM's, as the changes required are not particularly invasive.

On Sun, Jan 06, 2002 at 02:07:05PM +0100, Dave Jones wrote:
> It'd be nice to see all these patches reducing this struct consolidated,
> right now they're all ifdefing different bits with different names..

I can correct style issues and the like for at least the bits I'm
responsible for in very short order. I'll consolidate the waitqueue,
->virtual and ->zone removals within the hour, since that appears to
be wanted/needed.

Anton, do you want to take on mixing the zone into ->flags folding with
your approach?

Cheers,
Bill

2002-01-06 13:38:40

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs


Hi Bill,

> Very true. I devised something to address this that appears to work on
> multiple architectures already by folding ->zone into ->flags, which
> could be useful. (Dave Jones recommended I just let arch maintainers
> for things other than i386 mess with page_address() for other arches.)
> OTOH, I'm more interested in getting it trimmed down than getting credit.

For archs that need ->zone, merging it with ->flags sounds like a
great idea. Id like to cram something into ->flags on 64 bit archs
since its only a long due to bitop constraints. I had thought of
stuffing ->count in the high word but now Im just getting silly
since all non atomic accesses to ->flags would then have to be word
ones.

> My i386 version, which makes ->virtual conditional on CONFIG_HIGHMEM as
> well, is at:

Id like to do redo some profiling, on ppc64 we had page_address() doing
pointer arithmetic (instead of page->virtual) and the compiler created
an awful sequence of instructions in the acenic interrupt handler.
A zero copy TCP benchmark made it rather obvious.

Anton

2002-01-06 13:52:23

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

On Mon, Jan 07, 2002 at 12:33:26AM +1100, Anton Blanchard wrote:
> For archs that need ->zone, merging it with ->flags sounds like a
> great idea. Id like to cram something into ->flags on 64 bit archs
> since its only a long due to bitop constraints. I had thought of
> stuffing ->count in the high word but now Im just getting silly
> since all non atomic accesses to ->flags would then have to be word
> ones.

At some point in the past, I wrote:
>> My i386 version, which makes ->virtual conditional on CONFIG_HIGHMEM as
>> well, is at:

On Mon, Jan 07, 2002 at 12:33:26AM +1100, Anton Blanchard wrote:
> Id like to do redo some profiling, on ppc64 we had page_address() doing
> pointer arithmetic (instead of page->virtual) and the compiler created
> an awful sequence of instructions in the acenic interrupt handler.
> A zero copy TCP benchmark made it rather obvious.

I'm not entirely surprised at this, CONFIG_HIGHMEM is probably just
not quite a strict enough criterion for eliminating ->virtual as it's
a time/space tradeoff that just happens to be bad on ppc64. Maybe we
should figure out some other criterion in addition to it.


Cheers,
Bill

2002-01-06 14:09:37

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

From: Dave Jones <[email protected]>
Date: Sun, 6 Jan 2002 14:07:05 +0100 (CET)

Some of the low end single zone machines (m68k, sparc32, arm etc)
could benefit from losing ->virtual too.

Sparc32 has kmapping, so it would need virtual.

2002-01-06 14:44:23

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

On Sun, 6 Jan 2002, David S. Miller wrote:

> Some of the low end single zone machines (m68k, sparc32, arm etc)
> could benefit from losing ->virtual too.
> Sparc32 has kmapping, so it would need virtual.

I'm curious to see how large the tradeoff is with calculating
virtual in page_address(). The overhead there may be larger than
the win we get from better cacheline footprint in struct page

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-01-06 16:08:25

by Horst von Brand

[permalink] [raw]
Subject: Re: [PATCH] Remove 8 bytes from struct page on 64bit archs

Anton Blanchard <[email protected]> said:

[...]

> Here is a very simple patch (ppc64 only so far). For archs that have
> more than one memory zone, they should define the following:
>
> #define ARCH_NR_ZONES 3
> #define GET_PAGE_ZONE(page) (page)->zone
> #define SET_PAGE_ZONE(page, __zone) (page)->zone = (__zone)

Better sprinkle in a few more ()'s...
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616

2002-01-07 01:27:15

by Anton Blanchard

[permalink] [raw]
Subject: results: Remove 8 bytes from struct page on 64bit archs


Hi,

> I'm curious to see how large the tradeoff is with calculating
> virtual in page_address(). The overhead there may be larger than
> the win we get from better cacheline footprint in struct page

Check out:

http://samba.org/~anton/linux/struct_page/

"struct page can be hazardous to your health"

On this machine (2 way power3) we see a ~5% improvement on dbench by
removing ->zone. Quite a nice improvement.

When we also remove ->virtual and use pointer arithmetic to do
page_address() performance drops back down. The reason for this can
be found in:

http://samba.org/~anton/linux/struct_page/2.4.18pre1-nopagezone-nopagevirtual/2/6.html

Search for divd, remembering the percentage to the left is shifted one
instruction down. Yes divides really hurt.

Basically the difference is:


page_address() using page->virtual:

ld 25,112(31)


page_address() using pointer arithmetic:

.LC1:
.quad mem_map
.LC2:
.quad 0xc000000000000000

...

ld 9,.LC1
li 11,120 ; sizeof(struct page)
ld 10,.LC2
ld 0,0(9)
subf 0,0,31
divd 0,0,11
sldi 0,0,12
add 25,0,10


Perhaps the compiler should be optimising this better (can we replace
the divide?)

Anton

2002-01-07 03:22:32

by Richard Henderson

[permalink] [raw]
Subject: Re: results: Remove 8 bytes from struct page on 64bit archs

On Mon, Jan 07, 2002 at 12:25:55PM +1100, Anton Blanchard wrote:
> li 11,120 ; sizeof(struct page)
[...]
> divd 0,0,11
>
> Perhaps the compiler should be optimising this better (can we replace
> the divide?)

The powerpc backend claims to have a fast divide instruction
(via RTX_COST if you care about such things). We'll replace
with shift when dividing by powers of 2, but won't try the
multiply by a constant inverse trick.


r~

2002-01-07 05:15:19

by David Miller

[permalink] [raw]
Subject: Re: results: Remove 8 bytes from struct page on 64bit archs

From: Anton Blanchard <[email protected]>
Date: Mon, 7 Jan 2002 12:25:55 +1100

Perhaps the compiler should be optimising this better (can we replace
the divide?)

As mentioned the PPC backend of gcc thinks that divides are cheaper
than multiplies. On sparc64 for example we get multiplies by a
constant when emitting constant divides.

If you can make the struct a power of two as well as smaller, that
would eliminate both problems here but not fix the apparent bug in the
PPC backend.

2002-01-15 00:31:30

by Anton Blanchard

[permalink] [raw]
Subject: Re: results: Remove 8 bytes from struct page on 64bit archs


> The powerpc backend claims to have a fast divide instruction
> (via RTX_COST if you care about such things). We'll replace
> with shift when dividing by powers of 2, but won't try the
> multiply by a constant inverse trick.

To follow up: Alan Modra found and fixed the bug, it seems we were only
using the optimisation when the arguments were <= 32bit.

The target we use is RS64a which has a cost of 60 odd instructions
for divide.

Anton

2002-01-15 00:46:21

by Dave Jones

[permalink] [raw]
Subject: Re: results: Remove 8 bytes from struct page on 64bit archs

On Mon, 14 Jan 2002, Anton Blanchard wrote:

> To follow up: Alan Modra found and fixed the bug, it seems we were only
> using the optimisation when the arguments were <= 32bit.
> The target we use is RS64a which has a cost of 60 odd instructions
> for divide.

Look forward to seeing the updated benchmarks.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs