2008-03-21 06:24:22

by Christoph Lameter

[permalink] [raw]
Subject: [13/14] vcompound: Use vcompound for swap_map

Use virtual compound pages for the large swap maps. This only works for
swap maps that are smaller than a MAX_ORDER block though. If the swap map
is larger then there is no way around the use of vmalloc.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/swapfile.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.25-rc5-mm1/mm/swapfile.c
===================================================================
--- linux-2.6.25-rc5-mm1.orig/mm/swapfile.c 2008-03-20 20:32:12.793950570 -0700
+++ linux-2.6.25-rc5-mm1/mm/swapfile.c 2008-03-20 20:37:43.367821147 -0700
@@ -1312,7 +1312,7 @@ asmlinkage long sys_swapoff(const char _
p->flags = 0;
spin_unlock(&swap_lock);
mutex_unlock(&swapon_mutex);
- vfree(swap_map);
+ __free_vcompound(swap_map);
inode = mapping->host;
if (S_ISBLK(inode->i_mode)) {
struct block_device *bdev = I_BDEV(inode);
@@ -1636,13 +1636,13 @@ asmlinkage long sys_swapon(const char __
goto bad_swap;

/* OK, set up the swap map and apply the bad block list */
- if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) {
+ if (!(p->swap_map = __alloc_vcompound(GFP_KERNEL | __GFP_ZERO,
+ get_order(maxpages * sizeof(short))))) {
error = -ENOMEM;
goto bad_swap;
}

error = 0;
- memset(p->swap_map, 0, maxpages * sizeof(short));
for (i = 0; i < swap_header->info.nr_badpages; i++) {
int page_nr = swap_header->info.badpages[i];
if (page_nr <= 0 || page_nr >= swap_header->info.last_page)
@@ -1718,7 +1718,7 @@ bad_swap_2:
if (!(swap_flags & SWAP_FLAG_PREFER))
++least_priority;
spin_unlock(&swap_lock);
- vfree(swap_map);
+ __free_vcompound(swap_map);
if (swap_file)
filp_close(swap_file, NULL);
out:

--


2008-03-21 21:26:07

by Andi Kleen

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

Christoph Lameter <[email protected]> writes:

> Use virtual compound pages for the large swap maps. This only works for
> swap maps that are smaller than a MAX_ORDER block though. If the swap map
> is larger then there is no way around the use of vmalloc.

Have you considered the potential memory wastage from rounding up
to the next page order now? (similar in all the other patches
to change vmalloc). e.g. if the old size was 64k + 1 byte it will
suddenly get 128k now. That is actually not a uncommon situation
in my experience; there are often power of two buffers with
some small headers.

A long time ago (in 2.4-aa) I did something similar for module loading
as an experiment to avoid too many TLB misses. The module loader
would first try to get a continuous range in the direct mapping and
only then fall back to vmalloc.

But I used a simple trick to avoid the waste problem: it allocated a
continuous range rounded up to the next page-size order and then freed
the excess pages back into the page allocator. That was called
alloc_exact(). If you replace vmalloc with alloc_pages you should
use something like that too I think.

-Andi

2008-03-21 21:35:09

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

On Fri, 21 Mar 2008, Andi Kleen wrote:

> > is larger then there is no way around the use of vmalloc.
>
> Have you considered the potential memory wastage from rounding up
> to the next page order now? (similar in all the other patches
> to change vmalloc). e.g. if the old size was 64k + 1 byte it will
> suddenly get 128k now. That is actually not a uncommon situation
> in my experience; there are often power of two buffers with
> some small headers.

Yes the larger the order the more significant the problem becomes.

> A long time ago (in 2.4-aa) I did something similar for module loading
> as an experiment to avoid too many TLB misses. The module loader
> would first try to get a continuous range in the direct mapping and
> only then fall back to vmalloc.
>
> But I used a simple trick to avoid the waste problem: it allocated a
> continuous range rounded up to the next page-size order and then freed
> the excess pages back into the page allocator. That was called
> alloc_exact(). If you replace vmalloc with alloc_pages you should
> use something like that too I think.

That trick is still in use for alloc_large_system_hash....

But cutting off the tail of compound pages would make treating them as
order N pages difficult. The vmalloc fallback situation is easy to deal
with.

Maybe we can think about making compound pages being N consecutive pages
of PAGE_SIZE rather than an order O page? The api would be a bit
different then and it would require changes to the page allocator. More
fragmentation if pages like that are freed.

2008-03-24 19:56:54

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

On Fri, 21 Mar 2008, Andi Kleen wrote:

> But I used a simple trick to avoid the waste problem: it allocated a
> continuous range rounded up to the next page-size order and then freed
> the excess pages back into the page allocator. That was called
> alloc_exact(). If you replace vmalloc with alloc_pages you should
> use something like that too I think.

One way of dealing with it would be to define an additional allocation
variant that allows the limiting of the loss? I noted that both the swap
and the wait tables vary significantly between allocations. So we could
specify an upper boundary of a loss that is acceptable. If too much memory
would be lost then use vmalloc unconditionally.

---
include/linux/vmalloc.h | 12 ++++++++----
mm/page_alloc.c | 4 ++--
mm/swapfile.c | 4 ++--
mm/vmalloc.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 46 insertions(+), 8 deletions(-)

Index: linux-2.6.25-rc5-mm1/include/linux/vmalloc.h
===================================================================
--- linux-2.6.25-rc5-mm1.orig/include/linux/vmalloc.h 2008-03-24 12:51:47.457231129 -0700
+++ linux-2.6.25-rc5-mm1/include/linux/vmalloc.h 2008-03-24 12:52:05.449313572 -0700
@@ -88,14 +88,18 @@ extern void free_vm_area(struct vm_struc
/*
* Support for virtual compound pages.
*
- * Calls to vcompound alloc will result in the allocation of normal compound
- * pages unless memory is fragmented. If insufficient physical linear memory
- * is available then a virtually contiguous area of memory will be created
- * using the vmalloc functionality.
+ * Calls to vcompound_alloc and friends will result in the allocation of
+ * a normal physically contiguous compound page unless memory is fragmented.
+ * If insufficient physical linear memory is available then a virtually
+ * contiguous area of memory will be created using vmalloc.
*/
struct page *alloc_vcompound(gfp_t flags, int order);
+struct page *alloc_vcompound_maxloss(gfp_t flags, unsigned long size,
+ unsigned long maxloss);
void free_vcompound(struct page *);
void *__alloc_vcompound(gfp_t flags, int order);
+void *__alloc_vcompound_maxloss(gfp_t flags, unsigned long size,
+ unsigned long maxloss);
void __free_vcompound(void *addr);
struct page *vcompound_head_page(const void *x);

Index: linux-2.6.25-rc5-mm1/mm/vmalloc.c
===================================================================
--- linux-2.6.25-rc5-mm1.orig/mm/vmalloc.c 2008-03-24 12:51:47.485231279 -0700
+++ linux-2.6.25-rc5-mm1/mm/vmalloc.c 2008-03-24 12:52:05.453313419 -0700
@@ -1198,3 +1198,37 @@ void *__alloc_vcompound(gfp_t flags, int

return NULL;
}
+
+/*
+ * Functions to avoid loosing memory because of the rounding up to
+ * power of two sizes for compound page allocation. If the loss would
+ * be too great then use vmalloc regardless of the fragmentation
+ * situation.
+ */
+struct page *alloc_vcompound_maxloss(gfp_t flags, unsigned long size,
+ unsigned long maxloss)
+{
+ int order = get_order(size);
+ unsigned long loss = (PAGE_SIZE << order) - size;
+ void *addr;
+
+ if (loss < maxloss)
+ return alloc_vcompound(flags, order);
+
+ addr = __vmalloc(size, flags, PAGE_KERNEL);
+ if (!addr)
+ return NULL;
+ return vmalloc_to_page(addr);
+}
+
+void *__alloc_vcompound_maxloss(gfp_t flags, unsigned long size,
+ unsigned long maxloss)
+{
+ int order = get_order(size);
+ unsigned long loss = (PAGE_SIZE << order) - size;
+
+ if (loss < maxloss)
+ return __alloc_vcompound(flags, order);
+
+ return __vmalloc(size, flags, PAGE_KERNEL);
+}
Index: linux-2.6.25-rc5-mm1/mm/swapfile.c
===================================================================
--- linux-2.6.25-rc5-mm1.orig/mm/swapfile.c 2008-03-24 12:52:05.441314302 -0700
+++ linux-2.6.25-rc5-mm1/mm/swapfile.c 2008-03-24 12:52:05.453313419 -0700
@@ -1636,8 +1636,8 @@ asmlinkage long sys_swapon(const char __
goto bad_swap;

/* OK, set up the swap map and apply the bad block list */
- if (!(p->swap_map = __alloc_vcompound(GFP_KERNEL | __GFP_ZERO,
- get_order(maxpages * sizeof(short))))) {
+ if (!(p->swap_map = __alloc_vcompound_maxloss(GFP_KERNEL | __GFP_ZERO,
+ maxpages * sizeof(short))), 16 * PAGE_SIZE) {
error = -ENOMEM;
goto bad_swap;
}
Index: linux-2.6.25-rc5-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.25-rc5-mm1.orig/mm/page_alloc.c 2008-03-24 12:52:05.389313168 -0700
+++ linux-2.6.25-rc5-mm1/mm/page_alloc.c 2008-03-24 12:52:07.493322559 -0700
@@ -2866,8 +2866,8 @@ int zone_wait_table_init(struct zone *zo
* To use this new node's memory, further consideration will be
* necessary.
*/
- zone->wait_table = __alloc_vcompound(GFP_KERNEL,
- get_order(alloc_size));
+ zone->wait_table = __alloc_vcompound_maxloss(GFP_KERNEL,
+ alloc_size, 32 * PAGE_SIZE);
}
if (!zone->wait_table)
return -ENOMEM;

2008-03-25 07:49:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

On Mon, Mar 24, 2008 at 12:54:54PM -0700, Christoph Lameter wrote:
> On Fri, 21 Mar 2008, Andi Kleen wrote:
>
> > But I used a simple trick to avoid the waste problem: it allocated a
> > continuous range rounded up to the next page-size order and then freed
> > the excess pages back into the page allocator. That was called
> > alloc_exact(). If you replace vmalloc with alloc_pages you should
> > use something like that too I think.
>
> One way of dealing with it would be to define an additional allocation
> variant that allows the limiting of the loss? I noted that both the swap
> and the wait tables vary significantly between allocations. So we could
> specify an upper boundary of a loss that is acceptable. If too much memory
> would be lost then use vmalloc unconditionally.

I liked your idea of fixing compound pages to not rely on order
better. Ok it is likely more work to implement @)

Also if anything preserving memory should be default, but maybe
skippable a with __GFP_GO_FAST flag.

-Andi

2008-03-25 17:47:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

On Tue, 25 Mar 2008, Andi Kleen wrote:

> I liked your idea of fixing compound pages to not rely on order
> better. Ok it is likely more work to implement @)

Right. It just requires a page allocator rewrite. Which is overdue
anyways given the fastpath issues. Volunteers?

> Also if anything preserving memory should be default, but maybe
> skippable a with __GFP_GO_FAST flag.

Well. Guess we need a definition of preserving memory. All allocations
typically have some kind of overhead.

2008-03-25 17:52:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

On Tue, Mar 25, 2008 at 10:45:06AM -0700, Christoph Lameter wrote:
> On Tue, 25 Mar 2008, Andi Kleen wrote:
>
> > I liked your idea of fixing compound pages to not rely on order
> > better. Ok it is likely more work to implement @)
>
> Right. It just requires a page allocator rewrite.

Not when the trick of getting high order, returning left over pages
is used. I meant just updating the GFP_COMPOUND code to always
use number of pages instead of order so that it could deal with a compound
where the excess pages are already returned. That is not actually that
much work (I reimplemented this recently for dma alloc and it's < 20 LOC)

Of course the full rewrite would be also great, agreed :)

-Andi

2008-03-25 17:53:19

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/14] vcompound: Use vcompound for swap_map

On Tue, 25 Mar 2008, Andi Kleen wrote:

> Not when the trick of getting high order, returning left over pages
> is used. I meant just updating the GFP_COMPOUND code to always
> use number of pages instead of order so that it could deal with a compound
> where the excess pages are already returned. That is not actually that
> much work (I reimplemented this recently for dma alloc and it's < 20 LOC)

Would you post the patch here?