2011-04-11 22:03:59

by Dave Hansen

[permalink] [raw]
Subject: [PATCH 1/3] rename alloc_pages_exact()


alloc_pages_exact() returns a virtual address. But, alloc_pages() returns
a 'struct page *'. That makes for very confused kernel hackers.

__get_free_pages(), on the other hand, returns virtual addresses. That
makes alloc_pages_exact() a much closer match to __get_free_pages(), so
rename it to get_free_pages_exact(). Also change the arguments to have
flags first, just like __get_free_pages().

Note that alloc_pages_exact()'s partner, free_pages_exact() already
matches free_pages(), so we do not have to touch the free side of things.

Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Acked-by: David Rientjes <[email protected]>
---

linux-2.6.git-dave/drivers/video/fsl-diu-fb.c | 2 +-
linux-2.6.git-dave/drivers/video/mxsfb.c | 2 +-
linux-2.6.git-dave/drivers/video/pxafb.c | 6 +++---
linux-2.6.git-dave/drivers/virtio/virtio_pci.c | 2 +-
linux-2.6.git-dave/include/linux/gfp.h | 2 +-
linux-2.6.git-dave/kernel/profile.c | 4 ++--
linux-2.6.git-dave/mm/page_alloc.c | 18 +++++++++---------
linux-2.6.git-dave/mm/page_cgroup.c | 2 +-
8 files changed, 19 insertions(+), 19 deletions(-)

diff -puN drivers/video/fsl-diu-fb.c~change-alloc_pages_exact-name drivers/video/fsl-diu-fb.c
--- linux-2.6.git/drivers/video/fsl-diu-fb.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.453823153 -0700
+++ linux-2.6.git-dave/drivers/video/fsl-diu-fb.c 2011-04-11 15:01:16.489823137 -0700
@@ -294,7 +294,7 @@ static void *fsl_diu_alloc(size_t size,

pr_debug("size=%zu\n", size);

- virt = alloc_pages_exact(size, GFP_DMA | __GFP_ZERO);
+ virt = get_free_pages_exact(GFP_DMA | __GFP_ZERO, size);
if (virt) {
*phys = virt_to_phys(virt);
pr_debug("virt=%p phys=%llx\n", virt,
diff -puN drivers/video/mxsfb.c~change-alloc_pages_exact-name drivers/video/mxsfb.c
--- linux-2.6.git/drivers/video/mxsfb.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.457823151 -0700
+++ linux-2.6.git-dave/drivers/video/mxsfb.c 2011-04-11 15:01:16.489823137 -0700
@@ -718,7 +718,7 @@ static int __devinit mxsfb_init_fbinfo(s
} else {
if (!fb_size)
fb_size = SZ_2M; /* default */
- fb_virt = alloc_pages_exact(fb_size, GFP_DMA);
+ fb_virt = get_free_pages_exact(GFP_DMA, fb_size);
if (!fb_virt)
return -ENOMEM;

diff -puN drivers/video/pxafb.c~change-alloc_pages_exact-name drivers/video/pxafb.c
--- linux-2.6.git/drivers/video/pxafb.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.461823149 -0700
+++ linux-2.6.git-dave/drivers/video/pxafb.c 2011-04-11 15:01:16.493823135 -0700
@@ -905,8 +905,8 @@ static int __devinit pxafb_overlay_map_v
/* We assume that user will use at most video_mem_size for overlay fb,
* anyway, it's useless to use 16bpp main plane and 24bpp overlay
*/
- ofb->video_mem = alloc_pages_exact(PAGE_ALIGN(pxafb->video_mem_size),
- GFP_KERNEL | __GFP_ZERO);
+ ofb->video_mem = get_free_pages_exact(GFP_KERNEL | __GFP_ZERO,
+ PAGE_ALIGN(pxafb->video_mem_size));
if (ofb->video_mem == NULL)
return -ENOMEM;

@@ -1714,7 +1714,7 @@ static int __devinit pxafb_init_video_me
{
int size = PAGE_ALIGN(fbi->video_mem_size);

- fbi->video_mem = alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO);
+ fbi->video_mem = get_free_pages_exact(GFP_KERNEL | __GFP_ZERO, size);
if (fbi->video_mem == NULL)
return -ENOMEM;

diff -puN drivers/virtio/virtio_pci.c~change-alloc_pages_exact-name drivers/virtio/virtio_pci.c
--- linux-2.6.git/drivers/virtio/virtio_pci.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.465823147 -0700
+++ linux-2.6.git-dave/drivers/virtio/virtio_pci.c 2011-04-11 15:01:16.493823135 -0700
@@ -385,7 +385,7 @@ static struct virtqueue *setup_vq(struct
info->msix_vector = msix_vec;

size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
- info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
+ info->queue = get_free_pages_exact(GFP_KERNEL|__GFP_ZERO, size);
if (info->queue == NULL) {
err = -ENOMEM;
goto out_info;
diff -puN include/linux/gfp.h~change-alloc_pages_exact-name include/linux/gfp.h
--- linux-2.6.git/include/linux/gfp.h~change-alloc_pages_exact-name 2011-04-11 15:01:16.469823145 -0700
+++ linux-2.6.git-dave/include/linux/gfp.h 2011-04-11 15:01:16.493823135 -0700
@@ -351,7 +351,7 @@ extern struct page *alloc_pages_vma(gfp_
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);

-void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
+void *get_free_pages_exact(gfp_t gfp_mask, size_t size);
void free_pages_exact(void *virt, size_t size);

#define __get_free_page(gfp_mask) \
diff -puN kernel/profile.c~change-alloc_pages_exact-name kernel/profile.c
--- linux-2.6.git/kernel/profile.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.473823143 -0700
+++ linux-2.6.git-dave/kernel/profile.c 2011-04-11 15:01:16.497823133 -0700
@@ -121,8 +121,8 @@ int __ref profile_init(void)
if (prof_buffer)
return 0;

- prof_buffer = alloc_pages_exact(buffer_bytes,
- GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN);
+ prof_buffer = get_free_pages_exact(GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN,
+ buffer_bytes);
if (prof_buffer)
return 0;

diff -puN mm/page_alloc.c~change-alloc_pages_exact-name mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.477823141 -0700
+++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:16.501823131 -0700
@@ -2318,7 +2318,7 @@ void free_pages(unsigned long addr, unsi
EXPORT_SYMBOL(free_pages);

/**
- * alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * get_free_pages_exact - allocate an exact number physically-contiguous pages.
* @size: the number of bytes to allocate
* @gfp_mask: GFP flags for the allocation
*
@@ -2330,7 +2330,7 @@ EXPORT_SYMBOL(free_pages);
*
* Memory allocated by this function must be released by free_pages_exact().
*/
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
+void *get_free_pages_exact(gfp_t gfp_mask, size_t size)
{
unsigned int order = get_order(size);
unsigned long addr;
@@ -2349,14 +2349,14 @@ void *alloc_pages_exact(size_t size, gfp

return (void *)addr;
}
-EXPORT_SYMBOL(alloc_pages_exact);
+EXPORT_SYMBOL(get_free_pages_exact);

/**
- * free_pages_exact - release memory allocated via alloc_pages_exact()
- * @virt: the value returned by alloc_pages_exact.
- * @size: size of allocation, same value as passed to alloc_pages_exact().
+ * free_pages_exact - release memory allocated via get_free_pages_exact()
+ * @virt: the value returned by get_free_pages_exact.
+ * @size: size of allocation, same value as passed to get_free_pages_exact().
*
- * Release the memory allocated by a previous call to alloc_pages_exact.
+ * Release the memory allocated by a previous call to get_free_pages_exact().
*/
void free_pages_exact(void *virt, size_t size)
{
@@ -5308,10 +5308,10 @@ void *__init alloc_large_system_hash(con
/*
* If bucketsize is not a power-of-two, we may free
* some pages at the end of hash table which
- * alloc_pages_exact() automatically does
+ * get_free_pages_exact() automatically does
*/
if (get_order(size) < MAX_ORDER) {
- table = alloc_pages_exact(size, GFP_ATOMIC);
+ table = get_free_pages_exact(size, GFP_ATOMIC);
kmemleak_alloc(table, size, 1, GFP_ATOMIC);
}
}
diff -puN mm/page_cgroup.c~change-alloc_pages_exact-name mm/page_cgroup.c
--- linux-2.6.git/mm/page_cgroup.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.481823139 -0700
+++ linux-2.6.git-dave/mm/page_cgroup.c 2011-04-11 15:01:16.501823131 -0700
@@ -134,7 +134,7 @@ static void *__init_refok alloc_page_cgr
{
void *addr = NULL;

- addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN);
+ addr = get_free_pages_exact(GFP_KERNEL | __GFP_NOWARN, size);
if (addr)
return addr;

_


2011-04-11 22:04:08

by Dave Hansen

[permalink] [raw]
Subject: [PATCH 2/3] make new alloc_pages_exact()


What I really wanted in the end was a highmem-capable alloc_pages_exact(),
so here it is. This function can be used to allocate unmapped (like
highmem) non-power-of-two-sized areas of memory. This is in constast to
get_free_pages_exact() which can only allocate from lowmem.

My plan is to use this in the virtio_balloon driver to allocate large,
oddly-sized contiguous areas.

The new __alloc_pages_exact() now takes a size in numbers of pages,
and returns a 'struct page', which means it can now address highmem.

It's a bit unfortunate that this introduces __free_pages_exact()
alongside free_pages_exact(). But that mess already exists with
__free_pages() vs. free_pages_exact(). So, at worst, this mirrors
the mess that we already have.

I'm also a bit worried that I've not put in something named
alloc_pages_exact(), but that behaves differently than it did before this
set. I got all of the in-tree cases, but I'm a bit worried about
stragglers elsewhere. So, I'm calling this __alloc_pages_exact() for
the moment. We can take out the __ some day if it bothers people.

Note that the __get_free_pages() has a !GFP_HIGHMEM check. Now that
we are using alloc_pages_exact() instead of __get_free_pages() for
get_free_pages_exact(), we had to add a new check in
get_free_pages_exact().

This has been compile and boot tested, and I checked that

echo 2 > /sys/kernel/profiling

still works, since it uses get_free_pages_exact().

Signed-off-by: Dave Hansen <[email protected]>
---

linux-2.6.git-dave/include/linux/gfp.h | 4 +
linux-2.6.git-dave/mm/page_alloc.c | 84 ++++++++++++++++++++++++---------
2 files changed, 67 insertions(+), 21 deletions(-)

diff -puN include/linux/gfp.h~make_new_alloc_pages_exact include/linux/gfp.h
--- linux-2.6.git/include/linux/gfp.h~make_new_alloc_pages_exact 2011-04-11 15:01:17.165822836 -0700
+++ linux-2.6.git-dave/include/linux/gfp.h 2011-04-11 15:01:17.177822831 -0700
@@ -351,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);

+/* 'struct page' version */
+struct page *__alloc_pages_exact(gfp_t gfp_mask, size_t size);
+void __free_pages_exact(struct page *page, size_t size);
+/* virtual address version */
void *get_free_pages_exact(gfp_t gfp_mask, size_t size);
void free_pages_exact(void *virt, size_t size);

diff -puN mm/page_alloc.c~make_new_alloc_pages_exact mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~make_new_alloc_pages_exact 2011-04-11 15:01:17.169822835 -0700
+++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:17.177822831 -0700
@@ -2318,9 +2318,10 @@ void free_pages(unsigned long addr, unsi
EXPORT_SYMBOL(free_pages);

/**
- * get_free_pages_exact - allocate an exact number physically-contiguous pages.
- * @size: the number of bytes to allocate
+ * __alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * @nr_pages: the number of pages to allocate
* @gfp_mask: GFP flags for the allocation
+ * returns: struct page for allocated memory
*
* This function is similar to alloc_pages(), except that it allocates the
* minimum number of pages to satisfy the request. alloc_pages() can only
@@ -2330,29 +2331,75 @@ EXPORT_SYMBOL(free_pages);
*
* Memory allocated by this function must be released by free_pages_exact().
*/
-void *get_free_pages_exact(gfp_t gfp_mask, size_t size)
+struct page *__alloc_pages_exact(gfp_t gfp_mask, size_t nr_pages)
{
- unsigned int order = get_order(size);
- unsigned long addr;
+ unsigned int order = get_order(nr_pages * PAGE_SIZE);
+ struct page *page;

- addr = __get_free_pages(gfp_mask, order);
- if (addr) {
- unsigned long alloc_end = addr + (PAGE_SIZE << order);
- unsigned long used = addr + PAGE_ALIGN(size);
+ page = alloc_pages(gfp_mask, order);
+ if (page) {
+ struct page *alloc_end = page + (1 << order);
+ struct page *used = page + nr_pages;

- split_page(virt_to_page((void *)addr), order);
+ split_page(page, order);
while (used < alloc_end) {
- free_page(used);
- used += PAGE_SIZE;
+ __free_page(used);
+ used++;
}
}

- return (void *)addr;
+ return page;
+}
+EXPORT_SYMBOL(__alloc_pages_exact);
+
+/**
+ * __free_pages_exact - release memory allocated via __alloc_pages_exact()
+ * @virt: the value returned by get_free_pages_exact.
+ * @nr_pages: size in pages, same value as passed to __alloc_pages_exact().
+ *
+ * Release the memory allocated by a previous call to __alloc_pages_exact().
+ */
+void __free_pages_exact(struct page *page, size_t nr_pages)
+{
+ struct page *end = page + nr_pages;
+
+ while (page < end) {
+ __free_page(page);
+ page++;
+ }
+}
+EXPORT_SYMBOL(__free_pages_exact);
+
+/**
+ * get_free_pages_exact - allocate an exact number physically-contiguous pages.
+ * @gfp_mask: GFP flags for the allocation
+ * @size: the number of bytes to allocate
+ * returns: virtual address of allocated memory
+ *
+ * This function is similar to __get_free_pages(), except that it allocates the
+ * minimum number of pages to satisfy the request. get_free_pages() can only
+ * allocate memory in power-of-two pages.
+ *
+ * This function is also limited by MAX_ORDER.
+ *
+ * Memory allocated by this function must be released by free_pages_exact().
+ */
+void *get_free_pages_exact(gfp_t gfp_mask, size_t size)
+{
+ struct page *page;
+
+ /* If we are using page_address(), we can not allow highmem */
+ VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
+
+ page = __alloc_pages_exact(gfp_mask, size * PAGE_SIZE);
+ if (page)
+ return (void *) page_address(page);
+ return NULL;
}
EXPORT_SYMBOL(get_free_pages_exact);

/**
- * free_pages_exact - release memory allocated via get_free_pages_exact()
+ * __free_pages_exact - release memory allocated via get_free_pages_exact()
* @virt: the value returned by get_free_pages_exact.
* @size: size of allocation, same value as passed to get_free_pages_exact().
*
@@ -2360,13 +2407,8 @@ EXPORT_SYMBOL(get_free_pages_exact);
*/
void free_pages_exact(void *virt, size_t size)
{
- unsigned long addr = (unsigned long)virt;
- unsigned long end = addr + PAGE_ALIGN(size);
-
- while (addr < end) {
- free_page(addr);
- addr += PAGE_SIZE;
- }
+ int nr_pages = PAGE_ALIGN(size)/PAGE_SIZE;
+ __free_pages_exact(virt_to_page(virt), nr_pages);
}
EXPORT_SYMBOL(free_pages_exact);

_

2011-04-11 22:03:59

by Dave Hansen

[permalink] [raw]
Subject: [PATCH 3/3] reuse __free_pages_exact() in __alloc_pages_exact()


Michal Nazarewicz noticed that __alloc_pages_exact()'s
__free_page() loop was really close to something he was
using in one of his patches. That made me realize that
it was actually very similar to __free_pages_exact().

This uses __free_pages_exact() in place of the loop
that we had in __alloc_pages_exact(). Since we had to
change the temporary variables around anyway, I gave
them some better names to hopefully address some other
review comments.

---

linux-2.6.git-dave/mm/page_alloc.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff -puN mm/page_alloc.c~reuse-free-exact mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~reuse-free-exact 2011-04-11 15:01:17.701822598 -0700
+++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:17.713822594 -0700
@@ -2338,14 +2338,11 @@ struct page *__alloc_pages_exact(gfp_t g

page = alloc_pages(gfp_mask, order);
if (page) {
- struct page *alloc_end = page + (1 << order);
- struct page *used = page + nr_pages;
+ struct page *unused_start = page + nr_pages;
+ int nr_unused = (1 << order) - nr_pages;

split_page(page, order);
- while (used < alloc_end) {
- __free_page(used);
- used++;
- }
+ __free_pages_exact(unused_start, nr_unused);
}

return page;
_

2011-04-11 22:23:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

On Mon, 11 Apr 2011 15:03:46 -0700
Dave Hansen <[email protected]> wrote:

>
> What I really wanted in the end was a highmem-capable alloc_pages_exact(),
> so here it is. This function can be used to allocate unmapped (like
> highmem) non-power-of-two-sized areas of memory. This is in constast to
> get_free_pages_exact() which can only allocate from lowmem.
>
> My plan is to use this in the virtio_balloon driver to allocate large,
> oddly-sized contiguous areas.
>
> The new __alloc_pages_exact() now takes a size in numbers of pages,
> and returns a 'struct page', which means it can now address highmem.
>
> It's a bit unfortunate that this introduces __free_pages_exact()
> alongside free_pages_exact(). But that mess already exists with
> __free_pages() vs. free_pages_exact(). So, at worst, this mirrors
> the mess that we already have.
>
> I'm also a bit worried that I've not put in something named
> alloc_pages_exact(), but that behaves differently than it did before this
> set. I got all of the in-tree cases, but I'm a bit worried about
> stragglers elsewhere. So, I'm calling this __alloc_pages_exact() for
> the moment. We can take out the __ some day if it bothers people.

Yup, that's fair enough.

> Note that the __get_free_pages() has a !GFP_HIGHMEM check. Now that
> we are using alloc_pages_exact() instead of __get_free_pages() for
> get_free_pages_exact(), we had to add a new check in
> get_free_pages_exact().
>
> This has been compile and boot tested, and I checked that
>
> echo 2 > /sys/kernel/profiling
>
> still works, since it uses get_free_pages_exact().
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux-2.6.git-dave/include/linux/gfp.h | 4 +
> linux-2.6.git-dave/mm/page_alloc.c | 84 ++++++++++++++++++++++++---------
> 2 files changed, 67 insertions(+), 21 deletions(-)
>
> diff -puN include/linux/gfp.h~make_new_alloc_pages_exact include/linux/gfp.h
> --- linux-2.6.git/include/linux/gfp.h~make_new_alloc_pages_exact 2011-04-11 15:01:17.165822836 -0700
> +++ linux-2.6.git-dave/include/linux/gfp.h 2011-04-11 15:01:17.177822831 -0700
> @@ -351,6 +351,10 @@ extern struct page *alloc_pages_vma(gfp_
> extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
> extern unsigned long get_zeroed_page(gfp_t gfp_mask);
>
> +/* 'struct page' version */
> +struct page *__alloc_pages_exact(gfp_t gfp_mask, size_t size);
> +void __free_pages_exact(struct page *page, size_t size);

The declarations use "size", but the definitions use "nr_pages".
"nr_pages" is way better.

Should it really be size_t? size_t's units are "bytes", usually.

> -void *get_free_pages_exact(gfp_t gfp_mask, size_t size)
> +struct page *__alloc_pages_exact(gfp_t gfp_mask, size_t nr_pages)

Most allocation functions are of the form foo(size, gfp_t), but this
one has the args reversed. Was there a reason for that?


> {
> - unsigned int order = get_order(size);
> - unsigned long addr;
> + unsigned int order = get_order(nr_pages * PAGE_SIZE);
> + struct page *page;
>
> - addr = __get_free_pages(gfp_mask, order);
> - if (addr) {
> - unsigned long alloc_end = addr + (PAGE_SIZE << order);
> - unsigned long used = addr + PAGE_ALIGN(size);
> + page = alloc_pages(gfp_mask, order);
> + if (page) {
> + struct page *alloc_end = page + (1 << order);
> + struct page *used = page + nr_pages;
>
> - split_page(virt_to_page((void *)addr), order);
> + split_page(page, order);
> while (used < alloc_end) {
> - free_page(used);
> - used += PAGE_SIZE;
> + __free_page(used);
> + used++;
> }
> }
>
> - return (void *)addr;
> + return page;
> +}
> +EXPORT_SYMBOL(__alloc_pages_exact);
> +
> +/**
> + * __free_pages_exact - release memory allocated via __alloc_pages_exact()
> + * @virt: the value returned by get_free_pages_exact.
> + * @nr_pages: size in pages, same value as passed to __alloc_pages_exact().
> + *
> + * Release the memory allocated by a previous call to __alloc_pages_exact().
> + */
> +void __free_pages_exact(struct page *page, size_t nr_pages)
> +{
> + struct page *end = page + nr_pages;
> +
> + while (page < end) {

Hand-optimised. Old school. Doesn't trust the compiler :)

> + __free_page(page);
> + page++;
> + }
> +}
> +EXPORT_SYMBOL(__free_pages_exact);

Really, this function duplicates release_pages(). release_pages() is
big and fat and complex and is a crime against uniprocessor but it does
make some effort to reduce the spinlocking frequency and in many
situations, release_pages() will cause vastly less locked bus traffic
than your __free_pages_exact(). And who knows, smart use of
release_pages()'s "cold" hint may provide some benefits.

2011-04-11 22:36:12

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

On Mon, 2011-04-11 at 15:22 -0700, Andrew Morton wrote:
> > +/* 'struct page' version */
> > +struct page *__alloc_pages_exact(gfp_t gfp_mask, size_t size);
> > +void __free_pages_exact(struct page *page, size_t size);
>
> The declarations use "size", but the definitions use "nr_pages".
> "nr_pages" is way better.

I'll fix that.

> Should it really be size_t? size_t's units are "bytes", usually.

Yeah, the nr_pages one should probably be an unsigned long.

> > -void *get_free_pages_exact(gfp_t gfp_mask, size_t size)
> > +struct page *__alloc_pages_exact(gfp_t gfp_mask, size_t nr_pages)
>
> Most allocation functions are of the form foo(size, gfp_t), but this
> one has the args reversed. Was there a reason for that?

I'm trying to make this a clone of alloc_pages(), which does:

#define alloc_pages(gfp_mask, order)

It needs a note in the changelog on why I did it.

> > {
> > - unsigned int order = get_order(size);
> > - unsigned long addr;
> > + unsigned int order = get_order(nr_pages * PAGE_SIZE);
> > + struct page *page;
> >
> > - addr = __get_free_pages(gfp_mask, order);
> > - if (addr) {
> > - unsigned long alloc_end = addr + (PAGE_SIZE << order);
> > - unsigned long used = addr + PAGE_ALIGN(size);
> > + page = alloc_pages(gfp_mask, order);
> > + if (page) {
> > + struct page *alloc_end = page + (1 << order);
> > + struct page *used = page + nr_pages;
> >
> > - split_page(virt_to_page((void *)addr), order);
> > + split_page(page, order);
> > while (used < alloc_end) {
> > - free_page(used);
> > - used += PAGE_SIZE;
> > + __free_page(used);
> > + used++;
> > }
> > }
> >
> > - return (void *)addr;
> > + return page;
> > +}
> > +EXPORT_SYMBOL(__alloc_pages_exact);
> > +
> > +/**
> > + * __free_pages_exact - release memory allocated via __alloc_pages_exact()
> > + * @virt: the value returned by get_free_pages_exact.
> > + * @nr_pages: size in pages, same value as passed to __alloc_pages_exact().
> > + *
> > + * Release the memory allocated by a previous call to __alloc_pages_exact().
> > + */
> > +void __free_pages_exact(struct page *page, size_t nr_pages)
> > +{
> > + struct page *end = page + nr_pages;
> > +
> > + while (page < end) {
>
> Hand-optimised. Old school. Doesn't trust the compiler :)

Hey, ask the dude who put free_pages_exact() in there! :)

> > + __free_page(page);
> > + page++;
> > + }
> > +}
> > +EXPORT_SYMBOL(__free_pages_exact);
>
> Really, this function duplicates release_pages(). release_pages() is
> big and fat and complex and is a crime against uniprocessor but it does
> make some effort to reduce the spinlocking frequency and in many
> situations, release_pages() will cause vastly less locked bus traffic
> than your __free_pages_exact(). And who knows, smart use of
> release_pages()'s "cold" hint may provide some benefits.

Seems like a decent enough thing to try. I'll give it a shot and make
sure it's OK to use.

-- Dave

2011-04-11 22:42:43

by Timur Tabi

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

Dave Hansen wrote:
>> > Hand-optimised. Old school. Doesn't trust the compiler :)

> Hey, ask the dude who put free_pages_exact() in there! :)

Ugh, I don't remember at all why I wrote it the way I did. I'm pretty sure I
copied the style from somewhere else, since these functions are not really my
area of expertise.

When you guys finally agree on the order of the parameters :) I'll test out the
functions on my board.

--
Timur Tabi
Linux kernel developer at Freescale

2011-04-12 10:28:26

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

> Dave Hansen <[email protected]> wrote:
>> +void __free_pages_exact(struct page *page, size_t nr_pages)
>> +{
>> + struct page *end = page + nr_pages;
>> +
>> + while (page < end) {
>> + __free_page(page);
>> + page++;
>> + }
>> +}
>> +EXPORT_SYMBOL(__free_pages_exact);

On Tue, 12 Apr 2011 00:22:23 +0200, Andrew Morton wrote:
> Really, this function duplicates release_pages().

It requires an array of pointers to pages which is not great though if one
just wants to free a contiguous sequence of pages.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-04-12 10:29:57

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 3/3] reuse __free_pages_exact() in __alloc_pages_exact()

On Tue, 12 Apr 2011 00:03:48 +0200, Dave Hansen <[email protected]>
wrote:
> diff -puN mm/page_alloc.c~reuse-free-exact mm/page_alloc.c
> --- linux-2.6.git/mm/page_alloc.c~reuse-free-exact 2011-04-11
> 15:01:17.701822598 -0700
> +++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:17.713822594
> -0700
> @@ -2338,14 +2338,11 @@ struct page *__alloc_pages_exact(gfp_t g
> page = alloc_pages(gfp_mask, order);
> if (page) {
> - struct page *alloc_end = page + (1 << order);
> - struct page *used = page + nr_pages;
> + struct page *unused_start = page + nr_pages;
> + int nr_unused = (1 << order) - nr_pages;

How about unsigned long?

> split_page(page, order);
> - while (used < alloc_end) {
> - __free_page(used);
> - used++;
> - }
> + __free_pages_exact(unused_start, nr_unused);
> }
> return page;

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-04-12 15:04:35

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

On Tue, 2011-04-12 at 12:28 +0200, Michal Nazarewicz wrote:
> > Dave Hansen <[email protected]> wrote:
> >> +void __free_pages_exact(struct page *page, size_t nr_pages)
> >> +{
> >> + struct page *end = page + nr_pages;
> >> +
> >> + while (page < end) {
> >> + __free_page(page);
> >> + page++;
> >> + }
> >> +}
> >> +EXPORT_SYMBOL(__free_pages_exact);
>
> On Tue, 12 Apr 2011 00:22:23 +0200, Andrew Morton wrote:
> > Really, this function duplicates release_pages().
>
> It requires an array of pointers to pages which is not great though if one
> just wants to free a contiguous sequence of pages.

Actually, the various mem_map[]s _are_ arrays, at least up to
MAX_ORDER_NR_PAGES at a time. We can use that property here.

-- Dave

2011-04-12 15:24:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 3/3] reuse __free_pages_exact() in __alloc_pages_exact()

On Tue, 2011-04-12 at 12:29 +0200, Michal Nazarewicz wrote:
> On Tue, 12 Apr 2011 00:03:48 +0200, Dave Hansen <[email protected]>
> wrote:
> > diff -puN mm/page_alloc.c~reuse-free-exact mm/page_alloc.c
> > --- linux-2.6.git/mm/page_alloc.c~reuse-free-exact 2011-04-11
> > 15:01:17.701822598 -0700
> > +++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:17.713822594
> > -0700
> > @@ -2338,14 +2338,11 @@ struct page *__alloc_pages_exact(gfp_t g
> > page = alloc_pages(gfp_mask, order);
> > if (page) {
> > - struct page *alloc_end = page + (1 << order);
> > - struct page *used = page + nr_pages;
> > + struct page *unused_start = page + nr_pages;
> > + int nr_unused = (1 << order) - nr_pages;
>
> How about unsigned long?

Personally, I'd rather leave this up to the poor sucker that tries to
set MAX_ORDER to 33. If someone did that, we'd end up with kernels that
couldn't even boot on systems with less than 16GB of RAM since the
(required) flatmem mem_map[] would take up ~14.3GB. They couldn't
handle memory holes and couldn't be NUMA-aware, either.

So, if someone had a system like that, fixed up all the other spots
where we store numbers of pages in ints, and then did an 8TB+4k
allocation, yes, this would matter. I'd rather save the 10 bytes of
source code and 4 bytes of stack than account for such an impossibly
improbable system.

-- Dave

2011-04-12 15:57:27

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 3/3] reuse __free_pages_exact() in __alloc_pages_exact()

On Tue, 12 Apr 2011 17:24:24 +0200, Dave Hansen <[email protected]>
wrote:

> On Tue, 2011-04-12 at 12:29 +0200, Michal Nazarewicz wrote:
>> On Tue, 12 Apr 2011 00:03:48 +0200, Dave Hansen
>> <[email protected]>
>> wrote:
>> > diff -puN mm/page_alloc.c~reuse-free-exact mm/page_alloc.c
>> > --- linux-2.6.git/mm/page_alloc.c~reuse-free-exact 2011-04-11
>> > 15:01:17.701822598 -0700
>> > +++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:17.713822594
>> > -0700
>> > @@ -2338,14 +2338,11 @@ struct page *__alloc_pages_exact(gfp_t g
>> > page = alloc_pages(gfp_mask, order);
>> > if (page) {
>> > - struct page *alloc_end = page + (1 << order);
>> > - struct page *used = page + nr_pages;
>> > + struct page *unused_start = page + nr_pages;
>> > + int nr_unused = (1 << order) - nr_pages;
>>
>> How about unsigned long?
>
> Personally, I'd rather leave this up to the poor sucker that tries to
> set MAX_ORDER to 33. If someone did that, we'd end up with kernels that
> couldn't even boot on systems with less than 16GB of RAM since the
> (required) flatmem mem_map[] would take up ~14.3GB. They couldn't
> handle memory holes and couldn't be NUMA-aware, either.

I was thinking more about the fact that the int will get converted
anyway when calling __free_pages_exact() and it makes no sense for
number of pages to be negative. Just a suggestion, no strong
feelings.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-04-12 15:59:04

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

On Tue, 12 Apr 2011 17:04:13 +0200, Dave Hansen <[email protected]>
wrote:

> On Tue, 2011-04-12 at 12:28 +0200, Michal Nazarewicz wrote:
>> > Dave Hansen <[email protected]> wrote:
>> >> +void __free_pages_exact(struct page *page, size_t nr_pages)
>> >> +{
>> >> + struct page *end = page + nr_pages;
>> >> +
>> >> + while (page < end) {
>> >> + __free_page(page);
>> >> + page++;
>> >> + }
>> >> +}
>> >> +EXPORT_SYMBOL(__free_pages_exact);
>>
>> On Tue, 12 Apr 2011 00:22:23 +0200, Andrew Morton wrote:
>> > Really, this function duplicates release_pages().
>>
>> It requires an array of pointers to pages which is not great though if
>> one
>> just wants to free a contiguous sequence of pages.
>
> Actually, the various mem_map[]s _are_ arrays, at least up to
> MAX_ORDER_NR_PAGES at a time. We can use that property here.

In that case, waiting eagerly for the new patch. :)

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-04-12 17:08:06

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 1/3] rename alloc_pages_exact()

Dave Hansen <[email protected]> writes:

> alloc_pages_exact() returns a virtual address. But, alloc_pages() returns
> a 'struct page *'. That makes for very confused kernel hackers.
>
> __get_free_pages(), on the other hand, returns virtual addresses. That
> makes alloc_pages_exact() a much closer match to __get_free_pages(), so
> rename it to get_free_pages_exact(). Also change the arguments to have
> flags first, just like __get_free_pages().
>
> Note that alloc_pages_exact()'s partner, free_pages_exact() already
> matches free_pages(), so we do not have to touch the free side of things.
>
> Signed-off-by: Dave Hansen <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> Acked-by: David Rientjes <[email protected]>
> ---
>
> linux-2.6.git-dave/drivers/video/fsl-diu-fb.c | 2 +-
> linux-2.6.git-dave/drivers/video/mxsfb.c | 2 +-
> linux-2.6.git-dave/drivers/video/pxafb.c | 6 +++---
> linux-2.6.git-dave/drivers/virtio/virtio_pci.c | 2 +-
> linux-2.6.git-dave/include/linux/gfp.h | 2 +-
> linux-2.6.git-dave/kernel/profile.c | 4 ++--
> linux-2.6.git-dave/mm/page_alloc.c | 18 +++++++++---------
> linux-2.6.git-dave/mm/page_cgroup.c | 2 +-
> 8 files changed, 19 insertions(+), 19 deletions(-)
>
> diff -puN drivers/video/fsl-diu-fb.c~change-alloc_pages_exact-name drivers/video/fsl-diu-fb.c
> --- linux-2.6.git/drivers/video/fsl-diu-fb.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.453823153 -0700
> +++ linux-2.6.git-dave/drivers/video/fsl-diu-fb.c 2011-04-11 15:01:16.489823137 -0700
> @@ -294,7 +294,7 @@ static void *fsl_diu_alloc(size_t size,
>
> pr_debug("size=%zu\n", size);
>
> - virt = alloc_pages_exact(size, GFP_DMA | __GFP_ZERO);
> + virt = get_free_pages_exact(GFP_DMA | __GFP_ZERO, size);
> if (virt) {
> *phys = virt_to_phys(virt);
> pr_debug("virt=%p phys=%llx\n", virt,
> diff -puN drivers/video/mxsfb.c~change-alloc_pages_exact-name drivers/video/mxsfb.c
> --- linux-2.6.git/drivers/video/mxsfb.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.457823151 -0700
> +++ linux-2.6.git-dave/drivers/video/mxsfb.c 2011-04-11 15:01:16.489823137 -0700
> @@ -718,7 +718,7 @@ static int __devinit mxsfb_init_fbinfo(s
> } else {
> if (!fb_size)
> fb_size = SZ_2M; /* default */
> - fb_virt = alloc_pages_exact(fb_size, GFP_DMA);
> + fb_virt = get_free_pages_exact(GFP_DMA, fb_size);
> if (!fb_virt)
> return -ENOMEM;
>
> diff -puN drivers/video/pxafb.c~change-alloc_pages_exact-name drivers/video/pxafb.c
> --- linux-2.6.git/drivers/video/pxafb.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.461823149 -0700
> +++ linux-2.6.git-dave/drivers/video/pxafb.c 2011-04-11 15:01:16.493823135 -0700
> @@ -905,8 +905,8 @@ static int __devinit pxafb_overlay_map_v
> /* We assume that user will use at most video_mem_size for overlay fb,
> * anyway, it's useless to use 16bpp main plane and 24bpp overlay
> */
> - ofb->video_mem = alloc_pages_exact(PAGE_ALIGN(pxafb->video_mem_size),
> - GFP_KERNEL | __GFP_ZERO);
> + ofb->video_mem = get_free_pages_exact(GFP_KERNEL | __GFP_ZERO,
> + PAGE_ALIGN(pxafb->video_mem_size));
> if (ofb->video_mem == NULL)
> return -ENOMEM;
>
> @@ -1714,7 +1714,7 @@ static int __devinit pxafb_init_video_me
> {
> int size = PAGE_ALIGN(fbi->video_mem_size);
>
> - fbi->video_mem = alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO);
> + fbi->video_mem = get_free_pages_exact(GFP_KERNEL | __GFP_ZERO, size);
> if (fbi->video_mem == NULL)
> return -ENOMEM;
>
> diff -puN drivers/virtio/virtio_pci.c~change-alloc_pages_exact-name drivers/virtio/virtio_pci.c
> --- linux-2.6.git/drivers/virtio/virtio_pci.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.465823147 -0700
> +++ linux-2.6.git-dave/drivers/virtio/virtio_pci.c 2011-04-11 15:01:16.493823135 -0700
> @@ -385,7 +385,7 @@ static struct virtqueue *setup_vq(struct
> info->msix_vector = msix_vec;
>
> size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN));
> - info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO);
> + info->queue = get_free_pages_exact(GFP_KERNEL|__GFP_ZERO, size);
> if (info->queue == NULL) {
> err = -ENOMEM;
> goto out_info;
> diff -puN include/linux/gfp.h~change-alloc_pages_exact-name include/linux/gfp.h
> --- linux-2.6.git/include/linux/gfp.h~change-alloc_pages_exact-name 2011-04-11 15:01:16.469823145 -0700
> +++ linux-2.6.git-dave/include/linux/gfp.h 2011-04-11 15:01:16.493823135 -0700
> @@ -351,7 +351,7 @@ extern struct page *alloc_pages_vma(gfp_
> extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
> extern unsigned long get_zeroed_page(gfp_t gfp_mask);
>
> -void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
> +void *get_free_pages_exact(gfp_t gfp_mask, size_t size);
> void free_pages_exact(void *virt, size_t size);
>
> #define __get_free_page(gfp_mask) \
> diff -puN kernel/profile.c~change-alloc_pages_exact-name kernel/profile.c
> --- linux-2.6.git/kernel/profile.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.473823143 -0700
> +++ linux-2.6.git-dave/kernel/profile.c 2011-04-11 15:01:16.497823133 -0700
> @@ -121,8 +121,8 @@ int __ref profile_init(void)
> if (prof_buffer)
> return 0;
>
> - prof_buffer = alloc_pages_exact(buffer_bytes,
> - GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN);
> + prof_buffer = get_free_pages_exact(GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN,
> + buffer_bytes);
> if (prof_buffer)
> return 0;
>
> diff -puN mm/page_alloc.c~change-alloc_pages_exact-name mm/page_alloc.c
> --- linux-2.6.git/mm/page_alloc.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.477823141 -0700
> +++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-11 15:01:16.501823131 -0700
> @@ -2318,7 +2318,7 @@ void free_pages(unsigned long addr, unsi
> EXPORT_SYMBOL(free_pages);
>
> /**
> - * alloc_pages_exact - allocate an exact number physically-contiguous pages.
> + * get_free_pages_exact - allocate an exact number physically-contiguous pages.
> * @size: the number of bytes to allocate
> * @gfp_mask: GFP flags for the allocation
> *
> @@ -2330,7 +2330,7 @@ EXPORT_SYMBOL(free_pages);
> *
> * Memory allocated by this function must be released by free_pages_exact().
> */
> -void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
> +void *get_free_pages_exact(gfp_t gfp_mask, size_t size)
> {
> unsigned int order = get_order(size);
> unsigned long addr;
> @@ -2349,14 +2349,14 @@ void *alloc_pages_exact(size_t size, gfp
>
> return (void *)addr;
> }
> -EXPORT_SYMBOL(alloc_pages_exact);
> +EXPORT_SYMBOL(get_free_pages_exact);
>
> /**
> - * free_pages_exact - release memory allocated via alloc_pages_exact()
> - * @virt: the value returned by alloc_pages_exact.
> - * @size: size of allocation, same value as passed to alloc_pages_exact().
> + * free_pages_exact - release memory allocated via get_free_pages_exact()
> + * @virt: the value returned by get_free_pages_exact.
> + * @size: size of allocation, same value as passed to get_free_pages_exact().
> *
> - * Release the memory allocated by a previous call to alloc_pages_exact.
> + * Release the memory allocated by a previous call to get_free_pages_exact().
> */
> void free_pages_exact(void *virt, size_t size)
> {
> @@ -5308,10 +5308,10 @@ void *__init alloc_large_system_hash(con
> /*
> * If bucketsize is not a power-of-two, we may free
> * some pages at the end of hash table which
> - * alloc_pages_exact() automatically does
> + * get_free_pages_exact() automatically does
> */
> if (get_order(size) < MAX_ORDER) {
> - table = alloc_pages_exact(size, GFP_ATOMIC);
> + table = get_free_pages_exact(size, GFP_ATOMIC);

This should be table = get_free_pages_exact(GFP_ATOMIC, size);

Thanks.


> kmemleak_alloc(table, size, 1, GFP_ATOMIC);
> }
> }
> diff -puN mm/page_cgroup.c~change-alloc_pages_exact-name mm/page_cgroup.c
> --- linux-2.6.git/mm/page_cgroup.c~change-alloc_pages_exact-name 2011-04-11 15:01:16.481823139 -0700
> +++ linux-2.6.git-dave/mm/page_cgroup.c 2011-04-11 15:01:16.501823131 -0700
> @@ -134,7 +134,7 @@ static void *__init_refok alloc_page_cgr
> {
> void *addr = NULL;
>
> - addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN);
> + addr = get_free_pages_exact(GFP_KERNEL | __GFP_NOWARN, size);
> if (addr)
> return addr;
>
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2011-04-12 17:11:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 1/3] rename alloc_pages_exact()

On Wed, 2011-04-13 at 02:07 +0900, Namhyung Kim wrote:
> > if (get_order(size) < MAX_ORDER) {
> > - table = alloc_pages_exact(size, GFP_ATOMIC);
> > + table = get_free_pages_exact(size, GFP_ATOMIC);
>
> This should be table = get_free_pages_exact(GFP_ATOMIC, size);

It's clear that I'm retarded and need to learn how to use sparse. :)

Thanks!

-- Dave

2011-04-13 23:23:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 2/3] make new alloc_pages_exact()

On Tue, 2011-04-12 at 17:58 +0200, Michal Nazarewicz wrote:
> > Actually, the various mem_map[]s _are_ arrays, at least up to
> > MAX_ORDER_NR_PAGES at a time. We can use that property here.
>
> In that case, waiting eagerly for the new patch. :)

I misunderstood earlier. release_pages() takes an array of 'struct page
*', not an array of 'struct page'. To use it here, we'd need to
construct temporary arrays. If we're going to do that, we should
probably just use pagevecs, and if we're going to do _that_, we don't
need release_pages().

Nobody calls free_pages_exact() in any kind of hot path these days.
Most of the users are like kernel profiling where they *never* free the
memory. I'm not sure it's worth the complexity to optimize this.

-- Dave