alloc_pages_exact() is similar to alloc_pages(), except that it allocates
the minimum number of pages to fulfill the request. This is useful if you
want to allocate a very large buffer that is slightly larger than an
even power-of-two number of pages. In that case, alloc_pages() will waste
a lot of memory.
Signed-off-by: Timur Tabi <[email protected]>
---
I have a video driver that wants to allocate a 5MB buffer. alloc_pages()
will waste 3MB of physically-contiguous memory. Therefore, I would
like to see alloc_pages_exact() added to 2.6.27.
Please note that I am not a Linux VM expert. I wrote these functions based
on guidance from Andi Kleen. I have no familiarity with NUMA, so I don't know
how to handle that. Any and all suggestions are welcome.
include/linux/gfp.h | 3 ++
mm/page_alloc.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+), 0 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b414be3..1054801 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -215,6 +215,9 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
+void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
+void free_pages_exact(void *virt, size_t size);
+
#define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask),0)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2f55295..08bf9d7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1711,6 +1711,59 @@ void free_pages(unsigned long addr, unsigned int order)
EXPORT_SYMBOL(free_pages);
+/**
+ * alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * @size: the number of bytes to allocate
+ * @gfp_mask: GFP flags for the allocation
+ *
+ * This function is similar to alloc_pages(), except that it allocates the
+ * minimum number of pages to satisfy the request. alloc_pages() can only
+ * allocate memory in power-of-two pages.
+ *
+ * This function is also limited by MAX_ORDER.
+ *
+ * Memory allocated by this function must be released by free_pages_exact().
+ */
+void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
+{
+ unsigned int order = get_order(size);
+ unsigned long addr;
+
+ addr = __get_free_pages(gfp_mask, order);
+ if (addr) {
+ unsigned long alloc_end = addr + (PAGE_SIZE << order);
+ unsigned long used = addr + PAGE_ALIGN(size);
+
+ split_page(virt_to_page(addr), order);
+ while (used < alloc_end) {
+ free_page(used);
+ used += PAGE_SIZE;
+ }
+ }
+
+ return (void *)addr;
+}
+EXPORT_SYMBOL(alloc_pages_exact);
+
+/**
+ * free_pages_exact - release memory allocated via alloc_pages_exact()
+ * @virt: the value returned by alloc_pages_exact.
+ * @size: size of allocation, same value as passed to alloc_pages_exact().
+ *
+ * Release the memory allocated by a previous call to alloc_pages_exact.
+ */
+void free_pages_exact(void *virt, size_t size)
+{
+ unsigned long addr = (unsigned long)virt;
+ unsigned long end = addr + PAGE_ALIGN(size);
+
+ while (addr < end) {
+ free_page(addr);
+ addr += PAGE_SIZE;
+ }
+}
+EXPORT_SYMBOL(free_pages_exact);
+
static unsigned int nr_free_zone_pages(int offset)
{
struct zoneref *z;
--
1.5.5
On Tue, 2008-06-24 at 11:40 -0500, Timur Tabi wrote:
> +void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
> +{
> + unsigned int order = get_order(size);
> + unsigned long addr;
> +
> + addr = __get_free_pages(gfp_mask, order);
> + if (addr) {
> + unsigned long alloc_end = addr + (PAGE_SIZE << order);
> + unsigned long used = addr + PAGE_ALIGN(size);
> +
> + split_page(virt_to_page(addr), order);
> + while (used < alloc_end) {
> + free_page(used);
> + used += PAGE_SIZE;
> + }
> + }
> +
> + return (void *)addr;
> +}
Hi Timur,
This looks like a really good idea. It looks pretty good to me, no
functional problems. My brain had a really hard time parsing that code
for some reason, though. Could just be a lack of coffee.
I think the thing that confused me was trying to figure out if
'alloc_end' was the end of what we *did* allocate from
__get_free_pages() or if it was the *goal* allocation end.
'used' also seemed like a slightly strange variable name because it
points to the memory which is about to be freed and ends up *unused*.
I'll offer this up just in case you like it better. For me, it is
easier to parse, and should do the exact same thing. I also think it's
slightly nicer to do the arithmetic on 'struct page *' rather than
vaddrs in 'unsigned long'. It is _slightly_ cheaper not having to do a
virt_to_page() on each free_page() call. The same would go for the free
side as well.
All of the 'struct page *' arithmetic is OK since it is all done inside
one MAX_ORDER area.
void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
{
unsigned int order = get_order(size);
void *alloc;
struct page *surplus_start;
struct page *surplus_end;
struct page *page;
size = PAGE_ALIGN(size);
alloc = (void *)__get_free_pages(gfp_mask, order);
if (!alloc)
return NULL;
/* Turn the big allocation into a bunch of single pages */
split_page(virt_to_page(alloc), order);
surplus_start = virt_to_page(alloc + size);
surplus_end = surplus_start + (1 << order);
for (page = surplus_start; page < surplus_end; page++)
__free_page(page);
return alloc;
}
-- Dave
Dave Hansen wrote:
> This looks like a really good idea. It looks pretty good to me, no
> functional problems.
Thanks.
> I think the thing that confused me was trying to figure out if
> 'alloc_end' was the end of what we *did* allocate from
> __get_free_pages() or if it was the *goal* allocation end.
>
> 'used' also seemed like a slightly strange variable name because it
> points to the memory which is about to be freed and ends up *unused*.
This function is taken almost verbatim from alloc_large_system_hash(). I
figured if the terminology was good for that function, it's good for mine.
> I'll offer this up just in case you like it better. For me, it is
> easier to parse, and should do the exact same thing. I also think it's
> slightly nicer to do the arithmetic on 'struct page *' rather than
> vaddrs in 'unsigned long'. It is _slightly_ cheaper not having to do a
> virt_to_page() on each free_page() call. The same would go for the free
> side as well.
It does seem to be an improvement, although AKPM just accepted this patch. I'd
hate to bother him with a replacement patch for something so minor.
--
Timur Tabi
Linux kernel developer at Freescale