2010-12-13 11:27:12

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 00/10] Contiguous Memory Allocator

Hello everyone,

This is yet another version of CMA this time stripped from a lot of
code and with working migration implementation.

The Contiguous Memory Allocator (CMA) makes it possible for
device drivers to allocate big contiguous chunks of memory after
the system has booted.

For more information see 8th patch in the set.


The current version is just an allocator that handles allocation of
contiguous memory blocks. The difference between this patchset and
Kamezawa's alloc_contig_pages() are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
which may be unsuitable for embeded where a few MiBs are required.

2. CMA uses its own migratetype (MIGRATE_CMA) as it was suggested
during one of previous iterations. The migrate type behaves
similarly to ZONE_MOVABLE but can be put in arbitrary places.

This is required for us since we need to define two disjoint memory
ranges inside system's RAM. (ie. in two memory banks (do not
confuse with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
migrated. CMA on the other hand maintains its own allocator to
decide where to allocate memory for device drivers and then tries
to migrate pages from that part if needed. This is not strictly
required but I somehow feel it might be faster.


Links to previous versions of the patchset:
v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626/>
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010/>
v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573/>
v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986/>
v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669/>


Changelog:

v7: 1. A lot of functionality that handled driver->allocator_context
mapping has been removed from the patchset. This is not to say
that this code is not needed, it's just not worth posting
everything in one patchset.

Currently, CMA is "just" an allocator. It uses it's own
migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
which behave just like ZONE_MOVABLE but dispite the latter can
be put in arbitrary places.

2. The migration code that was introduced in the previous version
actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
The implementation is not yet complete though.

Migration support means that when CMA is not using memory
reserved for it, page allocator can allocate pages from it.
When CMA wants to use the memory, the pages have to be moved
and/or evicted as to make room for CMA.

To make it possible it must be guaranteed that only movable and
reclaimable pages are allocated in CMA controlled regions.
This is done by introducing a MIGRATE_CMA migrate type that
guarantees exactly that.

Some of the migration code is "borrowed" from Kamezawa
Hiroyuki's alloc_contig_pages() implementation. The main
difference is that thanks to MIGRATE_CMA migrate type CMA
assumes that memory controlled by CMA are is always movable or
reclaimable so that it makes allocation decisions regardless of
the whether some pages are actually allocated and migrates them
if needed.

The most interesting patches from the patchset that implement
the functionality are:

09/13: mm: alloc_contig_free_pages() added
10/13: mm: MIGRATE_CMA migration type added
11/13: mm: MIGRATE_CMA isolation functions added
12/13: mm: cma: Migration support added [wip]

Currently, kernel panics in some situations which I am trying
to investigate.

2. cma_pin() and cma_unpin() functions has been added (after
a conversation with Johan Mossberg). The idea is that whenever
hardware does not use the memory (no transaction is on) the
chunk can be moved around. This would allow defragmentation to
be implemented if desired. No defragmentation algorithm is
provided at this time.

3. Sysfs support has been replaced with debugfs. I always felt
unsure about the sysfs interface and when Greg KH pointed it
out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
that platform will provide a "*=<regions>" rule in the map
attribute.

2. The terminology has been changed slightly renaming "kind" to
"type" of memory. In the previous revisions, the documentation
indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
a separate patch, the fourth one). As a consequence, the
cma_set_defaults() function has been changed -- it no longer
accepts a string with list of regions but an array of regions.

2. The "asterisk" attribute has been removed. Now, each region
has an "asterisk" flag which lets one specify whether this
region should by considered "asterisk" region.

3. SysFS support has been moved to a separate patch (the third one
in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed. In exchange,
a SysFS entry has been created under kernel/mm/contiguous.

The intended way of specifying the attributes is
a cma_set_defaults() function called by platform initialisation
code. "regions" attribute (the string specified by "cma"
command line parameter) can be overwritten with command line
parameter; the other attributes can be changed during run-time
using the SysFS entries.

2. The behaviour of the "map" attribute has been modified
slightly. Currently, if no rule matches given device it is
assigned regions specified by the "asterisk" attribute. It is
by default built from the region names given in "regions"
attribute.

3. Devices can register private regions as well as regions that
can be shared but are not reserved using standard CMA
mechanisms. A private region has no name and can be accessed
only by devices that have the pointer to it.

4. The way allocators are registered has changed. Currently,
a cma_allocator_register() function is used for that purpose.
Moreover, allocators are attached to regions the first time
memory is registered from the region or when allocator is
registered which means that allocators can be dynamic modules
that are loaded after the kernel booted (of course, it won't be
possible to allocate a chunk of memory from a region if
allocator is not loaded).

5. Index of new functions:

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size,
+ dma_addr_t alignment)

+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)

+int __must_check cma_region_register(struct cma_region *reg);

+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);

+int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

mm: migrate.c: fix compilation error

I had some strange compilation error; this patch fixed them.

lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements

Some improvements to genalloc API (most importantly possibility to
allocate memory with alignment requirement).

mm: move some functions from memory_hotplug.c to page_isolation.c
mm: alloc_contig_free_pages() added

Code "stolen" from Kamezawa. The first patch just moves code
around and the second provide function for "allocates" already
freed memory.

mm: MIGRATE_CMA migration type added
mm: MIGRATE_CMA isolation functions added

Introduction of a new migratetype.

mm: cma: Contiguous Memory Allocator added

The CMA code.

mm: cma: Test device and application added

Test device and application. Not really for merging; just for
testing really. Maybe the whole thing should be moved to tools?

ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

A stub integration with some ARM machines. Mostly to get the cma
testing device working. Again, not for merging.

arch/arm/mach-s5pv210/mach-aquila.c | 2 +
arch/arm/mach-s5pv210/mach-goni.c | 2 +
arch/arm/mach-s5pv310/mach-universal_c210.c | 2 +
arch/arm/plat-s5p/Makefile | 2 +
arch/arm/plat-s5p/cma-stub.c | 49 +++
arch/arm/plat-s5p/include/plat/cma-stub.h | 21 ++
drivers/misc/Kconfig | 10 +
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 238 +++++++++++++
include/linux/bitmap.h | 24 +-
include/linux/cma.h | 252 ++++++++++++++
include/linux/genalloc.h | 46 ++--
include/linux/mmzone.h | 30 ++-
include/linux/page-isolation.h | 47 ++-
lib/bitmap.c | 22 +-
lib/genalloc.c | 182 ++++++-----
mm/Kconfig | 40 +++
mm/Makefile | 1 +
mm/cma.c | 477 +++++++++++++++++++++++++++
mm/compaction.c | 10 +
mm/internal.h | 3 +
mm/memory_hotplug.c | 108 ------
mm/migrate.c | 2 +
mm/page_alloc.c | 145 +++++++--
mm/page_isolation.c | 126 +++++++-
tools/cma/cma-test.c | 466 ++++++++++++++++++++++++++
26 files changed, 2040 insertions(+), 268 deletions(-)
create mode 100644 arch/arm/plat-s5p/cma-stub.c
create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c
create mode 100644 tools/cma/cma-test.c

--
1.7.2.3


2010-12-13 11:27:07

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 03/10] lib: genalloc: Generic allocator improvements

This commit adds a gen_pool_alloc_aligned() function to the
generic allocator API. It allows specifying alignment for the
allocated block. This feature uses
the bitmap_find_next_zero_area_off() function.

It also fixes possible issue with bitmap's last element being
not fully allocated (ie. space allocated for chunk->bits is
not a multiple of sizeof(long)).

It also makes some other smaller changes:
- moves structure definitions out of the header file,
- adds __must_check to functions returning value,
- makes gen_pool_add() return -ENOMEM rater than -1 on error,
- changes list_for_each to list_for_each_entry, and
- makes use of bitmap_clear().

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/genalloc.h | 46 ++++++------
lib/genalloc.c | 182 ++++++++++++++++++++++++++-------------------
2 files changed, 129 insertions(+), 99 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 9869ef3..8ac7337 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -8,29 +8,31 @@
* Version 2. See the file COPYING for more details.
*/

+struct gen_pool;

-/*
- * General purpose special memory pool descriptor.
- */
-struct gen_pool {
- rwlock_t lock;
- struct list_head chunks; /* list of chunks in this pool */
- int min_alloc_order; /* minimum allocation order */
-};
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid);

-/*
- * General purpose special memory pool chunk descriptor.
+int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr,
+ size_t size, int nid);
+
+void gen_pool_destroy(struct gen_pool *pool);
+
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order);
+
+/**
+ * gen_pool_alloc() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses a first-fit algorithm.
*/
-struct gen_pool_chunk {
- spinlock_t lock;
- struct list_head next_chunk; /* next chunk in pool */
- unsigned long start_addr; /* starting address of memory chunk */
- unsigned long end_addr; /* ending address of memory chunk */
- unsigned long bits[0]; /* bitmap for allocating memory chunk */
-};
+static inline unsigned long __must_check
+gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+ return gen_pool_alloc_aligned(pool, size, 0);
+}

-extern struct gen_pool *gen_pool_create(int, int);
-extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 1923f14..0761079 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -16,53 +16,80 @@
#include <linux/genalloc.h>


+/* General purpose special memory pool descriptor. */
+struct gen_pool {
+ rwlock_t lock; /* protects chunks list */
+ struct list_head chunks; /* list of chunks in this pool */
+ unsigned order; /* minimum allocation order */
+};
+
+/* General purpose special memory pool chunk descriptor. */
+struct gen_pool_chunk {
+ spinlock_t lock; /* protects bits */
+ struct list_head next_chunk; /* next chunk in pool */
+ unsigned long start; /* start of memory chunk */
+ unsigned long size; /* number of bits */
+ unsigned long bits[0]; /* bitmap for allocating memory chunk */
+};
+
+
/**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @order: Log base 2 of number of bytes each bitmap bit
+ * represents.
+ * @nid: Node id of the node the pool structure should be allocated
+ * on, or -1. This will be also used for other allocations.
*
* Create a new special memory pool that can be used to manage special purpose
* memory not managed by the regular kmalloc/kfree interface.
*/
-struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid)
{
struct gen_pool *pool;

- pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
- if (pool != NULL) {
+ if (WARN_ON(order >= BITS_PER_LONG))
+ return NULL;
+
+ pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid);
+ if (pool) {
rwlock_init(&pool->lock);
INIT_LIST_HEAD(&pool->chunks);
- pool->min_alloc_order = min_alloc_order;
+ pool->order = order;
}
return pool;
}
EXPORT_SYMBOL(gen_pool_create);

/**
- * gen_pool_add - add a new chunk of special memory to the pool
- * @pool: pool to add new memory chunk to
- * @addr: starting address of memory chunk to add to pool
- * @size: size in bytes of the memory chunk to add to pool
- * @nid: node id of the node the chunk structure and bitmap should be
- * allocated on, or -1
+ * gen_pool_add() - add a new chunk of special memory to the pool
+ * @pool: Pool to add new memory chunk to.
+ * @addr: Starting address of memory chunk to add to pool.
+ * @size: Size in bytes of the memory chunk to add to pool.
*
* Add a new chunk of special memory to the specified pool.
*/
-int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
- int nid)
+int __must_check
+gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size, int nid)
{
struct gen_pool_chunk *chunk;
- int nbits = size >> pool->min_alloc_order;
- int nbytes = sizeof(struct gen_pool_chunk) +
- (nbits + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+ size_t nbytes;
+
+ if (WARN_ON(!addr || addr + size < addr ||
+ (addr & ((1 << pool->order) - 1))))
+ return -EINVAL;

- chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
- if (unlikely(chunk == NULL))
- return -1;
+ size = size >> pool->order;
+ if (WARN_ON(!size))
+ return -EINVAL;
+
+ nbytes = sizeof *chunk + BITS_TO_LONGS(size) * sizeof *chunk->bits;
+ chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
+ if (!chunk)
+ return -ENOMEM;

spin_lock_init(&chunk->lock);
- chunk->start_addr = addr;
- chunk->end_addr = addr + size;
+ chunk->start = addr >> pool->order;
+ chunk->size = size;

write_lock(&pool->lock);
list_add(&chunk->next_chunk, &pool->chunks);
@@ -73,115 +100,116 @@ int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
EXPORT_SYMBOL(gen_pool_add);

/**
- * gen_pool_destroy - destroy a special memory pool
- * @pool: pool to destroy
+ * gen_pool_destroy() - destroy a special memory pool
+ * @pool: Pool to destroy.
*
* Destroy the specified special memory pool. Verifies that there are no
* outstanding allocations.
*/
void gen_pool_destroy(struct gen_pool *pool)
{
- struct list_head *_chunk, *_next_chunk;
struct gen_pool_chunk *chunk;
- int order = pool->min_alloc_order;
- int bit, end_bit;
-
+ int bit;

- list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ while (!list_empty(&pool->chunks)) {
+ chunk = list_entry(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
list_del(&chunk->next_chunk);

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
- BUG_ON(bit < end_bit);
+ bit = find_next_bit(chunk->bits, chunk->size, 0);
+ BUG_ON(bit < chunk->size);

kfree(chunk);
}
kfree(pool);
- return;
}
EXPORT_SYMBOL(gen_pool_destroy);

/**
- * gen_pool_alloc - allocate special memory from the pool
- * @pool: pool to allocate from
- * @size: number of bytes to allocate from the pool
+ * gen_pool_alloc_aligned() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ * @alignment_order: Order the allocated space should be
+ * aligned to (eg. 20 means allocated space
+ * must be aligned to 1MiB).
*
* Allocate the requested number of bytes from the specified pool.
* Uses a first-fit algorithm.
*/
-unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order)
{
- struct list_head *_chunk;
+ unsigned long addr, align_mask = 0, flags, start;
struct gen_pool_chunk *chunk;
- unsigned long addr, flags;
- int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit;

if (size == 0)
return 0;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (alignment_order > pool->order)
+ align_mask = (1 << (alignment_order - pool->order)) - 1;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ size = (size + (1UL << pool->order) - 1) >> pool->order;

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk) {
+ if (chunk->size < size)
+ continue;

spin_lock_irqsave(&chunk->lock, flags);
- start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
- nbits, 0);
- if (start_bit >= end_bit) {
+ start = bitmap_find_next_zero_area_off(chunk->bits, chunk->size,
+ 0, size, align_mask,
+ chunk->start);
+ if (start >= chunk->size) {
spin_unlock_irqrestore(&chunk->lock, flags);
continue;
}

- addr = chunk->start_addr + ((unsigned long)start_bit << order);
-
- bitmap_set(chunk->bits, start_bit, nbits);
+ bitmap_set(chunk->bits, start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- read_unlock(&pool->lock);
- return addr;
+ addr = (chunk->start + start) << pool->order;
+ goto done;
}
+
+ addr = 0;
+done:
read_unlock(&pool->lock);
- return 0;
+ return addr;
}
-EXPORT_SYMBOL(gen_pool_alloc);
+EXPORT_SYMBOL(gen_pool_alloc_aligned);

/**
- * gen_pool_free - free allocated special memory back to the pool
- * @pool: pool to free to
- * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * gen_pool_free() - free allocated special memory back to the pool
+ * @pool: Pool to free to.
+ * @addr: Starting address of memory to free back to pool.
+ * @size: Size in bytes of memory to free.
*
* Free previously allocated special memory back to the specified pool.
*/
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
unsigned long flags;
- int order = pool->min_alloc_order;
- int bit, nbits;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (!size)
+ return;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ addr = addr >> pool->order;
+ size = (size + (1UL << pool->order) - 1) >> pool->order;
+
+ BUG_ON(addr + size < addr);

- if (addr >= chunk->start_addr && addr < chunk->end_addr) {
- BUG_ON(addr + size > chunk->end_addr);
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk)
+ if (addr >= chunk->start &&
+ addr + size <= chunk->start + chunk->size) {
spin_lock_irqsave(&chunk->lock, flags);
- bit = (addr - chunk->start_addr) >> order;
- while (nbits--)
- __clear_bit(bit++, chunk->bits);
+ bitmap_clear(chunk->bits, addr - chunk->start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- break;
+ goto done;
}
- }
- BUG_ON(nbits > 0);
+ BUG_ON(1);
+done:
read_unlock(&pool->lock);
}
EXPORT_SYMBOL(gen_pool_free);
--
1.7.2.3

2010-12-13 11:27:16

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 08/10] mm: cma: Contiguous Memory Allocator added

The Contiguous Memory Allocator is a set of functions that lets
one initialise a region of memory which then can be used to perform
allocations of contiguous memory chunks from. The implementation
uses MIGRATE_CMA migration type which means that the memory is
shared with standard page allocator, ie. when CMA is not using
the memory, page allocator can allocate movable pages from the
region.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/cma.h | 223 ++++++++++++++++++++++++
mm/Kconfig | 32 ++++
mm/Makefile | 1 +
mm/cma.c | 477 +++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 733 insertions(+), 0 deletions(-)
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c

diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 0000000..25728a3
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,223 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ * The Contiguous Memory Allocator (CMA) makes it possible for
+ * device drivers to allocate big contiguous chunks of memory after
+ * the system has booted.
+ *
+ * It requires some machine- and/or platform-specific initialisation
+ * code which prepares memory ranges to be used with CMA and later,
+ * device drivers can allocate memory from those ranges.
+ *
+ * Why is it needed?
+ *
+ * Various devices on embedded systems have no scatter-getter and/or
+ * IO map support and require contiguous blocks of memory to
+ * operate. They include devices such as cameras, hardware video
+ * coders, etc.
+ *
+ * Such devices often require big memory buffers (a full HD frame
+ * is, for instance, more then 2 mega pixels large, i.e. more than 6
+ * MB of memory), which makes mechanisms such as kmalloc() or
+ * alloc_page() ineffective.
+ *
+ * At the same time, a solution where a big memory region is
+ * reserved for a device is suboptimal since often more memory is
+ * reserved then strictly required and, moreover, the memory is
+ * inaccessible to page system even if device drivers don't use it.
+ *
+ * CMA tries to solve this issue by operating on memory regions
+ * where only movable pages can be allocated from. This way, kernel
+ * can use the memory for pagecache and when device driver requests
+ * it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ * For device driver to use CMA it needs to have a pointer to a CMA
+ * context represented by a struct cma (which is an opaque data
+ * type).
+ *
+ * Once such pointer is obtained, device driver may allocate
+ * contiguous memory chunk using the following function:
+ *
+ * cm_alloc()
+ *
+ * This function returns a pointer to struct cm (another opaque data
+ * type) which represent a contiguous memory chunk. This pointer
+ * may be used with the following functions:
+ *
+ * cm_free() -- frees allocated contiguous memory
+ * cm_pin() -- pins memory
+ * cm_unpin() -- unpins memory
+ * cm_vmap() -- maps memory in kernel space
+ * cm_vunmap() -- unmaps memory from kernel space
+ *
+ * See the respective functions for more information.
+ *
+ * Platform/machine integration
+ *
+ * For device drivers to be able to use CMA platform or machine
+ * initialisation code must create a CMA context and pass it to
+ * device drivers. The latter may be done by a global variable or
+ * a platform/machine specific function. For the former CMA
+ * provides the following functions:
+ *
+ * cma_init()
+ * cma_reserve()
+ * cma_create()
+ *
+ * The first one initialises a portion of reserved memory so that it
+ * can be used with CMA. The second first tries to reserve memory
+ * (using memblock) and then initialise it.
+ *
+ * The cma_reserve() function must be called when memblock is still
+ * operational and reserving memory with it is still possible. On
+ * ARM platform the "reserve" machine callback is a perfect place to
+ * call it.
+ *
+ * The last function creates a CMA context on a range of previously
+ * initialised memory addresses. Because it uses kmalloc() it needs
+ * to be called after SLAB is initialised.
+ */
+
+/***************************** Kernel level API *****************************/
+
+#if defined __KERNEL__ && defined CONFIG_CMA
+
+/* CMA context */
+struct cma;
+/* Contiguous Memory chunk */
+struct cm;
+
+/**
+ * cma_init() - initialises range of physical memory to be used with CMA.
+ * @start: start address of the memory range in bytes.
+ * @size: size of the memory range in bytes.
+ *
+ * The range must be MAX_ORDER-1 aligned and it must have been already
+ * reserved (eg. with memblock).
+ *
+ * Returns zero on success or negative error.
+ */
+int cma_init(unsigned long start, unsigned long end);
+
+/**
+ * cma_reserve() - reserves and initialises memory to be used with CMA.
+ * @start: start address of the memory range in bytes hint; if unsure
+ * pass zero (will be down-aligned to MAX_ORDER-1).
+ * @size: size of the memory to reserve in bytes (will be up-aligned
+ * to MAX_ORDER-1).
+ * @alignment: desired alignment in bytes (must be power of two or zero).
+ *
+ * It will use memblock to allocate memory and then initialise it for
+ * use with CMA by invoking cma_init(). It must be called early in
+ * boot process while memblock is still operational.
+ *
+ * Returns reserved's area physical address or value that yields true
+ * when checked with IS_ERR_VALUE().
+ */
+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment);
+
+/**
+ * cma_create() - creates CMA context.
+ * @start: start address of the context in bytes.
+ * @size: size of the context in bytes.
+ *
+ * The range must be page aligned. The range must have been already
+ * initialised with cma_init(). Different contexts cannot overlap.
+ *
+ * Because this function uses kmalloc() it must be called after SLAB
+ * is initialised. This in particular means that it cannot be called
+ * just after cma_reserve() since the former needs to be run way
+ * earlier.
+ *
+ * Returns pointer to CMA context or a pointer-error on error.
+ */
+struct cma *cma_create(unsigned long start, unsigned long size);
+
+/**
+ * cma_destroy() - destroys CMA context.
+ * @cma: context to destroy.
+ */
+void cma_destroy(struct cma *cma);
+
+/**
+ * cm_alloc() - allocates contiguous memory.
+ * @cma: CMA context to use.
+ * @size: desired chunk size in bytes (must be non-zero).
+ * @alignent: desired minimal alignment in bytes (must be power of two
+ * or zero).
+ *
+ * Returns pointer to structure representing contiguous memory or
+ * a pointer-error on error.
+ */
+struct cm *cm_alloc(struct cma *cma, unsigned long size,
+ unsigned long alignment);
+
+/**
+ * cm_free() - frees contiguous memory.
+ * @cm: contiguous memory to free.
+ *
+ * The contiguous memory must be not be pinned (see cma_pin()) and
+ * must not be mapped to kernel space (cma_vmap()).
+ */
+void cm_free(struct cm *cm);
+
+/**
+ * cm_pin() - pins contiguous memory.
+ * @cm: contiguous memory to pin.
+ *
+ * Pinning is required to obtain contiguous memory's physical address.
+ * While memory is pinned the memory will remain valid it may change
+ * if memory is unpinned and then pinned again. This facility is
+ * provided so that memory defragmentation can be implemented inside
+ * CMA.
+ *
+ * Each call to cm_pin() must be accompanied by call to cm_unpin() and
+ * the calls may be nested.
+ *
+ * Returns chunk's physical address or a value that yields true when
+ * tested with IS_ERR_VALUE().
+ */
+unsigned long cm_pin(struct cm *cm);
+
+/**
+ * cm_unpin() - unpins contiguous memory.
+ * @cm: contiguous memory to unpin.
+ *
+ * See cm_pin().
+ */
+void cm_unpin(struct cm *cm);
+
+/**
+ * cm_vmap() - maps memory to kernel space (or returns existing mapping).
+ * @cm: contiguous memory to map.
+ *
+ * Each call to cm_vmap() must be accompanied with call to cm_vunmap()
+ * and the calls may be nested.
+ *
+ * Returns kernel virtual address or a pointer-error.
+ */
+void *cm_vmap(struct cm *cm);
+
+/**
+ * cm_vunmap() - unmpas memory from kernel space.
+ * @cm: contiguous memory to unmap.
+ *
+ * See cm_vmap().
+ */
+void cm_vunmap(struct cm *cm);
+
+#endif
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 7818b07..743893b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -339,3 +339,35 @@ config CLEANCACHE
in a negligible performance hit.

If unsure, say Y to enable cleancache
+
+
+config CMA
+ bool "Contiguous Memory Allocator framework"
+ # Currently there is only one allocator so force it on
+ select MIGRATION
+ select MIGRATE_CMA
+ select GENERIC_ALLOCATOR
+ help
+ This enables the Contiguous Memory Allocator framework which
+ allows drivers to allocate big physically-contiguous blocks of
+ memory for use with hardware components that do not support I/O
+ map nor scatter-gather.
+
+ If you select this option you will also have to select at least
+ one allocator algorithm below.
+
+ To make use of CMA you need to specify the regions and
+ driver->region mapping on command line when booting the kernel.
+
+ For more information see <include/linux/cma.h>. If unsure, say "n".
+
+config CMA_DEBUG
+ bool "CMA debug messages (DEVELOPEMENT)"
+ depends on CMA
+ help
+ Turns on debug messages in CMA. This produces KERN_DEBUG
+ messages for every CMA call as well as various messages while
+ processing calls such as cma_alloc(). This option does not
+ affect warning and error messages.
+
+ This is mostly used during development. If unsure, say "n".
diff --git a/mm/Makefile b/mm/Makefile
index 0b08d1c..c6a84f1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
+obj-$(CONFIG_CMA) += cma.o
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 0000000..401e604
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,477 @@
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See include/linux/cma.h for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/cma.h>
+
+#ifndef CONFIG_NO_BOOTMEM
+# include <linux/bootmem.h>
+#endif
+#ifdef CONFIG_HAVE_MEMBLOCK
+# include <linux/memblock.h>
+#endif
+
+#include <linux/err.h>
+#include <linux/genalloc.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+
+#include "internal.h"
+
+
+/************************* Initialise CMA *************************/
+
+static struct cma_grabbed {
+ unsigned long start;
+ unsigned long size;
+} cma_grabbed[8] __initdata;
+static unsigned cma_grabbed_count __initdata;
+
+int cma_init(unsigned long start, unsigned long size)
+{
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ if (!size)
+ return -EINVAL;
+ if ((start | size) & ((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1))
+ return -EINVAL;
+ if (start + size < start)
+ return -EOVERFLOW;
+
+ if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed))
+ return -ENOSPC;
+
+ cma_grabbed[cma_grabbed_count].start = start;
+ cma_grabbed[cma_grabbed_count].size = size;
+ ++cma_grabbed_count;
+ return 0;
+}
+
+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment)
+{
+ u64 addr;
+ int ret;
+
+ pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
+ (void *)alignment);
+
+ /* Sanity checks */
+ if (!size || (alignment & (alignment - 1)))
+ return (unsigned long)-EINVAL;
+
+ /* Sanitise input arguments */
+ start = ALIGN(start, MAX_ORDER_NR_PAGES << PAGE_SHIFT);
+ size &= ~((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1);
+ if (alignment < (MAX_ORDER_NR_PAGES << PAGE_SHIFT))
+ alignment = MAX_ORDER_NR_PAGES << PAGE_SHIFT;
+
+ /* Reserve memory */
+ if (start) {
+ if (memblock_is_region_reserved(start, size) ||
+ memblock_reserve(start, size) < 0)
+ return (unsigned long)-EBUSY;
+ } else {
+ /*
+ * Use __memblock_alloc_base() since
+ * memblock_alloc_base() panic()s.
+ */
+ addr = __memblock_alloc_base(size, alignment, 0);
+ if (!addr) {
+ return (unsigned long)-ENOMEM;
+ } else if (addr + size > ~(unsigned long)0) {
+ memblock_free(addr, size);
+ return (unsigned long)-EOVERFLOW;
+ } else {
+ start = addr;
+ }
+ }
+
+ /* CMA Initialise */
+ ret = cma_init(start, size);
+ if (ret < 0) {
+ memblock_free(start, size);
+ return ret;
+ }
+ return start;
+}
+
+static int __init cma_give_back(void)
+{
+ struct cma_grabbed *r = cma_grabbed;
+ unsigned i = cma_grabbed_count;
+
+ pr_debug("%s(): will give %u range(s)\n", __func__, i);
+
+ for (; i; --i, ++r) {
+ struct page *p = phys_to_page(r->start);
+ unsigned j = r->size >> (PAGE_SHIFT + pageblock_order);
+
+ pr_debug("%s(): giving (%p+%p)\n", __func__,
+ (void *)r->start, (void *)r->size);
+
+ do {
+ __free_pageblock_cma(p);
+ p += pageblock_nr_pages;
+ } while (--j);
+ }
+
+ return 0;
+}
+subsys_initcall(cma_give_back);
+
+
+/************************** CMA context ***************************/
+
+/* struct cma is just an alias for struct gen_alloc */
+
+struct cma *cma_create(unsigned long start, unsigned long size)
+{
+ struct gen_pool *pool;
+ int ret;
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ if (!size)
+ return ERR_PTR(-EINVAL);
+ if ((start | size) & (PAGE_SIZE - 1))
+ return ERR_PTR(-EINVAL);
+ if (start + size < start)
+ return ERR_PTR(-EOVERFLOW);
+
+ pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (unlikely(!pool))
+ return ERR_PTR(-ENOMEM);
+
+ ret = gen_pool_add(pool, start, size, -1);
+ if (unlikely(ret)) {
+ gen_pool_destroy(pool);
+ return ERR_PTR(ret);
+ }
+
+ pr_debug("%s: returning <%p>\n", __func__, (void *)pool);
+ return (void *)pool;
+}
+
+void cma_destroy(struct cma *cma)
+{
+ pr_debug("%s(<%p>)\n", __func__, (void *)cma);
+ gen_pool_destroy((void *)cma);
+}
+
+
+/************************* Allocate and free *************************/
+
+struct cm {
+ struct gen_pool *pool;
+ unsigned long phys, size;
+ atomic_t pinned, mapped;
+};
+
+/* Protects cm_alloc(), cm_free(), __cm_alloc() and __cm_free(). */
+static DEFINE_MUTEX(cma_mutex);
+
+/* Must hold cma_mutex to call these. */
+static int __cm_alloc(unsigned long start, unsigned long size);
+static void __cm_free(unsigned long start, unsigned long size);
+
+struct cm *cm_alloc(struct cma *cma, unsigned long size,
+ unsigned long alignment)
+{
+ unsigned long start;
+ int ret = -ENOMEM;
+ struct cm *cm;
+
+ pr_debug("%s(<%p>, %p/%p)\n", __func__, (void *)cma,
+ (void *)size, (void *)alignment);
+
+ if (!size || (alignment & (alignment - 1)))
+ return ERR_PTR(-EINVAL);
+ size = PAGE_ALIGN(size);
+
+ cm = kmalloc(sizeof *cm, GFP_KERNEL);
+ if (!cm)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_lock(&cma_mutex);
+
+ start = gen_pool_alloc_aligned((void *)cma, size,
+ alignment ? ffs(alignment) - 1 : 0);
+ if (!start)
+ goto error1;
+
+ ret = __cm_alloc(start, size);
+ if (ret)
+ goto error2;
+
+ mutex_unlock(&cma_mutex);
+
+ cm->pool = (void *)cma;
+ cm->phys = start;
+ cm->size = size;
+ atomic_set(&cm->pinned, 0);
+ atomic_set(&cm->mapped, 0);
+
+ pr_debug("%s(): returning [%p]\n", __func__, (void *)cm);
+ return cm;
+
+error2:
+ gen_pool_free((void *)cma, start, size);
+error1:
+ mutex_unlock(&cma_mutex);
+ kfree(cm);
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(cm_alloc);
+
+void cm_free(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+
+ if (WARN_ON(atomic_read(&cm->pinned) || atomic_read(&cm->mapped)))
+ return;
+
+ mutex_lock(&cma_mutex);
+
+ gen_pool_free(cm->pool, cm->phys, cm->size);
+ __cm_free(cm->phys, cm->size);
+
+ mutex_unlock(&cma_mutex);
+
+ kfree(cm);
+}
+EXPORT_SYMBOL_GPL(cm_free);
+
+
+/************************* Mapping and addresses *************************/
+
+/*
+ * Currently no-operations but keep reference counters for error
+ * checking.
+ */
+
+unsigned long cm_pin(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ atomic_inc(&cm->pinned);
+ return cm->phys;
+}
+EXPORT_SYMBOL_GPL(cm_pin);
+
+void cm_unpin(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ WARN_ON(!atomic_add_unless(&cm->pinned, -1, 0));
+}
+EXPORT_SYMBOL_GPL(cm_unpin);
+
+void *cm_vmap(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ atomic_inc(&cm->mapped);
+ /*
+ * Keep it simple... We should do something more clever in
+ * the future.
+ */
+ return phys_to_virt(cm->phys);
+}
+EXPORT_SYMBOL_GPL(cm_vmap);
+
+void cm_vunmap(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ WARN_ON(!atomic_add_unless(&cm->mapped, -1, 0));
+}
+EXPORT_SYMBOL_GPL(cm_vunmap);
+
+
+/************************* Migration stuff *************************/
+
+/* XXX Revisit */
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+# define phys_to_pfn __phys_to_pfn
+#else
+# warning correct phys_to_pfn implementation needed
+static unsigned long phys_to_pfn(phys_addr_t phys)
+{
+ return virt_to_pfn(phys_to_virt(phys));
+}
+#endif
+
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+ return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY 5
+static int __cm_migrate(unsigned long start, unsigned long end)
+{
+ int migration_failed = 0, ret;
+ unsigned long pfn = start;
+
+ pr_debug("%s(%p..%p)\n", __func__, (void *)start, (void *)end);
+
+ /*
+ * Some code "borrowed" from KAMEZAWA Hiroyuki's
+ * __alloc_contig_pages().
+ */
+
+ for (;;) {
+ pfn = scan_lru_pages(pfn, end);
+ if (!pfn || pfn >= end)
+ break;
+
+ ret = do_migrate_range(pfn, end);
+ if (!ret) {
+ migration_failed = 0;
+ } else if (ret != -EBUSY
+ || ++migration_failed >= MIGRATION_RETRY) {
+ return ret;
+ } else {
+ /* There are unstable pages.on pagevec. */
+ lru_add_drain_all();
+ /*
+ * there may be pages on pcplist before
+ * we mark the range as ISOLATED.
+ */
+ drain_all_pages();
+ }
+ cond_resched();
+ }
+
+ if (!migration_failed) {
+ /* drop all pages in pagevec and pcp list */
+ lru_add_drain_all();
+ drain_all_pages();
+ }
+
+ /* Make sure all pages are isolated */
+ if (WARN_ON(test_pages_isolated(start, end)))
+ return -EBUSY;
+
+ return 0;
+}
+
+static int __cm_alloc(unsigned long start, unsigned long size)
+{
+ unsigned long end, _start, _end;
+ int ret;
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ /*
+ * What we do here is we mark all pageblocks in range as
+ * MIGRATE_ISOLATE. Because of the way page allocator work, we
+ * align the range to MAX_ORDER pages so that page allocator
+ * won't try to merge buddies from different pageblocks and
+ * change MIGRATE_ISOLATE to some other migration type.
+ *
+ * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+ * migrate the pages from an unaligned range (ie. pages that
+ * we are interested in). This will put all the pages in
+ * range back to page allocator as MIGRATE_ISOLATE.
+ *
+ * When this is done, we take the pages in range from page
+ * allocator removing them from the buddy system. This way
+ * page allocator will never consider using them.
+ *
+ * This lets us mark the pageblocks back as MIGRATE_CMA so
+ * that free pages in the MAX_ORDER aligned range but not in
+ * the unaligned, original range are put back to page
+ * allocator so that buddy can use them.
+ */
+
+ start = phys_to_pfn(start);
+ end = start + (size >> PAGE_SHIFT);
+
+ pr_debug("\tisolate range(%lx, %lx)\n",
+ pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ ret = __start_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end), MIGRATE_CMA);
+ if (ret)
+ goto done;
+
+ pr_debug("\tmigrate range(%lx, %lx)\n", start, end);
+ ret = __cm_migrate(start, end);
+ if (ret)
+ goto done;
+
+ /*
+ * Pages from [start, end) are within a MAX_ORDER aligned
+ * blocks that are marked as MIGRATE_ISOLATE. What's more,
+ * all pages in [start, end) are free in page allocator. What
+ * we are going to do is to allocate all pages from [start,
+ * end) (that is remove them from page allocater).
+ *
+ * The only problem is that pages at the beginning and at the
+ * end of interesting range may be not aligned with pages that
+ * page allocator holds, ie. they can be part of higher order
+ * pages. Because of this, we reserve the bigger range and
+ * once this is done free the pages we are not interested in.
+ */
+
+ pr_debug("\tfinding buddy\n");
+ ret = 0;
+ while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+ if (WARN_ON(++ret >= MAX_ORDER))
+ return -EINVAL;
+
+ _start = start & (~0UL << ret);
+ pr_debug("\talloc freed(%lx, %lx)\n", _start, end);
+ _end = alloc_contig_freed_pages(_start, end, 0);
+
+ /* Free head and tail (if any) */
+ pr_debug("\tfree contig(%lx, %lx)\n", _start, start);
+ free_contig_pages(pfn_to_page(_start), start - _start);
+ pr_debug("\tfree contig(%lx, %lx)\n", end, _end);
+ free_contig_pages(pfn_to_page(end), _end - end);
+
+ ret = 0;
+
+done:
+ pr_debug("\tundo isolate range(%lx, %lx)\n",
+ pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ __undo_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end), MIGRATE_CMA);
+
+ pr_debug("ret = %d\n", ret);
+ return ret;
+}
+
+static void __cm_free(unsigned long start, unsigned long size)
+{
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ free_contig_pages(pfn_to_page(phys_to_pfn(start)),
+ size >> PAGE_SHIFT);
+}
--
1.7.2.3

2010-12-13 11:27:32

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 09/10] mm: cma: Test device and application added

This patch adds a "cma" misc device which lets user space use the
CMA API. This device is meant for testing. A testing application
is also provided.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
drivers/misc/Kconfig | 10 +
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 238 ++++++++++++++++++++++++
include/linux/cma.h | 29 +++
tools/cma/cma-test.c | 466 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 744 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 tools/cma/cma-test.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 1e1a4be..0dd1e8c 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -458,4 +458,14 @@ source "drivers/misc/cb710/Kconfig"
source "drivers/misc/iwmc3200top/Kconfig"
source "drivers/misc/ti-st/Kconfig"

+config CMA_DEVICE
+ tristate "CMA misc device (DEVELOPEMENT)"
+ depends on CMA
+ help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space. This is mostly for testing of the CMA
+ framework.
+
+ If unsure, say "n"
+
endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..f8eadd4 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
obj-$(CONFIG_PCH_PHUB) += pch_phub.o
obj-y += ti-st/
obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o
+obj-$(CONFIG_CMA_DEVICE) += cma-dev.o
diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c
new file mode 100644
index 0000000..6c36064
--- /dev/null
+++ b/drivers/misc/cma-dev.c
@@ -0,0 +1,238 @@
+/*
+ * Contiguous Memory Allocator userspace driver
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR_VALUE() */
+#include <linux/fs.h> /* struct file */
+#include <linux/mm.h> /* Memory stuff */
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/module.h> /* Standard module stuff */
+#include <linux/device.h> /* struct device, dev_dbg() */
+#include <linux/types.h> /* Just to be safe ;) */
+#include <linux/uaccess.h> /* __copy_{to,from}_user */
+#include <linux/miscdevice.h> /* misc_register() and company */
+
+#include <linux/cma.h>
+
+#include <plat/cma-stub.h>
+
+static int cma_file_open(struct inode *inode, struct file *file);
+static int cma_file_release(struct inode *inode, struct file *file);
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma);
+
+static struct miscdevice cma_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "cma",
+ .fops = &(const struct file_operations) {
+ .owner = THIS_MODULE,
+ .open = cma_file_open,
+ .release = cma_file_release,
+ .unlocked_ioctl = cma_file_ioctl,
+ .mmap = cma_file_mmap,
+ },
+};
+#define cma_dev (cma_miscdev.this_device)
+
+struct cma_private_data {
+ struct cm *cm;
+ unsigned long size;
+ unsigned long phys;
+};
+
+static int cma_file_open(struct inode *inode, struct file *file)
+{
+ struct cma_private_data *prv;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!cma_ctx)
+ return -EOPNOTSUPP;
+
+ prv = kzalloc(sizeof *prv, GFP_KERNEL);
+ if (!prv)
+ return -ENOMEM;
+
+ file->private_data = prv;
+
+ return 0;
+}
+
+static int cma_file_release(struct inode *inode, struct file *file)
+{
+ struct cma_private_data *prv = file->private_data;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (prv->cm) {
+ cm_unpin(prv->cm);
+ cm_free(prv->cm);
+ }
+ kfree(prv);
+
+ return 0;
+}
+
+static long cma_file_ioctl_req(struct cma_private_data *prv, unsigned long arg)
+{
+ struct cma_alloc_request req;
+ struct cm *cm;
+
+ dev_dbg(cma_dev, "%s()\n", __func__);
+
+ if (!arg)
+ return -EINVAL;
+
+ if (copy_from_user(&req, (void *)arg, sizeof req))
+ return -EFAULT;
+
+ if (req.magic != CMA_MAGIC)
+ return -ENOTTY;
+
+ /* May happen on 32 bit system. */
+ if (req.size > ~(unsigned long)0 || req.alignment > ~(unsigned long)0)
+ return -EINVAL;
+
+ req.size = PAGE_ALIGN(req.size);
+ if (req.size > ~(unsigned long)0)
+ return -EINVAL;
+
+ cm = cm_alloc(cma_ctx, req.size, req.alignment);
+ if (IS_ERR(cm))
+ return PTR_ERR(cm);
+
+ prv->phys = cm_pin(cm);
+ prv->size = req.size;
+ req.start = prv->phys;
+ if (copy_to_user((void *)arg, &req, sizeof req)) {
+ cm_free(cm);
+ return -EFAULT;
+ }
+ prv->cm = cm;
+
+ dev_dbg(cma_dev, "allocated %p@%p\n",
+ (void *)prv->size, (void *)prv->phys);
+
+ return 0;
+}
+
+static long
+cma_file_ioctl_pattern(struct cma_private_data *prv, unsigned long arg)
+{
+ unsigned long *_it, *it, *end, v;
+
+ dev_dbg(cma_dev, "%s(%s)\n", __func__, arg ? "fill" : "verify");
+
+ _it = phys_to_virt(prv->phys);
+ end = _it + prv->size / sizeof *_it;
+
+ if (arg)
+ for (v = 0, it = _it; it != end; ++v, ++it)
+ *it = v;
+
+ for (v = 0, it = _it; it != end; ++v, ++it)
+ if (*it != v)
+ goto error;
+
+ return prv->size;
+
+error:
+ dev_dbg(cma_dev, "at %p + %x got %lx, expected %lx\n",
+ (void *)_it, (it - _it) * sizeof *it, *it, v);
+ print_hex_dump(KERN_DEBUG, "cma: ", DUMP_PREFIX_ADDRESS,
+ 16, sizeof *it, it,
+ min_t(size_t, 128, (end - it) * sizeof *it), 0);
+ return (it - _it) * sizeof *it;
+}
+
+static long cma_file_ioctl_dump(struct cma_private_data *prv, unsigned long len)
+{
+ unsigned long *it;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)len);
+
+ it = phys_to_virt(prv->phys);
+ len = min(len & ~(sizeof *it - 1), prv->size);
+ print_hex_dump(KERN_DEBUG, "cma: ", DUMP_PREFIX_ADDRESS,
+ 16, sizeof *it, it, len, 0);
+
+ return 0;
+}
+
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+{
+ struct cma_private_data *prv = file->private_data;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if ((cmd == IOCTL_CMA_ALLOC) != !prv->cm)
+ return -EBADFD;
+
+ switch (cmd) {
+ case IOCTL_CMA_ALLOC:
+ return cma_file_ioctl_req(prv, arg);
+
+ case IOCTL_CMA_PATTERN:
+ return cma_file_ioctl_pattern(prv, arg);
+
+ case IOCTL_CMA_DUMP:
+ return cma_file_ioctl_dump(prv, arg);
+
+ default:
+ return -ENOTTY;
+ }
+}
+
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct cma_private_data *prv = file->private_data;
+ unsigned long pgoff, offset, length;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!prv->cm)
+ return -EBADFD;
+
+ pgoff = vma->vm_pgoff;
+ offset = pgoff << PAGE_SHIFT;
+ length = vma->vm_end - vma->vm_start;
+
+ if (offset >= prv->size
+ || length > prv->size
+ || offset + length > prv->size)
+ return -ENOSPC;
+
+ return remap_pfn_range(vma, vma->vm_start,
+ __phys_to_pfn(prv->phys) + pgoff,
+ length, vma->vm_page_prot);
+}
+
+static int __init cma_dev_init(void)
+{
+ int ret = misc_register(&cma_miscdev);
+ pr_debug("miscdev: register returned: %d\n", ret);
+ return ret;
+}
+module_init(cma_dev_init);
+
+static void __exit cma_dev_exit(void)
+{
+ dev_dbg(cma_dev, "deregisterring\n");
+ misc_deregister(&cma_miscdev);
+}
+module_exit(cma_dev_exit);
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 25728a3..cd03ec0 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -89,6 +89,35 @@
* to be called after SLAB is initialised.
*/

+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+
+#define CMA_MAGIC (('c' << 24) | ('M' << 16) | ('a' << 8) | 0x42)
+
+/**
+ * An information about area exportable to user space.
+ * @magic: must always be CMA_MAGIC.
+ * @_pad: padding (ignored).
+ * @size: size of the chunk to allocate.
+ * @alignment: desired alignment of the chunk (must be power of two or zero).
+ * @start: when ioctl() finishes this stores physical address of the chunk.
+ */
+struct cma_alloc_request {
+ __u32 magic;
+ __u32 _pad;
+
+ /* __u64 to be compatible accross 32 and 64 bit systems. */
+ __u64 size;
+ __u64 alignment;
+ __u64 start;
+};
+
+#define IOCTL_CMA_ALLOC _IOWR('p', 0, struct cma_alloc_request)
+#define IOCTL_CMA_PATTERN _IO('p', 1)
+#define IOCTL_CMA_DUMP _IO('p', 2)
+
+
/***************************** Kernel level API *****************************/

#if defined __KERNEL__ && defined CONFIG_CMA
diff --git a/tools/cma/cma-test.c b/tools/cma/cma-test.c
new file mode 100644
index 0000000..56bff8a
--- /dev/null
+++ b/tools/cma/cma-test.c
@@ -0,0 +1,466 @@
+/*
+ * cma-test.c -- CMA testing application
+ *
+ * Copyright (C) 2010 Samsung Electronics
+ * Author: Michal Nazarewicz <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/* $(CROSS_COMPILE)gcc -Wall -Wextra -g -o cma-test cma-test.c */
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <ctype.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+
+#include <linux/cma.h>
+
+
+/****************************** Chunks management ******************************/
+
+struct chunk {
+ struct chunk *next, *prev;
+ int fd;
+ unsigned long size;
+ unsigned long start;
+};
+
+static struct chunk root = {
+ .next = &root,
+ .prev = &root,
+};
+
+#define for_each(a) for (a = root.next; a != &root; a = a->next)
+
+static struct chunk *chunk_create(const char *prefix)
+{
+ struct chunk *chunk;
+ int fd;
+
+ chunk = malloc(sizeof *chunk);
+ if (!chunk) {
+ fprintf(stderr, "%s: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ fd = open("/dev/cma", O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "%s: /dev/cma: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ chunk->prev = chunk;
+ chunk->next = chunk;
+ chunk->fd = fd;
+ return chunk;
+}
+
+static void chunk_destroy(struct chunk *chunk)
+{
+ chunk->prev->next = chunk->next;
+ chunk->next->prev = chunk->prev;
+ close(chunk->fd);
+}
+
+static void chunk_add(struct chunk *chunk)
+{
+ chunk->next = &root;
+ chunk->prev = root.prev;
+ root.prev->next = chunk;
+ root.prev = chunk;
+}
+
+
+/****************************** Commands ******************************/
+
+/* Parsing helpers */
+#define SKIP_SPACE(ch) do { while (isspace(*(ch))) ++(ch); } while (0)
+
+static int memparse(char *ptr, char **retptr, unsigned long *ret)
+{
+ unsigned long val;
+
+ SKIP_SPACE(ptr);
+
+ errno = 0;
+ val = strtoul(ptr, &ptr, 0);
+ if (errno)
+ return -1;
+
+ switch (*ptr) {
+ case 'G':
+ case 'g':
+ val <<= 10;
+ case 'M':
+ case 'm':
+ val <<= 10;
+ case 'K':
+ case 'k':
+ val <<= 10;
+ ++ptr;
+ }
+
+ if (retptr) {
+ SKIP_SPACE(ptr);
+ *retptr = ptr;
+ }
+
+ *ret = val;
+ return 0;
+}
+
+static void cmd_list(char *name, char *line, int arg)
+{
+ struct chunk *chunk;
+
+ (void)name; (void)line; (void)arg;
+
+ for_each(chunk)
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+}
+
+static void cmd_alloc(char *name, char *line, int arg)
+{
+ unsigned long size, alignment = 0;
+ struct cma_alloc_request req;
+ struct chunk *chunk;
+ int ret;
+
+ (void)arg;
+
+ if (memparse(line, &line, &size) < 0 || !size) {
+ fprintf(stderr, "%s: invalid size\n", name);
+ return;
+ }
+
+ if (*line == '/')
+ if (memparse(line, &line, &alignment) < 0) {
+ fprintf(stderr, "%s: invalid alignment\n", name);
+ return;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr, "%s: unknown argument(s) at the end: %s\n",
+ name, line);
+ return;
+ }
+
+ chunk = chunk_create(name);
+ if (!chunk)
+ return;
+
+ fprintf(stderr, "%s: allocating %p/%p\n", name,
+ (void *)size, (void *)alignment);
+
+ req.magic = CMA_MAGIC;
+ req.size = size;
+ req.alignment = alignment;
+ req.start = 0;
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_ALLOC, &req);
+ if (ret < 0) {
+ fprintf(stderr, "%s: cma_alloc: %s\n", name, strerror(errno));
+ chunk_destroy(chunk);
+ } else {
+ chunk->size = req.size;
+ chunk->start = req.start;
+ chunk_add(chunk);
+
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+ }
+}
+
+static struct chunk *_cmd_numbered(char *name, char *line)
+{
+ struct chunk *chunk;
+
+ SKIP_SPACE(line);
+
+ if (*line) {
+ unsigned long num;
+
+ errno = 0;
+ num = strtoul(line, &line, 10);
+
+ if (errno || num > INT_MAX) {
+ fprintf(stderr, "%s: invalid number\n", name);
+ return NULL;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr,
+ "%s: unknown arguments at the end: %s\n",
+ name, line);
+ return NULL;
+ }
+
+ for_each(chunk)
+ if (chunk->fd == (int)num)
+ return chunk;
+ fprintf(stderr, "%s: no chunk %3lu\n", name, num);
+ return NULL;
+
+ } else {
+ chunk = root.prev;
+ if (chunk == &root) {
+ fprintf(stderr, "%s: no chunks\n", name);
+ return NULL;
+ }
+ return chunk;
+ }
+}
+
+static void cmd_free(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+ (void)arg;
+ if (chunk) {
+ fprintf(stderr, "%s: freeing %p@%p\n", name,
+ (void *)chunk->size, (void *)chunk->start);
+ chunk_destroy(chunk);
+ }
+}
+
+static void _cmd_pattern(char *name, unsigned long *ptr, unsigned long size,
+ int arg)
+{
+ unsigned long *end = ptr + size / sizeof *ptr, *it, v;
+
+ if (arg)
+ for (v = 0, it = ptr; it != end; ++v, ++it)
+ *it = v;
+
+ for (v = 0, it = ptr; it != end && *it == v; ++v, ++it)
+ /* nop */;
+
+ if (it != end)
+ fprintf(stderr, "%s: at +[%lx] got %lx, expected %lx\n",
+ name, (unsigned long)(it - ptr) * sizeof *it, *it, v);
+ else
+ fprintf(stderr, "%s: done\n", name);
+}
+
+static void _cmd_dump(char *name, uint32_t *ptr)
+{
+ unsigned lines = 32, groups;
+ uint32_t *it = ptr;
+
+ do {
+ printf("%s: %04lx:", name,
+ (unsigned long)(it - ptr) * sizeof *it);
+
+ groups = 4;
+ do {
+ printf(" %08lx", (unsigned long)*it);
+ ++it;
+ } while (--groups);
+
+ putchar('\n');
+ } while (--lines);
+}
+
+static void cmd_mapped(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+ void *ptr;
+
+ if (!chunk)
+ return;
+
+ ptr = mmap(NULL, chunk->size,
+ arg != 2 ? PROT_READ | PROT_WRITE : PROT_READ,
+ MAP_SHARED, chunk->fd, 0);
+
+ if (ptr == (void *)-1) {
+ fprintf(stderr, "%s: mapping failed: %s\n", name,
+ strerror(errno));
+ return;
+ }
+
+ switch (arg) {
+ case 0:
+ case 1:
+ _cmd_pattern(name, ptr, chunk->size, arg);
+ break;
+
+ case 2:
+ _cmd_dump(name, ptr);
+ }
+
+ munmap(ptr, chunk->size);
+}
+
+static void cmd_kpattern(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+ if (chunk) {
+ int ret;
+
+ fprintf(stderr, "%s: requesting kernel to %s %p@%p\n",
+ name, arg ? "fill" : "verify",
+ (void *)chunk->size, (void *)chunk->start);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_PATTERN, arg);
+ if (ret < 0)
+ fprintf(stderr, "%s: %s\n", name, strerror(errno));
+ else if ((unsigned long)ret < chunk->size)
+ fprintf(stderr, "%s: failed at +[%x]\n", name, ret);
+ else
+ fprintf(stderr, "%s: done\n", name);
+ }
+}
+
+static void cmd_kdump(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+
+ (void)arg;
+
+ if (chunk) {
+ int ret;
+
+ fprintf(stderr, "%s: requesting kernel to dump 256B@%p\n",
+ name, (void *)chunk->start);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_DUMP, 256);
+ if (ret < 0)
+ fprintf(stderr, "%s: %s\n", name, strerror(errno));
+ else
+ fprintf(stderr, "%s: done\n", name);
+ }
+}
+
+static const struct command {
+ const char short_name;
+ const char name[8];
+ void (*handle)(char *name, char *line, int arg);
+ int arg;
+ const char *help_args, *help;
+} commands[] = {
+ { 'l', "list", cmd_list, 0,
+ "", "list allocated chunks" },
+ { 'a', "alloc", cmd_alloc, 0,
+ "<size>[/<alignment>]", "allocate chunk" },
+ { 'f', "free", cmd_free, 0,
+ "[<num>]", "free an chunk" },
+ { 'w', "write", cmd_mapped, 1,
+ "[<num>]", "write data to chunk" },
+ { 'W', "kwrite", cmd_kpattern, 1,
+ "[<num>]", "let kernel write data to chunk" },
+ { 'v', "verify", cmd_mapped, 0,
+ "[<num>]", "verify chunk's content" },
+ { 'V', "kverify", cmd_kpattern, 0,
+ "[<num>]", "let kernel verify chunk's contet" },
+ { 'd', "dump", cmd_mapped, 2,
+ "[<num>]", "dump (some) content" },
+ { 'D', "kdump", cmd_kdump, 0,
+ "[<num>]", "let kernel dump (some) content" },
+ { '\0', "", NULL, 0, NULL, NULL }
+};
+
+static void handle_command(char *line)
+{
+ static char last_line[1024];
+
+ const struct command *cmd;
+ char *name, short_name = '\0';
+
+ SKIP_SPACE(line);
+ if (*line == '#')
+ return;
+
+ if (!*line)
+ strcpy(line, last_line);
+ else
+ strcpy(last_line, line);
+
+ name = line;
+ while (*line && !isspace(*line))
+ ++line;
+
+ if (*line) {
+ *line = '\0';
+ ++line;
+ }
+
+ if (!name[1])
+ short_name = name[0];
+
+ for (cmd = commands; *(cmd->name); ++cmd)
+ if (short_name
+ ? short_name == cmd->short_name
+ : !strcmp(name, cmd->name)) {
+ cmd->handle(name, line, cmd->arg);
+ return;
+ }
+
+ fprintf(stderr, "%s: unknown command\n", name);
+}
+
+
+/****************************** Main ******************************/
+
+int main(void)
+{
+ const struct command *cmd = commands;
+ unsigned no = 1;
+ char line[1024];
+ int skip = 0;
+
+ fputs("commands:\n", stderr);
+ do {
+ fprintf(stderr, " %c or %-7s %-10s %s\n",
+ cmd->short_name, cmd->name, cmd->help_args, cmd->help);
+ } while ((++cmd)->handle);
+ fputs(" # ... comment\n"
+ " <empty line> repeat previous\n"
+ "\n", stderr);
+
+ while (fgets(line, sizeof line, stdin)) {
+ char *nl = strchr(line, '\n');
+ if (nl) {
+ if (skip) {
+ fprintf(stderr, "cma: %d: line too long\n", no);
+ skip = 0;
+ } else {
+ *nl = '\0';
+ handle_command(line);
+ }
+ ++no;
+ } else {
+ skip = 1;
+ }
+ }
+
+ if (skip)
+ fprintf(stderr, "cma: %d: no new line at EOF\n", no);
+ return 0;
+}
--
1.7.2.3

2010-12-13 11:27:04

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 02/10] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked. This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/bitmap.h | 24 +++++++++++++++++++-----
lib/bitmap.c | 22 ++++++++++++----------
2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits);

extern void bitmap_set(unsigned long *map, int i, int len);
extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask)
+{
+ return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}

extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
}
EXPORT_SYMBOL(bitmap_clear);

-/*
+/**
* bitmap_find_next_zero_area - find a contiguous aligned zero area
* @map: The address to base the search on
* @size: The bitmap size in bits
* @start: The bitnumber to start searching at
* @nr: The number of zeroed bits we're looking for
* @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
*
* The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
*/
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset)
{
unsigned long index, end, i;
again:
index = find_next_zero_bit(map, size, start);

/* Align allocation */
- index = __ALIGN_MASK(index, align_mask);
+ index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;

end = index + nr;
if (end > size)
@@ -350,7 +352,7 @@ again:
}
return index;
}
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);

/*
* Bitmap printing & parsing functions: first version by Bill Irwin,
--
1.7.2.3

2010-12-13 11:28:27

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 10/10] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

This commit adds CMA memory reservation code to Aquila, Goni and c210
universal boards.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
arch/arm/mach-s5pv210/mach-aquila.c | 2 +
arch/arm/mach-s5pv210/mach-goni.c | 2 +
arch/arm/mach-s5pv310/mach-universal_c210.c | 2 +
arch/arm/plat-s5p/Makefile | 2 +
arch/arm/plat-s5p/cma-stub.c | 49 +++++++++++++++++++++++++++
arch/arm/plat-s5p/include/plat/cma-stub.h | 21 +++++++++++
6 files changed, 78 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/plat-s5p/cma-stub.c
create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h

diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c
index 28677ca..8608a16 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -39,6 +39,7 @@
#include <plat/fb.h>
#include <plat/fimc-core.h>
#include <plat/sdhci.h>
+#include <plat/cma-stub.h>

/* Following are default values for UCON, ULCON and UFCON UART registers */
#define AQUILA_UCON_DEFAULT (S3C2410_UCON_TXILEVEL | \
@@ -690,4 +691,5 @@ MACHINE_START(AQUILA, "Aquila")
.map_io = aquila_map_io,
.init_machine = aquila_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = cma_mach_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index b1dcf96..b1bf079 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -45,6 +45,7 @@
#include <plat/keypad.h>
#include <plat/sdhci.h>
#include <plat/clock.h>
+#include <plat/cma-stub.h>

/* Following are default values for UCON, ULCON and UFCON UART registers */
#define GONI_UCON_DEFAULT (S3C2410_UCON_TXILEVEL | \
@@ -865,4 +866,5 @@ MACHINE_START(GONI, "GONI")
.map_io = goni_map_io,
.init_machine = goni_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = cma_mach_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c b/arch/arm/mach-s5pv310/mach-universal_c210.c
index 16d8fc0..d65703a 100644
--- a/arch/arm/mach-s5pv310/mach-universal_c210.c
+++ b/arch/arm/mach-s5pv310/mach-universal_c210.c
@@ -21,6 +21,7 @@
#include <plat/s5pv310.h>
#include <plat/cpu.h>
#include <plat/devs.h>
+#include <plat/cma-stub.h>

#include <mach/map.h>

@@ -152,6 +153,7 @@ MACHINE_START(UNIVERSAL_C210, "UNIVERSAL_C210")
.boot_params = S5P_PA_SDRAM + 0x100,
.init_irq = s5pv310_init_irq,
.map_io = universal_map_io,
+ .reserve = cma_mach_reserve,
.init_machine = universal_machine_init,
.timer = &s5pv310_timer,
MACHINE_END
diff --git a/arch/arm/plat-s5p/Makefile b/arch/arm/plat-s5p/Makefile
index de65238..6fdb6ce 100644
--- a/arch/arm/plat-s5p/Makefile
+++ b/arch/arm/plat-s5p/Makefile
@@ -28,3 +28,5 @@ obj-$(CONFIG_S5P_DEV_FIMC0) += dev-fimc0.o
obj-$(CONFIG_S5P_DEV_FIMC1) += dev-fimc1.o
obj-$(CONFIG_S5P_DEV_FIMC2) += dev-fimc2.o
obj-$(CONFIG_S5P_DEV_ONENAND) += dev-onenand.o
+
+obj-$(CONFIG_CMA) += cma-stub.o
diff --git a/arch/arm/plat-s5p/cma-stub.c b/arch/arm/plat-s5p/cma-stub.c
new file mode 100644
index 0000000..716e56d
--- /dev/null
+++ b/arch/arm/plat-s5p/cma-stub.c
@@ -0,0 +1,49 @@
+/*
+ * This file is just a quick and dirty hack to get CMA testing device
+ * working. The cma_mach_reserve() should be called as mach's reserve
+ * callback. CMA testing device will use cma_ctx for allocations.
+ */
+
+#include <plat/cma-stub.h>
+
+#include <linux/cma.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+
+struct cma *cma_ctx;
+
+#define cma_size (32UL << 20) /* 32 MiB */
+
+static unsigned long cma_start __initdata;
+
+void __init cma_mach_reserve(void)
+{
+ unsigned long start = cma_reserve(0, cma_size, 0);
+ if (IS_ERR_VALUE(start))
+ printk(KERN_WARNING "cma: unable to reserve %lu for CMA: %d\n",
+ cma_size >> 20, (int)start);
+ else
+ cma_start = start;
+}
+
+static int __init cma_mach_init(void)
+{
+ int ret = -ENOMEM;
+
+ if (cma_start) {
+ struct cma *ctx = cma_create(cma_start, cma_size);
+ if (IS_ERR(ctx)) {
+ ret = PTR_ERR(ctx);
+ printk(KERN_WARNING
+ "cma: cma_create(%p, %p) failed: %d\n",
+ (void *)cma_start, (void *)cma_size, ret);
+ } else {
+ cma_ctx = ctx;
+ ret = 0;
+ }
+ }
+
+ return ret;
+}
+device_initcall(cma_mach_init);
diff --git a/arch/arm/plat-s5p/include/plat/cma-stub.h b/arch/arm/plat-s5p/include/plat/cma-stub.h
new file mode 100644
index 0000000..a24a03b
--- /dev/null
+++ b/arch/arm/plat-s5p/include/plat/cma-stub.h
@@ -0,0 +1,21 @@
+/*
+ * This file is just a quick and dirty hack to get CMA testing device
+ * working. The cma_mach_reserve() should be called as mach's reserve
+ * callback. CMA testing device will use cma_ctx for allocations.
+ */
+
+struct cma;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *cma_ctx;
+
+void cma_mach_reserve(void);
+
+#else
+
+#define cma_ctx ((struct cma *)NULL)
+
+#define cma_mach_reserve NULL
+
+#endif
--
1.7.2.3

2010-12-13 11:28:45

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 05/10] mm: alloc_contig_free_pages() added

From: KAMEZAWA Hiroyuki <[email protected]>

This commit introduces alloc_contig_free_pages() function
which allocates (ie. removes from buddy system) free pages
in range. Caller has to guarantee that all pages in range
are in buddy system.

Along with alloc_contig_free_pages(), a free_contig_pages()
function is provided which frees (or a subset of) pages
allocated with alloc_contig_free_pages().

I, Michal Nazarewicz, have modified the
alloc_contig_free_pages() function slightly from the original
version, mostly to make it easier to allocate note MAX_ORDER
aligned pages. This is done by making the function return
a pfn of a page one past the one allocated which may be
further then caller requested.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 3 ++
mm/page_alloc.c | 42 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
*/
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);

/*
* For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 826ba69..997f6c8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5425,6 +5425,48 @@ out:
spin_unlock_irqrestore(&zone->lock, flags);
}

+unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag)
+{
+ unsigned long pfn = start, count;
+ struct page *page;
+ struct zone *zone;
+ int order;
+
+ VM_BUG_ON(!pfn_valid(pfn));
+ page = pfn_to_page(pfn);
+
+ zone = page_zone(page);
+ spin_lock_irq(&zone->lock);
+ for (;;) {
+ VM_BUG_ON(page_count(page) || !PageBuddy(page));
+ list_del(&page->lru);
+ order = page_order(page);
+ zone->free_area[order].nr_free--;
+ rmv_page_order(page);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+ pfn += 1 << order;
+ if (pfn >= end)
+ break;
+ VM_BUG_ON(!pfn_valid(pfn));
+ page += 1 << order;
+ }
+ spin_unlock_irq(&zone->lock);
+
+ /* After this, pages in the range can be freed one be one */
+ page = pfn_to_page(start);
+ for (count = pfn - start; count; --count, ++page)
+ prep_new_page(page, 0, flag);
+
+ return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+ for (; nr_pages; --nr_pages, ++page)
+ __free_page(page);
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* All pages in the range must be isolated before calling this.
--
1.7.2.3

2010-12-13 11:28:06

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 07/10] mm: MIGRATE_CMA isolation functions added

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 39 ++++++++++++++++++++++++++-------------
mm/page_alloc.c | 6 +++---
mm/page_isolation.c | 15 ++++++++-------
3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..56f0e13 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,35 +3,49 @@

/*
* Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
* this will fail with -EBUSY.
*
* For isolating all pages in the range finally, the caller have to
* free all pages in the range. test_page_isolated() can be used for
* test it.
*/
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);

/*
* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
* target range is [start_pfn, end_pfn)
*/
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}

/*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
*/
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);

/*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
*/
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+ __unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
extern unsigned long alloc_contig_freed_pages(unsigned long start,
unsigned long end, gfp_t flag);
extern void free_contig_pages(struct page *page, int nr_pages);
@@ -39,7 +53,6 @@ extern void free_contig_pages(struct page *page, int nr_pages);
/*
* For migration.
*/
-
int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
unsigned long scan_lru_pages(unsigned long start, unsigned long end);
int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 537d1f6..084f1a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5474,7 +5474,7 @@ out:
return ret;
}

-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
{
struct zone *zone;
unsigned long flags;
@@ -5482,8 +5482,8 @@ void unset_migratetype_isolate(struct page *page)
spin_lock_irqsave(&zone->lock, flags);
if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
goto out;
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
- move_freepages_block(zone, page, MIGRATE_MOVABLE);
+ set_pageblock_migratetype(page, migratetype);
+ move_freepages_block(zone, page, migratetype);
out:
spin_unlock_irqrestore(&zone->lock, flags);
}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 077cf19..ea9781e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
}

/*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
* to be MIGRATE_ISOLATE.
* @start_pfn: The lower PFN of the range to be isolated.
* @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
*
* Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
* the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* start_pfn/end_pfn must be aligned to pageblock_order.
* Returns 0 on success and -EBUSY if any part of range cannot be isolated.
*/
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
for (pfn = start_pfn;
pfn < undo_pfn;
pfn += pageblock_nr_pages)
- unset_migratetype_isolate(pfn_to_page(pfn));
+ __unset_migratetype_isolate(pfn_to_page(pfn), migratetype);

return -EBUSY;
}
@@ -67,8 +68,8 @@ undo:
/*
* Make isolated pages available again.
*/
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
page = __first_valid_page(pfn, pageblock_nr_pages);
if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
continue;
- unset_migratetype_isolate(page);
+ __unset_migratetype_isolate(page, migratetype);
}
return 0;
}
--
1.7.2.3

2010-12-13 11:27:03

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 01/10] mm: migrate.c: fix compilation error

GCC complained about update_mmu_cache() not being defined
in migrate.c. Including <asm/tlbflush.h> seems to solve the problem.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
mm/migrate.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..6ae8a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,8 @@
#include <linux/hugetlb.h>
#include <linux/gfp.h>

+#include <asm/tlbflush.h>
+
#include "internal.h"

#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
--
1.7.2.3

2010-12-13 11:29:36

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 06/10] mm: MIGRATE_CMA migration type added

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/mmzone.h | 30 +++++++++++---
mm/Kconfig | 8 ++++
mm/compaction.c | 10 +++++
mm/internal.h | 3 +
mm/page_alloc.c | 97 +++++++++++++++++++++++++++++++++++++++--------
5 files changed, 124 insertions(+), 24 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 39c24eb..1b95899 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,24 @@
*/
#define PAGE_ALLOC_COSTLY_ORDER 3

-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE 1
-#define MIGRATE_MOVABLE 2
-#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE 3
-#define MIGRATE_ISOLATE 4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+ MIGRATE_UNMOVABLE,
+ MIGRATE_RECLAIMABLE,
+ MIGRATE_MOVABLE,
+ MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
+ MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+ MIGRATE_ISOLATE, /* can't allocate from here */
+#ifdef CONFIG_MIGRATE_CMA
+ MIGRATE_CMA, /* only movable */
+#endif
+ MIGRATE_TYPES
+};
+
+#ifdef CONFIG_MIGRATE_CMA
+# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+# define is_migrate_cma(migratetype) false
+#endif

#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +65,11 @@ static inline int get_pageblock_migratetype(struct page *page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
}

+static inline bool is_pageblock_cma(struct page *page)
+{
+ return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
diff --git a/mm/Kconfig b/mm/Kconfig
index b911ad3..7818b07 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1,3 +1,11 @@
+config MIGRATE_CMA
+ bool
+ help
+ This option should be selected by code that requires MIGRATE_CMA
+ migration type to be present. Once a page block has this
+ migration type, only movable pages can be allocated from it and
+ the page block never changes it's migration type.
+
config SELECT_MEMORY_MODEL
def_bool y
depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..c5e404b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;

+ /* Keep MIGRATE_CMA alone as well. */
+ /*
+ * XXX Revisit. We currently cannot let compaction touch CMA
+ * pages since compaction insists on changing their migration
+ * type to MIGRATE_MOVABLE (see split_free_page() called from
+ * isolate_freepages_block() above).
+ */
+ if (is_migrate_cma(migratetype))
+ return false;
+
/* If the page is a large free page, then allow migration */
if (PageBuddy(page) && page_order(page) >= pageblock_order)
return true;
diff --git a/mm/internal.h b/mm/internal.h
index dedb0af..cc24e74 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page);
* in mm/page_alloc.c
*/
extern void __free_pages_bootmem(struct page *page, unsigned int order);
+#ifdef CONFIG_MIGRATE_CMA
+extern void __free_pageblock_cma(struct page *page);
+#endif
extern void prep_compound_page(struct page *page, unsigned long order);
#ifdef CONFIG_MEMORY_FAILURE
extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 997f6c8..537d1f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -717,6 +717,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
}
}

+#ifdef CONFIG_MIGRATE_CMA
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init __free_pageblock_cma(struct page *page)
+{
+ struct page *p = page;
+ unsigned i = pageblock_nr_pages;
+
+ prefetchw(p);
+ do {
+ if (--i)
+ prefetchw(p + 1);
+ __ClearPageReserved(p);
+ set_page_count(p, 0);
+ } while (++p, i);
+
+ set_page_refcounted(page);
+ set_pageblock_migratetype(page, MIGRATE_CMA);
+ __free_pages(page, pageblock_order);
+}
+
+#endif

/*
* The order of subdivision here is critical for the IO subsystem.
@@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
*/
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
+#ifdef CONFIG_MIGRATE_CMA
+ [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
+#else
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
- [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
+#endif
+ [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
};

/*
@@ -924,12 +952,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
- for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+ for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
migratetype = fallbacks[start_migratetype][i];

/* MIGRATE_RESERVE handled later if necessary */
if (migratetype == MIGRATE_RESERVE)
- continue;
+ break;

area = &(zone->free_area[current_order]);
if (list_empty(&area->free_list[migratetype]))
@@ -944,19 +972,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* pages to the preferred allocation list. If falling
* back for a reclaimable kernel allocation, be more
* agressive about taking ownership of free pages
+ *
+ * On the other hand, never change migration
+ * type of MIGRATE_CMA pageblocks nor move CMA
+ * pages on different free lists. We don't
+ * want unmovable pages to be allocated from
+ * MIGRATE_CMA areas.
*/
- if (unlikely(current_order >= (pageblock_order >> 1)) ||
- start_migratetype == MIGRATE_RECLAIMABLE ||
- page_group_by_mobility_disabled) {
- unsigned long pages;
+ if (!is_pageblock_cma(page) &&
+ (unlikely(current_order >= (pageblock_order >> 1)) ||
+ start_migratetype == MIGRATE_RECLAIMABLE ||
+ page_group_by_mobility_disabled)) {
+ int pages;
pages = move_freepages_block(zone, page,
- start_migratetype);
+ start_migratetype);

- /* Claim the whole block if over half of it is free */
+ /*
+ * Claim the whole block if over half
+ * of it is free
+ */
if (pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled)
+ page_group_by_mobility_disabled)
set_pageblock_migratetype(page,
- start_migratetype);
+ start_migratetype);

migratetype = start_migratetype;
}
@@ -966,11 +1004,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
rmv_page_order(page);

/* Take ownership for orders >= pageblock_order */
- if (current_order >= pageblock_order)
+ if (current_order >= pageblock_order &&
+ !is_pageblock_cma(page))
change_pageblock_range(page, current_order,
start_migratetype);

- expand(zone, page, order, current_order, area, migratetype);
+ expand(zone, page, order, current_order, area,
+ is_migrate_cma(start_migratetype)
+ ? start_migratetype : migratetype);

trace_mm_page_alloc_extfrag(page, order, current_order,
start_migratetype, migratetype);
@@ -1042,7 +1083,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
list_add(&page->lru, list);
else
list_add_tail(&page->lru, list);
- set_page_private(page, migratetype);
+#ifdef CONFIG_MIGRATE_CMA
+ if (is_pageblock_cma(page))
+ set_page_private(page, MIGRATE_CMA);
+ else
+#endif
+ set_page_private(page, migratetype);
list = &page->lru;
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1181,9 +1227,16 @@ void free_hot_cold_page(struct page *page, int cold)
* offlined but treat RESERVE as movable pages so we can get those
* areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
+ *
+ * Still, do not change migration type of MIGRATE_CMA pages (if
+ * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+ * be allocated from MIGRATE_CMA block and we don't want to allow
+ * that). In this respect, treat MIGRATE_CMA like
+ * MIGRATE_ISOLATE.
*/
if (migratetype >= MIGRATE_PCPTYPES) {
- if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+ if (unlikely(migratetype == MIGRATE_ISOLATE
+ || is_migrate_cma(migratetype))) {
free_one_page(zone, page, 0, migratetype);
goto out;
}
@@ -1272,7 +1325,8 @@ int split_free_page(struct page *page)
if (order >= pageblock_order - 1) {
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages)
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ if (!is_pageblock_cma(page))
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);
}

return 1 << order;
@@ -5366,6 +5420,15 @@ int set_migratetype_isolate(struct page *page)
zone_idx = zone_idx(zone);

spin_lock_irqsave(&zone->lock, flags);
+ /*
+ * Treat MIGRATE_CMA specially since it may contain immobile
+ * CMA pages -- that's fine. CMA is likely going to touch
+ * only the mobile pages in the pageblokc.
+ */
+ if (is_pageblock_cma(page)) {
+ ret = 0;
+ goto out;
+ }

pfn = page_to_pfn(page);
arg.start_pfn = pfn;
--
1.7.2.3

2010-12-13 11:29:05

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv7 04/10] mm: move some functions from memory_hotplug.c to page_isolation.c

From: KAMEZAWA Hiroyuki <[email protected]>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
[mina86: reworded commit message]
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 7 +++
mm/memory_hotplug.c | 108 --------------------------------------
mm/page_isolation.c | 111 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 108 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);

+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);

#endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2c6523a..2b18cb5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -634,114 +634,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
}

/*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct zone *zone = NULL;
- struct page *page;
- int i;
- for (pfn = start_pfn;
- pfn < end_pfn;
- pfn += MAX_ORDER_NR_PAGES) {
- i = 0;
- /* This is just a CONFIG_HOLES_IN_ZONE check.*/
- while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
- i++;
- if (i == MAX_ORDER_NR_PAGES)
- continue;
- page = pfn_to_page(pfn + i);
- if (zone && page_zone(page) != zone)
- return 0;
- zone = page_zone(page);
- }
- return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
- unsigned long pfn;
- struct page *page;
- for (pfn = start; pfn < end; pfn++) {
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- if (PageLRU(page))
- return pfn;
- }
- }
- return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
- /* This should be improooooved!! */
- return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES (256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct page *page;
- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
- int not_managed = 0;
- int ret = 0;
- LIST_HEAD(source);
-
- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
- if (!pfn_valid(pfn))
- continue;
- page = pfn_to_page(pfn);
- if (!page_count(page))
- continue;
- /*
- * We can skip free pages. And we can only deal with pages on
- * LRU.
- */
- ret = isolate_lru_page(page);
- if (!ret) { /* Success */
- list_add_tail(&page->lru, &source);
- move_pages--;
- inc_zone_page_state(page, NR_ISOLATED_ANON +
- page_is_file_cache(page));
-
- } else {
-#ifdef CONFIG_DEBUG_VM
- printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
- pfn);
- dump_page(page);
-#endif
- /* Becasue we don't have big zone->lock. we should
- check this again here. */
- if (page_count(page)) {
- not_managed++;
- ret = -EBUSY;
- break;
- }
- }
- }
- if (!list_empty(&source)) {
- if (not_managed) {
- putback_lru_pages(&source);
- goto out;
- }
- /* this function returns # of failed pages */
- ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
- if (ret)
- putback_lru_pages(&source);
- }
-out:
- return ret;
-}
-
-/*
* remove from free_area[] and mark all as Reserved.
*/
static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..077cf19 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
#include <linux/mm.h>
#include <linux/page-isolation.h>
#include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
#include "internal.h"

static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
}
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct zone *zone = NULL;
+ struct page *page;
+ int i;
+ for (pfn = start_pfn;
+ pfn < end_pfn;
+ pfn += MAX_ORDER_NR_PAGES) {
+ i = 0;
+ /* This is just a CONFIG_HOLES_IN_ZONE check.*/
+ while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+ i++;
+ if (i == MAX_ORDER_NR_PAGES)
+ continue;
+ page = pfn_to_page(pfn + i);
+ if (zone && page_zone(page) != zone)
+ return 0;
+ zone = page_zone(page);
+ }
+ return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+ unsigned long pfn;
+ struct page *page;
+ for (pfn = start; pfn < end; pfn++) {
+ if (pfn_valid(pfn)) {
+ page = pfn_to_page(pfn);
+ if (PageLRU(page))
+ return pfn;
+ }
+ }
+ return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+ /* This should be improooooved!! */
+ return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES (256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct page *page;
+ int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+ int not_managed = 0;
+ int ret = 0;
+ LIST_HEAD(source);
+
+ for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+ if (!pfn_valid(pfn))
+ continue;
+ page = pfn_to_page(pfn);
+ if (!page_count(page))
+ continue;
+ /*
+ * We can skip free pages. And we can only deal with pages on
+ * LRU.
+ */
+ ret = isolate_lru_page(page);
+ if (!ret) { /* Success */
+ list_add_tail(&page->lru, &source);
+ move_pages--;
+ inc_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+
+ } else {
+#ifdef CONFIG_DEBUG_VM
+ printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+ pfn);
+ dump_page(page);
+#endif
+ /* Because we don't have big zone->lock. we should
+ check this again here. */
+ if (page_count(page)) {
+ not_managed++;
+ ret = -EBUSY;
+ break;
+ }
+ }
+ }
+ if (!list_empty(&source)) {
+ if (not_managed) {
+ putback_lru_pages(&source);
+ goto out;
+ }
+ /* this function returns # of failed pages */
+ ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+ if (ret)
+ putback_lru_pages(&source);
+ }
+out:
+ return ret;
+}
--
1.7.2.3

2010-12-14 01:21:13

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCHv7 06/10] mm: MIGRATE_CMA migration type added

On Mon, 13 Dec 2010 12:26:47 +0100
Michal Nazarewicz <[email protected]> wrote:

> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
>
> This guarantees that page in a MIGRATE_CMA page block can
> always be migrated somewhere else (unless there's no memory left
> in the system).
>
> It is designed to be used with Contiguous Memory Allocator
> (CMA) for allocating big chunks (eg. 10MiB) of physically
> contiguous memory. Once driver requests contiguous memory,
> CMA will migrate pages from MIGRATE_CMA pageblocks.
>
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> ---
> include/linux/mmzone.h | 30 +++++++++++---
> mm/Kconfig | 8 ++++
> mm/compaction.c | 10 +++++
> mm/internal.h | 3 +
> mm/page_alloc.c | 97 +++++++++++++++++++++++++++++++++++++++--------
> 5 files changed, 124 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 39c24eb..1b95899 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -35,13 +35,24 @@
> */
> #define PAGE_ALLOC_COSTLY_ORDER 3
>
> -#define MIGRATE_UNMOVABLE 0
> -#define MIGRATE_RECLAIMABLE 1
> -#define MIGRATE_MOVABLE 2
> -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
> -#define MIGRATE_RESERVE 3
> -#define MIGRATE_ISOLATE 4 /* can't allocate from here */
> -#define MIGRATE_TYPES 5
> +enum {
> + MIGRATE_UNMOVABLE,
> + MIGRATE_RECLAIMABLE,
> + MIGRATE_MOVABLE,
> + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
> + MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> + MIGRATE_ISOLATE, /* can't allocate from here */
> +#ifdef CONFIG_MIGRATE_CMA
> + MIGRATE_CMA, /* only movable */
> +#endif
> + MIGRATE_TYPES
> +};

A nitpick.

I personaly would like to put MIGRATE_ISOLATE to be bottom of enum because it
means _not_for_allocation.


> +
> +#ifdef CONFIG_MIGRATE_CMA
> +# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +# define is_migrate_cma(migratetype) false
> +#endif
>
> #define for_each_migratetype_order(order, type) \
> for (order = 0; order < MAX_ORDER; order++) \
> @@ -54,6 +65,11 @@ static inline int get_pageblock_migratetype(struct page *page)
> return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> }
>
> +static inline bool is_pageblock_cma(struct page *page)
> +{
> + return is_migrate_cma(get_pageblock_migratetype(page));
> +}
> +
> struct free_area {
> struct list_head free_list[MIGRATE_TYPES];
> unsigned long nr_free;
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b911ad3..7818b07 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1,3 +1,11 @@
> +config MIGRATE_CMA
> + bool
> + help
> + This option should be selected by code that requires MIGRATE_CMA
> + migration type to be present. Once a page block has this
> + migration type, only movable pages can be allocated from it and
> + the page block never changes it's migration type.
> +
> config SELECT_MEMORY_MODEL
> def_bool y
> depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4d709ee..c5e404b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page)
> if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
> return false;
>
> + /* Keep MIGRATE_CMA alone as well. */
> + /*
> + * XXX Revisit. We currently cannot let compaction touch CMA
> + * pages since compaction insists on changing their migration
> + * type to MIGRATE_MOVABLE (see split_free_page() called from
> + * isolate_freepages_block() above).
> + */
> + if (is_migrate_cma(migratetype))
> + return false;
> +
> /* If the page is a large free page, then allow migration */
> if (PageBuddy(page) && page_order(page) >= pageblock_order)
> return true;
> diff --git a/mm/internal.h b/mm/internal.h
> index dedb0af..cc24e74 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page);
> * in mm/page_alloc.c
> */
> extern void __free_pages_bootmem(struct page *page, unsigned int order);
> +#ifdef CONFIG_MIGRATE_CMA
> +extern void __free_pageblock_cma(struct page *page);
> +#endif
> extern void prep_compound_page(struct page *page, unsigned long order);
> #ifdef CONFIG_MEMORY_FAILURE
> extern bool is_free_buddy_page(struct page *page);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 997f6c8..537d1f6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -717,6 +717,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
> }
> }
>
> +#ifdef CONFIG_MIGRATE_CMA
> +
> +/*
> + * Free whole pageblock and set it's migration type to MIGRATE_CMA.
> + */
> +void __init __free_pageblock_cma(struct page *page)
> +{
> + struct page *p = page;
> + unsigned i = pageblock_nr_pages;
> +
> + prefetchw(p);
> + do {
> + if (--i)
> + prefetchw(p + 1);
> + __ClearPageReserved(p);
> + set_page_count(p, 0);
> + } while (++p, i);
> +
> + set_page_refcounted(page);
> + set_pageblock_migratetype(page, MIGRATE_CMA);
> + __free_pages(page, pageblock_order);
> +}
> +
> +#endif
>
> /*
> * The order of subdivision here is critical for the IO subsystem.
> @@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> * This array describes the order lists are fallen back to when
> * the free lists for the desirable migrate type are depleted
> */
> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
> [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
> [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
> +#ifdef CONFIG_MIGRATE_CMA
> + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
> +#else
> [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> - [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
> +#endif
> + [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
> };
>
> /*
> @@ -924,12 +952,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> /* Find the largest possible block of pages in the other list */
> for (current_order = MAX_ORDER-1; current_order >= order;
> --current_order) {
> - for (i = 0; i < MIGRATE_TYPES - 1; i++) {
> + for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {

Why fallbacks[0] ? and why do you need to change this ?

> migratetype = fallbacks[start_migratetype][i];
>
> /* MIGRATE_RESERVE handled later if necessary */
> if (migratetype == MIGRATE_RESERVE)
> - continue;
> + break;
>
Isn't this change enough for your purpose ?


> area = &(zone->free_area[current_order]);
> if (list_empty(&area->free_list[migratetype]))
> @@ -944,19 +972,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> * pages to the preferred allocation list. If falling
> * back for a reclaimable kernel allocation, be more
> * agressive about taking ownership of free pages
> + *
> + * On the other hand, never change migration
> + * type of MIGRATE_CMA pageblocks nor move CMA
> + * pages on different free lists. We don't
> + * want unmovable pages to be allocated from
> + * MIGRATE_CMA areas.
> */
> - if (unlikely(current_order >= (pageblock_order >> 1)) ||
> - start_migratetype == MIGRATE_RECLAIMABLE ||
> - page_group_by_mobility_disabled) {
> - unsigned long pages;
> + if (!is_pageblock_cma(page) &&
> + (unlikely(current_order >= (pageblock_order >> 1)) ||
> + start_migratetype == MIGRATE_RECLAIMABLE ||
> + page_group_by_mobility_disabled)) {
> + int pages;
> pages = move_freepages_block(zone, page,
> - start_migratetype);
> + start_migratetype);
>
> - /* Claim the whole block if over half of it is free */
> + /*
> + * Claim the whole block if over half
> + * of it is free
> + */
> if (pages >= (1 << (pageblock_order-1)) ||
> - page_group_by_mobility_disabled)
> + page_group_by_mobility_disabled)
> set_pageblock_migratetype(page,
> - start_migratetype);
> + start_migratetype);
>
> migratetype = start_migratetype;
> }
> @@ -966,11 +1004,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> rmv_page_order(page);
>
> /* Take ownership for orders >= pageblock_order */
> - if (current_order >= pageblock_order)
> + if (current_order >= pageblock_order &&
> + !is_pageblock_cma(page))
> change_pageblock_range(page, current_order,
> start_migratetype);
>
> - expand(zone, page, order, current_order, area, migratetype);
> + expand(zone, page, order, current_order, area,
> + is_migrate_cma(start_migratetype)
> + ? start_migratetype : migratetype);
>
> trace_mm_page_alloc_extfrag(page, order, current_order,
> start_migratetype, migratetype);
> @@ -1042,7 +1083,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> list_add(&page->lru, list);
> else
> list_add_tail(&page->lru, list);
> - set_page_private(page, migratetype);
> +#ifdef CONFIG_MIGRATE_CMA
> + if (is_pageblock_cma(page))
> + set_page_private(page, MIGRATE_CMA);
> + else
> +#endif
> + set_page_private(page, migratetype);

Hmm, doesn't this meet your changes which makes MIGRATE_CMA > MIGRATE_PCPLIST ?
And I think putting mixture of pages marked of MIGRATE_TYPE onto a pcplist is ugly.
Could you make this cleaner ?


> list = &page->lru;
> }
> __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1181,9 +1227,16 @@ void free_hot_cold_page(struct page *page, int cold)
> * offlined but treat RESERVE as movable pages so we can get those
> * areas back if necessary. Otherwise, we may have to free
> * excessively into the page allocator
> + *
> + * Still, do not change migration type of MIGRATE_CMA pages (if
> + * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
> + * be allocated from MIGRATE_CMA block and we don't want to allow
> + * that). In this respect, treat MIGRATE_CMA like
> + * MIGRATE_ISOLATE.
> */
> if (migratetype >= MIGRATE_PCPTYPES) {
> - if (unlikely(migratetype == MIGRATE_ISOLATE)) {
> + if (unlikely(migratetype == MIGRATE_ISOLATE
> + || is_migrate_cma(migratetype))) {
> free_one_page(zone, page, 0, migratetype);
> goto out;
> }

Doesn't this add *BAD* performance impact for usual use of pages marked as
MIGRATE_CMA ? IIUC, All pcp pages must be _drained_ at page migration after
making migrate_type as ISOLATED. So, this change should be unnecessary.

BTW, How about changing MIGRATE_CMA < MIGRATE_PCPTYPES ? and allow to have
it's own pcp list ?

I think
==
again:
if (likely(order == 0)) {
struct per_cpu_pages *pcp;
struct list_head *list;

local_irq_save(flags);
pcp = &this_cpu_ptr(zone->pageset)->pcp;
list = &pcp->lists[migratetype];
if (list_empty(list)) {
pcp->count += rmqueue_bulk(zone, 0,
pcp->batch, list,
migratetype, cold);
if (unlikely(list_empty(list))) {
+ if (migrate_type == MIGRATE_MOVABLE) { /*allow extra fallback*/
+ migrate_type == MIGRATE_CMA
+ goto again;
+ }
+ }
goto failed;
}

if (cold)
page = list_entry(list->prev, struct page, lru);
else
page = list_entry(list->next, struct page, lru);

list_del(&page->lru);
pcp->count--;
==
Will work enough as a fallback path which allows to allocate a memory from CMA area
if there aren't enough free pages. (and makes FALLBACK type as)

fallbacks[MIGRATE_CMA] = {?????},

for no fallbacks.


> @@ -1272,7 +1325,8 @@ int split_free_page(struct page *page)
> if (order >= pageblock_order - 1) {
> struct page *endpage = page + (1 << order) - 1;
> for (; page < endpage; page += pageblock_nr_pages)
> - set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> + if (!is_pageblock_cma(page))
> + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> }
>
> return 1 << order;
> @@ -5366,6 +5420,15 @@ int set_migratetype_isolate(struct page *page)
> zone_idx = zone_idx(zone);
>
> spin_lock_irqsave(&zone->lock, flags);
> + /*
> + * Treat MIGRATE_CMA specially since it may contain immobile
> + * CMA pages -- that's fine. CMA is likely going to touch
> + * only the mobile pages in the pageblokc.
> + */
> + if (is_pageblock_cma(page)) {
> + ret = 0;
> + goto out;
> + }
>
> pfn = page_to_pfn(page);
> arg.start_pfn = pfn;

Hmm, I'm not sure why you dont' have any change in __free_one_page() which overwrite
pageblock type. Is MIGRATE_CMA range is aligned to MAX_ORDER ? If so, please mention
about it in patch description or comment because of the patch order.


Thanks,
-Kame


2010-12-14 01:29:56

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCHv7 08/10] mm: cma: Contiguous Memory Allocator added

On Mon, 13 Dec 2010 12:26:49 +0100
Michal Nazarewicz <[email protected]> wrote:

> The Contiguous Memory Allocator is a set of functions that lets
> one initialise a region of memory which then can be used to perform
> allocations of contiguous memory chunks from. The implementation
> uses MIGRATE_CMA migration type which means that the memory is
> shared with standard page allocator, ie. when CMA is not using
> the memory, page allocator can allocate movable pages from the
> region.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> ---
> include/linux/cma.h | 223 ++++++++++++++++++++++++
> mm/Kconfig | 32 ++++
> mm/Makefile | 1 +
> mm/cma.c | 477 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 733 insertions(+), 0 deletions(-)
> create mode 100644 include/linux/cma.h
> create mode 100644 mm/cma.c
>
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> new file mode 100644
> index 0000000..25728a3
> --- /dev/null
> +++ b/include/linux/cma.h
> @@ -0,0 +1,223 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator
> + * Copyright (c) 2010 by Samsung Electronics.
> + * Written by Michal Nazarewicz ([email protected])
> + */
> +
> +/*
> + * Contiguous Memory Allocator
> + *
> + * The Contiguous Memory Allocator (CMA) makes it possible for
> + * device drivers to allocate big contiguous chunks of memory after
> + * the system has booted.
> + *
> + * It requires some machine- and/or platform-specific initialisation
> + * code which prepares memory ranges to be used with CMA and later,
> + * device drivers can allocate memory from those ranges.
> + *
> + * Why is it needed?
> + *
> + * Various devices on embedded systems have no scatter-getter and/or
> + * IO map support and require contiguous blocks of memory to
> + * operate. They include devices such as cameras, hardware video
> + * coders, etc.
> + *
> + * Such devices often require big memory buffers (a full HD frame
> + * is, for instance, more then 2 mega pixels large, i.e. more than 6
> + * MB of memory), which makes mechanisms such as kmalloc() or
> + * alloc_page() ineffective.
> + *
> + * At the same time, a solution where a big memory region is
> + * reserved for a device is suboptimal since often more memory is
> + * reserved then strictly required and, moreover, the memory is
> + * inaccessible to page system even if device drivers don't use it.
> + *
> + * CMA tries to solve this issue by operating on memory regions
> + * where only movable pages can be allocated from. This way, kernel
> + * can use the memory for pagecache and when device driver requests
> + * it, allocated pages can be migrated.
> + *
> + * Driver usage
> + *
> + * For device driver to use CMA it needs to have a pointer to a CMA
> + * context represented by a struct cma (which is an opaque data
> + * type).
> + *
> + * Once such pointer is obtained, device driver may allocate
> + * contiguous memory chunk using the following function:
> + *
> + * cm_alloc()
> + *
> + * This function returns a pointer to struct cm (another opaque data
> + * type) which represent a contiguous memory chunk. This pointer
> + * may be used with the following functions:
> + *
> + * cm_free() -- frees allocated contiguous memory
> + * cm_pin() -- pins memory
> + * cm_unpin() -- unpins memory
> + * cm_vmap() -- maps memory in kernel space
> + * cm_vunmap() -- unmaps memory from kernel space
> + *
> + * See the respective functions for more information.
> + *
> + * Platform/machine integration
> + *
> + * For device drivers to be able to use CMA platform or machine
> + * initialisation code must create a CMA context and pass it to
> + * device drivers. The latter may be done by a global variable or
> + * a platform/machine specific function. For the former CMA
> + * provides the following functions:
> + *
> + * cma_init()
> + * cma_reserve()
> + * cma_create()
> + *
> + * The first one initialises a portion of reserved memory so that it
> + * can be used with CMA. The second first tries to reserve memory
> + * (using memblock) and then initialise it.
> + *
> + * The cma_reserve() function must be called when memblock is still
> + * operational and reserving memory with it is still possible. On
> + * ARM platform the "reserve" machine callback is a perfect place to
> + * call it.
> + *
> + * The last function creates a CMA context on a range of previously
> + * initialised memory addresses. Because it uses kmalloc() it needs
> + * to be called after SLAB is initialised.
> + */
> +
> +/***************************** Kernel level API *****************************/
> +
> +#if defined __KERNEL__ && defined CONFIG_CMA
> +
> +/* CMA context */
> +struct cma;
> +/* Contiguous Memory chunk */
> +struct cm;
> +
> +/**
> + * cma_init() - initialises range of physical memory to be used with CMA.
> + * @start: start address of the memory range in bytes.
> + * @size: size of the memory range in bytes.
> + *
> + * The range must be MAX_ORDER-1 aligned and it must have been already
> + * reserved (eg. with memblock).
> + *
> + * Returns zero on success or negative error.
> + */
> +int cma_init(unsigned long start, unsigned long end);
> +
> +/**
> + * cma_reserve() - reserves and initialises memory to be used with CMA.
> + * @start: start address of the memory range in bytes hint; if unsure
> + * pass zero (will be down-aligned to MAX_ORDER-1).
> + * @size: size of the memory to reserve in bytes (will be up-aligned
> + * to MAX_ORDER-1).
> + * @alignment: desired alignment in bytes (must be power of two or zero).
> + *
> + * It will use memblock to allocate memory and then initialise it for
> + * use with CMA by invoking cma_init(). It must be called early in
> + * boot process while memblock is still operational.
> + *
> + * Returns reserved's area physical address or value that yields true
> + * when checked with IS_ERR_VALUE().
> + */
> +unsigned long cma_reserve(unsigned long start, unsigned long size,
> + unsigned long alignment);
> +
> +/**
> + * cma_create() - creates CMA context.
> + * @start: start address of the context in bytes.
> + * @size: size of the context in bytes.
> + *
> + * The range must be page aligned. The range must have been already
> + * initialised with cma_init(). Different contexts cannot overlap.
> + *
> + * Because this function uses kmalloc() it must be called after SLAB
> + * is initialised. This in particular means that it cannot be called
> + * just after cma_reserve() since the former needs to be run way
> + * earlier.
> + *
> + * Returns pointer to CMA context or a pointer-error on error.
> + */
> +struct cma *cma_create(unsigned long start, unsigned long size);
> +
> +/**
> + * cma_destroy() - destroys CMA context.
> + * @cma: context to destroy.
> + */
> +void cma_destroy(struct cma *cma);
> +
> +/**
> + * cm_alloc() - allocates contiguous memory.
> + * @cma: CMA context to use.
> + * @size: desired chunk size in bytes (must be non-zero).
> + * @alignent: desired minimal alignment in bytes (must be power of two
> + * or zero).
> + *
> + * Returns pointer to structure representing contiguous memory or
> + * a pointer-error on error.
> + */
> +struct cm *cm_alloc(struct cma *cma, unsigned long size,
> + unsigned long alignment);
> +
> +/**
> + * cm_free() - frees contiguous memory.
> + * @cm: contiguous memory to free.
> + *
> + * The contiguous memory must be not be pinned (see cma_pin()) and
> + * must not be mapped to kernel space (cma_vmap()).
> + */
> +void cm_free(struct cm *cm);
> +
> +/**
> + * cm_pin() - pins contiguous memory.
> + * @cm: contiguous memory to pin.
> + *
> + * Pinning is required to obtain contiguous memory's physical address.
> + * While memory is pinned the memory will remain valid it may change
> + * if memory is unpinned and then pinned again. This facility is
> + * provided so that memory defragmentation can be implemented inside
> + * CMA.
> + *
> + * Each call to cm_pin() must be accompanied by call to cm_unpin() and
> + * the calls may be nested.
> + *
> + * Returns chunk's physical address or a value that yields true when
> + * tested with IS_ERR_VALUE().
> + */
> +unsigned long cm_pin(struct cm *cm);
> +
> +/**
> + * cm_unpin() - unpins contiguous memory.
> + * @cm: contiguous memory to unpin.
> + *
> + * See cm_pin().
> + */
> +void cm_unpin(struct cm *cm);
> +
> +/**
> + * cm_vmap() - maps memory to kernel space (or returns existing mapping).
> + * @cm: contiguous memory to map.
> + *
> + * Each call to cm_vmap() must be accompanied with call to cm_vunmap()
> + * and the calls may be nested.
> + *
> + * Returns kernel virtual address or a pointer-error.
> + */
> +void *cm_vmap(struct cm *cm);
> +
> +/**
> + * cm_vunmap() - unmpas memory from kernel space.
> + * @cm: contiguous memory to unmap.
> + *
> + * See cm_vmap().
> + */
> +void cm_vunmap(struct cm *cm);
> +
> +#endif
> +
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 7818b07..743893b 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -339,3 +339,35 @@ config CLEANCACHE
> in a negligible performance hit.
>
> If unsure, say Y to enable cleancache
> +
> +
> +config CMA
> + bool "Contiguous Memory Allocator framework"
> + # Currently there is only one allocator so force it on
> + select MIGRATION
> + select MIGRATE_CMA
> + select GENERIC_ALLOCATOR
> + help
> + This enables the Contiguous Memory Allocator framework which
> + allows drivers to allocate big physically-contiguous blocks of
> + memory for use with hardware components that do not support I/O
> + map nor scatter-gather.
> +
> + If you select this option you will also have to select at least
> + one allocator algorithm below.
> +
> + To make use of CMA you need to specify the regions and
> + driver->region mapping on command line when booting the kernel.
> +
> + For more information see <include/linux/cma.h>. If unsure, say "n".
> +
> +config CMA_DEBUG
> + bool "CMA debug messages (DEVELOPEMENT)"
> + depends on CMA
> + help
> + Turns on debug messages in CMA. This produces KERN_DEBUG
> + messages for every CMA call as well as various messages while
> + processing calls such as cma_alloc(). This option does not
> + affect warning and error messages.
> +
> + This is mostly used during development. If unsure, say "n".
> diff --git a/mm/Makefile b/mm/Makefile
> index 0b08d1c..c6a84f1 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -43,3 +43,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
> obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> +obj-$(CONFIG_CMA) += cma.o
> diff --git a/mm/cma.c b/mm/cma.c
> new file mode 100644
> index 0000000..401e604
> --- /dev/null
> +++ b/mm/cma.c
> @@ -0,0 +1,477 @@
> +/*
> + * Contiguous Memory Allocator framework
> + * Copyright (c) 2010 by Samsung Electronics.
> + * Written by Michal Nazarewicz ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +/*
> + * See include/linux/cma.h for details.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +# define DEBUG
> +#endif
> +
> +#include <linux/cma.h>
> +
> +#ifndef CONFIG_NO_BOOTMEM
> +# include <linux/bootmem.h>
> +#endif
> +#ifdef CONFIG_HAVE_MEMBLOCK
> +# include <linux/memblock.h>
> +#endif
> +
> +#include <linux/err.h>
> +#include <linux/genalloc.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/page-isolation.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +
> +#include <asm/page.h>
> +
> +#include "internal.h"
> +
> +
> +/************************* Initialise CMA *************************/
> +
> +static struct cma_grabbed {
> + unsigned long start;
> + unsigned long size;
> +} cma_grabbed[8] __initdata;
> +static unsigned cma_grabbed_count __initdata;
> +
> +int cma_init(unsigned long start, unsigned long size)
> +{
> + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> + if (!size)
> + return -EINVAL;
> + if ((start | size) & ((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1))
> + return -EINVAL;
> + if (start + size < start)
> + return -EOVERFLOW;
> +
> + if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed))
> + return -ENOSPC;
> +
> + cma_grabbed[cma_grabbed_count].start = start;
> + cma_grabbed[cma_grabbed_count].size = size;
> + ++cma_grabbed_count;
> + return 0;
> +}
> +

Is it guaranteed that there are no memory holes, or zone overlap
in the range ? I think correctness of the range must be checked.




> +unsigned long cma_reserve(unsigned long start, unsigned long size,
> + unsigned long alignment)
> +{
> + u64 addr;
> + int ret;
> +
> + pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
> + (void *)alignment);
> +
> + /* Sanity checks */
> + if (!size || (alignment & (alignment - 1)))
> + return (unsigned long)-EINVAL;
> +
> + /* Sanitise input arguments */
> + start = ALIGN(start, MAX_ORDER_NR_PAGES << PAGE_SHIFT);
> + size &= ~((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1);
> + if (alignment < (MAX_ORDER_NR_PAGES << PAGE_SHIFT))
> + alignment = MAX_ORDER_NR_PAGES << PAGE_SHIFT;
> +
> + /* Reserve memory */
> + if (start) {
> + if (memblock_is_region_reserved(start, size) ||
> + memblock_reserve(start, size) < 0)
> + return (unsigned long)-EBUSY;
> + } else {
> + /*
> + * Use __memblock_alloc_base() since
> + * memblock_alloc_base() panic()s.
> + */
> + addr = __memblock_alloc_base(size, alignment, 0);
> + if (!addr) {
> + return (unsigned long)-ENOMEM;
> + } else if (addr + size > ~(unsigned long)0) {
> + memblock_free(addr, size);
> + return (unsigned long)-EOVERFLOW;
> + } else {
> + start = addr;
> + }
> + }
> +
> + /* CMA Initialise */
> + ret = cma_init(start, size);
> + if (ret < 0) {
> + memblock_free(start, size);
> + return ret;
> + }
> + return start;
> +}
> +
> +static int __init cma_give_back(void)
> +{
> + struct cma_grabbed *r = cma_grabbed;
> + unsigned i = cma_grabbed_count;
> +
> + pr_debug("%s(): will give %u range(s)\n", __func__, i);
> +
> + for (; i; --i, ++r) {
> + struct page *p = phys_to_page(r->start);
> + unsigned j = r->size >> (PAGE_SHIFT + pageblock_order);
> +
> + pr_debug("%s(): giving (%p+%p)\n", __func__,
> + (void *)r->start, (void *)r->size);
> +
> + do {
> + __free_pageblock_cma(p);
> + p += pageblock_nr_pages;
> + } while (--j);
> + }
> +
> + return 0;
> +}
> +subsys_initcall(cma_give_back);
> +
> +
> +/************************** CMA context ***************************/
> +
> +/* struct cma is just an alias for struct gen_alloc */
> +
> +struct cma *cma_create(unsigned long start, unsigned long size)
> +{
> + struct gen_pool *pool;
> + int ret;
> +
> + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> + if (!size)
> + return ERR_PTR(-EINVAL);
> + if ((start | size) & (PAGE_SIZE - 1))
> + return ERR_PTR(-EINVAL);
> + if (start + size < start)
> + return ERR_PTR(-EOVERFLOW);
> +
> + pool = gen_pool_create(PAGE_SHIFT, -1);
> + if (unlikely(!pool))
> + return ERR_PTR(-ENOMEM);
> +
> + ret = gen_pool_add(pool, start, size, -1);
> + if (unlikely(ret)) {
> + gen_pool_destroy(pool);
> + return ERR_PTR(ret);
> + }
> +
> + pr_debug("%s: returning <%p>\n", __func__, (void *)pool);
> + return (void *)pool;
> +}
> +
> +void cma_destroy(struct cma *cma)
> +{
> + pr_debug("%s(<%p>)\n", __func__, (void *)cma);
> + gen_pool_destroy((void *)cma);
> +}
> +
> +
> +/************************* Allocate and free *************************/
> +
> +struct cm {
> + struct gen_pool *pool;
> + unsigned long phys, size;
> + atomic_t pinned, mapped;
> +};
> +
> +/* Protects cm_alloc(), cm_free(), __cm_alloc() and __cm_free(). */
> +static DEFINE_MUTEX(cma_mutex);
> +
> +/* Must hold cma_mutex to call these. */
> +static int __cm_alloc(unsigned long start, unsigned long size);
> +static void __cm_free(unsigned long start, unsigned long size);
> +
> +struct cm *cm_alloc(struct cma *cma, unsigned long size,
> + unsigned long alignment)
> +{
> + unsigned long start;
> + int ret = -ENOMEM;
> + struct cm *cm;
> +
> + pr_debug("%s(<%p>, %p/%p)\n", __func__, (void *)cma,
> + (void *)size, (void *)alignment);
> +
> + if (!size || (alignment & (alignment - 1)))
> + return ERR_PTR(-EINVAL);
> + size = PAGE_ALIGN(size);
> +
> + cm = kmalloc(sizeof *cm, GFP_KERNEL);
> + if (!cm)
> + return ERR_PTR(-ENOMEM);
> +
> + mutex_lock(&cma_mutex);
> +
> + start = gen_pool_alloc_aligned((void *)cma, size,
> + alignment ? ffs(alignment) - 1 : 0);
> + if (!start)
> + goto error1;
> +
> + ret = __cm_alloc(start, size);
> + if (ret)
> + goto error2;
> +
> + mutex_unlock(&cma_mutex);
> +
> + cm->pool = (void *)cma;
> + cm->phys = start;
> + cm->size = size;
> + atomic_set(&cm->pinned, 0);
> + atomic_set(&cm->mapped, 0);
> +
> + pr_debug("%s(): returning [%p]\n", __func__, (void *)cm);
> + return cm;
> +
> +error2:
> + gen_pool_free((void *)cma, start, size);
> +error1:
> + mutex_unlock(&cma_mutex);
> + kfree(cm);
> + return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(cm_alloc);
> +
> +void cm_free(struct cm *cm)
> +{
> + pr_debug("%s([%p])\n", __func__, (void *)cm);
> +
> + if (WARN_ON(atomic_read(&cm->pinned) || atomic_read(&cm->mapped)))
> + return;
> +
> + mutex_lock(&cma_mutex);
> +
> + gen_pool_free(cm->pool, cm->phys, cm->size);
> + __cm_free(cm->phys, cm->size);
> +
> + mutex_unlock(&cma_mutex);
> +
> + kfree(cm);
> +}
> +EXPORT_SYMBOL_GPL(cm_free);
> +
> +
> +/************************* Mapping and addresses *************************/
> +
> +/*
> + * Currently no-operations but keep reference counters for error
> + * checking.
> + */
> +
> +unsigned long cm_pin(struct cm *cm)
> +{
> + pr_debug("%s([%p])\n", __func__, (void *)cm);
> + atomic_inc(&cm->pinned);
> + return cm->phys;
> +}
> +EXPORT_SYMBOL_GPL(cm_pin);
> +
> +void cm_unpin(struct cm *cm)
> +{
> + pr_debug("%s([%p])\n", __func__, (void *)cm);
> + WARN_ON(!atomic_add_unless(&cm->pinned, -1, 0));
> +}
> +EXPORT_SYMBOL_GPL(cm_unpin);
> +
> +void *cm_vmap(struct cm *cm)
> +{
> + pr_debug("%s([%p])\n", __func__, (void *)cm);
> + atomic_inc(&cm->mapped);
> + /*
> + * Keep it simple... We should do something more clever in
> + * the future.
> + */
> + return phys_to_virt(cm->phys);
> +}
> +EXPORT_SYMBOL_GPL(cm_vmap);
> +
> +void cm_vunmap(struct cm *cm)
> +{
> + pr_debug("%s([%p])\n", __func__, (void *)cm);
> + WARN_ON(!atomic_add_unless(&cm->mapped, -1, 0));
> +}
> +EXPORT_SYMBOL_GPL(cm_vunmap);
> +
> +
> +/************************* Migration stuff *************************/
> +
> +/* XXX Revisit */
> +#ifdef phys_to_pfn
> +/* nothing to do */
> +#elif defined __phys_to_pfn
> +# define phys_to_pfn __phys_to_pfn
> +#else
> +# warning correct phys_to_pfn implementation needed
> +static unsigned long phys_to_pfn(phys_addr_t phys)
> +{
> + return virt_to_pfn(phys_to_virt(phys));
> +}
> +#endif
> +
> +static unsigned long pfn_to_maxpage(unsigned long pfn)
> +{
> + return pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +}
> +
> +static unsigned long pfn_to_maxpage_up(unsigned long pfn)
> +{
> + return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +#define MIGRATION_RETRY 5
> +static int __cm_migrate(unsigned long start, unsigned long end)
> +{
> + int migration_failed = 0, ret;
> + unsigned long pfn = start;
> +
> + pr_debug("%s(%p..%p)\n", __func__, (void *)start, (void *)end);
> +
> + /*
> + * Some code "borrowed" from KAMEZAWA Hiroyuki's
> + * __alloc_contig_pages().
> + */
> +
> + for (;;) {
> + pfn = scan_lru_pages(pfn, end);
> + if (!pfn || pfn >= end)
> + break;
> +
> + ret = do_migrate_range(pfn, end);
> + if (!ret) {
> + migration_failed = 0;
> + } else if (ret != -EBUSY
> + || ++migration_failed >= MIGRATION_RETRY) {
> + return ret;
> + } else {
> + /* There are unstable pages.on pagevec. */
> + lru_add_drain_all();
> + /*
> + * there may be pages on pcplist before
> + * we mark the range as ISOLATED.
> + */
> + drain_all_pages();
> + }
> + cond_resched();
> + }
> +
> + if (!migration_failed) {
> + /* drop all pages in pagevec and pcp list */
> + lru_add_drain_all();
> + drain_all_pages();
> + }
> +
> + /* Make sure all pages are isolated */
> + if (WARN_ON(test_pages_isolated(start, end)))
> + return -EBUSY;
> +
> + return 0;
> +}
> +
> +static int __cm_alloc(unsigned long start, unsigned long size)
> +{
> + unsigned long end, _start, _end;
> + int ret;
> +
> + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> + /*
> + * What we do here is we mark all pageblocks in range as
> + * MIGRATE_ISOLATE. Because of the way page allocator work, we
> + * align the range to MAX_ORDER pages so that page allocator
> + * won't try to merge buddies from different pageblocks and
> + * change MIGRATE_ISOLATE to some other migration type.
> + *
> + * Once the pageblocks are marked as MIGRATE_ISOLATE, we
> + * migrate the pages from an unaligned range (ie. pages that
> + * we are interested in). This will put all the pages in
> + * range back to page allocator as MIGRATE_ISOLATE.
> + *
> + * When this is done, we take the pages in range from page
> + * allocator removing them from the buddy system. This way
> + * page allocator will never consider using them.
> + *
> + * This lets us mark the pageblocks back as MIGRATE_CMA so
> + * that free pages in the MAX_ORDER aligned range but not in
> + * the unaligned, original range are put back to page
> + * allocator so that buddy can use them.
> + */
> +
> + start = phys_to_pfn(start);
> + end = start + (size >> PAGE_SHIFT);
> +
> + pr_debug("\tisolate range(%lx, %lx)\n",
> + pfn_to_maxpage(start), pfn_to_maxpage_up(end));
> + ret = __start_isolate_page_range(pfn_to_maxpage(start),
> + pfn_to_maxpage_up(end), MIGRATE_CMA);
> + if (ret)
> + goto done;
> +
> + pr_debug("\tmigrate range(%lx, %lx)\n", start, end);
> + ret = __cm_migrate(start, end);
> + if (ret)
> + goto done;
> +
> + /*
> + * Pages from [start, end) are within a MAX_ORDER aligned
> + * blocks that are marked as MIGRATE_ISOLATE. What's more,
> + * all pages in [start, end) are free in page allocator. What
> + * we are going to do is to allocate all pages from [start,
> + * end) (that is remove them from page allocater).
> + *
> + * The only problem is that pages at the beginning and at the
> + * end of interesting range may be not aligned with pages that
> + * page allocator holds, ie. they can be part of higher order
> + * pages. Because of this, we reserve the bigger range and
> + * once this is done free the pages we are not interested in.
> + */
> +
> + pr_debug("\tfinding buddy\n");
> + ret = 0;
> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> + if (WARN_ON(++ret >= MAX_ORDER))
> + return -EINVAL;
> +
> + _start = start & (~0UL << ret);
> + pr_debug("\talloc freed(%lx, %lx)\n", _start, end);
> + _end = alloc_contig_freed_pages(_start, end, 0);
> +
> + /* Free head and tail (if any) */
> + pr_debug("\tfree contig(%lx, %lx)\n", _start, start);
> + free_contig_pages(pfn_to_page(_start), start - _start);
> + pr_debug("\tfree contig(%lx, %lx)\n", end, _end);
> + free_contig_pages(pfn_to_page(end), _end - end);
> +
> + ret = 0;
> +
> +done:
> + pr_debug("\tundo isolate range(%lx, %lx)\n",
> + pfn_to_maxpage(start), pfn_to_maxpage_up(end));
> + __undo_isolate_page_range(pfn_to_maxpage(start),
> + pfn_to_maxpage_up(end), MIGRATE_CMA);
> +
> + pr_debug("ret = %d\n", ret);
> + return ret;
> +}
> +
> +static void __cm_free(unsigned long start, unsigned long size)
> +{
> + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
> +
> + free_contig_pages(pfn_to_page(phys_to_pfn(start)),
> + size >> PAGE_SHIFT);
> +}

Hmm, it seems __cm_alloc() and __cm_migrate() has no special codes for CMA.
I'd like reuse this for my own contig page allocator.
So, could you make these function be more generic (name) ?
as
__alloc_range(start, size, mirate_type);

Then, what I have to do is only to add "search range" functions.

Thanks,
-Kame



2010-12-14 10:18:17

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCHv7 06/10] mm: MIGRATE_CMA migration type added

> On Mon, 13 Dec 2010 12:26:47 +0100
> Michal Nazarewicz <[email protected]> wrote:
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> @@ -35,13 +35,24 @@
>> */
>> #define PAGE_ALLOC_COSTLY_ORDER 3
>>
>> -#define MIGRATE_UNMOVABLE 0
>> -#define MIGRATE_RECLAIMABLE 1
>> -#define MIGRATE_MOVABLE 2
>> -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
>> -#define MIGRATE_RESERVE 3
>> -#define MIGRATE_ISOLATE 4 /* can't allocate from here */
>> -#define MIGRATE_TYPES 5
>> +enum {
>> + MIGRATE_UNMOVABLE,
>> + MIGRATE_RECLAIMABLE,
>> + MIGRATE_MOVABLE,
>> + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
>> + MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>> + MIGRATE_ISOLATE, /* can't allocate from here */
>> +#ifdef CONFIG_MIGRATE_CMA
>> + MIGRATE_CMA, /* only movable */
>> +#endif
>> + MIGRATE_TYPES
>> +};
>

KAMEZAWA Hiroyuki <[email protected]> writes:
> I personaly would like to put MIGRATE_ISOLATE to be bottom of enum
> because it means _not_for_allocation.

Will change. I didn't want to change the value of MIGRATE_ISOLATE in
fear of breaking something but hopefully no one depends on
MIGRATE_ISOLATE's value.

>> diff --git a/mm/compaction.c b/mm/compaction.c
>> @@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>> * This array describes the order lists are fallen back to when
>> * the free lists for the desirable migrate type are depleted
>> */
>> -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
>> +static int fallbacks[MIGRATE_TYPES][4] = {
>> [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
>> [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
>> +#ifdef CONFIG_MIGRATE_CMA
>> + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
>> +#else
>> [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
>> - [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
>> +#endif
>> + [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
>> };
>>
>> /*
>> @@ -924,12 +952,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>> /* Find the largest possible block of pages in the other list */
>> for (current_order = MAX_ORDER-1; current_order >= order;
>> --current_order) {
>> - for (i = 0; i < MIGRATE_TYPES - 1; i++) {
>> + for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {

> Why fallbacks[0] ? and why do you need to change this ?

I've changed the dimensions of fallbacks matrix, in particular second
dimension from MIGRATE_TYPE - 1 to 4 so this place needed to be changed
as well. Now, I think changing to ARRAY_SIZE() is just the safest
option available. This is actually just a minor optimisation.

>> migratetype = fallbacks[start_migratetype][i];
>>
>> /* MIGRATE_RESERVE handled later if necessary */
>> if (migratetype == MIGRATE_RESERVE)
>> - continue;
>> + break;
>>

> Isn't this change enough for your purpose ?

This is mostly just an optimisation really. I'm not sure what you think
is my purpose here. ;) It does fix an this issue of some of the
fallback[*] arrays having MIGRATETYPE_UNMOVABLE at the end.

>> @@ -1042,7 +1083,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>> list_add(&page->lru, list);
>> else
>> list_add_tail(&page->lru, list);
>> - set_page_private(page, migratetype);
>> +#ifdef CONFIG_MIGRATE_CMA
>> + if (is_pageblock_cma(page))
>> + set_page_private(page, MIGRATE_CMA);
>> + else
>> +#endif
>> + set_page_private(page, migratetype);

> Hmm, doesn't this meet your changes which makes MIGRATE_CMA >
> MIGRATE_PCPLIST ? And I think putting mixture of pages marked of
> MIGRATE_TYPE onto a pcplist is ugly.

You mean that pcplist page is on can disagree with page_private()?
I didn't think this was such a big deal honestly. Unless MIGRATE_CMA <
MIGRATE_PCPTYPES, a special case is needed either here or in
free_pcppages_bulk(), so I think this comes down to whether to make
MIGRATE_CAM < MIGRATE_PCPTYPES for which see below.

>> @@ -1181,9 +1227,16 @@ void free_hot_cold_page(struct page *page, int cold)
>> * offlined but treat RESERVE as movable pages so we can get those
>> * areas back if necessary. Otherwise, we may have to free
>> * excessively into the page allocator
>> + *
>> + * Still, do not change migration type of MIGRATE_CMA pages (if
>> + * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
>> + * be allocated from MIGRATE_CMA block and we don't want to allow
>> + * that). In this respect, treat MIGRATE_CMA like
>> + * MIGRATE_ISOLATE.
>> */
>> if (migratetype >= MIGRATE_PCPTYPES) {
>> - if (unlikely(migratetype == MIGRATE_ISOLATE)) {
>> + if (unlikely(migratetype == MIGRATE_ISOLATE
>> + || is_migrate_cma(migratetype))) {
>> free_one_page(zone, page, 0, migratetype);
>> goto out;
>> }

> Doesn't this add *BAD* performance impact for usual use of pages
> marked as MIGRATE_CMA ? IIUC, All pcp pages must be _drained_ at page
> migration after making migrate_type as ISOLATED. So, this change
> should be unnecessary.

Come to think of it, it would appear that you are right. I'll remove
this change.

> BTW, How about changing MIGRATE_CMA < MIGRATE_PCPTYPES ? and allow to
> have it's own pcp list ?
>
> I think
> ==
> again:
> if (likely(order == 0)) {
> struct per_cpu_pages *pcp;
> struct list_head *list;
>
> local_irq_save(flags);
> pcp = &this_cpu_ptr(zone->pageset)->pcp;
> list = &pcp->lists[migratetype];
> if (list_empty(list)) {
> pcp->count += rmqueue_bulk(zone, 0,
> pcp->batch, list,
> migratetype, cold);
> if (unlikely(list_empty(list))) {
> + if (migrate_type == MIGRATE_MOVABLE) { /*allow extra fallback*/
> + migrate_type == MIGRATE_CMA
> + goto again;

(This unbalances local_irq_save but that's just a minor note.)

> + }
> + }
> goto failed;
> }
>
> if (cold)
> page = list_entry(list->prev, struct page, lru);
> else
> page = list_entry(list->next, struct page, lru);
>
> list_del(&page->lru);
> pcp->count--;
> ==
> Will work enough as a fallback path which allows to allocate a memory
> from CMA area if there aren't enough free pages. (and makes FALLBACK
> type as)
>
> fallbacks[MIGRATE_CMA] = {?????},
>
> for no fallbacks.

Yes, I think that would work. I didn't want to create a new pcp list
especially since in most respects it behaves just like MIGRATE_MOVABLE.
Moreover, with MIGRATE_MOVABLE and MIGRATE_CMA sharing the same pcp list
the above additional fallback path is not necessary and instead the
already existing __rmqueue_fallback() path can be used.

>> @@ -1272,7 +1325,8 @@ int split_free_page(struct page *page)
>> if (order >= pageblock_order - 1) {
>> struct page *endpage = page + (1 << order) - 1;
>> for (; page < endpage; page += pageblock_nr_pages)
>> - set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> + if (!is_pageblock_cma(page))
>> + set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> }
>>
>> return 1 << order;
>> @@ -5366,6 +5420,15 @@ int set_migratetype_isolate(struct page *page)
>> zone_idx = zone_idx(zone);
>>
>> spin_lock_irqsave(&zone->lock, flags);
>> + /*
>> + * Treat MIGRATE_CMA specially since it may contain immobile
>> + * CMA pages -- that's fine. CMA is likely going to touch
>> + * only the mobile pages in the pageblokc.
>> + */
>> + if (is_pageblock_cma(page)) {
>> + ret = 0;
>> + goto out;
>> + }
>>
>> pfn = page_to_pfn(page);
>> arg.start_pfn = pfn;

> Hmm, I'm not sure why you dont' have any change in __free_one_page()
> which overwrite pageblock type. Is MIGRATE_CMA range is aligned to
> MAX_ORDER ? If so, please mention about it in patch description or
> comment because of the patch order.

Yep, you're correct. For MIGRATE_CMA to be usable, pages marked with it
must be aligned to MAX_ORDER_NR_PAGES. I'll add that in a comment
somewhere.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86-tlen.pl>--<jid:mina86-jabber.org>--ooO--(_)--Ooo--

2010-12-14 10:23:21

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCHv7 08/10] mm: cma: Contiguous Memory Allocator added

> On Mon, 13 Dec 2010 12:26:49 +0100
> Michal Nazarewicz <[email protected]> wrote:
>> +/************************* Initialise CMA *************************/
>> +
>> +static struct cma_grabbed {
>> + unsigned long start;
>> + unsigned long size;
>> +} cma_grabbed[8] __initdata;
>> +static unsigned cma_grabbed_count __initdata;
>> +
>> +int cma_init(unsigned long start, unsigned long size)
>> +{
>> + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
>> +
>> + if (!size)
>> + return -EINVAL;
>> + if ((start | size) & ((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1))
>> + return -EINVAL;
>> + if (start + size < start)
>> + return -EOVERFLOW;
>> +
>> + if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed))
>> + return -ENOSPC;
>> +
>> + cma_grabbed[cma_grabbed_count].start = start;
>> + cma_grabbed[cma_grabbed_count].size = size;
>> + ++cma_grabbed_count;
>> + return 0;
>> +}
>> +

KAMEZAWA Hiroyuki <[email protected]> writes:
> Is it guaranteed that there are no memory holes, or zone overlap
> in the range ? I think correctness of the range must be checked.

I keep thinking about it myself. The idea is that you get memory range
reserved using memblock (or some such) thus it should not contain any
memory holes. I'm not entirely sure about spanning different zones.
I'll add the checking code.

>> +#define MIGRATION_RETRY 5
>> +static int __cm_migrate(unsigned long start, unsigned long end)
>> +{
[...]
>> +}
>> +
>> +static int __cm_alloc(unsigned long start, unsigned long size)
>> +{
>> + unsigned long end, _start, _end;
>> + int ret;
>> +
[...]
>> +
>> + start = phys_to_pfn(start);
>> + end = start + (size >> PAGE_SHIFT);
>> +
>> + pr_debug("\tisolate range(%lx, %lx)\n",
>> + pfn_to_maxpage(start), pfn_to_maxpage_up(end));
>> + ret = __start_isolate_page_range(pfn_to_maxpage(start),
>> + pfn_to_maxpage_up(end), MIGRATE_CMA);
>> + if (ret)
>> + goto done;
>> +
>> + pr_debug("\tmigrate range(%lx, %lx)\n", start, end);
>> + ret = __cm_migrate(start, end);
>> + if (ret)
>> + goto done;
>> +
[...]
>> +
>> + pr_debug("\tfinding buddy\n");
>> + ret = 0;
>> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
>> + if (WARN_ON(++ret >= MAX_ORDER))
>> + return -EINVAL;
>> +
>> + _start = start & (~0UL << ret);
>> + pr_debug("\talloc freed(%lx, %lx)\n", _start, end);
>> + _end = alloc_contig_freed_pages(_start, end, 0);
>> +
>> + /* Free head and tail (if any) */
>> + pr_debug("\tfree contig(%lx, %lx)\n", _start, start);
>> + free_contig_pages(pfn_to_page(_start), start - _start);
>> + pr_debug("\tfree contig(%lx, %lx)\n", end, _end);
>> + free_contig_pages(pfn_to_page(end), _end - end);
>> +
>> + ret = 0;
>> +
>> +done:
>> + pr_debug("\tundo isolate range(%lx, %lx)\n",
>> + pfn_to_maxpage(start), pfn_to_maxpage_up(end));
>> + __undo_isolate_page_range(pfn_to_maxpage(start),
>> + pfn_to_maxpage_up(end), MIGRATE_CMA);
>> +
>> + pr_debug("ret = %d\n", ret);
>> + return ret;
>> +}
>> +
>> +static void __cm_free(unsigned long start, unsigned long size)
>> +{
>> + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
>> +
>> + free_contig_pages(pfn_to_page(phys_to_pfn(start)),
>> + size >> PAGE_SHIFT);
>> +}

> Hmm, it seems __cm_alloc() and __cm_migrate() has no special codes for CMA.
> I'd like reuse this for my own contig page allocator.
> So, could you make these function be more generic (name) ?
> as
> __alloc_range(start, size, mirate_type);
>
> Then, what I have to do is only to add "search range" functions.

Sure thing. I'll post it tomorrow or Friday. How about
alloc_contig_range() maybe?

--
Pozdrawiam _ _
.o. | Wasal Jasnie Oswieconej Pani Informatyki o' \,=./ `o
..o | Michal "mina86" Nazarewicz <mina86*tlen.pl> (o o)
ooo +---<jid:mina86-jabber.org>---<tlen:mina86>---ooO--(_)--Ooo--

2010-12-14 23:56:52

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCHv7 08/10] mm: cma: Contiguous Memory Allocator added

On Tue, 14 Dec 2010 11:23:15 +0100
Michal Nazarewicz <[email protected]> wrote:

> > Hmm, it seems __cm_alloc() and __cm_migrate() has no special codes for CMA.
> > I'd like reuse this for my own contig page allocator.
> > So, could you make these function be more generic (name) ?
> > as
> > __alloc_range(start, size, mirate_type);
> >
> > Then, what I have to do is only to add "search range" functions.
>
> Sure thing. I'll post it tomorrow or Friday. How about
> alloc_contig_range() maybe?
>

That sounds great. Thank you.

-Kame