2010-11-19 15:58:54

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 00/13] The Contiguous Memory Allocator framework

Hello everyone,

A few people asked about CMA at the LPC, so even though the I have not
yet finished working on the new CMA here it is so that all
interested parties can take a look and decide if it can be used for
their use case.

In particular, this version adds not yet completed support for memory
migration and cma_pin()/cma_unpin() calls.


For those who have not yet stumbled across CMA an excerpt from
documentation:

The Contiguous Memory Allocator (CMA) is a framework, which allows
setting up a machine-specific configuration for physically-contiguous
memory management. Memory for devices is then allocated according
to that configuration.

The main role of the framework is not to allocate memory, but to
parse and manage memory configurations, as well as to act as an
in-between between device drivers and pluggable allocators. It is
thus not tied to any memory allocation method or strategy.

For more information please refer to the fourth patch from the
patchset which contains the documentation.


Links to the previous versions of the patch set:
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010/>
v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573/>
v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986/>
v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669/>


Changelog:

v6: 1. Most importantly, v6 introduces support for memory migration.
The implementation is not yet complete though.

Migration support means that when CMA is not using memory
reserved for it, page allocator can allocate pages from it.
When CMA wants to use the memory, the pages have to be moved
and/or evicted as to make room for CMA.

To make it possible it must be guaranteed that only movable and
reclaimable pages are allocated in CMA controlled regions.
This is done by introducing a MIGRATE_CMA migrate type that
guarantees exactly that.

Some of the migration code is "borrowed" from Kamezawa
Hiroyuki's alloc_contig_pages() implementation. The main
difference is that thanks to MIGRATE_CMA migrate type CMA
assumes that memory controlled by CMA are is always movable or
reclaimable so that it makes allocation decisions regardless of
the whether some pages are actually allocated and migrates them
if needed.

The most interesting patches from the patchset that implement
the functionality are:

09/13: mm: alloc_contig_free_pages() added
10/13: mm: MIGRATE_CMA migration type added
11/13: mm: MIGRATE_CMA isolation functions added
12/13: mm: cma: Migration support added [wip]

Currently, kernel panics in some situations which I am trying
to investigate.

2. cma_pin() and cma_unpin() functions has been added (after
a conversation with Johan Mossberg). The idea is that whenever
hardware does not use the memory (no transaction is on) the
chunk can be moved around. This would allow defragmentation to
be implemented if desired. No defragmentation algorithm is
provided at this time.

3. Sysfs support has been replaced with debugfs. I always felt
unsure about the sysfs interface and when Greg KH pointed it
out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
that platform will provide a "*=<regions>" rule in the map
attribute.

2. The terminology has been changed slightly renaming "kind" to
"type" of memory. In the previous revisions, the documentation
indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
a separate patch, the fourth one). As a consequence, the
cma_set_defaults() function has been changed -- it no longer
accepts a string with list of regions but an array of regions.

2. The "asterisk" attribute has been removed. Now, each region
has an "asterisk" flag which lets one specify whether this
region should by considered "asterisk" region.

3. SysFS support has been moved to a separate patch (the third one
in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed. In exchange,
a SysFS entry has been created under kernel/mm/contiguous.

The intended way of specifying the attributes is
a cma_set_defaults() function called by platform initialisation
code. "regions" attribute (the string specified by "cma"
command line parameter) can be overwritten with command line
parameter; the other attributes can be changed during run-time
using the SysFS entries.

2. The behaviour of the "map" attribute has been modified
slightly. Currently, if no rule matches given device it is
assigned regions specified by the "asterisk" attribute. It is
by default built from the region names given in "regions"
attribute.

3. Devices can register private regions as well as regions that
can be shared but are not reserved using standard CMA
mechanisms. A private region has no name and can be accessed
only by devices that have the pointer to it.

4. The way allocators are registered has changed. Currently,
a cma_allocator_register() function is used for that purpose.
Moreover, allocators are attached to regions the first time
memory is registered from the region or when allocator is
registered which means that allocators can be dynamic modules
that are loaded after the kernel booted (of course, it won't be
possible to allocate a chunk of memory from a region if
allocator is not loaded).

5. Index of new functions:

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size,
+ dma_addr_t alignment)

+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)

+int __must_check cma_region_register(struct cma_region *reg);

+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);

+int cma_allocator_register(struct cma_allocator *alloc);


The whole patch set includes the following patches:

lib: rbtree: rb_root_init() function added
lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements

The above three are not really related to the CMA as such, they
only modify various library routines which are then used by CMA.

mm: cma: Contiguous Memory Allocator added

This is the main file implementing CMA. No migration support
here. Half of the patch is documentation and header file with
kernel-doc so it may be worth a while reading if you're
interested.

mm: cma: debugfs support added

This adds debugfs support to CMA. This patch is not really
important so you can safely skip it if you are in a hurry. ;)

mm: cma: Best-fit algorithm added

This adds a best-fit allocator. Again, this patch is not that
important even though it shows how a custom allocator can be
implemented and added to the CMA framework.

mm: cma: Test device and application added

A simple "testing" device and application. This lets allocate
chunks form user space as to test basic functionality. Once
again, you may safely ignore this patch.

mm: move some functions to page_isolation.c

This is Kamezawa Hiroyuki's patch. It moves some code migration
related code from mm/memory_hotplug.c to mm/page_isolation.c so
that it can be used even if memory hotplug is not enabled.

mm: alloc_contig_free_pages() added

This is taken from KAMEZAWA Hiroyuki's patch. It implements an
alloc_contig_free_pages() and free_contig_pages() functions. The
first one allocates a range of pages and the second frees them.
The pages that are allocated must be in buddy system.

mm: MIGRATE_CMA migration type added

This patch adds a new migration type: MIGRATE_CMA. It's
characteristics is that only movable and reclaimable pages can be
allocated from MIGRATE_CMA marked pageblock and once pageblokc's
migrate type is set to MIGRATE_CMA it is never changed by page
allocator to anything else.

mm: MIGRATE_CMA isolation functions added

This changes several functions that change pageblock migrate type
to MIGRATE_MOVABLE to take an argument which specifies what type
to change pageblock's migrate type to. This is then used with
MIGRATE_CMA pageblocks.

mm: cma: Migration support added [wip]

This adds support for migrating pages from CMA managed regions.
This means, that when CMA is not using part of the region it is
given to the page allocator to use.

ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

This commit adds support for CMA to three ARM boards.

Documentation/00-INDEX | 2 +
Documentation/contiguous-memory.txt | 577 +++++++++
arch/arm/mach-s5pv210/mach-aquila.c | 26 +
arch/arm/mach-s5pv210/mach-goni.c | 26 +
arch/arm/mach-s5pv310/mach-universal_c210.c | 17 +
drivers/misc/Kconfig | 8 +
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 263 +++++
include/linux/bitmap.h | 24 +-
include/linux/cma.h | 569 +++++++++
include/linux/genalloc.h | 46 +-
include/linux/mmzone.h | 30 +-
include/linux/page-isolation.h | 47 +-
include/linux/rbtree.h | 11 +
lib/bitmap.c | 22 +-
lib/genalloc.c | 182 ++--
mm/Kconfig | 98 ++
mm/Makefile | 2 +
mm/cma-best-fit.c | 382 ++++++
mm/cma.c | 1671 +++++++++++++++++++++++++++
mm/compaction.c | 10 +
mm/internal.h | 3 +
mm/memory_hotplug.c | 108 --
mm/page_alloc.c | 131 ++-
mm/page_isolation.c | 126 ++-
tools/cma/cma-test.c | 459 ++++++++
26 files changed, 4575 insertions(+), 266 deletions(-)
create mode 100644 Documentation/contiguous-memory.txt
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma-best-fit.c
create mode 100644 mm/cma.c
create mode 100644 tools/cma/cma-test.c

--
1.7.2.3


2010-11-19 15:58:35

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 13/13] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

This commit adds CMA memory reservation code to Aquila, Goni and c210
universal boards.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
arch/arm/mach-s5pv210/mach-aquila.c | 26 ++++++++++++++++++++++++++
arch/arm/mach-s5pv210/mach-goni.c | 26 ++++++++++++++++++++++++++
arch/arm/mach-s5pv310/mach-universal_c210.c | 17 +++++++++++++++++
3 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c
index 28677ca..f1feb73 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -21,6 +21,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -650,6 +651,30 @@ static void __init aquila_sound_init(void)
__raw_writel(__raw_readl(S5P_OTHERS) | (0x3 << 8), S5P_OTHERS);
}

+#ifdef CONFIG_CMA
+
+static void __init aquila_reserve(void)
+{
+ static struct cma_region regions[] = {
+ CMA_REGION("fw", 1 << 20, 128 << 10, 0x32000000),
+ CMA_REGION("b1", 32 << 20, 0, 0x33000000),
+ CMA_REGION("b2", 16 << 20, 0, 0x44000000),
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s5p-mfc5/f=fw;s5p-mfc5/a=b1;s5p-mfc5/b=b2;*=b1,b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
+#else
+
+#define aquila_reserve NULL
+
+#endif
+
static void __init aquila_map_io(void)
{
s5p_init_io(NULL, 0, S5P_VA_CHIPID);
@@ -690,4 +715,5 @@ MACHINE_START(AQUILA, "Aquila")
.map_io = aquila_map_io,
.init_machine = aquila_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = aquila_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index b1dcf96..0bda14f 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -25,6 +25,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -809,6 +810,30 @@ static void __init goni_sound_init(void)
__raw_writel(__raw_readl(S5P_OTHERS) | (0x3 << 8), S5P_OTHERS);
}

+#ifdef CONFIG_CMA
+
+static void __init goni_reserve(void)
+{
+ static struct cma_region regions[] = {
+ CMA_REGION("fw", 1 << 20, 128 << 10, 0x32000000),
+ CMA_REGION("b1", 32 << 20, 0, 0x33000000),
+ CMA_REGION("b2", 16 << 20, 0, 0x44000000),
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s5p-mfc5/f=fw;s5p-mfc5/a=b1;s5p-mfc5/b=b2;*=b1,b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
+#else
+
+#define goni_reserve NULL
+
+#endif
+
static void __init goni_map_io(void)
{
s5p_init_io(NULL, 0, S5P_VA_CHIPID);
@@ -865,4 +890,5 @@ MACHINE_START(GONI, "GONI")
.map_io = goni_map_io,
.init_machine = goni_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = goni_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c b/arch/arm/mach-s5pv310/mach-universal_c210.c
index 16d8fc0..90a2296 100644
--- a/arch/arm/mach-s5pv310/mach-universal_c210.c
+++ b/arch/arm/mach-s5pv310/mach-universal_c210.c
@@ -13,6 +13,7 @@
#include <linux/i2c.h>
#include <linux/gpio_keys.h>
#include <linux/gpio.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach-types.h>
@@ -138,6 +139,21 @@ static void __init universal_map_io(void)
s3c24xx_init_uarts(universal_uartcfgs, ARRAY_SIZE(universal_uartcfgs));
}

+static void __init universal_reserve(void)
+{
+ static struct cma_region regions[] = {
+ CMA_REGION("r" , 64 << 20, 0, 0),
+ CMA_REGION("fw", 1 << 20, 128 << 10),
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s3c-mfc5/f=fw;*=r";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
static void __init universal_machine_init(void)
{
i2c_register_board_info(0, i2c0_devs, ARRAY_SIZE(i2c0_devs));
@@ -152,6 +168,7 @@ MACHINE_START(UNIVERSAL_C210, "UNIVERSAL_C210")
.boot_params = S5P_PA_SDRAM + 0x100,
.init_irq = s5pv310_init_irq,
.map_io = universal_map_io,
+ .reserve = universal_reserve,
.init_machine = universal_machine_init,
.timer = &s5pv310_timer,
MACHINE_END
--
1.7.2.3

2010-11-19 15:59:13

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 10/13] mm: MIGRATE_CMA migration type added

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable and reclaimable pages can be allocated from
MIGRATE_CMA page blocks and (ii) page allocator will never
change migration type of MIGRATE_CMA page blocks.

This guarantees that page in a MIGRATE_CMA page block can
always be freed (by reclaiming it or moving somewhere else).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate or reclaim pages from MIGRATE_CMA page block.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

To use this new migration type one can use
__free_pageblock_cma() function which moves frees a whole page
block to a buddy allocator marking it (and thus all pages in
it) as MIGRATE_CMA.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/mmzone.h | 30 +++++++++++++----
mm/Kconfig | 9 +++++
mm/compaction.c | 10 ++++++
mm/internal.h | 3 ++
mm/page_alloc.c | 83 +++++++++++++++++++++++++++++++++++++++---------
5 files changed, 113 insertions(+), 22 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 39c24eb..317da6b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,24 @@
*/
#define PAGE_ALLOC_COSTLY_ORDER 3

-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE 1
-#define MIGRATE_MOVABLE 2
-#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE 3
-#define MIGRATE_ISOLATE 4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+ MIGRATE_UNMOVABLE,
+ MIGRATE_RECLAIMABLE,
+ MIGRATE_MOVABLE,
+ MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
+ MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+ MIGRATE_ISOLATE, /* can't allocate from here */
+#ifdef CONFIG_MIGRATE_CMA
+ MIGRATE_CMA, /* only movable & reclaimable */
+#endif
+ MIGRATE_TYPES
+};
+
+#ifdef CONFIG_MIGRATE_CMA
+# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+# define is_migrate_cma(migratetype) false
+#endif

#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +65,11 @@ static inline int get_pageblock_migratetype(struct page *page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
}

+static inline bool is_pageblock_cma(struct page *page)
+{
+ return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
diff --git a/mm/Kconfig b/mm/Kconfig
index 5ad2471..4aee3c5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1,3 +1,12 @@
+config MIGRATE_CMA
+ bool
+ help
+ This option should be selected by code that requires MIGRATE_CMA
+ migration type to be present. Once a page block has this
+ migration type, only movable and reclaimable pages can be
+ allocated from it and the page block never changes it's
+ migration type.
+
config SELECT_MEMORY_MODEL
def_bool y
depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..c5e404b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;

+ /* Keep MIGRATE_CMA alone as well. */
+ /*
+ * XXX Revisit. We currently cannot let compaction touch CMA
+ * pages since compaction insists on changing their migration
+ * type to MIGRATE_MOVABLE (see split_free_page() called from
+ * isolate_freepages_block() above).
+ */
+ if (is_migrate_cma(migratetype))
+ return false;
+
/* If the page is a large free page, then allow migration */
if (PageBuddy(page) && page_order(page) >= pageblock_order)
return true;
diff --git a/mm/internal.h b/mm/internal.h
index dedb0af..cc24e74 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page);
* in mm/page_alloc.c
*/
extern void __free_pages_bootmem(struct page *page, unsigned int order);
+#ifdef CONFIG_MIGRATE_CMA
+extern void __free_pageblock_cma(struct page *page);
+#endif
extern void prep_compound_page(struct page *page, unsigned long order);
#ifdef CONFIG_MEMORY_FAILURE
extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6dd2854..91daf22 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -712,6 +712,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
}
}

+#ifdef CONFIG_MIGRATE_CMA
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init __free_pageblock_cma(struct page *page)
+{
+ struct page *p = page;
+ unsigned i = pageblock_nr_pages;
+
+ prefetchw(p);
+ do {
+ if (--i)
+ prefetchw(p + 1);
+ __ClearPageReserved(p);
+ set_page_count(p, 0);
+ } while (++p, i);
+
+ set_page_refcounted(page);
+ set_pageblock_migratetype(page, MIGRATE_CMA);
+ __free_pages(page, pageblock_order);
+}
+
+#endif

/*
* The order of subdivision here is critical for the IO subsystem.
@@ -819,11 +843,16 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
*/
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
+#ifdef CONFIG_MIGRATE_CMA
+ [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
+ [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
+#else
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
- [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
+#endif
+ [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
};

/*
@@ -919,12 +948,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
- for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+ for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
migratetype = fallbacks[start_migratetype][i];

/* MIGRATE_RESERVE handled later if necessary */
if (migratetype == MIGRATE_RESERVE)
- continue;
+ break;

area = &(zone->free_area[current_order]);
if (list_empty(&area->free_list[migratetype]))
@@ -941,17 +970,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* agressive about taking ownership of free pages
*/
if (unlikely(current_order >= (pageblock_order >> 1)) ||
- start_migratetype == MIGRATE_RECLAIMABLE ||
- page_group_by_mobility_disabled) {
- unsigned long pages;
+ start_migratetype == MIGRATE_RECLAIMABLE ||
+ page_group_by_mobility_disabled) {
+ int pages;
pages = move_freepages_block(zone, page,
- start_migratetype);
+ start_migratetype);

- /* Claim the whole block if over half of it is free */
- if (pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled)
+ /*
+ * Claim the whole block if over half
+ * of it is free
+ *
+ * On the other hand, never change
+ * migration type of MIGRATE_CMA
+ * pageblockss. We don't want
+ * unmovable or unreclaimable pages to
+ * be allocated from MIGRATE_CMA
+ * areas.
+ */
+ if (!is_pageblock_cma(page) &&
+ (pages >= (1 << (pageblock_order-1)) ||
+ page_group_by_mobility_disabled))
set_pageblock_migratetype(page,
- start_migratetype);
+ start_migratetype);

migratetype = start_migratetype;
}
@@ -961,7 +1001,8 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
rmv_page_order(page);

/* Take ownership for orders >= pageblock_order */
- if (current_order >= pageblock_order)
+ if (current_order >= pageblock_order &&
+ !is_pageblock_cma(page))
change_pageblock_range(page, current_order,
start_migratetype);

@@ -1176,9 +1217,16 @@ void free_hot_cold_page(struct page *page, int cold)
* offlined but treat RESERVE as movable pages so we can get those
* areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
+ *
+ * Still, do not change migration type of MIGRATE_CMA pages (if
+ * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+ * be allocated from MIGRATE_CMA block and we don't want to allow
+ * that). In this respect, treat MIGRATE_CMA like
+ * MIGRATE_ISOLATE.
*/
if (migratetype >= MIGRATE_PCPTYPES) {
- if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+ if (unlikely(migratetype == MIGRATE_ISOLATE
+ || is_migrate_cma(migratetype))) {
free_one_page(zone, page, 0, migratetype);
goto out;
}
@@ -1267,7 +1315,8 @@ int split_free_page(struct page *page)
if (order >= pageblock_order - 1) {
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages)
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ if (!is_pageblock_cma(page))
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);
}

return 1 << order;
@@ -5365,6 +5414,10 @@ int set_migratetype_isolate(struct page *page)
zone_idx = zone_idx(zone);

spin_lock_irqsave(&zone->lock, flags);
+ if (is_pageblock_cma(page)) {
+ ret = 0;
+ goto out;
+ }

pfn = page_to_pfn(page);
arg.start_pfn = pfn;
--
1.7.2.3

2010-11-19 15:58:37

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 07/13] mm: cma: Test device and application added

This patch adds a "cma" misc device which lets user space use the
CMA API. This device is meant for testing. A testing application
is also provided.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
drivers/misc/Kconfig | 8 +
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 263 +++++++++++++++++++++++++++
include/linux/cma.h | 40 +++++
tools/cma/cma-test.c | 459 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 771 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 tools/cma/cma-test.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 1e1a4be..519a291 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -458,4 +458,12 @@ source "drivers/misc/cb710/Kconfig"
source "drivers/misc/iwmc3200top/Kconfig"
source "drivers/misc/ti-st/Kconfig"

+config CMA_DEVICE
+ tristate "CMA misc device (DEVELOPEMENT)"
+ depends on CMA_DEVELOPEMENT
+ help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space. This is mostly for testing of the CMA
+ framework.
+
endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..f8eadd4 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
obj-$(CONFIG_PCH_PHUB) += pch_phub.o
obj-y += ti-st/
obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o
+obj-$(CONFIG_CMA_DEVICE) += cma-dev.o
diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c
new file mode 100644
index 0000000..dce418a
--- /dev/null
+++ b/drivers/misc/cma-dev.c
@@ -0,0 +1,263 @@
+/*
+ * Contiguous Memory Allocator userspace driver
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR_VALUE() */
+#include <linux/fs.h> /* struct file */
+#include <linux/mm.h> /* Memory stuff */
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/module.h> /* Standard module stuff */
+#include <linux/device.h> /* struct device, dev_dbg() */
+#include <linux/types.h> /* Just to be safe ;) */
+#include <linux/uaccess.h> /* __copy_{to,from}_user */
+#include <linux/miscdevice.h> /* misc_register() and company */
+
+#include <linux/cma.h>
+
+static int cma_file_open(struct inode *inode, struct file *file);
+static int cma_file_release(struct inode *inode, struct file *file);
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma);
+
+static struct miscdevice cma_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "cma",
+ .fops = &(const struct file_operations) {
+ .owner = THIS_MODULE,
+ .open = cma_file_open,
+ .release = cma_file_release,
+ .unlocked_ioctl = cma_file_ioctl,
+ .mmap = cma_file_mmap,
+ },
+};
+#define cma_dev (cma_miscdev.this_device)
+
+static int cma_file_open(struct inode *inode, struct file *file)
+{
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ file->private_data = NULL;
+
+ return 0;
+}
+
+static int cma_file_release(struct inode *inode, struct file *file)
+{
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (file->private_data) {
+ cma_unpin(file->private_data);
+ cma_free(file->private_data);
+ }
+
+ return 0;
+}
+
+static long cma_file_ioctl_req(struct file *file, unsigned long arg)
+{
+ struct cma_alloc_request req;
+ const struct cma *chunk;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!arg)
+ return -EINVAL;
+
+ if (file->private_data) /* Already allocated */
+ return -EBADFD;
+
+ if (copy_from_user(&req, (void *)arg, sizeof req))
+ return -EFAULT;
+
+ if (req.magic != CMA_MAGIC)
+ return -ENOTTY;
+
+ if (req.type != CMA_REQ_DEV_KIND && req.type != CMA_REQ_FROM_REG)
+ return -EINVAL;
+
+ /* May happen on 32 bit system. */
+ if (req.size > ~(typeof(req.size))0 ||
+ req.alignment > ~(typeof(req.alignment))0)
+ return -EINVAL;
+
+ if (strnlen(req.spec, sizeof req.spec) >= sizeof req.spec)
+ return -EINVAL;
+
+ if (req.type == CMA_REQ_DEV_KIND) {
+ struct device fake_device;
+ char *kind;
+
+ fake_device.init_name = req.spec;
+ fake_device.kobj.name = req.spec;
+
+ kind = strrchr(req.spec, '/');
+ if (kind)
+ *kind++ = '\0';
+
+ chunk = cma_alloc(&fake_device, kind, req.size, req.alignment);
+ } else {
+ chunk = cma_alloc_from(req.spec, req.size, req.alignment);
+ }
+
+ if (IS_ERR(chunk))
+ return PTR_ERR(chunk);
+
+ req.start = cma_pin(chunk);
+ if (put_user(req.start,
+ (typeof(req.start) *)
+ (arg + offsetof(typeof(req), start)))) {
+ cma_free(chunk);
+ return -EFAULT;
+ }
+
+ file->private_data = (void *)chunk;
+
+ dev_dbg(cma_dev, "allocated %p@%p\n",
+ (void *)(unsigned long)req.size,
+ (void *)(unsigned long)req.start);
+
+ return 0;
+}
+
+#ifdef DEBUG
+
+static long __cma_pattern_failed(unsigned long *_it, unsigned long *it,
+ unsigned long *end, unsigned long v)
+{
+ dev_dbg(cma_dev, "at %p + %x got %lx, expected %lx\n",
+ (void *)_it, (it - _it) * sizeof *it, *it, v);
+ print_hex_dump(KERN_DEBUG, "cma: ", DUMP_PREFIX_ADDRESS,
+ 16, sizeof *it, it,
+ min_t(size_t, 128, (end - it) * sizeof *it), 0);
+ return (it - _it) * sizeof *it;
+}
+
+#else
+
+static long __cma_pattern_failed(unsigned long *_it, unsigned long *it,
+ unsigned long *end, unsigned long v)
+{
+ return (it - _it) * sizeof *it;
+}
+
+#endif
+
+static long cma_file_ioctl_pattern(struct file *file, unsigned long arg)
+{
+ const struct cma *chunk;
+ unsigned long *_it, *it, *end, v;
+
+ dev_dbg(cma_dev, "%s(%p, %s)\n", __func__, (void *)file,
+ arg ? "fill" : "verify");
+
+ if (!file->private_data)
+ return -EBADFD;
+
+ chunk = file->private_data;
+ _it = phys_to_virt(cma_phys(chunk));
+ end = _it + cma_size(chunk);
+
+ if (arg)
+ for (v = 0, it = _it; it != end; ++v, ++it)
+ *it = v;
+
+ for (v = 0, it = _it; it != end; ++v, ++it)
+ if (*it != v)
+ return __cma_pattern_failed(_it, it, end, v);
+
+ return cma_size(chunk);
+}
+
+static long cma_file_ioctl_dump(struct file *file, unsigned long len)
+{
+ const struct cma *chunk;
+ unsigned long *it;
+
+ dev_dbg(cma_dev, "%s(%p, %p)\n", __func__, (void *)file, (void *)len);
+
+ if (!file->private_data)
+ return -EBADFD;
+
+ chunk = file->private_data;
+ it = phys_to_virt(cma_phys(chunk));
+ len = min(len & ~(sizeof *it - 1), (unsigned long)cma_size(chunk));
+ print_hex_dump(KERN_DEBUG, "cma: ", DUMP_PREFIX_ADDRESS,
+ 16, sizeof *it, it, len, 0);
+
+ return 0;
+}
+
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+{
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ switch (cmd) {
+ case IOCTL_CMA_ALLOC:
+ return cma_file_ioctl_req(file, arg);
+
+ case IOCTL_CMA_PATTERN:
+ return cma_file_ioctl_pattern(file, arg);
+
+ case IOCTL_CMA_DUMP:
+ return cma_file_ioctl_dump(file, arg);
+
+ default:
+ /* Dead code */
+ return -ENOTTY;
+ }
+}
+
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ unsigned long pgoff, offset, length;
+ const struct cma *chunk;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!file->private_data)
+ return -EBADFD;
+
+ pgoff = vma->vm_pgoff;
+ offset = pgoff << PAGE_SHIFT;
+ length = vma->vm_end - vma->vm_start;
+
+ chunk = file->private_data;
+ if (offset >= cma_size(chunk)
+ || length > cma_size(chunk)
+ || offset + length > cma_size(chunk))
+ return -ENOSPC;
+
+ return remap_pfn_range(vma, vma->vm_start,
+ __phys_to_pfn(cma_phys(chunk)) + pgoff,
+ length, vma->vm_page_prot);
+}
+
+static int __init cma_dev_init(void)
+{
+ int ret = misc_register(&cma_miscdev);
+ pr_debug("miscdev: register returned: %d\n", ret);
+ return ret;
+}
+module_init(cma_dev_init);
+
+static void __exit cma_dev_exit(void)
+{
+ dev_dbg(cma_dev, "deregisterring\n");
+ misc_deregister(&cma_miscdev);
+}
+module_exit(cma_dev_exit);
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 8437104..56ed021 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -11,6 +11,46 @@
* See Documentation/contiguous-memory.txt for details.
*/

+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+
+#define CMA_MAGIC (('c' << 24) | ('M' << 16) | ('a' << 8) | 0x42)
+
+enum {
+ CMA_REQ_DEV_KIND,
+ CMA_REQ_FROM_REG
+};
+
+/**
+ * An information about area exportable to user space.
+ * @magic: must always be CMA_MAGIC.
+ * @type: type of the request.
+ * @spec: either "dev/kind\0" or "regions\0" depending on @type.
+ * In any case, the string must be NUL terminated.
+ * additionally, in the latter case scanning stops at
+ * semicolon (';').
+ * @size: size of the chunk to allocate.
+ * @alignment: desired alignment of the chunk (must be power of two or zero).
+ * @start: when ioctl() finishes this stores physical address of the chunk.
+ */
+struct cma_alloc_request {
+ __u32 magic;
+ __u32 type;
+
+ /* __u64 to be compatible accross 32 and 64 bit systems. */
+ __u64 size;
+ __u64 alignment;
+ __u64 start;
+
+ char spec[32];
+};
+
+#define IOCTL_CMA_ALLOC _IOWR('p', 0, struct cma_alloc_request)
+#define IOCTL_CMA_PATTERN _IO('p', 1)
+#define IOCTL_CMA_DUMP _IO('p', 2)
+
+
/***************************** Kernel level API *****************************/

#if defined __KERNEL__ && defined CONFIG_CMA
diff --git a/tools/cma/cma-test.c b/tools/cma/cma-test.c
new file mode 100644
index 0000000..6de155f
--- /dev/null
+++ b/tools/cma/cma-test.c
@@ -0,0 +1,459 @@
+/*
+ * cma-test.c -- CMA testing application
+ *
+ * Copyright (C) 2010 Samsung Electronics
+ * Author: Michal Nazarewicz <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/* $(CROSS_COMPILE)gcc -Wall -Wextra -g -o cma-test cma-test.c */
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <ctype.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <linux/cma.h>
+
+
+/****************************** Chunks management ******************************/
+
+struct chunk {
+ struct chunk *next, *prev;
+ int fd;
+ unsigned long size;
+ unsigned long start;
+};
+
+static struct chunk root = {
+ .next = &root,
+ .prev = &root,
+};
+
+#define for_each(a) for (a = root.next; a != &root; a = a->next)
+
+static struct chunk *chunk_create(const char *prefix)
+{
+ struct chunk *chunk;
+ int fd;
+
+ chunk = malloc(sizeof *chunk);
+ if (!chunk) {
+ fprintf(stderr, "%s: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ fd = open("/dev/cma", O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "%s: /dev/cma: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ chunk->prev = chunk;
+ chunk->next = chunk;
+ chunk->fd = fd;
+ return chunk;
+}
+
+static void chunk_destroy(struct chunk *chunk)
+{
+ chunk->prev->next = chunk->next;
+ chunk->next->prev = chunk->prev;
+ close(chunk->fd);
+}
+
+static void chunk_add(struct chunk *chunk)
+{
+ chunk->next = &root;
+ chunk->prev = root.prev;
+ root.prev->next = chunk;
+ root.prev = chunk;
+}
+
+
+/****************************** Commands ******************************/
+
+/* Parsing helpers */
+#define SKIP_SPACE(ch) do { while (isspace(*(ch))) ++(ch); } while (0)
+
+static int memparse(char *ptr, char **retptr, unsigned long *ret)
+{
+ unsigned long val;
+
+ SKIP_SPACE(ptr);
+
+ errno = 0;
+ val = strtoul(ptr, &ptr, 0);
+ if (errno)
+ return -1;
+
+ switch (*ptr) {
+ case 'G':
+ case 'g':
+ val <<= 10;
+ case 'M':
+ case 'm':
+ val <<= 10;
+ case 'K':
+ case 'k':
+ val <<= 10;
+ ++ptr;
+ }
+
+ if (retptr) {
+ SKIP_SPACE(ptr);
+ *retptr = ptr;
+ }
+
+ *ret = val;
+ return 0;
+}
+
+static void cmd_list(char *name, char *line, int arg)
+{
+ struct chunk *chunk;
+
+ (void)name; (void)line; (void)arg;
+
+ for_each(chunk)
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+}
+
+static void cmd_alloc(char *name, char *line, int from)
+{
+ static const char *what[2] = { "dev/kind", "regions" };
+
+ unsigned long size, alignment = 0;
+ struct cma_alloc_request req;
+ struct chunk *chunk;
+ char *spec;
+ size_t n;
+ int ret;
+
+ SKIP_SPACE(line);
+ if (!*line) {
+ fprintf(stderr, "%s: expecting %s\n", name, what[from]);
+ return;
+ }
+
+ for (spec = line; *line && !isspace(*line); ++line)
+ /* nothing */;
+
+ if (!*line) {
+ fprintf(stderr, "%s: expecting size after %s\n",
+ name, what[from]);
+ return;
+ }
+
+ *line++ = '\0';
+ n = line - spec;
+ if (n > sizeof req.spec) {
+ fprintf(stderr, "%s: %s too long\n", name, what[from]);
+ return;
+ }
+
+ if (memparse(line, &line, &size) < 0 || !size) {
+ fprintf(stderr, "%s: invalid size\n", name);
+ return;
+ }
+
+ if (*line == '/')
+ if (memparse(line, &line, &alignment) < 0) {
+ fprintf(stderr, "%s: invalid alignment\n", name);
+ return;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr, "%s: unknown argument(s) at the end: %s\n",
+ name, line);
+ return;
+ }
+
+ chunk = chunk_create(name);
+ if (!chunk)
+ return;
+
+ fprintf(stderr, "%s: allocating %p/%p\n", name,
+ (void *)size, (void *)alignment);
+
+ req.magic = CMA_MAGIC;
+ req.type = from ? CMA_REQ_FROM_REG : CMA_REQ_DEV_KIND;
+ req.size = size;
+ req.alignment = alignment;
+ req.start = 0;
+
+ memcpy(req.spec, spec, n);
+ memset(req.spec + n, '\0', sizeof req.spec - n);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_ALLOC, &req);
+ if (ret < 0) {
+ fprintf(stderr, "%s: cma_alloc: %s\n", name, strerror(errno));
+ chunk_destroy(chunk);
+ } else {
+ chunk_add(chunk);
+ chunk->size = req.size;
+ chunk->start = req.start;
+
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+ }
+}
+
+static struct chunk *__cmd_numbered(char *name, char *line)
+{
+ struct chunk *chunk;
+
+ SKIP_SPACE(line);
+
+ if (*line) {
+ unsigned long num;
+
+ errno = 0;
+ num = strtoul(line, &line, 10);
+
+ if (errno || num > INT_MAX) {
+ fprintf(stderr, "%s: invalid number\n", name);
+ return NULL;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr,
+ "%s: unknown arguments at the end: %s\n",
+ name, line);
+ return NULL;
+ }
+
+ for_each(chunk)
+ if (chunk->fd == (int)num)
+ return chunk;
+ fprintf(stderr, "%s: no chunk %3lu\n", name, num);
+ return NULL;
+
+ } else {
+ chunk = root.prev;
+ if (chunk == &root) {
+ fprintf(stderr, "%s: no chunks\n", name);
+ return NULL;
+ }
+ return chunk;
+ }
+}
+
+static void cmd_free(char *name, char *line, int arg)
+{
+ struct chunk *chunk = __cmd_numbered(name, line);
+ (void)arg;
+ if (chunk) {
+ fprintf(stderr, "%s: freeing %p@%p\n", name,
+ (void *)chunk->size, (void *)chunk->start);
+ chunk_destroy(chunk);
+ }
+}
+
+static void cmd_mapped(char *name, char *line, int arg)
+{
+ struct chunk *chunk = __cmd_numbered(name, line);
+ unsigned long *ptr, *it, *end, v;
+
+ if (!chunk)
+ return;
+
+ ptr = mmap(NULL, chunk->size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, chunk->fd, 0);
+
+ if (ptr == (void *)-1) {
+ fprintf(stderr, "%s: mapping failed: %s\n", name,
+ strerror(errno));
+ return;
+ }
+
+ end = ptr + chunk->size / sizeof *it;
+
+ if (arg)
+ for (v = 0, it = ptr; it != end; ++v, ++it)
+ *it = v;
+
+ for (v = 0, it = ptr; it != end && *it == v; ++v, ++it)
+ /* nop */;
+
+ if (it != end)
+ fprintf(stderr, "%s: at +[%x] got %lx, expected %lx\n",
+ name, (it - ptr) * sizeof *it, *it, v);
+ else
+ fprintf(stderr, "%s: done\n", name);
+
+ munmap(it, chunk->size);
+}
+
+static void cmd_pattern(char *name, char *line, int arg)
+{
+ struct chunk *chunk = __cmd_numbered(name, line);
+ if (chunk) {
+ int ret;
+
+ fprintf(stderr, "%s: requesting kernel to %s %p@%p\n",
+ name, arg ? "fill" : "verify",
+ (void *)chunk->size, (void *)chunk->start);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_PATTERN, arg);
+ if (ret < 0)
+ fprintf(stderr, "%s: %s\n", name, strerror(errno));
+ else if ((unsigned long)ret < chunk->size)
+ fprintf(stderr, "%s: failed at +[%x]\n", name, ret);
+ else
+ fprintf(stderr, "%s: done\n", name);
+ }
+}
+
+static void cmd_kdump(char *name, char *line, int arg)
+{
+ struct chunk *chunk = __cmd_numbered(name, line);
+
+ (void)arg;
+
+ if (chunk) {
+ int ret;
+
+ fprintf(stderr, "%s: requesting kernel to dump 256B@%p\n",
+ name, (void *)chunk->start);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_DUMP, 256);
+ if (ret < 0)
+ fprintf(stderr, "%s: %s\n", name, strerror(errno));
+ else
+ fprintf(stderr, "%s: done\n", name);
+ }
+}
+
+static const struct command {
+ const char short_name;
+ const char name[8];
+ void (*handle)(char *name, char *line, int arg);
+ int arg;
+ const char *help_args, *help;
+} commands[] = {
+ { 'l', "list", cmd_list, 0,
+ "", "list allocated chunks" },
+ { 'a', "alloc", cmd_alloc, 0,
+ "<dev>/<kind> <size>[/<alignment>]", "allocate chunk" },
+ { 'A', "afrom", cmd_alloc, 1,
+ "<regions> <size>[/<alignment>]", "allocate from region(s)" },
+ { 'f', "free", cmd_free, 0,
+ "[<num>]", "free an chunk" },
+ { 'w', "write", cmd_mapped, 1,
+ "[<num>]", "write data to chunk" },
+ { 'W', "kwrite", cmd_pattern, 1,
+ "[<num>]", "verify chunk's contet" },
+ { 'v', "verify", cmd_mapped, 0,
+ "[<num>]", "let kernel write data to chunk" },
+ { 'V', "kverify", cmd_pattern, 0,
+ "[<num>]", "let kernel verify verify chunk's contet" },
+ { 'D', "kdump", cmd_kdump, 0,
+ "[<num>]", "make kernel dump content" },
+ { '\0', "", NULL, 0, NULL, NULL }
+};
+
+static void handle_command(char *line)
+{
+ static char last_line[1024];
+
+ const struct command *cmd;
+ char *name, short_name = '\0';
+
+ SKIP_SPACE(line);
+ if (*line == '#')
+ return;
+
+ if (!*line)
+ strcpy(line, last_line);
+ else
+ strcpy(last_line, line);
+
+ name = line;
+ while (*line && !isspace(*line))
+ ++line;
+
+ if (*line) {
+ *line = '\0';
+ ++line;
+ }
+
+ if (!name[1])
+ short_name = name[0];
+
+ for (cmd = commands; *(cmd->name); ++cmd)
+ if (short_name
+ ? short_name == cmd->short_name
+ : !strcmp(name, cmd->name)) {
+ cmd->handle(name, line, cmd->arg);
+ return;
+ }
+
+ fprintf(stderr, "%s: unknown command\n", name);
+}
+
+
+/****************************** Main ******************************/
+
+int main(void)
+{
+ const struct command *cmd = commands;
+ unsigned no = 1;
+ char line[1024];
+ int skip = 0;
+
+ fputs("commands:\n", stderr);
+ do {
+ fprintf(stderr, " %c or %-7s %-10s %s\n",
+ cmd->short_name, cmd->name, cmd->help_args, cmd->help);
+ } while ((++cmd)->handle);
+ fputs(" # ... comment\n"
+ " <empty line> repeat previous\n"
+ "\n", stderr);
+
+ while (fgets(line, sizeof line, stdin)) {
+ char *nl = strchr(line, '\n');
+ if (nl) {
+ if (skip) {
+ fprintf(stderr, "cma: %d: line too long\n", no);
+ skip = 0;
+ } else {
+ *nl = '\0';
+ handle_command(line);
+ }
+ ++no;
+ } else {
+ skip = 1;
+ }
+ }
+
+ if (skip)
+ fprintf(stderr, "cma: %d: no new line at EOF\n", no);
+ return 0;
+}
--
1.7.2.3

2010-11-19 15:58:30

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 02/13] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked. This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/bitmap.h | 24 +++++++++++++++++++-----
lib/bitmap.c | 22 ++++++++++++----------
2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits);

extern void bitmap_set(unsigned long *map, int i, int len);
extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask)
+{
+ return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}

extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
}
EXPORT_SYMBOL(bitmap_clear);

-/*
+/**
* bitmap_find_next_zero_area - find a contiguous aligned zero area
* @map: The address to base the search on
* @size: The bitmap size in bits
* @start: The bitnumber to start searching at
* @nr: The number of zeroed bits we're looking for
* @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
*
* The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
*/
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset)
{
unsigned long index, end, i;
again:
index = find_next_zero_bit(map, size, start);

/* Align allocation */
- index = __ALIGN_MASK(index, align_mask);
+ index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;

end = index + nr;
if (end > size)
@@ -350,7 +352,7 @@ again:
}
return index;
}
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);

/*
* Bitmap printing & parsing functions: first version by Bill Irwin,
--
1.7.2.3

2010-11-19 15:58:32

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 05/13] mm: cma: debugfs support added

The debugfs development interface lets one change the map attribute
at run time as well as observe what regions have been reserved.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
Documentation/contiguous-memory.txt | 4 +
include/linux/cma.h | 11 +
mm/Kconfig | 25 ++-
mm/cma.c | 501 ++++++++++++++++++++++++++++++++++-
4 files changed, 537 insertions(+), 4 deletions(-)

diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
index f1715ba..ec09d8e 100644
--- a/Documentation/contiguous-memory.txt
+++ b/Documentation/contiguous-memory.txt
@@ -258,6 +258,10 @@
iff it matched in previous pattern. If the second part is
omitted it will mach any type of memory requested by device.

+ If debugfs support is enabled, this attribute is accessible via
+ debugfs and can be changed at run-time by writing to
+ contiguous/map.
+
Some examples (whitespace added for better readability):

cma_map = foo/quaz = r1;
diff --git a/include/linux/cma.h b/include/linux/cma.h
index a6031a7..8437104 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -24,6 +24,7 @@

struct device;
struct cma_info;
+struct dentry;

/**
* struct cma - an allocated contiguous chunk of memory.
@@ -276,6 +277,11 @@ struct cma_region {
unsigned users;
struct list_head list;

+#if defined CONFIG_CMA_DEBUGFS
+ const char *to_alloc_link, *from_alloc_link;
+ struct dentry *dir, *to_alloc, *from_alloc;
+#endif
+
unsigned used:1;
unsigned registered:1;
unsigned reserved:1;
@@ -382,6 +388,11 @@ struct cma_allocator {
void (*unpin)(struct cma *chunk);

struct list_head list;
+
+#if defined CONFIG_CMA_DEBUGFS
+ const char *dir_name;
+ struct dentry *regs_dir;
+#endif
};

/**
diff --git a/mm/Kconfig b/mm/Kconfig
index c7eb1bc..a5480ea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -351,16 +351,35 @@ config CMA
For more information see <Documentation/contiguous-memory.txt>.
If unsure, say "n".

-config CMA_DEBUG
- bool "CMA debug messages (DEVELOPEMENT)"
+config CMA_DEVELOPEMENT
+ bool "Include CMA developement features"
depends on CMA
help
+ This lets you enable some developement features of the CMA
+ framework. It does not add any code to the kernel.
+
+ Those options are mostly usable during development and testing.
+ If unsure, say "n".
+
+config CMA_DEBUG
+ bool "CMA debug messages"
+ depends on CMA_DEVELOPEMENT
+ help
Turns on debug messages in CMA. This produces KERN_DEBUG
messages for every CMA call as well as various messages while
processing calls such as cma_alloc(). This option does not
affect warning and error messages.

- This is mostly used during development. If unsure, say "n".
+config CMA_DEBUGFS
+ bool "CMA debugfs interface support"
+ depends on CMA_DEVELOPEMENT && DEBUG_FS
+ help
+ Enable support for debugfs interface. It is available under the
+ "contiguous" directory in the debugfs root directory. Each
+ region and allocator is represented there.
+
+ For more information consult
+ <Documentation/contiguous-memory.txt>.

config CMA_GENERIC_ALLOCATOR
bool "CMA generic allocator"
diff --git a/mm/cma.c b/mm/cma.c
index 17276b3..dfdeeb7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -34,11 +34,16 @@
#include <linux/slab.h> /* kmalloc() */
#include <linux/string.h> /* str*() */
#include <linux/genalloc.h> /* gen_pool_*() */
+#include <linux/debugfs.h> /* debugfs stuff */
+#include <linux/uaccess.h> /* copy_{to,from}_user */

#include <linux/cma.h>


-/* Protects cma_regions, cma_allocators, cma_map and cma_map_length. */
+/*
+ * Protects cma_regions, cma_allocators, cma_map, cma_map_length,
+ * cma_dfs_regions and cma_dfs_allocators.
+ */
static DEFINE_MUTEX(cma_mutex);


@@ -139,7 +144,13 @@ int __init __must_check cma_early_region_register(struct cma_region *reg)

/************************* Regions & Allocators *************************/

+static void __cma_dfs_region_add(struct cma_region *reg);
+static void __cma_dfs_region_alloc_update(struct cma_region *reg);
+static void __cma_dfs_allocator_add(struct cma_allocator *alloc);
+
static int __cma_region_attach_alloc(struct cma_region *reg);
+static void __maybe_unused __cma_region_detach_alloc(struct cma_region *reg);
+

/* List of all regions. Named regions are kept before unnamed. */
static LIST_HEAD(cma_regions);
@@ -222,6 +233,8 @@ int __must_check cma_region_register(struct cma_region *reg)
else
list_add_tail(&reg->list, &cma_regions);

+ __cma_dfs_region_add(reg);
+
done:
mutex_unlock(&cma_mutex);

@@ -298,6 +311,8 @@ int cma_allocator_register(struct cma_allocator *alloc)
__cma_region_attach_alloc(reg);
}

+ __cma_dfs_allocator_add(alloc);
+
mutex_unlock(&cma_mutex);

pr_debug("%s: allocator registered\n", alloc->name ?: "(unnamed)");
@@ -481,6 +496,476 @@ static int __init cma_init(void)
subsys_initcall(cma_init);


+/************************* Debugfs *************************/
+
+#if defined CONFIG_CMA_DEBUGFS
+
+static struct dentry *cma_dfs_regions, *cma_dfs_allocators;
+
+struct cma_dfs_file {
+ const char *name;
+ const struct file_operations *ops;
+};
+
+static struct dentry *
+cma_dfs_create_file(const char *name, struct dentry *parent,
+ void *priv, const struct file_operations *ops)
+{
+ struct dentry *d;
+ d = debugfs_create_file(name, ops->write ? 0644 : 0444,
+ parent, priv, ops);
+ if (IS_ERR_OR_NULL(d)) {
+ pr_err("debugfs: %s: %s: unable to create\n",
+ parent->d_iname, name);
+ return NULL;
+ }
+
+ return d;
+}
+
+static void cma_dfs_create_files(const struct cma_dfs_file *files,
+ struct dentry *parent, void *priv)
+{
+ while (files->name
+ && cma_dfs_create_file(files->name, parent, priv, files->ops))
+ ++files;
+}
+
+static struct dentry *
+cma_dfs_create_dir(const char *name, struct dentry *parent)
+{
+ struct dentry *d = debugfs_create_dir(name, parent);
+
+ if (IS_ERR_OR_NULL(d)) {
+ pr_err("debugfs: %s: %s: unable to create\n",
+ parent ? (const char *)parent->d_iname : "<root>", name);
+ return NULL;
+ }
+
+ return d;
+}
+
+static struct dentry *
+cma_dfs_create_lnk(const char *name, struct dentry *parent, const char *target)
+{
+ struct dentry *d = debugfs_create_symlink(name, parent, target);
+
+ if (IS_ERR_OR_NULL(d)) {
+ pr_err("debugfs: %s: %s: unable to create\n",
+ parent->d_iname, name);
+ return NULL;
+ }
+
+ return d;
+}
+
+static int cma_dfs_open(struct inode *inode, struct file *file)
+{
+ file->private_data = inode->i_private;
+ return 0;
+}
+
+static ssize_t cma_dfs_map_read(struct file *file, char __user *buf,
+ size_t size, loff_t *offp)
+{
+ ssize_t len;
+
+ if (!cma_map_length || *offp)
+ return 0;
+
+ mutex_lock(&cma_mutex);
+
+ /* may have changed */
+ len = cma_map_length;
+ if (!len)
+ goto done;
+
+ len = min_t(size_t, size, len);
+ if (copy_to_user(buf, cma_map, len))
+ len = -EFAULT;
+ else if ((size_t)len < size && put_user('\n', buf + len++))
+ len = -EFAULT;
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ if (len > 0)
+ *offp = len;
+
+ return len;
+}
+
+static ssize_t cma_dfs_map_write(struct file *file, const char __user *buf,
+ size_t size, loff_t *offp)
+{
+ char *val, *v;
+ ssize_t len;
+
+ if (size >= PAGE_SIZE || *offp)
+ return -ENOSPC;
+
+ val = kmalloc(size + 1, GFP_KERNEL);
+ if (!val)
+ return -ENOMEM;
+
+ if (copy_from_user(val, buf, size)) {
+ len = -EFAULT;
+ goto done;
+ }
+ val[size] = '\0';
+
+ len = cma_map_validate(val);
+ if (len < 0)
+ goto done;
+ val[len] = '\0';
+
+ mutex_lock(&cma_mutex);
+ v = (char *)cma_map;
+ cma_map = val;
+ val = v;
+ cma_map_length = len;
+ mutex_unlock(&cma_mutex);
+
+done:
+ kfree(val);
+
+ if (len > 0)
+ *offp = len;
+
+ return len;
+}
+
+static int __init cma_dfs_init(void)
+{
+ static const struct file_operations map_ops = {
+ .read = cma_dfs_map_read,
+ .write = cma_dfs_map_write,
+ };
+
+ struct dentry *root, *a, *r;
+
+ root = cma_dfs_create_dir("contiguous", NULL);
+ if (!root)
+ return 0;
+
+ if (!cma_dfs_create_file("map", root, NULL, &map_ops))
+ goto error;
+
+ a = cma_dfs_create_dir("allocators", root);
+ if (!a)
+ goto error;
+
+ r = cma_dfs_create_dir("regions", root);
+ if (!r)
+ goto error;
+
+ mutex_lock(&cma_mutex);
+ {
+ struct cma_allocator *alloc;
+ cma_dfs_allocators = a;
+ cma_foreach_allocator(alloc)
+ __cma_dfs_allocator_add(alloc);
+ }
+
+ {
+ struct cma_region *reg;
+ cma_dfs_regions = r;
+ cma_foreach_region(reg)
+ __cma_dfs_region_add(reg);
+ }
+ mutex_unlock(&cma_mutex);
+
+ return 0;
+
+error:
+ debugfs_remove_recursive(root);
+ return 0;
+}
+device_initcall(cma_dfs_init);
+
+static ssize_t cma_dfs_region_name_read(struct file *file, char __user *buf,
+ size_t size, loff_t *offp)
+{
+ struct cma_region *reg = file->private_data;
+ size_t len;
+
+ if (!reg->name || *offp)
+ return 0;
+
+ len = min(strlen(reg->name), size);
+ if (copy_to_user(buf, reg->name, len))
+ return -EFAULT;
+ if (len < size && put_user('\n', buf + len++))
+ return -EFAULT;
+
+ *offp = len;
+ return len;
+}
+
+static ssize_t cma_dfs_region_info_read(struct file *file, char __user *buf,
+ size_t size, loff_t *offp)
+{
+ struct cma_region *reg = file->private_data;
+ char str[min((size_t)63, size) + 1];
+ int len;
+
+ if (*offp)
+ return 0;
+
+ len = snprintf(str, sizeof str, "%p %p %p\n",
+ (void *)reg->start, (void *)reg->size,
+ (void *)reg->free_space);
+
+ if (copy_to_user(buf, str, len))
+ return -EFAULT;
+
+ *offp = len;
+ return len;
+}
+
+static ssize_t cma_dfs_region_alloc_read(struct file *file, char __user *buf,
+ size_t size, loff_t *offp)
+{
+ struct cma_region *reg = file->private_data;
+ char str[min((size_t)63, size) + 1];
+ const char *fmt;
+ const void *arg;
+ int len = 0;
+
+ if (*offp)
+ return 0;
+
+ mutex_lock(&cma_mutex);
+
+ if (reg->alloc) {
+ if (reg->alloc->name) {
+ fmt = "%s\n";
+ arg = reg->alloc->name;
+ } else {
+ fmt = "0x%p\n";
+ arg = (void *)reg->alloc;
+ }
+ } else if (reg->alloc_name) {
+ fmt = "[%s]\n";
+ arg = reg->alloc_name;
+ } else {
+ goto done;
+ }
+
+ len = snprintf(str, sizeof str, fmt, arg);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ if (len) {
+ if (copy_to_user(buf, str, len))
+ return -EFAULT;
+ *offp = len;
+ }
+ return len;
+}
+
+static ssize_t
+cma_dfs_region_alloc_write(struct file *file, const char __user *buf,
+ size_t size, loff_t *offp)
+{
+ struct cma_region *reg = file->private_data;
+ ssize_t ret;
+ char *s, *t;
+
+ if (size > 64 || *offp)
+ return -ENOSPC;
+
+ if (reg->alloc && reg->users)
+ return -EBUSY;
+
+ s = kmalloc(size + 1, GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ if (copy_from_user(s, buf, size)) {
+ ret = -EFAULT;
+ goto done_free;
+ }
+
+ s[size] = '\0';
+ t = strchr(s, '\n');
+ if (t == s) {
+ kfree(s);
+ s = NULL;
+ }
+ if (t)
+ *t = '\0';
+
+ mutex_lock(&cma_mutex);
+
+ /* things may have changed while we were acquiring lock */
+ if (reg->alloc && reg->users) {
+ ret = -EBUSY;
+ } else {
+ if (reg->alloc)
+ __cma_region_detach_alloc(reg);
+
+ t = s;
+ s = reg->free_alloc_name ? (char *)reg->alloc_name : NULL;
+
+ reg->alloc_name = t;
+ reg->free_alloc_name = 1;
+
+ ret = size;
+ }
+
+ mutex_unlock(&cma_mutex);
+
+done_free:
+ kfree(s);
+
+ if (ret > 0)
+ *offp = ret;
+ return ret;
+}
+
+static const struct cma_dfs_file __cma_dfs_region_files[] = {
+ {
+ "name", &(const struct file_operations){
+ .open = cma_dfs_open,
+ .read = cma_dfs_region_name_read,
+ },
+ },
+ {
+ "info", &(const struct file_operations){
+ .open = cma_dfs_open,
+ .read = cma_dfs_region_info_read,
+ },
+ },
+ {
+ "alloc", &(const struct file_operations){
+ .open = cma_dfs_open,
+ .read = cma_dfs_region_alloc_read,
+ .write = cma_dfs_region_alloc_write,
+ },
+ },
+ { }
+};
+
+static void __cma_dfs_region_add(struct cma_region *reg)
+{
+ struct dentry *d;
+
+ if (!cma_dfs_regions || reg->dir)
+ return;
+
+ /* Region's directory */
+ reg->from_alloc_link = kasprintf(GFP_KERNEL, "../../regions/0x%p",
+ (void *)reg->start);
+ if (!reg->from_alloc_link)
+ return;
+
+ d = cma_dfs_create_dir(reg->from_alloc_link + 14, cma_dfs_regions);
+ if (!d) {
+ kfree(reg->from_alloc_link);
+ return;
+ }
+
+ if (reg->name)
+ cma_dfs_create_lnk(reg->name, cma_dfs_regions,
+ reg->from_alloc_link + 14);
+
+ reg->dir = d;
+
+ /* Files */
+ cma_dfs_create_files(__cma_dfs_region_files, d, reg);
+
+ /* Link to allocator */
+ __cma_dfs_region_alloc_update(reg);
+}
+
+static void __cma_dfs_region_alloc_update(struct cma_region *reg)
+{
+ if (!cma_dfs_regions || !cma_dfs_allocators || !reg->dir)
+ return;
+
+ /* Remove stall links */
+ if (reg->to_alloc) {
+ debugfs_remove(reg->to_alloc);
+ reg->to_alloc = NULL;
+ }
+
+ if (reg->from_alloc) {
+ debugfs_remove(reg->from_alloc);
+ reg->from_alloc = NULL;
+ }
+
+ if (reg->to_alloc_link) {
+ kfree(reg->to_alloc_link);
+ reg->to_alloc_link = NULL;
+ }
+
+ if (!reg->alloc)
+ return;
+
+ /* Create new links */
+ if (reg->alloc->regs_dir)
+ reg->from_alloc =
+ cma_dfs_create_lnk(reg->from_alloc_link + 14,
+ reg->alloc->regs_dir,
+ reg->from_alloc_link);
+
+ if (!reg->alloc->dir_name)
+ return;
+
+ reg->to_alloc_link = kasprintf(GFP_KERNEL, "../allocators/%s",
+ reg->alloc->dir_name);
+ if (reg->to_alloc_link &&
+ !cma_dfs_create_lnk("allocator", reg->dir, reg->to_alloc_link)) {
+ kfree(reg->to_alloc_link);
+ reg->to_alloc_link = NULL;
+ }
+}
+
+static inline void __cma_dfs_allocator_add(struct cma_allocator *alloc)
+{
+ struct dentry *d;
+
+ if (!cma_dfs_allocators || alloc->dir_name)
+ return;
+
+ alloc->dir_name = alloc->name ?:
+ kasprintf(GFP_KERNEL, "0x%p", (void *)alloc);
+ if (!alloc->dir_name)
+ return;
+
+ d = cma_dfs_create_dir(alloc->dir_name, cma_dfs_allocators);
+ if (!d) {
+ if (!alloc->name)
+ kfree(alloc->dir_name);
+ alloc->dir_name = NULL;
+ return;
+ }
+
+ alloc->regs_dir = cma_dfs_create_dir("regions", d);
+}
+
+#else
+
+static inline void __cma_dfs_region_add(struct cma_region *reg)
+{
+ /* nop */
+}
+
+static inline void __cma_dfs_allocator_add(struct cma_allocator *alloc)
+{
+ /* nop */
+}
+
+static inline void __cma_dfs_region_alloc_update(struct cma_region *reg)
+{
+ /* nop */
+}
+
+#endif
+
+
/************************* The Device API *************************/

static const char *__must_check
@@ -731,10 +1216,24 @@ static int __cma_region_attach_alloc(struct cma_region *reg)
reg->alloc = alloc;
pr_debug("init: %s: %s: initialised allocator\n",
reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ __cma_dfs_region_alloc_update(reg);
}
return ret;
}

+static void __cma_region_detach_alloc(struct cma_region *reg)
+{
+ if (!reg->alloc)
+ return;
+
+ if (reg->alloc->cleanup)
+ reg->alloc->cleanup(reg);
+
+ reg->alloc = NULL;
+ reg->used = 1;
+ __cma_dfs_region_alloc_update(reg);
+}
+

/*
* s ::= rules
--
1.7.2.3

2010-11-19 15:59:47

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 12/13] mm: cma: Migration support added [wip]

This commits adds cma_early_grab_pageblocks() function as well
as makes cma_early_region_reserve() function use the former if
some conditions are met.

Grabbed pageblocks are later given back to page allocator with
migration type set to MIGRATE_CMA. This guarantees that only
movable and reclaimable pages are allocated from those page
blocks.

* * * THIS COMMIT IS NOT YET FINISHED * * *

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/cma.h | 38 +++++++-
mm/Kconfig | 15 +++
mm/cma-best-fit.c | 12 +++-
mm/cma.c | 239 +++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 299 insertions(+), 5 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 56ed021..6a56e2a 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -310,10 +310,6 @@ struct cma_region {
const char *alloc_name;
void *private_data;

-#ifdef CONFIG_CMA_USE_MIGRATE_CMA
- unsigned short *isolation_map;
-#endif
-
unsigned users;
struct list_head list;

@@ -327,7 +323,9 @@ struct cma_region {
unsigned reserved:1;
unsigned copy_name:1;
unsigned free_alloc_name:1;
+#ifdef CONFIG_CMA_USE_MIGRATE_CMA
unsigned use_isolate:1;
+#endif
};

/**
@@ -449,6 +447,38 @@ struct cma_allocator {
*/
int cma_allocator_register(struct cma_allocator *alloc);

+/**
+ * __cma_grab() - migrates all pages from range and reserves them for CMA
+ * @reg: Region this call is made in context of. If the region is
+ * not marked needing grabbing the function does nothing.
+ * @start: Address in bytes of the first byte to free.
+ * @size: Size of the region to free.
+ *
+ * This function should be used when allocator wants to allocate some
+ * physical memory to make sure that it is not used for any movable or
+ * reclaimable pages (eg. page cache).
+ *
+ * In essence, this function migrates all movable and reclaimable
+ * pages from the range and then removes them from buddy system so
+ * page allocator won't consider them when allocating space.
+ *
+ * The allocator may assume that it is unlikely for this function to fail so
+ * if it fails allocator should just recover and return error.
+ */
+int __cma_grab(struct cma_region *reg, phys_addr_t start, size_t size);
+
+/**
+ * cma_ungrab_range() - frees pages from range back to buddy system.
+ * @reg: Region this call is made in context of. If the region is
+ * not marked needing grabbing the function does nothing.
+ * @start: Address in bytes of the first byte to free.
+ * @size: Size of the region to free.
+ *
+ * This is reverse of cma_grab_range(). Allocator should use it when
+ * physical memory is no longer used.
+ */
+void __cma_ungrab(struct cma_region *reg, phys_addr_t start, size_t size);
+

/**************************** Initialisation API ****************************/

diff --git a/mm/Kconfig b/mm/Kconfig
index 4aee3c5..80fd6bd 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -363,6 +363,21 @@ config CMA
For more information see <Documentation/contiguous-memory.txt>.
If unsure, say "n".

+config CMA_USE_MIGRATE_CMA
+ bool "Use MIGRATE_CMA"
+ depends on CMA
+ default y
+ select MIGRATION
+ select MIGRATE_CMA
+ help
+ This makes CMA use MIGRATE_CMA migration type for regions
+ maintained by CMA. This makes it possible for standard page
+ allocator to use pages from such regions. This in turn may
+ make the whole system run faster as there will be more space
+ for page caches, etc.
+
+ If unsure, say "y".
+
config CMA_DEVELOPEMENT
bool "Include CMA developement features"
depends on CMA
diff --git a/mm/cma-best-fit.c b/mm/cma-best-fit.c
index 5ed1168..15f4206 100644
--- a/mm/cma-best-fit.c
+++ b/mm/cma-best-fit.c
@@ -145,6 +145,8 @@ static void cma_bf_cleanup(struct cma_region *reg)
kfree(prv);
}

+static void __cma_bf_free(struct cma_region *reg, union cma_bf_item *chunk);
+
struct cma *cma_bf_alloc(struct cma_region *reg,
size_t size, unsigned long alignment)
{
@@ -281,10 +283,17 @@ case_2:

item->chunk.phys = start;
item->chunk.size = size;
+
+ ret = __cma_grab(reg, start, size);
+ if (ret) {
+ __cma_bf_free(reg, item);
+ return ERR_PTR(ret);
+ }
+
return &item->chunk;
}

-static void cma_bf_free(struct cma_chunk *chunk)
+static void __cma_bf_free(struct cma_region *reg, union cma_bf_item *item)
{
struct cma_bf_private *prv = reg->private_data;
union cma_bf_item *prev;
@@ -350,6 +359,7 @@ next:
}
}

+static void cma_bf_free(struct cma *chunk)
{
__cma_ungrab(chunk->reg, chunk->phys, chunk->size);
__cma_bf_free(chunk->reg,
diff --git a/mm/cma.c b/mm/cma.c
index dfdeeb7..510181a 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -39,6 +39,13 @@

#include <linux/cma.h>

+#ifdef CONFIG_CMA_USE_MIGRATE_CMA
+#include <linux/page-isolation.h>
+
+#include <asm/page.h>
+
+#include "internal.h" /* __free_pageblock_cma() */
+#endif

/*
* Protects cma_regions, cma_allocators, cma_map, cma_map_length,
@@ -410,6 +417,222 @@ __cma_early_reserve(struct cma_region *reg)
return tried ? -ENOMEM : -EOPNOTSUPP;
}

+#ifdef CONFIG_CMA_USE_MIGRATE_CMA
+
+static struct cma_grabbed_early {
+ phys_addr_t start;
+ size_t size;
+} cma_grabbed_early[16] __initdata;
+static unsigned cma_grabbed_early_count __initdata;
+
+/* XXX Revisit */
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+# define phys_to_pfn __phys_to_pfn
+#else
+# warning correct phys_to_pfn implementation needed
+static unsigned long phys_to_pfn(phys_addr_t phys)
+{
+ return virt_to_pfn(phys_to_virt(phys));
+}
+#endif
+
+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+ return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+static int __init cma_free_grabbed(void)
+{
+ struct cma_grabbed_early *r = cma_grabbed_early;
+ unsigned i = cma_grabbed_early_count;
+
+ for (; i; --i, ++r) {
+ struct page *p = phys_to_page(r->start);
+ unsigned j = r->size >> (PAGE_SHIFT + pageblock_order);
+
+ pr_debug("feeding buddy with: %p + %u * %luM\n",
+ (void *)r->start,
+ j, 1ul << (PAGE_SHIFT + pageblock_order - 20));
+
+ do {
+ __free_pageblock_cma(p);
+ p += pageblock_nr_pages;
+ } while (--j);
+ }
+
+ return 0;
+}
+module_init(cma_free_grabbed);
+
+static phys_addr_t
+__cma_early_region_reserve_try_migrate_cma(struct cma_region *reg)
+{
+ int ret;
+
+ if (((reg->start | reg->size) & ((PAGE_SIZE << MAX_ORDER) - 1)))
+ return -EOPNOTSUPP;
+
+ /*
+ * XXX Revisit: Do we need to check if the region is
+ * consistent? For instance, are all pages valid and part of
+ * the same zone?
+ */
+
+ if (cma_grabbed_early_count >= ARRAY_SIZE(cma_grabbed_early)) {
+ static bool once = true;
+ if (once) {
+ pr_warn("grabbed too many ranges, not all will be MIGRATE_CMA");
+ once = false;
+ }
+ return -EOPNOTSUPP;
+ }
+
+ pr_debug("init: reserving region as MIGRATE_CMA\n");
+
+ reg->alignment = max(reg->alignment,
+ (unsigned long)PAGE_SHIFT << MAX_ORDER);
+
+ ret = __cma_early_reserve(reg);
+ if (ret)
+ return ret;
+
+ cma_grabbed_early[cma_grabbed_early_count].start = reg->start;
+ cma_grabbed_early[cma_grabbed_early_count].size = reg->size;
+ ++cma_grabbed_early_count;
+
+ reg->use_isolate = 1;
+
+ return 0;
+}
+
+int __cma_grab(struct cma_region *reg, phys_addr_t start_addr, size_t size)
+{
+ unsigned long start, end, _start, _end;
+ int ret;
+
+ if (!reg->use_isolate)
+ return 0;
+
+ pr_debug("%s\n", __func__);
+
+ /*
+ * What we do here is we mark all pageblocks in range as
+ * MIGRATE_ISOLATE. Because of the way page allocator work, we
+ * align the range to MAX_ORDER pages so that page allocator
+ * won't try to merge buddies from different pageblocks and
+ * change MIGRATE_ISOLATE to some other migration type.
+ *
+ * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+ * migrate the pages from an unaligned range (ie. pages that
+ * we are interested in). This will put all the pages in
+ * range back to page allocator as MIGRATE_ISOLATE.
+ *
+ * When this is done, we take the pages in range from page
+ * allocator removing them from the buddy system. This way
+ * page allocator will never consider using them.
+ *
+ * This lets us mark the pageblocks back as MIGRATE_CMA so
+ * that free pages in the MAX_ORDER aligned range but not in
+ * the unaligned, original range are put back to page
+ * allocator so that buddy can use them.
+ */
+
+ start = phys_to_pfn(start_addr);
+ end = start + (size >> PAGE_SHIFT);
+
+ pr_debug("\tisolate range(%lx, %lx)\n",
+ pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ ret = __start_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end), MIGRATE_CMA);
+ if (ret)
+ goto done;
+
+ pr_debug("\tmigrate range(%lx, %lx)\n", start, end);
+ ret = do_migrate_range(start, end);
+ if (ret)
+ goto done;
+
+ /*
+ * Pages from [start, end) are within a MAX_ORDER aligned
+ * blocks that are marked as MIGRATE_ISOLATE. What's more,
+ * all pages in [start, end) are free in page allocator. What
+ * we are going to do is to allocate all pages from [start,
+ * end) (that is remove them from page allocater).
+ *
+ * The only problem is that pages at the beginning and at the
+ * end of interesting range may be not aligned with pages that
+ * page allocator holds, ie. they can be part of higher order
+ * pages. Because of this, we reserve the bigger range and
+ * once this is done free the pages we are not interested in.
+ */
+
+ pr_debug("\tfinding buddy\n");
+ ret = 0;
+ while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+ if (WARN_ON(++ret > MAX_ORDER))
+ return -EINVAL;
+
+ _start = start & (~0UL << ret);
+ pr_debug("\talloc freed(%lx, %lx)\n", _start, end);
+ _end = alloc_contig_freed_pages(_start, end, 0);
+
+ /* Free head and tail (if any) */
+ pr_debug("\tfree contig(%lx, %lx)\n", _start, start);
+ free_contig_pages(pfn_to_page(_start), start - _start);
+ pr_debug("\tfree contig(%lx, %lx)\n", end, _end);
+ free_contig_pages(pfn_to_page(end), _end - end);
+
+ ret = 0;
+
+done:
+ pr_debug("\tundo isolate range(%lx, %lx)\n",
+ pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ __undo_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end), MIGRATE_CMA);
+
+ pr_debug("ret = %d\n", ret);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__cma_grab);
+
+void __cma_ungrab(struct cma_region *reg, phys_addr_t start, size_t size)
+{
+ if (reg->use_isolate)
+ free_contig_pages(pfn_to_page(phys_to_pfn(start)),
+ size >> PAGE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(__cma_ungrab);
+
+#else
+
+static inline phys_addr_t
+__cma_early_region_reserve_try_migrate_cma(struct cma_region *reg)
+{
+ return -EOPNOTSUPP;
+}
+
+int __cma_grab(struct cma_region *reg, phys_addr_t start, size_t size)
+{
+ (void)reg; (void)start; (void)size;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__cma_grab);
+
+void __cma_ungrab(struct cma_region *reg, phys_addr_t start, size_t size)
+{
+ (void)reg; (void)start; (void)size;
+}
+EXPORT_SYMBOL_GPL(__cma_ungrab);
+
+#endif
+
int __init cma_early_region_reserve(struct cma_region *reg)
{
int ret;
@@ -420,6 +643,13 @@ int __init cma_early_region_reserve(struct cma_region *reg)
reg->reserved)
return -EINVAL;

+ /*
+ * Try using cma_early_grab_maxpages() if the requested
+ * region's start is aligned to PAGE_SIZE << MAX_ORDER and
+ * its size is PAGE_SIZE << MAX_ORDER multiple.
+ */
+ ret = __cma_early_region_reserve_try_migrate_cma(reg);
+ if (ret ==-EOPNOTSUPP)
ret = __cma_early_reserve(reg);
if (!ret)
reg->reserved = 1;
@@ -1393,6 +1623,7 @@ struct cma *cma_gen_alloc(struct cma_region *reg,
{
unsigned long start;
struct cma *chunk;
+ int ret;

chunk = kmalloc(sizeof *chunk, GFP_KERNEL);
if (unlikely(!chunk))
@@ -1405,6 +1636,13 @@ struct cma *cma_gen_alloc(struct cma_region *reg,
return ERR_PTR(-ENOMEM);
}

+ ret = __cma_grab(reg, start, size);
+ if (ret) {
+ gen_pool_free(reg->private_data, start, size);
+ kfree(chunk);
+ return ERR_PTR(ret);
+ }
+
chunk->phys = start;
chunk->size = size;
return chunk;
@@ -1413,6 +1651,7 @@ struct cma *cma_gen_alloc(struct cma_region *reg,
static void cma_gen_free(struct cma *chunk)
{
gen_pool_free(chunk->reg->private_data, chunk->phys, chunk->size);
+ __cma_ungrab(chunk->reg, chunk->phys, chunk->size);
kfree(chunk);
}

--
1.7.2.3

2010-11-19 15:59:48

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 09/13] mm: alloc_contig_free_pages() added

From: KAMEZAWA Hiroyuki <[email protected]>

This commit introduces alloc_contig_free_pages() function
which allocates (ie. removes from buddy system) free pages
in range. Caller has to guarantee that all pages in range
are in buddy system.

Along with alloc_contig_free_pages(), a free_contig_pages()
function is provided which frees (or a subset of) pages
allocated with alloc_contig_free_pages().

I, Michal Nazarewicz, have modified the
alloc_contig_free_pages() function slightly from the original
version, mostly to make it easier to allocate note MAX_ORDER
aligned pages. This is done by making the function return
a pfn of a page one past the one allocated which may be
further then caller requested.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 3 ++
mm/page_alloc.c | 42 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
*/
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);

/*
* For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 07a6544..6dd2854 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5424,6 +5424,48 @@ out:
spin_unlock_irqrestore(&zone->lock, flags);
}

+unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag)
+{
+ unsigned long pfn = start, count;
+ struct page *page;
+ struct zone *zone;
+ int order;
+
+ VM_BUG_ON(!pfn_valid(pfn));
+ page = pfn_to_page(pfn);
+
+ zone = page_zone(page);
+ spin_lock_irq(&zone->lock);
+ for (;;) {
+ VM_BUG_ON(page_count(page) || !PageBuddy(page));
+ list_del(&page->lru);
+ order = page_order(page);
+ zone->free_area[order].nr_free--;
+ rmv_page_order(page);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+ pfn += 1 << order;
+ if (pfn >= end)
+ break;
+ VM_BUG_ON(!pfn_valid(pfn));
+ page += 1 << order;
+ }
+ spin_unlock_irq(&zone->lock);
+
+ /* After this, pages in the range can be freed one be one */
+ page = pfn_to_page(start);
+ for (count = pfn - start; count; --count, ++page)
+ prep_new_page(page, 0, flag);
+
+ return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+ for (; nr_pages; --nr_pages, ++page)
+ __free_page(page);
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* All pages in the range must be isolated before calling this.
--
1.7.2.3

2010-11-19 16:00:22

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 04/13] mm: cma: Contiguous Memory Allocator added

The Contiguous Memory Allocator framework is a set of APIs for
allocating physically contiguous chunks of memory.

Various chips require contiguous blocks of memory to operate. Those
chips include devices such as cameras, hardware video decoders and
encoders, etc.

The code is highly modular and customisable to suit the needs of
various users. Set of regions reserved for CMA can be configured
per-platform and it is easy to add custom allocator algorithms if one
has such need.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
Documentation/00-INDEX | 2 +
Documentation/contiguous-memory.txt | 573 +++++++++++++++++++++
include/linux/cma.h | 488 ++++++++++++++++++
mm/Kconfig | 41 ++
mm/Makefile | 1 +
mm/cma.c | 933 +++++++++++++++++++++++++++++++++++
6 files changed, 2038 insertions(+), 0 deletions(-)
create mode 100644 Documentation/contiguous-memory.txt
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 8dfc670..f93e787 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -94,6 +94,8 @@ connector/
- docs on the netlink based userspace<->kernel space communication mod.
console/
- documentation on Linux console drivers.
+contiguous-memory.txt
+ - documentation on physically-contiguous memory allocation framework.
cpu-freq/
- info on CPU frequency and voltage scaling.
cpu-hotplug.txt
diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
new file mode 100644
index 0000000..f1715ba
--- /dev/null
+++ b/Documentation/contiguous-memory.txt
@@ -0,0 +1,573 @@
+ -*- org -*-
+
+* Contiguous Memory Allocator
+
+ The Contiguous Memory Allocator (CMA) is a framework, which allows
+ setting up a machine-specific configuration for physically-contiguous
+ memory management. Memory for devices is then allocated according
+ to that configuration.
+
+ The main role of the framework is not to allocate memory, but to
+ parse and manage memory configurations, as well as to act as an
+ in-between between device drivers and pluggable allocators. It is
+ thus not tied to any memory allocation method or strategy.
+
+** Why is it needed?
+
+ Various devices on embedded systems have no scatter-getter and/or
+ IO map support and as such require contiguous blocks of memory to
+ operate. They include devices such as cameras, hardware video
+ decoders and encoders, etc.
+
+ Such devices often require big memory buffers (a full HD frame is,
+ for instance, more then 2 mega pixels large, i.e. more than 6 MB
+ of memory), which makes mechanisms such as kmalloc() ineffective.
+
+ Some embedded devices impose additional requirements on the
+ buffers, e.g. they can operate only on buffers allocated in
+ particular location/memory bank (if system has more than one
+ memory bank) or buffers aligned to a particular memory boundary.
+
+ Development of embedded devices have seen a big rise recently
+ (especially in the V4L area) and many such drivers include their
+ own memory allocation code. Most of them use bootmem-based methods.
+ CMA framework is an attempt to unify contiguous memory allocation
+ mechanisms and provide a simple API for device drivers, while
+ staying as customisable and modular as possible.
+
+** Design
+
+ The main design goal for the CMA was to provide a customisable and
+ modular framework, which could be configured to suit the needs of
+ individual systems. Configuration specifies a list of memory
+ regions, which then are assigned to devices. Memory regions can
+ be shared among many device drivers or assigned exclusively to
+ one. This has been achieved in the following ways:
+
+ 1. The core of the CMA does not handle allocation of memory and
+ management of free space. Dedicated allocators are used for
+ that purpose.
+
+ This way, if the provided solution does not match demands
+ imposed on a given system, one can develop a new algorithm and
+ easily plug it into the CMA framework.
+
+ 2. When requesting memory, devices have to introduce themselves.
+ This way CMA knows who the memory is allocated for. This
+ allows for the system architect to specify which memory regions
+ each device should use.
+
+ 3. Memory regions are grouped in various "types". When device
+ requests a chunk of memory, it can specify what type of memory
+ it needs. If no type is specified, "common" is assumed.
+
+ This makes it possible to configure the system in such a way,
+ that a single device may get memory from different memory
+ regions, depending on the "type" of memory it requested. For
+ example, a video codec driver might want to allocate some
+ shared buffers from the first memory bank and the other from
+ the second to get the highest possible memory throughput.
+
+ 4. For greater flexibility and extensibility, the framework allows
+ device drivers to register private regions of reserved memory
+ which then may be used only by them.
+
+ As an effect, if a driver would not use the rest of the CMA
+ interface, it can still use CMA allocators and other
+ mechanisms.
+
+ 4a. Early in boot process, device drivers can also request the
+ CMA framework to a reserve a region of memory for them
+ which then will be used as a private region.
+
+ This way, drivers do not need to directly call bootmem,
+ memblock or similar early allocator but merely register an
+ early region and the framework will handle the rest
+ including choosing the right early allocator.
+
+ 5. Even though memory region is allocated it can be moved around
+ unless driver pins it. This makes it possible to develop
+ a defragmentation scheme which would move buffers around when
+ they are not used by hardware at given moment.
+
+** Use cases
+
+ Let's analyse some imaginary system that uses the CMA to see how
+ the framework can be used and configured.
+
+
+ We have a platform with a hardware video decoder and a camera each
+ needing 20 MiB of memory in the worst case. Our system is written
+ in such a way though that the two devices are never used at the
+ same time and memory for them may be shared. In such a system the
+ following configuration would be used in the platform
+ initialisation code:
+
+ static struct cma_region regions[] = {
+ CMA_REGION("region", 20 << 20, 0, 0),
+ { }
+ }
+ static const char map[] __initconst = "video,camera=region";
+
+ cma_set_defaults(regions, map);
+
+ The regions array defines a single 20-MiB region named "region".
+ The map says that drivers named "video" and "camera" are to be
+ granted memory from the previously defined region.
+
+ A shorter map can be used as well:
+
+ static const char map[] __initconst = "*=region";
+
+ The asterisk ("*") matches all devices thus all devices will use
+ the region named "region".
+
+ We can see, that because the devices share the same memory region,
+ we save 20 MiB, compared to the situation when each of the devices
+ would reserve 20 MiB of memory for itself.
+
+
+ Now, let's say that we have also many other smaller devices and we
+ want them to share some smaller pool of memory. For instance 5
+ MiB. This can be achieved in the following way:
+
+ static struct cma_region regions[] = {
+ CMA_REGION("region", 20 << 20, 0, 0),
+ CMA_REGION("common", 5 << 20, 0, 0),
+ { }
+ }
+ static const char map[] __initconst =
+ "video,camera=region;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This instructs CMA to reserve two regions and let video and camera
+ use region "region" whereas all other devices should use region
+ "common".
+
+
+ Later on, after some development of the system, it can now run
+ video decoder and camera at the same time. The 20 MiB region is
+ no longer enough for the two to share. A quick fix can be made to
+ grant each of those devices separate regions:
+
+ static struct cma_region regions[] = {
+ CMA_REGION("v", 20 << 20, 0, 0),
+ CMA_REGION("c", 20 << 20, 0, 0),
+ CMA_REGION("common", 5 << 20, 0, 0),
+ { }
+ }
+ static const char map[] __initconst = "video=v;camera=c;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This solution also shows how with CMA you can assign private pools
+ of memory to each device if that is required.
+
+
+ Allocation mechanisms can be replaced dynamically in a similar
+ manner as well. Let's say that during testing, it has been
+ discovered that, for a given shared region of 40 MiB,
+ fragmentation has become a problem. It has been observed that,
+ after some time, it becomes impossible to allocate buffers of the
+ required sizes. So to satisfy our requirements, we would have to
+ reserve a larger shared region beforehand.
+
+ But fortunately, you have also managed to develop a new allocation
+ algorithm -- Neat Allocation Algorithm or "na" for short -- which
+ satisfies the needs for both devices even on a 30 MiB region. The
+ configuration can be then quickly changed to:
+
+ static struct cma_region regions[] = {
+ CMA_REGION("region", 30 << 20, 0, 0, .alloc_name = "na"),
+ CMA_REGION("common", 5 << 20, 0, 0),
+ { }
+ }
+ static const char map[] __initconst = "video,camera=region;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This shows how you can develop your own allocation algorithms if
+ the ones provided with CMA do not suit your needs and easily
+ replace them, without the need to modify CMA core or even
+ recompiling the kernel.
+
+** Technical Details
+
+*** The attributes
+
+ As shown above, CMA is configured by a two attributes: list
+ regions and map. The first one specifies regions that are to be
+ reserved for CMA. The second one specifies what regions each
+ device is assigned to.
+
+**** Regions
+
+ Regions is a list of regions terminated by a region with size
+ equal zero. The following fields may be set:
+
+ - size -- size of the region (required, must not be zero)
+ - alignment -- alignment of the region; must be power of two or
+ zero (optional)
+ - start -- where the region has to start (optional)
+ - alloc_name -- the name of allocator to use (optional)
+ - alloc -- allocator to use (optional; and besides
+ alloc_name is probably is what you want)
+
+ size, alignment and start is specified in bytes. Size will be
+ aligned up to a PAGE_SIZE. If alignment is less then a PAGE_SIZE
+ it will be set to a PAGE_SIZE. start will be aligned to
+ alignment.
+
+**** Map
+
+ The format of the "map" attribute is as follows:
+
+ map-attr ::= [ rules [ ';' ] ]
+ rules ::= rule [ ';' rules ]
+ rule ::= patterns '=' regions
+
+ patterns ::= pattern [ ',' patterns ]
+
+ regions ::= REG-NAME [ ',' regions ]
+ // list of regions to try to allocate memory
+ // from
+
+ pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ // pattern request must match for the rule to
+ // apply; the first rule that matches is
+ // applied; if dev-pattern part is omitted
+ // value identical to the one used in previous
+ // pattern is assumed.
+
+ dev-pattern ::= PATTERN
+ // pattern that device name must match for the
+ // rule to apply; may contain question marks
+ // which mach any characters and end with an
+ // asterisk which match the rest of the string
+ // (including nothing).
+
+ It is a sequence of rules which specify what regions should given
+ (device, type) pair use. The first rule that matches is applied.
+
+ For rule to match, the pattern must match (dev, type) pair.
+ Pattern consist of the part before and after slash. The first
+ part must match device name and the second part must match kind.
+
+ If the first part is empty, the device name is assumed to match
+ iff it matched in previous pattern. If the second part is
+ omitted it will mach any type of memory requested by device.
+
+ Some examples (whitespace added for better readability):
+
+ cma_map = foo/quaz = r1;
+ // device foo with type == "quaz" uses region r1
+
+ foo/* = r2; // OR:
+ /* = r2;
+ // device foo with any other kind uses region r2
+
+ bar = r1,r2;
+ // device bar uses region r1 or r2
+
+ baz?/a , baz?/b = r3;
+ // devices named baz? where ? is any character
+ // with type being "a" or "b" use r3
+
+*** The device and types of memory
+
+ The name of the device is taken from the device structure. It is
+ not possible to use CMA if driver does not register a device
+ (actually this can be overcome if a fake device structure is
+ provided with at least the name set).
+
+ The type of memory is an optional argument provided by the device
+ whenever it requests memory chunk. In many cases this can be
+ ignored but sometimes it may be required for some devices.
+
+ For instance, let's say that there are two memory banks and for
+ performance reasons a device uses buffers in both of them.
+ Platform defines a memory types "a" and "b" for regions in both
+ banks. The device driver would use those two types then to
+ request memory chunks from different banks. CMA attributes could
+ look as follows:
+
+ static struct cma_region regions[] = {
+ CMA_REGION("a", 32 << 20, 0, 0),
+ CMA_REGION("b", 32 << 20, 0, 512 << 20),
+ { }
+ }
+ static const char map[] __initconst = "foo/a=a;foo/b=b;*=a,b";
+
+ And whenever the driver allocated the memory it would specify the
+ kind of memory:
+
+ buffer1 = cma_alloc(dev, "a", 1 << 20, 0);
+ buffer2 = cma_alloc(dev, "b", 1 << 20, 0);
+
+ If it was needed to try to allocate from the other bank as well if
+ the dedicated one is full, the map attributes could be changed to:
+
+ static const char map[] __initconst = "foo/a=a,b;foo/b=b,a;*=a,b";
+
+ On the other hand, if the same driver was used on a system with
+ only one bank, the configuration could be changed just to:
+
+ static struct cma_region regions[] = {
+ CMA_REGION("r", 64 << 20, 0, 0),
+ { }
+ }
+ static const char map[] __initconst = "*=r";
+
+ without the need to change the driver at all.
+
+*** Device API
+
+ There are three basic calls provided by the CMA framework to
+ devices. To allocate a chunk of memory cma_alloc() function needs
+ to be used:
+
+ const struct cma *
+ cma_alloc(const struct device *dev, const char *type,
+ size_t size, unsigned long alignment);
+
+ If required, device may specify alignment in bytes that the chunk
+ need to satisfy. It have to be a power of two or zero. The
+ chunks are always aligned at least to a page.
+
+ The type specifies the type of memory as described to in the
+ previous subsection. If device driver does not care about memory
+ type it can safely pass NULL as the type which is the same as
+ possing "common".
+
+ The basic usage of the function is just a:
+
+ chunk = cma_alloc(dev, NULL, size, 0);
+
+ The function returns a pointer to an opaque structure (not really
+ opaque, its definition is in the header, but from device's point
+ of view it is opaque, ie. device must never touch it's internals).
+ On error an error-pointer is returned, so the correct way for
+ checking for errors is:
+
+ const struct cma *chunk = cma_alloc(dev, NULL, size, 0);
+ if (IS_ERR(chunk))
+ /* Error */
+ return PTR_ERR(chunk);
+ /* Allocated */
+
+ (Make sure to include <linux/err.h> which contains the definition
+ of the IS_ERR() and PTR_ERR() macros.)
+
+
+ Allocated chunk is freed via a cma_free() function:
+
+ void cma_free(const struct cma *chunk);
+
+
+ To use the chunk device must first pin it with the call to
+ cma_pin() function:
+
+ void cma_pin(const struct cma *chunk);
+
+ . Once chunk is pinned, its physical address may be queried with the
+ call to cma_phys() function:
+
+ phys_addr_t vcm_phys(const struct cma *chunk);
+
+ If device no longer needs the chunk to stay in the same place in
+ memory (but, obviously, requires its content not to be lost), it
+ should unpin the chunk with the call to cma_unpin():
+
+ void cma_unpin(const struct cma *chunk);
+
+ Unpinned chunks may be subject to defragmentation and they can be
+ moved around by the allocator as to join several small free areas
+ into one bigger (you know what defragmentation is about).
+
+
+ The last function is the cma_info() which returns information
+ about regions assigned to given (dev, type) pair. Its syntax is:
+
+ int cma_info(struct cma_info *info,
+ const struct device *dev,
+ const char *type);
+
+ On successful exit it fills the info structure with lower and
+ upper bound of regions, total size and number of regions assigned
+ to given (dev, type) pair.
+
+**** Dynamic and private regions
+
+ In the basic setup, regions are provided and initialised by
+ platform initialisation code (which usually use
+ cma_set_defaults() for that purpose).
+
+ It is, however, possible to create and add regions dynamically
+ using cma_region_register() function.
+
+ int cma_region_register(struct cma_region *reg);
+
+ The region does not have to have name. If it does not, it won't
+ be accessed via standard mapping (the one provided with map
+ attribute). Such regions are private and to allocate chunk from
+ them, one needs to call:
+
+ const struct cma *
+ cma_alloc_from_region(struct cma_region *reg,
+ size_t size, unsigned long alignment);
+
+ It is just like cma_alloc() expect one specifies what region to
+ allocate memory from. The region must have been registered.
+
+**** Allocating from region specified by name
+
+ If a driver preferred allocating from a region or list of regions
+ it knows name of it can use a different call simmilar to the
+ previous:
+
+ const struct cma *
+ cma_alloc_from(const char *regions,
+ size_t size, unsigned long alignment);
+
+ The first argument is a comma-separated list of regions the
+ driver desires CMA to try and allocate from. The list is
+ terminated by a NUL byte or a semicolon.
+
+ Similarly, there is a call for requesting information about named
+ regions:
+
+ int cma_info_about(struct cma_info *info, const char *regions);
+
+ Generally, it should not be needed to use those interfaces but
+ they are provided nevertheless.
+
+**** Registering early regions
+
+ An early region is a region that is managed by CMA early during
+ boot process. It's platforms responsibility to reserve memory
+ for early regions. Later on, when CMA initialises, early regions
+ with reserved memory are registered as normal regions.
+ Registering an early region may be a way for a device to request
+ a private pool of memory without worrying about actually
+ reserving the memory:
+
+ int cma_early_region_register(struct cma_region *reg);
+
+ This needs to be done quite early on in boot process, before
+ platform traverses the cma_early_regions list to reserve memory.
+
+ When boot process ends, device driver may see whether the region
+ was reserved (by checking reg->reserved flag) and if so, whether
+ it was successfully registered as a normal region (by checking
+ the reg->registered flag). If that is the case, device driver
+ can use normal API calls to use the region.
+
+*** Allocator operations
+
+ Creating an allocator for CMA needs four functions to be
+ implemented.
+
+
+ The first two are used to initialise an allocator for given driver
+ and clean up afterwards:
+
+ int cma_foo_init(struct cma_region *reg);
+ void cma_foo_cleanup(struct cma_region *reg);
+
+ The first is called when allocator is attached to region. When
+ the function is called, the cma_region structure is fully
+ initialised (ie. starting address and size have correct values).
+ As a meter of fact, allocator should never modify the cma_region
+ structure other then the private_data field which it may use to
+ point to it's private data.
+
+ The second call cleans up and frees all resources the allocator
+ has allocated for the region. The function can assume that all
+ chunks allocated form this region have been freed thus the whole
+ region is free.
+
+
+ Two other calls are used for allocating and freeing chunks. They
+ are:
+
+ struct cma *
+ cma_foo_alloc(struct cma_region *reg,
+ size_t size, unsigned long alignment);
+ void cma_foo_free(struct cma *chunk);
+
+ As names imply the first allocates a chunk and the other frees
+ a chunk of memory. The first one must also initialise size and
+ phys fields of the returned structure; On error, it must return an
+ error-pointer.
+
+
+ If allocator support pinning chunks, it needs to implement two
+ more functions:
+
+ void cma_foo_pin(struct cma *chunk);
+ void cma_foo_unpin(struct cma *chunk);
+
+ Among other things that depend on internal allocator pinning
+ implementation, the first function must also update the phys field
+ of the object pointed by chunk.
+
+
+ Any of the above four functions may assume that it is the only
+ thread accessing the region. Therefore, allocator does not need
+ to worry about concurrency. Moreover, all arguments are
+ guaranteed to be valid (i.e. page aligned size, a power of two
+ alignment no lower the a page size).
+
+
+ When allocator is ready, all that is left is to register it by
+ calling cma_allocator_register() function:
+
+ int cma_allocator_register(struct cma_allocator *alloc);
+
+ The argument is an structure with pointers to the above functions
+ and allocator's name. The whole call may look something like
+ this:
+
+ static struct cma_allocator alloc = {
+ .name = "foo",
+ .init = cma_foo_init,
+ .cleanup = cma_foo_cleanup,
+ .alloc = cma_foo_alloc,
+ .free = cma_foo_free,
+ .pin = cma_foo_pin, /* optional */
+ .unpin = cma_foo_unpin, /* optional */
+ };
+ return cma_allocator_register(&alloc);
+
+ The name ("foo") will be used when a this particular allocator is
+ requested as an allocator for given region.
+
+*** Integration with platform
+
+ There is one function that needs to be called form platform
+ initialisation code. That is the cma_early_regions_reserve()
+ function:
+
+ void cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+ It traverses list of all of the early regions provided by platform
+ and registered by drivers and reserves memory for them. The only
+ argument is a callback function used to reserve the region.
+ Passing NULL as the argument is the same as passing
+ cma_early_region_reserve() function which uses bootmem and
+ memblock for allocating.
+
+ Alternatively, platform code could traverse the cma_early_regions
+ list by itself but this should never be necessary.
+
+
+ Platform has also a way of providing default attributes for CMA,
+ cma_set_defaults() function is used for that purpose:
+
+ int cma_set_defaults(struct cma_region *regions, const char *map)
+
+ It needs to be called prior to reserving regions. It let one
+ specify the list of regions defined by platform and the map
+ attribute. The map may point to a string in __initdata. See
+ above in this document for example usage of this function.
diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 0000000..a6031a7
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,488 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+/***************************** Kernel level API *****************************/
+
+#if defined __KERNEL__ && defined CONFIG_CMA
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+
+
+struct device;
+struct cma_info;
+
+/**
+ * struct cma - an allocated contiguous chunk of memory.
+ * @phys: Chunk's physical address in bytes.
+ * @size: Chunk's size in bytes.
+ * @pinned: Number of times chunk has been pinned.
+ * @reg: Region this chunk belongs to.
+ *
+ * Fields of this structure should never be accessed directly by
+ * anything other than CMA core and allocators.
+ *
+ * Normal code should use cma_pin(), cma_unpin(), cma_phys(),
+ * cma_size() and cma_free() functions when dealing with struct cma.
+ *
+ * Allocator must fill the @size and @phys fields when chunk is
+ * created. If driver support pinning, @phys may be initialised as
+ * zero and updated by pin operation; unpin may then again set it to
+ * zero.
+ */
+struct cma {
+ phys_addr_t phys;
+ size_t size;
+ unsigned pinned;
+ struct cma_region *reg;
+};
+
+/*
+ * Don't call it directly, use cma_alloc(), cma_alloc_from() or
+ * cma_alloc_from_region().
+ */
+const struct cma *__must_check
+__cma_alloc(const struct device *dev, const char *type,
+ size_t size, unsigned long alignment);
+
+/* Don't call it directly, use cma_info() or cma_info_about(). */
+int
+__cma_info(struct cma_info *info, const struct device *dev, const char *type);
+
+
+/**
+ * cma_alloc() - allocates contiguous chunk of memory.
+ * @dev: The device to perform allocation for.
+ * @type: A type of memory to allocate. Platform may define
+ * several different types of memory and device drivers
+ * can then request chunks of different types. Usually it's
+ * safe to pass NULL here which is the same as passing
+ * "common".
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a pointer-error. Otherwise struct cma is returned
+ * which can be used with other CMA functions.
+ */
+static inline const struct cma *__must_check
+cma_alloc(const struct device *dev, const char *type,
+ size_t size, unsigned long alignment)
+{
+ return dev ? __cma_alloc(dev, type, size, alignment) : ERR_PTR(-EINVAL);
+}
+
+/**
+ * cma_free() - frees a chunk of memory.
+ * @chunk: Chunk to free. This must be a structure returned by
+ * cma_alloc() (or family). This may be NULL.
+ */
+void cma_free(const struct cma *chunk);
+
+/**
+ * cma_pin() - pins a chunk of memory.
+ * @chunk: Chunk to pin.
+ *
+ * Pinned chunk is one that cannot move in memory. Device drivers
+ * must pin chunk before they start using it. If chunk is unpinned it
+ * can be subject to memory defragmentation which in effect means that
+ * the chunk will change its address.
+ *
+ * In particular, if a device driver unpins memory chunk it must assume
+ * that previously used memory address is no longer valid.
+ *
+ * To unpin a function driver shall use cma_unpin() function.
+ *
+ * Chunk may be pinned several times. Each call to cma_pin() must be
+ * paired with a call to cma_unpin() and only the last one will really
+ * unpin the chunk.
+ *
+ * Returns chunk's physical address.
+ */
+phys_addr_t cma_pin(const struct cma *chunk);
+
+/**
+ * cma_unpin() - unpins a chunk of memory.
+ * @chunk: Chunk to unpin.
+ *
+ * See cma_pin().
+ */
+void cma_unpin(const struct cma *chunk);
+
+/**
+ * cma_phys() - returns chunk's physical address in bytes.
+ * @chunk: Chunk to query information about.
+ *
+ * Chunk must be pinned. Chunk must be pinned.
+ */
+static inline phys_addr_t cma_phys(const struct cma *chunk) {
+#ifdef CONFIG_CMA_DEBUG
+ WARN_ON(!chunk->pinned);
+#endif
+ return chunk->phys;
+}
+
+/**
+ * cma_size() - returns chunk's size in bytes.
+ * @chunk: Chunk to query information about.
+ */
+static inline size_t cma_size(const struct cma *chunk) {
+ return chunk->size;
+}
+
+/**
+ * struct cma_info - information about regions returned by cma_info().
+ * @lower_bound: The smallest address that is possible to be
+ * allocated for given (dev, type) pair.
+ * @upper_bound: The one byte after the biggest address that is
+ * possible to be allocated for given (dev, type)
+ * pair.
+ * @total_size: Total size of regions mapped to (dev, type) pair.
+ * @free_size: Total free size in all of the regions mapped to (dev, type)
+ * pair. Because of possible race conditions, it is not
+ * guaranteed that the value will be correct -- it gives only
+ * an approximation.
+ * @count: Number of regions mapped to (dev, type) pair.
+ */
+struct cma_info {
+ phys_addr_t lower_bound, upper_bound;
+ size_t total_size, free_size;
+ unsigned count;
+};
+
+/**
+ * cma_info - queries information about regions.
+ * @info: Pointer to a structure where to save the information.
+ * @dev: The device to query information for.
+ * @type: A type of memory to query information for.
+ * If unsure, pass NULL here which is equal to passing
+ * "common".
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info(struct cma_info *info, const struct device *dev, const char *type)
+{
+ return dev ? __cma_info(info, dev, type) : -EINVAL;
+}
+
+
+/****************************** Lower lever API *****************************/
+
+/**
+ * cma_alloc_from - allocates contiguous chunk of memory from named regions.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a pointer-error. Otherwise struct cma is returned
+ * holding information about allocated chunk.
+ */
+static inline const struct cma *__must_check
+cma_alloc_from(const char *regions, size_t size, unsigned long alignment)
+{
+ return __cma_alloc(NULL, regions, size, alignment);
+}
+
+/**
+ * cma_info_about - queries information about named regions.
+ * @info: Pointer to a structure where to save the information.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)
+{
+ return __cma_info(info, NULL, regions);
+}
+
+struct cma_allocator;
+
+/**
+ * struct cma_region - a region reserved for CMA allocations.
+ * @name: Unique name of the region. Read only.
+ * @start: physical address of the region in bytes.
+ * @size: size of the region in bytes.
+ * @free_space: Free space in the region. Read only.
+ * @alignment: Desired alignment of the region in bytes. A power of two,
+ * always at least page size. Early.
+ * @alloc: Allocator used with this region. On error an error-pointer
+ * should be returned. Private.
+ * @alloc_name: Allocator name read from cmdline. Private. This may be
+ * different from @alloc->name.
+ * @private_data: Allocator's private data.
+ * @users: Number of chunks allocated in this region.
+ * @list: Entry in list of regions. Private.
+ * @used: Whether region was already used, ie. there was at least
+ * one allocation request for. Private.
+ * @registered: Whether this region has been registered. Read only.
+ * @reserved: Whether this region has been reserved. Early. Read only.
+ * @copy_name: Whether @name and @alloc_name needs to be copied when
+ * this region is converted from early to normal. Early.
+ * Private.
+ * @free_alloc_name: Whether @alloc_name was kmalloced(). Private.
+ * @use_isolate: Whether to use MIGRATE_CMA. Private.
+ *
+ * Regions come in two types: an early region and normal region. The
+ * former can be reserved or not-reserved. Fields marked as "early"
+ * are only meaningful in early regions.
+ *
+ * Early regions are important only during initialisation. The list
+ * of early regions is built from the "cma" command line argument or
+ * platform defaults. Platform initialisation code is responsible for
+ * reserving space for unreserved regions that are placed on
+ * cma_early_regions list.
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+struct cma_region {
+ const char *name;
+ phys_addr_t start;
+ size_t size;
+ union {
+ size_t free_space; /* Normal region */
+ unsigned long alignment; /* Early region */
+ };
+
+ struct cma_allocator *alloc;
+ const char *alloc_name;
+ void *private_data;
+
+#ifdef CONFIG_CMA_USE_MIGRATE_CMA
+ unsigned short *isolation_map;
+#endif
+
+ unsigned users;
+ struct list_head list;
+
+ unsigned used:1;
+ unsigned registered:1;
+ unsigned reserved:1;
+ unsigned copy_name:1;
+ unsigned free_alloc_name:1;
+ unsigned use_isolate:1;
+};
+
+/**
+ * CMA_REGION() - helper macro for defining struct cma_region objects.
+ * @name: name of te structure.
+ * @_size: size of the structure in bytes.
+ * @_alignment: desired alignment of the region in bytes, must be power
+ * of two or zero.
+ * @_start: desired starting address of the region, may be zero.
+ * @rest: any additional initializers.
+ */
+#define CMA_REGION(name, _size, _alignment, _start, rest...) { \
+ (name), \
+ .start = (_start), \
+ .size = (_size), \
+ { .alignment = (_alignment) }, \
+ rest \
+ }
+
+/**
+ * cma_region_register() - registers a region.
+ * @reg: Region to register.
+ *
+ * Region's start and size must be set.
+ *
+ * If name is set the region will be accessible using normal mechanism
+ * like mapping or cma_alloc_from() function otherwise it will be
+ * a private region and accessible only using the
+ * cma_alloc_from_region() function.
+ *
+ * If alloc is set function will try to initialise given allocator
+ * (and will return error if it failes). Otherwise alloc_name may
+ * point to a name of an allocator to use (if not set, the default
+ * will be used).
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. In particular, -EADDRINUSE if
+ * region overlap with already existing region.
+ */
+int __must_check cma_region_register(struct cma_region *reg);
+
+/**
+ * cma_alloc_from_region() - allocates contiguous chunk of memory from region.
+ * @reg: Region to allocate chunk from.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a pointer-error. Otherwise struct cma is returned
+ * holding information about allocated chunk.
+ */
+const struct cma *__must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, unsigned long alignment);
+
+
+
+/****************************** Allocators API ******************************/
+
+/**
+ * struct cma_allocator - a CMA allocator.
+ * @name: Allocator's unique name
+ * @init: Initialises an allocator on given region.
+ * @cleanup: Cleans up after init. May assume that there are no chunks
+ * allocated in given region.
+ * @alloc: Allocates a chunk of memory of given size in bytes and
+ * with given alignment. Alignment is a power of
+ * two (thus non-zero) and callback does not need to check it.
+ * May also assume that it is the only call that uses given
+ * region (ie. access to the region is synchronised with
+ * a mutex). This has to allocate the chunk object (it may be
+ * embeded in a bigger structure with allocator-specific data.
+ * Required.
+ * @free: Frees allocated chunk. May also assume that it is the only
+ * call that uses given region. This has to free() the chunk
+ * object as well. Required.
+ * @pin: Pins chunk. Optional.
+ * @unpin: Unpins chunk. Optional.
+ * @list: Entry in list of allocators. Private.
+ *
+ * Allocator has to initialise the size fields of struct cma in alloc
+ * and correctly manage the its phys field. size field may be more
+ * then requested in alloc call. If allocator supports pinning alloc
+ * may initialise phys to zero but it then has to be updated when pin
+ * is called.
+ */
+struct cma_allocator {
+ const char *name;
+
+ int (*init)(struct cma_region *reg);
+ void (*cleanup)(struct cma_region *reg);
+ struct cma *(*alloc)(struct cma_region *reg, size_t size,
+ unsigned long alignment);
+ void (*free)(struct cma *chunk);
+ void (*pin)(struct cma *chunk);
+ void (*unpin)(struct cma *chunk);
+
+ struct list_head list;
+};
+
+/**
+ * cma_allocator_register() - Registers an allocator.
+ * @alloc: Allocator to register.
+ *
+ * Adds allocator to the list of allocators managed by CMA.
+ *
+ * All of the fields of cma_allocator structure must be set except for
+ * the optional name and the list's head which will be overriden
+ * anyway.
+ *
+ * Returns zero or negative error code.
+ */
+int cma_allocator_register(struct cma_allocator *alloc);
+
+
+/**************************** Initialisation API ****************************/
+
+/**
+ * cma_set_defaults() - specifies default command line parameters.
+ * @regions: A zero-sized entry terminated list of early regions.
+ * This array must not be placed in __initdata section.
+ * @map: Map attribute.
+ *
+ * This function should be called prior to cma_early_regions_reserve()
+ * and after early parameters have been parsed.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_set_defaults(struct cma_region *regions, const char *map);
+
+/**
+ * cma_early_regions - a list of early regions.
+ *
+ * Platform needs to allocate space for each of the region before
+ * initcalls are executed. If space is reserved, the reserved flag
+ * must be set. Platform initialisation code may choose to use
+ * cma_early_regions_allocate().
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+extern struct list_head cma_early_regions __initdata;
+
+/**
+ * cma_early_region_register() - registers an early region.
+ * @reg: Region to add.
+ *
+ * Region's size, start and alignment must be set (however the last
+ * two can be zero). If name is set the region will be accessible
+ * using normal mechanism like mapping or cma_alloc_from() function
+ * otherwise it will be a private region accessible only using the
+ * cma_alloc_from_region().
+ *
+ * During platform initialisation, space is reserved for early
+ * regions. Later, when CMA initialises, the early regions are
+ * "converted" into normal regions. If cma_region::alloc is set, CMA
+ * will then try to setup given allocator on the region. Failure to
+ * do so will result in the region not being registered even though
+ * the space for it will still be reserved. If cma_region::alloc is
+ * not set, allocator will be attached to the region on first use and
+ * the value of cma_region::alloc_name will be taken into account if
+ * set.
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. No checking if regions overlap is
+ * performed.
+ */
+int __init __must_check cma_early_region_register(struct cma_region *reg);
+
+/**
+ * cma_early_region_reserve() - reserves a physically contiguous memory region.
+ * @reg: Early region to reserve memory for.
+ *
+ * If platform supports bootmem this is the first allocator this
+ * function tries to use. If that failes (or bootmem is not
+ * supported) function tries to use memblec if it is available.
+ *
+ * On success sets reg->reserved flag.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_early_region_reserve(struct cma_region *reg);
+
+/**
+ * cma_early_regions_reserve() - helper function for reserving early regions.
+ * @reserve: Callbac function used to reserve space for region. Needs
+ * to return non-negative if allocation succeeded, negative
+ * error otherwise. NULL means cma_early_region_alloc() will
+ * be used.
+ *
+ * This function traverses the %cma_early_regions list and tries to
+ * reserve memory for each early region. It uses the @reserve
+ * callback function for that purpose. The reserved flag of each
+ * region is updated accordingly.
+ */
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+#endif
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index b911ad3..c7eb1bc 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -331,3 +331,44 @@ config CLEANCACHE
in a negligible performance hit.

If unsure, say Y to enable cleancache
+
+config CMA
+ bool "Contiguous Memory Allocator framework"
+ # Currently there is only one allocator so force it on
+ select CMA_GENERIC_ALLOCATOR
+ help
+ This enables the Contiguous Memory Allocator framework which
+ allows drivers to allocate big physically-contiguous blocks of
+ memory for use with hardware components that do not support I/O
+ map nor scatter-gather.
+
+ If you select this option you will also have to select at least
+ one allocator algorithm below.
+
+ To make use of CMA you need to specify the regions and
+ driver->region mapping on command line when booting the kernel.
+
+ For more information see <Documentation/contiguous-memory.txt>.
+ If unsure, say "n".
+
+config CMA_DEBUG
+ bool "CMA debug messages (DEVELOPEMENT)"
+ depends on CMA
+ help
+ Turns on debug messages in CMA. This produces KERN_DEBUG
+ messages for every CMA call as well as various messages while
+ processing calls such as cma_alloc(). This option does not
+ affect warning and error messages.
+
+ This is mostly used during development. If unsure, say "n".
+
+config CMA_GENERIC_ALLOCATOR
+ bool "CMA generic allocator"
+ depends on CMA
+ select GENERIC_ALLOCATOR
+ help
+ This is an allocator that uses a generic allocator API provided
+ by kernel. The generic allocator can use either of two
+ implementations: the first-fit, bitmap-based algorithm or
+ a best-fit, red-black tree-based algorithm. The algorithm can
+ be changed under "Library routines".
diff --git a/mm/Makefile b/mm/Makefile
index 0b08d1c..c6a84f1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
+obj-$(CONFIG_CMA) += cma.o
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 0000000..17276b3
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,933 @@
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#ifndef CONFIG_NO_BOOTMEM
+# include <linux/bootmem.h> /* alloc_bootmem_pages_nopanic() */
+#endif
+#ifdef CONFIG_HAVE_MEMBLOCK
+# include <linux/memblock.h> /* memblock*() */
+#endif
+#include <linux/device.h> /* struct device, dev_name() */
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR, PTR_ERR, etc. */
+#include <linux/mm.h> /* PAGE_ALIGN() */
+#include <linux/module.h> /* EXPORT_SYMBOL_GPL() */
+#include <linux/mutex.h> /* mutex */
+#include <linux/slab.h> /* kmalloc() */
+#include <linux/string.h> /* str*() */
+#include <linux/genalloc.h> /* gen_pool_*() */
+
+#include <linux/cma.h>
+
+
+/* Protects cma_regions, cma_allocators, cma_map and cma_map_length. */
+static DEFINE_MUTEX(cma_mutex);
+
+
+/************************* Map attribute *************************/
+
+static const char *cma_map;
+static size_t cma_map_length;
+
+/*
+ * map-attr ::= [ rules [ ';' ] ]
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ *
+ * See Documentation/contiguous-memory.txt for details.
+ */
+static ssize_t cma_map_validate(const char *param)
+{
+ const char *ch = param;
+
+ if (*ch == '\0' || *ch == '\n')
+ return 0;
+
+ for (;;) {
+ const char *start = ch;
+
+ while (*ch && *ch != '\n' && *ch != ';' && *ch != '=')
+ ++ch;
+
+ if (*ch != '=' || start == ch) {
+ pr_err("map: expecting \"<patterns>=<regions>\" near %s\n",
+ start);
+ return -EINVAL;
+ }
+
+ while (*++ch != ';')
+ if (*ch == '\0' || *ch == '\n')
+ return ch - param;
+ if (ch[1] == '\0' || ch[1] == '\n')
+ return ch - param;
+ ++ch;
+ }
+}
+
+static int __init cma_map_param(char *param)
+{
+ ssize_t len;
+
+ pr_debug("param: map: %s\n", param);
+
+ len = cma_map_validate(param);
+ if (len < 0)
+ return len;
+
+ cma_map = param;
+ cma_map_length = len;
+ return 0;
+}
+
+
+/************************* Early regions *************************/
+
+struct list_head cma_early_regions __initdata =
+ LIST_HEAD_INIT(cma_early_regions);
+
+
+int __init __must_check cma_early_region_register(struct cma_region *reg)
+{
+ unsigned long alignment;
+ phys_addr_t start;
+ size_t size;
+
+ if (reg->alignment & (reg->alignment - 1))
+ return -EINVAL;
+
+ alignment = max(reg->alignment, (unsigned long)PAGE_SIZE);
+ start = ALIGN(reg->start, alignment);
+ size = PAGE_ALIGN(reg->size);
+
+ if (start + size < start)
+ return -EINVAL;
+
+ reg->size = size;
+ reg->start = start;
+ reg->alignment = alignment;
+
+ list_add_tail(&reg->list, &cma_early_regions);
+
+ pr_debug("param: registering early region %s (%p@%p/%p)\n",
+ reg->name, (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+
+ return 0;
+}
+
+
+/************************* Regions & Allocators *************************/
+
+static int __cma_region_attach_alloc(struct cma_region *reg);
+
+/* List of all regions. Named regions are kept before unnamed. */
+static LIST_HEAD(cma_regions);
+
+#define cma_foreach_region(reg) \
+ list_for_each_entry(reg, &cma_regions, list)
+
+int __must_check cma_region_register(struct cma_region *reg)
+{
+ const char *name, *alloc_name;
+ struct cma_region *r;
+ char *ch = NULL;
+ int ret = 0;
+
+ if (!reg->size || reg->start + reg->size < reg->start)
+ return -EINVAL;
+
+ reg->users = 0;
+ reg->used = 0;
+ reg->private_data = NULL;
+ reg->registered = 0;
+ reg->free_space = reg->size;
+
+ /* Copy name and alloc_name */
+ name = reg->name;
+ alloc_name = reg->alloc_name;
+ if (reg->copy_name && (reg->name || reg->alloc_name)) {
+ size_t name_size, alloc_size;
+
+ name_size = reg->name ? strlen(reg->name) + 1 : 0;
+ alloc_size = reg->alloc_name ? strlen(reg->alloc_name) + 1 : 0;
+
+ ch = kmalloc(name_size + alloc_size, GFP_KERNEL);
+ if (!ch) {
+ pr_err("%s: not enough memory to allocate name\n",
+ reg->name ?: "(private)");
+ return -ENOMEM;
+ }
+
+ if (name_size) {
+ memcpy(ch, reg->name, name_size);
+ name = ch;
+ ch += name_size;
+ }
+
+ if (alloc_size) {
+ memcpy(ch, reg->alloc_name, alloc_size);
+ alloc_name = ch;
+ }
+ }
+
+ mutex_lock(&cma_mutex);
+
+ /* Don't let regions overlap */
+ cma_foreach_region(r)
+ if (r->start + r->size > reg->start &&
+ r->start < reg->start + reg->size) {
+ ret = -EADDRINUSE;
+ goto done;
+ }
+
+ if (reg->alloc) {
+ ret = __cma_region_attach_alloc(reg);
+ if (unlikely(ret < 0))
+ goto done;
+ }
+
+ reg->name = name;
+ reg->alloc_name = alloc_name;
+ reg->registered = 1;
+ ch = NULL;
+
+ /*
+ * Keep named at the beginning and unnamed (private) at the
+ * end. This helps in traversal when named region is looked
+ * for.
+ */
+ if (name)
+ list_add(&reg->list, &cma_regions);
+ else
+ list_add_tail(&reg->list, &cma_regions);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: region %sregistered\n",
+ reg->name ?: "(private)", ret ? "not " : "");
+ kfree(ch);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cma_region_register);
+
+static struct cma_region *__must_check
+__cma_region_find(const char **namep)
+{
+ struct cma_region *reg;
+ const char *ch, *name;
+ size_t n;
+
+ ch = *namep;
+ while (*ch && *ch != ',' && *ch != ';')
+ ++ch;
+ name = *namep;
+ *namep = *ch == ',' ? ch + 1 : ch;
+ n = ch - name;
+
+ /*
+ * Named regions are kept in front of unnamed so if we
+ * encounter unnamed region we can stop.
+ */
+ cma_foreach_region(reg)
+ if (!reg->name)
+ break;
+ else if (!strncmp(name, reg->name, n) && !reg->name[n])
+ return reg;
+
+ return NULL;
+}
+
+/* List of all allocators. */
+static LIST_HEAD(cma_allocators);
+
+#define cma_foreach_allocator(alloc) \
+ list_for_each_entry(alloc, &cma_allocators, list)
+
+int cma_allocator_register(struct cma_allocator *alloc)
+{
+ struct cma_region *reg;
+ int first;
+
+ if (!alloc->alloc || !alloc->free)
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ first = list_empty(&cma_allocators);
+
+ list_add_tail(&alloc->list, &cma_allocators);
+
+ /*
+ * Attach this allocator to all allocator-less regions that
+ * request this particular allocator (reg->alloc_name equals
+ * alloc->name) or if region wants the first available
+ * allocator and we are the first.
+ */
+ cma_foreach_region(reg) {
+ if (reg->alloc)
+ continue;
+ if (reg->alloc_name
+ ? alloc->name && !strcmp(alloc->name, reg->alloc_name)
+ : (!reg->used && first))
+ continue;
+
+ reg->alloc = alloc;
+ __cma_region_attach_alloc(reg);
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: allocator registered\n", alloc->name ?: "(unnamed)");
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cma_allocator_register);
+
+static struct cma_allocator *__must_check
+__cma_allocator_find(const char *name)
+{
+ struct cma_allocator *alloc;
+
+ if (!name)
+ return list_empty(&cma_allocators)
+ ? NULL
+ : list_entry(cma_allocators.next,
+ struct cma_allocator, list);
+
+ cma_foreach_allocator(alloc)
+ if (alloc->name && !strcmp(name, alloc->name))
+ return alloc;
+
+ return NULL;
+}
+
+
+/************************* Initialise CMA *************************/
+
+int __init cma_set_defaults(struct cma_region *regions, const char *map)
+{
+ if (map) {
+ int ret = cma_map_param((char *)map);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ if (!regions)
+ return 0;
+
+ for (; regions->size; ++regions) {
+ int ret = cma_early_region_register(regions);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ return 0;
+}
+
+static int __init
+__cma_early_reserve(struct cma_region *reg)
+{
+ bool tried = false;
+
+#ifndef CONFIG_NO_BOOTMEM
+
+ tried = true;
+
+ {
+ void *ptr = __alloc_bootmem_nopanic(reg->size, reg->alignment,
+ reg->start);
+ if (ptr) {
+ reg->start = virt_to_phys(ptr);
+ return 0;
+ }
+ }
+
+#endif
+
+#ifdef CONFIG_HAVE_MEMBLOCK
+
+ tried = true;
+
+ if (reg->start) {
+ if (!memblock_is_region_reserved(reg->start, reg->size) &&
+ memblock_reserve(reg->start, reg->size) >= 0)
+ return 0;
+ } else {
+ /*
+ * Use __memblock_alloc_base() since
+ * memblock_alloc_base() panic()s.
+ */
+ u64 ret = __memblock_alloc_base(reg->size, reg->alignment, 0);
+ if (ret && ret + reg->size < ~(phys_addr_t)0) {
+ reg->start = ret;
+ return 0;
+ }
+
+ if (ret)
+ memblock_free(ret, reg->size);
+ }
+
+#endif
+
+ return tried ? -ENOMEM : -EOPNOTSUPP;
+}
+
+int __init cma_early_region_reserve(struct cma_region *reg)
+{
+ int ret;
+
+ pr_debug("%s\n", __func__);
+
+ if (!reg->size || (reg->alignment & (reg->alignment - 1)) ||
+ reg->reserved)
+ return -EINVAL;
+
+ ret = __cma_early_reserve(reg);
+ if (!ret)
+ reg->reserved = 1;
+ return 0;
+}
+
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg))
+{
+ struct cma_region *reg;
+
+ pr_debug("init: reserving early regions\n");
+
+ if (!reserve)
+ reserve = cma_early_region_reserve;
+
+ list_for_each_entry(reg, &cma_early_regions, list) {
+ if (reg->reserved) {
+ /* nothing */
+ } else if (reserve(reg) >= 0) {
+ pr_debug("init: %s: reserved %p@%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size,
+ (void *)reg->start);
+ reg->reserved = 1;
+ } else {
+ pr_warn("init: %s: unable to reserve %p@%p/%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size,
+ (void *)reg->start,
+ (void *)reg->alignment);
+ }
+ }
+}
+
+static int __init cma_init(void)
+{
+ struct cma_region *reg, *n;
+
+ pr_debug("init: initialising\n");
+
+ if (cma_map) {
+ char *val = kmemdup(cma_map, cma_map_length + 1, GFP_KERNEL);
+ cma_map = val;
+ if (!val)
+ return -ENOMEM;
+ val[cma_map_length] = '\0';
+ }
+
+ list_for_each_entry_safe(reg, n, &cma_early_regions, list) {
+ INIT_LIST_HEAD(&reg->list);
+ /*
+ * We don't care if there was an error. It's a pity
+ * but there's not much we can do about it any way.
+ * If the error is on a region that was parsed from
+ * command line then it will stay and waste a bit of
+ * space; if it was registered using
+ * cma_early_region_register() it's caller's
+ * responsibility to do something about it.
+ */
+ if (reg->reserved && cma_region_register(reg) < 0)
+ /* ignore error */;
+ }
+
+ INIT_LIST_HEAD(&cma_early_regions);
+
+ return 0;
+}
+/*
+ * We want to be initialised earlier than module_init/__initcall so
+ * that drivers that want to grab memory at boot time will get CMA
+ * ready. subsys_initcall() seems early enough and not too early at
+ * the same time.
+ */
+subsys_initcall(cma_init);
+
+
+/************************* The Device API *************************/
+
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type);
+
+/* Allocate. */
+static const struct cma *__must_check
+__cma_alloc_from_region(struct cma_region *reg,
+ size_t size, unsigned long alignment)
+{
+ struct cma *chunk;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!reg || reg->free_space < size)
+ return ERR_PTR(-ENOSPC);
+
+ if (!reg->alloc) {
+ if (!reg->used)
+ __cma_region_attach_alloc(reg);
+ if (!reg->alloc)
+ return ERR_PTR(-ENOMEM);
+ }
+
+ chunk = reg->alloc->alloc(reg, size, alignment);
+ if (IS_ERR_OR_NULL(chunk))
+ return chunk ? ERR_CAST(chunk) : ERR_PTR(-ENOMEM);
+
+ chunk->pinned = 0;
+ chunk->reg = reg;
+ ++reg->users;
+ reg->free_space -= chunk->size;
+ pr_debug("allocated (at %p)\n", (void *)chunk->phys);
+ return chunk;
+}
+
+const struct cma *__must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, unsigned long alignment)
+{
+ const struct cma *chunk;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!size || alignment & (alignment - 1) || !reg)
+ return ERR_PTR(-EINVAL);
+
+ mutex_lock(&cma_mutex);
+
+ if (reg->registered) {
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+ chunk = __cma_alloc_from_region(reg, PAGE_ALIGN(size), alignment);
+ } else {
+ chunk = ERR_PTR(-EINVAL);
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ return chunk;
+}
+EXPORT_SYMBOL_GPL(cma_alloc_from_region);
+
+const struct cma *__must_check
+__cma_alloc(const struct device *dev, const char *type,
+ size_t size, unsigned long alignment)
+{
+ struct cma_region *reg;
+ const struct cma *chunk;
+ const char *from;
+
+ if (dev)
+ pr_debug("allocate %p/%p for %s/%s\n",
+ (void *)size, (void *)alignment,
+ dev_name(dev), type ?: "");
+
+ if (!size || alignment & (alignment - 1))
+ return ERR_PTR(-EINVAL);
+
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (unlikely(IS_ERR(from))) {
+ chunk = ERR_CAST(from);
+ goto done;
+ }
+
+ pr_debug("allocate %p/%p from one of %s\n",
+ (void *)size, (void *)alignment, from);
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ chunk = __cma_alloc_from_region(reg, size, alignment);
+ if (!IS_ERR(chunk))
+ goto done;
+ }
+
+ pr_debug("not enough memory\n");
+ chunk = ERR_PTR(-ENOMEM);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ return chunk;
+}
+EXPORT_SYMBOL_GPL(__cma_alloc);
+
+/* Query information about regions. */
+int
+__cma_info(struct cma_info *infop, const struct device *dev, const char *type)
+{
+ struct cma_info info = { ~(phys_addr_t)0, 0, 0, 0, 0 };
+ struct cma_region *reg;
+ const char *from;
+ int ret;
+
+ if (unlikely(!infop))
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (IS_ERR(from)) {
+ ret = PTR_ERR(from);
+ info.lower_bound = 0;
+ goto done;
+ }
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ if (reg) {
+ info.total_size += reg->size;
+ info.free_size += reg->free_space;
+ if (info.lower_bound > reg->start)
+ info.lower_bound = reg->start;
+ if (info.upper_bound < reg->start + reg->size)
+ info.upper_bound = reg->start + reg->size;
+ ++info.count;
+ }
+ }
+
+ ret = 0;
+done:
+ mutex_unlock(&cma_mutex);
+
+ memcpy(infop, &info, sizeof info);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__cma_info);
+
+/* Freeing. */
+void cma_free(const struct cma *_chunk)
+{
+ pr_debug("cma_free([%p])\n", (void *)_chunk);
+
+ if (_chunk) {
+ struct cma *chunk = (struct cma *)_chunk;
+
+ mutex_lock(&cma_mutex);
+
+ if (WARN_ON(chunk->pinned) && chunk->reg->alloc->unpin)
+ chunk->reg->alloc->unpin(chunk);
+
+ --chunk->reg->users;
+ chunk->reg->free_space += chunk->size;
+ chunk->reg->alloc->free(chunk);
+
+ mutex_unlock(&cma_mutex);
+ }
+}
+EXPORT_SYMBOL_GPL(cma_free);
+
+/* Pinning */
+phys_addr_t cma_pin(const struct cma *_chunk)
+{
+ struct cma *chunk = (struct cma *)_chunk;
+
+ pr_debug("cma_pin([%p])\n", (void *)chunk);
+
+ mutex_lock(&cma_mutex);
+
+ if (++chunk->pinned == 1 && chunk->reg->alloc->pin)
+ chunk->reg->alloc->pin(chunk);
+
+ mutex_unlock(&cma_mutex);
+
+ return chunk->phys;
+}
+EXPORT_SYMBOL_GPL(cma_pin);
+
+void cma_unpin(const struct cma *_chunk)
+{
+ struct cma *chunk = (struct cma *)_chunk;
+
+ pr_debug("cma_unpin([%p])\n", (void *)chunk);
+
+ mutex_lock(&cma_mutex);
+
+ if (!--chunk->pinned && chunk->reg->alloc->unpin)
+ chunk->reg->alloc->unpin(chunk);
+
+ mutex_unlock(&cma_mutex);
+}
+EXPORT_SYMBOL_GPL(cma_unpin);
+
+
+/************************* Miscellaneous *************************/
+
+static int __cma_region_attach_alloc(struct cma_region *reg)
+{
+ struct cma_allocator *alloc;
+ int ret;
+
+ /*
+ * If reg->alloc is set then caller wants us to use this
+ * allocator. Otherwise we need to find one by name.
+ */
+ if (reg->alloc) {
+ alloc = reg->alloc;
+ } else {
+ alloc = __cma_allocator_find(reg->alloc_name);
+ if (!alloc) {
+ pr_warn("init: %s: %s: no such allocator\n",
+ reg->name ?: "(private)",
+ reg->alloc_name ?: "(default)");
+ reg->used = 1;
+ return -ENOENT;
+ }
+ }
+
+ /* Try to initialise the allocator. */
+ reg->private_data = NULL;
+ ret = alloc->init ? alloc->init(reg) : 0;
+ if (unlikely(ret < 0)) {
+ pr_err("init: %s: %s: unable to initialise allocator\n",
+ reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ reg->alloc = NULL;
+ reg->used = 1;
+ } else {
+ reg->alloc = alloc;
+ pr_debug("init: %s: %s: initialised allocator\n",
+ reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ }
+ return ret;
+}
+
+
+/*
+ * s ::= rules
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ */
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type)
+{
+ /*
+ * This function matches the pattern from the map attribute
+ * agains given device name and type. Type may be of course
+ * NULL or an emtpy string.
+ */
+
+ const char *s, *name;
+ int name_matched = 0;
+
+ /*
+ * If dev is NULL we were called in alternative form where
+ * type is the from string. All we have to do is return it.
+ */
+ if (!dev)
+ return type ?: ERR_PTR(-EINVAL);
+
+ if (!cma_map)
+ return ERR_PTR(-ENOENT);
+
+ name = dev_name(dev);
+ if (WARN_ON(!name || !*name))
+ return ERR_PTR(-EINVAL);
+
+ if (!type)
+ type = "common";
+
+ /*
+ * Now we go throught the cma_map attribute.
+ */
+ for (s = cma_map; *s; ++s) {
+ const char *c;
+
+ /*
+ * If the pattern starts with a slash, the device part of the
+ * pattern matches if it matched previously.
+ */
+ if (*s == '/') {
+ if (!name_matched)
+ goto look_for_next;
+ goto match_type;
+ }
+
+ /*
+ * We are now trying to match the device name. This also
+ * updates the name_matched variable. If, while reading the
+ * spec, we ecnounter comma it means that the pattern does not
+ * match and we need to start over with another pattern (the
+ * one afther the comma). If we encounter equal sign we need
+ * to start over with another rule. If there is a character
+ * that does not match, we neet to look for a comma (to get
+ * another pattern) or semicolon (to get another rule) and try
+ * again if there is one somewhere.
+ */
+
+ name_matched = 0;
+
+ for (c = name; *s != '*' && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*s != '?' && *c != *s)
+ goto look_for_next;
+ if (*s == '*')
+ ++s;
+
+ name_matched = 1;
+
+ /*
+ * Now we need to match the type part of the pattern. If the
+ * pattern is missing it we match only if type points to an
+ * empty string. Otherwise wy try to match it just like name.
+ */
+ if (*s == '/') {
+match_type: /* s points to '/' */
+ ++s;
+
+ for (c = type; *s && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*c != *s)
+ goto look_for_next;
+ }
+
+ /* Return the string behind the '=' sign of the rule. */
+ if (*s == '=')
+ return s + 1;
+ else if (*s == ',')
+ return strchr(s, '=') + 1;
+
+ /* Pattern did not match */
+
+look_for_next:
+ do {
+ ++s;
+ } while (*s != ',' && *s != '=');
+ if (*s == ',')
+ continue;
+
+next_rule: /* s points to '=' */
+ s = strchr(s, ';');
+ if (!s)
+ break;
+
+next_pattern:
+ continue;
+ }
+
+ return ERR_PTR(-ENOENT);
+}
+
+
+/************************* Generic allocator *************************/
+
+#ifdef CONFIG_CMA_GENERIC_ALLOCATOR
+
+static int cma_gen_init(struct cma_region *reg)
+{
+ struct gen_pool *pool;
+ int ret;
+
+ pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (unlikely(!pool))
+ return -ENOMEM;
+
+ ret = gen_pool_add(pool, reg->start, reg->size, -1);
+ if (unlikely(ret)) {
+ gen_pool_destroy(pool);
+ return ret;
+ }
+
+ reg->private_data = pool;
+ return 0;
+}
+
+static void cma_gen_cleanup(struct cma_region *reg)
+{
+ gen_pool_destroy(reg->private_data);
+}
+
+struct cma *cma_gen_alloc(struct cma_region *reg,
+ size_t size, unsigned long alignment)
+{
+ unsigned long start;
+ struct cma *chunk;
+
+ chunk = kmalloc(sizeof *chunk, GFP_KERNEL);
+ if (unlikely(!chunk))
+ return ERR_PTR(-ENOMEM);
+
+ start = gen_pool_alloc_aligned(reg->private_data, size,
+ alignment ? ffs(alignment) - 1 : 0);
+ if (!start) {
+ kfree(chunk);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ chunk->phys = start;
+ chunk->size = size;
+ return chunk;
+}
+
+static void cma_gen_free(struct cma *chunk)
+{
+ gen_pool_free(chunk->reg->private_data, chunk->phys, chunk->size);
+ kfree(chunk);
+}
+
+static int cma_gen_module_init(void)
+{
+ static struct cma_allocator alloc = {
+ .name = "gen",
+ .init = cma_gen_init,
+ .cleanup = cma_gen_cleanup,
+ .alloc = cma_gen_alloc,
+ .free = cma_gen_free,
+ };
+ return cma_allocator_register(&alloc);
+}
+module_init(cma_gen_module_init);
+
+#endif
--
1.7.2.3

2010-11-19 16:00:59

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 08/13] mm: move some functions to page_isolation.c

From: KAMEZAWA Hiroyuki <[email protected]>

Memory hotplug is a logic for making pages unused in the specified range
of pfn. So, some of core logics can be used for other purpose as
allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Changelog: 2010/10/26
- adjusted to mmotm-1024 + Bob's 3 clean ups.
Changelog: 2010/10/21
- adjusted to mmotm-1020

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 7 +++
mm/memory_hotplug.c | 108 --------------------------------------
mm/page_isolation.c | 111 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 108 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);

+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);

#endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9260314..23f4e36 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -617,114 +617,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
}

/*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct zone *zone = NULL;
- struct page *page;
- int i;
- for (pfn = start_pfn;
- pfn < end_pfn;
- pfn += MAX_ORDER_NR_PAGES) {
- i = 0;
- /* This is just a CONFIG_HOLES_IN_ZONE check.*/
- while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
- i++;
- if (i == MAX_ORDER_NR_PAGES)
- continue;
- page = pfn_to_page(pfn + i);
- if (zone && page_zone(page) != zone)
- return 0;
- zone = page_zone(page);
- }
- return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
- unsigned long pfn;
- struct page *page;
- for (pfn = start; pfn < end; pfn++) {
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- if (PageLRU(page))
- return pfn;
- }
- }
- return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
- /* This should be improooooved!! */
- return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES (256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct page *page;
- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
- int not_managed = 0;
- int ret = 0;
- LIST_HEAD(source);
-
- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
- if (!pfn_valid(pfn))
- continue;
- page = pfn_to_page(pfn);
- if (!page_count(page))
- continue;
- /*
- * We can skip free pages. And we can only deal with pages on
- * LRU.
- */
- ret = isolate_lru_page(page);
- if (!ret) { /* Success */
- list_add_tail(&page->lru, &source);
- move_pages--;
- inc_zone_page_state(page, NR_ISOLATED_ANON +
- page_is_file_cache(page));
-
- } else {
-#ifdef CONFIG_DEBUG_VM
- printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
- pfn);
- dump_page(page);
-#endif
- /* Becasue we don't have big zone->lock. we should
- check this again here. */
- if (page_count(page)) {
- not_managed++;
- ret = -EBUSY;
- break;
- }
- }
- }
- if (!list_empty(&source)) {
- if (not_managed) {
- putback_lru_pages(&source);
- goto out;
- }
- /* this function returns # of failed pages */
- ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
- if (ret)
- putback_lru_pages(&source);
- }
-out:
- return ret;
-}
-
-/*
* remove from free_area[] and mark all as Reserved.
*/
static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..077cf19 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
#include <linux/mm.h>
#include <linux/page-isolation.h>
#include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
#include "internal.h"

static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
}
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct zone *zone = NULL;
+ struct page *page;
+ int i;
+ for (pfn = start_pfn;
+ pfn < end_pfn;
+ pfn += MAX_ORDER_NR_PAGES) {
+ i = 0;
+ /* This is just a CONFIG_HOLES_IN_ZONE check.*/
+ while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+ i++;
+ if (i == MAX_ORDER_NR_PAGES)
+ continue;
+ page = pfn_to_page(pfn + i);
+ if (zone && page_zone(page) != zone)
+ return 0;
+ zone = page_zone(page);
+ }
+ return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+ unsigned long pfn;
+ struct page *page;
+ for (pfn = start; pfn < end; pfn++) {
+ if (pfn_valid(pfn)) {
+ page = pfn_to_page(pfn);
+ if (PageLRU(page))
+ return pfn;
+ }
+ }
+ return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+ /* This should be improooooved!! */
+ return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES (256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct page *page;
+ int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+ int not_managed = 0;
+ int ret = 0;
+ LIST_HEAD(source);
+
+ for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+ if (!pfn_valid(pfn))
+ continue;
+ page = pfn_to_page(pfn);
+ if (!page_count(page))
+ continue;
+ /*
+ * We can skip free pages. And we can only deal with pages on
+ * LRU.
+ */
+ ret = isolate_lru_page(page);
+ if (!ret) { /* Success */
+ list_add_tail(&page->lru, &source);
+ move_pages--;
+ inc_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+
+ } else {
+#ifdef CONFIG_DEBUG_VM
+ printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+ pfn);
+ dump_page(page);
+#endif
+ /* Because we don't have big zone->lock. we should
+ check this again here. */
+ if (page_count(page)) {
+ not_managed++;
+ ret = -EBUSY;
+ break;
+ }
+ }
+ }
+ if (!list_empty(&source)) {
+ if (not_managed) {
+ putback_lru_pages(&source);
+ goto out;
+ }
+ /* this function returns # of failed pages */
+ ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+ if (ret)
+ putback_lru_pages(&source);
+ }
+out:
+ return ret;
+}
--
1.7.2.3

2010-11-19 16:00:58

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 11/13] mm: MIGRATE_CMA isolation functions added

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 39 ++++++++++++++++++++++++++-------------
mm/page_alloc.c | 6 +++---
mm/page_isolation.c | 15 ++++++++-------
3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..56f0e13 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,35 +3,49 @@

/*
* Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
* this will fail with -EBUSY.
*
* For isolating all pages in the range finally, the caller have to
* free all pages in the range. test_page_isolated() can be used for
* test it.
*/
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);

/*
* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
* target range is [start_pfn, end_pfn)
*/
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}

/*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
*/
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);

/*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
*/
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+ __unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
extern unsigned long alloc_contig_freed_pages(unsigned long start,
unsigned long end, gfp_t flag);
extern void free_contig_pages(struct page *page, int nr_pages);
@@ -39,7 +53,6 @@ extern void free_contig_pages(struct page *page, int nr_pages);
/*
* For migration.
*/
-
int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
unsigned long scan_lru_pages(unsigned long start, unsigned long end);
int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 91daf22..a24193e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5463,7 +5463,7 @@ out:
return ret;
}

-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
{
struct zone *zone;
unsigned long flags;
@@ -5471,8 +5471,8 @@ void unset_migratetype_isolate(struct page *page)
spin_lock_irqsave(&zone->lock, flags);
if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
goto out;
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
- move_freepages_block(zone, page, MIGRATE_MOVABLE);
+ set_pageblock_migratetype(page, migratetype);
+ move_freepages_block(zone, page, migratetype);
out:
spin_unlock_irqrestore(&zone->lock, flags);
}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 077cf19..ea9781e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
}

/*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
* to be MIGRATE_ISOLATE.
* @start_pfn: The lower PFN of the range to be isolated.
* @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
*
* Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
* the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* start_pfn/end_pfn must be aligned to pageblock_order.
* Returns 0 on success and -EBUSY if any part of range cannot be isolated.
*/
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
for (pfn = start_pfn;
pfn < undo_pfn;
pfn += pageblock_nr_pages)
- unset_migratetype_isolate(pfn_to_page(pfn));
+ __unset_migratetype_isolate(pfn_to_page(pfn), migratetype);

return -EBUSY;
}
@@ -67,8 +68,8 @@ undo:
/*
* Make isolated pages available again.
*/
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
page = __first_valid_page(pfn, pageblock_nr_pages);
if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
continue;
- unset_migratetype_isolate(page);
+ __unset_migratetype_isolate(page, migratetype);
}
return 0;
}
--
1.7.2.3

2010-11-19 16:01:39

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 06/13] mm: cma: Best-fit algorithm added

This commits adds a best-fit algorithm to the set of
algorithms supported by the CMA.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
mm/Kconfig | 16 ++-
mm/Makefile | 1 +
mm/cma-best-fit.c | 372 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 388 insertions(+), 1 deletions(-)
create mode 100644 mm/cma-best-fit.c

diff --git a/mm/Kconfig b/mm/Kconfig
index a5480ea..5ad2471 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -332,10 +332,13 @@ config CLEANCACHE

If unsure, say Y to enable cleancache

+config CMA_HAS_ALLOCATOR
+ bool
+
config CMA
bool "Contiguous Memory Allocator framework"
# Currently there is only one allocator so force it on
- select CMA_GENERIC_ALLOCATOR
+ select CMA_GENERIC_ALLOCATOR if !CMA_HAS_ALLOCATOR
help
This enables the Contiguous Memory Allocator framework which
allows drivers to allocate big physically-contiguous blocks of
@@ -391,3 +394,14 @@ config CMA_GENERIC_ALLOCATOR
implementations: the first-fit, bitmap-based algorithm or
a best-fit, red-black tree-based algorithm. The algorithm can
be changed under "Library routines".
+
+config CMA_BEST_FIT
+ bool "CMA best-fit allocator"
+ depends on CMA
+ select CMA_HAS_ALLOCATOR
+ help
+ This is a best-fit algorithm running in O(n log n) time where
+ n is the number of existing holes (which is never greater then
+ the number of allocated regions and usually much smaller). It
+ allocates area from the smallest hole that is big enough for
+ allocation in question.
diff --git a/mm/Makefile b/mm/Makefile
index c6a84f1..2cb2569 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_CMA) += cma.o
+obj-$(CONFIG_CMA_BEST_FIT) += cma-best-fit.o
diff --git a/mm/cma-best-fit.c b/mm/cma-best-fit.c
new file mode 100644
index 0000000..5ed1168
--- /dev/null
+++ b/mm/cma-best-fit.c
@@ -0,0 +1,372 @@
+/*
+ * Contiguous Memory Allocator framework: Best Fit allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: bf: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/slab.h> /* kmalloc() */
+
+#include <linux/cma.h> /* CMA structures */
+
+
+/************************* Data Types *************************/
+
+struct cma_bf_node {
+ unsigned long v;
+ struct rb_node n;
+};
+
+union cma_bf_item {
+ struct cma chunk;
+ struct {
+ struct cma_bf_node start, size;
+ };
+};
+
+struct cma_bf_private {
+ struct rb_root by_start_root;
+ struct rb_root by_size_root;
+ bool warned;
+};
+
+
+/************************* Basic Tree Manipulation *************************/
+
+static int cma_bf_node_add(struct cma_bf_node *node, struct rb_root *root,
+ bool unique)
+{
+ struct rb_node **link = &root->rb_node, *parent = NULL;
+ const unsigned long v = node->v;
+
+ while (*link) {
+ struct cma_bf_node *n;
+ parent = *link;
+ n = rb_entry(parent, struct cma_bf_node, n);
+
+ if (unlikely(unique && v == n->v))
+ return -EBUSY;
+
+ link = v <= n->v ? &parent->rb_left : &parent->rb_right;
+ }
+
+ rb_link_node(&node->n, parent, link);
+ rb_insert_color(&node->n, root);
+
+ return 0;
+}
+
+static void cma_bf_node_del(struct cma_bf_node *node, struct rb_root *root)
+{
+ rb_erase(&node->n, root);
+}
+
+static int cma_bf_item_add_by_start(union cma_bf_item *item,
+ struct cma_bf_private *prv)
+{
+ int ret = cma_bf_node_add(&item->start, &prv->by_start_root, true);
+ if (WARN_ON(ret && !prv->warned))
+ prv->warned = true;
+ return ret;
+}
+
+static void cma_bf_item_del_by_start(union cma_bf_item *item,
+ struct cma_bf_private *prv)
+{
+ cma_bf_node_del(&item->start, &prv->by_start_root);
+}
+
+static void cma_bf_item_add_by_size(union cma_bf_item *item,
+ struct cma_bf_private *prv)
+{
+ cma_bf_node_add(&item->size, &prv->by_size_root, false);
+}
+
+static void cma_bf_item_del_by_size(union cma_bf_item *item,
+ struct cma_bf_private *prv)
+{
+ cma_bf_node_del(&item->size, &prv->by_size_root);
+}
+
+
+/************************* Device API *************************/
+
+static int cma_bf_init(struct cma_region *reg)
+{
+ struct cma_bf_private *prv;
+ union cma_bf_item *item;
+
+ prv = kzalloc(sizeof *prv, GFP_KERNEL);
+ if (unlikely(!prv))
+ return -ENOMEM;
+
+ item = kzalloc(sizeof *item, GFP_KERNEL);
+ if (unlikely(!item)) {
+ kfree(prv);
+ return -ENOMEM;
+ }
+
+ item->start.v = reg->start;
+ item->size.v = reg->size;
+
+ rb_root_init(&prv->by_start_root, &item->start.n);
+ rb_root_init(&prv->by_size_root, &item->size.n);
+ prv->warned = false;
+
+ reg->private_data = prv;
+ return 0;
+}
+
+static void cma_bf_cleanup(struct cma_region *reg)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ union cma_bf_item *item =
+ rb_entry(prv->by_start_root.rb_node,
+ union cma_bf_item, start.n);
+
+ /* There should be only one item. */
+ WARN_ON(!prv->warned &&
+ (!item ||
+ item->start.n.rb_left || item->start.n.rb_right ||
+ item->size.n.rb_left || item->size.n.rb_right));
+
+ kfree(item);
+ kfree(prv);
+}
+
+struct cma *cma_bf_alloc(struct cma_region *reg,
+ size_t size, unsigned long alignment)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ struct rb_node *node = prv->by_size_root.rb_node;
+ union cma_bf_item *hole = NULL, *item;
+ unsigned long start;
+ int ret;
+
+ /* First find hole that is large enough */
+ while (node) {
+ union cma_bf_item *item =
+ rb_entry(node, union cma_bf_item, size.n);
+
+ if (item->size.v < size) {
+ node = node->rb_right;
+ } else if (item->size.v >= size) {
+ node = node->rb_left;
+ hole = item;
+ }
+ }
+ if (!hole)
+ return ERR_PTR(-ENOMEM);
+
+ /* Now look for items which can satisfy alignment requirements */
+ for (;;) {
+ unsigned long end = hole->start.v + hole->size.v;
+ start = ALIGN(hole->start.v, alignment);
+ if (start < end && end - start >= size)
+ break;
+
+ node = rb_next(node);
+ if (!node)
+ return ERR_PTR(-ENOMEM);
+
+ hole = rb_entry(node, union cma_bf_item, size.n);
+ }
+
+ /* And finally, take part of the hole */
+
+ /*
+ * There are three cases:
+ * 1. the chunk takes the whole hole,
+ * 2. the chunk is at the beginning or at the end of the hole, or
+ * 3. the chunk is in the middle of the hole.
+ */
+
+ /* Case 1, the whole hole */
+ if (size == hole->size.v) {
+ ret = __cma_grab(reg, start, size);
+ if (ret)
+ return ERR_PTR(ret);
+
+ cma_bf_item_del_by_start(hole, prv);
+ cma_bf_item_del_by_size(hole, prv);
+
+ hole->chunk.phys = start;
+ hole->chunk.size = size;
+ return &hole->chunk;
+ }
+
+ /* Allocate (so we can test early if ther's enough memory) */
+ item = kmalloc(sizeof *item, GFP_KERNEL);
+ if (unlikely(!item))
+ return ERR_PTR(-ENOMEM);
+
+ /* Case 3, in the middle */
+ if (start != hole->start.v &&
+ start + size != hole->start.v + hole->size.v) {
+ union cma_bf_item *tail;
+
+ /*
+ * Space between the end of the chunk and the end of
+ * the region, ie. space left after the end of the
+ * chunk. If this is dividable by alignment we can
+ * move the chunk to the end of the hole.
+ */
+ unsigned long left =
+ hole->start.v + hole->size.v - (start + size);
+ if ((left & (alignment - 1)) == 0) {
+ start += left;
+ /* And so, we have reduced problem to case 2. */
+ goto case_2;
+ }
+
+ /*
+ * We are going to add a hole at the end. This way,
+ * we will reduce the problem to case 2 -- the chunk
+ * will be at the end of a reduced hole.
+ */
+ tail = kmalloc(sizeof *tail, GFP_KERNEL);
+ if (unlikely(!tail)) {
+ kfree(item);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ tail->start.v = start + size;
+ tail->size.v =
+ hole->start.v + hole->size.v - tail->start.v;
+
+ if (cma_bf_item_add_by_start(tail, prv))
+ /*
+ * Things are broken beyond repair... Abort
+ * inserting the hole but continue with the
+ * item. We will loose some memory but we're
+ * screwed anyway.
+ */
+ kfree(tail);
+ else
+ cma_bf_item_add_by_size(tail, prv);
+
+ /*
+ * It's important that we first insert the new hole in
+ * the tree sorted by size and later reduce the size
+ * of the old hole. We will update the position of
+ * the old hole in the rb tree in code that handles
+ * case 2.
+ */
+ hole->size.v = tail->start.v - hole->start.v;
+
+ /* Go to case 2 */
+ }
+
+ /* Case 2, at the beginning or at the end */
+case_2:
+ /* No need to update the tree; order preserved. */
+ if (start == hole->start.v)
+ hole->start.v += size;
+
+ /* Alter hole's size */
+ hole->size.v -= size;
+ cma_bf_item_del_by_size(hole, prv);
+ cma_bf_item_add_by_size(hole, prv);
+
+ item->chunk.phys = start;
+ item->chunk.size = size;
+ return &item->chunk;
+}
+
+static void cma_bf_free(struct cma_chunk *chunk)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ union cma_bf_item *prev;
+ struct rb_node *node;
+ int twice;
+
+ {
+ unsigned long start = item->chunk.phys;
+ unsigned long size = item->chunk.size;
+ item->start.v = start;
+ item->size.v = size;
+ }
+
+ /* Add new hole */
+ if (cma_bf_item_add_by_start(item, prv)) {
+ /*
+ * We're screwed... Just free the item and forget
+ * about it. Things are broken beyond repair so no
+ * sense in trying to recover.
+ */
+ kfree(item);
+ return;
+ }
+
+ cma_bf_item_add_by_size(item, prv);
+
+ /* Merge with prev or next sibling */
+ twice = 2;
+ node = rb_prev(&item->start.n);
+ if (unlikely(!node))
+ goto next;
+ prev = rb_entry(node, union cma_bf_item, start.n);
+
+ for (;;) {
+ if (prev->start.v + prev->size.v == item->start.v) {
+ /* Remove previous hole from trees */
+ cma_bf_item_del_by_start(prev, prv);
+ cma_bf_item_del_by_size(prev, prv);
+
+ /* Alter this hole */
+ item->start.v = prev->start.v;
+ item->size.v += prev->size.v;
+ cma_bf_item_del_by_size(item, prv);
+ cma_bf_item_del_by_size(item, prv);
+ /*
+ * No need to update the by start trees as we
+ * do not break sequence order.
+ */
+
+ /* Free prev hole */
+ kfree(prev);
+ }
+
+next:
+ if (!--twice)
+ break;
+
+ node = rb_next(&item->start.n);
+ if (unlikely(!node))
+ break;
+ prev = item;
+ item = rb_entry(node, union cma_bf_item, start.n);
+ }
+}
+
+{
+ __cma_ungrab(chunk->reg, chunk->phys, chunk->size);
+ __cma_bf_free(chunk->reg,
+ container_of(chunk, union cma_bf_item, chunk));
+}
+
+/************************* Register *************************/
+
+static int cma_bf_module_init(void)
+{
+ static struct cma_allocator alloc = {
+ .name = "bf",
+ .init = cma_bf_init,
+ .cleanup = cma_bf_cleanup,
+ .alloc = cma_bf_alloc,
+ .free = cma_bf_free,
+ };
+ return cma_allocator_register(&alloc);
+}
+module_init(cma_bf_module_init);
--
1.7.2.3

2010-11-19 16:01:55

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 03/13] lib: genalloc: Generic allocator improvements

This commit adds a gen_pool_alloc_aligned() function to the
generic allocator API. It allows specifying alignment for the
allocated block. This feature uses
the bitmap_find_next_zero_area_off() function.

It also fixes possible issue with bitmap's last element being
not fully allocated (ie. space allocated for chunk->bits is
not a multiple of sizeof(long)).

It also makes some other smaller changes:
- moves structure definitions out of the header file,
- adds __must_check to functions returning value,
- makes gen_pool_add() return -ENOMEM rater than -1 on error,
- changes list_for_each to list_for_each_entry, and
- makes use of bitmap_clear().

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/genalloc.h | 46 ++++++------
lib/genalloc.c | 182 ++++++++++++++++++++++++++-------------------
2 files changed, 129 insertions(+), 99 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 9869ef3..8ac7337 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -8,29 +8,31 @@
* Version 2. See the file COPYING for more details.
*/

+struct gen_pool;

-/*
- * General purpose special memory pool descriptor.
- */
-struct gen_pool {
- rwlock_t lock;
- struct list_head chunks; /* list of chunks in this pool */
- int min_alloc_order; /* minimum allocation order */
-};
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid);

-/*
- * General purpose special memory pool chunk descriptor.
+int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr,
+ size_t size, int nid);
+
+void gen_pool_destroy(struct gen_pool *pool);
+
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order);
+
+/**
+ * gen_pool_alloc() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses a first-fit algorithm.
*/
-struct gen_pool_chunk {
- spinlock_t lock;
- struct list_head next_chunk; /* next chunk in pool */
- unsigned long start_addr; /* starting address of memory chunk */
- unsigned long end_addr; /* ending address of memory chunk */
- unsigned long bits[0]; /* bitmap for allocating memory chunk */
-};
+static inline unsigned long __must_check
+gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+ return gen_pool_alloc_aligned(pool, size, 0);
+}

-extern struct gen_pool *gen_pool_create(int, int);
-extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 1923f14..dc6d833 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -15,54 +15,81 @@
#include <linux/bitmap.h>
#include <linux/genalloc.h>

+/* General purpose special memory pool descriptor. */
+struct gen_pool {
+ rwlock_t lock; /* protects chunks list */
+ struct list_head chunks; /* list of chunks in this pool */
+ unsigned order; /* minimum allocation order */
+};
+
+/* General purpose special memory pool chunk descriptor. */
+struct gen_pool_chunk {
+ spinlock_t lock; /* protects bits */
+ struct list_head next_chunk; /* next chunk in pool */
+ unsigned long start; /* start of memory chunk */
+ unsigned long size; /* number of bits */
+ unsigned long bits[0]; /* bitmap for allocating memory chunk */
+};

/**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @order: Log base 2 of number of bytes each bitmap bit
+ * represents.
+ * @nid: Node id of the node the pool structure should be allocated
+ * on, or -1.
*
* Create a new special memory pool that can be used to manage special purpose
* memory not managed by the regular kmalloc/kfree interface.
*/
-struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid)
{
struct gen_pool *pool;

- pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
- if (pool != NULL) {
+ if (WARN_ON(order >= BITS_PER_LONG))
+ return NULL;
+
+ pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid);
+ if (pool) {
rwlock_init(&pool->lock);
INIT_LIST_HEAD(&pool->chunks);
- pool->min_alloc_order = min_alloc_order;
+ pool->order = order;
}
return pool;
}
EXPORT_SYMBOL(gen_pool_create);

/**
- * gen_pool_add - add a new chunk of special memory to the pool
- * @pool: pool to add new memory chunk to
- * @addr: starting address of memory chunk to add to pool
- * @size: size in bytes of the memory chunk to add to pool
- * @nid: node id of the node the chunk structure and bitmap should be
- * allocated on, or -1
+ * gen_pool_add() - add a new chunk of special memory to the pool
+ * @pool: Pool to add new memory chunk to.
+ * @addr: Starting address of memory chunk to add to pool.
+ * @size: Size in bytes of the memory chunk to add to pool.
+ * @nid: Node id of the node the pool structure should be allocated
+ * on, or -1.
*
* Add a new chunk of special memory to the specified pool.
*/
-int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
- int nid)
+int __must_check
+gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size, int nid)
{
struct gen_pool_chunk *chunk;
- int nbits = size >> pool->min_alloc_order;
- int nbytes = sizeof(struct gen_pool_chunk) +
- (nbits + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+ size_t nbytes;
+
+ if (WARN_ON(!addr || addr + size < addr ||
+ (addr & ((1 << pool->order) - 1))))
+ return -EINVAL;

- chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
- if (unlikely(chunk == NULL))
- return -1;
+ size = size >> pool->order;
+ if (WARN_ON(!size))
+ return -EINVAL;
+
+ nbytes = sizeof *chunk + BITS_TO_LONGS(size) * sizeof *chunk->bits;
+ chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
+ if (!chunk)
+ return -ENOMEM;

spin_lock_init(&chunk->lock);
- chunk->start_addr = addr;
- chunk->end_addr = addr + size;
+ chunk->start = addr >> pool->order;
+ chunk->size = size;

write_lock(&pool->lock);
list_add(&chunk->next_chunk, &pool->chunks);
@@ -73,115 +100,116 @@ int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
EXPORT_SYMBOL(gen_pool_add);

/**
- * gen_pool_destroy - destroy a special memory pool
- * @pool: pool to destroy
+ * gen_pool_destroy() - destroy a special memory pool
+ * @pool: Pool to destroy.
*
* Destroy the specified special memory pool. Verifies that there are no
* outstanding allocations.
*/
void gen_pool_destroy(struct gen_pool *pool)
{
- struct list_head *_chunk, *_next_chunk;
struct gen_pool_chunk *chunk;
- int order = pool->min_alloc_order;
- int bit, end_bit;
-
+ int bit;

- list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ while (!list_empty(&pool->chunks)) {
+ chunk = list_entry(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
list_del(&chunk->next_chunk);

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
- BUG_ON(bit < end_bit);
+ bit = find_next_bit(chunk->bits, chunk->size, 0);
+ BUG_ON(bit < chunk->size);

kfree(chunk);
}
kfree(pool);
- return;
}
EXPORT_SYMBOL(gen_pool_destroy);

/**
- * gen_pool_alloc - allocate special memory from the pool
- * @pool: pool to allocate from
- * @size: number of bytes to allocate from the pool
+ * gen_pool_alloc_aligned() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ * @alignment_order: Order the allocated space should be
+ * aligned to (eg. 20 means allocated space
+ * must be aligned to 1MiB).
*
* Allocate the requested number of bytes from the specified pool.
* Uses a first-fit algorithm.
*/
-unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order)
{
- struct list_head *_chunk;
+ unsigned long addr, align_mask = 0, flags, start;
struct gen_pool_chunk *chunk;
- unsigned long addr, flags;
- int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit;

if (size == 0)
return 0;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (alignment_order > pool->order)
+ align_mask = (1 << (alignment_order - pool->order)) - 1;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ size = (size + (1UL << pool->order) - 1) >> pool->order;

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk) {
+ if (chunk->size < size)
+ continue;

spin_lock_irqsave(&chunk->lock, flags);
- start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
- nbits, 0);
- if (start_bit >= end_bit) {
+ start = bitmap_find_next_zero_area_off(chunk->bits, chunk->size,
+ 0, size, align_mask,
+ chunk->start);
+ if (start >= chunk->size) {
spin_unlock_irqrestore(&chunk->lock, flags);
continue;
}

- addr = chunk->start_addr + ((unsigned long)start_bit << order);
-
- bitmap_set(chunk->bits, start_bit, nbits);
+ bitmap_set(chunk->bits, start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- read_unlock(&pool->lock);
- return addr;
+ addr = (chunk->start + start) << pool->order;
+ goto done;
}
+
+ addr = 0;
+done:
read_unlock(&pool->lock);
- return 0;
+ return addr;
}
-EXPORT_SYMBOL(gen_pool_alloc);
+EXPORT_SYMBOL(gen_pool_alloc_aligned);

/**
- * gen_pool_free - free allocated special memory back to the pool
- * @pool: pool to free to
- * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * gen_pool_free() - free allocated special memory back to the pool
+ * @pool: Pool to free to.
+ * @addr: Starting address of memory to free back to pool.
+ * @size: Size in bytes of memory to free.
*
* Free previously allocated special memory back to the specified pool.
*/
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
unsigned long flags;
- int order = pool->min_alloc_order;
- int bit, nbits;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (!size)
+ return;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ addr = addr >> pool->order;
+ size = (size + (1UL << pool->order) - 1) >> pool->order;

- if (addr >= chunk->start_addr && addr < chunk->end_addr) {
- BUG_ON(addr + size > chunk->end_addr);
+ BUG_ON(addr + size < addr);
+
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk)
+ if (addr >= chunk->start &&
+ addr + size <= chunk->start + chunk->size) {
spin_lock_irqsave(&chunk->lock, flags);
- bit = (addr - chunk->start_addr) >> order;
- while (nbits--)
- __clear_bit(bit++, chunk->bits);
+ bitmap_clear(chunk->bits, addr - chunk->start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- break;
+ goto done;
}
- }
- BUG_ON(nbits > 0);
+ BUG_ON(1);
+done:
read_unlock(&pool->lock);
}
EXPORT_SYMBOL(gen_pool_free);
--
1.7.2.3

2010-11-19 15:58:29

by Michal Nazarewicz

[permalink] [raw]
Subject: [RFCv6 01/13] lib: rbtree: rb_root_init() function added

Added a rb_root_init() function which initialises a rb_root
structure as a red-black tree with at most one element. The
rationale is that using rb_root_init(root, node) is more
straightforward and cleaner then first initialising and
empty tree followed by an insert operation.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/rbtree.h | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 7066acb..5b6dc66 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -130,6 +130,17 @@ static inline void rb_set_color(struct rb_node *rb, int color)
}

#define RB_ROOT (struct rb_root) { NULL, }
+
+static inline void rb_root_init(struct rb_root *root, struct rb_node *node)
+{
+ root->rb_node = node;
+ if (node) {
+ node->rb_parent_color = RB_BLACK; /* black, no parent */
+ node->rb_left = NULL;
+ node->rb_right = NULL;
+ }
+}
+
#define rb_entry(ptr, type, member) container_of(ptr, type, member)

#define RB_EMPTY_ROOT(root) ((root)->rb_node == NULL)
--
1.7.2.3