2010-12-15 20:41:26

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 00/12] Contiguous Memory Allocator

Hello everyone,

This is yet another version of CMA this time stripped from a lot of
code and with working migration implementation.

The Contiguous Memory Allocator (CMA) makes it possible for
device drivers to allocate big contiguous chunks of memory after
the system has booted.

For more information see 7th patch in the set.


This version fixes some things Kamezawa suggested plus it separates
code that uses MIGRATE_CMA from the rest of the code. This I hope
will help to grasp the overall idea of CMA.


The current version is just an allocator that handles allocation of
contiguous memory blocks. The difference between this patchset and
Kamezawa's alloc_contig_pages() are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
which may be unsuitable for embeded systems where a few MiBs are
required.

Lack of the requirement on the alignment means that several threads
might try to access the same pageblock/page. To prevent this from
happening CMA uses a mutex so that only one cm_alloc()/cm_free()
function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
similarly to ZONE_MOVABLE but can be put in arbitrary places.

This is required for us since we need to define two disjoint memory
ranges inside system RAM. (ie. in two memory banks (do not confuse
with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
migrated. CMA on the other hand maintains its own allocator to
decide where to allocate memory for device drivers and then tries
to migrate pages from that part if needed. This is not strictly
required but I somehow feel it might be faster.


Links to previous versions of the patchset:
v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v8: 1. The alloc_contig_range() function has now been separated from
CMA and put in page_allocator.c. This function tries to
migrate all LRU pages in specified range and then allocate the
range using alloc_contig_freed_pages().

2. Support for MIGRATE_CMA has been separated from the CMA code.
I have not tested if CMA works with ZONE_MOVABLE but I see no
reasons why it shouldn't.

3. I have added a @private argument when creating CMA contexts so
that one can reserve memory and not share it with the rest of
the system. This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
mapping has been removed from the patchset. This is not to say
that this code is not needed, it's just not worth posting
everything in one patchset.

Currently, CMA is "just" an allocator. It uses it's own
migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
which behave just like ZONE_MOVABLE but dispite the latter can
be put in arbitrary places.

2. The migration code that was introduced in the previous version
actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
The implementation is not yet complete though.

Migration support means that when CMA is not using memory
reserved for it, page allocator can allocate pages from it.
When CMA wants to use the memory, the pages have to be moved
and/or evicted as to make room for CMA.

To make it possible it must be guaranteed that only movable and
reclaimable pages are allocated in CMA controlled regions.
This is done by introducing a MIGRATE_CMA migrate type that
guarantees exactly that.

Some of the migration code is "borrowed" from Kamezawa
Hiroyuki's alloc_contig_pages() implementation. The main
difference is that thanks to MIGRATE_CMA migrate type CMA
assumes that memory controlled by CMA are is always movable or
reclaimable so that it makes allocation decisions regardless of
the whether some pages are actually allocated and migrates them
if needed.

The most interesting patches from the patchset that implement
the functionality are:

09/13: mm: alloc_contig_free_pages() added
10/13: mm: MIGRATE_CMA migration type added
11/13: mm: MIGRATE_CMA isolation functions added
12/13: mm: cma: Migration support added [wip]

Currently, kernel panics in some situations which I am trying
to investigate.

2. cma_pin() and cma_unpin() functions has been added (after
a conversation with Johan Mossberg). The idea is that whenever
hardware does not use the memory (no transaction is on) the
chunk can be moved around. This would allow defragmentation to
be implemented if desired. No defragmentation algorithm is
provided at this time.

3. Sysfs support has been replaced with debugfs. I always felt
unsure about the sysfs interface and when Greg KH pointed it
out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
that platform will provide a "*=<regions>" rule in the map
attribute.

2. The terminology has been changed slightly renaming "kind" to
"type" of memory. In the previous revisions, the documentation
indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
a separate patch, the fourth one). As a consequence, the
cma_set_defaults() function has been changed -- it no longer
accepts a string with list of regions but an array of regions.

2. The "asterisk" attribute has been removed. Now, each region
has an "asterisk" flag which lets one specify whether this
region should by considered "asterisk" region.

3. SysFS support has been moved to a separate patch (the third one
in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed. In exchange,
a SysFS entry has been created under kernel/mm/contiguous.

The intended way of specifying the attributes is
a cma_set_defaults() function called by platform initialisation
code. "regions" attribute (the string specified by "cma"
command line parameter) can be overwritten with command line
parameter; the other attributes can be changed during run-time
using the SysFS entries.

2. The behaviour of the "map" attribute has been modified
slightly. Currently, if no rule matches given device it is
assigned regions specified by the "asterisk" attribute. It is
by default built from the region names given in "regions"
attribute.

3. Devices can register private regions as well as regions that
can be shared but are not reserved using standard CMA
mechanisms. A private region has no name and can be accessed
only by devices that have the pointer to it.

4. The way allocators are registered has changed. Currently,
a cma_allocator_register() function is used for that purpose.
Moreover, allocators are attached to regions the first time
memory is registered from the region or when allocator is
registered which means that allocators can be dynamic modules
that are loaded after the kernel booted (of course, it won't be
possible to allocate a chunk of memory from a region if
allocator is not loaded).

5. Index of new functions:

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size,
+ dma_addr_t alignment)

+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)

+int __must_check cma_region_register(struct cma_region *reg);

+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);

+int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

mm: migrate.c: fix compilation error

I had some strange compilation error; this patch fixed them.

lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements

Some improvements to genalloc API (most importantly possibility to
allocate memory with alignment requirement).

mm: move some functions from memory_hotplug.c to page_isolation.c
mm: alloc_contig_freed_pages() added

Code "stolen" from Kamezawa. The first patch just moves code
around and the second provide function for "allocates" already
freed memory.

mm: alloc_contig_range() added

This is what Kamezawa asked: a function that tries to migrate all
pages from given range and then use alloc_contig_freed_pages()
(defined by the previous commit) to allocate those pages.

mm: cma: Contiguous Memory Allocator added

The CMA code but with no MIGRATE_CMA support yet. This assues
that one uses a ZONE_MOVABLE. I must admit I have not test that
setup yet but I don't see any reasons why it should not work.

mm: MIGRATE_CMA migration type added
mm: MIGRATE_CMA isolation functions added
mm: MIGRATE_CMA support added to CMA

Introduction of the new migratetype and support for it in CMA.
MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
memory range can be marked as one.

mm: cma: Test device and application added

Test device and application. Not really for merging; just for
testing really.

ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

A stub integration with some ARM machines. Mostly to get the cma
testing device working. Again, not for merging, just an example.


arch/arm/mach-s5pv210/Kconfig | 2 +
arch/arm/mach-s5pv210/mach-aquila.c | 2 +
arch/arm/mach-s5pv210/mach-goni.c | 2 +
arch/arm/mach-s5pv310/Kconfig | 1 +
arch/arm/mach-s5pv310/mach-universal_c210.c | 2 +
arch/arm/plat-s5p/Makefile | 2 +
arch/arm/plat-s5p/cma-stub.c | 49 +++
arch/arm/plat-s5p/include/plat/cma-stub.h | 21 ++
drivers/misc/Kconfig | 28 ++
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 238 ++++++++++++++
include/linux/bitmap.h | 24 ++-
include/linux/cma.h | 290 +++++++++++++++++
include/linux/genalloc.h | 46 ++--
include/linux/mmzone.h | 43 ++-
include/linux/page-isolation.h | 50 +++-
lib/bitmap.c | 22 +-
lib/genalloc.c | 182 ++++++-----
mm/Kconfig | 36 ++
mm/Makefile | 1 +
mm/cma.c | 455 ++++++++++++++++++++++++++
mm/compaction.c | 10 +
mm/internal.h | 3 +
mm/memory_hotplug.c | 108 -------
mm/migrate.c | 2 +
mm/page_alloc.c | 286 +++++++++++++++--
mm/page_isolation.c | 126 +++++++-
tools/cma/cma-test.c | 457 +++++++++++++++++++++++++++
28 files changed, 2219 insertions(+), 270 deletions(-)
create mode 100644 arch/arm/plat-s5p/cma-stub.c
create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c
create mode 100644 tools/cma/cma-test.c

--
1.7.2.3


2010-12-15 20:38:48

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 12/12] ARM: cma: Added CMA to Aquila, Goni and c210 universal boards

This commit adds CMA memory reservation code to Aquila, Goni and c210
universal boards.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
arch/arm/mach-s5pv210/Kconfig | 2 +
arch/arm/mach-s5pv210/mach-aquila.c | 2 +
arch/arm/mach-s5pv210/mach-goni.c | 2 +
arch/arm/mach-s5pv310/Kconfig | 1 +
arch/arm/mach-s5pv310/mach-universal_c210.c | 2 +
arch/arm/plat-s5p/Makefile | 2 +
arch/arm/plat-s5p/cma-stub.c | 49 +++++++++++++++++++++++++++
arch/arm/plat-s5p/include/plat/cma-stub.h | 21 +++++++++++
8 files changed, 81 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/plat-s5p/cma-stub.c
create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h

diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig
index 53aabef..b395a16 100644
--- a/arch/arm/mach-s5pv210/Kconfig
+++ b/arch/arm/mach-s5pv210/Kconfig
@@ -68,6 +68,7 @@ config MACH_AQUILA
select S5P_DEV_ONENAND
select S5PV210_SETUP_FB_24BPP
select S5PV210_SETUP_SDHCI
+ select CMA_DEVICE_POSSIBLE
help
Machine support for the Samsung Aquila target based on S5PC110 SoC

@@ -92,6 +93,7 @@ config MACH_GONI
select S5PV210_SETUP_I2C2
select S5PV210_SETUP_KEYPAD
select S5PV210_SETUP_SDHCI
+ select CMA_DEVICE_POSSIBLE
help
Machine support for Samsung GONI board
S5PC110(MCP) is one of package option of S5PV210
diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c
index 28677ca..8608a16 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -39,6 +39,7 @@
#include <plat/fb.h>
#include <plat/fimc-core.h>
#include <plat/sdhci.h>
+#include <plat/cma-stub.h>

/* Following are default values for UCON, ULCON and UFCON UART registers */
#define AQUILA_UCON_DEFAULT (S3C2410_UCON_TXILEVEL | \
@@ -690,4 +691,5 @@ MACHINE_START(AQUILA, "Aquila")
.map_io = aquila_map_io,
.init_machine = aquila_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = cma_mach_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index b1dcf96..b1bf079 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -45,6 +45,7 @@
#include <plat/keypad.h>
#include <plat/sdhci.h>
#include <plat/clock.h>
+#include <plat/cma-stub.h>

/* Following are default values for UCON, ULCON and UFCON UART registers */
#define GONI_UCON_DEFAULT (S3C2410_UCON_TXILEVEL | \
@@ -865,4 +866,5 @@ MACHINE_START(GONI, "GONI")
.map_io = goni_map_io,
.init_machine = goni_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = cma_mach_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv310/Kconfig b/arch/arm/mach-s5pv310/Kconfig
index d64efe0..ae4e0da 100644
--- a/arch/arm/mach-s5pv310/Kconfig
+++ b/arch/arm/mach-s5pv310/Kconfig
@@ -85,6 +85,7 @@ config MACH_UNIVERSAL_C210
select S5P_DEV_ONENAND
select S3C_DEV_I2C1
select S5PV310_SETUP_I2C1
+ select CMA_DEVICE_POSSIBLE
help
Machine support for Samsung Mobile Universal S5PC210 Reference
Board. S5PC210(MCP) is one of package option of S5PV310
diff --git a/arch/arm/mach-s5pv310/mach-universal_c210.c b/arch/arm/mach-s5pv310/mach-universal_c210.c
index 16d8fc0..d65703a 100644
--- a/arch/arm/mach-s5pv310/mach-universal_c210.c
+++ b/arch/arm/mach-s5pv310/mach-universal_c210.c
@@ -21,6 +21,7 @@
#include <plat/s5pv310.h>
#include <plat/cpu.h>
#include <plat/devs.h>
+#include <plat/cma-stub.h>

#include <mach/map.h>

@@ -152,6 +153,7 @@ MACHINE_START(UNIVERSAL_C210, "UNIVERSAL_C210")
.boot_params = S5P_PA_SDRAM + 0x100,
.init_irq = s5pv310_init_irq,
.map_io = universal_map_io,
+ .reserve = cma_mach_reserve,
.init_machine = universal_machine_init,
.timer = &s5pv310_timer,
MACHINE_END
diff --git a/arch/arm/plat-s5p/Makefile b/arch/arm/plat-s5p/Makefile
index de65238..6fdb6ce 100644
--- a/arch/arm/plat-s5p/Makefile
+++ b/arch/arm/plat-s5p/Makefile
@@ -28,3 +28,5 @@ obj-$(CONFIG_S5P_DEV_FIMC0) += dev-fimc0.o
obj-$(CONFIG_S5P_DEV_FIMC1) += dev-fimc1.o
obj-$(CONFIG_S5P_DEV_FIMC2) += dev-fimc2.o
obj-$(CONFIG_S5P_DEV_ONENAND) += dev-onenand.o
+
+obj-$(CONFIG_CMA) += cma-stub.o
diff --git a/arch/arm/plat-s5p/cma-stub.c b/arch/arm/plat-s5p/cma-stub.c
new file mode 100644
index 0000000..c175ba8
--- /dev/null
+++ b/arch/arm/plat-s5p/cma-stub.c
@@ -0,0 +1,49 @@
+/*
+ * This file is just a quick and dirty hack to get CMA testing device
+ * working. The cma_mach_reserve() should be called as mach's reserve
+ * callback. CMA testing device will use cma_ctx for allocations.
+ */
+
+#include <plat/cma-stub.h>
+
+#include <linux/cma.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+
+struct cma *cma_ctx;
+
+#define cma_size (32UL << 20) /* 32 MiB */
+
+static unsigned long cma_start __initdata;
+
+void __init cma_mach_reserve(void)
+{
+ unsigned long start = cma_reserve(0, cma_size, 0, true);
+ if (IS_ERR_VALUE(start))
+ printk(KERN_WARNING "cma: unable to reserve %lu for CMA: %d\n",
+ cma_size >> 20, (int)start);
+ else
+ cma_start = start;
+}
+
+static int __init cma_mach_init(void)
+{
+ int ret = -ENOMEM;
+
+ if (cma_start) {
+ struct cma *ctx = cma_create(cma_start, cma_size);
+ if (IS_ERR(ctx)) {
+ ret = PTR_ERR(ctx);
+ printk(KERN_WARNING
+ "cma: cma_create(%p, %p) failed: %d\n",
+ (void *)cma_start, (void *)cma_size, ret);
+ } else {
+ cma_ctx = ctx;
+ ret = 0;
+ }
+ }
+
+ return ret;
+}
+device_initcall(cma_mach_init);
diff --git a/arch/arm/plat-s5p/include/plat/cma-stub.h b/arch/arm/plat-s5p/include/plat/cma-stub.h
new file mode 100644
index 0000000..a24a03b
--- /dev/null
+++ b/arch/arm/plat-s5p/include/plat/cma-stub.h
@@ -0,0 +1,21 @@
+/*
+ * This file is just a quick and dirty hack to get CMA testing device
+ * working. The cma_mach_reserve() should be called as mach's reserve
+ * callback. CMA testing device will use cma_ctx for allocations.
+ */
+
+struct cma;
+
+#ifdef CONFIG_CMA
+
+extern struct cma *cma_ctx;
+
+void cma_mach_reserve(void);
+
+#else
+
+#define cma_ctx ((struct cma *)NULL)
+
+#define cma_mach_reserve NULL
+
+#endif
--
1.7.2.3

2010-12-15 20:38:51

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added

The Contiguous Memory Allocator is a set of functions that lets
one initialise a region of memory which then can be used to perform
allocations of contiguous memory chunks from.

CMA allows for creation of private and non-private contexts.
The former is reserved for CMA and no other kernel subsystem can
use it. The latter allows for movable pages to be allocated within
CMA's managed memory so that it can be used for page cache when
CMA devices do not use it.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/cma.h | 219 ++++++++++++++++++++++++++++++++++
mm/Kconfig | 22 ++++
mm/Makefile | 1 +
mm/cma.c | 328 +++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 570 insertions(+), 0 deletions(-)
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c

diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 0000000..e9575fd
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,219 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ * The Contiguous Memory Allocator (CMA) makes it possible for
+ * device drivers to allocate big contiguous chunks of memory after
+ * the system has booted.
+ *
+ * It requires some machine- and/or platform-specific initialisation
+ * code which prepares memory ranges to be used with CMA and later,
+ * device drivers can allocate memory from those ranges.
+ *
+ * Why is it needed?
+ *
+ * Various devices on embedded systems have no scatter-getter and/or
+ * IO map support and require contiguous blocks of memory to
+ * operate. They include devices such as cameras, hardware video
+ * coders, etc.
+ *
+ * Such devices often require big memory buffers (a full HD frame
+ * is, for instance, more then 2 mega pixels large, i.e. more than 6
+ * MB of memory), which makes mechanisms such as kmalloc() or
+ * alloc_page() ineffective.
+ *
+ * At the same time, a solution where a big memory region is
+ * reserved for a device is suboptimal since often more memory is
+ * reserved then strictly required and, moreover, the memory is
+ * inaccessible to page system even if device drivers don't use it.
+ *
+ * CMA tries to solve this issue by operating on memory regions
+ * where only movable pages can be allocated from. This way, kernel
+ * can use the memory for pagecache and when device driver requests
+ * it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ * For device driver to use CMA it needs to have a pointer to a CMA
+ * context represented by a struct cma (which is an opaque data
+ * type).
+ *
+ * Once such pointer is obtained, device driver may allocate
+ * contiguous memory chunk using the following function:
+ *
+ * cm_alloc()
+ *
+ * This function returns a pointer to struct cm (another opaque data
+ * type) which represent a contiguous memory chunk. This pointer
+ * may be used with the following functions:
+ *
+ * cm_free() -- frees allocated contiguous memory
+ * cm_pin() -- pins memory
+ * cm_unpin() -- unpins memory
+ * cm_vmap() -- maps memory in kernel space
+ * cm_vunmap() -- unmaps memory from kernel space
+ *
+ * See the respective functions for more information.
+ *
+ * Platform/machine integration
+ *
+ * For device drivers to be able to use CMA platform or machine
+ * initialisation code must create a CMA context and pass it to
+ * device drivers. The latter may be done by a global variable or
+ * a platform/machine specific function. For the former CMA
+ * provides the following functions:
+ *
+ * cma_reserve()
+ * cma_create()
+ *
+ * The cma_reserve() function must be called when memblock is still
+ * operational and reserving memory with it is still possible. On
+ * ARM platform the "reserve" machine callback is a perfect place to
+ * call it.
+ *
+ * The last function creates a CMA context on a range of previously
+ * initialised memory addresses. Because it uses kmalloc() it needs
+ * to be called after SLAB is initialised.
+ */
+
+/***************************** Kernel level API *****************************/
+
+#if defined __KERNEL__ && defined CONFIG_CMA
+
+/* CMA context */
+struct cma;
+/* Contiguous Memory chunk */
+struct cm;
+
+/**
+ * cma_reserve() - reserves memory.
+ * @start: start address of the memory range in bytes hint; if unsure
+ * pass zero.
+ * @size: size of the memory to reserve in bytes.
+ * @alignment: desired alignment in bytes (must be power of two or zero).
+ *
+ * It will use memblock to allocate memory. @start and @size will be
+ * aligned to PAGE_SIZE.
+ *
+ * Returns reserved's area physical address or value that yields true
+ * when checked with IS_ERR_VALUE().
+ */
+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment);
+
+/**
+ * cma_create() - creates a CMA context.
+ * @start: start address of the context in bytes.
+ * @size: size of the context in bytes.
+ * @min_alignment: minimal desired alignment or zero.
+ * @private: whether to create private context.
+ *
+ * The range must be page aligned. Different contexts cannot overlap.
+ *
+ * Unless @private is true the memory range must lay in ZONE_MOVABLE.
+ * If @private is true no underlaying memory checking is done and
+ * during allocation no pages migration will be performed - it is
+ * assumed that the memory is reserved and only CMA manages it.
+ *
+ * @start and @size must be page and @min_alignment alignment.
+ * @min_alignment specifies the minimal alignment that user will be
+ * able to request through cm_alloc() function. In most cases one
+ * will probably pass zero as @min_alignment but if the CMA context
+ * will be used only for, say, 1 MiB blocks passing 1 << 20 as
+ * @min_alignment may increase performance and reduce memory usage
+ * slightly.
+ *
+ * Because this function uses kmalloc() it must be called after SLAB
+ * is initialised. This in particular means that it cannot be called
+ * just after cma_reserve() since the former needs to be run way
+ * earlier.
+ *
+ * Returns pointer to CMA context or a pointer-error on error.
+ */
+struct cma *cma_create(unsigned long start, unsigned long size,
+ unsigned long min_alignment, _Bool private);
+
+/**
+ * cma_destroy() - destroys CMA context.
+ * @cma: context to destroy.
+ */
+void cma_destroy(struct cma *cma);
+
+/**
+ * cm_alloc() - allocates contiguous memory.
+ * @cma: CMA context to use.
+ * @size: desired chunk size in bytes (must be non-zero).
+ * @alignent: desired minimal alignment in bytes (must be power of two
+ * or zero).
+ *
+ * Returns pointer to structure representing contiguous memory or
+ * a pointer-error on error.
+ */
+struct cm *cm_alloc(struct cma *cma, unsigned long size,
+ unsigned long alignment);
+
+/**
+ * cm_free() - frees contiguous memory.
+ * @cm: contiguous memory to free.
+ *
+ * The contiguous memory must be not be pinned (see cma_pin()) and
+ * must not be mapped to kernel space (cma_vmap()).
+ */
+void cm_free(struct cm *cm);
+
+/**
+ * cm_pin() - pins contiguous memory.
+ * @cm: contiguous memory to pin.
+ *
+ * Pinning is required to obtain contiguous memory's physical address.
+ * While memory is pinned the memory will remain valid it may change
+ * if memory is unpinned and then pinned again. This facility is
+ * provided so that memory defragmentation can be implemented inside
+ * CMA.
+ *
+ * Each call to cm_pin() must be accompanied by call to cm_unpin() and
+ * the calls may be nested.
+ *
+ * Returns chunk's physical address or a value that yields true when
+ * tested with IS_ERR_VALUE().
+ */
+unsigned long cm_pin(struct cm *cm);
+
+/**
+ * cm_unpin() - unpins contiguous memory.
+ * @cm: contiguous memory to unpin.
+ *
+ * See cm_pin().
+ */
+void cm_unpin(struct cm *cm);
+
+/**
+ * cm_vmap() - maps memory to kernel space (or returns existing mapping).
+ * @cm: contiguous memory to map.
+ *
+ * Each call to cm_vmap() must be accompanied with call to cm_vunmap()
+ * and the calls may be nested.
+ *
+ * Returns kernel virtual address or a pointer-error.
+ */
+void *cm_vmap(struct cm *cm);
+
+/**
+ * cm_vunmap() - unmpas memory from kernel space.
+ * @cm: contiguous memory to unmap.
+ *
+ * See cm_vmap().
+ */
+void cm_vunmap(struct cm *cm);
+
+#endif
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index b911ad3..2beab4d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -331,3 +331,25 @@ config CLEANCACHE
in a negligible performance hit.

If unsure, say Y to enable cleancache
+
+config CMA
+ bool "Contiguous Memory Allocator"
+ select MIGRATION
+ select GENERIC_ALLOCATOR
+ help
+ This enables the Contiguous Memory Allocator which allows drivers
+ to allocate big physically-contiguous blocks of memory for use with
+ hardware components that do not support I/O map nor scatter-gather.
+
+ For more information see <include/linux/cma.h>. If unsure, say "n".
+
+config CMA_DEBUG
+ bool "CMA debug messages (DEVELOPEMENT)"
+ depends on CMA
+ help
+ Turns on debug messages in CMA. This produces KERN_DEBUG
+ messages for every CMA call as well as various messages while
+ processing calls such as cma_alloc(). This option does not
+ affect warning and error messages.
+
+ This is mostly used during development. If unsure, say "n".
diff --git a/mm/Makefile b/mm/Makefile
index 0b08d1c..c6a84f1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
+obj-$(CONFIG_CMA) += cma.o
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 0000000..d82361b
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,328 @@
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See include/linux/cma.h for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/cma.h>
+
+#ifndef CONFIG_NO_BOOTMEM
+# include <linux/bootmem.h>
+#endif
+#ifdef CONFIG_HAVE_MEMBLOCK
+# include <linux/memblock.h>
+#endif
+
+#include <linux/err.h>
+#include <linux/genalloc.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+
+#include "internal.h"
+
+/* XXX Revisit */
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+# define phys_to_pfn __phys_to_pfn
+#else
+# warning correct phys_to_pfn implementation needed
+static unsigned long phys_to_pfn(phys_addr_t phys)
+{
+ return virt_to_pfn(phys_to_virt(phys));
+}
+#endif
+
+
+/************************* Initialise CMA *************************/
+
+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment)
+{
+ pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
+ (void *)alignment);
+
+ /* Sanity checks */
+ if (!size || (alignment & (alignment - 1)))
+ return (unsigned long)-EINVAL;
+
+ /* Sanitise input arguments */
+ start = PAGE_ALIGN(start);
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+
+ /* Reserve memory */
+ if (start) {
+ if (memblock_is_region_reserved(start, size) ||
+ memblock_reserve(start, size) < 0)
+ return (unsigned long)-EBUSY;
+ } else {
+ /*
+ * Use __memblock_alloc_base() since
+ * memblock_alloc_base() panic()s.
+ */
+ u64 addr = __memblock_alloc_base(size, alignment, 0);
+ if (!addr) {
+ return (unsigned long)-ENOMEM;
+ } else if (addr + size > ~(unsigned long)0) {
+ memblock_free(addr, size);
+ return (unsigned long)-EOVERFLOW;
+ } else {
+ start = addr;
+ }
+ }
+
+ return start;
+}
+
+
+/************************** CMA context ***************************/
+
+struct cma {
+ bool migrate;
+ struct gen_pool *pool;
+};
+
+static int __cma_check_range(unsigned long start, unsigned long size)
+{
+ unsigned long pfn, count;
+ struct page *page;
+ struct zone *zone;
+
+ start = phys_to_pfn(start);
+ if (WARN_ON(!pfn_valid(start)))
+ return -EINVAL;
+
+ if (WARN_ON(page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE))
+ return -EINVAL;
+
+ /* First check if all pages are valid and in the same zone */
+ zone = page_zone(pfn_to_page(start));
+ count = size >> PAGE_SHIFT;
+ pfn = start;
+ while (++pfn, --count) {
+ if (WARN_ON(!pfn_valid(pfn)) ||
+ WARN_ON(page_zone(pfn_to_page(pfn)) != zone))
+ return -EINVAL;
+ }
+
+ /* Now check migratetype of their pageblocks. */
+ start = start & ~(pageblock_nr_pages - 1);
+ pfn = ALIGN(pfn, pageblock_nr_pages);
+ page = pfn_to_page(start);
+ count = (pfn - start) >> PAGE_SHIFT;
+ do {
+ if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_MOVABLE))
+ return -EINVAL;
+ page += pageblock_nr_pages;
+ } while (--count);
+
+ return 0;
+}
+
+struct cma *cma_create(unsigned long start, unsigned long size,
+ unsigned long min_alignment, bool private)
+{
+ struct gen_pool *pool;
+ struct cma *cma;
+ int ret;
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ if (!size)
+ return ERR_PTR(-EINVAL);
+ if (min_alignment & (min_alignment - 1))
+ return ERR_PTR(-EINVAL);
+ if (min_alignment < PAGE_SIZE)
+ min_alignment = PAGE_SIZE;
+ if ((start | size) & (min_alignment - 1))
+ return ERR_PTR(-EINVAL);
+ if (start + size < start)
+ return ERR_PTR(-EOVERFLOW);
+
+ if (!private) {
+ ret = __cma_check_range(start, size);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ }
+
+ cma = kmalloc(sizeof *cma, GFP_KERNEL);
+ if (!cma)
+ return ERR_PTR(-ENOMEM);
+
+ pool = gen_pool_create(ffs(min_alignment) - 1, -1);
+ if (!pool) {
+ ret = -ENOMEM;
+ goto error1;
+ }
+
+ ret = gen_pool_add(pool, start, size, -1);
+ if (unlikely(ret))
+ goto error2;
+
+ cma->migrate = !private;
+ cma->pool = pool;
+
+ pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+ return cma;
+
+error2:
+ gen_pool_destroy(pool);
+error1:
+ kfree(cma);
+ return ERR_PTR(ret);
+}
+
+void cma_destroy(struct cma *cma)
+{
+ pr_debug("%s(<%p>)\n", __func__, (void *)cma);
+ gen_pool_destroy((void *)cma);
+}
+
+
+/************************* Allocate and free *************************/
+
+struct cm {
+ struct cma *cma;
+ unsigned long phys, size;
+ atomic_t pinned, mapped;
+};
+
+/* Protects cm_alloc(), cm_free() as well as gen_pools of each cm. */
+static DEFINE_MUTEX(cma_mutex);
+
+struct cm *cm_alloc(struct cma *cma, unsigned long size,
+ unsigned long alignment)
+{
+ unsigned long start;
+ int ret = -ENOMEM;
+ struct cm *cm;
+
+ pr_debug("%s(<%p>, %p/%p)\n", __func__, (void *)cma,
+ (void *)size, (void *)alignment);
+
+ if (!size || (alignment & (alignment - 1)))
+ return ERR_PTR(-EINVAL);
+ size = PAGE_ALIGN(size);
+
+ cm = kmalloc(sizeof *cm, GFP_KERNEL);
+ if (!cm)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_lock(&cma_mutex);
+
+ start = gen_pool_alloc_aligned(cma->pool, size,
+ alignment ? ffs(alignment) - 1 : 0);
+ if (!start)
+ goto error1;
+
+ if (cma->migrate) {
+ unsigned long pfn = phys_to_pfn(start);
+ ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT), 0);
+ if (ret)
+ goto error2;
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ cm->cma = cma;
+ cm->phys = start;
+ cm->size = size;
+ atomic_set(&cm->pinned, 0);
+ atomic_set(&cm->mapped, 0);
+
+ pr_debug("%s(): returning [%p]\n", __func__, (void *)cm);
+ return cm;
+
+error2:
+ gen_pool_free((void *)cma, start, size);
+error1:
+ mutex_unlock(&cma_mutex);
+ kfree(cm);
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(cm_alloc);
+
+void cm_free(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+
+ if (WARN_ON(atomic_read(&cm->pinned) || atomic_read(&cm->mapped)))
+ return;
+
+ mutex_lock(&cma_mutex);
+
+ gen_pool_free(cm->cma->pool, cm->phys, cm->size);
+ if (cm->cma->migrate)
+ free_contig_pages(phys_to_page(cm->phys),
+ cm->size >> PAGE_SHIFT);
+
+ mutex_unlock(&cma_mutex);
+
+ kfree(cm);
+}
+EXPORT_SYMBOL_GPL(cm_free);
+
+
+/************************* Mapping and addresses *************************/
+
+/*
+ * Currently no-operations but keep reference counters for error
+ * checking.
+ */
+
+unsigned long cm_pin(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ atomic_inc(&cm->pinned);
+ return cm->phys;
+}
+EXPORT_SYMBOL_GPL(cm_pin);
+
+void cm_unpin(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ WARN_ON(!atomic_add_unless(&cm->pinned, -1, 0));
+}
+EXPORT_SYMBOL_GPL(cm_unpin);
+
+void *cm_vmap(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ atomic_inc(&cm->mapped);
+ /*
+ * XXX We should probably do something more clever in the
+ * future. The memory might be highmem after all.
+ */
+ return phys_to_virt(cm->phys);
+}
+EXPORT_SYMBOL_GPL(cm_vmap);
+
+void cm_vunmap(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ WARN_ON(!atomic_add_unless(&cm->mapped, -1, 0));
+}
+EXPORT_SYMBOL_GPL(cm_vunmap);
--
1.7.2.3

2010-12-15 20:38:45

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 05/12] mm: alloc_contig_freed_pages() added

From: KAMEZAWA Hiroyuki <[email protected]>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range. Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Michal Nazarewicz <[email protected]>
---
include/linux/page-isolation.h | 3 ++
mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
*/
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);

/*
* For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 826ba69..be240a3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5425,6 +5425,50 @@ out:
spin_unlock_irqrestore(&zone->lock, flags);
}

+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+ gfp_t flag)
+{
+ unsigned long pfn = start, count;
+ struct page *page;
+ struct zone *zone;
+ int order;
+
+ VM_BUG_ON(!pfn_valid(start));
+ zone = page_zone(pfn_to_page(start));
+
+ spin_lock_irq(&zone->lock);
+
+ page = pfn_to_page(pfn);
+ for (;;) {
+ VM_BUG_ON(page_count(page) || !PageBuddy(page));
+ list_del(&page->lru);
+ order = page_order(page);
+ zone->free_area[order].nr_free--;
+ rmv_page_order(page);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+ pfn += 1 << order;
+ if (pfn >= end)
+ break;
+ VM_BUG_ON(!pfn_valid(pfn));
+ page += 1 << order;
+ }
+
+ spin_unlock_irq(&zone->lock);
+
+ /* After this, pages in the range can be freed one be one */
+ page = pfn_to_page(start);
+ for (count = pfn - start; count; --count, ++page)
+ prep_new_page(page, 0, flag);
+
+ return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+ for (; nr_pages; --nr_pages, ++page)
+ __free_page(page);
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* All pages in the range must be isolated before calling this.
--
1.7.2.3

2010-12-15 20:38:47

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 04/12] mm: move some functions from memory_hotplug.c to page_isolation.c

From: KAMEZAWA Hiroyuki <[email protected]>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
[mina86: reworded commit message]
Signed-off-by: Michal Nazarewicz <[email protected]>
---
include/linux/page-isolation.h | 7 +++
mm/memory_hotplug.c | 108 --------------------------------------
mm/page_isolation.c | 111 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 108 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);

+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);

#endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2c6523a..2b18cb5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -634,114 +634,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
}

/*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct zone *zone = NULL;
- struct page *page;
- int i;
- for (pfn = start_pfn;
- pfn < end_pfn;
- pfn += MAX_ORDER_NR_PAGES) {
- i = 0;
- /* This is just a CONFIG_HOLES_IN_ZONE check.*/
- while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
- i++;
- if (i == MAX_ORDER_NR_PAGES)
- continue;
- page = pfn_to_page(pfn + i);
- if (zone && page_zone(page) != zone)
- return 0;
- zone = page_zone(page);
- }
- return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
- unsigned long pfn;
- struct page *page;
- for (pfn = start; pfn < end; pfn++) {
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- if (PageLRU(page))
- return pfn;
- }
- }
- return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
- /* This should be improooooved!! */
- return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES (256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct page *page;
- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
- int not_managed = 0;
- int ret = 0;
- LIST_HEAD(source);
-
- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
- if (!pfn_valid(pfn))
- continue;
- page = pfn_to_page(pfn);
- if (!page_count(page))
- continue;
- /*
- * We can skip free pages. And we can only deal with pages on
- * LRU.
- */
- ret = isolate_lru_page(page);
- if (!ret) { /* Success */
- list_add_tail(&page->lru, &source);
- move_pages--;
- inc_zone_page_state(page, NR_ISOLATED_ANON +
- page_is_file_cache(page));
-
- } else {
-#ifdef CONFIG_DEBUG_VM
- printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
- pfn);
- dump_page(page);
-#endif
- /* Becasue we don't have big zone->lock. we should
- check this again here. */
- if (page_count(page)) {
- not_managed++;
- ret = -EBUSY;
- break;
- }
- }
- }
- if (!list_empty(&source)) {
- if (not_managed) {
- putback_lru_pages(&source);
- goto out;
- }
- /* this function returns # of failed pages */
- ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
- if (ret)
- putback_lru_pages(&source);
- }
-out:
- return ret;
-}
-
-/*
* remove from free_area[] and mark all as Reserved.
*/
static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..077cf19 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
#include <linux/mm.h>
#include <linux/page-isolation.h>
#include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
#include "internal.h"

static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
}
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct zone *zone = NULL;
+ struct page *page;
+ int i;
+ for (pfn = start_pfn;
+ pfn < end_pfn;
+ pfn += MAX_ORDER_NR_PAGES) {
+ i = 0;
+ /* This is just a CONFIG_HOLES_IN_ZONE check.*/
+ while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+ i++;
+ if (i == MAX_ORDER_NR_PAGES)
+ continue;
+ page = pfn_to_page(pfn + i);
+ if (zone && page_zone(page) != zone)
+ return 0;
+ zone = page_zone(page);
+ }
+ return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+ unsigned long pfn;
+ struct page *page;
+ for (pfn = start; pfn < end; pfn++) {
+ if (pfn_valid(pfn)) {
+ page = pfn_to_page(pfn);
+ if (PageLRU(page))
+ return pfn;
+ }
+ }
+ return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+ /* This should be improooooved!! */
+ return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES (256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct page *page;
+ int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+ int not_managed = 0;
+ int ret = 0;
+ LIST_HEAD(source);
+
+ for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+ if (!pfn_valid(pfn))
+ continue;
+ page = pfn_to_page(pfn);
+ if (!page_count(page))
+ continue;
+ /*
+ * We can skip free pages. And we can only deal with pages on
+ * LRU.
+ */
+ ret = isolate_lru_page(page);
+ if (!ret) { /* Success */
+ list_add_tail(&page->lru, &source);
+ move_pages--;
+ inc_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+
+ } else {
+#ifdef CONFIG_DEBUG_VM
+ printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+ pfn);
+ dump_page(page);
+#endif
+ /* Because we don't have big zone->lock. we should
+ check this again here. */
+ if (page_count(page)) {
+ not_managed++;
+ ret = -EBUSY;
+ break;
+ }
+ }
+ }
+ if (!list_empty(&source)) {
+ if (not_managed) {
+ putback_lru_pages(&source);
+ goto out;
+ }
+ /* this function returns # of failed pages */
+ ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+ if (ret)
+ putback_lru_pages(&source);
+ }
+out:
+ return ret;
+}
--
1.7.2.3

2010-12-15 20:39:34

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 10/12] mm: MIGRATE_CMA support added to CMA

This commit adds MIGRATE_CMA migratetype support to the CMA.
The advantage is that an (almost) arbitrary memory range can
be marked as MIGRATE_CMA which may not be the case with
ZONE_MOVABLE.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/cma.h | 58 ++++++++++++++++---
mm/cma.c | 161 +++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 194 insertions(+), 25 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index e9575fd..8952531 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -71,9 +71,14 @@
* a platform/machine specific function. For the former CMA
* provides the following functions:
*
+ * cma_init_migratetype()
* cma_reserve()
* cma_create()
*
+ * The first one initialises a portion of reserved memory so that it
+ * can be used with CMA. The second first tries to reserve memory
+ * (using memblock) and then initialise it.
+ *
* The cma_reserve() function must be called when memblock is still
* operational and reserving memory with it is still possible. On
* ARM platform the "reserve" machine callback is a perfect place to
@@ -93,21 +98,56 @@ struct cma;
/* Contiguous Memory chunk */
struct cm;

+#ifdef CONFIG_MIGRATE_CMA
+
+/**
+ * cma_init_migratetype() - initialises range of physical memory to be used
+ * with CMA context.
+ * @start: start address of the memory range in bytes.
+ * @size: size of the memory range in bytes.
+ *
+ * The range must be MAX_ORDER_NR_PAGES aligned and it must have been
+ * already reserved (eg. with memblock).
+ *
+ * The actual initialisation is deferred until subsys initcalls are
+ * evaluated (unless this has already happened).
+ *
+ * Returns zero on success or negative error.
+ */
+int cma_init_migratetype(unsigned long start, unsigned long end);
+
+#else
+
+static inline int cma_init_migratetype(unsigned long start, unsigned long end)
+{
+ (void)start; (void)end;
+ return -EOPNOTSUPP;
+}
+
+#endif
+
/**
* cma_reserve() - reserves memory.
* @start: start address of the memory range in bytes hint; if unsure
* pass zero.
* @size: size of the memory to reserve in bytes.
* @alignment: desired alignment in bytes (must be power of two or zero).
+ * @init_migratetype: whether to initialise pageblocks.
+ *
+ * It will use memblock to allocate memory. If @init_migratetype is
+ * true, the function will also call cma_init_migratetype() on
+ * reserved region so that a non-private CMA context can be created on
+ * given range.
*
- * It will use memblock to allocate memory. @start and @size will be
- * aligned to PAGE_SIZE.
+ * @start and @size will be aligned to PAGE_SIZE if @init_migratetype
+ * is false or to (MAX_ORDER_NR_PAGES << PAGE_SHIFT) if
+ * @init_migratetype is true.
*
* Returns reserved's area physical address or value that yields true
* when checked with IS_ERR_VALUE().
*/
unsigned long cma_reserve(unsigned long start, unsigned long size,
- unsigned long alignment);
+ unsigned long alignment, _Bool init_migratetype);

/**
* cma_create() - creates a CMA context.
@@ -118,12 +158,14 @@ unsigned long cma_reserve(unsigned long start, unsigned long size,
*
* The range must be page aligned. Different contexts cannot overlap.
*
- * Unless @private is true the memory range must lay in ZONE_MOVABLE.
- * If @private is true no underlaying memory checking is done and
- * during allocation no pages migration will be performed - it is
- * assumed that the memory is reserved and only CMA manages it.
+ * Unless @private is true the memory range must either lay in
+ * ZONE_MOVABLE or must have been initialised with
+ * cma_init_migratetype() function. If @private is true no
+ * underlaying memory checking is done and during allocation no pages
+ * migration will be performed - it is assumed that the memory is
+ * reserved and only CMA manages it.
*
- * @start and @size must be page and @min_alignment alignment.
+ * @start and @size must be page and @min_alignment aligned.
* @min_alignment specifies the minimal alignment that user will be
* able to request through cm_alloc() function. In most cases one
* will probably pass zero as @min_alignment but if the CMA context
diff --git a/mm/cma.c b/mm/cma.c
index d82361b..4017dee 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -57,21 +57,130 @@ static unsigned long phys_to_pfn(phys_addr_t phys)

/************************* Initialise CMA *************************/

+#ifdef CONFIG_MIGRATE_CMA
+
+static struct cma_grabbed {
+ unsigned long start;
+ unsigned long size;
+} cma_grabbed[8] __initdata;
+static unsigned cma_grabbed_count __initdata;
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_give_back(unsigned long start, unsigned long size)
+{
+ unsigned long pfn = phys_to_pfn(start);
+ unsigned i = size >> PAGE_SHIFT;
+ struct zone *zone;
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ VM_BUG_ON(!pfn_valid(pfn));
+ zone = page_zone(pfn_to_page(pfn));
+
+ do {
+ VM_BUG_ON(!pfn_valid(pfn));
+ VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+ if (!(pfn & (pageblock_nr_pages - 1)))
+ __free_pageblock_cma(pfn_to_page(pfn));
+ ++pfn;
+ } while (--i);
+
+ return 0;
+}
+
+#else
+
+static int __cma_give_back(unsigned long start, unsigned long size)
+{
+ unsigned i = size >> (PAGE_SHIFT + pageblock_order);
+ struct page *p = phys_to_page(start);
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ do {
+ __free_pageblock_cma(p);
+ p += pageblock_nr_pages;
+ } while (--i);
+
+ return 0;
+}
+
+#endif
+
+static int __init __cma_queue_give_back(unsigned long start, unsigned long size)
+{
+ if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed))
+ return -ENOSPC;
+
+ cma_grabbed[cma_grabbed_count].start = start;
+ cma_grabbed[cma_grabbed_count].size = size;
+ ++cma_grabbed_count;
+ return 0;
+}
+
+static int (*cma_give_back)(unsigned long start, unsigned long size) =
+ __cma_queue_give_back;
+
+static int __init cma_give_back_queued(void)
+{
+ struct cma_grabbed *r = cma_grabbed;
+ unsigned i = cma_grabbed_count;
+
+ pr_debug("%s(): will give %u range(s)\n", __func__, i);
+
+ cma_give_back = __cma_give_back;
+
+ for (; i; --i, ++r)
+ __cma_give_back(r->start, r->size);
+
+ return 0;
+}
+subsys_initcall(cma_give_back_queued);
+
+int __ref cma_init_migratetype(unsigned long start, unsigned long size)
+{
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ if (!size)
+ return -EINVAL;
+ if ((start | size) & ((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1))
+ return -EINVAL;
+ if (start + size < start)
+ return -EOVERFLOW;
+
+ return cma_give_back(start, size);
+}
+
+#endif
+
unsigned long cma_reserve(unsigned long start, unsigned long size,
- unsigned long alignment)
+ unsigned long alignment, bool init_migratetype)
{
pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
(void *)alignment);

+#ifndef CONFIG_MIGRATE_CMA
+ if (init_migratetype)
+ return -EOPNOTSUPP;
+#endif
+
/* Sanity checks */
if (!size || (alignment & (alignment - 1)))
return (unsigned long)-EINVAL;

/* Sanitise input arguments */
- start = PAGE_ALIGN(start);
- size = PAGE_ALIGN(size);
- if (alignment < PAGE_SIZE)
- alignment = PAGE_SIZE;
+ if (init_migratetype) {
+ start = ALIGN(start, MAX_ORDER_NR_PAGES << PAGE_SHIFT);
+ size = ALIGN(size , MAX_ORDER_NR_PAGES << PAGE_SHIFT);
+ if (alignment < (MAX_ORDER_NR_PAGES << PAGE_SHIFT))
+ alignment = MAX_ORDER_NR_PAGES << PAGE_SHIFT;
+ } else {
+ start = PAGE_ALIGN(start);
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+ }

/* Reserve memory */
if (start) {
@@ -94,6 +203,15 @@ unsigned long cma_reserve(unsigned long start, unsigned long size,
}
}

+ /* CMA Initialise */
+ if (init_migratetype) {
+ int ret = cma_init_migratetype(start, size);
+ if (ret < 0) {
+ memblock_free(start, size);
+ return ret;
+ }
+ }
+
return start;
}

@@ -101,12 +219,13 @@ unsigned long cma_reserve(unsigned long start, unsigned long size,
/************************** CMA context ***************************/

struct cma {
- bool migrate;
+ int migratetype;
struct gen_pool *pool;
};

static int __cma_check_range(unsigned long start, unsigned long size)
{
+ int migratetype = MIGRATE_MOVABLE;
unsigned long pfn, count;
struct page *page;
struct zone *zone;
@@ -115,8 +234,13 @@ static int __cma_check_range(unsigned long start, unsigned long size)
if (WARN_ON(!pfn_valid(start)))
return -EINVAL;

+#ifdef CONFIG_MIGRATE_CMA
+ if (page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE)
+ migratetype = MIGRATE_CMA;
+#else
if (WARN_ON(page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE))
return -EINVAL;
+#endif

/* First check if all pages are valid and in the same zone */
zone = page_zone(pfn_to_page(start));
@@ -134,20 +258,20 @@ static int __cma_check_range(unsigned long start, unsigned long size)
page = pfn_to_page(start);
count = (pfn - start) >> PAGE_SHIFT;
do {
- if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_MOVABLE))
+ if (WARN_ON(get_pageblock_migratetype(page) != migratetype))
return -EINVAL;
page += pageblock_nr_pages;
} while (--count);

- return 0;
+ return migratetype;
}

struct cma *cma_create(unsigned long start, unsigned long size,
unsigned long min_alignment, bool private)
{
struct gen_pool *pool;
+ int migratetype, ret;
struct cma *cma;
- int ret;

pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);

@@ -162,10 +286,12 @@ struct cma *cma_create(unsigned long start, unsigned long size,
if (start + size < start)
return ERR_PTR(-EOVERFLOW);

- if (!private) {
- ret = __cma_check_range(start, size);
- if (ret < 0)
- return ERR_PTR(ret);
+ if (private) {
+ migratetype = 0;
+ } else {
+ migratetype = __cma_check_range(start, size);
+ if (migratetype < 0)
+ return ERR_PTR(migratetype);
}

cma = kmalloc(sizeof *cma, GFP_KERNEL);
@@ -182,7 +308,7 @@ struct cma *cma_create(unsigned long start, unsigned long size,
if (unlikely(ret))
goto error2;

- cma->migrate = !private;
+ cma->migratetype = migratetype;
cma->pool = pool;

pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
@@ -238,9 +364,10 @@ struct cm *cm_alloc(struct cma *cma, unsigned long size,
if (!start)
goto error1;

- if (cma->migrate) {
+ if (cma->migratetype) {
unsigned long pfn = phys_to_pfn(start);
- ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT), 0);
+ ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT),
+ 0, cma->migratetype);
if (ret)
goto error2;
}
@@ -275,7 +402,7 @@ void cm_free(struct cm *cm)
mutex_lock(&cma_mutex);

gen_pool_free(cm->cma->pool, cm->phys, cm->size);
- if (cm->cma->migrate)
+ if (cm->cma->migratetype)
free_contig_pages(phys_to_page(cm->phys),
cm->size >> PAGE_SHIFT);

--
1.7.2.3

2010-12-15 20:38:42

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 01/12] mm: migrate.c: fix compilation error

GCC complained about update_mmu_cache() not being defined
in migrate.c. Including <asm/tlbflush.h> seems to solve the problem.

Signed-off-by: Michal Nazarewicz <[email protected]>
---
mm/migrate.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..6ae8a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,6 +35,8 @@
#include <linux/hugetlb.h>
#include <linux/gfp.h>

+#include <asm/tlbflush.h>
+
#include "internal.h"

#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
--
1.7.2.3

2010-12-15 20:38:43

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 02/12] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

From: Michal Nazarewicz <[email protected]>

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked. This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz <[email protected]>
---
include/linux/bitmap.h | 24 +++++++++++++++++++-----
lib/bitmap.c | 22 ++++++++++++----------
2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits);

extern void bitmap_set(unsigned long *map, int i, int len);
extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask)
+{
+ return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}

extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
}
EXPORT_SYMBOL(bitmap_clear);

-/*
+/**
* bitmap_find_next_zero_area - find a contiguous aligned zero area
* @map: The address to base the search on
* @size: The bitmap size in bits
* @start: The bitnumber to start searching at
* @nr: The number of zeroed bits we're looking for
* @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
*
* The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
*/
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset)
{
unsigned long index, end, i;
again:
index = find_next_zero_bit(map, size, start);

/* Align allocation */
- index = __ALIGN_MASK(index, align_mask);
+ index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;

end = index + nr;
if (end > size)
@@ -350,7 +352,7 @@ again:
}
return index;
}
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);

/*
* Bitmap printing & parsing functions: first version by Bill Irwin,
--
1.7.2.3

2010-12-15 20:39:50

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 11/12] mm: cma: Test device and application added

This patch adds a "cma" misc device which lets user space use the
CMA API. This device is meant for testing. A testing application
is also provided.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
drivers/misc/Kconfig | 28 +++
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 238 +++++++++++++++++++++++++
include/linux/cma.h | 29 +++
tools/cma/cma-test.c | 457 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 753 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 tools/cma/cma-test.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 1e1a4be..b90e36b 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -458,4 +458,32 @@ source "drivers/misc/cb710/Kconfig"
source "drivers/misc/iwmc3200top/Kconfig"
source "drivers/misc/ti-st/Kconfig"

+# Selet this for platforms/machines that implemented code making
+# the CMA test device usable.
+config CMA_DEVICE_POSSIBLE
+ bool
+
+config CMA_DEVICE
+ tristate "CMA test device (DEVELOPEMENT)"
+ depends on CMA && CMA_DEVICE_POSSIBLE
+ help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space. This is for testing of the CMA framework.
+
+ If unsure, say "n"
+
+# Selet this for platforms/machines that implemented code making
+# the CMA test device usable.
+config CMA_DEVICE_POSSIBLE
+ bool
+
+config CMA_DEVICE
+ tristate "CMA test device (DEVELOPEMENT)"
+ depends on CMA && CMA_DEVICE_POSSIBLE
+ help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space. This is for testing of the CMA framework.
+
+ If unsure, say "n"
+
endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..f8eadd4 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
obj-$(CONFIG_PCH_PHUB) += pch_phub.o
obj-y += ti-st/
obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o
+obj-$(CONFIG_CMA_DEVICE) += cma-dev.o
diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c
new file mode 100644
index 0000000..6c36064
--- /dev/null
+++ b/drivers/misc/cma-dev.c
@@ -0,0 +1,238 @@
+/*
+ * Contiguous Memory Allocator userspace driver
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR_VALUE() */
+#include <linux/fs.h> /* struct file */
+#include <linux/mm.h> /* Memory stuff */
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/module.h> /* Standard module stuff */
+#include <linux/device.h> /* struct device, dev_dbg() */
+#include <linux/types.h> /* Just to be safe ;) */
+#include <linux/uaccess.h> /* __copy_{to,from}_user */
+#include <linux/miscdevice.h> /* misc_register() and company */
+
+#include <linux/cma.h>
+
+#include <plat/cma-stub.h>
+
+static int cma_file_open(struct inode *inode, struct file *file);
+static int cma_file_release(struct inode *inode, struct file *file);
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma);
+
+static struct miscdevice cma_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "cma",
+ .fops = &(const struct file_operations) {
+ .owner = THIS_MODULE,
+ .open = cma_file_open,
+ .release = cma_file_release,
+ .unlocked_ioctl = cma_file_ioctl,
+ .mmap = cma_file_mmap,
+ },
+};
+#define cma_dev (cma_miscdev.this_device)
+
+struct cma_private_data {
+ struct cm *cm;
+ unsigned long size;
+ unsigned long phys;
+};
+
+static int cma_file_open(struct inode *inode, struct file *file)
+{
+ struct cma_private_data *prv;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!cma_ctx)
+ return -EOPNOTSUPP;
+
+ prv = kzalloc(sizeof *prv, GFP_KERNEL);
+ if (!prv)
+ return -ENOMEM;
+
+ file->private_data = prv;
+
+ return 0;
+}
+
+static int cma_file_release(struct inode *inode, struct file *file)
+{
+ struct cma_private_data *prv = file->private_data;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (prv->cm) {
+ cm_unpin(prv->cm);
+ cm_free(prv->cm);
+ }
+ kfree(prv);
+
+ return 0;
+}
+
+static long cma_file_ioctl_req(struct cma_private_data *prv, unsigned long arg)
+{
+ struct cma_alloc_request req;
+ struct cm *cm;
+
+ dev_dbg(cma_dev, "%s()\n", __func__);
+
+ if (!arg)
+ return -EINVAL;
+
+ if (copy_from_user(&req, (void *)arg, sizeof req))
+ return -EFAULT;
+
+ if (req.magic != CMA_MAGIC)
+ return -ENOTTY;
+
+ /* May happen on 32 bit system. */
+ if (req.size > ~(unsigned long)0 || req.alignment > ~(unsigned long)0)
+ return -EINVAL;
+
+ req.size = PAGE_ALIGN(req.size);
+ if (req.size > ~(unsigned long)0)
+ return -EINVAL;
+
+ cm = cm_alloc(cma_ctx, req.size, req.alignment);
+ if (IS_ERR(cm))
+ return PTR_ERR(cm);
+
+ prv->phys = cm_pin(cm);
+ prv->size = req.size;
+ req.start = prv->phys;
+ if (copy_to_user((void *)arg, &req, sizeof req)) {
+ cm_free(cm);
+ return -EFAULT;
+ }
+ prv->cm = cm;
+
+ dev_dbg(cma_dev, "allocated %p@%p\n",
+ (void *)prv->size, (void *)prv->phys);
+
+ return 0;
+}
+
+static long
+cma_file_ioctl_pattern(struct cma_private_data *prv, unsigned long arg)
+{
+ unsigned long *_it, *it, *end, v;
+
+ dev_dbg(cma_dev, "%s(%s)\n", __func__, arg ? "fill" : "verify");
+
+ _it = phys_to_virt(prv->phys);
+ end = _it + prv->size / sizeof *_it;
+
+ if (arg)
+ for (v = 0, it = _it; it != end; ++v, ++it)
+ *it = v;
+
+ for (v = 0, it = _it; it != end; ++v, ++it)
+ if (*it != v)
+ goto error;
+
+ return prv->size;
+
+error:
+ dev_dbg(cma_dev, "at %p + %x got %lx, expected %lx\n",
+ (void *)_it, (it - _it) * sizeof *it, *it, v);
+ print_hex_dump(KERN_DEBUG, "cma: ", DUMP_PREFIX_ADDRESS,
+ 16, sizeof *it, it,
+ min_t(size_t, 128, (end - it) * sizeof *it), 0);
+ return (it - _it) * sizeof *it;
+}
+
+static long cma_file_ioctl_dump(struct cma_private_data *prv, unsigned long len)
+{
+ unsigned long *it;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)len);
+
+ it = phys_to_virt(prv->phys);
+ len = min(len & ~(sizeof *it - 1), prv->size);
+ print_hex_dump(KERN_DEBUG, "cma: ", DUMP_PREFIX_ADDRESS,
+ 16, sizeof *it, it, len, 0);
+
+ return 0;
+}
+
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+{
+ struct cma_private_data *prv = file->private_data;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if ((cmd == IOCTL_CMA_ALLOC) != !prv->cm)
+ return -EBADFD;
+
+ switch (cmd) {
+ case IOCTL_CMA_ALLOC:
+ return cma_file_ioctl_req(prv, arg);
+
+ case IOCTL_CMA_PATTERN:
+ return cma_file_ioctl_pattern(prv, arg);
+
+ case IOCTL_CMA_DUMP:
+ return cma_file_ioctl_dump(prv, arg);
+
+ default:
+ return -ENOTTY;
+ }
+}
+
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct cma_private_data *prv = file->private_data;
+ unsigned long pgoff, offset, length;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!prv->cm)
+ return -EBADFD;
+
+ pgoff = vma->vm_pgoff;
+ offset = pgoff << PAGE_SHIFT;
+ length = vma->vm_end - vma->vm_start;
+
+ if (offset >= prv->size
+ || length > prv->size
+ || offset + length > prv->size)
+ return -ENOSPC;
+
+ return remap_pfn_range(vma, vma->vm_start,
+ __phys_to_pfn(prv->phys) + pgoff,
+ length, vma->vm_page_prot);
+}
+
+static int __init cma_dev_init(void)
+{
+ int ret = misc_register(&cma_miscdev);
+ pr_debug("miscdev: register returned: %d\n", ret);
+ return ret;
+}
+module_init(cma_dev_init);
+
+static void __exit cma_dev_exit(void)
+{
+ dev_dbg(cma_dev, "deregisterring\n");
+ misc_deregister(&cma_miscdev);
+}
+module_exit(cma_dev_exit);
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 8952531..fe5d2ba 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -89,6 +89,35 @@
* to be called after SLAB is initialised.
*/

+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+
+#define CMA_MAGIC (('c' << 24) | ('M' << 16) | ('a' << 8) | 0x42)
+
+/**
+ * An information about area exportable to user space.
+ * @magic: must always be CMA_MAGIC.
+ * @_pad: padding (ignored).
+ * @size: size of the chunk to allocate.
+ * @alignment: desired alignment of the chunk (must be power of two or zero).
+ * @start: when ioctl() finishes this stores physical address of the chunk.
+ */
+struct cma_alloc_request {
+ __u32 magic;
+ __u32 _pad;
+
+ /* __u64 to be compatible accross 32 and 64 bit systems. */
+ __u64 size;
+ __u64 alignment;
+ __u64 start;
+};
+
+#define IOCTL_CMA_ALLOC _IOWR('p', 0, struct cma_alloc_request)
+#define IOCTL_CMA_PATTERN _IO('p', 1)
+#define IOCTL_CMA_DUMP _IO('p', 2)
+
+
/***************************** Kernel level API *****************************/

#if defined __KERNEL__ && defined CONFIG_CMA
diff --git a/tools/cma/cma-test.c b/tools/cma/cma-test.c
new file mode 100644
index 0000000..6275ef5
--- /dev/null
+++ b/tools/cma/cma-test.c
@@ -0,0 +1,457 @@
+/*
+ * cma-test.c -- CMA testing application
+ *
+ * Copyright (C) 2010 Samsung Electronics
+ * Author: Michal Nazarewicz <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/* $(CROSS_COMPILE)gcc -Wall -Wextra -g -o cma-test cma-test.c */
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <ctype.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+
+#include <linux/cma.h>
+
+struct chunk {
+ struct chunk *next, *prev;
+ int fd;
+ unsigned long size;
+ unsigned long start;
+};
+
+static struct chunk root = {
+ .next = &root,
+ .prev = &root,
+};
+
+#define for_each(a) for (a = root.next; a != &root; a = a->next)
+
+static struct chunk *chunk_create(const char *prefix)
+{
+ struct chunk *chunk;
+ int fd;
+
+ chunk = malloc(sizeof *chunk);
+ if (!chunk) {
+ fprintf(stderr, "%s: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ fd = open("/dev/cma", O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "%s: /dev/cma: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ chunk->prev = chunk;
+ chunk->next = chunk;
+ chunk->fd = fd;
+ return chunk;
+}
+
+static void chunk_destroy(struct chunk *chunk)
+{
+ chunk->prev->next = chunk->next;
+ chunk->next->prev = chunk->prev;
+ close(chunk->fd);
+}
+
+static void chunk_add(struct chunk *chunk)
+{
+ chunk->next = &root;
+ chunk->prev = root.prev;
+ root.prev->next = chunk;
+ root.prev = chunk;
+}
+
+/* Parsing helpers */
+#define SKIP_SPACE(ch) do { while (isspace(*(ch))) ++(ch); } while (0)
+
+static int memparse(char *ptr, char **retptr, unsigned long *ret)
+{
+ unsigned long val;
+
+ SKIP_SPACE(ptr);
+
+ errno = 0;
+ val = strtoul(ptr, &ptr, 0);
+ if (errno)
+ return -1;
+
+ switch (*ptr) {
+ case 'G':
+ case 'g':
+ val <<= 10;
+ case 'M':
+ case 'm':
+ val <<= 10;
+ case 'K':
+ case 'k':
+ val <<= 10;
+ ++ptr;
+ }
+
+ if (retptr) {
+ SKIP_SPACE(ptr);
+ *retptr = ptr;
+ }
+
+ *ret = val;
+ return 0;
+}
+
+static void cmd_list(char *name, char *line, int arg)
+{
+ struct chunk *chunk;
+
+ (void)name; (void)line; (void)arg;
+
+ for_each(chunk)
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+}
+
+static void cmd_alloc(char *name, char *line, int arg)
+{
+ unsigned long size, alignment = 0;
+ struct cma_alloc_request req;
+ struct chunk *chunk;
+ int ret;
+
+ (void)arg;
+
+ if (memparse(line, &line, &size) < 0 || !size) {
+ fprintf(stderr, "%s: invalid size\n", name);
+ return;
+ }
+
+ if (*line == '/')
+ if (memparse(line, &line, &alignment) < 0) {
+ fprintf(stderr, "%s: invalid alignment\n", name);
+ return;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr, "%s: unknown argument(s) at the end: %s\n",
+ name, line);
+ return;
+ }
+
+ chunk = chunk_create(name);
+ if (!chunk)
+ return;
+
+ fprintf(stderr, "%s: allocating %p/%p\n", name,
+ (void *)size, (void *)alignment);
+
+ req.magic = CMA_MAGIC;
+ req.size = size;
+ req.alignment = alignment;
+ req.start = 0;
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_ALLOC, &req);
+ if (ret < 0) {
+ fprintf(stderr, "%s: cma_alloc: %s\n", name, strerror(errno));
+ chunk_destroy(chunk);
+ } else {
+ chunk->size = req.size;
+ chunk->start = req.start;
+ chunk_add(chunk);
+
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+ }
+}
+
+static struct chunk *_cmd_numbered(char *name, char *line)
+{
+ struct chunk *chunk;
+
+ SKIP_SPACE(line);
+
+ if (*line) {
+ unsigned long num;
+
+ errno = 0;
+ num = strtoul(line, &line, 10);
+
+ if (errno || num > INT_MAX) {
+ fprintf(stderr, "%s: invalid number\n", name);
+ return NULL;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr,
+ "%s: unknown arguments at the end: %s\n",
+ name, line);
+ return NULL;
+ }
+
+ for_each(chunk)
+ if (chunk->fd == (int)num)
+ return chunk;
+ fprintf(stderr, "%s: no chunk %3lu\n", name, num);
+ return NULL;
+
+ } else {
+ chunk = root.prev;
+ if (chunk == &root) {
+ fprintf(stderr, "%s: no chunks\n", name);
+ return NULL;
+ }
+ return chunk;
+ }
+}
+
+static void cmd_free(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+ (void)arg;
+ if (chunk) {
+ fprintf(stderr, "%s: freeing %p@%p\n", name,
+ (void *)chunk->size, (void *)chunk->start);
+ chunk_destroy(chunk);
+ }
+}
+
+static void _cmd_pattern(char *name, unsigned long *ptr, unsigned long size,
+ int arg)
+{
+ unsigned long *end = ptr + size / sizeof *ptr, *it, v;
+
+ if (arg)
+ for (v = 0, it = ptr; it != end; ++v, ++it)
+ *it = v;
+
+ for (v = 0, it = ptr; it != end && *it == v; ++v, ++it)
+ /* nop */;
+
+ if (it != end)
+ fprintf(stderr, "%s: at +[%lx] got %lx, expected %lx\n",
+ name, (unsigned long)(it - ptr) * sizeof *it, *it, v);
+ else
+ fprintf(stderr, "%s: done\n", name);
+}
+
+static void _cmd_dump(char *name, uint32_t *ptr)
+{
+ unsigned lines = 32, groups;
+ uint32_t *it = ptr;
+
+ do {
+ printf("%s: %04lx:", name,
+ (unsigned long)(it - ptr) * sizeof *it);
+
+ groups = 4;
+ do {
+ printf(" %08lx", (unsigned long)*it);
+ ++it;
+ } while (--groups);
+
+ putchar('\n');
+ } while (--lines);
+}
+
+static void cmd_mapped(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+ void *ptr;
+
+ if (!chunk)
+ return;
+
+ ptr = mmap(NULL, chunk->size,
+ arg != 2 ? PROT_READ | PROT_WRITE : PROT_READ,
+ MAP_SHARED, chunk->fd, 0);
+
+ if (ptr == (void *)-1) {
+ fprintf(stderr, "%s: mapping failed: %s\n", name,
+ strerror(errno));
+ return;
+ }
+
+ switch (arg) {
+ case 0:
+ case 1:
+ _cmd_pattern(name, ptr, chunk->size, arg);
+ break;
+
+ case 2:
+ _cmd_dump(name, ptr);
+ }
+
+ munmap(ptr, chunk->size);
+}
+
+static void cmd_kpattern(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+ if (chunk) {
+ int ret;
+
+ fprintf(stderr, "%s: requesting kernel to %s %p@%p\n",
+ name, arg ? "fill" : "verify",
+ (void *)chunk->size, (void *)chunk->start);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_PATTERN, arg);
+ if (ret < 0)
+ fprintf(stderr, "%s: %s\n", name, strerror(errno));
+ else if ((unsigned long)ret < chunk->size)
+ fprintf(stderr, "%s: failed at +[%x]\n", name, ret);
+ else
+ fprintf(stderr, "%s: done\n", name);
+ }
+}
+
+static void cmd_kdump(char *name, char *line, int arg)
+{
+ struct chunk *chunk = _cmd_numbered(name, line);
+
+ (void)arg;
+
+ if (chunk) {
+ int ret;
+
+ fprintf(stderr, "%s: requesting kernel to dump 256B@%p\n",
+ name, (void *)chunk->start);
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_DUMP, 256);
+ if (ret < 0)
+ fprintf(stderr, "%s: %s\n", name, strerror(errno));
+ else
+ fprintf(stderr, "%s: done\n", name);
+ }
+}
+
+static const struct command {
+ const char short_name;
+ const char name[8];
+ void (*handle)(char *name, char *line, int arg);
+ int arg;
+ const char *help_args, *help;
+} commands[] = {
+ { 'l', "list", cmd_list, 0,
+ "", "list allocated chunks" },
+ { 'a', "alloc", cmd_alloc, 0,
+ "<size>[/<alignment>]", "allocate chunk" },
+ { 'f', "free", cmd_free, 0,
+ "[<num>]", "free an chunk" },
+ { 'w', "write", cmd_mapped, 1,
+ "[<num>]", "write data to chunk" },
+ { 'W', "kwrite", cmd_kpattern, 1,
+ "[<num>]", "let kernel write data to chunk" },
+ { 'v', "verify", cmd_mapped, 0,
+ "[<num>]", "verify chunk's content" },
+ { 'V', "kverify", cmd_kpattern, 0,
+ "[<num>]", "let kernel verify chunk's contet" },
+ { 'd', "dump", cmd_mapped, 2,
+ "[<num>]", "dump (some) content" },
+ { 'D', "kdump", cmd_kdump, 0,
+ "[<num>]", "let kernel dump (some) content" },
+ { '\0', "", NULL, 0, NULL, NULL }
+};
+
+static void handle_command(char *line)
+{
+ static char last_line[1024];
+
+ const struct command *cmd;
+ char *name, short_name = '\0';
+
+ SKIP_SPACE(line);
+ if (*line == '#')
+ return;
+
+ if (!*line)
+ strcpy(line, last_line);
+ else
+ strcpy(last_line, line);
+
+ name = line;
+ while (*line && !isspace(*line))
+ ++line;
+
+ if (*line) {
+ *line = '\0';
+ ++line;
+ }
+
+ if (!name[1])
+ short_name = name[0];
+
+ for (cmd = commands; *(cmd->name); ++cmd)
+ if (short_name
+ ? short_name == cmd->short_name
+ : !strcmp(name, cmd->name)) {
+ cmd->handle(name, line, cmd->arg);
+ return;
+ }
+
+ fprintf(stderr, "%s: unknown command\n", name);
+}
+
+int main(void)
+{
+ const struct command *cmd = commands;
+ unsigned no = 1;
+ char line[1024];
+ int skip = 0;
+
+ fputs("commands:\n", stderr);
+ do {
+ fprintf(stderr, " %c or %-7s %-10s %s\n",
+ cmd->short_name, cmd->name, cmd->help_args, cmd->help);
+ } while ((++cmd)->handle);
+ fputs(" # ... comment\n"
+ " <empty line> repeat previous\n"
+ "\n", stderr);
+
+ while (fgets(line, sizeof line, stdin)) {
+ char *nl = strchr(line, '\n');
+ if (nl) {
+ if (skip) {
+ fprintf(stderr, "cma: %d: line too long\n", no);
+ skip = 0;
+ } else {
+ *nl = '\0';
+ handle_command(line);
+ }
+ ++no;
+ } else {
+ skip = 1;
+ }
+ }
+
+ if (skip)
+ fprintf(stderr, "cma: %d: no new line at EOF\n", no);
+ return 0;
+}
--
1.7.2.3

2010-12-15 20:40:14

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 08/12] mm: MIGRATE_CMA migration type added

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/mmzone.h | 43 ++++++++++++++++++++----
mm/Kconfig | 14 ++++++++
mm/compaction.c | 10 +++++
mm/internal.h | 3 ++
mm/page_alloc.c | 87 +++++++++++++++++++++++++++++++++++++----------
5 files changed, 131 insertions(+), 26 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 39c24eb..cc798b1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
*/
#define PAGE_ALLOC_COSTLY_ORDER 3

-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE 1
-#define MIGRATE_MOVABLE 2
-#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE 3
-#define MIGRATE_ISOLATE 4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+ MIGRATE_UNMOVABLE,
+ MIGRATE_RECLAIMABLE,
+ MIGRATE_MOVABLE,
+ MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
+ MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_MIGRATE_CMA
+ /*
+ * MIGRATE_CMA migration type is designed to mimic the way
+ * ZONE_MOVABLE works. Only movable pages can be allocated
+ * from MIGRATE_CMA pageblocks and page allocator never
+ * implicitly change migration type of MIGRATE_CMA pageblock.
+ *
+ * The way to use it is to change migratetype of a range of
+ * pageblocks to MIGRATE_CMA which can be done by
+ * __free_pageblock_cma() function. What is important though
+ * is that a range of pageblocks must be aligned to
+ * MAX_ORDER_NR_PAGES should biggest page be bigger then
+ * a single pageblock.
+ */
+ MIGRATE_CMA,
+#endif
+ MIGRATE_ISOLATE, /* can't allocate from here */
+ MIGRATE_TYPES
+};
+
+#ifdef CONFIG_MIGRATE_CMA
+# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+# define is_migrate_cma(migratetype) false
+#endif

#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
}

+static inline bool is_pageblock_cma(struct page *page)
+{
+ return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
diff --git a/mm/Kconfig b/mm/Kconfig
index 2beab4d..32fb085 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -343,6 +343,20 @@ config CMA

For more information see <include/linux/cma.h>. If unsure, say "n".

+config MIGRATE_CMA
+ bool "Use MIGRATE_CMA migratetype"
+ depends on CMA
+ default y
+ help
+ This enables the use the MIGRATE_CMA migrate type in the CMA.
+ MIGRATE_CMA lets CMA work on almost arbitrary memory range and
+ not only inside ZONE_MOVABLE.
+
+ This option can also be selected by code that uses MIGRATE_CMA
+ even if CMA is not present.
+
+ If unsure, say "y".
+
config CMA_DEBUG
bool "CMA debug messages (DEVELOPEMENT)"
depends on CMA
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..c5e404b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -113,6 +113,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;

+ /* Keep MIGRATE_CMA alone as well. */
+ /*
+ * XXX Revisit. We currently cannot let compaction touch CMA
+ * pages since compaction insists on changing their migration
+ * type to MIGRATE_MOVABLE (see split_free_page() called from
+ * isolate_freepages_block() above).
+ */
+ if (is_migrate_cma(migratetype))
+ return false;
+
/* If the page is a large free page, then allow migration */
if (PageBuddy(page) && page_order(page) >= pageblock_order)
return true;
diff --git a/mm/internal.h b/mm/internal.h
index dedb0af..cc24e74 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page);
* in mm/page_alloc.c
*/
extern void __free_pages_bootmem(struct page *page, unsigned int order);
+#ifdef CONFIG_MIGRATE_CMA
+extern void __free_pageblock_cma(struct page *page);
+#endif
extern void prep_compound_page(struct page *page, unsigned long order);
#ifdef CONFIG_MEMORY_FAILURE
extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 008a6e8..e706282 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -717,6 +717,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
}
}

+#ifdef CONFIG_MIGRATE_CMA
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init __free_pageblock_cma(struct page *page)
+{
+ struct page *p = page;
+ unsigned i = pageblock_nr_pages;
+
+ prefetchw(p);
+ do {
+ if (--i)
+ prefetchw(p + 1);
+ __ClearPageReserved(p);
+ set_page_count(p, 0);
+ } while (++p, i);
+
+ set_page_refcounted(page);
+ set_pageblock_migratetype(page, MIGRATE_CMA);
+ __free_pages(page, pageblock_order);
+}
+
+#endif

/*
* The order of subdivision here is critical for the IO subsystem.
@@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
*/
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
+#ifdef CONFIG_MIGRATE_CMA
+ [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
+#else
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
- [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
+#endif
+ [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
};

/*
@@ -924,12 +952,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
- for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+ for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
migratetype = fallbacks[start_migratetype][i];

/* MIGRATE_RESERVE handled later if necessary */
if (migratetype == MIGRATE_RESERVE)
- continue;
+ break;

area = &(zone->free_area[current_order]);
if (list_empty(&area->free_list[migratetype]))
@@ -944,19 +972,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* pages to the preferred allocation list. If falling
* back for a reclaimable kernel allocation, be more
* agressive about taking ownership of free pages
+ *
+ * On the other hand, never change migration
+ * type of MIGRATE_CMA pageblocks nor move CMA
+ * pages on different free lists. We don't
+ * want unmovable pages to be allocated from
+ * MIGRATE_CMA areas.
*/
- if (unlikely(current_order >= (pageblock_order >> 1)) ||
- start_migratetype == MIGRATE_RECLAIMABLE ||
- page_group_by_mobility_disabled) {
- unsigned long pages;
+ if (!is_pageblock_cma(page) &&
+ (unlikely(current_order >= pageblock_order / 2) ||
+ start_migratetype == MIGRATE_RECLAIMABLE ||
+ page_group_by_mobility_disabled)) {
+ int pages;
pages = move_freepages_block(zone, page,
- start_migratetype);
+ start_migratetype);

- /* Claim the whole block if over half of it is free */
+ /*
+ * Claim the whole block if over half
+ * of it is free
+ */
if (pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled)
+ page_group_by_mobility_disabled)
set_pageblock_migratetype(page,
- start_migratetype);
+ start_migratetype);

migratetype = start_migratetype;
}
@@ -966,11 +1004,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
rmv_page_order(page);

/* Take ownership for orders >= pageblock_order */
- if (current_order >= pageblock_order)
+ if (current_order >= pageblock_order &&
+ !is_pageblock_cma(page))
change_pageblock_range(page, current_order,
start_migratetype);

- expand(zone, page, order, current_order, area, migratetype);
+ expand(zone, page, order, current_order, area,
+ is_migrate_cma(start_migratetype)
+ ? start_migratetype : migratetype);

trace_mm_page_alloc_extfrag(page, order, current_order,
start_migratetype, migratetype);
@@ -1042,7 +1083,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
list_add(&page->lru, list);
else
list_add_tail(&page->lru, list);
- set_page_private(page, migratetype);
+#ifdef CONFIG_MIGRATE_CMA
+ if (is_pageblock_cma(page))
+ set_page_private(page, MIGRATE_CMA);
+ else
+#endif
+ set_page_private(page, migratetype);
list = &page->lru;
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1178,8 +1224,8 @@ void free_hot_cold_page(struct page *page, int cold)
/*
* We only track unmovable, reclaimable and movable on pcp lists.
* Free ISOLATE pages back to the allocator because they are being
- * offlined but treat RESERVE as movable pages so we can get those
- * areas back if necessary. Otherwise, we may have to free
+ * offlined but treat RESERVE and CMA as movable pages so we can get
+ * those areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
*/
if (migratetype >= MIGRATE_PCPTYPES) {
@@ -1272,7 +1318,9 @@ int split_free_page(struct page *page)
if (order >= pageblock_order - 1) {
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages)
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ if (!is_pageblock_cma(page))
+ set_pageblock_migratetype(page,
+ MIGRATE_MOVABLE);
}

return 1 << order;
@@ -5309,7 +5357,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
if (zone_idx(zone) == ZONE_MOVABLE)
return true;

- if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+ if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+ is_pageblock_cma(page))
return true;

pfn = page_to_pfn(page);
--
1.7.2.3

2010-12-15 20:40:17

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 09/12] mm: MIGRATE_CMA isolation functions added

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 40 +++++++++++++++++++++++++++-------------
mm/page_alloc.c | 19 ++++++++++++-------
mm/page_isolation.c | 15 ++++++++-------
3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..177b307 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@

/*
* Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
* this will fail with -EBUSY.
*
* For isolating all pages in the range finally, the caller have to
* free all pages in the range. test_page_isolated() can be used for
* test it.
*/
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);

/*
* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
* target range is [start_pfn, end_pfn)
*/
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}

/*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
*/
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);

/*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
*/
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+ __unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
extern unsigned long alloc_contig_freed_pages(unsigned long start,
unsigned long end, gfp_t flag);
extern int alloc_contig_range(unsigned long start, unsigned long end,
- gfp_t flags);
+ gfp_t flags, unsigned migratetype);
extern void free_contig_pages(struct page *page, int nr_pages);

/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e706282..7f913d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5460,7 +5460,7 @@ out:
return ret;
}

-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
{
struct zone *zone;
unsigned long flags;
@@ -5468,8 +5468,8 @@ void unset_migratetype_isolate(struct page *page)
spin_lock_irqsave(&zone->lock, flags);
if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
goto out;
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
- move_freepages_block(zone, page, MIGRATE_MOVABLE);
+ set_pageblock_migratetype(page, migratetype);
+ move_freepages_block(zone, page, migratetype);
out:
spin_unlock_irqrestore(&zone->lock, flags);
}
@@ -5574,6 +5574,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
* @start: start PFN to allocate
* @end: one-past-the-last PFN to allocate
* @flags: flags passed to alloc_contig_freed_pages().
+ * @migratetype: migratetype of the underlaying pageblocks (either
+ * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks
+ * in range must have the same migratetype and it must
+ * be either of the two.
*
* The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
* aligned, hovewer it's callers responsibility to guarantee that we
@@ -5585,7 +5589,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
* need to be freed with free_contig_pages().
*/
int alloc_contig_range(unsigned long start, unsigned long end,
- gfp_t flags)
+ gfp_t flags, unsigned migratetype)
{
unsigned long _start, _end;
int ret;
@@ -5613,8 +5617,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* them.
*/

- ret = start_isolate_page_range(pfn_to_maxpage(start),
- pfn_to_maxpage_up(end));
+ ret = __start_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end), migratetype);
if (ret)
goto done;

@@ -5652,7 +5656,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,

ret = 0;
done:
- undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ __undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+ migratetype);
return ret;
}

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 077cf19..ea9781e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
}

/*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
* to be MIGRATE_ISOLATE.
* @start_pfn: The lower PFN of the range to be isolated.
* @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
*
* Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
* the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* start_pfn/end_pfn must be aligned to pageblock_order.
* Returns 0 on success and -EBUSY if any part of range cannot be isolated.
*/
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
for (pfn = start_pfn;
pfn < undo_pfn;
pfn += pageblock_nr_pages)
- unset_migratetype_isolate(pfn_to_page(pfn));
+ __unset_migratetype_isolate(pfn_to_page(pfn), migratetype);

return -EBUSY;
}
@@ -67,8 +68,8 @@ undo:
/*
* Make isolated pages available again.
*/
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
page = __first_valid_page(pfn, pageblock_nr_pages);
if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
continue;
- unset_migratetype_isolate(page);
+ __unset_migratetype_isolate(page, migratetype);
}
return 0;
}
--
1.7.2.3

2010-12-15 20:40:47

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 06/12] mm: alloc_contig_range() added

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages. It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/page-isolation.h | 2 +
mm/page_alloc.c | 144 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
extern unsigned long alloc_contig_freed_pages(unsigned long start,
unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+ gfp_t flags);
extern void free_contig_pages(struct page *page, int nr_pages);

/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index be240a3..008a6e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5463,6 +5463,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
return pfn;
}

+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+ return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY 5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+ int migration_failed = 0, ret;
+ unsigned long pfn = start;
+
+ /*
+ * Some code "borrowed" from KAMEZAWA Hiroyuki's
+ * __alloc_contig_pages().
+ */
+
+ for (;;) {
+ pfn = scan_lru_pages(pfn, end);
+ if (!pfn || pfn >= end)
+ break;
+
+ ret = do_migrate_range(pfn, end);
+ if (!ret) {
+ migration_failed = 0;
+ } else if (ret != -EBUSY
+ || ++migration_failed >= MIGRATION_RETRY) {
+ return ret;
+ } else {
+ /* There are unstable pages.on pagevec. */
+ lru_add_drain_all();
+ /*
+ * there may be pages on pcplist before
+ * we mark the range as ISOLATED.
+ */
+ drain_all_pages();
+ }
+ cond_resched();
+ }
+
+ if (!migration_failed) {
+ /* drop all pages in pagevec and pcp list */
+ lru_add_drain_all();
+ drain_all_pages();
+ }
+
+ /* Make sure all pages are isolated */
+ if (WARN_ON(test_pages_isolated(start, end)))
+ return -EBUSY;
+
+ return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start: start PFN to allocate
+ * @end: one-past-the-last PFN to allocate
+ * @flags: flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code. On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+ gfp_t flags)
+{
+ unsigned long _start, _end;
+ int ret;
+
+ /*
+ * What we do here is we mark all pageblocks in range as
+ * MIGRATE_ISOLATE. Because of the way page allocator work, we
+ * align the range to MAX_ORDER pages so that page allocator
+ * won't try to merge buddies from different pageblocks and
+ * change MIGRATE_ISOLATE to some other migration type.
+ *
+ * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+ * migrate the pages from an unaligned range (ie. pages that
+ * we are interested in). This will put all the pages in
+ * range back to page allocator as MIGRATE_ISOLATE.
+ *
+ * When this is done, we take the pages in range from page
+ * allocator removing them from the buddy system. This way
+ * page allocator will never consider using them.
+ *
+ * This lets us mark the pageblocks back as
+ * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+ * MAX_ORDER aligned range but not in the unaligned, original
+ * range are put back to page allocator so that buddy can use
+ * them.
+ */
+
+ ret = start_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end));
+ if (ret)
+ goto done;
+
+ ret = __alloc_contig_migrate_range(start, end);
+ if (ret)
+ goto done;
+
+ /*
+ * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+ * aligned blocks that are marked as MIGRATE_ISOLATE. What's
+ * more, all pages in [start, end) are free in page allocator.
+ * What we are going to do is to allocate all pages from
+ * [start, end) (that is remove them from page allocater).
+ *
+ * The only problem is that pages at the beginning and at the
+ * end of interesting range may be not aligned with pages that
+ * page allocator holds, ie. they can be part of higher order
+ * pages. Because of this, we reserve the bigger range and
+ * once this is done free the pages we are not interested in.
+ */
+
+ ret = 0;
+ while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+ if (WARN_ON(++ret >= MAX_ORDER))
+ return -EINVAL;
+
+ _start = start & (~0UL << ret);
+ _end = alloc_contig_freed_pages(_start, end, flags);
+
+ /* Free head and tail (if any) */
+ if (start != _start)
+ free_contig_pages(pfn_to_page(_start), start - _start);
+ if (end != _end)
+ free_contig_pages(pfn_to_page(end), _end - end);
+
+ ret = 0;
+done:
+ undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ return ret;
+}
+
void free_contig_pages(struct page *page, int nr_pages)
{
for (; nr_pages; --nr_pages, ++page)
--
1.7.2.3

2010-12-15 20:40:51

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv8 03/12] lib: genalloc: Generic allocator improvements

This commit adds a gen_pool_alloc_aligned() function to the
generic allocator API. It allows specifying alignment for the
allocated block. This feature uses
the bitmap_find_next_zero_area_off() function.

It also fixes possible issue with bitmap's last element being
not fully allocated (ie. space allocated for chunk->bits is
not a multiple of sizeof(long)).

It also makes some other smaller changes:
- moves structure definitions out of the header file,
- adds __must_check to functions returning value,
- makes gen_pool_add() return -ENOMEM rater than -1 on error,
- changes list_for_each to list_for_each_entry, and
- makes use of bitmap_clear().

Signed-off-by: Michal Nazarewicz <[email protected]>
---
include/linux/genalloc.h | 46 ++++++------
lib/genalloc.c | 182 ++++++++++++++++++++++++++-------------------
2 files changed, 129 insertions(+), 99 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 9869ef3..8ac7337 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -8,29 +8,31 @@
* Version 2. See the file COPYING for more details.
*/

+struct gen_pool;

-/*
- * General purpose special memory pool descriptor.
- */
-struct gen_pool {
- rwlock_t lock;
- struct list_head chunks; /* list of chunks in this pool */
- int min_alloc_order; /* minimum allocation order */
-};
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid);

-/*
- * General purpose special memory pool chunk descriptor.
+int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr,
+ size_t size, int nid);
+
+void gen_pool_destroy(struct gen_pool *pool);
+
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order);
+
+/**
+ * gen_pool_alloc() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses a first-fit algorithm.
*/
-struct gen_pool_chunk {
- spinlock_t lock;
- struct list_head next_chunk; /* next chunk in pool */
- unsigned long start_addr; /* starting address of memory chunk */
- unsigned long end_addr; /* ending address of memory chunk */
- unsigned long bits[0]; /* bitmap for allocating memory chunk */
-};
+static inline unsigned long __must_check
+gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+ return gen_pool_alloc_aligned(pool, size, 0);
+}

-extern struct gen_pool *gen_pool_create(int, int);
-extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 1923f14..0761079 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -16,53 +16,80 @@
#include <linux/genalloc.h>


+/* General purpose special memory pool descriptor. */
+struct gen_pool {
+ rwlock_t lock; /* protects chunks list */
+ struct list_head chunks; /* list of chunks in this pool */
+ unsigned order; /* minimum allocation order */
+};
+
+/* General purpose special memory pool chunk descriptor. */
+struct gen_pool_chunk {
+ spinlock_t lock; /* protects bits */
+ struct list_head next_chunk; /* next chunk in pool */
+ unsigned long start; /* start of memory chunk */
+ unsigned long size; /* number of bits */
+ unsigned long bits[0]; /* bitmap for allocating memory chunk */
+};
+
+
/**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @order: Log base 2 of number of bytes each bitmap bit
+ * represents.
+ * @nid: Node id of the node the pool structure should be allocated
+ * on, or -1. This will be also used for other allocations.
*
* Create a new special memory pool that can be used to manage special purpose
* memory not managed by the regular kmalloc/kfree interface.
*/
-struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid)
{
struct gen_pool *pool;

- pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
- if (pool != NULL) {
+ if (WARN_ON(order >= BITS_PER_LONG))
+ return NULL;
+
+ pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid);
+ if (pool) {
rwlock_init(&pool->lock);
INIT_LIST_HEAD(&pool->chunks);
- pool->min_alloc_order = min_alloc_order;
+ pool->order = order;
}
return pool;
}
EXPORT_SYMBOL(gen_pool_create);

/**
- * gen_pool_add - add a new chunk of special memory to the pool
- * @pool: pool to add new memory chunk to
- * @addr: starting address of memory chunk to add to pool
- * @size: size in bytes of the memory chunk to add to pool
- * @nid: node id of the node the chunk structure and bitmap should be
- * allocated on, or -1
+ * gen_pool_add() - add a new chunk of special memory to the pool
+ * @pool: Pool to add new memory chunk to.
+ * @addr: Starting address of memory chunk to add to pool.
+ * @size: Size in bytes of the memory chunk to add to pool.
*
* Add a new chunk of special memory to the specified pool.
*/
-int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
- int nid)
+int __must_check
+gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size, int nid)
{
struct gen_pool_chunk *chunk;
- int nbits = size >> pool->min_alloc_order;
- int nbytes = sizeof(struct gen_pool_chunk) +
- (nbits + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+ size_t nbytes;
+
+ if (WARN_ON(!addr || addr + size < addr ||
+ (addr & ((1 << pool->order) - 1))))
+ return -EINVAL;

- chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
- if (unlikely(chunk == NULL))
- return -1;
+ size = size >> pool->order;
+ if (WARN_ON(!size))
+ return -EINVAL;
+
+ nbytes = sizeof *chunk + BITS_TO_LONGS(size) * sizeof *chunk->bits;
+ chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
+ if (!chunk)
+ return -ENOMEM;

spin_lock_init(&chunk->lock);
- chunk->start_addr = addr;
- chunk->end_addr = addr + size;
+ chunk->start = addr >> pool->order;
+ chunk->size = size;

write_lock(&pool->lock);
list_add(&chunk->next_chunk, &pool->chunks);
@@ -73,115 +100,116 @@ int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
EXPORT_SYMBOL(gen_pool_add);

/**
- * gen_pool_destroy - destroy a special memory pool
- * @pool: pool to destroy
+ * gen_pool_destroy() - destroy a special memory pool
+ * @pool: Pool to destroy.
*
* Destroy the specified special memory pool. Verifies that there are no
* outstanding allocations.
*/
void gen_pool_destroy(struct gen_pool *pool)
{
- struct list_head *_chunk, *_next_chunk;
struct gen_pool_chunk *chunk;
- int order = pool->min_alloc_order;
- int bit, end_bit;
-
+ int bit;

- list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ while (!list_empty(&pool->chunks)) {
+ chunk = list_entry(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
list_del(&chunk->next_chunk);

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
- BUG_ON(bit < end_bit);
+ bit = find_next_bit(chunk->bits, chunk->size, 0);
+ BUG_ON(bit < chunk->size);

kfree(chunk);
}
kfree(pool);
- return;
}
EXPORT_SYMBOL(gen_pool_destroy);

/**
- * gen_pool_alloc - allocate special memory from the pool
- * @pool: pool to allocate from
- * @size: number of bytes to allocate from the pool
+ * gen_pool_alloc_aligned() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ * @alignment_order: Order the allocated space should be
+ * aligned to (eg. 20 means allocated space
+ * must be aligned to 1MiB).
*
* Allocate the requested number of bytes from the specified pool.
* Uses a first-fit algorithm.
*/
-unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order)
{
- struct list_head *_chunk;
+ unsigned long addr, align_mask = 0, flags, start;
struct gen_pool_chunk *chunk;
- unsigned long addr, flags;
- int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit;

if (size == 0)
return 0;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (alignment_order > pool->order)
+ align_mask = (1 << (alignment_order - pool->order)) - 1;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ size = (size + (1UL << pool->order) - 1) >> pool->order;

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk) {
+ if (chunk->size < size)
+ continue;

spin_lock_irqsave(&chunk->lock, flags);
- start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
- nbits, 0);
- if (start_bit >= end_bit) {
+ start = bitmap_find_next_zero_area_off(chunk->bits, chunk->size,
+ 0, size, align_mask,
+ chunk->start);
+ if (start >= chunk->size) {
spin_unlock_irqrestore(&chunk->lock, flags);
continue;
}

- addr = chunk->start_addr + ((unsigned long)start_bit << order);
-
- bitmap_set(chunk->bits, start_bit, nbits);
+ bitmap_set(chunk->bits, start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- read_unlock(&pool->lock);
- return addr;
+ addr = (chunk->start + start) << pool->order;
+ goto done;
}
+
+ addr = 0;
+done:
read_unlock(&pool->lock);
- return 0;
+ return addr;
}
-EXPORT_SYMBOL(gen_pool_alloc);
+EXPORT_SYMBOL(gen_pool_alloc_aligned);

/**
- * gen_pool_free - free allocated special memory back to the pool
- * @pool: pool to free to
- * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * gen_pool_free() - free allocated special memory back to the pool
+ * @pool: Pool to free to.
+ * @addr: Starting address of memory to free back to pool.
+ * @size: Size in bytes of memory to free.
*
* Free previously allocated special memory back to the specified pool.
*/
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
unsigned long flags;
- int order = pool->min_alloc_order;
- int bit, nbits;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (!size)
+ return;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ addr = addr >> pool->order;
+ size = (size + (1UL << pool->order) - 1) >> pool->order;
+
+ BUG_ON(addr + size < addr);

- if (addr >= chunk->start_addr && addr < chunk->end_addr) {
- BUG_ON(addr + size > chunk->end_addr);
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk)
+ if (addr >= chunk->start &&
+ addr + size <= chunk->start + chunk->size) {
spin_lock_irqsave(&chunk->lock, flags);
- bit = (addr - chunk->start_addr) >> order;
- while (nbits--)
- __clear_bit(bit++, chunk->bits);
+ bitmap_clear(chunk->bits, addr - chunk->start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- break;
+ goto done;
}
- }
- BUG_ON(nbits > 0);
+ BUG_ON(1);
+done:
read_unlock(&pool->lock);
}
EXPORT_SYMBOL(gen_pool_free);
--
1.7.2.3

2010-12-23 09:31:03

by Kyungmin Park

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

Hi Andrew,

any comments? what's the next step to merge it for 2.6.38 kernel. we
want to use this feature at mainline kernel.

Any idea and comments are welcome.

Thank you,
Kyungmin Park

On Thu, Dec 16, 2010 at 5:34 AM, Michal Nazarewicz
<[email protected]> wrote:
> Hello everyone,
>
> This is yet another version of CMA this time stripped from a lot of
> code and with working migration implementation.
>
> ? The Contiguous Memory Allocator (CMA) makes it possible for
> ? device drivers to allocate big contiguous chunks of memory after
> ? the system has booted.
>
> For more information see 7th patch in the set.
>
>
> This version fixes some things Kamezawa suggested plus it separates
> code that uses MIGRATE_CMA from the rest of the code. ?This I hope
> will help to grasp the overall idea of CMA.
>
>
> The current version is just an allocator that handles allocation of
> contiguous memory blocks. ?The difference between this patchset and
> Kamezawa's alloc_contig_pages() are:
>
> 1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
> ? which may be unsuitable for embeded systems where a few MiBs are
> ? required.
>
> ? Lack of the requirement on the alignment means that several threads
> ? might try to access the same pageblock/page. ?To prevent this from
> ? happening CMA uses a mutex so that only one cm_alloc()/cm_free()
> ? function may run at one point.
>
> 2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
> ? similarly to ZONE_MOVABLE but can be put in arbitrary places.
>
> ? This is required for us since we need to define two disjoint memory
> ? ranges inside system RAM. ?(ie. in two memory banks (do not confuse
> ? with nodes)).
>
> 3. alloc_contig_pages() scans memory in search for range that could be
> ? migrated. ?CMA on the other hand maintains its own allocator to
> ? decide where to allocate memory for device drivers and then tries
> ? to migrate pages from that part if needed. ?This is not strictly
> ? required but I somehow feel it might be faster.
>
>
> Links to previous versions of the patchset:
> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
> v5: (intentionally left out as CMA v5 was identical to CMA v4)
> v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>
>
>
> Changelog:
>
> v8: 1. The alloc_contig_range() function has now been separated from
> ? ? ? CMA and put in page_allocator.c. ?This function tries to
> ? ? ? migrate all LRU pages in specified range and then allocate the
> ? ? ? range using alloc_contig_freed_pages().
>
> ? ?2. Support for MIGRATE_CMA has been separated from the CMA code.
> ? ? ? I have not tested if CMA works with ZONE_MOVABLE but I see no
> ? ? ? reasons why it shouldn't.
>
> ? ?3. I have added a @private argument when creating CMA contexts so
> ? ? ? that one can reserve memory and not share it with the rest of
> ? ? ? the system. ?This way, CMA acts only as allocation algorithm.
>
> v7: 1. A lot of functionality that handled driver->allocator_context
> ? ? ? mapping has been removed from the patchset. ?This is not to say
> ? ? ? that this code is not needed, it's just not worth posting
> ? ? ? everything in one patchset.
>
> ? ? ? Currently, CMA is "just" an allocator. ?It uses it's own
> ? ? ? migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
> ? ? ? which behave just like ZONE_MOVABLE but dispite the latter can
> ? ? ? be put in arbitrary places.
>
> ? ?2. The migration code that was introduced in the previous version
> ? ? ? actually started working.
>
>
> v6: 1. Most importantly, v6 introduces support for memory migration.
> ? ? ? The implementation is not yet complete though.
>
> ? ? ? Migration support means that when CMA is not using memory
> ? ? ? reserved for it, page allocator can allocate pages from it.
> ? ? ? When CMA wants to use the memory, the pages have to be moved
> ? ? ? and/or evicted as to make room for CMA.
>
> ? ? ? To make it possible it must be guaranteed that only movable and
> ? ? ? reclaimable pages are allocated in CMA controlled regions.
> ? ? ? This is done by introducing a MIGRATE_CMA migrate type that
> ? ? ? guarantees exactly that.
>
> ? ? ? Some of the migration code is "borrowed" from Kamezawa
> ? ? ? Hiroyuki's alloc_contig_pages() implementation. ?The main
> ? ? ? difference is that thanks to MIGRATE_CMA migrate type CMA
> ? ? ? assumes that memory controlled by CMA are is always movable or
> ? ? ? reclaimable so that it makes allocation decisions regardless of
> ? ? ? the whether some pages are actually allocated and migrates them
> ? ? ? if needed.
>
> ? ? ? The most interesting patches from the patchset that implement
> ? ? ? the functionality are:
>
> ? ? ? ? 09/13: mm: alloc_contig_free_pages() added
> ? ? ? ? 10/13: mm: MIGRATE_CMA migration type added
> ? ? ? ? 11/13: mm: MIGRATE_CMA isolation functions added
> ? ? ? ? 12/13: mm: cma: Migration support added [wip]
>
> ? ? ? Currently, kernel panics in some situations which I am trying
> ? ? ? to investigate.
>
> ? ?2. cma_pin() and cma_unpin() functions has been added (after
> ? ? ? a conversation with Johan Mossberg). ?The idea is that whenever
> ? ? ? hardware does not use the memory (no transaction is on) the
> ? ? ? chunk can be moved around. ?This would allow defragmentation to
> ? ? ? be implemented if desired. ?No defragmentation algorithm is
> ? ? ? provided at this time.
>
> ? ?3. Sysfs support has been replaced with debugfs. ?I always felt
> ? ? ? unsure about the sysfs interface and when Greg KH pointed it
> ? ? ? out I finally got to rewrite it to debugfs.
>
>
> v5: (intentionally left out as CMA v5 was identical to CMA v4)
>
>
> v4: 1. The "asterisk" flag has been removed in favour of requiring
> ? ? ? that platform will provide a "*=<regions>" rule in the map
> ? ? ? attribute.
>
> ? ?2. The terminology has been changed slightly renaming "kind" to
> ? ? ? "type" of memory. ?In the previous revisions, the documentation
> ? ? ? indicated that device drivers define memory kinds and now,
>
> v3: 1. The command line parameters have been removed (and moved to
> ? ? ? a separate patch, the fourth one). ?As a consequence, the
> ? ? ? cma_set_defaults() function has been changed -- it no longer
> ? ? ? accepts a string with list of regions but an array of regions.
>
> ? ?2. The "asterisk" attribute has been removed. ?Now, each region
> ? ? ? has an "asterisk" flag which lets one specify whether this
> ? ? ? region should by considered "asterisk" region.
>
> ? ?3. SysFS support has been moved to a separate patch (the third one
> ? ? ? in the series) and now also includes list of regions.
>
> v2: 1. The "cma_map" command line have been removed. ?In exchange,
> ? ? ? a SysFS entry has been created under kernel/mm/contiguous.
>
> ? ? ? The intended way of specifying the attributes is
> ? ? ? a cma_set_defaults() function called by platform initialisation
> ? ? ? code. ?"regions" attribute (the string specified by "cma"
> ? ? ? command line parameter) can be overwritten with command line
> ? ? ? parameter; the other attributes can be changed during run-time
> ? ? ? using the SysFS entries.
>
> ? ?2. The behaviour of the "map" attribute has been modified
> ? ? ? slightly. ?Currently, if no rule matches given device it is
> ? ? ? assigned regions specified by the "asterisk" attribute. ?It is
> ? ? ? by default built from the region names given in "regions"
> ? ? ? attribute.
>
> ? ?3. Devices can register private regions as well as regions that
> ? ? ? can be shared but are not reserved using standard CMA
> ? ? ? mechanisms. ?A private region has no name and can be accessed
> ? ? ? only by devices that have the pointer to it.
>
> ? ?4. The way allocators are registered has changed. ?Currently,
> ? ? ? a cma_allocator_register() function is used for that purpose.
> ? ? ? Moreover, allocators are attached to regions the first time
> ? ? ? memory is registered from the region or when allocator is
> ? ? ? registered which means that allocators can be dynamic modules
> ? ? ? that are loaded after the kernel booted (of course, it won't be
> ? ? ? possible to allocate a chunk of memory from a region if
> ? ? ? allocator is not loaded).
>
> ? ?5. Index of new functions:
>
> ? ?+static inline dma_addr_t __must_check
> ? ?+cma_alloc_from(const char *regions, size_t size,
> ? ?+ ? ? ? ? ? ? ? dma_addr_t alignment)
>
> ? ?+static inline int
> ? ?+cma_info_about(struct cma_info *info, const const char *regions)
>
> ? ?+int __must_check cma_region_register(struct cma_region *reg);
>
> ? ?+dma_addr_t __must_check
> ? ?+cma_alloc_from_region(struct cma_region *reg,
> ? ?+ ? ? ? ? ? ? ? ? ? ? ?size_t size, dma_addr_t alignment);
>
> ? ?+static inline dma_addr_t __must_check
> ? ?+cma_alloc_from(const char *regions,
> ? ?+ ? ? ? ? ? ? ? size_t size, dma_addr_t alignment);
>
> ? ?+int cma_allocator_register(struct cma_allocator *alloc);
>
>
> Patches in this patchset:
>
> ?mm: migrate.c: fix compilation error
>
> ? ?I had some strange compilation error; this patch fixed them.
>
> ?lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
> ?lib: genalloc: Generic allocator improvements
>
> ? ?Some improvements to genalloc API (most importantly possibility to
> ? ?allocate memory with alignment requirement).
>
> ?mm: move some functions from memory_hotplug.c to page_isolation.c
> ?mm: alloc_contig_freed_pages() added
>
> ? ?Code "stolen" from Kamezawa. ?The first patch just moves code
> ? ?around and the second provide function for "allocates" already
> ? ?freed memory.
>
> ?mm: alloc_contig_range() added
>
> ? ?This is what Kamezawa asked: a function that tries to migrate all
> ? ?pages from given range and then use alloc_contig_freed_pages()
> ? ?(defined by the previous commit) to allocate those pages.
>
> ?mm: cma: Contiguous Memory Allocator added
>
> ? ?The CMA code but with no MIGRATE_CMA support yet. ?This assues
> ? ?that one uses a ZONE_MOVABLE. ?I must admit I have not test that
> ? ?setup yet but I don't see any reasons why it should not work.
>
> ?mm: MIGRATE_CMA migration type added
> ?mm: MIGRATE_CMA isolation functions added
> ?mm: MIGRATE_CMA support added to CMA
>
> ? ?Introduction of the new migratetype and support for it in CMA.
> ? ?MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
> ? ?memory range can be marked as one.
>
> ?mm: cma: Test device and application added
>
> ? ?Test device and application. ?Not really for merging; just for
> ? ?testing really.
>
> ?ARM: cma: Added CMA to Aquila, Goni and c210 universal boards
>
> ? ?A stub integration with some ARM machines. ?Mostly to get the cma
> ? ?testing device working. ?Again, not for merging, just an example.
>
>
> ?arch/arm/mach-s5pv210/Kconfig ? ? ? ? ? ? ? | ? ?2 +
> ?arch/arm/mach-s5pv210/mach-aquila.c ? ? ? ? | ? ?2 +
> ?arch/arm/mach-s5pv210/mach-goni.c ? ? ? ? ? | ? ?2 +
> ?arch/arm/mach-s5pv310/Kconfig ? ? ? ? ? ? ? | ? ?1 +
> ?arch/arm/mach-s5pv310/mach-universal_c210.c | ? ?2 +
> ?arch/arm/plat-s5p/Makefile ? ? ? ? ? ? ? ? ?| ? ?2 +
> ?arch/arm/plat-s5p/cma-stub.c ? ? ? ? ? ? ? ?| ? 49 +++
> ?arch/arm/plat-s5p/include/plat/cma-stub.h ? | ? 21 ++
> ?drivers/misc/Kconfig ? ? ? ? ? ? ? ? ? ? ? ?| ? 28 ++
> ?drivers/misc/Makefile ? ? ? ? ? ? ? ? ? ? ? | ? ?1 +
> ?drivers/misc/cma-dev.c ? ? ? ? ? ? ? ? ? ? ?| ?238 ++++++++++++++
> ?include/linux/bitmap.h ? ? ? ? ? ? ? ? ? ? ?| ? 24 ++-
> ?include/linux/cma.h ? ? ? ? ? ? ? ? ? ? ? ? | ?290 +++++++++++++++++
> ?include/linux/genalloc.h ? ? ? ? ? ? ? ? ? ?| ? 46 ++--
> ?include/linux/mmzone.h ? ? ? ? ? ? ? ? ? ? ?| ? 43 ++-
> ?include/linux/page-isolation.h ? ? ? ? ? ? ?| ? 50 +++-
> ?lib/bitmap.c ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ? 22 +-
> ?lib/genalloc.c ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ?182 ++++++-----
> ?mm/Kconfig ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ? 36 ++
> ?mm/Makefile ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? ?1 +
> ?mm/cma.c ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ?455 ++++++++++++++++++++++++++
> ?mm/compaction.c ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? 10 +
> ?mm/internal.h ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? ?3 +
> ?mm/memory_hotplug.c ? ? ? ? ? ? ? ? ? ? ? ? | ?108 -------
> ?mm/migrate.c ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ? ?2 +
> ?mm/page_alloc.c ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ?286 +++++++++++++++--
> ?mm/page_isolation.c ? ? ? ? ? ? ? ? ? ? ? ? | ?126 +++++++-
> ?tools/cma/cma-test.c ? ? ? ? ? ? ? ? ? ? ? ?| ?457 +++++++++++++++++++++++++++
> ?28 files changed, 2219 insertions(+), 270 deletions(-)
> ?create mode 100644 arch/arm/plat-s5p/cma-stub.c
> ?create mode 100644 arch/arm/plat-s5p/include/plat/cma-stub.h
> ?create mode 100644 drivers/misc/cma-dev.c
> ?create mode 100644 include/linux/cma.h
> ?create mode 100644 mm/cma.c
> ?create mode 100644 tools/cma/cma-test.c
>
> --
> 1.7.2.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. ?For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2010-12-23 10:07:43

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 06:30:57PM +0900, Kyungmin Park wrote:
> Hi Andrew,
>
> any comments? what's the next step to merge it for 2.6.38 kernel. we
> want to use this feature at mainline kernel.

Has anyone addressed my issue with it that this is wide-open for
abuse by allocating large chunks of memory, and then remapping
them in some way with different attributes, thereby violating the
ARM architecture specification?

In other words, do we _actually_ have a use for this which doesn't
involve doing something like allocating 32MB of memory from it,
remapping it so that it's DMA coherent, and then performing DMA
on the resulting buffer?

2010-12-23 10:58:20

by Marek Szyprowski

[permalink] [raw]
Subject: RE: [PATCHv8 00/12] Contiguous Memory Allocator

Hello,

On Thursday, December 23, 2010 11:07 AM Russell King - ARM Linux wrote:

> On Thu, Dec 23, 2010 at 06:30:57PM +0900, Kyungmin Park wrote:
> > Hi Andrew,
> >
> > any comments? what's the next step to merge it for 2.6.38 kernel. we
> > want to use this feature at mainline kernel.
>
> Has anyone addressed my issue with it that this is wide-open for
> abuse by allocating large chunks of memory, and then remapping
> them in some way with different attributes, thereby violating the
> ARM architecture specification?

Actually this contiguous memory allocator is a better replacement for
alloc_pages() which is used by dma_alloc_coherent(). It is a generic
framework that is not tied only to ARM architecture.

> In other words, do we _actually_ have a use for this which doesn't
> involve doing something like allocating 32MB of memory from it,
> remapping it so that it's DMA coherent, and then performing DMA
> on the resulting buffer?

This is an arm specific problem, also related to dma_alloc_coherent()
allocator. To be 100% conformant with ARM specification we would
probably need to unmap all pages used by the dma_coherent allocator
from the LOW MEM area. This is doable, but completely not related
to the CMA and this patch series.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2010-12-23 12:20:26

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 11:58:08AM +0100, Marek Szyprowski wrote:
> Actually this contiguous memory allocator is a better replacement for
> alloc_pages() which is used by dma_alloc_coherent(). It is a generic
> framework that is not tied only to ARM architecture.

... which is open to abuse. What I'm trying to find out is - if it
can't be used for DMA, what is it to be used for?

Or are we inventing an everything-but-ARM framework?

> > In other words, do we _actually_ have a use for this which doesn't
> > involve doing something like allocating 32MB of memory from it,
> > remapping it so that it's DMA coherent, and then performing DMA
> > on the resulting buffer?
>
> This is an arm specific problem, also related to dma_alloc_coherent()
> allocator. To be 100% conformant with ARM specification we would
> probably need to unmap all pages used by the dma_coherent allocator
> from the LOW MEM area. This is doable, but completely not related
> to the CMA and this patch series.

You've already been told why we can't unmap pages from the kernel
direct mapping.

Okay, so I'm just going to assume that CMA has _no_ _business_ being
used on ARM, and is not something that should interest anyone in the
ARM community.

2010-12-23 13:09:55

by Marek Szyprowski

[permalink] [raw]
Subject: RE: [PATCHv8 00/12] Contiguous Memory Allocator

Hello,

On Thursday, December 23, 2010 1:19 PM Russell King - ARM Linux wrote:

> On Thu, Dec 23, 2010 at 11:58:08AM +0100, Marek Szyprowski wrote:
> > Actually this contiguous memory allocator is a better replacement for
> > alloc_pages() which is used by dma_alloc_coherent(). It is a generic
> > framework that is not tied only to ARM architecture.
>
> ... which is open to abuse. What I'm trying to find out is - if it
> can't be used for DMA, what is it to be used for?
>
> Or are we inventing an everything-but-ARM framework?

We are trying to get something that really works and SOLVES some of the
problems with real devices that require contiguous memory for DMA.

> > > In other words, do we _actually_ have a use for this which doesn't
> > > involve doing something like allocating 32MB of memory from it,
> > > remapping it so that it's DMA coherent, and then performing DMA
> > > on the resulting buffer?
> >
> > This is an arm specific problem, also related to dma_alloc_coherent()
> > allocator. To be 100% conformant with ARM specification we would
> > probably need to unmap all pages used by the dma_coherent allocator
> > from the LOW MEM area. This is doable, but completely not related
> > to the CMA and this patch series.
>
> You've already been told why we can't unmap pages from the kernel
> direct mapping.

It requires some amount of work but I see no reason why we shouldn't be
able to unmap that pages to stay 100% conformant with ARM spec.

Please notice that there are also use cases where the memory will not be
accessed by the CPU at all (like DMA transfers between multimedia devices
and the system memory).

> Okay, so I'm just going to assume that CMA has _no_ _business_ being
> used on ARM, and is not something that should interest anyone in the
> ARM community.

Go ahead! Remeber to remove dma_coherent because it also breaks the spec. :)
Oh, I forgot. We can also remove all device drivers that might use DMA. :)



Merry Christmas and Happy New Year for everyone! :)

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

2010-12-23 13:35:07

by Tomasz Fujak

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

Dear Mr. King,

AFAIK the CMA is the fourth attempt since 2008 taken to solve the
multimedia memory allocation issue on some embedded devices. Most
notably on ARM, that happens to be present in the SoCs we care about
along the IOMMU-incapable multimedia IPs.

I understand that you have your guidelines taken from the ARM
specification, but this approach is not helping us. The mainline kernel
is server- and desktop-centric for various reasons I am not going to
dwell into. We're trying hard to solve the physical memory fragmentation
issue for some time now, only to hear "this is not acceptable, go
somewhere else". So we did - the CMA is targeted towards mm, NOT the
ARM. While I do not exactly know how you see your role in ARM kernel
development, we have shown a few times that this issue is important for
us, and we'd like to solve it. So if you could give a glimpse of what is
acceptable, given the existing circumstances, we could possibly help
developing that solution. Namely:
1. ARM-compatible SoC
2. Multimedia IP blocks requiring large amounts of contiguous memory
2. No IOMMU or SG in said blocks
4. Unused memory reserved for said multimedia drivers should be used by
the kernel
5. Multimedia allocation scenarios must always be working (under some
constraints of course), within sane time limit
6. The solution shall have minimal delta to upstream linux (none?)

While the obvious CMA uses are the ones you'd mostly like to avoid, we
haven't tried to post anything like that along.
This way no obvious spec abuse is made, and we minimize the delta to the
upstream - it's even better than current state, when you have dma
coherent memory doing exactly what you claim is forbidden (unpredictable
results could possibly happen).

As the feedback from the first CMA patches confirm, the issue we're
trying to solve here is real. Yet no real solution exists to my
knowledge. I understand the ARM holding my try to just wait till all the
relevant chips do have an IOMMU, but here and now there is a SOC we are
going to use. No IMOMMU, no SG. So would you please help us - or if for
some reason you can't, just not make our work any harder?

BTW why is the lowmem unmap not feasible? Is it the section entries in
the page tables scattered throughout the system? I was unable to find
the answer so far.

Best regards
--
Tomasz Fujak
Samsung Electronics Poland R&D

On 2010-12-23 13:19, Russell King - ARM Linux wrote:
> On Thu, Dec 23, 2010 at 11:58:08AM +0100, Marek Szyprowski wrote:
>> Actually this contiguous memory allocator is a better replacement for
>> alloc_pages() which is used by dma_alloc_coherent(). It is a generic
>> framework that is not tied only to ARM architecture.
> ... which is open to abuse. What I'm trying to find out is - if it
> can't be used for DMA, what is it to be used for?
>
> Or are we inventing an everything-but-ARM framework?
>
>>> In other words, do we _actually_ have a use for this which doesn't
>>> involve doing something like allocating 32MB of memory from it,
>>> remapping it so that it's DMA coherent, and then performing DMA
>>> on the resulting buffer?
>> This is an arm specific problem, also related to dma_alloc_coherent()
>> allocator. To be 100% conformant with ARM specification we would
>> probably need to unmap all pages used by the dma_coherent allocator
>> from the LOW MEM area. This is doable, but completely not related
>> to the CMA and this patch series.
> You've already been told why we can't unmap pages from the kernel
> direct mapping.
>
> Okay, so I'm just going to assume that CMA has _no_ _business_ being
> used on ARM, and is not something that should interest anyone in the
> ARM community.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2010-12-23 13:41:34

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

Russell King - ARM Linux <[email protected]> writes:
> Has anyone addressed my issue with it that this is wide-open for
> abuse by allocating large chunks of memory, and then remapping
> them in some way with different attributes, thereby violating the
> ARM architecture specification?
>
> In other words, do we _actually_ have a use for this which doesn't
> involve doing something like allocating 32MB of memory from it,
> remapping it so that it's DMA coherent, and then performing DMA
> on the resulting buffer?

Huge pages.

Also, don't treat it as coherent memory and just flush/clear/invalidate
cache before and after each DMA transaction. I never understood what's
wrong with that approach.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86-tlen.pl>--<jid:mina86-jabber.org>--ooO--(_)--Ooo--

2010-12-23 13:45:34

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 02:09:44PM +0100, Marek Szyprowski wrote:
> Hello,
>
> On Thursday, December 23, 2010 1:19 PM Russell King - ARM Linux wrote:
>
> > On Thu, Dec 23, 2010 at 11:58:08AM +0100, Marek Szyprowski wrote:
> > > Actually this contiguous memory allocator is a better replacement for
> > > alloc_pages() which is used by dma_alloc_coherent(). It is a generic
> > > framework that is not tied only to ARM architecture.
> >
> > ... which is open to abuse. What I'm trying to find out is - if it
> > can't be used for DMA, what is it to be used for?
> >
> > Or are we inventing an everything-but-ARM framework?
>
> We are trying to get something that really works and SOLVES some of the
> problems with real devices that require contiguous memory for DMA.

So, here you've confirmed that it's for DMA.

> > > > In other words, do we _actually_ have a use for this which doesn't
> > > > involve doing something like allocating 32MB of memory from it,
> > > > remapping it so that it's DMA coherent, and then performing DMA
> > > > on the resulting buffer?
> > >
> > > This is an arm specific problem, also related to dma_alloc_coherent()
> > > allocator. To be 100% conformant with ARM specification we would
> > > probably need to unmap all pages used by the dma_coherent allocator
> > > from the LOW MEM area. This is doable, but completely not related
> > > to the CMA and this patch series.
> >
> > You've already been told why we can't unmap pages from the kernel
> > direct mapping.
>
> It requires some amount of work but I see no reason why we shouldn't be
> able to unmap that pages to stay 100% conformant with ARM spec.

I have considered - and tried - to do that with the dma_alloc_coherent()
spec, but it is NOT POSSIBLE to do so - too many factors stand in the
way of making it work, such as the need bring the system to a complete
halt to modify all the L1 page tables and broadcast the TLB operations
to invalidate the old mappings. None of that can be done from all the
contexts under which dma_alloc_coherent() is called from.

> Please notice that there are also use cases where the memory will not be
> accessed by the CPU at all (like DMA transfers between multimedia devices
> and the system memory).

Rubbish - if you think that, then you have very little understanding of
modern CPUs. Modern CPUs speculatively access _any_ memory which is
visible to them, and as the ARM architecture progresses, the speculative
prefetching will become more aggressive. So if you have memory mapped
in the kernel direct map, then you _have_ to assume that the CPU will
fire off accesses to that memory at any time, loading it into its cache.

> > Okay, so I'm just going to assume that CMA has _no_ _business_ being
> > used on ARM, and is not something that should interest anyone in the
> > ARM community.
>
> Go ahead! Remeber to remove dma_coherent because it also breaks the spec. :)
> Oh, I forgot. We can also remove all device drivers that might use DMA. :)

The only solution I've come up for dma_alloc_coherent() is to reserve
the entire coherent DMA region at boot time, taking it out of the
kernel's view of available memory and thereby preventing it from ever
being mapped or the kernel using that memory for any other purpose.
That's about the best we can realistically do for ARM to conform to the
spec.

Every time I've brought this issue up with you, you've brushed it aside.
So if you feel that the right thing to do is to ignore such issues, you
won't be surprised if I keep opposing your efforts to get this into
mainline.

If you're serious about making this work, then provide some proper code
which shows how to use this for DMA on ARM systems without violating
the architecture specification. Until you do, I see no hope that CMA
will ever be suitable for use on ARM.

2010-12-23 13:49:34

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 02:35:00PM +0100, Tomasz Fujak wrote:
> Dear Mr. King,
>
> AFAIK the CMA is the fourth attempt since 2008 taken to solve the
> multimedia memory allocation issue on some embedded devices. Most
> notably on ARM, that happens to be present in the SoCs we care about
> along the IOMMU-incapable multimedia IPs.
>
> I understand that you have your guidelines taken from the ARM
> specification, but this approach is not helping us.

I'm sorry you feel like that, but I'm living in reality. If we didn't
have these architecture restrictions then we wouldn't have this problem
in the first place.

What I'm trying to do here is to ensure that we remain _legal_ to the
architecture specification - which for this issue means that we avoid
corrupting people's data.

Maybe you like having a system which randomly corrupts people's data?
I most certainly don't. But that's the way CMA is heading at the moment
on ARM.

It is not up to me to solve these problems - that's for the proposer of
the new API to do so. So, please, don't try to lump this problem on
my shoulders. It's not my problem to sort out.

2010-12-23 13:52:08

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 02:41:26PM +0100, Michal Nazarewicz wrote:
> Russell King - ARM Linux <[email protected]> writes:
> > Has anyone addressed my issue with it that this is wide-open for
> > abuse by allocating large chunks of memory, and then remapping
> > them in some way with different attributes, thereby violating the
> > ARM architecture specification?
> >
> > In other words, do we _actually_ have a use for this which doesn't
> > involve doing something like allocating 32MB of memory from it,
> > remapping it so that it's DMA coherent, and then performing DMA
> > on the resulting buffer?
>
> Huge pages.
>
> Also, don't treat it as coherent memory and just flush/clear/invalidate
> cache before and after each DMA transaction. I never understood what's
> wrong with that approach.

If you've ever used an ARM system with a VIVT cache, you'll know what's
wrong with this approach.

ARM systems with VIVT caches have extremely poor task switching
performance because they flush the entire data cache at every task switch
- to the extent that it makes system performance drop dramatically when
they become loaded.

Doing that for every DMA operation will kill the advantage we've gained
from having VIPT caches and ASIDs stone dead.

2010-12-23 14:04:12

by Tomasz Fujak

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On 2010-12-23 14:48, Russell King - ARM Linux wrote:
> On Thu, Dec 23, 2010 at 02:35:00PM +0100, Tomasz Fujak wrote:
>> Dear Mr. King,
>>
>> AFAIK the CMA is the fourth attempt since 2008 taken to solve the
>> multimedia memory allocation issue on some embedded devices. Most
>> notably on ARM, that happens to be present in the SoCs we care about
>> along the IOMMU-incapable multimedia IPs.
>>
>> I understand that you have your guidelines taken from the ARM
>> specification, but this approach is not helping us.
> I'm sorry you feel like that, but I'm living in reality. If we didn't
> have these architecture restrictions then we wouldn't have this problem
> in the first place.
Do we really have them, or just the documents say they exist?
> What I'm trying to do here is to ensure that we remain _legal_ to the
> architecture specification - which for this issue means that we avoid
> corrupting people's data.
As legal as the mentioned dma_coherent?
> Maybe you like having a system which randomly corrupts people's data?
> I most certainly don't. But that's the way CMA is heading at the moment
> on ARM.
Has this been experienced? I had some ARM-compatible boards on my desk
(xscale, v6 and v7) and none of them crashed due to this behavior. And
we *do* have multiple memory mappings, with different attributes.
> It is not up to me to solve these problems - that's for the proposer of
> the new API to do so. So, please, don't try to lump this problem on
> my shoulders. It's not my problem to sort out.
Just great. Nothing short of spectacular - this way the IA32 is going to
take the embedded market piece by piece once the big two advance their
foundry processes.
Despite having the translator, so much burden in the legacy ISA and the
fact that most of the embedded engineers from the high end are
accustomed to the ARM.

In other words, should we take your response as yet another NAK?
Or would you try harder and at least point us to some direction that
would not doom the effort from the very beginning.
I understand that the role of an oracle is so much easier, but the time
is running and devising subsequent solutions is not the use of
engineers' time.

Best regards
---
Tomasz Fujak

> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-12-23 14:08:27

by Tomasz Fujak

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On 2010-12-23 14:51, Russell King - ARM Linux wrote:
> On Thu, Dec 23, 2010 at 02:41:26PM +0100, Michal Nazarewicz wrote:
>> Russell King - ARM Linux <[email protected]> writes:
>>> Has anyone addressed my issue with it that this is wide-open for
>>> abuse by allocating large chunks of memory, and then remapping
>>> them in some way with different attributes, thereby violating the
>>> ARM architecture specification?
>>>
>>> In other words, do we _actually_ have a use for this which doesn't
>>> involve doing something like allocating 32MB of memory from it,
>>> remapping it so that it's DMA coherent, and then performing DMA
>>> on the resulting buffer?
>> Huge pages.
>>
>> Also, don't treat it as coherent memory and just flush/clear/invalidate
>> cache before and after each DMA transaction. I never understood what's
>> wrong with that approach.
> If you've ever used an ARM system with a VIVT cache, you'll know what's
> wrong with this approach.
>
> ARM systems with VIVT caches have extremely poor task switching
> performance because they flush the entire data cache at every task switch
> - to the extent that it makes system performance drop dramatically when
> they become loaded.
>
> Doing that for every DMA operation will kill the advantage we've gained
> from having VIPT caches and ASIDs stone dead.
This statement effectively means: don't map dma-able memory to the CPU
unless it's uncached. Have I missed anything?

> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-12-23 14:17:14

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 03:04:07PM +0100, Tomasz Fujak wrote:
> On 2010-12-23 14:48, Russell King - ARM Linux wrote:
> > On Thu, Dec 23, 2010 at 02:35:00PM +0100, Tomasz Fujak wrote:
> >> Dear Mr. King,
> >>
> >> AFAIK the CMA is the fourth attempt since 2008 taken to solve the
> >> multimedia memory allocation issue on some embedded devices. Most
> >> notably on ARM, that happens to be present in the SoCs we care about
> >> along the IOMMU-incapable multimedia IPs.
> >>
> >> I understand that you have your guidelines taken from the ARM
> >> specification, but this approach is not helping us.
> > I'm sorry you feel like that, but I'm living in reality. If we didn't
> > have these architecture restrictions then we wouldn't have this problem
> > in the first place.
> Do we really have them, or just the documents say they exist?

Yes. We have seen CPUs which lockup or crash as a result of mismatched
attributes in the page tables.

> > What I'm trying to do here is to ensure that we remain _legal_ to the
> > architecture specification - which for this issue means that we avoid
> > corrupting people's data.
> As legal as the mentioned dma_coherent?

See my other comment in an earlier email. See the patch which prevents
ioremap() being used on system memory. There is active movement at the
present time to sorting these violations out and find solutions for
them.

The last thing we need is a new API which introduces new violations.

> > Maybe you like having a system which randomly corrupts people's data?
> > I most certainly don't. But that's the way CMA is heading at the moment
> > on ARM.
> Has this been experienced? I had some ARM-compatible boards on my desk
> (xscale, v6 and v7) and none of them crashed due to this behavior. And
> we *do* have multiple memory mappings, with different attributes.

Xscale doesn't suffer from the problem. V6 doesn't aggressively speculate.
V7 speculates more aggressively, and corruption has been seen there.

> > It is not up to me to solve these problems - that's for the proposer of
> > the new API to do so. So, please, don't try to lump this problem on
> > my shoulders. It's not my problem to sort out.
> Just great. Nothing short of spectacular - this way the IA32 is going to
> take the embedded market piece by piece once the big two advance their
> foundry processes.

Look, I've been pointing out this problem ever since the very _first_
CMA patches were posted to the list, yet the CMA proponents have decided
to brush those problems aside each and every time I've raised them.

So, you should be asking _why_ the CMA proponents are choosing to ignore
this issue completely, rather than working to resolve it.

If it's resolved, then the problem goes away.

> In other words, should we take your response as yet another NAK?
> Or would you try harder and at least point us to some direction that
> would not doom the effort from the very beginning.

What the fsck do you think I've been doing? This is NOT THE FIRST time
I've raised this issue. I gave up raising it after the first couple
of attempts because I wasn't being listened to.

You say about _me_ not being very helpful. How about the CMA proponents
start taking the issue I've raised seriously, and try to work out how
to solve it? And how about blaming them for the months of wasted time
on this issue _because_ _they_ have chosen to ignore it?

2010-12-23 14:21:27

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 03:08:21PM +0100, Tomasz Fujak wrote:
> On 2010-12-23 14:51, Russell King - ARM Linux wrote:
> > On Thu, Dec 23, 2010 at 02:41:26PM +0100, Michal Nazarewicz wrote:
> >> Russell King - ARM Linux <[email protected]> writes:
> >>> Has anyone addressed my issue with it that this is wide-open for
> >>> abuse by allocating large chunks of memory, and then remapping
> >>> them in some way with different attributes, thereby violating the
> >>> ARM architecture specification?
> >>>
> >>> In other words, do we _actually_ have a use for this which doesn't
> >>> involve doing something like allocating 32MB of memory from it,
> >>> remapping it so that it's DMA coherent, and then performing DMA
> >>> on the resulting buffer?
> >> Huge pages.
> >>
> >> Also, don't treat it as coherent memory and just flush/clear/invalidate
> >> cache before and after each DMA transaction. I never understood what's
> >> wrong with that approach.
> > If you've ever used an ARM system with a VIVT cache, you'll know what's
> > wrong with this approach.
> >
> > ARM systems with VIVT caches have extremely poor task switching
> > performance because they flush the entire data cache at every task switch
> > - to the extent that it makes system performance drop dramatically when
> > they become loaded.
> >
> > Doing that for every DMA operation will kill the advantage we've gained
> > from having VIPT caches and ASIDs stone dead.
> This statement effectively means: don't map dma-able memory to the CPU
> unless it's uncached. Have I missed anything?

I'll give you another solution to the problem - lobby ARM Ltd to have
this restriction lifted from the architecture specification, which
will probably result in the speculative prefetching also having to be
removed.

That would be my preferred solution if I had the power to do so, but
I have to live with what ARM Ltd (and their partners such as yourselves)
decide should end up in the architecture specification.

2010-12-23 14:43:00

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On Thu, Dec 23, 2010 at 4:16 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Thu, Dec 23, 2010 at 03:04:07PM +0100, Tomasz Fujak wrote:
>> In other words, should we take your response as yet another NAK?
>> Or would you try harder and at least point us to some direction that
>> would not doom the effort from the very beginning.
>
> What the fsck do you think I've been doing?  This is NOT THE FIRST time
> I've raised this issue.  I gave up raising it after the first couple
> of attempts because I wasn't being listened to.
>
> You say about _me_ not being very helpful.  How about the CMA proponents
> start taking the issue I've raised seriously, and try to work out how
> to solve it?  And how about blaming them for the months of wasted time
> on this issue _because_ _they_ have chosen to ignore it?

I've also raised the issue for ARM. However, I don't see what is the
big problem.

A generic solution (that I think I already proposed) would be to
reserve a chunk of memory for the CMA that can be removed from the
normally mapped kernel memory through memblock at boot time. The size
of this memory region would be configurable through kconfig. Then, the
CMA would have a "dma" flag or something, and take chunks out of it
until there's no more, and then return errors. That would work for
ARM.

Cheers.

--
Felipe Contreras

2010-12-23 15:02:15

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

>> On Thu, Dec 23, 2010 at 03:04:07PM +0100, Tomasz Fujak wrote:
>>> In other words, should we take your response as yet another NAK?
>>> Or would you try harder and at least point us to some direction that
>>> would not doom the effort from the very beginning.

> On Thu, Dec 23, 2010 at 4:16 PM, Russell King - ARM Linux
> <[email protected]> wrote:
>> What the fsck do you think I've been doing?  This is NOT THE FIRST time
>> I've raised this issue.  I gave up raising it after the first couple
>> of attempts because I wasn't being listened to.
>>
>> You say about _me_ not being very helpful.  How about the CMA proponents
>> start taking the issue I've raised seriously, and try to work out how
>> to solve it?  And how about blaming them for the months of wasted time
>> on this issue _because_ _they_ have chosen to ignore it?

Felipe Contreras <[email protected]> writes:
> I've also raised the issue for ARM. However, I don't see what is the
> big problem.
>
> A generic solution (that I think I already proposed) would be to
> reserve a chunk of memory for the CMA that can be removed from the
> normally mapped kernel memory through memblock at boot time. The size
> of this memory region would be configurable through kconfig. Then, the
> CMA would have a "dma" flag or something,

Having exactly that usage in mind, in v8 I've added notion of private
CMA contexts which can be used for DMA coherent RAM as well as memory
mapped devices.

> and take chunks out of it until there's no more, and then return
> errors. That would work for ARM.

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +--<mina86-tlen.pl>--<jid:mina86-jabber.org>--ooO--(_)--Ooo--

2010-12-23 15:35:34

by Tomasz Fujak

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

On 2010-12-23 15:20, Russell King - ARM Linux wrote:
> On Thu, Dec 23, 2010 at 03:08:21PM +0100, Tomasz Fujak wrote:
>> On 2010-12-23 14:51, Russell King - ARM Linux wrote:
>>> On Thu, Dec 23, 2010 at 02:41:26PM +0100, Michal Nazarewicz wrote:
>>>> Russell King - ARM Linux <[email protected]> writes:
>>>>> Has anyone addressed my issue with it that this is wide-open for
>>>>> abuse by allocating large chunks of memory, and then remapping
>>>>> them in some way with different attributes, thereby violating the
>>>>> ARM architecture specification?
>>>>>
>>>>> In other words, do we _actually_ have a use for this which doesn't
>>>>> involve doing something like allocating 32MB of memory from it,
>>>>> remapping it so that it's DMA coherent, and then performing DMA
>>>>> on the resulting buffer?
>>>> Huge pages.
>>>>
>>>> Also, don't treat it as coherent memory and just flush/clear/invalidate
>>>> cache before and after each DMA transaction. I never understood what's
>>>> wrong with that approach.
>>> If you've ever used an ARM system with a VIVT cache, you'll know what's
>>> wrong with this approach.
>>>
>>> ARM systems with VIVT caches have extremely poor task switching
>>> performance because they flush the entire data cache at every task switch
>>> - to the extent that it makes system performance drop dramatically when
>>> they become loaded.
>>>
>>> Doing that for every DMA operation will kill the advantage we've gained
>>> from having VIPT caches and ASIDs stone dead.
>> This statement effectively means: don't map dma-able memory to the CPU
>> unless it's uncached. Have I missed anything?
> I'll give you another solution to the problem - lobby ARM Ltd to have
> this restriction lifted from the architecture specification, which
> will probably result in the speculative prefetching also having to be
> removed.
>
Isn't disabling Speculative Accesses forwarding to the AXI bus the
solution to our woes?
At least on the A8, which happens to be paired with non-IOMMU capable
IPs on our SoCs.
On A9 the bit is gone (or has it moved?), but we have IOMMU here so the
CMA isn't needed.

http://infocenter.arm.com/
Cortex-A8 Technical Reference Manual Revision: r3p2
3.2.26. c1, Auxiliary Control Register
CP15, c1, c0, bit 4: Enables speculative accesses on AXI

> That would be my preferred solution if I had the power to do so, but
> I have to live with what ARM Ltd (and their partners such as yourselves)
> decide should end up in the architecture specification.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-12-23 18:04:53

by David Brown

[permalink] [raw]
Subject: Re: [PATCHv8 00/12] Contiguous Memory Allocator

Felipe Contreras <[email protected]> writes:

> On Thu, Dec 23, 2010 at 4:16 PM, Russell King - ARM Linux

> A generic solution (that I think I already proposed) would be to
> reserve a chunk of memory for the CMA that can be removed from the
> normally mapped kernel memory through memblock at boot time. The size
> of this memory region would be configurable through kconfig. Then, the
> CMA would have a "dma" flag or something, and take chunks out of it
> until there's no more, and then return errors. That would work for
> ARM.

That sounds an awful lot like the Android kernel's pmem implementation.

Solving this problem is important for us as well, but, I'm not sure I
see a better solution that something like Felipe suggests.

The disadvantage, of course, being that the memory isn't available for
the system when the user isn't doing the multi-media.

David

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-02-02 12:43:47

by Ankita Garg

[permalink] [raw]
Subject: Re: [PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added

Hi Michal,

On Wed, Dec 15, 2010 at 09:34:27PM +0100, Michal Nazarewicz wrote:
> The Contiguous Memory Allocator is a set of functions that lets
> one initialise a region of memory which then can be used to perform
> allocations of contiguous memory chunks from.
>
> CMA allows for creation of private and non-private contexts.
> The former is reserved for CMA and no other kernel subsystem can
> use it. The latter allows for movable pages to be allocated within
> CMA's managed memory so that it can be used for page cache when
> CMA devices do not use it.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> ---
>

<snip>

> +/************************* Initialise CMA *************************/
> +
> +unsigned long cma_reserve(unsigned long start, unsigned long size,
> + unsigned long alignment)
> +{
> + pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
> + (void *)alignment);
> +
> + /* Sanity checks */
> + if (!size || (alignment & (alignment - 1)))
> + return (unsigned long)-EINVAL;
> +
> + /* Sanitise input arguments */
> + start = PAGE_ALIGN(start);
> + size = PAGE_ALIGN(size);
> + if (alignment < PAGE_SIZE)
> + alignment = PAGE_SIZE;
> +
> + /* Reserve memory */
> + if (start) {
> + if (memblock_is_region_reserved(start, size) ||
> + memblock_reserve(start, size) < 0)
> + return (unsigned long)-EBUSY;
> + } else {
> + /*
> + * Use __memblock_alloc_base() since
> + * memblock_alloc_base() panic()s.
> + */
> + u64 addr = __memblock_alloc_base(size, alignment, 0);
> + if (!addr) {
> + return (unsigned long)-ENOMEM;
> + } else if (addr + size > ~(unsigned long)0) {
> + memblock_free(addr, size);
> + return (unsigned long)-EOVERFLOW;
> + } else {
> + start = addr;
> + }
> + }
> +

Reserving the areas of memory belonging to CMA using memblock_reserve,
would preclude that range from the zones, due to which it would not be
available for buddy allocations right ?

> + return start;
> +}
> +
> +

--
Regards,
Ankita Garg ([email protected])
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India

2011-02-02 14:58:54

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCHv8 07/12] mm: cma: Contiguous Memory Allocator added

> On Wed, Dec 15, 2010 at 09:34:27PM +0100, Michal Nazarewicz wrote:
>> +unsigned long cma_reserve(unsigned long start, unsigned long size,
>> + unsigned long alignment)
>> +{
>> + pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
>> + (void *)alignment);
>> +
>> + /* Sanity checks */
>> + if (!size || (alignment & (alignment - 1)))
>> + return (unsigned long)-EINVAL;
>> +
>> + /* Sanitise input arguments */
>> + start = PAGE_ALIGN(start);
>> + size = PAGE_ALIGN(size);
>> + if (alignment < PAGE_SIZE)
>> + alignment = PAGE_SIZE;
>> +
>> + /* Reserve memory */
>> + if (start) {
>> + if (memblock_is_region_reserved(start, size) ||
>> + memblock_reserve(start, size) < 0)
>> + return (unsigned long)-EBUSY;
>> + } else {
>> + /*
>> + * Use __memblock_alloc_base() since
>> + * memblock_alloc_base() panic()s.
>> + */
>> + u64 addr = __memblock_alloc_base(size, alignment, 0);
>> + if (!addr) {
>> + return (unsigned long)-ENOMEM;
>> + } else if (addr + size > ~(unsigned long)0) {
>> + memblock_free(addr, size);
>> + return (unsigned long)-EOVERFLOW;
>> + } else {
>> + start = addr;
>> + }
>> + }
>> +

On Wed, 02 Feb 2011 13:43:33 +0100, Ankita Garg <[email protected]> wrote:
> Reserving the areas of memory belonging to CMA using memblock_reserve,
> would preclude that range from the zones, due to which it would not be
> available for buddy allocations right ?

Correct. CMA however, injects allocated pageblocks to buddy so they end
up in buddy with migratetype set to MIGRATE_CMA.

>> + return start;
>> +}

--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-<email/jid: [email protected]>--------ooO--(_)--Ooo--