2011-03-31 13:19:36

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCHv9 0/12] Contiguous Memory Allocator

Hello everyone,

There have been a bit quiet about Contiguous Memory Allocator for last
months. This is mainly caused by the fact that Michal (the author of CMA
patches) has left our team and we needed some time to takeover the
development.

This version is mainly a rebase and adaptation for 2.6.39-rc1 kernel
release. I've also managed to fix a bunch of nasty bugs that caused
problems if the allocation failed.

A few words for these who see CMA for the first time:

The Contiguous Memory Allocator (CMA) makes it possible for device
drivers to allocate big contiguous chunks of memory after the system
has booted.

The main difference from the similar frameworks is the fact that CMA
allows to transparently reuse memory region reserved for the big
chunk allocation as a system memory, so no memory is wasted when no
big chunk is allocated. Once the alloc request is issued, the
framework will migrate system pages to create a required big chunk of
physically contiguous memory.

For more information see the changelog and links to previous versions
of CMA framework.

The current version is just an allocator that handles allocation of
contiguous memory blocks. The difference between this patchset and
Kamezawa's alloc_contig_pages() are:

1. alloc_contig_pages() requires MAX_ORDER alignment of allocations
which may be unsuitable for embeded systems where a few MiBs are
required.

Lack of the requirement on the alignment means that several threads
might try to access the same pageblock/page. To prevent this from
happening CMA uses a mutex so that only one cm_alloc()/cm_free()
function may run at one point.

2. CMA may use its own migratetype (MIGRATE_CMA) which behaves
similarly to ZONE_MOVABLE but can be put in arbitrary places.

This is required for us since we need to define two disjoint memory
ranges inside system RAM. (ie. in two memory banks (do not confuse
with nodes)).

3. alloc_contig_pages() scans memory in search for range that could be
migrated. CMA on the other hand maintains its own allocator to
decide where to allocate memory for device drivers and then tries
to migrate pages from that part if needed. This is not strictly
required but I somehow feel it might be faster.

The current version doesn't handle any DMA mapping for the allocated
chunks. This topic will be handled by a separate patch series in the
future.

Despite that, we were able to test this version with real multimedia
hardware on Samsung S5PC110 platform. We used V4L2 drivers that already
contain support for CMA-based memory allocator in videobuf2 framework.
Please refer to last 3 patches of this series to get the idea how it can
be used.


Links to previous versions of the patchset:
v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
v5: (intentionally left out as CMA v5 was identical to CMA v4)
v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

2. Fixed a bunch of nasty bugs that happened when the allocation
failed (mainly kernel oops due to NULL ptr dereference).

3. Introduced testing code: cma-regions compatibility layer and
videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
CMA and put in page_allocator.c. This function tries to
migrate all LRU pages in specified range and then allocate the
range using alloc_contig_freed_pages().

2. Support for MIGRATE_CMA has been separated from the CMA code.
I have not tested if CMA works with ZONE_MOVABLE but I see no
reasons why it shouldn't.

3. I have added a @private argument when creating CMA contexts so
that one can reserve memory and not share it with the rest of
the system. This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
mapping has been removed from the patchset. This is not to say
that this code is not needed, it's just not worth posting
everything in one patchset.

Currently, CMA is "just" an allocator. It uses it's own
migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
which behave just like ZONE_MOVABLE but dispite the latter can
be put in arbitrary places.

2. The migration code that was introduced in the previous version
actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
The implementation is not yet complete though.

Migration support means that when CMA is not using memory
reserved for it, page allocator can allocate pages from it.
When CMA wants to use the memory, the pages have to be moved
and/or evicted as to make room for CMA.

To make it possible it must be guaranteed that only movable and
reclaimable pages are allocated in CMA controlled regions.
This is done by introducing a MIGRATE_CMA migrate type that
guarantees exactly that.

Some of the migration code is "borrowed" from Kamezawa
Hiroyuki's alloc_contig_pages() implementation. The main
difference is that thanks to MIGRATE_CMA migrate type CMA
assumes that memory controlled by CMA are is always movable or
reclaimable so that it makes allocation decisions regardless of
the whether some pages are actually allocated and migrates them
if needed.

The most interesting patches from the patchset that implement
the functionality are:

09/13: mm: alloc_contig_free_pages() added
10/13: mm: MIGRATE_CMA migration type added
11/13: mm: MIGRATE_CMA isolation functions added
12/13: mm: cma: Migration support added [wip]

Currently, kernel panics in some situations which I am trying
to investigate.

2. cma_pin() and cma_unpin() functions has been added (after
a conversation with Johan Mossberg). The idea is that whenever
hardware does not use the memory (no transaction is on) the
chunk can be moved around. This would allow defragmentation to
be implemented if desired. No defragmentation algorithm is
provided at this time.

3. Sysfs support has been replaced with debugfs. I always felt
unsure about the sysfs interface and when Greg KH pointed it
out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
that platform will provide a "*=<regions>" rule in the map
attribute.

2. The terminology has been changed slightly renaming "kind" to
"type" of memory. In the previous revisions, the documentation
indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
a separate patch, the fourth one). As a consequence, the
cma_set_defaults() function has been changed -- it no longer
accepts a string with list of regions but an array of regions.

2. The "asterisk" attribute has been removed. Now, each region
has an "asterisk" flag which lets one specify whether this
region should by considered "asterisk" region.

3. SysFS support has been moved to a separate patch (the third one
in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed. In exchange,
a SysFS entry has been created under kernel/mm/contiguous.

The intended way of specifying the attributes is
a cma_set_defaults() function called by platform initialisation
code. "regions" attribute (the string specified by "cma"
command line parameter) can be overwritten with command line
parameter; the other attributes can be changed during run-time
using the SysFS entries.

2. The behaviour of the "map" attribute has been modified
slightly. Currently, if no rule matches given device it is
assigned regions specified by the "asterisk" attribute. It is
by default built from the region names given in "regions"
attribute.

3. Devices can register private regions as well as regions that
can be shared but are not reserved using standard CMA
mechanisms. A private region has no name and can be accessed
only by devices that have the pointer to it.

4. The way allocators are registered has changed. Currently,
a cma_allocator_register() function is used for that purpose.
Moreover, allocators are attached to regions the first time
memory is registered from the region or when allocator is
registered which means that allocators can be dynamic modules
that are loaded after the kernel booted (of course, it won't be
possible to allocate a chunk of memory from a region if
allocator is not loaded).

5. Index of new functions:

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size,
+ dma_addr_t alignment)

+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)

+int __must_check cma_region_register(struct cma_region *reg);

+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);

+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);

+int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements

Some improvements to genalloc API (most importantly possibility to
allocate memory with alignment requirement).

mm: move some functions from memory_hotplug.c to page_isolation.c
mm: alloc_contig_freed_pages() added

Code "stolen" from Kamezawa. The first patch just moves code
around and the second provide function for "allocates" already
freed memory.

mm: alloc_contig_range() added

This is what Kamezawa asked: a function that tries to migrate all
pages from given range and then use alloc_contig_freed_pages()
(defined by the previous commit) to allocate those pages.

mm: cma: Contiguous Memory Allocator added

The CMA code but with no MIGRATE_CMA support yet. This assues
that one uses a ZONE_MOVABLE.

mm: MIGRATE_CMA migration type added
mm: MIGRATE_CMA isolation functions added
mm: MIGRATE_CMA support added to CMA

Introduction of the new migratetype and support for it in CMA.
MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any
memory range can be marked as one.

mm: cma: add CMA 'regions style' API (for testing)

This is a compatiblity layer with older CMA v1 API. It is mostly
intended to test the CMA framework with multimedia drivers
that we have available for Samsung S5PC110 platform. Not for
merging, just an example.

v4l: videobuf2: add CMA allocator (for testing)

Main client of CMA 'region style' framework. Updated to latest
changes in cma and cma-regions API. Used for testing multimedia
drivers that we have available for Samsung S5PC110 platform.
Not for merging, just an example.

ARM: S5PC110: Added CMA regions to Aquila and Goni boards

A stub integration with some ARM machines. Mostly to get the cma
testing device working. Not for merging, just an example.



Patch summary:

KAMEZAWA Hiroyuki (2):
mm: move some functions from memory_hotplug.c to page_isolation.c
mm: alloc_contig_freed_pages() added

Marek Szyprowski (3):
mm: cma: add CMA 'regions style' API (for testing)
v4l: videobuf2: add CMA allocator (for testing)
ARM: cma: Added CMA regions to Aquila and Goni boards

Michal Nazarewicz (7):
lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()
lib: genalloc: Generic allocator improvements
mm: alloc_contig_range() added
mm: cma: Contiguous Memory Allocator added
mm: MIGRATE_CMA migration type added
mm: MIGRATE_CMA isolation functions added
mm: MIGRATE_CMA support added to CMA

arch/arm/mach-s5pv210/mach-aquila.c | 31 ++
arch/arm/mach-s5pv210/mach-goni.c | 31 ++
drivers/media/video/Kconfig | 6 +
drivers/media/video/Makefile | 1 +
drivers/media/video/videobuf2-cma.c | 227 +++++++++++
include/linux/bitmap.h | 24 +-
include/linux/cma-regions.h | 340 ++++++++++++++++
include/linux/cma.h | 264 ++++++++++++
include/linux/genalloc.h | 46 ++-
include/linux/mmzone.h | 43 ++-
include/linux/page-isolation.h | 50 ++-
include/media/videobuf2-cma.h | 40 ++
lib/bitmap.c | 22 +-
lib/genalloc.c | 182 +++++----
mm/Kconfig | 34 ++
mm/Makefile | 1 +
mm/cma-regions.c | 759 +++++++++++++++++++++++++++++++++++
mm/cma.c | 457 +++++++++++++++++++++
mm/compaction.c | 10 +
mm/internal.h | 3 +
mm/memory_hotplug.c | 109 -----
mm/page_alloc.c | 292 +++++++++++++-
mm/page_isolation.c | 126 ++++++-
23 files changed, 2827 insertions(+), 271 deletions(-)
create mode 100644 drivers/media/video/videobuf2-cma.c
create mode 100644 include/linux/cma-regions.h
create mode 100644 include/linux/cma.h
create mode 100644 include/media/videobuf2-cma.h
create mode 100644 mm/cma-regions.c
create mode 100644 mm/cma.c

--
1.7.1.569.g6f426


2011-03-31 13:16:25

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 03/12] mm: move some functions from memory_hotplug.c to page_isolation.c

From: KAMEZAWA Hiroyuki <[email protected]>

Memory hotplug is a logic for making pages unused in the specified
range of pfn. So, some of core logics can be used for other purpose
as allocating a very large contigous memory block.

This patch moves some functions from mm/memory_hotplug.c to
mm/page_isolation.c. This helps adding a function for large-alloc in
page_isolation.c with memory-unplug technique.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
[m.nazarewicz: reworded commit message]
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
[m.szyprowski: rebase and updated to v2.6.39-rc1 changes]
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/page-isolation.h | 7 +++
mm/memory_hotplug.c | 109 ---------------------------------------
mm/page_isolation.c | 111 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 109 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..58cdbac 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);

+/*
+ * For migration.
+ */
+
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long scan_lru_pages(unsigned long start, unsigned long end);
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);

#endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 321fc74..746f6ed 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -640,115 +640,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
}

/*
- * Confirm all pages in a range [start, end) is belongs to the same zone.
- */
-static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct zone *zone = NULL;
- struct page *page;
- int i;
- for (pfn = start_pfn;
- pfn < end_pfn;
- pfn += MAX_ORDER_NR_PAGES) {
- i = 0;
- /* This is just a CONFIG_HOLES_IN_ZONE check.*/
- while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
- i++;
- if (i == MAX_ORDER_NR_PAGES)
- continue;
- page = pfn_to_page(pfn + i);
- if (zone && page_zone(page) != zone)
- return 0;
- zone = page_zone(page);
- }
- return 1;
-}
-
-/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
- */
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
-{
- unsigned long pfn;
- struct page *page;
- for (pfn = start; pfn < end; pfn++) {
- if (pfn_valid(pfn)) {
- page = pfn_to_page(pfn);
- if (PageLRU(page))
- return pfn;
- }
- }
- return 0;
-}
-
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
- /* This should be improooooved!! */
- return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
-#define NR_OFFLINE_AT_ONCE_PAGES (256)
-static int
-do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
-{
- unsigned long pfn;
- struct page *page;
- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
- int not_managed = 0;
- int ret = 0;
- LIST_HEAD(source);
-
- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
- if (!pfn_valid(pfn))
- continue;
- page = pfn_to_page(pfn);
- if (!page_count(page))
- continue;
- /*
- * We can skip free pages. And we can only deal with pages on
- * LRU.
- */
- ret = isolate_lru_page(page);
- if (!ret) { /* Success */
- list_add_tail(&page->lru, &source);
- move_pages--;
- inc_zone_page_state(page, NR_ISOLATED_ANON +
- page_is_file_cache(page));
-
- } else {
-#ifdef CONFIG_DEBUG_VM
- printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
- pfn);
- dump_page(page);
-#endif
- /* Becasue we don't have big zone->lock. we should
- check this again here. */
- if (page_count(page)) {
- not_managed++;
- ret = -EBUSY;
- break;
- }
- }
- }
- if (!list_empty(&source)) {
- if (not_managed) {
- putback_lru_pages(&source);
- goto out;
- }
- /* this function returns # of failed pages */
- ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
- true, true);
- if (ret)
- putback_lru_pages(&source);
- }
-out:
- return ret;
-}
-
-/*
* remove from free_area[] and mark all as Reserved.
*/
static int
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..8a3122c 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -5,6 +5,9 @@
#include <linux/mm.h>
#include <linux/page-isolation.h>
#include <linux/pageblock-flags.h>
+#include <linux/memcontrol.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
#include "internal.h"

static inline struct page *
@@ -139,3 +142,111 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
}
+
+
+/*
+ * Confirm all pages in a range [start, end) is belongs to the same zone.
+ */
+int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct zone *zone = NULL;
+ struct page *page;
+ int i;
+ for (pfn = start_pfn;
+ pfn < end_pfn;
+ pfn += MAX_ORDER_NR_PAGES) {
+ i = 0;
+ /* This is just a CONFIG_HOLES_IN_ZONE check.*/
+ while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
+ i++;
+ if (i == MAX_ORDER_NR_PAGES)
+ continue;
+ page = pfn_to_page(pfn + i);
+ if (zone && page_zone(page) != zone)
+ return 0;
+ zone = page_zone(page);
+ }
+ return 1;
+}
+
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+{
+ unsigned long pfn;
+ struct page *page;
+ for (pfn = start; pfn < end; pfn++) {
+ if (pfn_valid(pfn)) {
+ page = pfn_to_page(pfn);
+ if (PageLRU(page))
+ return pfn;
+ }
+ }
+ return 0;
+}
+
+struct page *
+hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
+{
+ /* This should be improooooved!! */
+ return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+#define NR_OFFLINE_AT_ONCE_PAGES (256)
+int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long pfn;
+ struct page *page;
+ int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+ int not_managed = 0;
+ int ret = 0;
+ LIST_HEAD(source);
+
+ for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+ if (!pfn_valid(pfn))
+ continue;
+ page = pfn_to_page(pfn);
+ if (!page_count(page))
+ continue;
+ /*
+ * We can skip free pages. And we can only deal with pages on
+ * LRU.
+ */
+ ret = isolate_lru_page(page);
+ if (!ret) { /* Success */
+ list_add_tail(&page->lru, &source);
+ move_pages--;
+ inc_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+
+ } else {
+#ifdef CONFIG_DEBUG_VM
+ printk(KERN_ALERT "removing pfn %lx from LRU failed\n",
+ pfn);
+ dump_page(page);
+#endif
+ /* Because we don't have big zone->lock. we should
+ check this again here. */
+ if (page_count(page)) {
+ not_managed++;
+ ret = -EBUSY;
+ break;
+ }
+ }
+ }
+ if (!list_empty(&source)) {
+ if (not_managed) {
+ putback_lru_pages(&source);
+ goto out;
+ }
+ /* this function returns # of failed pages */
+ ret = migrate_pages(&source, hotremove_migrate_alloc, MPOL_MF_MOVE_ALL, 0, 1);
+ if (ret)
+ putback_lru_pages(&source);
+ }
+out:
+ return ret;
+}
--
1.7.1.569.g6f426

2011-03-31 13:16:34

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 09/12] mm: MIGRATE_CMA support added to CMA

From: Michal Nazarewicz <[email protected]>

This commit adds MIGRATE_CMA migratetype support to the CMA.
The advantage is that an (almost) arbitrary memory range can
be marked as MIGRATE_CMA which may not be the case with
ZONE_MOVABLE.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/cma.h | 58 +++++++++++++++---
mm/cma.c | 167 ++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 197 insertions(+), 28 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index e9575fd..8952531 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -71,9 +71,14 @@
* a platform/machine specific function. For the former CMA
* provides the following functions:
*
+ * cma_init_migratetype()
* cma_reserve()
* cma_create()
*
+ * The first one initialises a portion of reserved memory so that it
+ * can be used with CMA. The second first tries to reserve memory
+ * (using memblock) and then initialise it.
+ *
* The cma_reserve() function must be called when memblock is still
* operational and reserving memory with it is still possible. On
* ARM platform the "reserve" machine callback is a perfect place to
@@ -93,21 +98,56 @@ struct cma;
/* Contiguous Memory chunk */
struct cm;

+#ifdef CONFIG_MIGRATE_CMA
+
+/**
+ * cma_init_migratetype() - initialises range of physical memory to be used
+ * with CMA context.
+ * @start: start address of the memory range in bytes.
+ * @size: size of the memory range in bytes.
+ *
+ * The range must be MAX_ORDER_NR_PAGES aligned and it must have been
+ * already reserved (eg. with memblock).
+ *
+ * The actual initialisation is deferred until subsys initcalls are
+ * evaluated (unless this has already happened).
+ *
+ * Returns zero on success or negative error.
+ */
+int cma_init_migratetype(unsigned long start, unsigned long end);
+
+#else
+
+static inline int cma_init_migratetype(unsigned long start, unsigned long end)
+{
+ (void)start; (void)end;
+ return -EOPNOTSUPP;
+}
+
+#endif
+
/**
* cma_reserve() - reserves memory.
* @start: start address of the memory range in bytes hint; if unsure
* pass zero.
* @size: size of the memory to reserve in bytes.
* @alignment: desired alignment in bytes (must be power of two or zero).
+ * @init_migratetype: whether to initialise pageblocks.
+ *
+ * It will use memblock to allocate memory. If @init_migratetype is
+ * true, the function will also call cma_init_migratetype() on
+ * reserved region so that a non-private CMA context can be created on
+ * given range.
*
- * It will use memblock to allocate memory. @start and @size will be
- * aligned to PAGE_SIZE.
+ * @start and @size will be aligned to PAGE_SIZE if @init_migratetype
+ * is false or to (MAX_ORDER_NR_PAGES << PAGE_SHIFT) if
+ * @init_migratetype is true.
*
* Returns reserved's area physical address or value that yields true
* when checked with IS_ERR_VALUE().
*/
unsigned long cma_reserve(unsigned long start, unsigned long size,
- unsigned long alignment);
+ unsigned long alignment, _Bool init_migratetype);

/**
* cma_create() - creates a CMA context.
@@ -118,12 +158,14 @@ unsigned long cma_reserve(unsigned long start, unsigned long size,
*
* The range must be page aligned. Different contexts cannot overlap.
*
- * Unless @private is true the memory range must lay in ZONE_MOVABLE.
- * If @private is true no underlaying memory checking is done and
- * during allocation no pages migration will be performed - it is
- * assumed that the memory is reserved and only CMA manages it.
+ * Unless @private is true the memory range must either lay in
+ * ZONE_MOVABLE or must have been initialised with
+ * cma_init_migratetype() function. If @private is true no
+ * underlaying memory checking is done and during allocation no pages
+ * migration will be performed - it is assumed that the memory is
+ * reserved and only CMA manages it.
*
- * @start and @size must be page and @min_alignment alignment.
+ * @start and @size must be page and @min_alignment aligned.
* @min_alignment specifies the minimal alignment that user will be
* able to request through cm_alloc() function. In most cases one
* will probably pass zero as @min_alignment but if the CMA context
diff --git a/mm/cma.c b/mm/cma.c
index f212920..ded91cab 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -57,21 +57,132 @@ static unsigned long phys_to_pfn(phys_addr_t phys)

/************************* Initialise CMA *************************/

+#ifdef CONFIG_MIGRATE_CMA
+
+static struct cma_grabbed {
+ unsigned long start;
+ unsigned long size;
+} cma_grabbed[8] __initdata;
+static unsigned cma_grabbed_count __initdata;
+
+#ifdef CONFIG_DEBUG_VM
+
+static int __cma_give_back(unsigned long start, unsigned long size)
+{
+ unsigned long pfn = phys_to_pfn(start);
+ unsigned i = size >> PAGE_SHIFT;
+ struct zone *zone;
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ VM_BUG_ON(!pfn_valid(pfn));
+ zone = page_zone(pfn_to_page(pfn));
+
+ do {
+ VM_BUG_ON(!pfn_valid(pfn));
+ VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
+ if (!(pfn & (pageblock_nr_pages - 1)))
+ __free_pageblock_cma(pfn_to_page(pfn));
+ ++pfn;
+ ++totalram_pages;
+ } while (--i);
+
+ return 0;
+}
+
+#else
+
+static int __cma_give_back(unsigned long start, unsigned long size)
+{
+ unsigned i = size >> (PAGE_SHIFT + pageblock_order);
+ struct page *p = phys_to_page(start);
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ do {
+ __free_pageblock_cma(p);
+ p += pageblock_nr_pages;
+ totalram_pages += pageblock_nr_pages;
+ } while (--i);
+
+ return 0;
+}
+
+#endif
+
+static int __init __cma_queue_give_back(unsigned long start, unsigned long size)
+{
+ if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed))
+ return -ENOSPC;
+
+ cma_grabbed[cma_grabbed_count].start = start;
+ cma_grabbed[cma_grabbed_count].size = size;
+ ++cma_grabbed_count;
+ return 0;
+}
+
+static int (*cma_give_back)(unsigned long start, unsigned long size) =
+ __cma_queue_give_back;
+
+static int __init cma_give_back_queued(void)
+{
+ struct cma_grabbed *r = cma_grabbed;
+ unsigned i = cma_grabbed_count;
+
+ pr_debug("%s(): will give %u range(s)\n", __func__, i);
+
+ cma_give_back = __cma_give_back;
+
+ for (; i; --i, ++r)
+ __cma_give_back(r->start, r->size);
+
+ return 0;
+}
+subsys_initcall(cma_give_back_queued);
+
+int __ref cma_init_migratetype(unsigned long start, unsigned long size)
+{
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ if (!size)
+ return -EINVAL;
+ if ((start | size) & ((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1))
+ return -EINVAL;
+ if (start + size < start)
+ return -EOVERFLOW;
+
+ return cma_give_back(start, size);
+}
+
+#endif
+
unsigned long cma_reserve(unsigned long start, unsigned long size,
- unsigned long alignment)
+ unsigned long alignment, bool init_migratetype)
{
pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
(void *)alignment);

+#ifndef CONFIG_MIGRATE_CMA
+ if (init_migratetype)
+ return -EOPNOTSUPP;
+#endif
+
/* Sanity checks */
if (!size || (alignment & (alignment - 1)))
return (unsigned long)-EINVAL;

/* Sanitise input arguments */
- start = PAGE_ALIGN(start);
- size = PAGE_ALIGN(size);
- if (alignment < PAGE_SIZE)
- alignment = PAGE_SIZE;
+ if (init_migratetype) {
+ start = ALIGN(start, MAX_ORDER_NR_PAGES << PAGE_SHIFT);
+ size = ALIGN(size , MAX_ORDER_NR_PAGES << PAGE_SHIFT);
+ if (alignment < (MAX_ORDER_NR_PAGES << PAGE_SHIFT))
+ alignment = MAX_ORDER_NR_PAGES << PAGE_SHIFT;
+ } else {
+ start = PAGE_ALIGN(start);
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+ }

/* Reserve memory */
if (start) {
@@ -94,6 +205,15 @@ unsigned long cma_reserve(unsigned long start, unsigned long size,
}
}

+ /* CMA Initialise */
+ if (init_migratetype) {
+ int ret = cma_init_migratetype(start, size);
+ if (ret < 0) {
+ memblock_free(start, size);
+ return ret;
+ }
+ }
+
return start;
}

@@ -101,12 +221,13 @@ unsigned long cma_reserve(unsigned long start, unsigned long size,
/************************** CMA context ***************************/

struct cma {
- bool migrate;
+ int migratetype;
struct gen_pool *pool;
};

static int __cma_check_range(unsigned long start, unsigned long size)
{
+ int migratetype = MIGRATE_MOVABLE;
unsigned long pfn, count;
struct page *page;
struct zone *zone;
@@ -115,8 +236,13 @@ static int __cma_check_range(unsigned long start, unsigned long size)
if (WARN_ON(!pfn_valid(start)))
return -EINVAL;

+#ifdef CONFIG_MIGRATE_CMA
+ if (page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE)
+ migratetype = MIGRATE_CMA;
+#else
if (WARN_ON(page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE))
return -EINVAL;
+#endif

/* First check if all pages are valid and in the same zone */
zone = page_zone(pfn_to_page(start));
@@ -134,20 +260,20 @@ static int __cma_check_range(unsigned long start, unsigned long size)
page = pfn_to_page(start);
count = (pfn - start) >> PAGE_SHIFT;
do {
- if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_MOVABLE))
+ if (WARN_ON(get_pageblock_migratetype(page) != migratetype))
return -EINVAL;
page += pageblock_nr_pages;
} while (--count);

- return 0;
+ return migratetype;
}

struct cma *cma_create(unsigned long start, unsigned long size,
unsigned long min_alignment, bool private)
{
struct gen_pool *pool;
+ int migratetype, ret;
struct cma *cma;
- int ret;

pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);

@@ -162,10 +288,12 @@ struct cma *cma_create(unsigned long start, unsigned long size,
if (start + size < start)
return ERR_PTR(-EOVERFLOW);

- if (!private) {
- ret = __cma_check_range(start, size);
- if (ret < 0)
- return ERR_PTR(ret);
+ if (private) {
+ migratetype = 0;
+ } else {
+ migratetype = __cma_check_range(start, size);
+ if (migratetype < 0)
+ return ERR_PTR(migratetype);
}

cma = kmalloc(sizeof *cma, GFP_KERNEL);
@@ -182,7 +310,7 @@ struct cma *cma_create(unsigned long start, unsigned long size,
if (unlikely(ret))
goto error2;

- cma->migrate = !private;
+ cma->migratetype = migratetype;
cma->pool = pool;

pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
@@ -238,13 +366,12 @@ struct cm *cm_alloc(struct cma *cma, unsigned long size,
if (!start)
goto error1;

- if (cma->migrate) {
+ if (cma->migratetype) {
unsigned long pfn = phys_to_pfn(start);
- ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT), 0);
- if (ret) {
- pr_info("cma allocation failed\n");
+ ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT),
+ 0, cma->migratetype);
+ if (ret)
goto error2;
- }
}

mutex_unlock(&cma_mutex);
@@ -277,7 +404,7 @@ void cm_free(struct cm *cm)
mutex_lock(&cma_mutex);

gen_pool_free(cm->cma->pool, cm->phys, cm->size);
- if (cm->cma->migrate)
+ if (cm->cma->migratetype)
free_contig_pages(phys_to_page(cm->phys),
cm->size >> PAGE_SHIFT);

--
1.7.1.569.g6f426

2011-03-31 13:16:33

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 07/12] mm: MIGRATE_CMA migration type added

From: Michal Nazarewicz <[email protected]>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees that page in a MIGRATE_CMA page block can
always be migrated somewhere else (unless there's no memory left
in the system).

It is designed to be used with Contiguous Memory Allocator
(CMA) for allocating big chunks (eg. 10MiB) of physically
contiguous memory. Once driver requests contiguous memory,
CMA will migrate pages from MIGRATE_CMA pageblocks.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/mmzone.h | 43 ++++++++++++++++++----
mm/Kconfig | 28 +++++++++------
mm/compaction.c | 10 +++++
mm/internal.h | 3 ++
mm/page_alloc.c | 93 ++++++++++++++++++++++++++++++++++++++----------
5 files changed, 140 insertions(+), 37 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e56f835..db8495e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
*/
#define PAGE_ALLOC_COSTLY_ORDER 3

-#define MIGRATE_UNMOVABLE 0
-#define MIGRATE_RECLAIMABLE 1
-#define MIGRATE_MOVABLE 2
-#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE 3
-#define MIGRATE_ISOLATE 4 /* can't allocate from here */
-#define MIGRATE_TYPES 5
+enum {
+ MIGRATE_UNMOVABLE,
+ MIGRATE_RECLAIMABLE,
+ MIGRATE_MOVABLE,
+ MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
+ MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_MIGRATE_CMA
+ /*
+ * MIGRATE_CMA migration type is designed to mimic the way
+ * ZONE_MOVABLE works. Only movable pages can be allocated
+ * from MIGRATE_CMA pageblocks and page allocator never
+ * implicitly change migration type of MIGRATE_CMA pageblock.
+ *
+ * The way to use it is to change migratetype of a range of
+ * pageblocks to MIGRATE_CMA which can be done by
+ * __free_pageblock_cma() function. What is important though
+ * is that a range of pageblocks must be aligned to
+ * MAX_ORDER_NR_PAGES should biggest page be bigger then
+ * a single pageblock.
+ */
+ MIGRATE_CMA,
+#endif
+ MIGRATE_ISOLATE, /* can't allocate from here */
+ MIGRATE_TYPES
+};
+
+#ifdef CONFIG_MIGRATE_CMA
+# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+# define is_migrate_cma(migratetype) false
+#endif

#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
}

+static inline bool is_pageblock_cma(struct page *page)
+{
+ return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
diff --git a/mm/Kconfig b/mm/Kconfig
index ac40779..844455e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -341,23 +341,29 @@ choice
endchoice

config CMA
- bool "Contiguous Memory Allocator framework"
- # Currently there is only one allocator so force it on
+ bool "Contiguous Memory Allocator"
select MIGRATION
select GENERIC_ALLOCATOR
help
- This enables the Contiguous Memory Allocator framework which
- allows drivers to allocate big physically-contiguous blocks of
- memory for use with hardware components that do not support I/O
- map nor scatter-gather.
+ This enables the Contiguous Memory Allocator which allows drivers
+ to allocate big physically-contiguous blocks of memory for use with
+ hardware components that do not support I/O map nor scatter-gather.

- If you select this option you will also have to select at least
- one allocator algorithm below.
+ For more information see <include/linux/cma.h>. If unsure, say "n".
+
+config MIGRATE_CMA
+ bool "Use MIGRATE_CMA migratetype"
+ depends on CMA
+ default y
+ help
+ This enables the use the MIGRATE_CMA migrate type in the CMA.
+ MIGRATE_CMA lets CMA work on almost arbitrary memory range and
+ not only inside ZONE_MOVABLE.

- To make use of CMA you need to specify the regions and
- driver->region mapping on command line when booting the kernel.
+ This option can also be selected by code that uses MIGRATE_CMA
+ even if CMA is not present.

- For more information see <include/linux/cma.h>. If unsure, say "n".
+ If unsure, say "y".

config CMA_DEBUG
bool "CMA debug messages (DEVELOPEMENT)"
diff --git a/mm/compaction.c b/mm/compaction.c
index 021a296..6d013c3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page)
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
return false;

+ /* Keep MIGRATE_CMA alone as well. */
+ /*
+ * XXX Revisit. We currently cannot let compaction touch CMA
+ * pages since compaction insists on changing their migration
+ * type to MIGRATE_MOVABLE (see split_free_page() called from
+ * isolate_freepages_block() above).
+ */
+ if (is_migrate_cma(migratetype))
+ return false;
+
/* If the page is a large free page, then allow migration */
if (PageBuddy(page) && page_order(page) >= pageblock_order)
return true;
diff --git a/mm/internal.h b/mm/internal.h
index 3438dd4..0df30da 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page);
* in mm/page_alloc.c
*/
extern void __free_pages_bootmem(struct page *page, unsigned int order);
+#ifdef CONFIG_MIGRATE_CMA
+extern void __free_pageblock_cma(struct page *page);
+#endif
extern void prep_compound_page(struct page *page, unsigned long order);
#ifdef CONFIG_MEMORY_FAILURE
extern bool is_free_buddy_page(struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index be21ac9..24f795e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -716,6 +716,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
}
}

+#ifdef CONFIG_MIGRATE_CMA
+
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init __free_pageblock_cma(struct page *page)
+{
+ struct page *p = page;
+ unsigned i = pageblock_nr_pages;
+
+ prefetchw(p);
+ do {
+ if (--i)
+ prefetchw(p + 1);
+ __ClearPageReserved(p);
+ set_page_count(p, 0);
+ } while (++p, i);
+
+ set_page_refcounted(page);
+ set_pageblock_migratetype(page, MIGRATE_CMA);
+ __free_pages(page, pageblock_order);
+}
+
+#endif

/*
* The order of subdivision here is critical for the IO subsystem.
@@ -824,11 +848,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
*/
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
+#ifdef CONFIG_MIGRATE_CMA
+ [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE },
+#else
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
- [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */
+#endif
+ [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
};

/*
@@ -923,12 +951,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
/* Find the largest possible block of pages in the other list */
for (current_order = MAX_ORDER-1; current_order >= order;
--current_order) {
- for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+ for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
migratetype = fallbacks[start_migratetype][i];

/* MIGRATE_RESERVE handled later if necessary */
if (migratetype == MIGRATE_RESERVE)
- continue;
+ break;

area = &(zone->free_area[current_order]);
if (list_empty(&area->free_list[migratetype]))
@@ -943,19 +971,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
* pages to the preferred allocation list. If falling
* back for a reclaimable kernel allocation, be more
* agressive about taking ownership of free pages
+ *
+ * On the other hand, never change migration
+ * type of MIGRATE_CMA pageblocks nor move CMA
+ * pages on different free lists. We don't
+ * want unmovable pages to be allocated from
+ * MIGRATE_CMA areas.
*/
- if (unlikely(current_order >= (pageblock_order >> 1)) ||
- start_migratetype == MIGRATE_RECLAIMABLE ||
- page_group_by_mobility_disabled) {
- unsigned long pages;
+ if (!is_pageblock_cma(page) &&
+ (unlikely(current_order >= pageblock_order / 2) ||
+ start_migratetype == MIGRATE_RECLAIMABLE ||
+ page_group_by_mobility_disabled)) {
+ int pages;
pages = move_freepages_block(zone, page,
- start_migratetype);
+ start_migratetype);

- /* Claim the whole block if over half of it is free */
+ /*
+ * Claim the whole block if over half
+ * of it is free
+ */
if (pages >= (1 << (pageblock_order-1)) ||
- page_group_by_mobility_disabled)
+ page_group_by_mobility_disabled)
set_pageblock_migratetype(page,
- start_migratetype);
+ start_migratetype);

migratetype = start_migratetype;
}
@@ -965,11 +1003,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
rmv_page_order(page);

/* Take ownership for orders >= pageblock_order */
- if (current_order >= pageblock_order)
+ if (current_order >= pageblock_order &&
+ !is_pageblock_cma(page))
change_pageblock_range(page, current_order,
start_migratetype);

- expand(zone, page, order, current_order, area, migratetype);
+ expand(zone, page, order, current_order, area,
+ is_migrate_cma(start_migratetype)
+ ? start_migratetype : migratetype);

trace_mm_page_alloc_extfrag(page, order, current_order,
start_migratetype, migratetype);
@@ -1041,7 +1082,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
list_add(&page->lru, list);
else
list_add_tail(&page->lru, list);
- set_page_private(page, migratetype);
+#ifdef CONFIG_MIGRATE_CMA
+ if (is_pageblock_cma(page))
+ set_page_private(page, MIGRATE_CMA);
+ else
+#endif
+ set_page_private(page, migratetype);
list = &page->lru;
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1182,9 +1228,16 @@ void free_hot_cold_page(struct page *page, int cold)
* offlined but treat RESERVE as movable pages so we can get those
* areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
+ *
+ * Still, do not change migration type of MIGRATE_CMA pages (if
+ * they'd be recorded as MIGRATE_MOVABLE an unmovable page could
+ * be allocated from MIGRATE_CMA block and we don't want to allow
+ * that). In this respect, treat MIGRATE_CMA like
+ * MIGRATE_ISOLATE.
*/
if (migratetype >= MIGRATE_PCPTYPES) {
- if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+ if (unlikely(migratetype == MIGRATE_ISOLATE
+ || is_migrate_cma(migratetype))) {
free_one_page(zone, page, 0, migratetype);
goto out;
}
@@ -1273,7 +1326,9 @@ int split_free_page(struct page *page)
if (order >= pageblock_order - 1) {
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages)
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ if (!is_pageblock_cma(page))
+ set_pageblock_migratetype(page,
+ MIGRATE_MOVABLE);
}

return 1 << order;
@@ -5429,8 +5484,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
*/
if (zone_idx(zone) == ZONE_MOVABLE)
return true;
-
- if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+ if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+ is_pageblock_cma(page))
return true;

pfn = page_to_pfn(page);
--
1.7.1.569.g6f426

2011-03-31 13:16:18

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 06/12] mm: cma: Contiguous Memory Allocator added

From: Michal Nazarewicz <[email protected]>

The Contiguous Memory Allocator is a set of functions that lets
one initialise a region of memory which then can be used to perform
allocations of contiguous memory chunks from.

CMA allows for creation of private and non-private contexts.
The former is reserved for CMA and no other kernel subsystem can
use it. The latter allows for movable pages to be allocated within
CMA's managed memory so that it can be used for page cache when
CMA devices do not use it.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/cma.h | 219 ++++++++++++++++++++++++++++++++++
mm/Kconfig | 28 +++++
mm/Makefile | 1 +
mm/cma.c | 330 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 2 +-
5 files changed, 579 insertions(+), 1 deletions(-)
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma.c

diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 0000000..e9575fd
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,219 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ * The Contiguous Memory Allocator (CMA) makes it possible for
+ * device drivers to allocate big contiguous chunks of memory after
+ * the system has booted.
+ *
+ * It requires some machine- and/or platform-specific initialisation
+ * code which prepares memory ranges to be used with CMA and later,
+ * device drivers can allocate memory from those ranges.
+ *
+ * Why is it needed?
+ *
+ * Various devices on embedded systems have no scatter-getter and/or
+ * IO map support and require contiguous blocks of memory to
+ * operate. They include devices such as cameras, hardware video
+ * coders, etc.
+ *
+ * Such devices often require big memory buffers (a full HD frame
+ * is, for instance, more then 2 mega pixels large, i.e. more than 6
+ * MB of memory), which makes mechanisms such as kmalloc() or
+ * alloc_page() ineffective.
+ *
+ * At the same time, a solution where a big memory region is
+ * reserved for a device is suboptimal since often more memory is
+ * reserved then strictly required and, moreover, the memory is
+ * inaccessible to page system even if device drivers don't use it.
+ *
+ * CMA tries to solve this issue by operating on memory regions
+ * where only movable pages can be allocated from. This way, kernel
+ * can use the memory for pagecache and when device driver requests
+ * it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ * For device driver to use CMA it needs to have a pointer to a CMA
+ * context represented by a struct cma (which is an opaque data
+ * type).
+ *
+ * Once such pointer is obtained, device driver may allocate
+ * contiguous memory chunk using the following function:
+ *
+ * cm_alloc()
+ *
+ * This function returns a pointer to struct cm (another opaque data
+ * type) which represent a contiguous memory chunk. This pointer
+ * may be used with the following functions:
+ *
+ * cm_free() -- frees allocated contiguous memory
+ * cm_pin() -- pins memory
+ * cm_unpin() -- unpins memory
+ * cm_vmap() -- maps memory in kernel space
+ * cm_vunmap() -- unmaps memory from kernel space
+ *
+ * See the respective functions for more information.
+ *
+ * Platform/machine integration
+ *
+ * For device drivers to be able to use CMA platform or machine
+ * initialisation code must create a CMA context and pass it to
+ * device drivers. The latter may be done by a global variable or
+ * a platform/machine specific function. For the former CMA
+ * provides the following functions:
+ *
+ * cma_reserve()
+ * cma_create()
+ *
+ * The cma_reserve() function must be called when memblock is still
+ * operational and reserving memory with it is still possible. On
+ * ARM platform the "reserve" machine callback is a perfect place to
+ * call it.
+ *
+ * The last function creates a CMA context on a range of previously
+ * initialised memory addresses. Because it uses kmalloc() it needs
+ * to be called after SLAB is initialised.
+ */
+
+/***************************** Kernel level API *****************************/
+
+#if defined __KERNEL__ && defined CONFIG_CMA
+
+/* CMA context */
+struct cma;
+/* Contiguous Memory chunk */
+struct cm;
+
+/**
+ * cma_reserve() - reserves memory.
+ * @start: start address of the memory range in bytes hint; if unsure
+ * pass zero.
+ * @size: size of the memory to reserve in bytes.
+ * @alignment: desired alignment in bytes (must be power of two or zero).
+ *
+ * It will use memblock to allocate memory. @start and @size will be
+ * aligned to PAGE_SIZE.
+ *
+ * Returns reserved's area physical address or value that yields true
+ * when checked with IS_ERR_VALUE().
+ */
+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment);
+
+/**
+ * cma_create() - creates a CMA context.
+ * @start: start address of the context in bytes.
+ * @size: size of the context in bytes.
+ * @min_alignment: minimal desired alignment or zero.
+ * @private: whether to create private context.
+ *
+ * The range must be page aligned. Different contexts cannot overlap.
+ *
+ * Unless @private is true the memory range must lay in ZONE_MOVABLE.
+ * If @private is true no underlaying memory checking is done and
+ * during allocation no pages migration will be performed - it is
+ * assumed that the memory is reserved and only CMA manages it.
+ *
+ * @start and @size must be page and @min_alignment alignment.
+ * @min_alignment specifies the minimal alignment that user will be
+ * able to request through cm_alloc() function. In most cases one
+ * will probably pass zero as @min_alignment but if the CMA context
+ * will be used only for, say, 1 MiB blocks passing 1 << 20 as
+ * @min_alignment may increase performance and reduce memory usage
+ * slightly.
+ *
+ * Because this function uses kmalloc() it must be called after SLAB
+ * is initialised. This in particular means that it cannot be called
+ * just after cma_reserve() since the former needs to be run way
+ * earlier.
+ *
+ * Returns pointer to CMA context or a pointer-error on error.
+ */
+struct cma *cma_create(unsigned long start, unsigned long size,
+ unsigned long min_alignment, _Bool private);
+
+/**
+ * cma_destroy() - destroys CMA context.
+ * @cma: context to destroy.
+ */
+void cma_destroy(struct cma *cma);
+
+/**
+ * cm_alloc() - allocates contiguous memory.
+ * @cma: CMA context to use.
+ * @size: desired chunk size in bytes (must be non-zero).
+ * @alignent: desired minimal alignment in bytes (must be power of two
+ * or zero).
+ *
+ * Returns pointer to structure representing contiguous memory or
+ * a pointer-error on error.
+ */
+struct cm *cm_alloc(struct cma *cma, unsigned long size,
+ unsigned long alignment);
+
+/**
+ * cm_free() - frees contiguous memory.
+ * @cm: contiguous memory to free.
+ *
+ * The contiguous memory must be not be pinned (see cma_pin()) and
+ * must not be mapped to kernel space (cma_vmap()).
+ */
+void cm_free(struct cm *cm);
+
+/**
+ * cm_pin() - pins contiguous memory.
+ * @cm: contiguous memory to pin.
+ *
+ * Pinning is required to obtain contiguous memory's physical address.
+ * While memory is pinned the memory will remain valid it may change
+ * if memory is unpinned and then pinned again. This facility is
+ * provided so that memory defragmentation can be implemented inside
+ * CMA.
+ *
+ * Each call to cm_pin() must be accompanied by call to cm_unpin() and
+ * the calls may be nested.
+ *
+ * Returns chunk's physical address or a value that yields true when
+ * tested with IS_ERR_VALUE().
+ */
+unsigned long cm_pin(struct cm *cm);
+
+/**
+ * cm_unpin() - unpins contiguous memory.
+ * @cm: contiguous memory to unpin.
+ *
+ * See cm_pin().
+ */
+void cm_unpin(struct cm *cm);
+
+/**
+ * cm_vmap() - maps memory to kernel space (or returns existing mapping).
+ * @cm: contiguous memory to map.
+ *
+ * Each call to cm_vmap() must be accompanied with call to cm_vunmap()
+ * and the calls may be nested.
+ *
+ * Returns kernel virtual address or a pointer-error.
+ */
+void *cm_vmap(struct cm *cm);
+
+/**
+ * cm_vunmap() - unmpas memory from kernel space.
+ * @cm: contiguous memory to unmap.
+ *
+ * See cm_vmap().
+ */
+void cm_vunmap(struct cm *cm);
+
+#endif
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index e9c0c61..ac40779 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -340,6 +340,34 @@ choice
benefit.
endchoice

+config CMA
+ bool "Contiguous Memory Allocator framework"
+ # Currently there is only one allocator so force it on
+ select MIGRATION
+ select GENERIC_ALLOCATOR
+ help
+ This enables the Contiguous Memory Allocator framework which
+ allows drivers to allocate big physically-contiguous blocks of
+ memory for use with hardware components that do not support I/O
+ map nor scatter-gather.
+
+ If you select this option you will also have to select at least
+ one allocator algorithm below.
+
+ To make use of CMA you need to specify the regions and
+ driver->region mapping on command line when booting the kernel.
+
+ For more information see <include/linux/cma.h>. If unsure, say "n".
+
+config CMA_DEBUG
+ bool "CMA debug messages (DEVELOPEMENT)"
+ depends on CMA
+ help
+ Turns on debug messages in CMA. This produces KERN_DEBUG
+ messages for every CMA call as well as various messages while
+ processing calls such as cma_alloc(). This option does not
+ affect warning and error messages.
+
#
# UP and nommu archs use km based percpu allocator
#
diff --git a/mm/Makefile b/mm/Makefile
index 42a8326..01c3b20 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -49,3 +49,4 @@ obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
+obj-$(CONFIG_CMA) += cma.o
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 0000000..f212920
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,330 @@
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See include/linux/cma.h for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/cma.h>
+
+#ifndef CONFIG_NO_BOOTMEM
+# include <linux/bootmem.h>
+#endif
+#ifdef CONFIG_HAVE_MEMBLOCK
+# include <linux/memblock.h>
+#endif
+
+#include <linux/err.h>
+#include <linux/genalloc.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+
+#include "internal.h"
+
+/* XXX Revisit */
+#ifdef phys_to_pfn
+/* nothing to do */
+#elif defined __phys_to_pfn
+# define phys_to_pfn __phys_to_pfn
+#else
+# warning correct phys_to_pfn implementation needed
+static unsigned long phys_to_pfn(phys_addr_t phys)
+{
+ return virt_to_pfn(phys_to_virt(phys));
+}
+#endif
+
+
+/************************* Initialise CMA *************************/
+
+unsigned long cma_reserve(unsigned long start, unsigned long size,
+ unsigned long alignment)
+{
+ pr_debug("%s(%p+%p/%p)\n", __func__, (void *)start, (void *)size,
+ (void *)alignment);
+
+ /* Sanity checks */
+ if (!size || (alignment & (alignment - 1)))
+ return (unsigned long)-EINVAL;
+
+ /* Sanitise input arguments */
+ start = PAGE_ALIGN(start);
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+
+ /* Reserve memory */
+ if (start) {
+ if (memblock_is_region_reserved(start, size) ||
+ memblock_reserve(start, size) < 0)
+ return (unsigned long)-EBUSY;
+ } else {
+ /*
+ * Use __memblock_alloc_base() since
+ * memblock_alloc_base() panic()s.
+ */
+ u64 addr = __memblock_alloc_base(size, alignment, 0);
+ if (!addr) {
+ return (unsigned long)-ENOMEM;
+ } else if (addr + size > ~(unsigned long)0) {
+ memblock_free(addr, size);
+ return (unsigned long)-EOVERFLOW;
+ } else {
+ start = addr;
+ }
+ }
+
+ return start;
+}
+
+
+/************************** CMA context ***************************/
+
+struct cma {
+ bool migrate;
+ struct gen_pool *pool;
+};
+
+static int __cma_check_range(unsigned long start, unsigned long size)
+{
+ unsigned long pfn, count;
+ struct page *page;
+ struct zone *zone;
+
+ start = phys_to_pfn(start);
+ if (WARN_ON(!pfn_valid(start)))
+ return -EINVAL;
+
+ if (WARN_ON(page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE))
+ return -EINVAL;
+
+ /* First check if all pages are valid and in the same zone */
+ zone = page_zone(pfn_to_page(start));
+ count = size >> PAGE_SHIFT;
+ pfn = start;
+ while (++pfn, --count) {
+ if (WARN_ON(!pfn_valid(pfn)) ||
+ WARN_ON(page_zone(pfn_to_page(pfn)) != zone))
+ return -EINVAL;
+ }
+
+ /* Now check migratetype of their pageblocks. */
+ start = start & ~(pageblock_nr_pages - 1);
+ pfn = ALIGN(pfn, pageblock_nr_pages);
+ page = pfn_to_page(start);
+ count = (pfn - start) >> PAGE_SHIFT;
+ do {
+ if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_MOVABLE))
+ return -EINVAL;
+ page += pageblock_nr_pages;
+ } while (--count);
+
+ return 0;
+}
+
+struct cma *cma_create(unsigned long start, unsigned long size,
+ unsigned long min_alignment, bool private)
+{
+ struct gen_pool *pool;
+ struct cma *cma;
+ int ret;
+
+ pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size);
+
+ if (!size)
+ return ERR_PTR(-EINVAL);
+ if (min_alignment & (min_alignment - 1))
+ return ERR_PTR(-EINVAL);
+ if (min_alignment < PAGE_SIZE)
+ min_alignment = PAGE_SIZE;
+ if ((start | size) & (min_alignment - 1))
+ return ERR_PTR(-EINVAL);
+ if (start + size < start)
+ return ERR_PTR(-EOVERFLOW);
+
+ if (!private) {
+ ret = __cma_check_range(start, size);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ }
+
+ cma = kmalloc(sizeof *cma, GFP_KERNEL);
+ if (!cma)
+ return ERR_PTR(-ENOMEM);
+
+ pool = gen_pool_create(ffs(min_alignment) - 1, -1);
+ if (!pool) {
+ ret = -ENOMEM;
+ goto error1;
+ }
+
+ ret = gen_pool_add(pool, start, size, -1);
+ if (unlikely(ret))
+ goto error2;
+
+ cma->migrate = !private;
+ cma->pool = pool;
+
+ pr_debug("%s: returning <%p>\n", __func__, (void *)cma);
+ return cma;
+
+error2:
+ gen_pool_destroy(pool);
+error1:
+ kfree(cma);
+ return ERR_PTR(ret);
+}
+
+void cma_destroy(struct cma *cma)
+{
+ pr_debug("%s(<%p>)\n", __func__, (void *)cma);
+ gen_pool_destroy(cma->pool);
+}
+
+
+/************************* Allocate and free *************************/
+
+struct cm {
+ struct cma *cma;
+ unsigned long phys, size;
+ atomic_t pinned, mapped;
+};
+
+/* Protects cm_alloc(), cm_free() as well as gen_pools of each cm. */
+static DEFINE_MUTEX(cma_mutex);
+
+struct cm *cm_alloc(struct cma *cma, unsigned long size,
+ unsigned long alignment)
+{
+ unsigned long start;
+ int ret = -ENOMEM;
+ struct cm *cm;
+
+ pr_debug("%s(<%p>, %p/%p)\n", __func__, (void *)cma,
+ (void *)size, (void *)alignment);
+
+ if (!size || (alignment & (alignment - 1)))
+ return ERR_PTR(-EINVAL);
+ size = PAGE_ALIGN(size);
+
+ cm = kmalloc(sizeof *cm, GFP_KERNEL);
+ if (!cm)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_lock(&cma_mutex);
+
+ start = gen_pool_alloc_aligned(cma->pool, size,
+ alignment ? ffs(alignment) - 1 : 0);
+ if (!start)
+ goto error1;
+
+ if (cma->migrate) {
+ unsigned long pfn = phys_to_pfn(start);
+ ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT), 0);
+ if (ret) {
+ pr_info("cma allocation failed\n");
+ goto error2;
+ }
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ cm->cma = cma;
+ cm->phys = start;
+ cm->size = size;
+ atomic_set(&cm->pinned, 0);
+ atomic_set(&cm->mapped, 0);
+
+ pr_debug("%s(): returning [%p]\n", __func__, (void *)cm);
+ return cm;
+
+error2:
+ gen_pool_free(cma->pool, start, size);
+error1:
+ mutex_unlock(&cma_mutex);
+ kfree(cm);
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(cm_alloc);
+
+void cm_free(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+
+ if (WARN_ON(atomic_read(&cm->pinned) || atomic_read(&cm->mapped)))
+ return;
+
+ mutex_lock(&cma_mutex);
+
+ gen_pool_free(cm->cma->pool, cm->phys, cm->size);
+ if (cm->cma->migrate)
+ free_contig_pages(phys_to_page(cm->phys),
+ cm->size >> PAGE_SHIFT);
+
+ mutex_unlock(&cma_mutex);
+
+ kfree(cm);
+}
+EXPORT_SYMBOL_GPL(cm_free);
+
+
+/************************* Mapping and addresses *************************/
+
+/*
+ * Currently no-operations but keep reference counters for error
+ * checking.
+ */
+
+unsigned long cm_pin(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ atomic_inc(&cm->pinned);
+ return cm->phys;
+}
+EXPORT_SYMBOL_GPL(cm_pin);
+
+void cm_unpin(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ WARN_ON(!atomic_add_unless(&cm->pinned, -1, 0));
+}
+EXPORT_SYMBOL_GPL(cm_unpin);
+
+void *cm_vmap(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ atomic_inc(&cm->mapped);
+ /*
+ * XXX We should probably do something more clever in the
+ * future. The memory might be highmem after all.
+ */
+ return phys_to_virt(cm->phys);
+}
+EXPORT_SYMBOL_GPL(cm_vmap);
+
+void cm_vunmap(struct cm *cm)
+{
+ pr_debug("%s([%p])\n", __func__, (void *)cm);
+ WARN_ON(!atomic_add_unless(&cm->mapped, -1, 0));
+}
+EXPORT_SYMBOL_GPL(cm_vunmap);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a270a5..be21ac9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5713,7 +5713,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
return -EINVAL;

_start = start & (~0UL << ret);
- _end = alloc_contig_freed_pages(_start, end, flag);
+ _end = alloc_contig_freed_pages(_start, end, flags);

/* Free head and tail (if any) */
if (start != _start)
--
1.7.1.569.g6f426

2011-03-31 13:16:29

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 11/12] v4l: videobuf2: add CMA allocator (for testing)

Add support for the CMA contiguous memory allocator to videobuf2.

Signed-off-by: Marek Szyprowski <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
CC: Pawel Osciak <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
drivers/media/video/Kconfig | 6 +
drivers/media/video/Makefile | 1 +
drivers/media/video/videobuf2-cma.c | 227 +++++++++++++++++++++++++++++++++++
include/media/videobuf2-cma.h | 40 ++++++
4 files changed, 274 insertions(+), 0 deletions(-)
create mode 100644 drivers/media/video/videobuf2-cma.c
create mode 100644 include/media/videobuf2-cma.h

diff --git a/drivers/media/video/Kconfig b/drivers/media/video/Kconfig
index 4498b94..d80c9d6 100644
--- a/drivers/media/video/Kconfig
+++ b/drivers/media/video/Kconfig
@@ -66,6 +66,12 @@ config VIDEOBUF2_DMA_SG
select VIDEOBUF2_CORE
select VIDEOBUF2_MEMOPS
tristate
+
+config VIDEOBUF2_CMA
+ depends on CMA
+ select VIDEOBUF2_CORE
+ select VIDEOBUF2_MEMOPS
+ tristate
#
# Multimedia Video device configuration
#
diff --git a/drivers/media/video/Makefile b/drivers/media/video/Makefile
index ace5d8b..86e98df 100644
--- a/drivers/media/video/Makefile
+++ b/drivers/media/video/Makefile
@@ -118,6 +118,7 @@ obj-$(CONFIG_VIDEOBUF2_MEMOPS) += videobuf2-memops.o
obj-$(CONFIG_VIDEOBUF2_VMALLOC) += videobuf2-vmalloc.o
obj-$(CONFIG_VIDEOBUF2_DMA_CONTIG) += videobuf2-dma-contig.o
obj-$(CONFIG_VIDEOBUF2_DMA_SG) += videobuf2-dma-sg.o
+obj-$(CONFIG_VIDEOBUF2_CMA) += videobuf2-cma.o

obj-$(CONFIG_V4L2_MEM2MEM_DEV) += v4l2-mem2mem.o

diff --git a/drivers/media/video/videobuf2-cma.c b/drivers/media/video/videobuf2-cma.c
new file mode 100644
index 0000000..5dc7e89
--- /dev/null
+++ b/drivers/media/video/videobuf2-cma.c
@@ -0,0 +1,227 @@
+/*
+ * videobuf2-cma.c - CMA memory allocator for videobuf2
+ *
+ * Copyright (C) 2010-2011 Samsung Electronics
+ *
+ * Author: Pawel Osciak <[email protected]>
+ * Marek Szyprowski <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/cma.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+
+#include <media/videobuf2-core.h>
+#include <media/videobuf2-memops.h>
+
+#include <asm/io.h>
+
+struct vb2_cma_conf {
+ struct device *dev;
+ const char *type;
+ unsigned long alignment;
+};
+
+struct vb2_cma_buf {
+ struct vb2_cma_conf *conf;
+ struct cm *cm;
+ dma_addr_t paddr;
+ unsigned long size;
+ struct vm_area_struct *vma;
+ atomic_t refcount;
+ struct vb2_vmarea_handler handler;
+};
+
+static void vb2_cma_put(void *buf_priv);
+
+static void *vb2_cma_alloc(void *alloc_ctx, unsigned long size)
+{
+ struct vb2_cma_conf *conf = alloc_ctx;
+ struct vb2_cma_buf *buf;
+
+ buf = kzalloc(sizeof *buf, GFP_KERNEL);
+ if (!buf)
+ return ERR_PTR(-ENOMEM);
+
+ buf->cm = cma_alloc(conf->dev, conf->type, size, conf->alignment);
+ if (IS_ERR(buf->cm)) {
+ printk(KERN_ERR "cma_alloc of size %ld failed\n", size);
+ kfree(buf);
+ return ERR_PTR(-ENOMEM);
+ }
+ buf->paddr = cm_pin(buf->cm);
+ buf->conf = conf;
+ buf->size = size;
+
+ buf->handler.refcount = &buf->refcount;
+ buf->handler.put = vb2_cma_put;
+ buf->handler.arg = buf;
+
+ atomic_inc(&buf->refcount);
+
+ return buf;
+}
+
+static void vb2_cma_put(void *buf_priv)
+{
+ struct vb2_cma_buf *buf = buf_priv;
+
+ if (atomic_dec_and_test(&buf->refcount)) {
+ cm_unpin(buf->cm);
+ cma_free(buf->cm);
+ kfree(buf);
+ }
+}
+
+static void *vb2_cma_cookie(void *buf_priv)
+{
+ struct vb2_cma_buf *buf = buf_priv;
+
+ return (void *)buf->paddr;
+}
+
+static unsigned int vb2_cma_num_users(void *buf_priv)
+{
+ struct vb2_cma_buf *buf = buf_priv;
+
+ return atomic_read(&buf->refcount);
+}
+
+static int vb2_cma_mmap(void *buf_priv, struct vm_area_struct *vma)
+{
+ struct vb2_cma_buf *buf = buf_priv;
+
+ if (!buf) {
+ printk(KERN_ERR "No buffer to map\n");
+ return -EINVAL;
+ }
+
+ return vb2_mmap_pfn_range(vma, buf->paddr, buf->size,
+ &vb2_common_vm_ops, &buf->handler);
+}
+
+static void *vb2_cma_get_userptr(void *alloc_ctx, unsigned long vaddr,
+ unsigned long size, int write)
+{
+ struct vb2_cma_buf *buf;
+ struct vm_area_struct *vma;
+ dma_addr_t paddr = 0;
+ int ret;
+
+ buf = kzalloc(sizeof *buf, GFP_KERNEL);
+ if (!buf)
+ return ERR_PTR(-ENOMEM);
+
+ ret = vb2_get_contig_userptr(vaddr, size, &vma, &paddr);
+ if (ret) {
+ printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
+ vaddr);
+ kfree(buf);
+ return ERR_PTR(ret);
+ }
+
+ buf->size = size;
+ buf->paddr = paddr;
+ buf->vma = vma;
+
+ return buf;
+}
+
+static void vb2_cma_put_userptr(void *mem_priv)
+{
+ struct vb2_cma_buf *buf = mem_priv;
+
+ if (!buf)
+ return;
+
+ vb2_put_vma(buf->vma);
+ kfree(buf);
+}
+
+static void *vb2_cma_vaddr(void *mem_priv)
+{
+ struct vb2_cma_buf *buf = mem_priv;
+ if (!buf)
+ return 0;
+
+ return phys_to_virt(buf->paddr);
+}
+
+const struct vb2_mem_ops vb2_cma_memops = {
+ .alloc = vb2_cma_alloc,
+ .put = vb2_cma_put,
+ .cookie = vb2_cma_cookie,
+ .mmap = vb2_cma_mmap,
+ .get_userptr = vb2_cma_get_userptr,
+ .put_userptr = vb2_cma_put_userptr,
+ .num_users = vb2_cma_num_users,
+ .vaddr = vb2_cma_vaddr,
+};
+EXPORT_SYMBOL_GPL(vb2_cma_memops);
+
+void *vb2_cma_init(struct device *dev, const char *type,
+ unsigned long alignment)
+{
+ struct vb2_cma_conf *conf;
+
+ conf = kzalloc(sizeof *conf, GFP_KERNEL);
+ if (!conf)
+ return ERR_PTR(-ENOMEM);
+
+ conf->dev = dev;
+ conf->type = type;
+ conf->alignment = alignment;
+
+ return conf;
+}
+EXPORT_SYMBOL_GPL(vb2_cma_init);
+
+void vb2_cma_cleanup(void *conf)
+{
+ kfree(conf);
+}
+EXPORT_SYMBOL_GPL(vb2_cma_cleanup);
+
+void **vb2_cma_init_multi(struct device *dev,
+ unsigned int num_planes,
+ const char *types[],
+ unsigned long alignments[])
+{
+ struct vb2_cma_conf *cma_conf;
+ void **alloc_ctxes;
+ unsigned int i;
+
+ alloc_ctxes = kzalloc((sizeof *alloc_ctxes + sizeof *cma_conf)
+ * num_planes, GFP_KERNEL);
+ if (!alloc_ctxes)
+ return ERR_PTR(-ENOMEM);
+
+ cma_conf = (void *)(alloc_ctxes + num_planes);
+
+ for (i = 0; i < num_planes; ++i, ++cma_conf) {
+ alloc_ctxes[i] = cma_conf;
+ cma_conf->dev = dev;
+ cma_conf->type = types[i];
+ cma_conf->alignment = alignments[i];
+ }
+
+ return alloc_ctxes;
+}
+EXPORT_SYMBOL_GPL(vb2_cma_init_multi);
+
+void vb2_cma_cleanup_multi(void **alloc_ctxes)
+{
+ kfree(alloc_ctxes);
+}
+EXPORT_SYMBOL_GPL(vb2_cma_cleanup_multi);
+
+MODULE_DESCRIPTION("CMA allocator handling routines for videobuf2");
+MODULE_AUTHOR("Pawel Osciak");
+MODULE_LICENSE("GPL");
diff --git a/include/media/videobuf2-cma.h b/include/media/videobuf2-cma.h
new file mode 100644
index 0000000..ba8dad5
--- /dev/null
+++ b/include/media/videobuf2-cma.h
@@ -0,0 +1,40 @@
+/*
+ * videobuf2-cma.h - CMA memory allocator for videobuf2
+ *
+ * Copyright (C) 2010-2011 Samsung Electronics
+ *
+ * Author: Pawel Osciak <[email protected]>
+ * Marek Szyprowski <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _MEDIA_VIDEOBUF2_CMA_H
+#define _MEDIA_VIDEOBUF2_CMA_H
+
+#include <media/videobuf2-core.h>
+
+static inline unsigned long vb2_cma_plane_paddr(struct vb2_buffer *vb,
+ unsigned int plane_no)
+{
+ return (unsigned long)vb2_plane_cookie(vb, plane_no);
+}
+
+struct vb2_alloc_ctx *vb2_cma_init(struct device *dev, const char *type,
+ unsigned long alignment);
+void vb2_cma_cleanup(struct vb2_alloc_ctx *alloc_ctx);
+
+struct vb2_alloc_ctx **vb2_cma_init_multi(struct device *dev,
+ unsigned int num_planes, const char *types[],
+ unsigned long alignments[]);
+void vb2_cma_cleanup_multi(struct vb2_alloc_ctx **alloc_ctxes);
+
+struct vb2_alloc_ctx *vb2_cma_init(struct device *dev, const char *type,
+ unsigned long alignment);
+void vb2_cma_cleanup(struct vb2_alloc_ctx *alloc_ctx);
+
+extern const struct vb2_mem_ops vb2_cma_memops;
+
+#endif
--
1.7.1.569.g6f426

2011-03-31 13:16:31

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 12/12] ARM: S5PC110: Added CMA regions to Aquila and Goni boards

This commit adds CMA memory regions to Aquila and Goni boards.

Signed-off-by: Marek Szyprowski <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
arch/arm/mach-s5pv210/mach-aquila.c | 31 +++++++++++++++++++++++++++++++
arch/arm/mach-s5pv210/mach-goni.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c
index 4e1d8ff..de3c41f 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -21,6 +21,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -642,6 +643,35 @@ static void __init aquila_sound_init(void)
__raw_writel(__raw_readl(S5P_OTHERS) | (0x3 << 8), S5P_OTHERS);
}

+static void __init aquila_reserve(void)
+{
+ static struct cma_region regions[] = {
+ {
+ .name = "fw",
+ .size = 16 << 20,
+ .alignment = 128 << 10,
+ .start = 0x32000000,
+ },
+ {
+ .name = "b1",
+ .size = 32 << 20,
+ .start = 0x33000000,
+ },
+ {
+ .name = "b2",
+ .size = 32 << 20,
+ .start = 0x44000000,
+ },
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s5p-mfc/f=fw;s5p-mfc/a=b1;s5p-mfc/b=b2;*=b1,b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
static void __init aquila_map_io(void)
{
s5p_init_io(NULL, 0, S5P_VA_CHIPID);
@@ -683,4 +713,5 @@ MACHINE_START(AQUILA, "Aquila")
.map_io = aquila_map_io,
.init_machine = aquila_machine_init,
.timer = &s5p_timer,
+ .reserve = aquila_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 31d5aa7..24f3f5c 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -26,6 +26,7 @@
#include <linux/input.h>
#include <linux/gpio.h>
#include <linux/interrupt.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -833,6 +834,35 @@ static void __init goni_sound_init(void)
__raw_writel(__raw_readl(S5P_OTHERS) | (0x3 << 8), S5P_OTHERS);
}

+static void __init goni_reserve(void)
+{
+ static struct cma_region regions[] = {
+ {
+ .name = "fw",
+ .size = 16 << 20,
+ .alignment = 128 << 10,
+ .start = 0x31000000,
+ },
+ {
+ .name = "b1",
+ .size = 32 << 20,
+ .start = 0x33000000,
+ },
+ {
+ .name = "b2",
+ .size = 32 << 20,
+ .start = 0x44000000,
+ },
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s5p-mfc/f=fw;s5p-mfc/a=b1;s5p-mfc/b=b2;*=b1,b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
static void __init goni_map_io(void)
{
s5p_init_io(NULL, 0, S5P_VA_CHIPID);
@@ -893,4 +923,5 @@ MACHINE_START(GONI, "GONI")
.map_io = goni_map_io,
.init_machine = goni_machine_init,
.timer = &s5p_timer,
+ .reserve = goni_reserve,
MACHINE_END
--
1.7.1.569.g6f426

2011-03-31 13:17:53

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 05/12] mm: alloc_contig_range() added

From: Michal Nazarewicz <[email protected]>

This commit adds the alloc_contig_range() function which tries
to allecate given range of pages. It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/page-isolation.h | 2 +
mm/page_alloc.c | 144 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index f1417ed..c5d1a7c 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
extern unsigned long alloc_contig_freed_pages(unsigned long start,
unsigned long end, gfp_t flag);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+ gfp_t flags);
extern void free_contig_pages(struct page *page, int nr_pages);

/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 11f8bcb..0a270a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5583,6 +5583,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
return pfn;
}

+static unsigned long pfn_to_maxpage(unsigned long pfn)
+{
+ return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_to_maxpage_up(unsigned long pfn)
+{
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+#define MIGRATION_RETRY 5
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+ int migration_failed = 0, ret;
+ unsigned long pfn = start;
+
+ /*
+ * Some code "borrowed" from KAMEZAWA Hiroyuki's
+ * __alloc_contig_pages().
+ */
+
+ for (;;) {
+ pfn = scan_lru_pages(pfn, end);
+ if (!pfn || pfn >= end)
+ break;
+
+ ret = do_migrate_range(pfn, end);
+ if (!ret) {
+ migration_failed = 0;
+ } else if (ret != -EBUSY
+ || ++migration_failed >= MIGRATION_RETRY) {
+ return ret;
+ } else {
+ /* There are unstable pages.on pagevec. */
+ lru_add_drain_all();
+ /*
+ * there may be pages on pcplist before
+ * we mark the range as ISOLATED.
+ */
+ drain_all_pages();
+ }
+ cond_resched();
+ }
+
+ if (!migration_failed) {
+ /* drop all pages in pagevec and pcp list */
+ lru_add_drain_all();
+ drain_all_pages();
+ }
+
+ /* Make sure all pages are isolated */
+ if (WARN_ON(test_pages_isolated(start, end)))
+ return -EBUSY;
+
+ return 0;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start: start PFN to allocate
+ * @end: one-past-the-last PFN to allocate
+ * @flags: flags passed to alloc_contig_freed_pages().
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, hovewer it's callers responsibility to guarantee that we
+ * are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * Returns zero on success or negative error code. On success all
+ * pages which PFN is in (start, end) are allocated for the caller and
+ * need to be freed with free_contig_pages().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end,
+ gfp_t flags)
+{
+ unsigned long _start, _end;
+ int ret;
+
+ /*
+ * What we do here is we mark all pageblocks in range as
+ * MIGRATE_ISOLATE. Because of the way page allocator work, we
+ * align the range to MAX_ORDER pages so that page allocator
+ * won't try to merge buddies from different pageblocks and
+ * change MIGRATE_ISOLATE to some other migration type.
+ *
+ * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+ * migrate the pages from an unaligned range (ie. pages that
+ * we are interested in). This will put all the pages in
+ * range back to page allocator as MIGRATE_ISOLATE.
+ *
+ * When this is done, we take the pages in range from page
+ * allocator removing them from the buddy system. This way
+ * page allocator will never consider using them.
+ *
+ * This lets us mark the pageblocks back as
+ * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+ * MAX_ORDER aligned range but not in the unaligned, original
+ * range are put back to page allocator so that buddy can use
+ * them.
+ */
+
+ ret = start_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end));
+ if (ret)
+ goto done;
+
+ ret = __alloc_contig_migrate_range(start, end);
+ if (ret)
+ goto done;
+
+ /*
+ * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+ * aligned blocks that are marked as MIGRATE_ISOLATE. What's
+ * more, all pages in [start, end) are free in page allocator.
+ * What we are going to do is to allocate all pages from
+ * [start, end) (that is remove them from page allocater).
+ *
+ * The only problem is that pages at the beginning and at the
+ * end of interesting range may be not aligned with pages that
+ * page allocator holds, ie. they can be part of higher order
+ * pages. Because of this, we reserve the bigger range and
+ * once this is done free the pages we are not interested in.
+ */
+
+ ret = 0;
+ while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
+ if (WARN_ON(++ret >= MAX_ORDER))
+ return -EINVAL;
+
+ _start = start & (~0UL << ret);
+ _end = alloc_contig_freed_pages(_start, end, flag);
+
+ /* Free head and tail (if any) */
+ if (start != _start)
+ free_contig_pages(pfn_to_page(_start), start - _start);
+ if (end != _end)
+ free_contig_pages(pfn_to_page(end), _end - end);
+
+ ret = 0;
+done:
+ undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ return ret;
+}
+
void free_contig_pages(struct page *page, int nr_pages)
{
for (; nr_pages; --nr_pages, ++page)
--
1.7.1.569.g6f426

2011-03-31 13:16:16

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 04/12] mm: alloc_contig_freed_pages() added

From: KAMEZAWA Hiroyuki <[email protected]>

This commit introduces alloc_contig_freed_pages() function
which allocates (ie. removes from buddy system) free pages
in range. Caller has to guarantee that all pages in range
are in buddy system.

Along with this function, a free_contig_pages() function is
provided which frees all (or a subset of) pages allocated
with alloc_contig_free_pages().

Michal Nazarewicz has modified the function to make it easier
to allocate not MAX_ORDER_NR_PAGES aligned pages by making it
return pfn of one-past-the-last allocated page.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/page-isolation.h | 3 ++
mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 58cdbac..f1417ed 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
*/
extern int set_migratetype_isolate(struct page *page);
extern void unset_migratetype_isolate(struct page *page);
+extern unsigned long alloc_contig_freed_pages(unsigned long start,
+ unsigned long end, gfp_t flag);
+extern void free_contig_pages(struct page *page, int nr_pages);

/*
* For migration.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d6e7ba7..11f8bcb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5545,6 +5545,50 @@ out:
spin_unlock_irqrestore(&zone->lock, flags);
}

+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
+ gfp_t flag)
+{
+ unsigned long pfn = start, count;
+ struct page *page;
+ struct zone *zone;
+ int order;
+
+ VM_BUG_ON(!pfn_valid(start));
+ zone = page_zone(pfn_to_page(start));
+
+ spin_lock_irq(&zone->lock);
+
+ page = pfn_to_page(pfn);
+ for (;;) {
+ VM_BUG_ON(page_count(page) || !PageBuddy(page));
+ list_del(&page->lru);
+ order = page_order(page);
+ zone->free_area[order].nr_free--;
+ rmv_page_order(page);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+ pfn += 1 << order;
+ if (pfn >= end)
+ break;
+ VM_BUG_ON(!pfn_valid(pfn));
+ page += 1 << order;
+ }
+
+ spin_unlock_irq(&zone->lock);
+
+ /* After this, pages in the range can be freed one be one */
+ page = pfn_to_page(start);
+ for (count = pfn - start; count; --count, ++page)
+ prep_new_page(page, 0, flag);
+
+ return pfn;
+}
+
+void free_contig_pages(struct page *page, int nr_pages)
+{
+ for (; nr_pages; --nr_pages, ++page)
+ __free_page(page);
+}
+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* All pages in the range must be isolated before calling this.
--
1.7.1.569.g6f426

2011-03-31 13:18:24

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 08/12] mm: MIGRATE_CMA isolation functions added

From: Michal Nazarewicz <[email protected]>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/page-isolation.h | 40 +++++++++++++++++++++++++++-------------
mm/page_alloc.c | 19 ++++++++++++-------
mm/page_isolation.c | 15 ++++++++-------
3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index c5d1a7c..177b307 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,39 +3,53 @@

/*
* Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
* this will fail with -EBUSY.
*
* For isolating all pages in the range finally, the caller have to
* free all pages in the range. test_page_isolated() can be used for
* test it.
*/
-extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);
+
+static inline int
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}
+
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype);

/*
* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
* target range is [start_pfn, end_pfn)
*/
-extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+static inline int
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+}

/*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
*/
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);

/*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
*/
-extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page);
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype);
+static inline void unset_migratetype_isolate(struct page *page)
+{
+ __unset_migratetype_isolate(page, MIGRATE_MOVABLE);
+}
extern unsigned long alloc_contig_freed_pages(unsigned long start,
unsigned long end, gfp_t flag);
extern int alloc_contig_range(unsigned long start, unsigned long end,
- gfp_t flags);
+ gfp_t flags, unsigned migratetype);
extern void free_contig_pages(struct page *page, int nr_pages);

/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 24f795e..5f4232d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5586,7 +5586,7 @@ out:
return ret;
}

-void unset_migratetype_isolate(struct page *page)
+void __unset_migratetype_isolate(struct page *page, unsigned migratetype)
{
struct zone *zone;
unsigned long flags;
@@ -5594,8 +5594,8 @@ void unset_migratetype_isolate(struct page *page)
spin_lock_irqsave(&zone->lock, flags);
if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
goto out;
- set_pageblock_migratetype(page, MIGRATE_MOVABLE);
- move_freepages_block(zone, page, MIGRATE_MOVABLE);
+ set_pageblock_migratetype(page, migratetype);
+ move_freepages_block(zone, page, migratetype);
out:
spin_unlock_irqrestore(&zone->lock, flags);
}
@@ -5700,6 +5700,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
* @start: start PFN to allocate
* @end: one-past-the-last PFN to allocate
* @flags: flags passed to alloc_contig_freed_pages().
+ * @migratetype: migratetype of the underlaying pageblocks (either
+ * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks
+ * in range must have the same migratetype and it must
+ * be either of the two.
*
* The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
* aligned, hovewer it's callers responsibility to guarantee that we
@@ -5711,7 +5715,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
* need to be freed with free_contig_pages().
*/
int alloc_contig_range(unsigned long start, unsigned long end,
- gfp_t flags)
+ gfp_t flags, unsigned migratetype)
{
unsigned long _start, _end;
int ret;
@@ -5739,8 +5743,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* them.
*/

- ret = start_isolate_page_range(pfn_to_maxpage(start),
- pfn_to_maxpage_up(end));
+ ret = __start_isolate_page_range(pfn_to_maxpage(start),
+ pfn_to_maxpage_up(end), migratetype);
if (ret)
goto done;

@@ -5778,7 +5782,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,

ret = 0;
done:
- undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end));
+ __undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end),
+ migratetype);
return ret;
}

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 8a3122c..d36f082 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
}

/*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
+ * __start_isolate_page_range() -- make page-allocation-type of range of pages
* to be MIGRATE_ISOLATE.
* @start_pfn: The lower PFN of the range to be isolated.
* @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
*
* Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
* the range will never be allocated. Any free pages and pages freed in the
@@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* start_pfn/end_pfn must be aligned to pageblock_order.
* Returns 0 on success and -EBUSY if any part of range cannot be isolated.
*/
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
unsigned long undo_pfn;
@@ -59,7 +60,7 @@ undo:
for (pfn = start_pfn;
pfn < undo_pfn;
pfn += pageblock_nr_pages)
- unset_migratetype_isolate(pfn_to_page(pfn));
+ __unset_migratetype_isolate(pfn_to_page(pfn), migratetype);

return -EBUSY;
}
@@ -67,8 +68,8 @@ undo:
/*
* Make isolated pages available again.
*/
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+ unsigned migratetype)
{
unsigned long pfn;
struct page *page;
@@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
page = __first_valid_page(pfn, pageblock_nr_pages);
if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
continue;
- unset_migratetype_isolate(page);
+ __unset_migratetype_isolate(page, migratetype);
}
return 0;
}
--
1.7.1.569.g6f426

2011-03-31 13:19:15

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 10/12] mm: cma: add CMA 'regions style' API (for testing)

This patch adds CMA 'regions style' API (almost compatible with CMA v1).
It is intended mainly for testing the real contigous memory allocator
and page migration with devices that use older CMA api.

Based on previous works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/cma-regions.h | 340 +++++++++++++++++++
include/linux/cma.h | 3 +
mm/Makefile | 2 +-
mm/cma-regions.c | 759 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 1103 insertions(+), 1 deletions(-)
create mode 100644 include/linux/cma-regions.h
create mode 100644 mm/cma-regions.c

diff --git a/include/linux/cma-regions.h b/include/linux/cma-regions.h
new file mode 100644
index 0000000..f3a2f4d
--- /dev/null
+++ b/include/linux/cma-regions.h
@@ -0,0 +1,340 @@
+#ifndef __LINUX_CMA_REGIONS_H
+#define __LINUX_CMA_REGIONS_H
+
+/*
+ * Contiguous Memory Allocator framework - memory region management
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ * Updated by Marek Szyprowski ([email protected])
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/init.h>
+
+/***************************** Kernel level API *****************************/
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+
+
+struct device;
+struct cma_info;
+struct cm;
+
+/*
+ * Don't call it directly, use cma_alloc(), cma_alloc_from() or
+ * cma_alloc_from_region().
+ */
+struct cm * __must_check
+__cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment);
+
+/* Don't call it directly, use cma_info() or cma_info_about(). */
+int
+__cma_info(struct cma_info *info, const struct device *dev, const char *type);
+
+
+/**
+ * cma_alloc - allocates contiguous chunk of memory.
+ * @dev: The device to perform allocation for.
+ * @type: A type of memory to allocate. Platform may define
+ * several different types of memory and device drivers
+ * can then request chunks of different types. Usually it's
+ * safe to pass NULL here which is the same as passing
+ * "common".
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise bus address of the chunk is returned.
+ */
+static inline struct cm * __must_check
+cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment)
+{
+ return dev ? __cma_alloc(dev, type, size, alignment) : ERR_PTR(-EINVAL);
+}
+
+
+/**
+ * struct cma_info - information about regions returned by cma_info().
+ * @lower_bound: The smallest address that is possible to be
+ * allocated for given (dev, type) pair.
+ * @upper_bound: The one byte after the biggest address that is
+ * possible to be allocated for given (dev, type)
+ * pair.
+ * @total_size: Total size of regions mapped to (dev, type) pair.
+ * @free_size: Total free size in all of the regions mapped to (dev, type)
+ * pair. Because of possible race conditions, it is not
+ * guaranteed that the value will be correct -- it gives only
+ * an approximation.
+ * @count: Number of regions mapped to (dev, type) pair.
+ */
+struct cma_info {
+ dma_addr_t lower_bound, upper_bound;
+ size_t total_size, free_size;
+ unsigned count;
+};
+
+/**
+ * cma_info - queries information about regions.
+ * @info: Pointer to a structure where to save the information.
+ * @dev: The device to query information for.
+ * @type: A type of memory to query information for.
+ * If unsure, pass NULL here which is equal to passing
+ * "common".
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info(struct cma_info *info, const struct device *dev, const char *type)
+{
+ return dev ? __cma_info(info, dev, type) : -EINVAL;
+}
+
+
+/**
+ * cma_free - frees a chunk of memory.
+ * @addr: Beginning of the chunk.
+ *
+ * Returns -ENOENT if there is no chunk at given location; otherwise
+ * zero. In the former case issues a warning.
+ */
+int cma_free(struct cm *addr);
+
+
+
+/****************************** Lower lever API *****************************/
+
+/**
+ * cma_alloc_from - allocates contiguous chunk of memory from named regions.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise bus address of the chunk is returned.
+ */
+static inline struct cm * __must_check
+cma_alloc_from(const char *regions, size_t size, dma_addr_t alignment)
+{
+ return __cma_alloc(NULL, regions, size, alignment);
+}
+
+/**
+ * cma_info_about - queries information about named regions.
+ * @info: Pointer to a structure where to save the information.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)
+{
+ return __cma_info(info, NULL, regions);
+}
+
+
+/**
+ * struct cma_region - a region reserved for CMA allocations.
+ * @name: Unique name of the region. Read only.
+ * @start: Bus address of the region in bytes. Always aligned at
+ * least to a full page. Read only.
+ * @size: Size of the region in bytes. Multiply of a page size.
+ * Read only.
+ * @alignment: Desired alignment of the region in bytes. A power of two,
+ * always at least page size. Early.
+ * @cma: Low level memory are registered to kernel mm subsystem.
+ * Private.
+ * @list: Entry in list of regions. Private.
+ * @used: Whether region was already used, ie. there was at least
+ * one allocation request for. Private.
+ * @registered: Whether this region has been registered. Read only.
+ * @reserved: Whether this region has been reserved. Early. Read only.
+ * @copy_name: Whether @name needs to be copied when this region is
+ * converted from early to normal. Early. Private.
+ *
+ * Regions come in two types: an early region and normal region. The
+ * former can be reserved or not-reserved. Fields marked as "early"
+ * are only meaningful in early regions.
+ *
+ * Early regions are important only during initialisation. The list
+ * of early regions is built from the "cma" command line argument or
+ * platform defaults. Platform initialisation code is responsible for
+ * reserving space for unreserved regions that are placed on
+ * cma_early_regions list.
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+struct cma_region {
+ const char *name;
+ dma_addr_t start;
+ size_t size;
+ dma_addr_t alignment; /* Early region */
+
+ struct cma *cma;
+ struct list_head list;
+
+ unsigned used:1;
+ unsigned registered:1;
+ unsigned reserved:1;
+ unsigned copy_name:1;
+};
+
+
+/**
+ * cma_region_register() - registers a region.
+ * @reg: Region to region.
+ *
+ * Region's start and size must be set.
+ *
+ * If name is set the region will be accessible using normal mechanism
+ * like mapping or cma_alloc_from() function otherwise it will be
+ * a private region and accessible only using the
+ * cma_alloc_from_region() function.
+ *
+ * If alloc is set function will try to initialise given allocator
+ * (and will return error if it failes). Otherwise alloc_name may
+ * point to a name of an allocator to use (if not set, the default
+ * will be used).
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. In particular, -EADDRINUSE if
+ * region overlap with already existing region.
+ */
+int __must_check cma_region_register(struct cma_region *reg);
+
+/**
+ * cma_region_unregister() - unregisters a region.
+ * @reg: Region to unregister.
+ *
+ * Region is unregistered only if there are no chunks allocated for
+ * it. Otherwise, function returns -EBUSY.
+ *
+ * On success returs zero.
+ */
+int __must_check cma_region_unregister(struct cma_region *reg);
+
+
+/**
+ * cma_alloc_from_region() - allocates contiguous chunk of memory from region.
+ * @reg: Region to allocate chunk from.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise bus address of the chunk is returned.
+ */
+struct cm * __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+
+
+/**************************** Initialisation API ****************************/
+
+/**
+ * cma_set_defaults() - specifies default command line parameters.
+ * @regions: A zero-sized entry terminated list of early regions.
+ * This array must not be placed in __initdata section.
+ * @map: Map attribute.
+ *
+ * This function should be called prior to cma_early_regions_reserve()
+ * and after early parameters have been parsed.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_set_defaults(struct cma_region *regions, const char *map);
+
+
+/**
+ * cma_early_regions - a list of early regions.
+ *
+ * Platform needs to allocate space for each of the region before
+ * initcalls are executed. If space is reserved, the reserved flag
+ * must be set. Platform initialisation code may choose to use
+ * cma_early_regions_allocate().
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+extern struct list_head cma_early_regions __initdata;
+
+
+/**
+ * cma_early_region_register() - registers an early region.
+ * @reg: Region to add.
+ *
+ * Region's size, start and alignment must be set (however the last
+ * two can be zero). If name is set the region will be accessible
+ * using normal mechanism like mapping or cma_alloc_from() function
+ * otherwise it will be a private region accessible only using the
+ * cma_alloc_from_region().
+ *
+ * During platform initialisation, space is reserved for early
+ * regions. Later, when CMA initialises, the early regions are
+ * "converted" into normal regions. If cma_region::alloc is set, CMA
+ * will then try to setup given allocator on the region. Failure to
+ * do so will result in the region not being registered even though
+ * the space for it will still be reserved. If cma_region::alloc is
+ * not set, allocator will be attached to the region on first use and
+ * the value of cma_region::alloc_name will be taken into account if
+ * set.
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. No checking if regions overlap is
+ * performed.
+ */
+int __init __must_check cma_early_region_register(struct cma_region *reg);
+
+
+/**
+ * cma_early_region_reserve() - reserves a physically contiguous memory region.
+ * @reg: Early region to reserve memory for.
+ *
+ * If platform supports bootmem this is the first allocator this
+ * function tries to use. If that failes (or bootmem is not
+ * supported) function tries to use memblec if it is available.
+ *
+ * On success sets reg->reserved flag.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_early_region_reserve(struct cma_region *reg);
+
+/**
+ * cma_early_regions_reserve() - helper function for reserving early regions.
+ * @reserve: Callbac function used to reserve space for region. Needs
+ * to return non-negative if allocation succeeded, negative
+ * error otherwise. NULL means cma_early_region_alloc() will
+ * be used.
+ *
+ * This function traverses the %cma_early_regions list and tries to
+ * reserve memory for each early region. It uses the @reserve
+ * callback function for that purpose. The reserved flag of each
+ * region is updated accordingly.
+ */
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+#endif
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 8952531..755f22e 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -1,3 +1,4 @@
+
#ifndef __LINUX_CMA_H
#define __LINUX_CMA_H

@@ -258,4 +259,6 @@ void cm_vunmap(struct cm *cm);

#endif

+#include <linux/cma-regions.h>
+
#endif
diff --git a/mm/Makefile b/mm/Makefile
index 01c3b20..33de730 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -49,4 +49,4 @@ obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
-obj-$(CONFIG_CMA) += cma.o
+obj-$(CONFIG_CMA) += cma.o cma-regions.o
diff --git a/mm/cma-regions.c b/mm/cma-regions.c
new file mode 100644
index 0000000..f37f58b
--- /dev/null
+++ b/mm/cma-regions.c
@@ -0,0 +1,759 @@
+/*
+ * Contiguous Memory Allocator framework - memory region management
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ * Updated by Marek Szyprowski ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/device.h> /* struct device, dev_name() */
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR, PTR_ERR, etc. */
+#include <linux/mm.h> /* PAGE_ALIGN() */
+#include <linux/module.h> /* EXPORT_SYMBOL_GPL() */
+#include <linux/mutex.h> /* mutex */
+#include <linux/slab.h> /* kmalloc() */
+#include <linux/string.h> /* str*() */
+
+#include <linux/cma.h>
+
+/*
+ * Protects cma_regions, cma_map, cma_map_length.
+ */
+static DEFINE_MUTEX(cma_mutex);
+
+
+/************************* Map attribute *************************/
+
+static const char *cma_map;
+static size_t cma_map_length;
+
+/*
+ * map-attr ::= [ rules [ ';' ] ]
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ *
+ * See Documentation/contiguous-memory.txt for details.
+ */
+static ssize_t cma_map_validate(const char *param)
+{
+ const char *ch = param;
+
+ if (*ch == '\0' || *ch == '\n')
+ return 0;
+
+ for (;;) {
+ const char *start = ch;
+
+ while (*ch && *ch != '\n' && *ch != ';' && *ch != '=')
+ ++ch;
+
+ if (*ch != '=' || start == ch) {
+ pr_err("map: expecting \"<patterns>=<regions>\" near %s\n",
+ start);
+ return -EINVAL;
+ }
+
+ while (*++ch != ';')
+ if (*ch == '\0' || *ch == '\n')
+ return ch - param;
+ if (ch[1] == '\0' || ch[1] == '\n')
+ return ch - param;
+ ++ch;
+ }
+}
+
+static int __init cma_map_param(char *param)
+{
+ ssize_t len;
+
+ pr_debug("param: map: %s\n", param);
+
+ len = cma_map_validate(param);
+ if (len < 0)
+ return len;
+
+ cma_map = param;
+ cma_map_length = len;
+ return 0;
+}
+
+#if defined CONFIG_CMA_CMDLINE
+
+early_param("cma.map", cma_map_param);
+
+#endif
+
+
+
+/************************* Early regions *************************/
+
+struct list_head cma_early_regions __initdata =
+ LIST_HEAD_INIT(cma_early_regions);
+
+#ifdef CONFIG_CMA_CMDLINE
+
+/*
+ * regions-attr ::= [ regions [ ';' ] ]
+ * regions ::= region [ ';' regions ]
+ *
+ * region ::= [ '-' ] reg-name
+ * '=' size
+ * [ '@' start ]
+ * [ '/' alignment ]
+ *
+ * See Documentation/contiguous-memory.txt for details.
+ *
+ * Example:
+ * cma=reg1=64M;reg2=32M@0x100000;reg3=64M/1M
+ *
+ */
+
+#define NUMPARSE(cond_ch, type, cond) ({ \
+ unsigned long long v = 0; \
+ if (*param == (cond_ch)) { \
+ const char *const msg = param + 1; \
+ v = memparse(msg, &param); \
+ if (!v || v > ~(type)0 || !(cond)) { \
+ pr_err("param: invalid value near %s\n", msg); \
+ ret = -EINVAL; \
+ break; \
+ } \
+ } \
+ v; \
+ })
+
+static int __init cma_param_parse(char *param)
+{
+ static struct cma_region regions[16];
+
+ size_t left = ARRAY_SIZE(regions);
+ struct cma_region *reg = regions;
+ int ret = 0;
+
+ pr_debug("param: %s\n", param);
+
+ for (; *param; ++reg) {
+ dma_addr_t start, alignment;
+ size_t size;
+
+ if (unlikely(!--left)) {
+ pr_err("param: too many early regions\n");
+ return -ENOSPC;
+ }
+
+ /* Parse name */
+ reg->name = param;
+ param = strchr(param, '=');
+ if (!param || param == reg->name) {
+ pr_err("param: expected \"<name>=\" near %s\n",
+ reg->name);
+ ret = -EINVAL;
+ break;
+ }
+ *param = '\0';
+
+ /* Parse numbers */
+ size = NUMPARSE('\0', size_t, true);
+ start = NUMPARSE('@', dma_addr_t, true);
+ alignment = NUMPARSE('/', dma_addr_t, (v & (v - 1)) == 0);
+
+ alignment = max(alignment, (dma_addr_t)PAGE_SIZE);
+ start = ALIGN(start, alignment);
+ size = PAGE_ALIGN(size);
+ if (start + size < start) {
+ pr_err("param: invalid start, size combination\n");
+ ret = -EINVAL;
+ break;
+ }
+
+ /* Go to next */
+ if (*param == ';') {
+ *param = '\0';
+ ++param;
+ } else if (*param) {
+ pr_err("param: expecting ';' or end of parameter near %s\n",
+ param);
+ ret = -EINVAL;
+ break;
+ }
+
+ /* Add */
+ reg->size = size;
+ reg->start = start;
+ reg->alignment = alignment;
+ reg->copy_name = 1;
+
+ list_add_tail(&reg->list, &cma_early_regions);
+
+ pr_debug("param: registering early region %s (%p@%p/%p)\n",
+ reg->name, (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+ }
+
+ return ret;
+}
+early_param("cma", cma_param_parse);
+
+#undef NUMPARSE
+
+#endif
+
+
+int __init __must_check cma_early_region_register(struct cma_region *reg)
+{
+ dma_addr_t start, alignment;
+ size_t size;
+
+ if (reg->alignment & (reg->alignment - 1))
+ return -EINVAL;
+
+ alignment = max(reg->alignment, (dma_addr_t)PAGE_SIZE);
+ start = ALIGN(reg->start, alignment);
+ size = PAGE_ALIGN(reg->size);
+
+ if (start + size < start)
+ return -EINVAL;
+
+ reg->size = size;
+ reg->start = start;
+ reg->alignment = alignment;
+
+ list_add_tail(&reg->list, &cma_early_regions);
+
+ pr_debug("param: registering early region %s (%p@%p/%p)\n",
+ reg->name, (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+
+ return 0;
+}
+
+
+
+/************************* Regions ******************************/
+
+/* List of all regions. Named regions are kept before unnamed. */
+static LIST_HEAD(cma_regions);
+
+#define cma_foreach_region(reg) \
+ list_for_each_entry(reg, &cma_regions, list)
+
+int __must_check cma_region_register(struct cma_region *reg)
+{
+ const char *name;
+ struct cma_region *r;
+ char *ch = NULL;
+ int ret = 0;
+
+ if (!reg->size || reg->start + reg->size < reg->start)
+ return -EINVAL;
+
+ reg->used = 0;
+ reg->registered = 0;
+
+ /* Copy name */
+ name = reg->name;
+ if (reg->copy_name && (reg->name)) {
+ size_t name_size;
+
+ name_size = reg->name ? strlen(reg->name) + 1 : 0;
+
+ ch = kmalloc(name_size, GFP_KERNEL);
+ if (!ch) {
+ pr_err("%s: not enough memory to allocate name\n",
+ reg->name ?: "(private)");
+ return -ENOMEM;
+ }
+
+ if (name_size) {
+ memcpy(ch, reg->name, name_size);
+ name = ch;
+ ch += name_size;
+ }
+ }
+
+ mutex_lock(&cma_mutex);
+
+ /* Don't let regions overlap */
+ cma_foreach_region(r)
+ if (r->start + r->size > reg->start &&
+ r->start < reg->start + reg->size) {
+ ret = -EADDRINUSE;
+ goto done;
+ }
+
+ reg->name = name;
+ reg->registered = 1;
+ reg->cma = cma_create(reg->start, reg->size, reg->alignment, false);
+
+ if (IS_ERR(reg->cma)) {
+ pr_err("error, cma create failed: %d\n", -(int)reg->cma);
+ ret = PTR_ERR(reg->cma);
+ goto done;
+ }
+ pr_info("created cma %p\n", reg->cma);
+ ch = NULL;
+
+ /*
+ * Keep named at the beginning and unnamed (private) at the
+ * end. This helps in traversal when named region is looked
+ * for.
+ */
+ if (name)
+ list_add(&reg->list, &cma_regions);
+ else
+ list_add_tail(&reg->list, &cma_regions);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: region %sregistered\n",
+ reg->name ?: "(private)", ret ? "not " : "");
+ kfree(ch);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cma_region_register);
+
+static struct cma_region *__must_check
+__cma_region_find(const char **namep)
+{
+ struct cma_region *reg;
+ const char *ch, *name;
+ size_t n;
+
+ ch = *namep;
+ while (*ch && *ch != ',' && *ch != ';')
+ ++ch;
+ name = *namep;
+ *namep = *ch == ',' ? ch + 1 : ch;
+ n = ch - name;
+
+ /*
+ * Named regions are kept in front of unnamed so if we
+ * encounter unnamed region we can stop.
+ */
+ cma_foreach_region(reg)
+ if (!reg->name)
+ break;
+ else if (!strncmp(name, reg->name, n) && !reg->name[n])
+ return reg;
+
+ return NULL;
+}
+
+/************************* Initialise CMA *************************/
+
+int __init cma_set_defaults(struct cma_region *regions, const char *map)
+{
+ if (map) {
+ int ret = cma_map_param((char *)map);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ if (!regions)
+ return 0;
+
+ for (; regions->size; ++regions) {
+ int ret = cma_early_region_register(regions);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ return 0;
+}
+
+
+int __init cma_early_region_reserve(struct cma_region *reg)
+{
+ int ret;
+
+ if (!reg->size || (reg->alignment & (reg->alignment - 1)) ||
+ reg->reserved)
+ return -EINVAL;
+
+ ret = cma_reserve(reg->start, reg->size, reg->alignment, true);
+
+ if (ret >= 0)
+ reg->start = ret;
+
+ return ret;
+}
+
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg))
+{
+ struct cma_region *reg;
+
+ pr_debug("init: reserving early regions\n");
+
+ if (!reserve)
+ reserve = cma_early_region_reserve;
+
+ list_for_each_entry(reg, &cma_early_regions, list) {
+ if (reg->reserved) {
+ /* nothing */
+ } else if (reserve(reg) >= 0) {
+ pr_debug("init: %s: reserved %p@%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size, (void *)reg->start);
+ reg->reserved = 1;
+ } else {
+ pr_warn("init: %s: unable to reserve %p@%p/%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+ }
+ }
+}
+
+
+static int __init cma_init(void)
+{
+ struct cma_region *reg, *n;
+
+ pr_debug("init: initialising\n");
+
+ if (cma_map) {
+ char *val = kmemdup(cma_map, cma_map_length + 1, GFP_KERNEL);
+ cma_map = val;
+ if (!val)
+ return -ENOMEM;
+ val[cma_map_length] = '\0';
+ }
+
+ list_for_each_entry_safe(reg, n, &cma_early_regions, list) {
+ INIT_LIST_HEAD(&reg->list);
+ /*
+ * We don't care if there was an error. It's a pity
+ * but there's not much we can do about it any way.
+ * If the error is on a region that was parsed from
+ * command line then it will stay and waste a bit of
+ * space; if it was registered using
+ * cma_early_region_register() it's caller's
+ * responsibility to do something about it.
+ */
+ if (reg->reserved && cma_region_register(reg) < 0)
+ /* ignore error */;
+ }
+
+ INIT_LIST_HEAD(&cma_early_regions);
+
+ return 0;
+}
+/*
+ * We want to be initialised earlier than module_init/__initcall so
+ * that drivers that want to grab memory at boot time will get CMA
+ * ready. subsys_initcall() seems early enough and not too early at
+ * the same time.
+ */
+subsys_initcall(cma_init);
+
+/************************* The Device API *************************/
+
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type);
+
+
+/* Allocate. */
+
+static struct cm * __must_check
+__cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ struct cm *cm;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!reg)
+ return ERR_PTR(-ENOMEM);
+
+ cm = cm_alloc(reg->cma, size, alignment);
+ if (IS_ERR(cm)) {
+ pr_err("failed to allocate\n");
+ return ERR_PTR(-ENOMEM);
+ }
+ return cm;
+}
+
+struct cm * __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ struct cm *cm;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!size || alignment & (alignment - 1) || !reg)
+ return ERR_PTR(-EINVAL);
+
+ mutex_lock(&cma_mutex);
+
+ cm = reg->registered ?
+ __cma_alloc_from_region(reg, PAGE_ALIGN(size),
+ max(alignment, (dma_addr_t)PAGE_SIZE)) :
+ ERR_PTR(-EINVAL);
+
+ mutex_unlock(&cma_mutex);
+
+ return cm;
+}
+EXPORT_SYMBOL_GPL(cma_alloc_from_region);
+
+struct cm * __must_check
+__cma_alloc(const struct device *dev, const char *type,
+ dma_addr_t size, dma_addr_t alignment)
+{
+ struct cma_region *reg;
+ const char *from;
+ struct cm *cm;
+
+ if (dev)
+ pr_debug("allocate %p/%p for %s/%s\n",
+ (void *)size, (void *)alignment,
+ dev_name(dev), type ?: "");
+
+ if (!size || alignment & (alignment - 1))
+ return ERR_PTR(-EINVAL);
+
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (unlikely(IS_ERR(from))) {
+ cm = ERR_PTR(PTR_ERR(from));
+ goto done;
+ }
+
+ pr_debug("allocate %p/%p from one of %s\n",
+ (void *)size, (void *)alignment, from);
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ cm = __cma_alloc_from_region(reg, size, alignment);
+ if (!IS_ERR(cm))
+ goto done;
+ }
+
+ pr_debug("not enough memory\n");
+ cm = ERR_PTR(-ENOMEM);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ return cm;
+}
+EXPORT_SYMBOL_GPL(__cma_alloc);
+
+
+/* Query information about regions. */
+static void __cma_info_add(struct cma_info *infop, struct cma_region *reg)
+{
+ infop->total_size += reg->size;
+ if (infop->lower_bound > reg->start)
+ infop->lower_bound = reg->start;
+ if (infop->upper_bound < reg->start + reg->size)
+ infop->upper_bound = reg->start + reg->size;
+ ++infop->count;
+}
+
+int
+__cma_info(struct cma_info *infop, const struct device *dev, const char *type)
+{
+ struct cma_info info = { ~(dma_addr_t)0, 0, 0, 0, 0 };
+ struct cma_region *reg;
+ const char *from;
+ int ret;
+
+ if (unlikely(!infop))
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (IS_ERR(from)) {
+ ret = PTR_ERR(from);
+ info.lower_bound = 0;
+ goto done;
+ }
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ if (reg)
+ __cma_info_add(&info, reg);
+ }
+
+ ret = 0;
+done:
+ mutex_unlock(&cma_mutex);
+
+ memcpy(infop, &info, sizeof info);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__cma_info);
+
+
+/* Freeing. */
+int cma_free(struct cm *cm)
+{
+ mutex_lock(&cma_mutex);
+
+ if (cm)
+ pr_debug("free(%p): freed\n", (void *)cm);
+
+ cm_free(cm);
+ mutex_unlock(&cma_mutex);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cma_free);
+
+
+/************************* Miscellaneous *************************/
+
+/*
+ * s ::= rules
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ */
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type)
+{
+ /*
+ * This function matches the pattern from the map attribute
+ * agains given device name and type. Type may be of course
+ * NULL or an emtpy string.
+ */
+
+ const char *s, *name;
+ int name_matched = 0;
+
+ /*
+ * If dev is NULL we were called in alternative form where
+ * type is the from string. All we have to do is return it.
+ */
+ if (!dev)
+ return type ?: ERR_PTR(-EINVAL);
+
+ if (!cma_map)
+ return ERR_PTR(-ENOENT);
+
+ name = dev_name(dev);
+ if (WARN_ON(!name || !*name))
+ return ERR_PTR(-EINVAL);
+
+ if (!type)
+ type = "common";
+
+ /*
+ * Now we go throught the cma_map attribute.
+ */
+ for (s = cma_map; *s; ++s) {
+ const char *c;
+
+ /*
+ * If the pattern starts with a slash, the device part of the
+ * pattern matches if it matched previously.
+ */
+ if (*s == '/') {
+ if (!name_matched)
+ goto look_for_next;
+ goto match_type;
+ }
+
+ /*
+ * We are now trying to match the device name. This also
+ * updates the name_matched variable. If, while reading the
+ * spec, we ecnounter comma it means that the pattern does not
+ * match and we need to start over with another pattern (the
+ * one afther the comma). If we encounter equal sign we need
+ * to start over with another rule. If there is a character
+ * that does not match, we neet to look for a comma (to get
+ * another pattern) or semicolon (to get another rule) and try
+ * again if there is one somewhere.
+ */
+
+ name_matched = 0;
+
+ for (c = name; *s != '*' && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*s != '?' && *c != *s)
+ goto look_for_next;
+ if (*s == '*')
+ ++s;
+
+ name_matched = 1;
+
+ /*
+ * Now we need to match the type part of the pattern. If the
+ * pattern is missing it we match only if type points to an
+ * empty string. Otherwise wy try to match it just like name.
+ */
+ if (*s == '/') {
+match_type: /* s points to '/' */
+ ++s;
+
+ for (c = type; *s && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*c != *s)
+ goto look_for_next;
+ }
+
+ /* Return the string behind the '=' sign of the rule. */
+ if (*s == '=')
+ return s + 1;
+ else if (*s == ',')
+ return strchr(s, '=') + 1;
+
+ /* Pattern did not match */
+
+look_for_next:
+ do {
+ ++s;
+ } while (*s != ',' && *s != '=');
+ if (*s == ',')
+ continue;
+
+next_rule: /* s points to '=' */
+ s = strchr(s, ';');
+ if (!s)
+ break;
+
+next_pattern:
+ continue;
+ }
+
+ return ERR_PTR(-ENOENT);
+}
--
1.7.1.569.g6f426

2011-03-31 13:16:23

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 02/12] lib: genalloc: Generic allocator improvements

From: Michal Nazarewicz <[email protected]>

This commit adds a gen_pool_alloc_aligned() function to the
generic allocator API. It allows specifying alignment for the
allocated block. This feature uses
the bitmap_find_next_zero_area_off() function.

It also fixes possible issue with bitmap's last element being
not fully allocated (ie. space allocated for chunk->bits is
not a multiple of sizeof(long)).

It also makes some other smaller changes:
- moves structure definitions out of the header file,
- adds __must_check to functions returning value,
- makes gen_pool_add() return -ENOMEM rater than -1 on error,
- changes list_for_each to list_for_each_entry, and
- makes use of bitmap_clear().

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/genalloc.h | 46 ++++++------
lib/genalloc.c | 182 ++++++++++++++++++++++++++-------------------
2 files changed, 129 insertions(+), 99 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 9869ef3..8ac7337 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -8,29 +8,31 @@
* Version 2. See the file COPYING for more details.
*/

+struct gen_pool;

-/*
- * General purpose special memory pool descriptor.
- */
-struct gen_pool {
- rwlock_t lock;
- struct list_head chunks; /* list of chunks in this pool */
- int min_alloc_order; /* minimum allocation order */
-};
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid);

-/*
- * General purpose special memory pool chunk descriptor.
+int __must_check gen_pool_add(struct gen_pool *pool, unsigned long addr,
+ size_t size, int nid);
+
+void gen_pool_destroy(struct gen_pool *pool);
+
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order);
+
+/**
+ * gen_pool_alloc() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses a first-fit algorithm.
*/
-struct gen_pool_chunk {
- spinlock_t lock;
- struct list_head next_chunk; /* next chunk in pool */
- unsigned long start_addr; /* starting address of memory chunk */
- unsigned long end_addr; /* ending address of memory chunk */
- unsigned long bits[0]; /* bitmap for allocating memory chunk */
-};
+static inline unsigned long __must_check
+gen_pool_alloc(struct gen_pool *pool, size_t size)
+{
+ return gen_pool_alloc_aligned(pool, size, 0);
+}

-extern struct gen_pool *gen_pool_create(int, int);
-extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 1923f14..0761079 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -16,53 +16,80 @@
#include <linux/genalloc.h>


+/* General purpose special memory pool descriptor. */
+struct gen_pool {
+ rwlock_t lock; /* protects chunks list */
+ struct list_head chunks; /* list of chunks in this pool */
+ unsigned order; /* minimum allocation order */
+};
+
+/* General purpose special memory pool chunk descriptor. */
+struct gen_pool_chunk {
+ spinlock_t lock; /* protects bits */
+ struct list_head next_chunk; /* next chunk in pool */
+ unsigned long start; /* start of memory chunk */
+ unsigned long size; /* number of bits */
+ unsigned long bits[0]; /* bitmap for allocating memory chunk */
+};
+
+
/**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @order: Log base 2 of number of bytes each bitmap bit
+ * represents.
+ * @nid: Node id of the node the pool structure should be allocated
+ * on, or -1. This will be also used for other allocations.
*
* Create a new special memory pool that can be used to manage special purpose
* memory not managed by the regular kmalloc/kfree interface.
*/
-struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
+struct gen_pool *__must_check gen_pool_create(unsigned order, int nid)
{
struct gen_pool *pool;

- pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
- if (pool != NULL) {
+ if (WARN_ON(order >= BITS_PER_LONG))
+ return NULL;
+
+ pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid);
+ if (pool) {
rwlock_init(&pool->lock);
INIT_LIST_HEAD(&pool->chunks);
- pool->min_alloc_order = min_alloc_order;
+ pool->order = order;
}
return pool;
}
EXPORT_SYMBOL(gen_pool_create);

/**
- * gen_pool_add - add a new chunk of special memory to the pool
- * @pool: pool to add new memory chunk to
- * @addr: starting address of memory chunk to add to pool
- * @size: size in bytes of the memory chunk to add to pool
- * @nid: node id of the node the chunk structure and bitmap should be
- * allocated on, or -1
+ * gen_pool_add() - add a new chunk of special memory to the pool
+ * @pool: Pool to add new memory chunk to.
+ * @addr: Starting address of memory chunk to add to pool.
+ * @size: Size in bytes of the memory chunk to add to pool.
*
* Add a new chunk of special memory to the specified pool.
*/
-int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
- int nid)
+int __must_check
+gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size, int nid)
{
struct gen_pool_chunk *chunk;
- int nbits = size >> pool->min_alloc_order;
- int nbytes = sizeof(struct gen_pool_chunk) +
- (nbits + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
+ size_t nbytes;
+
+ if (WARN_ON(!addr || addr + size < addr ||
+ (addr & ((1 << pool->order) - 1))))
+ return -EINVAL;

- chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
- if (unlikely(chunk == NULL))
- return -1;
+ size = size >> pool->order;
+ if (WARN_ON(!size))
+ return -EINVAL;
+
+ nbytes = sizeof *chunk + BITS_TO_LONGS(size) * sizeof *chunk->bits;
+ chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
+ if (!chunk)
+ return -ENOMEM;

spin_lock_init(&chunk->lock);
- chunk->start_addr = addr;
- chunk->end_addr = addr + size;
+ chunk->start = addr >> pool->order;
+ chunk->size = size;

write_lock(&pool->lock);
list_add(&chunk->next_chunk, &pool->chunks);
@@ -73,115 +100,116 @@ int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size,
EXPORT_SYMBOL(gen_pool_add);

/**
- * gen_pool_destroy - destroy a special memory pool
- * @pool: pool to destroy
+ * gen_pool_destroy() - destroy a special memory pool
+ * @pool: Pool to destroy.
*
* Destroy the specified special memory pool. Verifies that there are no
* outstanding allocations.
*/
void gen_pool_destroy(struct gen_pool *pool)
{
- struct list_head *_chunk, *_next_chunk;
struct gen_pool_chunk *chunk;
- int order = pool->min_alloc_order;
- int bit, end_bit;
-
+ int bit;

- list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ while (!list_empty(&pool->chunks)) {
+ chunk = list_entry(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
list_del(&chunk->next_chunk);

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
- BUG_ON(bit < end_bit);
+ bit = find_next_bit(chunk->bits, chunk->size, 0);
+ BUG_ON(bit < chunk->size);

kfree(chunk);
}
kfree(pool);
- return;
}
EXPORT_SYMBOL(gen_pool_destroy);

/**
- * gen_pool_alloc - allocate special memory from the pool
- * @pool: pool to allocate from
- * @size: number of bytes to allocate from the pool
+ * gen_pool_alloc_aligned() - allocate special memory from the pool
+ * @pool: Pool to allocate from.
+ * @size: Number of bytes to allocate from the pool.
+ * @alignment_order: Order the allocated space should be
+ * aligned to (eg. 20 means allocated space
+ * must be aligned to 1MiB).
*
* Allocate the requested number of bytes from the specified pool.
* Uses a first-fit algorithm.
*/
-unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
+unsigned long __must_check
+gen_pool_alloc_aligned(struct gen_pool *pool, size_t size,
+ unsigned alignment_order)
{
- struct list_head *_chunk;
+ unsigned long addr, align_mask = 0, flags, start;
struct gen_pool_chunk *chunk;
- unsigned long addr, flags;
- int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit;

if (size == 0)
return 0;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (alignment_order > pool->order)
+ align_mask = (1 << (alignment_order - pool->order)) - 1;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ size = (size + (1UL << pool->order) - 1) >> pool->order;

- end_bit = (chunk->end_addr - chunk->start_addr) >> order;
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk) {
+ if (chunk->size < size)
+ continue;

spin_lock_irqsave(&chunk->lock, flags);
- start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
- nbits, 0);
- if (start_bit >= end_bit) {
+ start = bitmap_find_next_zero_area_off(chunk->bits, chunk->size,
+ 0, size, align_mask,
+ chunk->start);
+ if (start >= chunk->size) {
spin_unlock_irqrestore(&chunk->lock, flags);
continue;
}

- addr = chunk->start_addr + ((unsigned long)start_bit << order);
-
- bitmap_set(chunk->bits, start_bit, nbits);
+ bitmap_set(chunk->bits, start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- read_unlock(&pool->lock);
- return addr;
+ addr = (chunk->start + start) << pool->order;
+ goto done;
}
+
+ addr = 0;
+done:
read_unlock(&pool->lock);
- return 0;
+ return addr;
}
-EXPORT_SYMBOL(gen_pool_alloc);
+EXPORT_SYMBOL(gen_pool_alloc_aligned);

/**
- * gen_pool_free - free allocated special memory back to the pool
- * @pool: pool to free to
- * @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * gen_pool_free() - free allocated special memory back to the pool
+ * @pool: Pool to free to.
+ * @addr: Starting address of memory to free back to pool.
+ * @size: Size in bytes of memory to free.
*
* Free previously allocated special memory back to the specified pool.
*/
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
unsigned long flags;
- int order = pool->min_alloc_order;
- int bit, nbits;

- nbits = (size + (1UL << order) - 1) >> order;
+ if (!size)
+ return;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ addr = addr >> pool->order;
+ size = (size + (1UL << pool->order) - 1) >> pool->order;
+
+ BUG_ON(addr + size < addr);

- if (addr >= chunk->start_addr && addr < chunk->end_addr) {
- BUG_ON(addr + size > chunk->end_addr);
+ read_lock(&pool->lock);
+ list_for_each_entry(chunk, &pool->chunks, next_chunk)
+ if (addr >= chunk->start &&
+ addr + size <= chunk->start + chunk->size) {
spin_lock_irqsave(&chunk->lock, flags);
- bit = (addr - chunk->start_addr) >> order;
- while (nbits--)
- __clear_bit(bit++, chunk->bits);
+ bitmap_clear(chunk->bits, addr - chunk->start, size);
spin_unlock_irqrestore(&chunk->lock, flags);
- break;
+ goto done;
}
- }
- BUG_ON(nbits > 0);
+ BUG_ON(1);
+done:
read_unlock(&pool->lock);
}
EXPORT_SYMBOL(gen_pool_free);
--
1.7.1.569.g6f426

2011-03-31 13:18:55

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH 01/12] lib: bitmap: Added alignment offset for bitmap_find_next_zero_area()

From: Michal Nazarewicz <[email protected]>

This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked. This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
CC: Michal Nazarewicz <[email protected]>
---
include/linux/bitmap.h | 24 +++++++++++++++++++-----
lib/bitmap.c | 22 ++++++++++++----------
2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..c0528d1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -45,6 +45,7 @@
* bitmap_set(dst, pos, nbits) Set specified bit area
* bitmap_clear(dst, pos, nbits) Clear specified bit area
* bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area
+ * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above
* bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n
* bitmap_shift_left(dst, src, n, nbits) *dst = *src << n
* bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -113,11 +114,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits);

extern void bitmap_set(unsigned long *map, int i, int len);
extern void bitmap_clear(unsigned long *map, int start, int nr);
-extern unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask);
+
+extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset);
+
+static inline unsigned long
+bitmap_find_next_zero_area(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask)
+{
+ return bitmap_find_next_zero_area_off(map, size, start, nr,
+ align_mask, 0);
+}

extern int bitmap_scnprintf(char *buf, unsigned int len,
const unsigned long *src, int nbits);
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 741fae9..8e75a6f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr)
}
EXPORT_SYMBOL(bitmap_clear);

-/*
+/**
* bitmap_find_next_zero_area - find a contiguous aligned zero area
* @map: The address to base the search on
* @size: The bitmap size in bits
* @start: The bitnumber to start searching at
* @nr: The number of zeroed bits we're looking for
* @align_mask: Alignment mask for zero area
+ * @align_offset: Alignment offset for zero area.
*
* The @align_mask should be one less than a power of 2; the effect is that
- * the bit offset of all zero areas this function finds is multiples of that
- * power of 2. A @align_mask of 0 means no alignment is required.
+ * the bit offset of all zero areas this function finds plus @align_offset
+ * is multiple of that power of 2.
*/
-unsigned long bitmap_find_next_zero_area(unsigned long *map,
- unsigned long size,
- unsigned long start,
- unsigned int nr,
- unsigned long align_mask)
+unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr,
+ unsigned long align_mask,
+ unsigned long align_offset)
{
unsigned long index, end, i;
again:
index = find_next_zero_bit(map, size, start);

/* Align allocation */
- index = __ALIGN_MASK(index, align_mask);
+ index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;

end = index + nr;
if (end > size)
@@ -350,7 +352,7 @@ again:
}
return index;
}
-EXPORT_SYMBOL(bitmap_find_next_zero_area);
+EXPORT_SYMBOL(bitmap_find_next_zero_area_off);

/*
* Bitmap printing & parsing functions: first version by Bill Irwin,
--
1.7.1.569.g6f426

2011-03-31 15:58:15

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
>
> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> + gfp_t flag)
> +{
> + unsigned long pfn = start, count;
> + struct page *page;
> + struct zone *zone;
> + int order;
> +
> + VM_BUG_ON(!pfn_valid(start));

This seems kinda mean. Could we return an error? I understand that
this is largely going to be an early-boot thing, but surely trying to
punt on crappy input beats a full-on BUG().

if (!pfn_valid(start))
return -1;

> + zone = page_zone(pfn_to_page(start));
> +
> + spin_lock_irq(&zone->lock);
> +
> + page = pfn_to_page(pfn);
> + for (;;) {
> + VM_BUG_ON(page_count(page) || !PageBuddy(page));
> + list_del(&page->lru);
> + order = page_order(page);
> + zone->free_area[order].nr_free--;
> + rmv_page_order(page);
> + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
> + pfn += 1 << order;
> + if (pfn >= end)
> + break;

If start->end happens to span the end of a zone, I believe this will
jump out of the zone. It will still be pfn_valid(), but potentially not
in the same zone.

> + VM_BUG_ON(!pfn_valid(pfn));
> + page += 1 << order;
> + }

That will break on SPARSEMEM. You potentially need to revalidate the
pfn->page mapping on every MAX_ORDER pfn change. It's easiest to just
do pfn_to_page() on each loop.

> +
> + spin_unlock_irq(&zone->lock);
>
> +void free_contig_pages(struct page *page, int nr_pages)
> +{
> + for (; nr_pages; --nr_pages, ++page)
> + __free_page(page);
> +}

Can't help but notice that this resembles a bit of a patch I posted last
week:

http://www.spinics.net/lists/linux-mm/msg16364.html

We'll have to make sure we only have one copy of this in the end.

-- Dave

2011-03-31 16:02:56

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> + ret = 0;
> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> + if (WARN_ON(++ret >= MAX_ORDER))
> + return -EINVAL;

Holy cow, that's dense. Is there really no more straightforward way to
do that?

In any case, please pull the ++ret bit out of the WARN_ON(). Some
people like to do:

#define WARN_ON(...) do{}while(0)

to save space on some systems.

-- Dave

2011-03-31 16:26:52

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

> On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
>> + ret = 0;
>> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
>> + if (WARN_ON(++ret >= MAX_ORDER))
>> + return -EINVAL;

On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote:
> Holy cow, that's dense. Is there really no more straightforward way to
> do that?

Which part exactly is dense? What would be qualify as a more
straightforward way?

> In any case, please pull the ++ret bit out of the WARN_ON(). Some
> people like to do:
>
> #define WARN_ON(...) do{}while(0)
>
> to save space on some systems.

I don't think that's the case. Even if WARN_ON() decides not to print
a warning, it will still return the value of the argument. If not,
a lot of code will brake.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-03-31 16:27:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> + _start = start & (~0UL << ret);
> + _end = alloc_contig_freed_pages(_start, end, flag);

These names are a wee bit lacking. Care to give them proper names that
might let a reader figure out how the "_" makes the variable different
from its nearly-identical twin?

-- Dave

2011-03-31 19:24:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Thu, Mar 31, 2011 at 08:58:03AM -0700, Dave Hansen wrote:
> On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> >
> > +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> > + gfp_t flag)
> > +{
> > + unsigned long pfn = start, count;
> > + struct page *page;
> > + struct zone *zone;
> > + int order;
> > +
> > + VM_BUG_ON(!pfn_valid(start));
>
> This seems kinda mean. Could we return an error? I understand that
> this is largely going to be an early-boot thing, but surely trying to
> punt on crappy input beats a full-on BUG().
>
> if (!pfn_valid(start))
> return -1;

But still keep the warning?

if (WARN_ON(!pfn_valid(start))
return -1;

-- Steve

2011-03-31 19:26:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, Mar 31, 2011 at 09:02:41AM -0700, Dave Hansen wrote:
> On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> > + ret = 0;
> > + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> > + if (WARN_ON(++ret >= MAX_ORDER))
> > + return -EINVAL;
>
> Holy cow, that's dense. Is there really no more straightforward way to
> do that?
>
> In any case, please pull the ++ret bit out of the WARN_ON(). Some
> people like to do:
>
> #define WARN_ON(...) do{}while(0)
>
> to save space on some systems.

That should be fixed, as the if (WARN_ON()) has become a standard in
most of the kernel. Removing WARN_ON() should be:

#define WARN_ON(x) ({0;})

But I agree, that there should be no "side effects" inside a WARN_ON(),
which that "++ret" is definitely one.

-- Steve

2011-03-31 19:28:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, Mar 31, 2011 at 06:26:45PM +0200, Michal Nazarewicz wrote:
> >In any case, please pull the ++ret bit out of the WARN_ON(). Some
> >people like to do:
> >
> >#define WARN_ON(...) do{}while(0)
> >
> >to save space on some systems.
>
> I don't think that's the case. Even if WARN_ON() decides not to print
> a warning, it will still return the value of the argument. If not,
> a lot of code will brake.
>

WARN_ON() should never do anything but test. That ret++ does not belong
inside the WARN_ON() condition. If there are other locations in the
kernel that do that, then those locations need to be fixed.

-- Steve

2011-03-31 19:52:33

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, 31 Mar 2011 21:28:21 +0200, Steven Rostedt wrote:
> WARN_ON() should never do anything but test. That ret++ does not belong
> inside the WARN_ON() condition. If there are other locations in the
> kernel that do that, then those locations need to be fixed.

Testing implies evaluating, so if we allow:

if (++i == end) { /* ... */ }

I see no reason why not to allow:

if (WARN_ON(++i == end)) { /* ... */ }

In both cases the condition is tested.

>> On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
>>> + ret = 0;
>>> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
>>> + if (WARN_ON(++ret >= MAX_ORDER))
>>> + return -EINVAL;

> On Thu, Mar 31, 2011 at 09:02:41AM -0700, Dave Hansen wrote:
>> In any case, please pull the ++ret bit out of the WARN_ON(). Some
>> people like to do:
>>
>> #define WARN_ON(...) do{}while(0)
>>
>> to save space on some systems.

On Thu, 31 Mar 2011 21:26:50 +0200, Steven Rostedt wrote:
> That should be fixed, as the if (WARN_ON()) has become a standard in
> most of the kernel. Removing WARN_ON() should be:
>
> #define WARN_ON(x) ({0;})

This would break a lot of code which expect that testing to take place.
Also see <http://lxr.linux.no/linux+*/include/asm-generic/bug.h#L108>.

> But I agree, that there should be no "side effects" inside a WARN_ON(),
> which that "++ret" is definitely one.

Thus I don't really agree with this point.

At any rate, I don't really care.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-03-31 20:28:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, 2011-03-31 at 18:26 +0200, Michal Nazarewicz wrote:
> > On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> >> + ret = 0;
> >> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
> >> + if (WARN_ON(++ret >= MAX_ORDER))
> >> + return -EINVAL;
>
> On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote:
> > Holy cow, that's dense. Is there really no more straightforward way to
> > do that?
>
> Which part exactly is dense? What would be qualify as a more
> straightforward way?

I'm still not 100% sure what it's trying to do. It looks like it
attempts to check all of "start"'s buddy pages.

unsigned long find_buddy(unsigned long pfn, int buddy)
{
unsigned long page_idx = pfn & ((1 << MAX_ORDER) - 1); // You had a macro for this I think
unsigned long buddy_idx = __find_buddy_index(page_idx, order);
return page_idx + buddy_idx;
}

Is something like this equivalent?

int order;
for (order = 0; order <= MAX_ORDER; order++) {
unsigned long buddy_pfn = find_buddy(start, order);
struct page *buddy = pfn_to_page(buddy_pfn);
if (PageBuddy(buddy)
break;
WARN();
return -EINVAL;
}

I'm wondering also if you can share some code with __rmqueue().

> > In any case, please pull the ++ret bit out of the WARN_ON(). Some
> > people like to do:
> >
> > #define WARN_ON(...) do{}while(0)
> >
> > to save space on some systems.
>
> I don't think that's the case. Even if WARN_ON() decides not to print
> a warning, it will still return the value of the argument. If not,
> a lot of code will brake.

Bah, sorry. I'm confusing WARN_ON() and WARN().

-- Dave

2011-03-31 20:33:48

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Thu, 2011-03-31 at 15:24 -0400, Steven Rostedt wrote:
> On Thu, Mar 31, 2011 at 08:58:03AM -0700, Dave Hansen wrote:
> > On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> > >
> > > +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
> > > + gfp_t flag)
> > > +{
> > > + unsigned long pfn = start, count;
> > > + struct page *page;
> > > + struct zone *zone;
> > > + int order;
> > > +
> > > + VM_BUG_ON(!pfn_valid(start));
> >
> > This seems kinda mean. Could we return an error? I understand that
> > this is largely going to be an early-boot thing, but surely trying to
> > punt on crappy input beats a full-on BUG().
> >
> > if (!pfn_valid(start))
> > return -1;
>
> But still keep the warning?
>
> if (WARN_ON(!pfn_valid(start))
> return -1;

Sure. You might also want to make sure you're respecting __GFP_NOWARN
if you're going to do that, or maybe just warn once per boot.


-- Dave

2011-03-31 21:09:48

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Thu, 31 Mar 2011 17:58:03 +0200, Dave Hansen <[email protected]>
wrote:

> On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
>>
>> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned
>> long end,
>> + gfp_t flag)
>> +{
>> + unsigned long pfn = start, count;
>> + struct page *page;
>> + struct zone *zone;
>> + int order;
>> +
>> + VM_BUG_ON(!pfn_valid(start));
>
> This seems kinda mean. Could we return an error? I understand that
> this is largely going to be an early-boot thing, but surely trying to
> punt on crappy input beats a full-on BUG().

Actually, I would have to check but I think that the usage of this function
(in this patchset) is that the caller expects the function to succeed. It
is
quite a low-level function so before running it a lot of preparation is
needed
and the caller must make sure that several conditions are met. I don't
really
see advantage of returning a value rather then BUG()ing.

Also, CMA does not call this function at boot time.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-03-31 21:14:46

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Thu, 2011-03-31 at 23:09 +0200, Michal Nazarewicz wrote:
> On Thu, 31 Mar 2011 17:58:03 +0200, Dave Hansen <[email protected]>
> wrote:
> > On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
> >> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned
> >> long end,
> >> + gfp_t flag)
> >> +{
> >> + unsigned long pfn = start, count;
> >> + struct page *page;
> >> + struct zone *zone;
> >> + int order;
> >> +
> >> + VM_BUG_ON(!pfn_valid(start));
> >
> > This seems kinda mean. Could we return an error? I understand that
> > this is largely going to be an early-boot thing, but surely trying to
> > punt on crappy input beats a full-on BUG().
>
> Actually, I would have to check but I think that the usage of this function
> (in this patchset) is that the caller expects the function to succeed. It
> is quite a low-level function so before running it a lot of preparation is
> needed and the caller must make sure that several conditions are met. I don't
> really see advantage of returning a value rather then BUG()ing.
>
> Also, CMA does not call this function at boot time.

We BUG_ON() in bootmem. Basically if we try to allocate an early-boot
structure and fail, we're screwed. We can't keep running without an
inode hash, or a mem_map[].

This looks like it's going to at least get partially used in drivers, at
least from the examples. Are these kinds of things that, if the driver
fails to load, that the system is useless and hosed? Or, is it
something where we might limp along to figure out what went wrong before
we reboot?

-- Dave

2011-03-31 21:17:57

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 05/12] mm: alloc_contig_range() added

On Thu, 31 Mar 2011 22:28:42 +0200, Dave Hansen <[email protected]>
wrote:

> On Thu, 2011-03-31 at 18:26 +0200, Michal Nazarewicz wrote:
>> > On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
>> >> + ret = 0;
>> >> + while (!PageBuddy(pfn_to_page(start & (~0UL << ret))))
>> >> + if (WARN_ON(++ret >= MAX_ORDER))
>> >> + return -EINVAL;
>>
>> On Thu, 31 Mar 2011 18:02:41 +0200, Dave Hansen wrote:
>> > Holy cow, that's dense. Is there really no more straightforward way
>> to
>> > do that?
>>
>> Which part exactly is dense? What would be qualify as a more
>> straightforward way?
>
> I'm still not 100% sure what it's trying to do. It looks like it
> attempts to check all of "start"'s buddy pages.

No. I'm going up through parents. This is because even though start
falls in a free block (ie. one that page allocator tracks), the actual
page that is in buddy system is larger then start and this loop looks
for beginning of that page.

> int order;
> for (order = 0; order <= MAX_ORDER; order++) {
> unsigned long buddy_pfn = find_buddy(start, order);
> struct page *buddy = pfn_to_page(buddy_pfn);
> if (PageBuddy(buddy)
> break;
> WARN();
> return -EINVAL;
> }

The WARN() and return would have to be outside of the loop and, as I
described, instead of find_buddy() something like find_parent() would
have to be used.

> I'm wondering also if you can share some code with __rmqueue().

Doubtful since start does not (have to) point to a page that is tracked
by page allocator but a page inside such a page.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-03-31 22:18:16

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

> On Thu, 2011-03-31 at 15:16 +0200, Marek Szyprowski wrote:
>> +unsigned long alloc_contig_freed_pages(unsigned long start, unsigned
>> long end,
>> + gfp_t flag)
>> +{
>> + unsigned long pfn = start, count;
>> + struct page *page;
>> + struct zone *zone;
>> + int order;
>> +
>> + VM_BUG_ON(!pfn_valid(start));

On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote:
> We BUG_ON() in bootmem. Basically if we try to allocate an early-boot
> structure and fail, we're screwed. We can't keep running without an
> inode hash, or a mem_map[].
>
> This looks like it's going to at least get partially used in drivers, at
> least from the examples. Are these kinds of things that, if the driver
> fails to load, that the system is useless and hosed? Or, is it
> something where we might limp along to figure out what went wrong before
> we reboot?

Bug in the above place does not mean that we could not allocate memory. It
means caller is broken.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-03-31 22:27:06

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Fri, 2011-04-01 at 00:18 +0200, Michal Nazarewicz wrote:
> On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote:
> > We BUG_ON() in bootmem. Basically if we try to allocate an early-boot
> > structure and fail, we're screwed. We can't keep running without an
> > inode hash, or a mem_map[].
> >
> > This looks like it's going to at least get partially used in drivers, at
> > least from the examples. Are these kinds of things that, if the driver
> > fails to load, that the system is useless and hosed? Or, is it
> > something where we might limp along to figure out what went wrong before
> > we reboot?
>
> Bug in the above place does not mean that we could not allocate memory. It
> means caller is broken.

Could you explain that a bit?

Is this a case where a device is mapped to a very *specific* range of
physical memory and no where else? What are the reasons for not marking
it off limits at boot? I also saw some bits of isolation and migration
in those patches. Can't the migration fail?

-- Dave

2011-03-31 22:51:38

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Fri, 01 Apr 2011 00:26:51 +0200, Dave Hansen <[email protected]>
wrote:

> On Fri, 2011-04-01 at 00:18 +0200, Michal Nazarewicz wrote:
>> On Thu, 31 Mar 2011 23:14:38 +0200, Dave Hansen wrote:
>> > We BUG_ON() in bootmem. Basically if we try to allocate an early-boot
>> > structure and fail, we're screwed. We can't keep running without an
>> > inode hash, or a mem_map[].
>> >
>> > This looks like it's going to at least get partially used in drivers,
>> at
>> > least from the examples. Are these kinds of things that, if the
>> driver
>> > fails to load, that the system is useless and hosed? Or, is it
>> > something where we might limp along to figure out what went wrong
>> before
>> > we reboot?
>>
>> Bug in the above place does not mean that we could not allocate
>> memory. It means caller is broken.
>
> Could you explain that a bit?
>
> Is this a case where a device is mapped to a very *specific* range of
> physical memory and no where else? What are the reasons for not marking
> it off limits at boot? I also saw some bits of isolation and migration
> in those patches. Can't the migration fail?

The function is called from alloc_contig_range() (see patch 05/12) which
makes sure that the PFN is valid. Situation where there is not enough
space is caught earlier in alloc_contig_range().

alloc_contig_freed_pages() must be given a valid PFN range such that all
the pages in that range are free (as in are within the region tracked by
page allocator) and of MIGRATETYPE_ISOLATE so that page allocator won't
touch them.

That's why invalid PFN is a bug in the caller and not an exception that
has to be handled.

Also, the function is not called during boot time. It is called while
system is already running.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-04-01 14:03:46

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

On Fri, 2011-04-01 at 00:51 +0200, Michal Nazarewicz wrote:
> On Fri, 01 Apr 2011 00:26:51 +0200, Dave Hansen <[email protected]>
> wrote:
> >> Bug in the above place does not mean that we could not allocate
> >> memory. It means caller is broken.
> >
> > Could you explain that a bit?
> >
> > Is this a case where a device is mapped to a very *specific* range of
> > physical memory and no where else? What are the reasons for not marking
> > it off limits at boot? I also saw some bits of isolation and migration
> > in those patches. Can't the migration fail?
>
> The function is called from alloc_contig_range() (see patch 05/12) which
> makes sure that the PFN is valid. Situation where there is not enough
> space is caught earlier in alloc_contig_range().
>
> alloc_contig_freed_pages() must be given a valid PFN range such that all
> the pages in that range are free (as in are within the region tracked by
> page allocator) and of MIGRATETYPE_ISOLATE so that page allocator won't
> touch them.

OK, so it really is a low-level function only. How about a comment that
explicitly says this? "Only called from $FOO with the area already
isolated." It probably also deserves an __ prefix.

> That's why invalid PFN is a bug in the caller and not an exception that
> has to be handled.
>
> Also, the function is not called during boot time. It is called while
> system is already running.

What kind of success have you had running this in practice? I'd be
worried that some silly task or a sticky dentry would end up in the
range that you want to allocate in.


-- Dave

2011-04-04 13:15:14

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH 04/12] mm: alloc_contig_freed_pages() added

> On Fri, 2011-04-01 at 00:51 +0200, Michal Nazarewicz wrote:
>> The function is called from alloc_contig_range() (see patch 05/12) which
>> makes sure that the PFN is valid. Situation where there is not enough
>> space is caught earlier in alloc_contig_range().
>>
>> alloc_contig_freed_pages() must be given a valid PFN range such that all
>> the pages in that range are free (as in are within the region tracked by
>> page allocator) and of MIGRATE_ISOLATE so that page allocator won't
>> touch them.

On Fri, 01 Apr 2011 16:03:16 +0200, Dave Hansen wrote:
> OK, so it really is a low-level function only. How about a comment that
> explicitly says this? "Only called from $FOO with the area already
> isolated." It probably also deserves an __ prefix.

Yes, it's not really for general use. Comment may indeed be useful here.

>> That's why invalid PFN is a bug in the caller and not an exception that
>> has to be handled.
>>
>> Also, the function is not called during boot time. It is called while
>> system is already running.

> What kind of success have you had running this in practice? I'd be
> worried that some silly task or a sticky dentry would end up in the
> range that you want to allocate in.

I'm not sure what you are asking.

The function requires the range to be marked as MIGRATE_ISOLATE and all
pages being free, so nothing can be allocated there while the function
is running.

If you are asking about CMA in general, the range that CMA uses is marked
as MIGRATE_CMA (a new migrate type) which means that only MIGRATE_MOVABLE
pages can be allocated there. This means, that in theory, if there is
enough memory the pages can always be moved out of the region. At leasts
that's my understanding of the type. If this is correct, the allocation
should always succeed provided enough memory for the pages within the
region to be moved to is available.

As of practice, I have run some simple test to see if the code works and
they succeeded. Also, Marek has run some test with actual hardware and
those worked well as well (but I'll let Marek talk about any details).

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-----<email/xmpp: [email protected]>-----ooO--(_)--Ooo--

2011-04-05 07:24:49

by Marek Szyprowski

[permalink] [raw]
Subject: RE: [PATCH 04/12] mm: alloc_contig_freed_pages() added

Hello,

On Monday, April 04, 2011 3:15 PM MichaƂ Nazarewicz wrote:

> > What kind of success have you had running this in practice? I'd be
> > worried that some silly task or a sticky dentry would end up in the
> > range that you want to allocate in.
>
> I'm not sure what you are asking.
>
> The function requires the range to be marked as MIGRATE_ISOLATE and all
> pages being free, so nothing can be allocated there while the function
> is running.
>
> If you are asking about CMA in general, the range that CMA uses is marked
> as MIGRATE_CMA (a new migrate type) which means that only MIGRATE_MOVABLE
> pages can be allocated there. This means, that in theory, if there is
> enough memory the pages can always be moved out of the region. At leasts
> that's my understanding of the type. If this is correct, the allocation
> should always succeed provided enough memory for the pages within the
> region to be moved to is available.
>
> As of practice, I have run some simple test to see if the code works and
> they succeeded. Also, Marek has run some test with actual hardware and
> those worked well as well (but I'll let Marek talk about any details).

We did the tests with real multimedia drivers - video codec and video
converter (s5p-mfc and s5p-fimc). These drivers allocate large contiguous
buffers for video data. The allocation is performed when driver is opened
by user space application.

First we consumed system memory by running a set of simple applications
that just did some malloc() and filled memory with random pattern to consume
free pages. Then some of that memory has been freed and we ran the video
decoding application. Multimedia drivers successfully managed to allocate
required contiguous buffers from MIGRATE_CMA ranges.

The tests have been performed with different system usage patterns (malloc(),
heavy filesystem load, anonymous memory mapping). In all these cases CMA
worked surprisingly good allowing the drivers to allocate the required
contiguous buffers.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center